100% active time at some disks - Windows Mar 8 and Mar 11 release

I want to report that I see a high percentage of my disks (about 30-40%), across 16 PC’s, have constant high active time.

I do NOT blame subspace, but I’ve had 2 Samsung 980 Pro faulty (disk not recognized/not ready) by farming subspace (they were brand new when I bought them, and they are used to farm subspace only). On both, after some times of farming, the disk is not accessible. And after reboot, it is recognized again.

If, only if, subspace cause the disk broken, could it be because the controller constantly at high active time, resulting heat and cause the chip unstable. It is just my theory, but we do not expect to have such high active time by subspace audit process, do we?

It is important to note that I don’t see this 100% active time on Feb 19 release.

1 Like

As mentioned in mar-04 release notes, this happens once because Windows decided to physically write the whole file to disk when write is triggered at the end of the file. This is not happening on Linux or macOS and I believe it is a bug and not aware if solution exists: filesystems - How to pre-allocate file on Windows (NTFS) without writing the whole file - Stack Overflow

Alternative is to do sparse file allocation, but then it wouldn’t be possible to guarantee that there is enough space on disk and farmer might crash in runtime when running out of space.

This happens only once and once file write is complete will no longer happen on restarts.

1 Like

I agree the screenshot above is write. But can you look below, no write, only read due to audit, but active time is 100%.

Do you think 100% active time of SSD controller, running 24/7 for several weeks can cause the disk to be faulty soon? As shared, I’ve had 2 of my brand new 980 Pro faulty. They’re only used to farm subspace, I don’t know if there is any coincidence.

My big concern is: whether constant audit, that makes the ssd controller busy 24/7 for several weeks/months, can cause faulty to it? Why we cannot move the audit to RAM, or change the protocol to get the audit lighter so it can be moved to RAM?

I really doubt the SSD is designed to have the controller constantly active at 20%, 30%, if not 100% as screenshot below. The Subspace protocol made them to do this job, I think Subspace is the first to make it.

Can you think seriously about it? We’re still in testnet, we can change. If this is really a harmful factor, please eliminate it and don’t bring it to mainnet.

So far we’re thinking of the write activity that will wear out the memory chip, but my concern is not about the memory chip, it’s about the controller which is made to work non-stop 24/7 for several weeks/months by subspace.

Hello Nazar, we are not talking about the windows thing of allocating all the plot space once at startup, but the disk activity being higher while simply farming in latest releases. I plotted 4TB out of a fresh 8TB drive (on 0.1.6 and 0.1.7) it took 6 days and disk activity went gradually from 0% to 81% when the plot was completed. this resulted in too high audit times as you can see in the pictures. Another strange thing is that when it completed the plot and started to farm only, audit times when from 0.81s to 1.08s

I then tried the same already completed plot 0.1.5 realease and it uses only 15% disk activity, with very good auditing performance and proving, as you can see here in the following pics:

This is what i get from release 0.1.6 and above:

this is what i get on 0.1.5 with the exact same drive:

I don’t think so. CPU/controller degradation when running within spec is a very unlikely event. 100% utilization can be the result of throttling, which can happen due to SSD controller overheating.

How do you put terabytes of on-disk space magically into very limited RAM? It makes no sense to me.

Did Samsung confirm they are faulty and replaced them?

This is to be expected. In older releases Windows was caching things for its own purposes and that was likely helping with reads reduction a bit, but you paid for that with all the RAM you have. Now we disabled caching and do in fact read from disk every time. If for the capacity and/or SSD that you have it means 81% utilization then that is how much it needs to work to read the data. QLC memory is the slowest kind right now and QVO series from Samsung is optimized for cost and non-demanding applications, so it is not surprising at all to see high utilization. Especially for 8T variant where controller is probably stressed the most.

I guess that is what 33.4G of extra RAM gives you. I am still looking for more optimizations. I can introduce an option to bring back the old mode of operation with high memory usage if some people prefer it, but I’m reluctant to make it the default.

I hope there is a way to get back higher performance without using many gigabytes of RAM, but didn’t find a way just yet.

I think that extra ram usage might be something like a memory leak, because it creeps out along hours, but performance is better since the first minutes of starting the farm.

Also the memory allocated specifically to space acres specified in task manager stays constant over time and is quite low, the value you see rising is the total system memory used
it’s like it uses additional memory to improve things but never free it when not using it anymore

You assumption is logical and reasonable, but not correct. See Windows is leaking memory when reading random chunks of a large file - Microsoft Q&A for some of the details.