Fake display of high RAM usage or RAM leak on Windows by Subspace farmer

You can try using cacheset to easily control memory
It simply notifies the system to release the memory used by MMF

read_at → read_exact_at → FileExt::seek_read → NtReadFile
However NtReadFile operations may also be initiated by MMF operations from inside the kernel
Since they can be released correctly by the system, they are not leaked memory.

I think I reproduced the issue of not being able to prove in time with a custom farmer. Basically the volume of random reads increases, but it doesn’t really make as much sense to me because performance-wise it should be the same. Will be investigating, but at least I have a way to reproduce it now. Thanks everyone for testing!

Unfortunately, looks like yet another bug in Windows :confused: : GetFileInformationByHandle followed by read with FILE_FLAG_NO_BUFFERING is extremely slow - Microsoft Q&A

Audits and proving are MUCH faster before checking for file size is done and slows down 10x (no wonder it can’t prove in time anymore) afterwards.

There is a chance I’ll be able to work around this too, but with every day I hate Windows a bit more.

Here is a new test build cooking that should address reward misses: Snapshot build · subspace/subspace@a01dbf1 · GitHub
There is also a new warning in case new farm was just created or resized, in those cases farmer restart is necessary to not lose farming rewards because Windows :face_with_symbols_over_mouth: (see link in the previous post for details).

There are some non-farming memory usage optimizations to apply later, but I want to make sure that this part works properly first.

Roger that. I’ll report after few hours.

I don’t get this part. Normally we only add/remove plot after farmer restart (close and then start again), is it sufficient OR we need another restart after restart?

If you create new farm (in the directory that was empty before) or resize existing farm (to bigger or smaller size) then you need to start farmer, let it initialize everything, stop and start again. You will see warning in logs if/when this is necessary.

I ran for more than 30 minutes, and missed 4/4 rewards. I ran with all default options from the same machine.

1 Like

Okay, that is something I can’t reproduce on my end so far.

Here is another test build with a bit more logging that should show how much time it takes to read stuff from disk for proving purposes: Snapshot build · subspace/subspace@6c78ad0 · GitHub

If possible:

  1. Pick one disk
  2. Run proving benchmark on that disk with above build
  3. Run with defaults and wait for the first proving with that build on that disk and provide logs containing a few minutes before and a few minutes after proving

NOTE: I don’t care about benchmarks from multiple disks or other builds, I don’t care about audit or other benchmarks, etc. I only need the information requested in above steps and nothing else.

benchmark prove, will do farming test and update later

E:\chia>subspace-farmer-windows-x86_64-skylake-unbuffered-farming-io-windows-wip.exe benchmark prove g:/sub
Benchmarking prove/plot/rayon/unbuffered
Benchmarking prove/plot/rayon/unbuffered: Warming up for 3.0000 s
Read record chunks in 478.1553ms: 24/None
Read record chunks in 480.8773ms: 24/None
Read record chunks in 482.3749ms: 24/None
Read record chunks in 482.9843ms: 24/None
Read record chunks in 470.6943ms: 24/None
Read record chunks in 483.5461ms: 24/None
Read record chunks in 482.0402ms: 24/None

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 7.7s.
Benchmarking prove/plot/rayon/unbuffered: Collecting 10 samples in estimated 7.6691 s (10 iterations)
Read record chunks in 484.4155ms: 24/None
Read record chunks in 480.9888ms: 24/None
Read record chunks in 474.2391ms: 24/None
Read record chunks in 484.9637ms: 24/None
Read record chunks in 476.5842ms: 24/None
Read record chunks in 483.2739ms: 24/None
Read record chunks in 472.9133ms: 24/None
Read record chunks in 411.2574ms: 24/None
Read record chunks in 483.9787ms: 24/None
Read record chunks in 474.5075ms: 24/None
Benchmarking prove/plot/rayon/unbuffered: Analyzing
prove/plot/rayon/unbuffered
                        time:   [742.32 ms 756.80 ms 766.27 ms]
                        change: [+49.532% +57.311% +65.250%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low severe
Benchmarking prove/plot/rayon/regular
Benchmarking prove/plot/rayon/regular: Warming up for 3.0000 s
Read record chunks in 481.967ms: 24/None
Read record chunks in 483.1066ms: 24/None
Read record chunks in 482.7118ms: 24/None
Read record chunks in 481.0941ms: 24/None
Read record chunks in 482.7405ms: 24/None
Read record chunks in 484.3169ms: 24/None
Read record chunks in 469.9897ms: 24/None

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 7.7s.
Benchmarking prove/plot/rayon/regular: Collecting 10 samples in estimated 7.6634 s (10 iterations)
Read record chunks in 483.8392ms: 24/None
Read record chunks in 480.04ms: 24/None
Read record chunks in 487.8269ms: 24/None
Read record chunks in 480.6634ms: 24/None
Read record chunks in 485.8754ms: 24/None
Read record chunks in 481.0021ms: 24/None
Read record chunks in 485.8655ms: 24/None
Read record chunks in 482.9795ms: 24/None
Read record chunks in 415.5774ms: 24/None
Read record chunks in 434.7999ms: 24/None
Benchmarking prove/plot/rayon/regular: Analyzing
prove/plot/rayon/regular
                        time:   [741.63 ms 757.00 ms 768.75 ms]
                        change: [+49.294% +57.314% +65.385%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) low severe

While the cash is being rebuild one interesting fact, utilization controller is about 57% and speed of read is 55MB/s, with all IO version utilization is 3-5x which mean drive when full plotted would be limited in speed to max ~100MB/s (as its 8TB 100MB/s would be slow) and its on sata3 and should not have issue to use full sata3 speed if needed

You asked to pick up 1 disk, so I picked up 980 Pro and run the prove benchmark for it. Below is result

I’ve been waiting pretty long but since this single disk is only 2TB and perhaps only half of it has been plotted, so I will provide the log later after my sleep.

PS C:\Subspace Farmer> .\subspace-farmer3h_5Mar_2 benchmark prove I:\1
2024-03-05T17:03:08.964913Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1367
2024-03-05T17:03:08.971559Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1368
2024-03-05T17:03:08.976124Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1369
2024-03-05T17:03:08.980551Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1370
2024-03-05T17:03:08.984999Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1371
2024-03-05T17:03:08.989367Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1372
2024-03-05T17:03:08.993658Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1373
2024-03-05T17:03:08.998113Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1374
2024-03-05T17:03:09.002482Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1375
2024-03-05T17:03:09.007080Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1376
Benchmarking prove/plot/rayon/unbuffered
Benchmarking prove/plot/rayon/unbuffered: Warming up for 3.0000 s
Read record chunks in 139.9117ms: 20/None
Read record chunks in 133.9035ms: 20/None
Read record chunks in 183.7562ms: 20/None
Read record chunks in 181.8184ms: 20/None
Read record chunks in 182.3581ms: 20/None
Read record chunks in 178.9505ms: 20/None
Read record chunks in 184.1901ms: 20/None

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 5.6s.
Benchmarking prove/plot/rayon/unbuffered: Collecting 10 samples in estimated 5.5935 s (10 iterations)
Read record chunks in 186.0112ms: 20/None
Read record chunks in 173.6635ms: 20/None
Read record chunks in 175.872ms: 20/None
Read record chunks in 180.0751ms: 20/None
Read record chunks in 177.268ms: 20/None
Read record chunks in 172.7944ms: 20/None
Read record chunks in 184.9057ms: 20/None
Read record chunks in 179.549ms: 20/None
Read record chunks in 181.7635ms: 20/None
Read record chunks in 182.5923ms: 20/None
Benchmarking prove/plot/rayon/unbuffered: Analyzing
prove/plot/rayon/unbuffered
                        time:   [552.48 ms 556.72 ms 560.44 ms]
                        change: [-26.951% -25.990% -25.060%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild
2024-03-05T17:03:27.827029Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1367
2024-03-05T17:03:27.833045Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1368
2024-03-05T17:03:27.838123Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1369
2024-03-05T17:03:27.843643Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1370
2024-03-05T17:03:27.850086Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1371
2024-03-05T17:03:27.855297Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1372
2024-03-05T17:03:27.861055Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1373
2024-03-05T17:03:27.866333Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1374
2024-03-05T17:03:27.871395Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1375
2024-03-05T17:03:27.876122Z  WARN subspace_farmer::single_disk_farm::farming: Failed to turn solution candidates into solutions error=Failed to decode sector contents map: Checksum mismatch sector_index=1376
Benchmarking prove/plot/rayon/regular
Benchmarking prove/plot/rayon/regular: Warming up for 3.0000 s
Read record chunks in 158.1121ms: 20/None
Read record chunks in 187.8623ms: 20/None
Read record chunks in 186.784ms: 20/None
Read record chunks in 189.8522ms: 20/None
Read record chunks in 194.8351ms: 20/None
Read record chunks in 187.5705ms: 20/None
Read record chunks in 193.2731ms: 20/None

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 5.7s.
Benchmarking prove/plot/rayon/regular: Collecting 10 samples in estimated 5.6827 s (10 iterations)
Read record chunks in 199.6511ms: 20/None
Read record chunks in 189.1162ms: 20/None
Read record chunks in 197.7991ms: 20/None
Read record chunks in 196.683ms: 20/None
Read record chunks in 200.0773ms: 20/None
Read record chunks in 197.4874ms: 20/None
Read record chunks in 193.4025ms: 20/None
Read record chunks in 191.4803ms: 20/None
Read record chunks in 197.3983ms: 20/None
Read record chunks in 186.9175ms: 20/None
Benchmarking prove/plot/rayon/regular: Analyzing
prove/plot/rayon/regular
                        time:   [576.75 ms 580.79 ms 585.02 ms]
                        change: [-23.074% -22.230% -21.355%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

while the disk cash is syncing it skipped one solution, I guess huge SSD controller utilization is issue, will post update when cash sync and SSD use if drop

disk utilization sometimes got for a time (1-2min) to even 98% while the read is same ~57MB/s defo strange

2024-03-05T18:55:02.955786Z  INFO subspace_farmer::farmer_cache: Piece cache sync 61.23% complete
2024-03-05T18:55:37.803697Z  INFO subspace_farmer::farmer_cache: Piece cache sync 61.44% complete
2024-03-05T18:55:49.541225Z  INFO subspace_farmer::farmer_cache: Piece cache sync 61.64% complete
Read record chunks in 7.1208174s: 24/None
2024-03-05T18:56:08.498216Z  WARN {disk_farm_index=0}: subspace_farmer::single_disk_farm::farming: Proving for solution skipped due to farming time limit slot=2807018 sector_index=1853
2024-03-05T18:56:09.923658Z  INFO subspace_farmer::farmer_cache: Piece cache sync 61.85% complete
1 Like

Got it, thanks! There are more things to debug then.

I still have 60% and sometimes 98% 8TB disk utilization with 38% plotted, I guess IO used do not work with windows like it should as its utilizing controller more then needed, with ~60MB/s read for farming 3TB utilization should be 10-14% as max speed of drive and controller is 500-600MB/s on sata3
and one more skipped solution after ~4h

2024-03-05T21:49:53.656240Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (37.16% complete) sector_index=2781
2024-03-05T21:51:43.401398Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (37.18% complete) sector_index=2782
2024-03-05T21:53:30.758390Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (37.19% complete) sector_index=2783
Read record chunks in 7.1562488s: 24/None
2024-03-05T21:54:14.249869Z  WARN {disk_farm_index=0}: subspace_farmer::single_disk_farm::farming: Proving for solution skipped due to farming time limit slot=2817701 sector_index=1502
2024-03-05T21:55:17.765114Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (37.20% complete) sector_index=2784
2024-03-05T21:57:05.512514Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (37.22% complete) sector_index=2785
2024-03-05T21:58:52.751977Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (37.23% complete) sector_index=2786
2024-03-05T22:00:43.453915Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (37.24% complete) sector_index=2787
2024-03-05T22:02:34.818604Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (37.26% complete) sector_index=2788
2024-03-05T22:04:24.680688Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (37.27% complete) sector_index=2789

How was the behavior with mar-04 in terms of disk utilization? BTW utilization is not about the bandwidth (in Task Manager).

Its fine, at start its 87% with 480MB/s read which last few minutes then it drop to 11-15% on SSD with 3TB plotted which is expected
on NVME use is less which is expected

Would you be able to do audit and prove benchmark at the same time on that disk and see if there is a significant impact of auditing on proving?

run in same time audit and proving with mar-04?

With latest experimental build from this thread: Fake display of high RAM usage or RAM leak on Windows by Subspace farmer - #90 by nazar-pc