Fake display of high RAM usage or RAM leak on Windows by Subspace farmer

Both audit and proving while use 60-90% utilize disk manage to hit higher read speed 200-500MB/s so might look fine in windows talk manager, here are results both run in same time:

E:\chia>subspace-farmer-windows-x86_64-skylake-unbuffered-farming-io-windows-wip.exe benchmark audit g:\sub
Benchmarking audit/plot/rayon/unbuffered
Benchmarking audit/plot/rayon/unbuffered: Warming up for 3.0000 s
Benchmarking audit/plot/rayon/unbuffered: Collecting 10 samples in estimated 6.1106 s (20 iterations)
Benchmarking audit/plot/rayon/unbuffered: Analyzing
audit/plot/rayon/unbuffered
                        time:   [300.70 ms 304.94 ms 309.66 ms]
                        thrpt:  [8887.5 GiB/s 9025.3 GiB/s 9152.5 GiB/s]
                 change:
                        time:   [-4.3081% -2.3559% -0.4516%] (p = 0.03 < 0.05)
                        thrpt:  [+0.4537% +2.4127% +4.5021%]
                        Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low mild
  1 (10.00%) high mild
Benchmarking audit/plot/rayon/regular
Benchmarking audit/plot/rayon/regular: Warming up for 3.0000 s
Benchmarking audit/plot/rayon/regular: Collecting 10 samples in estimated 6.4552 s (30 iterations)
Benchmarking audit/plot/rayon/regular: Analyzing
audit/plot/rayon/regular
                        time:   [171.28 ms 172.14 ms 173.03 ms]
                        thrpt:  [ 15905 GiB/s  15987 GiB/s  16068 GiB/s]
                 change:
                        time:   [-50.426% -49.604% -48.725%] (p = 0.00 < 0.05)
                        thrpt:  [+95.026% +98.429% +101.72%]
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low mild
  1 (10.00%) high mild
E:\chia>subspace-farmer-windows-x86_64-skylake-unbuffered-farming-io-windows-wip.exe benchmark prove g:\sub
Benchmarking prove/plot/rayon/unbuffered
Benchmarking prove/plot/rayon/unbuffered: Warming up for 3.0000 s
Read record chunks in 120.2104683s: 24/None

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 1205.6s.
Benchmarking prove/plot/rayon/unbuffered: Collecting 10 samples in estimated 1205.6 s (10 iterations)
Read record chunks in 491.2121ms: 24/None
Read record chunks in 425.4673ms: 24/None
Read record chunks in 484.5339ms: 24/None
Read record chunks in 481.8191ms: 24/None
Read record chunks in 485.5749ms: 24/None
Read record chunks in 472.9214ms: 24/None
Read record chunks in 480.5535ms: 24/None
Read record chunks in 485.3148ms: 24/None
Read record chunks in 484.8486ms: 24/None
Read record chunks in 479.6529ms: 24/None
Benchmarking prove/plot/rayon/unbuffered: Analyzing
prove/plot/rayon/unbuffered
                        time:   [812.13 ms 826.29 ms 837.59 ms]
                        change: [+4.0703% +5.8875% +7.5256%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild
Benchmarking prove/plot/rayon/regular
Benchmarking prove/plot/rayon/regular: Warming up for 3.0000 s
Read record chunks in 443.9145ms: 24/None
Read record chunks in 423.6131ms: 24/None
Read record chunks in 480.1131ms: 24/None
Read record chunks in 484.3911ms: 24/None
Read record chunks in 483.1861ms: 24/None
Read record chunks in 484.8668ms: 24/None
Read record chunks in 484.5676ms: 24/None

Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 7.5s.
Benchmarking prove/plot/rayon/regular: Collecting 10 samples in estimated 7.5052 s (10 iterations)
Read record chunks in 482.7983ms: 24/None
Read record chunks in 476.5681ms: 24/None
Read record chunks in 488.2769ms: 24/None
Read record chunks in 487.8396ms: 24/None
Read record chunks in 478.1048ms: 24/None
Read record chunks in 490.8079ms: 24/None
Read record chunks in 478.372ms: 24/None
Read record chunks in 485.754ms: 24/None
Read record chunks in 479.5896ms: 24/None
Read record chunks in 479.8443ms: 24/None
Benchmarking prove/plot/rayon/regular: Analyzing
prove/plot/rayon/regular
                        time:   [758.29 ms 762.70 ms 766.93 ms]
                        change: [-1.6711% -0.9625% -0.3094%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild
1 Like

so I ran overnight and got only 1 chance to win reward but missed. Here is log a few minutes before and after the event.

CLI: subspace-farmer3h_5Mar.exe farm path=I:\1,size=1900G --farm-during-initial-plotting true --node-rpc-url ws://192.168.2.215:9945 --metrics-endpoints 192.168.2.200:2222 --reward-address stxxxxxx

2024-03-05T19:05:58.497046Z  INFO subspace_farmer::farmer_cache: Piece cache sync 96.75% complete
2024-03-05T19:06:55.843101Z  INFO subspace_farmer::farmer_cache: Piece cache sync 97.36% complete
2024-03-05T19:07:40.337412Z  INFO subspace_farmer::farmer_cache: Piece cache sync 97.97% complete
2024-03-05T19:08:34.322964Z  INFO subspace_farmer::farmer_cache: Piece cache sync 98.59% complete
2024-03-05T19:08:36.160734Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (82.97% complete) sector_index=1476
2024-03-05T19:09:25.064721Z  INFO subspace_farmer::farmer_cache: Piece cache sync 99.20% complete
2024-03-05T19:14:46.211204Z  INFO subspace_farmer::farmer_cache: Piece cache sync 99.81% complete
2024-03-05T19:15:28.384995Z  INFO subspace_farmer::farmer_cache: Finished piece cache synchronization
2024-03-05T19:16:13.177588Z  WARN {disk_farm_index=0}: subspace_farmer::single_disk_farm::farming: Proving for solution skipped due to farming time limit slot=2808227 sector_index=1221
2024-03-05T19:18:53.684999Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (83.02% complete) sector_index=1477
2024-03-05T19:24:39.957210Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (83.08% complete) sector_index=1478
2024-03-05T19:28:16.885277Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (83.14% complete) sector_index=1479
2024-03-05T19:32:17.840446Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (83.19% complete) sector_index=1480
2024-03-05T19:36:13.176416Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (83.25% complete) sector_index=1481
2024-03-05T19:39:50.259021Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (83.31% complete) sector_index=1482
2024-03-05T19:43:41.189302Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (83.36% complete) sector_index=1483
2024-03-05T19:47:37.013683Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (83.42% complete) sector_index=1484
2024-03-05T19:51:10.408739Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (83.47% complete) sector_index=1485

@dragonP I don’t think you were running what I asked you to run though. You should have had ā€œRead record chunks inā€ message and I don’t see it in your logs.
It is also helpful when files are not renamed, so I can see from the command what you were actually running.

I ran the benchmark prove before. I think you missed it: Fake display of high RAM usage or RAM leak on Windows by Subspace farmer - #92 by dragonP

I expected a single message with benchmarks and proving logs from one of the disks together and nothing else. You have provided just benchmarks initially and logs provided later seem to correspond to a different build because they should have also had ā€œRead record chunks inā€ in logs just like reb0rn’s logs above, but they didn’t.

Yes, I was wrong.

I run the first build on 5 Mar. It’s painful for me but let me run the latest build yesterday again with filename unchanged. Hope I can report it soon with the first chance of reward win.

Okay, another build then: Snapshot build Ā· subspace/subspace@e8a2461 Ā· GitHub

This time since random reads are particularly bad on Windows, it will instead read the whole 1G sector for proving purposes, which wastes more bandwidth and memory while proving, but also happens to be faster and I don’t see a much better way to approach it right now :confused:

Hopefully ^ build works reasonably well so I can stop fighting Windows in foreseeable future.

I think reb0rn’s logs are bad enough to conclude that it’ll not work well. Better wait for ^ build and try it instead. Appreciate all the patience with various build, I can see how it might get confusing for everyone, but I feel like we are making progress here.

latest build, no memory issue

SSD 8TB at start had ~500MB/s read then after 3-4min stabilized at 13% utilization

but 2 of my 4TB nvme still have 85% utilization after start (15min) and just ~55MB/s read, very strange as the last fix work for SSD not for nvme drives which all should be way faster and are

4th drive also 4TB nvme also work fine, but will wait it out to see result

after 25min, 1st SSD that had 13% disk utilization just for no reason now show 60% and same read speed

nvme drive that only have normal utilization (7%) is the one that is 100% plotted

first reward was ok on drive that is 60% utlilized

Read record chunks in 867.9578ms: 24/None
2024-03-06T02:16:08.408210Z  INFO {disk_farm_index=1}: subspace_farmer::reward_signing: Successfully signed reward hash 0x76f2c0d1c690a0c07679b9f105fd2151f4c3856b0a7db7e107b59f7bff1ac864
Read record chunks in 2.5995519s: 24/None
2024-03-06T02:32:12.693216Z  INFO {disk_farm_index=3}: subspace_farmer::reward_signing: Successfully signed reward hash 0x9da98ea6eabd238b77bc338842545162d1442677c5794b2c58186c424d2199f1

Thanks @reb0rn. So I’ll take your result as very positive. I have re-run my machine with all plots (rather than just 1 plot as Nazar asked. I run 1 plot for almost 1 hour without any winning chance to test the fetch proof time).

If I can win with multiple plots, then this build sounds great.

Promising, the first chance and it won… Disk farm 5 is 980 Pro (NVMe). I hope SATA SSD can win also.

2024-03-06T02:59:17.545699Z  INFO subspace_farmer::farmer_cache: Piece cache sync 78.06% complete
2024-03-06T02:59:17.722353Z  INFO subspace_farmer::farmer_cache: Piece cache sync 86.73% complete
Read record chunks in 1.0172438s: 20/None
2024-03-06T03:03:45.627353Z  INFO {disk_farm_index=5}: subspace_farmer::reward_signing: Successfully signed reward hash 0x0a0fc927ae6fe9492fead3d162048bdf411d7644e036f9213c35c69988413b8b
1 Like

Drives that do only farming, have perfect utilization <10% but as soon they hit one replot or plot task utilization go to 65-95% on windows task manager and do not drop, while the read speed stay same as it should, is it just glitch in windows drive utilization monitoring or not have no idea
Still waiting to hit first reward on drive which is just farming to see if solution is ā€œfasterā€

this is how it look in windows task when replot is saved to disk

this is my 2nd pc that have few replots to do had one skip but i am not sure if disk read seek was issue or just rpc and new block on network

Read record chunks in 1.3336557s: 32/None
2024-03-06T03:55:11.148527Z  WARN {disk_farm_index=1}: subspace_farmer::single_disk_farm::farming: Failed to send solution to node, skipping further proving for this slot slot=2839350 sector_index=1101 error=RPC call failed: ErrorObject { code: ServerError(-32001), message: "Custom error: Solution was ignored", data: None }

Such increase should be temporary and happens when farmer is pulling pieces from local caches for the upcoming segment rather than from the network (I see you have dip in CPU usage, which likely means you have just finished one segment and it is time to collect pieces for the next segment).

That is likely caused by disk speed. When pulling pieces for the next segment it might load disk for some time, but usually a second to a few if disk is fast. 50MB/s for SSD is not much, at least for linear speeds, but already overwhelmed in your case, likely because you have a lot of plotting already done there and it might slow things down due to read concurrency. Not sure what else to do there given how inadequately Windows deals with things.

What SSD model is that? Also are you using defaults or customized something in CLI related to concurrency?

Those are NVME (kingston 4TB on 7950x) and no disk utilization stay high speed same, as my 2nd pc finished all replotting… restarted farmer to see what solution speed if when utilization is ~7%

The CPU use go down and up as it had only to replot 4 sectors from 2 drive

Still windows task manager utilization might be glitch still have not got solution farmed from drive that only farm

Ok my PC 2 which just farm with 7% NVME (restarted farmer after replot to make NVME utilization drop) utlization just hit block time is almost same as when utilization show 97%… ~980ms

1 Like

@nazar-pc I thought you’re asleep.

Okies. so we have a good result this time. I run the latest build today with all of my plots after I observed @reb0rn could win with multiple plots.

I’ve won 3/3 so far, no miss. I’ll let this build running for now. I’ve almost been timeout for the SATA SSD. This is very much promising, but if you can optimize the prove time more, it’ll be perfect.

I’ll report if result is not as expected after 12 hours or so. If not hear from me, it should be all perfect.

PC spec: CPU 13500, 64GB RAM, 8 plots, total 15.8TiB.
Result: RAM usage 26GB/64GB, win rate: 3/3, miss rate 0/3.

Log (below is log for the last 2 wins, the first win I’ve shared above):
Note: disk 2 is SATA, disk 7 is NVMe.

2024-03-06T03:52:41.231384Z  INFO {disk_farm_index=7}: subspace_farmer::single_disk_farm::plotting: Plotting sector (40.39% complete) sector_index=1475
2024-03-06T03:55:56.686752Z  INFO {disk_farm_index=4}: subspace_farmer::single_disk_farm::plotting: Plotting sector (89.49% complete) sector_index=1592
Read record chunks in 2.9108894s: 20/None
2024-03-06T03:56:29.442267Z  INFO {disk_farm_index=2}: subspace_farmer::reward_signing: Successfully signed reward hash 0x5f0b56a2f5b8543ff42da281fa597ad17b9772788f0416f0a51926412d6ccb37
2024-03-06T03:59:18.720482Z  INFO {disk_farm_index=5}: subspace_farmer::single_disk_farm::plotting: Plotting sector (82.74% complete) sector_index=1472
2024-03-06T04:02:40.742745Z  INFO {disk_farm_index=2}: subspace_farmer::single_disk_farm::plotting: Plotting sector (82.69% complete) sector_index=1471
2024-03-06T04:05:59.031248Z  INFO {disk_farm_index=6}: subspace_farmer::single_disk_farm::plotting: Plotting sector (82.69% complete) sector_index=1471
2024-03-06T04:09:17.697900Z  INFO {disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Plotting sector (82.74% complete) sector_index=1472
2024-03-06T04:12:36.037241Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (98.26% complete) sector_index=1472
2024-03-06T04:15:53.359821Z  INFO {disk_farm_index=3}: subspace_farmer::single_disk_farm::plotting: Plotting sector (82.91% complete) sector_index=1475
Read record chunks in 1.1156337s: 20/None
2024-03-06T04:16:50.941068Z  INFO {disk_farm_index=7}: subspace_farmer::reward_signing: Successfully signed reward hash 0xea7cfdf3849ad3ad38d614dd8c654bf35b0b580f987b72cc8140551b6a0014f8
2024-03-06T04:19:07.574344Z  INFO {disk_farm_index=7}: subspace_farmer::single_disk_farm::plotting: Plotting sector (40.42% complete) sector_index=1476
2024-03-06T04:22:23.448053Z  INFO {disk_farm_index=4}: subspace_farmer::single_disk_farm::plotting: Plotting sector (89.54% complete) sector_index=1593
1 Like

It is not a glitch.

Block time is not relevant here, you have ~4s to prove, so should be in a decent shape.

I hope I can, but for now I just want to make it work at all.

Memory usage is still high, it should go down further with more tuning of the same kind, but for other parts of the farmer.

I do not see memory use abuse, 8/32 here with few task that use those ~6-7GB alone so its fine

So after 4 hours, I’ve got 2/9 missed reward. Both happened on SATA SSD, only 2TB. I hope we can improve on this.

2024-03-06T06:11:57.583183Z  INFO {disk_farm_index=5}: subspace_farmer::single_disk_farm::plotting: Plotting sector (83.02% complete) sector_index=1477
2024-03-06T06:15:18.839524Z  INFO {disk_farm_index=2}: subspace_farmer::single_disk_farm::plotting: Plotting sector (82.97% complete) sector_index=1476
Read record chunks in 1.5817618s: 20/None
2024-03-06T06:18:32.179002Z  INFO {disk_farm_index=5}: subspace_farmer::reward_signing: Successfully signed reward hash 0xf86694adde96134a73fab5b6f4494dbc603e119c452991fc3c5cd99913315411
2024-03-06T06:18:40.094664Z  INFO {disk_farm_index=6}: subspace_farmer::single_disk_farm::plotting: Plotting sector (82.97% complete) sector_index=1476
2024-03-06T06:21:59.817234Z  INFO {disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Plotting sector (83.02% complete) sector_index=1477
2024-03-06T06:25:17.891877Z  INFO {disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (98.60% complete) sector_index=1477
2024-03-06T06:28:39.947803Z  INFO {disk_farm_index=3}: subspace_farmer::single_disk_farm::plotting: Plotting sector (83.19% complete) sector_index=1480
2024-03-06T06:31:58.324284Z  INFO {disk_farm_index=7}: subspace_farmer::single_disk_farm::plotting: Plotting sector (40.55% complete) sector_index=1481
2024-03-06T06:35:19.160498Z  INFO {disk_farm_index=4}: subspace_farmer::single_disk_farm::plotting: Plotting sector (89.83% complete) sector_index=1598
Read record chunks in 3.1487163s: 20/None
2024-03-06T06:38:10.501404Z  WARN {disk_farm_index=1}: subspace_farmer::single_disk_farm::farming: Failed to send solution to node, skipping further proving for this slot slot=2849129 sector_index=1466 error=RPC call failed: ErrorObject { code: ServerError(-32001), message: "Custom error: Solution was ignored", data: None }

Another test build with further changes: Snapshot build Ā· subspace/subspace@21de4db Ā· GitHub
It should reduce memory usage further, bypassing Windows caches for all files used by farmer during normal operation.

Got it. I’ll test and update in 1-2 hours.

I’ve checked again just now, seems build is not completed yet. Please have a look.