Fake display of high RAM usage or RAM leak on Windows by Subspace farmer

Probably this will help.
Subscpace node have a lot of errors with memory pages

After I stopped node

mmap is not a solution, in fact it was a problem we specifically moved away from in the past that was preventing usage of large files on Windows at all. You can find discussions on the forum and GitHub about this.

Leak is on farmer, not node. And those are called ā€œpage faultsā€ in English, they are not ā€œerrorsā€ in a sense that you might expect from the name. They are not necessary an issue to resolve and not relevant in this case.

In Windows is leaking memory when reading random chunks of a large file - Microsoft Q&A I have discovered that it is kernel-initiated memory-mapping (that I didn’t ask for and specifically instructed Windows to NOT do) that is a problem here :disappointed:

1 Like

Here is an experimental build based on the upcoming release of Advanced CLI, please give it a try when you have time: Snapshot build Ā· subspace/subspace@dddd321 Ā· GitHub
It doesn’t fix the issue globally, but it should bypass the issue for farming specifically and hopefully that is good enough for now.

NOTE: After first start it might seem to not plot anything for a while, that is expected on first start, just let it finish.

I’ve tested on one of my PC: CPU 13500, Win 11, 15.8 TiB in total, 64GB RAM

The result is amazing, total RAM is now only 11GB. And it’s stable for more than 30 minutes.

The plotting time is very slow for the first 2 sectors, but after that, it’s back to normal. Maybe slightly longer than Feb 19 ver but I’m not really sure.

I will have to watch out the reward / reward miss. I will report later in couple of hours.

I’m also testing on the second PC with Windows 10 and will report also.

1 Like

RAM usage is low but the result from 2 PC’s are very bad in term of reward miss. On both of my PC, the miss rate is up to 60-70% after 6 hours run. I’m so frustrated to say but this change is a ā€œno goā€.

As below, M0 and M4 are 2 machines I’ve tested with the new release. M1, M2 and M3 have almost % rate of miss reward.

M0 has a missed reward of 7/11. M4 has missed reward of 4/6.

I have no idea what tool that screenshot is from (not familiar) and what all of the numbers mean (it would have been more helpful if you posted it as text without truncating columns, etc), but looks like all of your farms are missing rewards, aren’t they?

It would also be helpful to know with which arguments you run each farmer.

Sorry for that.
M0, running 6 hours, missing reward is 7 out of 11.
M1, running for 6 days, missing reward is 2 out of 331.
M2, running for 21 hours, missing reward is 0 out of 46
M3, running for 6 days, missing reward is 4 out of 299
M4, running for almost 6 hours (this is the Win 10 I started testing later), missing reward is 4 out of 6.

As mentioned, M0 and M4 were used to test this build. M1, M2, M3 have been running Feb 19 release. The reward miss rate at M1, M2, M3 are acceptable to me as you can see it’s just 1-2%.

This is subspace monitoring tool almost all of us using right now, very helpful to monitor node/farmer status, plot time, reward miss. I thought you’re also using it.

What machines are those, which arguments did you use for farmers, what was the miss rate before upgrade? Is the miss rate the same over time or you had a few missed and it stopped missing after that?

The miss rate before upgrade is 1 to 2%, as we can see at M1-M3. M0 to M4 have same build: CPU 13500 and 64GB RAM. I was running at record concurrency 6 for M0-M4, so I think it’s the reason sometimes the reward is missed. M4 is 10900 one, also 64GB, I had no missed reward on this one with Feb 19 release since I set record concurrency value only 1 (the CPU uses too much electricity and it easily get hot so I set record as 1 only)

The missed rate with Feb 19 was random and rare.

The missed rate with the new build is around 2/3, so it’s frequent. The win and miss are mixed, more are missed as number shown.

Can you post all the arguments you use on the farmer? It gives me much more information at once than multiple comments explaining it.

Also I’d be helpful if you can compare reward misses with Snapshot build Ā· subspace/subspace@df919f9 Ā· GitHub, which is the same as above test build, but without Windows-specific change of unbuffered I/O.

Below is the full CLI, same options from M0 to M4. They have same build. I also change the farmer name for M0.

.\subspace-farmer3h_19Feb.exe farm path=C:\1,size=1600G path=E:\1,size=1900G path=F:\1,size=1900G path=H:\1,size=1900G path=I:\1,size=1900G path=J:\1,size=1900G path=K:\1,size=1900G path=L:\1,size=3900G --farm-during-initial-plotting true --in-connections 25 --out-connections 25 --pending-in-connections 25 --pending-out-connections 25 --node-rpc-url ws://192.168.2.205:9945 --metrics-endpoints 192.168.2.200:2222 --plotting-thread-pool-size 20 --replotting-thread-pool-size 20 --farming-thread-pool-size 20 --sector-downloading-concurrency 3 --sector-encoding-concurrency 1 --record-encoding-concurrency 6 --reward-address stxxxxxxx

I’d try to run with defaults for all plotting/replotting/farming/concurrency stuff, what you did can make things worse for rewards. Also that build works slightly differently than feb-19. New builds are not the same as old versions and de-prioritize plotting threads to leave room for farming. For all testing I’m asking, please run defaults, don’t mess with it.

I’m running this build now. Let’s wait for the result after few hours. FYI, I run with the same options as the CLI that I shared above.

Please just use defaults for all testing purposes, otherwise results are not necessary representative of what they should be

I’ve just re-run again with all default. Will report after my sleep. Thanks.

subspace-farmer3h_4Mar.exe farm path=C:\1,size=1600G path=E:\1,size=1900G path=F:\1,size=1900G path=H:\1,size=1900G path=I:\1,size=1900G path=J:\1,size=1900G path=K:\1,size=1900G path=L:\1,size=3900G --farm-during-initial-plotting true --node-rpc-url ws://192.168.2.205:9945 --metrics-endpoints 192.168.2.200:2222 --reward-address stxxxxx

1 Like

I’ve got no hits so far on either of the testing machine, just misses. After an hour and a half. 3 misses on one machine and 2 on the other. Ram usage has decreased to acceptable levels though. Will continue to monitor.

1 Like

I am also testing first build posted, ram usage is ok, 2 misses at start, but SSD usage/utilization is quite huge 60-90%, NVME is about 7%
I am not sure if it doing something and will usage drop down, but it was like 10x less with feb-19

plotting and farming in progress default command line

will test new bild now

1 Like

Are you still plotting? Are you using defaults for concurrency and things like that? Full farmer command would be helpful as well.

Snapshot build Ā· subspace/subspace@df919f9 Ā· GitHub wold be worth testing too to narrow down the root cause.

Also for those who are testing this, it would be helpful if you can do auditing/proving benchmarks. From my testing they were a bit faster with new code, but maybe it depends on setup. There will be rayon/unbuffered (new) and rayon/regular (old) implementations in there.