High RAM usage for May 15th release

I’ve exhausted options to try remotely, trying to reproduce in VM now, hopefully I succeed, then I should be able to create a fix for this.

Is it reproducible with a single farm and large piece cache so all pieces are stored locally? I used 1.8T farm with 5% piece cache for testing and can’t reproduce it so far on Windows 11.

Just confirmed that May 6 release also has memory issue, didn’t find this out before because I only used this version for a very short time. 96GB memory ran out after two days, node exited after failing to allocate memory. I have other applications running on this machine, but they are not so memory consuming and they were running with old version nodes and farmers, this “memory allocating failed” error never happened before even months ago when farmer used to allocation lots of memory, at that time, it just eat all memory gradually but won’t crash.

You probably need at least four farms. I will try to reproduce it with four farms, later today . In the short test I ran, I didn’t see May 6th having the memory issue.

Well, if you can try to reduce that to one and confirm or deny that would help. I ideally need the smallest reproduction. I tried plotting with one and two farms for hours (not days though) and didn’t notice anything special.

I’m looking for patterns, if we can find it then I can modify software to accelerate it (for example I tried doing “plotting” without sector encoding) so we can see the behavior quickly and verify the fix.

That is odd and likely a different issue. What was the last version that didn’t have this issue? It is really important to be sure that the issue is there or not there or else we’ll spend a lot of time checking wrong stuff.

With May 15th I was able to recreate the issue with 6 plots. I will run the May 6th as well. I will check the May 6th next.

I do not see any memory leak with the May 6th release and 6 concurrent plots. I have not tried it with plots# > 6.

Can you try fewer maybe? Also how much time do you typically run it for before you see an issue and how severe it is by then?

I tested the May 15th release with both four and six plots. The four-plot configuration didn’t encounter any issues. Based on my experience, it took approximately three hours to conclude whether there were any issues. In my setup farmer normally uses around 40GB of RAM. In cases where there were issues I saw 60-70GB of RAM usage after three hours.

Thanks, very helpful. All of the farms are still plotting, right?

Yes, they are still plotting using May 6th release.

I think I managed to reproduce something like this, will try to narrow it down and hopefully fix next

1 Like

I have theories of what the root cause might be, but it is a pain to deal with Windows, especially trying to debug something. As such I decided to simply constrain piece cache reads to a single read at a time. It was almost like this before, except it was also blocking executor.

Let me know if it helps with the issue at all and if it does I’m fine keeping it as is on Windows for now and let other platforms benefit from higher concurrency in the meantime: Snapshot build · subspace/subspace@27b6eec · GitHub

Thanks. I will try this out.

And assuming it works, try Snapshot build · subspace/subspace@e88371b · GitHub (it doesn’t then don’t bother). It does the same thing, but restricts concurrency of piece reading to 32 instead of 1.

UPD: 1 seems to work from my testing and 32 seems to not work, at least not work particularly well memory usage-wise.

27b6eec release you shared didn’t work. I was able to reproduce the issue with 5 farms after 12 hours of plotting. The RAM usage after 12 hours was 80GB.

Thanks, I have no other ideas than bringing more or less the old code back. Here is a build with that change in progress: Snapshot build · subspace/subspace@9c32cd4 · GitHub

Hopefully that works finally.

Thank you. I’ll give this a try and provide an update. Apologies, Windows has been quite frustrating.

Yeah, I wish it was the first time, but it is not :disappointed: