High memory usage for June 11th release

Issue Report

I have about 70TB of plots on two machines. On each machine two farmers - one farming finished plots, and the second plotting and filling up 3 ssd’s. The node is on the amd.
On both machines, the memory of the farmer plotting fluctuates but over time goes up . On the Intel it climbs high and crashes. This may take hours or even a day.

Environment

  • Win11 Home
  • Advance CLI
  • AMD Ryzen 7950x + INTEL 14900k
  • RAM 64G 4800MHz

Problem

[Paste any errors or relevant logs here]

Part of the issue here is that you run two farmers on the same machine, which makes little sense, there should be no need to do this and it will hurt success rate of farming, I don’t understand why are you doing this.

Can you provide full CLI commands you use for node and farmer?

Also it would be great if you used original executable names and not farmer1 and farmer2 that could be anything, including old version with memory leak that was just fixed in jun-11. I also don’t understand why you have two different executable names here, you can run the same one twice. So many questions about this setup.

And lastly it would really help if you could share all longs since farmer started until this level of memory usage.

The reasons for running more than one farmer on a machine, have to do with time saving with a growing farm:

  1. Any change to the farmer’s script, requires restarting it. The larger the farm is, the more time it takes to read initially and therefore I loose time on farming.
  2. Adding new plot script to an already plotting farmer (and restarting it) sends it into piece cache sync which takes forever, and therefore I loose time on plotting.

So basically, I run a new executable whenever I want to initiate a plot creation. Didn’t think to use the same executable, which probably makes more sense.

Before we delve into script (which hardly changed since the beginning of 3h) and logs, I’ll run just one farmer (the plotting one) on each machine to check if memory issues persist. I’ll report back soon…

Ok, I ran just one plotting farmer along side the node, and still memory skyrocketed to 48G. At the time of writing, it came down to 18G
(June 11 release)

.\node.exe
run
–base-path C:\Users\Rig\Desktop\subspace\sub11test\node3h2
–chain gemini-3h
–farmer
–in-peers 96
–out-peers 24
–rpc-listen-on 192.168.0.42:9944
–rpc-methods unsafe
–rpc-cors all
–name ggb2

.\farmer.exe
farm
–reward-address stAiUsXwLjiXHkWUmxkfYZhC6BHkNWwrXJvizzQyXBH4nQWrb
–node-rpc-url ws://192.168.0.42:9944
–listen-on /ip4/192.168.0.42/tcp/30622
–listen-on /ip4/192.168.0.42/udp/30622/quic-v1
–record-encoding-concurrency 3
path=F:\farmer1,size=500GiB
path=G:\farmer1,size=500GiB
path=H:\farmer1,size=500GiB

You can avoid internal benchmarking on restart by specifying the result explicitly like this:

path=F:\farmer1,size=500GiB,record-chunks-mode=ConcurrentChunks

Then restart is much faster.

Can you be more specific? Piece cache sync (assuming it finished successfully last time and you had large enough total space) will finish after all piece cache is read, which also doesn’t take a long time.

And I don’t expect you need to resart often anyway, one minute when you restart doesn’t really change anything.

If my assumptions are not correct then there is a potential to improve performance.

This makes no sense because you will trigger piece cache sync from scratch and eventually will not have all the pieces cached locally, which basically means you are losing more time and bandwidth on plotting overall than saving on restart.

Are you really having 48G of RAM used by jun-11 release with just 3*500G farms? That is very bad. If so how much time does it take you to reproduce this?

User in High RAM usage for May 15th release had tens of farms and WAY bigger Epyc CPU and I don’t think they had this high memory usage after fixes that went into jun-11. Can you double-check that you definitely use jun-11 release? I strongly prefer when people don’t rename it into farmer.exe to prevent potential confusion. You already had farmer1, farmer2 and farmer on screenshots.

I added “record-chunks-mode=ConcurrentChunks” and it works well.

Regarding executables, if I understand you correctly, I should use only the same farmer executable and I can use it even twice or more, and that won’t hurt success rate of farming? The reason I would want to use it more than once is to save time on initialization of a new plot creation. For some reason it takes about 7 hours or even much more to complete initiation of 3x500GiB plots. Is that normal? Could it be the quality of these specific ssd’s or not enough power delivered to them?

I’ve doubles checked and all are June 11 release. I’ve also changed executable names as you required. I’ll let it run and report the memory state soon…

Exactly!

This is a Windows-specific issue. If you had all disks added at once from the very beginning, piece cache would sync and work well for plotting. We also use unplotted disk space for additional cache, which helps smaller farmers, but on Windows this requires the whole to be written from start to finish and that is what takes 7 hours. It can be disabled with --plot-cache false, but you better have enough piece cache if you do that (can just increase percentage to compensate for that temporarily).

jun-18 was released with various fixes, you might want to try it out instead.

Ok, 1.5 hours after update to June-18 release:

  1. AMD - node + 2x farmers. One farming all full ssd’s, second plotting on 3 ssd’s.
    Memory of plotting farmer - 41G
  2. INTEL - 2x farmers. One farming all full ssd’s, second plotting on 3 ssd’s.
    Memory of plotting farmer - 6G

Could you provide full farmer command and logs since farmer start on AMD system where you see an issue? 1.5 hours is a reasonably small period of time for reproduction, we should be able to test various solutions in that environment.

.\subspace-farmer-windows-x86_64-skylake-gemini-3h-2024-jun-18.exe
farm
–reward-address stAiUsXwLjiXHkWUmxkfYZhC6BHkNWwrXJvizzQyXBH4nQWrb
–node-rpc-url ws://192.168.0.42:9944
–listen-on /ip4/192.168.0.42/tcp/30622
–listen-on /ip4/192.168.0.42/udp/30622/quic-v1
–record-encoding-concurrency 3
path=F:\farmer1,size=500GiB,record-chunks-mode=ConcurrentChunks
path=G:\farmer1,size=500GiB,record-chunks-mode=ConcurrentChunks
path=H:\farmer1,size=500GiB,record-chunks-mode=ConcurrentChunks
path=F:\farmer2,size=500GiB,record-chunks-mode=ConcurrentChunks
path=G:\farmer2,size=500GiB,record-chunks-mode=ConcurrentChunks
path=H:\farmer2,size=500GiB,record-chunks-mode=ConcurrentChunks

The farmer reached 52G after about 1.5 hours
Full log in:

Very strange. Amount of space is reasonable though, I’ll try to reproduce it in a VM. If it really takes 1.5 hours I should be able to catch it myself.

If it’s of any help, I don’t mind running it till it crashes and share the log.

Just want to point out that even though I haven’t changed anything on the Intel 14900k, its memory has been very reasonable lately, maybe peaking at 17G but usually at 6G~10G. Where as before this, it used to reach about 52G and sometimes even crash because not enough memory.

Do you have identical number of farms and their sizes on both machines?
The biggest difference plotting-wise is that AMD processor will have 2 L3 cache groups, which in turn means it will encode more sectors concurrently by default.

You can try --sector-encoding-concurrency 1 on AMD system to get the same concurrency as on Intel system and on Intel system you can set it to 2 to get similar behavior to AMD. I’m wondering what results you will get from those experiments.

Waiting for it to crash is not helpful, what is helpful is that it seems to use quite a bit of memory for you fairly quickly. I was not able to reproduce anything remotely as bad as what you’re describing myself before.

I’m not sure what you mean by number of farms. Is it the number of plots? If so, they’re not identical. Also, the total size of the farms on the machines is not identical
AMD - 24TB plotted + 3TB plotting
INTEL - 40TB plotted + 7.5TB plotting

I changed the --sector-encoding-concurrency as you asked, and ran both machines only with plotting farmers. After one hour, the memory:
AMD 41G + sector creation rate came down (it’s cpu isn’t maxing out)
INTEL 6G

AMD:

I’m now running both machines with full farm. Seems like amd memory going up…

Hm… what you’re saying doesn’t make a lot of sense given previous information. Previously in the command you have provided, there was just 6 500G farms, which is 3TB, not 24T+3T. I’m very confused as to what is going on. Are you running multiple farmer instances or something?

Under 20G is what I’d expect farmer to use during plotting on such machines, probably under 10G if it is not actively plotting.

As I said before, I’d need full commands of what you’re running as well as full logs since farmer has started. A few lines on screenshot are not giving enough information to work with.

Yes, I meant two instances on each machine - one just farming already plotted full ssd’s, and the second plotting and filling up 3 ssd’s. But now I’m running just one instance on each machine with all the farms, that is about 27TB on AMD and 48TB on INTEL.

SINCE I changed the --sector-encoding-concurrency (AMD-1, INTEL-2):

  1. only once (that I mentioned before) did the AMD reach 41G
  2. Now on both machines memory is stable and about 19G. At least for the last 10 hours since starting
  3. AMD sector creation rate came down from 43 sectors/hour to 30 sectors/hour, and cpu isn’t maxing. While INTEL sector creation rate stayed the same (33 sectors/hour) and its cpu is maxing.

So, besides sector creation rate going down significantly on AMD, memory seems ok and stable for now.

This is the AMD farmer commands: WeTransfer - Send Large Files & Share Photos Online - Up to 2GB Free
Sorry, for this run I haven’t saved full logs. I’ll do that next time…

I can report that both machines have been running smoothly for the last 48 hours, one instance each with all plots, with AMD memory peaking at 27G (INTEL 20G) but most of the time around 20G. Both are plotting.

As I mentioned before, lowering the AMD --sector-encoding-concurrency to 1 lowered its sector creation rate significantly. I’ve now changed it to 2…

So concurrency somehow causes higher memory usage, even on Intel (the fact that it got there once means that it can get there at some point later as well, likely when replotting kicks in).

Very interesting, I have no idea why this might be happening yet and I was running in a VM on Windows 11 Pro with sector concurrency 2 for a few hours and have not seen quite this dramatic increase in memory usage. It is a VM running on AMD Threadripper 7970X host with number of CPU cores limited 16.

It went down because lower concurrency is less efficient on that particular processor. Now we just need to figure out why it is using inadequately more memory with higher concurrency. Can you maybe set sector concurrency on AMD to 3 for fun and see how much time it takes for it to consume 40G of memory? Ideally it’d do that shortly after start, so we have a reliable reproduction.

It is still suboptimal, I see you have many disks with multiple farms on them, which is less efficient for auditing and plotting, it is not necessary to do that for a very long time now (it was beneficial in some cases in much older versions of Gemini 3h software, but not anymore). Though it should not be the reason of high memory usage in another process of course.

Thanks for providing additional details!

【environment】
Windows 11 23H2 Enterprise Edition
CLI gemini-3h-2024-jun-18 skylake
AMD EPYC 7763 CPU + Samsung DDR4 512GB Memory
10 pieces of 15.36T SSD

After running for a period of time, the farmer process will exit!