High memory usage for June 11th release

P3-1452 · June 15, 2024, 4:42pm

Issue Report

I have about 70TB of plots on two machines. On each machine two farmers - one farming finished plots, and the second plotting and filling up 3 ssd’s. The node is on the amd.
On both machines, the memory of the farmer plotting fluctuates but over time goes up . On the Intel it climbs high and crashes. This may take hours or even a day.

Environment

Win11 Home
Advance CLI
AMD Ryzen 7950x + INTEL 14900k
RAM 64G 4800MHz

Problem

[Paste any errors or relevant logs here]

nazar-pc · June 16, 2024, 4:43am

Part of the issue here is that you run two farmers on the same machine, which makes little sense, there should be no need to do this and it will hurt success rate of farming, I don’t understand why are you doing this.

Can you provide full CLI commands you use for node and farmer?

Also it would be great if you used original executable names and not farmer1 and farmer2 that could be anything, including old version with memory leak that was just fixed in jun-11. I also don’t understand why you have two different executable names here, you can run the same one twice. So many questions about this setup.

And lastly it would really help if you could share all longs since farmer started until this level of memory usage.

P3-1452 · June 16, 2024, 8:29am

The reasons for running more than one farmer on a machine, have to do with time saving with a growing farm:

Any change to the farmer’s script, requires restarting it. The larger the farm is, the more time it takes to read initially and therefore I loose time on farming.
Adding new plot script to an already plotting farmer (and restarting it) sends it into piece cache sync which takes forever, and therefore I loose time on plotting.

So basically, I run a new executable whenever I want to initiate a plot creation. Didn’t think to use the same executable, which probably makes more sense.

Before we delve into script (which hardly changed since the beginning of 3h) and logs, I’ll run just one farmer (the plotting one) on each machine to check if memory issues persist. I’ll report back soon…

P3-1452 · June 16, 2024, 4:22pm

(post deleted by author)

nazar-pc · June 16, 2024, 8:54pm

You can avoid internal benchmarking on restart by specifying the result explicitly like this:

path=F:\farmer1,size=500GiB,record-chunks-mode=ConcurrentChunks

Then restart is much faster.

Can you be more specific? Piece cache sync (assuming it finished successfully last time and you had large enough total space) will finish after all piece cache is read, which also doesn’t take a long time.

And I don’t expect you need to resart often anyway, one minute when you restart doesn’t really change anything.

If my assumptions are not correct then there is a potential to improve performance.

This makes no sense because you will trigger piece cache sync from scratch and eventually will not have all the pieces cached locally, which basically means you are losing more time and bandwidth on plotting overall than saving on restart.

Are you really having 48G of RAM used by jun-11 release with just 3*500G farms? That is very bad. If so how much time does it take you to reproduce this?

User in High RAM usage for May 15th release had tens of farms and WAY bigger Epyc CPU and I don’t think they had this high memory usage after fixes that went into jun-11. Can you double-check that you definitely use jun-11 release? I strongly prefer when people don’t rename it into farmer.exe to prevent potential confusion. You already had farmer1, farmer2 and farmer on screenshots.

P3-1452 · June 18, 2024, 9:17am

I added “record-chunks-mode=ConcurrentChunks” and it works well.

Regarding executables, if I understand you correctly, I should use only the same farmer executable and I can use it even twice or more, and that won’t hurt success rate of farming? The reason I would want to use it more than once is to save time on initialization of a new plot creation. For some reason it takes about 7 hours or even much more to complete initiation of 3x500GiB plots. Is that normal? Could it be the quality of these specific ssd’s or not enough power delivered to them?

I’ve doubles checked and all are June 11 release. I’ve also changed executable names as you required. I’ll let it run and report the memory state soon…

nazar-pc · June 18, 2024, 10:03am

Exactly!

This is a Windows-specific issue. If you had all disks added at once from the very beginning, piece cache would sync and work well for plotting. We also use unplotted disk space for additional cache, which helps smaller farmers, but on Windows this requires the whole to be written from start to finish and that is what takes 7 hours. It can be disabled with --plot-cache false, but you better have enough piece cache if you do that (can just increase percentage to compensate for that temporarily).

jun-18 was released with various fixes, you might want to try it out instead.

P3-1452 · June 18, 2024, 12:22pm

Ok, 1.5 hours after update to June-18 release:

AMD - node + 2x farmers. One farming all full ssd’s, second plotting on 3 ssd’s.
Memory of plotting farmer - 41G
INTEL - 2x farmers. One farming all full ssd’s, second plotting on 3 ssd’s.
Memory of plotting farmer - 6G

P3-1452 · June 18, 2024, 12:22pm

nazar-pc · June 18, 2024, 7:29pm

Could you provide full farmer command and logs since farmer start on AMD system where you see an issue? 1.5 hours is a reasonably small period of time for reproduction, we should be able to test various solutions in that environment.

nazar-pc · June 19, 2024, 9:40am

Very strange. Amount of space is reasonable though, I’ll try to reproduce it in a VM. If it really takes 1.5 hours I should be able to catch it myself.

P3-1452 · June 19, 2024, 10:45am

If it’s of any help, I don’t mind running it till it crashes and share the log.

Just want to point out that even though I haven’t changed anything on the Intel 14900k, its memory has been very reasonable lately, maybe peaking at 17G but usually at 6G~10G. Where as before this, it used to reach about 52G and sometimes even crash because not enough memory.

nazar-pc · June 19, 2024, 12:57pm

Do you have identical number of farms and their sizes on both machines?
The biggest difference plotting-wise is that AMD processor will have 2 L3 cache groups, which in turn means it will encode more sectors concurrently by default.

You can try --sector-encoding-concurrency 1 on AMD system to get the same concurrency as on Intel system and on Intel system you can set it to 2 to get similar behavior to AMD. I’m wondering what results you will get from those experiments.

Waiting for it to crash is not helpful, what is helpful is that it seems to use quite a bit of memory for you fairly quickly. I was not able to reproduce anything remotely as bad as what you’re describing myself before.

P3-1452 · June 19, 2024, 6:33pm

I’m not sure what you mean by number of farms. Is it the number of plots? If so, they’re not identical. Also, the total size of the farms on the machines is not identical
AMD - 24TB plotted + 3TB plotting
INTEL - 40TB plotted + 7.5TB plotting

I changed the --sector-encoding-concurrency as you asked, and ran both machines only with plotting farmers. After one hour, the memory:
AMD 41G + sector creation rate came down (it’s cpu isn’t maxing out)
INTEL 6G

AMD:

I’m now running both machines with full farm. Seems like amd memory going up…

nazar-pc · June 20, 2024, 5:07am

Hm… what you’re saying doesn’t make a lot of sense given previous information. Previously in the command you have provided, there was just 6 500G farms, which is 3TB, not 24T+3T. I’m very confused as to what is going on. Are you running multiple farmer instances or something?

Under 20G is what I’d expect farmer to use during plotting on such machines, probably under 10G if it is not actively plotting.

As I said before, I’d need full commands of what you’re running as well as full logs since farmer has started. A few lines on screenshot are not giving enough information to work with.

P3-1452 · June 21, 2024, 9:24am

Yes, I meant two instances on each machine - one just farming already plotted full ssd’s, and the second plotting and filling up 3 ssd’s. But now I’m running just one instance on each machine with all the farms, that is about 27TB on AMD and 48TB on INTEL.

SINCE I changed the --sector-encoding-concurrency (AMD-1, INTEL-2):

only once (that I mentioned before) did the AMD reach 41G
Now on both machines memory is stable and about 19G. At least for the last 10 hours since starting
AMD sector creation rate came down from 43 sectors/hour to 30 sectors/hour, and cpu isn’t maxing. While INTEL sector creation rate stayed the same (33 sectors/hour) and its cpu is maxing.

So, besides sector creation rate going down significantly on AMD, memory seems ok and stable for now.

This is the AMD farmer commands: WeTransfer - Send Large Files & Share Photos Online - Up to 2GB Free
Sorry, for this run I haven’t saved full logs. I’ll do that next time…

P3-1452 · June 24, 2024, 8:53am

I can report that both machines have been running smoothly for the last 48 hours, one instance each with all plots, with AMD memory peaking at 27G (INTEL 20G) but most of the time around 20G. Both are plotting.

As I mentioned before, lowering the AMD --sector-encoding-concurrency to 1 lowered its sector creation rate significantly. I’ve now changed it to 2…

nazar-pc · June 24, 2024, 11:32am

So concurrency somehow causes higher memory usage, even on Intel (the fact that it got there once means that it can get there at some point later as well, likely when replotting kicks in).

Very interesting, I have no idea why this might be happening yet and I was running in a VM on Windows 11 Pro with sector concurrency 2 for a few hours and have not seen quite this dramatic increase in memory usage. It is a VM running on AMD Threadripper 7970X host with number of CPU cores limited 16.

It went down because lower concurrency is less efficient on that particular processor. Now we just need to figure out why it is using inadequately more memory with higher concurrency. Can you maybe set sector concurrency on AMD to 3 for fun and see how much time it takes for it to consume 40G of memory? Ideally it’d do that shortly after start, so we have a reliable reproduction.

It is still suboptimal, I see you have many disks with multiple farms on them, which is less efficient for auditing and plotting, it is not necessary to do that for a very long time now (it was beneficial in some cases in much older versions of Gemini 3h software, but not anymore). Though it should not be the reason of high memory usage in another process of course.

Thanks for providing additional details!

gensyn_good · June 26, 2024, 2:30am

【environment】
Windows 11 23H2 Enterprise Edition
CLI gemini-3h-2024-jun-18 skylake
AMD EPYC 7763 CPU + Samsung DDR4 512GB Memory
10 pieces of 15.36T SSD

After running for a period of time, the farmer process will exit!

P3-1452 · June 26, 2024, 8:31am

I can report running the AMD on concurrency 3 over more than 24 hours with max memory of 27G as far as I noticed.

On INTEL (concurrency 2) I ran one instance plotting with about 40TB already plotted, memory was around 20G. Later on, added another instance to initiate piece cache sync. Sometime after it started plotting, I found the first instance with 38G memory. It fell to 12G Immediately after closing the second instance. I should say that the second instance was plotting to some same ssd’s as the first instance, though I don’t know if it’s connected to the memory issue.

Regarding many farms on one ssd instead of one farm - I still do that because the initial piece cache sync takes way longer for a 4TB farm than 500GB farm. Of course I could always start earlier…

Topic		Replies	Views
High RAM usage for May 15th release Support	47	384	June 19, 2024
Fake display of high RAM usage or RAM leak on Windows by Subspace farmer Support	132	1371	March 8, 2024
Plotting speed and synchronisation Support	5	671	June 13, 2022
Node fails to create large plots Testing	8	360	March 18, 2023
A plotting application with one SSD is much faster than a plotting application with multiple SSDs Support	8	391	October 22, 2023

High memory usage for June 11th release

Issue Report

Environment

Problem

Related topics