High RAM usage for May 15th release

wudung · May 19, 2024, 12:33am

Issue Report

There’s a significant issue with the May 15th release that I’ve observed, and it’s causing abnormally high RAM usage. To put it into perspective, I have around 83TiB of plots on this machine. When the issue occurred, the farmer was nearing completion of the last 5 of the 33 plots. The RAM usage for May 6th and prior releases was around 40GB. However, with the May 15th release, after just 10 hours of farming and plotting, the usage was 100GB; after 17 hours, it was 188GB.

Environment

Win11 Pro for Workstation
Advance CLI
AMD EPYC 9004 64 cores
RAM 4800MT/s

Problem

[Paste any errors or relevant logs here]

nazar-pc · May 19, 2024, 12:44am

Thanks for reporting. I checked the changes between may-06 and may-15 and didn’t see anything obvious.

I triggered a test build for code in between those two releases, can you check and tell me if it works better or not?: Snapshot build · subspace/subspace@2097fcb · GitHub

Also I’d like to double-check that you’re back on may-06 and no longer have memory usage issue, not just something you think was not the case back then.

wudung · May 19, 2024, 10:54am

Yes, I ran the May-6th release before creating this thread to confirm no memory issue. I also switched back to May 15th to confirm the memory issue I observed was reproducible. I will run the the test release and share the result . Thanks

wudung · May 21, 2024, 10:12pm

I tried the custom build you shared, but I finished plotting around the same time I started testing this build. For farming alone, I don’t see any difference between the April 25th, May 6th, May 15th builds, and the custom build. All the builds used around 30GB of RAM. Is there anything I can do to simulate the plotting process?

nazar-pc · May 22, 2024, 1:27am

I see, interesting. So it must be somehow plotting-related and Windows-specific. I still see nothing that would indicate regression from may-06 to may-15 though.

Depending on how much time it takes to consume a lot of RAM you can decrease and then increase size of the farm to create a few sectors that need to be plotted just to try it out.

P.S. There is no apr-25 release just like there was no may-16 release (I have edited the original post).

wudung · May 23, 2024, 11:06am

I changed the plot sizes and plotted using the custom release that you shared, and it doesn’t have the memory issue.

Moreover, I updated my windows to the latest, and ran the May 15th release to rule out any window issues. The memory usage is high even after the update.

Thanks for fixing the release dates. I have kept local copies of releases I have tried, and some of them were indexed by the download dates.

nazar-pc · May 24, 2024, 1:49am

Thanks for testing!
So I see the only change that might potentially impact it, which Snapshot build · subspace/subspace@4d4ccd5 · GitHub adds on top of the previous build. Please give it a try once it is build and let me know if the issue is present. I’ll look into the changes included there in the meantime to see what might be going on there.

Also if you can provide exact full command you’re running farmer and logs that’d be great!

wudung · May 24, 2024, 2:21am

The farmer command is :

.\subspace-farmer.exe farm --node-rpc-url ws://192.168.10.105:9944 --metrics-endpoints 192.168.10.105:7879 --reward-address subspace:address path=H:\plot1,size=1.8TiB path=H:\plot2,size=1.8TiB path=G:\plot1,size=1.8TiB path=G:\plot2,size=1.8TiB path=I:\plot1,size=2.75TiB path=I:\plot2,size=2.75TiB path=I:\plot3,size=2.75TiB path=I:\plot4,size=2.75TiB path=I:\plot5,size=2.75TiB | Tee-Object -file subspace_farmer.log -append

I will test the new build tomorrow, and update.

qbit · May 24, 2024, 5:02am

Same here, I only noticed this after farmer printed something like “allocation some bytes of memory failed” and exited. This happened on several machines, all Win11, 64GB memory, no more than 60-70TB plots one machine, never happened before, had to reverse back to May 6 version.

qbit · May 24, 2024, 5:17am

just noticed there is new release today, but didn’t mention the memory issue, so I suppose it’s not fixed right?

nazar-pc · May 24, 2024, 9:27am

This means it ran out of system memory.

Nothing was done intentionally to address it because I do not yet know the root cause.

nazar-pc · May 24, 2024, 9:59am

Okay, two more test builds that I think will narrow it down to a single commit.

I think this should be last good commit: Snapshot build · subspace/subspace@eccc821 · GitHub
And this will be the first bad commit: Snapshot build · subspace/subspace@647b363 · GitHub

Can you confirm?

wudung · May 24, 2024, 8:33pm

So here is the report for all the test releases :

2097fcb - good
4d4ccd5 - bad
eccc821 - good
647b363 -bad

nazar-pc · May 25, 2024, 1:16am

So 647b363 introduced a simple change that allows actually concurrent piece cache reads in many cases. This should improve performance of piece cache reads and unless concurrency parameters were heavily customized should not result in higher memory usage.

Since this is only happening while plotting, it is reads that are causing issues, but I see no reason why this should be happening. On your CPU there is 8 CCDs, so it will plot up to 8 sectors concurrently and there should be no way for it to use over 100G of RAM and I know we have even larger farmers on Linux that do not report such issues.

Can you set environment variable RUST_LOG to info,subspace_farmer::utils::farmer_piece_cache=trace,subspace_farmer_components::plotting=trace and run one of the problematic builds and upload collected logs somewhere? That should give me a bit more details about what your farmer is doing during plotting and why it might use so much memory.

wudung · May 26, 2024, 1:28am

I ran the 647b363 with the environment variable set as you suggested. Please find the logs for farmer and node at logs

nazar-pc · May 26, 2024, 3:17am

Right now I don’t think that particular first bad commit is wrong on its own, but it likely uncovered an issue elsewhere that wasn’t reproducible before (at least not easily).

After re-reading code I found an edge-case addressed in Improve `FarmerPieceGetter` by nazar-pc · Pull Request #2793 · subspace/subspace · GitHub. I don’t think it was actually possible to trigger, but just in case I initiated a test build with those changes anyway: Snapshot build · subspace/subspace@7f94a49 · GitHub

Please try it with environment variable RUST_LOG set to info,subspace_farmer::utils::farmer_piece_getter=trace,subspace_farmer_components::plotting=trace and share logs like you did last time (note that environment variable has slightly different value this time).

So far I don’t see any other issues, but it is also hard to blame over 100G of RAM just on memory allocator behavior alone, so there must be something somewhere.

Thanks for all of the tests so far!

wudung · May 26, 2024, 5:06pm

I am happy to hear that the testing is getting us closer to finding the root cause of the leak. I ran 7f94a49 with the changed value for RUST_LOG. This version also has the problem, and I stopped it at 70GB RAM. The good version stabilizes around 40-50GB for my setup. LOG

nazar-pc · May 27, 2024, 1:08am

Okay, then it was not triggered indeed. I’ll keep looking then and post once I find something relevant.

nazar-pc · May 28, 2024, 4:52am

Checked logs and didn’t see anything unusual. Given the fact that it worked before with piece cache reads being blocking, I decided to try and constrain piece getting concurrency, please give it a try: Snapshot build · subspace/subspace@5321620 · GitHub

It does seem like allocator misbehavior of sorts to me so far, not that it helps you as a user.

wudung · May 28, 2024, 11:56pm

The logs for the latest trial build are : logs

This build also has the runaway memory problem. I am happy to try more builds if you have any other ideas.

Topic		Replies	Views
Fake display of high RAM usage or RAM leak on Windows by Subspace farmer Support	132	1452	March 8, 2024
High memory usage for June 11th release Support	26	311	July 3, 2024
Memory doesn't release after reploting completed (Advanced CLI May-24) Support	2	63	June 9, 2024
Slowed plotting speed for mining due to an excessive number of physical SSDs Support	10	425	November 2, 2023
Windows version - Subspace farmer: how does it use mapped file in RAM? Support	2	240	October 14, 2023

High RAM usage for May 15th release

Issue Report

Environment

Problem

Related topics