For a long time farmers were saying that plotting is slow on large CPUs, now it is time to change that!
I’ve been hacking on NUMA support that should make things much better and need folks to test and provide feedback to confirm it is actually a positive change.
Please read this post to the very end before replying!
What is changing
There are several behaviors on the farmer that will be different.
Global thread pools
Previously plotting/replotting thread pools were created for each farm separately even though only configured number of them can be used at a time (by default just one). With upcoming changes there is a concept of thread pool manager that will create necessary number of thread pools that will be allocated to currently plotting/replotting farms.
Thread pinning
When thread pool is created, it is assigned to a set of CPU cores and will only be able to use those cores. Pinning doesn’t pin threads to cores 1:1 because I noticed it makes plotting A LOT slower on my machine at least, instead OS is free to move threads between cores, but only withing CPU cores allocated for thread pool. This will ensure plotting for a particular sector only happens on particular CPU/NUMA node.
NUMA support
On Linux and Windows farmer will detect NUMA systems and create number of thread pools that corresponds to number of NUMA nodes. This means default behavior will change for large CPUs and will consume more memory as the result, but that can be changed to the previous behavior with familiar CLI options if desired.
NOTE: You will have to enable NUMA in BIOS of your motherboard for farmer to know it exists, this option is definitely present in motherboards for Threadripper/Epyc processors, but might exist in others too. If you don’t enable it, both OS and farmer will think you have a single UMA processor and will not be able to apply optimizations!
Experimental NUMA-aware memory allocator
Mimalloc allocator we are using apparently has opt-in NUMA-aware allocation support, which is also exposed in Subspace farmer now.
What and how to test
For testing purposes I have created a test build here: Snapshot build · subspace/subspace@1e88a23 · GitHub
You’ll need a GitHub account to see build artifacts at the bottom of the page there, for container images gemini-3g-backport-numa-support
tag can be used.
To confirm positive changes I’d like you to test following scenarios:
- last release (
dec-22
) with default configuration and no tweaks to thread pools, concurrent encodings, etc. Just stock behavior - if you have done some CLI tweaks, test with them too
- try this experimental build with defaults (don’t change CLI options around thread pools, concurrent encodings, none of it should be necessary anymore)
- try 3. again, but also set environment variable
NUMA_ALLOCATOR=1
to use experimental NUMA-aware memory allocator that might further improve performance by keeping both compute and memory mostly within the same NUMA node
Results
Please post results in the following format:
- CPU AMD Epyc 7302 x2, RAM 16G DDR4 x16 (meaning motherboard has two Epyc 7302 processors installed and 16 memory modules, 16G each)
- 5m10s per sector (one sector is encoded at a time by default)
- 6m0s per sector, 4 sectors at a time (meaning number of downloaded and encoded sectors was manually increased)
- 4m30s per sector, 8 sectors at a time (meaning 8 NUMA nodes)
- 2m90s per sector, 8 sectors at a time (meaning 8 NUMA nodes)
Where 0 is information about your system and 1…4 correspond to tests described above.
Important remarks
First of all, these changes are only benefiting NUMA systems AND if they use multiple drives with the same farmer application to benefit from concurrent encoding.
Please keep this thread low-bandwidth and only use it to post results.
If you don’t know how to test, how to set environment variables, see errors in the process, etc. - ask in Discord, someone will help you (please don’t tag me directly).