Is this NUMA test result real?

z_W · January 3, 2024, 7:24am

CPU AMD Epyc 7302 x2, RAM 16G DDR4 x16 (meaning motherboard has two Epyc 7302 processors installed and 16 memory modules, 16G each)
5m10s per sector (one sector is encoded at a time by default)
6m0s per sector, 4 sectors at a time (meaning number of downloaded and encoded sectors was manually increased)
4m30s per sector, 8 sectors at a time (meaning 8 NUMA nodes)
2m90s per sector, 8 sectors at a time (meaning 8 NUMA no

nazar-pc · January 3, 2024, 11:49am

Looks plausible, though impact of NUMA-aware memory allocator seems huge, especially considering that 7002 Epyc processors use I/O die and I don’t think there should be significant difference between accessing any of the memory channels, though you do have two physical sockets and maybe crossing from one socket to another is very costly on 7002 Epyc platform.

If after repeated tests this is confirmed, we might make NUMA-aware memory allocator the default because negative impact on other platforms is limited and benefit here is massive.

z_W · January 3, 2024, 12:11pm

The optimization in the new version is still not as fast as running multiple instances of the software.

nazar-pc · January 3, 2024, 12:16pm

You numbers say the opposite though

z_W · January 3, 2024, 12:28pm

This is the test data you gave.

nazar-pc · January 3, 2024, 12:30pm

I mean in your first message version of the farmer with NUMA support is faster than version without NUMA support (even when configured to plot 4 sectors at a time). Why are you saying it is not as fast as running multiple instances?

z_W · January 3, 2024, 12:34pm

I have a server with only two NUMA nodes. Running the test version, the speed is 5m-6m*2, but I can only open one software. When I open two, the speed drops to over 10 minutes.

Without using the NUMA version, I can open four software, each running stably at 7m*1.

EPYC7302*2 is your CPU

nazar-pc · January 3, 2024, 12:40pm

Hm… the whole point of the new version is to utilize CPU fully, you shouldn’t need more than one instance because it’ll be less efficient, which is exactly what you see. Running multiple instances was a workaround for not supporting NUMA that is no longer necessary.

nazar-pc · January 3, 2024, 12:46pm

Ah, sorry for confusion. Those were just examples, they are made up numbers and just provided for illustration purposes to show how to submit test results.

z_W · January 3, 2024, 1:10pm

My CPU has many cores, but there are only two NUMA nodes. I expect the ideal speed for my CPU to be 7m-8m4. However, the actual speed is 5m-6m2.

nazar-pc · January 3, 2024, 1:18pm

Why?

Isn’t this not a good thing?

z_W · January 3, 2024, 1:39pm

Assuming I have 4 SSDs, the speed when running multiple instances of the software is 1SSD 7m-8m x1 x4. Using the new version and opening only one instance of the software, the speed is 4SSD 5m-6m x2 x1.

nazar-pc · January 3, 2024, 2:02pm

Did you customize any CLI options related to thread pool size or number of encoded sectors in new version? They will interfere with the intended behavior.

Overall it is possible that on some CPUs it will still be non-ideal in some configurations, for example in case of your 8272CL it is simply not the most optimal CPU due to just 2 NUMA nodes and such a massive number of cores in each that many algorithms will not take full advantage of it.

You should still be able to benefit by running two instances instead of 4. In worst case you’ll just run 4 instances like before. As long as performance doesn’t regress I think it is a win because NUMA support is clearly better than previous default.

BTW, with new version threads are pinned to cores, so if you specify encoding concurrency to 4, you should get very good CPU utilization while also avoiding crossing NUMA nodes with just one farmer.

z_W · January 3, 2024, 2:35pm

When I start a software process, I use the default parameters.

I am trying to start two software processes. Can I start two processes with these parameters?

--sector-downloading-concurrency 4 
--sector-encoding-concurrency 4
--farming-thread-pool-size 10
--plotting-thread-pool-size 16
--replotting-thread-pool-size 8

so?

nazar-pc · January 3, 2024, 2:59pm

If you want to have 4 farms plotted at the same time with one instance of the farmer, I would recommend to specify a single option:

--sector-encoding-concurrency 4

Farmer should be able to calculate all other options automatically in an optimal way.

This will result in half of each CPU being dedicated to plotting of a single sector, replotting will be configured to 1/4 of CPU core and downloading concurrency will be set to optimal value of 5. If you want overlap between sectors for plotting process you might also add --plotting-thread-pool-size 52 and each NUMA node will be processing 2 farms at the same time, but there will still be no NUMA node crossing.

I’m fairly certain it will be more efficient than running multiple farmer instances, especially if you’re not pinning them to NUMA nodes.

z_W · January 3, 2024, 3:04pm

I’ll try the parameters you recommended

z_W · January 3, 2024, 3:49pm

nazar-pc · January 3, 2024, 4:12pm

So ~4m45s per sector, not very fast for such system. You can set number of plotting threads to 104 to achieve the same result as running multiple separate farmers. This is up to you to experiment and share the findings.

z_W · January 4, 2024, 4:18am

more slowly,

–farming-thread-pool-size 10
–plotting-thread-pool-size 16
–replotting-thread-pool-size 8

I usually use this parameter to start four software processes

z_W · January 7, 2024, 3:27pm

2024-01-07T15:25:39.261781Z  INFO subspace_farmer::commands::farm: NUMA system detected numa_nodes=2
2024-01-07T15:25:39.261794Z  WARN subspace_farmer::commands::farm: Too few disk farms, CPU will not be utilized fully during plotting, same number of farms as NUMA nodes or more is recommended numa_nodes=2 farms_count=4

why is that?

Topic		Replies	Views
NUMA support is coming General farmer , performance , plotting	25	1559	February 1, 2024
Flexible assignment of cores to concurrent plotters Research plotting	2	264	January 25, 2024
NUMA aware plotter slows during replotting Support	6	271	January 9, 2024
A plotting application with one SSD is much faster than a plotting application with multiple SSDs Support	8	392	October 22, 2023
Worse sector speeds & misses than before, dual Epyc 7532, Windows 10, Mar 29 release Support	17	206	April 2, 2024

Is this NUMA test result real?

Related topics