NUMA aware plotter slows during replotting

Issue Report

The numa aware plotter does slow down during replotting, when one numa instance goes into replotting mode.
It appears all numa nodes slow the plotting process to the slower replotting speed.

Environment

Ubuntu22.04
Pre-release numa plotter (likely also applies to Jan-03)

Problem

2023-12-31T11:22:30.780633Z  INFO single_disk_farm{disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Plotting sector (99.85% complete) sector_index=3414
2023-12-31T11:31:38.744687Z  INFO single_disk_farm{disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Plotting sector (99.88% complete) sector_index=3415
2023-12-31T11:31:52.698893Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (80.38% complete) sector_index=3233
2023-12-31T11:41:12.563662Z  INFO single_disk_farm{disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Plotting sector (99.91% complete) sector_index=3416
2023-12-31T11:42:05.044334Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (80.41% complete) sector_index=3234
2023-12-31T11:50:25.811524Z  INFO single_disk_farm{disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Plotting sector (99.94% complete) sector_index=3417
2023-12-31T11:52:55.426509Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (80.43% complete) sector_index=3235
2023-12-31T11:59:36.592787Z  INFO single_disk_farm{disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Plotting sector (99.97% complete) sector_index=3418
2023-12-31T12:03:49.914380Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (80.46% complete) sector_index=3236
2023-12-31T12:08:49.026389Z  INFO single_disk_farm{disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Initial plotting complete
2023-12-31T12:08:49.175253Z  INFO single_disk_farm{disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Replotting sector (0.00% complete) sector_index=214
2023-12-31T12:15:15.124323Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (80.48% complete) sector_index=3237
2023-12-31T12:22:42.437863Z  INFO single_disk_farm{disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Replotting sector (0.67% complete) sector_index=240
2023-12-31T12:27:49.942203Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (80.51% complete) sector_index=3238
2023-12-31T12:36:30.308317Z  INFO single_disk_farm{disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Replotting sector (1.33% complete) sector_index=307
2023-12-31T12:40:25.007955Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (80.53% complete) sector_index=3239
2023-12-31T12:50:29.088330Z  INFO single_disk_farm{disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Replotting sector (2.00% complete) sector_index=318
2023-12-31T12:53:10.126838Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (80.56% complete) sector_index=3240
2023-12-31T13:04:14.383131Z  INFO single_disk_farm{disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Replotting sector (2.67% complete) sector_index=429
2023-12-31T13:05:40.415630Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (80.58% complete) sector_index=3241
2023-12-31T13:17:59.150840Z  INFO single_disk_farm{disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Replotting sector (3.33% complete) sector_index=436
2023-12-31T13:18:15.119444Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (80.61% complete) sector_index=3242
2023-12-31T13:30:59.399356Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (80.63% complete) sector_index=3243

Plot times per sector jump from 10min to 12min after one numa node starts replotting.
(This change appears small, but the change is much larger on dedicated hardware, on this server subspace plotting only accounts for a small CPU percentage).

I can see how this might be possible and it indeed only affects combination of initial plotting/replotting at the same time.

Created a GitHub issue about this: Pair plotting and replotting thread pools into pairs · Issue #2385 · subspace/subspace · GitHub

Yes, only replotting and plotting at the same time show the issue. Finishing plotting on a numa core without replot has no effect.

I recreated the issue on the 8 numa core TR. Interestingly enough not all cores slow their plotting. Of the 2 cores still plotting one keeps at 9-10min/sector, the 2nd one slowed to 15min/sector. (2 numa cores plotting, 3 replotting, 3 on break).

Yes, slowdown only happens when there is overlap

Will be fixed in upcoming release with NUMA support improvements by nazar-pc · Pull Request #2392 · subspace/subspace · GitHub

Fixed in Release gemini-3g-2024-jan-08 · subspace/subspace · GitHub

Confirmed working. Thanks!