As requested, I have completed the comparison of performance metrics between single and multiple instances. The analysis includes plotting time, GPU power usage, GPU utilization in %, and GPU memory usage. I’ve also included nvtop
graphs to highlight any significant differences. I hope this helps in understanding how to better improve single instance performance to better match that of multiple instances.
Performance Benchmarks
These performance benchmarks were conducted using two RTX 4090 GPUs with the following approach:
- Snapshot build #383
- A total of nine drives were used to prevent them from becoming a bottleneck.
- After adding each plotter to the cluster, I allowed at least one minute of plotting to reduce the effect of the initial ramp up period.
- GPU statistics were collected from each GPU, averaged over a 300 second period. Data was captured every five seconds.
- Sector times were calculated based on successful sectors plotted during the same 300 second period, aligning with the GPU statistics.
Observations
- The sector times are somewhat skewed due to failed sectors that waste processing time. Unfortunately, this issue has been ongoing, and has been reported in the past. This behavior makes replicating ideal conditions difficult.
- Gaps in GPU utilization were resolved by running more than
fivefour concurrent plotters.
2024-09-15T16:12:45.525621Z WARN {farm_index=3}:{sector_index=37}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Plotting progress stream ended before plotting finished
2024-09-15T16:12:45.540373Z WARN {farm_index=3}:{sector_index=31}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=16 expected_index=15 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress response_subject=stream-response.01J7V767AZR6MEJKYSTQAZFBXS
2024-09-15T16:12:45.540633Z WARN {farm_index=3}:{sector_index=31}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Plotting progress stream ended before plotting finished
2024-09-15T16:12:45.655387Z INFO {farm_index=0}:{sector_index=19}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry
Thoughts
- The results were rerun with a concurrency of six because the initial numbers appeared inconsistent. This explains the presence of duplicate results for that run.
- A concurrency of eight yielded the best results yesterday, with a sector time of 2.7 seconds. It appears that for these two GPUs, a concurrency of seven or eight is the optimal setting.
Results
========================================
Monitoring Results - 2024-09-15 10:23:11
Number of Plotter Instances: 1
Plotter Performance: Sectors: 35 | Avg/min: 7.00 | Avg time: 8.58 sec | TiB/day: 9.83
GPU 0:
Utilization: Max: 93%, Avg: 72.65%
Memory Usage: Max: 1940MB, Avg: 1939.65MB
Power Draw: Max: 211.31W, Avg: 180.69W
GPU 1:
Utilization: Max: 91%, Avg: 69.41%
Memory Usage: Max: 1940MB, Avg: 1939.51MB
Power Draw: Max: 233.84W, Avg: 189.48W
========================================
========================================
Monitoring Results - 2024-09-15 08:55:36
Number of Plotter Instances: 2
Plotter Performance: Sectors: 51 | Avg/min: 10.20 | Avg time: 5.88 sec | TiB/day: 14.12
GPU 0:
Utilization: Max: 100%, Avg: 57.55%
Memory Usage: Max: 3098MB, Avg: 3096.27MB
Power Draw: Max: 202.28W, Avg: 119.89W
GPU 1:
Utilization: Max: 100%, Avg: 82.89%
Memory Usage: Max: 3098MB, Avg: 3096.79MB
Power Draw: Max: 243.57W, Avg: 213.80W
========================================
========================================
Monitoring Results - 2024-09-15 09:03:35
Number of Plotter Instances: 3
Plotter Performance: Sectors: 83 | Avg/min: 16.60 | Avg time: 3.61 sec | TiB/day: 23.00
GPU 0:
Utilization: Max: 100%, Avg: 47.44%
Memory Usage: Max: 4255MB, Avg: 4250.44MB
Power Draw: Max: 204.78W, Avg: 124.94W
GPU 1:
Utilization: Max: 100%, Avg: 51.44%
Memory Usage: Max: 4255MB, Avg: 4250.48MB
Power Draw: Max: 241.26W, Avg: 159.30W
========================================
========================================
Monitoring Results - 2024-09-15 09:10:41
Number of Plotter Instances: 4
Plotter Performance: Sectors: 68 | Avg/min: 13.60 | Avg time: 4.41 sec | TiB/day: 18.83
GPU 0:
Utilization: Max: 100%, Avg: 70.15%
Memory Usage: Max: 5412MB, Avg: 5065.62MB
Power Draw: Max: 208.46W, Avg: 152.19W
GPU 1:
Utilization: Max: 100%, Avg: 62.29%
Memory Usage: Max: 5412MB, Avg: 5407.48MB
Power Draw: Max: 243.09W, Avg: 174.67W
========================================
========================================
Monitoring Results - 2024-09-15 09:18:39
Number of Plotter Instances: 5
Plotter Performance: Sectors: 69 | Avg/min: 13.80 | Avg time: 4.34 sec | TiB/day: 19.13
GPU 0:
Utilization: Max: 100%, Avg: 100.00%
Memory Usage: Max: 6570MB, Avg: 6536.79MB
Power Draw: Max: 190.84W, Avg: 187.16W
GPU 1:
Utilization: Max: 100%, Avg: 100.00%
Memory Usage: Max: 6570MB, Avg: 6569.75MB
Power Draw: Max: 245.54W, Avg: 244.37W
========================================
========================================
Monitoring Results - 2024-09-15 09:28:23
Number of Plotter Instances: 6
Plotter Performance: Sectors: 46 | Avg/min: 9.20 | Avg time: 6.52 sec | TiB/day: 12.73
GPU 0:
Utilization: Max: 100%, Avg: 100.00%
Memory Usage: Max: 7727MB, Avg: 7725.86MB
Power Draw: Max: 194.98W, Avg: 189.99W
GPU 1:
Utilization: Max: 100%, Avg: 100.00%
Memory Usage: Max: 7729MB, Avg: 7727.51MB
Power Draw: Max: 245.05W, Avg: 240.14W
========================================
========================================
Monitoring Results - 2024-09-15 09:39:44
Number of Plotter Instances: 6
Plotter Performance: Sectors: 69 | Avg/min: 13.80 | Avg time: 4.34 sec | TiB/day: 19.13
GPU 0:
Utilization: Max: 100%, Avg: 100.00%
Memory Usage: Max: 7729MB, Avg: 7729.00MB
Power Draw: Max: 190.69W, Avg: 188.27W
GPU 1:
Utilization: Max: 100%, Avg: 100.00%
Memory Usage: Max: 7729MB, Avg: 7728.86MB
Power Draw: Max: 245.90W, Avg: 244.38W
========================================
========================================
Monitoring Results - 2024-09-15 09:47:11
Number of Plotter Instances: 7
Plotter Performance: Sectors: 102 | Avg/min: 20.40 | Avg time: 2.94 sec | TiB/day: 28.24
GPU 0:
Utilization: Max: 100%, Avg: 100.00%
Memory Usage: Max: 8886MB, Avg: 8885.75MB
Power Draw: Max: 195.37W, Avg: 193.15W
GPU 1:
Utilization: Max: 100%, Avg: 100.00%
Memory Usage: Max: 8886MB, Avg: 8885.79MB
Power Draw: Max: 240.46W, Avg: 238.56W
========================================
========================================
Monitoring Results - 2024-09-15 09:56:44
Number of Plotter Instances: 8
Plotter Performance: Sectors: 86 | Avg/min: 17.20 | Avg time: 3.48 sec | TiB/day: 23.86
GPU 0:
Utilization: Max: 100%, Avg: 100.00%
Memory Usage: Max: 10044MB, Avg: 10043.86MB
Power Draw: Max: 198.86W, Avg: 196.50W
GPU 1:
Utilization: Max: 100%, Avg: 100.00%
Memory Usage: Max: 10044MB, Avg: 10043.24MB
Power Draw: Max: 238.93W, Avg: 234.99W
========================================
========================================
Monitoring Results - 2024-09-15 10:23:11
Number of Plotter Instances: 1
Plotter Performance: Sectors: 35 | Avg/min: 8.58 | Avg time: 8.58 sec | TiB/day: 9.83
GPU 0:
Utilization: Max: 93%, Avg: 72.65%
Memory Usage: Max: 1940MB, Avg: 1939.65MB
Power Draw: Max: 211.31W, Avg: 180.69W
GPU 1:
Utilization: Max: 91%, Avg: 69.41%
Memory Usage: Max: 1940MB, Avg: 1939.51MB
Power Draw: Max: 233.84W, Avg: 189.48W
========================================