Abnormal plotting progress in farmer cluster

Issue Report

Environment

  • Operating System:Ubuntu 22.04.4 LTS
  • Pulsar/Advanced CLI/Docker:subspace-farmer-ubuntu-x86_64-skylake-gemini-3h-2024-jun-11

Problem

I’ve discovered an issue with the farmer cluster: one of my devices showed a plotting progress of 25%, but when I terminated the process and restarted it, the plotting progress started from 11%. I’ve observed this regression in plotting progress on three devices already.

From the following logs, it can be seen that the progress for some directories is correct, while the progress for others still starts from 11%.

2024-06-12T06:19:00.023146Z  INFO {farm_index=6}:{sector_index=844}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.78% complete)
2024-06-12T06:19:00.023179Z  INFO {farm_index=9}:{sector_index=1797}: subspace_farmer::single_disk_farm::plotting: Plotting sector (25.08% complete)
2024-06-12T06:19:11.930200Z  INFO {farm_index=9}:{sector_index=1798}: subspace_farmer::single_disk_farm::plotting: Plotting sector (25.10% complete)
2024-06-12T06:19:17.057047Z  INFO {farm_index=8}:{sector_index=836}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.67% complete)
2024-06-12T06:19:20.532444Z  INFO {farm_index=8}:{sector_index=837}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.68% complete)
2024-06-12T06:19:20.532491Z  INFO {farm_index=9}:{sector_index=1799}: subspace_farmer::single_disk_farm::plotting: Plotting sector (25.11% complete)
2024-06-12T06:19:21.007924Z  INFO {farm_index=4}:{sector_index=826}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.53% complete)
2024-06-12T06:19:21.770713Z  INFO {farm_index=4}:{sector_index=827}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.54% complete)
2024-06-12T06:19:32.795749Z  INFO {farm_index=6}:{sector_index=845}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.80% complete)
2024-06-12T06:19:34.491403Z  INFO {farm_index=6}:{sector_index=846}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.81% complete)
2024-06-12T06:19:36.100887Z  INFO {farm_index=6}:{sector_index=847}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.82% complete)
2024-06-12T06:19:36.210342Z  INFO {farm_index=6}:{sector_index=848}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.84% complete)
2024-06-12T06:19:36.302947Z  INFO {farm_index=6}:{sector_index=849}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.85% complete)
2024-06-12T06:19:40.789982Z  INFO {farm_index=6}:{sector_index=850}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.86% complete)
2024-06-12T06:19:46.113586Z  INFO {farm_index=6}:{sector_index=851}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.88% complete)
2024-06-12T06:19:47.563037Z  INFO {farm_index=6}:{sector_index=852}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.89% complete)
2024-06-12T06:19:47.800029Z  INFO {farm_index=6}:{sector_index=853}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.91% complete)
2024-06-12T06:19:48.628995Z  INFO {farm_index=6}:{sector_index=854}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.92% complete)
2024-06-12T06:20:04.700004Z  INFO {farm_index=3}:{sector_index=1719}: subspace_farmer::single_disk_farm::plotting: Plotting sector (23.99% complete)
2024-06-12T06:20:11.049328Z  INFO {farm_index=3}:{sector_index=1720}: subspace_farmer::single_disk_farm::plotting: Plotting sector (24.01% complete)
2024-06-12T06:20:30.394135Z  INFO {farm_index=11}:{sector_index=901}: subspace_farmer::single_disk_farm::plotting: Plotting sector (12.58% complete)
2024-06-12T06:20:39.720970Z  INFO {farm_index=3}:{sector_index=1721}: subspace_farmer::single_disk_farm::plotting: Plotting sector (24.02% complete)
2024-06-12T06:20:42.659259Z  INFO {farm_index=3}:{sector_index=1722}: subspace_farmer::single_disk_farm::plotting: Plotting sector (24.04% complete)
2024-06-12T06:20:43.338458Z  INFO {farm_index=6}:{sector_index=855}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.93% complete)
2024-06-12T06:20:46.530493Z  INFO {farm_index=6}:{sector_index=856}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.95% complete)
2024-06-12T06:20:46.581016Z  INFO {farm_index=3}:{sector_index=1723}: subspace_farmer::single_disk_farm::plotting: Plotting sector (24.05% complete)
2024-06-12T06:20:49.157432Z  INFO {farm_index=6}:{sector_index=857}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.96% complete)
2024-06-12T06:20:50.985764Z  INFO {farm_index=6}:{sector_index=858}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.98% complete)
2024-06-12T06:21:32.231964Z  INFO {farm_index=5}:{sector_index=834}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.64% complete)
2024-06-12T06:21:32.250923Z  INFO {farm_index=5}:{sector_index=835}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.66% complete)
2024-06-12T06:21:32.926954Z  INFO {farm_index=5}:{sector_index=836}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.67% complete)
2024-06-12T06:21:40.592100Z  INFO {farm_index=5}:{sector_index=837}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.68% complete)
2024-06-12T06:21:42.299103Z  INFO {farm_index=5}:{sector_index=838}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.70% complete)
2024-06-12T06:21:42.321624Z  INFO {farm_index=5}:{sector_index=839}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.71% complete)
2024-06-12T06:21:46.532787Z  INFO {farm_index=6}:{sector_index=859}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.99% complete)
2024-06-12T06:21:46.935228Z  INFO {farm_index=5}:{sector_index=840}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.73% complete)
2024-06-12T06:21:47.234652Z  INFO {farm_index=6}:{sector_index=860}: subspace_farmer::single_disk_farm::plotting: Plotting sector (12.00% complete)
2024-06-12T06:22:19.819054Z  INFO {farm_index=7}:{sector_index=1887}: subspace_farmer::single_disk_farm::plotting: Plotting sector (26.34% complete)
2024-06-12T06:22:19.835337Z  INFO {farm_index=7}:{sector_index=1888}: subspace_farmer::single_disk_farm::plotting: Plotting sector (26.35% complete)
2024-06-12T06:22:19.853795Z  INFO {farm_index=7}:{sector_index=1889}: subspace_farmer::single_disk_farm::plotting: Plotting sector (26.37% complete)
2024-06-12T06:22:19.885186Z  INFO {farm_index=7}:{sector_index=1890}: subspace_farmer::single_disk_farm::plotting: Plotting sector (26.38% complete)
2024-06-12T06:22:25.708600Z  INFO {farm_index=7}:{sector_index=1891}: subspace_farmer::single_disk_farm::plotting: Plotting sector (26.40% complete)
2024-06-12T06:22:28.180803Z  INFO {farm_index=7}:{sector_index=1892}: subspace_farmer::single_disk_farm::plotting: Plotting sector (26.41% complete)
2024-06-12T06:22:34.929596Z  INFO {farm_index=6}:{sector_index=861}: subspace_farmer::single_disk_farm::plotting: Plotting sector (12.02% complete)
2024-06-12T06:23:44.043431Z  INFO {farm_index=5}:{sector_index=841}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.74% complete)
2024-06-12T06:23:44.054940Z  INFO {farm_index=5}:{sector_index=842}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.75% complete)
2024-06-12T06:24:18.089181Z  INFO {farm_index=7}:{sector_index=1893}: subspace_farmer::single_disk_farm::plotting: Plotting sector (26.42% complete)
2024-06-12T06:25:44.544506Z  INFO {farm_index=8}:{sector_index=838}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.70% complete)
2024-06-12T06:25:45.638903Z  INFO {farm_index=8}:{sector_index=839}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.71% complete)
2024-06-12T06:25:52.401306Z  INFO {farm_index=11}:{sector_index=902}: subspace_farmer::single_disk_farm::plotting: Plotting sector (12.59% complete)
2024-06-12T06:25:52.415988Z  INFO {farm_index=11}:{sector_index=903}: subspace_farmer::single_disk_farm::plotting: Plotting sector (12.60% complete)
2024-06-12T06:25:52.431115Z  INFO {farm_index=11}:{sector_index=904}: subspace_farmer::single_disk_farm::plotting: Plotting sector (12.62% complete)
2024-06-12T06:25:53.289754Z  INFO {farm_index=11}:{sector_index=905}: subspace_farmer::single_disk_farm::plotting: Plotting sector (12.63% complete)
2024-06-12T06:25:53.311782Z  INFO {farm_index=11}:{sector_index=906}: subspace_farmer::single_disk_farm::plotting: Plotting sector (12.65% complete)
2024-06-12T06:25:54.760099Z  INFO {farm_index=9}:{sector_index=1800}: subspace_farmer::single_disk_farm::plotting: Plotting sector (25.13% complete)

And this version reports these errors.

2024-06-12T06:42:07.121736Z  INFO {farm_index=0}:{sector_index=843}: subspace_farmer::single_disk_farm::plotting: Plotting sector (11.77% complete)
2024-06-12T06:42:09.270457Z  INFO {farm_index=7}:{sector_index=1943}: subspace_farmer::single_disk_farm::plotting: Plotting sector (27.12% complete)
2024-06-12T06:42:09.513422Z  INFO {farm_index=7}:{sector_index=1944}: subspace_farmer::single_disk_farm::plotting: Plotting sector (27.14% complete)
2024-06-12T06:42:09.704056Z  INFO {farm_index=7}:{sector_index=1945}: subspace_farmer::single_disk_farm::plotting: Plotting sector (27.15% complete)
2024-06-12T06:42:10.796887Z  INFO {farm_index=7}:{sector_index=1946}: subspace_farmer::single_disk_farm::plotting: Plotting sector (27.16% complete)
2024-06-12T06:42:12.589177Z  INFO async_nats: event: disconnected
2024-06-12T06:42:12.593107Z  INFO async_nats: event: connected
2024-06-12T06:42:12.613622Z  WARN {farm_index=3}:{sector_index=1741}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=352 expected_index=351 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:12.613694Z  WARN {farm_index=3}:{sector_index=1741}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Received only 1840646814892 sector bytes out of 1056896064 expected bytes
2024-06-12T06:42:12.619948Z  WARN {farm_index=5}:{sector_index=874}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=314 expected_index=313 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:12.620049Z  WARN {farm_index=5}:{sector_index=874}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Received only 924248092272 sector bytes out of 1056896064 expected bytes
2024-06-12T06:42:12.694940Z  WARN {farm_index=8}:{sector_index=851}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=67 expected_index=66 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:12.694990Z  WARN {farm_index=8}:{sector_index=851}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Received only 899473286108 sector bytes out of 1056896064 expected bytes
2024-06-12T06:42:12.696129Z  WARN {farm_index=5}:{sector_index=887}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=86 expected_index=85 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:12.696163Z  WARN {farm_index=5}:{sector_index=887}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Received only 937557405696 sector bytes out of 1056896064 expected bytes
2024-06-12T06:42:12.722366Z  WARN {farm_index=5}:{sector_index=878}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=310 expected_index=309 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:12.722396Z  WARN {farm_index=5}:{sector_index=878}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Received only 928468126784 sector bytes out of 1056896064 expected bytes
2024-06-12T06:42:12.726315Z  WARN {farm_index=5}:{sector_index=837}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=109 expected_index=108 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:12.726349Z  WARN {farm_index=5}:{sector_index=837}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Received only 884754126088 sector bytes out of 1056896064 expected bytes
2024-06-12T06:42:12.759282Z  WARN {farm_index=5}:{sector_index=880}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=71 expected_index=70 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:12.759318Z  WARN {farm_index=5}:{sector_index=880}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Received only 930127046836 sector bytes out of 1056896064 expected bytes
2024-06-12T06:42:12.762057Z  WARN {farm_index=10}:{sector_index=835}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=189 expected_index=188 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:12.762128Z  WARN {farm_index=10}:{sector_index=835}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Received only 882798878584 sector bytes out of 1056896064 expected bytes
2024-06-12T06:42:12.818441Z  WARN {farm_index=8}:{sector_index=848}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=65 expected_index=64 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:12.818472Z  WARN {farm_index=8}:{sector_index=848}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Received only 896298823044 sector bytes out of 1056896064 expected bytes
2024-06-12T06:42:12.822104Z  WARN {farm_index=3}:{sector_index=1739}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=557 expected_index=556 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:12.822147Z  WARN {farm_index=3}:{sector_index=1739}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Received only 1838919947144 sector bytes out of 1056896064 expected bytes
2024-06-12T06:42:13.615917Z  INFO {farm_index=3}:{sector_index=1741}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry
2024-06-12T06:42:13.725783Z  INFO {farm_index=5}:{sector_index=878}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry
2024-06-12T06:42:13.762506Z  INFO {farm_index=5}:{sector_index=880}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry
2024-06-12T06:42:13.843811Z  INFO {farm_index=8}:{sector_index=848}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry
2024-06-12T06:42:20.912001Z  WARN {farm_index=3}:{sector_index=1744}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=4 expected_index=3 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:20.912113Z  WARN {farm_index=3}:{sector_index=1744}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Plotting progress stream ended before plotting finished
2024-06-12T06:42:21.074467Z  WARN {farm_index=7}:{sector_index=1934}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=20 expected_index=19 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:21.074610Z  WARN {farm_index=7}:{sector_index=1934}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Plotting progress stream ended before plotting finished
2024-06-12T06:42:21.147199Z  WARN {farm_index=7}:{sector_index=1929}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=21 expected_index=20 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:21.147300Z  WARN {farm_index=7}:{sector_index=1929}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Plotting progress stream ended before plotting finished
2024-06-12T06:42:21.251026Z  WARN {farm_index=7}:{sector_index=1930}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=21 expected_index=20 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:21.251097Z  WARN {farm_index=7}:{sector_index=1930}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Plotting progress stream ended before plotting finished
2024-06-12T06:42:21.277070Z  WARN {farm_index=8}:{sector_index=861}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=21 expected_index=20 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:21.277141Z  WARN {farm_index=8}:{sector_index=861}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Plotting progress stream ended before plotting finished
2024-06-12T06:42:21.399757Z  WARN {farm_index=1}:{sector_index=842}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=7 expected_index=6 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:21.399848Z  WARN {farm_index=1}:{sector_index=842}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Plotting progress stream ended before plotting finished
2024-06-12T06:42:21.501108Z  WARN {farm_index=1}:{sector_index=845}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=6 expected_index=5 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:21.501164Z  WARN {farm_index=1}:{sector_index=845}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Plotting progress stream ended before plotting finished
2024-06-12T06:42:21.553631Z  WARN {farm_index=1}:{sector_index=846}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=6 expected_index=5 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:21.553680Z  WARN {farm_index=1}:{sector_index=846}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Plotting progress stream ended before plotting finished
2024-06-12T06:42:22.551927Z  WARN {farm_index=10}:{sector_index=836}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=33 expected_index=32 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress
2024-06-12T06:42:22.552042Z  WARN {farm_index=10}:{sector_index=836}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Plotting progress stream ended before plotting finished
2024-06-12T06:42:22.555006Z  INFO {farm_index=7}:{sector_index=1929}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry
2024-06-12T06:42:22.558712Z  INFO {farm_index=1}:{sector_index=846}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry
2024-06-12T06:42:23.261322Z  INFO {farm_index=3}:{sector_index=1744}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry
2024-06-12T06:42:23.261360Z  INFO {farm_index=1}:{sector_index=842}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry
2024-06-12T06:42:23.261360Z  INFO {farm_index=5}:{sector_index=887}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry

nats server log

[1136810] 2024/06/12 12:24:54.877233 [INF] 172.29.100.142:39032 - cid:247 - Slow Consumer Detected: MaxPending of 67108864 Exceeded
[1136810] 2024/06/12 12:27:42.766350 [INF] 172.29.100.142:45146 - cid:248 - Slow Consumer Detected: MaxPending of 67108864 Exceeded
[1136810] 2024/06/12 13:10:00.930075 [INF] 172.29.100.139:32804 - cid:250 - Slow Consumer Detected: MaxPending of 67108864 Exceeded
[1136810] 2024/06/12 13:12:52.478814 [INF] 172.29.100.139:52000 - cid:251 - Slow Consumer Detected: MaxPending of 67108864 Exceeded
[1136810] 2024/06/12 14:26:21.683667 [INF] 172.29.100.139:48286 - cid:253 - Slow Consumer Detected: MaxPending of 67108864 Exceeded
[1136810] 2024/06/12 14:37:31.006504 [INF] 172.29.100.139:39906 - cid:254 - Slow Consumer Detected: MaxPending of 67108864 Exceeded
[1136810] 2024/06/12 14:42:11.272434 [INF] 172.29.100.139:44580 - cid:255 - Slow Consumer Detected: MaxPending of 67108864 Exceeded

The plotting progress must be correct. You have different farms and each of them has their own progress. Some 11%, some 25%. Order in which sectors are plotted across different farms is not deterministic, so you may see different output each time you start the software.

This looks like a networking issue where your app lost connection to NATS server for some reason. It reconnected later, but the progress of whatever it was doing will be reset in this case.

This likely means you’re putting too much load on one of the components. Hard to say which one from the logs though. Can you tell me more about the setup? How many things you’re running, what hardware, networking situation, etc.?

Each farmer is plot at the same time. I can confirm that the progress of each farmer is around 25% before the process ends. However, after restarting the process, the progress of some farmers becomes 11%. Suppose there are 6 14T farmers on a device, and 6 farmers start plot at the same time. After a few days, the progress of these 6 farmers will be between 11% and 25%. Is there such a big difference?

172.29.100.11 node
172.29.100.141 nats-server、farmer
172.29.100.142 controller、farmer
172.29.100.143 cache
172.29.100.111~140 plotter*30

CPU:E5-2697 v4 *2
Ethernet:10G

As already said, the process is not deterministic, so it is possible that some farms will make more progress than others.

That is A LOT of plotters for a poor single controller and cache instance. The code is not magic and may not be able to process that many at the same time, receiving side is simply overwhelmed.

I’ll create an experimental build that should scale both controller and cache much further, will post a link once I have it. The expectation was that it wouldn’t be necessary, but I don’t think huge setups will work as is unfortunately.

Either way you should consider running more than one node and controller+cache pair in such setup for performance and redundancy. And maybe even clustered NATS.

I have about 200 plotters, but not all of them are turned on yet. If all of them are turned on, do you recommend dividing these 200 plotters into multiple clusters? If they are to be combined into a cluster, how many controllers, caches, and NATS components are recommended to run?

You don’t have to have multiple clusters. Single cluster can have multiple controllers, caches, etc. I do not have numbers for you as I do not have hardware to experiment with such setups. This is something we expect community to try out and share with each other.

Ok, I will share it after testing.

Here is an experimental build: Snapshot build · subspace/subspace@9c9b294 · GitHub

It will run multiple instances of internal service on all components of farming cluster, which should allow for higher capacity to receive requests internally. By default it run 32 copies, but can be customized with CLI option.

Curious if it works better at your scale.

Today, I tested this version and noticed many frequent warn-level logs. This time, I restarted the process and chose to retain the historical logs. The issue of the P plot progress regression reoccurred. According to the following logs, it can be seen that for the same farmer, the progress before shutdown was significantly greater than the progress after restarting. This happened with multiple farmers. Please take this issue seriously.

farmer log

tail -n 600 /tmp/ssc-farmer-cluster.log |grep -w farm_index=0

2024-06-13T08:35:26.383227Z  INFO {farm_index=0}:{sector_index=2749}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.37% complete)
2024-06-13T08:36:12.103293Z  INFO {farm_index=0}:{sector_index=2750}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.39% complete)
2024-06-13T08:37:12.264245Z  INFO {farm_index=0}:{sector_index=2751}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.40% complete)
2024-06-13T08:37:39.046221Z  INFO {farm_index=0}:{sector_index=2752}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.41% complete)
2024-06-13T08:38:43.876455Z  INFO {farm_index=0}:{sector_index=2753}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.43% complete)
2024-06-13T08:39:22.259871Z  INFO {farm_index=0}:{sector_index=2754}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.44% complete)
2024-06-13T08:40:12.376220Z  INFO {farm_index=0}:{sector_index=2755}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.46% complete)
2024-06-13T08:41:14.263042Z  INFO {farm_index=0}:{sector_index=2756}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.47% complete)
2024-06-13T08:42:00.857325Z  INFO {farm_index=0}:{sector_index=2757}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.48% complete)
2024-06-13T08:42:44.515834Z  INFO {farm_index=0}:{sector_index=2758}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.50% complete)
2024-06-13T08:43:37.854474Z  INFO {farm_index=0}:{sector_index=2759}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.51% complete)
2024-06-13T08:44:27.378110Z  INFO {farm_index=0}:{sector_index=2760}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.53% complete)
2024-06-13T08:45:06.788350Z  INFO {farm_index=0}:{sector_index=2761}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.54% complete)
2024-06-13T08:46:05.612580Z  INFO {farm_index=0}:{sector_index=2762}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.55% complete)
2024-06-13T08:46:43.254518Z  INFO {farm_index=0}:{sector_index=2763}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.57% complete)
2024-06-13T08:47:29.561284Z  INFO {farm_index=0}:{sector_index=2764}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.58% complete)
2024-06-13T08:53:54.468084Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plot_cache: Checking plot cache contents, this can take a while
2024-06-13T08:53:54.471958Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plot_cache: Finished checking plot cache contents
2024-06-13T08:54:03.342146Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm: Benchmarking faster proving method
2024-06-13T08:54:07.414347Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm: Faster proving method found fastest_mode=ConcurrentChunks
2024-06-13T08:54:44.185997Z  INFO {farm_index=0}: subspace_farmer::commands::cluster::farmer: Farm 0:
2024-06-13T08:54:44.186007Z  INFO {farm_index=0}: subspace_farmer::commands::cluster::farmer:   ID: 01HY2PPGHAMJWPP0DTFKY9ANMM
2024-06-13T08:54:44.186011Z  INFO {farm_index=0}: subspace_farmer::commands::cluster::farmer:   Genesis hash: 0x0c121c75f4ef450f40619e1fca9d1e8e7fbabc42c895bc4790801e85d5a91c34
2024-06-13T08:54:44.186015Z  INFO {farm_index=0}: subspace_farmer::commands::cluster::farmer:   Public key: 0x34872127034dc4d30a19cfbffb33428c8a8f1e524cb220e925d49ac8a21e7a39
2024-06-13T08:54:44.186020Z  INFO {farm_index=0}: subspace_farmer::commands::cluster::farmer:   Allocated space: 6.9 TiB (7.6 TB)
2024-06-13T08:54:44.186024Z  INFO {farm_index=0}: subspace_farmer::commands::cluster::farmer:   Directory: /disk/nvme0n1/ssc-3h/farmer-1
2024-06-13T08:54:50.469489Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Subscribing to archived segments
2024-06-13T08:54:50.469578Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::farming: Subscribing to slot info notifications
2024-06-13T08:54:50.469609Z  INFO {farm_index=0}: subspace_farmer::reward_signing: Subscribing to reward signing notifications
2024-06-13T08:56:35.166486Z  INFO {farm_index=0}:{sector_index=1134}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.83% complete)
2024-06-13T08:56:35.532183Z  INFO {farm_index=0}:{sector_index=1135}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.84% complete)
2024-06-13T08:56:35.535302Z  INFO {farm_index=0}:{sector_index=1136}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.86% complete)
2024-06-13T08:56:37.940341Z  INFO {farm_index=0}:{sector_index=1137}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.87% complete)
2024-06-13T08:56:47.701234Z  INFO {farm_index=0}:{sector_index=1138}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.88% complete)
2024-06-13T08:56:51.852718Z  INFO {farm_index=0}:{sector_index=1139}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.90% complete)
tail -n 600 /tmp/ssc-farmer-cluster.log |grep -w farm_index=4

2024-06-13T08:43:58.460272Z  INFO {farm_index=4}:{sector_index=2760}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.53% complete)
2024-06-13T08:44:53.156748Z  INFO {farm_index=4}:{sector_index=2761}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.54% complete)
2024-06-13T08:45:34.690771Z  INFO {farm_index=4}:{sector_index=2762}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.55% complete)
2024-06-13T08:46:32.186574Z  INFO {farm_index=4}:{sector_index=2763}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.57% complete)
2024-06-13T08:47:17.017952Z  INFO {farm_index=4}:{sector_index=2764}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.58% complete)
2024-06-13T08:47:53.531921Z  INFO {farm_index=4}:{sector_index=2765}: subspace_farmer::single_disk_farm::plotting: Plotting sector (38.60% complete)
2024-06-13T08:53:54.478707Z  INFO {farm_index=4}: subspace_farmer::single_disk_farm::plot_cache: Checking plot cache contents, this can take a while
2024-06-13T08:53:54.480966Z  INFO {farm_index=4}: subspace_farmer::single_disk_farm::plot_cache: Finished checking plot cache contents
2024-06-13T08:54:32.747557Z  INFO {farm_index=4}: subspace_farmer::single_disk_farm: Benchmarking faster proving method
2024-06-13T08:54:36.217331Z  INFO {farm_index=4}: subspace_farmer::single_disk_farm: Faster proving method found fastest_mode=ConcurrentChunks
2024-06-13T08:54:44.183391Z  INFO {farm_index=4}: subspace_farmer::commands::cluster::farmer: Farm 4:
2024-06-13T08:54:44.183402Z  INFO {farm_index=4}: subspace_farmer::commands::cluster::farmer:   ID: 01HY2PPGHA4J6GGWCW5JVXFC39
2024-06-13T08:54:44.183407Z  INFO {farm_index=4}: subspace_farmer::commands::cluster::farmer:   Genesis hash: 0x0c121c75f4ef450f40619e1fca9d1e8e7fbabc42c895bc4790801e85d5a91c34
2024-06-13T08:54:44.183412Z  INFO {farm_index=4}: subspace_farmer::commands::cluster::farmer:   Public key: 0x3cc8e24bc21c56d6fc7156f67da35caadbbbf47624ba42bf47e2314c16a9e577
2024-06-13T08:54:44.183417Z  INFO {farm_index=4}: subspace_farmer::commands::cluster::farmer:   Allocated space: 6.9 TiB (7.6 TB)
2024-06-13T08:54:44.183421Z  INFO {farm_index=4}: subspace_farmer::commands::cluster::farmer:   Directory: /disk/nvme4n1/ssc-3h/farmer-1
2024-06-13T08:54:50.469881Z  INFO {farm_index=4}: subspace_farmer::single_disk_farm::plotting: Subscribing to archived segments
2024-06-13T08:54:50.469891Z  INFO {farm_index=4}: subspace_farmer::single_disk_farm::farming: Subscribing to slot info notifications
2024-06-13T08:54:50.469907Z  INFO {farm_index=4}: subspace_farmer::reward_signing: Subscribing to reward signing notifications
2024-06-13T08:56:33.418580Z  INFO {farm_index=4}:{sector_index=1166}: subspace_farmer::single_disk_farm::plotting: Plotting sector (16.28% complete)
2024-06-13T08:56:33.897381Z  INFO {farm_index=4}:{sector_index=1167}: subspace_farmer::single_disk_farm::plotting: Plotting sector (16.29% complete)
tail -n 600 /tmp/ssc-farmer-cluster.log |grep -w farm_index=5

2024-06-13T08:43:32.522121Z  INFO {farm_index=5}:{sector_index=2814}: subspace_farmer::single_disk_farm::plotting: Plotting sector (39.28% complete)
2024-06-13T08:44:13.862131Z  INFO {farm_index=5}:{sector_index=2815}: subspace_farmer::single_disk_farm::plotting: Plotting sector (39.29% complete)
2024-06-13T08:45:00.962527Z  INFO {farm_index=5}:{sector_index=2816}: subspace_farmer::single_disk_farm::plotting: Plotting sector (39.31% complete)
2024-06-13T08:45:47.365785Z  INFO {farm_index=5}:{sector_index=2817}: subspace_farmer::single_disk_farm::plotting: Plotting sector (39.32% complete)
2024-06-13T08:46:39.189531Z  INFO {farm_index=5}:{sector_index=2818}: subspace_farmer::single_disk_farm::plotting: Plotting sector (39.34% complete)
2024-06-13T08:47:20.335046Z  INFO {farm_index=5}:{sector_index=2819}: subspace_farmer::single_disk_farm::plotting: Plotting sector (39.35% complete)
2024-06-13T08:48:06.182893Z  INFO {farm_index=5}:{sector_index=2820}: subspace_farmer::single_disk_farm::plotting: Plotting sector (39.36% complete)
2024-06-13T08:53:54.455927Z  INFO {farm_index=5}: subspace_farmer::single_disk_farm::plot_cache: Checking plot cache contents, this can take a while
2024-06-13T08:53:54.459022Z  INFO {farm_index=5}: subspace_farmer::single_disk_farm::plot_cache: Finished checking plot cache contents
2024-06-13T08:54:24.391284Z  INFO {farm_index=5}: subspace_farmer::single_disk_farm: Benchmarking faster proving method
2024-06-13T08:54:28.312305Z  INFO {farm_index=5}: subspace_farmer::single_disk_farm: Faster proving method found fastest_mode=ConcurrentChunks
2024-06-13T08:54:44.183557Z  INFO {farm_index=5}: subspace_farmer::commands::cluster::farmer: Farm 5:
2024-06-13T08:54:44.183566Z  INFO {farm_index=5}: subspace_farmer::commands::cluster::farmer:   ID: 01HY2PPGHANJETGTNFZR9SS2K9
2024-06-13T08:54:44.183570Z  INFO {farm_index=5}: subspace_farmer::commands::cluster::farmer:   Genesis hash: 0x0c121c75f4ef450f40619e1fca9d1e8e7fbabc42c895bc4790801e85d5a91c34
2024-06-13T08:54:44.183575Z  INFO {farm_index=5}: subspace_farmer::commands::cluster::farmer:   Public key: 0xfab748a1fe59006a71d5d2bd83a8289722fe913e0ec792a5427d57d73e1ea636
2024-06-13T08:54:44.183580Z  INFO {farm_index=5}: subspace_farmer::commands::cluster::farmer:   Allocated space: 6.9 TiB (7.6 TB)
2024-06-13T08:54:44.183584Z  INFO {farm_index=5}: subspace_farmer::commands::cluster::farmer:   Directory: /disk/nvme5n1/ssc-3h/farmer-1
2024-06-13T08:54:50.469940Z  INFO {farm_index=5}: subspace_farmer::single_disk_farm::plotting: Subscribing to archived segments
2024-06-13T08:54:50.469953Z  INFO {farm_index=5}: subspace_farmer::single_disk_farm::farming: Subscribing to slot info notifications
2024-06-13T08:54:50.469964Z  INFO {farm_index=5}: subspace_farmer::reward_signing: Subscribing to reward signing notifications
2024-06-13T08:56:26.762402Z  INFO {farm_index=5}:{sector_index=1100}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.35% complete)
2024-06-13T08:56:26.770741Z  INFO {farm_index=5}:{sector_index=1101}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.37% complete)
2024-06-13T08:56:26.796280Z  INFO {farm_index=5}:{sector_index=1102}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.38% complete)
2024-06-13T08:56:26.798384Z  INFO {farm_index=5}:{sector_index=1103}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.40% complete)
2024-06-13T08:56:26.799884Z  INFO {farm_index=5}:{sector_index=1104}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.41% complete)
2024-06-13T08:56:27.274767Z  INFO {farm_index=5}:{sector_index=1105}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.42% complete)

I think you might have networking issue because warnings seem to happen after this each time:

2024-06-12T18:04:24.031918Z  INFO async_nats: event: disconnected
2024-06-12T18:04:24.048468Z  INFO async_nats: event: connected

As to progress being lost, it is possible to lose a bit of progress due to how plotting works: plotting will happen for up to 50 sectors at a time (controlled via --sector-encoding-concurrency, but at the same time only sectors that are plotted sequentially are actually available for farming and will be seeing as plotted on restart. So if you have plotted sectors 1, 10 and 11, without sectors 2-9 being plotted on restart you’ll see that just sector 1 is plotted because there is no information stored about sectors further away.

I will take a look at improving ordering such that you’re more likely to plot sectors sequentially, but you should still look into why farmer looses connection to NATS server.

According to the above logs, it can be seen that the same farmer lost more than 1000 sectors, not just 50.

The sector_index was 2820 before the process ended and 1100 after restarting the process. Did you carefully review this?

tail -n 600 /tmp/ssc-farmer-cluster.log |grep -w farm_index=5

2024-06-13T08:46:39.189531Z  INFO {farm_index=5}:{sector_index=2818}: subspace_farmer::single_disk_farm::plotting: Plotting sector (39.34% complete)
2024-06-13T08:47:20.335046Z  INFO {farm_index=5}:{sector_index=2819}: subspace_farmer::single_disk_farm::plotting: Plotting sector (39.35% complete)
2024-06-13T08:48:06.182893Z  INFO {farm_index=5}:{sector_index=2820}: subspace_farmer::single_disk_farm::plotting: Plotting sector (39.36% complete)
2024-06-13T08:53:54.455927Z  INFO {farm_index=5}: subspace_farmer::single_disk_farm::plot_cache: Checking plot cache contents, this can take a while
2024-06-13T08:53:54.459022Z  INFO {farm_index=5}: subspace_farmer::single_disk_farm::plot_cache: Finished checking plot cache contents
2024-06-13T08:54:24.391284Z  INFO {farm_index=5}: subspace_farmer::single_disk_farm: Benchmarking faster proving method
2024-06-13T08:54:28.312305Z  INFO {farm_index=5}: subspace_farmer::single_disk_farm: Faster proving method found fastest_mode=ConcurrentChunks
2024-06-13T08:54:44.183557Z  INFO {farm_index=5}: subspace_farmer::commands::cluster::farmer: Farm 5:
2024-06-13T08:54:44.183566Z  INFO {farm_index=5}: subspace_farmer::commands::cluster::farmer:   ID: 01HY2PPGHANJETGTNFZR9SS2K9
2024-06-13T08:54:44.183570Z  INFO {farm_index=5}: subspace_farmer::commands::cluster::farmer:   Genesis hash: 0x0c121c75f4ef450f40619e1fca9d1e8e7fbabc42c895bc4790801e85d5a91c34
2024-06-13T08:54:44.183575Z  INFO {farm_index=5}: subspace_farmer::commands::cluster::farmer:   Public key: 0xfab748a1fe59006a71d5d2bd83a8289722fe913e0ec792a5427d57d73e1ea636
2024-06-13T08:54:44.183580Z  INFO {farm_index=5}: subspace_farmer::commands::cluster::farmer:   Allocated space: 6.9 TiB (7.6 TB)
2024-06-13T08:54:44.183584Z  INFO {farm_index=5}: subspace_farmer::commands::cluster::farmer:   Directory: /disk/nvme5n1/ssc-3h/farmer-1
2024-06-13T08:54:50.469940Z  INFO {farm_index=5}: subspace_farmer::single_disk_farm::plotting: Subscribing to archived segments
2024-06-13T08:54:50.469953Z  INFO {farm_index=5}: subspace_farmer::single_disk_farm::farming: Subscribing to slot info notifications
2024-06-13T08:54:50.469964Z  INFO {farm_index=5}: subspace_farmer::reward_signing: Subscribing to reward signing notifications
2024-06-13T08:56:26.762402Z  INFO {farm_index=5}:{sector_index=1100}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.35% complete)
2024-06-13T08:56:26.770741Z  INFO {farm_index=5}:{sector_index=1101}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.37% complete)
2024-06-13T08:56:26.796280Z  INFO {farm_index=5}:{sector_index=1102}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.38% complete)
2024-06-13T08:56:26.798384Z  INFO {farm_index=5}:{sector_index=1103}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.40% complete)
1 Like

That is definitely very odd, I don’t have a good explanation for it right now I’m araid, I don’t see how it can happen in the code yet.

Could you set environment variable RUST_LOG=info,subspace_farmer=debug and post logs from that? It will print more information about what is happening.

I also see this message in logs:

2024-06-12T12:09:39.523763Z  WARN {farm_index=5}:{sector_index=1113}: subspace_farmer::cluster::nats_client: Received unexpected response stream index, aborting stream actual_index=216 expected_index=215 message_type=subspace_farmer::cluster::plotter::ClusterSectorPlottingProgress

Are you sure you were already running jun-11 at that point? That specific situation was fixed in jun-11 and should no longer happen with a single NATS connection.

Yes, before the process was stopped, it was running the gemini-3h-2024-jun-11 version. I will continue testing after setting the environment variables tomorrow and then provide the logs.

1 Like

Thanks for reporting and patience with this. I understand it must be frustrating this much plotting progress.

I recommend to restart it with mentioned environment variable and maybe not waiting as long this time since it seems to have skipped quite a lot of sectors, so should be reproducible much quicker.

Improved test build that should cause a bit less strain on NATS: Snapshot build · subspace/subspace@0ff84b6 · GitHub

When I set the environment variables today and restarted the process to reproduce the progress loss issue, I found that the progress was normal. This is very strange…

Well, hypothetically something could be racy in there. I also found and fixed one potential issue in Make sure metadata for plotted sectors is written sequentially by nazar-pc · Pull Request #2847 · subspace/subspace · GitHub that you may have hit potentially as well. Would be worth trying next release when it is out.