Cluster farmer timeout while plotting

Issue Report

Environment

  • Operating System:
    – node: Ubuntu24.04 Server, 10GB lan
    – Controller: On the same server with node
    – Cache: On the same server as the node
    – Plot: On the same server as the node, 4090x4
    – Farmer: Ubuntu24.04 Server, 24x7T U.2, 10GB lan, IP: 172.16.0.210
  • Advanced CLI
    nat use binary file start
cat nats.conf
max_payload = 6MB
./nats-server --config ./nats.conf

farm start scripts:

/home/scmy/sub/bin/subspace-farmer-ubuntu-x86_64-skylake-mainnet-2025-jan-14 \
    cluster \
        --nats-server "nats://172.16.0.41:4222" \
    farmer \
        --reward-address "subXXXXXX" \
        path="/data/nvme01,size=7.5T" \
        path="/data/nvme02,size=7.5T" \
	path="/data/nvme03,size=7.5T" \
	path="/data/nvme04,size=7.5T" \
	path="/data/nvme05,size=7.5T" \
	path="/data/nvme06,size=7.5T" \
	path="/data/nvme07,size=7.5T" \
	path="/data/nvme08,size=7.5T" \
	path="/data/nvme09,size=7.5T" \
	path="/data/nvme10,size=7.5T" \
	path="/data/nvme11,size=7.5T" \
	path="/data/nvme12,size=7.5T" \
	path="/data/nvme13,size=7.5T" \
	path="/data/nvme14,size=7.5T" \
	path="/data/nvme15,size=7.5T" \
	path="/data/nvme16,size=7.5T" \
	path="/data/nvme17,size=7.5T" \
	path="/data/nvme18,size=7.5T" \
	path="/data/nvme19,size=7.5T" \
	path="/data/nvme20,size=7.5T" \
	path="/data/nvme21,size=7.5T" \
	path="/data/nvme22,size=7.5T" \
	path="/data/nvme23,size=7.5T" \
	path="/data/nvme24,size=7.5T"

controller start scripts

nohup /home/user/sub/bin/subspace-farmer-ubuntu-x86_64-skylake-mainnet-2025-jan-14 \
    cluster \
        --nats-server "nats://172.16.0.41:4222" \
    controller \
         --node-rpc-url  "ws://localhost:9944" \
        --base-path "/home/user/sub/controller/cache" >> /home/user/sub/logs/controller.log 2>&1 &

cache start controllers

nohup /home/user/sub/bin/subspace-farmer-ubuntu-x86_64-skylake-mainnet-2025-jan-14 \
    cluster \
        --nats-server "nats://172.16.0.41:4222" \
    controller \
         --node-rpc-url  "ws://localhost:9944" \
        --base-path "/home/user/sub/controller/cache" >> /home/user/sub/logs/controller.log 2>&1 &

plotter start scripts

nohup /home/user/sub/bin/subspace-farmer-ubuntu-x86_64-skylake-mainnet-2025-jan-14 \
    cluster \
        --nats-server "nats://172.16.0.41:4222" \
    plotter   --cuda-gpus "0,1,2,3" \
    >> /home/user/sub/logs/plotter.log 2>&1 &

Problem

farmer has some retry error

2025-06-19T01:51:26.136184Z  INFO {farm_index=5}:{sector_index=362}: subspace_farmer::single_disk_farm::plotting: Plotting sector (5.10% complete)
2025-06-19T01:51:30.031482Z ERROR subspace_farmer::commands::cluster::farmer: Farm errored and stopped farm_index=14 error=Failed to retrieve farmer info: request timed out: deadline has elapsed
2025-06-19T01:52:00.032256Z ERROR subspace_farmer::commands::cluster::farmer: Farm errored and stopped farm_index=14 error=Failed to retrieve farmer info: request timed out: deadline has elapsed
2025-06-19T01:52:21.627675Z  WARN {farm_index=5}:{sector_index=361}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Timed out without ping from plotter
2025-06-19T01:52:21.790299Z  WARN {farm_index=3}:{sector_index=458}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Timed out without ping from plotter
2025-06-19T01:52:22.825276Z  WARN {farm_index=13}:{sector_index=411}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Timed out without ping from plotter
2025-06-19T01:52:23.094832Z  WARN {farm_index=6}:{sector_index=374}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Timed out without ping from plotter
2025-06-19T01:52:24.295761Z  WARN {farm_index=2}:{sector_index=433}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Timed out without ping from plotter
2025-06-19T01:52:24.464232Z  WARN {farm_index=13}:{sector_index=412}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Timed out without ping from plotter
2025-06-19T01:52:24.726921Z  WARN {farm_index=18}:{sector_index=313}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Timed out without ping from plotter
2025-06-19T01:52:25.419336Z  WARN {farm_index=22}:{sector_index=370}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Timed out without ping from plotter
2025-06-19T01:52:26.215304Z  WARN {farm_index=10}:{sector_index=415}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Timed out without ping from plotter
2025-06-19T01:52:27.281179Z  WARN {farm_index=18}:{sector_index=314}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s error=Low-level plotting error: Received only 62285410 sector bytes out of 1056896064 expected bytes
2025-06-19T01:52:30.033443Z ERROR subspace_farmer::commands::cluster::farmer: Farm errored and stopped farm_index=14 error=Failed to retrieve farmer info: request timed out: deadline has elapsed
2025-06-19T01:53:00.034669Z ERROR subspace_farmer::commands::cluster::farmer: Farm errored and stopped farm_index=14 error=Failed to retrieve farmer info: request timed out: deadline has elapsed
2025-06-19T01:53:14.331705Z  INFO async_nats: event: disconnected
2025-06-19T01:53:14.338558Z  INFO async_nats: event: connected
2025-06-19T01:53:30.036031Z ERROR subspace_farmer::commands::cluster::farmer: Farm errored and stopped farm_index=14 error=Failed to retrieve farmer info: request timed out: deadline has elapsed
2025-06-19T01:53:37.018361Z  INFO {farm_index=17}:{sector_index=462}: subspace_farmer::single_disk_farm::plotting: Plotting sector (6.51% complete)
2025-06-19T01:54:00.037509Z ERROR subspace_farmer::commands::cluster::farmer: Farm errored and stopped farm_index=14 error=Failed to retrieve farmer info: request timed out: deadline has elapsed
2025-06-19T01:54:08.481416Z  INFO {farm_index=17}:{sector_index=463}: subspace_farmer::single_disk_farm::plotting: Plotting sector (6.53% complete)
2025-06-19T01:54:08.482028Z  INFO {farm_index=17}:{sector_index=464}: subspace_farmer::single_disk_farm::plotting: Plotting sector (6.54% complete)
2025-06-19T01:54:30.038721Z ERROR subspace_farmer::commands::cluster::farmer: Farm errored and stopped farm_index=14 error=Failed to retrieve farmer info: request timed out: deadline has elapsed
2025-06-19T01:54:31.950795Z  INFO {farm_index=17}:{sector_index=465}: subspace_farmer::single_disk_farm::plotting: Plotting sector (6.55% complete)
2025-06-19T01:54:31.951322Z  INFO {farm_index=17}:{sector_index=466}: subspace_farmer::single_disk_farm::plotting: Plotting sector (6.57% complete)
2025-06-19T01:54:55.379480Z  INFO {farm_index=17}:{sector_index=467}: subspace_farmer::single_disk_farm::plotting: Plotting sector (6.58% complete)
2025-06-19T01:54:55.380003Z  INFO {farm_index=17}:{sector_index=468}: subspace_farmer::single_disk_farm::plotting: Plotting sector (6.60% complete)
2025-06-19T01:55:00.039854Z ERROR subspace_farmer::commands::cluster::farmer: Farm errored and stopped farm_index=14 error=Failed to retrieve farmer info: request timed out: deadline has elapsed
2025-06-19T01:55:18.049379Z  INFO {farm_index=17}:{sector_index=469}: subspace_farmer::single_disk_farm::plotting: Plotting sector (6.61% complete)
2025-06-19T01:55:18.050003Z  INFO {farm_index=17}:{sector_index=470}: subspace_farmer::single_disk_farm::plotting: Plotting sector (6.62% complete)
2025-06-19T01:55:30.040962Z ERROR subspace_farmer::commands::cluster::farmer: Farm errored and stopped farm_index=14 error=Failed to retrieve farmer info: request timed out: deadline has elapsed
2025-06-19T01:55:42.274445Z  INFO {farm_index=17}:{sector_index=471}: subspace_farmer::single_disk_farm::plotting: Plotting sector (6.64% complete)
2025-06-19T01:55:42.308097Z  INFO {farm_index=17}:{sector_index=472}: subspace_farmer::single_disk_farm::plotting: Plotting sector (6.65% complete)
2025-06-19T01:56:00.042646Z ERROR subspace_farmer::commands::cluster::farmer: Farm errored and stopped farm_index=14 error=Failed to retrieve farmer info: request timed out: deadline has elapsed
2025-06-19T01:56:05.205854Z  INFO {farm_index=17}:{sector_index=473}: subspace_farmer::single_disk_farm::plotting: Plotting sector (6.67% complete)
2025-06-19T01:56:05.207883Z  INFO {farm_index=17}:{sector_index=474}: subspace_farmer::single_disk_farm::plotting: Plotting sector (6.68% complete)

nats log

[758555] 2025/06/19 01:58:24.437801 [INF] 172.16.0.210:53028 - cid:33 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 02:06:08.478966 [INF] 172.16.0.210:34662 - cid:34 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 02:13:52.001808 [INF] 172.16.0.210:45122 - cid:35 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 02:48:32.622433 [INF] 172.16.0.210:56376 - cid:36 - Slow Consumer Detected: MaxPending of 67108864 Exceeded
[758555] 2025/06/19 02:48:35.145185 [INF] 172.16.0.210:56376 - cid:36 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 02:53:27.609596 [INF] 172.16.0.210:38096 - cid:37 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 174 chunks of 11324864 total bytes.
[758555] 2025/06/19 03:01:10.153342 [INF] 172.16.0.210:33790 - cid:38 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 03:08:54.390576 [INF] 172.16.0.210:54336 - cid:39 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 03:16:35.803371 [INF] 172.16.0.210:42274 - cid:40 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662433 total bytes.
[758555] 2025/06/19 03:24:19.880461 [INF] 172.16.0.210:34412 - cid:41 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 03:59:02.718683 [INF] 172.16.0.210:38482 - cid:42 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 04:03:55.227528 [INF] 172.16.0.210:55434 - cid:43 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 04:11:39.014914 [INF] 172.16.0.210:53296 - cid:44 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 04:19:22.267332 [INF] 172.16.0.210:40718 - cid:45 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 04:27:06.330523 [INF] 172.16.0.210:47204 - cid:46 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 04:34:49.542196 [INF] 172.16.0.210:40704 - cid:47 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 05:09:33.711950 [INF] 172.16.0.210:37904 - cid:48 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 05:14:27.811272 [INF] 172.16.0.210:45110 - cid:49 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 90 chunks of 5801610 total bytes.
[758555] 2025/06/19 05:22:10.082283 [INF] 172.16.0.210:38576 - cid:50 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 05:29:54.653596 [INF] 172.16.0.210:48396 - cid:51 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 05:37:38.073034 [INF] 172.16.0.210:48380 - cid:52 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662432 total bytes.
[758555] 2025/06/19 05:45:14.799090 [INF] 172.16.0.210:39394 - cid:53 - Slow Consumer Detected: MaxPending of 67108864 Exceeded
[758555] 2025/06/19 05:45:21.351270 [INF] 172.16.0.210:39394 - cid:53 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 174 chunks of 11324866 total bytes.
[758555] 2025/06/19 06:20:04.540960 [INF] 172.16.0.210:34562 - cid:54 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662433 total bytes.
[758555] 2025/06/19 06:24:57.901459 [INF] 172.16.0.210:34588 - cid:55 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662433 total bytes.
[758555] 2025/06/19 06:32:41.910903 [INF] 172.16.0.210:57018 - cid:56 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662433 total bytes.
[758555] 2025/06/19 06:40:23.844188 [INF] 172.16.0.210:34530 - cid:57 - Slow Consumer Detected: MaxPending of 67108864 Exceeded
[758555] 2025/06/19 06:40:26.255352 [INF] 172.16.0.210:34530 - cid:57 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662433 total bytes.
[758555] 2025/06/19 06:48:08.825384 [INF] 172.16.0.210:36862 - cid:58 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 174 chunks of 11324866 total bytes.
[758555] 2025/06/19 06:55:47.108478 [INF] 172.16.0.210:35786 - cid:59 - Slow Consumer Detected: MaxPending of 67108864 Exceeded
[758555] 2025/06/19 06:55:52.671982 [INF] 172.16.0.210:35786 - cid:59 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662433 total bytes.
[758555] 2025/06/19 07:30:34.890604 [INF] 172.16.0.210:52798 - cid:60 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662433 total bytes.
[758555] 2025/06/19 07:35:20.137993 [INF] 172.16.0.210:53128 - cid:61 - Slow Consumer Detected: MaxPending of 67108864 Exceeded
[758555] 2025/06/19 07:35:27.162466 [INF] 172.16.0.210:53128 - cid:61 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 176 chunks of 11464044 total bytes.
[758555] 2025/06/19 07:43:09.644323 [INF] 172.16.0.210:47526 - cid:62 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662433 total bytes.
[758555] 2025/06/19 07:50:54.203138 [INF] 172.16.0.210:56980 - cid:63 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662433 total bytes.
[758555] 2025/06/19 07:58:37.529335 [INF] 172.16.0.210:37210 - cid:64 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662433 total bytes.
[758555] 2025/06/19 08:06:18.531061 [INF] 172.16.0.210:35886 - cid:65 - Slow Consumer Detected: MaxPending of 67108864 Exceeded
[758555] 2025/06/19 08:06:22.572024 [INF] 172.16.0.210:35886 - cid:65 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662433 total bytes.
[758555] 2025/06/19 08:41:05.837315 [INF] 172.16.0.210:38872 - cid:66 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662433 total bytes.
[758555] 2025/06/19 08:45:51.831237 [INF] 172.16.0.210:40640 - cid:67 - Slow Consumer Detected: MaxPending of 67108864 Exceeded
[758555] 2025/06/19 08:45:58.238537 [INF] 172.16.0.210:40640 - cid:67 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 261 chunks of 16987299 total bytes.
[758555] 2025/06/19 08:53:42.466011 [INF] 172.16.0.210:36352 - cid:68 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 90 chunks of 5801611 total bytes.
[758555] 2025/06/19 09:01:21.477297 [INF] 172.16.0.210:33700 - cid:69 - Slow Consumer Detected: MaxPending of 67108864 Exceeded
[758555] 2025/06/19 09:01:26.848853 [INF] 172.16.0.210:33700 - cid:69 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 261 chunks of 16987384 total bytes.
[758555] 2025/06/19 09:09:09.748325 [INF] 172.16.0.210:54990 - cid:70 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 174 chunks of 11324931 total bytes.
[758555] 2025/06/19 09:16:54.065987 [INF] 172.16.0.210:59114 - cid:71 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 88 chunks of 5662433 total bytes.

Is NATS on the node or farmer server? Have you trued using max_payload = 2MB as recommended in the official docs? Farming Cluster | Farm from Anywhere

NATS confirms the config file being used when it starts - do you see something like Using configuration file: ... at startup?

Your cache start command is a controller start - that’s a typo?

And your controller base-path is the cache folder?