Farming cluster

Is the ARM server a pi or something similar? If so I dont think you should enable plotting on it, might just slow things down.

ARM server is 8 cores aarch64 CPU with 4 core 1.8GHz, and the other 4 cores are 2.8GHz. Not a pi.

If you run a plotter on that slow ARM machine, then it will plot on that slow ARM machine in addition to x86-64. If you don’t want to use it for plotting - don’t run plotter there.

I have 10 servers, one of server has a large SSD to plot.
How to arrange controller, cache, plotter and farmer in the servers to to do a quick plotting to the SSD?

For plotting the primary thing that matters is plotter. You’ll want to run it only on the machine/machines that are fast. You don’t need to run it everywhere. The faster is networking between machines the better it is. Having 10G network is ideal, but not required of course.

The log formats of the farmers are not consistent, and there are quite a few warning messages. Is this normal?

2024-05-24T07:50:32.111870Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (7.02% complete) sector_index=73
2024-05-24T07:50:32.183444Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (7.12% complete) sector_index=74
2024-05-24T07:50:36.485025Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (7.21% complete) sector_index=75
2024-05-24T07:50:42.236333Z  WARN {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=71}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s sector_index=74 error=Low-level plotting error: Timed out without ping from plotter
2024-05-24T07:50:44.120945Z  INFO {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=71}: subspace_farmer::single_disk_farm::plotting: Plotting sector (7.31% complete) sector_index=76
2024-05-24T07:50:46.891632Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry sector_index=71
2024-05-24T07:50:50.888259Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry sector_index=74
2024-05-24T07:50:54.017888Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (7.40% complete) sector_index=77
2024-05-24T07:50:54.359936Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (7.50% complete) sector_index=78
2024-05-24T07:50:57.398469Z  WARN {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s sector_index=76 error=Low-level plotting error: Timed out without ping from plotter
2024-05-24T07:51:00.939702Z  WARN {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=79}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s sector_index=74 error=Low-level plotting error: Timed out without ping from plotter
2024-05-24T07:51:01.363339Z  INFO {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=79}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry sector_index=76
2024-05-24T07:51:01.929071Z  WARN {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=79}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s sector_index=71 error=Low-level plotting error: Timed out without ping from plotter
2024-05-24T07:51:04.598118Z  INFO {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=79}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=74}: subspace_farmer::single_disk_farm::plotting: Plotting sector (7.60% complete) sector_index=79
2024-05-24T07:51:16.403998Z  WARN {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=79}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=74}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s sector_index=76 error=Low-level plotting error: Timed out without ping from plotter
2024-05-24T07:51:19.176133Z  INFO {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=79}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=74}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=71}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=80}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry sector_index=71
2024-05-24T07:51:26.535976Z  INFO {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=79}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=74}: subspace_farmer::single_disk_farm::plotting: Plotting sector (7.69% complete) sector_index=80
2024-05-24T07:51:33.469966Z  INFO {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=79}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=74}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry sector_index=76
2024-05-24T07:51:43.480165Z  WARN {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=79}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=74}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=81}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s sector_index=76 error=Low-level plotting error: Timed out without ping from plotter
2024-05-24T07:53:30.455822Z  INFO {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=79}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=74}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=81}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry sector_index=76
2024-05-24T07:53:52.804138Z  INFO {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=79}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=74}: subspace_farmer::single_disk_farm::plotting: Plotting sector (7.79% complete) sector_index=81
2024-05-24T07:53:52.806595Z  INFO {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=79}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=74}: subspace_farmer::single_disk_farm::plotting: Plotting sector (7.88% complete) sector_index=82
2024-05-24T07:53:52.807674Z  INFO {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=79}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=74}: subspace_farmer::single_disk_farm::plotting: Plotting sector (7.98% complete) sector_index=83
2024-05-24T07:53:52.809282Z  INFO {farm_index=0}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=79}:{public_key=8cea533ae6691fd7167f9565993bbb4e7f4fd17150f8f41c306fc9778e10a071 sector_index=74}: subspace_farmer::single_disk_farm::plotting: Plotting sector (8.08% complete) sector_index=84

Yeah, logs are off. Plotting warnigns are not good and should ideally not happen. Please create a separate forum thread with description of your setup and we’ll try to figure out why it happens.

Logs should be fixed with Improve farming cluster logging by nazar-pc ¡ Pull Request #2789 ¡ subspace/subspace ¡ GitHub

I expect farming-cluster to accelerate my plotting phrase, and here is my test.
Test ONE:
Server 1: controller, cache, plotter, farmer.
It took 172 mins to plot 20GB by itself.

Test TWO:
6 servers are used:
Server 1: controller, cache, plotter, farmer
Server 2, 3, 4, 5: controller, cache, plotter
Server 6: NATs
It took 179 mins to plot 20GB by the cluster.

Note: clear the farming directory before the test.

Why there is no plotting speed improve by farming-cluster feature? In which case we can see the benefit of farming-cluster?

By default farmer will not not request to plot more than 8 sectors at a time in order to limit memory usage (though I think we’ll increase that signficiantly due to some improvements already done to the plotting process), so if your servers provide more capacity in total, it will not actually be utilized. You can change that by increasing --sector-encoding-concurrency to something like 100.

The non-cluster version’s Plot directory cannot be continued with the cluster version for Plot, right?

root@172-29-100-142:~# cat /disk/nvme1n1/ssc-3h/farmer-1/single_disk_farm.json |jq {
  "v0": {
    "id": "01HY2PPKMTBXGD3VGMTV9BW5JW",
    "genesisHash": "0c121c75f4ef450f40619e1fca9d1e8e7fbabc42c895bc4790801e85d5a91c34",
    "publicKey": "e29c3bff319a6b1da300c3d794c1f089e0ef13bf4cfdc7a93f26724ae60ef77c",
    "piecesInSector": 1000,
    "allocatedSpace": 7573101084672
  }
}
root@172-29-100-142:~# 
root@172-29-100-142:~# /root/ssc/subspace-farmer-cluster info /disk/nvme1n1/ssc-3h/farmer-1/Single disk farm 0:
  ID: 01HY2PPKMTBXGD3VGMTV9BW5JW
  Genesis hash: 0x0c121c75f4ef450f40619e1fca9d1e8e7fbabc42c895bc4790801e85d5a91c34
  Public key: 0xe29c3bff319a6b1da300c3d794c1f089e0ef13bf4cfdc7a93f26724ae60ef77c
  Allocated space: 6.9 TiB (7.6 TB)
  Directory: /disk/nvme1n1/ssc-3h/farmer-1/
root@172-29-100-142:~# 
root@172-29-100-142:~# /root/ssc/subspace-farmer-cluster cluster --nats-servers nats://172.29.100.141:4242 farmer --reward-address stC1HgpMEVpwYKEfiPPcDpLnwhmtV7RfaZHQjuUk1DfMbxxx path=/disk/nvme1n1/ssc-3h/farmer-1,size=7053GiB 
2024-05-27T03:06:45.096417Z  INFO async_nats: event: connected
2024-05-27T03:06:45.096494Z  INFO async_nats: event: connected
2024-05-27T03:06:45.096464Z  INFO async_nats: event: connected
2024-05-27T03:06:45.096488Z  INFO async_nats: event: connected
2024-05-27T03:06:45.096414Z  INFO async_nats: event: connected
2024-05-27T03:06:45.096488Z  INFO async_nats: event: connected
2024-05-27T03:06:45.096517Z  INFO async_nats: event: connected
2024-05-27T03:06:45.096516Z  INFO async_nats: event: connected
2024-05-27T03:06:45.805534Z ERROR {farm_index=0}: subspace_farmer::commands::cluster::farmer: Farm creation failed error=Can't preallocate plot file, probably not enough space on disk: File exists (os error 17)
Error: Can't preallocate plot file, probably not enough space on disk: File exists (os error 17)

Simply reduce the plot size or remove the piece_cache.bin from each plot drive as they are not used in clustering.

2 Likes

ok、thank you、This is indeed feasible.

with --sector-encoding-concurrency 100 parameter, the plot time is 3 hours and 20 minutes, longer than without it.

In which case, farming cluster can perforce better than regular farmer?

./subspace-farmer cluster --nats-server nats://192.168.0.10:4222 farmer --reward-address stXXX path=/data/farm_test,size=20GiB --sector-encoding-concurrency 100
2024-05-27T04:26:06.436001Z  INFO async_nats: event: connected
2024-05-27T04:26:06.436282Z  INFO async_nats: event: connected
2024-05-27T04:26:06.437065Z  INFO async_nats: event: connected
2024-05-27T04:26:06.437555Z  INFO async_nats: event: connected
2024-05-27T04:26:06.438044Z  INFO async_nats: event: connected
2024-05-27T04:26:06.438299Z  INFO async_nats: event: connected
2024-05-27T04:26:06.438353Z  INFO async_nats: event: connected
2024-05-27T04:26:06.439017Z  INFO async_nats: event: connected
2024-05-27T04:26:06.855588Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plot_cache: Checking plot cache contents, this can take a while
2024-05-27T04:26:06.856689Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plot_cache: Finished checking plot cache contents
2024-05-27T04:26:06.857521Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm: Benchmarking faster proving method
2024-05-27T04:26:08.188449Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm: Faster proving method found fastest_mode=ConcurrentChunks
2024-05-27T04:26:08.219843Z  INFO {farm_index=0}: subspace_farmer::commands::cluster::farmer: Farm 0:
2024-05-27T04:26:08.219854Z  INFO {farm_index=0}: subspace_farmer::commands::cluster::farmer:   ID: 01HYW4RTM2AS2Y9V1FH3WDQXQQ
2024-05-27T04:26:08.219871Z  INFO {farm_index=0}: subspace_farmer::commands::cluster::farmer:   Genesis hash: 0x0c121c75f4ef450f40619e1fca9d1e8e7fbabc42c895bc4790801e85d5a91c34
2024-05-27T04:26:08.219873Z  INFO {farm_index=0}: subspace_farmer::commands::cluster::farmer:   Public key: 0xf0d0ee649c301a031dbbcd5d964e697fad354979a758e029d5dd9cb6c267711f
2024-05-27T04:26:08.219879Z  INFO {farm_index=0}: subspace_farmer::commands::cluster::farmer:   Allocated space: 20.0 GiB (21.5 GB)
2024-05-27T04:26:08.219881Z  INFO {farm_index=0}: subspace_farmer::commands::cluster::farmer:   Directory: /data/farm_test
2024-05-27T04:26:08.220270Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::farming: Subscribing to slot info notifications
2024-05-27T04:26:08.220305Z  INFO {farm_index=0}: subspace_farmer::reward_signing: Subscribing to reward signing notifications
2024-05-27T04:26:08.222332Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Subscribing to archived segments
2024-05-27T04:26:08.227048Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (0.00% complete) sector_index=0
2024-05-27T04:26:08.230237Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (5.00% complete) sector_index=1
2024-05-27T04:26:08.234533Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (10.00% complete) sector_index=2
2024-05-27T04:26:08.239297Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (15.00% complete) sector_index=3
2024-05-27T04:26:08.246499Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (20.00% complete) sector_index=4
2024-05-27T04:26:08.251019Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (25.00% complete) sector_index=5
2024-05-27T04:26:09.488217Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (30.00% complete) sector_index=6
2024-05-27T04:26:09.983376Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (35.00% complete) sector_index=7
2024-05-27T04:26:09.998948Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (40.00% complete) sector_index=8
2024-05-27T04:26:10.613134Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (45.00% complete) sector_index=9
2024-05-27T04:26:10.931575Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (50.00% complete) sector_index=10
2024-05-27T04:26:10.954486Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (55.00% complete) sector_index=11
2024-05-27T04:26:20.943550Z  WARN {farm_index=0}:{public_key=f0d0ee649c301a031dbbcd5d964e697fad354979a758e029d5dd9cb6c267711f sector_index=12}: subspace_farmer::single_disk_farm::plotting: Failed to plot sector, retrying in 1s sector_index=10 error=Low-level plotting error: Timed out without ping from plotter
2024-05-27T04:38:32.309727Z  INFO {farm_index=0}:{public_key=f0d0ee649c301a031dbbcd5d964e697fad354979a758e029d5dd9cb6c267711f sector_index=12}:{public_key=f0d0ee649c301a031dbbcd5d964e697fad354979a758e029d5dd9cb6c267711f sector_index=10}: subspace_farmer::single_disk_farm::plotting: Plotting sector (60.00% complete) sector_index=12
2024-05-27T04:44:20.580794Z  INFO {farm_index=0}:{public_key=f0d0ee649c301a031dbbcd5d964e697fad354979a758e029d5dd9cb6c267711f sector_index=12}:{public_key=f0d0ee649c301a031dbbcd5d964e697fad354979a758e029d5dd9cb6c267711f sector_index=10}:{public_key=f0d0ee649c301a031dbbcd5d964e697fad354979a758e029d5dd9cb6c267711f sector_index=13}: subspace_farmer::single_disk_farm::plotting: Plotting sector retry sector_index=10
2024-05-27T05:42:56.406019Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (65.00% complete) sector_index=13
2024-05-27T05:44:34.541380Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (70.00% complete) sector_index=14
2024-05-27T06:01:38.951693Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (75.00% complete) sector_index=15
2024-05-27T06:03:51.978759Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (80.00% complete) sector_index=16
2024-05-27T06:04:09.576539Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (85.00% complete) sector_index=17
2024-05-27T06:06:04.091562Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (90.00% complete) sector_index=18
2024-05-27T06:06:45.322035Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (95.00% complete) sector_index=19
2024-05-27T07:46:47.125506Z  INFO {farm_index=0}: subspace_farmer::single_disk_farm::plotting: Initial plotting complete

Hard to answer without knowing what machines you have, what networking you have between them, etc. This thread is already quite long, please create a separate topic and describe all the machines you have, networking between them and what components each of them is running.

System Specs
Alpha has a 12c Ryzen and runs only node, cache and controller + nats.

Farmers are all 7950X or HP Z840s with a v4 Xeon.

Errors
Controller
2024-06-03T12:57:01.149958Z INFO async_nats: event: slow consumers for subscription 1

The above messages was getting spammed until I restarted my Controller

Controller Logs

I can upload logs for other parts of the cluster if needed.

System Diagram:
Screenshot 2024-06-03 060545

I have detailed logs with a similar issue, I’ll ping you if more information is needed

1 Like

Turns out I can benefit from logs with RUST_LOG=info,subspace_farmer=trace, thanks!

It took about two days to replicate the slow consumers for subscription 1 errors. I rotated the logs and was able to capture it with just under one day of logs. I then realized I only had the trace on the controller, cache, and plotter. (Standard logging on the farmers). The logs are pretty good size ~1.9GiB (~158MiB compressed).

If you’d like me to run this again with the farmers using extended logs, let me know.

Snapshot build #342

System Specs

Role: NATS (nats.log)
Dual Socket 2680v3 24 cores (48 threads)
Link: 20Gbit

Role: Cache, Controller, Plotter, Farmer (cache.log controller.log plotter.log farmer1.log)
Dual Socket 7742 128 cores (256 threads)
Cache: 100GiB
Plots: 109T
Link: 100Gbit

Role: Node (node.log)
EPYC 7F72 24 Cores (48 Threads)
Plots: 0
Link: 100Gbit

Role: Farmer (farmer3.log)
Dual Socket 2687v4 24 cores (48 threads)
Plots:189T
Link: 20Gbit

Role: Farmer (farmer4.log)
Dual Socket 2697A 32 Cores (64 Threads)
Plots:189T
Link: 20Gbit

Role: Farmer (farmer6.log)
Dual Socket 6136 24 Cores (48 Threads)
Plots 91TB
Link: 20Gbit

Logs

I’m wondering if you’ll be able to reproduce it with There are too many warning logs in the farmer cluster - #22 by duanyz_aiyo