Cluster Farm plotter/farmer panic

Qwinn · June 14, 2024, 9:10pm

It may be a pain to read. I use a lot of environment variables to make updating easy. This is an include in all of them.

/mnt/sub/space/subspace.i:

NODEIP="192.168.4.21"
NATSIP="192.168.4.21"
NODERPC="9944"
CONTROLLERIP="192.168.4.21"
FARMPROMETHEUS="2222"
NODEEXEC="subspace-node-ubuntu-x86_64-skylake-gemini-3h-"
FARMEXEC="subspace-farmer-ubuntu-x86_64-skylake-gemini-3h-"
CURRVER="2024-jun-11"
DISKPARMS="record-chunks-mode=ConcurrentChunks"
REWARDADDRESS="blahblahblah"

/mnt/sub/space/startcontroller:

#!/bin/bash

. /mnt/sub/space/subspace.i

$PWD/$FARMEXEC$CURRVER cluster --nats-server nats://127.0.0.1:4222 \
controller \
--base-path /mnt/sub/space/controller \
--node-rpc-url ws://127.0.0.1:$NODERPC \
--listen-on /ip4/$CONTROLLERIP/tcp/30533 \
>> $PWD/controllerS-$CURRVER.log 2>&1

/mnt/sub/space/startcache:

. /mnt/sub/space/subspace.i

$PWD/$FARMEXEC$CURRVER cluster --nats-server nats://127.0.0.1:4222 \
cache \
path=$PWD/cache,size=200G

/mnt/sub/space/startplotter:

. /mnt/sub/space/subspace.i

./subspace-farmer cluster --nats-server nats://$NATSIP:4222 \
    plotter \
    --plotting-thread-pool-size 8 \
    >> $PWD/plotterS-$CURRVER.log 2>&1

Qwinn · June 14, 2024, 9:12pm

Wait a minute. I just noticed that last one, the plotter, is callilng “subspace-farmer”. That must be the #344 build. Ugh.

Let me try fixing that.

EDIT: Sigh, yeah, it looks like that was the issue. Sorry. Trying unusually named builds with my setup makes me prone to this sort of thing. I’ve solved it since by just renaming unusual builds to the current date as if they are a normal release, then letting my scripts run unaltered.

nazar-pc · June 14, 2024, 9:20pm

Right, that is why I asked for full commands
One more mystery resolved then

I should have been more careful and make sure it either works or doesn’t rather than crashing that way but was lazy.

Qwinn · June 14, 2024, 9:50pm

Yeah, good call on having me go over those. Hopefully with my new method for how I run snapshot builds it shouldn’t happen again.

Qwinn · June 15, 2024, 1:20am

One very nice improvement in this jun-11 version is I’m not getting occasional timeout errors in the plotter anymore, at least so far. Those were quite annoying.

nazar-pc · June 15, 2024, 1:32am

Test build mentioned above fixes more reasons that can result in timeouts, though they are definitely more rare now. Appreciate all the testing, it helps to make software better!

Qwinn · June 15, 2024, 5:05am

Quite welcome.

Off topic tip: Running farmer-executable cluster plotter -help. The description of the plotting-cpu-cores option mentions a requirement to set --replotting-cpu-cores a certain way, but that option doesn’t appear to exist in cluster plotting (which makes sense).

I’m really impressed at how well jun-11 cluster is working. Very smooth, no errors, good performance and not heavily impacting other processes. Nice work.

(That said, I still haven’t enabled farming or plotting across the LAN. But locally, working great. I’ll start that when all my local disks are fully plotted and replotted. That may take a few days.)

JeanS · June 15, 2024, 11:49am

I do have same problem on 2 out of my 3 plotters.

How I’ll get this fix (snapshot build) in use as I am using Portainer / Docker Stacks config?

My current plotter config:

version: “3.8”
services:
ss_plot_zen:
image: Package farmer · GitHub
command:
[
“cluster”,
“–nats-server”, “nats_edge:4222”,
“plotter”
]
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.hostname == sub-zen
environment:
- TZ=Europe/Finland
labels:
js-subspace-plotter.name: “Subspace Plotter - Zen”
networks:
subspace_nwk:

nazar-pc · June 15, 2024, 5:36pm

Copy-paste typos: Fix cluster plotter docs by nazar-pc · Pull Request #2853 · subspace/subspace · GitHub

As mentioned in solution above you need to make sure to run the same release of node and all farmer components. If you have mismatches you may get these issues. Going forward we’ll try to avoid them though.

JeanS · June 15, 2024, 6:10pm

Server 1 is Dell Precision 5820 (Intel® Xeon® Processor W-2000 Family) with DDR4 EEC memory;
Everything works fine on this PC (Ubuntu 24.04 LTS as a Proxmox VM); cache, controller, farmer, node and plotter.
I am using gemini-3h-2024-jun-11.
Problem is that this server has pretty weak CPU compared other servers, but it has most SSD capacity.

Server 2 is Dell Poweredge R720 - Xeon with DDR3 ECC.
None of official builds works on this; farmer, cache, controller and plotter crashes on startup (no node on this server)
But my own custom build works on this, except plotter has this ‘Invalid Scalar’ error.

Server 3 is Threadripper 1950x with DDR4 ECC memory. This has only plotter and farmer (controller and cache used from server 2 and node from server 1). Plotter has this ‘Invalid Scalar’ issue.

It is a bit challenge to use exactly same versions on all servers, but I can continue testing. Otherwise I’ll put on hold these Server 2 and 3.

nazar-pc · June 15, 2024, 6:23pm

When upgrading to jun-11 you have to use the same versions or you will definitely have issues

Topic		Replies	Views
There are too many warning logs in the farmer cluster Incentivized Testnet farmer	23	247	June 11, 2024
Plotting Stops after Error (taurus-oct-2024) Support	10	138	November 1, 2024
Immediate panicking after plotting starts Incentivized Testnet	2	203	December 6, 2023
Jun-18 subspace farmer plotter ping timeout Support	7	83	June 26, 2024
Farmer 15-feb stop plotting, farming keep on Support	11	231	March 11, 2024

Cluster Farm plotter/farmer panic

Related topics