After receiving the below error my plotter no longer creates any new sectors. No errors on NATs, the Farmer has no errors, it just does not progress any further. Restarting the plotters and farmers will fix the issue.
I noticed in Discord 3-4 other people also have had this same issue. I can add more logs if needed.
Just happened again. CPU usage is almost 0% after this happens. The Plotter docker container is still up but the last log is just the WARN log.
Also just for more information on my setup in case it helps. I have a two plotters with 4090s. One has NATs, Controller, Cache, Plotter. The other has Controller, Cache, Plotter and connected to NATs on the other host. Then my farmers are 6 hosts with 16T each. They connect to NATs on the plotter. They are running only a Farmer.
Also one thing I just noticed - I only need to restart the Farmer and it starts plotting again.
I am also seeing this behaviour. May be a complete coincidence but I also have a 4090 plotter (need more data points I suspect). Plotting halts after these messages and restarting the farmer gets it going again.
First to start farmer with RUST_LOG=info,subspace_farmer=trace, which will generate a lot more logs, but should also help with debugging (make sure Docker is configured to retain a couple GB of logs at least).
Second, it’d help to dump backtraces of threads with gdb when this happens.
Here is how I’d do it:
Okay I’ve added the RUST_LOG, will report back when it happens again. Another interesting thing I noticed is that I have 6 farmers attached to the plotter. But ALL farmers stop when the plotter throws that WARN. Restarting any of the Farmers will get plotting going again for just that Farmer.