I had an issue with a drive last night and noticed that if a drive drops out the farmer just continues looking for it with an error and does not keep farming the other drives. Drives do go bad and can disconnect for various reasons but I understand it does not happen often. Would it be worth looking into making the farmer continue farming if this occurs?
Did the whole farmer process exit? I am planning to make it just print an error about the farm that failed and continue, just didn’t get to it yet.
No it did not exit, just kept printing the error over and over and nothing else.
It’s the same topic I posted here other day. I’ve had same issue when one of my disk got faulty.
Do you have logs from right before this happened? Ideally as text, screenshots are painful to work with.
Should be addressed by Improve error handling by nazar-pc · Pull Request #2639 · subspace/subspace · GitHub
For me, the whole farmer process exit.
This happened many times on one of my machine, maybe because there is something wrong with sata power cable, every one day or so, a disk would be drops out and the farmer would exit, I would have to re-plugin the disk and re-launch farmer.
It would be great if we can keep farmer running with the left disks with a warning message. Below is the log when this happens:
2024-03-24T21:07:32.765924Z INFO {farm_index=0}: subspace_farmer::reward_signing: Successfully signed reward hash 0xf43feccf866e52102357bee67d000e25013c648ea0a9af0fb43db88a19feeb1b
2024-03-24T23:12:03.557530Z INFO {farm_index=1}: subspace_farmer::reward_signing: Successfully signed reward hash 0x9c20001740f45e709b180b013b08152c8f182af7f3b94a2b809b7ead21a4848c
2024-03-24T23:17:20.192270Z WARN subspace_networking::behavior::persistent_parameters: Failed to flush known peers to disk error=A device which does not exist was specified. (os error 433)
Error: Low-level auditing error: Failed read s-bucket 34783 of sector 330: A device which does not exist was specified. (os error 433)
Caused by:
Failed read s-bucket 34783 of sector 330: A device which does not exist was specified. (os error 433)
You should definitely look into that. While we can make code behave better in those cases, we’ll not be able to make missing drive re-appear again.