Node not able to sync for days

Hello, out of all the nodes i have one that isn’t able to sync. I have no clue why that is. It’s stuck at the moment at 13k. It originally was synced, then it stopped being able to sync. Eventually I was thinking the database might be corrupted so I deleted the node’s database. Now it’s still stuck at 13k height. I tried with fewer in/out peers and at the moment with a lot. This is 2x 32 core Epyc/Milan and Fedora 37. Not sure what else to tell you. I think there is something specific to this node.

1 Like

Blockquote
– firewall ports open
firewall-cmd --list-ports
3389/tcp 30200-30206/tcp 30900-30906/tcp

Blockquote
– listening on ports
lsof -i -P -n | grep LISTEN |grep -i sub
subspace- 248964 srv_subspace 292u IPv4 17774598 0t0 TCP *:30904 (LISTEN)
subspace- 248964 srv_subspace 297u IPv6 17774601 0t0 TCP *:30900 (LISTEN)
subspace- 248964 srv_subspace 299u IPv4 17774603 0t0 TCP *:30900 (LISTEN)
subspace- 248964 srv_subspace 466u IPv4 18735601 0t0 TCP 127.0.0.1:30903 (LISTEN)
subspace- 248964 srv_subspace 467u IPv4 18936226 0t0 TCP 127.0.0.1:30901 (LISTEN)

That’s how it’s looking for days… Restart of node or server doesn’t help. I have 2 totally identical machines, same HW and SW, no issues there. Where and what do I need to dig out find the reason for this issue

There was an issue with node sync from DSN in older releases, sep-13-2 fixes that. Also you have way too many peers, no need to customize node options, it may cause more harm than good for you. Defaults should work just fine. Might take some time before it starts importing blocks, but it shouldn’t take days or anything like that.

Thanks, 13-2 has made no difference so far and as I said, since it doesn’t sync i started to change defaults for example the peers, i tried with lesser and with more. None of which had a positive effect.

sep-13-2 takes time to start, give it maybe an hour, depending on what is the last block you have in archival history, but it should get un-stuck. If you restart the node you just reset the process to zero. You can observe more detailed logs with RUST_LOG=info,subspace_service=trace, it will print some useful details about sync process so you can see it is doing something.

Here is another node that went unsync… In the log it looks good until

Blockquote
2023-09-16 08:10:55 [Consensus] :sparkles: Imported #453531 (0xca39…b27b)
2023-09-16 08:10:55 [Consensus] :sparkles: Imported #453530 (0xc379…23bb)
2023-09-16 08:10:57 [Consensus] :sparkles: Imported #453532 (0xf412…1fea)
2023-09-16 08:10:58 [Consensus] :zzz: Idle (17 peers), best: #453532 (0xf412…1fea), finalized #232206 (0x9143…9e6f), :arrow_down: 61.8kiB/s :arrow_up: 27.0kiB/s
2023-09-16 08:10:58 [Consensus] :ballot_box: Claimed vote at slot 1694866258
2023-09-16 08:10:59 [Consensus] Received notification to sync from DSN reason=WentOnlineSubspace
2023-09-16 08:11:03 [Consensus] :zzz: Idle (19 peers), best: #453532 (0xf412…1fea), finalized #232206 (0x9143…9e6f), :arrow_down: 13.5kiB/s :arrow_up: 7.0kiB/s
2023-09-16 08:11:08 [Consensus] :zzz: Idle (23 peers), best: #453532 (0xf412…1fea), finalized #232206 (0x9143…9e6f), :arrow_down: 25.0kiB/s :arrow_up: 1.5kiB/s
2023-09-16 08:11:13 [Consensus] :zzz: Idle (25 peers), best: #453532 (0xf412…1fea), finalized #232206 (0x9143…9e6f), :arrow_down: 17.6kiB/s :arrow_up: 0.4kiB/s
2023-09-16 08:11:18 [Consensus] :zzz: Idle (27 peers), best: #453532 (0xf412…1fea), finalized #232206 (0x9143…9e6f), :arrow_down: 33.4kiB/s :arrow_up: 0.6kiB/s
2023-09-16 08:11:23 [Consensus] :zzz: Idle (27 peers), best: #453532 (0xf412…1fea), finalized #232206 (0x9143…9e6f), :arrow_down: 25.9kiB/s :arrow_up: 0.4kiB/s
2023-09-16 08:11:28 [Consensus] :zzz: Idle (26 peers), best: #453532 (0xf412…1fea), finalized #232206 (0x9143…9e6f), :arrow_down: 27.7kiB/s :arrow_up: 0.9kiB/s
2023-09-16 08:11:33 [Consensus] :gear: Syncing 0.0 bps, target=#453539 (25 peers), best: #453532 (0xf412…1fea), finalized #232206 (0x9143…9e6f), :arrow_down: 18.3kiB/s :arrow_up: 0.8kiB/s
2023-09-16 08:11:38 [Consensus] :gear: Syncing 0.0 bps, target=#453540 (22 peers), best: #453532 (0xf412…1fea), finalized #232206 (0x9143…9e6f), :arrow_down: 24.5kiB/s :arrow_up: 0.4kiB/s
2023-09-16 08:11:43 [Consensus] :gear: Syncing 0.0 bps, target=#453540 (21 peers), best: #453532 (0xf412…1fea), finalized #232206 (0x9143…9e6f), :arrow_down: 23.3kiB/s :arrow_up: 0.6kiB/s

…it continues this way for 4-5 hours until there is this WS transport error: i/o and then it start’s to sync again. this is currently sep-11

Blockquote
2023-09-16 12:24:54 [Consensus] :gear: Syncing 0.0 bps, target=#456180 (15 peers), best: #453724 (0xad40…2cdc), finalized #232206 (0x9143…9e6f), :arrow_down: 50.1kiB/s :arrow_up: 0.4kiB/s
2023-09-16 12:24:59 [Consensus] :gear: Syncing 0.0 bps, target=#456180 (15 peers), best: #453724 (0xad40…2cdc), finalized #232206 (0x9143…9e6f), :arrow_down: 36.3kiB/s :arrow_up: 0.7kiB/s
2023-09-16 12:25:04 [Consensus] :gear: Syncing 0.0 bps, target=#456180 (13 peers), best: #453724 (0xad40…2cdc), finalized #232206 (0x9143…9e6f), :arrow_down: 37.9kiB/s :arrow_up: 0.4kiB/s
2023-09-16 12:25:09 [Consensus] :gear: Syncing 0.0 bps, target=#456180 (13 peers), best: #453724 (0xad40…2cdc), finalized #232206 (0x9143…9e6f), :arrow_down: 43.5kiB/s :arrow_up: 0.4kiB/s
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 64
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 58
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 52
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 55
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 54
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 53
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 56
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 59
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 57
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 60
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 34
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 63
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 43
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 37
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 36
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 62
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 46
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 47
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 42
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 51
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 35
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 45
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 61
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 41
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 50
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 48
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 40
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 44
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 49
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 39
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 38
2023-09-16 12:25:12 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 33
2023-09-16 12:25:13 WS transport error: i/o error: Connection reset by peer (os error 104); terminate connection: 0
2023-09-16 12:25:14 [Consensus] :gear: Syncing 0.0 bps, target=#456183 (16 peers), best: #453724 (0xad40…2cdc), finalized #232206 (0x9143…9e6f), :arrow_down: 83.4kiB/s :arrow_up: 1.4kiB/s
2023-09-16 12:25:19 [Consensus] :gear: Syncing 0.0 bps, target=#456183 (14 peers), best: #453724 (0xad40…2cdc), finalized #232206 (0x9143…9e6f), :arrow_down: 61.4kiB/s :arrow_up: 0.4kiB/s
2023-09-16 12:25:24 [Consensus] :gear: Syncing 0.0 bps, target=#456184 (12 peers), best: #453724 (0xad40…2cdc), finalized #232206 (0x9143…9e6f), :arrow_down: 89.0kiB/s :arrow_up: 1.2kiB/s
2023-09-16 12:25:29 [Consensus] :gear: Syncing 0.0 bps, target=#456184 (11 peers), best: #453724 (0xad40…2cdc), finalized #232206 (0x9143…9e6f), :arrow_down: 115.3kiB/s :arrow_up: 0.2kiB/s
2023-09-16 12:25:34 [Consensus] :gear: Syncing 0.0 bps, target=#456185 (11 peers), best: #453724 (0xad40…2cdc), finalized #232206 (0x9143…9e6f), :arrow_down: 115.8kiB/s :arrow_up: 0.8kiB/s
2023-09-16 12:25:39 [Consensus] :gear: Syncing 0.0 bps, target=#456189 (11 peers), best: #453724 (0xad40…2cdc), finalized #232206 (0x9143…9e6f), :arrow_down: 130.9kiB/s :arrow_up: 0.7kiB/s
2023-09-16 12:25:44 [Consensus] :gear: Syncing 0.0 bps, target=#456190 (11 peers), best: #453724 (0xad40…2cdc), finalized #232206 (0x9143…9e6f), :arrow_down: 128.2kiB/s :arrow_up: 0.5kiB/s
2023-09-16 12:25:49 [Consensus] :gear: Syncing 0.0 bps, target=#456190 (12 peers), best: #453724 (0xad40…2cdc), finalized #232206 (0x9143…9e6f), :arrow_down: 84.7kiB/s :arrow_up: 0.3kiB/s
2023-09-16 12:25:54 [Consensus] :gear: Syncing 6.1 bps, target=#456192 (12 peers), best: #453755 (0x4921…786e), finalized #232206 (0x9143…9e6f), :arrow_down: 34.7kiB/s :arrow_up: 0.6kiB/s
2023-09-16 12:25:59 [Consensus] :gear: Syncing 5.3 bps, target=#456192 (14 peers), best: #453782 (0x34e3…234d), finalized #232206 (0x9143…9e6f), :arrow_down: 25.5kiB/s :arrow_up: 0.4kiB/s
2023-09-16 12:26:03 [Consensus] :broken_heart: Error importing block 0xc0db2b2e28fab26a0ea9d16ffd9b08c0292cf5b9567855da96f42d3f9b231623: block has an unknown parent
2023-09-16 12:26:03 [Consensus] :broken_heart: Error importing block 0x4f658ff4d54036adc38d200d02e24a5ffa4d4b47b834c90e00897b52e976bb4c: block has an unknown parent
2023-09-16 12:26:04 [Consensus] :gear: Syncing 3.7 bps, target=#456194 (15 peers), best: #453801 (0x7c6b…52b2), finalized #232206 (0x9143…9e6f), :arrow_down: 23.3kiB/s :arrow_up: 1.8kiB/s
2023-09-16 12:26:09 [Consensus] :gear: Syncing 0.0 bps, target=#456194 (15 peers), best: #453801 (0x7c6b…52b2), finalized #232206 (0x9143…9e6f), :arrow_down: 355.2kiB/s :arrow_up: 1.1kiB/s
2023-09-16 12:26:10 [Consensus] :sparkles: Imported #453802 (0xc0db…1623)
2023-09-16 12:26:14 [Consensus] :gear: Syncing 5.3 bps, target=#456195 (8 peers), best: #453828 (0x51f9…2dfd), finalized #232206 (0x9143…9e6f), :arrow_down: 465.5kiB/s :arrow_up: 1.1kiB/s
2023-09-16 12:26:19 [Consensus] :gear: Syncing 5.5 bps, target=#456197 (11 peers), best: #453856 (0x7917…50bf), finalized #232206 (0x9143…9e6f), :arrow_down: 382.1kiB/s :arrow_up: 1.6kiB/s
2023-09-16 12:26:24 [Consensus] :gear: Syncing 5.3 bps, target=#456197 (11 peers), best: #453883 (0xc3c1…1a62), finalized #232206 (0x9143…9e6f), :arrow_down: 414.8kiB/s :arrow_up: 0.4kiB/s
2023-09-16 12:26:29 [Consensus] :gear: Syncing 5.5 bps, target=#456198 (10 peers), best: #453911 (0x1739…5fc6), finalized #232206 (0x9143…9e6f), :arrow_down: 335.4kiB/s :arrow_up: 0.6kiB/s
2023-09-16 12:26:34 [Consensus] :gear: Syncing 5.9 bps, target=#456199 (13 peers), best: #453941 (0xbad7…1b00), finalized #232206 (0x9143…9e6f), :arrow_down: 401.2kiB/s :arrow_up: 1.3kiB/s
2023-09-16 12:26:39 [Consensus] :gear: Syncing 5.5 bps, target=#456199 (13 peers), best: #453969 (0x17b2…5038), finalized #232206 (0x9143…9e6f), :arrow_down: 497.7kiB/s :arrow_up: 0.4kiB/s
2023-09-16 12:26:44 [Consensus] :gear: Syncing 5.7 bps, target=#456200 (12 peers), best: #453998 (0x0279…f2a5), finalized #232206 (0x9143…9e6f), :arrow_down: 426.6kiB/s :arrow_up: 0.8kiB/s

As long as it is making progress there is nothing inherently wrong.

If it gets stuck, it is likely due to a bug in Substrate. But also your node is not using DSN sync because you’re not on latest release.

If you’re not on latest release even if you sync successfully you’ll likely get banned (and ban yourself) a bunch of peers in the process.

Please upgrade to new releases as they come in, especially if you see issues.

Can you expand on what would cause the ban on users running the old versions? Is this a Farmer ban as weve discussed in the past or a different type of ban?

Bans can be two-way here.

One can be caused by local node requesting blocks from other nodes, but other nodes don’t have them (pruned already), thus reputation will be decreased for those peers until it reaches the floor, at which point peer will be banned.

The other happens in the opposite direction when local node requests the same blocks over and over again without getting desired response. Asking for the same blocks many times will result in reputation decrease down to being banned by other peers.

So one way or another node will either sync or get banned and probability of sync decreases over time as more nodes are pruned.

2 Likes