I’ve been trying to sync to the mainnet since launch, with a number of iterations, and no matter what I do, my node lags behind and can never sync up.
Latest setup:
Ubuntu 22.04 VM, fully updated as of 10NOV2024 running on Proxmox VE 7.4-17.
VM:
2 sockets, 8 cores → x86-64-v2-AES numa=1
4GiB RAM [balloon=0]
256G SSD / discard=on, iothread=1
BIOS: OVMF (UEFI) / Display: SPICE (qxl) / Machine q35
SCSI Controller: VirtIO SCSI single
Hypervisor:
40 x Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz (2 Sockets)
datastore SSD: Samsung SSD 860 1TB
Internet connection:
1Gbps / 250Mbps
Systemd unit file:
[Unit]
Description=Subspace Node
Wants=network.target
After=network.target
[Service]
User=subspace
Group=subspace
ExecStart=/home/subspace/.local/bin/subspace-node run \
--name "nodename" \
--base-path /home/subspace/.local/share/subspace-node \
--chain mainnet \
--farmer \
--listen-on /ip4/0.0.0.0/tcp/30333 \
--dsn-listen-on /ip4/0.0.0.0/tcp/30433 \
--rpc-methods unsafe \
--rpc-cors all \
--rpc-listen-on 0.0.0.0:9945
StandardOutput=append:/var/log/subspace/log1.log
StandardError=append:/var/log/subspace/log2.log
KillSignal=SIGINT
Restart=always
RestartSec=10
Nice=-5
LimitNOFILE=100000
[Install]
WantedBy=multi-user.target
Ports are forwarded and the corresponding firewall rules are in place.
When I start a node on a Windows 10 Pro workstation (without the ports forwarded to it), it syncs in under an hour (snap sync + a bit of slow sync). But the linux node fails to catch up even after a successful snap pre-sync. The Windows workstation is a dual E5-2630v3 with nvme storage - a little bit newer than the v2 CPUs in the server running the Ubuntu VM.
AES-NI is passed through to the VM and as far as I know it’s working from within the VM.
I’m using this binary on the Ubuntu VM: https://github.com/autonomys/subspace/releases/download/mainnet-2024-nov-06/subspace-node-ubuntu-x86_64-v2-mainnet-2024-nov-06
I’ve set up an Ubuntu 24.04 VM too, on the same hypervisor/server, with the same VM hardware, with identical results.
I am happy to provide any additional info needed to troubleshoot this. Cheers.
EDIT1:
- The CPU gets pegged to ~100% for periods of time with short breaks in between, and there is a decent amount of data being exchanged.
- I am mostly seeing 0bps with the occasional blip:
2024-11-10T23:47:41.732354Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55151 (40 peers), best: #54502 (0xc086…c514), finalized #54249 (0xbf32…37b2), ⬇ 7.0MiB/s ⬆ 1.7kiB/s
2024-11-10T23:47:46.733784Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55151 (40 peers), best: #54502 (0xc086…c514), finalized #54249 (0xbf32…37b2), ⬇ 4.4MiB/s ⬆ 1.8kiB/s
2024-11-10T23:47:51.734191Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55152 (40 peers), best: #54502 (0xc086…c514), finalized #54249 (0xbf32…37b2), ⬇ 3.1MiB/s ⬆ 1.2kiB/s
2024-11-10T23:47:56.734485Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55152 (40 peers), best: #54502 (0xc086…c514), finalized #54249 (0xbf32…37b2), ⬇ 1.5MiB/s ⬆ 1.3kiB/s
2024-11-10T23:48:01.735036Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55152 (40 peers), best: #54502 (0xc086…c514), finalized #54249 (0xbf32…37b2), ⬇ 656.0kiB/s ⬆ 1.5kiB/s
2024-11-10T23:48:06.735438Z INFO Consensus: substrate: ⚙️ Preparing 0.6 bps, target=#55153 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 681.4kiB/s ⬆ 1.4kiB/s
2024-11-10T23:48:11.736096Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55153 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 1.5MiB/s ⬆ 1.4kiB/s
2024-11-10T23:48:16.736414Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55153 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 1.4MiB/s ⬆ 1.2kiB/s
2024-11-10T23:48:21.736793Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55153 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 700.2kiB/s ⬆ 1.3kiB/s
2024-11-10T23:48:26.737066Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55154 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 719.8kiB/s ⬆ 1.8kiB/s
2024-11-10T23:48:31.737349Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55154 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 805.8kiB/s ⬆ 1.9kiB/s
2024-11-10T23:48:36.737694Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55155 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 913.8kiB/s ⬆ 1.8kiB/s
2024-11-10T23:48:41.738054Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55156 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 1.3MiB/s ⬆ 2.9kiB/s
2024-11-10T23:48:46.738363Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55156 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 1.1MiB/s ⬆ 2.3kiB/s
2024-11-10T23:48:51.738748Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55157 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 1.0MiB/s ⬆ 5.3kiB/s
2024-11-10T23:48:56.739201Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55157 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 1.2MiB/s ⬆ 1.8kiB/s
2024-11-10T23:49:01.740545Z INFO Consensus: substrate: ⚙️ Preparing 0.1 bps, target=#55157 (40 peers), best: #54506 (0x54a0…ebe9), finalized #54249 (0xbf32…37b2), ⬇ 1.8MiB/s ⬆ 3.2kiB/s
2024-11-10T23:49:06.741623Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55157 (40 peers), best: #54506 (0x54a0…ebe9), finalized #54249 (0xbf32…37b2), ⬇ 947.5kiB/s ⬆ 5.6kiB/s
2024-11-10T23:49:11.741928Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55157 (40 peers), best: #54506 (0x54a0…ebe9), finalized #54249 (0xbf32…37b2), ⬇ 922.1kiB/s ⬆ 2.6kiB/s
2024-11-10T23:49:16.742254Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55158 (40 peers), best: #54506 (0x54a0…ebe9), finalized #54249 (0xbf32…37b2), ⬇ 1.2MiB/s ⬆ 2.2kiB/s
2024-11-10T23:49:21.743441Z INFO Consensus: substrate: ⚙️ Preparing 0.1 bps, target=#55158 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 10.2MiB/s ⬆ 2.3kiB/s
2024-11-10T23:49:26.743792Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55159 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 6.1MiB/s ⬆ 2.1kiB/s
2024-11-10T23:49:31.744359Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55161 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 6.0MiB/s ⬆ 3.6kiB/s
2024-11-10T23:49:36.746226Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55162 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 3.2MiB/s ⬆ 2.9kiB/s
2024-11-10T23:49:41.747390Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55162 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.4MiB/s ⬆ 1.4kiB/s
2024-11-10T23:49:46.748105Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55163 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.5MiB/s ⬆ 1.2kiB/s
2024-11-10T23:49:51.748403Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55163 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.2MiB/s ⬆ 1.7kiB/s
2024-11-10T23:49:56.749505Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55164 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.2MiB/s ⬆ 1.5kiB/s
2024-11-10T23:50:01.749820Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55166 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.5MiB/s ⬆ 2.4kiB/s
2024-11-10T23:50:06.750095Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55167 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 11.1MiB/s ⬆ 1.2kiB/s
2024-11-10T23:50:11.750362Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55167 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 7.2MiB/s ⬆ 1.6kiB/s
2024-11-10T23:50:16.750728Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55168 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 5.3MiB/s ⬆ 1.7kiB/s
2024-11-10T23:50:21.751755Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55168 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 2.6MiB/s ⬆ 1.1kiB/s
2024-11-10T23:50:26.752029Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55168 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.7MiB/s ⬆ 1.1kiB/s
2024-11-10T23:50:31.753052Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55168 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.5MiB/s ⬆ 1.3kiB/s
2024-11-10T23:50:36.833906Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55168 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.4MiB/s ⬆ 1.4kiB/s
2024-11-10T23:50:41.834936Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55169 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.0MiB/s ⬆ 1.5kiB/s
2024-11-10T23:50:46.837103Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55169 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 548.8kiB/s ⬆ 1.7kiB/s
2024-11-10T23:50:51.838123Z INFO Consensus: substrate: ⚙️ Preparing 0.0 bps, target=#55170 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 667.0kiB/s ⬆ 1.5kiB/s
EDIT2:
- CPU temps are fine. CPU is not throttling.
- Nothing else that’s CPU intensive is running on the hypervisor.
- The last testnet did not have issues synching, even without snap sync. Only mainnet seems to have a problem synching on that VM.
- I deleted the node’s data and restarted the node. This did not help.
EDIT3:
- I’ve set up a Windows Server 2022 VM on the same host. It reaches snap sync and then the process errors out and exits (I’ve seen this happen on the linux VM too, but it auto restarts courtesy systemd, then continues). Sync rate 0 bps. CPU at 100%. A decent amount of network activity. Log: PS C:\subspace> .\node.ps12024-11-11T01:05:18.483984Z INFO subspace_node::com - Pastebin.com
EDIT4:
- vexr on discord suggested that I switch the vCPU type to “host”. This appears to have done the trick. I will edit the post with a final update once the node has synched up fully.
EDIT5:
- Node syncs fine now, stays in sync, and I was able to reduce the number of vCPUs allocated to it.