Node can't catch up with mainnet on Ubuntu VM

T777 · November 11, 2024, 12:20am

I’ve been trying to sync to the mainnet since launch, with a number of iterations, and no matter what I do, my node lags behind and can never sync up.

Latest setup:
Ubuntu 22.04 VM, fully updated as of 10NOV2024 running on Proxmox VE 7.4-17.

VM:
2 sockets, 8 cores → x86-64-v2-AES numa=1
4GiB RAM [balloon=0]
256G SSD / discard=on, iothread=1
BIOS: OVMF (UEFI) / Display: SPICE (qxl) / Machine q35
SCSI Controller: VirtIO SCSI single

Hypervisor:
40 x Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz (2 Sockets)
datastore SSD: Samsung SSD 860 1TB

Internet connection:
1Gbps / 250Mbps

Systemd unit file:

[Unit]
Description=Subspace Node
Wants=network.target
After=network.target

[Service]
User=subspace
Group=subspace
ExecStart=/home/subspace/.local/bin/subspace-node run \
          --name "nodename" \
          --base-path /home/subspace/.local/share/subspace-node \
          --chain mainnet \
          --farmer \
          --listen-on /ip4/0.0.0.0/tcp/30333 \
          --dsn-listen-on /ip4/0.0.0.0/tcp/30433 \
          --rpc-methods unsafe \
          --rpc-cors all \
          --rpc-listen-on 0.0.0.0:9945
StandardOutput=append:/var/log/subspace/log1.log
StandardError=append:/var/log/subspace/log2.log
KillSignal=SIGINT
Restart=always
RestartSec=10
Nice=-5
LimitNOFILE=100000

[Install]
WantedBy=multi-user.target

Ports are forwarded and the corresponding firewall rules are in place.

When I start a node on a Windows 10 Pro workstation (without the ports forwarded to it), it syncs in under an hour (snap sync + a bit of slow sync). But the linux node fails to catch up even after a successful snap pre-sync. The Windows workstation is a dual E5-2630v3 with nvme storage - a little bit newer than the v2 CPUs in the server running the Ubuntu VM.

AES-NI is passed through to the VM and as far as I know it’s working from within the VM.

I’m using this binary on the Ubuntu VM: https://github.com/autonomys/subspace/releases/download/mainnet-2024-nov-06/subspace-node-ubuntu-x86_64-v2-mainnet-2024-nov-06

I’ve set up an Ubuntu 24.04 VM too, on the same hypervisor/server, with the same VM hardware, with identical results.

I am happy to provide any additional info needed to troubleshoot this. Cheers.

EDIT1:

The CPU gets pegged to ~100% for periods of time with short breaks in between, and there is a decent amount of data being exchanged.
I am mostly seeing 0bps with the occasional blip:

2024-11-10T23:47:41.732354Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55151 (40 peers), best: #54502 (0xc086…c514), finalized #54249 (0xbf32…37b2), ⬇ 7.0MiB/s ⬆ 1.7kiB/s
2024-11-10T23:47:46.733784Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55151 (40 peers), best: #54502 (0xc086…c514), finalized #54249 (0xbf32…37b2), ⬇ 4.4MiB/s ⬆ 1.8kiB/s
2024-11-10T23:47:51.734191Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55152 (40 peers), best: #54502 (0xc086…c514), finalized #54249 (0xbf32…37b2), ⬇ 3.1MiB/s ⬆ 1.2kiB/s
2024-11-10T23:47:56.734485Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55152 (40 peers), best: #54502 (0xc086…c514), finalized #54249 (0xbf32…37b2), ⬇ 1.5MiB/s ⬆ 1.3kiB/s
2024-11-10T23:48:01.735036Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55152 (40 peers), best: #54502 (0xc086…c514), finalized #54249 (0xbf32…37b2), ⬇ 656.0kiB/s ⬆ 1.5kiB/s
2024-11-10T23:48:06.735438Z  INFO Consensus: substrate: ⚙️  Preparing  0.6 bps, target=#55153 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 681.4kiB/s ⬆ 1.4kiB/s
2024-11-10T23:48:11.736096Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55153 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 1.5MiB/s ⬆ 1.4kiB/s
2024-11-10T23:48:16.736414Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55153 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 1.4MiB/s ⬆ 1.2kiB/s
2024-11-10T23:48:21.736793Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55153 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 700.2kiB/s ⬆ 1.3kiB/s
2024-11-10T23:48:26.737066Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55154 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 719.8kiB/s ⬆ 1.8kiB/s
2024-11-10T23:48:31.737349Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55154 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 805.8kiB/s ⬆ 1.9kiB/s
2024-11-10T23:48:36.737694Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55155 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 913.8kiB/s ⬆ 1.8kiB/s
2024-11-10T23:48:41.738054Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55156 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 1.3MiB/s ⬆ 2.9kiB/s
2024-11-10T23:48:46.738363Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55156 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 1.1MiB/s ⬆ 2.3kiB/s
2024-11-10T23:48:51.738748Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55157 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 1.0MiB/s ⬆ 5.3kiB/s
2024-11-10T23:48:56.739201Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55157 (40 peers), best: #54505 (0x1c7a…99be), finalized #54249 (0xbf32…37b2), ⬇ 1.2MiB/s ⬆ 1.8kiB/s
2024-11-10T23:49:01.740545Z  INFO Consensus: substrate: ⚙️  Preparing  0.1 bps, target=#55157 (40 peers), best: #54506 (0x54a0…ebe9), finalized #54249 (0xbf32…37b2), ⬇ 1.8MiB/s ⬆ 3.2kiB/s
2024-11-10T23:49:06.741623Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55157 (40 peers), best: #54506 (0x54a0…ebe9), finalized #54249 (0xbf32…37b2), ⬇ 947.5kiB/s ⬆ 5.6kiB/s
2024-11-10T23:49:11.741928Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55157 (40 peers), best: #54506 (0x54a0…ebe9), finalized #54249 (0xbf32…37b2), ⬇ 922.1kiB/s ⬆ 2.6kiB/s
2024-11-10T23:49:16.742254Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55158 (40 peers), best: #54506 (0x54a0…ebe9), finalized #54249 (0xbf32…37b2), ⬇ 1.2MiB/s ⬆ 2.2kiB/s
2024-11-10T23:49:21.743441Z  INFO Consensus: substrate: ⚙️  Preparing  0.1 bps, target=#55158 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 10.2MiB/s ⬆ 2.3kiB/s
2024-11-10T23:49:26.743792Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55159 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 6.1MiB/s ⬆ 2.1kiB/s
2024-11-10T23:49:31.744359Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55161 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 6.0MiB/s ⬆ 3.6kiB/s
2024-11-10T23:49:36.746226Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55162 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 3.2MiB/s ⬆ 2.9kiB/s
2024-11-10T23:49:41.747390Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55162 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.4MiB/s ⬆ 1.4kiB/s
2024-11-10T23:49:46.748105Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55163 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.5MiB/s ⬆ 1.2kiB/s
2024-11-10T23:49:51.748403Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55163 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.2MiB/s ⬆ 1.7kiB/s
2024-11-10T23:49:56.749505Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55164 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.2MiB/s ⬆ 1.5kiB/s
2024-11-10T23:50:01.749820Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55166 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.5MiB/s ⬆ 2.4kiB/s
2024-11-10T23:50:06.750095Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55167 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 11.1MiB/s ⬆ 1.2kiB/s
2024-11-10T23:50:11.750362Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55167 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 7.2MiB/s ⬆ 1.6kiB/s
2024-11-10T23:50:16.750728Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55168 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 5.3MiB/s ⬆ 1.7kiB/s
2024-11-10T23:50:21.751755Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55168 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 2.6MiB/s ⬆ 1.1kiB/s
2024-11-10T23:50:26.752029Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55168 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.7MiB/s ⬆ 1.1kiB/s
2024-11-10T23:50:31.753052Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55168 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.5MiB/s ⬆ 1.3kiB/s
2024-11-10T23:50:36.833906Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55168 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.4MiB/s ⬆ 1.4kiB/s
2024-11-10T23:50:41.834936Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55169 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 1.0MiB/s ⬆ 1.5kiB/s
2024-11-10T23:50:46.837103Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55169 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 548.8kiB/s ⬆ 1.7kiB/s
2024-11-10T23:50:51.838123Z  INFO Consensus: substrate: ⚙️  Preparing  0.0 bps, target=#55170 (40 peers), best: #54507 (0xf755…6f6f), finalized #54249 (0xbf32…37b2), ⬇ 667.0kiB/s ⬆ 1.5kiB/s

EDIT2:

CPU temps are fine. CPU is not throttling.
Nothing else that’s CPU intensive is running on the hypervisor.
The last testnet did not have issues synching, even without snap sync. Only mainnet seems to have a problem synching on that VM.
I deleted the node’s data and restarted the node. This did not help.

EDIT3:

I’ve set up a Windows Server 2022 VM on the same host. It reaches snap sync and then the process errors out and exits (I’ve seen this happen on the linux VM too, but it auto restarts courtesy systemd, then continues). Sync rate 0 bps. CPU at 100%. A decent amount of network activity. Log: PS C:\subspace> .\node.ps12024-11-11T01:05:18.483984Z INFO subspace_node::com - Pastebin.com

EDIT4:

vexr on discord suggested that I switch the vCPU type to “host”. This appears to have done the trick. I will edit the post with a final update once the node has synched up fully.

EDIT5:

Node syncs fine now, stays in sync, and I was able to reduce the number of vCPUs allocated to it.

nazar-pc · November 11, 2024, 3:44am

Yeah, that would make sure you have all the instructions at your disposal so the software can run the most efficiently.

T777 · November 11, 2024, 4:02am

I can now confirm that the issue is resolved.

I would suggest to put a warning in the docs about this, as I am sure that I am not the only one running a node on proxmox.

nazar-pc · November 11, 2024, 12:32pm

I think generally you’ll want to always use host regardless of whether you use proxmox or libvirtd or something else. I’m not sure we should put instructions for configuring virtual machines in farming documentation though.

Topic		Replies	Views
Node can't catch up to 100 units, doesn't fully sync Support nodes	13	480	March 29, 2023
Synchronization does not start Support	18	679	October 2, 2022
Node not able to sync for days Incentivized Testnet gemini , nodes	9	458	September 19, 2023
Node sync issues on sep-03 Support	13	100	September 9, 2024
Gemini-3g nodes not syncing on Linux Support	22	609	November 18, 2023

Node can't catch up with mainnet on Ubuntu VM

Related topics