Failed to allocate bytes exception when running Docker on aarch64

Running this configuration:

  • Raspberry Pi4 8GiB
  • Docker compose
  • gemini-1b-2022-aug-17-aarch64

I’m seeing this error on every start:

subspace-node    | 2022-08-22 09:58:31 Subspace    
subspace-node    | 2022-08-22 09:58:31 ✌️  version 0.1.0-unknown    
subspace-node    | 2022-08-22 09:58:31 ❤️  by Subspace Labs <https://subspace.network>, 2021-2022    
subspace-node    | 2022-08-22 09:58:31 📋 Chain specification: Subspace Gemini 1    
subspace-node    | 2022-08-22 09:58:31 🏷  Node name: counterpoint_aarch64_pi4_hdd    
subspace-node    | 2022-08-22 09:58:31 👤 Role: AUTHORITY    
subspace-node    | 2022-08-22 09:58:31 💾 Database: ParityDb at /var/subspace/chains/subspace_gemini_1b/paritydb/full    
subspace-node    | 2022-08-22 09:58:31 ⛓  Native runtime: subspace-3 (subspace-0.tx0.au0)    
subspace-node    | 
subspace-node    | ====================
subspace-node    | 
subspace-node    | Version: 0.1.0-unknown
subspace-node    | 
subspace-node    |    0: sp_panic_handler::set::{{closure}}
subspace-node    |    1: std::panicking::rust_panic_with_hook
subspace-node    |    2: std::panicking::begin_panic_handler::{{closure}}
subspace-node    |    3: std::sys_common::backtrace::__rust_end_short_backtrace
subspace-node    |    4: rust_begin_unwind
subspace-node    |    5: core::panicking::panic_fmt
subspace-node    |    6: sc_consensus_subspace::archiver::initialize_archiver
subspace-node    |    7: subspace_service::new_partial
subspace-node    |    8: subspace_service::new_full::{{closure}}
subspace-node    |    9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
subspace-node    |   10: sc_cli::runner::Runner<C>::run_node_until_exit
subspace-node    |   11: subspace_node::main
subspace-node    |   12: std::sys_common::backtrace::__rust_begin_short_backtrace
subspace-node    |   13: std::rt::lang_start::{{closure}}
subspace-node    |   14: main
subspace-node    |   15: __libc_start_main
subspace-node    |   16: <unknown>
subspace-node    | 
subspace-node    | 
subspace-node    | Thread 'main' panicked at 'Failed to make runtime API call during last archived block search: Application(VersionInvalid("cannot create the wasmtime engine: failed to create memory pool mapping: mmap failed to allocate 0x3080000000 bytes: Cannot allocate memory (os error 12)"))', /code/crates/sc-consensus-subspace/src/archiver.rs:72
subspace-node    | 
subspace-node    | This is a bug. Please report it at:
subspace-node    | 
subspace-node    |      https://forum.autonomys.xyz
subspace-node    | 
subspace-node    | 2022-08-22 09:58:35 [PrimaryChain] Cannot create a runtime error=Other("cannot create the wasmtime engine: failed to create memory pool mapping: mmap failed to allocate 0x3080000000 bytes: Cannot allocate memory (os error 12)")

htop suggests there is plenty of RAM available throughout startup:

image

Note that this just started happening - I’ve managed to sync up to > 100k blocks on this build and it’s just started occurring.

Thanks for reporting this jim! Im going to get this escalated up to the development team for a better look!

No existing solutions found, please take a look. @Support-L2

Just an update that this is still happening on the sep-06 and sep-10 builds.

subspace-node    | 2022-09-15 21:22:39 Subspace    
subspace-node    | 2022-09-15 21:22:39 ✌️  version 0.1.0-unknown    
subspace-node    | 2022-09-15 21:22:39 ❤️  by Subspace Labs <https://subspace.network>, 2021-2022    
subspace-node    | 2022-09-15 21:22:39 📋 Chain specification: Subspace Gemini 2a    
subspace-node    | 2022-09-15 21:22:39 🏷  Node name: counterpoint_aarch64_pi4_hdd    
subspace-node    | 2022-09-15 21:22:39 👤 Role: AUTHORITY    
subspace-node    | 2022-09-15 21:22:39 💾 Database: ParityDb at /var/subspace/chains/subspace_gemini_2a/paritydb/full    
subspace-node    | 2022-09-15 21:22:39 ⛓  Native runtime: subspace-4 (subspace-0.tx0.au0)    
subspace-node    | 
subspace-node    | ====================
subspace-node    | 
subspace-node    | Version: 0.1.0-unknown
subspace-node    | 
subspace-node    |    0: sp_panic_handler::set::{{closure}}
subspace-node    |    1: std::panicking::rust_panic_with_hook
subspace-node    |    2: std::panicking::begin_panic_handler::{{closure}}
subspace-node    |    3: std::sys_common::backtrace::__rust_end_short_backtrace
subspace-node    |    4: rust_begin_unwind
subspace-node    |    5: core::panicking::panic_fmt
subspace-node    |    6: sc_consensus_subspace::archiver::initialize_archiver
subspace-node    |    7: subspace_service::new_partial
subspace-node    |    8: subspace_service::new_full::{{closure}}
subspace-node    |    9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
subspace-node    |   10: sc_cli::runner::Runner<C>::run_node_until_exit
subspace-node    |   11: subspace_node::main
subspace-node    |   12: std::sys_common::backtrace::__rust_begin_short_backtrace
subspace-node    |   13: std::rt::lang_start::{{closure}}
subspace-node    |   14: main
subspace-node    |   15: __libc_start_main
subspace-node    |   16: <unknown>
subspace-node    | 
subspace-node    | 2022-09-15 21:22:42 [PrimaryChain] Cannot create a runtime error=Other("cannot create the wasmtime engine: failed to create memory pool mapping: mmap failed to allocate 0x3080000000 bytes: Cannot allocate memory (os error 12)")
subspace-node    | 
subspace-node    | Thread 'main' panicked at 'Failed to make runtime API call during last archived block search: Application(VersionInvalid("cannot create the wasmtime engine: failed to create memory pool mapping: mmap failed to allocate 0x3080000000 bytes: Cannot allocate memory (os error 12)"))', /code/crates/sc-consensus-subspace/src/archiver.rs:72
subspace-node    | 
subspace-node    | This is a bug. Please report it at:
subspace-node    | 
subspace-node    |      https://forum.autonomys.xyz
subspace-node    | 
subspace-node exited with code 1

How much RAM does the system have? I suspect it may just run out of free memory at some point in time.
Enabling SWAP might help mitigating it.

It has 8GiB RAM and 2GiB swap. I don’t see the memory usage peaking in htop when this error is thrown (it stays flat) or I would assume OOM.

I’d try give it more SWAP, especially if it crashes reliably.
It may potentially use more RAM in a moment and crash before you can notice it.

Upped the swap to 8GiB and then 16GiB with the same result. The node is bombing out too fast to be filling the swap from my observations. htop is also completely static.

Looks like you’re building the node yourself?
Any chance you’re running a 32-bit OS and building 32-bit ARM software despite CPU is techcically Aarch64?

It’s the aarch64 Docker images I’m using. And the node synced fine (if slowly) right up to block 5735 when the node container started boot looping with the error message above.

What OS and version is it?

Another thing many libraries do is use of virtual memory that is much larger than amount of physical memory.
What does cat /proc/sys/vm/overcommit_memory say? If it says 2, that would be the root cause here.

jim@raspberrypi4:~/subspace-testnet/raspberrypi4 $ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye
jim@raspberrypi4:~/subspace-testnet/raspberrypi4 $ cat /proc/sys/vm/overcommit_memory
0

All looks good, can you also provide uname -a?

jim@raspberrypi4:~/subspace-testnet/raspberrypi4 $ uname -a
Linux raspberrypi4 5.15.56-v8+ #1575 SMP PREEMPT Fri Jul 22 20:31:26 BST 2022 aarch64 GNU/Linux

I think the last piece of information I’d like to see is docker-compose.yml you’re using

I did just try commenting out all the monitoring containers but get exactly the same result. I could set you up with access so you can have a play with it yourself if that would help Nazar? I realise we’re probably on the edge of what can be supported with a Pi but I’m duly following the error message and reporting it :slight_smile:

version: "3.7"

services:

# ********************************************************************************
# The following section should be copied from the official Subspace Docker Compose
# guide at https://github.com/subspace/subspace/blob/main/docs/farming.md
#*********************************************************************************

  node:
    container_name: subspace-node
    # Replace `snapshot-DATE` with latest release (like `snapshot-2022-apr-29`)
    # For running on Aarch64 add `-aarch64` after `DATE`
    image: ghcr.io/subspace/node:gemini-2a-2022-sep-10-aarch64
    volumes:
# Instead of specifying volume (which will store data in `/var/lib/docker`), you can
# alternatively specify path to the directory where files will be stored, just make
# sure everyone is allowed to write there
#      - node-data:/var/subspace:rw
      - "/media/jim/storage/subspace-node:/var/subspace:rw"
    ports:
# If port 30333 is already occupied by another Substrate-based node, replace all
# occurrences of `30333` in this file with another value
      - "0.0.0.0:30336:30336"
    expose:
      - "9615:9615"
    restart: unless-stopped
    command: [
      "--chain", "gemini-2a",
      "--base-path", "/var/subspace",
      "--execution", "wasm",
      "--pruning", "archive",
      #"--pruning", "1024",
      #"--keep-blocks", "1024",
      "--port", "30336",
      "--rpc-cors", "all",
      "--rpc-methods", "safe",
      "--unsafe-ws-external",
      "--validator",
# Expose Prometheus exporter on all interfaces.
# Default is local.
      "--prometheus-external",
# Specify Prometheus exporter.
      "--prometheus-port", "9615",
# Replace `INSERT_YOUR_ID` with your node ID (will be shown in telemetry)
      "--name", "counterpoint_aarch64_pi4_hdd"
    ]
    healthcheck:
      timeout: 5s
# If node setup takes longer then expected, you want to increase `interval` and `retries` number.
      interval: 30s
      retries: 5

  farmer:
    container_name: subspace-farmer
    depends_on:
      node:
        condition: service_healthy
    # For running on Aarch64 add `-aarch64` after `DATE`
    image: ghcr.io/subspace/farmer:gemini-2a-2022-sep-10-aarch64
# Un-comment following 2 lines to unlock farmer's RPC
    ports:
#      - "127.0.0.1:9955:9955"
# Instead of specifying volume (which will store data in `/var/lib/docker`), you can
# alternatively specify path to the directory where files will be stored, just make
# sure everyone is allowed to write there
      - "0.0.0.0:40336:40336"
    volumes:
#      - farmer-data:/var/subspace:rw
      - "/media/jim/storage/subspace-farmer:/var/subspace:rw"
    restart: unless-stopped
    command: [
      "--base-path", "/var/subspace",
      "farm",
      "--node-rpc-url", "ws://node:9944",
      "--ws-server-listen-addr", "0.0.0.0:9955",
      "--listen-on", "/ip4/0.0.0.0/tcp/40336",
# Replace `WALLET_ADDRESS` with your Polkadot.js wallet address
      "--reward-address", "st7NQTVwkwMsLyr4xUTfESrF1t7Y3tRggjWTpMQgb48SYQMeS",
# Replace `PLOT_SIZE` with plot size in gigabytes or terabytes, for instance 100G or 2T (but leave at least 10G of disk space for node)
      "--plot-size", "100G"
    ]

# ********************************************************************************
# The above section should be copied from the official Subspace Docker Compose
# guide at https://github.com/subspace/subspace/blob/main/docs/farming.md
#*********************************************************************************

# ********************************************************************************
# The below section defines the containers we need for the monitoring stack
#*********************************************************************************

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    environment:
      - GF_SECURITY_ADMIN_USER=<REDACTED>
      - GF_SECURITY_ADMIN_PASSWORD=<REDACTED>
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_SECURITY_DISABLE_BRUTE_FORCE_LOGIN_PROTECTION=false
    restart: unless-stopped
    ports:
      - 3000:3000

  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    expose:
      - 9090:9090

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    expose:
      - 9100

# ********************************************************************************
# The above section defines the containers we need for the monitoring stack
#*********************************************************************************

volumes:
# ********************************************************************************
# These first two volumes are required by the default setup from the official
# guide at https://github.com/subspace/subspace/blob/main/docs/farming.md
#*********************************************************************************
  node-data:
  farmer-data:
# ********************************************************************************
# These next volumes are required by the monitoring solution
#*********************************************************************************
  grafana_data:
  prometheus_data:

Thanks, I think I’ll need to do some more debugging with it myself, created an issue in our repo with these details that you can follow for future updates: Wasmtime-related memory allocation failure on aarch64 · Issue #833 · subspace/subspace · GitHub

1 Like

Wasn’t able to reproduce neither with virtualization, nor with real hardware, see last message on GitHub.

Can you attach lscpu output? Maybe something specific to your processor…

jim@raspberrypi4:~ $ lscpu
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       1
Vendor ID:                       ARM
Model:                           3
Model name:                      Cortex-A72
Stepping:                        r0p3
CPU max MHz:                     1800.0000
CPU min MHz:                     600.0000
BogoMIPS:                        108.00
L1d cache:                       128 KiB
L1i cache:                       192 KiB
L2 cache:                        1 MiB
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Vulnerable
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm crc32 cpuid