In this short blog post, we’ll break down what on-chain storage means at Autonomys and why it’s essential to our work.
What is On-Chain Storage?
On-chain storage can be understood in two ways:
- Narrow Sense: Data is stored on-chain if it meets two conditions:
- It’s recorded in the blockchain’s history.
- Every participating node stores it.
Bitcoin is an example of this type, where all nodes keep a full copy of the data.
- Wide Sense: Data is considered on-chain if:
- It’s recorded in the blockchain’s history.
- Only some nodes store it (or its coded version), but they still guarantee its availability.
Ethereum’s Danksharding is an example of this.
Currently, the Autonomys network uses the narrow-sense definition of on-chain storage. However, our upcoming sharded version will adopt the wide-sense definition.
Why Does On-Chain Storage Matter?
Compared to off-chain storage, on-chain storage requires no additional trust assumptions. With on-chain storage, all data is within the blockchain itself, eliminating the need to rely on outside entities for data availability.
Consider an example: if we store block headers on-chain but use an off-chain solution like IPFS for block bodies, then our data availability depends on IPFS’s reliability. In the Web3 world, reducing these kinds of dependencies is desirable. On-chain storage, where the blockchain itself guarantees data availability, aligns more closely with Web3’s trustless philosophy.
Why Choose the Wide-Sense Definition?
The narrow-sense approach, where every node stores all data, is not scalable. Since all nodes must download all data, the data throughput is limited by the bandwidth of the slowest nodes—creating a bottleneck. The wide-sense approach, however, allows us to scale more effectively. It offers two ways to boost data throughput:
- Data Replication by a Subset of Nodes:
- Each node stores a commitment to the data (proving its inclusion in blockchain history).
- The data itself is stored only by a subset of nodes, and as long as one honest node is online, the data is available.
- Erasure Coding with Distributed Storage:
- Every node stores a commitment to the data.
- The data is erasure-coded, and each node stores just a portion of it (i.e., a coded data chunk). The data can be fully recovered if enough honest nodes with coded data chunks are available.
At Autonomys, we are incorporating both methods to create modular data domains with over 20+ GBPS throughput. Stay tuned for more details on how this design unfolds.