Data upload scaling to 1.8TBps

Data Availability Committees (DACs) are an effective approach to scaling blockchain throughput by offloading data availability responsibilities to a smaller, dedicated group of nodes. This allows the base chain to process transactions more efficiently without the burden of storing and propagating all transaction data.
One main connection to be made here is that domain operators are already defined as separate class(es) of nodes from farmers. Each domain may have different hardware requirements for its operator set based on the purpose of the domain. Moreover, each domain may be permissioned with only whitelisted operators allowed to run or permissionless.
Given these insights, operators of a permissioned domain can naturally serve as a DAC for data specific to that domain. These operators, already responsible for executing transactions and maintaining the state of their domain, can extend their role to ensure short-term data availability (farmer plots guarantee long-term data availability). By leveraging the existing trust model and decoupled execution framework, operators can efficiently store and attest to the availability of domain-specific data, allowing the main consensus chain to focus on ordering and settlement.
This allows us to introduce a struct, in essence, similar to a blob on Ethereum - an opaque (to base chain) set of data bytes to be stored in farmers’ plots. Given how the Archiving protocol is designed, where the history is measured in archived segments and grows in increments of 256 MiB at a time, having this “blob” conceptually coincide with a segment is only natural. Each segment contains 128 MiB of raw data (and doubles in size due to erasure coding).
If a DAC is allowed to submit solely the KZG segment commitments to the base chain, each raw data segment will effectively take only 48 bytes of blockspace.
With our current block size limit of 3.75 MiB, one block can fit
s=\frac{3.75 \text{MiB}}{48\text{b}}=81920
segments. Since each segment contains 128 MiB of raw data, this means a single block can “contain” up to 81920*128\text{MiB}=10240 \text{GiB} \approx 11 \text{TB}. Divided by 6 second block time, this makes the current achievable throughput limit of the network \approx1.83\text{TBps}.
Such throughput is conditional on the DAC throughput. While a terabyte bandwidth for a node is unrealistic, it is still achievable given the modularity of the DecEx framework, which allows us to spin hundreds of DAC-domains with 10 Gbps bandwidth each and overall achieve Terabyte throughput.

5 Likes

However, the data stored in the domain still needs to be published to the consensus chain, otherwise the fraud proof may not be

The consensus chain block can only store 3.5M of data. Will there be plans to increase the block size to a large extent in the future? Just like the block size of Arweave.

We are currently not planning to increase blocksize a lot, because we like to still keep minimal bandwidth requirements for nodes (unlike Arweave).
What I describe here will still have all data archived in farmer plots and referenced by KZG commitments in block bodies.

If the source data of the domain is not sent to the consensus chain, but only the zkg commitment of the source data is sent to the consensus chain, the farmer will not be able to store the source data of these domains.

If the block size cannot be expanded, how can we save very large files? AI, whether it is source data or models, is very large.

Depends what you mean by “send”. The domain data will not be explicitly included in the block body (which is also why we don’t need to increase block size), only the commitments.
During archiving, the domain data corresponding to the commitments will be appended to the global history and farmers will be able to plot only the pieces needed for their individual plots, without having to download all the domain data every single block.
If we were to put domain data into block body explicitly (like we do today) and increase block size to say 1 GiB than we would have to increase farmer bandwidth requirements correspondingly to and kill decentralization. We won’t do that as it goes in conflict with our yeras-long commitment to low barriers to participation.

When all DACs act maliciously (for example, DAC does not transmit data to Farmer), it will lead to the inability to generate fraud proofs.

That is a valid concern, which is why we plan to initially implement a few permissioned trusted DACs and later make them permissionless and trustless via sharding design we’re working on: Autonomys Scalability Roadmap (Part 1) and Autonomys Scalability Roadmap (Part 2)

I look forward to the launch of sharding as soon as possible. With sharding, the entire protocol will be truly complete. Real decentralized storage and decentralized computing are exciting to think about.

2 Likes