Proving on consensus after the state is pruned

During one of the discussions with the team, we realized that at the moment, for the domains, once a given domain block is confirmed its state is pruned. So if at some point we want to prove something related to domain on consensus state it would not be possible to prove as the state is pruned. This is same case if someone wants to prove something else on consensus runtime for a specific block which is beyond confirmation depth. Ofcourse, we can pull the actual consensus block either through DSN or archival node but to prove it on runtime we either need to inject that state back to the consensus runtime or fetch through through host function. Fetching with host function would require nodes that do not maintain archival history to fetch it through a network and it will be infeasible.

One potential solution we were thinking was to introduce Merkle mountain range into consensus chain that stores consensus block headers. Having this would make it possible to prove anything on the consensus runtime.

Potential storage

  • on runtime, mmr would store the peaks and it should not bloat the overtime
  • on the client which acts as proof providers would need to store all the leaves to generate the required proofs which will bloat the client storage as the chain grows.

Proofs sizes:
Since we just add consensus block header into MMR, there would multiple indirect proofs if we want to prove something on domain state and as a result proof sizes would be much bigger. I think this would be okay ?

Want to gather some feedback and potential alternative data structures instead of MMR. Please let me know what you think.

cc: @nazar-pc @dariolina @Jeremy_Frank

2 Likes

I see your concern and potential solution, however, I fail to see why is this a concern. The definition of the challenge period implies the inability to challenge things beyond that period. So if you need to prove something wrong you should do it before the state is pruned. Would you need to prove something is correct? After the challenge period, everything that was in those blocks is considered correct.
What would you need to prove?

Its not about proving right but rather proving something that has happened on the Domain chain on consensus chain. We do not have a usecase of proving something that existed yet but if such feature does indeed comes in, then there is no way to prove that such event has happened on Domain on Consensus runtime since it does not hold necessary state beyond unconfirmed domain blocks. Having an infrastructure to do that proof verification is required to be added rather than adding such data at a later time. We can ofcourse add the same MMR at a later time but setting the consensus state requires some form trust and migration both of which might not be feasible

Here is background: https://github.com/subspace/subspace/issues/2082#issuecomment-1758267896

In short we have pathological cases with domain runtime upgrade when one of the domains is offline for a prolonged period of time. If/when domain goes back online they should be able to upgrade to the latest runtime version. The problem is that this either means unbounded state on consensus chain to store all intermediate domain runtime versions or access to state at unboudned depth, both of which are very problematic for obvious reasons. This is not necessary for happy case, but definitely required in order to verify domain runtime upgrade fraud proof.

What we thought is that with MMR such a domain would be able to revive itself by providing not only ER that runtime upgrade happened, but also the runtime itself and a proof that it was in fact a correct runtime back then. Maybe we only need this for fraud proof and upgrade can still follow the happy path, but considering the danger I think we should include it during upgrade alongside with ER.

While we don’t need to design the upgrade mechanism itself, we need an infrastructure to be physically able to generate correct proofs in the future without resorting to revival of the domains via governance only.

1 Like

A Merkle mountain range (MMR) is a data structure that supports efficient appends. Also, it is related to vector commitment with efficient updates. Some latest advances are BalanceProofs and this one. To @ved: Do we need aggregatability for our purpose? (See the previous links for its definition.)

Another direction to explore: Just as a Merkle tree can be turned into a Verkle tree, a Merkle mountain range can be turned into a Verkle mountain range.

As an update to this thread, we have found another use case where this may be necessary, and it is XDM. According to discussions in Trusted source of domain state root for cross domain message verification the destination domain will only execute XDM after the challenge period on the source domain has passed and the state has been pruned.

Looks like all the three data structures provide similar efficient performance over leaf appends, proof generation, and proof verification. As for the proof size, all data strucutres are efficient but MMR, I believe, will have smaller proof sizes for recent leaves. Please correct me if you think otherwise. It should be okay to use any though for this use case MMR might be straightforward since MMR is already used in other projects though I’m not aware of rust implementations for the rest of the Data structures. We might need to invest sometime here though.

This is the proof structure I have in mind taking XDM example.

  • Storage proof of the XDM message on the domain
  • Storage proof of the Domain state root on the consensus.
  • Proof of consensus header leaf by one of the above data structure.

Though we can optimise the 1 & 2 proof if the leaf not only holds consensus header but also domain state roots and other domain specific information.

At the moment, we do not need aggregatability because to prove something on consensus, or on mutiple domains, we just need to provide one leaf proof. Though I’m not sure if there is a usecase for multiple leaf proofs down the line.

From my understanding, MMR scheme might not be compatible with Verkle tree unlike Merkle trees here. Maybe others have some thoughts ?

cc: @dariolina @nazar-pc

Great information @ved! Below please find a few quick comments.

  1. MMR is good enough for now. I can explain how to build a Verkle-tree-style MMR after our mainnet launch.

  2. A key question is the following: How can a consensus node track “correct” ERs? To answer this question, let us first make three simple observations. 1) If multiple ERs point to a same domain block, then at most one of them is correct. 2) If a single ER points to a domain block and this ER survives the challenge period, then this ER is considered to be correct based on our security assumption. 3) If there are still multiple ERs pointing to a domain block after its challenge period, then our security assumption is wrong. We have to extend the challenge period. Based on these observations, we have a rough idea of how a consensus node tracks “correct” ERs. Since correct ERs form an increasing sequence, we can use MMR to record the “root” of this sequence.

  3. We are ready to discuss the proof structure by using XDM as an example. Let us view a domain operator as a prover and view a consensus node as a verifier. The prover needs to provide a storage proof for the XDM message regarding a particular ER together with another storage proof for this ER regarding an MMR root.

  4. A downside of the above approach is that we need to maintain MMR for each and every domain. Can we optimize this?

Tracking ER root is not necessary I believe since final confirmed ER has the domain state root that was not challenged. We would need domain state root to prove XDM and another proof to MMR leaf for MMR root

  1. A downside of the above approach is that we need to maintain MMR for each and every domain. Can we optimize this?

Not necessarily. Idea was to include consensus header and list of domain stateroots mapped to domain_id that were confirmed during a consensus block. We would have one MMR for all domains

Thanks @ved! Below please find several quick questions (just for clarification).

  • What do you mean by “final confirmed ER”? Will “final confirmed ER” be pruned from its associated consensus block?

  • XDM is a domain extrinsic. Why not using domain_block_extrinsics_root to prove XDM?

  • Consider a consensus block B_h of height h, which contains bundles from three domains. Suppose that the challenge period is X (consensus) blocks. Then, the consensus block B_{h + X} of height h + X will “confirm” ERs for three domain blocks associated with B_h, right? Or, the header of B_{h + X} will include MMR root with each leaf being a domain state root associated with B_h, right?

Yes, all the confirmed domain block state is pruned and so we want to such data to MMR for proof verification.

We are trying to prove the extrinsic execution since XDM is stored on state once accepted. We want to prove such XDM is acceped by the source domain

B_{h + X}​ will confirm the Domain blocks on B_h and derive new leaf within that block finalization. So B_{h + X} 's header with 3 domain state roots are included in the MMR leaf

To clarify, XDM extrinsic is included in a domain block only after it was sent to destination domain.
When a user submits XDM to the source domain, it is stored only in runtime state until the state root of the best block at that point is confirmed. So we can’t use domain_block_extrinsics_root from source domain.

Thanks @ved and @dariolina! It is all clear now. It seems that a simple solution would be to use the header of B_{h + X} to “store” some form of domain state roots (perhaps together with some other information) from domain blocks associated with B_h. I will elaborate on this shortly.