An Update on Progress
As we approach Phase-2 launch, we want to provide our community with greater visibility into the technical challenges our engineering teams are actively addressing and the progress being made. This update offers a detailed look at both the protocol development work and the infrastructure scaling efforts underway, including the issues we’ve encountered during recent stress testing and how we’re systematically resolving them. Our commitment to transparency means sharing not just our successes, but also the technical hurdles we’re working through as we prepare for mainnet launch.
TLDR
Protocol Progress: The team is in the final stretch toward Phase 2 launch, with significant progress on critical requirements. The “Crossing the Narrow Sea” contest successfully stress-tested our XDM (cross-domain messaging) system, revealing and helping us fix a serious double-minting bug along with other performance issues. We’ve completed essential benchmarking work and implemented numerous stability improvements. The audit process is advancing well, with XDM review nearly complete and only two remaining categories: sudo on domains and domain snap sync functionality.
Astral Challenges & Solutions: Our block explorer is experiencing growing pains as network activity increases, with 32 million extrinsics and 145 million events now on Taurus. We’ve identified and are systematically addressing performance bottlenecks, achieving 5x indexing speed improvements through database optimizations, Redis queue architecture, and eliminating unnecessary API calls. Additional work includes resolving SubQuery incompatibilities and fixing staking indexer accuracy issues.
Bottom Line: Both teams are making progress toward Phase 2 readiness while proactively addressing scalability challenges that come with increased network usage.
Protocol
The protocol team has been focused on finishing phase two requirements. These currently fall into two areas:
- benchmarking extrinsic weights
- addressing XDM (cross-domain messaging) issues that were surfaced during the recent “Crossing the Narrow Sea” contest
The XDM testing was very successful and surfaced many issues that became most apparent under heavy load. These varied in criticality, from highlighting documentation gaps to discovering a potentially very serious double-minting bug.
Many in our community are closely following our P1 audit issues on GitHub and have noted that at times this list has been somewhat dynamic. I want to explain how our audit is being conducted and how we are approaching launch readiness. Additionally, I want our community to understand that our team is extremely focused on fully launching the protocol. We are all stakeholders and want to see a successful phase two launch as soon as possible.
In order to minimize the time between code completion and launching with an audited codebase, SR Labs has been conducting an ongoing audit as we merge code that requires auditing. When a pull request is merged, if it requires auditing, it is tagged with an audit P1-P3 tag. This helps SR Labs prioritize what code they should be auditing next. These tags may change over time as certain features or bug fixes become higher priority, or as features are de-prioritized such as the permissioned EVM or potentially with domains snap sync. When conducting testing, such as we did with XDM recently, any serious issue found will need to be addressed and audited prior to network launch. This is why several new items have been added to the queue recently.
Recent pull requests
The recent activity in our GitHub repository reflects the focused efforts described above, with pull requests clustered around key phase two priorities.
XDM Cross-Domain Messaging Improvements
Following the “Crossing the Narrow Sea” contest, we’ve implemented several critical XDM fixes. The before mentioned double-minting bug was addressed in a series of PRs. PR #3514 ensures proper fund minting sequencing, PR #3522 captures fallible routes to properly mark failed transfers, and PR #3534 ensures no partial state during XDM processing. We’ve also optimized XDM processing, addressing the WASM runtime memory allocation issue many operators were experiencing in PR #3544 and added minimum transfer amounts for cross-chain transactions via PR #3545.
Benchmarking and Weight Calibration
The benchmarking work marked as a phase two requirement is progressing with PR #3543 fixing broken pallet-domains benchmarks, PR #3542 adding benchmarks for pallet domains extension, and PR #3541 implementing XDM extension benchmarks. These are essential for accurate extrinsic weight calculations before phase 2 launch. Now that the benchmarks are written we will run the benchmarks on a reference machine and update extrinsic weights accordingly.
Network Stability and DSN Improvements
Several fixes have been made to improve DSN performance. PR #3525 fixes multi-piece object retrieval bugs, PR #3523 improves segment downloading performance during DSN sync, and PR #3519 corrects an off-by-one error in snap sync reconstruction.
Current Audit Status
The auditors have been heavily focused on XDM and they expect to have completed review of recent PRs imminently. This leaves two categories of audit that we have marked as P1, sudo on domains and domains snap sync.
Sudo on Domain
On Substrate-based chains, Sudo is used to execute critical operations related to consensus, such as runtime upgrades and balance changes. This is currently how Sudo works on our consensus chain.
This changes when it comes to domains since they derive security from consensus. We cannot use Sudo directly on domains due to security concerns. Any critical executions, such as runtime upgrades, should originate from consensus. Therefore, we have developed a custom Sudo pallet for domains. With this pallet, any Sudo calls on domains are sent from consensus and executed on domains. Since this is a critical piece of code for domains, we want to ensure the audit provides clear approval before deploying this as part of phase 2.
Domain Snap Sync
Currently, operators must sync their nodes from consensus chain genesis to operate a domain, which creates significant friction, as this sync can take many hours to days. We have implemented snap sync for domains, similar to our existing consensus chain snap sync functionality. This allows operators to start fresh nodes and sync to the tip of the domain chain in hours rather than potentially multiple days. While testing this feature we have found several issues that have needed to be addressed, which has added to the audit queue. While not mandatory for phase 2 launch, including this feature would significantly benefit domain operator user experience. However, if the audit will take too long, we will launch phase 2 without this feature.
Issues Under Investigation
We are investigating reports of unreliable piece downloads on the network. This has been seen during plotting/re-plotting, syncing and object retrieval from the Auto Drive file gateway on the Taurus network. We are looking into whether this is a recent regression or an existing issue that is arising under current usage patterns.
Astral
Astral is currently facing several critical issues that have been exacerbated by ongoing XDM testing. As the protocol team fixed bottlenecks, the increased load surfaced new issues within Astral. Additionally, recent fixes to the staking indexer have yet to make it to production due to the bottleneck of slow re-indexing. We are systematically addressing these issues and re-architecting our indexing strategy. This will include removing some features that appear to have little usage but put excessive strain on indexing and/or queries. Specific issues we have been facing and solutions to these issues are detailed below.
Astral Issues
Slow Indexing
During periods of high activity, indexing has been extremely slow, at times barely able to keep up with block production. This creates a frustrating cycle as fixes need to be applied (see Staking Indexer Incorrect Values below) but the indexer can take many days(or weeks) to catch up. Over the last couple of weeks we have looked at the indexing process from the ground up and found many ways to improve indexing speed significantly (5x on average). Improvements include:
- Eliminating unnecessary API calls during indexing, including account history fetching and space pledge calculations that were major bottlenecks. PR #1594 in Astral addresses this issue.
- Implementing Redis queue architecture to move account history processing to asynchronous workers, reducing main indexer load
- Optimizing database schema by removing string-based sorting and converting to appropriate numeric types.
- Reducing database overhead by eliminating unnecessary indexes that were more than 20% overhead each
Autonomys “finalization” Incompatible with SubQuery
Autonomys’ usage of finalization within the polkadot-sdk is different than most chains, as we are a longest chain protocol with probabilistic finality. Our usage of the finalization flag is based on archived segments and can be many tens of thousands blocks from the chain head. SubQuery stores every intermediate header in a single entry of the _metadata
table, so when the distance between last finalized block and chain head becomes excessively long, the bloated metadata blob inflates memory consumption and lengthens cold-start times because the indexer must deserialize and verify the entire set before resuming work. The fix for this is to add a custom “finalization” threshold rather than using the block finalization flag which required forking SubQuery to handle the custom logic. PR #1 in our forked SubQuery and PR #1594 in Astral are resolving this issue.
Domain Segment Events Incompatible with SubQuery
A second area where we have incompatibility with SubQuery is in how system events are handled on our domains. To address a significant performance bottleneck in our domain execution, we introduced the concept of event segments. While this resolved our performance issue, it also created incompatibility with some polkadot-sdk ecosystem tooling such as the polkadot-js explorer and SubQuery.
We recently added support in our SubQuery fork to properly handle EventSegments
, which now allows us to correctly index XDM transactions on the Taurus Auto EVM domain. This capability had been blocking our ability to calculate a proper XDM tally for Crossing the Narrow Sea. With this issue resolved, we can now implement Auto EVM indexing of XDM transactions.
Slow Queries as Extrinsics/Event Counts Grow
As data on the networks has grown significantly, with 32 million extrinsics and over 145 million events on Taurus, maintaining filters, sorting, search functionality, and aggregate counts for pagination has become increasingly difficult and slow. While we are working toward more permanent solutions, several temporary measures are being applied to stabilize performance. We’ve implemented a 500k record limit and split queries to improve response times and prevent timeouts. Certain features, such as full page download, are still slow but will succeed if given time. Additionally, we’ve added indexes for aggregate count calculations to speed up pagination and split the ExtrinsicsByAccountId query to improve performance for account-specific data retrieval.
Staking Indexer Showing Incorrect Values
The staking indexer has been displaying incorrect values due to improper handling of storage fees during deposit processing. The issue occurred when processing new deposits through the OperatorNominated
event flow, where we were incorrectly applying the same storage fee deduction logic used for OperatorRegistered
events. This resulted in double-deducting storage fees - removing fees from amounts that already had fees removed - leading to lower estimated_shares values and inaccurate staking data across the platform. The core issue has been resolved in PR #1592, though deployment is pending the completion of re-indexing to ensure data accuracy.
Summary
The protocol team continues focused development toward Phase 2 launch, addressing critical XDM issues discovered during the “Crossing the Narrow Sea” contest and completing benchmarking requirements. Recent fixes include resolving a serious double-minting bug and implementing performance optimizations. The audit process is progressing with XDM review nearing completion, leaving sudo on domains and domain snap sync as the remaining audit categories. Meanwhile, Astral faces performance challenges exacerbated by increased network activity, with systematic improvements underway including 5x indexing speed improvements and database optimizations to handle the growing data volume of 32 million extrinsics and 145 million events on Taurus.