Change derivation of PoSpace seed (and HDD-compatible exploit)

Right now we have the following design and implementation:

let sector_id = SectorId::new(public_key_hash, sector_index);
let seed = sector_id.derive_evaluation_seed(piece_offset, history_size);

However, the observation is that history_size is the same for all pieces in the sector. Technically according to current rules it doesn’t have to be, but it is because there is no reason for that to not be the case.

Now I’m wondering if we should move history size into sector ID derivation instead. I don’t recall that there is any value in sector IDs to be identical regardless of history size.


Moreover, I found an exploit for current design that may replace a random reads with essentially sequential reads during audit and potentially HDD-compatible plotter.

Note that this is how we derive slot challenge for a sector:

let sector_slot_challenge = sector_id.derive_sector_slot_challenge(
    global_challenge,
);

And on disk for auditing purposes we store chunks sorted by s-buckets.

Here is what I would do to reduce random reads:

  • Have one sector index/ID reused across multiple physical sectors
  • Plot each physical sector as of different blockchain height (which is easy to do once history size is non-trivial)
  • Interleave s-buckets of different physical sectors, such that all s-buckets with index 0 come first, then all s-buckets with index 1, etc.

With such disk layout one could read a whole bunch of “sectors” with a single large sequential reads without the need to jump between different sections of the disk due to the fact that sector slot challenge will be the same for all of these distinct physical sectors.

Of course these sectors will store random selection of pieces, so there is technically nothing fundamentally wrong with this.

On one hand this may allow to design HDD-friendly (to a degree at least) auditor, but proving will likely remain being a challenge with 32k of random reads across disk (spaced even further than usually).


I’m not sure if there are other exploits of current design to worry about, but I don’t like it either way and I think we should address this.

1 Like

If you recall, we also considered such disk layout for SSDs at some point. IIRC there was also complexity in mapping where exactly one would find the rest of the chunks of the required piece. Also I don’t think it’s necessarily a bad thing if someone manages to hack plotting to support HDD. I do doubt proving is feasible, though.

I’m talking specifically about same sector here, for different sectors it’ll not help in any way, while for same sector with the same amount of metadata it is possible to combine reads more efficiently.

It just feels architecturally wrong to apply history size to the piece offset rather than the whole sector, which sparked this investigation.

I see what you’re saying, makes sense that if history_size is same for the whole sector then it can be added into the mix one step before. Mathematically, it shouldn’t make a difference as long as history_size is mixed into the hash, but I also don’t think it’s not worth changing at this point as it doesn’t bring any obvious benefit, but it would require an audit sign-off.

Nazar, please see Usage of total_pieces. I believe that this was the reason to include history_size (back then called total_pieces) in the evaluation seed of the PoS table.
It seems to me that we can move it into sector_id instead, however the moral is that we should be careful about moving it from one place to another.

1 Like

IIRC that was v1 consensus, which worked significantly differently from v2.3 that we have right now.

Absolutely!

It depends on implementation. Even if random reads are not really possible on a single HDD, imagine for simplicity that you have a bunch of HDDs and you read let’s say 1/th of the chunks or even the whole sector from each of 8 drives. Then suddenly even sequential read becomes something like 125MB/s, which even somewhat old HDDs can do.

My concern is that if this is possible, we’ll need to build such a software. If we don’t then someone else will, likely closed source, even more likely operated by farming pool. Which will effectively defeat the one of the key pillars of the protocol: decentralization.

I would strongly suggest to fix the issue before mainnet, even if that requires a short delay due to audit/dev work etc.

SSD only would be the best solution, but it really needs to be airtight. Further workarounds to the SSD I/O requirement might be found, once mainnet more eyes will check the code for such possibilities. Any such discoveries will lead to centralization.

SDD and HDD farming could be an option as long as SSDs keeps an economical advantage (due to smaller plotfiles for instance). But the implementation has to come from Autonomys, otherwise 3rd party developers will try to centralize farming. Optional HDD mining will also lead to a ballooning netspace, there are exabytes waiting for something to farm.

Again, I’d advocate for a thorough fix now, even at the cost of a delay.

Yes, this grinding attack is real. I believe that we shouldn’t let this happen.

It seems that moving history_size into sector_id is strictly better than our previous method. But it is unclear whether we have other attack surfaces.

Note that our current security analysis assumes a fixed history_size (for simplicity). So, we should extend our security analysis to a more general case.

To make this more concrete, we propose the following changes

  • sector_id = keyed_hash(public_key_hash, sector_index, history_size) %Previously, it was sector_id = keyed_hash(public_key_hash, sector_index)
  • evaluation_seed = hash(sector_id || piece_offset) %Previously, it was evaluation_seed = hash(sector_id || piece_offset || history_size)

As a sanity check, we know that
different history_size → different sector_id → different evaluation_seed → hard to fake storage

Still, will this open new attack surfaces? We will explore this in the coming days.