I have been experimenting with various things (again) and discovered a change that improves performance for CPU-based table creation by ~10%, possibly even more (all other things being equal).
The observation is that sometimes Y values are actually duplicated, which forces the implementation to track the count of these things here:
And then have to loop over them here:
But it would have been much more efficient to have a simple yes/no flag instead.
From consensus point of view this makes no difference, but for farmer this means a breaking change to the on-disk farm format when decoding data. So in case such a change is introduced, at least temporarily both versions should be supported.
This should be especially helpeful for GPUs where loops like this are very costly there.
For context, here is what the distribution of counts looks roughly like:
Count
Occurrences
Occurrences in %
Any
11313485
100%
0
11138124
98.44%
1
173970
1.54%
2
1375
0.01%
3
16
0.0001%
Essentially every next count is about two orders of magnitude less likely to happen. I have not seen counts 7+ in my testing, but I guess they may happen with some probability too.
Only handling 0 and 1 will mean tables will be a little bit smaller (we’ll ignore ~1.5% of potentially valid entries on every table), but since we already have parity backup on the farmer and not all of those entries will have matches anyway, the possibility of the farmer not having encoded pieces (having chunks that can’t participate in rewards) is 0% just like before.
Exploring further optimizations, since we don’t technically need any particular Y in the table, so due to small size of each bucket during matches, SIMD binary search might work really well without rmap creation.
And this is especially applicable for GPU, where such things should be even faster. Have a very good feeling about performance potential here.