We have a problem, how big or small I’m yet to figure out.
Right now commitment creation for raw record (with parameters from v2.3 spec) is slow takes ~200ms on my machine: Slow comitment to polynomial with blst_from_scratch · Issue #202 · sifraitech/rust-kzg · GitHub
It is not as bad during normal usage, but there are cases like during development, initial node startup and node sync it’ll occupy a significant amount of time.
Specifically, it takes ~25 seconds to commit to 128 pieces, which I can try to work around with parallelism, but it is still an expensive operation even though we’re not doing witness creation for those large polynomials.
Is there something we can do beyond parallelism to accelerate it?
First, I could explore other functions they have in rust-kzg
that may be faster. Hopefully, doing a linear combination with a trusted setup directly with lower-level primitives than commit-to-poly
can help.
Second, if create-polynomial
takes a significant time, we can explore treating trusted setup in Lagrange form (like ETH) vs the coefficient form we use currently. That, however would require some modifications to proof computation functions either upstream or a fork.
So I have tried the rust-kzg
devs suggestion of porting Variable Base Multiscalar Multiplication to blst-from-scratch
(by copy-pasting and changing types to blst types, forgive me):
- With the feature
parallel
enabled on a 2^15 poly, MSM is better:
create-polynomial time: [1.6686 ms 1.6810 ms 1.7017 ms]
commit time: [82.195 ms 83.207 ms 84.294 ms]
create-witness time: [137.34 ms 137.64 ms 137.90 ms]
- Without parallelism, MSM turned out to be three times worse than our
main
:
create-polynomial time: [4.8556 ms 4.8603 ms 4.8655 ms]
commit time: [616.63 ms 616.88 ms 617.17 ms]
create-witness time: [632.05 ms 633.69 ms 635.44 ms]
- Enabling ‘parallel’ for
blst-from-scratch
does not affect main
commitment performance because the g1_linear_combination
function rust-kzg
uses to commit has no cfg
for parallel.
How critical do you consider this issue to be? How much time should we spend right now improving commitment performance, or can we move forward with what we have?
I’m already parallelizing commitments creation in Improve archiver performance by nazar-pc · Pull Request #1323 · subspace/subspace · GitHub (still needs some fixes apparently, but should be merged soon) and there is some more potential for it still.
Parallelism would help us in some cases, but more generally we’d ideally have a cheaper algorithm in general. We don’t expect people run 128-thread machines to benefit massively from threading, it should become faster in single-threaded mode if possible.
I’m not sure it is a blocker yet, but when we start getting bigger blocks it will become a bigger issue and it will certainly impact sync time significantly. So not a blocker, but certainly very important consideration.
Variable base MSM is slower in a single thread. Another avenue to explore from a purely mathematical perspective is that we always multiply with the same set of points (aka trusted setup) and random sets of scalars, not random points to random scalars as in the general case. Maybe there is a way to precompute some operations (for instance this paper). Let’s come back to this when the time comes.