By reading the source code, it’s not hard to discover that the plotting operation is actually an iteration over N Vec, and involves computations like calculating checksums. This leads to performance issues due to the large number of iterations. While converting the iteration to CUDA or OpenGL is extremely complex, is there a more convenient way to iterate it into X86 code and transfer it to other compute cards? For example, performing parallel computations on specialized hardware like Intel’s VAC or Phi, and then reading the result code back from the memory.
While we do plot one sector at a time, the encoding is heavily parallelizable and will be offloaded to GPUs in the future. Checksum calculation is technically not a hard requirement of the protocol, it is more of a UX improvement to be able to discover disc corruption/inconsistencies and with Blake3 is parallelized as well, taking negligible amount of time in the grand scale of things.
I’m not exactly sure what you’re suggesting exactly here.