Verified Bedrock2 code for Number-Theoretic Transform #1997
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The PR is still in a bit of a rough state, but I'm opening it to see if there is interest in adding it.
The Number-Theoretic Transform is a technique to accelerate polynomial multiplications used in recent lattice-based cryptography for PQC.
This PR defines:
Polynomial.v
CyclotomicDecomposition.v
which defines an homomorphism from a typePquotl (cyclotomic_decomposition n 0)
toPquotl (cyclotomic_decomposition n k)
wherePquotl ql
is defined asPquotl (ql: list P): Type := { pl: list P | List.Forall2 (fun p q => Peq p (Pmod p q)) pl ql }
, andcyclotomic_decomposition n i
is the i-th layer decomposition ofX^n + 1
. It also defines various optimizations for the NTT.RupicolaNTT.v
BedrockNTT.v
, I initially tried to automatically synthesize the code using Rupicola, but ended up doing the proof manuallyRupicolaBarrettReduction.v
andRupicolaMontgomeryArithmetic.v
MLKEM.v
andMLDSA.v
.I believe the C code should look like what someone would write after reading the NIST standards with no other reference. In terms of performance, this is slower than the handwritten C reference implementations for Kyber/Dilithium which use a so-called centered signed representation for field elements, and delay reduction of the coefficients to the end of the NTT instead of systematically doing it at each step like the synthesised code.