[EXPERIMENTAL] Add RV32-IM backend#1119
Draft
hanno-becker wants to merge 1 commit into
Draft
Conversation
78ef84d to
0c721b5
Compare
Contributor
CBMC Results (ML-DSA-65, REDUCE-RAM)Full Results (205 proofs)
|
Contributor
CBMC Results (ML-DSA-44, REDUCE-RAM)Full Results (205 proofs)
|
Contributor
CBMC Results (ML-DSA-87, REDUCE-RAM)Full Results (205 proofs)
|
Contributor
CBMC Results (ML-DSA-44)Full Results (205 proofs)
|
Contributor
CBMC Results (ML-DSA-87)Full Results (205 proofs)
|
Contributor
CBMC Results (ML-DSA-65)Full Results (205 proofs)
|
0df94c0 to
1e15593
Compare
1e15593 to
c61ca7e
Compare
66858dd to
28b6232
Compare
This commit adds an experimental arithmetic backend for RISCV32-IM: The backend includes: - A 2+2+2+2 forward NTT - A 2+2+2+2 inverse NTT - A poly-poly base multiplication (but not vector-vector) Modular arithmetic in the NTT/inverse-NTT uses Barrett multiplication as in the AArch64 backend. A notable difference is that RV32-IM does not provide SQRDMULH, so we fall back to a plain MULH: This affects the 'magic constant' slightly, and also increases the output bound of the modular multiplication to 5/4 * MLDSA_Q -- this is already detailed in proofs/isabelle/neon_ntt. The larger multiplication bound is reflected in a larger output bound for the forward NTT, which has already been adopted in a previous commit and is known not to cause conflicts with the rest of the code. For the inverse NTT, however, a bound > MLDSA_Q for the output is problematic. Hence, for the final scaling step in the inverse NTT, we 'emulate' an SQRDMULH using MULH + ADD + SRAI. This kernel still needs to be proved correct in the 'Neon' NTT paper adaptation in proofs/isabelle/neon_ntt. We are interest in different classes of RV32-IM CPUs, some with fast, some with slow multipliers. For CPUs with very slow multiplier, such as Hummingbird E203 (https://github.com/riscv-mcu/e203_hbirdv2), cycles can be saved by leveraging MLDSA_Q = 2^23 - 2^13 + 1 and rewriting the low-multiplication by MLDSA_Q as a sequence of SHIFT+ADD/SUB. We offer this alternative version of the backend under a compile-time flag MLD_USE_NATIVE_RV32IM_SLOW_MULTIPLIER. CI runs the functional, KAT, ACVP and wycheproof suites under qemu-riscv32 for both the default and the slow-multiplier variant. Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
28b6232 to
23bbd01
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit adds an experimental arithmetic backend for RISCV32-IM:
The backend includes:
Modular arithmetic in the NTT/inverse-NTT uses Barrett multiplication
as in the AArch64 backend. A notable difference is that RV32-IM does
not provide SQRDMULH, so we fall back to a plain MULH: This affects
the 'magic constant' slightly, and also increases the output bound
of the modular multiplication to 5/4 * MLDSA_Q -- this is already
detailed in proofs/isabelle/neon_ntt. The larger multiplication bound
is reflected in a larger output bound for the forward NTT, which has
already been adopted in a previous commit and is known not to cause
conflicts with the rest of the code. For the inverse NTT, however,
a bound > MLDSA_Q for the output is problematic. Hence, for the final
scaling step in the inverse NTT, we 'emulate' an SQRDMULH using
MULH + ADD + SRAI. This kernel still needs to be proved correct
in the 'Neon' NTT paper adaptation in proofs/isabelle/neon_ntt.
We are interest in different classes of RV32-IM CPUs, some with fast,
some with slow multipliers. For CPUs with very slow multiplier, such
as Hummingbird E203 (https://github.com/riscv-mcu/e203_hbirdv2), cycles
can be saved by leveraging MLDSA_Q = 2^23 - 2^13 + 1 and rewriting
the low-multiplication by MLDSA_Q as a sequence of SHIFT+ADD/SUB.
We offer this alternative version of the backend under a compile-time
flag MLD_USE_NATIVE_RV32IM_SLOW_MULTIPLIER.
CI runs the functional, KAT, ACVP and wycheproof suites under
qemu-riscv32 for both the default and the slow-multiplier variant.