Skip to content

C: Switch to [inv]NTT with 2+2+2+1 structure#1696

Draft
hanno-becker wants to merge 1 commit into
mainfrom
c_ntt
Draft

C: Switch to [inv]NTT with 2+2+2+1 structure#1696
hanno-becker wants to merge 1 commit into
mainfrom
c_ntt

Conversation

@hanno-becker

Copy link
Copy Markdown
Contributor

Rewrite mlk_poly_ntt_c / mlk_poly_invntt_tomont_c to process two layers at a time, with three 2-layer passes plus the leftover layer 7 as a single layer.

Introduces shared mlk_ct_butterfly and mlk_gs_butterfly helpers; the inverse 2-layer block applies four GS butterflies and then Barrett-reduces the additive outputs explicitly.

mlk_fqmul now takes a precomputed b_twisted = b * MLKEM_Q^{-1} mod 2^16 and uses a hi-mul / lo-mul-and-correct sequence in place of an inline mlk_montgomery_reduce, dropping the QINV multiply. The mlk_zetas table is regenerated as int16_t[128][2] of (zeta_mont, zeta_twisted) pairs.

@hanno-becker hanno-becker changed the title C-ref NTT: merge layer pairs, switch fqmul to twisted Montgomery C: Switch to [inv]NTT with 2+2+2+1 structure May 15, 2026
@hanno-becker hanno-becker added the benchmark this PR should be benchmarked in CI label May 15, 2026
@hanno-becker hanno-becker force-pushed the c_ntt branch 2 times, most recently from 5b598cc to 69e4999 Compare May 15, 2026 13:46
@oqs-bot

oqs-bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-KEM-512)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 1689s 1202s +40.5%
Full Results (195 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 1689s 1202s +40.5%
mlk_ntt_2_layers_block 402s - new
mlk_indcpa_keypair_derand 244s 232s +5%
mlk_indcpa_enc 161s 158s +2%
mlk_rej_uniform_c 108s 106s +2%
mlk_polyvec_basemul_acc_montgomery_cached_c 53s 49s +8%
mlk_poly_rej_uniform 29s 28s +4%
mlk_keccak_squeezeblocks_x4 26s 24s +8%
mlk_ntt_2_layers 24s - new
poly_ntt_native 24s 23s +4%
mlk_ntt_layer 22s 25s -12%
mlk_poly_reduce_native 19s 17s +12%
keccakf1600x4_permute_native_x4 16s 19s -16%
mlk_poly_decompress_d10_native 15s 12s +25%
mlk_indcpa_dec 14s 15s -7%
mlk_poly_decompress_d4_native 13s 13s +0%
mlk_invntt_2_layers_block 11s - new
mlk_polyvec_add 11s 12s -8%
mlk_poly_frommsg 9s 10s -10%
mlk_poly_rej_uniform_x4 9s 5s +80%
mlk_keccak_squeezeblocks 8s 6s +33%
polyvec_basemul_acc_montgomery_cached_native 8s 6s +33%
mlk_invntt_layer 7s 4s +75%
mlk_keccak_squeeze_once 7s 7s +0%
mlk_ntt_butterfly_block 7s 7s +0%
mlk_poly_frombytes_native 7s 8s -12%
mlk_poly_invntt_tomont_c 6s 2s +200%
rej_uniform_native_x86_64 6s 6s +0%
kem_check_pk 5s 2s +150%
mlk_fqmul 5s 13s -62%
mlk_keccakf1600_permute_c 5s 4s +25%
mlk_poly_cbd_eta2 5s 7s -29%
mlk_poly_decompress_d11_c 5s 3s +67%
mlk_poly_ntt 5s 6s -17%
nttunpack_native_x86_64 5s 3s +67%
poly_decompress_d10_native_x86_64 5s 2s +150%
poly_decompress_d4_native_x86_64 5s 7s -29%
sys_check_capability 5s 3s +67%
keccakf1600_permute_native 4s 2s +100%
kem_enc 4s 2s +100%
mlk_ct_cmov_zero 4s 2s +100%
mlk_keccak_absorb_once_x4 4s 4s +0%
mlk_montgomery_reduce 4s 2s +100%
mlk_poly_compress_d11_c 4s 2s +100%
mlk_poly_compress_du 4s 2s +100%
mlk_polyvec_reduce 4s 2s +100%
mlk_polyvec_tomont 4s 1s +300%
mlk_scalar_decompress_d4 4s 2s +100%
poly_frombytes_native_x86_64 4s 5s -20%
poly_invntt_tomont_native 4s 2s +100%
poly_mulcache_compute_native_aarch64 4s 3s +33%
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 4s 3s +33%
intt_native_x86_64 3s 2s +50%
keccak_f1600_x1_native_aarch64 3s 3s +0%
keccak_f1600_x1_native_aarch64_v84a 3s 3s +0%
keccak_f1600_x4_native_aarch64_v84a 3s 5s -40%
keccakf1600x4_extract_bytes_native 3s 2s +50%
keccakf1600x4_xor_bytes_native 3s 3s +0%
kem_check_sk 3s 1s +200%
kem_dec 3s 6s -50%
mlk_barrett_reduce 3s 2s +50%
mlk_check_pct 3s 1s +200%
mlk_ct_get_optblocker_i32 3s 2s +50%
mlk_ct_memcmp 3s 2s +50%
mlk_gen_matrix 3s 1s +200%
mlk_invntt_2_layers 3s - new
mlk_keccak_absorb_once 3s 2s +50%
mlk_keccakf1600_extract_bytes 3s 3s +0%
mlk_keccakf1600x4_extract_bytes_c 3s 2s +50%
mlk_keccakf1600x4_xor_bytes 3s 1s +200%
mlk_poly_add 3s 2s +50%
mlk_poly_cbd_eta1 3s 2s +50%
mlk_poly_compress_d5_c 3s 1s +200%
mlk_poly_compress_d5_native 3s 5s -40%
mlk_poly_decompress_d4_c 3s 1s +200%
mlk_poly_decompress_du 3s 2s +50%
mlk_poly_frombytes 3s 3s +0%
mlk_poly_mulcache_compute 3s 3s +0%
mlk_poly_reduce_c 3s 1s +200%
mlk_poly_tobytes 3s 1s +200%
mlk_poly_tobytes_c 3s 4s -25%
mlk_poly_tomont_native 3s 3s +0%
mlk_polyvec_compress_du 3s 2s +50%
mlk_polyvec_invntt_tomont 3s 2s +50%
mlk_polyvec_permute_bitrev_to_custom_native 3s 2s +50%
mlk_scalar_compress_d1 3s 1s +200%
mlk_scalar_compress_d4 3s 2s +50%
mlk_scalar_decompress_d10 3s 2s +50%
mlk_scalar_decompress_d5 3s 3s +0%
mlk_scalar_signed_to_unsigned_q 3s 2s +50%
mlk_sha3_512 3s 3s +0%
mlk_shake128x4_absorb_once 3s 1s +200%
mlk_shake256x4 3s 2s +50%
mlk_value_barrier_u8 3s 2s +50%
poly_compress_d10_native_x86_64 3s 2s +50%
poly_compress_d5_native_x86_64 3s 1s +200%
poly_decompress_d11_native_x86_64 3s 4s -25%
poly_reduce_native_aarch64 3s 2s +50%
poly_tobytes_native_x86_64 3s 1s +200%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 3s 4s -25%
rej_uniform_native 3s 2s +50%
intt_native_aarch64 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 4s -50%
kem_enc_derand 2s 2s +0%
kem_keypair 2s 3s -33%
kem_keypair_derand 2s 2s +0%
mlk_ct_cmask_neg_i16 2s 2s +0%
mlk_ct_cmask_nonzero_u16 2s 2s +0%
mlk_ct_get_optblocker_u32 2s 1s +100%
mlk_ct_get_optblocker_u8 2s 2s +0%
mlk_ct_sel_uint8 2s 1s +100%
mlk_enc_getnoise_eta1_eta2 2s 3s -33%
mlk_gen_matrix_serial 2s 3s -33%
mlk_keccakf1600_xor_bytes (big endian) 2s 5s -60%
mlk_keccakf1600x4_permute 2s 1s +100%
mlk_keccakf1600x4_xor_bytes_c 2s 2s +0%
mlk_keypair_getnoise_eta1 2s 2s +0%
mlk_matvec_mul 2s 1s +100%
mlk_poly_compress_d10 2s 2s +0%
mlk_poly_compress_d10_c 2s 3s -33%
mlk_poly_compress_d11 2s 2s +0%
mlk_poly_compress_d11_native 2s 2s +0%
mlk_poly_compress_d4 2s 1s +100%
mlk_poly_compress_d4_c 2s 2s +0%
mlk_poly_compress_d4_native 2s 2s +0%
mlk_poly_compress_d5 2s 3s -33%
mlk_poly_compress_dv 2s 2s +0%
mlk_poly_decompress_d11 2s 4s -50%
mlk_poly_decompress_d11_native 2s 1s +100%
mlk_poly_decompress_d5 2s 2s +0%
mlk_poly_decompress_d5_native 2s 3s -33%
mlk_poly_decompress_dv 2s 5s -60%
mlk_poly_frombytes_c 2s 3s -33%
mlk_poly_getnoise_eta1122_4x 2s 2s +0%
mlk_poly_getnoise_eta1_4x 2s 3s -33%
mlk_poly_getnoise_eta1_4x_native 2s 2s +0%
mlk_poly_getnoise_eta2 2s 4s -50%
mlk_poly_invntt_tomont 2s 1s +100%
mlk_poly_mulcache_compute_c 2s 3s -33%
mlk_poly_mulcache_compute_native 2s 3s -33%
mlk_poly_ntt_c 2s 4s -50%
mlk_poly_sub 2s 3s -33%
mlk_poly_tobytes_native 2s 2s +0%
mlk_poly_tomont 2s 2s +0%
mlk_poly_tomont_c 2s 2s +0%
mlk_poly_tomsg 2s 3s -33%
mlk_polymat_permute_bitrev_to_custom 2s 2s +0%
mlk_polyvec_basemul_acc_montgomery_cached 2s 1s +100%
mlk_polyvec_decompress_du 2s 2s +0%
mlk_polyvec_frombytes 2s 2s +0%
mlk_polyvec_mulcache_compute 2s 3s -33%
mlk_polyvec_permute_bitrev_to_custom 2s 2s +0%
mlk_polyvec_tobytes 2s 3s -33%
mlk_rej_uniform 2s 3s -33%
mlk_scalar_compress_d10 2s 1s +100%
mlk_scalar_compress_d11 2s 3s -33%
mlk_scalar_compress_d5 2s 2s +0%
mlk_scalar_decompress_d11 2s 1s +100%
mlk_sha3_256 2s 1s +100%
mlk_shake128_absorb_once 2s 2s +0%
mlk_shake128_squeezeblocks 2s 1s +100%
mlk_shake256 2s 1s +100%
mlk_value_barrier_u32 2s 2s +0%
ntt_native_aarch64 2s 2s +0%
poly_compress_d11_native_x86_64 2s 3s -33%
poly_decompress_d5_native_x86_64 2s 1s +100%
poly_mulcache_compute_native_x86_64 2s 5s -60%
poly_reduce_native_x86_64 2s 3s -33%
poly_tomont_native_aarch64 2s 3s -33%
poly_tomont_native_x86_64 2s 1s +100%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 2s 2s +0%
rej_uniform_native_aarch64 2s 2s +0%
keccak_f1600_x4_native_avx2 1s 2s -50%
mlk_ct_cmask_nonzero_u8 1s 1s +0%
mlk_ct_sel_int16 1s 2s -50%
mlk_keccakf1600_extract_bytes (big endian) 1s 2s -50%
mlk_keccakf1600_permute 1s 4s -75%
mlk_keccakf1600_xor_bytes 1s 2s -50%
mlk_keccakf1600x4_extract_bytes 1s 2s -50%
mlk_poly_compress_d10_native 1s 1s +0%
mlk_poly_decompress_d10 1s 3s -67%
mlk_poly_decompress_d10_c 1s 1s +0%
mlk_poly_decompress_d4 1s 1s +0%
mlk_poly_decompress_d5_c 1s 1s +0%
mlk_poly_reduce 1s 3s -67%
mlk_polyvec_ntt 1s 2s -50%
mlk_shake128x4_squeezeblocks 1s 3s -67%
mlk_value_barrier_i32 1s 3s -67%
ntt_native_x86_64 1s 2s -50%
poly_compress_d4_native_x86_64 1s 3s -67%
poly_getnoise_eta1122_4x_native 1s 2s -50%
poly_tobytes_native_aarch64 1s 1s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 1s 5s -80%

@oqs-bot

oqs-bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-KEM-768)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 1727s 1303s +32.5%
Full Results (195 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 1727s 1303s +32.5%
mlk_ntt_2_layers_block 436s - new
mlk_indcpa_keypair_derand 196s 202s -3%
mlk_indcpa_enc 172s 177s -3%
mlk_rej_uniform_c 122s 136s -10%
mlk_polyvec_basemul_acc_montgomery_cached_c 43s 46s -7%
mlk_poly_rej_uniform 32s 35s -9%
mlk_keccak_squeezeblocks_x4 27s 25s +8%
mlk_ntt_2_layers 27s - new
poly_ntt_native 27s 28s -4%
mlk_ntt_layer 25s 35s -29%
mlk_poly_reduce_native 19s 21s -10%
keccakf1600x4_permute_native_x4 17s 17s +0%
polyvec_basemul_acc_montgomery_cached_native 16s 19s -16%
mlk_indcpa_dec 15s 14s +7%
mlk_poly_decompress_d4_native 14s 13s +8%
mlk_poly_decompress_d10_native 13s 16s -19%
mlk_poly_frommsg 12s 8s +50%
mlk_invntt_2_layers_block 11s - new
mlk_poly_ntt 10s 7s +43%
mlk_ntt_butterfly_block 9s 10s -10%
mlk_poly_frombytes_native 9s 12s -25%
mlk_polyvec_add 9s 9s +0%
mlk_invntt_layer 8s 5s +60%
mlk_keccak_squeeze_once 8s 7s +14%
mlk_keccak_squeezeblocks 8s 10s -20%
mlk_poly_compress_d4_c 8s 2s +300%
kem_dec 7s 5s +40%
mlk_poly_rej_uniform_x4 7s 6s +17%
rej_uniform_native_x86_64 7s 7s +0%
poly_frombytes_native_x86_64 6s 5s +20%
mlk_keccak_absorb_once_x4 5s 7s -29%
mlk_keccakf1600_permute_c 5s 6s -17%
mlk_poly_cbd_eta1 5s 2s +150%
mlk_scalar_signed_to_unsigned_q 5s 2s +150%
rej_uniform_native_aarch64 5s 3s +67%
intt_native_aarch64 4s 1s +300%
intt_native_x86_64 4s 1s +300%
kem_check_pk 4s 3s +33%
mlk_ct_cmask_nonzero_u16 4s 1s +300%
mlk_enc_getnoise_eta1_eta2 4s 3s +33%
mlk_fqmul 4s 14s -71%
mlk_gen_matrix_serial 4s 3s +33%
mlk_keccak_absorb_once 4s 3s +33%
mlk_keccakf1600x4_permute 4s 1s +300%
mlk_keypair_getnoise_eta1 4s 3s +33%
mlk_poly_compress_d10_c 4s 3s +33%
mlk_poly_compress_du 4s 2s +100%
mlk_poly_decompress_d11 4s 2s +100%
mlk_poly_getnoise_eta1_4x_native 4s 6s -33%
mlk_poly_getnoise_eta2 4s 3s +33%
mlk_poly_tomont_native 4s 3s +33%
mlk_polyvec_reduce 4s 2s +100%
mlk_scalar_decompress_d4 4s 4s +0%
mlk_shake256x4 4s 5s -20%
mlk_value_barrier_u8 4s 1s +300%
poly_compress_d5_native_x86_64 4s 6s -33%
poly_decompress_d10_native_x86_64 4s 3s +33%
poly_decompress_d5_native_x86_64 4s 3s +33%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 4s 4s +0%
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 4s 5s -20%
mlk_check_pct 3s 2s +50%
mlk_ct_cmask_nonzero_u8 3s 1s +200%
mlk_ct_get_optblocker_u8 3s 2s +50%
mlk_ct_sel_uint8 3s 3s +0%
mlk_gen_matrix 3s 3s +0%
mlk_keccakf1600x4_extract_bytes_c 3s 2s +50%
mlk_matvec_mul 3s 1s +200%
mlk_poly_cbd_eta2 3s 3s +0%
mlk_poly_compress_d10_native 3s 3s +0%
mlk_poly_compress_d4 3s 2s +50%
mlk_poly_compress_d5_native 3s 2s +50%
mlk_poly_decompress_d4_c 3s 1s +200%
mlk_poly_decompress_d5_c 3s 2s +50%
mlk_poly_getnoise_eta1_4x 3s 3s +0%
mlk_poly_invntt_tomont_c 3s 3s +0%
mlk_poly_mulcache_compute 3s 2s +50%
mlk_poly_tomont 3s 3s +0%
mlk_polyvec_frombytes 3s 3s +0%
mlk_polyvec_invntt_tomont 3s 3s +0%
mlk_polyvec_ntt 3s 3s +0%
mlk_polyvec_permute_bitrev_to_custom_native 3s 4s -25%
mlk_scalar_compress_d1 3s 2s +50%
mlk_scalar_compress_d4 3s 3s +0%
mlk_scalar_compress_d5 3s 3s +0%
mlk_scalar_decompress_d10 3s 3s +0%
mlk_shake128x4_squeezeblocks 3s 2s +50%
mlk_shake256 3s 1s +200%
mlk_value_barrier_i32 3s 3s +0%
ntt_native_aarch64 3s 4s -25%
ntt_native_x86_64 3s 2s +50%
nttunpack_native_x86_64 3s 4s -25%
poly_compress_d10_native_x86_64 3s 3s +0%
poly_compress_d11_native_x86_64 3s 3s +0%
poly_decompress_d11_native_x86_64 3s 2s +50%
poly_decompress_d4_native_x86_64 3s 4s -25%
poly_getnoise_eta1122_4x_native 3s 5s -40%
poly_mulcache_compute_native_aarch64 3s 4s -25%
poly_reduce_native_aarch64 3s 3s +0%
poly_tobytes_native_aarch64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 3s 4s -25%
keccak_f1600_x1_native_aarch64 2s 5s -60%
keccak_f1600_x1_native_aarch64_v84a 2s 1s +100%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_avx2 2s 2s +0%
keccakf1600_permute_native 2s 1s +100%
keccakf1600x4_extract_bytes_native 2s 3s -33%
keccakf1600x4_xor_bytes_native 2s 2s +0%
kem_check_sk 2s 3s -33%
kem_enc 2s 3s -33%
kem_enc_derand 2s 3s -33%
kem_keypair 2s 2s +0%
mlk_ct_cmask_neg_i16 2s 3s -33%
mlk_ct_cmov_zero 2s 1s +100%
mlk_ct_get_optblocker_i32 2s 2s +0%
mlk_ct_sel_int16 2s 3s -33%
mlk_invntt_2_layers 2s - new
mlk_keccakf1600_extract_bytes 2s 3s -33%
mlk_keccakf1600_extract_bytes (big endian) 2s 5s -60%
mlk_keccakf1600_permute 2s 2s +0%
mlk_keccakf1600_xor_bytes 2s 3s -33%
mlk_keccakf1600_xor_bytes (big endian) 2s 2s +0%
mlk_keccakf1600x4_xor_bytes 2s 3s -33%
mlk_keccakf1600x4_xor_bytes_c 2s 1s +100%
mlk_poly_add 2s 3s -33%
mlk_poly_compress_d10 2s 3s -33%
mlk_poly_compress_d11 2s 2s +0%
mlk_poly_compress_d11_c 2s 2s +0%
mlk_poly_compress_d4_native 2s 3s -33%
mlk_poly_compress_d5_c 2s 4s -50%
mlk_poly_decompress_d10 2s 4s -50%
mlk_poly_decompress_d10_c 2s 2s +0%
mlk_poly_decompress_d11_c 2s 3s -33%
mlk_poly_decompress_d11_native 2s 2s +0%
mlk_poly_decompress_d4 2s 1s +100%
mlk_poly_decompress_d5 2s 3s -33%
mlk_poly_decompress_du 2s 2s +0%
mlk_poly_decompress_dv 2s 3s -33%
mlk_poly_frombytes 2s 2s +0%
mlk_poly_frombytes_c 2s 1s +100%
mlk_poly_getnoise_eta1122_4x 2s 1s +100%
mlk_poly_invntt_tomont 2s 3s -33%
mlk_poly_mulcache_compute_native 2s 2s +0%
mlk_poly_reduce 2s 1s +100%
mlk_poly_sub 2s 1s +100%
mlk_poly_tobytes_native 2s 2s +0%
mlk_poly_tomsg 2s 3s -33%
mlk_polymat_permute_bitrev_to_custom 2s 3s -33%
mlk_polyvec_basemul_acc_montgomery_cached 2s 3s -33%
mlk_polyvec_decompress_du 2s 2s +0%
mlk_polyvec_mulcache_compute 2s 4s -50%
mlk_polyvec_tomont 2s 1s +100%
mlk_rej_uniform 2s 3s -33%
mlk_sha3_256 2s 2s +0%
mlk_sha3_512 2s 1s +100%
mlk_shake128_squeezeblocks 2s 2s +0%
mlk_shake128x4_absorb_once 2s 2s +0%
mlk_value_barrier_u32 2s 1s +100%
poly_invntt_tomont_native 2s 2s +0%
poly_mulcache_compute_native_x86_64 2s 2s +0%
poly_tobytes_native_x86_64 2s 1s +100%
poly_tomont_native_aarch64 2s 3s -33%
poly_tomont_native_x86_64 2s 1s +100%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 2s 1s +100%
rej_uniform_native 2s 4s -50%
sys_check_capability 2s 4s -50%
keccak_f1600_x4_native_aarch64_v84a 1s 4s -75%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 1s +0%
kem_keypair_derand 1s 3s -67%
mlk_barrett_reduce 1s 2s -50%
mlk_ct_get_optblocker_u32 1s 3s -67%
mlk_ct_memcmp 1s 3s -67%
mlk_keccakf1600x4_extract_bytes 1s 1s +0%
mlk_montgomery_reduce 1s 1s +0%
mlk_poly_compress_d11_native 1s 2s -50%
mlk_poly_compress_d5 1s 4s -75%
mlk_poly_compress_dv 1s 1s +0%
mlk_poly_decompress_d5_native 1s 1s +0%
mlk_poly_mulcache_compute_c 1s 3s -67%
mlk_poly_ntt_c 1s 2s -50%
mlk_poly_reduce_c 1s 6s -83%
mlk_poly_tobytes 1s 2s -50%
mlk_poly_tobytes_c 1s 2s -50%
mlk_poly_tomont_c 1s 1s +0%
mlk_polyvec_compress_du 1s 1s +0%
mlk_polyvec_permute_bitrev_to_custom 1s 1s +0%
mlk_polyvec_tobytes 1s 1s +0%
mlk_scalar_compress_d10 1s 1s +0%
mlk_scalar_compress_d11 1s 2s -50%
mlk_scalar_decompress_d11 1s 1s +0%
mlk_scalar_decompress_d5 1s 2s -50%
mlk_shake128_absorb_once 1s 2s -50%
poly_compress_d4_native_x86_64 1s 2s -50%
poly_reduce_native_x86_64 1s 3s -67%

@oqs-bot

oqs-bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-KEM-1024)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 1562s 1207s +29.4%
Full Results (195 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 1562s 1207s +29.4%
mlk_ntt_2_layers_block 376s - new
mlk_indcpa_enc 135s 141s -4%
mlk_indcpa_keypair_derand 123s 118s +4%
mlk_rej_uniform_c 113s 117s -3%
mlk_polyvec_basemul_acc_montgomery_cached_c 70s 75s -7%
polyvec_basemul_acc_montgomery_cached_native 33s 34s -3%
mlk_poly_rej_uniform 29s 30s -3%
mlk_keccak_squeezeblocks_x4 24s 25s -4%
mlk_ntt_2_layers 23s - new
mlk_ntt_layer 23s 29s -21%
poly_ntt_native 23s 25s -8%
mlk_poly_reduce_native 18s 24s -25%
keccakf1600x4_permute_native_x4 15s 17s -12%
mlk_poly_decompress_d11_native 13s 14s -7%
mlk_poly_decompress_d5_native 12s 14s -14%
mlk_invntt_2_layers_block 11s - new
mlk_polyvec_add 11s 11s +0%
kem_dec 8s 5s +60%
mlk_indcpa_dec 8s 10s -20%
mlk_invntt_layer 8s 9s -11%
mlk_keccak_squeezeblocks 8s 8s +0%
mlk_poly_frombytes_native 8s 7s +14%
mlk_poly_ntt 8s 8s +0%
mlk_keccak_absorb_once_x4 7s 5s +40%
mlk_keccak_squeeze_once 7s 10s -30%
mlk_poly_frommsg 7s 6s +17%
mlk_ntt_butterfly_block 6s 8s -25%
mlk_poly_compress_d10_native 6s 2s +200%
mlk_polymat_permute_bitrev_to_custom 6s 7s -14%
rej_uniform_native_x86_64 6s 5s +20%
mlk_fqmul 5s 16s -69%
mlk_gen_matrix_serial 5s 6s -17%
mlk_keccakf1600_permute_c 5s 7s -29%
mlk_poly_tomont_c 5s 1s +400%
mlk_shake128x4_absorb_once 5s 4s +25%
poly_getnoise_eta1122_4x_native 5s 5s +0%
kem_check_pk 4s 3s +33%
kem_keypair_derand 4s 3s +33%
mlk_gen_matrix 4s 6s -33%
mlk_invntt_2_layers 4s - new
mlk_keccak_absorb_once 4s 4s +0%
mlk_matvec_mul 4s 3s +33%
mlk_montgomery_reduce 4s 2s +100%
mlk_poly_compress_d11 4s 1s +300%
mlk_poly_compress_d11_c 4s 4s +0%
mlk_poly_compress_du 4s 2s +100%
mlk_poly_decompress_d10 4s 1s +300%
mlk_poly_getnoise_eta1_4x 4s 3s +33%
mlk_poly_invntt_tomont_c 4s 3s +33%
mlk_poly_rej_uniform_x4 4s 7s -43%
mlk_polyvec_mulcache_compute 4s 3s +33%
mlk_polyvec_permute_bitrev_to_custom_native 4s 3s +33%
mlk_polyvec_tobytes 4s 2s +100%
ntt_native_aarch64 4s 2s +100%
nttunpack_native_x86_64 4s 3s +33%
poly_decompress_d11_native_x86_64 4s 5s -20%
poly_decompress_d5_native_x86_64 4s 4s +0%
poly_mulcache_compute_native_aarch64 4s 1s +300%
keccak_f1600_x1_native_aarch64_v84a 3s 3s +0%
keccak_f1600_x4_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_avx2 3s 1s +200%
keccakf1600_permute_native 3s 2s +50%
kem_check_sk 3s 5s -40%
kem_enc 3s 2s +50%
kem_enc_derand 3s 3s +0%
mlk_check_pct 3s 1s +200%
mlk_ct_cmask_nonzero_u16 3s 4s -25%
mlk_ct_cmov_zero 3s 4s -25%
mlk_ct_memcmp 3s 2s +50%
mlk_enc_getnoise_eta1_eta2 3s 4s -25%
mlk_keccakf1600_extract_bytes 3s 2s +50%
mlk_keccakf1600_extract_bytes (big endian) 3s 3s +0%
mlk_keccakf1600_xor_bytes 3s 2s +50%
mlk_keccakf1600_xor_bytes (big endian) 3s 2s +50%
mlk_keccakf1600x4_permute 3s 1s +200%
mlk_poly_add 3s 2s +50%
mlk_poly_cbd_eta1 3s 3s +0%
mlk_poly_cbd_eta2 3s 2s +50%
mlk_poly_compress_d5 3s 3s +0%
mlk_poly_decompress_d11 3s 4s -25%
mlk_poly_decompress_d11_c 3s 4s -25%
mlk_poly_decompress_d5 3s 3s +0%
mlk_poly_decompress_d5_c 3s 2s +50%
mlk_poly_decompress_dv 3s 3s +0%
mlk_poly_reduce 3s 1s +200%
mlk_poly_reduce_c 3s 4s -25%
mlk_poly_tobytes_c 3s 2s +50%
mlk_poly_tomont_native 3s 1s +200%
mlk_polyvec_ntt 3s 3s +0%
mlk_polyvec_permute_bitrev_to_custom 3s 2s +50%
mlk_polyvec_reduce 3s 2s +50%
mlk_scalar_compress_d10 3s 1s +200%
mlk_scalar_compress_d4 3s 1s +200%
mlk_scalar_decompress_d10 3s 2s +50%
mlk_scalar_decompress_d11 3s 1s +200%
mlk_sha3_256 3s 2s +50%
mlk_sha3_512 3s 1s +200%
mlk_shake128_absorb_once 3s 2s +50%
mlk_shake256x4 3s 4s -25%
mlk_value_barrier_u32 3s 1s +200%
poly_compress_d10_native_x86_64 3s 1s +200%
poly_compress_d5_native_x86_64 3s 2s +50%
poly_decompress_d10_native_x86_64 3s 4s -25%
poly_frombytes_native_x86_64 3s 4s -25%
poly_invntt_tomont_native 3s 1s +200%
poly_tobytes_native_x86_64 3s 3s +0%
poly_tomont_native_aarch64 3s 3s +0%
poly_tomont_native_x86_64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 3s 4s -25%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 3s 4s -25%
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 3s 2s +50%
rej_uniform_native_aarch64 3s 3s +0%
intt_native_aarch64 2s 1s +100%
intt_native_x86_64 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 1s +100%
keccakf1600x4_extract_bytes_native 2s 3s -33%
keccakf1600x4_xor_bytes_native 2s 1s +100%
kem_keypair 2s 4s -50%
mlk_barrett_reduce 2s 3s -33%
mlk_ct_cmask_neg_i16 2s 1s +100%
mlk_ct_get_optblocker_i32 2s 1s +100%
mlk_ct_get_optblocker_u32 2s 3s -33%
mlk_ct_sel_int16 2s 2s +0%
mlk_ct_sel_uint8 2s 1s +100%
mlk_keccakf1600x4_extract_bytes 2s 3s -33%
mlk_keccakf1600x4_extract_bytes_c 2s 4s -50%
mlk_keccakf1600x4_xor_bytes 2s 2s +0%
mlk_keccakf1600x4_xor_bytes_c 2s 4s -50%
mlk_poly_compress_d11_native 2s 1s +100%
mlk_poly_compress_d4 2s 1s +100%
mlk_poly_compress_d4_c 2s 5s -60%
mlk_poly_compress_d4_native 2s 1s +100%
mlk_poly_compress_dv 2s 2s +0%
mlk_poly_decompress_d10_native 2s 3s -33%
mlk_poly_decompress_d4 2s 2s +0%
mlk_poly_decompress_d4_c 2s 4s -50%
mlk_poly_getnoise_eta1122_4x 2s 1s +100%
mlk_poly_getnoise_eta1_4x_native 2s 2s +0%
mlk_poly_getnoise_eta2 2s 3s -33%
mlk_poly_mulcache_compute 2s 4s -50%
mlk_poly_ntt_c 2s 4s -50%
mlk_poly_sub 2s 2s +0%
mlk_poly_tobytes 2s 2s +0%
mlk_poly_tobytes_native 2s 1s +100%
mlk_poly_tomont 2s 4s -50%
mlk_polyvec_basemul_acc_montgomery_cached 2s 1s +100%
mlk_polyvec_compress_du 2s 2s +0%
mlk_polyvec_frombytes 2s 4s -50%
mlk_polyvec_invntt_tomont 2s 2s +0%
mlk_polyvec_tomont 2s 3s -33%
mlk_rej_uniform 2s 1s +100%
mlk_scalar_compress_d1 2s 2s +0%
mlk_scalar_compress_d5 2s 1s +100%
mlk_scalar_decompress_d4 2s 4s -50%
mlk_scalar_decompress_d5 2s 2s +0%
mlk_scalar_signed_to_unsigned_q 2s 4s -50%
mlk_shake128_squeezeblocks 2s 2s +0%
mlk_shake128x4_squeezeblocks 2s 1s +100%
mlk_shake256 2s 2s +0%
mlk_value_barrier_u8 2s 2s +0%
poly_mulcache_compute_native_x86_64 2s 6s -67%
poly_reduce_native_aarch64 2s 2s +0%
poly_reduce_native_x86_64 2s 3s -33%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 2s 4s -50%
rej_uniform_native 2s 3s -33%
keccak_f1600_x1_native_aarch64 1s 1s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 2s -50%
mlk_ct_cmask_nonzero_u8 1s 1s +0%
mlk_ct_get_optblocker_u8 1s 2s -50%
mlk_keccakf1600_permute 1s 2s -50%
mlk_keypair_getnoise_eta1 1s 5s -80%
mlk_poly_compress_d10 1s 2s -50%
mlk_poly_compress_d10_c 1s 2s -50%
mlk_poly_compress_d5_c 1s 4s -75%
mlk_poly_compress_d5_native 1s 1s +0%
mlk_poly_decompress_d10_c 1s 3s -67%
mlk_poly_decompress_d4_native 1s 1s +0%
mlk_poly_decompress_du 1s 1s +0%
mlk_poly_frombytes 1s 4s -75%
mlk_poly_frombytes_c 1s 5s -80%
mlk_poly_invntt_tomont 1s 2s -50%
mlk_poly_mulcache_compute_c 1s 4s -75%
mlk_poly_mulcache_compute_native 1s 3s -67%
mlk_poly_tomsg 1s 2s -50%
mlk_polyvec_decompress_du 1s 4s -75%
mlk_scalar_compress_d11 1s 2s -50%
mlk_value_barrier_i32 1s 3s -67%
ntt_native_x86_64 1s 2s -50%
poly_compress_d11_native_x86_64 1s 4s -75%
poly_compress_d4_native_x86_64 1s 3s -67%
poly_decompress_d4_native_x86_64 1s 4s -75%
poly_tobytes_native_aarch64 1s 4s -75%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 1s 2s -50%
sys_check_capability 1s 1s +0%

Rewrite mlk_poly_ntt_c / mlk_poly_invntt_tomont_c to process two
layers at a time, with three 2-layer passes plus the leftover layer 7
as a single layer.

Introduces shared mlk_ct_butterfly and mlk_gs_butterfly helpers;
the inverse 2-layer block applies four GS butterflies and then
Barrett-reduces the additive outputs explicitly.

mlk_fqmul now takes a precomputed b_twisted = b * MLKEM_Q^{-1} mod 2^16
and uses a hi-mul / lo-mul-and-correct sequence in place of an inline
mlk_montgomery_reduce, dropping the QINV multiply. The mlk_zetas table
is regenerated as int16_t[128][2] of (zeta_mont, zeta_twisted) pairs.

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
@hanno-becker hanno-becker added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels May 16, 2026

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 12320 cycles 12319 cycles 1.00
ML-KEM-512 encaps 14999 cycles 14997 cycles 1.00
ML-KEM-512 decaps 19554 cycles 19549 cycles 1.00
ML-KEM-768 keypair 21264 cycles 21264 cycles 1
ML-KEM-768 encaps 23873 cycles 23871 cycles 1.00
ML-KEM-768 decaps 30416 cycles 30423 cycles 1.00
ML-KEM-1024 keypair 30328 cycles 30327 cycles 1.00
ML-KEM-1024 encaps 34574 cycles 34573 cycles 1.00
ML-KEM-1024 decaps 44191 cycles 44191 cycles 1

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 12042 cycles 12031 cycles 1.00
ML-KEM-512 encaps 13614 cycles 13792 cycles 0.99
ML-KEM-512 decaps 17818 cycles 17802 cycles 1.00
ML-KEM-768 keypair 21294 cycles 21035 cycles 1.01
ML-KEM-768 encaps 22008 cycles 22107 cycles 1.00
ML-KEM-768 decaps 28034 cycles 28330 cycles 0.99
ML-KEM-1024 keypair 29563 cycles 29964 cycles 0.99
ML-KEM-1024 encaps 31689 cycles 31704 cycles 1.00
ML-KEM-1024 decaps 39367 cycles 39312 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 24431 cycles 28199 cycles 0.87
ML-KEM-512 encaps 29933 cycles 36622 cycles 0.82
ML-KEM-512 decaps 37694 cycles 45214 cycles 0.83
ML-KEM-768 keypair 40053 cycles 46304 cycles 0.87
ML-KEM-768 encaps 50467 cycles 55843 cycles 0.90
ML-KEM-768 decaps 59728 cycles 69876 cycles 0.85
ML-KEM-1024 keypair 64223 cycles 70436 cycles 0.91
ML-KEM-1024 encaps 76545 cycles 82480 cycles 0.93
ML-KEM-1024 decaps 86915 cycles 99348 cycles 0.87

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 14215 cycles 14239 cycles 1.00
ML-KEM-512 encaps 15990 cycles 15964 cycles 1.00
ML-KEM-512 decaps 21534 cycles 21528 cycles 1.00
ML-KEM-768 keypair 25122 cycles 24710 cycles 1.02
ML-KEM-768 encaps 25669 cycles 25470 cycles 1.01
ML-KEM-768 decaps 33537 cycles 33335 cycles 1.01
ML-KEM-1024 keypair 34894 cycles 37146 cycles 0.94
ML-KEM-1024 encaps 36116 cycles 36786 cycles 0.98
ML-KEM-1024 decaps 47225 cycles 46716 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 12789 cycles 12790 cycles 1.00
ML-KEM-512 encaps 14279 cycles 14273 cycles 1.00
ML-KEM-512 decaps 19139 cycles 19129 cycles 1.00
ML-KEM-768 keypair 22564 cycles 22413 cycles 1.01
ML-KEM-768 encaps 23063 cycles 23072 cycles 1.00
ML-KEM-768 decaps 30067 cycles 30061 cycles 1.00
ML-KEM-1024 keypair 34215 cycles 33027 cycles 1.04
ML-KEM-1024 encaps 33003 cycles 33126 cycles 1.00
ML-KEM-1024 decaps 42408 cycles 42412 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-1024 keypair 34215 cycles 33027 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ppc64le (POWER10) benchmarks

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 52364 cycles 59434 cycles 0.88
ML-KEM-512 encaps 62294 cycles 72134 cycles 0.86
ML-KEM-512 decaps 76687 cycles 92082 cycles 0.83
ML-KEM-768 keypair 89778 cycles 99316 cycles 0.90
ML-KEM-768 encaps 103607 cycles 115930 cycles 0.89
ML-KEM-768 decaps 122375 cycles 141912 cycles 0.86
ML-KEM-1024 keypair 139005 cycles 150195 cycles 0.93
ML-KEM-1024 encaps 155445 cycles 169079 cycles 0.92
ML-KEM-1024 decaps 177695 cycles 200415 cycles 0.89

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 17690 cycles 17644 cycles 1.00
ML-KEM-512 encaps 20636 cycles 20596 cycles 1.00
ML-KEM-512 decaps 27083 cycles 27048 cycles 1.00
ML-KEM-768 keypair 29976 cycles 29903 cycles 1.00
ML-KEM-768 encaps 32752 cycles 32771 cycles 1.00
ML-KEM-768 decaps 42010 cycles 41962 cycles 1.00
ML-KEM-1024 keypair 43720 cycles 43743 cycles 1.00
ML-KEM-1024 encaps 48775 cycles 48657 cycles 1.00
ML-KEM-1024 decaps 61383 cycles 61383 cycles 1

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 17600 cycles 17540 cycles 1.00
ML-KEM-512 encaps 19907 cycles 19937 cycles 1.00
ML-KEM-512 decaps 26420 cycles 26445 cycles 1.00
ML-KEM-768 keypair 31206 cycles 31159 cycles 1.00
ML-KEM-768 encaps 31864 cycles 32046 cycles 0.99
ML-KEM-768 decaps 41472 cycles 41536 cycles 1.00
ML-KEM-1024 keypair 43815 cycles 43957 cycles 1.00
ML-KEM-1024 encaps 45880 cycles 45616 cycles 1.01
ML-KEM-1024 decaps 58050 cycles 58219 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 33786 cycles 40258 cycles 0.84
ML-KEM-512 encaps 41402 cycles 48385 cycles 0.86
ML-KEM-512 decaps 51313 cycles 62592 cycles 0.82
ML-KEM-768 keypair 54176 cycles 63729 cycles 0.85
ML-KEM-768 encaps 65367 cycles 74928 cycles 0.87
ML-KEM-768 decaps 78299 cycles 93722 cycles 0.84
ML-KEM-1024 keypair 84344 cycles 95285 cycles 0.89
ML-KEM-1024 encaps 98201 cycles 109505 cycles 0.90
ML-KEM-1024 decaps 114088 cycles 132331 cycles 0.86

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 30684 cycles 36614 cycles 0.84
ML-KEM-512 encaps 36951 cycles 43076 cycles 0.86
ML-KEM-512 decaps 45696 cycles 55713 cycles 0.82
ML-KEM-768 keypair 49371 cycles 58664 cycles 0.84
ML-KEM-768 encaps 58602 cycles 67519 cycles 0.87
ML-KEM-768 decaps 69939 cycles 84462 cycles 0.83
ML-KEM-1024 keypair 76257 cycles 88980 cycles 0.86
ML-KEM-1024 encaps 87602 cycles 99212 cycles 0.88
ML-KEM-1024 decaps 101779 cycles 120642 cycles 0.84

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 34213 cycles 35412 cycles 0.97
ML-KEM-512 encaps 38932 cycles 40111 cycles 0.97
ML-KEM-512 decaps 48961 cycles 51138 cycles 0.96
ML-KEM-768 keypair 54993 cycles 56668 cycles 0.97
ML-KEM-768 encaps 63100 cycles 65152 cycles 0.97
ML-KEM-768 decaps 75917 cycles 79299 cycles 0.96
ML-KEM-1024 keypair 85413 cycles 87866 cycles 0.97
ML-KEM-1024 encaps 94656 cycles 96875 cycles 0.98
ML-KEM-1024 decaps 111743 cycles 115827 cycles 0.96

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 18675 cycles 18637 cycles 1.00
ML-KEM-512 encaps 21886 cycles 21874 cycles 1.00
ML-KEM-512 decaps 28890 cycles 28863 cycles 1.00
ML-KEM-768 keypair 31630 cycles 31540 cycles 1.00
ML-KEM-768 encaps 34788 cycles 34773 cycles 1.00
ML-KEM-768 decaps 44835 cycles 44778 cycles 1.00
ML-KEM-1024 keypair 46068 cycles 46080 cycles 1.00
ML-KEM-1024 encaps 51494 cycles 51490 cycles 1.00
ML-KEM-1024 decaps 65004 cycles 65028 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 39358 cycles 45700 cycles 0.86
ML-KEM-512 encaps 46964 cycles 54451 cycles 0.86
ML-KEM-512 decaps 57447 cycles 69774 cycles 0.82
ML-KEM-768 keypair 62990 cycles 74220 cycles 0.85
ML-KEM-768 encaps 74954 cycles 86044 cycles 0.87
ML-KEM-768 decaps 88809 cycles 106669 cycles 0.83
ML-KEM-1024 keypair 100417 cycles 112098 cycles 0.90
ML-KEM-1024 encaps 110546 cycles 124743 cycles 0.89
ML-KEM-1024 decaps 128093 cycles 150712 cycles 0.85

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 59837 cycles 59732 cycles 1.00
ML-KEM-512 encaps 67447 cycles 67418 cycles 1.00
ML-KEM-512 decaps 86186 cycles 86116 cycles 1.00
ML-KEM-768 keypair 97444 cycles 97471 cycles 1.00
ML-KEM-768 encaps 110872 cycles 111029 cycles 1.00
ML-KEM-768 decaps 137941 cycles 137995 cycles 1.00
ML-KEM-1024 keypair 154689 cycles 154794 cycles 1.00
ML-KEM-1024 encaps 171850 cycles 171103 cycles 1.00
ML-KEM-1024 decaps 209734 cycles 208406 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 28264 cycles 28220 cycles 1.00
ML-KEM-512 encaps 34156 cycles 34107 cycles 1.00
ML-KEM-512 decaps 44374 cycles 44335 cycles 1.00
ML-KEM-768 keypair 47618 cycles 47615 cycles 1.00
ML-KEM-768 encaps 53933 cycles 53937 cycles 1.00
ML-KEM-768 decaps 68339 cycles 68365 cycles 1.00
ML-KEM-1024 keypair 70245 cycles 70246 cycles 1.00
ML-KEM-1024 encaps 78734 cycles 78726 cycles 1.00
ML-KEM-1024 decaps 98418 cycles 98445 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 37514 cycles 38880 cycles 0.96
ML-KEM-512 encaps 42772 cycles 44586 cycles 0.96
ML-KEM-512 decaps 53687 cycles 56659 cycles 0.95
ML-KEM-768 keypair 60007 cycles 62298 cycles 0.96
ML-KEM-768 encaps 68948 cycles 72317 cycles 0.95
ML-KEM-768 decaps 82493 cycles 87701 cycles 0.94
ML-KEM-1024 keypair 93052 cycles 96154 cycles 0.97
ML-KEM-1024 encaps 103167 cycles 106126 cycles 0.97
ML-KEM-1024 decaps 121291 cycles 126570 cycles 0.96

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 28238 cycles 28269 cycles 1.00
ML-KEM-512 encaps 34162 cycles 34122 cycles 1.00
ML-KEM-512 decaps 44342 cycles 44378 cycles 1.00
ML-KEM-768 keypair 47642 cycles 47674 cycles 1.00
ML-KEM-768 encaps 53923 cycles 53908 cycles 1.00
ML-KEM-768 decaps 68400 cycles 68363 cycles 1.00
ML-KEM-1024 keypair 70382 cycles 70273 cycles 1.00
ML-KEM-1024 encaps 78782 cycles 78768 cycles 1.00
ML-KEM-1024 decaps 98584 cycles 98473 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 55880 cycles 59124 cycles 0.95
ML-KEM-512 encaps 65578 cycles 68626 cycles 0.96
ML-KEM-512 decaps 82286 cycles 87341 cycles 0.94
ML-KEM-768 keypair 90360 cycles 95326 cycles 0.95
ML-KEM-768 encaps 104886 cycles 109860 cycles 0.95
ML-KEM-768 decaps 126772 cycles 134332 cycles 0.94
ML-KEM-1024 keypair 140425 cycles 147915 cycles 0.95
ML-KEM-1024 encaps 157358 cycles 163791 cycles 0.96
ML-KEM-1024 decaps 185033 cycles 195404 cycles 0.95

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 51033 cycles 50865 cycles 1.00
ML-KEM-512 encaps 58913 cycles 58841 cycles 1.00
ML-KEM-512 decaps 74849 cycles 74794 cycles 1.00
ML-KEM-768 keypair 86918 cycles 86024 cycles 1.01
ML-KEM-768 encaps 95132 cycles 94487 cycles 1.01
ML-KEM-768 decaps 118067 cycles 119530 cycles 0.99
ML-KEM-1024 keypair 130121 cycles 130071 cycles 1.00
ML-KEM-1024 encaps 142355 cycles 142892 cycles 1.00
ML-KEM-1024 decaps 174938 cycles 173373 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks

Details
Benchmark suite Current: ee59089 Previous: 070028c Ratio
ML-KEM-512 keypair 155485 cycles 155504 cycles 1.00
ML-KEM-512 encaps 163394 cycles 163399 cycles 1.00
ML-KEM-512 decaps 206591 cycles 206667 cycles 1.00
ML-KEM-768 keypair 249903 cycles 249893 cycles 1.00
ML-KEM-768 encaps 270434 cycles 270406 cycles 1.00
ML-KEM-768 decaps 332188 cycles 332823 cycles 1.00
ML-KEM-1024 keypair 395688 cycles 395922 cycles 1.00
ML-KEM-1024 encaps 422596 cycles 423034 cycles 1.00
ML-KEM-1024 decaps 506524 cycles 506558 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark this PR should be benchmarked in CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants