Skip to content

x86_64 + HOL-Light: Replace poly_use_hint AVX2 intrinsics with hand-written assembly and HOL-Light proofs#1189

Open
jakemas wants to merge 1 commit into
mainfrom
use-hint-x86-proofs-pr
Open

x86_64 + HOL-Light: Replace poly_use_hint AVX2 intrinsics with hand-written assembly and HOL-Light proofs#1189
jakemas wants to merge 1 commit into
mainfrom
use-hint-x86-proofs-pr

Conversation

@jakemas

@jakemas jakemas commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Resolves #484
Resolves #485

This is the only remaining intrinsic to be picked up, and proved. I know @willieyz has a PR in flight, I've taken it, brought it up to main, fixed conflicts, adjusted asm, fixed #1074/#1076, added hol-light CBMC proofs and supporting infra.

Replaces the poly_use_hint_32 (ML-DSA-65/87) and poly_use_hint_88 (ML-DSA-44) AVX2 intrinsics with hand-written assembly, and adds HOL-Light functional-correctness and constant-time / memory-safety proofs for both x86_64 routines.

Builds on @willieyz's original assembly (#940). Changes from that code:

Performance

The performance degradation we previously saw in willieyz's PR is fixed.
poly_use_hint component benchmark, median cycles on Intel Xeon Platinum 8175M, OPT=1 CYCLES=PMU:

Variant AVX2 intrinsics (baseline, main) @willieyz asm (#940) Hand-written AVX2 asm (this PR)
use_hint_32 (ML-DSA-65/87) ~314 ~300 ~303
use_hint_88 (ML-DSA-44) ~377 ~351 ~356

@jakemas jakemas force-pushed the use-hint-x86-proofs-pr branch 4 times, most recently from 9981d8b to 5dfcb0c Compare June 18, 2026 23:44
@oqs-bot

oqs-bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-65, REDUCE-RAM)

Full Results (206 proofs)
Proof Status Current Previous Change
**TOTAL** 1661s 1495s +11.1%
mld_invntt_layer 180s 163s +10%
poly_pointwise_montgomery_c 149s 133s +12%
rej_uniform_native 130s 119s +9%
polyvec_matrix_pointwise_montgomery_yvec 88s 81s +9%
mld_ct_memcmp 73s 65s +12%
mld_ntt_layer 48s 41s +17%
fqmul 44s 39s +13%
polyveck_chknorm 41s 36s +14%
mld_attempt_signature_generation 24s 21s +14%
keccakf1600x4_permute_native 23s 22s +5%
mld_ntt_butterfly_block 23s 21s +10%
poly_chknorm_c 21s 17s +24%
polyt0_unpack 17s 17s +0%
rej_uniform_c 17s 16s +6%
sign_verify_internal 17s 16s +6%
polyveck_decompose 16s 16s +0%
polyvecl_chknorm 16s 14s +14%
mld_check_pct 15s 14s +7%
poly_add 12s 11s +9%
poly_uniform_eta_4x 12s 13s -8%
keccak_absorb_once_x4 10s 8s +25%
polyvecl_ntt 10s 6s +67%
rej_uniform 10s 7s +43%
poly_invntt_tomont_c 9s 10s -10%
polyvec_matrix_pointwise_montgomery_row 9s 11s -18%
poly_power2round 8s 7s +14%
polyveck_caddq 8s 7s +14%
sign 8s 6s +33%
compute_pack_t0_t1 7s 7s +0%
mld_keccakf1600_permute_c 7s 6s +17%
mld_sample_s1_s2_serial 7s 3s +133%
pointwise_acc_native_aarch64 7s 6s +17%
polyveck_invntt_tomont 7s 5s +40%
sign_verify_extmu 7s 4s +75%
keccak_absorb 6s 5s +20%
keccakf1600_permute_native 6s 5s +20%
mld_compute_pack_z 6s 6s +0%
mld_h 6s 3s +100%
poly_caddq_c 6s 3s +100%
poly_invntt_tomont_native 6s 2s +200%
poly_shiftl 6s 4s +50%
polyveck_unpack_eta 6s 2s +200%
power2round 6s 2s +200%
rej_eta_native 6s 4s +50%
sign_pk_from_sk 6s 7s -14%
sign_verify 6s 2s +200%
decompose 5s 2s +150%
intt_native_x86_64 5s 5s +0%
keccak_squeeze 5s 4s +25%
keccak_squeezeblocks_x4 5s 5s +0%
keccakf1600x4_permute 5s 3s +67%
mld_polymat_expand_entry 5s 3s +67%
pointwise_acc_native_x86_64 5s 6s -17%
poly_chknorm_native_x86_64 5s 2s +150%
poly_permute_bitrev_to_custom_optional 5s 4s +25%
poly_pointwise_montgomery 5s 2s +150%
poly_reduce 5s 3s +67%
poly_uniform 5s 3s +67%
poly_uniform_gamma1_4x 5s 2s +150%
polyeta_unpack 5s 7s -29%
polyveck_pack_eta 5s 2s +150%
polyvecl_uniform_gamma1_serial 5s 3s +67%
polyz_unpack_c 5s 6s -17%
rej_eta_c 5s 2s +150%
rej_uniform_eta_native_aarch64 5s 3s +67%
shake256x4_absorb_once 5s 1s +400%
sign_keypair 5s 6s -17%
sign_open 5s 5s +0%
sign_signature_extmu 5s 3s +67%
sign_signature_pre_hash_shake256 5s 6s -17%
sign_verify_pre_hash_internal 5s 2s +150%
intt_native_aarch64 4s 4s +0%
keccakf1600_extract_bytes (big endian) 4s 1s +300%
keccakf1600x4_xor_bytes_native 4s 1s +300%
mld_ct_abs_i32 4s 3s +33%
mld_ct_cmask_neg_i32 4s 4s +0%
mld_keccakf1600x4_xor_bytes_c 4s 1s +300%
mld_sample_s1_s2 4s 4s +0%
ntt_native_aarch64 4s 3s +33%
ntt_native_x86_64 4s 4s +0%
pack_sig_c 4s 3s +33%
pack_sk_rho_key_tr_s2 4s 2s +100%
pointwise_native_aarch64 4s 2s +100%
poly_caddq_native 4s 3s +33%
poly_caddq_native_aarch64 4s 4s +0%
poly_challenge 4s 5s -20%
poly_decompose 4s 2s +100%
poly_decompose_88_native_aarch64 4s 2s +100%
poly_decompose_native 4s 3s +33%
poly_use_hint_native_x86_64 4s - new
polyeta_pack 4s 2s +100%
polyt0_pack 4s 6s -33%
polyt1_unpack 4s 3s +33%
polyveck_ntt 4s 4s +0%
polyveck_reduce 4s 4s +0%
polyvecl_pointwise_acc_montgomery_native 4s 3s +33%
polyvecl_unpack_eta 4s 3s +33%
polyvecl_unpack_z 4s 3s +33%
shake256_finalize 4s 2s +100%
sign_keypair_internal 4s 4s +0%
sign_signature 4s 3s +33%
sign_signature_internal 4s 4s +0%
sign_signature_pre_hash_internal 4s 4s +0%
unpack_pk_t1 4s 1s +300%
unpack_sk_s2hat 4s 2s +100%
caddq 3s 3s +0%
fqscale 3s 4s -25%
keccak_f1600_x1_native_aarch64 3s 2s +50%
keccakf1600_xor_bytes 3s 2s +50%
keccakf1600x4_extract_bytes 3s 2s +50%
keccakf1600x4_extract_bytes_native 3s 3s +0%
mld_ct_get_optblocker_i64 3s 2s +50%
mld_ct_get_optblocker_u8 3s 3s +0%
mld_keccakf1600_extract_bytes 3s 3s +0%
mld_prepare_domain_separation_prefix 3s 3s +0%
mld_value_barrier_i64 3s 3s +0%
mld_value_barrier_u32 3s 2s +50%
mld_value_barrier_u8 3s 2s +50%
montgomery_reduce 3s 5s -40%
nttunpack_native_x86_64 3s 3s +0%
pack_sig_h 3s 4s -25%
pack_sig_z 3s 3s +0%
pack_sk_s1 3s 4s -25%
pointwise_native_x86_64 3s 3s +0%
poly_caddq 3s 3s +0%
poly_caddq_native_x86_64 3s 2s +50%
poly_chknorm_native 3s 4s -25%
poly_decompose_32_native_aarch64 3s 1s +200%
poly_decompose_c 3s 5s -40%
poly_ntt 3s 3s +0%
poly_ntt_c 3s 2s +50%
poly_sub 3s 3s +0%
poly_uniform_4x 3s 3s +0%
poly_uniform_eta 3s 2s +50%
poly_uniform_gamma1 3s 3s +0%
poly_use_hint_c 3s 3s +0%
poly_use_hint_native 3s 4s -25%
poly_use_hint_native_aarch64 3s 3s +0%
polyt1_pack 3s 4s -25%
polyvec_matrix_expand 3s 1s +200%
polyvec_matrix_expand_serial 3s 3s +0%
polyvecl_pointwise_acc_montgomery_c 3s 3s +0%
polyvecl_uniform_gamma1 3s 2s +50%
polyw1_pack_88 3s 3s +0%
polyz_unpack 3s 3s +0%
polyz_unpack_17_native_aarch64 3s 4s -25%
polyz_unpack_native 3s 2s +50%
polyz_unpack_native_x86_64 3s 3s +0%
reduce32 3s 3s +0%
rej_uniform_native_aarch64 3s 3s +0%
shake128_absorb 3s 3s +0%
shake128_init 3s 2s +50%
shake128_squeeze 3s 2s +50%
shake256_absorb 3s 3s +0%
shake256_init 3s 2s +50%
shake256_squeeze 3s 2s +50%
sk_s1hat_get_poly 3s 3s +0%
sk_s2hat_get_poly 3s 1s +200%
unpack_sk 3s 3s +0%
unpack_sk_s1hat 3s 4s -25%
yvec_get_poly 3s 3s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 3s -33%
keccak_f1600_x4_native_avx2 2s 2s +0%
keccak_finalize 2s 4s -50%
keccak_init 2s 3s -33%
keccakf1600_xor_bytes (big endian) 2s 3s -33%
keccakf1600x4_xor_bytes 2s 3s -33%
make_hint 2s 4s -50%
mld_ct_cmask_nonzero_u32 2s 4s -50%
mld_ct_cmask_nonzero_u8 2s 4s -50%
mld_ct_get_optblocker_u32 2s 4s -50%
mld_keccakf1600x4_extract_bytes_c 2s 4s -50%
poly_chknorm 2s 5s -60%
poly_chknorm_native_aarch64 2s 4s -50%
poly_invntt_tomont 2s 4s -50%
poly_ntt_native 2s 4s -50%
poly_permute_bitrev_to_custom_optional_native 2s 5s -60%
poly_use_hint 2s 4s -50%
polyvecl_pack_eta 2s 3s -33%
polyvecl_pointwise_acc_montgomery 2s 2s +0%
polyw1_pack 2s 4s -50%
polyw1_pack_32 2s 2s +0%
polyz_pack 2s 3s -33%
polyz_unpack_19_native_aarch64 2s 4s -50%
rej_eta 2s 2s +0%
shake128_finalize 2s 4s -50%
shake128x4_absorb_once 2s 1s +100%
shake128x4_squeezeblocks 2s 2s +0%
shake256 2s 2s +0%
shake256_release 2s 2s +0%
sig_unpack_hints 2s 3s -33%
sign_verify_pre_hash_shake256 2s 5s -60%
sk_t0hat_get_poly 2s 2s +0%
unpack_sk_t0hat 2s 1s +100%
yvec_init 2s 5s -60%
keccak_f1600_x1_native_aarch64_v84a 1s 2s -50%
keccak_f1600_x4_native_aarch64_v84a 1s 1s +0%
keccakf1600_permute 1s 2s -50%
mld_ct_sel_int32 1s 4s -75%
poly_pointwise_montgomery_native 1s 2s -50%
polyveck_pack_w1 1s 2s -50%
shake128_release 1s 1s +0%
shake256x4_squeezeblocks 1s 2s -50%
sys_check_capability 1s 2s -50%
use_hint 1s 3s -67%

@oqs-bot

oqs-bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-44, REDUCE-RAM)

Full Results (206 proofs)
Proof Status Current Previous Change
**TOTAL** 1474s 1486s -0.8%
mld_invntt_layer 156s 160s -3%
poly_pointwise_montgomery_c 121s 131s -8%
rej_uniform_native 119s 115s +3%
polyvec_matrix_pointwise_montgomery_yvec 109s 111s -2%
mld_ct_memcmp 63s 63s +0%
mld_ntt_layer 44s 40s +10%
fqmul 40s 42s -5%
keccakf1600x4_permute_native 24s 22s +9%
mld_attempt_signature_generation 24s 24s +0%
sign_verify_internal 22s 23s -4%
mld_ntt_butterfly_block 21s 19s +11%
poly_chknorm_c 18s 17s +6%
polyeta_unpack 17s 14s +21%
mld_check_pct 15s 17s -12%
polyt0_unpack 14s 18s -22%
rej_uniform_c 14s 16s -12%
poly_uniform_eta_4x 13s 12s +8%
polyz_unpack_c 11s 11s +0%
compute_pack_t0_t1 9s 8s +12%
poly_add 9s 11s -18%
polyvec_matrix_pointwise_montgomery_row 9s 9s +0%
keccak_absorb_once_x4 8s 9s -11%
sign_keypair 8s 3s +167%
poly_decompose_c 7s 6s +17%
poly_invntt_tomont_c 7s 11s -36%
polyveck_chknorm 7s 9s -22%
rej_uniform 7s 8s -12%
sign 7s 7s +0%
mld_compute_pack_z 6s 6s +0%
poly_uniform_eta 6s 3s +100%
polyvecl_chknorm 6s 5s +20%
sign_pk_from_sk 6s 6s +0%
sign_signature_extmu 6s 5s +20%
sign_signature_pre_hash_internal 6s 5s +20%
caddq 5s 4s +25%
keccak_absorb 5s 6s -17%
mld_keccakf1600_permute_c 5s 5s +0%
ntt_native_x86_64 5s 4s +25%
nttunpack_native_x86_64 5s 3s +67%
pointwise_acc_native_x86_64 5s 6s -17%
pointwise_native_aarch64 5s 2s +150%
poly_caddq_native 5s 3s +67%
poly_challenge 5s 5s +0%
poly_chknorm_native 5s 3s +67%
poly_pointwise_montgomery 5s 3s +67%
poly_power2round 5s 8s -38%
poly_reduce 5s 5s +0%
poly_shiftl 5s 4s +25%
poly_sub 5s 3s +67%
poly_use_hint 5s 4s +25%
polyveck_decompose 5s 5s +0%
polyvecl_ntt 5s 3s +67%
sign_signature_internal 5s 3s +67%
sign_verify 5s 6s -17%
sign_verify_extmu 5s 5s +0%
intt_native_aarch64 4s 2s +100%
keccak_squeezeblocks_x4 4s 3s +33%
keccakf1600_permute 4s 3s +33%
keccakf1600_xor_bytes (big endian) 4s 3s +33%
keccakf1600x4_extract_bytes_native 4s 3s +33%
mld_ct_cmask_nonzero_u32 4s 4s +0%
mld_h 4s 3s +33%
mld_keccakf1600_extract_bytes 4s 4s +0%
mld_polymat_expand_entry 4s 5s -20%
mld_sample_s1_s2 4s 5s -20%
mld_value_barrier_i64 4s 1s +300%
pack_sk_rho_key_tr_s2 4s 2s +100%
pointwise_acc_native_aarch64 4s 6s -33%
poly_caddq_c 4s 4s +0%
poly_caddq_native_aarch64 4s 2s +100%
poly_decompose_88_native_aarch64 4s 2s +100%
poly_invntt_tomont_native 4s 3s +33%
poly_pointwise_montgomery_native 4s 3s +33%
poly_uniform 4s 3s +33%
poly_uniform_4x 4s 3s +33%
poly_use_hint_c 4s 3s +33%
poly_use_hint_native_aarch64 4s 3s +33%
polyt1_unpack 4s 3s +33%
polyvec_matrix_expand_serial 4s 2s +100%
polyveck_invntt_tomont 4s 7s -43%
polyveck_reduce 4s 4s +0%
polyw1_pack_88 4s 2s +100%
polyz_unpack 4s 3s +33%
rej_uniform_eta_native_aarch64 4s 5s -20%
shake128_absorb 4s 1s +300%
shake128_release 4s 2s +100%
shake256 4s 2s +100%
sign_keypair_internal 4s 4s +0%
sign_open 4s 5s -20%
sign_signature 4s 4s +0%
sign_signature_pre_hash_shake256 4s 2s +100%
yvec_get_poly 4s 1s +300%
decompose 3s 3s +0%
fqscale 3s 2s +50%
intt_native_x86_64 3s 3s +0%
keccak_finalize 3s 1s +200%
keccak_init 3s 2s +50%
keccak_squeeze 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 1s +200%
keccakf1600_xor_bytes 3s 2s +50%
keccakf1600x4_xor_bytes 3s 2s +50%
mld_ct_cmask_neg_i32 3s 3s +0%
mld_ct_get_optblocker_i64 3s 2s +50%
mld_prepare_domain_separation_prefix 3s 5s -40%
montgomery_reduce 3s 2s +50%
ntt_native_aarch64 3s 2s +50%
pack_sig_c 3s 2s +50%
pointwise_native_x86_64 3s 2s +50%
poly_caddq 3s 2s +50%
poly_caddq_native_x86_64 3s 2s +50%
poly_chknorm_native_x86_64 3s 2s +50%
poly_decompose_native 3s 4s -25%
poly_invntt_tomont 3s 2s +50%
poly_ntt 3s 3s +0%
poly_permute_bitrev_to_custom_optional_native 3s 2s +50%
poly_uniform_gamma1 3s 3s +0%
poly_uniform_gamma1_4x 3s 3s +0%
poly_use_hint_native_x86_64 3s - new
polyt0_pack 3s 5s -40%
polyveck_caddq 3s 4s -25%
polyveck_ntt 3s 2s +50%
polyveck_pack_w1 3s 3s +0%
polyvecl_pointwise_acc_montgomery_native 3s 3s +0%
polyvecl_uniform_gamma1 3s 2s +50%
polyvecl_unpack_eta 3s 4s -25%
polyvecl_unpack_z 3s 4s -25%
polyw1_pack_32 3s 4s -25%
polyz_unpack_19_native_aarch64 3s 3s +0%
polyz_unpack_native 3s 3s +0%
reduce32 3s 3s +0%
rej_eta 3s 2s +50%
rej_eta_native 3s 4s -25%
rej_uniform_native_aarch64 3s 3s +0%
shake128_finalize 3s 1s +200%
shake128_squeeze 3s 1s +200%
shake128x4_squeezeblocks 3s 2s +50%
shake256_init 3s 2s +50%
shake256_squeeze 3s 2s +50%
shake256x4_squeezeblocks 3s 2s +50%
sign_verify_pre_hash_internal 3s 5s -40%
sign_verify_pre_hash_shake256 3s 4s -25%
sys_check_capability 3s 4s -25%
unpack_pk_t1 3s 3s +0%
unpack_sk_s1hat 3s 3s +0%
unpack_sk_s2hat 3s 4s -25%
keccak_f1600_x1_native_aarch64 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_avx2 2s 3s -33%
keccakf1600_permute_native 2s 2s +0%
keccakf1600x4_permute 2s 1s +100%
mld_ct_get_optblocker_u32 2s 2s +0%
mld_ct_get_optblocker_u8 2s 2s +0%
mld_keccakf1600x4_extract_bytes_c 2s 1s +100%
mld_keccakf1600x4_xor_bytes_c 2s 3s -33%
mld_sample_s1_s2_serial 2s 4s -50%
mld_value_barrier_u32 2s 2s +0%
pack_sig_h 2s 3s -33%
pack_sig_z 2s 3s -33%
pack_sk_s1 2s 4s -50%
poly_chknorm 2s 5s -60%
poly_decompose_32_native_aarch64 2s 2s +0%
poly_ntt_c 2s 2s +0%
poly_ntt_native 2s 5s -60%
poly_permute_bitrev_to_custom_optional 2s 2s +0%
poly_use_hint_native 2s 6s -67%
polyeta_pack 2s 2s +0%
polyt1_pack 2s 4s -50%
polyveck_unpack_eta 2s 2s +0%
polyvecl_pack_eta 2s 3s -33%
polyvecl_pointwise_acc_montgomery 2s 4s -50%
polyvecl_pointwise_acc_montgomery_c 2s 3s -33%
polyvecl_uniform_gamma1_serial 2s 5s -60%
polyw1_pack 2s 2s +0%
polyz_unpack_17_native_aarch64 2s 7s -71%
polyz_unpack_native_x86_64 2s 3s -33%
power2round 2s 1s +100%
rej_eta_c 2s 3s -33%
shake128_init 2s 3s -33%
shake256_absorb 2s 2s +0%
shake256_finalize 2s 3s -33%
sk_s1hat_get_poly 2s 4s -50%
sk_s2hat_get_poly 2s 2s +0%
sk_t0hat_get_poly 2s 3s -33%
unpack_sk 2s 1s +100%
unpack_sk_t0hat 2s 1s +100%
use_hint 2s 3s -33%
yvec_init 2s 4s -50%
keccak_f1600_x4_native_aarch64_v84a 1s 4s -75%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 1s 3s -67%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 2s -50%
keccakf1600x4_extract_bytes 1s 3s -67%
keccakf1600x4_xor_bytes_native 1s 3s -67%
make_hint 1s 2s -50%
mld_ct_abs_i32 1s 2s -50%
mld_ct_cmask_nonzero_u8 1s 3s -67%
mld_ct_sel_int32 1s 3s -67%
mld_value_barrier_u8 1s 1s +0%
poly_chknorm_native_aarch64 1s 3s -67%
poly_decompose 1s 4s -75%
polyvec_matrix_expand 1s 2s -50%
polyveck_pack_eta 1s 2s -50%
polyz_pack 1s 2s -50%
shake128x4_absorb_once 1s 3s -67%
shake256_release 1s 3s -67%
shake256x4_absorb_once 1s 1s +0%
sig_unpack_hints 1s 3s -67%

@oqs-bot

oqs-bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-87, REDUCE-RAM)

Full Results (206 proofs)
Proof Status Current Previous Change
**TOTAL** 1653s 1664s -0.7%
mld_invntt_layer 175s 172s +2%
polyvec_matrix_pointwise_montgomery_yvec 163s 162s +1%
poly_pointwise_montgomery_c 138s 135s +2%
rej_uniform_native 127s 128s -1%
mld_ct_memcmp 72s 69s +4%
mld_ntt_layer 46s 44s +5%
fqmul 43s 44s -2%
mld_attempt_signature_generation 35s 35s +0%
keccakf1600x4_permute_native 25s 22s +14%
mld_ntt_butterfly_block 24s 23s +4%
sign_verify_internal 23s 22s +5%
poly_chknorm_c 19s 19s +0%
polyt0_unpack 17s 15s +13%
polyveck_decompose 17s 16s +6%
rej_uniform_c 16s 16s +0%
polyeta_unpack 15s 15s +0%
mld_check_pct 13s 11s +18%
poly_add 13s 12s +8%
poly_uniform_eta_4x 12s 14s -14%
compute_pack_t0_t1 11s 11s +0%
polyvecl_chknorm 10s 11s -9%
polyvecl_ntt 10s 9s +11%
keccak_absorb_once_x4 8s 8s +0%
pointwise_acc_native_x86_64 8s 7s +14%
poly_invntt_tomont_c 8s 7s +14%
rej_uniform 8s 9s -11%
sign_pk_from_sk 8s 6s +33%
mld_sample_s1_s2_serial 7s 6s +17%
poly_power2round 7s 8s -12%
polyveck_invntt_tomont 7s 7s +0%
sign 7s 7s +0%
sign_keypair_internal 7s 8s -12%
keccak_absorb 6s 6s +0%
mld_sample_s1_s2 6s 5s +20%
pointwise_acc_native_aarch64 6s 8s -25%
poly_caddq_c 6s 5s +20%
poly_decompose_c 6s 10s -40%
poly_ntt 6s 2s +200%
poly_reduce 6s 4s +50%
polyveck_caddq 6s 6s +0%
polyveck_reduce 6s 6s +0%
polyz_unpack_c 6s 7s -14%
rej_eta_native 6s 5s +20%
rej_uniform_native_aarch64 6s 5s +20%
sign_signature_pre_hash_internal 6s 3s +100%
sign_verify_pre_hash_shake256 6s 4s +50%
intt_native_x86_64 5s 1s +400%
keccakf1600x4_permute 5s 2s +150%
mld_compute_pack_z 5s 4s +25%
mld_keccakf1600_permute_c 5s 5s +0%
pointwise_native_aarch64 5s 4s +25%
poly_permute_bitrev_to_custom_optional_native 5s 3s +67%
poly_uniform 5s 3s +67%
poly_use_hint_native 5s 3s +67%
poly_use_hint_native_x86_64 5s - new
polyvec_matrix_expand_serial 5s 4s +25%
polyvec_matrix_pointwise_montgomery_row 5s 9s -44%
sign_signature 5s 7s -29%
intt_native_aarch64 4s 2s +100%
keccak_squeezeblocks_x4 4s 4s +0%
keccakf1600_permute 4s 1s +300%
keccakf1600x4_xor_bytes_native 4s 2s +100%
mld_ct_cmask_neg_i32 4s 3s +33%
mld_polymat_expand_entry 4s 3s +33%
montgomery_reduce 4s 3s +33%
nttunpack_native_x86_64 4s 5s -20%
poly_caddq 4s 4s +0%
poly_caddq_native 4s 5s -20%
poly_caddq_native_aarch64 4s 3s +33%
poly_challenge 4s 6s -33%
poly_chknorm_native_aarch64 4s 3s +33%
poly_ntt_native 4s 2s +100%
poly_permute_bitrev_to_custom_optional 4s 2s +100%
poly_pointwise_montgomery_native 4s 3s +33%
poly_shiftl 4s 5s -20%
poly_uniform_eta 4s 2s +100%
poly_uniform_gamma1 4s 2s +100%
poly_use_hint_c 4s 4s +0%
polyeta_pack 4s 2s +100%
polyveck_unpack_eta 4s 3s +33%
polyvecl_pack_eta 4s 2s +100%
polyvecl_pointwise_acc_montgomery_native 4s 3s +33%
polyvecl_uniform_gamma1 4s 2s +100%
polyvecl_uniform_gamma1_serial 4s 4s +0%
polyvecl_unpack_z 4s 2s +100%
polyz_unpack 4s 3s +33%
sign_keypair 4s 5s -20%
sign_open 4s 6s -33%
sign_signature_extmu 4s 5s -20%
sign_signature_pre_hash_shake256 4s 6s -33%
sign_verify 4s 5s -20%
sign_verify_extmu 4s 4s +0%
sign_verify_pre_hash_internal 4s 2s +100%
sk_s2hat_get_poly 4s 2s +100%
unpack_pk_t1 4s 2s +100%
unpack_sk_s1hat 4s 2s +100%
unpack_sk_s2hat 4s 2s +100%
caddq 3s 2s +50%
fqscale 3s 3s +0%
keccak_f1600_x1_native_aarch64 3s 1s +200%
keccak_f1600_x1_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 2s +50%
keccak_f1600_x4_native_avx2 3s 1s +200%
keccak_squeeze 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 1s +200%
keccakf1600_xor_bytes 3s 4s -25%
mld_ct_abs_i32 3s 1s +200%
mld_ct_cmask_nonzero_u8 3s 6s -50%
mld_ct_sel_int32 3s 2s +50%
mld_h 3s 3s +0%
mld_keccakf1600x4_extract_bytes_c 3s 2s +50%
mld_value_barrier_u32 3s 5s -40%
pack_sig_c 3s 4s -25%
pack_sig_z 3s 3s +0%
pointwise_native_x86_64 3s 5s -40%
poly_caddq_native_x86_64 3s 3s +0%
poly_chknorm 3s 3s +0%
poly_chknorm_native 3s 3s +0%
poly_decompose_88_native_aarch64 3s 3s +0%
poly_decompose_native 3s 2s +50%
poly_invntt_tomont_native 3s 4s -25%
poly_pointwise_montgomery 3s 3s +0%
poly_sub 3s 2s +50%
poly_uniform_gamma1_4x 3s 5s -40%
polyveck_ntt 3s 2s +50%
polyvecl_pointwise_acc_montgomery_c 3s 3s +0%
polyw1_pack 3s 3s +0%
polyw1_pack_32 3s 2s +50%
polyz_unpack_17_native_aarch64 3s 6s -50%
polyz_unpack_19_native_aarch64 3s 3s +0%
power2round 3s 3s +0%
reduce32 3s 3s +0%
rej_eta_c 3s 4s -25%
rej_uniform_eta_native_aarch64 3s 3s +0%
shake128_release 3s 3s +0%
shake256 3s 2s +50%
shake256_absorb 3s 3s +0%
shake256_squeeze 3s 2s +50%
shake256x4_squeezeblocks 3s 5s -40%
sig_unpack_hints 3s 5s -40%
sign_signature_internal 3s 4s -25%
sk_s1hat_get_poly 3s 3s +0%
sk_t0hat_get_poly 3s 3s +0%
sys_check_capability 3s 2s +50%
use_hint 3s 2s +50%
keccak_f1600_x4_native_aarch64_v84a 2s 5s -60%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 4s -50%
keccak_finalize 2s 2s +0%
keccak_init 2s 3s -33%
keccakf1600_permute_native 2s 3s -33%
keccakf1600x4_extract_bytes 2s 5s -60%
keccakf1600x4_extract_bytes_native 2s 4s -50%
keccakf1600x4_xor_bytes 2s 5s -60%
make_hint 2s 3s -33%
mld_ct_cmask_nonzero_u32 2s 3s -33%
mld_ct_get_optblocker_i64 2s 1s +100%
mld_ct_get_optblocker_u32 2s 3s -33%
mld_keccakf1600_extract_bytes 2s 4s -50%
mld_keccakf1600x4_xor_bytes_c 2s 3s -33%
mld_prepare_domain_separation_prefix 2s 4s -50%
mld_value_barrier_i64 2s 4s -50%
mld_value_barrier_u8 2s 2s +0%
ntt_native_aarch64 2s 2s +0%
ntt_native_x86_64 2s 4s -50%
pack_sk_rho_key_tr_s2 2s 2s +0%
pack_sk_s1 2s 3s -33%
poly_chknorm_native_x86_64 2s 5s -60%
poly_decompose 2s 4s -50%
poly_decompose_32_native_aarch64 2s 2s +0%
poly_invntt_tomont 2s 3s -33%
poly_ntt_c 2s 3s -33%
poly_uniform_4x 2s 4s -50%
poly_use_hint 2s 4s -50%
poly_use_hint_native_aarch64 2s 4s -50%
polyt0_pack 2s 2s +0%
polyt1_pack 2s 4s -50%
polyt1_unpack 2s 3s -33%
polyveck_chknorm 2s 2s +0%
polyveck_pack_eta 2s 5s -60%
polyveck_pack_w1 2s 3s -33%
polyvecl_pointwise_acc_montgomery 2s 3s -33%
polyvecl_unpack_eta 2s 4s -50%
polyw1_pack_88 2s 5s -60%
polyz_unpack_native 2s 7s -71%
polyz_unpack_native_x86_64 2s 4s -50%
shake128_absorb 2s 3s -33%
shake128_finalize 2s 2s +0%
shake128x4_squeezeblocks 2s 3s -33%
shake256_release 2s 3s -33%
shake256x4_absorb_once 2s 1s +100%
unpack_sk 2s 5s -60%
unpack_sk_t0hat 2s 3s -33%
yvec_get_poly 2s 4s -50%
yvec_init 2s 4s -50%
decompose 1s 2s -50%
keccakf1600_xor_bytes (big endian) 1s 2s -50%
mld_ct_get_optblocker_u8 1s 1s +0%
pack_sig_h 1s 3s -67%
polyvec_matrix_expand 1s 2s -50%
polyz_pack 1s 1s +0%
rej_eta 1s 3s -67%
shake128_init 1s 4s -75%
shake128_squeeze 1s 2s -50%
shake128x4_absorb_once 1s 2s -50%
shake256_finalize 1s 3s -67%
shake256_init 1s 2s -50%

@oqs-bot

oqs-bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-65)

Full Results (206 proofs)
Proof Status Current Previous Change
**TOTAL** 2062s 2203s -6.4%
mld_invntt_layer 281s 319s -12%
polyvecl_pointwise_acc_montgomery_c 205s 238s -14%
rej_uniform_native 148s 159s -7%
polyvec_matrix_expand 129s 140s -8%
poly_pointwise_montgomery_c 93s 112s -17%
mld_attempt_signature_generation 67s 68s -1%
sign_verify_internal 66s 67s -1%
mld_ct_memcmp 65s 75s -13%
sign_signature_internal 47s 48s -2%
mld_ntt_layer 44s 46s -4%
fqmul 41s 47s -13%
keccakf1600x4_permute_native 25s 23s +9%
polyvec_matrix_expand_serial 24s 27s -11%
rej_uniform 22s 22s +0%
compute_pack_t0_t1 21s 17s +24%
mld_ntt_butterfly_block 20s 23s -13%
poly_chknorm_c 18s 22s -18%
polyt0_unpack 17s 17s +0%
mld_check_pct 16s 18s -11%
polyvecl_chknorm 16s 15s +7%
poly_uniform_eta_4x 13s 12s +8%
polyveck_decompose 13s 15s -13%
rej_uniform_c 13s 15s -13%
poly_uniform_4x 12s 10s +20%
keccak_absorb_once_x4 10s 9s +11%
poly_add 10s 11s -9%
polyveck_chknorm 10s 12s -17%
mld_compute_pack_z 9s 8s +12%
poly_invntt_tomont_c 9s 9s +0%
polyveck_invntt_tomont 9s 8s +12%
sign_pk_from_sk 9s 6s +50%
polyveck_ntt 8s 7s +14%
mld_keccakf1600_permute_c 7s 7s +0%
mld_polymat_expand_entry 7s 3s +133%
poly_pointwise_montgomery 7s 3s +133%
polyvec_matrix_pointwise_montgomery_yvec 7s 8s -12%
polyvecl_ntt 7s 7s +0%
polyz_unpack_c 7s 7s +0%
sign 7s 8s -12%
pointwise_acc_native_x86_64 6s 7s -14%
poly_challenge 6s 4s +50%
poly_invntt_tomont 6s 1s +500%
poly_ntt_native 6s 1s +500%
polyz_unpack_native_x86_64 6s 3s +100%
sign_keypair 6s 4s +50%
keccak_finalize 5s 2s +150%
keccakf1600_xor_bytes (big endian) 5s 2s +150%
ntt_native_x86_64 5s 4s +25%
pointwise_acc_native_aarch64 5s 6s -17%
poly_caddq_c 5s 5s +0%
poly_decompose 5s 2s +150%
poly_permute_bitrev_to_custom_optional_native 5s 4s +25%
poly_power2round 5s 5s +0%
poly_use_hint 5s 2s +150%
poly_use_hint_native_aarch64 5s 3s +67%
polyveck_caddq 5s 7s -29%
polyw1_pack_88 5s 2s +150%
power2round 5s 2s +150%
rej_uniform_eta_native_aarch64 5s 5s +0%
sign_open 5s 5s +0%
sign_signature_pre_hash_internal 5s 5s +0%
unpack_sk_t0hat 5s 5s +0%
decompose 4s 2s +100%
intt_native_aarch64 4s 5s -20%
keccak_absorb 4s 7s -43%
keccak_f1600_x1_native_aarch64_v84a 4s 3s +33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 4s 2s +100%
keccak_squeezeblocks_x4 4s 5s -20%
keccakf1600_permute 4s 2s +100%
keccakf1600x4_extract_bytes 4s 2s +100%
mld_ct_cmask_nonzero_u32 4s 2s +100%
mld_keccakf1600x4_extract_bytes_c 4s 2s +100%
mld_sample_s1_s2 4s 6s -33%
mld_value_barrier_u8 4s 3s +33%
ntt_native_aarch64 4s 3s +33%
poly_decompose_32_native_aarch64 4s 4s +0%
poly_permute_bitrev_to_custom_optional 4s 3s +33%
poly_pointwise_montgomery_native 4s 3s +33%
poly_uniform_gamma1_4x 4s 4s +0%
poly_use_hint_native 4s 3s +33%
polyeta_unpack 4s 3s +33%
polyt0_pack 4s 4s +0%
polyw1_pack 4s 4s +0%
polyz_unpack_native 4s 5s -20%
rej_eta_c 4s 4s +0%
rej_uniform_native_aarch64 4s 2s +100%
shake128_absorb 4s 3s +33%
sig_unpack_hints 4s 2s +100%
sign_keypair_internal 4s 4s +0%
sign_signature_pre_hash_shake256 4s 4s +0%
sign_verify 4s 5s -20%
sign_verify_pre_hash_internal 4s 4s +0%
unpack_pk_t1 4s 2s +100%
use_hint 4s 2s +100%
caddq 3s 4s -25%
intt_native_x86_64 3s 3s +0%
keccak_f1600_x1_native_aarch64 3s 4s -25%
keccak_f1600_x4_native_aarch64_v84a 3s 2s +50%
keccak_squeeze 3s 2s +50%
keccakf1600x4_extract_bytes_native 3s 1s +200%
keccakf1600x4_permute 3s 1s +200%
mld_ct_get_optblocker_i64 3s 3s +0%
mld_ct_get_optblocker_u32 3s 2s +50%
mld_keccakf1600x4_xor_bytes_c 3s 4s -25%
mld_sample_s1_s2_serial 3s 8s -62%
montgomery_reduce 3s 3s +0%
nttunpack_native_x86_64 3s 2s +50%
pack_sk_rho_key_tr_s2 3s 4s -25%
pack_sk_s1 3s 2s +50%
pointwise_native_aarch64 3s 3s +0%
poly_caddq_native_aarch64 3s 3s +0%
poly_caddq_native_x86_64 3s 3s +0%
poly_chknorm 3s 4s -25%
poly_chknorm_native_aarch64 3s 3s +0%
poly_decompose_c 3s 3s +0%
poly_decompose_native 3s 1s +200%
poly_invntt_tomont_native 3s 4s -25%
poly_ntt 3s 3s +0%
poly_ntt_c 3s 3s +0%
poly_uniform 3s 4s -25%
poly_uniform_eta 3s 5s -40%
poly_use_hint_native_x86_64 3s - new
polyt1_pack 3s 3s +0%
polyveck_reduce 3s 4s -25%
polyveck_unpack_eta 3s 3s +0%
polyvecl_pack_eta 3s 2s +50%
polyvecl_uniform_gamma1_serial 3s 3s +0%
polyw1_pack_32 3s 7s -57%
polyz_pack 3s 2s +50%
polyz_unpack_17_native_aarch64 3s 2s +50%
reduce32 3s 3s +0%
rej_eta 3s 6s -50%
shake128_init 3s 3s +0%
shake128x4_squeezeblocks 3s 3s +0%
shake256 3s 2s +50%
shake256_absorb 3s 2s +50%
shake256_release 3s 2s +50%
shake256_squeeze 3s 3s +0%
sign_signature 3s 5s -40%
sign_signature_extmu 3s 5s -40%
sign_verify_extmu 3s 4s -25%
sign_verify_pre_hash_shake256 3s 2s +50%
sk_s1hat_get_poly 3s 3s +0%
sk_s2hat_get_poly 3s 4s -25%
unpack_sk 3s 3s +0%
unpack_sk_s1hat 3s 3s +0%
unpack_sk_s2hat 3s 1s +200%
yvec_get_poly 3s 2s +50%
fqscale 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 5s -60%
keccak_f1600_x4_native_avx2 2s 2s +0%
keccak_init 2s 2s +0%
keccakf1600_permute_native 2s 1s +100%
keccakf1600_xor_bytes 2s 4s -50%
keccakf1600x4_xor_bytes 2s 3s -33%
keccakf1600x4_xor_bytes_native 2s 1s +100%
make_hint 2s 4s -50%
mld_ct_abs_i32 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 2s +0%
mld_ct_get_optblocker_u8 2s 1s +100%
mld_ct_sel_int32 2s 2s +0%
mld_h 2s 5s -60%
mld_keccakf1600_extract_bytes 2s 2s +0%
mld_prepare_domain_separation_prefix 2s 3s -33%
mld_value_barrier_u32 2s 2s +0%
pack_sig_c 2s 2s +0%
pack_sig_h 2s 4s -50%
pack_sig_z 2s 4s -50%
pointwise_native_x86_64 2s 5s -60%
poly_caddq 2s 2s +0%
poly_caddq_native 2s 2s +0%
poly_chknorm_native 2s 6s -67%
poly_chknorm_native_x86_64 2s 2s +0%
poly_decompose_88_native_aarch64 2s 4s -50%
poly_reduce 2s 2s +0%
poly_shiftl 2s 4s -50%
poly_sub 2s 3s -33%
poly_uniform_gamma1 2s 4s -50%
poly_use_hint_c 2s 4s -50%
polyeta_pack 2s 3s -33%
polyt1_unpack 2s 3s -33%
polyvec_matrix_pointwise_montgomery_row 2s 2s +0%
polyveck_pack_eta 2s 2s +0%
polyveck_pack_w1 2s 2s +0%
polyvecl_pointwise_acc_montgomery 2s 4s -50%
polyvecl_pointwise_acc_montgomery_native 2s 3s -33%
polyvecl_uniform_gamma1 2s 6s -67%
polyvecl_unpack_eta 2s 5s -60%
polyvecl_unpack_z 2s 3s -33%
polyz_unpack 2s 4s -50%
polyz_unpack_19_native_aarch64 2s 2s +0%
rej_eta_native 2s 4s -50%
shake128_finalize 2s 4s -50%
shake128_release 2s 3s -33%
shake128x4_absorb_once 2s 2s +0%
shake256_finalize 2s 3s -33%
shake256_init 2s 2s +0%
shake256x4_absorb_once 2s 2s +0%
shake256x4_squeezeblocks 2s 2s +0%
sk_t0hat_get_poly 2s 3s -33%
sys_check_capability 2s 4s -50%
yvec_init 2s 5s -60%
keccakf1600_extract_bytes (big endian) 1s 2s -50%
mld_ct_cmask_neg_i32 1s 1s +0%
mld_value_barrier_i64 1s 5s -80%
shake128_squeeze 1s 2s -50%

@oqs-bot

oqs-bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-44)

Full Results (206 proofs)
Proof Status Current Previous Change
**TOTAL** 1820s 1841s -1.1%
mld_invntt_layer 288s 302s -5%
rej_uniform_native 148s 152s -3%
polyvecl_pointwise_acc_montgomery_c 120s 130s -8%
poly_pointwise_montgomery_c 102s 105s -3%
mld_ct_memcmp 68s 72s -6%
mld_attempt_signature_generation 62s 64s -3%
mld_ntt_layer 46s 43s +7%
fqmul 44s 42s +5%
keccakf1600x4_permute_native 26s 25s +4%
polyvec_matrix_expand 26s 27s -4%
sign_verify_internal 26s 27s -4%
sign_signature_internal 23s 21s +10%
mld_ntt_butterfly_block 22s 24s -8%
rej_uniform 21s 21s +0%
poly_chknorm_c 19s 22s -14%
polyt0_unpack 17s 17s +0%
compute_pack_t0_t1 16s 16s +0%
mld_check_pct 16s 14s +14%
polyeta_unpack 15s 18s -17%
polyveck_chknorm 14s 12s +17%
poly_uniform_eta_4x 13s 11s +18%
polyz_unpack_c 12s 10s +20%
rej_uniform_c 12s 12s +0%
poly_add 11s 9s +22%
polyvec_matrix_pointwise_montgomery_yvec 11s 9s +22%
keccak_absorb_once_x4 10s 10s +0%
poly_invntt_tomont_c 10s 10s +0%
poly_uniform_4x 10s 10s +0%
polyvec_matrix_expand_serial 9s 9s +0%
pointwise_acc_native_x86_64 8s 7s +14%
sign_pk_from_sk 8s 7s +14%
poly_caddq_c 7s 5s +40%
polyveck_decompose 7s 8s -12%
sign 7s 8s -12%
sign_verify_pre_hash_internal 7s 4s +75%
keccak_absorb 6s 7s -14%
mld_compute_pack_z 6s 9s -33%
mld_keccakf1600_permute_c 6s 7s -14%
mld_polymat_expand_entry 6s 2s +200%
poly_shiftl 6s 2s +200%
poly_use_hint_c 6s 8s -25%
polyz_unpack_native_x86_64 6s 2s +200%
sign_open 6s 6s +0%
sign_signature 6s 3s +100%
unpack_sk 6s 3s +100%
fqscale 5s 3s +67%
mld_ct_cmask_nonzero_u32 5s 5s +0%
mld_ct_get_optblocker_u8 5s 3s +67%
montgomery_reduce 5s 3s +67%
pack_sig_c 5s 2s +150%
pointwise_acc_native_aarch64 5s 5s +0%
poly_caddq 5s 3s +67%
poly_challenge 5s 4s +25%
poly_permute_bitrev_to_custom_optional 5s 2s +150%
poly_uniform_gamma1_4x 5s 6s -17%
poly_use_hint_native_aarch64 5s 5s +0%
polyt0_pack 5s 4s +25%
polyt1_unpack 5s 3s +67%
polyveck_caddq 5s 4s +25%
polyveck_ntt 5s 4s +25%
polyvecl_chknorm 5s 3s +67%
polyvecl_uniform_gamma1 5s 3s +67%
polyvecl_unpack_eta 5s 2s +150%
rej_eta 5s 2s +150%
rej_eta_native 5s 4s +25%
sig_unpack_hints 5s 2s +150%
sign_keypair_internal 5s 2s +150%
sign_signature_pre_hash_shake256 5s 4s +25%
sign_verify 5s 4s +25%
sk_t0hat_get_poly 5s 5s +0%
decompose 4s 2s +100%
keccakf1600x4_extract_bytes 4s 2s +100%
mld_prepare_domain_separation_prefix 4s 3s +33%
mld_sample_s1_s2_serial 4s 4s +0%
mld_value_barrier_i64 4s 3s +33%
mld_value_barrier_u32 4s 3s +33%
ntt_native_x86_64 4s 4s +0%
pack_sig_h 4s 6s -33%
pointwise_native_aarch64 4s 5s -20%
poly_caddq_native 4s 5s -20%
poly_chknorm_native 4s 3s +33%
poly_decompose_88_native_aarch64 4s 3s +33%
poly_pointwise_montgomery 4s 2s +100%
poly_reduce 4s 5s -20%
poly_sub 4s 3s +33%
poly_uniform 4s 3s +33%
polyvecl_pack_eta 4s 4s +0%
polyvecl_uniform_gamma1_serial 4s 2s +100%
polyvecl_unpack_z 4s 3s +33%
polyz_pack 4s 2s +100%
polyz_unpack_19_native_aarch64 4s 3s +33%
polyz_unpack_native 4s 5s -20%
power2round 4s 1s +300%
rej_eta_c 4s 5s -20%
shake128_squeeze 4s 1s +300%
sign_signature_extmu 4s 4s +0%
sign_verify_pre_hash_shake256 4s 4s +0%
sk_s1hat_get_poly 4s 2s +100%
unpack_sk_t0hat 4s 3s +33%
yvec_init 4s 3s +33%
caddq 3s 2s +50%
intt_native_aarch64 3s 6s -50%
intt_native_x86_64 3s 4s -25%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 4s -25%
keccak_squeezeblocks_x4 3s 4s -25%
keccakf1600_permute_native 3s 2s +50%
keccakf1600_xor_bytes 3s 4s -25%
keccakf1600x4_permute 3s 4s -25%
keccakf1600x4_xor_bytes 3s 1s +200%
make_hint 3s 5s -40%
mld_ct_abs_i32 3s 1s +200%
mld_ct_cmask_nonzero_u8 3s 2s +50%
mld_ct_get_optblocker_i64 3s 2s +50%
mld_h 3s 4s -25%
mld_keccakf1600_extract_bytes 3s 3s +0%
mld_keccakf1600x4_extract_bytes_c 3s 2s +50%
mld_keccakf1600x4_xor_bytes_c 3s 1s +200%
mld_sample_s1_s2 3s 3s +0%
mld_value_barrier_u8 3s 1s +200%
ntt_native_aarch64 3s 5s -40%
nttunpack_native_x86_64 3s 5s -40%
pack_sig_z 3s 2s +50%
pack_sk_s1 3s 3s +0%
poly_caddq_native_x86_64 3s 4s -25%
poly_chknorm 3s 4s -25%
poly_chknorm_native_aarch64 3s 3s +0%
poly_chknorm_native_x86_64 3s 4s -25%
poly_decompose 3s 3s +0%
poly_decompose_32_native_aarch64 3s 5s -40%
poly_decompose_c 3s 3s +0%
poly_decompose_native 3s 3s +0%
poly_ntt_c 3s 5s -40%
poly_ntt_native 3s 3s +0%
poly_permute_bitrev_to_custom_optional_native 3s 2s +50%
poly_pointwise_montgomery_native 3s 4s -25%
poly_uniform_eta 3s 7s -57%
poly_use_hint 3s 3s +0%
poly_use_hint_native_x86_64 3s - new
polyt1_pack 3s 4s -25%
polyveck_invntt_tomont 3s 4s -25%
polyveck_unpack_eta 3s 4s -25%
polyvecl_ntt 3s 5s -40%
polyvecl_pointwise_acc_montgomery_native 3s 6s -50%
polyz_unpack 3s 3s +0%
rej_uniform_eta_native_aarch64 3s 3s +0%
shake128_release 3s 3s +0%
shake256 3s 2s +50%
shake256_absorb 3s 2s +50%
shake256_init 3s 3s +0%
shake256_release 3s 1s +200%
shake256_squeeze 3s 1s +200%
shake256x4_absorb_once 3s 1s +200%
sign_keypair 3s 4s -25%
sys_check_capability 3s 2s +50%
unpack_sk_s1hat 3s 3s +0%
unpack_sk_s2hat 3s 3s +0%
yvec_get_poly 3s 3s +0%
keccak_f1600_x1_native_aarch64 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 2s 4s -50%
keccak_f1600_x4_native_aarch64_v84a 2s 1s +100%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 3s -33%
keccak_f1600_x4_native_avx2 2s 3s -33%
keccakf1600_extract_bytes (big endian) 2s 4s -50%
keccakf1600_xor_bytes (big endian) 2s 2s +0%
keccakf1600x4_extract_bytes_native 2s 3s -33%
pack_sk_rho_key_tr_s2 2s 4s -50%
pointwise_native_x86_64 2s 3s -33%
poly_caddq_native_aarch64 2s 5s -60%
poly_invntt_tomont 2s 4s -50%
poly_invntt_tomont_native 2s 4s -50%
poly_power2round 2s 5s -60%
poly_uniform_gamma1 2s 2s +0%
poly_use_hint_native 2s 2s +0%
polyeta_pack 2s 3s -33%
polyvec_matrix_pointwise_montgomery_row 2s 2s +0%
polyveck_pack_eta 2s 3s -33%
polyveck_pack_w1 2s 2s +0%
polyveck_reduce 2s 2s +0%
polyvecl_pointwise_acc_montgomery 2s 3s -33%
polyw1_pack 2s 3s -33%
polyw1_pack_32 2s 3s -33%
polyw1_pack_88 2s 8s -75%
polyz_unpack_17_native_aarch64 2s 3s -33%
rej_uniform_native_aarch64 2s 3s -33%
shake128_absorb 2s 1s +100%
shake128_init 2s 1s +100%
sign_signature_pre_hash_internal 2s 3s -33%
sign_verify_extmu 2s 3s -33%
sk_s2hat_get_poly 2s 4s -50%
unpack_pk_t1 2s 3s -33%
use_hint 2s 2s +0%
keccak_finalize 1s 3s -67%
keccak_init 1s 3s -67%
keccak_squeeze 1s 5s -80%
keccakf1600_permute 1s 2s -50%
keccakf1600x4_xor_bytes_native 1s 2s -50%
mld_ct_cmask_neg_i32 1s 2s -50%
mld_ct_get_optblocker_u32 1s 1s +0%
mld_ct_sel_int32 1s 3s -67%
poly_ntt 1s 2s -50%
reduce32 1s 3s -67%
shake128_finalize 1s 3s -67%
shake128x4_absorb_once 1s 3s -67%
shake128x4_squeezeblocks 1s 2s -50%
shake256_finalize 1s 4s -75%
shake256x4_squeezeblocks 1s 2s -50%

@oqs-bot

oqs-bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-87)

Full Results (206 proofs)
Proof Status Current Previous Change
**TOTAL** 2486s 2584s -3.8%
polyvecl_pointwise_acc_montgomery_c 327s 367s -11%
mld_invntt_layer 303s 321s -6%
polyvec_matrix_expand 226s 244s -7%
rej_uniform_native 155s 160s -3%
mld_attempt_signature_generation 108s 107s +1%
poly_pointwise_montgomery_c 104s 112s -7%
sign_signature_internal 69s 66s +5%
mld_ct_memcmp 68s 73s -7%
sign_verify_internal 59s 64s -8%
polyvec_matrix_expand_serial 49s 50s -2%
mld_ntt_layer 44s 45s -2%
fqmul 42s 45s -7%
compute_pack_t0_t1 33s 32s +3%
polyvec_matrix_pointwise_montgomery_yvec 30s 32s -6%
keccakf1600x4_permute_native 23s 22s +5%
mld_ntt_butterfly_block 23s 22s +5%
rej_uniform 23s 23s +0%
poly_chknorm_c 22s 19s +16%
mld_check_pct 18s 16s +12%
polyt0_unpack 17s 17s +0%
rej_uniform_c 16s 15s +7%
polyeta_unpack 15s 17s -12%
poly_uniform_eta_4x 14s 13s +8%
poly_add 10s 13s -23%
poly_uniform_4x 10s 13s -23%
polyvecl_ntt 10s 9s +11%
keccak_absorb_once_x4 9s 12s -25%
poly_invntt_tomont_c 9s 9s +0%
polyveck_decompose 9s 9s +0%
polyveck_invntt_tomont 9s 10s -10%
sign 9s 6s +50%
mld_compute_pack_z 8s 8s +0%
pointwise_acc_native_x86_64 8s 8s +0%
polyveck_ntt 8s 7s +14%
unpack_sk_t0hat 8s 8s +0%
mld_keccakf1600_permute_c 7s 7s +0%
mld_sample_s1_s2 7s 6s +17%
pointwise_acc_native_aarch64 7s 8s -12%
poly_uniform_gamma1_4x 7s 3s +133%
polyveck_caddq 7s 6s +17%
caddq 6s 3s +100%
poly_caddq_native 6s 2s +200%
poly_decompose_c 6s 7s -14%
polyvecl_chknorm 6s 5s +20%
sign_open 6s 3s +100%
sign_pk_from_sk 6s 7s -14%
sign_verify_pre_hash_internal 6s 6s +0%
decompose 5s 2s +150%
keccak_absorb 5s 6s -17%
keccak_init 5s 3s +67%
mld_prepare_domain_separation_prefix 5s 5s +0%
mld_sample_s1_s2_serial 5s 7s -29%
pack_sig_c 5s 2s +150%
poly_caddq_c 5s 6s -17%
poly_caddq_native_aarch64 5s 4s +25%
poly_decompose 5s 4s +25%
poly_power2round 5s 6s -17%
poly_reduce 5s 1s +400%
polyveck_chknorm 5s 4s +25%
polyz_unpack_c 5s 5s +0%
shake128x4_squeezeblocks 5s 1s +400%
sign_signature_extmu 5s 5s +0%
sign_signature_pre_hash_shake256 5s 4s +25%
sign_verify 5s 5s +0%
sign_verify_pre_hash_shake256 5s 3s +67%
sk_s2hat_get_poly 5s 3s +67%
unpack_sk_s1hat 5s 3s +67%
keccak_squeeze 4s 2s +100%
keccak_squeezeblocks_x4 4s 4s +0%
keccakf1600x4_xor_bytes 4s 4s +0%
mld_ct_get_optblocker_u32 4s 3s +33%
mld_h 4s 4s +0%
mld_value_barrier_u32 4s 4s +0%
montgomery_reduce 4s 4s +0%
ntt_native_aarch64 4s 2s +100%
ntt_native_x86_64 4s 2s +100%
nttunpack_native_x86_64 4s 3s +33%
pack_sig_z 4s 1s +300%
pack_sk_s1 4s 3s +33%
pointwise_native_x86_64 4s 3s +33%
poly_challenge 4s 6s -33%
poly_chknorm_native_aarch64 4s 7s -43%
poly_permute_bitrev_to_custom_optional 4s 2s +100%
poly_pointwise_montgomery_native 4s 4s +0%
poly_shiftl 4s 3s +33%
poly_sub 4s 3s +33%
poly_uniform_eta 4s 5s -20%
poly_uniform_gamma1 4s 3s +33%
poly_use_hint_native_aarch64 4s 3s +33%
poly_use_hint_native_x86_64 4s - new
polyt0_pack 4s 5s -20%
polyt1_unpack 4s 2s +100%
polyveck_pack_eta 4s 4s +0%
polyveck_pack_w1 4s 5s -20%
polyveck_unpack_eta 4s 3s +33%
polyvecl_pack_eta 4s 4s +0%
polyvecl_pointwise_acc_montgomery_native 4s 2s +100%
polyvecl_uniform_gamma1_serial 4s 2s +100%
polyw1_pack_88 4s 3s +33%
polyz_unpack_19_native_aarch64 4s 4s +0%
polyz_unpack_native_x86_64 4s 5s -20%
power2round 4s 4s +0%
shake128_absorb 4s 1s +300%
sign_keypair 4s 4s +0%
sign_keypair_internal 4s 8s -50%
sign_signature 4s 3s +33%
sys_check_capability 4s 3s +33%
unpack_pk_t1 4s 3s +33%
use_hint 4s 3s +33%
intt_native_aarch64 3s 3s +0%
intt_native_x86_64 3s 5s -40%
keccak_f1600_x4_native_aarch64_v84a 3s 1s +200%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 4s -25%
keccakf1600_permute 3s 3s +0%
keccakf1600x4_extract_bytes 3s 2s +50%
keccakf1600x4_permute 3s 3s +0%
make_hint 3s 4s -25%
mld_ct_cmask_nonzero_u32 3s 3s +0%
mld_ct_get_optblocker_u8 3s 1s +200%
mld_ct_sel_int32 3s 2s +50%
mld_keccakf1600_extract_bytes 3s 3s +0%
mld_keccakf1600x4_extract_bytes_c 3s 2s +50%
mld_keccakf1600x4_xor_bytes_c 3s 1s +200%
mld_polymat_expand_entry 3s 4s -25%
mld_value_barrier_u8 3s 2s +50%
pack_sig_h 3s 3s +0%
pointwise_native_aarch64 3s 3s +0%
poly_caddq_native_x86_64 3s 3s +0%
poly_chknorm 3s 3s +0%
poly_chknorm_native 3s 4s -25%
poly_chknorm_native_x86_64 3s 4s -25%
poly_decompose_32_native_aarch64 3s 2s +50%
poly_decompose_native 3s 3s +0%
poly_invntt_tomont 3s 2s +50%
poly_invntt_tomont_native 3s 5s -40%
poly_ntt 3s 6s -50%
poly_ntt_native 3s 4s -25%
poly_pointwise_montgomery 3s 5s -40%
poly_uniform 3s 3s +0%
poly_use_hint 3s 2s +50%
poly_use_hint_native 3s 4s -25%
polyeta_pack 3s 2s +50%
polyt1_pack 3s 2s +50%
polyveck_reduce 3s 6s -50%
polyvecl_unpack_eta 3s 4s -25%
polyvecl_unpack_z 3s 4s -25%
polyw1_pack 3s 3s +0%
polyz_unpack 3s 3s +0%
polyz_unpack_17_native_aarch64 3s 4s -25%
polyz_unpack_native 3s 4s -25%
rej_eta 3s 5s -40%
rej_eta_native 3s 5s -40%
rej_uniform_eta_native_aarch64 3s 5s -40%
rej_uniform_native_aarch64 3s 3s +0%
shake128_release 3s 3s +0%
shake128x4_absorb_once 3s 4s -25%
shake256_absorb 3s 3s +0%
shake256_finalize 3s 2s +50%
shake256_squeeze 3s 2s +50%
shake256x4_absorb_once 3s 2s +50%
sig_unpack_hints 3s 6s -50%
sign_signature_pre_hash_internal 3s 4s -25%
sign_verify_extmu 3s 3s +0%
sk_t0hat_get_poly 3s 2s +50%
unpack_sk 3s 4s -25%
yvec_init 3s 3s +0%
fqscale 2s 5s -60%
keccak_f1600_x1_native_aarch64 2s 2s +0%
keccak_f1600_x1_native_aarch64_v84a 2s 4s -50%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 1s +100%
keccak_f1600_x4_native_avx2 2s 4s -50%
keccakf1600_extract_bytes (big endian) 2s 1s +100%
keccakf1600_permute_native 2s 2s +0%
keccakf1600_xor_bytes 2s 2s +0%
keccakf1600_xor_bytes (big endian) 2s 2s +0%
keccakf1600x4_extract_bytes_native 2s 3s -33%
keccakf1600x4_xor_bytes_native 2s 3s -33%
mld_ct_abs_i32 2s 2s +0%
mld_ct_cmask_neg_i32 2s 4s -50%
mld_ct_get_optblocker_i64 2s 4s -50%
mld_value_barrier_i64 2s 2s +0%
pack_sk_rho_key_tr_s2 2s 5s -60%
poly_caddq 2s 3s -33%
poly_decompose_88_native_aarch64 2s 2s +0%
poly_ntt_c 2s 3s -33%
poly_permute_bitrev_to_custom_optional_native 2s 4s -50%
poly_use_hint_c 2s 2s +0%
polyvec_matrix_pointwise_montgomery_row 2s 4s -50%
polyvecl_pointwise_acc_montgomery 2s 4s -50%
polyvecl_uniform_gamma1 2s 4s -50%
polyw1_pack_32 2s 3s -33%
polyz_pack 2s 1s +100%
reduce32 2s 3s -33%
rej_eta_c 2s 3s -33%
shake128_finalize 2s 2s +0%
shake128_init 2s 2s +0%
shake128_squeeze 2s 4s -50%
shake256_release 2s 6s -67%
shake256x4_squeezeblocks 2s 2s +0%
sk_s1hat_get_poly 2s 2s +0%
yvec_get_poly 2s 1s +100%
keccak_finalize 1s 2s -50%
mld_ct_cmask_nonzero_u8 1s 4s -75%
shake256 1s 2s -50%
shake256_init 1s 4s -75%
unpack_sk_s2hat 1s 5s -80%

@jakemas jakemas force-pushed the use-hint-x86-proofs-pr branch from 5dfcb0c to 2b012e5 Compare June 18, 2026 23:59

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 46543 cycles 46505 cycles 1.00
ML-DSA-44 sign 131110 cycles 131069 cycles 1.00
ML-DSA-44 verify 47346 cycles 47311 cycles 1.00
ML-DSA-65 keypair 81683 cycles 81686 cycles 1.00
ML-DSA-65 sign 215314 cycles 215297 cycles 1.00
ML-DSA-65 verify 79298 cycles 79302 cycles 1.00
ML-DSA-87 keypair 132406 cycles 132401 cycles 1.00
ML-DSA-87 sign 277266 cycles 277556 cycles 1.00
ML-DSA-87 verify 134048 cycles 134051 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 112979 cycles 112792 cycles 1.00
ML-DSA-44 sign 401852 cycles 401073 cycles 1.00
ML-DSA-44 verify 119681 cycles 119472 cycles 1.00
ML-DSA-65 keypair 193030 cycles 192928 cycles 1.00
ML-DSA-65 sign 650200 cycles 649950 cycles 1.00
ML-DSA-65 verify 192928 cycles 192868 cycles 1.00
ML-DSA-87 keypair 318757 cycles 318801 cycles 1.00
ML-DSA-87 sign 828689 cycles 828900 cycles 1.00
ML-DSA-87 verify 326680 cycles 326715 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 112165 cycles 112135 cycles 1.00
ML-DSA-44 sign 353502 cycles 353883 cycles 1.00
ML-DSA-44 verify 117018 cycles 117213 cycles 1.00
ML-DSA-65 keypair 194781 cycles 194346 cycles 1.00
ML-DSA-65 sign 583902 cycles 583663 cycles 1.00
ML-DSA-65 verify 192714 cycles 193077 cycles 1.00
ML-DSA-87 keypair 320922 cycles 320080 cycles 1.00
ML-DSA-87 sign 747323 cycles 747211 cycles 1.00
ML-DSA-87 verify 318745 cycles 317932 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 211556 cycles 211551 cycles 1.00
ML-DSA-44 sign 758823 cycles 759909 cycles 1.00
ML-DSA-44 verify 229004 cycles 229380 cycles 1.00
ML-DSA-65 keypair 376999 cycles 378142 cycles 1.00
ML-DSA-65 sign 1247052 cycles 1247244 cycles 1.00
ML-DSA-65 verify 371175 cycles 371972 cycles 1.00
ML-DSA-87 keypair 600407 cycles 601369 cycles 1.00
ML-DSA-87 sign 1582648 cycles 1581866 cycles 1.00
ML-DSA-87 verify 616058 cycles 617247 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 43539 cycles 43355 cycles 1.00
ML-DSA-44 sign 130496 cycles 130544 cycles 1.00
ML-DSA-44 verify 45200 cycles 45221 cycles 1.00
ML-DSA-65 keypair 75740 cycles 75384 cycles 1.00
ML-DSA-65 sign 214734 cycles 214301 cycles 1.00
ML-DSA-65 verify 74318 cycles 74395 cycles 1.00
ML-DSA-87 keypair 123030 cycles 123229 cycles 1.00
ML-DSA-87 sign 271641 cycles 270971 cycles 1.00
ML-DSA-87 verify 120752 cycles 120534 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 61570 cycles 61347 cycles 1.00
ML-DSA-44 sign 188839 cycles 189018 cycles 1.00
ML-DSA-44 verify 66442 cycles 66490 cycles 1.00
ML-DSA-65 keypair 110091 cycles 107823 cycles 1.02
ML-DSA-65 sign 314575 cycles 311956 cycles 1.01
ML-DSA-65 verify 110040 cycles 108642 cycles 1.01
ML-DSA-87 keypair 171189 cycles 171470 cycles 1.00
ML-DSA-87 sign 379547 cycles 378524 cycles 1.00
ML-DSA-87 verify 171219 cycles 169439 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 91607 cycles 91483 cycles 1.00
ML-DSA-44 sign 351599 cycles 351948 cycles 1.00
ML-DSA-44 verify 99770 cycles 99772 cycles 1.00
ML-DSA-65 keypair 153995 cycles 153840 cycles 1.00
ML-DSA-65 sign 572344 cycles 570802 cycles 1.00
ML-DSA-65 verify 159837 cycles 159638 cycles 1.00
ML-DSA-87 keypair 255277 cycles 255330 cycles 1.00
ML-DSA-87 sign 726258 cycles 726314 cycles 1.00
ML-DSA-87 verify 263867 cycles 264290 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 55426 cycles 55817 cycles 0.99
ML-DSA-44 sign 159150 cycles 160142 cycles 0.99
ML-DSA-44 verify 57595 cycles 58345 cycles 0.99
ML-DSA-65 keypair 95738 cycles 96668 cycles 0.99
ML-DSA-65 sign 260961 cycles 264062 cycles 0.99
ML-DSA-65 verify 96292 cycles 97394 cycles 0.99
ML-DSA-87 keypair 154498 cycles 156370 cycles 0.99
ML-DSA-87 sign 323133 cycles 326034 cycles 0.99
ML-DSA-87 verify 151292 cycles 152600 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 154549 cycles 154341 cycles 1.00
ML-DSA-44 sign 590341 cycles 588680 cycles 1.00
ML-DSA-44 verify 169731 cycles 169051 cycles 1.00
ML-DSA-65 keypair 262129 cycles 262010 cycles 1.00
ML-DSA-65 sign 961211 cycles 962118 cycles 1.00
ML-DSA-65 verify 271442 cycles 271267 cycles 1.00
ML-DSA-87 keypair 431136 cycles 431402 cycles 1.00
ML-DSA-87 sign 1212224 cycles 1210399 cycles 1.00
ML-DSA-87 verify 447543 cycles 447214 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 67229 cycles 67424 cycles 1.00
ML-DSA-44 sign 198371 cycles 198161 cycles 1.00
ML-DSA-44 verify 70305 cycles 70119 cycles 1.00
ML-DSA-65 keypair 119276 cycles 119402 cycles 1.00
ML-DSA-65 sign 326136 cycles 326062 cycles 1.00
ML-DSA-65 verify 116872 cycles 116765 cycles 1.00
ML-DSA-87 keypair 196540 cycles 196679 cycles 1.00
ML-DSA-87 sign 421405 cycles 421900 cycles 1.00
ML-DSA-87 verify 193296 cycles 193403 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 71403 cycles 71551 cycles 1.00
ML-DSA-44 sign 208982 cycles 208983 cycles 1.00
ML-DSA-44 verify 74836 cycles 74733 cycles 1.00
ML-DSA-65 keypair 125933 cycles 125927 cycles 1.00
ML-DSA-65 sign 345627 cycles 345490 cycles 1.00
ML-DSA-65 verify 124118 cycles 124185 cycles 1.00
ML-DSA-87 keypair 207065 cycles 206521 cycles 1.00
ML-DSA-87 sign 444070 cycles 439805 cycles 1.01
ML-DSA-87 verify 204100 cycles 204446 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 133180 cycles 134773 cycles 0.99
ML-DSA-44 sign 518749 cycles 524514 cycles 0.99
ML-DSA-44 verify 146562 cycles 147960 cycles 0.99
ML-DSA-65 keypair 225259 cycles 228152 cycles 0.99
ML-DSA-65 sign 846103 cycles 854970 cycles 0.99
ML-DSA-65 verify 234865 cycles 238032 cycles 0.99
ML-DSA-87 keypair 367122 cycles 371227 cycles 0.99
ML-DSA-87 sign 1060070 cycles 1069626 cycles 0.99
ML-DSA-87 verify 381059 cycles 384666 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 46429 cycles 46780 cycles 0.99
ML-DSA-44 sign 139202 cycles 138851 cycles 1.00
ML-DSA-44 verify 49535 cycles 49217 cycles 1.01
ML-DSA-65 keypair 82794 cycles 82672 cycles 1.00
ML-DSA-65 sign 227727 cycles 227571 cycles 1.00
ML-DSA-65 verify 82058 cycles 82025 cycles 1.00
ML-DSA-87 keypair 129470 cycles 130121 cycles 0.99
ML-DSA-87 sign 280908 cycles 280059 cycles 1.00
ML-DSA-87 verify 128521 cycles 128980 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 127570 cycles 127652 cycles 1.00
ML-DSA-44 sign 441188 cycles 441196 cycles 1.00
ML-DSA-44 verify 136371 cycles 136381 cycles 1.00
ML-DSA-65 keypair 220497 cycles 220706 cycles 1.00
ML-DSA-65 sign 714325 cycles 713750 cycles 1.00
ML-DSA-65 verify 221016 cycles 220752 cycles 1.00
ML-DSA-87 keypair 364546 cycles 365114 cycles 1.00
ML-DSA-87 sign 915587 cycles 921254 cycles 0.99
ML-DSA-87 verify 370845 cycles 370795 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 137943 cycles 138043 cycles 1.00
ML-DSA-44 sign 486093 cycles 486132 cycles 1.00
ML-DSA-44 verify 149025 cycles 149080 cycles 1.00
ML-DSA-65 keypair 241723 cycles 241858 cycles 1.00
ML-DSA-65 sign 792077 cycles 791595 cycles 1.00
ML-DSA-65 verify 242156 cycles 241299 cycles 1.00
ML-DSA-87 keypair 395769 cycles 396308 cycles 1.00
ML-DSA-87 sign 1013650 cycles 1019320 cycles 0.99
ML-DSA-87 verify 403627 cycles 403778 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 227585 cycles 229314 cycles 0.99
ML-DSA-44 sign 617247 cycles 633574 cycles 0.97
ML-DSA-44 verify 223087 cycles 223313 cycles 1.00
ML-DSA-65 keypair 383083 cycles 387745 cycles 0.99
ML-DSA-65 sign 991379 cycles 1000126 cycles 0.99
ML-DSA-65 verify 365888 cycles 368820 cycles 0.99
ML-DSA-87 keypair 631971 cycles 640246 cycles 0.99
ML-DSA-87 sign 1311893 cycles 1340002 cycles 0.98
ML-DSA-87 verify 618225 cycles 625960 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 112409 cycles 112383 cycles 1.00
ML-DSA-44 sign 354351 cycles 353857 cycles 1.00
ML-DSA-44 verify 117504 cycles 117230 cycles 1.00
ML-DSA-65 keypair 194514 cycles 194621 cycles 1.00
ML-DSA-65 sign 584390 cycles 584356 cycles 1.00
ML-DSA-65 verify 193521 cycles 193140 cycles 1.00
ML-DSA-87 keypair 320989 cycles 320887 cycles 1.00
ML-DSA-87 sign 747858 cycles 746586 cycles 1.00
ML-DSA-87 verify 318073 cycles 318597 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 121906 cycles 118559 cycles 1.03
ML-DSA-44 sign 466366 cycles 458188 cycles 1.02
ML-DSA-44 verify 134415 cycles 130934 cycles 1.03
ML-DSA-65 keypair 200533 cycles 200695 cycles 1.00
ML-DSA-65 sign 745242 cycles 742901 cycles 1.00
ML-DSA-65 verify 208898 cycles 209938 cycles 1.00
ML-DSA-87 keypair 330451 cycles 330620 cycles 1.00
ML-DSA-87 sign 937675 cycles 935710 cycles 1.00
ML-DSA-87 verify 344116 cycles 343441 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 212331 cycles 211560 cycles 1.00
ML-DSA-44 sign 761162 cycles 759550 cycles 1.00
ML-DSA-44 verify 229987 cycles 229101 cycles 1.00
ML-DSA-65 keypair 378737 cycles 377267 cycles 1.00
ML-DSA-65 sign 1248155 cycles 1247098 cycles 1.00
ML-DSA-65 verify 373444 cycles 371496 cycles 1.01
ML-DSA-87 keypair 602568 cycles 600625 cycles 1.00
ML-DSA-87 sign 1584683 cycles 1584726 cycles 1.00
ML-DSA-87 verify 618500 cycles 616267 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 314774 cycles 316204 cycles 1.00
ML-DSA-44 sign 1195084 cycles 1200034 cycles 1.00
ML-DSA-44 verify 344766 cycles 346175 cycles 1.00
ML-DSA-65 keypair 586747 cycles 573340 cycles 1.02
ML-DSA-65 sign 1934847 cycles 1950745 cycles 0.99
ML-DSA-65 verify 562256 cycles 540488 cycles 1.04
ML-DSA-87 keypair 872532 cycles 851822 cycles 1.02
ML-DSA-87 sign 2477111 cycles 2431803 cycles 1.02
ML-DSA-87 verify 898628 cycles 891771 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-65 verify 562256 cycles 540488 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 271763 cycles 271174 cycles 1.00
ML-DSA-44 sign 821008 cycles 824814 cycles 1.00
ML-DSA-44 verify 276025 cycles 275199 cycles 1.00
ML-DSA-65 keypair 465651 cycles 464395 cycles 1.00
ML-DSA-65 sign 1327127 cycles 1344742 cycles 0.99
ML-DSA-65 verify 451152 cycles 453340 cycles 1.00
ML-DSA-87 keypair 803172 cycles 800977 cycles 1.00
ML-DSA-87 sign 1864325 cycles 1859208 cycles 1.00
ML-DSA-87 verify 781387 cycles 774585 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 465923 cycles 463353 cycles 1.01
ML-DSA-44 sign 2142098 cycles 2132217 cycles 1.00
ML-DSA-44 verify 558746 cycles 555186 cycles 1.01
ML-DSA-65 keypair 784684 cycles 783632 cycles 1.00
ML-DSA-65 sign 3495012 cycles 3480344 cycles 1.00
ML-DSA-65 verify 868294 cycles 864386 cycles 1.00
ML-DSA-87 keypair 1269194 cycles 1259586 cycles 1.01
ML-DSA-87 sign 4369517 cycles 4283557 cycles 1.02
ML-DSA-87 verify 1391355 cycles 1383524 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Details
Benchmark suite Current: 2b012e5 Previous: e5884bb Ratio
ML-DSA-44 keypair 760366 cycles 759827 cycles 1.00
ML-DSA-44 sign 3140093 cycles 3139576 cycles 1.00
ML-DSA-44 verify 859520 cycles 859029 cycles 1.00
ML-DSA-65 keypair 1284878 cycles 1285683 cycles 1.00
ML-DSA-65 sign 5074016 cycles 5080630 cycles 1.00
ML-DSA-65 verify 1363408 cycles 1364316 cycles 1.00
ML-DSA-87 keypair 2112289 cycles 2112201 cycles 1.00
ML-DSA-87 sign 6350994 cycles 6353149 cycles 1.00
ML-DSA-87 verify 2228031 cycles 2227965 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@jakemas jakemas force-pushed the use-hint-x86-proofs-pr branch 4 times, most recently from 03b65ee to f8c8ba3 Compare June 19, 2026 03:42
@jakemas jakemas marked this pull request as ready for review June 19, 2026 03:56
@jakemas jakemas requested a review from a team as a code owner June 19, 2026 03:56
@jakemas jakemas force-pushed the use-hint-x86-proofs-pr branch 2 times, most recently from 04bb175 to 2f73667 Compare June 24, 2026 17:39
…ritten assembly and HOL-Light proofs

Replaces the poly_use_hint_32 (ML-DSA-65/87) and poly_use_hint_88 (ML-DSA-44)
AVX2 intrinsics with hand-written assembly, and adds HOL-Light functional
correctness and constant-time/memory-safety proofs for both x86_64 routines.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
Co-authored-by: willieyz <willieyz@users.noreply.github.com>
@jakemas jakemas force-pushed the use-hint-x86-proofs-pr branch from 2f73667 to 9e615b9 Compare June 25, 2026 16:51

@mkannwischer mkannwischer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jakemas for moving this forward!

Comment on lines +101 to +103
(* The decompose helper lemmas (A1_BOUND, A1_WRAP, BARRETT_INTERVAL_32) and *)
(* the DIV/MOD equivalence tactics are arch-independent and shared with the *)
(* AArch64 proof of the same routine. *)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are shared, can we move them to a shared file, please?

// Compute a1 = round-(a / 523776) ≈ round(a * 1074791425 /
// 2^49), where round-() denotes "round half down". This is
// exact for 0 <= a < Q. Note that half is rounded down since
// 1074791425 / 2^49 ≲ 1 / 523776.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments seem to describe the AArch64 code, not the x86_64 code. Can you please adjust them to describe what this code is doing?

Comment on lines +77 to +80
/* Reference: The reference avx2 implementation checks a0 >= 0, which is
* different from the specification and the reference C implementation. We
* follow the specification and check a0 > 0.
*/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer true since 3 weeks ago: pq-crystals/dilithium@bba9534. We should reference that commit here

Comment on lines +147 to +155
let BARRETT_INTERVAL_32 = prove(
`!a lo hi k.
lo <= a /\ a <= hi /\
k * 262144 <= (2 * lo * 1074791425) DIV 4294967296 + 131072 /\
(2 * hi * 1074791425) DIV 4294967296 + 131072 < (k + 1) * 262144 /\
k * 4194304 <= (lo + 127) DIV 128 * 1025 + 2097152 /\
(hi + 127) DIV 128 * 1025 + 2097152 < (k + 1) * 4194304
==> ((2 * a * 1074791425) DIV 4294967296 + 131072) DIV 262144 = k /\
((a + 127) DIV 128 * 1025 + 2097152) DIV 4194304 = k`,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this rather confusing - this seems to mix the AArch64 branches (k * 262144 <= (2 * lo * 1074791425) DIV 4294967296 + 131072) and the x86 branches (k * 4194304 <= (lo + 127) DIV 128 * 1025 + 2097152). But the AArch64 branches are dead in this proof.
It seems easiest to remove the dead branches and not claim that this theorem is shared with AArch64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

3 participants