Skip to content

Speed up C-reference NTT/invNTT with twisted zetas + 2-layer merging#1118

Draft
hanno-becker wants to merge 5 commits into
mainfrom
c_ntt_2
Draft

Speed up C-reference NTT/invNTT with twisted zetas + 2-layer merging#1118
hanno-becker wants to merge 5 commits into
mainfrom
c_ntt_2

Conversation

@hanno-becker

@hanno-becker hanno-becker commented May 12, 2026

Copy link
Copy Markdown
Contributor

Replace the single-layer C-reference forward and inverse NTT in
mldsa/src/poly.c with one that merges two layers each.

Also, store each twiddle alongside its precomputed twist, letting
mld_fqmul(a, b, b_twisted) drop the multiply with MLDSA_Q^{-1}
that was previously hidden inside mld_montgomery_reduce.

Mirrors pq-code-package/mlkem-native/463 (@rod-chapman) and pq-code-package/mlkem-native/683

@hanno-becker hanno-becker changed the title [WIP] Speed up C-reference NTT/invNTT with twisted zetas + 2-layer me… [WIP] Speed up C-reference NTT/invNTT with twisted zetas + 2-layer merging May 12, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 46538 cycles 46536 cycles 1.00
ML-DSA-44 sign 131062 cycles 131058 cycles 1.00
ML-DSA-44 verify 47344 cycles 47346 cycles 1.00
ML-DSA-65 keypair 81686 cycles 81682 cycles 1.00
ML-DSA-65 sign 215381 cycles 215367 cycles 1.00
ML-DSA-65 verify 79305 cycles 79306 cycles 1.00
ML-DSA-87 keypair 132409 cycles 132411 cycles 1.00
ML-DSA-87 sign 277469 cycles 277415 cycles 1.00
ML-DSA-87 verify 134241 cycles 134234 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 113288 cycles 112746 cycles 1.00
ML-DSA-44 sign 403510 cycles 400854 cycles 1.01
ML-DSA-44 verify 121569 cycles 120116 cycles 1.01
ML-DSA-65 keypair 193992 cycles 192886 cycles 1.01
ML-DSA-65 sign 651150 cycles 649888 cycles 1.00
ML-DSA-65 verify 194758 cycles 192947 cycles 1.01
ML-DSA-87 keypair 319376 cycles 318753 cycles 1.00
ML-DSA-87 sign 831047 cycles 828832 cycles 1.00
ML-DSA-87 verify 329038 cycles 326641 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 45392 cycles 45452 cycles 1.00
ML-DSA-44 sign 136338 cycles 136127 cycles 1.00
ML-DSA-44 verify 47463 cycles 47248 cycles 1.00
ML-DSA-65 keypair 78478 cycles 78548 cycles 1.00
ML-DSA-65 sign 221924 cycles 222310 cycles 1.00
ML-DSA-65 verify 77951 cycles 77415 cycles 1.01
ML-DSA-87 keypair 126284 cycles 124515 cycles 1.01
ML-DSA-87 sign 279614 cycles 275775 cycles 1.01
ML-DSA-87 verify 123991 cycles 122738 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 719602 cycles 820317 cycles 0.88
ML-DSA-44 sign 2246699 cycles 3224057 cycles 0.70
ML-DSA-44 verify 762835 cycles 917185 cycles 0.83
ML-DSA-65 keypair 1251304 cycles 1391201 cycles 0.90
ML-DSA-65 sign 3627167 cycles 5232394 cycles 0.69
ML-DSA-65 verify 1252102 cycles 1464903 cycles 0.85
ML-DSA-87 keypair 2106818 cycles 2299598 cycles 0.92
ML-DSA-87 sign 4794070 cycles 6620374 cycles 0.72
ML-DSA-87 verify 2120585 cycles 2408309 cycles 0.88

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 94177 cycles 94270 cycles 1.00
ML-DSA-44 sign 310203 cycles 329827 cycles 0.94
ML-DSA-44 verify 97031 cycles 98781 cycles 0.98
ML-DSA-65 keypair 158476 cycles 161555 cycles 0.98
ML-DSA-65 sign 500936 cycles 538788 cycles 0.93
ML-DSA-65 verify 157551 cycles 160081 cycles 0.98
ML-DSA-87 keypair 261642 cycles 264477 cycles 0.99
ML-DSA-87 sign 650338 cycles 695417 cycles 0.94
ML-DSA-87 verify 260657 cycles 266020 cycles 0.98

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 56284 cycles 57221 cycles 0.98
ML-DSA-44 sign 167843 cycles 166930 cycles 1.01
ML-DSA-44 verify 60027 cycles 58283 cycles 1.03
ML-DSA-65 keypair 99358 cycles 96734 cycles 1.03
ML-DSA-65 sign 272455 cycles 270287 cycles 1.01
ML-DSA-65 verify 99857 cycles 97285 cycles 1.03
ML-DSA-87 keypair 158208 cycles 161661 cycles 0.98
ML-DSA-87 sign 334417 cycles 335089 cycles 1.00
ML-DSA-87 verify 157879 cycles 153800 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 3rd gen (c6a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: dfda657 Previous: a71b5d2 Ratio
ML-DSA-87 verify 158518 cycles 153800 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 46964 cycles 47162 cycles 1.00
ML-DSA-44 sign 144418 cycles 144655 cycles 1.00
ML-DSA-44 verify 49951 cycles 50104 cycles 1.00
ML-DSA-65 keypair 84031 cycles 83041 cycles 1.01
ML-DSA-65 sign 232816 cycles 229850 cycles 1.01
ML-DSA-65 verify 83951 cycles 83119 cycles 1.01
ML-DSA-87 keypair 131766 cycles 131179 cycles 1.00
ML-DSA-87 sign 281804 cycles 281956 cycles 1.00
ML-DSA-87 verify 129740 cycles 129801 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 130860 cycles 133677 cycles 0.98
ML-DSA-44 sign 492691 cycles 522396 cycles 0.94
ML-DSA-44 verify 142552 cycles 146685 cycles 0.97
ML-DSA-65 keypair 219946 cycles 223803 cycles 0.98
ML-DSA-65 sign 797000 cycles 850834 cycles 0.94
ML-DSA-65 verify 227832 cycles 233807 cycles 0.97
ML-DSA-87 keypair 366825 cycles 375278 cycles 0.98
ML-DSA-87 sign 1017924 cycles 1083775 cycles 0.94
ML-DSA-87 verify 377803 cycles 387875 cycles 0.97

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 61870 cycles 61851 cycles 1.00
ML-DSA-44 sign 190809 cycles 191604 cycles 1.00
ML-DSA-44 verify 66296 cycles 66346 cycles 1.00
ML-DSA-65 keypair 111571 cycles 116244 cycles 0.96
ML-DSA-65 sign 320974 cycles 322314 cycles 1.00
ML-DSA-65 verify 111271 cycles 113021 cycles 0.98
ML-DSA-87 keypair 172676 cycles 172777 cycles 1.00
ML-DSA-87 sign 380745 cycles 384407 cycles 0.99
ML-DSA-87 verify 172559 cycles 174711 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 115084 cycles 118552 cycles 0.97
ML-DSA-44 sign 407771 cycles 446050 cycles 0.91
ML-DSA-44 verify 123860 cycles 128938 cycles 0.96
ML-DSA-65 keypair 195940 cycles 202120 cycles 0.97
ML-DSA-65 sign 650499 cycles 718282 cycles 0.91
ML-DSA-65 verify 200657 cycles 207260 cycles 0.97
ML-DSA-87 keypair 325260 cycles 334395 cycles 0.97
ML-DSA-87 sign 836275 cycles 919394 cycles 0.91
ML-DSA-87 verify 331657 cycles 342499 cycles 0.97

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 67289 cycles 67284 cycles 1.00
ML-DSA-44 sign 201443 cycles 201465 cycles 1.00
ML-DSA-44 verify 70197 cycles 70236 cycles 1.00
ML-DSA-65 keypair 119340 cycles 119592 cycles 1.00
ML-DSA-65 sign 327977 cycles 328455 cycles 1.00
ML-DSA-65 verify 116780 cycles 116975 cycles 1.00
ML-DSA-87 keypair 196703 cycles 196660 cycles 1.00
ML-DSA-87 sign 425032 cycles 424673 cycles 1.00
ML-DSA-87 verify 193211 cycles 193003 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 144207 cycles 150243 cycles 0.96
ML-DSA-44 sign 482947 cycles 543993 cycles 0.89
ML-DSA-44 verify 152571 cycles 162793 cycles 0.94
ML-DSA-65 keypair 248619 cycles 253828 cycles 0.98
ML-DSA-65 sign 797728 cycles 879250 cycles 0.91
ML-DSA-65 verify 250855 cycles 261051 cycles 0.96
ML-DSA-87 keypair 414648 cycles 428028 cycles 0.97
ML-DSA-87 sign 1021025 cycles 1133779 cycles 0.90
ML-DSA-87 verify 417342 cycles 438707 cycles 0.95

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 112457 cycles 112463 cycles 1.00
ML-DSA-44 sign 354616 cycles 354285 cycles 1.00
ML-DSA-44 verify 117054 cycles 117088 cycles 1.00
ML-DSA-65 keypair 194541 cycles 194650 cycles 1.00
ML-DSA-65 sign 584282 cycles 584287 cycles 1.00
ML-DSA-65 verify 193240 cycles 192995 cycles 1.00
ML-DSA-87 keypair 320612 cycles 321252 cycles 1.00
ML-DSA-87 sign 748693 cycles 749933 cycles 1.00
ML-DSA-87 verify 317879 cycles 318651 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 125226 cycles 128439 cycles 0.97
ML-DSA-44 sign 421018 cycles 444902 cycles 0.95
ML-DSA-44 verify 134116 cycles 136577 cycles 0.98
ML-DSA-65 keypair 216986 cycles 220139 cycles 0.99
ML-DSA-65 sign 681415 cycles 718637 cycles 0.95
ML-DSA-65 verify 218343 cycles 221218 cycles 0.99
ML-DSA-87 keypair 361874 cycles 365464 cycles 0.99
ML-DSA-87 sign 886549 cycles 917775 cycles 0.97
ML-DSA-87 verify 368411 cycles 371436 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 71566 cycles 71503 cycles 1.00
ML-DSA-44 sign 211564 cycles 211366 cycles 1.00
ML-DSA-44 verify 74848 cycles 74967 cycles 1.00
ML-DSA-65 keypair 125946 cycles 125922 cycles 1.00
ML-DSA-65 sign 347535 cycles 348013 cycles 1.00
ML-DSA-65 verify 123867 cycles 124042 cycles 1.00
ML-DSA-87 keypair 206188 cycles 206707 cycles 1.00
ML-DSA-87 sign 443030 cycles 447437 cycles 0.99
ML-DSA-87 verify 204440 cycles 204174 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 135363 cycles 137989 cycles 0.98
ML-DSA-44 sign 457401 cycles 481848 cycles 0.95
ML-DSA-44 verify 145386 cycles 148733 cycles 0.98
ML-DSA-65 keypair 237653 cycles 240592 cycles 0.99
ML-DSA-65 sign 742121 cycles 785306 cycles 0.95
ML-DSA-65 verify 236161 cycles 241073 cycles 0.98
ML-DSA-87 keypair 390893 cycles 395138 cycles 0.99
ML-DSA-87 sign 958819 cycles 1005113 cycles 0.95
ML-DSA-87 verify 396238 cycles 403185 cycles 0.98

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 204952 cycles 212493 cycles 0.96
ML-DSA-44 sign 690273 cycles 756435 cycles 0.91
ML-DSA-44 verify 218086 cycles 229158 cycles 0.95
ML-DSA-65 keypair 369004 cycles 378664 cycles 0.97
ML-DSA-65 sign 1129113 cycles 1240500 cycles 0.91
ML-DSA-65 verify 356558 cycles 372168 cycles 0.96
ML-DSA-87 keypair 589071 cycles 602034 cycles 0.98
ML-DSA-87 sign 1454044 cycles 1579603 cycles 0.92
ML-DSA-87 verify 596654 cycles 618336 cycles 0.96

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 112686 cycles 112405 cycles 1.00
ML-DSA-44 sign 356052 cycles 354779 cycles 1.00
ML-DSA-44 verify 117667 cycles 117271 cycles 1.00
ML-DSA-65 keypair 194353 cycles 194498 cycles 1.00
ML-DSA-65 sign 585143 cycles 584927 cycles 1.00
ML-DSA-65 verify 193113 cycles 193003 cycles 1.00
ML-DSA-87 keypair 321039 cycles 321197 cycles 1.00
ML-DSA-87 sign 749458 cycles 749906 cycles 1.00
ML-DSA-87 verify 318256 cycles 318296 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 270334 cycles 270813 cycles 1.00
ML-DSA-44 sign 814667 cycles 814217 cycles 1.00
ML-DSA-44 verify 274970 cycles 273907 cycles 1.00
ML-DSA-65 keypair 467712 cycles 467318 cycles 1.00
ML-DSA-65 sign 1367463 cycles 1320861 cycles 1.04
ML-DSA-65 verify 456340 cycles 451480 cycles 1.01
ML-DSA-87 keypair 805783 cycles 802075 cycles 1.00
ML-DSA-87 sign 1881318 cycles 1880613 cycles 1.00
ML-DSA-87 verify 787853 cycles 779252 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot oqs-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 206083 cycles 212221 cycles 0.97
ML-DSA-44 sign 691174 cycles 758032 cycles 0.91
ML-DSA-44 verify 218535 cycles 229778 cycles 0.95
ML-DSA-65 keypair 369344 cycles 378417 cycles 0.98
ML-DSA-65 sign 1129681 cycles 1241106 cycles 0.91
ML-DSA-65 verify 356857 cycles 372482 cycles 0.96
ML-DSA-87 keypair 589415 cycles 603782 cycles 0.98
ML-DSA-87 sign 1455802 cycles 1581844 cycles 0.92
ML-DSA-87 verify 596976 cycles 618440 cycles 0.97

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 391571 cycles 458494 cycles 0.85
ML-DSA-44 sign 1483786 cycles 2126863 cycles 0.70
ML-DSA-44 verify 443220 cycles 552683 cycles 0.80
ML-DSA-65 keypair 676713 cycles 770631 cycles 0.88
ML-DSA-65 sign 2440668 cycles 3460057 cycles 0.71
ML-DSA-65 verify 707703 cycles 857490 cycles 0.83
ML-DSA-87 keypair 1128071 cycles 1249666 cycles 0.90
ML-DSA-87 sign 3173423 cycles 4303345 cycles 0.74
ML-DSA-87 verify 1174442 cycles 1370001 cycles 0.86

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 223259 cycles 222102 cycles 1.01
ML-DSA-44 sign 616370 cycles 622325 cycles 0.99
ML-DSA-44 verify 223427 cycles 227406 cycles 0.98
ML-DSA-65 keypair 396333 cycles 385188 cycles 1.03
ML-DSA-65 sign 1033678 cycles 1017117 cycles 1.02
ML-DSA-65 verify 378398 cycles 371026 cycles 1.02
ML-DSA-87 keypair 656252 cycles 657858 cycles 1.00
ML-DSA-87 sign 1362649 cycles 1413224 cycles 0.96
ML-DSA-87 verify 638462 cycles 647577 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Details
Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-44 keypair 290502 cycles 317617 cycles 0.91
ML-DSA-44 sign 980030 cycles 1205867 cycles 0.81
ML-DSA-44 verify 311188 cycles 362564 cycles 0.86
ML-DSA-65 keypair 549669 cycles 577543 cycles 0.95
ML-DSA-65 sign 1648606 cycles 1961272 cycles 0.84
ML-DSA-65 verify 505767 cycles 556667 cycles 0.91
ML-DSA-87 keypair 835538 cycles 912005 cycles 0.92
ML-DSA-87 sign 2090280 cycles 2489549 cycles 0.84
ML-DSA-87 verify 852209 cycles 953121 cycles 0.89

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot

oqs-bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-87, REDUCE-RAM)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 2949s 1624s +81.6%
fqscale - 5s -
poly_invntt_tomont_c ⚠️ 21s 8s +162%
Full Results (206 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 2949s 1624s +81.6%
mld_invntt_2_layers_block 665s - new
mld_ntt_2_layers_block 536s - new
mld_ntt_2_layers 273s - new
polyvec_matrix_pointwise_montgomery_yvec 155s 162s -4%
rej_uniform_native 141s 124s +14%
mld_invntt_2_layers 94s - new
poly_pointwise_montgomery_c 94s 138s -32%
mld_ct_memcmp 65s 71s -8%
fqmul 43s 43s +0%
mld_attempt_signature_generation 34s 34s +0%
sign_verify_internal 25s 21s +19%
keccakf1600x4_permute_native 24s 21s +14%
poly_invntt_tomont_c ⚠️ 21s 8s +162%
polyt0_unpack 18s 17s +6%
polyveck_decompose 17s 17s +0%
rej_uniform 15s 9s +67%
polyeta_unpack 14s 14s +0%
rej_uniform_c 13s 16s -19%
poly_chknorm_c 12s 20s -40%
poly_uniform_eta_4x 12s 14s -14%
mld_check_pct 11s 14s -21%
poly_add 11s 12s -8%
polyvecl_chknorm 11s 11s +0%
compute_pack_t0_t1 9s 11s -18%
keccak_absorb_once_x4 9s 7s +29%
mld_sample_s1_s2_serial 8s 4s +100%
pointwise_acc_native_aarch64 8s 7s +14%
polyvecl_ntt 8s 7s +14%
sign 8s 6s +33%
mld_keccakf1600_permute_c 7s 6s +17%
pointwise_acc_native_x86_64 7s 6s +17%
poly_uniform_gamma1 7s 4s +75%
polyveck_caddq 7s 6s +17%
keccak_absorb 6s 6s +0%
mld_keccakf1600_extract_bytes 6s 4s +50%
mld_sample_s1_s2 6s 7s -14%
poly_caddq_c 6s 3s +100%
poly_caddq_native_x86_64 6s 4s +50%
poly_use_hint_native 6s 2s +200%
polyt0_pack 6s 5s +20%
polyvec_matrix_pointwise_montgomery_row 6s 8s -25%
polyveck_chknorm 6s 2s +200%
polyveck_invntt_tomont 6s 6s +0%
sign_signature_extmu 6s 4s +50%
make_hint 5s 5s +0%
mld_compute_pack_z 5s 5s +0%
mld_prepare_domain_separation_prefix 5s 5s +0%
mld_value_barrier_i64 5s 2s +150%
mld_value_barrier_u32 5s 4s +25%
ntt_native_aarch64 5s 4s +25%
poly_challenge 5s 4s +25%
poly_ntt_native 5s 3s +67%
poly_uniform 5s 4s +25%
polyvecl_pointwise_acc_montgomery 5s 4s +25%
polyz_unpack_c 5s 5s +0%
power2round 5s 3s +67%
shake256x4_squeezeblocks 5s 2s +150%
sign_pk_from_sk 5s 5s +0%
sign_signature 5s 3s +67%
sign_signature_internal 5s 6s -17%
sign_signature_pre_hash_internal 5s 2s +150%
sign_verify_extmu 5s 4s +25%
yvec_init 5s 2s +150%
caddq 4s 3s +33%
intt_native_x86_64 4s 4s +0%
keccak_squeezeblocks_x4 4s 5s -20%
keccakf1600_permute_native 4s 2s +100%
mld_ct_cmask_nonzero_u32 4s 2s +100%
mld_keccakf1600x4_xor_bytes_c 4s 5s -20%
mld_polymat_expand_entry 4s 3s +33%
montgomery_reduce 4s 4s +0%
nttunpack_native_x86_64 4s 2s +100%
pack_sig_h 4s 4s +0%
poly_chknorm_native_x86_64 4s 3s +33%
poly_decompose_32_native_aarch64 4s 3s +33%
poly_decompose_c 4s 4s +0%
poly_decompose_native 4s 4s +0%
poly_permute_bitrev_to_custom_optional 4s 2s +100%
poly_power2round 4s 4s +0%
poly_reduce 4s 5s -20%
poly_shiftl 4s 3s +33%
poly_uniform_4x 4s 3s +33%
poly_uniform_eta 4s 4s +0%
poly_uniform_gamma1_4x 4s 2s +100%
polyeta_pack 4s 2s +100%
polyvec_matrix_expand_serial 4s 2s +100%
polyveck_reduce 4s 6s -33%
polyvecl_pack_eta 4s 4s +0%
polyvecl_uniform_gamma1 4s 2s +100%
polyw1_pack 4s 4s +0%
polyw1_pack_88 4s 3s +33%
polyz_pack 4s 4s +0%
polyz_unpack_native 4s 5s -20%
rej_eta 4s 2s +100%
rej_eta_c 4s 3s +33%
rej_uniform_native_aarch64 4s 5s -20%
shake128_finalize 4s 3s +33%
shake256_squeeze 4s 2s +100%
shake256x4_absorb_once 4s 2s +100%
sig_unpack_hints 4s 2s +100%
sign_keypair 4s 6s -33%
sign_signature_pre_hash_shake256 4s 7s -43%
sk_t0hat_get_poly 4s 8s -50%
unpack_sk_s1hat 4s 4s +0%
yvec_get_poly 4s 1s +300%
keccak_f1600_x1_native_aarch64 3s 2s +50%
keccak_squeeze 3s 2s +50%
keccakf1600_extract_bytes (big endian) 3s 5s -40%
keccakf1600_xor_bytes 3s 1s +200%
keccakf1600x4_extract_bytes 3s 3s +0%
keccakf1600x4_extract_bytes_native 3s 2s +50%
mld_ct_cmask_neg_i32 3s 2s +50%
mld_ct_cmask_nonzero_u8 3s 5s -40%
mld_ct_get_optblocker_u8 3s 2s +50%
pack_sig_z 3s 5s -40%
pack_sk_rho_key_tr_s2 3s 4s -25%
pack_sk_s1 3s 2s +50%
poly_caddq_native 3s 2s +50%
poly_caddq_native_aarch64 3s 1s +200%
poly_chknorm 3s 5s -40%
poly_chknorm_native 3s 4s -25%
poly_chknorm_native_aarch64 3s 3s +0%
poly_decompose_88_native_aarch64 3s 2s +50%
poly_invntt_tomont 3s 2s +50%
poly_invntt_tomont_native 3s 4s -25%
poly_pointwise_montgomery 3s 3s +0%
poly_use_hint_native_aarch64 3s 2s +50%
polyt1_pack 3s 4s -25%
polyt1_unpack 3s 6s -50%
polyveck_pack_eta 3s 4s -25%
polyveck_pack_w1 3s 4s -25%
polyveck_unpack_eta 3s 4s -25%
polyvecl_pointwise_acc_montgomery_native 3s 4s -25%
polyvecl_uniform_gamma1_serial 3s 2s +50%
polyz_unpack 3s 4s -25%
reduce32 3s 4s -25%
rej_eta_native 3s 4s -25%
shake128x4_squeezeblocks 3s 3s +0%
shake256_absorb 3s 1s +200%
shake256_init 3s 4s -25%
sign_keypair_internal 3s 5s -40%
sign_open 3s 6s -50%
sign_verify 3s 4s -25%
sign_verify_pre_hash_shake256 3s 2s +50%
sys_check_capability 3s 3s +0%
unpack_sk 3s 4s -25%
unpack_sk_s2hat 3s 3s +0%
use_hint 3s 2s +50%
decompose 2s 2s +0%
intt_native_aarch64 2s 4s -50%
keccak_f1600_x1_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_avx2 2s 4s -50%
keccak_finalize 2s 2s +0%
keccak_init 2s 3s -33%
keccakf1600_permute 2s 2s +0%
keccakf1600_xor_bytes (big endian) 2s 2s +0%
keccakf1600x4_permute 2s 2s +0%
keccakf1600x4_xor_bytes_native 2s 2s +0%
mld_ct_abs_i32 2s 2s +0%
mld_ct_get_optblocker_u32 2s 3s -33%
mld_ct_sel_int32 2s 1s +100%
mld_h 2s 4s -50%
mld_keccakf1600x4_extract_bytes_c 2s 2s +0%
mld_value_barrier_u8 2s 3s -33%
ntt_native_x86_64 2s 1s +100%
pointwise_native_aarch64 2s 2s +0%
pointwise_native_x86_64 2s 3s -33%
poly_caddq 2s 2s +0%
poly_decompose 2s 2s +0%
poly_ntt 2s 2s +0%
poly_ntt_c 2s 3s -33%
poly_permute_bitrev_to_custom_optional_native 2s 3s -33%
poly_pointwise_montgomery_native 2s 2s +0%
poly_sub 2s 3s -33%
poly_use_hint_c 2s 2s +0%
polyveck_ntt 2s 3s -33%
polyvecl_pointwise_acc_montgomery_c 2s 4s -50%
polyvecl_unpack_eta 2s 2s +0%
polyvecl_unpack_z 2s 5s -60%
polyw1_pack_32 2s 4s -50%
polyz_unpack_17_native_aarch64 2s 2s +0%
polyz_unpack_19_native_aarch64 2s 3s -33%
polyz_unpack_native_x86_64 2s 4s -50%
rej_uniform_eta_native_aarch64 2s 2s +0%
shake128_absorb 2s 2s +0%
shake128_init 2s 3s -33%
shake128_release 2s 3s -33%
shake128_squeeze 2s 2s +0%
shake128x4_absorb_once 2s 2s +0%
shake256 2s 1s +100%
shake256_finalize 2s 2s +0%
shake256_release 2s 2s +0%
sign_verify_pre_hash_internal 2s 2s +0%
sk_s1hat_get_poly 2s 2s +0%
sk_s2hat_get_poly 2s 1s +100%
unpack_pk_t1 2s 4s -50%
unpack_sk_t0hat 2s 2s +0%
fqscale - 5s -
keccakf1600x4_xor_bytes 1s 2s -50%
mld_ct_get_optblocker_i64 1s 1s +0%
pack_sig_c 1s 2s -50%
poly_use_hint 1s 4s -75%
polyvec_matrix_expand 1s 1s +0%

@oqs-bot

oqs-bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-65, REDUCE-RAM)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 3080s 1583s +94.6%
fqscale - 3s -
poly_invntt_tomont_c ⚠️ 25s 10s +150%
Full Results (206 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 3080s 1583s +94.6%
mld_invntt_2_layers_block 723s - new
mld_ntt_2_layers_block 610s - new
mld_ntt_2_layers 297s - new
rej_uniform_native 151s 129s +17%
mld_invntt_2_layers 104s - new
poly_pointwise_montgomery_c 100s 141s -29%
polyvec_matrix_pointwise_montgomery_yvec 87s 83s +5%
mld_ct_memcmp 72s 66s +9%
fqmul 45s 43s +5%
polyveck_chknorm 40s 39s +3%
poly_invntt_tomont_c ⚠️ 25s 10s +150%
keccakf1600x4_permute_native 24s 24s +0%
mld_attempt_signature_generation 22s 21s +5%
polyt0_unpack 18s 15s +20%
polyveck_decompose 17s 18s -6%
mld_check_pct 15s 16s -6%
rej_uniform_c 15s 17s -12%
polyvecl_chknorm 14s 16s -12%
sign_verify_internal 14s 16s -12%
rej_uniform 13s 9s +44%
poly_add 12s 11s +9%
poly_chknorm_c 12s 18s -33%
poly_uniform_eta_4x 12s 11s +9%
keccak_absorb_once_x4 10s 9s +11%
polyz_unpack_c 9s 5s +80%
sign 9s 7s +29%
mld_keccakf1600_permute_c 8s 7s +14%
polyvec_matrix_pointwise_montgomery_row 8s 10s -20%
polyveck_caddq 8s 6s +33%
keccak_absorb 7s 9s -22%
keccak_squeezeblocks_x4 7s 4s +75%
mld_compute_pack_z 7s 5s +40%
pointwise_acc_native_x86_64 7s 7s +0%
polyveck_invntt_tomont 7s 7s +0%
polyvecl_ntt 7s 8s -12%
compute_pack_t0_t1 6s 6s +0%
mld_polymat_expand_entry 6s 3s +100%
mld_prepare_domain_separation_prefix 6s 5s +20%
pointwise_acc_native_aarch64 6s 5s +20%
poly_ntt_native 6s 1s +500%
rej_eta_c 6s 5s +20%
sign_signature_internal 6s 5s +20%
sign_signature_pre_hash_shake256 6s 5s +20%
sign_verify_pre_hash_shake256 6s 4s +50%
keccakf1600_permute_native 5s 3s +67%
keccakf1600_xor_bytes (big endian) 5s 3s +67%
keccakf1600x4_permute 5s 3s +67%
keccakf1600x4_xor_bytes_native 5s 2s +150%
mld_h 5s 4s +25%
mld_sample_s1_s2_serial 5s 4s +25%
ntt_native_aarch64 5s 3s +67%
poly_caddq_native_x86_64 5s 2s +150%
poly_challenge 5s 5s +0%
poly_decompose_32_native_aarch64 5s 3s +67%
poly_decompose_c 5s 4s +25%
poly_shiftl 5s 3s +67%
poly_uniform_eta 5s 4s +25%
polyeta_unpack 5s 5s +0%
polyt0_pack 5s 3s +67%
polyw1_pack_32 5s 4s +25%
polyz_unpack_19_native_aarch64 5s 4s +25%
shake256x4_squeezeblocks 5s 3s +67%
sign_verify_pre_hash_internal 5s 5s +0%
caddq 4s 3s +33%
mld_ct_cmask_nonzero_u32 4s 4s +0%
mld_sample_s1_s2 4s 7s -43%
montgomery_reduce 4s 2s +100%
pack_sig_h 4s 2s +100%
pointwise_native_x86_64 4s 3s +33%
poly_caddq_native_aarch64 4s 2s +100%
poly_chknorm_native_aarch64 4s 4s +0%
poly_chknorm_native_x86_64 4s 4s +0%
poly_decompose 4s 3s +33%
poly_decompose_88_native_aarch64 4s 2s +100%
poly_decompose_native 4s 2s +100%
poly_invntt_tomont_native 4s 2s +100%
poly_pointwise_montgomery_native 4s 5s -20%
poly_power2round 4s 8s -50%
poly_reduce 4s 5s -20%
poly_sub 4s 3s +33%
poly_use_hint 4s 2s +100%
polyeta_pack 4s 3s +33%
polyt1_unpack 4s 3s +33%
polyvec_matrix_expand 4s 3s +33%
polyveck_pack_w1 4s 3s +33%
polyveck_reduce 4s 6s -33%
polyvecl_pointwise_acc_montgomery 4s 4s +0%
polyw1_pack_88 4s 3s +33%
rej_uniform_eta_native_aarch64 4s 3s +33%
shake256_init 4s 3s +33%
sign_keypair 4s 5s -20%
sign_pk_from_sk 4s 4s +0%
sign_signature 4s 3s +33%
sign_signature_extmu 4s 2s +100%
sign_verify 4s 5s -20%
sk_s2hat_get_poly 4s 4s +0%
decompose 3s 4s -25%
intt_native_aarch64 3s 4s -25%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 1s +200%
keccak_squeeze 3s 4s -25%
make_hint 3s 4s -25%
mld_ct_abs_i32 3s 2s +50%
mld_ct_cmask_neg_i32 3s 3s +0%
mld_ct_cmask_nonzero_u8 3s 5s -40%
mld_ct_get_optblocker_u8 3s 3s +0%
mld_keccakf1600x4_extract_bytes_c 3s 2s +50%
mld_value_barrier_u8 3s 2s +50%
ntt_native_x86_64 3s 4s -25%
nttunpack_native_x86_64 3s 5s -40%
pack_sig_c 3s 4s -25%
pointwise_native_aarch64 3s 4s -25%
poly_caddq_c 3s 5s -40%
poly_caddq_native 3s 3s +0%
poly_chknorm 3s 3s +0%
poly_ntt 3s 3s +0%
poly_permute_bitrev_to_custom_optional 3s 1s +200%
poly_pointwise_montgomery 3s 3s +0%
poly_uniform 3s 5s -40%
poly_uniform_gamma1_4x 3s 3s +0%
poly_use_hint_native 3s 4s -25%
poly_use_hint_native_aarch64 3s 2s +50%
polyveck_ntt 3s 3s +0%
polyveck_pack_eta 3s 4s -25%
polyveck_unpack_eta 3s 3s +0%
polyvecl_pointwise_acc_montgomery_c 3s 4s -25%
polyvecl_pointwise_acc_montgomery_native 3s 4s -25%
polyz_pack 3s 3s +0%
polyz_unpack 3s 4s -25%
polyz_unpack_17_native_aarch64 3s 2s +50%
polyz_unpack_native 3s 2s +50%
reduce32 3s 3s +0%
rej_eta 3s 4s -25%
rej_eta_native 3s 3s +0%
rej_uniform_native_aarch64 3s 4s -25%
shake128_finalize 3s 2s +50%
shake128_release 3s 3s +0%
shake128x4_absorb_once 3s 2s +50%
shake256_absorb 3s 1s +200%
shake256_release 3s 2s +50%
shake256x4_absorb_once 3s 5s -40%
sig_unpack_hints 3s 3s +0%
sign_keypair_internal 3s 7s -57%
sign_open 3s 4s -25%
sign_signature_pre_hash_internal 3s 7s -57%
sign_verify_extmu 3s 2s +50%
sk_s1hat_get_poly 3s 1s +200%
sk_t0hat_get_poly 3s 5s -40%
unpack_sk_s1hat 3s 2s +50%
use_hint 3s 5s -40%
yvec_get_poly 3s 2s +50%
intt_native_x86_64 2s 3s -33%
keccak_f1600_x1_native_aarch64 2s 2s +0%
keccak_f1600_x1_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 3s -33%
keccak_f1600_x4_native_avx2 2s 2s +0%
keccak_finalize 2s 3s -33%
keccakf1600_extract_bytes (big endian) 2s 3s -33%
keccakf1600_permute 2s 2s +0%
keccakf1600_xor_bytes 2s 1s +100%
keccakf1600x4_extract_bytes_native 2s 4s -50%
keccakf1600x4_xor_bytes 2s 2s +0%
mld_ct_sel_int32 2s 3s -33%
mld_keccakf1600x4_xor_bytes_c 2s 2s +0%
mld_value_barrier_i64 2s 3s -33%
mld_value_barrier_u32 2s 3s -33%
pack_sk_rho_key_tr_s2 2s 3s -33%
pack_sk_s1 2s 2s +0%
poly_caddq 2s 4s -50%
poly_chknorm_native 2s 2s +0%
poly_ntt_c 2s 3s -33%
poly_permute_bitrev_to_custom_optional_native 2s 5s -60%
poly_uniform_gamma1 2s 3s -33%
poly_use_hint_c 2s 2s +0%
polyt1_pack 2s 1s +100%
polyvec_matrix_expand_serial 2s 2s +0%
polyvecl_pack_eta 2s 3s -33%
polyvecl_uniform_gamma1 2s 4s -50%
polyvecl_uniform_gamma1_serial 2s 4s -50%
polyvecl_unpack_eta 2s 2s +0%
polyvecl_unpack_z 2s 3s -33%
polyw1_pack 2s 2s +0%
polyz_unpack_native_x86_64 2s 3s -33%
power2round 2s 2s +0%
shake128_absorb 2s 1s +100%
shake128_init 2s 2s +0%
shake128_squeeze 2s 2s +0%
shake128x4_squeezeblocks 2s 3s -33%
shake256 2s 3s -33%
shake256_finalize 2s 1s +100%
shake256_squeeze 2s 2s +0%
sys_check_capability 2s 2s +0%
unpack_pk_t1 2s 5s -60%
unpack_sk 2s 3s -33%
unpack_sk_s2hat 2s 4s -50%
yvec_init 2s 7s -71%
fqscale - 3s -
keccak_init 1s 2s -50%
keccakf1600x4_extract_bytes 1s 4s -75%
mld_ct_get_optblocker_i64 1s 1s +0%
mld_ct_get_optblocker_u32 1s 2s -50%
mld_keccakf1600_extract_bytes 1s 2s -50%
pack_sig_z 1s 1s +0%
poly_invntt_tomont 1s 3s -67%
poly_uniform_4x 1s 3s -67%
unpack_sk_t0hat 1s 3s -67%

@oqs-bot

oqs-bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-44, REDUCE-RAM)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 3004s 1628s +84.5%
fqscale - 5s -
poly_invntt_tomont_c ⚠️ 21s 9s +133%
Full Results (206 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 3004s 1628s +84.5%
mld_invntt_2_layers_block 702s - new
mld_ntt_2_layers_block 586s - new
mld_ntt_2_layers 281s - new
rej_uniform_native 148s 127s +17%
polyvec_matrix_pointwise_montgomery_yvec 117s 122s -4%
poly_pointwise_montgomery_c 98s 142s -31%
mld_invntt_2_layers 97s - new
mld_ct_memcmp 66s 69s -4%
fqmul 45s 42s +7%
mld_attempt_signature_generation 24s 27s -11%
keccakf1600x4_permute_native 23s 22s +5%
sign_verify_internal 22s 24s -8%
poly_invntt_tomont_c ⚠️ 21s 9s +133%
polyt0_unpack 19s 18s +6%
polyeta_unpack 16s 15s +7%
mld_check_pct 15s 14s +7%
rej_uniform_c 15s 17s -12%
poly_chknorm_c 14s 21s -33%
poly_uniform_eta_4x 14s 15s -7%
compute_pack_t0_t1 11s 9s +22%
polyz_unpack_c 11s 10s +10%
rej_uniform 11s 8s +38%
poly_add 10s 12s -17%
keccak_absorb_once_x4 9s 10s -10%
mld_compute_pack_z 9s 5s +80%
polyvec_matrix_pointwise_montgomery_row 9s 10s -10%
mld_keccakf1600_permute_c 7s 8s -12%
mld_sample_s1_s2 7s 5s +40%
poly_reduce 7s 3s +133%
polyveck_chknorm 7s 7s +0%
polyveck_decompose 7s 7s +0%
polyvecl_ntt 7s 5s +40%
sign 7s 6s +17%
sign_pk_from_sk 7s 6s +17%
keccak_absorb 6s 8s -25%
keccak_squeezeblocks_x4 6s 5s +20%
pointwise_acc_native_x86_64 6s 6s +0%
poly_challenge 6s 4s +50%
poly_decompose_c 6s 8s -25%
polyvecl_unpack_eta 6s 2s +200%
sign_keypair_internal 6s 5s +20%
sign_signature_internal 6s 8s -25%
sign_signature_pre_hash_shake256 6s 5s +20%
mld_keccakf1600x4_extract_bytes_c 5s 2s +150%
montgomery_reduce 5s 2s +150%
ntt_native_x86_64 5s 3s +67%
pointwise_acc_native_aarch64 5s 5s +0%
poly_ntt_c 5s 4s +25%
poly_power2round 5s 6s -17%
poly_uniform_gamma1 5s 2s +150%
polyvec_matrix_expand 5s 1s +400%
polyveck_invntt_tomont 5s 5s +0%
polyveck_ntt 5s 2s +150%
polyvecl_chknorm 5s 6s -17%
polyz_pack 5s 5s +0%
sign_keypair 5s 3s +67%
sign_verify_extmu 5s 4s +25%
unpack_sk 5s 2s +150%
unpack_sk_t0hat 5s 3s +67%
decompose 4s 3s +33%
intt_native_x86_64 4s 2s +100%
keccak_f1600_x1_native_aarch64 4s 3s +33%
keccak_squeeze 4s 4s +0%
keccakf1600_permute 4s 2s +100%
keccakf1600x4_extract_bytes 4s 2s +100%
keccakf1600x4_extract_bytes_native 4s 3s +33%
mld_h 4s 5s -20%
mld_prepare_domain_separation_prefix 4s 4s +0%
nttunpack_native_x86_64 4s 2s +100%
pack_sk_rho_key_tr_s2 4s 6s -33%
poly_caddq_c 4s 3s +33%
poly_caddq_native 4s 4s +0%
poly_chknorm_native 4s 4s +0%
poly_chknorm_native_x86_64 4s 2s +100%
poly_ntt_native 4s 4s +0%
poly_pointwise_montgomery 4s 3s +33%
poly_sub 4s 4s +0%
poly_uniform 4s 5s -20%
poly_uniform_eta 4s 8s -50%
poly_use_hint_native 4s 3s +33%
polyeta_pack 4s 3s +33%
polyt0_pack 4s 3s +33%
polyt1_pack 4s 3s +33%
polyvec_matrix_expand_serial 4s 2s +100%
polyveck_reduce 4s 7s -43%
polyvecl_pack_eta 4s 2s +100%
polyvecl_unpack_z 4s 3s +33%
polyw1_pack_88 4s 3s +33%
polyz_unpack_19_native_aarch64 4s 4s +0%
polyz_unpack_native_x86_64 4s 3s +33%
reduce32 4s 3s +33%
rej_eta_native 4s 4s +0%
rej_uniform_eta_native_aarch64 4s 3s +33%
rej_uniform_native_aarch64 4s 3s +33%
shake128_init 4s 2s +100%
sign_signature 4s 6s -33%
sign_verify_pre_hash_shake256 4s 3s +33%
unpack_sk_s2hat 4s 2s +100%
use_hint 4s 3s +33%
caddq 3s 1s +200%
intt_native_aarch64 3s 4s -25%
keccak_f1600_x4_native_aarch64_v84a 3s 3s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 3s +0%
keccakf1600_permute_native 3s 4s -25%
keccakf1600_xor_bytes (big endian) 3s 2s +50%
keccakf1600x4_permute 3s 2s +50%
keccakf1600x4_xor_bytes_native 3s 3s +0%
mld_ct_abs_i32 3s 3s +0%
mld_ct_cmask_nonzero_u32 3s 2s +50%
mld_ct_get_optblocker_u32 3s 3s +0%
mld_ct_get_optblocker_u8 3s 4s -25%
mld_keccakf1600x4_xor_bytes_c 3s 2s +50%
mld_sample_s1_s2_serial 3s 3s +0%
pack_sig_c 3s 3s +0%
pack_sig_z 3s 3s +0%
pointwise_native_aarch64 3s 6s -50%
pointwise_native_x86_64 3s 3s +0%
poly_caddq_native_x86_64 3s 2s +50%
poly_chknorm_native_aarch64 3s 4s -25%
poly_decompose 3s 5s -40%
poly_decompose_32_native_aarch64 3s 3s +0%
poly_decompose_88_native_aarch64 3s 4s -25%
poly_invntt_tomont 3s 4s -25%
poly_ntt 3s 1s +200%
poly_permute_bitrev_to_custom_optional 3s 3s +0%
poly_permute_bitrev_to_custom_optional_native 3s 3s +0%
poly_use_hint 3s 3s +0%
poly_use_hint_c 3s 1s +200%
polyt1_unpack 3s 5s -40%
polyveck_pack_eta 3s 4s -25%
polyveck_pack_w1 3s 3s +0%
polyveck_unpack_eta 3s 3s +0%
polyvecl_pointwise_acc_montgomery 3s 1s +200%
polyvecl_pointwise_acc_montgomery_c 3s 4s -25%
polyvecl_uniform_gamma1 3s 3s +0%
polyw1_pack 3s 3s +0%
polyw1_pack_32 3s 2s +50%
polyz_unpack_native 3s 4s -25%
power2round 3s 5s -40%
rej_eta 3s 4s -25%
rej_eta_c 3s 4s -25%
shake128_absorb 3s 2s +50%
shake128_squeeze 3s 4s -25%
shake128x4_squeezeblocks 3s 1s +200%
shake256_finalize 3s 4s -25%
shake256_release 3s 3s +0%
shake256_squeeze 3s 3s +0%
shake256x4_absorb_once 3s 1s +200%
shake256x4_squeezeblocks 3s 3s +0%
sign_open 3s 4s -25%
sign_signature_extmu 3s 2s +50%
sign_signature_pre_hash_internal 3s 5s -40%
sign_verify 3s 4s -25%
sign_verify_pre_hash_internal 3s 5s -40%
sk_s2hat_get_poly 3s 3s +0%
unpack_sk_s1hat 3s 4s -25%
yvec_get_poly 3s 5s -40%
keccak_f1600_x1_native_aarch64_v84a 2s 4s -50%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 5s -60%
keccak_f1600_x4_native_avx2 2s 4s -50%
keccak_finalize 2s 2s +0%
keccak_init 2s 3s -33%
keccakf1600_extract_bytes (big endian) 2s 2s +0%
keccakf1600_xor_bytes 2s 4s -50%
keccakf1600x4_xor_bytes 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 2s +0%
mld_ct_get_optblocker_i64 2s 2s +0%
mld_ct_sel_int32 2s 2s +0%
mld_keccakf1600_extract_bytes 2s 4s -50%
mld_polymat_expand_entry 2s 3s -33%
mld_value_barrier_i64 2s 4s -50%
mld_value_barrier_u32 2s 1s +100%
mld_value_barrier_u8 2s 2s +0%
ntt_native_aarch64 2s 4s -50%
pack_sk_s1 2s 5s -60%
poly_caddq 2s 5s -60%
poly_caddq_native_aarch64 2s 5s -60%
poly_chknorm 2s 2s +0%
poly_decompose_native 2s 5s -60%
poly_invntt_tomont_native 2s 3s -33%
poly_pointwise_montgomery_native 2s 3s -33%
poly_shiftl 2s 4s -50%
polyveck_caddq 2s 6s -67%
polyvecl_pointwise_acc_montgomery_native 2s 5s -60%
polyvecl_uniform_gamma1_serial 2s 4s -50%
polyz_unpack 2s 3s -33%
polyz_unpack_17_native_aarch64 2s 4s -50%
shake128_finalize 2s 3s -33%
shake128_release 2s 4s -50%
shake128x4_absorb_once 2s 3s -33%
shake256 2s 3s -33%
shake256_absorb 2s 3s -33%
shake256_init 2s 1s +100%
sig_unpack_hints 2s 3s -33%
unpack_pk_t1 2s 3s -33%
yvec_init 2s 3s -33%
fqscale - 5s -
make_hint 1s 4s -75%
mld_ct_cmask_neg_i32 1s 1s +0%
pack_sig_h 1s 4s -75%
poly_uniform_4x 1s 2s -50%
poly_uniform_gamma1_4x 1s 2s -50%
poly_use_hint_native_aarch64 1s 2s -50%
sk_s1hat_get_poly 1s 2s -50%
sk_t0hat_get_poly 1s 3s -67%
sys_check_capability 1s 3s -67%

@oqs-bot

oqs-bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-87)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 3700s 2349s +57.5%
fqscale - 3s -
Full Results (206 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 3700s 2349s +57.5%
mld_invntt_2_layers_block 715s - new
mld_ntt_2_layers_block 574s - new
polyvecl_pointwise_acc_montgomery_c 335s 296s +13%
mld_ntt_2_layers 284s - new
polyvec_matrix_expand 225s 213s +6%
mld_attempt_signature_generation 109s 105s +4%
rej_uniform_native 106s 149s -29%
mld_invntt_2_layers 101s - new
poly_pointwise_montgomery_c 93s 99s -6%
mld_ct_memcmp 70s 64s +9%
sign_signature_internal 67s 64s +5%
sign_verify_internal 60s 57s +5%
polyvec_matrix_expand_serial 48s 48s +0%
fqmul 44s 41s +7%
compute_pack_t0_t1 33s 32s +3%
polyvec_matrix_pointwise_montgomery_yvec 30s 32s -6%
keccakf1600x4_permute_native 23s 24s -4%
mld_check_pct 17s 16s +6%
rej_uniform 17s 21s -19%
polyeta_unpack 16s 16s +0%
polyt0_unpack 15s 16s -6%
poly_chknorm_c 14s 19s -26%
poly_invntt_tomont_c 13s 7s +86%
rej_uniform_c 13s 12s +8%
poly_uniform_eta_4x 12s 12s +0%
polyvecl_ntt 12s 10s +20%
polyveck_decompose 11s 8s +38%
poly_add 10s 11s -9%
poly_uniform_4x 10s 12s -17%
polyveck_invntt_tomont 10s 7s +43%
keccak_absorb_once_x4 9s 9s +0%
polyveck_ntt 9s 7s +29%
mld_compute_pack_z 8s 6s +33%
polyveck_caddq 8s 7s +14%
polyveck_chknorm 8s 5s +60%
unpack_sk_t0hat 8s 5s +60%
keccak_absorb 7s 8s -12%
mld_keccakf1600_permute_c 7s 6s +17%
mld_sample_s1_s2 7s 5s +40%
pointwise_acc_native_aarch64 7s 8s -12%
poly_decompose_c 7s 7s +0%
poly_uniform_eta 7s 5s +40%
poly_use_hint 7s 3s +133%
decompose 6s 2s +200%
mld_keccakf1600x4_extract_bytes_c 6s 3s +100%
pointwise_acc_native_x86_64 6s 7s -14%
poly_caddq_c 6s 7s -14%
polyz_unpack_c 6s 3s +100%
sign 6s 8s -25%
keccakf1600x4_extract_bytes 5s 2s +150%
mld_sample_s1_s2_serial 5s 6s -17%
ntt_native_aarch64 5s 2s +150%
pack_sig_z 5s 2s +150%
pack_sk_s1 5s 4s +25%
poly_chknorm 5s 4s +25%
poly_chknorm_native_aarch64 5s 3s +67%
poly_decompose_32_native_aarch64 5s 3s +67%
poly_invntt_tomont 5s 2s +150%
poly_power2round 5s 4s +25%
polyt1_pack 5s 2s +150%
polyt1_unpack 5s 4s +25%
polyveck_reduce 5s 3s +67%
polyvecl_chknorm 5s 4s +25%
polyvecl_pointwise_acc_montgomery 5s 3s +67%
polyvecl_pointwise_acc_montgomery_native 5s 5s +0%
polyvecl_unpack_z 5s 1s +400%
polyw1_pack 5s 6s -17%
polyw1_pack_32 5s 4s +25%
polyz_unpack_19_native_aarch64 5s 5s +0%
reduce32 5s 3s +67%
shake128x4_absorb_once 5s 5s +0%
sign_keypair_internal 5s 2s +150%
sign_pk_from_sk 5s 7s -29%
sign_signature 5s 5s +0%
use_hint 5s 2s +150%
keccak_f1600_x1_native_aarch64_v84a 4s 3s +33%
keccak_squeezeblocks_x4 4s 2s +100%
keccakf1600_permute_native 4s 2s +100%
mld_keccakf1600x4_xor_bytes_c 4s 2s +100%
mld_polymat_expand_entry 4s 2s +100%
montgomery_reduce 4s 2s +100%
ntt_native_x86_64 4s 4s +0%
pointwise_native_aarch64 4s 4s +0%
poly_chknorm_native 4s 3s +33%
poly_decompose 4s 4s +0%
poly_pointwise_montgomery 4s 1s +300%
poly_pointwise_montgomery_native 4s 2s +100%
poly_uniform_gamma1_4x 4s 3s +33%
poly_use_hint_c 4s 5s -20%
poly_use_hint_native 4s 2s +100%
polyt0_pack 4s 3s +33%
polyvec_matrix_pointwise_montgomery_row 4s 3s +33%
polyvecl_uniform_gamma1 4s 3s +33%
polyvecl_uniform_gamma1_serial 4s 2s +100%
polyvecl_unpack_eta 4s 5s -20%
rej_eta_c 4s 3s +33%
rej_uniform_eta_native_aarch64 4s 2s +100%
rej_uniform_native_aarch64 4s 5s -20%
shake128x4_squeezeblocks 4s 2s +100%
shake256_finalize 4s 4s +0%
shake256x4_absorb_once 4s 2s +100%
sign_verify 4s 6s -33%
sign_verify_pre_hash_shake256 4s 5s -20%
sk_s1hat_get_poly 4s 2s +100%
intt_native_aarch64 3s 2s +50%
keccak_f1600_x1_native_aarch64 3s 4s -25%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 3s 1s +200%
keccak_init 3s 3s +0%
keccakf1600_permute 3s 3s +0%
keccakf1600_xor_bytes 3s 5s -40%
keccakf1600_xor_bytes (big endian) 3s 5s -40%
keccakf1600x4_xor_bytes 3s 4s -25%
mld_ct_abs_i32 3s 3s +0%
mld_ct_cmask_nonzero_u8 3s 2s +50%
mld_ct_get_optblocker_u8 3s 4s -25%
mld_h 3s 4s -25%
mld_prepare_domain_separation_prefix 3s 4s -25%
mld_value_barrier_u8 3s 2s +50%
pack_sig_c 3s 3s +0%
poly_caddq_native 3s 4s -25%
poly_caddq_native_aarch64 3s 5s -40%
poly_challenge 3s 4s -25%
poly_decompose_native 3s 2s +50%
poly_invntt_tomont_native 3s 2s +50%
poly_ntt 3s 1s +200%
poly_permute_bitrev_to_custom_optional 3s 2s +50%
poly_permute_bitrev_to_custom_optional_native 3s 2s +50%
poly_shiftl 3s 3s +0%
poly_sub 3s 3s +0%
poly_uniform 3s 4s -25%
poly_use_hint_native_aarch64 3s 2s +50%
polyveck_pack_eta 3s 4s -25%
polyveck_unpack_eta 3s 2s +50%
polyw1_pack_88 3s 1s +200%
power2round 3s 4s -25%
rej_eta_native 3s 3s +0%
shake128_release 3s 3s +0%
shake256_absorb 3s 2s +50%
shake256x4_squeezeblocks 3s 2s +50%
sig_unpack_hints 3s 6s -50%
sign_open 3s 4s -25%
sign_signature_pre_hash_internal 3s 3s +0%
sign_signature_pre_hash_shake256 3s 3s +0%
sign_verify_extmu 3s 4s -25%
sk_t0hat_get_poly 3s 2s +50%
yvec_init 3s 6s -50%
caddq 2s 3s -33%
intt_native_x86_64 2s 2s +0%
keccak_f1600_x4_native_aarch64_v84a 2s 2s +0%
keccak_f1600_x4_native_avx2 2s 1s +100%
keccak_finalize 2s 2s +0%
keccakf1600_extract_bytes (big endian) 2s 3s -33%
keccakf1600x4_extract_bytes_native 2s 4s -50%
keccakf1600x4_permute 2s 3s -33%
keccakf1600x4_xor_bytes_native 2s 1s +100%
make_hint 2s 4s -50%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_get_optblocker_i64 2s 2s +0%
mld_keccakf1600_extract_bytes 2s 2s +0%
mld_value_barrier_i64 2s 3s -33%
nttunpack_native_x86_64 2s 4s -50%
pack_sig_h 2s 3s -33%
pack_sk_rho_key_tr_s2 2s 2s +0%
pointwise_native_x86_64 2s 4s -50%
poly_caddq 2s 2s +0%
poly_caddq_native_x86_64 2s 4s -50%
poly_chknorm_native_x86_64 2s 2s +0%
poly_ntt_native 2s 7s -71%
poly_uniform_gamma1 2s 3s -33%
polyeta_pack 2s 4s -50%
polyveck_pack_w1 2s 4s -50%
polyz_pack 2s 3s -33%
polyz_unpack 2s 4s -50%
polyz_unpack_17_native_aarch64 2s 2s +0%
polyz_unpack_native 2s 2s +0%
polyz_unpack_native_x86_64 2s 4s -50%
rej_eta 2s 3s -33%
shake128_absorb 2s 3s -33%
shake128_finalize 2s 3s -33%
shake128_init 2s 3s -33%
shake128_squeeze 2s 4s -50%
shake256 2s 2s +0%
shake256_release 2s 2s +0%
shake256_squeeze 2s 1s +100%
sign_keypair 2s 7s -71%
sign_signature_extmu 2s 5s -60%
sign_verify_pre_hash_internal 2s 5s -60%
sys_check_capability 2s 3s -33%
unpack_pk_t1 2s 1s +100%
unpack_sk 2s 2s +0%
unpack_sk_s1hat 2s 1s +100%
unpack_sk_s2hat 2s 3s -33%
yvec_get_poly 2s 3s -33%
fqscale - 3s -
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 1s 1s +0%
keccak_squeeze 1s 3s -67%
mld_ct_cmask_nonzero_u32 1s 3s -67%
mld_ct_get_optblocker_u32 1s 2s -50%
mld_ct_sel_int32 1s 3s -67%
mld_value_barrier_u32 1s 3s -67%
poly_decompose_88_native_aarch64 1s 3s -67%
poly_ntt_c 1s 3s -67%
poly_reduce 1s 2s -50%
polyvecl_pack_eta 1s 2s -50%
shake256_init 1s 2s -50%
sk_s2hat_get_poly 1s 3s -67%

@oqs-bot

oqs-bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-65)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 3322s 2036s +63.2%
fqscale - 3s -
Full Results (206 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 3322s 2036s +63.2%
mld_invntt_2_layers_block 697s - new
mld_ntt_2_layers_block 566s - new
mld_ntt_2_layers 284s - new
polyvecl_pointwise_acc_montgomery_c 208s 208s +0%
polyvec_matrix_expand 133s 130s +2%
rej_uniform_native 110s 145s -24%
mld_invntt_2_layers 97s - new
poly_pointwise_montgomery_c 96s 89s +8%
mld_ct_memcmp 66s 66s +0%
mld_attempt_signature_generation 65s 66s -2%
sign_verify_internal 64s 65s -2%
sign_signature_internal 47s 48s -2%
fqmul 41s 40s +2%
polyvec_matrix_expand_serial 24s 25s -4%
keccakf1600x4_permute_native 22s 22s +0%
rej_uniform 18s 22s -18%
compute_pack_t0_t1 17s 17s +0%
mld_check_pct 16s 14s +14%
poly_chknorm_c 16s 20s -20%
polyt0_unpack 16s 15s +7%
polyvecl_chknorm 16s 15s +7%
polyveck_decompose 14s 14s +0%
poly_invntt_tomont_c 13s 11s +18%
poly_uniform_4x 12s 9s +33%
polyveck_chknorm 12s 9s +33%
poly_uniform_eta_4x 11s 10s +10%
mld_compute_pack_z 10s 9s +11%
mld_keccakf1600_permute_c 10s 8s +25%
poly_add 10s 13s -23%
rej_uniform_c 10s 12s -17%
polyvec_matrix_pointwise_montgomery_yvec 9s 9s +0%
keccak_absorb_once_x4 8s 12s -33%
pointwise_acc_native_x86_64 8s 7s +14%
polyveck_invntt_tomont 8s 9s -11%
sign 8s 6s +33%
polyvecl_ntt 7s 6s +17%
polyvecl_uniform_gamma1 7s 4s +75%
sign_pk_from_sk 7s 4s +75%
sign_verify_pre_hash_shake256 7s 5s +40%
unpack_sk_t0hat 7s 5s +40%
mld_value_barrier_u8 6s 1s +500%
pointwise_acc_native_aarch64 6s 4s +50%
poly_decompose_c 6s 3s +100%
polyveck_ntt 6s 7s -14%
polyz_unpack_c 6s 6s +0%
reduce32 6s 3s +100%
sign_signature_extmu 6s 3s +100%
sign_verify_extmu 6s 3s +100%
keccak_absorb 5s 9s -44%
mld_ct_cmask_nonzero_u32 5s 2s +150%
mld_keccakf1600_extract_bytes 5s 2s +150%
mld_prepare_domain_separation_prefix 5s 2s +150%
pointwise_native_aarch64 5s 2s +150%
pointwise_native_x86_64 5s 2s +150%
poly_caddq_c 5s 4s +25%
poly_caddq_native 5s 4s +25%
poly_chknorm_native_aarch64 5s 3s +67%
poly_power2round 5s 4s +25%
poly_reduce 5s 2s +150%
poly_uniform 5s 3s +67%
poly_uniform_gamma1_4x 5s 3s +67%
polyt1_unpack 5s 4s +25%
polyvec_matrix_pointwise_montgomery_row 5s 2s +150%
polyveck_reduce 5s 3s +67%
sign_keypair 5s 6s -17%
sign_open 5s 5s +0%
unpack_sk 5s 5s +0%
keccak_squeezeblocks_x4 4s 5s -20%
keccakf1600x4_extract_bytes_native 4s 2s +100%
mld_ct_abs_i32 4s 3s +33%
mld_ct_get_optblocker_u32 4s 2s +100%
mld_h 4s 2s +100%
mld_polymat_expand_entry 4s 1s +300%
mld_sample_s1_s2 4s 6s -33%
ntt_native_aarch64 4s 2s +100%
nttunpack_native_x86_64 4s 4s +0%
pack_sig_c 4s 5s -20%
poly_caddq 4s 3s +33%
poly_challenge 4s 2s +100%
poly_chknorm 4s 3s +33%
poly_decompose_32_native_aarch64 4s 4s +0%
poly_ntt 4s 2s +100%
poly_ntt_native 4s 1s +300%
poly_permute_bitrev_to_custom_optional_native 4s 2s +100%
poly_shiftl 4s 4s +0%
poly_uniform_eta 4s 3s +33%
poly_use_hint_c 4s 3s +33%
poly_use_hint_native 4s 4s +0%
polyt0_pack 4s 5s -20%
polyveck_caddq 4s 6s -33%
polyvecl_pack_eta 4s 2s +100%
polyw1_pack_88 4s 2s +100%
polyz_unpack_native 4s 3s +33%
polyz_unpack_native_x86_64 4s 3s +33%
rej_eta_native 4s 5s -20%
rej_uniform_eta_native_aarch64 4s 4s +0%
shake128_init 4s 4s +0%
shake128_release 4s 2s +100%
sign_keypair_internal 4s 4s +0%
sign_signature_pre_hash_internal 4s 3s +33%
sign_signature_pre_hash_shake256 4s 4s +0%
unpack_pk_t1 4s 2s +100%
yvec_get_poly 4s 2s +100%
yvec_init 4s 3s +33%
caddq 3s 2s +50%
decompose 3s 3s +0%
keccak_f1600_x1_native_aarch64_v84a 3s 1s +200%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 3s +0%
keccak_f1600_x4_native_avx2 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 1s +200%
keccakf1600_permute 3s 3s +0%
keccakf1600x4_permute 3s 3s +0%
make_hint 3s 2s +50%
mld_ct_sel_int32 3s 2s +50%
mld_keccakf1600x4_extract_bytes_c 3s 2s +50%
mld_sample_s1_s2_serial 3s 5s -40%
mld_value_barrier_i64 3s 3s +0%
mld_value_barrier_u32 3s 1s +200%
montgomery_reduce 3s 2s +50%
pack_sig_z 3s 4s -25%
pack_sk_s1 3s 5s -40%
poly_chknorm_native 3s 4s -25%
poly_decompose_native 3s 4s -25%
poly_invntt_tomont 3s 2s +50%
poly_permute_bitrev_to_custom_optional 3s 3s +0%
poly_uniform_gamma1 3s 3s +0%
poly_use_hint_native_aarch64 3s 5s -40%
polyeta_pack 3s 4s -25%
polyeta_unpack 3s 6s -50%
polyveck_pack_eta 3s 3s +0%
polyveck_pack_w1 3s 2s +50%
polyvecl_pointwise_acc_montgomery 3s 4s -25%
polyvecl_unpack_z 3s 3s +0%
polyw1_pack 3s 5s -40%
polyz_pack 3s 5s -40%
polyz_unpack 3s 4s -25%
rej_eta 3s 2s +50%
rej_uniform_native_aarch64 3s 2s +50%
shake128_absorb 3s 3s +0%
shake128_squeeze 3s 1s +200%
shake256_init 3s 3s +0%
shake256_release 3s 3s +0%
shake256x4_absorb_once 3s 4s -25%
sign_signature 3s 2s +50%
sign_verify 3s 4s -25%
sk_s2hat_get_poly 3s 3s +0%
sk_t0hat_get_poly 3s 2s +50%
unpack_sk_s1hat 3s 3s +0%
unpack_sk_s2hat 3s 3s +0%
keccak_f1600_x1_native_aarch64 2s 2s +0%
keccak_f1600_x4_native_aarch64_v84a 2s 2s +0%
keccak_init 2s 1s +100%
keccak_squeeze 2s 1s +100%
keccakf1600_permute_native 2s 1s +100%
keccakf1600_xor_bytes 2s 3s -33%
keccakf1600_xor_bytes (big endian) 2s 2s +0%
keccakf1600x4_extract_bytes 2s 3s -33%
keccakf1600x4_xor_bytes 2s 3s -33%
keccakf1600x4_xor_bytes_native 2s 3s -33%
mld_ct_cmask_nonzero_u8 2s 3s -33%
mld_ct_get_optblocker_i64 2s 2s +0%
mld_ct_get_optblocker_u8 2s 2s +0%
ntt_native_x86_64 2s 2s +0%
pack_sig_h 2s 5s -60%
pack_sk_rho_key_tr_s2 2s 4s -50%
poly_caddq_native_aarch64 2s 4s -50%
poly_chknorm_native_x86_64 2s 2s +0%
poly_decompose 2s 2s +0%
poly_invntt_tomont_native 2s 2s +0%
poly_ntt_c 2s 4s -50%
poly_pointwise_montgomery 2s 2s +0%
poly_pointwise_montgomery_native 2s 4s -50%
poly_sub 2s 5s -60%
poly_use_hint 2s 6s -67%
polyt1_pack 2s 4s -50%
polyvecl_pointwise_acc_montgomery_native 2s 4s -50%
polyvecl_unpack_eta 2s 1s +100%
polyw1_pack_32 2s 1s +100%
polyz_unpack_19_native_aarch64 2s 3s -33%
power2round 2s 2s +0%
rej_eta_c 2s 2s +0%
shake128_finalize 2s 2s +0%
shake128x4_squeezeblocks 2s 2s +0%
shake256 2s 3s -33%
shake256_absorb 2s 2s +0%
shake256_finalize 2s 3s -33%
shake256_squeeze 2s 3s -33%
shake256x4_squeezeblocks 2s 1s +100%
sig_unpack_hints 2s 3s -33%
sign_verify_pre_hash_internal 2s 6s -67%
sys_check_capability 2s 3s -33%
use_hint 2s 3s -33%
fqscale - 3s -
intt_native_aarch64 1s 2s -50%
intt_native_x86_64 1s 4s -75%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 2s -50%
keccak_finalize 1s 3s -67%
mld_ct_cmask_neg_i32 1s 2s -50%
mld_keccakf1600x4_xor_bytes_c 1s 3s -67%
poly_caddq_native_x86_64 1s 2s -50%
poly_decompose_88_native_aarch64 1s 4s -75%
polyveck_unpack_eta 1s 4s -75%
polyvecl_uniform_gamma1_serial 1s 3s -67%
polyz_unpack_17_native_aarch64 1s 3s -67%
shake128x4_absorb_once 1s 4s -75%
sk_s1hat_get_poly 1s 3s -67%

@oqs-bot

oqs-bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

CBMC Results (ML-DSA-44)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 3075s 1727s +78.1%
fqscale - 2s -
Full Results (206 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 3075s 1727s +78.1%
mld_invntt_2_layers_block 716s - new
mld_ntt_2_layers_block 587s - new
mld_ntt_2_layers 282s - new
polyvecl_pointwise_acc_montgomery_c 124s 117s +6%
rej_uniform_native 108s 140s -23%
mld_invntt_2_layers 97s - new
poly_pointwise_montgomery_c 89s 97s -8%
mld_ct_memcmp 70s 70s +0%
mld_attempt_signature_generation 65s 64s +2%
fqmul 45s 40s +12%
sign_verify_internal 29s 29s +0%
polyvec_matrix_expand 27s 24s +12%
keccakf1600x4_permute_native 21s 24s -12%
sign_signature_internal 20s 19s +5%
rej_uniform 17s 19s -11%
polyt0_unpack 16s 18s -11%
poly_invntt_tomont_c 15s 7s +114%
polyeta_unpack 15s 14s +7%
mld_check_pct 14s 15s -7%
rej_uniform_c 14s 13s +8%
compute_pack_t0_t1 13s 15s -13%
poly_uniform_eta_4x 13s 12s +8%
poly_chknorm_c 12s 20s -40%
poly_uniform_4x 12s 11s +9%
poly_add 11s 11s +0%
polyveck_decompose 11s 9s +22%
keccak_absorb_once_x4 10s 10s +0%
polyveck_chknorm 10s 12s -17%
polyvec_matrix_expand_serial 9s 7s +29%
polyvec_matrix_pointwise_montgomery_yvec 9s 8s +12%
polyz_unpack_c 9s 10s -10%
mld_compute_pack_z 7s 7s +0%
pointwise_acc_native_x86_64 7s 7s +0%
poly_chknorm_native_aarch64 7s 5s +40%
poly_use_hint_c 7s 5s +40%
poly_caddq_c 6s 8s -25%
poly_uniform_eta 6s 2s +200%
polyvecl_uniform_gamma1 6s 1s +500%
sign 6s 10s -40%
sign_open 6s 4s +50%
sign_pk_from_sk 6s 6s +0%
sign_signature_pre_hash_internal 6s 2s +200%
keccak_finalize 5s 2s +150%
keccak_init 5s 1s +400%
keccak_squeezeblocks_x4 5s 2s +150%
mld_ct_cmask_neg_i32 5s 1s +400%
mld_h 5s 5s +0%
mld_keccakf1600_permute_c 5s 7s -29%
mld_prepare_domain_separation_prefix 5s 3s +67%
ntt_native_aarch64 5s 4s +25%
pointwise_acc_native_aarch64 5s 5s +0%
poly_caddq_native_aarch64 5s 2s +150%
poly_challenge 5s 5s +0%
poly_power2round 5s 4s +25%
poly_uniform 5s 4s +25%
poly_uniform_gamma1_4x 5s 3s +67%
poly_use_hint 5s 4s +25%
poly_use_hint_native 5s 4s +25%
poly_use_hint_native_aarch64 5s 5s +0%
polyveck_caddq 5s 2s +150%
polyvecl_ntt 5s 4s +25%
rej_uniform_eta_native_aarch64 5s 2s +150%
sig_unpack_hints 5s 2s +150%
sign_keypair_internal 5s 3s +67%
sign_signature 5s 4s +25%
sign_signature_extmu 5s 3s +67%
sign_verify_pre_hash_internal 5s 4s +25%
keccak_absorb 4s 8s -50%
keccakf1600x4_xor_bytes_native 4s 2s +100%
make_hint 4s 2s +100%
mld_ct_sel_int32 4s 3s +33%
mld_value_barrier_u8 4s 4s +0%
pack_sig_z 4s 2s +100%
pointwise_native_aarch64 4s 5s -20%
pointwise_native_x86_64 4s 2s +100%
poly_caddq_native 4s 5s -20%
poly_decompose 4s 4s +0%
poly_permute_bitrev_to_custom_optional 4s 3s +33%
poly_uniform_gamma1 4s 2s +100%
polyeta_pack 4s 3s +33%
polyveck_invntt_tomont 4s 6s -33%
polyveck_pack_eta 4s 3s +33%
polyvecl_chknorm 4s 1s +300%
polyvecl_pack_eta 4s 2s +100%
polyvecl_uniform_gamma1_serial 4s 5s -20%
polyvecl_unpack_eta 4s 2s +100%
polyz_pack 4s 3s +33%
polyz_unpack_native_x86_64 4s 1s +300%
rej_eta_native 4s 5s -20%
shake256_absorb 4s 4s +0%
shake256x4_absorb_once 4s 2s +100%
sign_keypair 4s 6s -33%
sign_signature_pre_hash_shake256 4s 4s +0%
sign_verify_extmu 4s 5s -20%
sign_verify_pre_hash_shake256 4s 5s -20%
sk_t0hat_get_poly 4s 5s -20%
unpack_sk_s2hat 4s 4s +0%
use_hint 4s 5s -20%
keccak_f1600_x4_native_aarch64_v84a 3s 2s +50%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 3s +0%
keccak_squeeze 3s 3s +0%
keccakf1600_permute 3s 1s +200%
keccakf1600_xor_bytes 3s 2s +50%
keccakf1600x4_extract_bytes 3s 3s +0%
mld_ct_abs_i32 3s 1s +200%
mld_ct_cmask_nonzero_u32 3s 2s +50%
mld_ct_get_optblocker_i64 3s 2s +50%
mld_ct_get_optblocker_u32 3s 2s +50%
mld_keccakf1600x4_xor_bytes_c 3s 2s +50%
mld_polymat_expand_entry 3s 2s +50%
mld_sample_s1_s2 3s 3s +0%
montgomery_reduce 3s 3s +0%
ntt_native_x86_64 3s 4s -25%
pack_sig_c 3s 4s -25%
pack_sk_rho_key_tr_s2 3s 3s +0%
poly_caddq 3s 4s -25%
poly_chknorm 3s 2s +50%
poly_chknorm_native 3s 3s +0%
poly_chknorm_native_x86_64 3s 3s +0%
poly_decompose_32_native_aarch64 3s 2s +50%
poly_decompose_c 3s 4s -25%
poly_decompose_native 3s 2s +50%
poly_invntt_tomont 3s 1s +200%
poly_invntt_tomont_native 3s 3s +0%
poly_ntt 3s 1s +200%
poly_ntt_native 3s 5s -40%
poly_pointwise_montgomery_native 3s 3s +0%
poly_reduce 3s 4s -25%
polyt0_pack 3s 5s -40%
polyt1_pack 3s 3s +0%
polyt1_unpack 3s 3s +0%
polyveck_ntt 3s 6s -50%
polyveck_reduce 3s 2s +50%
polyveck_unpack_eta 3s 4s -25%
polyvecl_pointwise_acc_montgomery_native 3s 6s -50%
polyvecl_unpack_z 3s 3s +0%
polyw1_pack 3s 2s +50%
polyw1_pack_32 3s 2s +50%
polyw1_pack_88 3s 3s +0%
polyz_unpack 3s 4s -25%
polyz_unpack_17_native_aarch64 3s 2s +50%
polyz_unpack_native 3s 3s +0%
power2round 3s 3s +0%
reduce32 3s 3s +0%
rej_eta 3s 3s +0%
rej_uniform_native_aarch64 3s 3s +0%
shake256 3s 2s +50%
shake256_init 3s 3s +0%
shake256_release 3s 1s +200%
shake256x4_squeezeblocks 3s 3s +0%
sk_s2hat_get_poly 3s 3s +0%
sys_check_capability 3s 2s +50%
unpack_pk_t1 3s 4s -25%
unpack_sk 3s 2s +50%
unpack_sk_s1hat 3s 3s +0%
yvec_get_poly 3s 3s +0%
yvec_init 3s 2s +50%
caddq 2s 5s -60%
decompose 2s 2s +0%
intt_native_aarch64 2s 2s +0%
intt_native_x86_64 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 3s -33%
keccak_f1600_x4_native_avx2 2s 2s +0%
keccakf1600_permute_native 2s 3s -33%
keccakf1600_xor_bytes (big endian) 2s 3s -33%
keccakf1600x4_extract_bytes_native 2s 2s +0%
keccakf1600x4_permute 2s 3s -33%
keccakf1600x4_xor_bytes 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 2s +0%
mld_ct_get_optblocker_u8 2s 2s +0%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_keccakf1600x4_extract_bytes_c 2s 1s +100%
mld_sample_s1_s2_serial 2s 4s -50%
mld_value_barrier_i64 2s 2s +0%
nttunpack_native_x86_64 2s 3s -33%
pack_sig_h 2s 5s -60%
pack_sk_s1 2s 3s -33%
poly_caddq_native_x86_64 2s 2s +0%
poly_decompose_88_native_aarch64 2s 3s -33%
poly_ntt_c 2s 2s +0%
poly_permute_bitrev_to_custom_optional_native 2s 4s -50%
poly_pointwise_montgomery 2s 3s -33%
poly_shiftl 2s 3s -33%
poly_sub 2s 2s +0%
polyvec_matrix_pointwise_montgomery_row 2s 3s -33%
polyveck_pack_w1 2s 4s -50%
polyvecl_pointwise_acc_montgomery 2s 2s +0%
polyz_unpack_19_native_aarch64 2s 2s +0%
rej_eta_c 2s 4s -50%
shake128_absorb 2s 1s +100%
shake128_init 2s 2s +0%
shake128_release 2s 5s -60%
shake128_squeeze 2s 1s +100%
shake128x4_squeezeblocks 2s 1s +100%
shake256_finalize 2s 2s +0%
shake256_squeeze 2s 3s -33%
sign_verify 2s 2s +0%
sk_s1hat_get_poly 2s 5s -60%
fqscale - 2s -
keccak_f1600_x1_native_aarch64 1s 2s -50%
keccak_f1600_x1_native_aarch64_v84a 1s 4s -75%
keccakf1600_extract_bytes (big endian) 1s 1s +0%
mld_value_barrier_u32 1s 1s +0%
shake128_finalize 1s 4s -75%
shake128x4_absorb_once 1s 6s -83%
unpack_sk_t0hat 1s 3s -67%

@hanno-becker hanno-becker changed the title [WIP] Speed up C-reference NTT/invNTT with twisted zetas + 2-layer merging Speed up C-reference NTT/invNTT with twisted zetas + 2-layer merging May 13, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 38f3f91 Previous: a71b5d2 Ratio
ML-DSA-65 sign 1367463 cycles 1320861 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

@rod-chapman rod-chapman left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good. 1 suggestion to improve proof times.


# Disable any setting of EXTERNAL_SAT_SOLVER, and choose SMT backend instead
EXTERNAL_SAT_SOLVER=
CBMCFLAGS=--bitwuzla

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my laptop, proof of this new implementation takes 131s with bitwuzla, so I tried z3, which compltes the proof in about 26s. Suggest switch to CBMCFLAGS=--smt2

@rod-chapman rod-chapman force-pushed the c_ntt_2 branch 2 times, most recently from efd51e6 to d1bcc5d Compare June 22, 2026 09:28
@rod-chapman

Copy link
Copy Markdown
Contributor

22nd June updates.

Proof of mld_fqscale() fails as expected, since it does not meet the requirement to return a value bounded by MLDSA_Q, following the relaxation of the bound returned by mld_fqmul().

All other proofs, including those of merged NTT and INTT, are OK for all parameter sets, with and without MLD_CONFIG_REDUCE_RAM

hanno-becker and others added 5 commits June 24, 2026 11:38
Replace the single-layer C-reference forward and inverse NTT in
`mldsa/src/poly.c` with one that merges two layers each.

Also, store each twiddle alongside its precomputed twist, letting
`mld_fqmul(a, b, b_twisted)` drop the multiply with MLDSA_Q^{-1}
that was previously hidden inside `mld_montgomery_reduce`.

Mirrors pq-code-package/mlkem-native/#463 and pq-code/package/mlkem-native/#683

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Signed-off-by: Rod Chapman <rodchap@amazon.com>
Signed-off-by: Rod Chapman <rodchap@amazon.com>
Signed-off-by: Rod Chapman <rodchap@amazon.com>
Proof of mld_fqscale() now expected to fail, since it returns a value
bounded by MLD_FQMUL_BOUND which is too weak. mld_fqscale() needs
to be updated to return a value bounded by MLDSA_Q before this
branch can be completed.

Signed-off-by: Rod Chapman <rodchap@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants