C: Switch to [inv]NTT with 2+2+2+1 structure#1696
Conversation
5b598cc to
69e4999
Compare
CBMC Results (ML-KEM-512)
Full Results (195 proofs)
|
CBMC Results (ML-KEM-768)
Full Results (195 proofs)
|
CBMC Results (ML-KEM-1024)
Full Results (195 proofs)
|
Rewrite mlk_poly_ntt_c / mlk_poly_invntt_tomont_c to process two
layers at a time, with three 2-layer passes plus the leftover layer 7
as a single layer.
Introduces shared mlk_ct_butterfly and mlk_gs_butterfly helpers;
the inverse 2-layer block applies four GS butterflies and then
Barrett-reduces the additive outputs explicitly.
mlk_fqmul now takes a precomputed b_twisted = b * MLKEM_Q^{-1} mod 2^16
and uses a hi-mul / lo-mul-and-correct sequence in place of an inline
mlk_montgomery_reduce, dropping the QINV multiply. The mlk_zetas table
is regenerated as int16_t[128][2] of (zeta_mont, zeta_twisted) pairs.
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
oqs-bot
left a comment
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
12320 cycles |
12319 cycles |
1.00 |
ML-KEM-512 encaps |
14999 cycles |
14997 cycles |
1.00 |
ML-KEM-512 decaps |
19554 cycles |
19549 cycles |
1.00 |
ML-KEM-768 keypair |
21264 cycles |
21264 cycles |
1 |
ML-KEM-768 encaps |
23873 cycles |
23871 cycles |
1.00 |
ML-KEM-768 decaps |
30416 cycles |
30423 cycles |
1.00 |
ML-KEM-1024 keypair |
30328 cycles |
30327 cycles |
1.00 |
ML-KEM-1024 encaps |
34574 cycles |
34573 cycles |
1.00 |
ML-KEM-1024 decaps |
44191 cycles |
44191 cycles |
1 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
12042 cycles |
12031 cycles |
1.00 |
ML-KEM-512 encaps |
13614 cycles |
13792 cycles |
0.99 |
ML-KEM-512 decaps |
17818 cycles |
17802 cycles |
1.00 |
ML-KEM-768 keypair |
21294 cycles |
21035 cycles |
1.01 |
ML-KEM-768 encaps |
22008 cycles |
22107 cycles |
1.00 |
ML-KEM-768 decaps |
28034 cycles |
28330 cycles |
0.99 |
ML-KEM-1024 keypair |
29563 cycles |
29964 cycles |
0.99 |
ML-KEM-1024 encaps |
31689 cycles |
31704 cycles |
1.00 |
ML-KEM-1024 decaps |
39367 cycles |
39312 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
24431 cycles |
28199 cycles |
0.87 |
ML-KEM-512 encaps |
29933 cycles |
36622 cycles |
0.82 |
ML-KEM-512 decaps |
37694 cycles |
45214 cycles |
0.83 |
ML-KEM-768 keypair |
40053 cycles |
46304 cycles |
0.87 |
ML-KEM-768 encaps |
50467 cycles |
55843 cycles |
0.90 |
ML-KEM-768 decaps |
59728 cycles |
69876 cycles |
0.85 |
ML-KEM-1024 keypair |
64223 cycles |
70436 cycles |
0.91 |
ML-KEM-1024 encaps |
76545 cycles |
82480 cycles |
0.93 |
ML-KEM-1024 decaps |
86915 cycles |
99348 cycles |
0.87 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
14215 cycles |
14239 cycles |
1.00 |
ML-KEM-512 encaps |
15990 cycles |
15964 cycles |
1.00 |
ML-KEM-512 decaps |
21534 cycles |
21528 cycles |
1.00 |
ML-KEM-768 keypair |
25122 cycles |
24710 cycles |
1.02 |
ML-KEM-768 encaps |
25669 cycles |
25470 cycles |
1.01 |
ML-KEM-768 decaps |
33537 cycles |
33335 cycles |
1.01 |
ML-KEM-1024 keypair |
34894 cycles |
37146 cycles |
0.94 |
ML-KEM-1024 encaps |
36116 cycles |
36786 cycles |
0.98 |
ML-KEM-1024 decaps |
47225 cycles |
46716 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
12789 cycles |
12790 cycles |
1.00 |
ML-KEM-512 encaps |
14279 cycles |
14273 cycles |
1.00 |
ML-KEM-512 decaps |
19139 cycles |
19129 cycles |
1.00 |
ML-KEM-768 keypair |
22564 cycles |
22413 cycles |
1.01 |
ML-KEM-768 encaps |
23063 cycles |
23072 cycles |
1.00 |
ML-KEM-768 decaps |
30067 cycles |
30061 cycles |
1.00 |
ML-KEM-1024 keypair |
34215 cycles |
33027 cycles |
1.04 |
ML-KEM-1024 encaps |
33003 cycles |
33126 cycles |
1.00 |
ML-KEM-1024 decaps |
42408 cycles |
42412 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-1024 keypair |
34215 cycles |
33027 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
ppc64le (POWER10) benchmarks
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
52364 cycles |
59434 cycles |
0.88 |
ML-KEM-512 encaps |
62294 cycles |
72134 cycles |
0.86 |
ML-KEM-512 decaps |
76687 cycles |
92082 cycles |
0.83 |
ML-KEM-768 keypair |
89778 cycles |
99316 cycles |
0.90 |
ML-KEM-768 encaps |
103607 cycles |
115930 cycles |
0.89 |
ML-KEM-768 decaps |
122375 cycles |
141912 cycles |
0.86 |
ML-KEM-1024 keypair |
139005 cycles |
150195 cycles |
0.93 |
ML-KEM-1024 encaps |
155445 cycles |
169079 cycles |
0.92 |
ML-KEM-1024 decaps |
177695 cycles |
200415 cycles |
0.89 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
17690 cycles |
17644 cycles |
1.00 |
ML-KEM-512 encaps |
20636 cycles |
20596 cycles |
1.00 |
ML-KEM-512 decaps |
27083 cycles |
27048 cycles |
1.00 |
ML-KEM-768 keypair |
29976 cycles |
29903 cycles |
1.00 |
ML-KEM-768 encaps |
32752 cycles |
32771 cycles |
1.00 |
ML-KEM-768 decaps |
42010 cycles |
41962 cycles |
1.00 |
ML-KEM-1024 keypair |
43720 cycles |
43743 cycles |
1.00 |
ML-KEM-1024 encaps |
48775 cycles |
48657 cycles |
1.00 |
ML-KEM-1024 decaps |
61383 cycles |
61383 cycles |
1 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
17600 cycles |
17540 cycles |
1.00 |
ML-KEM-512 encaps |
19907 cycles |
19937 cycles |
1.00 |
ML-KEM-512 decaps |
26420 cycles |
26445 cycles |
1.00 |
ML-KEM-768 keypair |
31206 cycles |
31159 cycles |
1.00 |
ML-KEM-768 encaps |
31864 cycles |
32046 cycles |
0.99 |
ML-KEM-768 decaps |
41472 cycles |
41536 cycles |
1.00 |
ML-KEM-1024 keypair |
43815 cycles |
43957 cycles |
1.00 |
ML-KEM-1024 encaps |
45880 cycles |
45616 cycles |
1.01 |
ML-KEM-1024 decaps |
58050 cycles |
58219 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
33786 cycles |
40258 cycles |
0.84 |
ML-KEM-512 encaps |
41402 cycles |
48385 cycles |
0.86 |
ML-KEM-512 decaps |
51313 cycles |
62592 cycles |
0.82 |
ML-KEM-768 keypair |
54176 cycles |
63729 cycles |
0.85 |
ML-KEM-768 encaps |
65367 cycles |
74928 cycles |
0.87 |
ML-KEM-768 decaps |
78299 cycles |
93722 cycles |
0.84 |
ML-KEM-1024 keypair |
84344 cycles |
95285 cycles |
0.89 |
ML-KEM-1024 encaps |
98201 cycles |
109505 cycles |
0.90 |
ML-KEM-1024 decaps |
114088 cycles |
132331 cycles |
0.86 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
30684 cycles |
36614 cycles |
0.84 |
ML-KEM-512 encaps |
36951 cycles |
43076 cycles |
0.86 |
ML-KEM-512 decaps |
45696 cycles |
55713 cycles |
0.82 |
ML-KEM-768 keypair |
49371 cycles |
58664 cycles |
0.84 |
ML-KEM-768 encaps |
58602 cycles |
67519 cycles |
0.87 |
ML-KEM-768 decaps |
69939 cycles |
84462 cycles |
0.83 |
ML-KEM-1024 keypair |
76257 cycles |
88980 cycles |
0.86 |
ML-KEM-1024 encaps |
87602 cycles |
99212 cycles |
0.88 |
ML-KEM-1024 decaps |
101779 cycles |
120642 cycles |
0.84 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
34213 cycles |
35412 cycles |
0.97 |
ML-KEM-512 encaps |
38932 cycles |
40111 cycles |
0.97 |
ML-KEM-512 decaps |
48961 cycles |
51138 cycles |
0.96 |
ML-KEM-768 keypair |
54993 cycles |
56668 cycles |
0.97 |
ML-KEM-768 encaps |
63100 cycles |
65152 cycles |
0.97 |
ML-KEM-768 decaps |
75917 cycles |
79299 cycles |
0.96 |
ML-KEM-1024 keypair |
85413 cycles |
87866 cycles |
0.97 |
ML-KEM-1024 encaps |
94656 cycles |
96875 cycles |
0.98 |
ML-KEM-1024 decaps |
111743 cycles |
115827 cycles |
0.96 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
18675 cycles |
18637 cycles |
1.00 |
ML-KEM-512 encaps |
21886 cycles |
21874 cycles |
1.00 |
ML-KEM-512 decaps |
28890 cycles |
28863 cycles |
1.00 |
ML-KEM-768 keypair |
31630 cycles |
31540 cycles |
1.00 |
ML-KEM-768 encaps |
34788 cycles |
34773 cycles |
1.00 |
ML-KEM-768 decaps |
44835 cycles |
44778 cycles |
1.00 |
ML-KEM-1024 keypair |
46068 cycles |
46080 cycles |
1.00 |
ML-KEM-1024 encaps |
51494 cycles |
51490 cycles |
1.00 |
ML-KEM-1024 decaps |
65004 cycles |
65028 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
39358 cycles |
45700 cycles |
0.86 |
ML-KEM-512 encaps |
46964 cycles |
54451 cycles |
0.86 |
ML-KEM-512 decaps |
57447 cycles |
69774 cycles |
0.82 |
ML-KEM-768 keypair |
62990 cycles |
74220 cycles |
0.85 |
ML-KEM-768 encaps |
74954 cycles |
86044 cycles |
0.87 |
ML-KEM-768 decaps |
88809 cycles |
106669 cycles |
0.83 |
ML-KEM-1024 keypair |
100417 cycles |
112098 cycles |
0.90 |
ML-KEM-1024 encaps |
110546 cycles |
124743 cycles |
0.89 |
ML-KEM-1024 decaps |
128093 cycles |
150712 cycles |
0.85 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
59837 cycles |
59732 cycles |
1.00 |
ML-KEM-512 encaps |
67447 cycles |
67418 cycles |
1.00 |
ML-KEM-512 decaps |
86186 cycles |
86116 cycles |
1.00 |
ML-KEM-768 keypair |
97444 cycles |
97471 cycles |
1.00 |
ML-KEM-768 encaps |
110872 cycles |
111029 cycles |
1.00 |
ML-KEM-768 decaps |
137941 cycles |
137995 cycles |
1.00 |
ML-KEM-1024 keypair |
154689 cycles |
154794 cycles |
1.00 |
ML-KEM-1024 encaps |
171850 cycles |
171103 cycles |
1.00 |
ML-KEM-1024 decaps |
209734 cycles |
208406 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
28264 cycles |
28220 cycles |
1.00 |
ML-KEM-512 encaps |
34156 cycles |
34107 cycles |
1.00 |
ML-KEM-512 decaps |
44374 cycles |
44335 cycles |
1.00 |
ML-KEM-768 keypair |
47618 cycles |
47615 cycles |
1.00 |
ML-KEM-768 encaps |
53933 cycles |
53937 cycles |
1.00 |
ML-KEM-768 decaps |
68339 cycles |
68365 cycles |
1.00 |
ML-KEM-1024 keypair |
70245 cycles |
70246 cycles |
1.00 |
ML-KEM-1024 encaps |
78734 cycles |
78726 cycles |
1.00 |
ML-KEM-1024 decaps |
98418 cycles |
98445 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
37514 cycles |
38880 cycles |
0.96 |
ML-KEM-512 encaps |
42772 cycles |
44586 cycles |
0.96 |
ML-KEM-512 decaps |
53687 cycles |
56659 cycles |
0.95 |
ML-KEM-768 keypair |
60007 cycles |
62298 cycles |
0.96 |
ML-KEM-768 encaps |
68948 cycles |
72317 cycles |
0.95 |
ML-KEM-768 decaps |
82493 cycles |
87701 cycles |
0.94 |
ML-KEM-1024 keypair |
93052 cycles |
96154 cycles |
0.97 |
ML-KEM-1024 encaps |
103167 cycles |
106126 cycles |
0.97 |
ML-KEM-1024 decaps |
121291 cycles |
126570 cycles |
0.96 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
28238 cycles |
28269 cycles |
1.00 |
ML-KEM-512 encaps |
34162 cycles |
34122 cycles |
1.00 |
ML-KEM-512 decaps |
44342 cycles |
44378 cycles |
1.00 |
ML-KEM-768 keypair |
47642 cycles |
47674 cycles |
1.00 |
ML-KEM-768 encaps |
53923 cycles |
53908 cycles |
1.00 |
ML-KEM-768 decaps |
68400 cycles |
68363 cycles |
1.00 |
ML-KEM-1024 keypair |
70382 cycles |
70273 cycles |
1.00 |
ML-KEM-1024 encaps |
78782 cycles |
78768 cycles |
1.00 |
ML-KEM-1024 decaps |
98584 cycles |
98473 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
55880 cycles |
59124 cycles |
0.95 |
ML-KEM-512 encaps |
65578 cycles |
68626 cycles |
0.96 |
ML-KEM-512 decaps |
82286 cycles |
87341 cycles |
0.94 |
ML-KEM-768 keypair |
90360 cycles |
95326 cycles |
0.95 |
ML-KEM-768 encaps |
104886 cycles |
109860 cycles |
0.95 |
ML-KEM-768 decaps |
126772 cycles |
134332 cycles |
0.94 |
ML-KEM-1024 keypair |
140425 cycles |
147915 cycles |
0.95 |
ML-KEM-1024 encaps |
157358 cycles |
163791 cycles |
0.96 |
ML-KEM-1024 decaps |
185033 cycles |
195404 cycles |
0.95 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
51033 cycles |
50865 cycles |
1.00 |
ML-KEM-512 encaps |
58913 cycles |
58841 cycles |
1.00 |
ML-KEM-512 decaps |
74849 cycles |
74794 cycles |
1.00 |
ML-KEM-768 keypair |
86918 cycles |
86024 cycles |
1.01 |
ML-KEM-768 encaps |
95132 cycles |
94487 cycles |
1.01 |
ML-KEM-768 decaps |
118067 cycles |
119530 cycles |
0.99 |
ML-KEM-1024 keypair |
130121 cycles |
130071 cycles |
1.00 |
ML-KEM-1024 encaps |
142355 cycles |
142892 cycles |
1.00 |
ML-KEM-1024 decaps |
174938 cycles |
173373 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks
Details
| Benchmark suite | Current: ee59089 | Previous: 070028c | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
155485 cycles |
155504 cycles |
1.00 |
ML-KEM-512 encaps |
163394 cycles |
163399 cycles |
1.00 |
ML-KEM-512 decaps |
206591 cycles |
206667 cycles |
1.00 |
ML-KEM-768 keypair |
249903 cycles |
249893 cycles |
1.00 |
ML-KEM-768 encaps |
270434 cycles |
270406 cycles |
1.00 |
ML-KEM-768 decaps |
332188 cycles |
332823 cycles |
1.00 |
ML-KEM-1024 keypair |
395688 cycles |
395922 cycles |
1.00 |
ML-KEM-1024 encaps |
422596 cycles |
423034 cycles |
1.00 |
ML-KEM-1024 decaps |
506524 cycles |
506558 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
Rewrite mlk_poly_ntt_c / mlk_poly_invntt_tomont_c to process two layers at a time, with three 2-layer passes plus the leftover layer 7 as a single layer.
Introduces shared mlk_ct_butterfly and mlk_gs_butterfly helpers; the inverse 2-layer block applies four GS butterflies and then Barrett-reduces the additive outputs explicitly.
mlk_fqmul now takes a precomputed b_twisted = b * MLKEM_Q^{-1} mod 2^16 and uses a hi-mul / lo-mul-and-correct sequence in place of an inline mlk_montgomery_reduce, dropping the QINV multiply. The mlk_zetas table is regenerated as int16_t[128][2] of (zeta_mont, zeta_twisted) pairs.