x86_64 + HOL-Light: Replace poly_use_hint AVX2 intrinsics with hand-written assembly and HOL-Light proofs#1189
x86_64 + HOL-Light: Replace poly_use_hint AVX2 intrinsics with hand-written assembly and HOL-Light proofs#1189jakemas wants to merge 1 commit into
Conversation
9981d8b to
5dfcb0c
Compare
CBMC Results (ML-DSA-65, REDUCE-RAM)Full Results (206 proofs)
|
CBMC Results (ML-DSA-44, REDUCE-RAM)Full Results (206 proofs)
|
CBMC Results (ML-DSA-87, REDUCE-RAM)Full Results (206 proofs)
|
CBMC Results (ML-DSA-65)Full Results (206 proofs)
|
CBMC Results (ML-DSA-44)Full Results (206 proofs)
|
CBMC Results (ML-DSA-87)Full Results (206 proofs)
|
5dfcb0c to
2b012e5
Compare
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46543 cycles |
46505 cycles |
1.00 |
ML-DSA-44 sign |
131110 cycles |
131069 cycles |
1.00 |
ML-DSA-44 verify |
47346 cycles |
47311 cycles |
1.00 |
ML-DSA-65 keypair |
81683 cycles |
81686 cycles |
1.00 |
ML-DSA-65 sign |
215314 cycles |
215297 cycles |
1.00 |
ML-DSA-65 verify |
79298 cycles |
79302 cycles |
1.00 |
ML-DSA-87 keypair |
132406 cycles |
132401 cycles |
1.00 |
ML-DSA-87 sign |
277266 cycles |
277556 cycles |
1.00 |
ML-DSA-87 verify |
134048 cycles |
134051 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (no-opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112979 cycles |
112792 cycles |
1.00 |
ML-DSA-44 sign |
401852 cycles |
401073 cycles |
1.00 |
ML-DSA-44 verify |
119681 cycles |
119472 cycles |
1.00 |
ML-DSA-65 keypair |
193030 cycles |
192928 cycles |
1.00 |
ML-DSA-65 sign |
650200 cycles |
649950 cycles |
1.00 |
ML-DSA-65 verify |
192928 cycles |
192868 cycles |
1.00 |
ML-DSA-87 keypair |
318757 cycles |
318801 cycles |
1.00 |
ML-DSA-87 sign |
828689 cycles |
828900 cycles |
1.00 |
ML-DSA-87 verify |
326680 cycles |
326715 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112165 cycles |
112135 cycles |
1.00 |
ML-DSA-44 sign |
353502 cycles |
353883 cycles |
1.00 |
ML-DSA-44 verify |
117018 cycles |
117213 cycles |
1.00 |
ML-DSA-65 keypair |
194781 cycles |
194346 cycles |
1.00 |
ML-DSA-65 sign |
583902 cycles |
583663 cycles |
1.00 |
ML-DSA-65 verify |
192714 cycles |
193077 cycles |
1.00 |
ML-DSA-87 keypair |
320922 cycles |
320080 cycles |
1.00 |
ML-DSA-87 sign |
747323 cycles |
747211 cycles |
1.00 |
ML-DSA-87 verify |
318745 cycles |
317932 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
211556 cycles |
211551 cycles |
1.00 |
ML-DSA-44 sign |
758823 cycles |
759909 cycles |
1.00 |
ML-DSA-44 verify |
229004 cycles |
229380 cycles |
1.00 |
ML-DSA-65 keypair |
376999 cycles |
378142 cycles |
1.00 |
ML-DSA-65 sign |
1247052 cycles |
1247244 cycles |
1.00 |
ML-DSA-65 verify |
371175 cycles |
371972 cycles |
1.00 |
ML-DSA-87 keypair |
600407 cycles |
601369 cycles |
1.00 |
ML-DSA-87 sign |
1582648 cycles |
1581866 cycles |
1.00 |
ML-DSA-87 verify |
616058 cycles |
617247 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
43539 cycles |
43355 cycles |
1.00 |
ML-DSA-44 sign |
130496 cycles |
130544 cycles |
1.00 |
ML-DSA-44 verify |
45200 cycles |
45221 cycles |
1.00 |
ML-DSA-65 keypair |
75740 cycles |
75384 cycles |
1.00 |
ML-DSA-65 sign |
214734 cycles |
214301 cycles |
1.00 |
ML-DSA-65 verify |
74318 cycles |
74395 cycles |
1.00 |
ML-DSA-87 keypair |
123030 cycles |
123229 cycles |
1.00 |
ML-DSA-87 sign |
271641 cycles |
270971 cycles |
1.00 |
ML-DSA-87 verify |
120752 cycles |
120534 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
61570 cycles |
61347 cycles |
1.00 |
ML-DSA-44 sign |
188839 cycles |
189018 cycles |
1.00 |
ML-DSA-44 verify |
66442 cycles |
66490 cycles |
1.00 |
ML-DSA-65 keypair |
110091 cycles |
107823 cycles |
1.02 |
ML-DSA-65 sign |
314575 cycles |
311956 cycles |
1.01 |
ML-DSA-65 verify |
110040 cycles |
108642 cycles |
1.01 |
ML-DSA-87 keypair |
171189 cycles |
171470 cycles |
1.00 |
ML-DSA-87 sign |
379547 cycles |
378524 cycles |
1.00 |
ML-DSA-87 verify |
171219 cycles |
169439 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
91607 cycles |
91483 cycles |
1.00 |
ML-DSA-44 sign |
351599 cycles |
351948 cycles |
1.00 |
ML-DSA-44 verify |
99770 cycles |
99772 cycles |
1.00 |
ML-DSA-65 keypair |
153995 cycles |
153840 cycles |
1.00 |
ML-DSA-65 sign |
572344 cycles |
570802 cycles |
1.00 |
ML-DSA-65 verify |
159837 cycles |
159638 cycles |
1.00 |
ML-DSA-87 keypair |
255277 cycles |
255330 cycles |
1.00 |
ML-DSA-87 sign |
726258 cycles |
726314 cycles |
1.00 |
ML-DSA-87 verify |
263867 cycles |
264290 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
55426 cycles |
55817 cycles |
0.99 |
ML-DSA-44 sign |
159150 cycles |
160142 cycles |
0.99 |
ML-DSA-44 verify |
57595 cycles |
58345 cycles |
0.99 |
ML-DSA-65 keypair |
95738 cycles |
96668 cycles |
0.99 |
ML-DSA-65 sign |
260961 cycles |
264062 cycles |
0.99 |
ML-DSA-65 verify |
96292 cycles |
97394 cycles |
0.99 |
ML-DSA-87 keypair |
154498 cycles |
156370 cycles |
0.99 |
ML-DSA-87 sign |
323133 cycles |
326034 cycles |
0.99 |
ML-DSA-87 verify |
151292 cycles |
152600 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
154549 cycles |
154341 cycles |
1.00 |
ML-DSA-44 sign |
590341 cycles |
588680 cycles |
1.00 |
ML-DSA-44 verify |
169731 cycles |
169051 cycles |
1.00 |
ML-DSA-65 keypair |
262129 cycles |
262010 cycles |
1.00 |
ML-DSA-65 sign |
961211 cycles |
962118 cycles |
1.00 |
ML-DSA-65 verify |
271442 cycles |
271267 cycles |
1.00 |
ML-DSA-87 keypair |
431136 cycles |
431402 cycles |
1.00 |
ML-DSA-87 sign |
1212224 cycles |
1210399 cycles |
1.00 |
ML-DSA-87 verify |
447543 cycles |
447214 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
67229 cycles |
67424 cycles |
1.00 |
ML-DSA-44 sign |
198371 cycles |
198161 cycles |
1.00 |
ML-DSA-44 verify |
70305 cycles |
70119 cycles |
1.00 |
ML-DSA-65 keypair |
119276 cycles |
119402 cycles |
1.00 |
ML-DSA-65 sign |
326136 cycles |
326062 cycles |
1.00 |
ML-DSA-65 verify |
116872 cycles |
116765 cycles |
1.00 |
ML-DSA-87 keypair |
196540 cycles |
196679 cycles |
1.00 |
ML-DSA-87 sign |
421405 cycles |
421900 cycles |
1.00 |
ML-DSA-87 verify |
193296 cycles |
193403 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
71403 cycles |
71551 cycles |
1.00 |
ML-DSA-44 sign |
208982 cycles |
208983 cycles |
1.00 |
ML-DSA-44 verify |
74836 cycles |
74733 cycles |
1.00 |
ML-DSA-65 keypair |
125933 cycles |
125927 cycles |
1.00 |
ML-DSA-65 sign |
345627 cycles |
345490 cycles |
1.00 |
ML-DSA-65 verify |
124118 cycles |
124185 cycles |
1.00 |
ML-DSA-87 keypair |
207065 cycles |
206521 cycles |
1.00 |
ML-DSA-87 sign |
444070 cycles |
439805 cycles |
1.01 |
ML-DSA-87 verify |
204100 cycles |
204446 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
133180 cycles |
134773 cycles |
0.99 |
ML-DSA-44 sign |
518749 cycles |
524514 cycles |
0.99 |
ML-DSA-44 verify |
146562 cycles |
147960 cycles |
0.99 |
ML-DSA-65 keypair |
225259 cycles |
228152 cycles |
0.99 |
ML-DSA-65 sign |
846103 cycles |
854970 cycles |
0.99 |
ML-DSA-65 verify |
234865 cycles |
238032 cycles |
0.99 |
ML-DSA-87 keypair |
367122 cycles |
371227 cycles |
0.99 |
ML-DSA-87 sign |
1060070 cycles |
1069626 cycles |
0.99 |
ML-DSA-87 verify |
381059 cycles |
384666 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46429 cycles |
46780 cycles |
0.99 |
ML-DSA-44 sign |
139202 cycles |
138851 cycles |
1.00 |
ML-DSA-44 verify |
49535 cycles |
49217 cycles |
1.01 |
ML-DSA-65 keypair |
82794 cycles |
82672 cycles |
1.00 |
ML-DSA-65 sign |
227727 cycles |
227571 cycles |
1.00 |
ML-DSA-65 verify |
82058 cycles |
82025 cycles |
1.00 |
ML-DSA-87 keypair |
129470 cycles |
130121 cycles |
0.99 |
ML-DSA-87 sign |
280908 cycles |
280059 cycles |
1.00 |
ML-DSA-87 verify |
128521 cycles |
128980 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
127570 cycles |
127652 cycles |
1.00 |
ML-DSA-44 sign |
441188 cycles |
441196 cycles |
1.00 |
ML-DSA-44 verify |
136371 cycles |
136381 cycles |
1.00 |
ML-DSA-65 keypair |
220497 cycles |
220706 cycles |
1.00 |
ML-DSA-65 sign |
714325 cycles |
713750 cycles |
1.00 |
ML-DSA-65 verify |
221016 cycles |
220752 cycles |
1.00 |
ML-DSA-87 keypair |
364546 cycles |
365114 cycles |
1.00 |
ML-DSA-87 sign |
915587 cycles |
921254 cycles |
0.99 |
ML-DSA-87 verify |
370845 cycles |
370795 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
137943 cycles |
138043 cycles |
1.00 |
ML-DSA-44 sign |
486093 cycles |
486132 cycles |
1.00 |
ML-DSA-44 verify |
149025 cycles |
149080 cycles |
1.00 |
ML-DSA-65 keypair |
241723 cycles |
241858 cycles |
1.00 |
ML-DSA-65 sign |
792077 cycles |
791595 cycles |
1.00 |
ML-DSA-65 verify |
242156 cycles |
241299 cycles |
1.00 |
ML-DSA-87 keypair |
395769 cycles |
396308 cycles |
1.00 |
ML-DSA-87 sign |
1013650 cycles |
1019320 cycles |
0.99 |
ML-DSA-87 verify |
403627 cycles |
403778 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
227585 cycles |
229314 cycles |
0.99 |
ML-DSA-44 sign |
617247 cycles |
633574 cycles |
0.97 |
ML-DSA-44 verify |
223087 cycles |
223313 cycles |
1.00 |
ML-DSA-65 keypair |
383083 cycles |
387745 cycles |
0.99 |
ML-DSA-65 sign |
991379 cycles |
1000126 cycles |
0.99 |
ML-DSA-65 verify |
365888 cycles |
368820 cycles |
0.99 |
ML-DSA-87 keypair |
631971 cycles |
640246 cycles |
0.99 |
ML-DSA-87 sign |
1311893 cycles |
1340002 cycles |
0.98 |
ML-DSA-87 verify |
618225 cycles |
625960 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112409 cycles |
112383 cycles |
1.00 |
ML-DSA-44 sign |
354351 cycles |
353857 cycles |
1.00 |
ML-DSA-44 verify |
117504 cycles |
117230 cycles |
1.00 |
ML-DSA-65 keypair |
194514 cycles |
194621 cycles |
1.00 |
ML-DSA-65 sign |
584390 cycles |
584356 cycles |
1.00 |
ML-DSA-65 verify |
193521 cycles |
193140 cycles |
1.00 |
ML-DSA-87 keypair |
320989 cycles |
320887 cycles |
1.00 |
ML-DSA-87 sign |
747858 cycles |
746586 cycles |
1.00 |
ML-DSA-87 verify |
318073 cycles |
318597 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
121906 cycles |
118559 cycles |
1.03 |
ML-DSA-44 sign |
466366 cycles |
458188 cycles |
1.02 |
ML-DSA-44 verify |
134415 cycles |
130934 cycles |
1.03 |
ML-DSA-65 keypair |
200533 cycles |
200695 cycles |
1.00 |
ML-DSA-65 sign |
745242 cycles |
742901 cycles |
1.00 |
ML-DSA-65 verify |
208898 cycles |
209938 cycles |
1.00 |
ML-DSA-87 keypair |
330451 cycles |
330620 cycles |
1.00 |
ML-DSA-87 sign |
937675 cycles |
935710 cycles |
1.00 |
ML-DSA-87 verify |
344116 cycles |
343441 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
212331 cycles |
211560 cycles |
1.00 |
ML-DSA-44 sign |
761162 cycles |
759550 cycles |
1.00 |
ML-DSA-44 verify |
229987 cycles |
229101 cycles |
1.00 |
ML-DSA-65 keypair |
378737 cycles |
377267 cycles |
1.00 |
ML-DSA-65 sign |
1248155 cycles |
1247098 cycles |
1.00 |
ML-DSA-65 verify |
373444 cycles |
371496 cycles |
1.01 |
ML-DSA-87 keypair |
602568 cycles |
600625 cycles |
1.00 |
ML-DSA-87 sign |
1584683 cycles |
1584726 cycles |
1.00 |
ML-DSA-87 verify |
618500 cycles |
616267 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
314774 cycles |
316204 cycles |
1.00 |
ML-DSA-44 sign |
1195084 cycles |
1200034 cycles |
1.00 |
ML-DSA-44 verify |
344766 cycles |
346175 cycles |
1.00 |
ML-DSA-65 keypair |
586747 cycles |
573340 cycles |
1.02 |
ML-DSA-65 sign |
1934847 cycles |
1950745 cycles |
0.99 |
ML-DSA-65 verify |
562256 cycles |
540488 cycles |
1.04 |
ML-DSA-87 keypair |
872532 cycles |
851822 cycles |
1.02 |
ML-DSA-87 sign |
2477111 cycles |
2431803 cycles |
1.02 |
ML-DSA-87 verify |
898628 cycles |
891771 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-65 verify |
562256 cycles |
540488 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
271763 cycles |
271174 cycles |
1.00 |
ML-DSA-44 sign |
821008 cycles |
824814 cycles |
1.00 |
ML-DSA-44 verify |
276025 cycles |
275199 cycles |
1.00 |
ML-DSA-65 keypair |
465651 cycles |
464395 cycles |
1.00 |
ML-DSA-65 sign |
1327127 cycles |
1344742 cycles |
0.99 |
ML-DSA-65 verify |
451152 cycles |
453340 cycles |
1.00 |
ML-DSA-87 keypair |
803172 cycles |
800977 cycles |
1.00 |
ML-DSA-87 sign |
1864325 cycles |
1859208 cycles |
1.00 |
ML-DSA-87 verify |
781387 cycles |
774585 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
465923 cycles |
463353 cycles |
1.01 |
ML-DSA-44 sign |
2142098 cycles |
2132217 cycles |
1.00 |
ML-DSA-44 verify |
558746 cycles |
555186 cycles |
1.01 |
ML-DSA-65 keypair |
784684 cycles |
783632 cycles |
1.00 |
ML-DSA-65 sign |
3495012 cycles |
3480344 cycles |
1.00 |
ML-DSA-65 verify |
868294 cycles |
864386 cycles |
1.00 |
ML-DSA-87 keypair |
1269194 cycles |
1259586 cycles |
1.01 |
ML-DSA-87 sign |
4369517 cycles |
4283557 cycles |
1.02 |
ML-DSA-87 verify |
1391355 cycles |
1383524 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Details
| Benchmark suite | Current: 2b012e5 | Previous: e5884bb | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
760366 cycles |
759827 cycles |
1.00 |
ML-DSA-44 sign |
3140093 cycles |
3139576 cycles |
1.00 |
ML-DSA-44 verify |
859520 cycles |
859029 cycles |
1.00 |
ML-DSA-65 keypair |
1284878 cycles |
1285683 cycles |
1.00 |
ML-DSA-65 sign |
5074016 cycles |
5080630 cycles |
1.00 |
ML-DSA-65 verify |
1363408 cycles |
1364316 cycles |
1.00 |
ML-DSA-87 keypair |
2112289 cycles |
2112201 cycles |
1.00 |
ML-DSA-87 sign |
6350994 cycles |
6353149 cycles |
1.00 |
ML-DSA-87 verify |
2228031 cycles |
2227965 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
03b65ee to
f8c8ba3
Compare
04bb175 to
2f73667
Compare
…ritten assembly and HOL-Light proofs Replaces the poly_use_hint_32 (ML-DSA-65/87) and poly_use_hint_88 (ML-DSA-44) AVX2 intrinsics with hand-written assembly, and adds HOL-Light functional correctness and constant-time/memory-safety proofs for both x86_64 routines. Signed-off-by: Jake Massimo <jakemas@amazon.com> Co-authored-by: willieyz <willieyz@users.noreply.github.com>
2f73667 to
9e615b9
Compare
mkannwischer
left a comment
There was a problem hiding this comment.
Thanks @jakemas for moving this forward!
| (* The decompose helper lemmas (A1_BOUND, A1_WRAP, BARRETT_INTERVAL_32) and *) | ||
| (* the DIV/MOD equivalence tactics are arch-independent and shared with the *) | ||
| (* AArch64 proof of the same routine. *) |
There was a problem hiding this comment.
If they are shared, can we move them to a shared file, please?
| // Compute a1 = round-(a / 523776) ≈ round(a * 1074791425 / | ||
| // 2^49), where round-() denotes "round half down". This is | ||
| // exact for 0 <= a < Q. Note that half is rounded down since | ||
| // 1074791425 / 2^49 ≲ 1 / 523776. |
There was a problem hiding this comment.
These comments seem to describe the AArch64 code, not the x86_64 code. Can you please adjust them to describe what this code is doing?
| /* Reference: The reference avx2 implementation checks a0 >= 0, which is | ||
| * different from the specification and the reference C implementation. We | ||
| * follow the specification and check a0 > 0. | ||
| */ |
There was a problem hiding this comment.
This is no longer true since 3 weeks ago: pq-crystals/dilithium@bba9534. We should reference that commit here
| let BARRETT_INTERVAL_32 = prove( | ||
| `!a lo hi k. | ||
| lo <= a /\ a <= hi /\ | ||
| k * 262144 <= (2 * lo * 1074791425) DIV 4294967296 + 131072 /\ | ||
| (2 * hi * 1074791425) DIV 4294967296 + 131072 < (k + 1) * 262144 /\ | ||
| k * 4194304 <= (lo + 127) DIV 128 * 1025 + 2097152 /\ | ||
| (hi + 127) DIV 128 * 1025 + 2097152 < (k + 1) * 4194304 | ||
| ==> ((2 * a * 1074791425) DIV 4294967296 + 131072) DIV 262144 = k /\ | ||
| ((a + 127) DIV 128 * 1025 + 2097152) DIV 4194304 = k`, |
There was a problem hiding this comment.
I find this rather confusing - this seems to mix the AArch64 branches (k * 262144 <= (2 * lo * 1074791425) DIV 4294967296 + 131072) and the x86 branches (k * 4194304 <= (lo + 127) DIV 128 * 1025 + 2097152). But the AArch64 branches are dead in this proof.
It seems easiest to remove the dead branches and not claim that this theorem is shared with AArch64.
Resolves #484
Resolves #485
This is the only remaining intrinsic to be picked up, and proved. I know @willieyz has a PR in flight, I've taken it, brought it up to main, fixed conflicts, adjusted asm, fixed #1074/#1076, added hol-light CBMC proofs and supporting infra.
Replaces the
poly_use_hint_32(ML-DSA-65/87) andpoly_use_hint_88(ML-DSA-44) AVX2 intrinsics with hand-written assembly, and adds HOL-Light functional-correctness and constant-time / memory-safety proofs for both x86_64 routines.Builds on @willieyz's original assembly (#940). Changes from that code:
(int32_t *a, const int32_t *h)instead of the 3-arg out-of-place form._asmconvention and wired the native backend to the assembly (not the.cintrinsic).poly_use_hint_32compare bound0x15BA900(87·GAMMA2) →0x7BE100(31·GAMMA2), per Potentially wrong value for q_bound in poly_use_hint_32_avx2.c #1074/x86_64: Fix wrong qbound 87*q -> 31*q in use_hint_32 #1076.Performance
The performance degradation we previously saw in willieyz's PR is fixed.
poly_use_hintcomponent benchmark, median cycles on Intel Xeon Platinum 8175M,OPT=1 CYCLES=PMU:main)use_hint_32(ML-DSA-65/87)use_hint_88(ML-DSA-44)