Drop (leading) windows in MSM where all coefficients zero by Vindaar · Pull Request #479 · mratsim/constantine

Vindaar · 2024-10-25T12:31:21Z

NOTE: CI is broken, because I temporarily added a parameter useZeroWindows to the MSM procs.

This implements the same optimization as done in:

for the single threaded MSM implementations.

Essentially, if all coefficients a_i are smaller than some value x < p (with p the prime field order), their binary representations will have leading zero bits. In this case, we can skip processing the windows corresponding to these leading zeros in the bucket calculation since they would not contribute to the final result.

After writing a benchmark comparing the old implementation using all windows vs. the optimization (and hence added a new argument that currently breaks the CI), I noticed that the performance improvements are pretty minor. In fact, even without this optimization our improvements are massive for inputs with only small coefficients. The gains beyond that of skipping windows only provides an additional marginal improvement.

Here is a plot showing the time for an MSM with a different number of points and number of set bits. Compared is the reference implementation (using b prefix for baseline) with the optimized implementation. Both split by either using all windows or only using non zero leading windows. What is actually quite interesting is that for the reference implementation the improvements generally are much larger, pushing it in line with the optimized implementation.

(I ran the bench on my laptop with a i7-8750H)

For scalar field elements modulo p, if all coefficients `a_i` are smaller than some value x < p, their binary representations will have leading zero bits. In this case, we can skip processing the windows corresponding to these leading zeros in the bucket calculation since they would not contribute to the final result. This follows the same idea as in the implementation: privacy-ethereum/halo2curves#168

greptile-apps · 2026-03-16T08:33:38Z

Greptile Summary

This PR ports the "drop leading zero windows" optimisation from halo2curves into Constantine's single-threaded MSM implementations (multiScalarMul_vartime and multiScalarMul_reference_vartime). A new determineBitsSet helper ORs all coefficients together to find the highest set bit, and when that bit falls below the topmost Pippenger window, the number of processed windows is reduced accordingly. New benchmarks and correctness-comparison tests are included to validate and profile the change.

Issues found:

Compile error — missing module: t_ec_template.nim line 783 imports ../../benchmarks/bench_msm_impl_optional_drop_windows which does not exist anywhere in the repository (the bench file has it commented out). This breaks compilation of every test file that imports t_ec_template.
Compile error — undefined identifier: t_ec_template.nim line 780 references bEC.getScalarField().bits() where bEC is undefined; should be EC.
Breaking API change: All public overloads of multiScalarMul_vartime and multiScalarMul_reference_vartime now require a mandatory useZeroWindows: bool argument with no default. Adding useZeroWindows: bool = true as a default would restore backwards compatibility.
Off-by-one in optimised Pippenger top: In msmImpl_vartime and msmAffineImpl_vartime, the reduced top is computed as ceilDiv(msb, c) * c, which is one window position too high. With msb=100, c=14 this gives top=112 but the last useful window is at 98. One entirely zero window is needlessly processed. The reference implementation correctly uses numWindows = ceilDiv(msb, c) (a count), equivalent to top = (ceilDiv(msb,c) - 1) * c.
== should be <= in sanity test: The check msb == min(bits, EC.getScalarField().bits()) is probabilistically fragile; random_coefficient only guarantees at-most maxBit bits, not exactly maxBit.
determineBitsSet declared as proc: Should be func per codebase convention (no side effects).
random_coefficient duplicated: Identical implementations appear in both bench_elliptic_parallel_template.nim and t_ec_template.nim.

Confidence Score: 1/5

Not safe to merge — two compile-blocking issues in the test template and a breaking API change across all public MSM entry points.
The PR introduces two definite compile errors (missing module import and bEC undefined identifier) that prevent the new tests from building, and a breaking API change that breaks CI as acknowledged by the author. The core optimisation logic in the reference implementation is correct, but the optimised Pippenger variants process one redundant zero window. All four issues must be resolved before this can be merged safely.
tests/math_elliptic_curves/t_ec_template.nim (two compile errors), constantine/math/elliptic/ec_multi_scalar_mul.nim (breaking API + off-by-one in top window for msmImpl_vartime and msmAffineImpl_vartime).

Important Files Changed

Filename	Overview
constantine/math/elliptic/ec_multi_scalar_mul.nim	Core MSM implementation: adds `determineBitsSet` helper and `useZeroWindows` parameter to all overloads. Two issues: (1) all public APIs are now breaking changes with no default value for `useZeroWindows`; (2) in `msmImpl_vartime`/`msmAffineImpl_vartime` the optimised `top` is computed as `ceilDiv(msb,c)c` (one window too many) instead of `(ceilDiv(msb,c)-1)c`; also `when` compile-time guards converted to `if` runtime checks.
tests/math_elliptic_curves/t_ec_template.nim	Three compile-blocking issues: (1) `bEC` typo on line 780 (undefined identifier, should be `EC`); (2) import of non-existent `bench_msm_impl_optional_drop_windows` module on line 783; (3) sanity assertion uses `==` where `<=` is correct. The `random_coefficient` helper is also duplicated from the benchmark template.
benchmarks/bench_elliptic_parallel_template.nim	Adds `maxBit` parameter to `createBenchMsmContext` and a `random_coefficient` helper to generate sparse coefficients. Fields made public. Logic is sound; `random_coefficient` is duplicated in the test template.
benchmarks/bench_ec_msm_coeffs_with_zeroes.nim	New benchmark comparing all-windows vs zero-window-dropping strategies across baseline and optimised MSM. Requires `ggplotnim` (hard compile error if absent). Contains a commented-out import of the missing `bench_msm_impl_optional_drop_windows` module.
tests/math_elliptic_curves/t_ec_msm_zero_windows_sanity.nim	New sanity test for `determineBitsSet`; blocked from compiling due to `bEC` typo and the missing-module import in the template it imports.
tests/math_elliptic_curves/t_ec_shortw_jac_g1_msm_zero_windows_comparison.nim	New correctness comparison test verifying `useZeroWindows=true` and `useZeroWindows=false` produce identical results. Blocked from compiling because it imports `t_ec_template` which in turn imports the missing benchmark module.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["multiScalarMul_vartime / multiScalarMul_reference_vartime\n(public API — useZeroWindows: bool)"] --> B{useZeroWindows?}

    B -- "true\n(original behaviour)" --> C["Use full window count\nnumWindows = ceilDiv(bits, c)\ntop = bits - excess"]

    B -- "false\n(new optimisation)" --> D["determineBitsSet\nOR all coefs → find highest set bit (msb)"]

    D --> E{msb < 0?}
    E -- yes --> F["Return early\n(all coefs zero)"]
    E -- no --> G{msb < bits - excess?}

    G -- "no\n(no savings)" --> C
    G -- "yes\n(leading zeros exist)" --> H["Reduce window count\nnumWindows = ceilDiv(msb, c)\ntop = ceilDiv(msb,c) × c ⚠️\nexcess = 0"]

    C --> I["Run Pippenger bucket accumulation\nover all windows"]
    H --> I

    I --> J["Final Horner accumulation\nr = Σ miniMSM[w] × 2^(w·c)"]

Comments Outside Diff (5)

tests/math_elliptic_curves/t_ec_template.nim, line 780 (link)

Undefined identifier bEC — compile error

bEC is not defined anywhere in scope; this should be EC (the typedesc parameter of the enclosing proc). As written, this will fail to compile.
tests/math_elliptic_curves/t_ec_template.nim, line 783 (link)

Import of non-existent module — compile error

bench_msm_impl_optional_drop_windows.nim does not exist in the repository. The bench file (bench_ec_msm_coeffs_with_zeroes.nim) has this same import commented out (# ./bench_msm_impl_optional_drop_windows), which strongly suggests it was an intermediate file that was removed before this PR was submitted. This import will cause a compilation failure for any test file that imports t_ec_template.
tests/math_elliptic_curves/t_ec_template.nim, line 780 (link)

Equality check == should be <= in sanity assertion

determineBitsSet returns the highest set bit position + 1 across all N coefficients. random_coefficient guarantees that the result fits in at most maxBit bits, but it does not guarantee that the highest bit is exactly maxBit (e.g., all 4096 random values might land just below 2^(maxBit-1), or — for maxBit = 1 — could even all be zero with negligible but nonzero probability).

The assertion should be:
constantine/math/elliptic/ec_multi_scalar_mul.nim, line 222-245 (link)

determineBitsSet should be func, not proc

This helper has no observable side effects and does not mutate state, yet it is declared as proc. The existing codebase convention is to use func for pure computations. Declaring it as proc silently removes the compile-time guarantee of no side effects and may prevent the compiler from applying certain optimisations.
benchmarks/bench_elliptic_parallel_template.nim, line 144-156 (link)

random_coefficient is duplicated across benchmark and test files

Identical implementations of random_coefficient appear both here (in bench_elliptic_parallel_template.nim, lines 144–156) and in tests/math_elliptic_curves/t_ec_template.nim (lines 732–744). Any future bug fix or behavioural change in one copy will not propagate to the other, creating a maintenance burden.

Consider extracting this helper into a single shared location (e.g., helpers/prng_unsafe alongside the existing test PRNG utilities) and importing it in both files.

_{Last reviewed commit: 2489912}

greptile-apps · 2026-03-16T08:33:49Z

constantine/math/elliptic/ec_multi_scalar_mul.nim

       points: ptr UncheckedArray[ECaff],
-       len: int) {.tags:[VarTime, Alloca, HeapAlloc], meter.} =
+       len: int,
+       useZeroWindows: bool) {.tags:[VarTime, Alloca, HeapAlloc], meter.} =
  ## Multiscalar multiplication:
  ##   r <- [a₀]P₀ + [a₁]P₁ + ... + [aₙ₋₁]Pₙ₋₁

-  msm_dispatch_vartime(r, coefs, points, len)
+  msm_dispatch_vartime(r, coefs, points, len, useZeroWindows)

 func multiScalarMul_vartime*[bits: static int, EC, ECaff](
       r: var EC,
       coefs: openArray[BigInt[bits]],
-       points: openArray[ECaff]) {.tags:[VarTime, Alloca, HeapAlloc], meter.} =
+       points: openArray[ECaff],
+       useZeroWindows: bool) {.tags:[VarTime, Alloca, HeapAlloc], meter.} =
  ## Multiscalar multiplication:
  ##   r <- [a₀]P₀ + [a₁]P₁ + ... + [aₙ₋₁]Pₙ₋₁
  debug: doAssert coefs.len == points.len
  let N = points.len
-  msm_dispatch_vartime(r, coefs.asUnchecked(), points.asUnchecked(), N)
+  msm_dispatch_vartime(r, coefs.asUnchecked(), points.asUnchecked(), N, useZeroWindows)

 func multiScalarMul_vartime*[F, EC, ECaff](
       r: var EC,
       coefs: ptr UncheckedArray[F],
       points: ptr UncheckedArray[ECaff],
-       len: int) {.tags:[VarTime, Alloca, HeapAlloc], meter.} =
+       len: int,
+       useZeroWindows: bool) {.tags:[VarTime, Alloca, HeapAlloc], meter.} =
  ## Multiscalar multiplication:
  ##   r <- [a₀]P₀ + [a₁]P₁ + ... + [aₙ₋₁]Pₙ₋₁

  let n = cast[int](len)
  let coefs_big = allocHeapArrayAligned(F.getBigInt(), n, alignment = 64)
  coefs_big.batchFromField(coefs, n)
-  r.multiScalarMul_vartime(coefs_big, points, n)
+  r.multiScalarMul_vartime(coefs_big, points, n, useZeroWindows)

  freeHeapAligned(coefs_big)

 func multiScalarMul_vartime*[EC, ECaff](
       r: var EC,
       coefs: openArray[Fr],
-       points: openArray[ECaff]) {.tags:[VarTime, Alloca, HeapAlloc], inline.} =
+       points: openArray[ECaff],
+       useZeroWindows: bool) {.tags:[VarTime, Alloca, HeapAlloc], meter.} =
  ## Multiscalar multiplication:
  ##   r <- [a₀]P₀ + [a₁]P₁ + ... + [aₙ₋₁]Pₙ₋₁
  debug: doAssert coefs.len == points.len
  let N = points.len
-  multiScalarMul_vartime(r, coefs.asUnchecked(), points.asUnchecked(), N)
+  multiScalarMul_vartime(r, coefs.asUnchecked(), points.asUnchecked(), N, useZeroWindows)


Breaking public API change — useZeroWindows has no default

Every public overload of multiScalarMul_vartime and multiScalarMul_reference_vartime now requires a mandatory useZeroWindows: bool argument. This breaks all existing call sites in the codebase (and in downstream consumers). The PR description acknowledges CI is broken for this reason.

The conventional way to introduce this without a breaking change is to give the parameter a sensible default — true to preserve existing behaviour (process all windows):

func multiScalarMul_vartime*[bits: static int, EC, ECaff]( r: var EC, coefs: openArray[BigInt[bits]], points: openArray[ECaff], useZeroWindows: bool = true) {.tags:[VarTime, Alloca, HeapAlloc], meter.}

Applying the same default to every overload would restore CI without sacrificing the new opt-in optimisation path.

greptile-apps · 2026-03-16T08:33:49Z

constantine/math/elliptic/ec_multi_scalar_mul.nim

@@ -363,13 +415,27 @@ func msmAffineImpl_vartime[bits: static int, EC, ECaff](

  # Algorithm
  # ---------
-  const excess = bits mod c
-  const top = bits - excess
+  var excess = bits mod c


Off-by-one: one extra zero window is processed in the optimised Pippenger path

When useZeroWindows = false and msb < bits - excess, the code computes:

msb = ceilDiv_vartime(msb, c) * c excess = 0 top = msb # e.g. with msb=100, c=14 → top=112

With top = 112 and excess = 0, the prologue is a no-op (because if excess != 0 is false), so w stays at 112. The steady-state loop then processes window 112 (bits 112–125) first, which is entirely zero by definition — determineBitsSet returned 100, meaning no bit above 99 is set.

The reference implementation in multiScalarMulImpl_reference_vartime handles this correctly by computing numWindows = ceilDiv(msb, c) (a window count), which maps to window indices 0…7 covering bits 0–111. The equivalent for msmImpl_vartime is:

msb = (ceilDiv_vartime(msb, c) - 1) * c # e.g. (8-1)*14 = 98

This makes top = 98, the prologue is still a no-op, and the steady state begins at the last useful window (bits 98–111). Functionally the current code is still correct (a zero window contributes nothing), but it wastes one full window's worth of bucket accumulation and reduction work.

The same issue exists in msmAffineImpl_vartime at the analogous section.

Vindaar added 4 commits October 25, 2024 13:43

[tests] add proc to sample random numbers less than X bits in size

d5c9a30

[bench] add benchmark comparing zero window optimization

004d6c5

[tests] add test cases for zero window MSM operations

2489912

mratsim mentioned this pull request Nov 27, 2024

fix(MSM): properly handle edge condition in parallel MSM when bits is exactly divided by c #484

Merged

greptile-apps bot reviewed Mar 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Drop (leading) windows in MSM where all coefficients zero #479

Drop (leading) windows in MSM where all coefficients zero #479
Vindaar wants to merge 4 commits intomasterfrom
msmDropZeroWindows

Vindaar commented Oct 25, 2024

Uh oh!

greptile-apps bot commented Mar 16, 2026 •

edited

Loading

Comments Outside Diff (5)

Uh oh!

greptile-apps bot Mar 16, 2026

Uh oh!

greptile-apps bot Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Vindaar commented Oct 25, 2024

Uh oh!

greptile-apps bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 1/5

Important Files Changed

Flowchart

Comments Outside Diff (5)

Uh oh!

greptile-apps bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Mar 16, 2026 •

edited

Loading