Delayed Modular Reduction for Fast Polynomial Evaluations by wu-s-john · Pull Request #205 · NethermindEth/zinc-plus

wu-s-john · 2026-06-04T06:40:01Z

This PR speeds up binary polynomial evaluation under the scalarization challenge by using delayed modular reduction for the hot equality-weight accumulations.

Delayed modular reduction means we do not reduce modulo the field prime after every field addition. Instead, we accumulate several field elements in a wider integer representation that is large enough to hold the unreduced sum, then reduce the accumulator back into the field once at the end.

For a binary column with row bits c_{b,i}, evaluation has the shape:

Σ_b eq_r(b) · (Σ_i c_{b,i} α^i)

Equivalently:

Σ_i α^i · (Σ_b eq_r(b) c_{b,i})

This PR accelerates the inner coefficient sums:

Σ_b eq_r(b) c_{b,i}

Those sums are just conditional additions of cached eq_r(b) values for set bits. Instead of performing a modular reduction after each selected addition, we accumulate the Montgomery limbs in a 5-limb Uint<5> delayed-reduction accumulator and perform one Barrett reduction at the end.

Use borrowed/in-place field addition while accumulating rotated and shifted binary polynomial evaluations. These paths run once per relevant set bit across each virtual bit-op or shifted bit-slice column, so avoiding clones removes a large amount of temporary field-element copying without changing the immediate-reduction semantics.

Add a narrow delayed modular reduction path for 4-limb Montgomery fields and use it in the hot binary polynomial evaluation paths. The new `zinc_utils::delayed_reduction` module introduces: - `MontgomeryLimbs` for exposing reduced Montgomery-form field limbs. - `DelayedModularReduction` for sum-only delayed accumulation. - `BarrettReductionParams` with const `mu` computation. - A `Uint<5>` accumulator implementation for summing 4-limb field elements. - An optimized `barrett_reduce_5` path for reducing bounded 5-limb sums. - Implementations for both `MontyField<4>` and `ConstMontyField<_, 4>`. This lets binary polynomial evaluation accumulate many selected `eq(r, b)` values as raw Montgomery limbs, then perform one Barrett reduction per output coefficient instead of doing a field reduction after every conditional add. Apply the accumulator to two hot paths: - Lifted binary polynomial evaluation in the protocol layer. - Streaming shifted bit-slice evaluation in the PIOP booleanity code. The lifted binary evaluation now builds the `eq(point, *)` table once, scans the binary trace rows, conditionally adds `eq_b` into per-bit `Uint<5>` accumulators, and reduces once per bit coefficient. The shifted bit-slice streaming path uses the same delayed accumulation strategy while continuing to avoid materializing shifted bit-slice MLE buffers. Use `crypto_bigint::Uint<5>` directly as the accumulator rather than a custom wide-limb wrapper, keeping the representation aligned with the rest of the integer code. The Barrett reducer is specialized to the actual accumulator width, avoiding the unused sixth limb from the earlier 6-limb reducer shape. Also extend the relevant protocol prover/verifier bounds so the optimized paths can access Montgomery limbs, and generalize `ConstMontyField` projection support through `FromRef`.

frozenspider · 2026-06-11T10:13:49Z

Hey @wu-s-john, could you please update the branch by resolving conflicts?

wu-s-john added 3 commits June 3, 2026 17:02

perf: stream shifted bit-slice evals in 4x prover

7c38f5f

wu-s-john force-pushed the dmr branch from 13aeb06 to 7c38f5f Compare June 4, 2026 17:26

wu-s-john changed the title ~~Delayed Modular Reduction for~~ Delayed Modular Reduction for Fast Polynomial Evaluations Jun 4, 2026

wu-s-john marked this pull request as ready for review June 4, 2026 17:29

wu-s-john requested review from ElijahVlasov, albert-garreta, frozenspider and osdnk as code owners June 4, 2026 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Delayed Modular Reduction for Fast Polynomial Evaluations#205

Delayed Modular Reduction for Fast Polynomial Evaluations#205
wu-s-john wants to merge 3 commits into
NethermindEth:main-betafrom
wu-s-john:dmr

wu-s-john commented Jun 4, 2026 •

edited

Loading

Uh oh!

frozenspider commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

wu-s-john commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frozenspider commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wu-s-john commented Jun 4, 2026 •

edited

Loading