Skip to content

Delayed Modular Reduction for Fast Polynomial Evaluations#205

Open
wu-s-john wants to merge 3 commits into
NethermindEth:main-betafrom
wu-s-john:dmr
Open

Delayed Modular Reduction for Fast Polynomial Evaluations#205
wu-s-john wants to merge 3 commits into
NethermindEth:main-betafrom
wu-s-john:dmr

Conversation

@wu-s-john

@wu-s-john wu-s-john commented Jun 4, 2026

Copy link
Copy Markdown

This PR speeds up binary polynomial evaluation under the scalarization challenge by using delayed modular reduction for the hot equality-weight accumulations.

Delayed modular reduction means we do not reduce modulo the field prime after every field addition. Instead, we accumulate several field elements in a wider integer representation that is large enough to hold the unreduced sum, then reduce the accumulator back into the field once at the end.

For a binary column with row bits c_{b,i}, evaluation has the shape:

Σ_b eq_r(b) · (Σ_i c_{b,i} α^i)

Equivalently:

Σ_i α^i · (Σ_b eq_r(b) c_{b,i})

This PR accelerates the inner coefficient sums:

Σ_b eq_r(b) c_{b,i}

Those sums are just conditional additions of cached eq_r(b) values for set bits. Instead of performing a modular reduction after each selected addition, we accumulate the Montgomery limbs in a 5-limb Uint<5> delayed-reduction accumulator and perform one Barrett reduction at the end.

wu-s-john added 3 commits June 3, 2026 17:02
Use borrowed/in-place field addition while accumulating rotated and shifted binary polynomial evaluations. These paths run once per relevant set bit across each virtual bit-op or shifted bit-slice column, so avoiding clones removes a large amount of temporary field-element copying without changing the immediate-reduction semantics.
Add a narrow delayed modular reduction path for 4-limb Montgomery fields and
use it in the hot binary polynomial evaluation paths.

The new `zinc_utils::delayed_reduction` module introduces:
- `MontgomeryLimbs` for exposing reduced Montgomery-form field limbs.
- `DelayedModularReduction` for sum-only delayed accumulation.
- `BarrettReductionParams` with const `mu` computation.
- A `Uint<5>` accumulator implementation for summing 4-limb field elements.
- An optimized `barrett_reduce_5` path for reducing bounded 5-limb sums.
- Implementations for both `MontyField<4>` and `ConstMontyField<_, 4>`.

This lets binary polynomial evaluation accumulate many selected `eq(r, b)`
values as raw Montgomery limbs, then perform one Barrett reduction per output
coefficient instead of doing a field reduction after every conditional add.

Apply the accumulator to two hot paths:
- Lifted binary polynomial evaluation in the protocol layer.
- Streaming shifted bit-slice evaluation in the PIOP booleanity code.

The lifted binary evaluation now builds the `eq(point, *)` table once, scans the
binary trace rows, conditionally adds `eq_b` into per-bit `Uint<5>`
accumulators, and reduces once per bit coefficient. The shifted bit-slice
streaming path uses the same delayed accumulation strategy while continuing to
avoid materializing shifted bit-slice MLE buffers.

Use `crypto_bigint::Uint<5>` directly as the accumulator rather than a custom
wide-limb wrapper, keeping the representation aligned with the rest of the
integer code. The Barrett reducer is specialized to the actual accumulator
width, avoiding the unused sixth limb from the earlier 6-limb reducer shape.

Also extend the relevant protocol prover/verifier bounds so the optimized paths
can access Montgomery limbs, and generalize `ConstMontyField` projection support
through `FromRef`.
@wu-s-john wu-s-john changed the title Delayed Modular Reduction for Delayed Modular Reduction for Fast Polynomial Evaluations Jun 4, 2026
@wu-s-john wu-s-john marked this pull request as ready for review June 4, 2026 17:29
@frozenspider

Copy link
Copy Markdown
Collaborator

Hey @wu-s-john, could you please update the branch by resolving conflicts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants