perf(compiler): thread ValueRange through OR/XOR/SHR_U + narrow U64/U128 compare paths#499
Open
abmcar wants to merge 6 commits into
Open
perf(compiler): thread ValueRange through OR/XOR/SHR_U + narrow U64/U128 compare paths#499abmcar wants to merge 6 commits into
abmcar wants to merge 6 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Extends monotone-safe ValueRange propagation through additional MIR builder sites to preserve U64/U128 ranges longer, and uses those annotations to simplify certain U64 compare helpers.
Changes:
- Thread
ValueRangethroughOR/XORresults (u64-const fast path and general path) using a monotone join (max). - Thread
ValueRangethroughSHR_U(unsigned right shift) by carrying the shifted value’s range. - Optimize U64-constant compare helpers by skipping upper-limb OR-folds/selects when the wide operand is tagged
ValueRange::U64.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/compiler/evm_frontend/evm_mir_compiler.h | Adds a range-join helper and preserves ranges through OR/XOR and SHR_U producers. |
| src/compiler/evm_frontend/evm_mir_compiler.cpp | Adds U64-range fast paths for EQ/LT/GT helpers to avoid redundant upper-limb checks. |
| docs/changes/2026-05-12-g4-safe-valuerange/README.md | Documents the change, invariants, and trust-chain considerations. |
| docs/changes/2026-05-12-g4-safe-valuerange/reviews/* | Adds review artifacts for spec + implementation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
⚡ Performance Regression Check Results✅ Performance Check Passed (interpreter)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions ✅ Performance Check Passed (multipass)Performance Benchmark Results (threshold: 25%)
Summary: 192 benchmarks, 0 regressions |
abmcar
added a commit
to abmcar/DTVM
that referenced
this pull request
May 13, 2026
Cleanup pass on PR DTVMStack#499 from /simplify review: - Drop the Operand::maxRange one-line helper; use std::max at the single call site to match the existing std::min precedent for AND narrowing. - Add a "ordering is load-bearing" invariant note above the ValueRange enum to guard against silent breakage from future reorderings. - Drop the defensive `MInstruction *FinalResult = nullptr;` init in the three compare helpers; the if/else-if/else cascade already assigns on every path, and removing the nullptr lets the compiler warn loudly if a future branch addition skips assignment. - Trim Phase 5 OR/XOR comment from 3 lines to 1 (the AND-returns-earlier half is already encoded by the if constexpr guard). - Drop the WHAT-style trailing comments in the U256 fallback paths ("// Final: low matches AND upper is zero", "// result = hasUpper ? ...") that restate the next assignment. No behavior change. Local validation: format clean, dtvmapi builds, 223 / 223 multipass unittests pass. Statetest skipped (diff is structurally equivalent — only comment/init/helper-naming changes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
abmcar
added a commit
to abmcar/DTVM
that referenced
this pull request
May 15, 2026
- Static-assert ValueRange enum ordering (U64 < U128 < U256) so the std::min/std::max contract in AND/OR/XOR can't be silently broken. - Add isLiteralZero debug helper and assert it on the upper limbs of the U64/U128 fast paths in handleCompareEqU64, handleCompareLtRhsU64, handleCompareGtRhsU64 — promotes the producer-side invariant from prose to a runtime check (ZEN_ASSERT, zero cost in release). - Fix README wording: 4-limb OR-fold -> 3-upper-limb OR-fold. Addresses Copilot review on DTVMStack#499. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
6 tasks
…ld in U64 compares Extend the monotone-safe subset of G4 ValueRange propagation: - handleBitwiseOp<BO_OR|BO_XOR> Phase 1 u64-const fast path now carries OtherOp.getRange() (limbs[1..3] pass through as identical MInstruction pointers, so the value-range claim is preserved bit-identically). - handleBitwiseOp<BO_OR|BO_XOR> Phase 5 general path now carries max(LHS.range, RHS.range) — OR/XOR are monotone joins on bit width. AND is excluded (its monotone direction is min, already handled by its own earlier returns). - handleShift<BO_SHR_U> now carries ValueOp.getRange() — unsigned right shift cannot widen. SHL widens; SHR_S sign-fills; both keep U256. - handleCompareEqU64 / LtRhsU64 / GtRhsU64 skip the 3-upper-limb OR-fold and the boolean select when the wide operand carries ValueRange::U64. Every existing U64 producer materializes literal MIR zero in limbs[1..3], so the new fast path's runtime safety is structural. Out of scope per docs/research/directions/u256-strength-reduction/ analysis/2026-05-12-verified-opportunities.md §1.G4 verdict: SUB (wraps), SHL (widens), SAR (sign-fills), ADDMOD/MULMOD (need richer model), EXP, signed compare. Validation (worktree g4-safe-valuerange): - tools/format.sh check: clean - evmone-unittests multipass run list: 223/223 pass - evmone-statetest -k fork_Cancun multipass: 2723/2723 pass Spec + review artifacts: docs/changes/2026-05-12-g4-safe-valuerange/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
For LHS operands tagged ValueRange::U128, limbs[2..3] are provably zero (every existing U128 producer materializes literal MIR Zero in those positions — see invariant chain in 2026-05-12-g4-safe-valuerange/README.md). The previous U256 fallback in handleCompareEqU64 / LtRhsU64 / GtRhsU64 emitted a 3-limb OR-fold (LHS[1] | LHS[2] | LHS[3]); for U128 inputs that collapses to a single LHS[1] vs Zero check. Per-helper savings vs the U256 fallback path: - handleCompareEqU64: 2 OR instructions saved (single CMP + AND remains). - handleCompareLtRhsU64: 2 OR saved (single NE-CMP + Select remains). - handleCompareGtRhsU64: 2 OR saved (single NE-CMP + Select remains). Also initialize FinalResult to nullptr in all three helpers as defensive practice against future branch additions (no behavior change today). Local validation on g4-safe-valuerange: - tools/format.sh check clean - multipass unittests 223/223 - multipass statetest -k fork_Cancun 2723/2723 - 27-bench local ping-pong is too noisy on this machine (baseline drifts ~10pp between back-to-back runs); deferring perf judgment to the CI multipass regression check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cleanup pass on PR DTVMStack#499 from /simplify review: - Drop the Operand::maxRange one-line helper; use std::max at the single call site to match the existing std::min precedent for AND narrowing. - Add a "ordering is load-bearing" invariant note above the ValueRange enum to guard against silent breakage from future reorderings. - Drop the defensive `MInstruction *FinalResult = nullptr;` init in the three compare helpers; the if/else-if/else cascade already assigns on every path, and removing the nullptr lets the compiler warn loudly if a future branch addition skips assignment. - Trim Phase 5 OR/XOR comment from 3 lines to 1 (the AND-returns-earlier half is already encoded by the if constexpr guard). - Drop the WHAT-style trailing comments in the U256 fallback paths ("// Final: low matches AND upper is zero", "// result = hasUpper ? ...") that restate the next assignment. No behavior change. Local validation: format clean, dtvmapi builds, 223 / 223 multipass unittests pass. Statetest skipped (diff is structurally equivalent — only comment/init/helper-naming changes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Static-assert ValueRange enum ordering (U64 < U128 < U256) so the std::min/std::max contract in AND/OR/XOR can't be silently broken. - Add isLiteralZero debug helper and assert it on the upper limbs of the U64/U128 fast paths in handleCompareEqU64, handleCompareLtRhsU64, handleCompareGtRhsU64 — promotes the producer-side invariant from prose to a runtime check (ZEN_ASSERT, zero cost in release). - Fix README wording: 4-limb OR-fold -> 3-upper-limb OR-fold. Addresses Copilot review on DTVMStack#499. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
33db54c to
8e0cd91
Compare
The isLiteralZero ZEN_ASSERTs in the U64/U128 compare fast paths were
checking a producer-side structural invariant ("upper limbs are literal
MIR Zero constants"). PR DTVMStack#493 (EVMRangeAnalyzer cfg-join) widened the
Range contract from "literal-zero MIR" to "value-level zero" — the
analyzer now retrofits Range = U64 onto operands whose backing variables
hold any MIR that evaluates to a u64-fitting value. The old assertions
fired on analyzer-narrowed operands and crashed the JIT (CI hit SIGABRT
on stEIP1153_transientStorage.transStorageReset).
Remove the helper and the six ZEN_ASSERTs; the surrounding comments now
say "value-zero by Range contract" instead of "provably zero (literal
MIR)". The fast paths themselves are unchanged and still correct under
the new contract — they read Range and elide reading upper limbs, which
is sound for both producer-direct and analyzer-derived U64 tags.
Update the change-doc's "Invariant chain" section to reflect the two
producer paths (direct materialization + analyzer narrowing) now in
effect, instead of the old "today vs. with-cfg-join" framing.
Local validation: format clean, dtvmapi builds, 223/223 multipass
unittests pass, 2723/2723 statetest -k fork_Cancun pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extend the monotone-safe subset of
ValueRangepropagation inEVMMirBuilderso narrower-range information survives across more producers and is exploited by the unsigned-RHS compare consumers.Producer side (commit 8b546b3)
handleBitwiseOp): both Phase-1 u64-const fast path and Phase-5 general path now carrymax(LHS.range, RHS.range). AND is unaffected — its monotone direction ismin, already handled by its existing earlier returns.handleShift<BO_SHR_U>): result carriesValueOp.getRange()— right shift cannot widen. SHL widens by construction; SAR sign-fills; both keep the U256 default.Consumer side (commits 8b546b3 + c0ae6aa)
handleCompareEqU64/handleCompareLtRhsU64/handleCompareGtRhsU64short-circuit the upper-limbs check based on the LHS range tag:ValueRange::U64(commit 8b546b3)LHS[0]compare. Saves 2 OR + 1 CMP + 1 AND/Select.ValueRange::U128(commit c0ae6aa)LHS[1] != 0check. The range tag proves limbs[2..3] are value-zero, so the generated consumer does not need to inspect them. Saves 2 OR vs U256 path.ValueRange::U256(unchanged)The fast paths rely on the value-level
ValueRangeinvariant:U64means limbs[1..3] are zero at runtime, andU128means limbs[2..3] are zero at runtime. Some producers, notably dynamicSHR_U, can establish that fact through selects/ORs rather than literal MIR zero instructions, so correctness is based on the range tag's soundness rather than structural literal-zero form.Out of scope (deferred)
SUB (wraps to full U256), SHL (widens), SAR (sign-fills upper limbs), ADDMOD/MULMOD (need a richer range model than U64/U128/U256), EXP, signed compare. Rationale + verification in
docs/changes/2026-05-12-g4-safe-valuerange/README.md.Validation
Local (worktree built with CI-faithful flags incl.
-DZEN_ENABLE_JIT_PRECOMPILE_FALLBACK=ON):tools/format.sh checkcmake --build build --target dtvmapievmone-unittestsmultipass (run list)evmone-statetest -k fork_CancunmultipassPerformance — three layers of evidence
1. CI multipass regression check
Current PR HEAD:
ad82eb5. CI is rerunning for this cleanup commit; do not treat the prior all-green status as current until the pending checks complete.Previous performance run on the pre-cleanup implementation (
c0ae6aa) covered 192 benches with a 25% threshold and passed. Wins were concentrated on macro / shift-heavy paths:total/main/snailtracer/benchmarktotal/main/blake2b_shifts/8415nullstotal/main/blake2b_huff/8415nullstotal/main/structarray_alloc/nfts_ranktotal/main/weierstrudel/15total/main/weierstrudel/1192-bench median: +0.20%. Mean: +0.19%. The largest reported regressions are on pure-PUSH micro-benches (
synth/PUSH26/PUSH28, ≤ +7.5%, sub-2us, never enter touched code paths).2. Controlled microbench (narrow vs wide path, same loop shape)
Two hand-crafted EVM bytecodes with identical loop structure — same PUSH/ADD/AND/DUP/PUSH/GT/JUMPI gas costs per iteration. Only difference: the wide control inserts
NOT NOTafterANDto force the LHS range tag back to U256 (value preserved bit-identically), so both baseline and branch take the U256 fallback path on that variant.narrow_compare_u64/loop(LHS range U64)wide_compare_u256_control/loop(LHS range U256, same loop body + 2 NOTs)Δ — Δ_control = −7.40 pp net signal, isolated to the narrow path. The control shows the bench-harness noise floor is ≤ 0.4%, so the −7.76% on the narrow bench cannot be explained by drift / cache / scheduling.
Microbench bytecodes (26 / 28 bytes):
3. U128 narrow path — currently muted, full unlock pending cfg-join
narrow_compare_u128/loopshows only −1.90% because each EVM loop back-edge re-loads the running value from the stack withRange = U256, so commit c0ae6aa's U128 fast path fires only on the first iteration. The full U128 win will be unlocked whenperf/value-range-cfg-join(which retrofits analyzer-derived ranges onto stack-popped operands at block entry) lands. The U128 narrow code is correct and dormant today — no regression risk, no behavior change for U256 fallback.References
docs/changes/2026-05-12-g4-safe-valuerange/README.mddocs/changes/2026-05-12-g4-safe-valuerange/reviews/docs/research/directions/u256-strength-reduction/analysis/2026-05-12-verified-opportunities.md§ 1.G4 (in DTVM-Papers, commite104583)Commits in this PR
8b546b3— producer threading (OR/XOR/SHR_U) + U64 narrow compare consumerc0ae6aa— U128 narrow compare consumer (extends U64 fast path to U128 tier; full win pending cfg-join)ad82eb5— refactor cleanup from /simplify review (drop maxRange helper for std::max; drop cargo-cult nullptr init; trim comments; document ValueRange enum ordering invariant)Test plan
Performance Regression Check (multipass)forad82eb5— pending; previousc0ae6aarun passed (median +0.20%, snailtracer −3.40%)Performance Regression Check (interpreter)forad82eb5— pendingTest DTVM-EVM multipass evmtestsuite with gas register in release modeforad82eb5— pendingTest DTVM-EVM multipass and interpreter using evmone statetests in release modeforad82eb5— pendingTest DTVM-EVM multipass and interpreter using evmone unit tests in release modeforad82eb5— pendingad82eb5— pendingevm_mir_compiler.{cpp,h}(manual confirm by reviewer)🤖 Generated with Claude Code