Skip to content

feat(evm): drop SPP all-or-nothing fallback for dynamic-jump contracts#446

Merged
zoowii merged 23 commits into
DTVMStack:mainfrom
abmcar:feat/gas-check-placement
May 15, 2026
Merged

feat(evm): drop SPP all-or-nothing fallback for dynamic-jump contracts#446
zoowii merged 23 commits into
DTVMStack:mainfrom
abmcar:feat/gas-check-placement

Conversation

@abmcar
Copy link
Copy Markdown
Contributor

@abmcar abmcar commented Apr 5, 2026

Summary

Before this PR, the EVM bytecode cache's SPP gas-metering pipeline used
an all-or-nothing bailout: any unresolved dynamic jump caused the
entire contract to skip SPP and fall back to per-block gas metering
.
Real-world Solidity contracts always contain dynamic jumps (function
dispatch, return-via-stack), so in practice SPP delivered zero benefit
on the workloads that matter.

This PR removes the bailout by making the cache build a mixed-precision
CFG
(precise edges for static PUSH→JUMP, over-approximated edges for
all other dynamic jumps) and exporting a separate SPP-shifted gas-cost
array
that the multipass JIT consumes while the interpreter continues
to read the unshifted costs guaranteed by PR #371. The expensive SPP
pipeline is also gated to JIT-consumer modules so interpreter-only paths
never pay for it.

What this PR delivers

  1. Capability — SPP gas metering now applies to every contract,
    including ones with unresolvable dynamic jumps. Previously it was
    effectively dead code on real Solidity workloads.
  2. Algorithmic complexityO(D × J²)O(N) on the cache CFG
    build, where D is the dynamic-jump count and J is the JUMPDEST
    count. Verified: loop_full_of_jumpdests cache-build wall-clock
    drops from 7.3s to 3.3s (local single-machine measurement, not
    CI-tracked).
  3. ArchitectureGasChunkCost / GasChunkCostSPP parallel arrays
    plus the CacheNeedsSPP lifecycle flag cleanly separate the
    interpreter and JIT gas paths, so future work on either side can
    move independently.

Runtime delta on the 27-bench paper subset is within evmone-bench's
inter-binary drift band — see Evaluation below.

Phase 1 — mixed-CFG gas block construction

  • Remove the all-or-nothing HasDynamicJump bailout in
    buildGasChunksSPP. Contracts with any unresolved dynamic jump used to
    skip the SPP pipeline completely; now they run the pipeline with
    over-approximated edges for the unresolved portion.
  • Factor out buildCFGEdges() with over-approximation for all unresolved
    dynamic jumps (sound for SPP metering). The CFG is intentionally kept
    over-approximate — using resolved targets to narrow edges would
    under-approximate the CFG when resolution is incomplete, causing
    lemma614Update to shift gas along non-existent edges.
  • Tighten the SPP shift guard inside lemma614Update so a shift never
    crosses an isGasChunkTerminator boundary, and set MinSucc = 0 when
    encountering excluded successors to prevent unsafe shifting.

Phase 2 — decodePushAsJumpDest decode helper

  • Factor decodePushAsJumpDest() out of resolveConstantJumpTarget() as
    a shared decode helper.

Phase 3 — wire SPP-shifted costs into the multipass JIT

  • Add a second parallel cost array EVMBytecodeCache::GasChunkCostSPP
    populated from Metering[] inside buildGasChunksSPP. The existing
    GasChunkCost continues to hold unshifted Blocks[Id].Cost per the
    interpreter-safety invariant established by PR fix(evm): disable SPP gas cost shifting and add opcode validity check in interpreter fallback path #371.
  • Plumb the pointer through EVMFrontendContext::setGasChunkInfo and the
    EVMMirBuilder constructor / copy path.
  • Swap reads at three JIT sites so they prefer GasChunkCostSPP when
    available, falling back to GasChunkCost:
    • EVMMirBuilder::meterOpcode — primary per-chunk-start charge
    • EVMMirBuilder::meterOpcodeRange — JUMPDEST-skip cumulative sum
    • buildEVMFunction JUMPDEST-run suffix-sum precompute

Phase 4 — gate the SPP pipeline on JIT-consumer modules only

  • Add buildBytecodeCache(..., bool EnableSPP = false). When false,
    skip the expensive CFG / metering pipeline entirely.
  • Track EVMModule::CacheNeedsSPP. Flipped to true immediately before
    action::performEVMJITCompile runs — interpreter-only modules never
    pay the SPP pipeline cost.

Phase 5 — CFG soundness fix

  • Remove the two-pass CFG rebuild that used resolved call-site targets to
    replace over-approximate edges. This created an under-approximate CFG
    when call-site enumeration was incomplete, causing lemma614Update to
    shift gas along non-existent edges and produce unsafe metering.
  • buildCFGEdges() now always over-approximates dynamic jumps
    logically: unresolved dynamic jumps are represented by
    ImplicitDynamicPredCount stamped onto JUMPDEST blocks, rather than by
    materializing one Succs/Preds edge per dynamic-jump/JUMPDEST pair.
    Static jumps (PUSH → JUMP) still get precise single-target edges.
  • Remove dead call-site enumeration code (resolveCallSiteTargets and
    ResolvedJumpTargets export) — no downstream consumer exists yet.
    The algorithm can be restored from git history when a consumer
    (e.g. MIR direct-branch optimization) is implemented.

Phase 6 — review fixes

  • Remove dead GasBlock::Prev2Pc / Prev2Opcode fields and their
    writebacks. They were originally added to support a future
    3-instruction call-site window lookup, but the call-site enumeration
    that would have consumed them was removed in Phase 5. Whole-repo
    grep confirmed zero readers; struct shrinks ~9 bytes.
  • Extend the buildCFGEdges function comment to make the soundness
    pairing with lemma614Update explicit: the implicit dynamic
    predecessor count is folded into effectivePredCount, so dynamic
    targets are treated as multi-predecessor blocks without materializing
    D×J CFG edges.
  • Document the EVMModule::CacheNeedsSPP lifecycle invariant: the flag
    must be set before any getBytecodeCache() call, since the
    EnableSPP decision is fixed at lazy-build time.

Phase 7 — drop O(D × J²) over-approximation cost

  • Replace the explicit add-then-split-critical-edge step that materialised
    one CFG edge per (dynamic-jump-block, JUMPDEST) pair with a per-JUMPDEST
    scalar ImplicitDynamicPredCount, folded into effectivePredCount so
    the lemma 6.14 update behaves identically without materialising the
    edges. On a contract with D dynamic jumps and J JUMPDESTs, the
    cache build drops from O(D × J²) to O(N).

  • To keep dyn-only JUMPDESTs (Solidity function returns, unreachable in
    the static-only CFG) visible to the dominator / loop analyses, seed
    the reachability search from every JUMPDEST after the static reachable
    set is built. Gated on ImplicitDynamicPredCount > 0 (round-2 review
    fix) so statically-dead JUMPDESTs in dynamic-jump-free contracts
    preserve pre-Phase-7 behavior.

  • Compile-time check: end-to-end evmone-unittests wall clock for
    loop_full_of_jumpdests (24556 JUMPDESTs) drops from 7.3 s to
    3.3 s (local single-machine measurement, not CI-tracked).

  • Intra-PR demo — same source built on the commit immediately
    before this Phase 7 commit (99f23a3) vs current HEAD, on a
    synthetic contract CALLDATALOAD JUMP <N × JUMPDEST> STOP:

    N JUMPDESTs Pre-Phase-7 (D × J explicit edges) Phase 7 (O(N) implicit count) Speedup
    100 0.07 ms 0.05 ms 1.4×
    500 0.39 ms 0.13 ms 3.0×
    1,000 1.01 ms 0.29 ms 3.4×
    2,000 3.04 ms 0.67 ms 4.5×
    5,000 19.66 ms 2.71 ms 7.2×
    10,000 84.76 ms 10.38 ms 8.2×
    20,000 345.94 ms 43.68 ms 7.9×

    Pre-Phase-7 grows ~4× per doubling of N (quadratic — the expected
    O(D × J²) shape). Phase 7 grows 2–4× per doubling (sub-quadratic;
    residual super-linearity comes from computeDominators /
    buildLoopsUsingDominance, which this PR does not touch). Reproduce
    with cmake --build build --target evmCacheComplexityDemo && bash docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/scaling_demo.sh.

  • New evmCacheTests unit test target with 4 smoke/regression cases
    covering the SPP gate, dynamic-target reachability path, interpreter-only
    no-SPP path, and multi-dyn-jump conservative metering.

  • Detail: docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md
    and docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/review-fixes-r2.md.

Evaluation

Runtime

After rebasing onto current upstream/main (which includes the
intervening upstream perf work — #458 u256 arithmetic, #460
displacement-addressed bytes32, #482 depth-indexed pool, #483 inline
arithmetic delegate), and measuring the 27 paper benches
(^external/total/(main|micro)/):

  • 27-bench 10-rep geomean: +1.15% (treatment slower; +0.46% after
    correcting a single 20-rep-confirmed outlier on
    main/blake2b_shifts/8415nulls).

  • 0 benches above the ±25% CI gate.

  • Caveat: this 10-rep number is sequential
    (baseline-all-then-treatment-all), so it conflates real PR delta with
    inter-binary system drift. Focused 20-rep re-measurement on the three
    largest movers indicates the per-bench deltas are dominated by drift,
    not by PR effects:

    Bench 10-rep Δ 20-rep Δ (focused) Verdict
    main/weierstrudel/1 +3.51% +0.55% (treat CV 2.19%) drift
    main/blake2b_huff/8415nulls -6.30% +1.55% drift (flipped direction)
    micro/loop_with_many_jumpdests/empty -4.84% -0.55% drift
    main/blake2b_shifts/8415nulls +20.34% (CV 21.93%) +0.25% (CV 2.09%) single-iteration outlier

    Additionally, three of the four "regression" benches reported above
    the noise band at 10 reps — micro/memory_grow_mstore/{nogrow,by1},
    micro/memory_grow_mload/nogrow — contain zero JUMP / JUMPI /
    JUMPDEST opcodes
    , so PR feat(evm): drop SPP all-or-nothing fallback for dynamic-jump contracts #446's CFG changes cannot affect them by
    construction. Those deltas are pure drift artifacts.

Reading: this PR's value is the capability change (Phase 1 — SPP
now applies to contracts with dynamic jumps) and the algorithmic
complexity guarantee (Phase 7 — O(D × J²) → O(N) cache build), not raw
runtime improvement on the existing 27-bench suite. The intervening
upstream perf work absorbed any prior absolute speedup; the remaining
per-bench deltas are at or below evmone-bench's single-machine
inter-binary drift band.

Interpreter

0 regressions on CI. Local timing confirms the SPP gating bypass:

test suite before Phase 4 after Phase 4
evmone-unittests interpreter 3744 ms 419 ms (−89%)

Correctness

  • tools/format.sh check: clean
  • evmone-unittests multipass: 223/223 pass
  • evmone-unittests interpreter: 215/215 pass
  • evmone-statetest --fork Cancun multipass: 2723/2723 pass
  • evmone-statetest --fork Cancun interpreter: 2723/2723 pass
  • New evmCacheTests unit tests: 4/4 pass

Changed files

  • src/evm/evm_cache.h — add GasChunkCostSPP array;
    GasBlock::ImplicitDynamicPredCount field
  • src/evm/evm_cache.cpp — mixed-CFG, SPP export, EnableSPP gating,
    soundness fix (always over-approximate CFG); drop dead
    Prev2Pc/Prev2Opcode; clarify CFG over-approx invariant;
    implicit-dyn-pred count + reachability stitch with R2 gate (Phase 7)
  • src/compiler/evm_frontend/evm_mir_compiler.{h,cpp} — plumb SPP
    pointer; prefer SPP-shifted cost at three chunk-cost read sites
  • src/compiler/evm_compiler.cpp — pass SPP pointer via setGasChunkInfo
  • src/runtime/evm_module.{h,cpp} — add CacheNeedsSPP flag; flip
    before JIT compile; document lifecycle invariant
  • src/tests/evm_cache_tests.cpp — NEW unit test target
  • src/tests/CMakeLists.txt — register new test target
  • docs/changes/2026-04-05-gas-check-placement/ — change doc +
    review-fix plan
  • docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/ — Phase 7
    change doc + round-2 review-fix plan
  • docs/design/evm-gas-mechanism.md — design doc (interpreter + JIT
    gas mechanism with SPP)

Test plan

  • tools/format.sh check clean
  • evmone-unittests multipass and interpreter: all pass
  • evmone-statetest --fork Cancun multipass and interpreter: all pass
  • New evmCacheTests 4 cases pass
  • CI perf regression check passes within the ±25% gate
  • loop_full_of_jumpdests cache build under 4s (compile-time
    complexity verification, local single-machine)

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 5, 2026 13:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves EVM SPP (strategic placement of gas checks) by always constructing a CFG with mixed-precision jump edges, adding a bytecode analysis pass to resolve common Solidity internal-return SWAPn → JUMP patterns via call-site enumeration, fixing an SPP output writeback bug, and plumbing resolved jump targets into the MIR compiler to enable more direct branching.

Changes:

  • Build CFG even with unresolved dynamic jumps, using resolved edges where available and over-approximated edges otherwise.
  • Add call-site enumeration to resolve SWAPn → JUMP return targets and export them through EVMBytecodeCache.
  • Fix SPP results being computed but not written to GasChunkCost, and use resolved targets for a direct-branch fast path in MIR for single-target jumps.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/evm/evm_cache.h Add ResolvedJumpTargets to bytecode cache for cross-phase consumption.
src/evm/evm_cache.cpp Implement call-site enumeration, CFG edge builder with mixed precision, and fix SPP cost writeback.
src/compiler/evm_frontend/evm_mir_compiler.h Add frontend context plumbing for resolved targets and track CurrentInstrPC.
src/compiler/evm_frontend/evm_mir_compiler.cpp Use resolved jump targets to emit direct branch for single-target dynamic JUMP.
src/compiler/evm_compiler.cpp Pass resolved targets from module cache into frontend compilation context.
src/action/evm_bytecode_visitor.h Set current instruction PC before invoking jump handlers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/compiler/evm_frontend/evm_mir_compiler.cpp Outdated
Comment thread src/evm/evm_cache.cpp Outdated
Comment thread src/evm/evm_cache.cpp
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 5, 2026

⚡ Performance Regression Check Results

✅ Performance Check Passed (interpreter)

Performance Benchmark Results (threshold: 25%)

Benchmark Baseline (us) Current (us) Change Status
total/main/blake2b_huff/8415nulls 2.51 2.60 +3.7% PASS
total/main/blake2b_huff/empty 0.04 0.04 +3.1% PASS
total/main/blake2b_shifts/8415nulls 20.15 20.61 +2.3% PASS
total/main/sha1_divs/5311 8.64 8.59 -0.7% PASS
total/main/sha1_divs/empty 0.10 0.11 +5.6% PASS
total/main/sha1_shifts/5311 6.08 6.34 +4.4% PASS
total/main/sha1_shifts/empty 0.07 0.08 +4.1% PASS
total/main/snailtracer/benchmark 73.96 74.04 +0.1% PASS
total/main/structarray_alloc/nfts_rank 1.46 1.38 -5.5% PASS
total/main/swap_math/insufficient_liquidity 0.00 0.00 +0.9% PASS
total/main/swap_math/received 0.01 0.01 +0.4% PASS
total/main/swap_math/spent 0.01 0.01 +1.1% PASS
total/main/weierstrudel/1 0.30 0.29 -3.9% PASS
total/main/weierstrudel/15 3.41 3.18 -6.8% PASS
total/micro/JUMPDEST_n0/empty 3.01 2.63 -12.5% PASS
total/micro/jump_around/empty 0.08 0.10 +13.9% PASS
total/micro/loop_with_many_jumpdests/empty 45.88 40.23 -12.3% PASS
total/micro/memory_grow_mload/by1 0.13 0.13 -6.0% PASS
total/micro/memory_grow_mload/by16 0.14 0.13 -5.2% PASS
total/micro/memory_grow_mload/by32 0.16 0.15 -7.2% PASS
total/micro/memory_grow_mload/nogrow 0.13 0.12 -3.3% PASS
total/micro/memory_grow_mstore/by1 0.13 0.13 -4.8% PASS
total/micro/memory_grow_mstore/by16 0.15 0.14 -4.8% PASS
total/micro/memory_grow_mstore/by32 0.16 0.15 -5.0% PASS
total/micro/memory_grow_mstore/nogrow 0.13 0.12 -8.8% PASS
total/micro/signextend/one 0.27 0.28 +1.6% PASS
total/micro/signextend/zero 0.27 0.28 +2.0% PASS
total/synth/ADD/b0 3.22 3.22 -0.0% PASS
total/synth/ADD/b1 3.81 3.57 -6.4% PASS
total/synth/ADDRESS/a0 5.74 4.81 -16.2% PASS
total/synth/ADDRESS/a1 6.29 5.34 -15.1% PASS
total/synth/AND/b0 3.16 3.10 -1.9% PASS
total/synth/AND/b1 3.64 3.50 -4.0% PASS
total/synth/BYTE/b0 6.90 6.09 -11.8% PASS
total/synth/BYTE/b1 5.78 5.11 -11.7% PASS
total/synth/CALLDATASIZE/a0 3.44 3.51 +2.1% PASS
total/synth/CALLDATASIZE/a1 4.13 3.58 -13.4% PASS
total/synth/CALLER/a0 5.70 4.81 -15.7% PASS
total/synth/CALLER/a1 6.28 5.34 -15.0% PASS
total/synth/CALLVALUE/a0 4.33 3.51 -19.0% PASS
total/synth/CALLVALUE/a1 4.41 3.76 -14.6% PASS
total/synth/CODESIZE/a0 3.96 3.76 -5.1% PASS
total/synth/CODESIZE/a1 4.09 4.01 -1.8% PASS
total/synth/DUP1/d0 1.57 1.39 -11.5% PASS
total/synth/DUP1/d1 1.96 1.76 -10.3% PASS
total/synth/DUP10/d0 1.57 1.39 -11.5% PASS
total/synth/DUP10/d1 2.05 1.73 -15.3% PASS
total/synth/DUP11/d0 1.46 1.15 -21.1% PASS
total/synth/DUP11/d1 2.05 1.73 -15.3% PASS
total/synth/DUP12/d0 1.57 1.39 -11.5% PASS
total/synth/DUP12/d1 2.05 1.73 -15.4% PASS
total/synth/DUP13/d0 1.57 1.39 -11.5% PASS
total/synth/DUP13/d1 1.96 1.73 -11.7% PASS
total/synth/DUP14/d0 1.48 1.39 -6.2% PASS
total/synth/DUP14/d1 2.05 1.73 -15.3% PASS
total/synth/DUP15/d0 1.57 1.39 -11.3% PASS
total/synth/DUP15/d1 2.05 1.73 -15.4% PASS
total/synth/DUP16/d0 1.57 1.39 -11.3% PASS
total/synth/DUP16/d1 1.96 1.73 -11.6% PASS
total/synth/DUP2/d0 1.48 1.39 -6.3% PASS
total/synth/DUP2/d1 2.05 1.73 -15.4% PASS
total/synth/DUP3/d0 1.51 1.15 -23.6% PASS
total/synth/DUP3/d1 2.05 1.73 -15.4% PASS
total/synth/DUP4/d0 1.48 1.39 -6.2% PASS
total/synth/DUP4/d1 2.05 1.73 -15.4% PASS
total/synth/DUP5/d0 1.57 1.39 -11.4% PASS
total/synth/DUP5/d1 2.06 1.73 -15.7% PASS
total/synth/DUP6/d0 1.57 1.39 -11.5% PASS
total/synth/DUP6/d1 2.05 1.73 -15.4% PASS
total/synth/DUP7/d0 1.57 1.39 -11.6% PASS
total/synth/DUP7/d1 1.96 1.73 -11.6% PASS
total/synth/DUP8/d0 1.44 1.39 -3.3% PASS
total/synth/DUP8/d1 2.05 1.73 -15.4% PASS
total/synth/DUP9/d0 1.57 1.15 -26.7% PASS
total/synth/DUP9/d1 2.05 1.73 -15.3% PASS
total/synth/EQ/b0 6.10 5.32 -12.8% PASS
total/synth/EQ/b1 6.59 5.61 -14.8% PASS
total/synth/GAS/a0 4.21 3.83 -8.9% PASS
total/synth/GAS/a1 4.49 4.00 -11.1% PASS
total/synth/GT/b0 5.79 5.38 -7.1% PASS
total/synth/GT/b1 6.20 5.35 -13.7% PASS
total/synth/ISZERO/u0 9.63 8.33 -13.6% PASS
total/synth/JUMPDEST/n0 3.01 2.63 -12.5% PASS
total/synth/LT/b0 5.78 5.38 -6.9% PASS
total/synth/LT/b1 6.21 5.35 -13.8% PASS
total/synth/MSIZE/a0 4.98 4.33 -13.1% PASS
total/synth/MSIZE/a1 5.67 4.85 -14.4% PASS
total/synth/MUL/b0 6.33 5.49 -13.2% PASS
total/synth/MUL/b1 6.78 5.93 -12.5% PASS
total/synth/NOT/u0 5.53 5.06 -8.5% PASS
total/synth/OR/b0 3.04 3.02 -0.8% PASS
total/synth/OR/b1 3.54 3.41 -3.5% PASS
total/synth/PC/a0 3.68 3.59 -2.5% PASS
total/synth/PC/a1 4.12 3.61 -12.3% PASS
total/synth/PUSH1/p0 1.48 1.39 -6.1% PASS
total/synth/PUSH1/p1 2.16 1.82 -15.7% PASS
total/synth/PUSH10/p0 1.51 1.39 -7.6% PASS
total/synth/PUSH10/p1 2.07 1.83 -11.3% PASS
total/synth/PUSH11/p0 1.58 1.39 -11.6% PASS
total/synth/PUSH11/p1 2.15 1.82 -15.3% PASS
total/synth/PUSH12/p0 1.51 1.39 -8.0% PASS
total/synth/PUSH12/p1 2.07 1.82 -11.9% PASS
total/synth/PUSH13/p0 1.52 1.31 -13.5% PASS
total/synth/PUSH13/p1 2.08 1.83 -12.1% PASS
total/synth/PUSH14/p0 1.52 1.42 -6.5% PASS
total/synth/PUSH14/p1 1.89 1.83 -3.3% PASS
total/synth/PUSH15/p0 1.52 1.39 -8.4% PASS
total/synth/PUSH15/p1 2.10 1.92 -8.4% PASS
total/synth/PUSH16/p0 1.50 1.39 -7.0% PASS
total/synth/PUSH16/p1 2.10 1.83 -12.7% PASS
total/synth/PUSH17/p0 1.52 1.39 -8.2% PASS
total/synth/PUSH17/p1 2.07 1.83 -11.7% PASS
total/synth/PUSH18/p0 1.52 1.32 -13.1% PASS
total/synth/PUSH18/p1 2.09 1.85 -11.7% PASS
total/synth/PUSH19/p0 1.56 1.31 -16.1% PASS
total/synth/PUSH19/p1 2.09 1.85 -11.6% PASS
total/synth/PUSH2/p0 1.51 1.40 -7.2% PASS
total/synth/PUSH2/p1 2.16 1.82 -15.6% PASS
total/synth/PUSH20/p0 1.51 1.40 -7.2% PASS
total/synth/PUSH20/p1 2.11 1.85 -12.4% PASS
total/synth/PUSH21/p0 1.53 1.31 -13.9% PASS
total/synth/PUSH21/p1 2.11 1.83 -13.2% PASS
total/synth/PUSH22/p0 1.52 1.39 -8.1% PASS
total/synth/PUSH22/p1 2.17 1.86 -14.2% PASS
total/synth/PUSH23/p0 1.52 1.32 -13.0% PASS
total/synth/PUSH23/p1 2.16 1.88 -12.9% PASS
total/synth/PUSH24/p0 1.51 1.39 -7.7% PASS
total/synth/PUSH24/p1 1.90 1.83 -3.7% PASS
total/synth/PUSH25/p0 1.51 1.39 -7.9% PASS
total/synth/PUSH25/p1 2.08 1.83 -11.9% PASS
total/synth/PUSH26/p0 1.51 1.39 -7.7% PASS
total/synth/PUSH26/p1 2.08 1.83 -11.7% PASS
total/synth/PUSH27/p0 1.52 1.32 -13.0% PASS
total/synth/PUSH27/p1 2.08 1.84 -11.6% PASS
total/synth/PUSH28/p0 1.52 1.34 -12.1% PASS
total/synth/PUSH28/p1 2.08 1.85 -11.1% PASS
total/synth/PUSH29/p0 1.53 1.32 -13.5% PASS
total/synth/PUSH29/p1 2.17 1.86 -14.1% PASS
total/synth/PUSH3/p0 1.52 1.39 -8.2% PASS
total/synth/PUSH3/p1 2.07 1.64 -21.2% PASS
total/synth/PUSH30/p0 1.58 1.54 -3.0% PASS
total/synth/PUSH30/p1 2.10 1.65 -21.4% PASS
total/synth/PUSH31/p0 1.58 1.39 -11.5% PASS
total/synth/PUSH31/p1 2.21 1.80 -18.3% PASS
total/synth/PUSH32/p0 1.53 1.39 -9.3% PASS
total/synth/PUSH32/p1 2.17 1.83 -15.4% PASS
total/synth/PUSH4/p0 1.51 1.39 -7.7% PASS
total/synth/PUSH4/p1 2.16 1.60 -25.8% PASS
total/synth/PUSH5/p0 1.51 1.39 -7.7% PASS
total/synth/PUSH5/p1 2.07 1.83 -11.3% PASS
total/synth/PUSH6/p0 1.51 1.39 -8.0% PASS
total/synth/PUSH6/p1 1.81 1.82 +0.7% PASS
total/synth/PUSH7/p0 1.52 1.39 -8.0% PASS
total/synth/PUSH7/p1 2.08 1.85 -11.0% PASS
total/synth/PUSH8/p0 1.55 1.30 -16.2% PASS
total/synth/PUSH8/p1 2.16 1.82 -15.4% PASS
total/synth/PUSH9/p0 1.57 1.40 -11.0% PASS
total/synth/PUSH9/p1 1.95 1.83 -6.0% PASS
total/synth/RETURNDATASIZE/a0 3.88 3.91 +0.7% PASS
total/synth/RETURNDATASIZE/a1 4.17 3.93 -5.7% PASS
total/synth/SAR/b0 4.44 3.92 -11.7% PASS
total/synth/SAR/b1 5.23 4.71 -10.0% PASS
total/synth/SGT/b0 4.39 4.60 +4.7% PASS
total/synth/SGT/b1 5.06 4.11 -18.8% PASS
total/synth/SHL/b0 3.94 3.60 -8.8% PASS
total/synth/SHL/b1 3.86 3.65 -5.3% PASS
total/synth/SHR/b0 3.64 3.47 -4.6% PASS
total/synth/SHR/b1 3.86 3.68 -4.5% PASS
total/synth/SIGNEXTEND/b0 3.60 3.44 -4.3% PASS
total/synth/SIGNEXTEND/b1 4.10 3.82 -6.8% PASS
total/synth/SLT/b0 4.38 4.28 -2.3% PASS
total/synth/SLT/b1 4.88 4.08 -16.3% PASS
total/synth/SUB/b0 3.22 3.22 -0.1% PASS
total/synth/SUB/b1 3.73 3.48 -6.6% PASS
total/synth/SWAP1/s0 3.43 3.43 +0.0% PASS
total/synth/SWAP10/s0 3.45 3.46 +0.1% PASS
total/synth/SWAP11/s0 3.45 3.45 +0.1% PASS
total/synth/SWAP12/s0 3.46 3.46 +0.2% PASS
total/synth/SWAP13/s0 3.46 3.46 +0.0% PASS
total/synth/SWAP14/s0 3.46 3.47 +0.1% PASS
total/synth/SWAP15/s0 3.30 3.74 +13.4% PASS
total/synth/SWAP16/s0 3.39 3.49 +3.0% PASS
total/synth/SWAP2/s0 3.44 3.44 +0.1% PASS
total/synth/SWAP3/s0 3.44 3.44 +0.0% PASS
total/synth/SWAP4/s0 3.44 3.45 +0.2% PASS
total/synth/SWAP5/s0 3.44 3.45 +0.1% PASS
total/synth/SWAP6/s0 3.44 3.45 +0.3% PASS
total/synth/SWAP7/s0 3.45 3.45 -0.0% PASS
total/synth/SWAP8/s0 3.45 3.45 +0.2% PASS
total/synth/SWAP9/s0 3.45 3.46 +0.3% PASS
total/synth/XOR/b0 3.14 3.10 -1.4% PASS
total/synth/XOR/b1 3.72 3.49 -6.0% PASS
total/synth/loop_v1 7.05 6.69 -5.1% PASS
total/synth/loop_v2 7.07 6.74 -4.5% PASS

Summary: 194 benchmarks, 0 regressions


✅ Performance Check Passed (multipass)

Performance Benchmark Results (threshold: 25%)

Benchmark Baseline (us) Current (us) Change Status
total/main/blake2b_huff/8415nulls 0.83 0.83 -0.1% PASS
total/main/blake2b_huff/empty 0.02 0.02 -0.7% PASS
total/main/blake2b_shifts/8415nulls 4.43 4.40 -0.7% PASS
total/main/sha1_divs/5311 0.58 0.58 -0.3% PASS
total/main/sha1_divs/empty 0.01 0.01 -0.3% PASS
total/main/sha1_shifts/5311 0.54 0.54 +0.4% PASS
total/main/sha1_shifts/empty 0.01 0.01 +0.5% PASS
total/main/snailtracer/benchmark 31.20 30.96 -0.8% PASS
total/main/structarray_alloc/nfts_rank 0.27 0.27 -1.0% PASS
total/main/swap_math/insufficient_liquidity 0.00 0.00 -0.0% PASS
total/main/swap_math/received 0.00 0.00 -0.5% PASS
total/main/swap_math/spent 0.00 0.00 +0.6% PASS
total/main/weierstrudel/1 0.24 0.24 +0.1% PASS
total/main/weierstrudel/15 2.58 2.57 -0.4% PASS
total/micro/JUMPDEST_n0/empty 0.00 0.00 +0.3% PASS
total/micro/jump_around/empty 0.06 0.06 +3.0% PASS
total/micro/loop_with_many_jumpdests/empty 0.00 0.00 -0.2% PASS
total/micro/memory_grow_mload/by1 0.01 0.01 +0.6% PASS
total/micro/memory_grow_mload/by16 0.01 0.01 +0.9% PASS
total/micro/memory_grow_mload/by32 0.01 0.01 -1.3% PASS
total/micro/memory_grow_mload/nogrow 0.01 0.01 -0.5% PASS
total/micro/memory_grow_mstore/by1 0.01 0.01 +1.0% PASS
total/micro/memory_grow_mstore/by16 0.01 0.01 -0.2% PASS
total/micro/memory_grow_mstore/by32 0.01 0.01 -1.3% PASS
total/micro/memory_grow_mstore/nogrow 0.01 0.01 -0.6% PASS
total/micro/signextend/one 0.08 0.08 +1.3% PASS
total/micro/signextend/zero 0.08 0.08 +1.2% PASS
total/synth/ADD/b0 0.00 0.00 +0.1% PASS
total/synth/ADD/b1 0.00 0.00 -0.5% PASS
total/synth/ADDRESS/a0 0.15 0.15 -0.1% PASS
total/synth/ADDRESS/a1 0.15 0.15 +0.0% PASS
total/synth/AND/b0 0.00 0.00 +0.1% PASS
total/synth/AND/b1 0.00 0.00 +0.3% PASS
total/synth/BYTE/b0 0.00 0.00 +0.1% PASS
total/synth/BYTE/b1 0.00 0.00 +0.3% PASS
total/synth/CALLDATASIZE/a0 0.07 0.07 -0.0% PASS
total/synth/CALLDATASIZE/a1 0.07 0.07 -0.0% PASS
total/synth/CALLER/a0 0.18 0.18 -0.3% PASS
total/synth/CALLER/a1 0.18 0.18 -0.2% PASS
total/synth/CALLVALUE/a0 0.19 0.19 +1.3% PASS
total/synth/CALLVALUE/a1 0.19 0.19 +1.2% PASS
total/synth/CODESIZE/a0 0.07 0.07 -0.0% PASS
total/synth/CODESIZE/a1 0.07 0.07 +0.0% PASS
total/synth/DUP1/d0 0.00 0.00 -0.1% PASS
total/synth/DUP1/d1 0.00 0.00 -0.0% PASS
total/synth/DUP10/d0 0.00 0.00 +0.4% PASS
total/synth/DUP10/d1 0.00 0.00 +0.0% PASS
total/synth/DUP11/d0 0.00 0.00 +0.0% PASS
total/synth/DUP11/d1 0.00 0.00 +0.2% PASS
total/synth/DUP12/d0 0.00 0.00 +0.1% PASS
total/synth/DUP12/d1 0.00 0.00 +0.2% PASS
total/synth/DUP13/d0 0.00 0.00 +0.2% PASS
total/synth/DUP13/d1 0.00 0.00 -0.0% PASS
total/synth/DUP14/d0 0.00 0.00 -0.0% PASS
total/synth/DUP14/d1 0.00 0.00 +0.4% PASS
total/synth/DUP15/d0 0.00 0.00 +0.0% PASS
total/synth/DUP15/d1 0.00 0.00 +0.0% PASS
total/synth/DUP16/d0 0.00 0.00 -0.1% PASS
total/synth/DUP16/d1 0.00 0.00 +0.2% PASS
total/synth/DUP2/d0 0.00 0.00 +0.3% PASS
total/synth/DUP2/d1 0.00 0.00 +0.3% PASS
total/synth/DUP3/d0 0.00 0.00 +0.2% PASS
total/synth/DUP3/d1 0.00 0.00 +0.4% PASS
total/synth/DUP4/d0 0.00 0.00 +0.5% PASS
total/synth/DUP4/d1 0.00 0.00 +0.1% PASS
total/synth/DUP5/d0 0.00 0.00 +0.3% PASS
total/synth/DUP5/d1 0.00 0.00 -0.3% PASS
total/synth/DUP6/d0 0.00 0.00 +0.4% PASS
total/synth/DUP6/d1 0.00 0.00 +0.4% PASS
total/synth/DUP7/d0 0.00 0.00 -0.0% PASS
total/synth/DUP7/d1 0.00 0.00 +0.0% PASS
total/synth/DUP8/d0 0.00 0.00 -0.1% PASS
total/synth/DUP8/d1 0.00 0.00 -0.1% PASS
total/synth/DUP9/d0 0.00 0.00 -0.2% PASS
total/synth/DUP9/d1 0.00 0.00 -0.1% PASS
total/synth/EQ/b0 0.00 0.00 -0.1% PASS
total/synth/EQ/b1 0.00 0.00 +0.2% PASS
total/synth/GAS/a0 0.76 0.76 +0.0% PASS
total/synth/GAS/a1 0.76 0.76 +0.0% PASS
total/synth/GT/b0 0.00 0.00 -0.3% PASS
total/synth/GT/b1 0.00 0.00 +0.1% PASS
total/synth/ISZERO/u0 0.00 0.00 +0.1% PASS
total/synth/JUMPDEST/n0 0.00 0.00 +0.4% PASS
total/synth/LT/b0 0.00 0.00 +0.2% PASS
total/synth/LT/b1 0.00 0.00 +0.2% PASS
total/synth/MSIZE/a0 0.00 0.00 +0.1% PASS
total/synth/MSIZE/a1 0.00 0.00 +0.0% PASS
total/synth/MUL/b0 0.00 0.00 +0.1% PASS
total/synth/MUL/b1 0.00 0.00 +0.2% PASS
total/synth/NOT/u0 0.00 0.00 +0.3% PASS
total/synth/OR/b0 0.00 0.00 +0.3% PASS
total/synth/OR/b1 0.00 0.00 +0.1% PASS
total/synth/PC/a0 0.00 0.00 +0.1% PASS
total/synth/PC/a1 0.00 0.00 -0.2% PASS
total/synth/PUSH1/p0 0.00 0.00 +0.2% PASS
total/synth/PUSH1/p1 0.00 0.00 +0.1% PASS
total/synth/PUSH10/p0 0.00 0.00 +2.1% PASS
total/synth/PUSH10/p1 0.00 0.00 +2.5% PASS
total/synth/PUSH11/p0 0.00 0.00 -0.3% PASS
total/synth/PUSH11/p1 0.00 0.00 +1.7% PASS
total/synth/PUSH12/p0 0.00 0.00 +3.3% PASS
total/synth/PUSH12/p1 0.00 0.00 +0.5% PASS
total/synth/PUSH13/p0 0.00 0.00 +0.4% PASS
total/synth/PUSH13/p1 0.00 0.00 +0.4% PASS
total/synth/PUSH14/p0 0.00 0.00 +1.4% PASS
total/synth/PUSH14/p1 0.00 0.00 +0.3% PASS
total/synth/PUSH15/p0 0.00 0.00 -0.1% PASS
total/synth/PUSH15/p1 0.00 0.00 +0.4% PASS
total/synth/PUSH16/p0 0.00 0.00 -0.4% PASS
total/synth/PUSH16/p1 0.00 0.00 +1.5% PASS
total/synth/PUSH17/p0 0.00 0.00 +0.4% PASS
total/synth/PUSH17/p1 0.00 0.00 -0.3% PASS
total/synth/PUSH18/p0 0.00 0.00 +0.6% PASS
total/synth/PUSH18/p1 0.00 0.00 +1.0% PASS
total/synth/PUSH19/p0 0.00 0.00 +0.4% PASS
total/synth/PUSH19/p1 0.00 0.00 -0.1% PASS
total/synth/PUSH2/p0 0.00 0.00 -0.6% PASS
total/synth/PUSH2/p1 0.00 0.00 +0.2% PASS
total/synth/PUSH20/p0 0.00 0.00 +0.1% PASS
total/synth/PUSH20/p1 0.00 0.00 +0.1% PASS
total/synth/PUSH21/p0 0.00 0.00 -0.2% PASS
total/synth/PUSH21/p1 0.00 0.00 +0.4% PASS
total/synth/PUSH22/p0 1.40 1.40 -0.0% PASS
total/synth/PUSH22/p1 1.59 1.59 -0.3% PASS
total/synth/PUSH23/p0 1.39 1.39 -0.0% PASS
total/synth/PUSH23/p1 1.59 1.59 +0.0% PASS
total/synth/PUSH24/p0 1.40 1.40 +0.1% PASS
total/synth/PUSH24/p1 1.58 1.59 +0.1% PASS
total/synth/PUSH25/p0 1.40 1.40 +0.0% PASS
total/synth/PUSH25/p1 1.59 1.58 -0.1% PASS
total/synth/PUSH26/p0 1.31 1.32 +0.7% PASS
total/synth/PUSH26/p1 1.59 1.60 +1.0% PASS
total/synth/PUSH27/p0 1.40 1.40 +0.1% PASS
total/synth/PUSH27/p1 1.61 1.61 +0.1% PASS
total/synth/PUSH28/p0 1.40 1.40 -0.2% PASS
total/synth/PUSH28/p1 1.61 1.61 +0.3% PASS
total/synth/PUSH29/p0 1.40 1.39 -0.0% PASS
total/synth/PUSH29/p1 1.59 1.60 +0.6% PASS
total/synth/PUSH3/p0 0.00 0.00 +0.5% PASS
total/synth/PUSH3/p1 0.00 0.00 +0.1% PASS
total/synth/PUSH30/p0 1.50 1.48 -0.8% PASS
total/synth/PUSH30/p1 1.62 1.62 +0.3% PASS
total/synth/PUSH31/p0 1.40 1.40 -0.3% PASS
total/synth/PUSH31/p1 1.77 1.69 -5.0% PASS
total/synth/PUSH32/p0 1.40 1.40 +0.1% PASS
total/synth/PUSH32/p1 1.61 1.61 +0.0% PASS
total/synth/PUSH4/p0 0.00 0.00 -0.1% PASS
total/synth/PUSH4/p1 0.00 0.00 +0.9% PASS
total/synth/PUSH5/p0 0.00 0.00 +0.2% PASS
total/synth/PUSH5/p1 0.00 0.00 +0.1% PASS
total/synth/PUSH6/p0 0.00 0.00 -1.1% PASS
total/synth/PUSH6/p1 0.00 0.00 -1.4% PASS
total/synth/PUSH7/p0 0.00 0.00 +1.4% PASS
total/synth/PUSH7/p1 0.00 0.00 +1.6% PASS
total/synth/PUSH8/p0 0.00 0.00 +0.3% PASS
total/synth/PUSH8/p1 0.00 0.00 +2.2% PASS
total/synth/PUSH9/p0 0.00 0.00 +1.8% PASS
total/synth/PUSH9/p1 0.00 0.00 -0.2% PASS
total/synth/RETURNDATASIZE/a0 0.03 0.03 -0.1% PASS
total/synth/RETURNDATASIZE/a1 0.03 0.03 -0.3% PASS
total/synth/SAR/b0 0.00 0.00 -0.1% PASS
total/synth/SAR/b1 0.00 0.00 +0.3% PASS
total/synth/SGT/b0 0.00 0.00 -0.2% PASS
total/synth/SGT/b1 0.00 0.00 +0.0% PASS
total/synth/SHL/b0 0.00 0.00 +0.2% PASS
total/synth/SHL/b1 0.00 0.00 +0.2% PASS
total/synth/SHR/b0 0.00 0.00 +0.3% PASS
total/synth/SHR/b1 0.00 0.00 -0.1% PASS
total/synth/SIGNEXTEND/b0 0.00 0.00 -0.1% PASS
total/synth/SIGNEXTEND/b1 0.00 0.00 +0.0% PASS
total/synth/SLT/b0 0.00 0.00 +0.3% PASS
total/synth/SLT/b1 0.00 0.00 -0.4% PASS
total/synth/SUB/b0 0.00 0.00 +0.4% PASS
total/synth/SUB/b1 0.00 0.00 -0.1% PASS
total/synth/SWAP1/s0 0.00 0.00 +0.0% PASS
total/synth/SWAP10/s0 0.00 0.00 +0.3% PASS
total/synth/SWAP11/s0 0.00 0.00 +0.2% PASS
total/synth/SWAP12/s0 0.00 0.00 +0.4% PASS
total/synth/SWAP13/s0 0.00 0.00 +0.2% PASS
total/synth/SWAP14/s0 0.00 0.00 +0.1% PASS
total/synth/SWAP15/s0 0.00 0.00 -0.3% PASS
total/synth/SWAP16/s0 0.00 0.00 +0.4% PASS
total/synth/SWAP2/s0 0.00 0.00 +0.0% PASS
total/synth/SWAP3/s0 0.00 0.00 +0.3% PASS
total/synth/SWAP4/s0 0.00 0.00 +0.0% PASS
total/synth/SWAP5/s0 0.00 0.00 -0.2% PASS
total/synth/SWAP6/s0 0.00 0.00 -0.0% PASS
total/synth/SWAP7/s0 0.00 0.00 -0.1% PASS
total/synth/SWAP8/s0 0.00 0.00 +0.0% PASS
total/synth/SWAP9/s0 0.00 0.00 -0.1% PASS
total/synth/XOR/b0 0.00 0.00 +0.4% PASS
total/synth/XOR/b1 0.00 0.00 +0.3% PASS
total/synth/loop_v1 1.50 1.51 +0.6% PASS
total/synth/loop_v2 1.39 1.39 +0.3% PASS

Summary: 194 benchmarks, 0 regressions


@abmcar abmcar force-pushed the feat/gas-check-placement branch 2 times, most recently from 6103dfd to de98b08 Compare April 5, 2026 14:22
@abmcar abmcar marked this pull request as draft April 6, 2026 03:42
@abmcar abmcar force-pushed the feat/gas-check-placement branch from de98b08 to 3ae7153 Compare April 6, 2026 06:24
@abmcar abmcar changed the title feat(evm): gas check placement optimization with mixed CFG support WIP:feat(evm): gas check placement optimization with mixed CFG support Apr 8, 2026
@abmcar abmcar force-pushed the feat/gas-check-placement branch 4 times, most recently from 16b877a to efc242f Compare April 11, 2026 03:59
@abmcar abmcar changed the title WIP:feat(evm): gas check placement optimization with mixed CFG support feat(evm): gas check placement with mixed CFG, SPP JIT output, and interpreter-mode gating Apr 11, 2026
@abmcar abmcar force-pushed the feat/gas-check-placement branch from 1c3fe1c to 773fcdc Compare April 13, 2026 10:09
abmcar added a commit to abmcar/DTVM that referenced this pull request Apr 25, 2026
…al design

Address codex review on PR DTVMStack#446:
- Rewrite the gas-check-placement change doc to describe the final design
  only: mixed-precision CFG (over-approximate dynamic jumps), separate
  GasChunkCostSPP array for the JIT, and interpreter-mode gating. Drop
  the call-site / ResolvedJumpTargets narrative — that exploration was
  reverted by c26bf7c and lives in git history, not the change doc.
- Update src/evm/evm_cache.md so GasChunkCost is documented as the
  unshifted interpreter cost and the new GasChunkCostSPP field is
  documented as the SPP-shifted JIT cost. Match the field semantics in
  src/evm/evm_cache.cpp:1161-1165 and src/evm/evm_cache.h:22-27.
@abmcar abmcar force-pushed the feat/gas-check-placement branch from a15505f to a6e34cf Compare April 25, 2026 05:58
@abmcar abmcar marked this pull request as ready for review April 25, 2026 16:07
Comment thread src/compiler/evm_frontend/evm_mir_compiler.cpp
abmcar added a commit to abmcar/DTVM that referenced this pull request Apr 28, 2026
Add docs/design/evm-gas-mechanism.md with mermaid diagrams covering:
- shared EVMBytecodeCache layout and the GasChunkCost vs
  GasChunkCostSPP split,
- interpreter chunk fast path (pre-charge at chunk start) with
  per-opcode fallback,
- JIT meterOpcode/meterGas dMIR emission and shared OOG block,
- SPP cost shifting (Lemma 6.14), per-path total preservation,
  mixed-precision CFG with over-approximated dynamic jumps,
- pipeline gating via EVMModule::CacheNeedsSPP.

Addresses zoowii's review request on PR DTVMStack#446 to document the
latest interpreter and JIT gas mechanism.
abmcar added a commit to abmcar/DTVM that referenced this pull request May 7, 2026
Records the 6 fix items (F1-F6) identified by the 2026-05-07 self-review
of PR DTVMStack#446 with concrete file:line citations, sequencing, and quality
gates.

The plan was iterated through 3 review rounds (Opus subagent + concrete
GraphQL verification of GitHub thread state) before this final form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
abmcar and others added 8 commits May 11, 2026 20:16
Remove the all-or-nothing HasDynamicJump bailout in buildGasChunksSPP so that
contracts with a mix of resolvable and unresolvable jumps still get CFG-based
SPP shifting on the resolved portion. Factor the edge construction into a
reusable buildCFGEdges() that can be driven either with over-approximation or
with call-site-resolved targets.

Add resolveCallSiteTargets() which detects the Solidity internal-function
return pattern (SWAPn -> JUMP) and walks predecessors to find the enclosing
function entry JUMPDEST, then collects valid return addresses from all
matching call sites (PUSH ret -> PUSH func -> JUMP). The reverse-reachability
walk is bounded by MAX_REVERSE_REACHABILITY_DEPTH to cap compile-time cost.

Introduce decodePushAsJumpDest() as a shared PUSH-as-JUMPDEST decode helper
and add Prev2Pc / Prev2Opcode tracking on GasBlock so the 3-instruction
call-site window can be inspected without rescanning bytecode.

Tighten the SPP shifting guard so that a successor whose last opcode is a
isGasChunkTerminator bails out of shifting, preventing gas cost from being
moved across chunk boundaries.

GasChunkCost continues to write Blocks[Id].Cost (the original unshifted
per-block cost) exactly as PR DTVMStack#371 established: the interpreter gas chunk
fast path depends on unshifted costs, and exporting SPP-shifted metering to
the JIT is left as a follow-up on a separate JIT-only output path.

Test plan:
- format.sh check: clean
- evmone-unittests multipass: 223/223 pass
- evmone-unittests interpreter: 215/215 pass
- evmone-statetest --fork Cancun: 2723/2723 pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Measured via evmone-bench against upstream/main@a14a9de on the
external/total/(main|micro) benchmark set: geomean -10.13% across
27 benchmarks, with memory_grow_mload/mstore -19% to -24%, signextend
-19% to -20%, and snailtracer -7.53%.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add EVMBytecodeCache::GasChunkCostSPP as a second parallel cost array that
holds the SPP metering-shifted per-chunk cost computed by buildGasChunksSPP.
The interpreter continues reading the unshifted GasChunkCost (preserving
PR DTVMStack#371's interpreter-safety invariant) while the multipass JIT prefers the
shifted values when emitting gas checks.

Plumb the pointer through EVMFrontendContext::setGasChunkInfo, snapshot it in
EVMMirBuilder, and swap the three chunk-cost read sites:

- meterOpcode — primary per-chunk-start charge
- meterOpcodeRange (slow path) — JUMPDEST-skip cumulative sum
- JUMPDEST-run suffix-sum precompute inside the jump table builder

Swapping all three sites is safe because SPP's lemma614 shift can only
transfer gas from a single-predecessor successor into its parent. Every
JUMPDEST is a jump target (multi-predecessor), so SPP can never shift gas
*into* a JUMPDEST-run member from outside the run; the only intra-run shift
it can perform is from the trailing body chunk up into the last JUMPDEST,
which is still charged along every entry path into the run. Total gas along
any realizable execution path is preserved.

Test plan:
- format.sh check: clean
- evmone-unittests multipass: 223/223 pass
- evmone-unittests interpreter: 215/215 pass
- evmone-statetest --fork Cancun: 2723/2723 pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 3 wires SPP-shifted gas costs into the multipass JIT but leaves the
expensive CFG / call-site / metering pipeline running for every module —
including interpreter-only ones that never read the shifted values. On CI
that manifests as a ~7% interpreter-mode regression on snailtracer and up
to +14% on smaller benchmarks where compile time dominates the total run.

Gate the pipeline:

- Add `bool EnableSPP` to `buildBytecodeCache`. When false, the function
  walks the basic gas-block scan, writes unshifted `Blocks[Id].Cost` into
  `GasChunkEnd` / `GasChunkCost`, and leaves `GasChunkCostSPP` empty.

- Track `EVMModule::CacheNeedsSPP`. It is set to `true` immediately before
  `performEVMJITCompile` runs (the only current SPP consumer). Pure
  interpreter-mode modules and JIT-fallback modules leave it `false`, so
  the lazy `initBytecodeCache` picks the cheap path.

- `evm_compiler.cpp` passes `nullptr` when the cache's `GasChunkCostSPP`
  vector is empty, so any JIT compile without SPP (defensive path) falls
  back to the unshifted table via the existing `GasChunkCostSPP ? ... :
  GasChunkCost` pattern in `meterOpcode` / `meterOpcodeRange` / the
  JUMPDEST-run suffix-sum builder.

Test plan:
- format.sh check: clean
- evmone-unittests multipass: 223/223 pass (9.4s vs 13.7s previously)
- evmone-unittests interpreter: 215/215 pass (0.4s vs 3.7s previously)
- evmone-statetest --fork Cancun: 2723/2723 pass (67s vs 103s previously)
- Local bench vs upstream/main (CI flags): geomean -14.29% (n=27)

The test-suite runtime drop is the dominant signal that gating works —
interpreter mode no longer runs the SPP pipeline, so every module load in
the test suite gets back the ~90% of cache-build time the pipeline used
to consume.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous two-pass CFG build used call-site-resolved targets to
replace over-approximate edges for dynamic jumps. This created an
under-approximate CFG when resolution was incomplete, allowing
lemma614Update to shift gas along non-existent edges and produce
unsafe metering on missed paths.

Fix: always over-approximate dynamic jumps in buildCFGEdges (edges to
all JUMPDESTs). Remove the second-pass CFG rebuild. Export resolved
targets through ResolvedJumpTargets for downstream consumers (MIR
direct-branch optimization with runtime guard) instead of using them
for CFG refinement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reflect Phase 5 (soundness fix): always over-approximate CFG for
dynamic jumps, export ResolvedJumpTargets for downstream MIR use,
document reverse BFS cross-function risk as benign.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
resolveCallSiteTargets() and its ResolvedJumpTargets export had no
downstream consumer — the resolved targets were computed but never
read. Remove the function, its helper isSwapOpcode(), the cache
field, and the output parameter to eliminate dead work (O(N×200) BFS
per JIT-compiled contract).

The call-site enumeration algorithm can be restored from git history
when a consumer (e.g. MIR direct-branch optimization) is implemented.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous run had +70% SHL/SHR/SAR synth regressions due to noisy
neighbor on shared GitHub Actions runner — same baseline, same code,
different run produced 11.8ms vs 20.2ms for SHL/b0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
abmcar and others added 8 commits May 11, 2026 20:17
…al design

Address codex review on PR DTVMStack#446:
- Rewrite the gas-check-placement change doc to describe the final design
  only: mixed-precision CFG (over-approximate dynamic jumps), separate
  GasChunkCostSPP array for the JIT, and interpreter-mode gating. Drop
  the call-site / ResolvedJumpTargets narrative — that exploration was
  reverted by c26bf7c and lives in git history, not the change doc.
- Update src/evm/evm_cache.md so GasChunkCost is documented as the
  unshifted interpreter cost and the new GasChunkCostSPP field is
  documented as the SPP-shifted JIT cost. Match the field semantics in
  src/evm/evm_cache.cpp:1161-1165 and src/evm/evm_cache.h:22-27.
Add docs/design/evm-gas-mechanism.md with mermaid diagrams covering:
- shared EVMBytecodeCache layout and the GasChunkCost vs
  GasChunkCostSPP split,
- interpreter chunk fast path (pre-charge at chunk start) with
  per-opcode fallback,
- JIT meterOpcode/meterGas dMIR emission and shared OOG block,
- SPP cost shifting (Lemma 6.14), per-path total preservation,
  mixed-precision CFG with over-approximated dynamic jumps,
- pipeline gating via EVMModule::CacheNeedsSPP.

Addresses zoowii's review request on PR DTVMStack#446 to document the
latest interpreter and JIT gas mechanism.
Two reviewers flagged factual issues in docs/design/evm-gas-mechanism.md:

- JIT meterOpcode flowchart was wrong: when the chunk cache is
  populated and PC is mid-chunk, the function returns without
  emitting any MIR (evm_mir_compiler.cpp:537), it does NOT fall
  through to the per-opcode metric. The per-opcode fallback only
  fires when the cache pointers are absent. Diagram now shows
  both branches separately.
- Chunk-terminator wording was inverted: SSTORE/CALL*/CREATE*/GAS
  end their own chunk (the terminator's static cost is included,
  evm_cache.cpp:329), they are not "before the boundary". Updated
  the chunk definition and the interpreter key-properties bullet.
- Memory expansion is not a chunk boundary; it is charged inside
  handlers via expandMemoryAndChargeGas. Removed the misleading
  "dynamic memory growth" entry.
- Gas-register sync sites are not limited to CALL/CREATE/return;
  syncGasToMemory is also called from balance/code/keccak/memory
  handlers. Listed concrete line numbers.
- Interpreter mermaid: chunk-start condition failure does not
  raise OOG directly; it falls into the slow per-opcode path.
- Failure-mode table updated to match.

Also drift-fixed line numbers for meterOpcode (524), meterOpcodeRange
(544), JumpDestRun precompute (1297-1335), and EVMModule
initBytecodeCache (133-136). Counted parallel arrays correctly (5).
Added precondition note to the SPP example. Replaced \\n with <br/>
and quoted diamond labels so GitHub renders the diagrams.
Records the 6 fix items (F1-F6) identified by the 2026-05-07 self-review
of PR DTVMStack#446 with concrete file:line citations, sequencing, and quality
gates.

The plan was iterated through 3 review rounds (Opus subagent + concrete
GraphQL verification of GitHub thread state) before this final form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…approx invariant

GasBlock::Prev2Pc and Prev2Opcode were added to support a future 3-instruction
call-site window lookup, but the call-site enumeration that would have consumed
them was removed in commit c26bf7c (Phase 5 CFG soundness fix). Whole-repo grep
confirms zero readers; only the writers in buildGasBlocks remain. Removing both
fields shrinks GasBlock by ~9 bytes (one uint32_t + one uint8_t + alignment) and
removes dead bookkeeping from every cache build.

Also extends the buildCFGEdges function comment to make the soundness pairing
with lemma614Update explicit: the over-approximated dynamic-jump edges to all
JUMPDESTs work because lemma614Update's effectivePredCount > 1 guard refuses to
shift gas across multi-predecessor edges, so the over-approximation is absorbed
without breaking per-path totals.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CacheNeedsSPP flag controls whether the bytecode cache builds with the
SPP metering pipeline. It must be set before any getBytecodeCache() call:
once the cache is lazily built, the EnableSPP decision is fixed for the
module's lifetime. Future lazy / on-demand JIT paths must flip this flag
before triggering the cache build, otherwise the JIT will silently fall
back to the unshifted GasChunkCost array.

Documentation only — no behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… and F6 dropped

The change-doc Metrics section cited a 27-bench local evmone-bench run
(3 reps) whose numbers had drifted significantly from the CI Performance
Regression Check baseline. The "≤ +6% small regressions" framing
mis-categorized the actual jump-heavy regressions on weierstrudel /
jump_around / snailtracer, which sit in the +10–23% range on the CI
multipass table. Rewrite the section using the CI bot's authoritative
numbers and drop the unverified geomean claim.

Also update the review-fix plan to record:
- F1, F4, F5 implemented in commits 81efba3 and 691069a;
- F2, F3 applied to the PR body;
- F6 (open an upstream issue for addEdge O(deg²)) dropped — the concern
  was theoretical, no commit on this branch touches addEdge, no measured
  evidence of compile-time pain. Concern kept inline as a future-tuning
  reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR DTVMStack#446 over-approximates dynamic jumps by inserting an explicit edge
from every dynamic-jump block to every JUMPDEST, then splitting each as
a critical edge. On contracts with D dynamic jumps and J JUMPDESTs that
costs O(D*J^2), so compile time grows quadratically with J and dominates
JIT-prep on jump-heavy bytecode.

This change carries a per-JUMPDEST scalar ImplicitDynamicPredCount that
counts how many dynamic-jump blocks could reach it at runtime, and folds
it into effectivePredCount so the lemma 6.14 update behaves identically
without materialising the edges. To keep dyn-only JUMPDESTs visible to
dominator and loop analyses (Solidity function returns that are
unreachable in the static-only CFG), we seed reachability from every
JUMPDEST after the static reachable set is built.

Compile time on loop_full_of_jumpdests drops from 7.3s to 3.3s. Geomean
runtime is -6.15% across the 27 paper benches with zero benches above
the +/-25% CI gate.

See docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@abmcar abmcar force-pushed the feat/gas-check-placement branch from b943a7a to 972615a Compare May 11, 2026 12:19
After rebasing onto current upstream/main (which now includes DTVMStack#458 / DTVMStack#460
/ DTVMStack#482 / DTVMStack#483 perf work) and running a 10-rep evmone-bench on the 27 paper
benches, the cumulative PR delta has collapsed to noise (raw geomean
+1.15%, +0.46% after correcting a single-iteration outlier on
main/blake2b_shifts/8415nulls via a focused 20-rep re-measurement).
0 benches above the +/-25% CI gate.

The A-vs-PR-base -2.73% from this commit's own optimization is unchanged;
the framing shift is that the absolute runtime delta of the whole PR vs
unmodified main has been absorbed by the intervening upstream perf
optimizations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@abmcar abmcar changed the title feat(evm): gas check placement with mixed CFG, SPP JIT output, and interpreter-mode gating feat(evm): drop SPP all-or-nothing fallback for dynamic-jump contracts May 12, 2026
abmcar and others added 6 commits May 12, 2026 15:02
Round-2 self-review surfaced that Phase 7's reachability stitch was
seeding every JUMPDEST as a BFS root regardless of whether the
contract contains any dynamic jumps. On contracts with no dynamic
jumps, a statically-dead JUMPDEST (no predecessor of any kind) would
become reachable post-Phase-7, expanding computeDominators and
computeLoops input and silently changing SPP decisions on that block
class. With this fix the stitch only seeds JUMPDESTs that carry a
nonzero ImplicitDynamicPredCount, preserving pre-Phase-7 behavior on
dead JUMPDESTs in dynamic-jump-free contracts.

Also:
- Replace the stale "dynamic jumps get edges to every JUMPDEST"
  comment block above buildCFGEdges (the implementation has actually
  skipped materialising those edges since Phase 7).
- Add evmCacheTests target with 4 targeted tests covering the gate,
  the dyn-target stitch path, the interpreter-only no-SPP path, and a
  multi-dyn-jump corner case.
- Soften the loop_full_of_jumpdests 7.3s -> 3.3s claim in the Phase 7
  change doc to note it is a local single-machine measurement, not
  CI-tracked.

Review plan: docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/review-fixes-r2.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 10-rep "wins" of -6.30% (blake2b_huff/8415nulls) and -4.84%
(loop_with_many_jumpdests/empty) flip or collapse in a 20-rep focused
rerun (+1.55% and -0.55% respectively). The 10-rep "regressions" of
+3.51% (weierstrudel/1) similarly collapses to +0.55%. Three other
10-rep "regression" benches (memory_grow_*) contain zero JUMP /
JUMPI / JUMPDEST opcodes, so PR DTVMStack#446's CFG changes cannot affect
them by construction.

Net picture: the 27-bench corpus is essentially flat within evmone-
bench's inter-binary drift band when measured at higher rep count.
Update the change doc to cite the 20-rep data, surface the
methodology caveat, and acknowledge the SPP-to-JIT cost-flow
mechanism as a theoretical effect below current measurement
precision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `evmCacheComplexityDemo` (build-only, not registered with ctest) and
a wrapper script `docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/
scaling_demo.sh` that runs the demo at multiple JUMPDEST counts and
prints a CSV table. Lets reviewers reproduce the cache-build wall clock
claim on any machine without rebuilding evmone or invoking the full
unittest binary.

The demo measures the WHOLE `buildBytecodeCache` pipeline (not just
the Phase 7 CFG step), so it surfaces the residual super-linear cost
from `computeDominators` / `buildLoopsUsingDominance` that this PR does
not touch. Update the change doc to (a) reference the demo, (b) include
a sample table, and (c) explicitly scope the O(N) claim to the
over-approximation step itself — the 4-second saving on
`loop_full_of_jumpdests` (7.3 s -> 3.3 s) IS the Phase 7 contribution;
the remaining 3.3 s is dom/loop analysis untouched by this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the single-snapshot table with a pre-Phase-7 vs Phase-7
comparison built from the same demo source on commit 99f23a3 (PR head
one commit before this change). At 20k JUMPDESTs the cache build drops
from 346 ms to 44 ms (~8x), confirming the asymptotic shape change.

Pre-Phase-7 doubles in ~4x time per doubling of N (quadratic, matching
the expected O(D * J^2) shape of explicit-edge add + critical-edge
split). Phase 7 doubles in 2-4x (sub-quadratic, with residual super-
linearity from computeDominators / buildLoopsUsingDominance independent
of this PR).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Compress the 30-line narrative around the dyn-target reachability stitch
and the over-approximation rationale to 10 lines that state only the
non-obvious WHY (soundness invariant + statically-dead JUMPDESTs left
unreachable on purpose). No behavior change.
* Build evmCacheComplexityDemo without -fsanitize=address even in ASan
  builds so the wall-clock measurement is not distorted by sanitizer
  overhead (the demo's whole purpose is timing buildBytecodeCache).
* Replace local OP_* magic-number constants in the tests and the demo
  with casts of evmc_opcode::* (single source of truth).
* Use zen::common::SteadyClock alias in the demo for consistency with
  the rest of the codebase.
* Strip historical / PR / commit-line-number narration from the test
  fixtures and the demo banner; concrete numbers and rationale already
  live in docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md.

No behavior change: evmCacheTests 4/4 pass, demo scaling shape is
preserved at N in {100, 5000, 20000}.
@zoowii zoowii merged commit d44eb8e into DTVMStack:main May 15, 2026
16 checks passed
@abmcar abmcar deleted the feat/gas-check-placement branch May 20, 2026 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants