feat(evm): drop SPP all-or-nothing fallback for dynamic-jump contracts#446
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves EVM SPP (strategic placement of gas checks) by always constructing a CFG with mixed-precision jump edges, adding a bytecode analysis pass to resolve common Solidity internal-return SWAPn → JUMP patterns via call-site enumeration, fixing an SPP output writeback bug, and plumbing resolved jump targets into the MIR compiler to enable more direct branching.
Changes:
- Build CFG even with unresolved dynamic jumps, using resolved edges where available and over-approximated edges otherwise.
- Add call-site enumeration to resolve
SWAPn → JUMPreturn targets and export them throughEVMBytecodeCache. - Fix SPP results being computed but not written to
GasChunkCost, and use resolved targets for a direct-branch fast path in MIR for single-target jumps.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/evm/evm_cache.h |
Add ResolvedJumpTargets to bytecode cache for cross-phase consumption. |
src/evm/evm_cache.cpp |
Implement call-site enumeration, CFG edge builder with mixed precision, and fix SPP cost writeback. |
src/compiler/evm_frontend/evm_mir_compiler.h |
Add frontend context plumbing for resolved targets and track CurrentInstrPC. |
src/compiler/evm_frontend/evm_mir_compiler.cpp |
Use resolved jump targets to emit direct branch for single-target dynamic JUMP. |
src/compiler/evm_compiler.cpp |
Pass resolved targets from module cache into frontend compilation context. |
src/action/evm_bytecode_visitor.h |
Set current instruction PC before invoking jump handlers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
⚡ Performance Regression Check Results✅ Performance Check Passed (interpreter)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions ✅ Performance Check Passed (multipass)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions |
6103dfd to
de98b08
Compare
de98b08 to
3ae7153
Compare
16b877a to
efc242f
Compare
1c3fe1c to
773fcdc
Compare
…al design Address codex review on PR DTVMStack#446: - Rewrite the gas-check-placement change doc to describe the final design only: mixed-precision CFG (over-approximate dynamic jumps), separate GasChunkCostSPP array for the JIT, and interpreter-mode gating. Drop the call-site / ResolvedJumpTargets narrative — that exploration was reverted by c26bf7c and lives in git history, not the change doc. - Update src/evm/evm_cache.md so GasChunkCost is documented as the unshifted interpreter cost and the new GasChunkCostSPP field is documented as the SPP-shifted JIT cost. Match the field semantics in src/evm/evm_cache.cpp:1161-1165 and src/evm/evm_cache.h:22-27.
a15505f to
a6e34cf
Compare
Add docs/design/evm-gas-mechanism.md with mermaid diagrams covering: - shared EVMBytecodeCache layout and the GasChunkCost vs GasChunkCostSPP split, - interpreter chunk fast path (pre-charge at chunk start) with per-opcode fallback, - JIT meterOpcode/meterGas dMIR emission and shared OOG block, - SPP cost shifting (Lemma 6.14), per-path total preservation, mixed-precision CFG with over-approximated dynamic jumps, - pipeline gating via EVMModule::CacheNeedsSPP. Addresses zoowii's review request on PR DTVMStack#446 to document the latest interpreter and JIT gas mechanism.
Records the 6 fix items (F1-F6) identified by the 2026-05-07 self-review of PR DTVMStack#446 with concrete file:line citations, sequencing, and quality gates. The plan was iterated through 3 review rounds (Opus subagent + concrete GraphQL verification of GitHub thread state) before this final form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove the all-or-nothing HasDynamicJump bailout in buildGasChunksSPP so that contracts with a mix of resolvable and unresolvable jumps still get CFG-based SPP shifting on the resolved portion. Factor the edge construction into a reusable buildCFGEdges() that can be driven either with over-approximation or with call-site-resolved targets. Add resolveCallSiteTargets() which detects the Solidity internal-function return pattern (SWAPn -> JUMP) and walks predecessors to find the enclosing function entry JUMPDEST, then collects valid return addresses from all matching call sites (PUSH ret -> PUSH func -> JUMP). The reverse-reachability walk is bounded by MAX_REVERSE_REACHABILITY_DEPTH to cap compile-time cost. Introduce decodePushAsJumpDest() as a shared PUSH-as-JUMPDEST decode helper and add Prev2Pc / Prev2Opcode tracking on GasBlock so the 3-instruction call-site window can be inspected without rescanning bytecode. Tighten the SPP shifting guard so that a successor whose last opcode is a isGasChunkTerminator bails out of shifting, preventing gas cost from being moved across chunk boundaries. GasChunkCost continues to write Blocks[Id].Cost (the original unshifted per-block cost) exactly as PR DTVMStack#371 established: the interpreter gas chunk fast path depends on unshifted costs, and exporting SPP-shifted metering to the JIT is left as a follow-up on a separate JIT-only output path. Test plan: - format.sh check: clean - evmone-unittests multipass: 223/223 pass - evmone-unittests interpreter: 215/215 pass - evmone-statetest --fork Cancun: 2723/2723 pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Measured via evmone-bench against upstream/main@a14a9de on the external/total/(main|micro) benchmark set: geomean -10.13% across 27 benchmarks, with memory_grow_mload/mstore -19% to -24%, signextend -19% to -20%, and snailtracer -7.53%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add EVMBytecodeCache::GasChunkCostSPP as a second parallel cost array that holds the SPP metering-shifted per-chunk cost computed by buildGasChunksSPP. The interpreter continues reading the unshifted GasChunkCost (preserving PR DTVMStack#371's interpreter-safety invariant) while the multipass JIT prefers the shifted values when emitting gas checks. Plumb the pointer through EVMFrontendContext::setGasChunkInfo, snapshot it in EVMMirBuilder, and swap the three chunk-cost read sites: - meterOpcode — primary per-chunk-start charge - meterOpcodeRange (slow path) — JUMPDEST-skip cumulative sum - JUMPDEST-run suffix-sum precompute inside the jump table builder Swapping all three sites is safe because SPP's lemma614 shift can only transfer gas from a single-predecessor successor into its parent. Every JUMPDEST is a jump target (multi-predecessor), so SPP can never shift gas *into* a JUMPDEST-run member from outside the run; the only intra-run shift it can perform is from the trailing body chunk up into the last JUMPDEST, which is still charged along every entry path into the run. Total gas along any realizable execution path is preserved. Test plan: - format.sh check: clean - evmone-unittests multipass: 223/223 pass - evmone-unittests interpreter: 215/215 pass - evmone-statetest --fork Cancun: 2723/2723 pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 3 wires SPP-shifted gas costs into the multipass JIT but leaves the expensive CFG / call-site / metering pipeline running for every module — including interpreter-only ones that never read the shifted values. On CI that manifests as a ~7% interpreter-mode regression on snailtracer and up to +14% on smaller benchmarks where compile time dominates the total run. Gate the pipeline: - Add `bool EnableSPP` to `buildBytecodeCache`. When false, the function walks the basic gas-block scan, writes unshifted `Blocks[Id].Cost` into `GasChunkEnd` / `GasChunkCost`, and leaves `GasChunkCostSPP` empty. - Track `EVMModule::CacheNeedsSPP`. It is set to `true` immediately before `performEVMJITCompile` runs (the only current SPP consumer). Pure interpreter-mode modules and JIT-fallback modules leave it `false`, so the lazy `initBytecodeCache` picks the cheap path. - `evm_compiler.cpp` passes `nullptr` when the cache's `GasChunkCostSPP` vector is empty, so any JIT compile without SPP (defensive path) falls back to the unshifted table via the existing `GasChunkCostSPP ? ... : GasChunkCost` pattern in `meterOpcode` / `meterOpcodeRange` / the JUMPDEST-run suffix-sum builder. Test plan: - format.sh check: clean - evmone-unittests multipass: 223/223 pass (9.4s vs 13.7s previously) - evmone-unittests interpreter: 215/215 pass (0.4s vs 3.7s previously) - evmone-statetest --fork Cancun: 2723/2723 pass (67s vs 103s previously) - Local bench vs upstream/main (CI flags): geomean -14.29% (n=27) The test-suite runtime drop is the dominant signal that gating works — interpreter mode no longer runs the SPP pipeline, so every module load in the test suite gets back the ~90% of cache-build time the pipeline used to consume. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous two-pass CFG build used call-site-resolved targets to replace over-approximate edges for dynamic jumps. This created an under-approximate CFG when resolution was incomplete, allowing lemma614Update to shift gas along non-existent edges and produce unsafe metering on missed paths. Fix: always over-approximate dynamic jumps in buildCFGEdges (edges to all JUMPDESTs). Remove the second-pass CFG rebuild. Export resolved targets through ResolvedJumpTargets for downstream consumers (MIR direct-branch optimization with runtime guard) instead of using them for CFG refinement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reflect Phase 5 (soundness fix): always over-approximate CFG for dynamic jumps, export ResolvedJumpTargets for downstream MIR use, document reverse BFS cross-function risk as benign. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
resolveCallSiteTargets() and its ResolvedJumpTargets export had no downstream consumer — the resolved targets were computed but never read. Remove the function, its helper isSwapOpcode(), the cache field, and the output parameter to eliminate dead work (O(N×200) BFS per JIT-compiled contract). The call-site enumeration algorithm can be restored from git history when a consumer (e.g. MIR direct-branch optimization) is implemented. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous run had +70% SHL/SHR/SAR synth regressions due to noisy neighbor on shared GitHub Actions runner — same baseline, same code, different run produced 11.8ms vs 20.2ms for SHL/b0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…al design Address codex review on PR DTVMStack#446: - Rewrite the gas-check-placement change doc to describe the final design only: mixed-precision CFG (over-approximate dynamic jumps), separate GasChunkCostSPP array for the JIT, and interpreter-mode gating. Drop the call-site / ResolvedJumpTargets narrative — that exploration was reverted by c26bf7c and lives in git history, not the change doc. - Update src/evm/evm_cache.md so GasChunkCost is documented as the unshifted interpreter cost and the new GasChunkCostSPP field is documented as the SPP-shifted JIT cost. Match the field semantics in src/evm/evm_cache.cpp:1161-1165 and src/evm/evm_cache.h:22-27.
Add docs/design/evm-gas-mechanism.md with mermaid diagrams covering: - shared EVMBytecodeCache layout and the GasChunkCost vs GasChunkCostSPP split, - interpreter chunk fast path (pre-charge at chunk start) with per-opcode fallback, - JIT meterOpcode/meterGas dMIR emission and shared OOG block, - SPP cost shifting (Lemma 6.14), per-path total preservation, mixed-precision CFG with over-approximated dynamic jumps, - pipeline gating via EVMModule::CacheNeedsSPP. Addresses zoowii's review request on PR DTVMStack#446 to document the latest interpreter and JIT gas mechanism.
Two reviewers flagged factual issues in docs/design/evm-gas-mechanism.md: - JIT meterOpcode flowchart was wrong: when the chunk cache is populated and PC is mid-chunk, the function returns without emitting any MIR (evm_mir_compiler.cpp:537), it does NOT fall through to the per-opcode metric. The per-opcode fallback only fires when the cache pointers are absent. Diagram now shows both branches separately. - Chunk-terminator wording was inverted: SSTORE/CALL*/CREATE*/GAS end their own chunk (the terminator's static cost is included, evm_cache.cpp:329), they are not "before the boundary". Updated the chunk definition and the interpreter key-properties bullet. - Memory expansion is not a chunk boundary; it is charged inside handlers via expandMemoryAndChargeGas. Removed the misleading "dynamic memory growth" entry. - Gas-register sync sites are not limited to CALL/CREATE/return; syncGasToMemory is also called from balance/code/keccak/memory handlers. Listed concrete line numbers. - Interpreter mermaid: chunk-start condition failure does not raise OOG directly; it falls into the slow per-opcode path. - Failure-mode table updated to match. Also drift-fixed line numbers for meterOpcode (524), meterOpcodeRange (544), JumpDestRun precompute (1297-1335), and EVMModule initBytecodeCache (133-136). Counted parallel arrays correctly (5). Added precondition note to the SPP example. Replaced \\n with <br/> and quoted diamond labels so GitHub renders the diagrams.
Records the 6 fix items (F1-F6) identified by the 2026-05-07 self-review of PR DTVMStack#446 with concrete file:line citations, sequencing, and quality gates. The plan was iterated through 3 review rounds (Opus subagent + concrete GraphQL verification of GitHub thread state) before this final form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…approx invariant GasBlock::Prev2Pc and Prev2Opcode were added to support a future 3-instruction call-site window lookup, but the call-site enumeration that would have consumed them was removed in commit c26bf7c (Phase 5 CFG soundness fix). Whole-repo grep confirms zero readers; only the writers in buildGasBlocks remain. Removing both fields shrinks GasBlock by ~9 bytes (one uint32_t + one uint8_t + alignment) and removes dead bookkeeping from every cache build. Also extends the buildCFGEdges function comment to make the soundness pairing with lemma614Update explicit: the over-approximated dynamic-jump edges to all JUMPDESTs work because lemma614Update's effectivePredCount > 1 guard refuses to shift gas across multi-predecessor edges, so the over-approximation is absorbed without breaking per-path totals. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CacheNeedsSPP flag controls whether the bytecode cache builds with the SPP metering pipeline. It must be set before any getBytecodeCache() call: once the cache is lazily built, the EnableSPP decision is fixed for the module's lifetime. Future lazy / on-demand JIT paths must flip this flag before triggering the cache build, otherwise the JIT will silently fall back to the unshifted GasChunkCost array. Documentation only — no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… and F6 dropped The change-doc Metrics section cited a 27-bench local evmone-bench run (3 reps) whose numbers had drifted significantly from the CI Performance Regression Check baseline. The "≤ +6% small regressions" framing mis-categorized the actual jump-heavy regressions on weierstrudel / jump_around / snailtracer, which sit in the +10–23% range on the CI multipass table. Rewrite the section using the CI bot's authoritative numbers and drop the unverified geomean claim. Also update the review-fix plan to record: - F1, F4, F5 implemented in commits 81efba3 and 691069a; - F2, F3 applied to the PR body; - F6 (open an upstream issue for addEdge O(deg²)) dropped — the concern was theoretical, no commit on this branch touches addEdge, no measured evidence of compile-time pain. Concern kept inline as a future-tuning reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR DTVMStack#446 over-approximates dynamic jumps by inserting an explicit edge from every dynamic-jump block to every JUMPDEST, then splitting each as a critical edge. On contracts with D dynamic jumps and J JUMPDESTs that costs O(D*J^2), so compile time grows quadratically with J and dominates JIT-prep on jump-heavy bytecode. This change carries a per-JUMPDEST scalar ImplicitDynamicPredCount that counts how many dynamic-jump blocks could reach it at runtime, and folds it into effectivePredCount so the lemma 6.14 update behaves identically without materialising the edges. To keep dyn-only JUMPDESTs visible to dominator and loop analyses (Solidity function returns that are unreachable in the static-only CFG), we seed reachability from every JUMPDEST after the static reachable set is built. Compile time on loop_full_of_jumpdests drops from 7.3s to 3.3s. Geomean runtime is -6.15% across the 27 paper benches with zero benches above the +/-25% CI gate. See docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
b943a7a to
972615a
Compare
After rebasing onto current upstream/main (which now includes DTVMStack#458 / DTVMStack#460 / DTVMStack#482 / DTVMStack#483 perf work) and running a 10-rep evmone-bench on the 27 paper benches, the cumulative PR delta has collapsed to noise (raw geomean +1.15%, +0.46% after correcting a single-iteration outlier on main/blake2b_shifts/8415nulls via a focused 20-rep re-measurement). 0 benches above the +/-25% CI gate. The A-vs-PR-base -2.73% from this commit's own optimization is unchanged; the framing shift is that the absolute runtime delta of the whole PR vs unmodified main has been absorbed by the intervening upstream perf optimizations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-2 self-review surfaced that Phase 7's reachability stitch was seeding every JUMPDEST as a BFS root regardless of whether the contract contains any dynamic jumps. On contracts with no dynamic jumps, a statically-dead JUMPDEST (no predecessor of any kind) would become reachable post-Phase-7, expanding computeDominators and computeLoops input and silently changing SPP decisions on that block class. With this fix the stitch only seeds JUMPDESTs that carry a nonzero ImplicitDynamicPredCount, preserving pre-Phase-7 behavior on dead JUMPDESTs in dynamic-jump-free contracts. Also: - Replace the stale "dynamic jumps get edges to every JUMPDEST" comment block above buildCFGEdges (the implementation has actually skipped materialising those edges since Phase 7). - Add evmCacheTests target with 4 targeted tests covering the gate, the dyn-target stitch path, the interpreter-only no-SPP path, and a multi-dyn-jump corner case. - Soften the loop_full_of_jumpdests 7.3s -> 3.3s claim in the Phase 7 change doc to note it is a local single-machine measurement, not CI-tracked. Review plan: docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/review-fixes-r2.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 10-rep "wins" of -6.30% (blake2b_huff/8415nulls) and -4.84% (loop_with_many_jumpdests/empty) flip or collapse in a 20-rep focused rerun (+1.55% and -0.55% respectively). The 10-rep "regressions" of +3.51% (weierstrudel/1) similarly collapses to +0.55%. Three other 10-rep "regression" benches (memory_grow_*) contain zero JUMP / JUMPI / JUMPDEST opcodes, so PR DTVMStack#446's CFG changes cannot affect them by construction. Net picture: the 27-bench corpus is essentially flat within evmone- bench's inter-binary drift band when measured at higher rep count. Update the change doc to cite the 20-rep data, surface the methodology caveat, and acknowledge the SPP-to-JIT cost-flow mechanism as a theoretical effect below current measurement precision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `evmCacheComplexityDemo` (build-only, not registered with ctest) and a wrapper script `docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/ scaling_demo.sh` that runs the demo at multiple JUMPDEST counts and prints a CSV table. Lets reviewers reproduce the cache-build wall clock claim on any machine without rebuilding evmone or invoking the full unittest binary. The demo measures the WHOLE `buildBytecodeCache` pipeline (not just the Phase 7 CFG step), so it surfaces the residual super-linear cost from `computeDominators` / `buildLoopsUsingDominance` that this PR does not touch. Update the change doc to (a) reference the demo, (b) include a sample table, and (c) explicitly scope the O(N) claim to the over-approximation step itself — the 4-second saving on `loop_full_of_jumpdests` (7.3 s -> 3.3 s) IS the Phase 7 contribution; the remaining 3.3 s is dom/loop analysis untouched by this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the single-snapshot table with a pre-Phase-7 vs Phase-7 comparison built from the same demo source on commit 99f23a3 (PR head one commit before this change). At 20k JUMPDESTs the cache build drops from 346 ms to 44 ms (~8x), confirming the asymptotic shape change. Pre-Phase-7 doubles in ~4x time per doubling of N (quadratic, matching the expected O(D * J^2) shape of explicit-edge add + critical-edge split). Phase 7 doubles in 2-4x (sub-quadratic, with residual super- linearity from computeDominators / buildLoopsUsingDominance independent of this PR). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Compress the 30-line narrative around the dyn-target reachability stitch and the over-approximation rationale to 10 lines that state only the non-obvious WHY (soundness invariant + statically-dead JUMPDESTs left unreachable on purpose). No behavior change.
* Build evmCacheComplexityDemo without -fsanitize=address even in ASan
builds so the wall-clock measurement is not distorted by sanitizer
overhead (the demo's whole purpose is timing buildBytecodeCache).
* Replace local OP_* magic-number constants in the tests and the demo
with casts of evmc_opcode::* (single source of truth).
* Use zen::common::SteadyClock alias in the demo for consistency with
the rest of the codebase.
* Strip historical / PR / commit-line-number narration from the test
fixtures and the demo banner; concrete numbers and rationale already
live in docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md.
No behavior change: evmCacheTests 4/4 pass, demo scaling shape is
preserved at N in {100, 5000, 20000}.
Summary
Before this PR, the EVM bytecode cache's SPP gas-metering pipeline used
an all-or-nothing bailout: any unresolved dynamic jump caused the
entire contract to skip SPP and fall back to per-block gas metering.
Real-world Solidity contracts always contain dynamic jumps (function
dispatch, return-via-stack), so in practice SPP delivered zero benefit
on the workloads that matter.
This PR removes the bailout by making the cache build a mixed-precision
CFG (precise edges for static
PUSH→JUMP, over-approximated edges forall other dynamic jumps) and exporting a separate SPP-shifted gas-cost
array that the multipass JIT consumes while the interpreter continues
to read the unshifted costs guaranteed by PR #371. The expensive SPP
pipeline is also gated to JIT-consumer modules so interpreter-only paths
never pay for it.
What this PR delivers
including ones with unresolvable dynamic jumps. Previously it was
effectively dead code on real Solidity workloads.
O(D × J²)→O(N)on the cache CFGbuild, where
Dis the dynamic-jump count andJis the JUMPDESTcount. Verified:
loop_full_of_jumpdestscache-build wall-clockdrops from 7.3s to 3.3s (local single-machine measurement, not
CI-tracked).
GasChunkCost/GasChunkCostSPPparallel arraysplus the
CacheNeedsSPPlifecycle flag cleanly separate theinterpreter and JIT gas paths, so future work on either side can
move independently.
Runtime delta on the 27-bench paper subset is within evmone-bench's
inter-binary drift band — see Evaluation below.
Phase 1 — mixed-CFG gas block construction
HasDynamicJumpbailout inbuildGasChunksSPP. Contracts with any unresolved dynamic jump used toskip the SPP pipeline completely; now they run the pipeline with
over-approximated edges for the unresolved portion.
buildCFGEdges()with over-approximation for all unresolveddynamic jumps (sound for SPP metering). The CFG is intentionally kept
over-approximate — using resolved targets to narrow edges would
under-approximate the CFG when resolution is incomplete, causing
lemma614Updateto shift gas along non-existent edges.lemma614Updateso a shift nevercrosses an
isGasChunkTerminatorboundary, and setMinSucc = 0whenencountering excluded successors to prevent unsafe shifting.
Phase 2 —
decodePushAsJumpDestdecode helperdecodePushAsJumpDest()out ofresolveConstantJumpTarget()asa shared decode helper.
Phase 3 — wire SPP-shifted costs into the multipass JIT
EVMBytecodeCache::GasChunkCostSPPpopulated from
Metering[]insidebuildGasChunksSPP. The existingGasChunkCostcontinues to hold unshiftedBlocks[Id].Costper theinterpreter-safety invariant established by PR fix(evm): disable SPP gas cost shifting and add opcode validity check in interpreter fallback path #371.
EVMFrontendContext::setGasChunkInfoand theEVMMirBuilderconstructor / copy path.GasChunkCostSPPwhenavailable, falling back to
GasChunkCost:EVMMirBuilder::meterOpcode— primary per-chunk-start chargeEVMMirBuilder::meterOpcodeRange— JUMPDEST-skip cumulative sumbuildEVMFunctionJUMPDEST-run suffix-sum precomputePhase 4 — gate the SPP pipeline on JIT-consumer modules only
buildBytecodeCache(..., bool EnableSPP = false). When false,skip the expensive CFG / metering pipeline entirely.
EVMModule::CacheNeedsSPP. Flipped totrueimmediately beforeaction::performEVMJITCompileruns — interpreter-only modules neverpay the SPP pipeline cost.
Phase 5 — CFG soundness fix
replace over-approximate edges. This created an under-approximate CFG
when call-site enumeration was incomplete, causing
lemma614Updatetoshift gas along non-existent edges and produce unsafe metering.
buildCFGEdges()now always over-approximates dynamic jumpslogically: unresolved dynamic jumps are represented by
ImplicitDynamicPredCountstamped onto JUMPDEST blocks, rather than bymaterializing one Succs/Preds edge per dynamic-jump/JUMPDEST pair.
Static jumps (PUSH → JUMP) still get precise single-target edges.
resolveCallSiteTargetsandResolvedJumpTargetsexport) — no downstream consumer exists yet.The algorithm can be restored from git history when a consumer
(e.g. MIR direct-branch optimization) is implemented.
Phase 6 — review fixes
GasBlock::Prev2Pc/Prev2Opcodefields and theirwritebacks. They were originally added to support a future
3-instruction call-site window lookup, but the call-site enumeration
that would have consumed them was removed in Phase 5. Whole-repo
grep confirmed zero readers; struct shrinks ~9 bytes.
buildCFGEdgesfunction comment to make the soundnesspairing with
lemma614Updateexplicit: the implicit dynamicpredecessor count is folded into
effectivePredCount, so dynamictargets are treated as multi-predecessor blocks without materializing
D×J CFG edges.
EVMModule::CacheNeedsSPPlifecycle invariant: the flagmust be set before any
getBytecodeCache()call, since theEnableSPPdecision is fixed at lazy-build time.Phase 7 — drop
O(D × J²)over-approximation costReplace the explicit add-then-split-critical-edge step that materialised
one CFG edge per (dynamic-jump-block, JUMPDEST) pair with a per-JUMPDEST
scalar
ImplicitDynamicPredCount, folded intoeffectivePredCountsothe lemma 6.14 update behaves identically without materialising the
edges. On a contract with
Ddynamic jumps andJJUMPDESTs, thecache build drops from
O(D × J²)toO(N).To keep dyn-only JUMPDESTs (Solidity function returns, unreachable in
the static-only CFG) visible to the dominator / loop analyses, seed
the reachability search from every JUMPDEST after the static reachable
set is built. Gated on
ImplicitDynamicPredCount > 0(round-2 reviewfix) so statically-dead JUMPDESTs in dynamic-jump-free contracts
preserve pre-Phase-7 behavior.
Compile-time check: end-to-end
evmone-unittestswall clock forloop_full_of_jumpdests(24556 JUMPDESTs) drops from 7.3 s to3.3 s (local single-machine measurement, not CI-tracked).
Intra-PR demo — same source built on the commit immediately
before this Phase 7 commit (
99f23a3) vs current HEAD, on asynthetic contract
CALLDATALOAD JUMP <N × JUMPDEST> STOP:Pre-Phase-7 grows ~4× per doubling of N (quadratic — the expected
O(D × J²) shape). Phase 7 grows 2–4× per doubling (sub-quadratic;
residual super-linearity comes from
computeDominators/buildLoopsUsingDominance, which this PR does not touch). Reproducewith
cmake --build build --target evmCacheComplexityDemo && bash docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/scaling_demo.sh.New
evmCacheTestsunit test target with 4 smoke/regression casescovering the SPP gate, dynamic-target reachability path, interpreter-only
no-SPP path, and multi-dyn-jump conservative metering.
Detail:
docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.mdand
docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/review-fixes-r2.md.Evaluation
Runtime
After rebasing onto current
upstream/main(which includes theintervening upstream perf work — #458 u256 arithmetic, #460
displacement-addressed bytes32, #482 depth-indexed pool, #483 inline
arithmetic delegate), and measuring the 27 paper benches
(
^external/total/(main|micro)/):27-bench 10-rep geomean: +1.15% (treatment slower; +0.46% after
correcting a single 20-rep-confirmed outlier on
main/blake2b_shifts/8415nulls).0 benches above the ±25% CI gate.
Caveat: this 10-rep number is sequential
(baseline-all-then-treatment-all), so it conflates real PR delta with
inter-binary system drift. Focused 20-rep re-measurement on the three
largest movers indicates the per-bench deltas are dominated by drift,
not by PR effects:
main/weierstrudel/1main/blake2b_huff/8415nullsmicro/loop_with_many_jumpdests/emptymain/blake2b_shifts/8415nullsAdditionally, three of the four "regression" benches reported above
the noise band at 10 reps —
micro/memory_grow_mstore/{nogrow,by1},micro/memory_grow_mload/nogrow— contain zero JUMP / JUMPI /JUMPDEST opcodes, so PR feat(evm): drop SPP all-or-nothing fallback for dynamic-jump contracts #446's CFG changes cannot affect them by
construction. Those deltas are pure drift artifacts.
Reading: this PR's value is the capability change (Phase 1 — SPP
now applies to contracts with dynamic jumps) and the algorithmic
complexity guarantee (Phase 7 — O(D × J²) → O(N) cache build), not raw
runtime improvement on the existing 27-bench suite. The intervening
upstream perf work absorbed any prior absolute speedup; the remaining
per-bench deltas are at or below evmone-bench's single-machine
inter-binary drift band.
Interpreter
0 regressions on CI. Local timing confirms the SPP gating bypass:
evmone-unittestsinterpreterCorrectness
tools/format.sh check: cleanevmone-unittestsmultipass: 223/223 passevmone-unittestsinterpreter: 215/215 passevmone-statetest --fork Cancunmultipass: 2723/2723 passevmone-statetest --fork Cancuninterpreter: 2723/2723 passevmCacheTestsunit tests: 4/4 passChanged files
src/evm/evm_cache.h— addGasChunkCostSPParray;GasBlock::ImplicitDynamicPredCountfieldsrc/evm/evm_cache.cpp— mixed-CFG, SPP export,EnableSPPgating,soundness fix (always over-approximate CFG); drop dead
Prev2Pc/Prev2Opcode; clarify CFG over-approx invariant;implicit-dyn-pred count + reachability stitch with R2 gate (Phase 7)
src/compiler/evm_frontend/evm_mir_compiler.{h,cpp}— plumb SPPpointer; prefer SPP-shifted cost at three chunk-cost read sites
src/compiler/evm_compiler.cpp— pass SPP pointer viasetGasChunkInfosrc/runtime/evm_module.{h,cpp}— addCacheNeedsSPPflag; flipbefore JIT compile; document lifecycle invariant
src/tests/evm_cache_tests.cpp— NEW unit test targetsrc/tests/CMakeLists.txt— register new test targetdocs/changes/2026-04-05-gas-check-placement/— change doc +review-fix plan
docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/— Phase 7change doc + round-2 review-fix plan
docs/design/evm-gas-mechanism.md— design doc (interpreter + JITgas mechanism with SPP)
Test plan
tools/format.sh checkcleanevmone-unittestsmultipass and interpreter: all passevmone-statetest --fork Cancunmultipass and interpreter: all passevmCacheTests4 cases passloop_full_of_jumpdestscache build under 4s (compile-timecomplexity verification, local single-machine)
🤖 Generated with Claude Code