feat(evm): drop SPP all-or-nothing fallback for dynamic-jump contracts by abmcar · Pull Request #446 · DTVMStack/DTVM

abmcar · 2026-04-05T13:24:33Z

Summary

Before this PR, the EVM bytecode cache's SPP gas-metering pipeline used
an all-or-nothing bailout: any unresolved dynamic jump caused the
entire contract to skip SPP and fall back to per-block gas metering.
Real-world Solidity contracts always contain dynamic jumps (function
dispatch, return-via-stack), so in practice SPP delivered zero benefit
on the workloads that matter.

This PR removes the bailout by making the cache build a mixed-precision
CFG (precise edges for static PUSH→JUMP, over-approximated edges for
all other dynamic jumps) and exporting a separate SPP-shifted gas-cost
array that the multipass JIT consumes while the interpreter continues
to read the unshifted costs guaranteed by PR #371. The expensive SPP
pipeline is also gated to JIT-consumer modules so interpreter-only paths
never pay for it.

What this PR delivers

Capability — SPP gas metering now applies to every contract,
including ones with unresolvable dynamic jumps. Previously it was
effectively dead code on real Solidity workloads.
Algorithmic complexity — O(D × J²) → O(N) on the cache CFG
build, where D is the dynamic-jump count and J is the JUMPDEST
count. Verified: loop_full_of_jumpdests cache-build wall-clock
drops from 7.3s to 3.3s (local single-machine measurement, not
CI-tracked).
Architecture — GasChunkCost / GasChunkCostSPP parallel arrays
plus the CacheNeedsSPP lifecycle flag cleanly separate the
interpreter and JIT gas paths, so future work on either side can
move independently.

Runtime delta on the 27-bench paper subset is within evmone-bench's
inter-binary drift band — see Evaluation below.

Phase 1 — mixed-CFG gas block construction

Remove the all-or-nothing HasDynamicJump bailout in
buildGasChunksSPP. Contracts with any unresolved dynamic jump used to
skip the SPP pipeline completely; now they run the pipeline with
over-approximated edges for the unresolved portion.
Factor out buildCFGEdges() with over-approximation for all unresolved
dynamic jumps (sound for SPP metering). The CFG is intentionally kept
over-approximate — using resolved targets to narrow edges would
under-approximate the CFG when resolution is incomplete, causing
lemma614Update to shift gas along non-existent edges.
Tighten the SPP shift guard inside lemma614Update so a shift never
crosses an isGasChunkTerminator boundary, and set MinSucc = 0 when
encountering excluded successors to prevent unsafe shifting.

Phase 2 — `decodePushAsJumpDest` decode helper

Factor decodePushAsJumpDest() out of resolveConstantJumpTarget() as
a shared decode helper.

Phase 3 — wire SPP-shifted costs into the multipass JIT

Add a second parallel cost array EVMBytecodeCache::GasChunkCostSPP
populated from Metering[] inside buildGasChunksSPP. The existing
GasChunkCost continues to hold unshifted Blocks[Id].Cost per the
interpreter-safety invariant established by PR fix(evm): disable SPP gas cost shifting and add opcode validity check in interpreter fallback path #371.
Plumb the pointer through EVMFrontendContext::setGasChunkInfo and the
EVMMirBuilder constructor / copy path.
Swap reads at three JIT sites so they prefer GasChunkCostSPP when
available, falling back to GasChunkCost:
- EVMMirBuilder::meterOpcode — primary per-chunk-start charge
- EVMMirBuilder::meterOpcodeRange — JUMPDEST-skip cumulative sum
- buildEVMFunction JUMPDEST-run suffix-sum precompute

Phase 4 — gate the SPP pipeline on JIT-consumer modules only

Add buildBytecodeCache(..., bool EnableSPP = false). When false,
skip the expensive CFG / metering pipeline entirely.
Track EVMModule::CacheNeedsSPP. Flipped to true immediately before
action::performEVMJITCompile runs — interpreter-only modules never
pay the SPP pipeline cost.

Phase 5 — CFG soundness fix

Remove the two-pass CFG rebuild that used resolved call-site targets to
replace over-approximate edges. This created an under-approximate CFG
when call-site enumeration was incomplete, causing lemma614Update to
shift gas along non-existent edges and produce unsafe metering.
buildCFGEdges() now always over-approximates dynamic jumps
logically: unresolved dynamic jumps are represented by
ImplicitDynamicPredCount stamped onto JUMPDEST blocks, rather than by
materializing one Succs/Preds edge per dynamic-jump/JUMPDEST pair.
Static jumps (PUSH → JUMP) still get precise single-target edges.
Remove dead call-site enumeration code (resolveCallSiteTargets and
ResolvedJumpTargets export) — no downstream consumer exists yet.
The algorithm can be restored from git history when a consumer
(e.g. MIR direct-branch optimization) is implemented.

Phase 6 — review fixes

Remove dead GasBlock::Prev2Pc / Prev2Opcode fields and their
writebacks. They were originally added to support a future
3-instruction call-site window lookup, but the call-site enumeration
that would have consumed them was removed in Phase 5. Whole-repo
grep confirmed zero readers; struct shrinks ~9 bytes.
Extend the buildCFGEdges function comment to make the soundness
pairing with lemma614Update explicit: the implicit dynamic
predecessor count is folded into effectivePredCount, so dynamic
targets are treated as multi-predecessor blocks without materializing
D×J CFG edges.
Document the EVMModule::CacheNeedsSPP lifecycle invariant: the flag
must be set before any getBytecodeCache() call, since the
EnableSPP decision is fixed at lazy-build time.

Phase 7 — drop `O(D × J²)` over-approximation cost

Replace the explicit add-then-split-critical-edge step that materialised
one CFG edge per (dynamic-jump-block, JUMPDEST) pair with a per-JUMPDEST
scalar ImplicitDynamicPredCount, folded into effectivePredCount so
the lemma 6.14 update behaves identically without materialising the
edges. On a contract with D dynamic jumps and J JUMPDESTs, the
cache build drops from O(D × J²) to O(N).
To keep dyn-only JUMPDESTs (Solidity function returns, unreachable in
the static-only CFG) visible to the dominator / loop analyses, seed
the reachability search from every JUMPDEST after the static reachable
set is built. Gated on ImplicitDynamicPredCount > 0 (round-2 review
fix) so statically-dead JUMPDESTs in dynamic-jump-free contracts
preserve pre-Phase-7 behavior.
Compile-time check: end-to-end evmone-unittests wall clock for
loop_full_of_jumpdests (24556 JUMPDESTs) drops from 7.3 s to
3.3 s (local single-machine measurement, not CI-tracked).

Intra-PR demo — same source built on the commit immediately
before this Phase 7 commit (99f23a3) vs current HEAD, on a
synthetic contract CALLDATALOAD JUMP <N × JUMPDEST> STOP:

N JUMPDESTs	Pre-Phase-7 (D × J explicit edges)	Phase 7 (O(N) implicit count)	Speedup
100	0.07 ms	0.05 ms	1.4×
500	0.39 ms	0.13 ms	3.0×
1,000	1.01 ms	0.29 ms	3.4×
2,000	3.04 ms	0.67 ms	4.5×
5,000	19.66 ms	2.71 ms	7.2×
10,000	84.76 ms	10.38 ms	8.2×
20,000	345.94 ms	43.68 ms	7.9×

Pre-Phase-7 grows ~4× per doubling of N (quadratic — the expected
O(D × J²) shape). Phase 7 grows 2–4× per doubling (sub-quadratic;
residual super-linearity comes from computeDominators /
buildLoopsUsingDominance, which this PR does not touch). Reproduce
with cmake --build build --target evmCacheComplexityDemo && bash docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/scaling_demo.sh.

New evmCacheTests unit test target with 4 smoke/regression cases
covering the SPP gate, dynamic-target reachability path, interpreter-only
no-SPP path, and multi-dyn-jump conservative metering.
Detail: docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md
and docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/review-fixes-r2.md.

Evaluation

Runtime

After rebasing onto current upstream/main (which includes the
intervening upstream perf work — #458 u256 arithmetic, #460
displacement-addressed bytes32, #482 depth-indexed pool, #483 inline
arithmetic delegate), and measuring the 27 paper benches
(^external/total/(main|micro)/):

27-bench 10-rep geomean: +1.15% (treatment slower; +0.46% after
correcting a single 20-rep-confirmed outlier on
main/blake2b_shifts/8415nulls).
0 benches above the ±25% CI gate.

Caveat: this 10-rep number is sequential
(baseline-all-then-treatment-all), so it conflates real PR delta with
inter-binary system drift. Focused 20-rep re-measurement on the three
largest movers indicates the per-bench deltas are dominated by drift,
not by PR effects:

Bench	10-rep Δ	20-rep Δ (focused)	Verdict
`main/weierstrudel/1`	+3.51%	+0.55% (treat CV 2.19%)	drift
`main/blake2b_huff/8415nulls`	-6.30%	+1.55%	drift (flipped direction)
`micro/loop_with_many_jumpdests/empty`	-4.84%	-0.55%	drift
`main/blake2b_shifts/8415nulls`	+20.34% (CV 21.93%)	+0.25% (CV 2.09%)	single-iteration outlier

Additionally, three of the four "regression" benches reported above
the noise band at 10 reps — micro/memory_grow_mstore/{nogrow,by1},
micro/memory_grow_mload/nogrow — contain zero JUMP / JUMPI /
JUMPDEST opcodes, so PR feat(evm): drop SPP all-or-nothing fallback for dynamic-jump contracts #446's CFG changes cannot affect them by
construction. Those deltas are pure drift artifacts.

Reading: this PR's value is the capability change (Phase 1 — SPP
now applies to contracts with dynamic jumps) and the algorithmic
complexity guarantee (Phase 7 — O(D × J²) → O(N) cache build), not raw
runtime improvement on the existing 27-bench suite. The intervening
upstream perf work absorbed any prior absolute speedup; the remaining
per-bench deltas are at or below evmone-bench's single-machine
inter-binary drift band.

Interpreter

0 regressions on CI. Local timing confirms the SPP gating bypass:

test suite	before Phase 4	after Phase 4
`evmone-unittests` interpreter	3744 ms	419 ms (−89%)

Correctness

tools/format.sh check: clean
evmone-unittests multipass: 223/223 pass
evmone-unittests interpreter: 215/215 pass
evmone-statetest --fork Cancun multipass: 2723/2723 pass
evmone-statetest --fork Cancun interpreter: 2723/2723 pass
New evmCacheTests unit tests: 4/4 pass

Changed files

src/evm/evm_cache.h — add GasChunkCostSPP array;
GasBlock::ImplicitDynamicPredCount field
src/evm/evm_cache.cpp — mixed-CFG, SPP export, EnableSPP gating,
soundness fix (always over-approximate CFG); drop dead
Prev2Pc/Prev2Opcode; clarify CFG over-approx invariant;
implicit-dyn-pred count + reachability stitch with R2 gate (Phase 7)
src/compiler/evm_frontend/evm_mir_compiler.{h,cpp} — plumb SPP
pointer; prefer SPP-shifted cost at three chunk-cost read sites
src/compiler/evm_compiler.cpp — pass SPP pointer via setGasChunkInfo
src/runtime/evm_module.{h,cpp} — add CacheNeedsSPP flag; flip
before JIT compile; document lifecycle invariant
src/tests/evm_cache_tests.cpp — NEW unit test target
src/tests/CMakeLists.txt — register new test target
docs/changes/2026-04-05-gas-check-placement/ — change doc +
review-fix plan
docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/ — Phase 7
change doc + round-2 review-fix plan
docs/design/evm-gas-mechanism.md — design doc (interpreter + JIT
gas mechanism with SPP)

Test plan

tools/format.sh check clean
evmone-unittests multipass and interpreter: all pass
evmone-statetest --fork Cancun multipass and interpreter: all pass
New evmCacheTests 4 cases pass
CI perf regression check passes within the ±25% gate
loop_full_of_jumpdests cache build under 4s (compile-time
complexity verification, local single-machine)

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR improves EVM SPP (strategic placement of gas checks) by always constructing a CFG with mixed-precision jump edges, adding a bytecode analysis pass to resolve common Solidity internal-return SWAPn → JUMP patterns via call-site enumeration, fixing an SPP output writeback bug, and plumbing resolved jump targets into the MIR compiler to enable more direct branching.

Changes:

Build CFG even with unresolved dynamic jumps, using resolved edges where available and over-approximated edges otherwise.
Add call-site enumeration to resolve SWAPn → JUMP return targets and export them through EVMBytecodeCache.
Fix SPP results being computed but not written to GasChunkCost, and use resolved targets for a direct-branch fast path in MIR for single-target jumps.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/evm/evm_cache.h`	Add `ResolvedJumpTargets` to bytecode cache for cross-phase consumption.
`src/evm/evm_cache.cpp`	Implement call-site enumeration, CFG edge builder with mixed precision, and fix SPP cost writeback.
`src/compiler/evm_frontend/evm_mir_compiler.h`	Add frontend context plumbing for resolved targets and track `CurrentInstrPC`.
`src/compiler/evm_frontend/evm_mir_compiler.cpp`	Use resolved jump targets to emit direct branch for single-target dynamic `JUMP`.
`src/compiler/evm_compiler.cpp`	Pass resolved targets from module cache into frontend compilation context.
`src/action/evm_bytecode_visitor.h`	Set current instruction PC before invoking jump handlers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-04-05T14:01:29Z

⚡ Performance Regression Check Results

✅ Performance Check Passed (interpreter)

Performance Benchmark Results (threshold: 25%)

Benchmark	Baseline (us)	Current (us)	Change	Status
total/main/blake2b_huff/8415nulls	2.51	2.60	+3.7%	PASS
total/main/blake2b_huff/empty	0.04	0.04	+3.1%	PASS
total/main/blake2b_shifts/8415nulls	20.15	20.61	+2.3%	PASS
total/main/sha1_divs/5311	8.64	8.59	-0.7%	PASS
total/main/sha1_divs/empty	0.10	0.11	+5.6%	PASS
total/main/sha1_shifts/5311	6.08	6.34	+4.4%	PASS
total/main/sha1_shifts/empty	0.07	0.08	+4.1%	PASS
total/main/snailtracer/benchmark	73.96	74.04	+0.1%	PASS
total/main/structarray_alloc/nfts_rank	1.46	1.38	-5.5%	PASS
total/main/swap_math/insufficient_liquidity	0.00	0.00	+0.9%	PASS
total/main/swap_math/received	0.01	0.01	+0.4%	PASS
total/main/swap_math/spent	0.01	0.01	+1.1%	PASS
total/main/weierstrudel/1	0.30	0.29	-3.9%	PASS
total/main/weierstrudel/15	3.41	3.18	-6.8%	PASS
total/micro/JUMPDEST_n0/empty	3.01	2.63	-12.5%	PASS
total/micro/jump_around/empty	0.08	0.10	+13.9%	PASS
total/micro/loop_with_many_jumpdests/empty	45.88	40.23	-12.3%	PASS
total/micro/memory_grow_mload/by1	0.13	0.13	-6.0%	PASS
total/micro/memory_grow_mload/by16	0.14	0.13	-5.2%	PASS
total/micro/memory_grow_mload/by32	0.16	0.15	-7.2%	PASS
total/micro/memory_grow_mload/nogrow	0.13	0.12	-3.3%	PASS
total/micro/memory_grow_mstore/by1	0.13	0.13	-4.8%	PASS
total/micro/memory_grow_mstore/by16	0.15	0.14	-4.8%	PASS
total/micro/memory_grow_mstore/by32	0.16	0.15	-5.0%	PASS
total/micro/memory_grow_mstore/nogrow	0.13	0.12	-8.8%	PASS
total/micro/signextend/one	0.27	0.28	+1.6%	PASS
total/micro/signextend/zero	0.27	0.28	+2.0%	PASS
total/synth/ADD/b0	3.22	3.22	-0.0%	PASS
total/synth/ADD/b1	3.81	3.57	-6.4%	PASS
total/synth/ADDRESS/a0	5.74	4.81	-16.2%	PASS
total/synth/ADDRESS/a1	6.29	5.34	-15.1%	PASS
total/synth/AND/b0	3.16	3.10	-1.9%	PASS
total/synth/AND/b1	3.64	3.50	-4.0%	PASS
total/synth/BYTE/b0	6.90	6.09	-11.8%	PASS
total/synth/BYTE/b1	5.78	5.11	-11.7%	PASS
total/synth/CALLDATASIZE/a0	3.44	3.51	+2.1%	PASS
total/synth/CALLDATASIZE/a1	4.13	3.58	-13.4%	PASS
total/synth/CALLER/a0	5.70	4.81	-15.7%	PASS
total/synth/CALLER/a1	6.28	5.34	-15.0%	PASS
total/synth/CALLVALUE/a0	4.33	3.51	-19.0%	PASS
total/synth/CALLVALUE/a1	4.41	3.76	-14.6%	PASS
total/synth/CODESIZE/a0	3.96	3.76	-5.1%	PASS
total/synth/CODESIZE/a1	4.09	4.01	-1.8%	PASS
total/synth/DUP1/d0	1.57	1.39	-11.5%	PASS
total/synth/DUP1/d1	1.96	1.76	-10.3%	PASS
total/synth/DUP10/d0	1.57	1.39	-11.5%	PASS
total/synth/DUP10/d1	2.05	1.73	-15.3%	PASS
total/synth/DUP11/d0	1.46	1.15	-21.1%	PASS
total/synth/DUP11/d1	2.05	1.73	-15.3%	PASS
total/synth/DUP12/d0	1.57	1.39	-11.5%	PASS
total/synth/DUP12/d1	2.05	1.73	-15.4%	PASS
total/synth/DUP13/d0	1.57	1.39	-11.5%	PASS
total/synth/DUP13/d1	1.96	1.73	-11.7%	PASS
total/synth/DUP14/d0	1.48	1.39	-6.2%	PASS
total/synth/DUP14/d1	2.05	1.73	-15.3%	PASS
total/synth/DUP15/d0	1.57	1.39	-11.3%	PASS
total/synth/DUP15/d1	2.05	1.73	-15.4%	PASS
total/synth/DUP16/d0	1.57	1.39	-11.3%	PASS
total/synth/DUP16/d1	1.96	1.73	-11.6%	PASS
total/synth/DUP2/d0	1.48	1.39	-6.3%	PASS
total/synth/DUP2/d1	2.05	1.73	-15.4%	PASS
total/synth/DUP3/d0	1.51	1.15	-23.6%	PASS
total/synth/DUP3/d1	2.05	1.73	-15.4%	PASS
total/synth/DUP4/d0	1.48	1.39	-6.2%	PASS
total/synth/DUP4/d1	2.05	1.73	-15.4%	PASS
total/synth/DUP5/d0	1.57	1.39	-11.4%	PASS
total/synth/DUP5/d1	2.06	1.73	-15.7%	PASS
total/synth/DUP6/d0	1.57	1.39	-11.5%	PASS
total/synth/DUP6/d1	2.05	1.73	-15.4%	PASS
total/synth/DUP7/d0	1.57	1.39	-11.6%	PASS
total/synth/DUP7/d1	1.96	1.73	-11.6%	PASS
total/synth/DUP8/d0	1.44	1.39	-3.3%	PASS
total/synth/DUP8/d1	2.05	1.73	-15.4%	PASS
total/synth/DUP9/d0	1.57	1.15	-26.7%	PASS
total/synth/DUP9/d1	2.05	1.73	-15.3%	PASS
total/synth/EQ/b0	6.10	5.32	-12.8%	PASS
total/synth/EQ/b1	6.59	5.61	-14.8%	PASS
total/synth/GAS/a0	4.21	3.83	-8.9%	PASS
total/synth/GAS/a1	4.49	4.00	-11.1%	PASS
total/synth/GT/b0	5.79	5.38	-7.1%	PASS
total/synth/GT/b1	6.20	5.35	-13.7%	PASS
total/synth/ISZERO/u0	9.63	8.33	-13.6%	PASS
total/synth/JUMPDEST/n0	3.01	2.63	-12.5%	PASS
total/synth/LT/b0	5.78	5.38	-6.9%	PASS
total/synth/LT/b1	6.21	5.35	-13.8%	PASS
total/synth/MSIZE/a0	4.98	4.33	-13.1%	PASS
total/synth/MSIZE/a1	5.67	4.85	-14.4%	PASS
total/synth/MUL/b0	6.33	5.49	-13.2%	PASS
total/synth/MUL/b1	6.78	5.93	-12.5%	PASS
total/synth/NOT/u0	5.53	5.06	-8.5%	PASS
total/synth/OR/b0	3.04	3.02	-0.8%	PASS
total/synth/OR/b1	3.54	3.41	-3.5%	PASS
total/synth/PC/a0	3.68	3.59	-2.5%	PASS
total/synth/PC/a1	4.12	3.61	-12.3%	PASS
total/synth/PUSH1/p0	1.48	1.39	-6.1%	PASS
total/synth/PUSH1/p1	2.16	1.82	-15.7%	PASS
total/synth/PUSH10/p0	1.51	1.39	-7.6%	PASS
total/synth/PUSH10/p1	2.07	1.83	-11.3%	PASS
total/synth/PUSH11/p0	1.58	1.39	-11.6%	PASS
total/synth/PUSH11/p1	2.15	1.82	-15.3%	PASS
total/synth/PUSH12/p0	1.51	1.39	-8.0%	PASS
total/synth/PUSH12/p1	2.07	1.82	-11.9%	PASS
total/synth/PUSH13/p0	1.52	1.31	-13.5%	PASS
total/synth/PUSH13/p1	2.08	1.83	-12.1%	PASS
total/synth/PUSH14/p0	1.52	1.42	-6.5%	PASS
total/synth/PUSH14/p1	1.89	1.83	-3.3%	PASS
total/synth/PUSH15/p0	1.52	1.39	-8.4%	PASS
total/synth/PUSH15/p1	2.10	1.92	-8.4%	PASS
total/synth/PUSH16/p0	1.50	1.39	-7.0%	PASS
total/synth/PUSH16/p1	2.10	1.83	-12.7%	PASS
total/synth/PUSH17/p0	1.52	1.39	-8.2%	PASS
total/synth/PUSH17/p1	2.07	1.83	-11.7%	PASS
total/synth/PUSH18/p0	1.52	1.32	-13.1%	PASS
total/synth/PUSH18/p1	2.09	1.85	-11.7%	PASS
total/synth/PUSH19/p0	1.56	1.31	-16.1%	PASS
total/synth/PUSH19/p1	2.09	1.85	-11.6%	PASS
total/synth/PUSH2/p0	1.51	1.40	-7.2%	PASS
total/synth/PUSH2/p1	2.16	1.82	-15.6%	PASS
total/synth/PUSH20/p0	1.51	1.40	-7.2%	PASS
total/synth/PUSH20/p1	2.11	1.85	-12.4%	PASS
total/synth/PUSH21/p0	1.53	1.31	-13.9%	PASS
total/synth/PUSH21/p1	2.11	1.83	-13.2%	PASS
total/synth/PUSH22/p0	1.52	1.39	-8.1%	PASS
total/synth/PUSH22/p1	2.17	1.86	-14.2%	PASS
total/synth/PUSH23/p0	1.52	1.32	-13.0%	PASS
total/synth/PUSH23/p1	2.16	1.88	-12.9%	PASS
total/synth/PUSH24/p0	1.51	1.39	-7.7%	PASS
total/synth/PUSH24/p1	1.90	1.83	-3.7%	PASS
total/synth/PUSH25/p0	1.51	1.39	-7.9%	PASS
total/synth/PUSH25/p1	2.08	1.83	-11.9%	PASS
total/synth/PUSH26/p0	1.51	1.39	-7.7%	PASS
total/synth/PUSH26/p1	2.08	1.83	-11.7%	PASS
total/synth/PUSH27/p0	1.52	1.32	-13.0%	PASS
total/synth/PUSH27/p1	2.08	1.84	-11.6%	PASS
total/synth/PUSH28/p0	1.52	1.34	-12.1%	PASS
total/synth/PUSH28/p1	2.08	1.85	-11.1%	PASS
total/synth/PUSH29/p0	1.53	1.32	-13.5%	PASS
total/synth/PUSH29/p1	2.17	1.86	-14.1%	PASS
total/synth/PUSH3/p0	1.52	1.39	-8.2%	PASS
total/synth/PUSH3/p1	2.07	1.64	-21.2%	PASS
total/synth/PUSH30/p0	1.58	1.54	-3.0%	PASS
total/synth/PUSH30/p1	2.10	1.65	-21.4%	PASS
total/synth/PUSH31/p0	1.58	1.39	-11.5%	PASS
total/synth/PUSH31/p1	2.21	1.80	-18.3%	PASS
total/synth/PUSH32/p0	1.53	1.39	-9.3%	PASS
total/synth/PUSH32/p1	2.17	1.83	-15.4%	PASS
total/synth/PUSH4/p0	1.51	1.39	-7.7%	PASS
total/synth/PUSH4/p1	2.16	1.60	-25.8%	PASS
total/synth/PUSH5/p0	1.51	1.39	-7.7%	PASS
total/synth/PUSH5/p1	2.07	1.83	-11.3%	PASS
total/synth/PUSH6/p0	1.51	1.39	-8.0%	PASS
total/synth/PUSH6/p1	1.81	1.82	+0.7%	PASS
total/synth/PUSH7/p0	1.52	1.39	-8.0%	PASS
total/synth/PUSH7/p1	2.08	1.85	-11.0%	PASS
total/synth/PUSH8/p0	1.55	1.30	-16.2%	PASS
total/synth/PUSH8/p1	2.16	1.82	-15.4%	PASS
total/synth/PUSH9/p0	1.57	1.40	-11.0%	PASS
total/synth/PUSH9/p1	1.95	1.83	-6.0%	PASS
total/synth/RETURNDATASIZE/a0	3.88	3.91	+0.7%	PASS
total/synth/RETURNDATASIZE/a1	4.17	3.93	-5.7%	PASS
total/synth/SAR/b0	4.44	3.92	-11.7%	PASS
total/synth/SAR/b1	5.23	4.71	-10.0%	PASS
total/synth/SGT/b0	4.39	4.60	+4.7%	PASS
total/synth/SGT/b1	5.06	4.11	-18.8%	PASS
total/synth/SHL/b0	3.94	3.60	-8.8%	PASS
total/synth/SHL/b1	3.86	3.65	-5.3%	PASS
total/synth/SHR/b0	3.64	3.47	-4.6%	PASS
total/synth/SHR/b1	3.86	3.68	-4.5%	PASS
total/synth/SIGNEXTEND/b0	3.60	3.44	-4.3%	PASS
total/synth/SIGNEXTEND/b1	4.10	3.82	-6.8%	PASS
total/synth/SLT/b0	4.38	4.28	-2.3%	PASS
total/synth/SLT/b1	4.88	4.08	-16.3%	PASS
total/synth/SUB/b0	3.22	3.22	-0.1%	PASS
total/synth/SUB/b1	3.73	3.48	-6.6%	PASS
total/synth/SWAP1/s0	3.43	3.43	+0.0%	PASS
total/synth/SWAP10/s0	3.45	3.46	+0.1%	PASS
total/synth/SWAP11/s0	3.45	3.45	+0.1%	PASS
total/synth/SWAP12/s0	3.46	3.46	+0.2%	PASS
total/synth/SWAP13/s0	3.46	3.46	+0.0%	PASS
total/synth/SWAP14/s0	3.46	3.47	+0.1%	PASS
total/synth/SWAP15/s0	3.30	3.74	+13.4%	PASS
total/synth/SWAP16/s0	3.39	3.49	+3.0%	PASS
total/synth/SWAP2/s0	3.44	3.44	+0.1%	PASS
total/synth/SWAP3/s0	3.44	3.44	+0.0%	PASS
total/synth/SWAP4/s0	3.44	3.45	+0.2%	PASS
total/synth/SWAP5/s0	3.44	3.45	+0.1%	PASS
total/synth/SWAP6/s0	3.44	3.45	+0.3%	PASS
total/synth/SWAP7/s0	3.45	3.45	-0.0%	PASS
total/synth/SWAP8/s0	3.45	3.45	+0.2%	PASS
total/synth/SWAP9/s0	3.45	3.46	+0.3%	PASS
total/synth/XOR/b0	3.14	3.10	-1.4%	PASS
total/synth/XOR/b1	3.72	3.49	-6.0%	PASS
total/synth/loop_v1	7.05	6.69	-5.1%	PASS
total/synth/loop_v2	7.07	6.74	-4.5%	PASS

Summary: 194 benchmarks, 0 regressions

✅ Performance Check Passed (multipass)

Performance Benchmark Results (threshold: 25%)

Benchmark	Baseline (us)	Current (us)	Change	Status
total/main/blake2b_huff/8415nulls	0.83	0.83	-0.1%	PASS
total/main/blake2b_huff/empty	0.02	0.02	-0.7%	PASS
total/main/blake2b_shifts/8415nulls	4.43	4.40	-0.7%	PASS
total/main/sha1_divs/5311	0.58	0.58	-0.3%	PASS
total/main/sha1_divs/empty	0.01	0.01	-0.3%	PASS
total/main/sha1_shifts/5311	0.54	0.54	+0.4%	PASS
total/main/sha1_shifts/empty	0.01	0.01	+0.5%	PASS
total/main/snailtracer/benchmark	31.20	30.96	-0.8%	PASS
total/main/structarray_alloc/nfts_rank	0.27	0.27	-1.0%	PASS
total/main/swap_math/insufficient_liquidity	0.00	0.00	-0.0%	PASS
total/main/swap_math/received	0.00	0.00	-0.5%	PASS
total/main/swap_math/spent	0.00	0.00	+0.6%	PASS
total/main/weierstrudel/1	0.24	0.24	+0.1%	PASS
total/main/weierstrudel/15	2.58	2.57	-0.4%	PASS
total/micro/JUMPDEST_n0/empty	0.00	0.00	+0.3%	PASS
total/micro/jump_around/empty	0.06	0.06	+3.0%	PASS
total/micro/loop_with_many_jumpdests/empty	0.00	0.00	-0.2%	PASS
total/micro/memory_grow_mload/by1	0.01	0.01	+0.6%	PASS
total/micro/memory_grow_mload/by16	0.01	0.01	+0.9%	PASS
total/micro/memory_grow_mload/by32	0.01	0.01	-1.3%	PASS
total/micro/memory_grow_mload/nogrow	0.01	0.01	-0.5%	PASS
total/micro/memory_grow_mstore/by1	0.01	0.01	+1.0%	PASS
total/micro/memory_grow_mstore/by16	0.01	0.01	-0.2%	PASS
total/micro/memory_grow_mstore/by32	0.01	0.01	-1.3%	PASS
total/micro/memory_grow_mstore/nogrow	0.01	0.01	-0.6%	PASS
total/micro/signextend/one	0.08	0.08	+1.3%	PASS
total/micro/signextend/zero	0.08	0.08	+1.2%	PASS
total/synth/ADD/b0	0.00	0.00	+0.1%	PASS
total/synth/ADD/b1	0.00	0.00	-0.5%	PASS
total/synth/ADDRESS/a0	0.15	0.15	-0.1%	PASS
total/synth/ADDRESS/a1	0.15	0.15	+0.0%	PASS
total/synth/AND/b0	0.00	0.00	+0.1%	PASS
total/synth/AND/b1	0.00	0.00	+0.3%	PASS
total/synth/BYTE/b0	0.00	0.00	+0.1%	PASS
total/synth/BYTE/b1	0.00	0.00	+0.3%	PASS
total/synth/CALLDATASIZE/a0	0.07	0.07	-0.0%	PASS
total/synth/CALLDATASIZE/a1	0.07	0.07	-0.0%	PASS
total/synth/CALLER/a0	0.18	0.18	-0.3%	PASS
total/synth/CALLER/a1	0.18	0.18	-0.2%	PASS
total/synth/CALLVALUE/a0	0.19	0.19	+1.3%	PASS
total/synth/CALLVALUE/a1	0.19	0.19	+1.2%	PASS
total/synth/CODESIZE/a0	0.07	0.07	-0.0%	PASS
total/synth/CODESIZE/a1	0.07	0.07	+0.0%	PASS
total/synth/DUP1/d0	0.00	0.00	-0.1%	PASS
total/synth/DUP1/d1	0.00	0.00	-0.0%	PASS
total/synth/DUP10/d0	0.00	0.00	+0.4%	PASS
total/synth/DUP10/d1	0.00	0.00	+0.0%	PASS
total/synth/DUP11/d0	0.00	0.00	+0.0%	PASS
total/synth/DUP11/d1	0.00	0.00	+0.2%	PASS
total/synth/DUP12/d0	0.00	0.00	+0.1%	PASS
total/synth/DUP12/d1	0.00	0.00	+0.2%	PASS
total/synth/DUP13/d0	0.00	0.00	+0.2%	PASS
total/synth/DUP13/d1	0.00	0.00	-0.0%	PASS
total/synth/DUP14/d0	0.00	0.00	-0.0%	PASS
total/synth/DUP14/d1	0.00	0.00	+0.4%	PASS
total/synth/DUP15/d0	0.00	0.00	+0.0%	PASS
total/synth/DUP15/d1	0.00	0.00	+0.0%	PASS
total/synth/DUP16/d0	0.00	0.00	-0.1%	PASS
total/synth/DUP16/d1	0.00	0.00	+0.2%	PASS
total/synth/DUP2/d0	0.00	0.00	+0.3%	PASS
total/synth/DUP2/d1	0.00	0.00	+0.3%	PASS
total/synth/DUP3/d0	0.00	0.00	+0.2%	PASS
total/synth/DUP3/d1	0.00	0.00	+0.4%	PASS
total/synth/DUP4/d0	0.00	0.00	+0.5%	PASS
total/synth/DUP4/d1	0.00	0.00	+0.1%	PASS
total/synth/DUP5/d0	0.00	0.00	+0.3%	PASS
total/synth/DUP5/d1	0.00	0.00	-0.3%	PASS
total/synth/DUP6/d0	0.00	0.00	+0.4%	PASS
total/synth/DUP6/d1	0.00	0.00	+0.4%	PASS
total/synth/DUP7/d0	0.00	0.00	-0.0%	PASS
total/synth/DUP7/d1	0.00	0.00	+0.0%	PASS
total/synth/DUP8/d0	0.00	0.00	-0.1%	PASS
total/synth/DUP8/d1	0.00	0.00	-0.1%	PASS
total/synth/DUP9/d0	0.00	0.00	-0.2%	PASS
total/synth/DUP9/d1	0.00	0.00	-0.1%	PASS
total/synth/EQ/b0	0.00	0.00	-0.1%	PASS
total/synth/EQ/b1	0.00	0.00	+0.2%	PASS
total/synth/GAS/a0	0.76	0.76	+0.0%	PASS
total/synth/GAS/a1	0.76	0.76	+0.0%	PASS
total/synth/GT/b0	0.00	0.00	-0.3%	PASS
total/synth/GT/b1	0.00	0.00	+0.1%	PASS
total/synth/ISZERO/u0	0.00	0.00	+0.1%	PASS
total/synth/JUMPDEST/n0	0.00	0.00	+0.4%	PASS
total/synth/LT/b0	0.00	0.00	+0.2%	PASS
total/synth/LT/b1	0.00	0.00	+0.2%	PASS
total/synth/MSIZE/a0	0.00	0.00	+0.1%	PASS
total/synth/MSIZE/a1	0.00	0.00	+0.0%	PASS
total/synth/MUL/b0	0.00	0.00	+0.1%	PASS
total/synth/MUL/b1	0.00	0.00	+0.2%	PASS
total/synth/NOT/u0	0.00	0.00	+0.3%	PASS
total/synth/OR/b0	0.00	0.00	+0.3%	PASS
total/synth/OR/b1	0.00	0.00	+0.1%	PASS
total/synth/PC/a0	0.00	0.00	+0.1%	PASS
total/synth/PC/a1	0.00	0.00	-0.2%	PASS
total/synth/PUSH1/p0	0.00	0.00	+0.2%	PASS
total/synth/PUSH1/p1	0.00	0.00	+0.1%	PASS
total/synth/PUSH10/p0	0.00	0.00	+2.1%	PASS
total/synth/PUSH10/p1	0.00	0.00	+2.5%	PASS
total/synth/PUSH11/p0	0.00	0.00	-0.3%	PASS
total/synth/PUSH11/p1	0.00	0.00	+1.7%	PASS
total/synth/PUSH12/p0	0.00	0.00	+3.3%	PASS
total/synth/PUSH12/p1	0.00	0.00	+0.5%	PASS
total/synth/PUSH13/p0	0.00	0.00	+0.4%	PASS
total/synth/PUSH13/p1	0.00	0.00	+0.4%	PASS
total/synth/PUSH14/p0	0.00	0.00	+1.4%	PASS
total/synth/PUSH14/p1	0.00	0.00	+0.3%	PASS
total/synth/PUSH15/p0	0.00	0.00	-0.1%	PASS
total/synth/PUSH15/p1	0.00	0.00	+0.4%	PASS
total/synth/PUSH16/p0	0.00	0.00	-0.4%	PASS
total/synth/PUSH16/p1	0.00	0.00	+1.5%	PASS
total/synth/PUSH17/p0	0.00	0.00	+0.4%	PASS
total/synth/PUSH17/p1	0.00	0.00	-0.3%	PASS
total/synth/PUSH18/p0	0.00	0.00	+0.6%	PASS
total/synth/PUSH18/p1	0.00	0.00	+1.0%	PASS
total/synth/PUSH19/p0	0.00	0.00	+0.4%	PASS
total/synth/PUSH19/p1	0.00	0.00	-0.1%	PASS
total/synth/PUSH2/p0	0.00	0.00	-0.6%	PASS
total/synth/PUSH2/p1	0.00	0.00	+0.2%	PASS
total/synth/PUSH20/p0	0.00	0.00	+0.1%	PASS
total/synth/PUSH20/p1	0.00	0.00	+0.1%	PASS
total/synth/PUSH21/p0	0.00	0.00	-0.2%	PASS
total/synth/PUSH21/p1	0.00	0.00	+0.4%	PASS
total/synth/PUSH22/p0	1.40	1.40	-0.0%	PASS
total/synth/PUSH22/p1	1.59	1.59	-0.3%	PASS
total/synth/PUSH23/p0	1.39	1.39	-0.0%	PASS
total/synth/PUSH23/p1	1.59	1.59	+0.0%	PASS
total/synth/PUSH24/p0	1.40	1.40	+0.1%	PASS
total/synth/PUSH24/p1	1.58	1.59	+0.1%	PASS
total/synth/PUSH25/p0	1.40	1.40	+0.0%	PASS
total/synth/PUSH25/p1	1.59	1.58	-0.1%	PASS
total/synth/PUSH26/p0	1.31	1.32	+0.7%	PASS
total/synth/PUSH26/p1	1.59	1.60	+1.0%	PASS
total/synth/PUSH27/p0	1.40	1.40	+0.1%	PASS
total/synth/PUSH27/p1	1.61	1.61	+0.1%	PASS
total/synth/PUSH28/p0	1.40	1.40	-0.2%	PASS
total/synth/PUSH28/p1	1.61	1.61	+0.3%	PASS
total/synth/PUSH29/p0	1.40	1.39	-0.0%	PASS
total/synth/PUSH29/p1	1.59	1.60	+0.6%	PASS
total/synth/PUSH3/p0	0.00	0.00	+0.5%	PASS
total/synth/PUSH3/p1	0.00	0.00	+0.1%	PASS
total/synth/PUSH30/p0	1.50	1.48	-0.8%	PASS
total/synth/PUSH30/p1	1.62	1.62	+0.3%	PASS
total/synth/PUSH31/p0	1.40	1.40	-0.3%	PASS
total/synth/PUSH31/p1	1.77	1.69	-5.0%	PASS
total/synth/PUSH32/p0	1.40	1.40	+0.1%	PASS
total/synth/PUSH32/p1	1.61	1.61	+0.0%	PASS
total/synth/PUSH4/p0	0.00	0.00	-0.1%	PASS
total/synth/PUSH4/p1	0.00	0.00	+0.9%	PASS
total/synth/PUSH5/p0	0.00	0.00	+0.2%	PASS
total/synth/PUSH5/p1	0.00	0.00	+0.1%	PASS
total/synth/PUSH6/p0	0.00	0.00	-1.1%	PASS
total/synth/PUSH6/p1	0.00	0.00	-1.4%	PASS
total/synth/PUSH7/p0	0.00	0.00	+1.4%	PASS
total/synth/PUSH7/p1	0.00	0.00	+1.6%	PASS
total/synth/PUSH8/p0	0.00	0.00	+0.3%	PASS
total/synth/PUSH8/p1	0.00	0.00	+2.2%	PASS
total/synth/PUSH9/p0	0.00	0.00	+1.8%	PASS
total/synth/PUSH9/p1	0.00	0.00	-0.2%	PASS
total/synth/RETURNDATASIZE/a0	0.03	0.03	-0.1%	PASS
total/synth/RETURNDATASIZE/a1	0.03	0.03	-0.3%	PASS
total/synth/SAR/b0	0.00	0.00	-0.1%	PASS
total/synth/SAR/b1	0.00	0.00	+0.3%	PASS
total/synth/SGT/b0	0.00	0.00	-0.2%	PASS
total/synth/SGT/b1	0.00	0.00	+0.0%	PASS
total/synth/SHL/b0	0.00	0.00	+0.2%	PASS
total/synth/SHL/b1	0.00	0.00	+0.2%	PASS
total/synth/SHR/b0	0.00	0.00	+0.3%	PASS
total/synth/SHR/b1	0.00	0.00	-0.1%	PASS
total/synth/SIGNEXTEND/b0	0.00	0.00	-0.1%	PASS
total/synth/SIGNEXTEND/b1	0.00	0.00	+0.0%	PASS
total/synth/SLT/b0	0.00	0.00	+0.3%	PASS
total/synth/SLT/b1	0.00	0.00	-0.4%	PASS
total/synth/SUB/b0	0.00	0.00	+0.4%	PASS
total/synth/SUB/b1	0.00	0.00	-0.1%	PASS
total/synth/SWAP1/s0	0.00	0.00	+0.0%	PASS
total/synth/SWAP10/s0	0.00	0.00	+0.3%	PASS
total/synth/SWAP11/s0	0.00	0.00	+0.2%	PASS
total/synth/SWAP12/s0	0.00	0.00	+0.4%	PASS
total/synth/SWAP13/s0	0.00	0.00	+0.2%	PASS
total/synth/SWAP14/s0	0.00	0.00	+0.1%	PASS
total/synth/SWAP15/s0	0.00	0.00	-0.3%	PASS
total/synth/SWAP16/s0	0.00	0.00	+0.4%	PASS
total/synth/SWAP2/s0	0.00	0.00	+0.0%	PASS
total/synth/SWAP3/s0	0.00	0.00	+0.3%	PASS
total/synth/SWAP4/s0	0.00	0.00	+0.0%	PASS
total/synth/SWAP5/s0	0.00	0.00	-0.2%	PASS
total/synth/SWAP6/s0	0.00	0.00	-0.0%	PASS
total/synth/SWAP7/s0	0.00	0.00	-0.1%	PASS
total/synth/SWAP8/s0	0.00	0.00	+0.0%	PASS
total/synth/SWAP9/s0	0.00	0.00	-0.1%	PASS
total/synth/XOR/b0	0.00	0.00	+0.4%	PASS
total/synth/XOR/b1	0.00	0.00	+0.3%	PASS
total/synth/loop_v1	1.50	1.51	+0.6%	PASS
total/synth/loop_v2	1.39	1.39	+0.3%	PASS

Summary: 194 benchmarks, 0 regressions

…al design Address codex review on PR DTVMStack#446: - Rewrite the gas-check-placement change doc to describe the final design only: mixed-precision CFG (over-approximate dynamic jumps), separate GasChunkCostSPP array for the JIT, and interpreter-mode gating. Drop the call-site / ResolvedJumpTargets narrative — that exploration was reverted by c26bf7c and lives in git history, not the change doc. - Update src/evm/evm_cache.md so GasChunkCost is documented as the unshifted interpreter cost and the new GasChunkCostSPP field is documented as the SPP-shifted JIT cost. Match the field semantics in src/evm/evm_cache.cpp:1161-1165 and src/evm/evm_cache.h:22-27.

Add docs/design/evm-gas-mechanism.md with mermaid diagrams covering: - shared EVMBytecodeCache layout and the GasChunkCost vs GasChunkCostSPP split, - interpreter chunk fast path (pre-charge at chunk start) with per-opcode fallback, - JIT meterOpcode/meterGas dMIR emission and shared OOG block, - SPP cost shifting (Lemma 6.14), per-path total preservation, mixed-precision CFG with over-approximated dynamic jumps, - pipeline gating via EVMModule::CacheNeedsSPP. Addresses zoowii's review request on PR DTVMStack#446 to document the latest interpreter and JIT gas mechanism.

Records the 6 fix items (F1-F6) identified by the 2026-05-07 self-review of PR DTVMStack#446 with concrete file:line citations, sequencing, and quality gates. The plan was iterated through 3 review rounds (Opus subagent + concrete GraphQL verification of GitHub thread state) before this final form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Remove the all-or-nothing HasDynamicJump bailout in buildGasChunksSPP so that contracts with a mix of resolvable and unresolvable jumps still get CFG-based SPP shifting on the resolved portion. Factor the edge construction into a reusable buildCFGEdges() that can be driven either with over-approximation or with call-site-resolved targets. Add resolveCallSiteTargets() which detects the Solidity internal-function return pattern (SWAPn -> JUMP) and walks predecessors to find the enclosing function entry JUMPDEST, then collects valid return addresses from all matching call sites (PUSH ret -> PUSH func -> JUMP). The reverse-reachability walk is bounded by MAX_REVERSE_REACHABILITY_DEPTH to cap compile-time cost. Introduce decodePushAsJumpDest() as a shared PUSH-as-JUMPDEST decode helper and add Prev2Pc / Prev2Opcode tracking on GasBlock so the 3-instruction call-site window can be inspected without rescanning bytecode. Tighten the SPP shifting guard so that a successor whose last opcode is a isGasChunkTerminator bails out of shifting, preventing gas cost from being moved across chunk boundaries. GasChunkCost continues to write Blocks[Id].Cost (the original unshifted per-block cost) exactly as PR DTVMStack#371 established: the interpreter gas chunk fast path depends on unshifted costs, and exporting SPP-shifted metering to the JIT is left as a follow-up on a separate JIT-only output path. Test plan: - format.sh check: clean - evmone-unittests multipass: 223/223 pass - evmone-unittests interpreter: 215/215 pass - evmone-statetest --fork Cancun: 2723/2723 pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Measured via evmone-bench against upstream/main@a14a9de on the external/total/(main|micro) benchmark set: geomean -10.13% across 27 benchmarks, with memory_grow_mload/mstore -19% to -24%, signextend -19% to -20%, and snailtracer -7.53%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add EVMBytecodeCache::GasChunkCostSPP as a second parallel cost array that holds the SPP metering-shifted per-chunk cost computed by buildGasChunksSPP. The interpreter continues reading the unshifted GasChunkCost (preserving PR DTVMStack#371's interpreter-safety invariant) while the multipass JIT prefers the shifted values when emitting gas checks. Plumb the pointer through EVMFrontendContext::setGasChunkInfo, snapshot it in EVMMirBuilder, and swap the three chunk-cost read sites: - meterOpcode — primary per-chunk-start charge - meterOpcodeRange (slow path) — JUMPDEST-skip cumulative sum - JUMPDEST-run suffix-sum precompute inside the jump table builder Swapping all three sites is safe because SPP's lemma614 shift can only transfer gas from a single-predecessor successor into its parent. Every JUMPDEST is a jump target (multi-predecessor), so SPP can never shift gas *into* a JUMPDEST-run member from outside the run; the only intra-run shift it can perform is from the trailing body chunk up into the last JUMPDEST, which is still charged along every entry path into the run. Total gas along any realizable execution path is preserved. Test plan: - format.sh check: clean - evmone-unittests multipass: 223/223 pass - evmone-unittests interpreter: 215/215 pass - evmone-statetest --fork Cancun: 2723/2723 pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phase 3 wires SPP-shifted gas costs into the multipass JIT but leaves the expensive CFG / call-site / metering pipeline running for every module — including interpreter-only ones that never read the shifted values. On CI that manifests as a ~7% interpreter-mode regression on snailtracer and up to +14% on smaller benchmarks where compile time dominates the total run. Gate the pipeline: - Add `bool EnableSPP` to `buildBytecodeCache`. When false, the function walks the basic gas-block scan, writes unshifted `Blocks[Id].Cost` into `GasChunkEnd` / `GasChunkCost`, and leaves `GasChunkCostSPP` empty. - Track `EVMModule::CacheNeedsSPP`. It is set to `true` immediately before `performEVMJITCompile` runs (the only current SPP consumer). Pure interpreter-mode modules and JIT-fallback modules leave it `false`, so the lazy `initBytecodeCache` picks the cheap path. - `evm_compiler.cpp` passes `nullptr` when the cache's `GasChunkCostSPP` vector is empty, so any JIT compile without SPP (defensive path) falls back to the unshifted table via the existing `GasChunkCostSPP ? ... : GasChunkCost` pattern in `meterOpcode` / `meterOpcodeRange` / the JUMPDEST-run suffix-sum builder. Test plan: - format.sh check: clean - evmone-unittests multipass: 223/223 pass (9.4s vs 13.7s previously) - evmone-unittests interpreter: 215/215 pass (0.4s vs 3.7s previously) - evmone-statetest --fork Cancun: 2723/2723 pass (67s vs 103s previously) - Local bench vs upstream/main (CI flags): geomean -14.29% (n=27) The test-suite runtime drop is the dominant signal that gating works — interpreter mode no longer runs the SPP pipeline, so every module load in the test suite gets back the ~90% of cache-build time the pipeline used to consume. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The previous two-pass CFG build used call-site-resolved targets to replace over-approximate edges for dynamic jumps. This created an under-approximate CFG when resolution was incomplete, allowing lemma614Update to shift gas along non-existent edges and produce unsafe metering on missed paths. Fix: always over-approximate dynamic jumps in buildCFGEdges (edges to all JUMPDESTs). Remove the second-pass CFG rebuild. Export resolved targets through ResolvedJumpTargets for downstream consumers (MIR direct-branch optimization with runtime guard) instead of using them for CFG refinement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Reflect Phase 5 (soundness fix): always over-approximate CFG for dynamic jumps, export ResolvedJumpTargets for downstream MIR use, document reverse BFS cross-function risk as benign. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

resolveCallSiteTargets() and its ResolvedJumpTargets export had no downstream consumer — the resolved targets were computed but never read. Remove the function, its helper isSwapOpcode(), the cache field, and the output parameter to eliminate dead work (O(N×200) BFS per JIT-compiled contract). The call-site enumeration algorithm can be restored from git history when a consumer (e.g. MIR direct-branch optimization) is implemented. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Previous run had +70% SHL/SHR/SAR synth regressions due to noisy neighbor on shared GitHub Actions runner — same baseline, same code, different run produced 11.8ms vs 20.2ms for SHL/b0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…al design Address codex review on PR DTVMStack#446: - Rewrite the gas-check-placement change doc to describe the final design only: mixed-precision CFG (over-approximate dynamic jumps), separate GasChunkCostSPP array for the JIT, and interpreter-mode gating. Drop the call-site / ResolvedJumpTargets narrative — that exploration was reverted by c26bf7c and lives in git history, not the change doc. - Update src/evm/evm_cache.md so GasChunkCost is documented as the unshifted interpreter cost and the new GasChunkCostSPP field is documented as the SPP-shifted JIT cost. Match the field semantics in src/evm/evm_cache.cpp:1161-1165 and src/evm/evm_cache.h:22-27.

Add docs/design/evm-gas-mechanism.md with mermaid diagrams covering: - shared EVMBytecodeCache layout and the GasChunkCost vs GasChunkCostSPP split, - interpreter chunk fast path (pre-charge at chunk start) with per-opcode fallback, - JIT meterOpcode/meterGas dMIR emission and shared OOG block, - SPP cost shifting (Lemma 6.14), per-path total preservation, mixed-precision CFG with over-approximated dynamic jumps, - pipeline gating via EVMModule::CacheNeedsSPP. Addresses zoowii's review request on PR DTVMStack#446 to document the latest interpreter and JIT gas mechanism.

Two reviewers flagged factual issues in docs/design/evm-gas-mechanism.md: - JIT meterOpcode flowchart was wrong: when the chunk cache is populated and PC is mid-chunk, the function returns without emitting any MIR (evm_mir_compiler.cpp:537), it does NOT fall through to the per-opcode metric. The per-opcode fallback only fires when the cache pointers are absent. Diagram now shows both branches separately. - Chunk-terminator wording was inverted: SSTORE/CALL*/CREATE*/GAS end their own chunk (the terminator's static cost is included, evm_cache.cpp:329), they are not "before the boundary". Updated the chunk definition and the interpreter key-properties bullet. - Memory expansion is not a chunk boundary; it is charged inside handlers via expandMemoryAndChargeGas. Removed the misleading "dynamic memory growth" entry. - Gas-register sync sites are not limited to CALL/CREATE/return; syncGasToMemory is also called from balance/code/keccak/memory handlers. Listed concrete line numbers. - Interpreter mermaid: chunk-start condition failure does not raise OOG directly; it falls into the slow per-opcode path. - Failure-mode table updated to match. Also drift-fixed line numbers for meterOpcode (524), meterOpcodeRange (544), JumpDestRun precompute (1297-1335), and EVMModule initBytecodeCache (133-136). Counted parallel arrays correctly (5). Added precondition note to the SPP example. Replaced \\n with <br/> and quoted diamond labels so GitHub renders the diagrams.

Records the 6 fix items (F1-F6) identified by the 2026-05-07 self-review of PR DTVMStack#446 with concrete file:line citations, sequencing, and quality gates. The plan was iterated through 3 review rounds (Opus subagent + concrete GraphQL verification of GitHub thread state) before this final form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…approx invariant GasBlock::Prev2Pc and Prev2Opcode were added to support a future 3-instruction call-site window lookup, but the call-site enumeration that would have consumed them was removed in commit c26bf7c (Phase 5 CFG soundness fix). Whole-repo grep confirms zero readers; only the writers in buildGasBlocks remain. Removing both fields shrinks GasBlock by ~9 bytes (one uint32_t + one uint8_t + alignment) and removes dead bookkeeping from every cache build. Also extends the buildCFGEdges function comment to make the soundness pairing with lemma614Update explicit: the over-approximated dynamic-jump edges to all JUMPDESTs work because lemma614Update's effectivePredCount > 1 guard refuses to shift gas across multi-predecessor edges, so the over-approximation is absorbed without breaking per-path totals. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The CacheNeedsSPP flag controls whether the bytecode cache builds with the SPP metering pipeline. It must be set before any getBytecodeCache() call: once the cache is lazily built, the EnableSPP decision is fixed for the module's lifetime. Future lazy / on-demand JIT paths must flip this flag before triggering the cache build, otherwise the JIT will silently fall back to the unshifted GasChunkCost array. Documentation only — no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… and F6 dropped The change-doc Metrics section cited a 27-bench local evmone-bench run (3 reps) whose numbers had drifted significantly from the CI Performance Regression Check baseline. The "≤ +6% small regressions" framing mis-categorized the actual jump-heavy regressions on weierstrudel / jump_around / snailtracer, which sit in the +10–23% range on the CI multipass table. Rewrite the section using the CI bot's authoritative numbers and drop the unverified geomean claim. Also update the review-fix plan to record: - F1, F4, F5 implemented in commits 81efba3 and 691069a; - F2, F3 applied to the PR body; - F6 (open an upstream issue for addEdge O(deg²)) dropped — the concern was theoretical, no commit on this branch touches addEdge, no measured evidence of compile-time pain. Concern kept inline as a future-tuning reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR DTVMStack#446 over-approximates dynamic jumps by inserting an explicit edge from every dynamic-jump block to every JUMPDEST, then splitting each as a critical edge. On contracts with D dynamic jumps and J JUMPDESTs that costs O(D*J^2), so compile time grows quadratically with J and dominates JIT-prep on jump-heavy bytecode. This change carries a per-JUMPDEST scalar ImplicitDynamicPredCount that counts how many dynamic-jump blocks could reach it at runtime, and folds it into effectivePredCount so the lemma 6.14 update behaves identically without materialising the edges. To keep dyn-only JUMPDESTs visible to dominator and loop analyses (Solidity function returns that are unreachable in the static-only CFG), we seed reachability from every JUMPDEST after the static reachable set is built. Compile time on loop_full_of_jumpdests drops from 7.3s to 3.3s. Geomean runtime is -6.15% across the 27 paper benches with zero benches above the +/-25% CI gate. See docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After rebasing onto current upstream/main (which now includes DTVMStack#458 / DTVMStack#460 / DTVMStack#482 / DTVMStack#483 perf work) and running a 10-rep evmone-bench on the 27 paper benches, the cumulative PR delta has collapsed to noise (raw geomean +1.15%, +0.46% after correcting a single-iteration outlier on main/blake2b_shifts/8415nulls via a focused 20-rep re-measurement). 0 benches above the +/-25% CI gate. The A-vs-PR-base -2.73% from this commit's own optimization is unchanged; the framing shift is that the absolute runtime delta of the whole PR vs unmodified main has been absorbed by the intervening upstream perf optimizations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Round-2 self-review surfaced that Phase 7's reachability stitch was seeding every JUMPDEST as a BFS root regardless of whether the contract contains any dynamic jumps. On contracts with no dynamic jumps, a statically-dead JUMPDEST (no predecessor of any kind) would become reachable post-Phase-7, expanding computeDominators and computeLoops input and silently changing SPP decisions on that block class. With this fix the stitch only seeds JUMPDESTs that carry a nonzero ImplicitDynamicPredCount, preserving pre-Phase-7 behavior on dead JUMPDESTs in dynamic-jump-free contracts. Also: - Replace the stale "dynamic jumps get edges to every JUMPDEST" comment block above buildCFGEdges (the implementation has actually skipped materialising those edges since Phase 7). - Add evmCacheTests target with 4 targeted tests covering the gate, the dyn-target stitch path, the interpreter-only no-SPP path, and a multi-dyn-jump corner case. - Soften the loop_full_of_jumpdests 7.3s -> 3.3s claim in the Phase 7 change doc to note it is a local single-machine measurement, not CI-tracked. Review plan: docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/review-fixes-r2.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The 10-rep "wins" of -6.30% (blake2b_huff/8415nulls) and -4.84% (loop_with_many_jumpdests/empty) flip or collapse in a 20-rep focused rerun (+1.55% and -0.55% respectively). The 10-rep "regressions" of +3.51% (weierstrudel/1) similarly collapses to +0.55%. Three other 10-rep "regression" benches (memory_grow_*) contain zero JUMP / JUMPI / JUMPDEST opcodes, so PR DTVMStack#446's CFG changes cannot affect them by construction. Net picture: the 27-bench corpus is essentially flat within evmone- bench's inter-binary drift band when measured at higher rep count. Update the change doc to cite the 20-rep data, surface the methodology caveat, and acknowledge the SPP-to-JIT cost-flow mechanism as a theoretical effect below current measurement precision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add `evmCacheComplexityDemo` (build-only, not registered with ctest) and a wrapper script `docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/ scaling_demo.sh` that runs the demo at multiple JUMPDEST counts and prints a CSV table. Lets reviewers reproduce the cache-build wall clock claim on any machine without rebuilding evmone or invoking the full unittest binary. The demo measures the WHOLE `buildBytecodeCache` pipeline (not just the Phase 7 CFG step), so it surfaces the residual super-linear cost from `computeDominators` / `buildLoopsUsingDominance` that this PR does not touch. Update the change doc to (a) reference the demo, (b) include a sample table, and (c) explicitly scope the O(N) claim to the over-approximation step itself — the 4-second saving on `loop_full_of_jumpdests` (7.3 s -> 3.3 s) IS the Phase 7 contribution; the remaining 3.3 s is dom/loop analysis untouched by this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the single-snapshot table with a pre-Phase-7 vs Phase-7 comparison built from the same demo source on commit 99f23a3 (PR head one commit before this change). At 20k JUMPDESTs the cache build drops from 346 ms to 44 ms (~8x), confirming the asymptotic shape change. Pre-Phase-7 doubles in ~4x time per doubling of N (quadratic, matching the expected O(D * J^2) shape of explicit-edge add + critical-edge split). Phase 7 doubles in 2-4x (sub-quadratic, with residual super- linearity from computeDominators / buildLoopsUsingDominance independent of this PR). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Compress the 30-line narrative around the dyn-target reachability stitch and the over-approximation rationale to 10 lines that state only the non-obvious WHY (soundness invariant + statically-dead JUMPDESTs left unreachable on purpose). No behavior change.

* Build evmCacheComplexityDemo without -fsanitize=address even in ASan builds so the wall-clock measurement is not distorted by sanitizer overhead (the demo's whole purpose is timing buildBytecodeCache). * Replace local OP_* magic-number constants in the tests and the demo with casts of evmc_opcode::* (single source of truth). * Use zen::common::SteadyClock alias in the demo for consistency with the rest of the codebase. * Strip historical / PR / commit-line-number narration from the test fixtures and the demo banner; concrete numbers and rationale already live in docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md. No behavior change: evmCacheTests 4/4 pass, demo scaling shape is preserved at N in {100, 5000, 20000}.

Copilot AI review requested due to automatic review settings April 5, 2026 13:24

Copilot started reviewing on behalf of abmcar April 5, 2026 13:25 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Comment thread src/compiler/evm_frontend/evm_mir_compiler.cpp Outdated

Comment thread src/evm/evm_cache.cpp Outdated

Comment thread src/evm/evm_cache.cpp

abmcar force-pushed the feat/gas-check-placement branch 2 times, most recently from 6103dfd to de98b08 Compare April 5, 2026 14:22

abmcar marked this pull request as draft April 6, 2026 03:42

abmcar force-pushed the feat/gas-check-placement branch from de98b08 to 3ae7153 Compare April 6, 2026 06:24

abmcar changed the title ~~feat(evm): gas check placement optimization with mixed CFG support~~ WIP:feat(evm): gas check placement optimization with mixed CFG support Apr 8, 2026

abmcar force-pushed the feat/gas-check-placement branch 4 times, most recently from 16b877a to efc242f Compare April 11, 2026 03:59

abmcar changed the title ~~WIP:feat(evm): gas check placement optimization with mixed CFG support~~ feat(evm): gas check placement with mixed CFG, SPP JIT output, and interpreter-mode gating Apr 11, 2026

abmcar force-pushed the feat/gas-check-placement branch from 1c3fe1c to 773fcdc Compare April 13, 2026 10:09

abmcar force-pushed the feat/gas-check-placement branch from a15505f to a6e34cf Compare April 25, 2026 05:58

abmcar marked this pull request as ready for review April 25, 2026 16:07

zoowii reviewed Apr 28, 2026

View reviewed changes

Comment thread src/compiler/evm_frontend/evm_mir_compiler.cpp

abmcar mentioned this pull request May 11, 2026

perf(core): replace D*J dyn-jump edges with implicit predecessor count abmcar/DTVM#14

Closed

5 tasks

abmcar and others added 8 commits May 11, 2026 20:16

abmcar and others added 8 commits May 11, 2026 20:17

abmcar force-pushed the feat/gas-check-placement branch from b943a7a to 972615a Compare May 11, 2026 12:19

abmcar changed the title ~~feat(evm): gas check placement with mixed CFG, SPP JIT output, and interpreter-mode gating~~ feat(evm): drop SPP all-or-nothing fallback for dynamic-jump contracts May 12, 2026

abmcar and others added 6 commits May 12, 2026 15:02

zoowii merged commit d44eb8e into DTVMStack:main May 15, 2026
16 checks passed

abmcar deleted the feat/gas-check-placement branch May 20, 2026 07:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evm): drop SPP all-or-nothing fallback for dynamic-jump contracts#446

feat(evm): drop SPP all-or-nothing fallback for dynamic-jump contracts#446
zoowii merged 23 commits into
DTVMStack:mainfrom
abmcar:feat/gas-check-placement

abmcar commented Apr 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

abmcar commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this PR delivers

Phase 1 — mixed-CFG gas block construction

Phase 2 — decodePushAsJumpDest decode helper

Phase 3 — wire SPP-shifted costs into the multipass JIT

Phase 4 — gate the SPP pipeline on JIT-consumer modules only

Phase 5 — CFG soundness fix

Phase 6 — review fixes

Phase 7 — drop O(D × J²) over-approximation cost

Evaluation

Runtime

Interpreter

Correctness

Changed files

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚡ Performance Regression Check Results

✅ Performance Check Passed (interpreter)

✅ Performance Check Passed (multipass)

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

abmcar commented Apr 5, 2026 •

edited

Loading

Phase 2 — `decodePushAsJumpDest` decode helper

Phase 7 — drop `O(D × J²)` over-approximation cost

github-actions Bot commented Apr 5, 2026 •

edited

Loading