diff --git a/docs/changes/2026-04-05-gas-check-placement/README.md b/docs/changes/2026-04-05-gas-check-placement/README.md
new file mode 100644
index 000000000..f172fd044
--- /dev/null
+++ b/docs/changes/2026-04-05-gas-check-placement/README.md
@@ -0,0 +1,183 @@
+# Change: Gas check placement optimization with mixed CFG support
+
+- **Status**: Implemented
+- **Date**: 2026-04-05
+- **Tier**: Full
+
+## Overview
+
+Remove the all-or-nothing dynamic-jump fallback in the EVM bytecode cache's
+SPP gas-metering pipeline. Previously, any unresolved dynamic jump caused the
+entire contract to fall back to per-block gas metering (zero SPP benefit).
+The cache now always builds a CFG with mixed-precision edges and runs the
+SPP shifting pass, while keeping the unshifted per-block cost available for
+the interpreter.
+
+The final design has three pieces:
+
+- **Mixed-precision CFG**: static jumps (`PUSH → JUMP`) get a single precise
+  edge to the resolved `JUMPDEST`; every other dynamic jump gets
+  over-approximated edges to all `JUMPDEST` blocks. The over-approximation
+  is intentional — narrowing dynamic-jump edges with partially-resolved
+  call-site information would under-approximate the CFG and let the SPP
+  pass shift gas along edges that don't exist at runtime, producing unsafe
+  metering. See `buildCFGEdges` in `src/evm/evm_cache.cpp` (lines 386–429,
+  in particular the dynamic-jump branch at line 419).
+- **SPP-shifted gas cost on a separate array**: the interpreter's gas-chunk
+  fast path requires unshifted per-block costs (PR #371). To preserve those
+  semantics while enabling SPP for the JIT, the cache exposes two parallel
+  arrays. `EVMBytecodeCache::GasChunkCost` keeps the unshifted base cost
+  (written from `Blocks[Id].Cost` at `evm_cache.cpp:1161`), and a new
+  `EVMBytecodeCache::GasChunkCostSPP` carries the SPP-shifted value
+  (written from the metering function `Metering[Id]` at
+  `evm_cache.cpp:1165`). The interpreter (`src/evm/interpreter.cpp:382`)
+  reads only `GasChunkCost`; the multipass JIT prefers `GasChunkCostSPP`
+  when non-null and falls back to `GasChunkCost` otherwise
+  (`src/compiler/evm_frontend/evm_mir_compiler.cpp:534, 578`).
+- **Interpreter-mode gating**: the SPP pipeline (CFG construction + metering
+  pass) is expensive and only useful for the JIT consumer. `buildBytecodeCache`
+  takes an `EnableSPP` parameter; when false, it emits unshifted per-block
+  costs and skips the CFG/metering work entirely. `EVMModule::CacheNeedsSPP`
+  is set to `true` immediately before `performEVMJITCompile` runs, so
+  interpreter-only modules never pay the SPP pipeline cost. When the JIT
+  somehow runs without SPP being built, `evm_compiler.cpp` passes `nullptr`
+  for `GasChunkCostSPP` so the JIT falls back to the unshifted array.
+
+## Motivation
+
+The existing all-or-nothing fallback meant any contract with unresolvable
+dynamic jumps got zero benefit from SPP. Real-world Solidity contracts mix
+static and dynamic jumps, so a mixed-edge CFG is needed to let the SPP pass
+do useful work on the resolved portion of the CFG while staying sound on
+the unresolved portion.
+
+## Scope
+
+This PR is scoped to the cache-side CFG and JIT-cost wiring:
+
+- Remove the `HasDynamicJump` early-exit bailout in `buildGasChunksSPP`.
+- Factor out `buildCFGEdges()` with over-approximation for all unresolved
+  dynamic jumps (sound for SPP metering).
+- Add `EVMBytecodeCache::GasChunkCostSPP` and write SPP-shifted costs into
+  it, leaving `GasChunkCost` unshifted for the interpreter.
+- Plumb the SPP pointer through `EVMFrontendContext::setGasChunkInfo` and
+  `EVMMirBuilder`; swap the JIT's chunk-cost reads (`meterOpcode`,
+  `meterOpcodeRange`, JUMPDEST-run suffix-sum precompute) to prefer
+  `GasChunkCostSPP` when non-null.
+- Add an `EnableSPP` parameter to `buildBytecodeCache` and gate the
+  pipeline on JIT-consumer modules only.
+- Tighten the SPP shifting guards to bail out of the shift when a successor
+  is a `isGasChunkTerminator` — prevents masking gas cost across chunk
+  boundaries.
+
+No frontend/MIR changes beyond the cost-source swap are included.
+
+## Impact
+
+### Affected Modules
+
+- `docs/modules/evm/` — EVM bytecode cache, CFG construction, SPP metering
+
+### Compatibility
+
+No breaking changes. Interpreter semantics are preserved (`GasChunkCost`
+remains the unshifted per-block cost, matching PR #371). JIT semantics are
+preserved when SPP is enabled (the JIT now reads SPP-shifted costs from a
+separate array instead of overwriting the interpreter's table).
+
+### Metrics
+
+Numbers are from the CI Performance Regression Check (baseline
+`perf-baseline-*-a14a9de...`, 5 repetitions, 25% threshold) — the
+gate-of-record for this PR. The full 194-bench multipass table lives in
+the github-actions perf-check comment on the PR; this section
+summarizes the design-relevant subset.
+
+**Wins (jump-light / cost-shift opportunities):**
+
+- `micro/signextend/{one,zero}`: 0.13 → 0.07 μs (≈ −42.7%)
+- `micro/memory_grow_mstore/nogrow`: −6.8%
+- `main/structarray_alloc/nfts_rank`: −6.2%
+- `main/blake2b_huff/8415nulls`: −5.3%
+
+**Regressions (jump-heavy contracts — predicted cost of mixed-CFG
+over-approximation):**
+
+- `micro/jump_around/empty`: 0.04 → 0.05 μs (+22.8%)
+- `main/weierstrudel/1`: 0.20 → 0.24 μs (+19.5%)
+- `main/weierstrudel/15`: 2.22 → 2.60 μs (+17.5%)
+- `main/snailtracer/benchmark`: 28.49 → 31.58 μs (+10.9%)
+
+The +17–23% regressions on jump-heavy contracts are the design tradeoff
+of over-approximating dynamic-jump edges to all `JUMPDEST` blocks in
+order to keep the SPP shift sound (narrowing those edges with partial
+call-site resolution would under-approximate the CFG and break per-path
+total invariants — see Phase 5 / `buildCFGEdges` in
+`src/evm/evm_cache.cpp:389-429`). All 194 benches stay within the 25%
+gate, but `jump_around` has tight headroom.
+
+Earlier drafts of this section cited a 27-bench local `evmone-bench`
+run (3 reps) that drifted from the CI baseline; the CI bot table is the
+authoritative source.
+
+Correctness: 223/223 multipass evmone-unittests, 215/215 interpreter
+evmone-unittests, 2723/2723 evmone-statetests on `fork_Cancun` for both
+multipass and interpreter modes.
+
+## Implementation Plan
+
+### Mixed CFG construction
+
+- [x] Remove the all-or-nothing fallback that disabled SPP on any unresolved
+      dynamic jump
+- [x] Factor `buildCFGEdges()` so static jumps get precise single-target
+      edges and unresolved dynamic jumps get over-approximated edges to
+      every `JUMPDEST`
+
+### JIT cost wiring
+
+- [x] Add `EVMBytecodeCache::GasChunkCostSPP` parallel array, populated from
+      the SPP metering function in `buildGasChunksSPP`
+- [x] Plumb the SPP pointer through `EVMFrontendContext::setGasChunkInfo`
+      and `EVMMirBuilder`
+- [x] In `meterOpcode`, `meterOpcodeRange`, and the JUMPDEST-run suffix-sum
+      precompute, prefer `GasChunkCostSPP` when non-null
+- [x] Interpreter continues reading the unshifted `GasChunkCost` — no change
+
+### SPP pipeline gating
+
+- [x] Add `buildBytecodeCache(..., bool EnableSPP)` parameter
+- [x] When `EnableSPP == false`, skip the CFG / metering pipeline and emit
+      unshifted per-block costs only
+- [x] `EVMModule::CacheNeedsSPP` is flipped to `true` immediately before
+      `performEVMJITCompile` runs, so interpreter-only modules never pay
+      the SPP pipeline cost
+- [x] `evm_compiler.cpp` passes `nullptr` for `GasChunkCostSPP` when the
+      array is empty, so the JIT falls back to the unshifted array if a
+      module is JIT-compiled without SPP being built
+
+### Soundness guards
+
+- [x] Tighten `lemma614Update` to set `MinSucc = 0` when encountering
+      excluded successors or gas-chunk terminators
+
+## Changed Files
+
+- `src/evm/evm_cache.h` — add `GasChunkCostSPP` array, document
+  interpreter vs JIT consumer split
+- `src/evm/evm_cache.cpp` — mixed-CFG `buildCFGEdges`, SPP-shifted cost
+  export, `EnableSPP` gating
+- `src/compiler/evm_frontend/evm_mir_compiler.h` — plumb SPP pointer
+  through context and builder
+- `src/compiler/evm_frontend/evm_mir_compiler.cpp` — prefer SPP-shifted
+  cost at the three chunk-cost read sites
+- `src/compiler/evm_compiler.cpp` — pass the new pointer via
+  `setGasChunkInfo`, with `nullptr` fallback when the SPP array is empty
+- `src/runtime/evm_module.h` — `CacheNeedsSPP` flag
+
+## Risks
+
+- Over-approximated edges for unresolved jumps may pessimize gas placement
+  for pathological contracts with many unresolved targets. Acceptable
+  because the alternative (narrowed edges from partial resolution) is
+  unsound for SPP.
diff --git a/docs/changes/2026-04-05-gas-check-placement/review-fixes.md b/docs/changes/2026-04-05-gas-check-placement/review-fixes.md
new file mode 100644
index 000000000..12f02d156
--- /dev/null
+++ b/docs/changes/2026-04-05-gas-check-placement/review-fixes.md
@@ -0,0 +1,237 @@
+# PR #446 Review Response Plan
+
+- **Status**: Implemented (F1, F4, F5); F2/F3 applied to PR body; F6 dropped
+- **Date**: 2026-05-07
+- **Parent change**: `README.md` (gas check placement w/ mixed CFG, SPP JIT output, interpreter-mode gating)
+- **Branch**: `feat/gas-check-placement`
+
+## Status update (2026-05-07)
+
+- **F1 implemented** in commit `81efba3` — `Prev2Pc/Prev2Opcode` removed,
+  whole-repo grep clean, `GasBlock` shrinks ~9 bytes.
+- **F4 implemented** in commit `81efba3` (squashed with F1) — added the
+  soundness-pairing comment to `buildCFGEdges`.
+- **F5 implemented** in commit `691069a` — `CacheNeedsSPP` lifecycle
+  invariant comment added.
+- **F2 / F3 applied** to the PR body via `gh pr edit` — Copilot threads
+  noted as already-resolved + content-stale (live GraphQL confirmed
+  `isResolved: true` for all three before the edit), perf table
+  rewritten with honest +17 to +22.8% jump-heavy regressions from the
+  latest CI bot output.
+- **F6 dropped** — opening an upstream issue for an `addEdge` O(deg²)
+  concern that was theoretical, unmeasured, and not touched by any
+  commit on this branch would have been noise. The concern remains
+  documented below for future reference but no issue is filed.
+
+This plan addresses the findings of the 2026-05-07 self-review of PR #446.
+Items are grouped by whether they block merge.
+
+## Blocking before merge
+
+### F1 — Remove dead `Prev2Pc` / `Prev2Opcode` tracking
+
+**Symptom**: `src/evm/evm_cache.cpp:195, 198` add two `GasBlock` fields and
+`src/evm/evm_cache.cpp:323-324` write them in `buildGasBlocks`, but no
+reader exists in `src/` or `tests/`. The PR description justifies them as
+"future 3-instruction call-site window lookup", but Phase 5 (commit
+`c26bf7c`) removed call-site enumeration entirely, so the rationale no
+longer applies on this branch.
+
+**Why this blocks**: a fresh reviewer will re-question every PR cycle until
+the dead fields are gone or have a concrete forward link. Leaving them in
+also adds a small per-block bookkeeping cost on every cache build.
+
+**Fix**: remove `GasBlock::Prev2Pc`, `GasBlock::Prev2Opcode`, and the two
+writes inside `buildGasBlocks`. Verify no header or test exposes them.
+
+**Verification**:
+- `grep -rn 'Prev2Pc\|Prev2Opcode'` (whole repo) returns nothing.
+- `tools/format.sh check` clean.
+- Local `evmone-unittests` multipass + interpreter both pass — confirm no
+  hidden dependency surfaces.
+
+**Side effect to note in commit body**: `GasBlock` shrinks by ~9 bytes
+(one `uint32_t` + one `uint8_t` + alignment). Cache memory footprint
+drops marginally; not expected to perturb perf but worth flagging.
+
+**Out of scope**: re-introducing the tracking when a real consumer lands.
+That belongs in the consumer's own PR.
+
+### F2 — Annotate PR body re: stale Copilot AI threads
+
+**Symptom**: the three Copilot AI inline comments on PR #446 target an
+earlier iteration that included `ResolvedJumpTargets` and call-site
+enumeration. Phase 5 (`c26bf7c`) deleted that code, making the threads
+content-stale.
+
+**Round-2 update**: a live GraphQL query
+(`gh api graphql ... reviewThreads`) on 2026-05-07 confirmed that all
+three Copilot threads are **already** `isResolved: true` (Copilot author
+login: `copilot-pull-request-reviewer`). zoowii's design-doc thread is
+also resolved. So the previously-planned `resolveReviewThread` mutation
+is unnecessary.
+
+**Why this still matters (downgraded from blocking)**: even though the
+threads are visually collapsed, the resolution didn't cite the commit
+that made them obsolete. A future reviewer expanding the threads can
+still be confused. A short pointer in the PR body removes that
+confusion.
+
+**Fix**:
+1. Edit the PR description to add a short "Resolved review threads" line
+   noting that Phase 5 commit `c26bf7c` (call-site enumeration removal)
+   makes the three Copilot AI inline threads content-stale; threads are
+   already resolved on the GitHub side.
+2. Do **not** edit, reply to, re-resolve, or unresolve any thread — they
+   are already in the correct state, and zoowii's thread must be left
+   alone per the "no-auto-reply-to-zoowii" rule.
+
+**Verification**:
+- `gh api graphql -f query='query{repository(owner:"DTVMStack",name:"DTVM"){pullRequest(number:446){reviewThreads(first:50){nodes{id isResolved comments(first:1){nodes{author{login}}}}}}}}'`
+  still reports `isResolved: true` for all 4 threads (3 Copilot + 1
+  zoowii) after the PR body edit.
+- `gh pr view 446` shows the PR body now mentions `c26bf7c` as the
+  commit that obsoleted the call-site / `ResolvedJumpTargets`
+  discussion.
+
+### F3 — Make `weierstrudel` / `jump_around` regression visible in PR body
+
+**Symptom**: the multipass perf table shows `weierstrudel/15 +17.5%`,
+`weierstrudel/1 +19.5%`, `micro/jump_around/empty +22.8%` — within the
+25% gate but clustered near the ceiling. The current PR description
+groups them with "small regressions remain (≤ +6%)" which is wrong, and
+buries them in the per-bench list.
+
+**Why this blocks**: hides a known design-tradeoff cost from upstream
+reviewers; if a future contract trips +25%, reviewers will treat it as a
+new regression rather than the predicted cost of mixed-CFG over-approx.
+
+**Fix**: rewrite the "Risks" / "Evaluation" section of the PR body to:
+1. Correct the "≤ +6%" claim; explicitly list the ~+17 to +23% jump-heavy
+   regressions with the actual numbers.
+2. State that these are the predicted cost of CFG over-approximation on
+   jump-heavy contracts (consistent with the design-doc rationale) — not
+   noise.
+3. Note the 25% threshold buffer is intentional but tight; if a future
+   contract trips, the right move is to investigate that contract, not
+   to widen the threshold.
+
+**Verification**:
+- Read the rewritten PR body once before pushing, confirm each cited
+  number matches the CI bot's latest table (per the
+  "PR perf table integrity" rule, regenerate from the bot, do not paste
+  from memory).
+
+## Non-blocking follow-ups (file as TODO comments + GitHub issue)
+
+### F4 — Document `buildCFGEdges` over-approx invariant
+
+`buildCFGEdges` is at `src/evm/evm_cache.cpp:389-429`. Its function-level
+comment (lines 386-388) and inline branch comment (lines 419-422) already
+explain *why* over-approximation is intentional, but neither links forward
+to the soundness mechanism that absorbs the cost (`lemma614Update` at line
+920, which uses the `effectivePredCount > 1` guard at line 911 to refuse
+shifting along over-approx edges).
+
+Append one sentence to the function-level comment block at lines 386-388:
+
+> "After this pass, JUMPDEST blocks may have many predecessors; this is
+> the intentional partner to `lemma614Update`'s `effectivePredCount > 1`
+> guard, which refuses to shift gas across edges with multiple
+> predecessors and so absorbs the over-approximation soundly."
+
+Documentation only — no behavior change. ~3-line edit at the function
+header.
+
+### F5 — `CacheNeedsSPP` lifecycle invariant comment
+
+The `CacheNeedsSPP` field is at `src/runtime/evm_module.h:82` (already
+has a short comment about JIT consumption). The lifecycle constraint is
+visible at `src/runtime/evm_module.cpp:117` (set before
+`performEVMJITCompile`), `:125` (`getBytecodeCache` triggers build), and
+`:135` (`initBytecodeCache` reads `CacheNeedsSPP`).
+
+Append to the field's existing comment:
+
+> "Must be set before any `getBytecodeCache()` call — once the cache is
+> built, the `EnableSPP` decision is fixed for the lifetime of the
+> module. Future lazy / on-demand JIT paths must flip this flag before
+> triggering the lazy cache build."
+
+Documentation only.
+
+### F6 — `addEdge` O(deg²) compile-time guardrail [DROPPED 2026-05-07]
+
+**Status**: dropped. Opening an upstream issue about a code path none
+of the F1/F4/F5 commits touch, with no measured evidence of compile-
+time pain on the existing CI matrix, would have been noise.
+
+**Original concern (kept for future reference)**: `addEdge`
+(`src/evm/evm_cache.cpp:204` area) uses `std::find` for dedup, giving
+O(current_deg) per insertion. Combined with over-approximated
+dynamic-jump edges (`|JUMPDEST| × |dynamic jumps|`), pathological
+contracts could inflate compile time. Phase 4 gating limits exposure
+to JIT-consumer modules.
+
+**If a future contract trips this**: capture the offending bytecode
++ JIT compile-time profile first, then either (a) switch `Succs` /
+`Preds` to a `vector<uint32_t>` + `unordered_set<uint32_t>` hybrid for
+O(1) dedup, or (b) add a `LOG_INFO` warning when
+`JumpDestBlocks.size() * dynamic_jump_count` exceeds a threshold so
+the next tuning cycle has telemetry. Don't act preemptively.
+
+## Sequencing
+
+| Step | Action | Where |
+|------|--------|-------|
+| 1 | F1: remove `Prev2Pc/Prev2Opcode` (1 commit) | `src/evm/evm_cache.cpp` |
+| 2 | F4 + F5: documentation tweaks (1 commit, squashable) | `src/evm/evm_cache.cpp`, `src/runtime/evm_module.h` |
+| 3 | Build + format + local test gate (see below) | `tools/format.sh` + `evmone-unittests` + `evmone-statetest` + `ctest` |
+| 4 | Push to `feat/gas-check-placement`; await CI green (~35 min for the multipass perf job) | — |
+| 5 | F2: edit PR body to point at `c26bf7c` (no thread mutation — Round-2 live query confirmed all 4 threads already resolved) | GitHub web/CLI |
+| 6 | F3: rewrite Evaluation section in PR body using numbers from the latest CI bot table (per "PR perf table integrity" rule, never paste from memory) | GitHub web/CLI |
+| 7 | (F6 dropped) | — |
+
+## Out-of-scope
+
+- Re-introducing call-site resolution / `ResolvedJumpTargets`: belongs in
+  a future PR with a real consumer (e.g. MIR direct-branch optimization).
+- Tuning the 25% perf threshold or adjusting individual bench tolerances:
+  that is a CI-config concern, not a code change.
+- Switching `addEdge` data structure: see F6 — follow-up only.
+
+## Quality gates
+
+Before pushing the F1+F4+F5 commit, the build must use the CI-faithful
+flag set (`.claude/rules/dtvm-build-config.md` /
+`.claude/rules/match_ci_cmake_flags`): in particular
+`-DZEN_ENABLE_JIT_PRECOMPILE_FALLBACK=ON` and `-DZEN_ENABLE_LIBEVM=ON`,
+otherwise interpreter / fallback paths run a different code shape than
+CI.
+
+1. `tools/format.sh check` clean.
+2. `cmake --build build --target dtvmapi -j$(nproc)` succeeds, no new
+   warnings.
+3. `evmone-unittests` multipass: 223/223 pass.
+4. `evmone-unittests` interpreter: 215/215 pass.
+5. `evmone-statetest --fork Cancun` multipass: 2723/2723 pass (current
+   baseline; the count must match — any drop is a regression).
+6. `evmone-statetest --fork Cancun` interpreter: must match the pass
+   count reported by the most recent CI green run on
+   `feat/gas-check-placement` (binary equality — record it once before
+   making the F1+F4+F5 commit so the local re-run can be compared
+   exactly, not just "all green").
+7. `ctest` from `build/` (the project's built-in EVM spec tests, per
+   `.claude/rules/dtvm-local-test.md`).
+8. CI green on the new push, including the matrix jobs:
+   `Build and test DTVM multipass on x86-64`,
+   `Build and test DTVM interpreter on x86-64`,
+   `Test DTVM-EVM JIT fallback in release mode with ctest on x86-64`,
+   `Test DTVM-EVM multipass evmtestsuite with gas register in release
+   mode with ctest on x86-64`,
+   `Performance Regression Check (interpreter)` and
+   `Performance Regression Check (multipass)`.
+   (~35 min for the multipass perf job.)
+
+Skip F3 (PR-body edits) until F1+F4+F5 commits land and CI passes, since
+the PR description should match the final state of the branch.
diff --git a/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md b/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md
new file mode 100644
index 000000000..c1e787cd3
--- /dev/null
+++ b/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md
@@ -0,0 +1,280 @@
+# Change: SPP CFG over-approximation via implicit dyn-pred count
+
+- **Status**: Implemented
+- **Date**: 2026-05-11
+- **Tier**: Light
+- **Parent PR**: builds on `feat/gas-check-placement` (PR #446)
+
+## Overview
+
+Replace the `O(D * J)` explicit over-approximation edges in
+`buildCFGEdges` (where `D` = #unresolved dynamic jumps and `J` = #JUMPDEST
+blocks) with an `O(D + J)` implicit predecessor count. The SPP shifting
+pass `lemma614Update` makes its single-vs-multi-predecessor decision via
+the new `effectivePredCount`, so behavior is equivalent for every pair
+(parent, JUMPDEST-successor) that the explicit representation would have
+materialized — without ever building the dense edge set.
+
+The static-only reachability gap that this creates (dyn-only JUMPDESTs
+become unreachable from the entry block) is closed by an explicit
+reachability stitch that seeds every JUMPDEST as a root before the
+dominator and loop analyses run.
+
+## Motivation
+
+The current `feat/gas-check-placement` representation builds a dense
+over-approximate CFG: every unresolved dynamic `JUMP`/`JUMPI` adds one
+edge to every JUMPDEST in the contract. This is `O(D * J)` edges, and
+`addEdge`'s `std::find` dedup makes each insertion `O(deg)`, so the
+total cost is `O(D * J^2 + J * D^2) = O(D * J * (D + J))`. For
+pathological dyn-heavy contracts the asymptotic blow-up is third-order.
+
+Independently, `splitCriticalEdges` then processes those same edges
+and (because every JUMPDEST has many predecessors in the over-approx
+graph) splits each one with another `O(deg)` erase + insert pair —
+contributing the same asymptotic cost a second time. PR #446 already
+gates the SPP pipeline on JIT-consumer modules to bound the runtime
+impact, but the per-module cost is still material when a large
+contract is JIT-compiled.
+
+The dense edges contribute nothing to SPP's local shift decision:
+`lemma614Update` refuses to shift into any successor with
+`effectivePredCount > 1`, and every JUMPDEST that the dynamic jump
+could reach has many predecessors after over-approximation. The edges
+are pure compile-time tax.
+
+## Design
+
+### Implicit predecessor count (replaces `D * J` edges)
+
+`GasBlock` gains one field:
+
+```cpp
+uint32_t ImplicitDynamicPredCount = 0;
+```
+
+Set on every JUMPDEST when the contract has at least one unresolved
+dynamic jump. The count equals `D`, matching the number of
+predecessors the explicit over-approximation would have produced.
+
+`effectivePredCount` folds the count in:
+
+```cpp
+static size_t effectivePredCount(const GasBlock &Block) {
+  size_t Count = Block.Preds.size();
+  if (Block.Start == 0) ++Count;
+  Count += Block.ImplicitDynamicPredCount;
+  return Count;
+}
+```
+
+`lemma614Update` reads `effectivePredCount` for every shift decision,
+so it sees an identical "multi-pred?" answer to the explicit case.
+`buildCFGEdges` no longer adds any edge from a dynamic-jump block; the
+SPP graph carries only static fall-through and resolved static-jump
+edges.
+
+### Reachability stitch (closes the dom/loop gap)
+
+After `computeReachable` runs from the entry block, every JUMPDEST is
+seeded into the reachable set and forward-propagated via `Succs`.
+Without this step, dyn-only JUMPDESTs (e.g. Solidity function return
+addresses, reached at runtime only via `PUSH ret; ... JUMP`) would
+remain unreachable in the static-only CFG, and `computeDominators` /
+`buildLoopsUsingDominance` would skip them — letting their static
+successor chains miss SPP shifting opportunities.
+
+The stitch is purely additive (sets only `Reachable[x] = 1`) and
+maintains the dominator monotonicity property required by SPP.
+
+### Compile-time complexity
+
+| Pass                  | Before (over-approx)        | After (implicit count)       |
+|-----------------------|-----------------------------|------------------------------|
+| `buildCFGEdges`       | `O(D * J^2 + J * D^2)`      | `O(N)`                       |
+| `splitCriticalEdges`  | `O(D * J^2)` on dyn edges   | `O(N)` (no dyn edges to split) |
+| `computeReachable`    | `O(N + E_dense)`            | `O(N) + reachability stitch` |
+| `computeDominators`   | Bitset width up by `+1` per JUMPDEST extra Pred | Same width, sparser graph |
+
+## Alternatives considered
+
+### Super-node (DynDispatch hub) — rejected
+
+A virtual `DynDispatch` block routing all dynamic jumps into one hub,
+then fanning out to all JUMPDESTs. `O(D + J)` edges, preserves
+reachability without a stitch, every standard pass sees a "real" CFG.
+
+Implemented and benchmarked side-by-side. Wall times are local
+single-machine measurements (`evmone-unittests` for the
+`loop_full_of_jumpdests` test, single test, multipass mode). They are
+**not currently tracked in CI** — a dedicated compile-time-dense
+benchmark lane is out of scope for this PR.
+
+| Implementation | Wall time (local) |
+|----------------|-------------------|
+| `feat/gas-check-placement` (over-approx)  | 7.3 s |
+| **A** (implicit count, this PR)           | 3.3 s |
+| **B** (super-node)                        | 275 s |
+
+B's blow-up traces to `computeDominators` / `buildLoopsUsingDominance`
+on the dispatch hub: the hub creates a deeply irreducible CFG where
+the iterative dataflow takes super-linear passes to converge, and
+every back-edge into the hub triggers a `collectNaturalLoop` walk
+over every block transitively reachable from it. Patching the loop
+passes to special-case the hub re-introduces the structural
+asymmetry that motivated A in the first place. **B is unusable.**
+
+### Reproducing the scaling claim
+
+Build the manual demo and run the wrapper script:
+
+```bash
+cmake --build build --target evmCacheComplexityDemo
+bash docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/scaling_demo.sh
+```
+
+The demo generates a synthetic contract (`CALLDATALOAD JUMP <N x JUMPDEST>
+STOP`) and times the full `buildBytecodeCache` call.
+
+**Intra-PR comparison** (the same demo cherry-picked onto commit
+`99f23a3`, which is the PR's head one commit BEFORE Phase 7 — both
+states run the SPP pipeline; the only difference is the
+over-approximation representation):
+
+| N JUMPDESTs | Pre-Phase-7 (D×J explicit edges) | Phase 7 (O(N) implicit count) | Speedup |
+|------------:|---------------------------------:|------------------------------:|--------:|
+|         100 |   0.07 ms |  0.05 ms | 1.4× |
+|         500 |   0.39 ms |  0.13 ms | 3.0× |
+|       1,000 |   1.01 ms |  0.29 ms | 3.4× |
+|       2,000 |   3.04 ms |  0.67 ms | 4.5× |
+|       5,000 |  19.66 ms |  2.71 ms | 7.2× |
+|      10,000 |  84.76 ms | 10.38 ms | 8.2× |
+|      20,000 | 345.94 ms | 43.68 ms | **7.9×** |
+
+Pre-Phase-7 wall clock grows ~4× per doubling of `N` (quadratic — the
+expected O(D × J²) shape of explicit-edge add + critical-edge split).
+Phase 7 grows 2–4× per doubling — sub-quadratic, with the residual
+super-linearity sourced from `computeDominators` and
+`buildLoopsUsingDominance` running on the now-larger reachable set.
+
+**Scope of the O(N) claim**: Phase 7 makes the CFG over-approximation
+step itself O(N) (one count stamp per JUMPDEST). The wall clock above
+includes the rest of the SPP pipeline — `computeDominators` and
+`buildLoopsUsingDominance` are iterative dataflow with super-linear
+worst-case behaviour and dominate the time at large N. The 4-second
+saving on `loop_full_of_jumpdests` (7.3 s → 3.3 s above) is the Phase 7
+contribution; the remaining 3.3 s is dom / loop analysis plus JIT
+compile, untouched by this PR. Cutting that further would require a
+separate dom-analysis change.
+
+### Edge-budget fallback — rejected
+
+Keep the explicit over-approx but skip SPP when
+`D * J > kBudget`. Trades a complexity ceiling for an SPP cliff; on
+contracts that sit just over the budget the gas-check density jumps
+discontinuously. Solves a symptom rather than the root cause.
+
+## Impact
+
+### Performance (27 paper benches, `--benchmark_min_time=3x`, 5 reps)
+
+vs `feat/gas-check-placement` (PR #446) baseline:
+
+- **Geomean: 0.9727× (-2.73%)**
+- Arithmetic mean: -1.48%
+
+**Wins** (regressions from PR #446 reversed):
+
+| Benchmark | PR #446 vs upstream | A v2 vs PR #446 |
+|---|---|---|
+| `micro/jump_around/empty`       | +22.8% | **-53.1%** |
+| `micro/signextend/zero`         | -42.7% | -24.6% (further) |
+| `main/blake2b_huff/8415nulls`   | -5.3%  | -14.7% (further) |
+| `main/structarray_alloc/nfts_rank` | -6.2% | -4.9% (further) |
+| `main/snailtracer/benchmark`    | -      | -1.3% |
+| `main/weierstrudel/15`          | +17.5% | -2.5% |
+
+**Worst-case regressions** (vs PR #446):
+
+| Benchmark | A v2 vs PR #446 | Note |
+|---|---|---|
+| `main/sha1_shifts/empty`  | +27.0% (mean) | Single-outlier noise; median delta +2.7% |
+| `micro/memory_grow_mstore/by16` | +13.98% | Real |
+| `micro/memory_grow_mload/by32`  | +10.64% | Real |
+| `micro/loop_with_many_jumpdests/empty` | +6.81% | Real (was +48.5% in A v1 without reachability stitch) |
+
+All real regressions are well under the 25% CI gate. The
+`sha1_shifts/empty` mean is pulled up by one rep that hit 8.87us out
+of 5; the median is +2.7%.
+
+### Correctness
+
+- `evmone-unittests` multipass: **223/223 pass**, 8.4 s wall time
+  (vs 13 s baseline, 305 s for scheme B).
+- `tools/format.sh check`: clean.
+
+## Changed files
+
+- `src/evm/evm_cache.cpp` — `GasBlock::ImplicitDynamicPredCount` field;
+  `effectivePredCount` folds it in; `buildCFGEdges` stamps the count
+  on every JUMPDEST and skips the `D * J` edge-add loop; reachability
+  stitch in `buildGasChunksSPP` seeds every JUMPDEST as a root after
+  `computeReachable`.
+
+### Performance — full PR #446 (with this optimization) vs `upstream/main`
+
+After rebasing `feat/gas-check-placement` onto current `upstream/main`
+(which now includes #458/#460/#482/#483 upstream perf work), the
+end-to-end picture on the same 27-bench paper filter is essentially
+flat:
+
+- **27-bench 10-rep geomean: +1.15%** (treatment slower).
+- 0 benches above the ±25% CI gate.
+- **Caveat — single-session sequential 10-rep is noisy**: a focused 20-rep
+  re-measurement on the four largest 10-rep movers showed they collapse
+  to evmone-bench's inter-binary drift band:
+
+  | Bench | 10-rep Δ | 20-rep Δ (focused) |
+  |---|---|---|
+  | `main/weierstrudel/1` | +3.51% | +0.55% (treat CV 2.19%) |
+  | `main/blake2b_huff/8415nulls` | −6.30% | +1.55% (flipped) |
+  | `micro/loop_with_many_jumpdests/empty` | −4.84% | −0.55% |
+  | `main/blake2b_shifts/8415nulls` | +20.34% (CV 21.93%) | +0.25% (CV 2.09%) |
+
+- Three of the four 10-rep "regression" benches above the noise band —
+  `micro/memory_grow_mstore/{nogrow,by1}`, `micro/memory_grow_mload/nogrow`
+  — contain **zero JUMP / JUMPI / JUMPDEST opcodes**, so PR #446's CFG
+  changes cannot affect them by construction. Those deltas are pure
+  drift artifacts.
+
+The earlier −2.73% A-vs-PR-base geomean still holds — this change does
+improve over PR #446's pre-rebase head. But the cumulative PR #446
+benefit over current upstream/main has shrunk to within drift band on
+this 27-bench corpus: the intervening upstream perf commits absorbed
+the absolute speedup, and the residual per-bench deltas are not
+statistically distinguishable from inter-binary system drift.
+
+### A note on the SPP→JIT cost-flow mechanism
+
+PR #446 is the first time SPP-shifted gas costs reach the JIT in any
+version of DTVM. SPP redistributes cost between blocks but preserves
+total gas across any path. For contracts with many JUMPDESTs targeted
+by dynamic jumps, the lemma 6.14 multi-pred guard prevents shifts
+INTO those JUMPDESTs but allows shifts OUT, which can mildly inflate
+the chunk-start metering immediate at each JUMPDEST. This theoretical
+effect would not be visible on the runtime side of the 27-bench
+corpus at current measurement precision (20-rep focused on
+`main/weierstrudel/1` — the most dyn-dispatch-heavy bench — shows
++0.55% delta, within CV). A future PR could gate `GasChunkCostSPP`
+to `nullptr` for JUMPDEST-density-heavy contracts if a measurable
+regression surfaces; nothing in the current corpus justifies the
+added gating logic.
+
+## Out of scope
+
+- The peripheral diagnostics about `GasChunkCostSPP` in clangd are
+  pre-existing for the PR #446 branch and unrelated to this change.
+- Re-introducing super-node / DynDispatch later — would require
+  rewriting `computeDominators` and `buildLoopsUsingDominance` to
+  treat dispatch hubs structurally, which is invasive and gives no
+  measurable benefit over the implicit-count representation.
diff --git a/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/review-fixes-r2.md b/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/review-fixes-r2.md
new file mode 100644
index 000000000..8fdfe9644
--- /dev/null
+++ b/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/review-fixes-r2.md
@@ -0,0 +1,262 @@
+# PR #446 Round-2 Review Response Plan
+
+- **Status**: Revised after round-1 review (Opus + Codex)
+- **Date**: 2026-05-12
+- **Parent change**: `README.md` (SPP CFG implicit-dyn-pred Phase 7)
+- **Branch**: `feat/gas-check-placement`
+
+This plan addresses the 2026-05-12 self-review of post-rebase PR #446,
+revised after round-1 dual-reviewer feedback. Round-1 surfaced four
+substantive corrections: R1.1 is not directly observable, R1.2 needs a
+stronger oracle, R2 is a real semantic change (not a perf guard), and
+R3's target should be the stale comment at evm_cache.cpp:1054-1059.
+
+## Blocking before merge
+
+### R1 — Targeted cache-builder unit tests for Phase 7 invariants
+
+**Symptom**: Phase 7 introduces two new mechanisms — `ImplicitDynamicPredCount`
+folded into `effectivePredCount`, and a reachability stitch that seeds every
+JUMPDEST as a BFS root after `computeReachable`. No test in `src/tests/`
+exercises either directly. The 223+215+2723 corpus pass empirically but
+won't isolate a regression in the stitch or implicit-pred logic.
+
+**Observability constraint** (from round-1 review): `struct GasBlock` and
+`ImplicitDynamicPredCount` are file-static in `src/evm/evm_cache.cpp` (line
+~197). Only `EVMBytecodeCache` arrays are exposed via `evm_cache.h`.
+**`GasChunkCostSPP[i] != 0` is not a valid oracle for "block was reached"**
+— `buildGasChunksSPP` writes every non-empty block's `Metering[Id]` into
+`GasChunkCostSPP` regardless of whether SPP analysis reached it
+(evm_cache.cpp:1207-1219). The only valid oracle is the **specific shifted
+value at PC**: when SPP analysis ran on a block, the shifted value differs
+from the unshifted base cost in a deterministic, hand-computable way.
+
+**Fix**: add a new test executable `evmCacheTests` to `src/tests/CMakeLists.txt`
+that includes `evm/evm_cache.h` directly and drives `buildBytecodeCache`.
+Use raw-hex bytecode fixtures.
+
+Three cases:
+
+1. **`Stitch_Reaches_DynOnly_JumpDest_Affects_SPP`**
+   Fixture: a contract where one JUMPDEST `A` has NO static predecessor
+   (only reachable via a dynamic JUMP elsewhere). `A` has a successor `S`
+   with `effectivePredCount(S) == 1` and a non-terminator cost that
+   lemma 6.14 would shift back into `A`.
+
+   **Oracle caveat (round-2 review)**: `computeReverseTopo`
+   (evm_cache.cpp:697-735) iterates every block without filtering by
+   `Reachable[]`, so the negative-control claim "without stitch, S not
+   in RevTopo" would be wrong. What actually changes when the stitch
+   fires is `computeDominators` input (Reachable-gated at
+   evm_cache.cpp:630-633): with stitch, A's dom-set is computed against
+   a live forward CFG; without stitch, A is self-dom.
+   `findBackEdgesUsingDominators` and the loop-aware shift path then
+   diverge.
+
+   Oracle: build the cache twice — once with the stitch live (current
+   code) and once with a test-local stitch-off path that no-ops the
+   seed loop. Assert `GasChunkCostSPP[A.Start]` differs between the two.
+   This is a "stitch toggles observable behavior" assertion; it does NOT
+   require hand-computing the exact shifted value, but does require a
+   stitch-off variant accessible to the test (a `#ifdef
+   DTVM_TEST_STITCH_OFF` block, or duplicate the build path in the test
+   TU with the seed loop disabled). If the toggle mechanic proves too
+   invasive, skip case 1 and rely on cases 2 and 3 below.
+
+2. **`No_Shift_Into_Implicit_MultiPred_JumpDest`**
+   Fixture: a JUMPDEST `B` with exactly 1 explicit static predecessor AND
+   ≥ 1 implicit dyn-jump source elsewhere in the contract.
+   - lemma 6.14 INTO `B`: `effectivePredCount(B) = 1 + DynamicJumpCount ≥ 2`,
+     should refuse to shift cost from B's predecessor INTO B.
+   - Assertion: `GasChunkCostSPP[predOf_B.Start]` is NOT modified by a
+     shift that would have moved cost into `B`. Concretely, the shifted
+     value at the predecessor should not include any contribution from
+     `B`'s base cost.
+
+3. **`Shift_OUT_From_MultiPred_JumpDest_Still_Works`** (added per round-1
+   reviewer note)
+   Fixture: a JUMPDEST `M` that has multiple implicit dyn-pred (so
+   `effectivePredCount(M) > 1`, no shift INTO M), but has at least one
+   successor `T` with `effectivePredCount(T) == 1`.
+   - lemma 6.14 looks at M's successors (evm_cache.cpp:960-972). The
+     check is on `effectivePredCount(Blocks[Succ])`, NOT on M itself. So
+     shifting cost from `T` back into `M` IS still allowed.
+   - Assertion: `GasChunkCostSPP[M.Start]` reflects the shift FROM T, i.e.
+     is greater than `GasChunkCost[M.Start]` (M's unshifted base cost).
+
+**Verification**:
+- New test target builds and links cleanly.
+- All three cases pass; explicitly disabling the stitch (debug experiment)
+  must make case 1 fail (oracle is meaningful).
+- `tools/format.sh check` clean.
+- Existing 223/215/2723 corpus unaffected.
+
+**Out of scope**: bytecode fuzzing. Targeted hand-crafted fixtures only.
+
+### R2 — Restrict stitch BFS seeding to dyn-target JUMPDESTs only
+
+**Re-framed per round-1 review**: this is a **semantic change**, not a
+perf guard. The current stitch (evm_cache.cpp:1066-1092) seeds every
+JUMPDEST as a BFS root, including:
+
+1. JUMPDESTs in no-dyn-jump contracts that are statically dead (no pred).
+2. JUMPDESTs in mixed contracts (dyn + static) that have no static or
+   implicit-dyn predecessor — i.e. genuinely-dead JUMPDESTs that no jump
+   targets at all.
+
+Pre-Phase-7, both classes were unreachable in `Reachable[]` and therefore
+ignored by `computeDominators` / `lemma614Update`. Post-Phase-7, both
+classes are now in `Reachable[]`, their dom-tree positions get computed
+(evm_cache.cpp:630-657), they enter `RevTopo`, and `lemma614Update` is
+called on them (evm_cache.cpp:1127-1132 has no `Reachable[]` gate). So
+their loop / backedge / SPP decisions are now potentially different.
+
+**Why this blocks**: silent semantic change on a class of contracts the
+post-rebase 27-bench corpus doesn't isolate. The behavior change is
+benign in most cases (dead JUMPDESTs have no out-flow, so no cost shifts
+through them), but it widens the dom/loop analysis input set in ways the
+review can't fully predict.
+
+**Fix**: change the stitch seed set from "all JUMPDESTs" to "only JUMPDESTs
+with `ImplicitDynamicPredCount > 0`". Implementation: inside the stitch
+loop (currently evm_cache.cpp:1076-1080), gate the `if (Reachable[JdId] == 0)`
+seed with `if (Blocks[JdId].ImplicitDynamicPredCount > 0)`. This restores
+pre-Phase-7 behavior on truly-dead JUMPDESTs while still rescuing real
+dyn-targets.
+
+**Verification**:
+- `Reachable[]` is internal to `buildGasChunksSPP`; the public header
+  only exposes `GasChunkCost{,SPP}`, `JumpDestMap`, `PushValueMap`,
+  `GasChunkEnd` (evm_cache.h:18-36). So the test asserts on cache state
+  delta, not on `Reachable[]` directly.
+- Fixture: contract with no dyn-jumps + one statically-dead JUMPDEST
+  `D`. With R2's gate, `D.ImplicitDynamicPredCount == 0`, the stitch
+  skips it, and `D`'s `Metering[]` value remains its unshifted base
+  cost (no `lemma614Update` call considers shifting into `D` because
+  no block has `D` in its Succs). Assertion:
+  `GasChunkCostSPP[D.Start] == GasChunkCost[D.Start]` (no shift).
+  Without the gate (regression case), `D` is in `Reachable[]`,
+  `computeDominators` may treat its position differently, and a
+  shift may alter `GasChunkCostSPP[D.Start]`. The before/after is the
+  observable delta. Implement as a unit test in `evmCacheTests`.
+- Existing tests pass.
+
+**Out of scope**: revisiting whether the dom/loop analyses should run on
+unreachable nodes at all. The conservative move here is to preserve
+pre-Phase-7 behavior on the dead-island class.
+
+### R3 — Fix the stale CFG comment block at evm_cache.cpp:1054-1059
+
+**Symptom**: the comment block above the `buildCFGEdges` call site at
+`evm_cache.cpp:1054-1059` reads:
+
+```
+// Build CFG with over-approximation for all unresolved dynamic jumps.
+// Static jumps (PUSH → JUMP) get precise single-target edges; dynamic
+// jumps get edges to every JUMPDEST. This is intentionally conservative —
+// ...
+```
+
+The text "dynamic jumps get edges to every JUMPDEST" is **wrong** post-
+Phase-7. Inside `buildCFGEdges` (evm_cache.cpp:446-447) the new behavior
+is explicitly "No explicit Succs/Preds edges added" for dyn jumps. A
+future contributor reading the call-site comment will be misled.
+
+**Why this blocks**: stale documentation lures contributors into
+re-introducing the D × J explicit edges (undoing Phase 7) "to match the
+documented behavior".
+
+**Fix**: replace the call-site comment block (1054-1059) with one that
+matches the new implementation. Suggested text:
+
+```
+// Build CFG. Static jumps (PUSH → JUMP) get precise single-target edges.
+// For unresolved dynamic jumps the CFG is kept sound by stamping each
+// JUMPDEST with ImplicitDynamicPredCount instead of materialising the
+// D × |JUMPDEST| explicit edges — that count is folded into
+// `effectivePredCount`, so `lemma614Update`'s "shift only into
+// single-effective-pred successors" check behaves identically to the
+// old explicit-edge representation. The `splitCriticalEdges` pass below
+// operates on explicit Succs/Preds and therefore never sees dyn-jump →
+// JUMPDEST edges; that is intentional because the multi-predecessor
+// guard in `lemma614Update` (with implicit count folded INTO
+// effectivePredCount) blocks shifts whenever effective preds > 1.
+```
+
+**Wording rationale (round-2 review note)**: an earlier draft said "any
+`ImplicitDynamicPredCount > 0` rejects shifts INTO". That is wrong when
+`ImplicitDynamicPredCount == 1` and the JUMPDEST has no explicit static
+pred — `effectivePredCount` would be 1 and the guard would NOT fire. In
+practice that case is moot (no block has the JUMPDEST in its Succs when
+all entries are dyn, so no `lemma614Update` call considers shifting
+into it), but the comment must phrase the invariant in terms of
+`effectivePredCount > 1` to be technically correct.
+
+**Verification**: comment correct vs implementation. No code change.
+
+## Non-blocking nice-to-have
+
+### R5 — Soften the `loop_full_of_jumpdests` compile-time claim
+
+**Symptom**: `docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md`
+claims "7.3s → 3.3s" without noting that this is a local single-machine
+measurement, not regression-protected by CI.
+
+**Fix**: update the README phrasing to "7.3s → 3.3s on a local
+single-machine run; not currently tracked in CI". Defer adding a
+compile-time bench lane to a separate PR.
+
+### R6 — Optional paranoid assert in implicit-pred stamp loop
+
+**Symptom**: `buildCFGEdges` stamps `ImplicitDynamicPredCount` on every
+block ID in `JumpDestBlocks` without verifying each ID is actually a
+JUMPDEST opcode.
+
+**Fix (if cheap)**: add `ZEN_ASSERT(Blocks[JdId].LastOpcode == evmc::OP_JUMPDEST)`
+before the stamp. Skip if `ZEN_ASSERT` is not available in this TU
+without dragging in extra includes.
+
+## Dropped
+
+### R4 — Document duplicated `isGasChunkTerminator` check — **dropped**
+
+Round-1 review: the comment block above `effectivePredCount` (~line
+930-937) already documents the multi-pred guarantee, and the
+`MinSucc = 0` rationale is already commented at evm_cache.cpp:963-967.
+Adding another comment is noise per `.claude/rules/cpp-code-style.md`
+("Only include essential comments").
+
+## Execution order
+
+1. **R3** (comment-only) — lowest risk, no code behavior change. Land
+   first so any subsequent diff stays small.
+2. **R2** (stitch-gate) — code change. Verify via fixture that
+   statically-dead JUMPDESTs return to `Reachable[]=0`.
+3. **R1** (3 unit tests). Build `evmCacheTests` and ensure all three
+   cases pass against the post-R2 implementation.
+4. **R5** (doc softening) — one-line phrase change.
+5. **R6** (assert) — optional, decide at execution time based on header
+   reach.
+
+After each step: `tools/format.sh check`, build target, run unit tests.
+
+## Verification gate before commit
+
+- New `evmCacheTests` target builds and all 3 cases pass.
+- `tools/format.sh check` clean.
+- `cmake --build build --target dtvmapi -j$(nproc)` clean (no new warnings).
+- `evmone-unittests` multipass: 223/223 pass.
+- `evmone-statetest --fork Cancun` multipass: smoke run.
+
+## Risks
+
+- **R1 fixture authoring** is the largest unknown. Hand-computing the
+  expected shifted SPP value requires careful bytecode design. If
+  difficulty exceeds budget, fall back to a single "stitch toggles
+  observable behavior" assertion (case 1 only).
+- **R2 semantic change** may surface in the existing 2723 statetest
+  corpus. If so, this becomes a 3-way decision: revert R2, narrow the
+  guard further, or accept the semantic broadening. Run statetest after
+  R2 lands.
+- **R3, R5, R6** carry no runtime risk.
+
diff --git a/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/scaling_demo.sh b/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/scaling_demo.sh
new file mode 100755
index 000000000..64fe57022
--- /dev/null
+++ b/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/scaling_demo.sh
@@ -0,0 +1,18 @@
+#!/usr/bin/env bash
+# Sweep buildBytecodeCache wall-clock across N JUMPDESTs. Background and
+# numbers live in README.md alongside this script.
+# Prereq: cmake --build build --target evmCacheComplexityDemo
+
+set -euo pipefail
+
+DEMO=${EVMCACHE_DEMO:-build/evmCacheComplexityDemo}
+if [[ ! -x "$DEMO" ]]; then
+  echo "demo binary not found at $DEMO" >&2
+  echo "build it with: cmake --build build --target evmCacheComplexityDemo" >&2
+  exit 1
+fi
+
+echo "n_jumpdests,build_ms"
+for N in 100 500 1000 2000 5000 10000 20000; do
+  "$DEMO" "$N"
+done
diff --git a/docs/design/evm-gas-mechanism.md b/docs/design/evm-gas-mechanism.md
new file mode 100644
index 000000000..9f4f23694
--- /dev/null
+++ b/docs/design/evm-gas-mechanism.md
@@ -0,0 +1,341 @@
+# EVM Gas Mechanism (Interpreter and JIT)
+
+This document describes how DTVM accounts for EVM gas in both the
+interpreter and the multipass JIT, and how the SPP (Structured
+Precharging Pass) shifts charges along the control-flow graph for the
+JIT consumer while keeping the interpreter's per-block totals
+unchanged.
+
+## Goals
+
+- Charge each EVM execution path the exact gas the spec requires.
+- Detect Out-Of-Gas (OOG) before any state change occurs.
+- Amortize the per-opcode "is there enough gas?" check across
+  straight-line code so the hot path reduces to one comparison per
+  basic block (interpreter) or per chunk start (JIT).
+
+## Shared data: the bytecode cache
+
+Both execution engines read from a single
+`zen::evm::EVMBytecodeCache` (`src/evm/evm_cache.h`). The cache is
+built lazily on first access — `EVMModule::initBytecodeCache` is
+defined at `src/runtime/evm_module.cpp:133-136`; the SPP-gating site
+that flips `CacheNeedsSPP` lives at `src/runtime/evm_module.cpp:117`.
+The cache exposes five parallel arrays indexed by program counter
+(PC):
+
+| Field            | Indexed by | Meaning                                                                                 |
+| ---------------- | ---------- | --------------------------------------------------------------------------------------- |
+| `JumpDestMap`    | PC         | 1 if a `JUMPDEST` opcode begins at this PC, else 0.                                     |
+| `PushValueMap`   | PC         | The 256-bit immediate decoded from a `PUSH*` at this PC (otherwise 0).                  |
+| `GasChunkEnd`    | chunk-start PC | Exclusive end PC of the gas chunk that starts here. Zero for non-chunk-start PCs.  |
+| `GasChunkCost`   | chunk-start PC | **Unshifted** sum of opcode gas costs in the chunk (interpreter consumer).         |
+| `GasChunkCostSPP`| chunk-start PC | **SPP-shifted** chunk cost (JIT consumer). Empty when SPP is disabled for this module. |
+
+A "gas chunk" is a maximal straight-line region whose static gas
+cost can be summed once at chunk construction. It ends at any
+**gas-chunk terminator** (`isGasChunkTerminator` at
+`src/evm/evm_cache.cpp:41-62`): the control-flow exits
+`STOP`/`RETURN`/`REVERT`/`SELFDESTRUCT`/`INVALID`/`JUMP`/`JUMPI` and
+the gas-sensitive opcodes
+`SSTORE`/`CALL`/`CALLCODE`/`DELEGATECALL`/`STATICCALL`/`CREATE`/
+`CREATE2`/`GAS`. The terminator is **inside** its chunk — its static
+cost is included in `GasChunkCost` (`evm_cache.cpp:329`) and a fresh
+chunk starts at the *next* PC (`evm_cache.cpp:291-296`). A chunk also
+ends just before a `JUMPDEST` (since `JUMPDEST` itself begins a new
+chunk) and at the end of the bytecode.
+
+```mermaid
+flowchart LR
+    Bytecode["EVM bytecode"]
+    JD[JumpDestMap]
+    PV[PushValueMap]
+    CE[GasChunkEnd]
+    CC["GasChunkCost<br/>(unshifted)"]
+    SPP["GasChunkCostSPP<br/>(shifted, optional)"]
+
+    Bytecode --> Builder["buildBytecodeCache<br/>(src/evm/evm_cache.cpp)"]
+    Builder --> JD
+    Builder --> PV
+    Builder --> CE
+    Builder --> CC
+    Builder -. "EnableSPP=true" .-> SPP
+
+    JD --> Interpreter
+    PV --> Interpreter
+    CE --> Interpreter
+    CC --> Interpreter
+
+    JD --> JIT["Multipass JIT<br/>(EVMMirBuilder)"]
+    PV --> JIT
+    CE --> JIT
+    CC --> JIT
+    SPP --> JIT
+```
+
+The two consumers read disjoint chunk-cost arrays so neither
+perturbs the other. Concretely:
+
+- The interpreter reads only `GasChunkCost` (`src/evm/interpreter.cpp:382`).
+- The JIT prefers `GasChunkCostSPP` when non-null and falls back to
+  `GasChunkCost` otherwise (`src/compiler/evm_frontend/evm_mir_compiler.cpp:534, 578, 1315`).
+
+## Interpreter mode
+
+The interpreter runs the dispatch loop in
+`BaseInterpreter::interpret` (`src/evm/interpreter.cpp:362`). Each
+outer iteration starts at `Frame->Pc` and tries the **chunk fast
+path** first:
+
+```mermaid
+flowchart TD
+    Start(["outer iter<br/>Pc = ChunkStartPc"]) --> Cond{"GasChunkEnd[Pc] &gt; Pc<br/>AND<br/>gas &gt;= GasChunkCost[Pc]?"}
+    Cond -- "no (either side)" --> Slow["Per-opcode dispatch<br/>(switch/handler call,<br/>line 1610+)<br/>handler invokes chargeGas"]
+    Slow --> SlowOOG{"chargeGas:<br/>gas &lt; opcode cost?"}
+    SlowOOG -- "yes" --> OOG["setStatus(EVMC_OUT_OF_GAS)<br/>break"]
+    SlowOOG -- "no" --> Pcpp["Frame->Pc++"]
+    Pcpp --> Start
+
+    Cond -- "yes" --> Pre["Frame->Msg.gas -= GasChunkCost[Pc]<br/>(pre-charge entire chunk)"]
+    Pre --> CG["Computed-goto fast path<br/>until Pc &gt;= ChunkEnd"]
+    CG --> Restart{"control-flow<br/>opcode hit?"}
+    Restart -- "no" --> Start
+    Restart -- "yes (JUMP/JUMPI/...)<br/>update Pc, restart" --> Start
+```
+
+Key properties:
+
+- Inside a chunk, **no gas check happens per opcode** — the chunk's
+  total has already been deducted at the chunk start. The
+  computed-goto loop simply executes opcodes, advances `Pc`, and
+  checks `Pc >= ChunkEnd` to exit
+  (`DISPATCH_NEXT` macro at `src/evm/interpreter.cpp:525`).
+- Opcodes whose behaviour depends on `gas_left` at runtime
+  (`SSTORE`, `CALL*`, `CREATE*`, `GAS`) are gas-chunk terminators —
+  each is the **last** opcode of its chunk, so the chunk's static
+  pre-charge has been applied before the handler runs and any
+  dynamic delta the handler charges (via `chargeGas` at
+  `src/evm/interpreter.cpp:33-50`) is layered on top of an accurate
+  `gas_left` value.
+- Memory expansion is **not** a chunk boundary: opcodes that touch
+  memory (`MLOAD`, `MSTORE`, `MSTORE8`, `KECCAK256`, the various
+  `*COPY` opcodes, `RETURN`, `REVERT`, …) charge their dynamic
+  expansion delta inline by calling `expandMemoryAndChargeGas`
+  (`src/evm/opcode_handlers.cpp:261`) from within the handler.
+- The interpreter intentionally consumes the **unshifted** cost
+  (PR #371). The cache must keep an unshifted column available
+  regardless of whether SPP runs.
+
+## Multipass JIT mode
+
+The JIT lowers EVM bytecode to dMIR via `EVMMirBuilder`
+(`src/compiler/evm_frontend/evm_mir_compiler.{h,cpp}`). Gas accounting
+is woven into MIR generation by two helpers:
+
+- `meterOpcode(Opcode, PC)` — emit the gas check for one opcode at
+  `PC` (`src/compiler/evm_frontend/evm_mir_compiler.cpp:524`).
+- `meterOpcodeRange(StartPC, EndPCExclusive)` — emit the gas check
+  for a contiguous PC range, used by the JUMPDEST run optimization
+  (`src/compiler/evm_frontend/evm_mir_compiler.cpp:544`).
+
+Both ultimately call `meterGas(Cost)` to emit the actual dMIR
+sequence (`src/compiler/evm_frontend/evm_mir_compiler.cpp:607`,
+short-circuits when `GasCost == 0` at line 608):
+
+```mermaid
+flowchart TD
+    A["meterOpcode(Op, PC)"] --> B{"GasMeteringEnabled?"}
+    B -- "no" --> X(["return (no MIR)"])
+    B -- "yes" --> Cache{"Chunk cache populated?<br/>(GasChunkEnd &amp;&amp; GasChunkCost<br/>&amp;&amp; PC &lt; GasChunkSize)"}
+    Cache -- "no (cache absent)" --> PerOp["Cost = InstructionMetrics[Op].gas_cost<br/>meterGas(Cost)"]
+    Cache -- "yes" --> ChunkStart{"GasChunkEnd[PC] &gt; PC?<br/>(this PC is a chunk start)"}
+    ChunkStart -- "no (mid-chunk PC)" --> Skip(["return (no MIR;<br/>chunk start already paid)"])
+    ChunkStart -- "yes" --> Sel["Cost = GasChunkCostSPP[PC]<br/>  ?? GasChunkCost[PC]<br/>meterGas(Cost)"]
+    PerOp --> Emit
+    Sel --> Emit
+
+    Emit["meterGas(Cost) emits dMIR:<br/>  CurrentGas = load gas<br/>  IsOutOfGas = (CurrentGas &lt; Cost)<br/>  brif IsOutOfGas, OOGBlock, ContinueBlock<br/>  NewGas = CurrentGas - Cost<br/>  store NewGas"]
+    Emit --> Cont(["fall through to opcode lowering"])
+```
+
+Two consequences:
+
+1. The JIT emits **at most one gas check per chunk** — the call at
+   the chunk-start opcode covers every opcode up to (but not
+   including) the next chunk start. Calls at mid-chunk PCs see
+   `GasChunkEnd[PC] == 0` and return without emitting any MIR
+   (`evm_mir_compiler.cpp:529, 537`). The fast path at line 553-572
+   in `meterOpcodeRange` consumes a precomputed
+   `JumpDestRunLastPC`/`JumpDestRunSkipCost` table; the table itself
+   is populated when the JUMPDEST run jump-table is materialized
+   (`evm_mir_compiler.cpp:1297-1335`), so dispatching across a run
+   of consecutive `JUMPDEST`s costs one `meterGas` call.
+2. The OOG branch is shared across all gas checks in the function
+   via `getOrCreateExceptionSetBB(ErrorCode::GasLimitExceeded)`,
+   keeping the cold path out of the hot block layout
+   (`evm_mir_compiler.cpp:626, 663`).
+
+When the build is configured with `ZEN_ENABLE_EVM_GAS_REGISTER`, the
+gas value lives in a virtual register (`GasRegVar`,
+`evm_mir_compiler.cpp:614-642`) instead of being reloaded from
+memory on every `meterGas`. Synchronization back to `EVMInstance`
+happens at any host-call boundary that may read or update gas —
+not just `CALL*`/`CREATE*`/return, but also runtime helpers such as
+the balance/code/keccak/memory-load handlers (`syncGasToMemory`
+calls at `evm_mir_compiler.cpp:3556, 3623, 3638, 3652, 3745, 3776,
+3857, 3976, 4054, 4136`; `syncGasToMemoryFull` is invoked at module
+return / `RETURN` / `REVERT` / `STOP` /
+`SELFDESTRUCT` paths around lines 1246, 4167-4259).
+
+## SPP cost shifting
+
+The Structured Precharging Pass — implemented as `lemma614Update` in
+`src/evm/evm_cache.cpp:919` — moves gas costs **backwards** along the
+CFG. For each non-cycle node, it charges the minimum successor cost
+upfront, so the consumer only pays the residual at runtime:
+
+```
+                 Block A (cost = 3)
+                /                \
+       Block B (5)            Block C (7)
+
+After SPP (min successor = 5 charged at A):
+
+                 Block A' (cost = 3 + 5 = 8)
+                /                \
+       Block B' (0)           Block C' (2)
+```
+
+(The diagram assumes B and C each have only A as predecessor and
+neither ends with a gas-chunk terminator — `lemma614Update` only
+shifts when those preconditions hold; see the
+`effectivePredCount == 1` and `isGasChunkTerminator` guards at
+`evm_cache.cpp:940, 944, 966`.)
+
+Per-path totals are preserved: A→B is `3+5 = 8` before and
+`8+0 = 8` after; A→C is `3+7 = 10` before and `8+2 = 10` after. The
+benefit is that B's chunk now starts with cost zero, which lets
+`meterGas` short-circuit and emit no dMIR at all
+(`evm_mir_compiler.cpp:608`), and C's chunk only needs to charge the
+residual `2`. The JIT therefore emits fewer non-trivial gas checks
+on the hot path and shrinks the OOG fan-out.
+
+Soundness on cycles: the shift never crosses back-edges or
+gas-chunk terminators (`SSTORE`/`CALL*`/`CREATE*`/`GAS`), so dynamic
+gas is always charged at the correct point
+(`evm_cache.cpp:421-427, 919-960`).
+
+### Why a separate `GasChunkCostSPP` array
+
+The interpreter's chunk fast path was specified against the
+**unshifted** per-block cost in PR #371 and the cache must continue
+to honour that contract. To enable SPP for the JIT without
+disturbing the interpreter, the cache exposes two parallel arrays:
+
+- `GasChunkCost` — unshifted, written from `Blocks[Id].Cost`
+  (`evm_cache.cpp:1161`), consumed by the interpreter.
+- `GasChunkCostSPP` — shifted, written from the metering function
+  `Metering[Id]` (`evm_cache.cpp:1165`), consumed by the JIT.
+
+The shifted variant is sound for the JIT because SPP refuses to
+shift cost across **gas-sensitive terminators**: `GAS`, `CALL*`,
+`CREATE*` (`isGasSensitiveTerminator` and `isGasChunkTerminator`
+checks at `evm_cache.cpp:944, 966`). Each of these opcodes ends its
+own chunk, so by the time it executes the chunk's cost — shifted or
+not — has already been deducted at the chunk-start `meterGas`, and
+the value the opcode reads (e.g. `GAS`) reflects the spec-mandated
+remaining gas. Cost from the *successor* chunk never leaks back
+across the terminator.
+
+### Mixed-precision CFG
+
+The SPP pass needs a sound CFG to compute "minimum successor cost"
+correctly:
+
+```mermaid
+flowchart LR
+    subgraph Static[Static jump]
+      P1[PUSH dest_pc] --> J1[JUMP]
+      J1 -. resolved .-> D1[JUMPDEST at dest_pc]
+    end
+
+    subgraph Dynamic[Dynamic jump]
+      X[stack-derived target] --> J2[JUMP]
+      J2 -. over-approx .-> D2[every JUMPDEST]
+    end
+```
+
+- `PUSH n; JUMP` resolves to a single edge to `JUMPDEST` at PC `n`
+  (`resolveConstantJumpTarget` in `evm_cache.cpp`).
+- Every other dynamic `JUMP` gets edges to **all** `JUMPDEST`
+  blocks (`buildCFGEdges`, `evm_cache.cpp:386-429`).
+
+Narrowing dynamic-jump edges using partial call-site information
+would under-approximate the CFG and let SPP shift charges along
+runtime-impossible edges, which breaks the per-path total invariant.
+The over-approximation is intentional and documented inline
+(`evm_cache.cpp:419-427`).
+
+## Pipeline gating
+
+The SPP CFG construction and shifting pass is significant compile-
+time work and is only useful for the JIT consumer. Interpreter-only
+modules skip it via `EVMModule::CacheNeedsSPP`:
+
+```mermaid
+sequenceDiagram
+    participant Loader as EVMModule::create
+    participant Mod as EVMModule
+    participant Cache as EVMBytecodeCache
+    participant JIT as performEVMJITCompile
+
+    Loader->>Mod: construct (CacheNeedsSPP=false)
+    alt RunMode != InterpMode
+        Loader->>Mod: EVMAnalyzer.analyze()
+        alt JIT-suitable
+            Loader->>Mod: CacheNeedsSPP = true
+            Loader->>JIT: performEVMJITCompile(Mod)
+            JIT->>Cache: getBytecodeCache()
+            Cache->>Cache: buildBytecodeCache(EnableSPP=true)
+            Note right of Cache: builds CFG, runs SPP,<br/>fills GasChunkCostSPP
+            Cache-->>JIT: cache (with SPP)
+        end
+    end
+
+    Note over Loader,Cache: First call to interpreter only:
+    Loader->>Cache: getBytecodeCache()
+    Cache->>Cache: buildBytecodeCache(EnableSPP=false)
+    Note right of Cache: skips CFG/SPP,<br/>GasChunkCostSPP stays empty
+```
+
+`evm_compiler.cpp` passes `nullptr` for the SPP pointer when the
+array is empty
+(`src/compiler/evm_compiler.cpp:70-74`), so a JIT compilation that
+runs without SPP (e.g. JIT bypass paths) cleanly falls back to the
+unshifted array and remains correct.
+
+## Failure mode summary
+
+| Trigger                                                       | Where                                                              | Result                                       |
+| ------------------------------------------------------------- | ------------------------------------------------------------------ | -------------------------------------------- |
+| Interpreter, gas insufficient for full chunk pre-charge       | Combined check at `interpreter.cpp:397-398`                        | Skip fast path; fall through to slow path    |
+| Interpreter slow path, gas < per-opcode cost                  | `chargeGas` at `interpreter.cpp:33-50`                             | `setStatus(EVMC_OUT_OF_GAS)`, exit outer loop |
+| JIT chunk-start `meterGas`, gas < `Cost`                      | `meterGas` `IsOutOfGas` branch (`evm_mir_compiler.cpp:622-631`)    | Branch to shared `OutOfGasBB`                 |
+| JIT mid-chunk per-opcode `meterGas`, gas < `Cost`             | Same code path, just smaller `Cost`                                 | Same shared `OutOfGasBB`                      |
+| Dynamic-cost opcode (`SSTORE`/`CALL*`/`CREATE*`) underpaid    | Forced chunk boundary; charged by handler call                     | Returns OOG status to dispatcher              |
+
+## References
+
+- `src/evm/evm_cache.{h,cpp}` — bytecode cache, CFG construction,
+  `buildGasChunksSPP`, `lemma614Update`.
+- `src/evm/interpreter.cpp` — chunk fast path (line 395), per-opcode
+  `chargeGas` (line 33).
+- `src/compiler/evm_frontend/evm_mir_compiler.{h,cpp}` —
+  `meterOpcode`, `meterOpcodeRange`, `meterGas`.
+- `src/runtime/evm_module.{h,cpp}` — `CacheNeedsSPP` gating before
+  `performEVMJITCompile`.
+- `src/compiler/evm_compiler.cpp:70-74` — JIT-side `nullptr` fallback
+  for empty `GasChunkCostSPP`.
+- `docs/changes/2026-04-05-gas-check-placement/README.md` — design
+  notes and benchmark results for the mixed-CFG / dual-array split.
+- `docs/modules/evm/spec.md` — module spec for the EVM bytecode cache.
diff --git a/src/compiler/evm_compiler.cpp b/src/compiler/evm_compiler.cpp
index f7b908c7a..28ee695e5 100644
--- a/src/compiler/evm_compiler.cpp
+++ b/src/compiler/evm_compiler.cpp
@@ -69,8 +69,13 @@ void EagerEVMJITCompiler::compile() {
   Ctx.setMemoryLinearStrideSkipLeadingZeroLimbStores(
       EVMMod->getMemoryLinearStrideSkipLeadingZeroLimbStores());
   const auto &Cache = EVMMod->getBytecodeCache();
+  // GasChunkCostSPP is only allocated when the SPP metering pipeline runs
+  // (i.e. this module will be JIT-compiled). Pass nullptr when the array is
+  // empty so the JIT falls back to the unshifted GasChunkCost automatically.
+  const uint64_t *CostSPPPtr =
+      Cache.GasChunkCostSPP.empty() ? nullptr : Cache.GasChunkCostSPP.data();
   Ctx.setGasChunkInfo(Cache.GasChunkEnd.data(), Cache.GasChunkCost.data(),
-                      EVMMod->CodeSize);
+                      CostSPPPtr, EVMMod->CodeSize);
 
   MModule Mod(Ctx);
   buildEVMFunction(Ctx, Mod, *EVMMod);
diff --git a/src/compiler/evm_frontend/evm_mir_compiler.cpp b/src/compiler/evm_frontend/evm_mir_compiler.cpp
index 3b04a5784..bbaa4a247 100644
--- a/src/compiler/evm_frontend/evm_mir_compiler.cpp
+++ b/src/compiler/evm_frontend/evm_mir_compiler.cpp
@@ -78,6 +78,7 @@ EVMFrontendContext::EVMFrontendContext(const EVMFrontendContext &OtherCtx)
       BytecodeSize(OtherCtx.BytecodeSize),
       GasMeteringEnabled(OtherCtx.GasMeteringEnabled),
       GasChunkEnd(OtherCtx.GasChunkEnd), GasChunkCost(OtherCtx.GasChunkCost),
+      GasChunkCostSPP(OtherCtx.GasChunkCostSPP),
       GasChunkSize(OtherCtx.GasChunkSize), Revision(OtherCtx.Revision),
       MemoryLinearStrideSkipLeadingZeroLimbStores(
           OtherCtx.MemoryLinearStrideSkipLeadingZeroLimbStores)
@@ -393,6 +394,7 @@ void EVMMirBuilder::initEVM(CompilerContext *Context) {
 
   GasChunkEnd = EvmCtx->getGasChunkEnd();
   GasChunkCost = EvmCtx->getGasChunkCost();
+  GasChunkCostSPP = EvmCtx->getGasChunkCostSPP();
   GasChunkSize = EvmCtx->getGasChunkSize();
 
 #ifdef ZEN_ENABLE_EVM_GAS_REGISTER
@@ -527,7 +529,12 @@ void EVMMirBuilder::meterOpcode(evmc_opcode Opcode, uint64_t PC) {
   }
   if (GasChunkEnd && GasChunkCost && PC < GasChunkSize) {
     if (GasChunkEnd[PC] > PC) {
-      meterGas(GasChunkCost[PC]);
+      // Prefer SPP-shifted cost when available — it preserves per-path totals
+      // while reducing the number of non-zero entries the JIT must emit a
+      // gas check for.
+      const uint64_t Cost =
+          GasChunkCostSPP ? GasChunkCostSPP[PC] : GasChunkCost[PC];
+      meterGas(Cost);
     }
     return;
   }
@@ -570,7 +577,7 @@ void EVMMirBuilder::meterOpcodeRange(uint64_t StartPC,
     uint64_t Cost = 0;
     if (GasChunkEnd && GasChunkCost && PC < GasChunkSize &&
         GasChunkEnd[PC] > PC) {
-      Cost = GasChunkCost[PC];
+      Cost = GasChunkCostSPP ? GasChunkCostSPP[PC] : GasChunkCost[PC];
     } else {
       const uint8_t Opcode = static_cast<uint8_t>(Bytecode[PC]);
       Cost = static_cast<uint64_t>(InstructionMetrics[Opcode].gas_cost);
@@ -1307,7 +1314,7 @@ void EVMMirBuilder::createJumpTable() {
             uint64_t Cost = 0;
             if (GasChunkEnd && GasChunkCost && Pc < GasChunkSize &&
                 GasChunkEnd[Pc] > Pc) {
-              Cost = GasChunkCost[Pc];
+              Cost = GasChunkCostSPP ? GasChunkCostSPP[Pc] : GasChunkCost[Pc];
             } else {
               // All bytes in the run are JUMPDEST opcode bytes (PUSH payload is
               // skipped in the scan above), so the fallback is a constant.
diff --git a/src/compiler/evm_frontend/evm_mir_compiler.h b/src/compiler/evm_frontend/evm_mir_compiler.h
index 65f35c745..4eec866de 100644
--- a/src/compiler/evm_frontend/evm_mir_compiler.h
+++ b/src/compiler/evm_frontend/evm_mir_compiler.h
@@ -66,13 +66,15 @@ class EVMFrontendContext final : public CompileContext {
   bool isGasMeteringEnabled() const { return GasMeteringEnabled; }
 
   void setGasChunkInfo(const uint32_t *ChunkEnd, const uint64_t *ChunkCost,
-                       size_t Size) {
+                       const uint64_t *ChunkCostSPP, size_t Size) {
     GasChunkEnd = ChunkEnd;
     GasChunkCost = ChunkCost;
+    GasChunkCostSPP = ChunkCostSPP;
     GasChunkSize = Size;
   }
   const uint32_t *getGasChunkEnd() const { return GasChunkEnd; }
   const uint64_t *getGasChunkCost() const { return GasChunkCost; }
+  const uint64_t *getGasChunkCostSPP() const { return GasChunkCostSPP; }
   size_t getGasChunkSize() const { return GasChunkSize; }
   bool hasGasChunks() const {
     return GasChunkEnd && GasChunkCost && GasChunkSize > 0;
@@ -98,6 +100,7 @@ class EVMFrontendContext final : public CompileContext {
   bool GasMeteringEnabled = false;
   const uint32_t *GasChunkEnd = nullptr;
   const uint64_t *GasChunkCost = nullptr;
+  const uint64_t *GasChunkCostSPP = nullptr;
   size_t GasChunkSize = 0;
   evmc_revision Revision = zen::evm::DEFAULT_REVISION;
   uint8_t MemoryLinearStrideSkipLeadingZeroLimbStores = 0;
@@ -1275,6 +1278,7 @@ class EVMMirBuilder final {
   // Chunk gas metering
   const uint32_t *GasChunkEnd = nullptr;
   const uint64_t *GasChunkCost = nullptr;
+  const uint64_t *GasChunkCostSPP = nullptr;
   size_t GasChunkSize = 0;
 
 #ifdef ZEN_ENABLE_EVM_GAS_REGISTER
diff --git a/src/evm/evm_cache.cpp b/src/evm/evm_cache.cpp
index cc5a6208e..a2a4f4a19 100644
--- a/src/evm/evm_cache.cpp
+++ b/src/evm/evm_cache.cpp
@@ -197,6 +197,11 @@ struct GasBlock {
   uint64_t Cost = 0;
   std::vector<uint32_t> Succs;
   std::vector<uint32_t> Preds;
+  // Count of dynamic-jump blocks in this contract that could land here at
+  // runtime. Only nonzero for JUMPDEST blocks when the contract has at
+  // least one unresolved dynamic jump. Carried separately so we avoid
+  // materialising D*J explicit over-approximation edges (see buildCFGEdges).
+  uint32_t ImplicitDynamicPredCount = 0;
 };
 
 static void addEdge(std::vector<GasBlock> &Blocks, uint32_t From, uint32_t To) {
@@ -338,6 +343,27 @@ static void buildGasBlocks(const zen::common::Byte *Code, size_t CodeSize,
   }
 }
 
+// Decode a PUSH immediate at PushPc and validate it as a JUMPDEST address.
+// Returns true and sets DestPc on success.
+static bool decodePushAsJumpDest(const std::vector<intx::uint256> &PushValueMap,
+                                 const std::vector<uint8_t> &JumpDestMap,
+                                 size_t CodeSize, uint32_t PushPc,
+                                 uint32_t &DestPc) {
+  const intx::uint256 Value = PushValueMap[PushPc];
+  if ((Value >> 64) != 0) {
+    return false;
+  }
+  const uint64_t Dest = static_cast<uint64_t>(Value);
+  if (Dest >= CodeSize) {
+    return false;
+  }
+  if (JumpDestMap[Dest] == 0) {
+    return false;
+  }
+  DestPc = static_cast<uint32_t>(Dest);
+  return true;
+}
+
 static bool resolveConstantJumpTarget(const std::vector<uint8_t> &JumpDestMap,
                                       const std::vector<intx::uint256> &PushMap,
                                       size_t CodeSize, const GasBlock &Block,
@@ -354,22 +380,73 @@ static bool resolveConstantJumpTarget(const std::vector<uint8_t> &JumpDestMap,
     return false;
   }
 
-  const intx::uint256 Value = PushMap[Block.PrevPc];
-  if ((Value >> 64) != 0) {
-    return false;
-  }
+  return decodePushAsJumpDest(PushMap, JumpDestMap, CodeSize, Block.PrevPc,
+                              DestPc);
+}
 
-  const uint64_t Dest = static_cast<uint64_t>(Value);
-  if (Dest >= CodeSize) {
-    return false;
+// Build CFG edges for all blocks. Static jumps (PUSH → JUMP) get precise
+// single-target edges. For each unresolved dynamic jump we DO NOT add the
+// D*|JUMPDEST| explicit over-approximation edges (which previously made the
+// pass quadratic-to-cubic in pathological contracts). Instead we record on
+// every JUMPDEST how many dynamic-jump blocks could land there at runtime
+// via `ImplicitDynamicPredCount`, and `effectivePredCount` folds that count
+// into its multi-predecessor check. SPP decisions are identical: a JUMPDEST
+// that is a potential dynamic-jump target sees `effectivePredCount > 1` and
+// `lemma614Update` refuses to shift gas across that edge, exactly as it
+// would have done against an explicit over-approximated `Preds` set.
+static void buildCFGEdges(std::vector<GasBlock> &Blocks,
+                          const std::vector<uint32_t> &BlockAtPc,
+                          const std::vector<uint8_t> &JumpDestMap,
+                          const std::vector<intx::uint256> &PushValueMap,
+                          const std::vector<uint32_t> &JumpDestBlocks,
+                          size_t CodeSize) {
+  // Count unresolved dynamic jumps once so we can stamp every JUMPDEST with
+  // the right implicit-predecessor count in O(N) instead of O(D*J).
+  uint32_t DynamicJumpCount = 0;
+  for (const auto &Block : Blocks) {
+    if (!isJumpOpcode(Block.LastOpcode)) {
+      continue;
+    }
+    uint32_t DestPc = 0;
+    if (!resolveConstantJumpTarget(JumpDestMap, PushValueMap, CodeSize, Block,
+                                   DestPc)) {
+      ++DynamicJumpCount;
+    }
   }
-
-  if (JumpDestMap[Dest] == 0) {
-    return false;
+  if (DynamicJumpCount > 0) {
+    for (uint32_t JdId : JumpDestBlocks) {
+      Blocks[JdId].ImplicitDynamicPredCount = DynamicJumpCount;
+    }
   }
 
-  DestPc = static_cast<uint32_t>(Dest);
-  return true;
+  for (size_t BlockId = 0; BlockId < Blocks.size(); ++BlockId) {
+    auto &Block = Blocks[BlockId];
+    const bool IsTerminator = isControlFlowTerminator(Block.LastOpcode);
+
+    // Add fallthrough edge for non-terminating opcodes (CALL/CREATE/GAS,
+    // JUMPI included via the generic !IsTerminator path).
+    if (!IsTerminator && Block.End < CodeSize) {
+      const uint32_t SuccId = BlockAtPc[Block.End];
+      if (SuccId != UINT32_MAX) {
+        addEdge(Blocks, static_cast<uint32_t>(BlockId), SuccId);
+      }
+    }
+
+    // Add jump target edge(s).
+    if (isJumpOpcode(Block.LastOpcode)) {
+      uint32_t DestPc = 0;
+      if (resolveConstantJumpTarget(JumpDestMap, PushValueMap, CodeSize, Block,
+                                    DestPc)) {
+        // Static (constant) jump: single known target.
+        const uint32_t SuccId = BlockAtPc[DestPc];
+        if (SuccId != UINT32_MAX) {
+          addEdge(Blocks, static_cast<uint32_t>(BlockId), SuccId);
+        }
+      }
+      // Dynamic jump: handled by the implicit-predecessor count stamped onto
+      // every JUMPDEST above. No explicit Succs/Preds edges added.
+    }
+  }
 }
 
 static size_t bitsetWordCount(size_t NumBits) { return (NumBits + 63) / 64; }
@@ -852,11 +929,18 @@ static bool buildLoopsUsingDominance(
 
 // Effective predecessor count: the entry block (Start == 0) is always reachable
 // from the program start, adding an implicit path not represented in the CFG.
+//
+// Blocks with `ImplicitDynamicPredCount > 0` (every JUMPDEST in a contract
+// that has at least one dynamic jump) carry the over-approximated dynamic
+// predecessors as a count instead of explicit edges; folding them in here
+// keeps `lemma614Update`'s "shift only into single-pred successors" check
+// equivalent to the explicit over-approximation.
 static size_t effectivePredCount(const GasBlock &Block) {
   size_t Count = Block.Preds.size();
   if (Block.Start == 0) {
     ++Count;
   }
+  Count += Block.ImplicitDynamicPredCount;
   return Count;
 }
 
@@ -876,10 +960,17 @@ static bool lemma614Update(uint32_t NodeId, const std::vector<GasBlock> &Blocks,
       continue;
     }
     if (AllowedMask && !bitsetTest(*AllowedMask, Succ)) {
+      // Non-back-edge successor excluded from shifting — its path would
+      // see the inflated parent cost without compensation.
+      MinSucc = 0;
       continue;
     }
-    // Only consider successors with exactly one effective predecessor.
     if (effectivePredCount(Blocks[Succ]) != 1) {
+      MinSucc = 0;
+      continue;
+    }
+    if (isGasChunkTerminator(Blocks[Succ].LastOpcode)) {
+      MinSucc = 0;
       continue;
     }
     MinSucc = std::min(MinSucc, Metering[Succ]);
@@ -900,6 +991,9 @@ static bool lemma614Update(uint32_t NodeId, const std::vector<GasBlock> &Blocks,
     if (effectivePredCount(Blocks[Succ]) != 1) {
       continue;
     }
+    if (isGasChunkTerminator(Blocks[Succ].LastOpcode)) {
+      continue;
+    }
     Metering[Succ] -= MinSucc;
   }
 
@@ -911,7 +1005,9 @@ static bool buildGasChunksSPP(const zen::common::Byte *Code, size_t CodeSize,
                               const std::vector<uint8_t> &JumpDestMap,
                               const std::vector<intx::uint256> &PushValueMap,
                               std::vector<uint32_t> &GasChunkEnd,
-                              std::vector<uint64_t> &GasChunkCost) {
+                              std::vector<uint64_t> &GasChunkCost,
+                              std::vector<uint64_t> &GasChunkCostSPP,
+                              bool EnableSPP) {
   std::vector<GasBlock> Blocks;
   std::vector<uint32_t> BlockAtPc;
   buildGasBlocks(Code, CodeSize, MetricsTable, Blocks, BlockAtPc);
@@ -920,19 +1016,11 @@ static bool buildGasChunksSPP(const zen::common::Byte *Code, size_t CodeSize,
     return true;
   }
 
-  bool HasDynamicJump = false;
-  for (const auto &Block : Blocks) {
-    if (!isJumpOpcode(Block.LastOpcode)) {
-      continue;
-    }
-    uint32_t DestPc = 0;
-    if (!resolveConstantJumpTarget(JumpDestMap, PushValueMap, CodeSize, Block,
-                                   DestPc)) {
-      HasDynamicJump = true;
-      break;
-    }
-  }
-  if (HasDynamicJump) {
+  if (!EnableSPP) {
+    // Interpreter-only fast path: emit unshifted per-block costs and skip
+    // the expensive CFG / call-site / metering pipeline. The JIT consumer
+    // path (which would read GasChunkCostSPP) is never wired up for this
+    // module, so no SPP-shifted values are needed.
     for (const auto &Block : Blocks) {
       if (Block.Start < CodeSize) {
         GasChunkEnd[Block.Start] = Block.End;
@@ -942,6 +1030,9 @@ static bool buildGasChunksSPP(const zen::common::Byte *Code, size_t CodeSize,
     return true;
   }
 
+  // Always build CFG — no early exit for dynamic jumps.
+  // Unresolved jumps get over-approximated edges to all JUMPDESTs.
+
   std::vector<uint32_t> JumpDestBlocks;
   if (!JumpDestMap.empty()) {
     std::vector<uint8_t> SeenBlocks(Blocks.size(), 0);
@@ -960,42 +1051,44 @@ static bool buildGasChunksSPP(const zen::common::Byte *Code, size_t CodeSize,
     }
   }
 
-  // Build CFG
-  for (size_t BlockId = 0; BlockId < Blocks.size(); ++BlockId) {
-    auto &Block = Blocks[BlockId];
-    const bool IsTerminator = isControlFlowTerminator(Block.LastOpcode);
+  // Static jumps get precise single-target edges. For unresolved dynamic
+  // jumps, the CFG over-approximation is encoded as
+  // ImplicitDynamicPredCount on each JUMPDEST (folded into
+  // effectivePredCount). Narrowing to partial call-site resolution would
+  // under-approximate the CFG and let SPP shift gas along non-existent
+  // edges, producing unsafe metering.
+  buildCFGEdges(Blocks, BlockAtPc, JumpDestMap, PushValueMap, JumpDestBlocks,
+                CodeSize);
 
-    // Add fallthrough edge for non-terminating opcodes (CALL/CREATE/GAS
-    // included).
-    if (!IsTerminator && Block.End < CodeSize) {
-      const uint32_t SuccId = BlockAtPc[Block.End];
-      if (SuccId != UINT32_MAX) {
-        addEdge(Blocks, static_cast<uint32_t>(BlockId), SuccId);
+  splitCriticalEdges(Blocks, CodeSize);
+
+  std::vector<uint8_t> Reachable = computeReachable(Blocks, 0);
+  // Seed dyn-target JUMPDESTs as reachability roots so dom/loop analyses
+  // include them and their static successors. Statically-dead JUMPDESTs
+  // (no static pred, no dyn-jump in the contract) are intentionally left
+  // unreachable.
+  {
+    std::vector<uint32_t> Stack;
+    for (uint32_t JdId : JumpDestBlocks) {
+      if (Blocks[JdId].ImplicitDynamicPredCount == 0) {
+        continue;
+      }
+      if (Reachable[JdId] == 0) {
+        Reachable[JdId] = 1;
+        Stack.push_back(JdId);
       }
     }
-
-    // Add jump edge (if static jump)
-    if (isJumpOpcode(Block.LastOpcode)) {
-      uint32_t DestPc = 0;
-      if (resolveConstantJumpTarget(JumpDestMap, PushValueMap, CodeSize, Block,
-                                    DestPc)) {
-        const uint32_t SuccId = BlockAtPc[DestPc];
-        if (SuccId != UINT32_MAX) {
-          addEdge(Blocks, static_cast<uint32_t>(BlockId), SuccId);
-        }
-      } else {
-        // Dynamic jump: over-approx to all jump destinations.
-        for (uint32_t SuccId : JumpDestBlocks) {
-          addEdge(Blocks, static_cast<uint32_t>(BlockId), SuccId);
+    while (!Stack.empty()) {
+      const uint32_t Node = Stack.back();
+      Stack.pop_back();
+      for (uint32_t Succ : Blocks[Node].Succs) {
+        if (Reachable[Succ] == 0) {
+          Reachable[Succ] = 1;
+          Stack.push_back(Succ);
         }
       }
     }
   }
-
-  // Split critical edges (required for safe SPP optimization)
-  splitCriticalEdges(Blocks, CodeSize);
-
-  const std::vector<uint8_t> Reachable = computeReachable(Blocks, 0);
   const std::vector<std::vector<uint64_t>> Dom =
       computeDominators(Blocks, Reachable);
 
@@ -1119,6 +1212,10 @@ static bool buildGasChunksSPP(const zen::common::Byte *Code, size_t CodeSize,
     }
     GasChunkEnd[Blocks[Id].Start] = Blocks[Id].End;
     GasChunkCost[Blocks[Id].Start] = Blocks[Id].Cost;
+    // Export SPP-shifted cost on a separate output array so the JIT can read
+    // it without perturbing the interpreter fast path, which continues to see
+    // the unshifted per-block cost above.
+    GasChunkCostSPP[Blocks[Id].Start] = Metering[Id];
   }
 
   return true;
@@ -1127,11 +1224,16 @@ static bool buildGasChunksSPP(const zen::common::Byte *Code, size_t CodeSize,
 } // namespace
 
 void buildBytecodeCache(EVMBytecodeCache &Cache, const common::Byte *Code,
-                        size_t CodeSize, evmc_revision Rev) {
+                        size_t CodeSize, evmc_revision Rev, bool EnableSPP) {
   Cache.JumpDestMap.assign(CodeSize, 0);
   Cache.PushValueMap.resize(CodeSize);
   Cache.GasChunkEnd.assign(CodeSize, 0);
   Cache.GasChunkCost.assign(CodeSize, 0);
+  if (EnableSPP) {
+    Cache.GasChunkCostSPP.assign(CodeSize, 0);
+  } else {
+    Cache.GasChunkCostSPP.clear();
+  }
 
   buildJumpDestMapAndPushCache(Code, CodeSize, Cache.JumpDestMap,
                                Cache.PushValueMap);
@@ -1141,7 +1243,8 @@ void buildBytecodeCache(EVMBytecodeCache &Cache, const common::Byte *Code,
   }
 
   buildGasChunksSPP(Code, CodeSize, MetricsTable, Cache.JumpDestMap,
-                    Cache.PushValueMap, Cache.GasChunkEnd, Cache.GasChunkCost);
+                    Cache.PushValueMap, Cache.GasChunkEnd, Cache.GasChunkCost,
+                    Cache.GasChunkCostSPP, EnableSPP);
 }
 
 } // namespace zen::evm
diff --git a/src/evm/evm_cache.h b/src/evm/evm_cache.h
index cd7dd69ec..d43a48739 100644
--- a/src/evm/evm_cache.h
+++ b/src/evm/evm_cache.h
@@ -19,11 +19,21 @@ struct EVMBytecodeCache {
   std::vector<uint8_t> JumpDestMap;
   std::vector<intx::uint256> PushValueMap;
   std::vector<uint32_t> GasChunkEnd;
+  // Per-chunk-start unshifted gas cost. Interpreter reads this — it must
+  // equal the original block base cost (see PR #371).
   std::vector<uint64_t> GasChunkCost;
+  // Per-chunk-start SPP-shifted gas cost for the multipass JIT. Produced by
+  // buildGasChunksSPP's metering pass; never read by the interpreter.
+  std::vector<uint64_t> GasChunkCostSPP;
 };
 
+// Build the bytecode cache. When EnableSPP is true, the expensive SPP
+// metering pipeline runs and GasChunkCostSPP is populated with shifted
+// per-chunk costs for the multipass JIT. When false (interpreter-only
+// modules), the pipeline is skipped and GasChunkCostSPP stays empty.
 void buildBytecodeCache(EVMBytecodeCache &Cache, const common::Byte *Code,
-                        size_t CodeSize, evmc_revision Rev);
+                        size_t CodeSize, evmc_revision Rev,
+                        bool EnableSPP = false);
 
 } // namespace zen::evm
 
diff --git a/src/evm/evm_cache.md b/src/evm/evm_cache.md
index f2f050179..ad9eed528 100644
--- a/src/evm/evm_cache.md
+++ b/src/evm/evm_cache.md
@@ -7,7 +7,8 @@ This document describes the bytecode cache built by `buildBytecodeCache()` in `s
 - `JumpDestMap[pc]` (`uint8_t`): `1` if `Code[pc]` is `OP_JUMPDEST` and this byte is an opcode byte (not inside PUSH data).
 - `PushValueMap[pc]` (`intx::uint256`): decoded immediate for `PUSH1..PUSH32` at `pc`. Unused entries are `0`.
 - `GasChunkEnd[pc]` (`uint32_t`): for a chunk start `pc`, the exclusive end PC of the chunk; otherwise `0`.
-- `GasChunkCost[pc]` (`uint64_t`): metering cost charged at block start `pc` (SPP-shifted in optimized mode); otherwise `0`.
+- `GasChunkCost[pc]` (`uint64_t`): unshifted base gas cost of the block starting at `pc` (sum of EVMC base costs of opcodes in the block); otherwise `0`. Read by the interpreter.
+- `GasChunkCostSPP[pc]` (`uint64_t`): SPP-shifted gas cost of the block starting at `pc`. Populated only when the SPP metering pipeline runs (JIT-consumer modules); otherwise the array is empty. Read by the multipass JIT.
 
 ## Build Algorithm
 
@@ -62,7 +63,12 @@ using a linear-time SPP pass:
     to the loop nodes in local reverse-topological order.
 
 This moves common costs earlier, reducing the number of non-zero charge points.
-The resulting `m` is stored in `GasChunkCost` at each block start.
+The resulting shifted value `m(s)` is stored in `GasChunkCostSPP[s]` at each
+block start; `GasChunkCost[s]` continues to hold the unshifted base cost so
+the interpreter fast path is unaffected. The SPP pipeline only runs for
+modules that will be JIT-compiled (gated by `EnableSPP` in
+`buildBytecodeCache`); for interpreter-only modules `GasChunkCostSPP` is
+left empty and the CFG / metering work is skipped.
 
 If the CFG is not suitable for linear SPP (e.g., dominance-based loop analysis
 fails), we still run SPP updates once per node in reverse topological order
@@ -96,14 +102,16 @@ zero bytes on the right, matching the EVM encoding.
 
 ### Correctness of chunk gas charging
 
-In SPP mode, `GasChunkCost[s]` is the shifted metering value `m(s)`. Lemma 6.14
-updates move cost along CFG edges while preserving total base cost on every
-path. Over-approximating dynamic jumps keeps the optimization safe (it may
-reduce shifts but never undercharges). Splitting critical edges ensures that
-cost is only moved along edges where the local update is valid. When loop
-analysis fails, the reverse-topological updates still preserve correctness
-without fast-forward.
-
-The fast path is still used only when `gas_left >= GasChunkCost[s]`, so base-cost
-out-of-gas cannot occur inside a block. Dynamic/extra gas is charged inside
-opcode handlers as before (memory expansion, cold access, keccak word cost, etc).
+`GasChunkCost[s]` is always the unshifted base cost of block `s`, so the
+interpreter's fast path enters a chunk only when `gas_left >= GasChunkCost[s]`
+and base-cost out-of-gas cannot occur inside a block. The multipass JIT reads
+the shifted value `m(s)` from `GasChunkCostSPP[s]`. Lemma 6.14 updates move
+cost along CFG edges while preserving total base cost on every path.
+Over-approximating dynamic jumps to all `JUMPDEST`s keeps the optimization
+safe — narrowing those edges with partial call-site resolution would
+under-approximate the CFG and let the SPP pass shift gas along edges that
+don't exist at runtime, producing unsafe metering. Splitting critical edges
+ensures that cost is only moved along edges where the local update is valid.
+When loop analysis fails, the reverse-topological updates still preserve
+correctness without fast-forward. Dynamic/extra gas is charged inside opcode
+handlers as before (memory expansion, cold access, keccak word cost, etc).
diff --git a/src/runtime/evm_module.cpp b/src/runtime/evm_module.cpp
index a3e3177f3..d551ca9bc 100644
--- a/src/runtime/evm_module.cpp
+++ b/src/runtime/evm_module.cpp
@@ -114,6 +114,9 @@ EVMModule::newEVMModule(Runtime &RT, CodeHolderUniquePtr CodeHolder,
     if (!Mod->ShouldFallbackToInterp)
 #endif // ZEN_ENABLE_JIT_PRECOMPILE_FALLBACK
     {
+      // JIT is about to compile this module — mark the bytecode cache so the
+      // SPP metering pipeline runs on first access.
+      Mod->CacheNeedsSPP = true;
       action::performEVMJITCompile(*Mod);
     }
   }
@@ -130,7 +133,8 @@ const evm::EVMBytecodeCache &EVMModule::getBytecodeCache() const {
 }
 
 void EVMModule::initBytecodeCache() const {
-  evm::buildBytecodeCache(BytecodeCache, Code, CodeSize, Revision);
+  evm::buildBytecodeCache(BytecodeCache, Code, CodeSize, Revision,
+                          CacheNeedsSPP);
 }
 
 } // namespace zen::runtime
diff --git a/src/runtime/evm_module.h b/src/runtime/evm_module.h
index 60ea5b62d..de09d9b67 100644
--- a/src/runtime/evm_module.h
+++ b/src/runtime/evm_module.h
@@ -101,6 +101,16 @@ class EVMModule final : public BaseModule<EVMModule> {
   void initBytecodeCache() const;
   mutable bool BytecodeCacheInitialized = false;
   mutable evm::EVMBytecodeCache BytecodeCache;
+  // Whether this module will be consumed by the multipass JIT. When true,
+  // buildBytecodeCache runs the expensive SPP metering pipeline so the JIT
+  // can read shifted gas costs from GasChunkCostSPP. When false, only the
+  // cheap per-block pass runs — interpreter-only modules pay nothing extra.
+  //
+  // Must be set before any getBytecodeCache() call: once the cache is
+  // built, the EnableSPP decision is fixed for the lifetime of the
+  // module. Future lazy / on-demand JIT paths must flip this flag before
+  // triggering the lazy cache build.
+  bool CacheNeedsSPP = false;
   evmc_revision Revision = zen::evm::DEFAULT_REVISION;
   EVMMemorySpecializationProfile MemoryProfile = {};
 
diff --git a/src/tests/CMakeLists.txt b/src/tests/CMakeLists.txt
index 973ef3a2f..526528bab 100644
--- a/src/tests/CMakeLists.txt
+++ b/src/tests/CMakeLists.txt
@@ -60,6 +60,11 @@ if(ZEN_ENABLE_SPEC_TEST)
   if(ZEN_ENABLE_EVM)
     add_subdirectory(mpt)
     add_executable(evmInterpTests evm_interp_tests.cpp)
+    add_executable(evmCacheTests evm_cache_tests.cpp)
+    # Build-only target: never receives ASan even in ASan builds so its
+    # wall-clock measurement is not distorted by sanitizer overhead.
+    add_executable(evmCacheComplexityDemo evm_cache_complexity_demo.cpp)
+    target_link_libraries(evmCacheComplexityDemo PRIVATE dtvmcore)
     if(ZEN_ENABLE_MULTIPASS_JIT)
       add_executable(evmJitFrontendTests evm_jit_frontend_tests.cpp)
     endif()
@@ -99,6 +104,7 @@ if(ZEN_ENABLE_SPEC_TEST)
 
     if(ZEN_ENABLE_EVM)
       target_compile_options(evmInterpTests PRIVATE -fsanitize=address)
+      target_compile_options(evmCacheTests PRIVATE -fsanitize=address)
       if(ZEN_ENABLE_MULTIPASS_JIT)
         target_compile_options(evmJitFrontendTests PRIVATE -fsanitize=address)
       endif()
@@ -124,6 +130,11 @@ if(ZEN_ENABLE_SPEC_TEST)
           PRIVATE dtvmcore rapidjson yaml-cpp gtest_main -fsanitize=address
           PUBLIC ${GTEST_BOTH_LIBRARIES}
         )
+        target_link_libraries(
+          evmCacheTests
+          PRIVATE dtvmcore gtest_main -fsanitize=address
+          PUBLIC ${GTEST_BOTH_LIBRARIES}
+        )
         if(ZEN_ENABLE_MULTIPASS_JIT)
           target_link_libraries(
             evmJitFrontendTests
@@ -180,6 +191,11 @@ if(ZEN_ENABLE_SPEC_TEST)
                   -static-libasan
           PUBLIC ${GTEST_BOTH_LIBRARIES}
         )
+        target_link_libraries(
+          evmCacheTests
+          PRIVATE dtvmcore gtest_main -fsanitize=address -static-libasan
+          PUBLIC ${GTEST_BOTH_LIBRARIES}
+        )
         if(ZEN_ENABLE_MULTIPASS_JIT)
           target_link_libraries(
             evmJitFrontendTests
@@ -245,6 +261,11 @@ if(ZEN_ENABLE_SPEC_TEST)
         PRIVATE dtvmcore rapidjson yaml-cpp gtest_main
         PUBLIC ${GTEST_BOTH_LIBRARIES}
       )
+      target_link_libraries(
+        evmCacheTests
+        PRIVATE dtvmcore gtest_main
+        PUBLIC ${GTEST_BOTH_LIBRARIES}
+      )
       if(ZEN_ENABLE_MULTIPASS_JIT)
         target_link_libraries(
           evmJitFrontendTests
@@ -292,6 +313,7 @@ if(ZEN_ENABLE_SPEC_TEST)
 
   if(ZEN_ENABLE_EVM)
     add_test(NAME evmInterpTests COMMAND evmInterpTests)
+    add_test(NAME evmCacheTests COMMAND evmCacheTests)
     if(ZEN_ENABLE_MULTIPASS_JIT)
       add_test(NAME evmJitFrontendTests COMMAND evmJitFrontendTests)
     endif()
diff --git a/src/tests/evm_cache_complexity_demo.cpp b/src/tests/evm_cache_complexity_demo.cpp
new file mode 100644
index 000000000..26dcf2d09
--- /dev/null
+++ b/src/tests/evm_cache_complexity_demo.cpp
@@ -0,0 +1,65 @@
+// Copyright (C) 2025 the DTVM authors. All Rights Reserved.
+// SPDX-License-Identifier: Apache-2.0
+
+// Time buildBytecodeCache on a CALLDATALOAD JUMP <N x JUMPDEST> STOP
+// contract. Usage: evmCacheComplexityDemo <n_jumpdests>
+// Output: "<n_jumpdests>,<build_ms>" on stdout.
+
+#include "evm/evm_cache.h"
+#include "platform/platform.h"
+
+#include <evmc/evmc.h>
+#include <evmc/instructions.h>
+
+#include <chrono>
+#include <cstddef>
+#include <cstdint>
+#include <cstdio>
+#include <cstdlib>
+#include <string>
+#include <vector>
+
+namespace {
+
+constexpr uint8_t OP_STOP = static_cast<uint8_t>(evmc_opcode::OP_STOP);
+constexpr uint8_t OP_CALLDATALOAD =
+    static_cast<uint8_t>(evmc_opcode::OP_CALLDATALOAD);
+constexpr uint8_t OP_JUMP = static_cast<uint8_t>(evmc_opcode::OP_JUMP);
+constexpr uint8_t OP_JUMPDEST = static_cast<uint8_t>(evmc_opcode::OP_JUMPDEST);
+
+std::vector<uint8_t> makeDynDispatchContract(size_t NumJumpDests) {
+  std::vector<uint8_t> Code;
+  Code.reserve(NumJumpDests + 3);
+  Code.push_back(OP_CALLDATALOAD);
+  Code.push_back(OP_JUMP);
+  for (size_t I = 0; I < NumJumpDests; ++I) {
+    Code.push_back(OP_JUMPDEST);
+  }
+  Code.push_back(OP_STOP);
+  return Code;
+}
+
+double timeCacheBuildMs(const std::vector<uint8_t> &Code) {
+  using Clock = zen::common::SteadyClock;
+  const auto Start = Clock::now();
+  zen::evm::EVMBytecodeCache Cache;
+  zen::evm::buildBytecodeCache(Cache,
+                               reinterpret_cast<const std::byte *>(Code.data()),
+                               Code.size(), EVMC_CANCUN, /*EnableSPP=*/true);
+  const auto End = Clock::now();
+  return std::chrono::duration<double, std::milli>(End - Start).count();
+}
+
+} // namespace
+
+int main(int Argc, char **Argv) {
+  if (Argc != 2) {
+    std::fprintf(stderr, "usage: %s <n_jumpdests>\n", Argv[0]);
+    return 2;
+  }
+  const size_t N = static_cast<size_t>(std::stoull(Argv[1]));
+  const auto Code = makeDynDispatchContract(N);
+  const double Ms = timeCacheBuildMs(Code);
+  std::printf("%zu,%.3f\n", N, Ms);
+  return 0;
+}
diff --git a/src/tests/evm_cache_tests.cpp b/src/tests/evm_cache_tests.cpp
new file mode 100644
index 000000000..ac34320e3
--- /dev/null
+++ b/src/tests/evm_cache_tests.cpp
@@ -0,0 +1,107 @@
+// Copyright (C) 2025 the DTVM authors. All Rights Reserved.
+// SPDX-License-Identifier: Apache-2.0
+
+// Regression tests for buildBytecodeCache's SPP pipeline: implicit
+// dyn-pred count + reachability stitch on dyn-target JUMPDESTs.
+
+#include "evm/evm_cache.h"
+
+#include <evmc/evmc.h>
+#include <evmc/instructions.h>
+#include <gtest/gtest.h>
+
+#include <cstddef>
+#include <cstdint>
+#include <vector>
+
+namespace {
+
+using zen::evm::buildBytecodeCache;
+using zen::evm::EVMBytecodeCache;
+
+constexpr uint8_t OP_STOP = static_cast<uint8_t>(evmc_opcode::OP_STOP);
+constexpr uint8_t OP_ADD = static_cast<uint8_t>(evmc_opcode::OP_ADD);
+constexpr uint8_t OP_CALLDATALOAD =
+    static_cast<uint8_t>(evmc_opcode::OP_CALLDATALOAD);
+constexpr uint8_t OP_POP = static_cast<uint8_t>(evmc_opcode::OP_POP);
+constexpr uint8_t OP_JUMP = static_cast<uint8_t>(evmc_opcode::OP_JUMP);
+constexpr uint8_t OP_JUMPDEST = static_cast<uint8_t>(evmc_opcode::OP_JUMPDEST);
+constexpr uint8_t OP_PUSH1 = static_cast<uint8_t>(evmc_opcode::OP_PUSH1);
+
+EVMBytecodeCache buildSPPCache(const std::vector<uint8_t> &Code) {
+  EVMBytecodeCache Cache;
+  buildBytecodeCache(Cache, reinterpret_cast<const std::byte *>(Code.data()),
+                     Code.size(), EVMC_CANCUN, /*EnableSPP=*/true);
+  return Cache;
+}
+
+EVMBytecodeCache buildNoSPPCache(const std::vector<uint8_t> &Code) {
+  EVMBytecodeCache Cache;
+  buildBytecodeCache(Cache, reinterpret_cast<const std::byte *>(Code.data()),
+                     Code.size(), EVMC_CANCUN, /*EnableSPP=*/false);
+  return Cache;
+}
+
+// Smoke: no dynamic jumps + a statically-dead JUMPDEST must not crash;
+// SPP must leave the dead block's cost unchanged (empty Succs, nothing
+// to shift out).
+TEST(EVMCacheImplicitDynPred, BuildsCleanly_NoDynJumpWithDeadJumpDest) {
+  const std::vector<uint8_t> Code = {OP_STOP, OP_JUMPDEST, OP_ADD, OP_STOP};
+  const EVMBytecodeCache Cache = buildSPPCache(Code);
+
+  ASSERT_EQ(Cache.GasChunkCost.size(), Code.size());
+  ASSERT_EQ(Cache.GasChunkCostSPP.size(), Code.size());
+  // JUMPDEST(1) + ADD(3) = 4 gas.
+  EXPECT_EQ(Cache.GasChunkCost[1], 4u);
+  EXPECT_EQ(Cache.GasChunkCostSPP[1], Cache.GasChunkCost[1]);
+}
+
+// A JUMPDEST reachable only via an unresolved dynamic jump must still
+// land in dom-analysis input via the reachability stitch, so its SPP
+// entry is populated.
+TEST(EVMCacheImplicitDynPred, DynTargetJumpDest_StitchedIntoSPP) {
+  const std::vector<uint8_t> Code = {
+      OP_CALLDATALOAD, OP_JUMP, OP_JUMPDEST, OP_ADD, OP_POP, OP_STOP,
+  };
+  const EVMBytecodeCache Cache = buildSPPCache(Code);
+
+  ASSERT_EQ(Cache.GasChunkCost.size(), Code.size());
+  ASSERT_EQ(Cache.GasChunkCostSPP.size(), Code.size());
+  // JUMPDEST(1) + ADD(3) + POP(2) + STOP(0) = 6 gas.
+  EXPECT_EQ(Cache.GasChunkCost[2], 6u);
+  EXPECT_EQ(Cache.GasChunkCostSPP[2], Cache.GasChunkCost[2]);
+  // CALLDATALOAD(3) + JUMP(8) = 11 gas.
+  EXPECT_EQ(Cache.GasChunkCost[0], 11u);
+}
+
+// EnableSPP=false must leave GasChunkCostSPP empty so the JIT-consumer
+// fall-through hands the unshifted cost array to downstream code.
+TEST(EVMCacheImplicitDynPred, InterpreterOnly_LeavesSPPArrayEmpty) {
+  const std::vector<uint8_t> Code = {OP_PUSH1, 0x05,        OP_JUMP, OP_PUSH1,
+                                     0x00,     OP_JUMPDEST, OP_STOP};
+  const EVMBytecodeCache Cache = buildNoSPPCache(Code);
+
+  ASSERT_EQ(Cache.GasChunkCost.size(), Code.size());
+  EXPECT_TRUE(Cache.GasChunkCostSPP.empty());
+}
+
+// Two dynamic JUMPs => ImplicitDynamicPredCount == 2 on each JUMPDEST.
+// effectivePredCount must block any lemma614 shift INTO either JUMPDEST.
+TEST(EVMCacheImplicitDynPred, MultipleDynJumps_BothTargetsCounted) {
+  const std::vector<uint8_t> Code = {
+      OP_CALLDATALOAD, OP_JUMP,     OP_JUMPDEST, OP_CALLDATALOAD,
+      OP_JUMP,         OP_JUMPDEST, OP_POP,      OP_STOP,
+  };
+  const EVMBytecodeCache Cache = buildSPPCache(Code);
+
+  ASSERT_EQ(Cache.GasChunkCost.size(), Code.size());
+  ASSERT_EQ(Cache.GasChunkCostSPP.size(), Code.size());
+  EXPECT_EQ(Cache.JumpDestMap[2], 1u);
+  EXPECT_EQ(Cache.JumpDestMap[5], 1u);
+  EXPECT_GT(Cache.GasChunkCost[2], 0u);
+  EXPECT_GT(Cache.GasChunkCost[5], 0u);
+  EXPECT_EQ(Cache.GasChunkCostSPP[2], Cache.GasChunkCost[2]);
+  EXPECT_EQ(Cache.GasChunkCostSPP[5], Cache.GasChunkCost[5]);
+}
+
+} // namespace