diff --git a/docs/changes/2026-04-05-gas-check-placement/README.md b/docs/changes/2026-04-05-gas-check-placement/README.md new file mode 100644 index 000000000..f172fd044 --- /dev/null +++ b/docs/changes/2026-04-05-gas-check-placement/README.md @@ -0,0 +1,183 @@ +# Change: Gas check placement optimization with mixed CFG support + +- **Status**: Implemented +- **Date**: 2026-04-05 +- **Tier**: Full + +## Overview + +Remove the all-or-nothing dynamic-jump fallback in the EVM bytecode cache's +SPP gas-metering pipeline. Previously, any unresolved dynamic jump caused the +entire contract to fall back to per-block gas metering (zero SPP benefit). +The cache now always builds a CFG with mixed-precision edges and runs the +SPP shifting pass, while keeping the unshifted per-block cost available for +the interpreter. + +The final design has three pieces: + +- **Mixed-precision CFG**: static jumps (`PUSH → JUMP`) get a single precise + edge to the resolved `JUMPDEST`; every other dynamic jump gets + over-approximated edges to all `JUMPDEST` blocks. The over-approximation + is intentional — narrowing dynamic-jump edges with partially-resolved + call-site information would under-approximate the CFG and let the SPP + pass shift gas along edges that don't exist at runtime, producing unsafe + metering. See `buildCFGEdges` in `src/evm/evm_cache.cpp` (lines 386–429, + in particular the dynamic-jump branch at line 419). +- **SPP-shifted gas cost on a separate array**: the interpreter's gas-chunk + fast path requires unshifted per-block costs (PR #371). To preserve those + semantics while enabling SPP for the JIT, the cache exposes two parallel + arrays. `EVMBytecodeCache::GasChunkCost` keeps the unshifted base cost + (written from `Blocks[Id].Cost` at `evm_cache.cpp:1161`), and a new + `EVMBytecodeCache::GasChunkCostSPP` carries the SPP-shifted value + (written from the metering function `Metering[Id]` at + `evm_cache.cpp:1165`). The interpreter (`src/evm/interpreter.cpp:382`) + reads only `GasChunkCost`; the multipass JIT prefers `GasChunkCostSPP` + when non-null and falls back to `GasChunkCost` otherwise + (`src/compiler/evm_frontend/evm_mir_compiler.cpp:534, 578`). +- **Interpreter-mode gating**: the SPP pipeline (CFG construction + metering + pass) is expensive and only useful for the JIT consumer. `buildBytecodeCache` + takes an `EnableSPP` parameter; when false, it emits unshifted per-block + costs and skips the CFG/metering work entirely. `EVMModule::CacheNeedsSPP` + is set to `true` immediately before `performEVMJITCompile` runs, so + interpreter-only modules never pay the SPP pipeline cost. When the JIT + somehow runs without SPP being built, `evm_compiler.cpp` passes `nullptr` + for `GasChunkCostSPP` so the JIT falls back to the unshifted array. + +## Motivation + +The existing all-or-nothing fallback meant any contract with unresolvable +dynamic jumps got zero benefit from SPP. Real-world Solidity contracts mix +static and dynamic jumps, so a mixed-edge CFG is needed to let the SPP pass +do useful work on the resolved portion of the CFG while staying sound on +the unresolved portion. + +## Scope + +This PR is scoped to the cache-side CFG and JIT-cost wiring: + +- Remove the `HasDynamicJump` early-exit bailout in `buildGasChunksSPP`. +- Factor out `buildCFGEdges()` with over-approximation for all unresolved + dynamic jumps (sound for SPP metering). +- Add `EVMBytecodeCache::GasChunkCostSPP` and write SPP-shifted costs into + it, leaving `GasChunkCost` unshifted for the interpreter. +- Plumb the SPP pointer through `EVMFrontendContext::setGasChunkInfo` and + `EVMMirBuilder`; swap the JIT's chunk-cost reads (`meterOpcode`, + `meterOpcodeRange`, JUMPDEST-run suffix-sum precompute) to prefer + `GasChunkCostSPP` when non-null. +- Add an `EnableSPP` parameter to `buildBytecodeCache` and gate the + pipeline on JIT-consumer modules only. +- Tighten the SPP shifting guards to bail out of the shift when a successor + is a `isGasChunkTerminator` — prevents masking gas cost across chunk + boundaries. + +No frontend/MIR changes beyond the cost-source swap are included. + +## Impact + +### Affected Modules + +- `docs/modules/evm/` — EVM bytecode cache, CFG construction, SPP metering + +### Compatibility + +No breaking changes. Interpreter semantics are preserved (`GasChunkCost` +remains the unshifted per-block cost, matching PR #371). JIT semantics are +preserved when SPP is enabled (the JIT now reads SPP-shifted costs from a +separate array instead of overwriting the interpreter's table). + +### Metrics + +Numbers are from the CI Performance Regression Check (baseline +`perf-baseline-*-a14a9de...`, 5 repetitions, 25% threshold) — the +gate-of-record for this PR. The full 194-bench multipass table lives in +the github-actions perf-check comment on the PR; this section +summarizes the design-relevant subset. + +**Wins (jump-light / cost-shift opportunities):** + +- `micro/signextend/{one,zero}`: 0.13 → 0.07 μs (≈ −42.7%) +- `micro/memory_grow_mstore/nogrow`: −6.8% +- `main/structarray_alloc/nfts_rank`: −6.2% +- `main/blake2b_huff/8415nulls`: −5.3% + +**Regressions (jump-heavy contracts — predicted cost of mixed-CFG +over-approximation):** + +- `micro/jump_around/empty`: 0.04 → 0.05 μs (+22.8%) +- `main/weierstrudel/1`: 0.20 → 0.24 μs (+19.5%) +- `main/weierstrudel/15`: 2.22 → 2.60 μs (+17.5%) +- `main/snailtracer/benchmark`: 28.49 → 31.58 μs (+10.9%) + +The +17–23% regressions on jump-heavy contracts are the design tradeoff +of over-approximating dynamic-jump edges to all `JUMPDEST` blocks in +order to keep the SPP shift sound (narrowing those edges with partial +call-site resolution would under-approximate the CFG and break per-path +total invariants — see Phase 5 / `buildCFGEdges` in +`src/evm/evm_cache.cpp:389-429`). All 194 benches stay within the 25% +gate, but `jump_around` has tight headroom. + +Earlier drafts of this section cited a 27-bench local `evmone-bench` +run (3 reps) that drifted from the CI baseline; the CI bot table is the +authoritative source. + +Correctness: 223/223 multipass evmone-unittests, 215/215 interpreter +evmone-unittests, 2723/2723 evmone-statetests on `fork_Cancun` for both +multipass and interpreter modes. + +## Implementation Plan + +### Mixed CFG construction + +- [x] Remove the all-or-nothing fallback that disabled SPP on any unresolved + dynamic jump +- [x] Factor `buildCFGEdges()` so static jumps get precise single-target + edges and unresolved dynamic jumps get over-approximated edges to + every `JUMPDEST` + +### JIT cost wiring + +- [x] Add `EVMBytecodeCache::GasChunkCostSPP` parallel array, populated from + the SPP metering function in `buildGasChunksSPP` +- [x] Plumb the SPP pointer through `EVMFrontendContext::setGasChunkInfo` + and `EVMMirBuilder` +- [x] In `meterOpcode`, `meterOpcodeRange`, and the JUMPDEST-run suffix-sum + precompute, prefer `GasChunkCostSPP` when non-null +- [x] Interpreter continues reading the unshifted `GasChunkCost` — no change + +### SPP pipeline gating + +- [x] Add `buildBytecodeCache(..., bool EnableSPP)` parameter +- [x] When `EnableSPP == false`, skip the CFG / metering pipeline and emit + unshifted per-block costs only +- [x] `EVMModule::CacheNeedsSPP` is flipped to `true` immediately before + `performEVMJITCompile` runs, so interpreter-only modules never pay + the SPP pipeline cost +- [x] `evm_compiler.cpp` passes `nullptr` for `GasChunkCostSPP` when the + array is empty, so the JIT falls back to the unshifted array if a + module is JIT-compiled without SPP being built + +### Soundness guards + +- [x] Tighten `lemma614Update` to set `MinSucc = 0` when encountering + excluded successors or gas-chunk terminators + +## Changed Files + +- `src/evm/evm_cache.h` — add `GasChunkCostSPP` array, document + interpreter vs JIT consumer split +- `src/evm/evm_cache.cpp` — mixed-CFG `buildCFGEdges`, SPP-shifted cost + export, `EnableSPP` gating +- `src/compiler/evm_frontend/evm_mir_compiler.h` — plumb SPP pointer + through context and builder +- `src/compiler/evm_frontend/evm_mir_compiler.cpp` — prefer SPP-shifted + cost at the three chunk-cost read sites +- `src/compiler/evm_compiler.cpp` — pass the new pointer via + `setGasChunkInfo`, with `nullptr` fallback when the SPP array is empty +- `src/runtime/evm_module.h` — `CacheNeedsSPP` flag + +## Risks + +- Over-approximated edges for unresolved jumps may pessimize gas placement + for pathological contracts with many unresolved targets. Acceptable + because the alternative (narrowed edges from partial resolution) is + unsound for SPP. diff --git a/docs/changes/2026-04-05-gas-check-placement/review-fixes.md b/docs/changes/2026-04-05-gas-check-placement/review-fixes.md new file mode 100644 index 000000000..12f02d156 --- /dev/null +++ b/docs/changes/2026-04-05-gas-check-placement/review-fixes.md @@ -0,0 +1,237 @@ +# PR #446 Review Response Plan + +- **Status**: Implemented (F1, F4, F5); F2/F3 applied to PR body; F6 dropped +- **Date**: 2026-05-07 +- **Parent change**: `README.md` (gas check placement w/ mixed CFG, SPP JIT output, interpreter-mode gating) +- **Branch**: `feat/gas-check-placement` + +## Status update (2026-05-07) + +- **F1 implemented** in commit `81efba3` — `Prev2Pc/Prev2Opcode` removed, + whole-repo grep clean, `GasBlock` shrinks ~9 bytes. +- **F4 implemented** in commit `81efba3` (squashed with F1) — added the + soundness-pairing comment to `buildCFGEdges`. +- **F5 implemented** in commit `691069a` — `CacheNeedsSPP` lifecycle + invariant comment added. +- **F2 / F3 applied** to the PR body via `gh pr edit` — Copilot threads + noted as already-resolved + content-stale (live GraphQL confirmed + `isResolved: true` for all three before the edit), perf table + rewritten with honest +17 to +22.8% jump-heavy regressions from the + latest CI bot output. +- **F6 dropped** — opening an upstream issue for an `addEdge` O(deg²) + concern that was theoretical, unmeasured, and not touched by any + commit on this branch would have been noise. The concern remains + documented below for future reference but no issue is filed. + +This plan addresses the findings of the 2026-05-07 self-review of PR #446. +Items are grouped by whether they block merge. + +## Blocking before merge + +### F1 — Remove dead `Prev2Pc` / `Prev2Opcode` tracking + +**Symptom**: `src/evm/evm_cache.cpp:195, 198` add two `GasBlock` fields and +`src/evm/evm_cache.cpp:323-324` write them in `buildGasBlocks`, but no +reader exists in `src/` or `tests/`. The PR description justifies them as +"future 3-instruction call-site window lookup", but Phase 5 (commit +`c26bf7c`) removed call-site enumeration entirely, so the rationale no +longer applies on this branch. + +**Why this blocks**: a fresh reviewer will re-question every PR cycle until +the dead fields are gone or have a concrete forward link. Leaving them in +also adds a small per-block bookkeeping cost on every cache build. + +**Fix**: remove `GasBlock::Prev2Pc`, `GasBlock::Prev2Opcode`, and the two +writes inside `buildGasBlocks`. Verify no header or test exposes them. + +**Verification**: +- `grep -rn 'Prev2Pc\|Prev2Opcode'` (whole repo) returns nothing. +- `tools/format.sh check` clean. +- Local `evmone-unittests` multipass + interpreter both pass — confirm no + hidden dependency surfaces. + +**Side effect to note in commit body**: `GasBlock` shrinks by ~9 bytes +(one `uint32_t` + one `uint8_t` + alignment). Cache memory footprint +drops marginally; not expected to perturb perf but worth flagging. + +**Out of scope**: re-introducing the tracking when a real consumer lands. +That belongs in the consumer's own PR. + +### F2 — Annotate PR body re: stale Copilot AI threads + +**Symptom**: the three Copilot AI inline comments on PR #446 target an +earlier iteration that included `ResolvedJumpTargets` and call-site +enumeration. Phase 5 (`c26bf7c`) deleted that code, making the threads +content-stale. + +**Round-2 update**: a live GraphQL query +(`gh api graphql ... reviewThreads`) on 2026-05-07 confirmed that all +three Copilot threads are **already** `isResolved: true` (Copilot author +login: `copilot-pull-request-reviewer`). zoowii's design-doc thread is +also resolved. So the previously-planned `resolveReviewThread` mutation +is unnecessary. + +**Why this still matters (downgraded from blocking)**: even though the +threads are visually collapsed, the resolution didn't cite the commit +that made them obsolete. A future reviewer expanding the threads can +still be confused. A short pointer in the PR body removes that +confusion. + +**Fix**: +1. Edit the PR description to add a short "Resolved review threads" line + noting that Phase 5 commit `c26bf7c` (call-site enumeration removal) + makes the three Copilot AI inline threads content-stale; threads are + already resolved on the GitHub side. +2. Do **not** edit, reply to, re-resolve, or unresolve any thread — they + are already in the correct state, and zoowii's thread must be left + alone per the "no-auto-reply-to-zoowii" rule. + +**Verification**: +- `gh api graphql -f query='query{repository(owner:"DTVMStack",name:"DTVM"){pullRequest(number:446){reviewThreads(first:50){nodes{id isResolved comments(first:1){nodes{author{login}}}}}}}}'` + still reports `isResolved: true` for all 4 threads (3 Copilot + 1 + zoowii) after the PR body edit. +- `gh pr view 446` shows the PR body now mentions `c26bf7c` as the + commit that obsoleted the call-site / `ResolvedJumpTargets` + discussion. + +### F3 — Make `weierstrudel` / `jump_around` regression visible in PR body + +**Symptom**: the multipass perf table shows `weierstrudel/15 +17.5%`, +`weierstrudel/1 +19.5%`, `micro/jump_around/empty +22.8%` — within the +25% gate but clustered near the ceiling. The current PR description +groups them with "small regressions remain (≤ +6%)" which is wrong, and +buries them in the per-bench list. + +**Why this blocks**: hides a known design-tradeoff cost from upstream +reviewers; if a future contract trips +25%, reviewers will treat it as a +new regression rather than the predicted cost of mixed-CFG over-approx. + +**Fix**: rewrite the "Risks" / "Evaluation" section of the PR body to: +1. Correct the "≤ +6%" claim; explicitly list the ~+17 to +23% jump-heavy + regressions with the actual numbers. +2. State that these are the predicted cost of CFG over-approximation on + jump-heavy contracts (consistent with the design-doc rationale) — not + noise. +3. Note the 25% threshold buffer is intentional but tight; if a future + contract trips, the right move is to investigate that contract, not + to widen the threshold. + +**Verification**: +- Read the rewritten PR body once before pushing, confirm each cited + number matches the CI bot's latest table (per the + "PR perf table integrity" rule, regenerate from the bot, do not paste + from memory). + +## Non-blocking follow-ups (file as TODO comments + GitHub issue) + +### F4 — Document `buildCFGEdges` over-approx invariant + +`buildCFGEdges` is at `src/evm/evm_cache.cpp:389-429`. Its function-level +comment (lines 386-388) and inline branch comment (lines 419-422) already +explain *why* over-approximation is intentional, but neither links forward +to the soundness mechanism that absorbs the cost (`lemma614Update` at line +920, which uses the `effectivePredCount > 1` guard at line 911 to refuse +shifting along over-approx edges). + +Append one sentence to the function-level comment block at lines 386-388: + +> "After this pass, JUMPDEST blocks may have many predecessors; this is +> the intentional partner to `lemma614Update`'s `effectivePredCount > 1` +> guard, which refuses to shift gas across edges with multiple +> predecessors and so absorbs the over-approximation soundly." + +Documentation only — no behavior change. ~3-line edit at the function +header. + +### F5 — `CacheNeedsSPP` lifecycle invariant comment + +The `CacheNeedsSPP` field is at `src/runtime/evm_module.h:82` (already +has a short comment about JIT consumption). The lifecycle constraint is +visible at `src/runtime/evm_module.cpp:117` (set before +`performEVMJITCompile`), `:125` (`getBytecodeCache` triggers build), and +`:135` (`initBytecodeCache` reads `CacheNeedsSPP`). + +Append to the field's existing comment: + +> "Must be set before any `getBytecodeCache()` call — once the cache is +> built, the `EnableSPP` decision is fixed for the lifetime of the +> module. Future lazy / on-demand JIT paths must flip this flag before +> triggering the lazy cache build." + +Documentation only. + +### F6 — `addEdge` O(deg²) compile-time guardrail [DROPPED 2026-05-07] + +**Status**: dropped. Opening an upstream issue about a code path none +of the F1/F4/F5 commits touch, with no measured evidence of compile- +time pain on the existing CI matrix, would have been noise. + +**Original concern (kept for future reference)**: `addEdge` +(`src/evm/evm_cache.cpp:204` area) uses `std::find` for dedup, giving +O(current_deg) per insertion. Combined with over-approximated +dynamic-jump edges (`|JUMPDEST| × |dynamic jumps|`), pathological +contracts could inflate compile time. Phase 4 gating limits exposure +to JIT-consumer modules. + +**If a future contract trips this**: capture the offending bytecode ++ JIT compile-time profile first, then either (a) switch `Succs` / +`Preds` to a `vector` + `unordered_set` hybrid for +O(1) dedup, or (b) add a `LOG_INFO` warning when +`JumpDestBlocks.size() * dynamic_jump_count` exceeds a threshold so +the next tuning cycle has telemetry. Don't act preemptively. + +## Sequencing + +| Step | Action | Where | +|------|--------|-------| +| 1 | F1: remove `Prev2Pc/Prev2Opcode` (1 commit) | `src/evm/evm_cache.cpp` | +| 2 | F4 + F5: documentation tweaks (1 commit, squashable) | `src/evm/evm_cache.cpp`, `src/runtime/evm_module.h` | +| 3 | Build + format + local test gate (see below) | `tools/format.sh` + `evmone-unittests` + `evmone-statetest` + `ctest` | +| 4 | Push to `feat/gas-check-placement`; await CI green (~35 min for the multipass perf job) | — | +| 5 | F2: edit PR body to point at `c26bf7c` (no thread mutation — Round-2 live query confirmed all 4 threads already resolved) | GitHub web/CLI | +| 6 | F3: rewrite Evaluation section in PR body using numbers from the latest CI bot table (per "PR perf table integrity" rule, never paste from memory) | GitHub web/CLI | +| 7 | (F6 dropped) | — | + +## Out-of-scope + +- Re-introducing call-site resolution / `ResolvedJumpTargets`: belongs in + a future PR with a real consumer (e.g. MIR direct-branch optimization). +- Tuning the 25% perf threshold or adjusting individual bench tolerances: + that is a CI-config concern, not a code change. +- Switching `addEdge` data structure: see F6 — follow-up only. + +## Quality gates + +Before pushing the F1+F4+F5 commit, the build must use the CI-faithful +flag set (`.claude/rules/dtvm-build-config.md` / +`.claude/rules/match_ci_cmake_flags`): in particular +`-DZEN_ENABLE_JIT_PRECOMPILE_FALLBACK=ON` and `-DZEN_ENABLE_LIBEVM=ON`, +otherwise interpreter / fallback paths run a different code shape than +CI. + +1. `tools/format.sh check` clean. +2. `cmake --build build --target dtvmapi -j$(nproc)` succeeds, no new + warnings. +3. `evmone-unittests` multipass: 223/223 pass. +4. `evmone-unittests` interpreter: 215/215 pass. +5. `evmone-statetest --fork Cancun` multipass: 2723/2723 pass (current + baseline; the count must match — any drop is a regression). +6. `evmone-statetest --fork Cancun` interpreter: must match the pass + count reported by the most recent CI green run on + `feat/gas-check-placement` (binary equality — record it once before + making the F1+F4+F5 commit so the local re-run can be compared + exactly, not just "all green"). +7. `ctest` from `build/` (the project's built-in EVM spec tests, per + `.claude/rules/dtvm-local-test.md`). +8. CI green on the new push, including the matrix jobs: + `Build and test DTVM multipass on x86-64`, + `Build and test DTVM interpreter on x86-64`, + `Test DTVM-EVM JIT fallback in release mode with ctest on x86-64`, + `Test DTVM-EVM multipass evmtestsuite with gas register in release + mode with ctest on x86-64`, + `Performance Regression Check (interpreter)` and + `Performance Regression Check (multipass)`. + (~35 min for the multipass perf job.) + +Skip F3 (PR-body edits) until F1+F4+F5 commits land and CI passes, since +the PR description should match the final state of the branch. diff --git a/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md b/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md new file mode 100644 index 000000000..c1e787cd3 --- /dev/null +++ b/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md @@ -0,0 +1,280 @@ +# Change: SPP CFG over-approximation via implicit dyn-pred count + +- **Status**: Implemented +- **Date**: 2026-05-11 +- **Tier**: Light +- **Parent PR**: builds on `feat/gas-check-placement` (PR #446) + +## Overview + +Replace the `O(D * J)` explicit over-approximation edges in +`buildCFGEdges` (where `D` = #unresolved dynamic jumps and `J` = #JUMPDEST +blocks) with an `O(D + J)` implicit predecessor count. The SPP shifting +pass `lemma614Update` makes its single-vs-multi-predecessor decision via +the new `effectivePredCount`, so behavior is equivalent for every pair +(parent, JUMPDEST-successor) that the explicit representation would have +materialized — without ever building the dense edge set. + +The static-only reachability gap that this creates (dyn-only JUMPDESTs +become unreachable from the entry block) is closed by an explicit +reachability stitch that seeds every JUMPDEST as a root before the +dominator and loop analyses run. + +## Motivation + +The current `feat/gas-check-placement` representation builds a dense +over-approximate CFG: every unresolved dynamic `JUMP`/`JUMPI` adds one +edge to every JUMPDEST in the contract. This is `O(D * J)` edges, and +`addEdge`'s `std::find` dedup makes each insertion `O(deg)`, so the +total cost is `O(D * J^2 + J * D^2) = O(D * J * (D + J))`. For +pathological dyn-heavy contracts the asymptotic blow-up is third-order. + +Independently, `splitCriticalEdges` then processes those same edges +and (because every JUMPDEST has many predecessors in the over-approx +graph) splits each one with another `O(deg)` erase + insert pair — +contributing the same asymptotic cost a second time. PR #446 already +gates the SPP pipeline on JIT-consumer modules to bound the runtime +impact, but the per-module cost is still material when a large +contract is JIT-compiled. + +The dense edges contribute nothing to SPP's local shift decision: +`lemma614Update` refuses to shift into any successor with +`effectivePredCount > 1`, and every JUMPDEST that the dynamic jump +could reach has many predecessors after over-approximation. The edges +are pure compile-time tax. + +## Design + +### Implicit predecessor count (replaces `D * J` edges) + +`GasBlock` gains one field: + +```cpp +uint32_t ImplicitDynamicPredCount = 0; +``` + +Set on every JUMPDEST when the contract has at least one unresolved +dynamic jump. The count equals `D`, matching the number of +predecessors the explicit over-approximation would have produced. + +`effectivePredCount` folds the count in: + +```cpp +static size_t effectivePredCount(const GasBlock &Block) { + size_t Count = Block.Preds.size(); + if (Block.Start == 0) ++Count; + Count += Block.ImplicitDynamicPredCount; + return Count; +} +``` + +`lemma614Update` reads `effectivePredCount` for every shift decision, +so it sees an identical "multi-pred?" answer to the explicit case. +`buildCFGEdges` no longer adds any edge from a dynamic-jump block; the +SPP graph carries only static fall-through and resolved static-jump +edges. + +### Reachability stitch (closes the dom/loop gap) + +After `computeReachable` runs from the entry block, every JUMPDEST is +seeded into the reachable set and forward-propagated via `Succs`. +Without this step, dyn-only JUMPDESTs (e.g. Solidity function return +addresses, reached at runtime only via `PUSH ret; ... JUMP`) would +remain unreachable in the static-only CFG, and `computeDominators` / +`buildLoopsUsingDominance` would skip them — letting their static +successor chains miss SPP shifting opportunities. + +The stitch is purely additive (sets only `Reachable[x] = 1`) and +maintains the dominator monotonicity property required by SPP. + +### Compile-time complexity + +| Pass | Before (over-approx) | After (implicit count) | +|-----------------------|-----------------------------|------------------------------| +| `buildCFGEdges` | `O(D * J^2 + J * D^2)` | `O(N)` | +| `splitCriticalEdges` | `O(D * J^2)` on dyn edges | `O(N)` (no dyn edges to split) | +| `computeReachable` | `O(N + E_dense)` | `O(N) + reachability stitch` | +| `computeDominators` | Bitset width up by `+1` per JUMPDEST extra Pred | Same width, sparser graph | + +## Alternatives considered + +### Super-node (DynDispatch hub) — rejected + +A virtual `DynDispatch` block routing all dynamic jumps into one hub, +then fanning out to all JUMPDESTs. `O(D + J)` edges, preserves +reachability without a stitch, every standard pass sees a "real" CFG. + +Implemented and benchmarked side-by-side. Wall times are local +single-machine measurements (`evmone-unittests` for the +`loop_full_of_jumpdests` test, single test, multipass mode). They are +**not currently tracked in CI** — a dedicated compile-time-dense +benchmark lane is out of scope for this PR. + +| Implementation | Wall time (local) | +|----------------|-------------------| +| `feat/gas-check-placement` (over-approx) | 7.3 s | +| **A** (implicit count, this PR) | 3.3 s | +| **B** (super-node) | 275 s | + +B's blow-up traces to `computeDominators` / `buildLoopsUsingDominance` +on the dispatch hub: the hub creates a deeply irreducible CFG where +the iterative dataflow takes super-linear passes to converge, and +every back-edge into the hub triggers a `collectNaturalLoop` walk +over every block transitively reachable from it. Patching the loop +passes to special-case the hub re-introduces the structural +asymmetry that motivated A in the first place. **B is unusable.** + +### Reproducing the scaling claim + +Build the manual demo and run the wrapper script: + +```bash +cmake --build build --target evmCacheComplexityDemo +bash docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/scaling_demo.sh +``` + +The demo generates a synthetic contract (`CALLDATALOAD JUMP +STOP`) and times the full `buildBytecodeCache` call. + +**Intra-PR comparison** (the same demo cherry-picked onto commit +`99f23a3`, which is the PR's head one commit BEFORE Phase 7 — both +states run the SPP pipeline; the only difference is the +over-approximation representation): + +| N JUMPDESTs | Pre-Phase-7 (D×J explicit edges) | Phase 7 (O(N) implicit count) | Speedup | +|------------:|---------------------------------:|------------------------------:|--------:| +| 100 | 0.07 ms | 0.05 ms | 1.4× | +| 500 | 0.39 ms | 0.13 ms | 3.0× | +| 1,000 | 1.01 ms | 0.29 ms | 3.4× | +| 2,000 | 3.04 ms | 0.67 ms | 4.5× | +| 5,000 | 19.66 ms | 2.71 ms | 7.2× | +| 10,000 | 84.76 ms | 10.38 ms | 8.2× | +| 20,000 | 345.94 ms | 43.68 ms | **7.9×** | + +Pre-Phase-7 wall clock grows ~4× per doubling of `N` (quadratic — the +expected O(D × J²) shape of explicit-edge add + critical-edge split). +Phase 7 grows 2–4× per doubling — sub-quadratic, with the residual +super-linearity sourced from `computeDominators` and +`buildLoopsUsingDominance` running on the now-larger reachable set. + +**Scope of the O(N) claim**: Phase 7 makes the CFG over-approximation +step itself O(N) (one count stamp per JUMPDEST). The wall clock above +includes the rest of the SPP pipeline — `computeDominators` and +`buildLoopsUsingDominance` are iterative dataflow with super-linear +worst-case behaviour and dominate the time at large N. The 4-second +saving on `loop_full_of_jumpdests` (7.3 s → 3.3 s above) is the Phase 7 +contribution; the remaining 3.3 s is dom / loop analysis plus JIT +compile, untouched by this PR. Cutting that further would require a +separate dom-analysis change. + +### Edge-budget fallback — rejected + +Keep the explicit over-approx but skip SPP when +`D * J > kBudget`. Trades a complexity ceiling for an SPP cliff; on +contracts that sit just over the budget the gas-check density jumps +discontinuously. Solves a symptom rather than the root cause. + +## Impact + +### Performance (27 paper benches, `--benchmark_min_time=3x`, 5 reps) + +vs `feat/gas-check-placement` (PR #446) baseline: + +- **Geomean: 0.9727× (-2.73%)** +- Arithmetic mean: -1.48% + +**Wins** (regressions from PR #446 reversed): + +| Benchmark | PR #446 vs upstream | A v2 vs PR #446 | +|---|---|---| +| `micro/jump_around/empty` | +22.8% | **-53.1%** | +| `micro/signextend/zero` | -42.7% | -24.6% (further) | +| `main/blake2b_huff/8415nulls` | -5.3% | -14.7% (further) | +| `main/structarray_alloc/nfts_rank` | -6.2% | -4.9% (further) | +| `main/snailtracer/benchmark` | - | -1.3% | +| `main/weierstrudel/15` | +17.5% | -2.5% | + +**Worst-case regressions** (vs PR #446): + +| Benchmark | A v2 vs PR #446 | Note | +|---|---|---| +| `main/sha1_shifts/empty` | +27.0% (mean) | Single-outlier noise; median delta +2.7% | +| `micro/memory_grow_mstore/by16` | +13.98% | Real | +| `micro/memory_grow_mload/by32` | +10.64% | Real | +| `micro/loop_with_many_jumpdests/empty` | +6.81% | Real (was +48.5% in A v1 without reachability stitch) | + +All real regressions are well under the 25% CI gate. The +`sha1_shifts/empty` mean is pulled up by one rep that hit 8.87us out +of 5; the median is +2.7%. + +### Correctness + +- `evmone-unittests` multipass: **223/223 pass**, 8.4 s wall time + (vs 13 s baseline, 305 s for scheme B). +- `tools/format.sh check`: clean. + +## Changed files + +- `src/evm/evm_cache.cpp` — `GasBlock::ImplicitDynamicPredCount` field; + `effectivePredCount` folds it in; `buildCFGEdges` stamps the count + on every JUMPDEST and skips the `D * J` edge-add loop; reachability + stitch in `buildGasChunksSPP` seeds every JUMPDEST as a root after + `computeReachable`. + +### Performance — full PR #446 (with this optimization) vs `upstream/main` + +After rebasing `feat/gas-check-placement` onto current `upstream/main` +(which now includes #458/#460/#482/#483 upstream perf work), the +end-to-end picture on the same 27-bench paper filter is essentially +flat: + +- **27-bench 10-rep geomean: +1.15%** (treatment slower). +- 0 benches above the ±25% CI gate. +- **Caveat — single-session sequential 10-rep is noisy**: a focused 20-rep + re-measurement on the four largest 10-rep movers showed they collapse + to evmone-bench's inter-binary drift band: + + | Bench | 10-rep Δ | 20-rep Δ (focused) | + |---|---|---| + | `main/weierstrudel/1` | +3.51% | +0.55% (treat CV 2.19%) | + | `main/blake2b_huff/8415nulls` | −6.30% | +1.55% (flipped) | + | `micro/loop_with_many_jumpdests/empty` | −4.84% | −0.55% | + | `main/blake2b_shifts/8415nulls` | +20.34% (CV 21.93%) | +0.25% (CV 2.09%) | + +- Three of the four 10-rep "regression" benches above the noise band — + `micro/memory_grow_mstore/{nogrow,by1}`, `micro/memory_grow_mload/nogrow` + — contain **zero JUMP / JUMPI / JUMPDEST opcodes**, so PR #446's CFG + changes cannot affect them by construction. Those deltas are pure + drift artifacts. + +The earlier −2.73% A-vs-PR-base geomean still holds — this change does +improve over PR #446's pre-rebase head. But the cumulative PR #446 +benefit over current upstream/main has shrunk to within drift band on +this 27-bench corpus: the intervening upstream perf commits absorbed +the absolute speedup, and the residual per-bench deltas are not +statistically distinguishable from inter-binary system drift. + +### A note on the SPP→JIT cost-flow mechanism + +PR #446 is the first time SPP-shifted gas costs reach the JIT in any +version of DTVM. SPP redistributes cost between blocks but preserves +total gas across any path. For contracts with many JUMPDESTs targeted +by dynamic jumps, the lemma 6.14 multi-pred guard prevents shifts +INTO those JUMPDESTs but allows shifts OUT, which can mildly inflate +the chunk-start metering immediate at each JUMPDEST. This theoretical +effect would not be visible on the runtime side of the 27-bench +corpus at current measurement precision (20-rep focused on +`main/weierstrudel/1` — the most dyn-dispatch-heavy bench — shows ++0.55% delta, within CV). A future PR could gate `GasChunkCostSPP` +to `nullptr` for JUMPDEST-density-heavy contracts if a measurable +regression surfaces; nothing in the current corpus justifies the +added gating logic. + +## Out of scope + +- The peripheral diagnostics about `GasChunkCostSPP` in clangd are + pre-existing for the PR #446 branch and unrelated to this change. +- Re-introducing super-node / DynDispatch later — would require + rewriting `computeDominators` and `buildLoopsUsingDominance` to + treat dispatch hubs structurally, which is invasive and gives no + measurable benefit over the implicit-count representation. diff --git a/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/review-fixes-r2.md b/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/review-fixes-r2.md new file mode 100644 index 000000000..8fdfe9644 --- /dev/null +++ b/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/review-fixes-r2.md @@ -0,0 +1,262 @@ +# PR #446 Round-2 Review Response Plan + +- **Status**: Revised after round-1 review (Opus + Codex) +- **Date**: 2026-05-12 +- **Parent change**: `README.md` (SPP CFG implicit-dyn-pred Phase 7) +- **Branch**: `feat/gas-check-placement` + +This plan addresses the 2026-05-12 self-review of post-rebase PR #446, +revised after round-1 dual-reviewer feedback. Round-1 surfaced four +substantive corrections: R1.1 is not directly observable, R1.2 needs a +stronger oracle, R2 is a real semantic change (not a perf guard), and +R3's target should be the stale comment at evm_cache.cpp:1054-1059. + +## Blocking before merge + +### R1 — Targeted cache-builder unit tests for Phase 7 invariants + +**Symptom**: Phase 7 introduces two new mechanisms — `ImplicitDynamicPredCount` +folded into `effectivePredCount`, and a reachability stitch that seeds every +JUMPDEST as a BFS root after `computeReachable`. No test in `src/tests/` +exercises either directly. The 223+215+2723 corpus pass empirically but +won't isolate a regression in the stitch or implicit-pred logic. + +**Observability constraint** (from round-1 review): `struct GasBlock` and +`ImplicitDynamicPredCount` are file-static in `src/evm/evm_cache.cpp` (line +~197). Only `EVMBytecodeCache` arrays are exposed via `evm_cache.h`. +**`GasChunkCostSPP[i] != 0` is not a valid oracle for "block was reached"** +— `buildGasChunksSPP` writes every non-empty block's `Metering[Id]` into +`GasChunkCostSPP` regardless of whether SPP analysis reached it +(evm_cache.cpp:1207-1219). The only valid oracle is the **specific shifted +value at PC**: when SPP analysis ran on a block, the shifted value differs +from the unshifted base cost in a deterministic, hand-computable way. + +**Fix**: add a new test executable `evmCacheTests` to `src/tests/CMakeLists.txt` +that includes `evm/evm_cache.h` directly and drives `buildBytecodeCache`. +Use raw-hex bytecode fixtures. + +Three cases: + +1. **`Stitch_Reaches_DynOnly_JumpDest_Affects_SPP`** + Fixture: a contract where one JUMPDEST `A` has NO static predecessor + (only reachable via a dynamic JUMP elsewhere). `A` has a successor `S` + with `effectivePredCount(S) == 1` and a non-terminator cost that + lemma 6.14 would shift back into `A`. + + **Oracle caveat (round-2 review)**: `computeReverseTopo` + (evm_cache.cpp:697-735) iterates every block without filtering by + `Reachable[]`, so the negative-control claim "without stitch, S not + in RevTopo" would be wrong. What actually changes when the stitch + fires is `computeDominators` input (Reachable-gated at + evm_cache.cpp:630-633): with stitch, A's dom-set is computed against + a live forward CFG; without stitch, A is self-dom. + `findBackEdgesUsingDominators` and the loop-aware shift path then + diverge. + + Oracle: build the cache twice — once with the stitch live (current + code) and once with a test-local stitch-off path that no-ops the + seed loop. Assert `GasChunkCostSPP[A.Start]` differs between the two. + This is a "stitch toggles observable behavior" assertion; it does NOT + require hand-computing the exact shifted value, but does require a + stitch-off variant accessible to the test (a `#ifdef + DTVM_TEST_STITCH_OFF` block, or duplicate the build path in the test + TU with the seed loop disabled). If the toggle mechanic proves too + invasive, skip case 1 and rely on cases 2 and 3 below. + +2. **`No_Shift_Into_Implicit_MultiPred_JumpDest`** + Fixture: a JUMPDEST `B` with exactly 1 explicit static predecessor AND + ≥ 1 implicit dyn-jump source elsewhere in the contract. + - lemma 6.14 INTO `B`: `effectivePredCount(B) = 1 + DynamicJumpCount ≥ 2`, + should refuse to shift cost from B's predecessor INTO B. + - Assertion: `GasChunkCostSPP[predOf_B.Start]` is NOT modified by a + shift that would have moved cost into `B`. Concretely, the shifted + value at the predecessor should not include any contribution from + `B`'s base cost. + +3. **`Shift_OUT_From_MultiPred_JumpDest_Still_Works`** (added per round-1 + reviewer note) + Fixture: a JUMPDEST `M` that has multiple implicit dyn-pred (so + `effectivePredCount(M) > 1`, no shift INTO M), but has at least one + successor `T` with `effectivePredCount(T) == 1`. + - lemma 6.14 looks at M's successors (evm_cache.cpp:960-972). The + check is on `effectivePredCount(Blocks[Succ])`, NOT on M itself. So + shifting cost from `T` back into `M` IS still allowed. + - Assertion: `GasChunkCostSPP[M.Start]` reflects the shift FROM T, i.e. + is greater than `GasChunkCost[M.Start]` (M's unshifted base cost). + +**Verification**: +- New test target builds and links cleanly. +- All three cases pass; explicitly disabling the stitch (debug experiment) + must make case 1 fail (oracle is meaningful). +- `tools/format.sh check` clean. +- Existing 223/215/2723 corpus unaffected. + +**Out of scope**: bytecode fuzzing. Targeted hand-crafted fixtures only. + +### R2 — Restrict stitch BFS seeding to dyn-target JUMPDESTs only + +**Re-framed per round-1 review**: this is a **semantic change**, not a +perf guard. The current stitch (evm_cache.cpp:1066-1092) seeds every +JUMPDEST as a BFS root, including: + +1. JUMPDESTs in no-dyn-jump contracts that are statically dead (no pred). +2. JUMPDESTs in mixed contracts (dyn + static) that have no static or + implicit-dyn predecessor — i.e. genuinely-dead JUMPDESTs that no jump + targets at all. + +Pre-Phase-7, both classes were unreachable in `Reachable[]` and therefore +ignored by `computeDominators` / `lemma614Update`. Post-Phase-7, both +classes are now in `Reachable[]`, their dom-tree positions get computed +(evm_cache.cpp:630-657), they enter `RevTopo`, and `lemma614Update` is +called on them (evm_cache.cpp:1127-1132 has no `Reachable[]` gate). So +their loop / backedge / SPP decisions are now potentially different. + +**Why this blocks**: silent semantic change on a class of contracts the +post-rebase 27-bench corpus doesn't isolate. The behavior change is +benign in most cases (dead JUMPDESTs have no out-flow, so no cost shifts +through them), but it widens the dom/loop analysis input set in ways the +review can't fully predict. + +**Fix**: change the stitch seed set from "all JUMPDESTs" to "only JUMPDESTs +with `ImplicitDynamicPredCount > 0`". Implementation: inside the stitch +loop (currently evm_cache.cpp:1076-1080), gate the `if (Reachable[JdId] == 0)` +seed with `if (Blocks[JdId].ImplicitDynamicPredCount > 0)`. This restores +pre-Phase-7 behavior on truly-dead JUMPDESTs while still rescuing real +dyn-targets. + +**Verification**: +- `Reachable[]` is internal to `buildGasChunksSPP`; the public header + only exposes `GasChunkCost{,SPP}`, `JumpDestMap`, `PushValueMap`, + `GasChunkEnd` (evm_cache.h:18-36). So the test asserts on cache state + delta, not on `Reachable[]` directly. +- Fixture: contract with no dyn-jumps + one statically-dead JUMPDEST + `D`. With R2's gate, `D.ImplicitDynamicPredCount == 0`, the stitch + skips it, and `D`'s `Metering[]` value remains its unshifted base + cost (no `lemma614Update` call considers shifting into `D` because + no block has `D` in its Succs). Assertion: + `GasChunkCostSPP[D.Start] == GasChunkCost[D.Start]` (no shift). + Without the gate (regression case), `D` is in `Reachable[]`, + `computeDominators` may treat its position differently, and a + shift may alter `GasChunkCostSPP[D.Start]`. The before/after is the + observable delta. Implement as a unit test in `evmCacheTests`. +- Existing tests pass. + +**Out of scope**: revisiting whether the dom/loop analyses should run on +unreachable nodes at all. The conservative move here is to preserve +pre-Phase-7 behavior on the dead-island class. + +### R3 — Fix the stale CFG comment block at evm_cache.cpp:1054-1059 + +**Symptom**: the comment block above the `buildCFGEdges` call site at +`evm_cache.cpp:1054-1059` reads: + +``` +// Build CFG with over-approximation for all unresolved dynamic jumps. +// Static jumps (PUSH → JUMP) get precise single-target edges; dynamic +// jumps get edges to every JUMPDEST. This is intentionally conservative — +// ... +``` + +The text "dynamic jumps get edges to every JUMPDEST" is **wrong** post- +Phase-7. Inside `buildCFGEdges` (evm_cache.cpp:446-447) the new behavior +is explicitly "No explicit Succs/Preds edges added" for dyn jumps. A +future contributor reading the call-site comment will be misled. + +**Why this blocks**: stale documentation lures contributors into +re-introducing the D × J explicit edges (undoing Phase 7) "to match the +documented behavior". + +**Fix**: replace the call-site comment block (1054-1059) with one that +matches the new implementation. Suggested text: + +``` +// Build CFG. Static jumps (PUSH → JUMP) get precise single-target edges. +// For unresolved dynamic jumps the CFG is kept sound by stamping each +// JUMPDEST with ImplicitDynamicPredCount instead of materialising the +// D × |JUMPDEST| explicit edges — that count is folded into +// `effectivePredCount`, so `lemma614Update`'s "shift only into +// single-effective-pred successors" check behaves identically to the +// old explicit-edge representation. The `splitCriticalEdges` pass below +// operates on explicit Succs/Preds and therefore never sees dyn-jump → +// JUMPDEST edges; that is intentional because the multi-predecessor +// guard in `lemma614Update` (with implicit count folded INTO +// effectivePredCount) blocks shifts whenever effective preds > 1. +``` + +**Wording rationale (round-2 review note)**: an earlier draft said "any +`ImplicitDynamicPredCount > 0` rejects shifts INTO". That is wrong when +`ImplicitDynamicPredCount == 1` and the JUMPDEST has no explicit static +pred — `effectivePredCount` would be 1 and the guard would NOT fire. In +practice that case is moot (no block has the JUMPDEST in its Succs when +all entries are dyn, so no `lemma614Update` call considers shifting +into it), but the comment must phrase the invariant in terms of +`effectivePredCount > 1` to be technically correct. + +**Verification**: comment correct vs implementation. No code change. + +## Non-blocking nice-to-have + +### R5 — Soften the `loop_full_of_jumpdests` compile-time claim + +**Symptom**: `docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/README.md` +claims "7.3s → 3.3s" without noting that this is a local single-machine +measurement, not regression-protected by CI. + +**Fix**: update the README phrasing to "7.3s → 3.3s on a local +single-machine run; not currently tracked in CI". Defer adding a +compile-time bench lane to a separate PR. + +### R6 — Optional paranoid assert in implicit-pred stamp loop + +**Symptom**: `buildCFGEdges` stamps `ImplicitDynamicPredCount` on every +block ID in `JumpDestBlocks` without verifying each ID is actually a +JUMPDEST opcode. + +**Fix (if cheap)**: add `ZEN_ASSERT(Blocks[JdId].LastOpcode == evmc::OP_JUMPDEST)` +before the stamp. Skip if `ZEN_ASSERT` is not available in this TU +without dragging in extra includes. + +## Dropped + +### R4 — Document duplicated `isGasChunkTerminator` check — **dropped** + +Round-1 review: the comment block above `effectivePredCount` (~line +930-937) already documents the multi-pred guarantee, and the +`MinSucc = 0` rationale is already commented at evm_cache.cpp:963-967. +Adding another comment is noise per `.claude/rules/cpp-code-style.md` +("Only include essential comments"). + +## Execution order + +1. **R3** (comment-only) — lowest risk, no code behavior change. Land + first so any subsequent diff stays small. +2. **R2** (stitch-gate) — code change. Verify via fixture that + statically-dead JUMPDESTs return to `Reachable[]=0`. +3. **R1** (3 unit tests). Build `evmCacheTests` and ensure all three + cases pass against the post-R2 implementation. +4. **R5** (doc softening) — one-line phrase change. +5. **R6** (assert) — optional, decide at execution time based on header + reach. + +After each step: `tools/format.sh check`, build target, run unit tests. + +## Verification gate before commit + +- New `evmCacheTests` target builds and all 3 cases pass. +- `tools/format.sh check` clean. +- `cmake --build build --target dtvmapi -j$(nproc)` clean (no new warnings). +- `evmone-unittests` multipass: 223/223 pass. +- `evmone-statetest --fork Cancun` multipass: smoke run. + +## Risks + +- **R1 fixture authoring** is the largest unknown. Hand-computing the + expected shifted SPP value requires careful bytecode design. If + difficulty exceeds budget, fall back to a single "stitch toggles + observable behavior" assertion (case 1 only). +- **R2 semantic change** may surface in the existing 2723 statetest + corpus. If so, this becomes a 3-way decision: revert R2, narrow the + guard further, or accept the semantic broadening. Run statetest after + R2 lands. +- **R3, R5, R6** carry no runtime risk. + diff --git a/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/scaling_demo.sh b/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/scaling_demo.sh new file mode 100755 index 000000000..64fe57022 --- /dev/null +++ b/docs/changes/2026-05-11-spp-cfg-implicit-dyn-pred/scaling_demo.sh @@ -0,0 +1,18 @@ +#!/usr/bin/env bash +# Sweep buildBytecodeCache wall-clock across N JUMPDESTs. Background and +# numbers live in README.md alongside this script. +# Prereq: cmake --build build --target evmCacheComplexityDemo + +set -euo pipefail + +DEMO=${EVMCACHE_DEMO:-build/evmCacheComplexityDemo} +if [[ ! -x "$DEMO" ]]; then + echo "demo binary not found at $DEMO" >&2 + echo "build it with: cmake --build build --target evmCacheComplexityDemo" >&2 + exit 1 +fi + +echo "n_jumpdests,build_ms" +for N in 100 500 1000 2000 5000 10000 20000; do + "$DEMO" "$N" +done diff --git a/docs/design/evm-gas-mechanism.md b/docs/design/evm-gas-mechanism.md new file mode 100644 index 000000000..9f4f23694 --- /dev/null +++ b/docs/design/evm-gas-mechanism.md @@ -0,0 +1,341 @@ +# EVM Gas Mechanism (Interpreter and JIT) + +This document describes how DTVM accounts for EVM gas in both the +interpreter and the multipass JIT, and how the SPP (Structured +Precharging Pass) shifts charges along the control-flow graph for the +JIT consumer while keeping the interpreter's per-block totals +unchanged. + +## Goals + +- Charge each EVM execution path the exact gas the spec requires. +- Detect Out-Of-Gas (OOG) before any state change occurs. +- Amortize the per-opcode "is there enough gas?" check across + straight-line code so the hot path reduces to one comparison per + basic block (interpreter) or per chunk start (JIT). + +## Shared data: the bytecode cache + +Both execution engines read from a single +`zen::evm::EVMBytecodeCache` (`src/evm/evm_cache.h`). The cache is +built lazily on first access — `EVMModule::initBytecodeCache` is +defined at `src/runtime/evm_module.cpp:133-136`; the SPP-gating site +that flips `CacheNeedsSPP` lives at `src/runtime/evm_module.cpp:117`. +The cache exposes five parallel arrays indexed by program counter +(PC): + +| Field | Indexed by | Meaning | +| ---------------- | ---------- | --------------------------------------------------------------------------------------- | +| `JumpDestMap` | PC | 1 if a `JUMPDEST` opcode begins at this PC, else 0. | +| `PushValueMap` | PC | The 256-bit immediate decoded from a `PUSH*` at this PC (otherwise 0). | +| `GasChunkEnd` | chunk-start PC | Exclusive end PC of the gas chunk that starts here. Zero for non-chunk-start PCs. | +| `GasChunkCost` | chunk-start PC | **Unshifted** sum of opcode gas costs in the chunk (interpreter consumer). | +| `GasChunkCostSPP`| chunk-start PC | **SPP-shifted** chunk cost (JIT consumer). Empty when SPP is disabled for this module. | + +A "gas chunk" is a maximal straight-line region whose static gas +cost can be summed once at chunk construction. It ends at any +**gas-chunk terminator** (`isGasChunkTerminator` at +`src/evm/evm_cache.cpp:41-62`): the control-flow exits +`STOP`/`RETURN`/`REVERT`/`SELFDESTRUCT`/`INVALID`/`JUMP`/`JUMPI` and +the gas-sensitive opcodes +`SSTORE`/`CALL`/`CALLCODE`/`DELEGATECALL`/`STATICCALL`/`CREATE`/ +`CREATE2`/`GAS`. The terminator is **inside** its chunk — its static +cost is included in `GasChunkCost` (`evm_cache.cpp:329`) and a fresh +chunk starts at the *next* PC (`evm_cache.cpp:291-296`). A chunk also +ends just before a `JUMPDEST` (since `JUMPDEST` itself begins a new +chunk) and at the end of the bytecode. + +```mermaid +flowchart LR + Bytecode["EVM bytecode"] + JD[JumpDestMap] + PV[PushValueMap] + CE[GasChunkEnd] + CC["GasChunkCost
(unshifted)"] + SPP["GasChunkCostSPP
(shifted, optional)"] + + Bytecode --> Builder["buildBytecodeCache
(src/evm/evm_cache.cpp)"] + Builder --> JD + Builder --> PV + Builder --> CE + Builder --> CC + Builder -. "EnableSPP=true" .-> SPP + + JD --> Interpreter + PV --> Interpreter + CE --> Interpreter + CC --> Interpreter + + JD --> JIT["Multipass JIT
(EVMMirBuilder)"] + PV --> JIT + CE --> JIT + CC --> JIT + SPP --> JIT +``` + +The two consumers read disjoint chunk-cost arrays so neither +perturbs the other. Concretely: + +- The interpreter reads only `GasChunkCost` (`src/evm/interpreter.cpp:382`). +- The JIT prefers `GasChunkCostSPP` when non-null and falls back to + `GasChunkCost` otherwise (`src/compiler/evm_frontend/evm_mir_compiler.cpp:534, 578, 1315`). + +## Interpreter mode + +The interpreter runs the dispatch loop in +`BaseInterpreter::interpret` (`src/evm/interpreter.cpp:362`). Each +outer iteration starts at `Frame->Pc` and tries the **chunk fast +path** first: + +```mermaid +flowchart TD + Start(["outer iter
Pc = ChunkStartPc"]) --> Cond{"GasChunkEnd[Pc] > Pc
AND
gas >= GasChunkCost[Pc]?"} + Cond -- "no (either side)" --> Slow["Per-opcode dispatch
(switch/handler call,
line 1610+)
handler invokes chargeGas"] + Slow --> SlowOOG{"chargeGas:
gas < opcode cost?"} + SlowOOG -- "yes" --> OOG["setStatus(EVMC_OUT_OF_GAS)
break"] + SlowOOG -- "no" --> Pcpp["Frame->Pc++"] + Pcpp --> Start + + Cond -- "yes" --> Pre["Frame->Msg.gas -= GasChunkCost[Pc]
(pre-charge entire chunk)"] + Pre --> CG["Computed-goto fast path
until Pc >= ChunkEnd"] + CG --> Restart{"control-flow
opcode hit?"} + Restart -- "no" --> Start + Restart -- "yes (JUMP/JUMPI/...)
update Pc, restart" --> Start +``` + +Key properties: + +- Inside a chunk, **no gas check happens per opcode** — the chunk's + total has already been deducted at the chunk start. The + computed-goto loop simply executes opcodes, advances `Pc`, and + checks `Pc >= ChunkEnd` to exit + (`DISPATCH_NEXT` macro at `src/evm/interpreter.cpp:525`). +- Opcodes whose behaviour depends on `gas_left` at runtime + (`SSTORE`, `CALL*`, `CREATE*`, `GAS`) are gas-chunk terminators — + each is the **last** opcode of its chunk, so the chunk's static + pre-charge has been applied before the handler runs and any + dynamic delta the handler charges (via `chargeGas` at + `src/evm/interpreter.cpp:33-50`) is layered on top of an accurate + `gas_left` value. +- Memory expansion is **not** a chunk boundary: opcodes that touch + memory (`MLOAD`, `MSTORE`, `MSTORE8`, `KECCAK256`, the various + `*COPY` opcodes, `RETURN`, `REVERT`, …) charge their dynamic + expansion delta inline by calling `expandMemoryAndChargeGas` + (`src/evm/opcode_handlers.cpp:261`) from within the handler. +- The interpreter intentionally consumes the **unshifted** cost + (PR #371). The cache must keep an unshifted column available + regardless of whether SPP runs. + +## Multipass JIT mode + +The JIT lowers EVM bytecode to dMIR via `EVMMirBuilder` +(`src/compiler/evm_frontend/evm_mir_compiler.{h,cpp}`). Gas accounting +is woven into MIR generation by two helpers: + +- `meterOpcode(Opcode, PC)` — emit the gas check for one opcode at + `PC` (`src/compiler/evm_frontend/evm_mir_compiler.cpp:524`). +- `meterOpcodeRange(StartPC, EndPCExclusive)` — emit the gas check + for a contiguous PC range, used by the JUMPDEST run optimization + (`src/compiler/evm_frontend/evm_mir_compiler.cpp:544`). + +Both ultimately call `meterGas(Cost)` to emit the actual dMIR +sequence (`src/compiler/evm_frontend/evm_mir_compiler.cpp:607`, +short-circuits when `GasCost == 0` at line 608): + +```mermaid +flowchart TD + A["meterOpcode(Op, PC)"] --> B{"GasMeteringEnabled?"} + B -- "no" --> X(["return (no MIR)"]) + B -- "yes" --> Cache{"Chunk cache populated?
(GasChunkEnd && GasChunkCost
&& PC < GasChunkSize)"} + Cache -- "no (cache absent)" --> PerOp["Cost = InstructionMetrics[Op].gas_cost
meterGas(Cost)"] + Cache -- "yes" --> ChunkStart{"GasChunkEnd[PC] > PC?
(this PC is a chunk start)"} + ChunkStart -- "no (mid-chunk PC)" --> Skip(["return (no MIR;
chunk start already paid)"]) + ChunkStart -- "yes" --> Sel["Cost = GasChunkCostSPP[PC]
?? GasChunkCost[PC]
meterGas(Cost)"] + PerOp --> Emit + Sel --> Emit + + Emit["meterGas(Cost) emits dMIR:
CurrentGas = load gas
IsOutOfGas = (CurrentGas < Cost)
brif IsOutOfGas, OOGBlock, ContinueBlock
NewGas = CurrentGas - Cost
store NewGas"] + Emit --> Cont(["fall through to opcode lowering"]) +``` + +Two consequences: + +1. The JIT emits **at most one gas check per chunk** — the call at + the chunk-start opcode covers every opcode up to (but not + including) the next chunk start. Calls at mid-chunk PCs see + `GasChunkEnd[PC] == 0` and return without emitting any MIR + (`evm_mir_compiler.cpp:529, 537`). The fast path at line 553-572 + in `meterOpcodeRange` consumes a precomputed + `JumpDestRunLastPC`/`JumpDestRunSkipCost` table; the table itself + is populated when the JUMPDEST run jump-table is materialized + (`evm_mir_compiler.cpp:1297-1335`), so dispatching across a run + of consecutive `JUMPDEST`s costs one `meterGas` call. +2. The OOG branch is shared across all gas checks in the function + via `getOrCreateExceptionSetBB(ErrorCode::GasLimitExceeded)`, + keeping the cold path out of the hot block layout + (`evm_mir_compiler.cpp:626, 663`). + +When the build is configured with `ZEN_ENABLE_EVM_GAS_REGISTER`, the +gas value lives in a virtual register (`GasRegVar`, +`evm_mir_compiler.cpp:614-642`) instead of being reloaded from +memory on every `meterGas`. Synchronization back to `EVMInstance` +happens at any host-call boundary that may read or update gas — +not just `CALL*`/`CREATE*`/return, but also runtime helpers such as +the balance/code/keccak/memory-load handlers (`syncGasToMemory` +calls at `evm_mir_compiler.cpp:3556, 3623, 3638, 3652, 3745, 3776, +3857, 3976, 4054, 4136`; `syncGasToMemoryFull` is invoked at module +return / `RETURN` / `REVERT` / `STOP` / +`SELFDESTRUCT` paths around lines 1246, 4167-4259). + +## SPP cost shifting + +The Structured Precharging Pass — implemented as `lemma614Update` in +`src/evm/evm_cache.cpp:919` — moves gas costs **backwards** along the +CFG. For each non-cycle node, it charges the minimum successor cost +upfront, so the consumer only pays the residual at runtime: + +``` + Block A (cost = 3) + / \ + Block B (5) Block C (7) + +After SPP (min successor = 5 charged at A): + + Block A' (cost = 3 + 5 = 8) + / \ + Block B' (0) Block C' (2) +``` + +(The diagram assumes B and C each have only A as predecessor and +neither ends with a gas-chunk terminator — `lemma614Update` only +shifts when those preconditions hold; see the +`effectivePredCount == 1` and `isGasChunkTerminator` guards at +`evm_cache.cpp:940, 944, 966`.) + +Per-path totals are preserved: A→B is `3+5 = 8` before and +`8+0 = 8` after; A→C is `3+7 = 10` before and `8+2 = 10` after. The +benefit is that B's chunk now starts with cost zero, which lets +`meterGas` short-circuit and emit no dMIR at all +(`evm_mir_compiler.cpp:608`), and C's chunk only needs to charge the +residual `2`. The JIT therefore emits fewer non-trivial gas checks +on the hot path and shrinks the OOG fan-out. + +Soundness on cycles: the shift never crosses back-edges or +gas-chunk terminators (`SSTORE`/`CALL*`/`CREATE*`/`GAS`), so dynamic +gas is always charged at the correct point +(`evm_cache.cpp:421-427, 919-960`). + +### Why a separate `GasChunkCostSPP` array + +The interpreter's chunk fast path was specified against the +**unshifted** per-block cost in PR #371 and the cache must continue +to honour that contract. To enable SPP for the JIT without +disturbing the interpreter, the cache exposes two parallel arrays: + +- `GasChunkCost` — unshifted, written from `Blocks[Id].Cost` + (`evm_cache.cpp:1161`), consumed by the interpreter. +- `GasChunkCostSPP` — shifted, written from the metering function + `Metering[Id]` (`evm_cache.cpp:1165`), consumed by the JIT. + +The shifted variant is sound for the JIT because SPP refuses to +shift cost across **gas-sensitive terminators**: `GAS`, `CALL*`, +`CREATE*` (`isGasSensitiveTerminator` and `isGasChunkTerminator` +checks at `evm_cache.cpp:944, 966`). Each of these opcodes ends its +own chunk, so by the time it executes the chunk's cost — shifted or +not — has already been deducted at the chunk-start `meterGas`, and +the value the opcode reads (e.g. `GAS`) reflects the spec-mandated +remaining gas. Cost from the *successor* chunk never leaks back +across the terminator. + +### Mixed-precision CFG + +The SPP pass needs a sound CFG to compute "minimum successor cost" +correctly: + +```mermaid +flowchart LR + subgraph Static[Static jump] + P1[PUSH dest_pc] --> J1[JUMP] + J1 -. resolved .-> D1[JUMPDEST at dest_pc] + end + + subgraph Dynamic[Dynamic jump] + X[stack-derived target] --> J2[JUMP] + J2 -. over-approx .-> D2[every JUMPDEST] + end +``` + +- `PUSH n; JUMP` resolves to a single edge to `JUMPDEST` at PC `n` + (`resolveConstantJumpTarget` in `evm_cache.cpp`). +- Every other dynamic `JUMP` gets edges to **all** `JUMPDEST` + blocks (`buildCFGEdges`, `evm_cache.cpp:386-429`). + +Narrowing dynamic-jump edges using partial call-site information +would under-approximate the CFG and let SPP shift charges along +runtime-impossible edges, which breaks the per-path total invariant. +The over-approximation is intentional and documented inline +(`evm_cache.cpp:419-427`). + +## Pipeline gating + +The SPP CFG construction and shifting pass is significant compile- +time work and is only useful for the JIT consumer. Interpreter-only +modules skip it via `EVMModule::CacheNeedsSPP`: + +```mermaid +sequenceDiagram + participant Loader as EVMModule::create + participant Mod as EVMModule + participant Cache as EVMBytecodeCache + participant JIT as performEVMJITCompile + + Loader->>Mod: construct (CacheNeedsSPP=false) + alt RunMode != InterpMode + Loader->>Mod: EVMAnalyzer.analyze() + alt JIT-suitable + Loader->>Mod: CacheNeedsSPP = true + Loader->>JIT: performEVMJITCompile(Mod) + JIT->>Cache: getBytecodeCache() + Cache->>Cache: buildBytecodeCache(EnableSPP=true) + Note right of Cache: builds CFG, runs SPP,
fills GasChunkCostSPP + Cache-->>JIT: cache (with SPP) + end + end + + Note over Loader,Cache: First call to interpreter only: + Loader->>Cache: getBytecodeCache() + Cache->>Cache: buildBytecodeCache(EnableSPP=false) + Note right of Cache: skips CFG/SPP,
GasChunkCostSPP stays empty +``` + +`evm_compiler.cpp` passes `nullptr` for the SPP pointer when the +array is empty +(`src/compiler/evm_compiler.cpp:70-74`), so a JIT compilation that +runs without SPP (e.g. JIT bypass paths) cleanly falls back to the +unshifted array and remains correct. + +## Failure mode summary + +| Trigger | Where | Result | +| ------------------------------------------------------------- | ------------------------------------------------------------------ | -------------------------------------------- | +| Interpreter, gas insufficient for full chunk pre-charge | Combined check at `interpreter.cpp:397-398` | Skip fast path; fall through to slow path | +| Interpreter slow path, gas < per-opcode cost | `chargeGas` at `interpreter.cpp:33-50` | `setStatus(EVMC_OUT_OF_GAS)`, exit outer loop | +| JIT chunk-start `meterGas`, gas < `Cost` | `meterGas` `IsOutOfGas` branch (`evm_mir_compiler.cpp:622-631`) | Branch to shared `OutOfGasBB` | +| JIT mid-chunk per-opcode `meterGas`, gas < `Cost` | Same code path, just smaller `Cost` | Same shared `OutOfGasBB` | +| Dynamic-cost opcode (`SSTORE`/`CALL*`/`CREATE*`) underpaid | Forced chunk boundary; charged by handler call | Returns OOG status to dispatcher | + +## References + +- `src/evm/evm_cache.{h,cpp}` — bytecode cache, CFG construction, + `buildGasChunksSPP`, `lemma614Update`. +- `src/evm/interpreter.cpp` — chunk fast path (line 395), per-opcode + `chargeGas` (line 33). +- `src/compiler/evm_frontend/evm_mir_compiler.{h,cpp}` — + `meterOpcode`, `meterOpcodeRange`, `meterGas`. +- `src/runtime/evm_module.{h,cpp}` — `CacheNeedsSPP` gating before + `performEVMJITCompile`. +- `src/compiler/evm_compiler.cpp:70-74` — JIT-side `nullptr` fallback + for empty `GasChunkCostSPP`. +- `docs/changes/2026-04-05-gas-check-placement/README.md` — design + notes and benchmark results for the mixed-CFG / dual-array split. +- `docs/modules/evm/spec.md` — module spec for the EVM bytecode cache. diff --git a/src/compiler/evm_compiler.cpp b/src/compiler/evm_compiler.cpp index f7b908c7a..28ee695e5 100644 --- a/src/compiler/evm_compiler.cpp +++ b/src/compiler/evm_compiler.cpp @@ -69,8 +69,13 @@ void EagerEVMJITCompiler::compile() { Ctx.setMemoryLinearStrideSkipLeadingZeroLimbStores( EVMMod->getMemoryLinearStrideSkipLeadingZeroLimbStores()); const auto &Cache = EVMMod->getBytecodeCache(); + // GasChunkCostSPP is only allocated when the SPP metering pipeline runs + // (i.e. this module will be JIT-compiled). Pass nullptr when the array is + // empty so the JIT falls back to the unshifted GasChunkCost automatically. + const uint64_t *CostSPPPtr = + Cache.GasChunkCostSPP.empty() ? nullptr : Cache.GasChunkCostSPP.data(); Ctx.setGasChunkInfo(Cache.GasChunkEnd.data(), Cache.GasChunkCost.data(), - EVMMod->CodeSize); + CostSPPPtr, EVMMod->CodeSize); MModule Mod(Ctx); buildEVMFunction(Ctx, Mod, *EVMMod); diff --git a/src/compiler/evm_frontend/evm_mir_compiler.cpp b/src/compiler/evm_frontend/evm_mir_compiler.cpp index 3b04a5784..bbaa4a247 100644 --- a/src/compiler/evm_frontend/evm_mir_compiler.cpp +++ b/src/compiler/evm_frontend/evm_mir_compiler.cpp @@ -78,6 +78,7 @@ EVMFrontendContext::EVMFrontendContext(const EVMFrontendContext &OtherCtx) BytecodeSize(OtherCtx.BytecodeSize), GasMeteringEnabled(OtherCtx.GasMeteringEnabled), GasChunkEnd(OtherCtx.GasChunkEnd), GasChunkCost(OtherCtx.GasChunkCost), + GasChunkCostSPP(OtherCtx.GasChunkCostSPP), GasChunkSize(OtherCtx.GasChunkSize), Revision(OtherCtx.Revision), MemoryLinearStrideSkipLeadingZeroLimbStores( OtherCtx.MemoryLinearStrideSkipLeadingZeroLimbStores) @@ -393,6 +394,7 @@ void EVMMirBuilder::initEVM(CompilerContext *Context) { GasChunkEnd = EvmCtx->getGasChunkEnd(); GasChunkCost = EvmCtx->getGasChunkCost(); + GasChunkCostSPP = EvmCtx->getGasChunkCostSPP(); GasChunkSize = EvmCtx->getGasChunkSize(); #ifdef ZEN_ENABLE_EVM_GAS_REGISTER @@ -527,7 +529,12 @@ void EVMMirBuilder::meterOpcode(evmc_opcode Opcode, uint64_t PC) { } if (GasChunkEnd && GasChunkCost && PC < GasChunkSize) { if (GasChunkEnd[PC] > PC) { - meterGas(GasChunkCost[PC]); + // Prefer SPP-shifted cost when available — it preserves per-path totals + // while reducing the number of non-zero entries the JIT must emit a + // gas check for. + const uint64_t Cost = + GasChunkCostSPP ? GasChunkCostSPP[PC] : GasChunkCost[PC]; + meterGas(Cost); } return; } @@ -570,7 +577,7 @@ void EVMMirBuilder::meterOpcodeRange(uint64_t StartPC, uint64_t Cost = 0; if (GasChunkEnd && GasChunkCost && PC < GasChunkSize && GasChunkEnd[PC] > PC) { - Cost = GasChunkCost[PC]; + Cost = GasChunkCostSPP ? GasChunkCostSPP[PC] : GasChunkCost[PC]; } else { const uint8_t Opcode = static_cast(Bytecode[PC]); Cost = static_cast(InstructionMetrics[Opcode].gas_cost); @@ -1307,7 +1314,7 @@ void EVMMirBuilder::createJumpTable() { uint64_t Cost = 0; if (GasChunkEnd && GasChunkCost && Pc < GasChunkSize && GasChunkEnd[Pc] > Pc) { - Cost = GasChunkCost[Pc]; + Cost = GasChunkCostSPP ? GasChunkCostSPP[Pc] : GasChunkCost[Pc]; } else { // All bytes in the run are JUMPDEST opcode bytes (PUSH payload is // skipped in the scan above), so the fallback is a constant. diff --git a/src/compiler/evm_frontend/evm_mir_compiler.h b/src/compiler/evm_frontend/evm_mir_compiler.h index 65f35c745..4eec866de 100644 --- a/src/compiler/evm_frontend/evm_mir_compiler.h +++ b/src/compiler/evm_frontend/evm_mir_compiler.h @@ -66,13 +66,15 @@ class EVMFrontendContext final : public CompileContext { bool isGasMeteringEnabled() const { return GasMeteringEnabled; } void setGasChunkInfo(const uint32_t *ChunkEnd, const uint64_t *ChunkCost, - size_t Size) { + const uint64_t *ChunkCostSPP, size_t Size) { GasChunkEnd = ChunkEnd; GasChunkCost = ChunkCost; + GasChunkCostSPP = ChunkCostSPP; GasChunkSize = Size; } const uint32_t *getGasChunkEnd() const { return GasChunkEnd; } const uint64_t *getGasChunkCost() const { return GasChunkCost; } + const uint64_t *getGasChunkCostSPP() const { return GasChunkCostSPP; } size_t getGasChunkSize() const { return GasChunkSize; } bool hasGasChunks() const { return GasChunkEnd && GasChunkCost && GasChunkSize > 0; @@ -98,6 +100,7 @@ class EVMFrontendContext final : public CompileContext { bool GasMeteringEnabled = false; const uint32_t *GasChunkEnd = nullptr; const uint64_t *GasChunkCost = nullptr; + const uint64_t *GasChunkCostSPP = nullptr; size_t GasChunkSize = 0; evmc_revision Revision = zen::evm::DEFAULT_REVISION; uint8_t MemoryLinearStrideSkipLeadingZeroLimbStores = 0; @@ -1275,6 +1278,7 @@ class EVMMirBuilder final { // Chunk gas metering const uint32_t *GasChunkEnd = nullptr; const uint64_t *GasChunkCost = nullptr; + const uint64_t *GasChunkCostSPP = nullptr; size_t GasChunkSize = 0; #ifdef ZEN_ENABLE_EVM_GAS_REGISTER diff --git a/src/evm/evm_cache.cpp b/src/evm/evm_cache.cpp index cc5a6208e..a2a4f4a19 100644 --- a/src/evm/evm_cache.cpp +++ b/src/evm/evm_cache.cpp @@ -197,6 +197,11 @@ struct GasBlock { uint64_t Cost = 0; std::vector Succs; std::vector Preds; + // Count of dynamic-jump blocks in this contract that could land here at + // runtime. Only nonzero for JUMPDEST blocks when the contract has at + // least one unresolved dynamic jump. Carried separately so we avoid + // materialising D*J explicit over-approximation edges (see buildCFGEdges). + uint32_t ImplicitDynamicPredCount = 0; }; static void addEdge(std::vector &Blocks, uint32_t From, uint32_t To) { @@ -338,6 +343,27 @@ static void buildGasBlocks(const zen::common::Byte *Code, size_t CodeSize, } } +// Decode a PUSH immediate at PushPc and validate it as a JUMPDEST address. +// Returns true and sets DestPc on success. +static bool decodePushAsJumpDest(const std::vector &PushValueMap, + const std::vector &JumpDestMap, + size_t CodeSize, uint32_t PushPc, + uint32_t &DestPc) { + const intx::uint256 Value = PushValueMap[PushPc]; + if ((Value >> 64) != 0) { + return false; + } + const uint64_t Dest = static_cast(Value); + if (Dest >= CodeSize) { + return false; + } + if (JumpDestMap[Dest] == 0) { + return false; + } + DestPc = static_cast(Dest); + return true; +} + static bool resolveConstantJumpTarget(const std::vector &JumpDestMap, const std::vector &PushMap, size_t CodeSize, const GasBlock &Block, @@ -354,22 +380,73 @@ static bool resolveConstantJumpTarget(const std::vector &JumpDestMap, return false; } - const intx::uint256 Value = PushMap[Block.PrevPc]; - if ((Value >> 64) != 0) { - return false; - } + return decodePushAsJumpDest(PushMap, JumpDestMap, CodeSize, Block.PrevPc, + DestPc); +} - const uint64_t Dest = static_cast(Value); - if (Dest >= CodeSize) { - return false; +// Build CFG edges for all blocks. Static jumps (PUSH → JUMP) get precise +// single-target edges. For each unresolved dynamic jump we DO NOT add the +// D*|JUMPDEST| explicit over-approximation edges (which previously made the +// pass quadratic-to-cubic in pathological contracts). Instead we record on +// every JUMPDEST how many dynamic-jump blocks could land there at runtime +// via `ImplicitDynamicPredCount`, and `effectivePredCount` folds that count +// into its multi-predecessor check. SPP decisions are identical: a JUMPDEST +// that is a potential dynamic-jump target sees `effectivePredCount > 1` and +// `lemma614Update` refuses to shift gas across that edge, exactly as it +// would have done against an explicit over-approximated `Preds` set. +static void buildCFGEdges(std::vector &Blocks, + const std::vector &BlockAtPc, + const std::vector &JumpDestMap, + const std::vector &PushValueMap, + const std::vector &JumpDestBlocks, + size_t CodeSize) { + // Count unresolved dynamic jumps once so we can stamp every JUMPDEST with + // the right implicit-predecessor count in O(N) instead of O(D*J). + uint32_t DynamicJumpCount = 0; + for (const auto &Block : Blocks) { + if (!isJumpOpcode(Block.LastOpcode)) { + continue; + } + uint32_t DestPc = 0; + if (!resolveConstantJumpTarget(JumpDestMap, PushValueMap, CodeSize, Block, + DestPc)) { + ++DynamicJumpCount; + } } - - if (JumpDestMap[Dest] == 0) { - return false; + if (DynamicJumpCount > 0) { + for (uint32_t JdId : JumpDestBlocks) { + Blocks[JdId].ImplicitDynamicPredCount = DynamicJumpCount; + } } - DestPc = static_cast(Dest); - return true; + for (size_t BlockId = 0; BlockId < Blocks.size(); ++BlockId) { + auto &Block = Blocks[BlockId]; + const bool IsTerminator = isControlFlowTerminator(Block.LastOpcode); + + // Add fallthrough edge for non-terminating opcodes (CALL/CREATE/GAS, + // JUMPI included via the generic !IsTerminator path). + if (!IsTerminator && Block.End < CodeSize) { + const uint32_t SuccId = BlockAtPc[Block.End]; + if (SuccId != UINT32_MAX) { + addEdge(Blocks, static_cast(BlockId), SuccId); + } + } + + // Add jump target edge(s). + if (isJumpOpcode(Block.LastOpcode)) { + uint32_t DestPc = 0; + if (resolveConstantJumpTarget(JumpDestMap, PushValueMap, CodeSize, Block, + DestPc)) { + // Static (constant) jump: single known target. + const uint32_t SuccId = BlockAtPc[DestPc]; + if (SuccId != UINT32_MAX) { + addEdge(Blocks, static_cast(BlockId), SuccId); + } + } + // Dynamic jump: handled by the implicit-predecessor count stamped onto + // every JUMPDEST above. No explicit Succs/Preds edges added. + } + } } static size_t bitsetWordCount(size_t NumBits) { return (NumBits + 63) / 64; } @@ -852,11 +929,18 @@ static bool buildLoopsUsingDominance( // Effective predecessor count: the entry block (Start == 0) is always reachable // from the program start, adding an implicit path not represented in the CFG. +// +// Blocks with `ImplicitDynamicPredCount > 0` (every JUMPDEST in a contract +// that has at least one dynamic jump) carry the over-approximated dynamic +// predecessors as a count instead of explicit edges; folding them in here +// keeps `lemma614Update`'s "shift only into single-pred successors" check +// equivalent to the explicit over-approximation. static size_t effectivePredCount(const GasBlock &Block) { size_t Count = Block.Preds.size(); if (Block.Start == 0) { ++Count; } + Count += Block.ImplicitDynamicPredCount; return Count; } @@ -876,10 +960,17 @@ static bool lemma614Update(uint32_t NodeId, const std::vector &Blocks, continue; } if (AllowedMask && !bitsetTest(*AllowedMask, Succ)) { + // Non-back-edge successor excluded from shifting — its path would + // see the inflated parent cost without compensation. + MinSucc = 0; continue; } - // Only consider successors with exactly one effective predecessor. if (effectivePredCount(Blocks[Succ]) != 1) { + MinSucc = 0; + continue; + } + if (isGasChunkTerminator(Blocks[Succ].LastOpcode)) { + MinSucc = 0; continue; } MinSucc = std::min(MinSucc, Metering[Succ]); @@ -900,6 +991,9 @@ static bool lemma614Update(uint32_t NodeId, const std::vector &Blocks, if (effectivePredCount(Blocks[Succ]) != 1) { continue; } + if (isGasChunkTerminator(Blocks[Succ].LastOpcode)) { + continue; + } Metering[Succ] -= MinSucc; } @@ -911,7 +1005,9 @@ static bool buildGasChunksSPP(const zen::common::Byte *Code, size_t CodeSize, const std::vector &JumpDestMap, const std::vector &PushValueMap, std::vector &GasChunkEnd, - std::vector &GasChunkCost) { + std::vector &GasChunkCost, + std::vector &GasChunkCostSPP, + bool EnableSPP) { std::vector Blocks; std::vector BlockAtPc; buildGasBlocks(Code, CodeSize, MetricsTable, Blocks, BlockAtPc); @@ -920,19 +1016,11 @@ static bool buildGasChunksSPP(const zen::common::Byte *Code, size_t CodeSize, return true; } - bool HasDynamicJump = false; - for (const auto &Block : Blocks) { - if (!isJumpOpcode(Block.LastOpcode)) { - continue; - } - uint32_t DestPc = 0; - if (!resolveConstantJumpTarget(JumpDestMap, PushValueMap, CodeSize, Block, - DestPc)) { - HasDynamicJump = true; - break; - } - } - if (HasDynamicJump) { + if (!EnableSPP) { + // Interpreter-only fast path: emit unshifted per-block costs and skip + // the expensive CFG / call-site / metering pipeline. The JIT consumer + // path (which would read GasChunkCostSPP) is never wired up for this + // module, so no SPP-shifted values are needed. for (const auto &Block : Blocks) { if (Block.Start < CodeSize) { GasChunkEnd[Block.Start] = Block.End; @@ -942,6 +1030,9 @@ static bool buildGasChunksSPP(const zen::common::Byte *Code, size_t CodeSize, return true; } + // Always build CFG — no early exit for dynamic jumps. + // Unresolved jumps get over-approximated edges to all JUMPDESTs. + std::vector JumpDestBlocks; if (!JumpDestMap.empty()) { std::vector SeenBlocks(Blocks.size(), 0); @@ -960,42 +1051,44 @@ static bool buildGasChunksSPP(const zen::common::Byte *Code, size_t CodeSize, } } - // Build CFG - for (size_t BlockId = 0; BlockId < Blocks.size(); ++BlockId) { - auto &Block = Blocks[BlockId]; - const bool IsTerminator = isControlFlowTerminator(Block.LastOpcode); + // Static jumps get precise single-target edges. For unresolved dynamic + // jumps, the CFG over-approximation is encoded as + // ImplicitDynamicPredCount on each JUMPDEST (folded into + // effectivePredCount). Narrowing to partial call-site resolution would + // under-approximate the CFG and let SPP shift gas along non-existent + // edges, producing unsafe metering. + buildCFGEdges(Blocks, BlockAtPc, JumpDestMap, PushValueMap, JumpDestBlocks, + CodeSize); - // Add fallthrough edge for non-terminating opcodes (CALL/CREATE/GAS - // included). - if (!IsTerminator && Block.End < CodeSize) { - const uint32_t SuccId = BlockAtPc[Block.End]; - if (SuccId != UINT32_MAX) { - addEdge(Blocks, static_cast(BlockId), SuccId); + splitCriticalEdges(Blocks, CodeSize); + + std::vector Reachable = computeReachable(Blocks, 0); + // Seed dyn-target JUMPDESTs as reachability roots so dom/loop analyses + // include them and their static successors. Statically-dead JUMPDESTs + // (no static pred, no dyn-jump in the contract) are intentionally left + // unreachable. + { + std::vector Stack; + for (uint32_t JdId : JumpDestBlocks) { + if (Blocks[JdId].ImplicitDynamicPredCount == 0) { + continue; + } + if (Reachable[JdId] == 0) { + Reachable[JdId] = 1; + Stack.push_back(JdId); } } - - // Add jump edge (if static jump) - if (isJumpOpcode(Block.LastOpcode)) { - uint32_t DestPc = 0; - if (resolveConstantJumpTarget(JumpDestMap, PushValueMap, CodeSize, Block, - DestPc)) { - const uint32_t SuccId = BlockAtPc[DestPc]; - if (SuccId != UINT32_MAX) { - addEdge(Blocks, static_cast(BlockId), SuccId); - } - } else { - // Dynamic jump: over-approx to all jump destinations. - for (uint32_t SuccId : JumpDestBlocks) { - addEdge(Blocks, static_cast(BlockId), SuccId); + while (!Stack.empty()) { + const uint32_t Node = Stack.back(); + Stack.pop_back(); + for (uint32_t Succ : Blocks[Node].Succs) { + if (Reachable[Succ] == 0) { + Reachable[Succ] = 1; + Stack.push_back(Succ); } } } } - - // Split critical edges (required for safe SPP optimization) - splitCriticalEdges(Blocks, CodeSize); - - const std::vector Reachable = computeReachable(Blocks, 0); const std::vector> Dom = computeDominators(Blocks, Reachable); @@ -1119,6 +1212,10 @@ static bool buildGasChunksSPP(const zen::common::Byte *Code, size_t CodeSize, } GasChunkEnd[Blocks[Id].Start] = Blocks[Id].End; GasChunkCost[Blocks[Id].Start] = Blocks[Id].Cost; + // Export SPP-shifted cost on a separate output array so the JIT can read + // it without perturbing the interpreter fast path, which continues to see + // the unshifted per-block cost above. + GasChunkCostSPP[Blocks[Id].Start] = Metering[Id]; } return true; @@ -1127,11 +1224,16 @@ static bool buildGasChunksSPP(const zen::common::Byte *Code, size_t CodeSize, } // namespace void buildBytecodeCache(EVMBytecodeCache &Cache, const common::Byte *Code, - size_t CodeSize, evmc_revision Rev) { + size_t CodeSize, evmc_revision Rev, bool EnableSPP) { Cache.JumpDestMap.assign(CodeSize, 0); Cache.PushValueMap.resize(CodeSize); Cache.GasChunkEnd.assign(CodeSize, 0); Cache.GasChunkCost.assign(CodeSize, 0); + if (EnableSPP) { + Cache.GasChunkCostSPP.assign(CodeSize, 0); + } else { + Cache.GasChunkCostSPP.clear(); + } buildJumpDestMapAndPushCache(Code, CodeSize, Cache.JumpDestMap, Cache.PushValueMap); @@ -1141,7 +1243,8 @@ void buildBytecodeCache(EVMBytecodeCache &Cache, const common::Byte *Code, } buildGasChunksSPP(Code, CodeSize, MetricsTable, Cache.JumpDestMap, - Cache.PushValueMap, Cache.GasChunkEnd, Cache.GasChunkCost); + Cache.PushValueMap, Cache.GasChunkEnd, Cache.GasChunkCost, + Cache.GasChunkCostSPP, EnableSPP); } } // namespace zen::evm diff --git a/src/evm/evm_cache.h b/src/evm/evm_cache.h index cd7dd69ec..d43a48739 100644 --- a/src/evm/evm_cache.h +++ b/src/evm/evm_cache.h @@ -19,11 +19,21 @@ struct EVMBytecodeCache { std::vector JumpDestMap; std::vector PushValueMap; std::vector GasChunkEnd; + // Per-chunk-start unshifted gas cost. Interpreter reads this — it must + // equal the original block base cost (see PR #371). std::vector GasChunkCost; + // Per-chunk-start SPP-shifted gas cost for the multipass JIT. Produced by + // buildGasChunksSPP's metering pass; never read by the interpreter. + std::vector GasChunkCostSPP; }; +// Build the bytecode cache. When EnableSPP is true, the expensive SPP +// metering pipeline runs and GasChunkCostSPP is populated with shifted +// per-chunk costs for the multipass JIT. When false (interpreter-only +// modules), the pipeline is skipped and GasChunkCostSPP stays empty. void buildBytecodeCache(EVMBytecodeCache &Cache, const common::Byte *Code, - size_t CodeSize, evmc_revision Rev); + size_t CodeSize, evmc_revision Rev, + bool EnableSPP = false); } // namespace zen::evm diff --git a/src/evm/evm_cache.md b/src/evm/evm_cache.md index f2f050179..ad9eed528 100644 --- a/src/evm/evm_cache.md +++ b/src/evm/evm_cache.md @@ -7,7 +7,8 @@ This document describes the bytecode cache built by `buildBytecodeCache()` in `s - `JumpDestMap[pc]` (`uint8_t`): `1` if `Code[pc]` is `OP_JUMPDEST` and this byte is an opcode byte (not inside PUSH data). - `PushValueMap[pc]` (`intx::uint256`): decoded immediate for `PUSH1..PUSH32` at `pc`. Unused entries are `0`. - `GasChunkEnd[pc]` (`uint32_t`): for a chunk start `pc`, the exclusive end PC of the chunk; otherwise `0`. -- `GasChunkCost[pc]` (`uint64_t`): metering cost charged at block start `pc` (SPP-shifted in optimized mode); otherwise `0`. +- `GasChunkCost[pc]` (`uint64_t`): unshifted base gas cost of the block starting at `pc` (sum of EVMC base costs of opcodes in the block); otherwise `0`. Read by the interpreter. +- `GasChunkCostSPP[pc]` (`uint64_t`): SPP-shifted gas cost of the block starting at `pc`. Populated only when the SPP metering pipeline runs (JIT-consumer modules); otherwise the array is empty. Read by the multipass JIT. ## Build Algorithm @@ -62,7 +63,12 @@ using a linear-time SPP pass: to the loop nodes in local reverse-topological order. This moves common costs earlier, reducing the number of non-zero charge points. -The resulting `m` is stored in `GasChunkCost` at each block start. +The resulting shifted value `m(s)` is stored in `GasChunkCostSPP[s]` at each +block start; `GasChunkCost[s]` continues to hold the unshifted base cost so +the interpreter fast path is unaffected. The SPP pipeline only runs for +modules that will be JIT-compiled (gated by `EnableSPP` in +`buildBytecodeCache`); for interpreter-only modules `GasChunkCostSPP` is +left empty and the CFG / metering work is skipped. If the CFG is not suitable for linear SPP (e.g., dominance-based loop analysis fails), we still run SPP updates once per node in reverse topological order @@ -96,14 +102,16 @@ zero bytes on the right, matching the EVM encoding. ### Correctness of chunk gas charging -In SPP mode, `GasChunkCost[s]` is the shifted metering value `m(s)`. Lemma 6.14 -updates move cost along CFG edges while preserving total base cost on every -path. Over-approximating dynamic jumps keeps the optimization safe (it may -reduce shifts but never undercharges). Splitting critical edges ensures that -cost is only moved along edges where the local update is valid. When loop -analysis fails, the reverse-topological updates still preserve correctness -without fast-forward. - -The fast path is still used only when `gas_left >= GasChunkCost[s]`, so base-cost -out-of-gas cannot occur inside a block. Dynamic/extra gas is charged inside -opcode handlers as before (memory expansion, cold access, keccak word cost, etc). +`GasChunkCost[s]` is always the unshifted base cost of block `s`, so the +interpreter's fast path enters a chunk only when `gas_left >= GasChunkCost[s]` +and base-cost out-of-gas cannot occur inside a block. The multipass JIT reads +the shifted value `m(s)` from `GasChunkCostSPP[s]`. Lemma 6.14 updates move +cost along CFG edges while preserving total base cost on every path. +Over-approximating dynamic jumps to all `JUMPDEST`s keeps the optimization +safe — narrowing those edges with partial call-site resolution would +under-approximate the CFG and let the SPP pass shift gas along edges that +don't exist at runtime, producing unsafe metering. Splitting critical edges +ensures that cost is only moved along edges where the local update is valid. +When loop analysis fails, the reverse-topological updates still preserve +correctness without fast-forward. Dynamic/extra gas is charged inside opcode +handlers as before (memory expansion, cold access, keccak word cost, etc). diff --git a/src/runtime/evm_module.cpp b/src/runtime/evm_module.cpp index a3e3177f3..d551ca9bc 100644 --- a/src/runtime/evm_module.cpp +++ b/src/runtime/evm_module.cpp @@ -114,6 +114,9 @@ EVMModule::newEVMModule(Runtime &RT, CodeHolderUniquePtr CodeHolder, if (!Mod->ShouldFallbackToInterp) #endif // ZEN_ENABLE_JIT_PRECOMPILE_FALLBACK { + // JIT is about to compile this module — mark the bytecode cache so the + // SPP metering pipeline runs on first access. + Mod->CacheNeedsSPP = true; action::performEVMJITCompile(*Mod); } } @@ -130,7 +133,8 @@ const evm::EVMBytecodeCache &EVMModule::getBytecodeCache() const { } void EVMModule::initBytecodeCache() const { - evm::buildBytecodeCache(BytecodeCache, Code, CodeSize, Revision); + evm::buildBytecodeCache(BytecodeCache, Code, CodeSize, Revision, + CacheNeedsSPP); } } // namespace zen::runtime diff --git a/src/runtime/evm_module.h b/src/runtime/evm_module.h index 60ea5b62d..de09d9b67 100644 --- a/src/runtime/evm_module.h +++ b/src/runtime/evm_module.h @@ -101,6 +101,16 @@ class EVMModule final : public BaseModule { void initBytecodeCache() const; mutable bool BytecodeCacheInitialized = false; mutable evm::EVMBytecodeCache BytecodeCache; + // Whether this module will be consumed by the multipass JIT. When true, + // buildBytecodeCache runs the expensive SPP metering pipeline so the JIT + // can read shifted gas costs from GasChunkCostSPP. When false, only the + // cheap per-block pass runs — interpreter-only modules pay nothing extra. + // + // Must be set before any getBytecodeCache() call: once the cache is + // built, the EnableSPP decision is fixed for the lifetime of the + // module. Future lazy / on-demand JIT paths must flip this flag before + // triggering the lazy cache build. + bool CacheNeedsSPP = false; evmc_revision Revision = zen::evm::DEFAULT_REVISION; EVMMemorySpecializationProfile MemoryProfile = {}; diff --git a/src/tests/CMakeLists.txt b/src/tests/CMakeLists.txt index 973ef3a2f..526528bab 100644 --- a/src/tests/CMakeLists.txt +++ b/src/tests/CMakeLists.txt @@ -60,6 +60,11 @@ if(ZEN_ENABLE_SPEC_TEST) if(ZEN_ENABLE_EVM) add_subdirectory(mpt) add_executable(evmInterpTests evm_interp_tests.cpp) + add_executable(evmCacheTests evm_cache_tests.cpp) + # Build-only target: never receives ASan even in ASan builds so its + # wall-clock measurement is not distorted by sanitizer overhead. + add_executable(evmCacheComplexityDemo evm_cache_complexity_demo.cpp) + target_link_libraries(evmCacheComplexityDemo PRIVATE dtvmcore) if(ZEN_ENABLE_MULTIPASS_JIT) add_executable(evmJitFrontendTests evm_jit_frontend_tests.cpp) endif() @@ -99,6 +104,7 @@ if(ZEN_ENABLE_SPEC_TEST) if(ZEN_ENABLE_EVM) target_compile_options(evmInterpTests PRIVATE -fsanitize=address) + target_compile_options(evmCacheTests PRIVATE -fsanitize=address) if(ZEN_ENABLE_MULTIPASS_JIT) target_compile_options(evmJitFrontendTests PRIVATE -fsanitize=address) endif() @@ -124,6 +130,11 @@ if(ZEN_ENABLE_SPEC_TEST) PRIVATE dtvmcore rapidjson yaml-cpp gtest_main -fsanitize=address PUBLIC ${GTEST_BOTH_LIBRARIES} ) + target_link_libraries( + evmCacheTests + PRIVATE dtvmcore gtest_main -fsanitize=address + PUBLIC ${GTEST_BOTH_LIBRARIES} + ) if(ZEN_ENABLE_MULTIPASS_JIT) target_link_libraries( evmJitFrontendTests @@ -180,6 +191,11 @@ if(ZEN_ENABLE_SPEC_TEST) -static-libasan PUBLIC ${GTEST_BOTH_LIBRARIES} ) + target_link_libraries( + evmCacheTests + PRIVATE dtvmcore gtest_main -fsanitize=address -static-libasan + PUBLIC ${GTEST_BOTH_LIBRARIES} + ) if(ZEN_ENABLE_MULTIPASS_JIT) target_link_libraries( evmJitFrontendTests @@ -245,6 +261,11 @@ if(ZEN_ENABLE_SPEC_TEST) PRIVATE dtvmcore rapidjson yaml-cpp gtest_main PUBLIC ${GTEST_BOTH_LIBRARIES} ) + target_link_libraries( + evmCacheTests + PRIVATE dtvmcore gtest_main + PUBLIC ${GTEST_BOTH_LIBRARIES} + ) if(ZEN_ENABLE_MULTIPASS_JIT) target_link_libraries( evmJitFrontendTests @@ -292,6 +313,7 @@ if(ZEN_ENABLE_SPEC_TEST) if(ZEN_ENABLE_EVM) add_test(NAME evmInterpTests COMMAND evmInterpTests) + add_test(NAME evmCacheTests COMMAND evmCacheTests) if(ZEN_ENABLE_MULTIPASS_JIT) add_test(NAME evmJitFrontendTests COMMAND evmJitFrontendTests) endif() diff --git a/src/tests/evm_cache_complexity_demo.cpp b/src/tests/evm_cache_complexity_demo.cpp new file mode 100644 index 000000000..26dcf2d09 --- /dev/null +++ b/src/tests/evm_cache_complexity_demo.cpp @@ -0,0 +1,65 @@ +// Copyright (C) 2025 the DTVM authors. All Rights Reserved. +// SPDX-License-Identifier: Apache-2.0 + +// Time buildBytecodeCache on a CALLDATALOAD JUMP STOP +// contract. Usage: evmCacheComplexityDemo +// Output: "," on stdout. + +#include "evm/evm_cache.h" +#include "platform/platform.h" + +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +namespace { + +constexpr uint8_t OP_STOP = static_cast(evmc_opcode::OP_STOP); +constexpr uint8_t OP_CALLDATALOAD = + static_cast(evmc_opcode::OP_CALLDATALOAD); +constexpr uint8_t OP_JUMP = static_cast(evmc_opcode::OP_JUMP); +constexpr uint8_t OP_JUMPDEST = static_cast(evmc_opcode::OP_JUMPDEST); + +std::vector makeDynDispatchContract(size_t NumJumpDests) { + std::vector Code; + Code.reserve(NumJumpDests + 3); + Code.push_back(OP_CALLDATALOAD); + Code.push_back(OP_JUMP); + for (size_t I = 0; I < NumJumpDests; ++I) { + Code.push_back(OP_JUMPDEST); + } + Code.push_back(OP_STOP); + return Code; +} + +double timeCacheBuildMs(const std::vector &Code) { + using Clock = zen::common::SteadyClock; + const auto Start = Clock::now(); + zen::evm::EVMBytecodeCache Cache; + zen::evm::buildBytecodeCache(Cache, + reinterpret_cast(Code.data()), + Code.size(), EVMC_CANCUN, /*EnableSPP=*/true); + const auto End = Clock::now(); + return std::chrono::duration(End - Start).count(); +} + +} // namespace + +int main(int Argc, char **Argv) { + if (Argc != 2) { + std::fprintf(stderr, "usage: %s \n", Argv[0]); + return 2; + } + const size_t N = static_cast(std::stoull(Argv[1])); + const auto Code = makeDynDispatchContract(N); + const double Ms = timeCacheBuildMs(Code); + std::printf("%zu,%.3f\n", N, Ms); + return 0; +} diff --git a/src/tests/evm_cache_tests.cpp b/src/tests/evm_cache_tests.cpp new file mode 100644 index 000000000..ac34320e3 --- /dev/null +++ b/src/tests/evm_cache_tests.cpp @@ -0,0 +1,107 @@ +// Copyright (C) 2025 the DTVM authors. All Rights Reserved. +// SPDX-License-Identifier: Apache-2.0 + +// Regression tests for buildBytecodeCache's SPP pipeline: implicit +// dyn-pred count + reachability stitch on dyn-target JUMPDESTs. + +#include "evm/evm_cache.h" + +#include +#include +#include + +#include +#include +#include + +namespace { + +using zen::evm::buildBytecodeCache; +using zen::evm::EVMBytecodeCache; + +constexpr uint8_t OP_STOP = static_cast(evmc_opcode::OP_STOP); +constexpr uint8_t OP_ADD = static_cast(evmc_opcode::OP_ADD); +constexpr uint8_t OP_CALLDATALOAD = + static_cast(evmc_opcode::OP_CALLDATALOAD); +constexpr uint8_t OP_POP = static_cast(evmc_opcode::OP_POP); +constexpr uint8_t OP_JUMP = static_cast(evmc_opcode::OP_JUMP); +constexpr uint8_t OP_JUMPDEST = static_cast(evmc_opcode::OP_JUMPDEST); +constexpr uint8_t OP_PUSH1 = static_cast(evmc_opcode::OP_PUSH1); + +EVMBytecodeCache buildSPPCache(const std::vector &Code) { + EVMBytecodeCache Cache; + buildBytecodeCache(Cache, reinterpret_cast(Code.data()), + Code.size(), EVMC_CANCUN, /*EnableSPP=*/true); + return Cache; +} + +EVMBytecodeCache buildNoSPPCache(const std::vector &Code) { + EVMBytecodeCache Cache; + buildBytecodeCache(Cache, reinterpret_cast(Code.data()), + Code.size(), EVMC_CANCUN, /*EnableSPP=*/false); + return Cache; +} + +// Smoke: no dynamic jumps + a statically-dead JUMPDEST must not crash; +// SPP must leave the dead block's cost unchanged (empty Succs, nothing +// to shift out). +TEST(EVMCacheImplicitDynPred, BuildsCleanly_NoDynJumpWithDeadJumpDest) { + const std::vector Code = {OP_STOP, OP_JUMPDEST, OP_ADD, OP_STOP}; + const EVMBytecodeCache Cache = buildSPPCache(Code); + + ASSERT_EQ(Cache.GasChunkCost.size(), Code.size()); + ASSERT_EQ(Cache.GasChunkCostSPP.size(), Code.size()); + // JUMPDEST(1) + ADD(3) = 4 gas. + EXPECT_EQ(Cache.GasChunkCost[1], 4u); + EXPECT_EQ(Cache.GasChunkCostSPP[1], Cache.GasChunkCost[1]); +} + +// A JUMPDEST reachable only via an unresolved dynamic jump must still +// land in dom-analysis input via the reachability stitch, so its SPP +// entry is populated. +TEST(EVMCacheImplicitDynPred, DynTargetJumpDest_StitchedIntoSPP) { + const std::vector Code = { + OP_CALLDATALOAD, OP_JUMP, OP_JUMPDEST, OP_ADD, OP_POP, OP_STOP, + }; + const EVMBytecodeCache Cache = buildSPPCache(Code); + + ASSERT_EQ(Cache.GasChunkCost.size(), Code.size()); + ASSERT_EQ(Cache.GasChunkCostSPP.size(), Code.size()); + // JUMPDEST(1) + ADD(3) + POP(2) + STOP(0) = 6 gas. + EXPECT_EQ(Cache.GasChunkCost[2], 6u); + EXPECT_EQ(Cache.GasChunkCostSPP[2], Cache.GasChunkCost[2]); + // CALLDATALOAD(3) + JUMP(8) = 11 gas. + EXPECT_EQ(Cache.GasChunkCost[0], 11u); +} + +// EnableSPP=false must leave GasChunkCostSPP empty so the JIT-consumer +// fall-through hands the unshifted cost array to downstream code. +TEST(EVMCacheImplicitDynPred, InterpreterOnly_LeavesSPPArrayEmpty) { + const std::vector Code = {OP_PUSH1, 0x05, OP_JUMP, OP_PUSH1, + 0x00, OP_JUMPDEST, OP_STOP}; + const EVMBytecodeCache Cache = buildNoSPPCache(Code); + + ASSERT_EQ(Cache.GasChunkCost.size(), Code.size()); + EXPECT_TRUE(Cache.GasChunkCostSPP.empty()); +} + +// Two dynamic JUMPs => ImplicitDynamicPredCount == 2 on each JUMPDEST. +// effectivePredCount must block any lemma614 shift INTO either JUMPDEST. +TEST(EVMCacheImplicitDynPred, MultipleDynJumps_BothTargetsCounted) { + const std::vector Code = { + OP_CALLDATALOAD, OP_JUMP, OP_JUMPDEST, OP_CALLDATALOAD, + OP_JUMP, OP_JUMPDEST, OP_POP, OP_STOP, + }; + const EVMBytecodeCache Cache = buildSPPCache(Code); + + ASSERT_EQ(Cache.GasChunkCost.size(), Code.size()); + ASSERT_EQ(Cache.GasChunkCostSPP.size(), Code.size()); + EXPECT_EQ(Cache.JumpDestMap[2], 1u); + EXPECT_EQ(Cache.JumpDestMap[5], 1u); + EXPECT_GT(Cache.GasChunkCost[2], 0u); + EXPECT_GT(Cache.GasChunkCost[5], 0u); + EXPECT_EQ(Cache.GasChunkCostSPP[2], Cache.GasChunkCost[2]); + EXPECT_EQ(Cache.GasChunkCostSPP[5], Cache.GasChunkCost[5]); +} + +} // namespace