Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
48fada6
perf(core): replace iterative-bitset dominator with Cooper-Harvey-Ken…
abmcar May 12, 2026
1be3f39
perf(core): inline dom-tree children adjacency to avoid N small allocs
abmcar May 16, 2026
62ef503
perf(core): add optional ZEN_EVM_CACHE_PROFILE per-phase instrumentation
abmcar May 16, 2026
3c659f6
test(core): add bytecode-replay demo mode + structural dominator tests
abmcar May 16, 2026
9df8ee8
chore(tools): add evm-cache bench harness and paired-ratio BCa analyzer
abmcar May 16, 2026
a75ab11
test(core): add corpus fetchers for evm-cache bench (Sourcify + top-RPC)
abmcar May 16, 2026
92c6c04
docs(core): document dom-CHK algorithm + per-phase profile flag
abmcar May 16, 2026
04d0a55
docs(docs): promote evm-spp-overhaul spec into the repo
abmcar May 16, 2026
b00efa1
docs(docs): apply Phase 4 R1 Codex review fixes (gate honesty + scope)
abmcar May 16, 2026
8a95175
docs(docs): apply Phase 4 R1 Opus review fixes (IrreducibleSCC + nits)
abmcar May 16, 2026
592fd35
docs(docs): apply Phase 4 R2 review fixes (test rename + variance band)
abmcar May 16, 2026
e06d291
perf(core): fuse buildGasBlocks 2-pass into single bytecode walk
abmcar May 17, 2026
3bba649
perf(core): fold collectJumpDests into buildGasBlocks single walk
abmcar May 17, 2026
0dd5bb9
perf(core): flatten Preds/Succs into CSR for cache-locality on hot pa…
abmcar May 17, 2026
4d74033
perf(core): add chkFixpointRounds counter to diagnose CHK convergence
abmcar May 17, 2026
6e1bc6b
perf(core): derive InCycle from natural loops on reducible CFGs
abmcar May 17, 2026
de934a8
perf(core): fuse buildCFGEdges two passes into a single sweep
abmcar May 17, 2026
118c993
perf(core): share computeDomInfo RPO with computeReverseTopo
abmcar May 17, 2026
77e0454
style(core): apply tools/format.sh to evm_cache.cpp after PR C work
abmcar May 17, 2026
55a250b
perf(core): reserve Blocks + emplace_back to drop GasBlock move/reall…
abmcar May 17, 2026
689e5d5
perf(core): split per-block Succs/Preds out of GasBlock into EdgeTables
abmcar May 17, 2026
f7630d8
perf(core): pack GasBlock to exact 32 bytes via field reorder
abmcar May 17, 2026
4f9f5be
docs(docs): add change doc for evm-cache-build-fusion PR
abmcar May 17, 2026
c5db655
docs(docs): apply round-1 red-team review fixes (Opus + Codex)
abmcar May 17, 2026
de507df
docs(docs): add round-2 reviews (both PASS) + asymmetric gap note
abmcar May 17, 2026
ab74da5
docs(docs): add cache-build spec, B-lite pilot, motivation reviews
abmcar May 17, 2026
99a666c
docs(docs): clarify phase-pair count in cache-build chrono overhead
abmcar May 17, 2026
911f8c1
docs(docs): translate perf-summary.md to English
abmcar May 17, 2026
606709f
fix(core): address review feedback on evm_cache.cpp
abmcar May 18, 2026
332606d
docs(core): correct CHK dominator complexity claim in evm_cache.md
abmcar May 18, 2026
4c585b1
fix(test): validate PUSH target against JUMPDEST set in static_jump_s…
abmcar May 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,11 @@ option(ZEN_ENABLE_DEBUG_GREEDY_RA "Enable debug greedy ra" OFF)
# Profiling options
option(ZEN_ENABLE_PROFILER "Enable profiler" OFF)
option(ZEN_ENABLE_LINUX_PERF "Enable linux perf" OFF)
option(
ZEN_EVM_CACHE_PROFILE
"Enable per-phase wall-clock instrumentation in EVM bytecode-cache build (stderr CSV)"
OFF
)

# Test options
option(ZEN_ENABLE_SPEC_TEST "Enable spec test" OFF)
Expand Down
394 changes: 394 additions & 0 deletions docs/changes/2026-05-12-evm-dom-chk/README.md

Large diffs are not rendered by default.

172 changes: 172 additions & 0 deletions docs/changes/2026-05-12-evm-dom-chk/reviews/impl-round-1-codex.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# R1 implementation review - Codex skeptic

Worktree: `/home/abmcar/DTVM/.worktrees/perf-dom-lengauer-tarjan`

Reviewed current worktree contents, not only `/home/abmcar/.claude/jobs/3d8995d3/dom-chk-impl.diff`. The worktree contains uncommitted edits to `src/evm/evm_cache.cpp`, `src/tests/evm_cache_tests.cpp`, and untracked docs/header files.

## Findings

1. ✗ **Spec consistency: cited line numbers and grep claims are stale.**

The change doc still cites old implementation lines. Examples:

- `docs/changes/2026-05-12-evm-dom-chk/README.md:11` says `computeDominators` is at line 619, but `rg -n "computeDominators|computeDomInfo" src/evm/evm_cache.cpp` outputs:
- `src/evm/evm_cache.cpp:627:static DomInfo computeDomInfo(...)`
- no `computeDominators` hit.
- `README.md:87-89` cites `evm_cache.cpp:631` / `660-664` for old init/class-C behavior. Current code has `Info.IDom.assign` at `src/evm/evm_cache.cpp:631`, while class A/B/C init is at `src/evm/evm_cache.cpp:647-660`.
- `README.md:92` cites the Phase-7 stitch at `evm_cache.cpp:1087-1108`; current stitch is `src/evm/evm_cache.cpp:1227-1260`.
- `README.md:174-176` lists old query lines 684/793/838. Current query sites are `src/evm/evm_cache.cpp:834`, `:943`, and `:990`.
- `README.md:178-179` says `grep -n "bitsetTest(Dom" src/evm/evm_cache.cpp` returned three hits. Fresh command `rg -n "bitsetTest\\(Dom" src/evm/evm_cache.cpp` returned no output (exit 1).

2. ✗ **Result numbers are not reproducible exactly.**

Current doc table at `README.md:256-259` claims `3.38 / 5.90 / 14.48 / 38.95 ms` for `10k/20k/50k/100k`. The user-provided older claim `2.85 / 5.52 / 14.66 / 40.07 ms` is no longer what the current doc says.

Fresh command:

```sh
for N in 10000 20000 50000 100000; do ./build/evmCacheComplexityDemo $N; done
```

Output:

```text
10000,2.878
20000,5.978
50000,16.355
100000,38.719
```

Thresholds still look satisfied for 20k and 100k, but the published exact table does not match the fresh run, especially 50k.

3. ✗ **Gate counts are partly wrong or unsupported.**

- ✓ Multipass unit slice is verified. Command:

```sh
EVMONE_EXTERNAL_OPTIONS=.../build/lib/libdtvmapi.so.0.1.0,mode=multipass \
/home/abmcar/evmone/build/bin/evmone-unittests --gtest_filter="$(paste -sd: tests/evmone_unittests/EVMOneMultipassUnitTestsRunList.txt)"
```

Output ended with:

```text
[==========] 223 tests from 1 test suite ran. (8512 ms total)
[ PASSED ] 223 tests.
```

- ✗ Interpreter count in the doc is wrong. `README.md:245` says `215/215`, but:

```text
wc -l tests/evmone_unittests/EVMOneMultipassUnitTestsRunList.txt tests/evmone_unittests/EVMOneInterpreterUnitTestsRunList.txt
223 tests/evmone_unittests/EVMOneMultipassUnitTestsRunList.txt
226 tests/evmone_unittests/EVMOneInterpreterUnitTestsRunList.txt
449 total
```

- ✗ Statetest `2723/2723` is unsupported in this worktree. Command `find . -type f \( -iname '*run*list*.txt' -o -iname '*runlist*.txt' \)` found only the two evmone unit-test run lists, not a statetest run list. Counting current Cancun JSON post entries:

```sh
find tests/evm_spec_test/state_tests -type f -name '*.json' -print0 |
xargs -0 jq '[.[] | select(.post.Cancun != null) | .post.Cancun | length] | add // 0' |
awk '{s+=$1} END {print s}'
```

Output:

```text
1798
```

I did not run the slow statetest suite.

- ✗ The user asked to verify `evmCacheTests 8/8`; current implementation has 9 tests. Current doc `README.md:247` says 9/9, and fresh `./build/evmCacheTests --gtest_color=no` output ends with:

```text
[==========] 9 tests from 2 test suites ran. (0 ms total)
[ PASSED ] 9 tests.
```

4. ✗ **Format gate is not clean as written.**

Command:

```sh
tools/format.sh check
```

Exit code: 123. Output starts with unrelated existing files such as:

```text
src/singlepass/x64/assembler.h:34:3: error: code should be clang-formatted [-Wclang-format-violations]
src/singlepass/x64/asm/assembler.h:340:50: error: code should be clang-formatted [-Wclang-format-violations]
src/platform/sgx/zen_sgx_file.h:65:31: error: code should be clang-formatted [-Wclang-format-violations]
```

Narrow check for changed files did pass:

```sh
clang-format --dry-run -style=file -Werror src/evm/evm_cache.cpp src/evm/evm_cache_for_testing.h src/tests/evm_cache_tests.cpp
```

Output: none; exit code 0.

5. ✗ **The exact compiler-warning grep is non-empty after a clean rebuild.**

I first had to set `CCACHE_DIR=/tmp/codex-ccache CCACHE_TEMPDIR=/tmp/codex-ccache/tmp`, because default ccache tried to write `/home/abmcar/.cache/ccache/tmp` and failed under the sandbox.

Commands:

```sh
CCACHE_DIR=/tmp/codex-ccache CCACHE_TEMPDIR=/tmp/codex-ccache/tmp cmake --build build --target clean
CCACHE_DIR=/tmp/codex-ccache CCACHE_TEMPDIR=/tmp/codex-ccache/tmp \
cmake --build build --target dtvmapi -- -j$(nproc) 2>&1 |
tee /tmp/dtvmapi-build-r1.log |
grep -E "warning|error"
```

Output includes 9 matches. One match is a false positive on `errors.cpp.o`; the rest are warnings in unrelated files, e.g.:

```text
src/utils/others.cpp:86:10: warning: ignoring return value of 'size_t fread(...)' [-Wunused-result]
src/common/traphandler.cpp:117:18: warning: cast between incompatible function types ... [-Wcast-function-type]
src/common/evm_traphandler.cpp:133:18: warning: cast between incompatible function types ... [-Wcast-function-type]
src/compiler/cgir/pass/cg_inline_spiller.cpp:1405:6: warning: ... defined but not used [-Wunused-function]
```

Build itself succeeded (`/tmp/dtvmapi-build-r1.log` ends with `[100%] Built target dtvmapi`), and `grep -E "warning|error" /tmp/dtvmapi-build-r1.log | rg "evm_cache|evm_cache_tests|evm_cache_for_testing"` produced no output. Still, `README.md:243` claims the gate is clean, and the requested grep is not clean.

6. ✓ **`DomInfo::dominates` interval correctness is structurally sound for valid node IDs.**

Current formula is at `src/evm/evm_cache.cpp:616-623`:

```cpp
return Enter[A] <= Enter[B] && Exit[B] <= Exit[A];
```

The DFS builds the dom tree by inverting `IDom` at `src/evm/evm_cache.cpp:788-791`, uses a single global `uint32_t Time = 0` at `src/evm/evm_cache.cpp:800`, assigns pre-order enter times at `:805` and `:812`, and assigns post-order exit times at `:815`. That is the standard ancestor interval invariant. The `A == B` fast path returns before bounds checking; all three production callers pass valid block IDs, so this is not a current blocker.

7. ✓ **Caller argument order preserves `(dominator, dominated)` semantics.**

Fresh `rg -n "Dom\\.dominates" src/evm/evm_cache.cpp` output:

```text
834: if (Dom.dominates(To, static_cast<uint32_t>(From))) {
943: if (!Dom.dominates(To, static_cast<uint32_t>(From))) {
990: if (!Dom.dominates(Loop.Header, Node)) {
```

These preserve the original semantic documented at `README.md:168-170`: the first argument is the dominator candidate, second is the dominated node.

8. ✗ **Doc internal consistency still needs revision.**

- `README.md:188` says loop collection after the PR is `O(Σ |loop|)` interval-containment tests. The current code still performs bitset work: `Words = bitsetWordCount(NumBlocks)` at `src/evm/evm_cache.cpp:928`, ORs every word at `:954-956`, scans all nodes at `:965-979`, and uses bitset intersection/subset checks at `:1000-1004` and later parent selection. Dominance queries are O(1), but loop collection is not accurately described by that table row.
- `README.md:317-318` says class C is handled "via the post-fixpoint sweep"; current design text says init seeding at `README.md:97-99`, and code implements init seeding at `src/evm/evm_cache.cpp:647-660`. The Risk section is stale.
- The Risks section does **not** claim O(depth) query worst-case; `README.md:311-312` correctly says the query worst case is unchanged because Enter/Exit DFS is always `O(N + E)`. The remaining `O(depth)` mentions are outside Risks (`README.md:71`, `:73`, `:345`) and refer to `intersect` or a discarded first pass, not current queries.
- `README.md:274` says "The four new GTests" but lists five, and `README.md:340-341` / `:361-362` still say four tests.

## Verdict

Verdict: REVISE — concrete blockers listed above.

Core implementation checks for interval dominance and the three caller argument orders pass. The blockers are review/documentation/gate integrity issues: stale spec line citations, non-reproducible exact timing table, wrong/unsupported gate counts, global format failure, non-empty warning grep, and inaccurate complexity/risk wording.
103 changes: 103 additions & 0 deletions docs/changes/2026-05-12-evm-dom-chk/reviews/impl-round-1-opus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
Verdict: REVISE — the implementation tracks the §Design spec closely and the three caller rewrites use the correct (dominator, dominated) order, but (a) a class-C *descendant* corner case produces a strictly weaker dominance relation than the old bitset pass, and (b) the GTest set never exercises the post-fixpoint sweep (class C) it is designed to handle. Other findings are NITs.

## 1. Spec compliance — PASS

- CHK intersect with postorder fingers — `src/evm/evm_cache.cpp:708-726`. Walks the lower-postorder finger up the partially-built IDom tree, matches Cooper-Harvey-Kennedy 2001 Fig. 3.
- Multi-root divergence sentinel — `src/evm/evm_cache.cpp:712-715` and `:717-722` return `UINT32_MAX` iff a finger reaches its own root (`P == B1 || P == UINT32_MAX`). Caller at `:748-751` flags `Diverged=true` and falls back to self-root at `:754-755`. Matches §Design step 5 (option (a) from R2: divergence-only at this site).
- Post-fixpoint sweep — `src/evm/evm_cache.cpp:767-771`. Promotes residual `UINT32_MAX` (class C and any orphan reachable component) to self-root. Matches §Design step 7.
- Enter/Exit DFS on a single global `Time` counter — `src/evm/evm_cache.cpp:789-808`. Iterates roots in node-id order, recurses through `Children[]` (inverted IDom), increments `Time` on both enter and exit. Each root receives a disjoint `[Enter, Exit]` interval, so cross-root pairs correctly answer `dominates == false`.

## 2. Output semantics — PASS (argument orders correct)

`bitsetTest(Dom[X], Y)` reads "Y dominates X", i.e. Y is the dominator candidate. The rewrites:

| Old site | New site | Call |
|----------|----------|------|
| `evm_cache.cpp:684` (`bitsetTest(Dom[From], To)`) | `evm_cache.cpp:823` | `Dom.dominates(To, From)` — back-edge: To dominates From. ✓ |
| `evm_cache.cpp:793` | `evm_cache.cpp:932` | `Dom.dominates(To, From)` — header discovery: same orientation. ✓ |
| `evm_cache.cpp:838` | `evm_cache.cpp:979` | `Dom.dominates(Loop.Header, Node)` — loop-body sanity: header dominates body. ✓ |

`grep -n "Dom.dominates" src/evm/evm_cache.cpp` returns exactly these 3 hits — no stragglers.

## 3. DomInfo::dominates correctness — PASS

`evm_cache.cpp:623-631`. The interval-containment invariant `Enter[A] <= Enter[B] && Exit[B] <= Exit[A]` is correct iff the Enter/Exit DFS assigns each subtree a contiguous interval. The DFS at `:789-808` does exactly this:
- Pre-tick on push (`Info.Enter[Root] = Time++` at `:794`; `Info.Enter[C] = Time++` at `:801`).
- Post-tick on pop (`Info.Exit[Top.Node] = Time++` at `:804`).

For any A ancestor of B in the dom tree, A's subtree DFS strictly encloses B's, so `Enter[A] < Enter[B] && Exit[B] < Exit[A]`. For cross-root pairs, the global counter ticks monotonically across roots, so two roots get strictly disjoint intervals; non-containment holds.

## 4. BLOCKER — Class-C *descendant* drift from old bitset semantics

Location: `src/evm/evm_cache.cpp:737-762` (fixpoint inner loop) and `:767-771` (sweep).

Scenario: node N is class C (`Reachable[N]==1`, `Preds` non-empty, all preds `Reachable==0`). Node M has `Reachable[M]==1` and its only Reachable pred is N.

- **Old bitset pass** (`computeDominators`, removed): N's `HasPred=false` branch produced `Dom[N] = {N}`. For M, `Dom[M] = (All & Dom[N]) ∪ {M} = {N, M}` — **N dominates M**.
- **New CHK pass**: At `:738-740` we skip Reachable==0 preds; at `:741-743` we skip preds whose `IDom == UINT32_MAX`. For N: all preds skipped → `NewIDom = UINT32_MAX` → no update. For M (visited later in RPO): its only Reachable pred N still has `IDom = UINT32_MAX` at this point, skipped → `NewIDom = UINT32_MAX` → no update. After the fixpoint converges, both N and M are still `UINT32_MAX`; the sweep at `:767-771` makes both self-roots. **N does NOT dominate M.**

This is a strictly weaker dominance relation than the old pass. The three query sites all read "does X dominate Y"; a false answer can:
- Suppress a back-edge `findBackEdgesUsingDominators` would otherwise detect (`:823`).
- Drop a loop header (`:932`).
- Or, conversely, fail the loop-body sanity check (`:979`) — `buildLoopsUsingDominance` returns false and SPP falls back to non-linear processing.

The change doc (README §Risks bullet 2 at `docs/changes/2026-05-12-evm-dom-chk/README.md:300-304`) only addresses class C *itself*, not its descendants. The doc claims class C "is expected absent post-stitch", but that only protects the node-itself case; a class-C descendant chain (M, M', M'' all only reachable through N) is the broader corner.

Fixes (pick one):

- (a) **Preserve old semantics**: after the post-fixpoint sweep promotes class-C nodes to self, run **one more RPO pass** so descendants pick up the now-promoted class-C node as their IDom seed. (Cheap; a single pass for the rare case.)
- (b) **Treat any reachable orphan as a fresh root at init**: extend the init at `:646-650` to seed `IDom[N] = N` for any node whose Reachable-true pred set is empty (the third class A∪B∪C up front), then the fixpoint and sweep are unchanged.
- (c) **Accept the divergence and prove it cannot affect SPP**: requires a proof that no class-C *chain* survives Phase-7 stitch (the stitch only adds forward edges through Succs, but it can leave class-C descendants if a JUMPDEST chain is entered only via a stitched node whose own preds were stale). I do not see this proof in the change doc.

Recommend (b) — it is one extra line in the init loop and removes the entire foot-gun.

## 5. MED — GTests do not exercise the post-fixpoint sweep

Location: `src/tests/evm_cache_tests.cpp:154-241` (the four new dominator tests).

`LinearChain_Correct`, `DiamondCFG_Correct`, `NestedLoop_Correct` only have nodes that are either Reachable==1 with at least one Reachable pred, or Reachable==1 with empty preds (entry). `DisjointRoots_SelfIdom` exercises true multi-root divergence (preds in distinct forests) — that hits the `intersect → UINT32_MAX → Diverged=true` path at `:748-755`, **not** the post-fixpoint sweep at `:767-771`.

The sweep at `:767-771` is unexercised. The class-C corner from finding 4 is also unexercised. A targeted test would build a 3-node CFG: node 0 with `Reachable==0`, node 1 with `Reachable==1, Preds={0}`, node 2 with `Reachable==1, Preds={1}`. Expected old semantics: `IDom[1]=1, IDom[2]=1`. Current implementation: `IDom[1]=1, IDom[2]=2` (drift). The test makes the drift testable.

Fix: add `ClassC_DescendantsRouteToNodeRoot` (or equivalent name reflecting the chosen resolution from finding 4).

## 6. NIT — Defensive DFS reachability over Succs only

Location: `src/evm/evm_cache.cpp:698-702`.

The defensive DFS visits any unvisited node, but it only follows Succs (`Blocks[Top.Node].Succs` at `:674`). A reachable orphan whose entry has empty Succs (a single-node island with no outgoing edges) would be visited as a 1-node DFS — fine. But if the orphan island is entered only through Preds (reachable from elsewhere via Succs of an unreachable node), the defensive sweep at `:698-702` would visit them in node-id order; not a correctness issue, just noting that "DFS over Succs from every unvisited node" is a strict superset of "DFS over Succs from roots" and the postorder numbering for class-C and orphan nodes is well-defined.

## 7. NIT — Header comment slightly verbose vs project style

Location: `src/evm/evm_cache.cpp:606-617`.

`.claude/rules/cpp-code-style.md` says "Only include essential comments — avoid excessive documentation". The 12-line header is defensible (algorithm + 3 root classes + sentinel rule), but trims well: the §Design table is in the change doc, and the inline comment could be ~4 lines (algorithm name, root semantics, query-helper pointer). Not a blocker.

## 8. NIT — Frame reference is invalidated on push, relies on increment-before-push

Location: `src/evm/evm_cache.cpp:673-686` (DfsFrame) and `:797-806` (EtFrame).

`DfsFrame &Top = Stack.back()` at `:673` is a live reference, and `Stack.push_back(...)` at `:680` may invalidate it. The code increments `Top.SuccIdx` *before* the push and never re-reads `Top` after the push in the same iteration — so this is safe, but it is fragile to future edits. Both stacks are `reserve(N)`'d (`:663`, `:788`) and max depth is ≤ N, so reallocation should not occur even if the push did happen after re-read. Recommend a one-line comment: `// Top may be invalidated by push_back below; do not reuse.`

## 9. NIT — `for_testing::computeIDomForTesting` Preds reconstruction order

Location: `src/evm/evm_cache.cpp:1409-1422`.

Preds are reconstructed in node-id order, which is not necessarily the order Preds appear in the production pipeline (where `buildCFGEdges` may emit them in a different order). The fixpoint inner loop is order-insensitive for correctness (final `NewIDom` is the intersection), but if a future regression test depends on a specific Pred order to trigger a class-C path, the testing shim may mask it. Document the order-insensitive invariant or pin the Preds order to match production.

## 10. Commit message conformance — PASS

Planned title: `perf(core): replace iterative-bitset dominator with Cooper-Harvey-Kennedy algorithm`.

Per `.claude/rules/commit-conventions.md`:
- Type `perf` ✓ (perf change).
- Scope `core` ✓ (touches `src/evm/`, which `repo-architecture.md` groups under core runtime).
- Subject lowercase, imperative ("replace"), no trailing period ✓.
- Length 91 chars < 120 ✓.

## Suggested test additions

1. `ClassC_DescendantsRouteToNodeRoot` — see finding 4/5.
2. `CrossRootDominatesFalse` — direct `DomInfo::dominates(A, B)` check across roots (currently only indirect via IDom equality). Build a 2-disjoint-tree fixture and assert `dominates` returns false for cross-root pairs. (Optional but easy.)

Reviewed by: opus (impl R1)
Loading
Loading