perf: EVM cache-build overhaul (dom-CHK + phase fusion + CSR) by abmcar · Pull Request #514 · DTVMStack/DTVM

abmcar · 2026-05-18T04:13:08Z

Summary

EVM cache-build pipeline overhaul. Two layers of work bundled in one PR
because the second layer depends on instrumentation + algorithmic
foundations introduced in the first:

Foundation layer: replace the iterative-bitset dominator
(O(N²/64)) with Cooper-Harvey-Kennedy + Tarjan DFS Enter/Exit for
O(1) dominates(A, B) queries; inline the dominator-tree children
adjacency; add the opt-in ZEN_EVM_CACHE_PROFILE per-phase chrono
instrumentation; add evmCacheComplexityDemo bytecode-replay mode +
structural dominator GTests; add the bench_evm_cache.sh paired
harness + analyze_evm_cache_bench.py paired-ratio BCa cluster-
bootstrap analyzer; add the Sourcify + top-RPC corpus fetchers.
Fusion layer: collapse multi-pass bytecode/edge walks
(buildGasBlocks 2-pass → 1-pass, collectJumpDests folded in,
buildCFGEdges single-sweep); flatten Blocks[].Succs/Preds into a
read-only CSRGraph after splitCriticalEdges freezes the graph and
route every downstream reader through it; share
DomInfo::RPO with computeReverseTopo; pack GasBlock 80 B → 32 B
by extracting Succs/Preds into a parallel EdgeTables and
reordering fields (static_assert(sizeof(GasBlock) == 32)-locked).

Behaviour-level semantics unchanged. evmone-statetest --vm external_vm -k fork_Cancun 2723/2723 pass and build/evmCacheTests 14/14 pass —
both re-ran after every implementation commit, not just at the end.

The fusion-layer N=100k headline (-41% / 1.69×) was independently
re-measured at 1.67× / -40.2%, reproducing within ±10%. Spec docs:

docs/changes/2026-05-16-evm-spp-overhaul/README.md (foundation)
docs/changes/2026-05-17-evm-cache-build-fusion/README.md (fusion)
docs/changes/2026-05-17-evm-cache-build-fusion/perf-summary.md
(3-tier cross-N comparison + per-phase deltas + production-scale
pilot + pre-committed gating criteria for the deferred micro-opts)

Production-scale pilot (n=10, directional)

Paired wall-clock on 10 mainnet contracts pulled via eth_getCode from
https://ethereum.publicnode.com, stratified by CodeSize. 15 reps per
binary, point estimates only — the full paired-ratio BCa
cluster-bootstrap (the foundation layer's harness applied to a wider
Sourcify corpus) remains a post-merge follow-up.

The speedup column below is the median of per-contract speedups,
not the ratio of the displayed Median baseline / Median HEAD
columns (each of which is itself a median across the stratum's
contracts).

Stratum	n	Median baseline (us)	Median HEAD (us)	Median speedup	Median Δ%
small (<4 KB)	3	71.9	64.9	1.11×	+9.7%
medium (4-16 KB)	4	343.1	341.3	1.12×	+10.4%
large (16-25 KB)	3	1374.7	1003.9	1.25×	+20.0%
overall	10	—	—	1.17×	+14.9%

9 / 10 contracts faster on HEAD. DAI (-21.5%, 7.9 KB) is the one
outlier and is logged for follow-up — not a ship blocker, but warrants
investigation under a wider corpus. Per-contract rows live in
perf-summary.md.

Caveats: selection-biased toward high-traffic mainnet contracts
(USDT/USDC/Uniswap/WETH cluster); n=10 is too thin to support any
confidence-interval claim; this is a directional sanity check, not
production-grade methodology.

Synthetic stress (algorithmic-DoS regime, not production scale)

EIP-170 caps real contract bytecode at 24 576 bytes, so the deployable
upper bound is N ≲ 8000 blocks (at worst-case ~3 B / block packing).
evmCacheComplexityDemo at N=100 000 is outside what a deployed
contract can produce and ships only as an algorithmic-DoS regression
guard. With that caveat front-loaded:

N	upstream/main (us)	This PR (us)	Cumulative speedup
10 000	14 110	2 476	5.7×
20 000	45 862	4 876	9.4×
50 000	246 615	13 972	17.7×
100 000	959 509	29 065	33.0×

Of the 33× at N=100 000, the foundation layer contributes 18.6× (the
iterative-bitset dominator was the dominant cost at upstream/main) and
the fusion layer contributes the remaining 1.78× on top. Real deployed
contracts fall in the N=100-2000 band where the proportional gain
compresses substantially — see the production-scale pilot above for
an empirical anchor.

The pipeline goes from super-linear 2× N → 4× time (upstream/main,
matching the O(N²/64) bitset dataflow) to fully linear 2× N → 2.0× time on HEAD.

What's in this PR

28 commits = 17 implementation/test/tooling + 11 docs/review. Diff:
45 files, +6347 / -244. Source footprint: 11 files / +1896 / -244
under src/, tests/corpus/evm-cache/, and tools/; the rest is
docs.

Foundation layer (commits 48fada6..592fd35, 11 commits):

48fada6 replace iterative-bitset dominator with CHK
1be3f39 inline dom-tree children adjacency
62ef503 opt-in ZEN_EVM_CACHE_PROFILE per-phase instrumentation
3c659f6 bytecode-replay demo mode + structural dominator GTests
9df8ee8 bench_evm_cache.sh + analyze_evm_cache_bench.py
(paired-ratio BCa cluster-bootstrap; Efron-Tibshirani §14.3)
a75ab11 Sourcify + top-RPC corpus fetchers
92c6c04, 04d0a55, b00efa1, 8a95175, 592fd35 change doc +
review fixes

Fusion layer (commits e06d291..911f8c1, 17 commits):

Bytecode-walk fusion: e06d291 buildGasBlocks 2-pass → 1-pass;
3bba649 collectJumpDests fold.
CSR adjacency + conditional Tarjan: 0dd5bb9 CSRGraph flatten
after splitCriticalEdges; 4d74033 chkFixpointRounds diagnostic
counter; 6e1bc6b conditional InCycle on reducible CFGs (skips
Tarjan SCC when reducible).
Edge-build fusion + RPO share: de934a8 buildCFGEdges single-sweep;
118c993 computeReverseTopo reads DomInfo::RPO.
GasBlock compaction: 55a250b Blocks.reserve(CodeSize) +
emplace_back; 689e5d5 Succs/Preds extracted into parallel
EdgeTables (GasBlock 80 → 40 B); f7630d8 field reorder packs to
exact 32 B (static_assert-locked).
77e0454 clang-format sweep (no semantic change).
4f9f5be, c5db655, de507df, ab74da5, 99a666c, 911f8c1
change doc + module spec + production-scale pilot + review fixes.

Safety invariant on irreducible CFGs is preserved by lemma614Update's
effectivePredCount multi-pred guard (evm_cache.cpp:1224), not by
the conditional InCycle fast-path. A future-contributor warning in
docs/modules/evm/cache-build.md §Invariants explicitly states the
multi-pred guard must NOT be removed on the assumption that InCycle
covers it. Counterexample (irreducible 2-entry cycle A ↔ B) included
in that warning.

Test plan

tools/format.sh check clean
cmake --build build --target dtvmapi -j$(nproc) succeeds with
no new warnings (use CCACHE_DISABLE=1 if ccache mount is read-only)
build/evmCacheTests — 14 / 14 pass (10 dominator + 4
implicit-dyn-pred)
evmone-statetest --vm external_vm -k fork_Cancun — 2723 / 2723
pass (~80 s)
evmCacheComplexityDemo at N=10k/20k/50k/100k — monotone
improvement vs upstream/main
Production-scale pilot on 10 mainnet contracts — 9 / 10 faster
than upstream/main; DAI flagged for follow-up
Fusion-layer N=100k headline (-41% / 1.69×) independently
re-measured at 1.67× / -40.2% (within ±10%)

Out of scope / future work

Stack-SSA + SCCP — dropped on data: 92.5% (statetest) / 98.4%
(evmone-bench) of JUMPs already statically resolved by the existing
PUSH→JUMP heuristic; expected runtime gain < 1% against 500+ LoC
of SSA construction.
SemiNCA dominator — dropped on data: CHK fixpoint converges in
2 rounds on every measured workload (logged via chkFixpointRounds
diagnostic); SemiNCA's second-sweep saving (~1.5 ms) is comparable
to its own DSU bookkeeping cost.
Cache-build micro-opts (computeReachable fold /
buildCFGEdges dedup-skip / buildCSR prefetch /
GasBlock hot/cold split) — gated on the production-scale
validation follow-up. Pre-committed thresholds in perf-summary.md
§Future-work: GO requires (i) production N ≲ 8000 paired median
≥ +5% AND p95 reduction ≥ 0.2 ms, (ii) end-to-end evmone-bench
median ≥ +1% / p95 ≥ +3%, (iii) N=2000 paired ≥ 50% of N=100k
paired, (iv) first-touch p95 reduction ≥ +5%. KILL if any clause
fails → pivot to runtime / JIT / host-call hotspots.
UseLinearSPP=false dedicated GTest — deferred to a follow-up
PR. Current irreducible-fallback path correctness rests on the
multi-pred guard argument + evmone-statetest end-to-end. See
docs/changes/2026-05-17-evm-cache-build-fusion/README.md.

🤖 Generated with Claude Code

…nedy algorithm Replace the iterative bitset dataflow in computeDominators with Cooper-Harvey-Kennedy 2001 (CHK) producing an immediate-dominator array, augmented with Tarjan DFS pre/post times (DomInfo::Enter/Exit) so the two consumers (findBackEdgesUsingDominators, buildLoopsUsingDominance) answer dominance queries in O(1) via interval containment. Memory drops from O(N^2) bits to 3N uint32_t. Time drops from O(N^2/64) worst-case to O(N + E) typical for the reducible CFGs that EVM bytecode produces. evmCacheComplexityDemo speedups vs the post-DTVMStack#446 bitset path: N=10000: 10.38 ms -> 3.38 ms (3.1x) N=20000: 43.68 ms -> 5.90 ms (7.4x) N=50000: - -> 14.48 ms N=100000: 948 ms -> 38.95 ms (24.3x; user-provided pre-PR number) Class A/B/C self-root seeding moved to init time so descendants of a class-C node can intersect against a settled root in step 4 of the fixpoint, preserving the old bitset pass's Dom[descendant] semantics (verified by ClassCDescendant_SeedsAtInit). Gates (all pass): - format check on PR-changed files clean - dtvmapi build no new warnings in PR-touched files - evmone-unittests multipass 223/223 - evmone-unittests interpreter 215/215 - evmone-statetest -k fork_Cancun multipass 2723/2723 (zero new failures) - evmCacheTests 9/9 (4 implicit-dyn-pred + 5 new dominator) - evmCacheComplexityDemo gate thresholds met by >=2x margin Spec and reviews: docs/changes/2026-05-12-evm-dom-chk/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace `vector<vector<uint32_t>> Children(N)` with a CSR layout (ChildStart[N+1] + ChildIdx[]) so the dom-tree DFS no longer pays N inner-vector heap allocations at large N. Same algorithm; same output; lower constant factor. Also collapse the class-A/B/C init branches in computeDomInfo into a single `HasReachablePred` predicate, since Preds.empty() is just the zero-pred case of the same condition. Behavior unchanged. Gates: format clean, dtvmapi build clean, evmCacheTests 9/9, multipass unittests 223/223, scaling demo within ±10% noise of prior commit (still meets N=20k<15ms / N=100k<100ms gates). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Instrument buildGasChunksSPP with 13 named phase boundaries gated by ZEN_EVM_CACHE_PROFILE compile-time flag. OFF (default) macro-elides all chrono calls so release builds carry zero overhead. ON emits one stderr CSV row per phase: EVM_CACHE_PROFILE,<phase>,<microseconds>. Phases timed: - buildGasBlocks, collectJumpDests, buildCFGEdges, splitCriticalEdges - computeReachable (incl. dyn-target stitch), computeDomInfo - findBackEdges, computeReverseTopo, computeInCycle - buildLoopsUsingDominance, meteringInit, lemma614Schedule, writeback Enables per-phase profiling of real-corpus contracts to drive follow-up PR ordering (which phase dominates real workloads). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

evm_cache_complexity_demo: support --bytecode <hex-or-bin-file> [--label <tag>] to time cache build on real contract bytecode. CSV row <label>,<n_jumpdests>,<build_us> on stdout. Hex auto-detects 0x prefix and whitespace, falls back to raw binary. evm_cache_tests: add 5 structural dominator cases covering self-loops, irreducible multi-entry SCCs, nested loops with shared exits, post-split critical-edge diamonds, and dynamic-target JUMPDESTs inside static loops. Each case asserts IDom well-formedness via a shared helper plus behavioural invariants on cycle members. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

bench_evm_cache.sh: spawn one fresh process per repetition (avoids intra-process cache reuse); emit long-form CSV (label,n_jumpdests,run_idx,phase,phase_us). `phase=total` comes from the demo's stdout; per-phase rows are picked up from the demo's stderr when the binary is built with -DZEN_EVM_CACHE_PROFILE=ON. analyze_evm_cache_bench.py: cluster bootstrap on contracts (per-contract unit, WITH replacement, N=1000 by default); BCa with jackknife `a` (leave-one-contract-out, Efron 1987) and median-bias `z_0`; gate inverts on the `total` phase as r_upper_CI <= 0.85 (= improvement_lo >= 15%%). Sanity: baseline=treatment gives r_median=1.0 with degenerate CI; a synthetic 50%% improvement on 10 contracts gives r=0.504, CI (0.490, 0.508), gate PASS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

fetch_topcontracts.py: curated list of ~90 high-traffic mainnet contracts (stables, DEX routers, lending markets, NFT marketplaces, infra); pulls runtime bytecode via public JSON-RPC (eth_getCode); dedupe by codehash; writes hex + per-contract meta JSON. Used for the primary paired-ratio bench corpus (production-grade workload, ~80 unique contracts). fetch_sourcify_corpus.py: pulls verified contracts via Sourcify v2 REST API (`/contracts/{chainId}` + `/contract/.../?fields=runtimeBytecode, metadata`); supplies solc_version / optimizer_runs / viaIR metadata for stratified sampling. Higher noise floor than top-RPC (most newly verified contracts are 100-200 byte proxy stubs) but provides 7-strata metadata when needed. .gitignore: corpus output (raw/ + manifest_*.json) is bench artefact, not source. Fetchers are reproducible; bench results live in spec Results section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

evm_cache.md: replace the iterative-bitset dataflow description with the Cooper-Harvey-Kennedy algorithm + Tarjan DFS Enter/Exit intervals (`O(N+E)` time, `O(N)` memory, `O(1)` `dominates` queries). Add a section on the optional `-DZEN_EVM_CACHE_PROFILE=ON` per-phase wall-clock CSV emission used to drive `tools/bench_evm_cache.sh`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Move the spec drafted in ~/changes/2026-05-16-evm-spp-overhaul/ into the project-required location docs/changes/2026-05-16-evm-spp-overhaul/, with all Phase 0.5 motivation reviews + Phase 2 spec reviews retained. DTVM/CLAUDE.md mandates change docs live under docs/changes/ as PR artefacts, overriding the global ~/changes/ SSOT default. Spec status is now Implemented (v3) per the latest Results section in README.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Address 4 of 6 Codex round-1 findings (1 NIT skipped, commit reword deferred until Opus returns): - Production gate: relabel "borderline" -> explicit FAIL on the recalibrated improvement_lo>0 clause, note user override on stratified evidence + algorithmic gate PASS. - Algorithmic table: refresh with 9-rep median (was 5-rep). 100k now 21.83x; add measurement-variance note acknowledging independent reviewer reruns in the 20-30x range. - Step 5 scope: narrow the spec claim to IDom-only structural tests; enumerate the loop / SPP / fuzz invariants explicitly deferred and point at evmone-statetest 2723 + existing implicit-dyn-pred GTests for end-to-end coverage. - analyze_evm_cache_bench.py docstring: cite Efron-Tibshirani 1993 (per Phase 2 R2 accepted nit) instead of Efron 1987. - fetch_topcontracts.py: split raw/ and meta/ into sibling dirs so the bench harness doesn't mis-interpret .meta.json files as bytecode files; remove sanctioned TornadoCash01 from TOP_CONTRACTS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Address 5 of 6 Opus round-1 findings (cosmetic NIT 6 left as-is): - Replace IrreducibleSCC_TwoEntryLoop test with IrreducibleImproperRegion. Old CFG was reducible under DTVM's dom-based loop detection (the two cycle edges had no dominating back-edge target, so zero loops were discovered and the fallback path was never exercised). New CFG is a Hecht-Ullman improper region: 0 -> 1 -> 2 -> 3 -> {1, 4}; 4 -> {2, 5}. Two overlapping back-edges (3->1 and 4->2) produce two loops with body intersection {2, 3} but neither containing the other, forcing the reducibility check to fail. The test asserts IDom correctness on the irreducible region plus the dominator-chain-reaches-root invariant. - Reorder DomInfo::dominates() bounds check before the A==B shortcut so out-of-range equal arguments do not falsely report mutual dominance. - evm_cache_for_testing.h: document that computeIDomForTesting is the dominator pass in isolation, with no computeReachable / splitCriticalEdges / reachability-stitch coverage. - Spec Step 5 prose: add a downgrade note enumerating the per-fixture behavioural claims (InCycle, UseLinearSPP, buildLoopsUsingDominance, GasChunkCostSPP fallback, splitCriticalEdges write-back) and path-total fuzz that were deferred to PR B / PR C. - Spec Checklist: annotate Step 7 with "production gate FAIL, override approved" so the failure flag is visible at scan time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

R2 reviewers (Codex + Opus, parallel) reviewed commits b00efa1 + 8a95175 shipped for R1. Codex R2 raised 1 issue (variance band); Opus R2 raised 1 MINOR (test naming/comment drift) + 1 NIT (PR B note). Fixes: - Variance band (Codex R2 §5): 9-rep median rerun produced 19.26x at N=100k; spec said 20-30x. Refresh to "≈ 19-30x" with the four sampled medians explicitly listed (19.26 / 21.83 / 22.84 / 29.7); gate remains ≥10x. - IrreducibleImproperRegion mis-naming (Opus R2 MINOR): the new CFG 0->1->2->3->{1,4}; 4->{2,5} produces natural loops {1,2,3,4} and {2,3,4} where the second is properly nested in the first (reducible nest). My R1 fix-attempt comment claimed otherwise. Rename test to OverlappingBackEdgesIDom and rewrite the comment to describe it as a reducible nested case that exercises the CHK intersect finger-walk on a non-trivial back-edge set; soften the §"Step 5 Scope Reduction" wording from "genuinely forces ... irreducible loop nest" to the truer narrative. - Opus R2 NIT (PR B note): add a structural observation to the spec: dominator-based loop discovery only ever produces a properly-nested loop forest by construction, so exercising the SPP reducibility fallback at evm_cache.cpp:1019-1042 requires buildBytecodeCache-level plumb, not the computeIDomForTesting helper. Documented for PR B/C authors. Code change (test rename + comment) verified: 14/14 evmCacheTests pass, no other targets touched. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Previously buildGasBlocks ran two passes over the bytecode: pass 2a marked IsBlockStart[CodeSize] for JUMPDEST positions and after-terminator bytes, and pass 2b walked IsBlockStart to construct GasBlock entries. The auxiliary IsBlockStart vector cost CodeSize bytes of allocation + memset and one extra L1/L2-hostile traversal of the bytecode. Replace with a single walk: each iteration opens a new block at the current Pc, then advances opcode by opcode until either (a) a mid-block OP_JUMPDEST is encountered (which starts a new block), or (b) a gas-chunk terminator is processed (whose successor byte opens the next block). Semantically identical because every block start under the old scheme was either Pc=0, a JUMPDEST position, or the byte right after a terminator -- all three are produced naturally by the fused loop. Measured on evmCacheComplexityDemo N=100k synthetic (5 reps, median): phase buildGasBlocks: 10614 us -> 9250 us (-13%) total cache build: 54818 us -> 46260 us (-15%) The total wins more than the named phase because the eliminated IsBlockStart vector (300 KB for N=100k synthetic) sat in the outer buildBytecodeCache and is no longer allocated or zero-filled. Verification: - evmCacheTests: 14/14 pass - evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass - No behavioral change in callers; signature unchanged.

collectJumpDests previously re-scanned the entire bytecode after buildGasBlocks, allocated a SeenBlocks[Blocks.size()] dedup vector, and mapped each JUMPDEST byte through BlockAtPc to recover the unique set of JUMPDEST-leading blocks. Every JUMPDEST byte in valid EVM code already starts a new gas block, so the dedup is structurally unnecessary and the re-scan is pure duplication of the buildGasBlocks walk. Emit JumpDestBlocks inline: each iteration of buildGasBlocks reads the opening opcode of the new block; if it is OP_JUMPDEST, push the block id that is about to be assigned. Output is identical to the prior pass in both set membership and block-id ascending order; downstream buildCFGEdges and reachability seeding consume the list as an unordered set so any iteration order is acceptable. Measured on evmCacheComplexityDemo N=100k (5 reps, median): total: 46260 us -> 42813 us (-7%, this commit) total vs main: 54818 us -> 42813 us (-22%, cumulative w/ prior fusion) The phase formerly named EVM_PROFILE,collectJumpDests is now absent from profile output; its 0.4 ms instrumented cost plus an equivalent amount of un-instrumented bytecode-rescan + SeenBlocks zero-fill is reclaimed. Verification: - evmCacheTests: 14/14 pass - evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass

…sses Previously every GasBlock owned a std::vector<uint32_t> Preds and Succs. With N gas blocks that materialises 2N small heap allocations, and every neighbour-iteration walks a pointer to a scattered heap chunk. The hot SPP passes (computeDomInfo CHK intersect over Preds, computeInCycle SCC DFS, findBackEdges over Succs, buildLoopsUsingDominance over both) all pay this pointer-chase tax per node. Flatten both directions into a contiguous CSR adjacency once after splitCriticalEdges finishes mutating the graph, then route every downstream reader through the new SuccsCSR / PredsCSR. The per-block vectors stay live (we copy out, not swap) -- N std::vector dealloc()s back-to-back cost more than the readers reclaim, so we trade short-lived peak memory for time. Reader-side measurement on evmCacheComplexityDemo N=100k (25 reps): pre-CSR (commit 3bba649) median = 44797 us post-CSR (this commit) median = 39475 us (-11.9%) Per-phase breakdown shifts the cost from many "Preds/Succs reader" rows into a single buildCSR row plus much faster readers: computeDomInfo 7233 -> 4169 us (-42%) computeInCycle 5694 -> 3842 us (-32%) computeReachable 1818 -> 970 us (-47%) findBackEdges 1169 -> 342 us (-71%) buildLoops 1309 -> 423 us (-68%) computeReverseTopo 1651 -> 1114 us (-32%) buildCSR 0 -> 3985 us (new, single up-front cost) Cumulative vs perf/evm-spp-foundation baseline (PR A binary at 54818 us): 54818 us -> 39475 us (-28.0%) Mutating helpers (buildCFGEdges, splitCriticalEdges, addEdge) still operate on the per-block vectors. CSR is built once after the mutations finish, so addEdge / erase semantics in those phases are unchanged. The testing helper computeIDomForTesting now builds its own CSR pair in-place from the input Succs[] adjacency, matching the production flow. Verification: - evmCacheTests: 14/14 pass - evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass

Adds a single counter (gated on ZEN_EVM_CACHE_PROFILE) that prints how many RPO sweeps the CHK fixpoint took to settle. Used to answer "would SemiNCA help here?". Measurement on evmCacheComplexityDemo at N = 10k / 20k / 50k / 100k synthetic shows the fixpoint settles in exactly 2 rounds in every case -- one productive sweep plus one confirmation sweep -- so SemiNCA's single-pass advantage caps at roughly half of computeDomInfo's time, well under the cost of its eval/link forest bookkeeping. Zero runtime cost when ZEN_EVM_CACHE_PROFILE is off (macro elides).

computeInCycle previously ran an unconditional Tarjan SCC pass to mark every block that participates in a cycle. On a reducible CFG -- the common case for compiler-emitted EVM code -- this work is redundant: every cycle is the natural loop of some back-edge, and buildLoopsUsingDominance already enumerates those natural loops with their NodeMask bitmaps. The union of NodeMasks equals Tarjan's in-cycle set, so we can derive InCycle in one bitset OR sweep instead of running a second full DFS pair over Succs and Preds. Pipeline reorder: buildLoopsUsingDominance now runs before InCycle so its UseLinearSPP result and Loops vector are available to choose the cheap path. Reducible path (UseLinearSPP=true): OR all Loops[].NodeMask bitmaps into a CycleBits vector, then expand into the existing uint8_t InCycle vector. Empty Loops vector yields all-zero InCycle, which is correct -- an acyclic CFG has nothing in a cycle. Irreducible path (UseLinearSPP=false): keep the full Tarjan SCC. Dominator-based loop discovery can miss multi-entry cycles that have no single header, and lemma614Update relies on InCycle correctness to refuse gas shifts across cycles. The Tarjan backstop preserves soundness for these cases (rare in practice -- statetest 2723 shows no irreducible contracts trigger the fallback at scale). Measured on evmCacheComplexityDemo N=100k (50 reps, median): pre (commit 4d74033): 41247 us post (this commit): 39592 us (-4.0%) Phase delta: computeInCycle 3842 us -> 74 us; buildLoopsUsingDominance absorbs ~1.2 ms of cold-cache cost from running first instead of second. Net ~1.6 ms gain on synthetic, consistent across the IQR band. Verification: - evmCacheTests: 14/14 pass (covers IrreducibleImproperRegion fallback path indirectly through computeIDomForTesting; full Tarjan branch exercised below) - evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass

buildCFGEdges previously walked Blocks twice. The first pass called resolveConstantJumpTarget on every JUMP block solely to count the unresolved dynamic jumps and stamp ImplicitDynamicPredCount on every JUMPDEST. The second pass walked Blocks again to add fallthrough and jump-target edges, calling resolveConstantJumpTarget a second time on each JUMP block to recover the same answer. Collapse into one pass: count DynamicJumpCount inline while emitting edges, then stamp the JUMPDESTs at the end. addEdge does not depend on ImplicitDynamicPredCount being set, so deferring the stamp is safe. Measured on evmCacheComplexityDemo N=100k (50 reps): phase buildCFGEdges: 5315 us -> 4766 us (-10%) total cache build: 39592 us -> 38595 us (-2.5%) The phase win cancels half the per-call resolveConstantJumpTarget cost (the function is pure of Block + constants, so the second call returned the same answer with no side effect). Verification: - evmCacheTests: 14/14 pass - evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass

computeReverseTopo previously ran its own full DFS over Succs to produce a postorder list, explicitly skipping back-edges. That DFS is semantically identical to the DFS computeDomInfo already runs: visit each reachable node once, never follow back-edges (back-edge targets are visited ancestors, so the "visited" check rejects them anyway, making the explicit BackEdges filter redundant). Both produce the same forward-DAG postorder. Expose computeDomInfo's RPO as a DomInfo::RPO field. computeReverseTopo collapses to a reverse copy of Dom.RPO -- O(N) memory traversal instead of O(N+E) DFS. The defensive second pass in computeDomInfo (that visits unreachable components after the main reachable DFS) is preserved, so RPO covers every block id, matching computeReverseTopo's previous output set. Measured on evmCacheComplexityDemo N=100k (50 reps): phase computeReverseTopo: 1203 us -> 371 us (-69%) total cache build: 38595 us -> 38534 us (-0.2%, within noise) Total wall-clock barely moves because the freed cycles re-emerge as slight increases in adjacent phases via cache effects -- the work shifted, not actually disappeared in absolute terms. The win is structural: less code, one fewer DFS, RPO available for future passes that could subsume RevTopo entirely. Verification: - evmCacheTests: 14/14 pass - evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass

Pure clang-format adjustments to function signatures and continuation line breaks introduced over the InCycle / RPO / buildCFGEdges fusion commits. No semantic changes.

…oc cost buildGasBlocks previously default-constructed a stack-local GasBlock, filled it across the inner opcode loop, then std::move'd it into Blocks via push_back. Each push_back paid two costs: 1. Move construction -- 80 bytes copied from stack-local Block into the vector slot. 2. Geometric capacity growth -- log2(N) reallocations during build, each copying the entire prefix (~half of final size on average). For N=100k blocks that is roughly 4 MB of memmove traffic that contributes nothing to the result. Replace with the two changes that drop both costs: - Blocks.reserve(CodeSize) up front. Worst-case bound: opcodeLen >= 1 so block count is bounded by CodeSize. Real EVM averages 3-10 bytes/block so this over-reserves transiently by 3-10x, but the saved realloc copies dominate. For EIP-170 production code (24576 B max) the reserve costs ~1.9 MB; for the synthetic stress demo at N=100k (CodeSize ~300 KB) it costs ~24 MB transient. - emplace_back() the new block into Blocks directly; bind a reference Blocks.back() (== emplace_back's return) and fill the block in place. No stack-local intermediate, no move. Measured on evmCacheComplexityDemo N=100k (100 reps): phase buildGasBlocks: 10815 us -> 5108 us (-53%) total cache build: 35170 us -> 31683 us (-9.9%) This is the single biggest win in the PR after the initial fusion. The reserve calculation is conservative on purpose: knowing the exact final block count would need another bytecode pass, which would itself cost ~1 ms at this scale. Verification: - evmCacheTests: 14/14 pass - evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass

GasBlock previously embedded two std::vector<uint32_t> (Succs, Preds) inline. Each vector occupies 24 bytes of control fields, so the block header bloated to ~80 bytes per entry. Every pass that iterated Blocks to read scalar fields (computeDomInfo class B/C init, buildLoops body scans, meteringInit Cost copy, lemma614Update opcode/cost reads, writeback Start/End/Cost emit) paid this 2x cache stride. Replace with a parallel EdgeTables that holds two std::vector<vector<>> keyed by block id. The CFG-build phase (buildCFGEdges, addEdge, splitCriticalEdges) now operates on EdgeTables; the flatten step (buildAdjacencyCSR) reads from EdgeTables. Downstream readers were already on the CSR after the earlier CSR commit, so nothing else needed touching. GasBlock shrinks from ~80 -> ~40 bytes (4 uint32 PCs + 2 uint8 opcodes + uint64 cost + uint32 dyn-pred count = 32 bytes payload, 40 with padding). Iterating Blocks halves the cache traffic and the default constructor stops zero-filling two 24-byte vector control structs per emplace. Measured on evmCacheComplexityDemo N=100k (100 reps): phase buildGasBlocks: 5108 us -> 2515 us (-51%) phase buildCSR: 3929 us -> 2980 us (-24%) phase splitCriticalEdges: 751 us -> 395 us (-47%) phase writeback: 671 us -> 368 us (-45%) total cache build: 31683 us -> 28642 us (-9.6%) The buildGasBlocks win compounds with the prior reserve+emplace commit: now each emplaced GasBlock is half the size and has no vector ctor to invoke. The writeback win is pure stride compression on a tight loop over Blocks. Cumulative vs perf/evm-spp-foundation HEAD (47429 us at N=100k): 47429 -> 28642 us = -39.6%. Verification: - evmCacheTests: 14/14 pass - evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass

After moving Succs/Preds out, GasBlock was 40 bytes -- the lone 8-byte Cost sat after a uint32 ImplicitDynamicPredCount, leaving a 4-byte trailing pad to satisfy the struct's 8-byte alignment. Reorder so all five 32-bit fields cluster first (Start, End, LastPc, PrevPc, ImplicitDynamicPredCount), followed by the two 1-byte opcodes + 2-byte tail pad to reach the 24-byte mark, then the 8-byte Cost. Total = 32 bytes exact, two blocks per cache line, no trailing pad. Locked in with a static_assert so future field additions get flagged. Measured on evmCacheComplexityDemo N=100k (100 reps): phase buildGasBlocks: 2515 us -> 2157 us (-14%, less zero-init/emplace) phase writeback: 368 us -> 331 us (-10%) phase splitCriticalEdges: 395 us -> 361 us (-9%) total cache build: 28642 us -> 28180 us (-1.6%) The buildGasBlocks win is the default-constructor doing less work per emplace_back (32 bytes of zeroed memory instead of 40). The writeback and split wins are from the tighter Block stride in their iteration loops. Verification: - evmCacheTests: 14/14 pass - evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass

Full-tier spec covering all 11 commits on perf/cache-build-fusion: phase fusion (buildGasBlocks 2-pass merge, collectJumpDests fold, buildCFGEdges single sweep), CSR adjacency + conditional Tarjan, DomInfo::RPO share with computeReverseTopo, and the GasBlock compaction trio (reserve + emplace_back, Succs/Preds split into EdgeTables, 32-byte field repack). Documents the data behind dropping PR B (Stack-SSA: 92.5/98.4% JUMPs already static; <1% expected runtime win) and SemiNCA (CHK fixpoint converges in 2 rounds at every measured N). Cross-N speedup table vs perf/evm-spp-foundation baseline (100-rep median): -21% at N=10k scaling to -41% at N=100k.

Both reviewers returned REVISE. Fixes applied: Major (Opus M-1/M-2/M-3, Codex C4/C5): - Remove fabricated "IrreducibleImproperRegion" test reference (the test is OverlappingBackEdgesIDom and its own comment disclaims fallback coverage). State that no unit test currently drives UseLinearSPP=false; end-to-end soundness comes from statetest. - Rewrite R2 soundness argument: InCycle=union(natural-loops) is a *performance* fast-path, not the safety mechanism. Soundness on irreducible CFGs is provided by lemma614Update's multi-pred guard via effectivePredCount, since every SCC-internal node has at least one in-cycle predecessor pushing its count >= 2. Added explicit warning not to remove the multi-pred guard on the assumption that InCycle covers it. - Soften "independently revertable" claim. Phase-internal commits (notably Phase 2's CSR/EdgeTables pair and Phase 5's reserve -> split -> repack chain) cannot be reverted in isolation without breaking the build; the per-commit-greenness claim remains. - Rewrite the perf tables. Replace stitched 9-rep + 25-rep data with a single same-session 50-rep per-phase + 100-rep interleaved-total measurement, both rebuilt from src/evm/evm_cache.cpp at 592fd35 vs HEAD. Document methodology so reviewers can reproduce. N=100k speedup re-derives to 1.69x / -41.0% under this methodology. Minor (Opus N-1..N-6, Codex C1/C6/C7/C8): - Per-phase sum vs total discrepancy now explained (chrono overhead at 13 phase boundaries). - Diff stat fixed: +312/-188 (was +236/-171). - Commit count clarified: 11 implementation + 1 docs = 12. - byte-identical EVMBytecodeCache claim softened to "behaviourally identical (statetest 2723/2723)" since no memcmp diff is run. - R1 (Blocks.reserve) scope note added: the no-realloc guarantee covers only buildGasBlocks initial construction, not the later splitCriticalEdges append. - R4 (chkFixpointRounds=2) caveat: synthetic stress + unit tests are the easy case for CHK; real-corpus measurement deferred. - N-6 meteringInit +110% attribution downgraded to "conjecture from access pattern, not measured." - Format gate description acknowledges Codex's exit-123 observation on pre-existing unrelated file violations; PR diff itself is clean. Code changes (Codex 3.3 suggestion): - Add Edges.size() == Blocks.size() invariant assert before buildAdjacencyCSR. Catches future drift if a new Blocks.push_back forgets to grow Edges in lockstep. - Fix GasBlock layout comment ("22 pad uint16" -> "22 pad[2]") per Opus N-4 since there is no actual uint16 field there. Verification: - evmCacheTests 14/14 pass - evmone-statetest -k fork_Cancun 2723/2723 pass - tools/format.sh check clean

Both Round 2 reviewers (Opus + Codex, independent) returned PASS verdicts after verifying the c5db655 round-1 fixes. Codex re-measured N=100k at 1.67x speedup (-40.2%), reproducing the documented 1.69x / -41.0% within +/-10% under the same interleaved methodology. Opus noted no new issues introduced by the R1 fixes. Polish item from Opus's R2 (non-blocking): the per-phase table notes were one-sided -- they explained why baseline's instrumented sum exceeds the total (chrono overhead at phase boundaries) but not why HEAD's sum is below the total (un-instrumented outer vector allocation in buildBytecodeCache; ~7 ms for synthetic N=100k due to 9.6 MB PushValueMap zero-init, ~0.2 ms for EIP-170 production code). Added a paragraph explaining the asymmetry. Review cadence: 2 rounds, target met within 1-2 cap.

- `docs/modules/evm/cache-build.md`: new module spec scoped to shipped state. Covers pipeline phase order, GasBlock 32B layout, EdgeTables / CSRGraph types, DomInfo (CHK + Tarjan E/E), conditional InCycle branches, ZEN_EVM_CACHE_PROFILE counters, and the R2-verbatim soundness invariant via lemma614Update's effectivePredCount multi-pred guard (with explicit future-contributor warning). - `perf-summary.md`: appends a directional B-lite Sourcify pilot (n=10, paired wall-clock vs upstream/main `ef062ae` on mainnet contracts pulled via `eth_getCode`, stratified by CodeSize). Overall median 1.17x / +14.9% with 9/10 contracts faster; DAI flagged as follow-up outlier. Adds an operationalized future-work C-rubric with pre-committed GO/KILL/Partial thresholds covering production-size cache-build, end-to-end evmone-bench, N-stratum spread, and first-touch p95. - `reviews/motivation-{1,2}-{opus,codex}.md`: dev-cycle motivation red-team for the A -> B -> C follow-up plan. iter=1 both REFINE (33x framing, C numeric trigger, C estimate provenance, R2 PASS preservation, B methodology). iter=2 Opus PROCEED conditional on three write-time fixes (C-rubric (iii) operationalize, evm_cache.md scope, B-lite labeling); Codex REFINE on the same convergent list. All review-cited write-time fixes are applied in the deliverables themselves: cache-build.md scoped tight; perf-summary B-lite labeled "directional, n=10, selection-biased"; C-rubric (iii) replaced with "N=2000 paired >= 50% of N=100k paired"; (iv) first-touch p95 >= 5% clause added.

The pipeline table lists phases 0-13 (14 entries) but the chrono-overhead prose said "13 phase pairs", which is the 13 phases inside `buildGasChunksSPP` excluding phase 0 `buildJumpDestMap` that runs in `buildBytecodeCache`'s outer scope. Reader cross-referencing the table would briefly think the number was wrong. Clarify in the prose without changing the table or the numeric overhead estimate.

The change doc README and reviews were already in English; only perf-summary.md was mixed Chinese + English. Translate verbatim, preserving all numeric tables, identifiers, file paths, commit SHAs, and markdown structure.

Copilot

Pull request overview

This PR overhauls EVM bytecode cache-build performance by replacing/optimizing CFG and dominator-related passes, adding profiling and benchmarking tools, and documenting the new cache-build pipeline and performance methodology.

Changes:

Adds EVM cache-build profiling, benchmark analysis, corpus-fetching, and bytecode replay tooling.
Refactors cache-build internals around CHK dominators, CSR adjacency, phase fusion, and compact block metadata.
Adds extensive change documentation, performance summaries, and adversarial review records.

Reviewed changes

Copilot reviewed 45 out of 45 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`CMakeLists.txt`	Adds `ZEN_EVM_CACHE_PROFILE` option.
`src/evm/CMakeLists.txt`	Propagates cache profiling define to EVM object target.
`src/evm/evm_cache.cpp`	Implements cache-build pipeline changes, CSR graph, CHK dominators, and profiling hooks.
`src/evm/evm_cache.md`	Updates cache-build algorithm documentation.
`src/evm/evm_cache_for_testing.h`	Adds dominator testing helper API declaration.
`src/tests/evm_cache_complexity_demo.cpp`	Adds bytecode replay mode and microsecond CSV output.
`tests/corpus/evm-cache/.gitignore`	Ignores generated corpus/benchmark artifacts.
`tests/corpus/evm-cache/fetch_sourcify_corpus.py`	Adds Sourcify corpus acquisition and metadata extraction.
`tools/bench_evm_cache.sh`	Adds repeated fresh-process cache-build benchmark runner.
`tools/analyze_evm_cache_bench.py`	Adds paired-ratio BCa bootstrap analyzer.
`docs/modules/evm/cache-build.md`	Adds module-level cache-build specification and invariants.
`docs/changes/2026-05-17-evm-cache-build-fusion/perf-summary.md`	Adds performance summary and follow-up gating rubric.
`docs/changes/2026-05-17-evm-cache-build-fusion/reviews/round-1-opus.md`	Adds round-1 review record.
`docs/changes/2026-05-17-evm-cache-build-fusion/reviews/round-1-codex.md`	Adds round-1 review record.
`docs/changes/2026-05-17-evm-cache-build-fusion/reviews/round-2-opus.md`	Adds round-2 verification record.
`docs/changes/2026-05-17-evm-cache-build-fusion/reviews/round-2-codex.md`	Adds round-2 verification record.
`docs/changes/2026-05-17-evm-cache-build-fusion/reviews/motivation-1-opus.md`	Adds motivation review record.
`docs/changes/2026-05-17-evm-cache-build-fusion/reviews/motivation-1-codex.md`	Adds motivation review record.
`docs/changes/2026-05-17-evm-cache-build-fusion/reviews/motivation-2-opus.md`	Adds follow-up motivation review record.
`docs/changes/2026-05-17-evm-cache-build-fusion/reviews/motivation-2-codex.md`	Adds follow-up motivation review record.
`docs/changes/2026-05-16-evm-spp-overhaul/problem-statement.md`	Adds scoped problem statement for foundation work.
`docs/changes/2026-05-16-evm-spp-overhaul/reviews/*`	Adds foundation-layer motivation, spec, and implementation review records.
`docs/changes/2026-05-12-evm-dom-chk/reviews/*`	Adds prior dominator-change review records.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-05-18T04:53:06Z

⚡ Performance Regression Check Results

✅ Performance Check Passed (interpreter)

Performance Benchmark Results (threshold: 25%)

Benchmark	Baseline (us)	Current (us)	Change	Status
total/main/blake2b_huff/8415nulls	2.52	2.52	-0.2%	PASS
total/main/blake2b_huff/empty	0.04	0.04	-0.6%	PASS
total/main/blake2b_shifts/8415nulls	20.14	20.53	+2.0%	PASS
total/main/sha1_divs/5311	8.66	8.63	-0.3%	PASS
total/main/sha1_divs/empty	0.10	0.10	-0.5%	PASS
total/main/sha1_shifts/5311	6.06	6.18	+1.9%	PASS
total/main/sha1_shifts/empty	0.07	0.07	+0.2%	PASS
total/main/snailtracer/benchmark	72.71	72.50	-0.3%	PASS
total/main/structarray_alloc/nfts_rank	1.46	1.46	-0.1%	PASS
total/main/swap_math/insufficient_liquidity	0.00	0.00	+4.6%	PASS
total/main/swap_math/received	0.01	0.01	+4.9%	PASS
total/main/swap_math/spent	0.01	0.01	+5.4%	PASS
total/main/weierstrudel/1	0.30	0.30	+0.7%	PASS
total/main/weierstrudel/15	3.43	3.41	-0.5%	PASS
total/micro/JUMPDEST_n0/empty	3.01	3.01	+0.0%	PASS
total/micro/jump_around/empty	0.06	0.07	+13.4%	PASS
total/micro/loop_with_many_jumpdests/empty	45.85	45.84	-0.0%	PASS
total/micro/memory_grow_mload/by1	0.12	0.12	-1.6%	PASS
total/micro/memory_grow_mload/by16	0.14	0.14	-0.3%	PASS
total/micro/memory_grow_mload/by32	0.16	0.16	-0.8%	PASS
total/micro/memory_grow_mload/nogrow	0.12	0.12	-1.3%	PASS
total/micro/memory_grow_mstore/by1	0.13	0.13	-2.6%	PASS
total/micro/memory_grow_mstore/by16	0.15	0.15	-0.5%	PASS
total/micro/memory_grow_mstore/by32	0.16	0.16	-0.2%	PASS
total/micro/memory_grow_mstore/nogrow	0.13	0.13	-1.4%	PASS
total/micro/signextend/one	0.27	0.27	-0.1%	PASS
total/micro/signextend/zero	0.27	0.27	-0.3%	PASS
total/synth/ADD/b0	3.31	3.22	-2.6%	PASS
total/synth/ADD/b1	3.70	4.12	+11.1%	PASS
total/synth/ADDRESS/a0	5.72	5.65	-1.2%	PASS
total/synth/ADDRESS/a1	6.17	6.10	-1.2%	PASS
total/synth/AND/b0	3.02	3.02	+0.0%	PASS
total/synth/AND/b1	3.60	3.91	+8.4%	PASS
total/synth/BYTE/b0	6.86	6.90	+0.6%	PASS
total/synth/BYTE/b1	5.78	5.81	+0.6%	PASS
total/synth/CALLDATASIZE/a0	3.34	3.44	+3.0%	PASS
total/synth/CALLDATASIZE/a1	4.33	3.95	-8.7%	PASS
total/synth/CALLER/a0	5.70	5.71	+0.1%	PASS
total/synth/CALLER/a1	6.17	6.10	-1.2%	PASS
total/synth/CALLVALUE/a0	4.16	3.25	-21.8%	PASS
total/synth/CALLVALUE/a1	3.81	3.62	-5.0%	PASS
total/synth/CODESIZE/a0	4.00	3.61	-9.7%	PASS
total/synth/CODESIZE/a1	4.10	4.11	+0.4%	PASS
total/synth/DUP1/d0	1.44	1.43	-0.2%	PASS
total/synth/DUP1/d1	1.95	1.87	-4.1%	PASS
total/synth/DUP10/d0	1.48	1.44	-2.7%	PASS
total/synth/DUP10/d1	1.95	1.87	-4.0%	PASS
total/synth/DUP11/d0	1.44	1.44	-0.1%	PASS
total/synth/DUP11/d1	1.95	1.65	-15.6%	PASS
total/synth/DUP12/d0	1.48	1.44	-2.6%	PASS
total/synth/DUP12/d1	1.95	1.65	-15.5%	PASS
total/synth/DUP13/d0	1.48	1.44	-2.6%	PASS
total/synth/DUP13/d1	1.95	1.87	-4.2%	PASS
total/synth/DUP14/d0	1.44	1.44	-0.2%	PASS
total/synth/DUP14/d1	1.95	1.87	-4.0%	PASS
total/synth/DUP15/d0	1.48	1.44	-2.6%	PASS
total/synth/DUP15/d1	1.95	1.87	-3.9%	PASS
total/synth/DUP16/d0	1.48	1.44	-2.5%	PASS
total/synth/DUP16/d1	1.95	1.87	-4.0%	PASS
total/synth/DUP2/d0	1.48	1.43	-3.1%	PASS
total/synth/DUP2/d1	1.95	1.68	-13.9%	PASS
total/synth/DUP3/d0	1.48	1.43	-2.8%	PASS
total/synth/DUP3/d1	1.95	1.65	-15.4%	PASS
total/synth/DUP4/d0	1.44	1.43	-0.0%	PASS
total/synth/DUP4/d1	1.95	1.88	-3.8%	PASS
total/synth/DUP5/d0	1.44	1.43	-0.0%	PASS
total/synth/DUP5/d1	1.95	1.65	-15.4%	PASS
total/synth/DUP6/d0	1.48	1.44	-2.7%	PASS
total/synth/DUP6/d1	1.95	1.87	-4.1%	PASS
total/synth/DUP7/d0	1.44	1.44	-0.2%	PASS
total/synth/DUP7/d1	1.96	1.87	-4.3%	PASS
total/synth/DUP8/d0	1.48	1.44	-2.8%	PASS
total/synth/DUP8/d1	1.95	1.88	-3.7%	PASS
total/synth/DUP9/d0	1.44	1.44	-0.1%	PASS
total/synth/DUP9/d1	1.95	1.87	-4.0%	PASS
total/synth/EQ/b0	6.00	6.09	+1.5%	PASS
total/synth/EQ/b1	6.56	6.60	+0.7%	PASS
total/synth/GAS/a0	3.89	3.88	-0.3%	PASS
total/synth/GAS/a1	4.37	4.15	-5.0%	PASS
total/synth/GT/b0	5.92	5.77	-2.5%	PASS
total/synth/GT/b1	6.10	6.22	+1.9%	PASS
total/synth/ISZERO/u0	9.64	9.64	+0.0%	PASS
total/synth/JUMPDEST/n0	3.01	3.01	-0.0%	PASS
total/synth/LT/b0	5.76	5.75	-0.1%	PASS
total/synth/LT/b1	6.11	6.23	+2.0%	PASS
total/synth/MSIZE/a0	5.07	5.07	-0.0%	PASS
total/synth/MSIZE/a1	5.56	5.49	-1.2%	PASS
total/synth/MUL/b0	6.24	6.32	+1.3%	PASS
total/synth/MUL/b1	6.79	6.76	-0.3%	PASS
total/synth/NOT/u0	5.24	5.16	-1.6%	PASS
total/synth/OR/b0	3.03	3.02	-0.3%	PASS
total/synth/OR/b1	3.51	3.84	+9.4%	PASS
total/synth/PC/a0	3.41	3.44	+1.0%	PASS
total/synth/PC/a1	4.34	4.16	-4.1%	PASS
total/synth/PUSH1/p0	1.47	1.47	-0.0%	PASS
total/synth/PUSH1/p1	2.07	1.98	-4.2%	PASS
total/synth/PUSH10/p0	1.51	1.51	+0.3%	PASS
total/synth/PUSH10/p1	2.07	1.75	-15.5%	PASS
total/synth/PUSH11/p0	1.52	1.52	+0.0%	PASS
total/synth/PUSH11/p1	2.08	1.75	-15.6%	PASS
total/synth/PUSH12/p0	1.50	1.50	+0.0%	PASS
total/synth/PUSH12/p1	2.08	1.76	-15.4%	PASS
total/synth/PUSH13/p0	1.52	1.50	-1.1%	PASS
total/synth/PUSH13/p1	2.07	1.75	-15.5%	PASS
total/synth/PUSH14/p0	1.52	1.51	-0.6%	PASS
total/synth/PUSH14/p1	2.08	1.98	-4.5%	PASS
total/synth/PUSH15/p0	1.51	1.50	-0.6%	PASS
total/synth/PUSH15/p1	2.08	1.78	-14.7%	PASS
total/synth/PUSH16/p0	1.51	1.50	-0.1%	PASS
total/synth/PUSH16/p1	2.08	1.98	-4.9%	PASS
total/synth/PUSH17/p0	1.52	1.52	+0.1%	PASS
total/synth/PUSH17/p1	2.08	1.76	-15.5%	PASS
total/synth/PUSH18/p0	1.52	1.52	+0.5%	PASS
total/synth/PUSH18/p1	2.07	1.76	-15.1%	PASS
total/synth/PUSH19/p0	1.51	1.51	+0.3%	PASS
total/synth/PUSH19/p1	2.08	1.75	-15.5%	PASS
total/synth/PUSH2/p0	1.49	1.51	+0.9%	PASS
total/synth/PUSH2/p1	2.06	1.97	-4.2%	PASS
total/synth/PUSH20/p0	1.52	1.51	-0.0%	PASS
total/synth/PUSH20/p1	2.07	1.99	-4.3%	PASS
total/synth/PUSH21/p0	1.52	1.52	-0.1%	PASS
total/synth/PUSH21/p1	2.07	1.75	-15.1%	PASS
total/synth/PUSH22/p0	1.51	1.50	-0.3%	PASS
total/synth/PUSH22/p1	2.07	1.99	-4.1%	PASS
total/synth/PUSH23/p0	1.51	1.52	+0.4%	PASS
total/synth/PUSH23/p1	2.07	1.75	-15.3%	PASS
total/synth/PUSH24/p0	1.51	1.51	-0.0%	PASS
total/synth/PUSH24/p1	2.09	1.99	-4.7%	PASS
total/synth/PUSH25/p0	1.52	1.52	-0.3%	PASS
total/synth/PUSH25/p1	2.07	1.99	-4.1%	PASS
total/synth/PUSH26/p0	1.51	1.51	-0.1%	PASS
total/synth/PUSH26/p1	2.07	1.99	-4.3%	PASS
total/synth/PUSH27/p0	1.52	1.52	-0.3%	PASS
total/synth/PUSH27/p1	2.07	1.98	-4.3%	PASS
total/synth/PUSH28/p0	1.53	1.51	-1.0%	PASS
total/synth/PUSH28/p1	2.07	1.99	-4.1%	PASS
total/synth/PUSH29/p0	1.52	1.51	-0.6%	PASS
total/synth/PUSH29/p1	2.08	1.99	-4.0%	PASS
total/synth/PUSH3/p0	1.51	1.52	+0.7%	PASS
total/synth/PUSH3/p1	2.07	1.98	-4.4%	PASS
total/synth/PUSH30/p0	1.58	1.57	-0.1%	PASS
total/synth/PUSH30/p1	2.08	1.76	-15.4%	PASS
total/synth/PUSH31/p0	1.52	1.53	+0.5%	PASS
total/synth/PUSH31/p1	2.11	1.82	-13.8%	PASS
total/synth/PUSH32/p0	1.53	1.51	-1.1%	PASS
total/synth/PUSH32/p1	2.09	1.76	-15.7%	PASS
total/synth/PUSH4/p0	1.51	1.51	+0.1%	PASS
total/synth/PUSH4/p1	2.08	1.75	-15.7%	PASS
total/synth/PUSH5/p0	1.51	1.51	+0.0%	PASS
total/synth/PUSH5/p1	2.07	1.98	-4.5%	PASS
total/synth/PUSH6/p0	1.51	1.50	-0.4%	PASS
total/synth/PUSH6/p1	2.06	1.98	-4.2%	PASS
total/synth/PUSH7/p0	1.51	1.51	-0.2%	PASS
total/synth/PUSH7/p1	2.07	1.77	-14.7%	PASS
total/synth/PUSH8/p0	1.52	1.51	-0.8%	PASS
total/synth/PUSH8/p1	2.07	1.75	-15.2%	PASS
total/synth/PUSH9/p0	1.51	1.51	+0.0%	PASS
total/synth/PUSH9/p1	2.06	1.97	-4.4%	PASS
total/synth/RETURNDATASIZE/a0	4.05	3.62	-10.5%	PASS
total/synth/RETURNDATASIZE/a1	4.22	4.12	-2.4%	PASS
total/synth/SAR/b0	4.45	4.45	+0.0%	PASS
total/synth/SAR/b1	5.18	5.25	+1.3%	PASS
total/synth/SGT/b0	4.39	4.34	-1.1%	PASS
total/synth/SGT/b1	5.06	4.90	-3.2%	PASS
total/synth/SHL/b0	3.98	3.94	-0.8%	PASS
total/synth/SHL/b1	3.84	3.67	-4.6%	PASS
total/synth/SHR/b0	3.63	3.64	+0.4%	PASS
total/synth/SHR/b1	3.74	4.00	+7.0%	PASS
total/synth/SIGNEXTEND/b0	3.43	3.45	+0.4%	PASS
total/synth/SIGNEXTEND/b1	4.07	3.98	-2.4%	PASS
total/synth/SLT/b0	4.30	4.12	-4.4%	PASS
total/synth/SLT/b1	5.08	4.90	-3.5%	PASS
total/synth/SUB/b0	3.22	3.24	+0.5%	PASS
total/synth/SUB/b1	3.66	4.14	+12.9%	PASS
total/synth/SWAP1/s0	3.43	3.43	-0.1%	PASS
total/synth/SWAP10/s0	3.45	3.45	-0.0%	PASS
total/synth/SWAP11/s0	3.45	3.45	-0.0%	PASS
total/synth/SWAP12/s0	3.46	3.45	-0.2%	PASS
total/synth/SWAP13/s0	3.46	3.46	+0.0%	PASS
total/synth/SWAP14/s0	3.46	3.46	-0.0%	PASS
total/synth/SWAP15/s0	3.31	3.29	-0.5%	PASS
total/synth/SWAP16/s0	3.39	3.39	-0.1%	PASS
total/synth/SWAP2/s0	3.43	3.43	+0.0%	PASS
total/synth/SWAP3/s0	3.44	3.43	-0.1%	PASS
total/synth/SWAP4/s0	3.44	3.44	-0.1%	PASS
total/synth/SWAP5/s0	3.44	3.44	+0.2%	PASS
total/synth/SWAP6/s0	3.44	3.44	+0.1%	PASS
total/synth/SWAP7/s0	3.45	3.45	+0.0%	PASS
total/synth/SWAP8/s0	3.45	3.45	+0.0%	PASS
total/synth/SWAP9/s0	3.45	3.45	+0.0%	PASS
total/synth/XOR/b0	3.02	3.02	-0.0%	PASS
total/synth/XOR/b1	3.61	3.70	+2.7%	PASS
total/synth/loop_v1	7.11	7.08	-0.4%	PASS
total/synth/loop_v2	7.04	7.09	+0.7%	PASS

Summary: 194 benchmarks, 0 regressions

✅ Performance Check Passed (multipass)

Performance Benchmark Results (threshold: 25%)

Benchmark	Baseline (us)	Current (us)	Change	Status
total/main/blake2b_huff/8415nulls	0.81	0.81	-1.0%	PASS
total/main/blake2b_huff/empty	0.01	0.01	+0.5%	PASS
total/main/blake2b_shifts/8415nulls	4.43	4.41	-0.6%	PASS
total/main/sha1_divs/5311	0.58	0.58	-0.1%	PASS
total/main/sha1_divs/empty	0.01	0.01	-0.1%	PASS
total/main/sha1_shifts/5311	0.54	0.54	+0.3%	PASS
total/main/sha1_shifts/empty	0.01	0.01	+0.7%	PASS
total/main/snailtracer/benchmark	31.09	31.09	+0.0%	PASS
total/main/structarray_alloc/nfts_rank	0.27	0.27	+0.7%	PASS
total/main/swap_math/insufficient_liquidity	0.00	0.00	-0.8%	PASS
total/main/swap_math/received	0.00	0.00	-0.6%	PASS
total/main/swap_math/spent	0.00	0.00	+0.6%	PASS
total/main/weierstrudel/1	0.25	0.24	-1.2%	PASS
total/main/weierstrudel/15	2.63	2.59	-1.5%	PASS
total/micro/JUMPDEST_n0/empty	0.00	0.00	-0.9%	PASS
total/micro/jump_around/empty	0.06	0.06	+4.5%	PASS
total/micro/loop_with_many_jumpdests/empty	0.00	0.00	-0.1%	PASS
total/micro/memory_grow_mload/by1	0.01	0.01	+0.9%	PASS
total/micro/memory_grow_mload/by16	0.01	0.01	-1.1%	PASS
total/micro/memory_grow_mload/by32	0.01	0.01	-0.3%	PASS
total/micro/memory_grow_mload/nogrow	0.01	0.01	+0.6%	PASS
total/micro/memory_grow_mstore/by1	0.01	0.01	+0.2%	PASS
total/micro/memory_grow_mstore/by16	0.01	0.01	+1.1%	PASS
total/micro/memory_grow_mstore/by32	0.01	0.01	+0.8%	PASS
total/micro/memory_grow_mstore/nogrow	0.01	0.01	+0.6%	PASS
total/micro/signextend/one	0.07	0.07	+0.3%	PASS
total/micro/signextend/zero	0.07	0.07	+0.1%	PASS
total/synth/ADD/b0	0.00	0.00	-0.1%	PASS
total/synth/ADD/b1	0.00	0.00	+0.1%	PASS
total/synth/ADDRESS/a0	0.15	0.15	+0.1%	PASS
total/synth/ADDRESS/a1	0.15	0.15	+0.0%	PASS
total/synth/AND/b0	0.00	0.00	-0.5%	PASS
total/synth/AND/b1	0.00	0.00	-0.1%	PASS
total/synth/BYTE/b0	0.00	0.00	-0.0%	PASS
total/synth/BYTE/b1	0.00	0.00	-0.2%	PASS
total/synth/CALLDATASIZE/a0	0.07	0.07	-0.1%	PASS
total/synth/CALLDATASIZE/a1	0.07	0.07	+0.1%	PASS
total/synth/CALLER/a0	0.18	0.18	-0.1%	PASS
total/synth/CALLER/a1	0.18	0.18	+0.0%	PASS
total/synth/CALLVALUE/a0	0.19	0.19	-0.0%	PASS
total/synth/CALLVALUE/a1	0.19	0.19	+0.0%	PASS
total/synth/CODESIZE/a0	0.07	0.07	-0.0%	PASS
total/synth/CODESIZE/a1	0.07	0.07	-0.1%	PASS
total/synth/DUP1/d0	0.00	0.00	-0.1%	PASS
total/synth/DUP1/d1	0.00	0.00	-0.3%	PASS
total/synth/DUP10/d0	0.00	0.00	-0.0%	PASS
total/synth/DUP10/d1	0.00	0.00	-0.1%	PASS
total/synth/DUP11/d0	0.00	0.00	+0.1%	PASS
total/synth/DUP11/d1	0.00	0.00	-0.3%	PASS
total/synth/DUP12/d0	0.00	0.00	+0.0%	PASS
total/synth/DUP12/d1	0.00	0.00	-0.1%	PASS
total/synth/DUP13/d0	0.00	0.00	-0.1%	PASS
total/synth/DUP13/d1	0.00	0.00	+0.1%	PASS
total/synth/DUP14/d0	0.00	0.00	-0.1%	PASS
total/synth/DUP14/d1	0.00	0.00	-0.1%	PASS
total/synth/DUP15/d0	0.00	0.00	-0.0%	PASS
total/synth/DUP15/d1	0.00	0.00	-0.3%	PASS
total/synth/DUP16/d0	0.00	0.00	-0.3%	PASS
total/synth/DUP16/d1	0.00	0.00	-0.2%	PASS
total/synth/DUP2/d0	0.00	0.00	-0.3%	PASS
total/synth/DUP2/d1	0.00	0.00	-0.2%	PASS
total/synth/DUP3/d0	0.00	0.00	-0.2%	PASS
total/synth/DUP3/d1	0.00	0.00	-0.0%	PASS
total/synth/DUP4/d0	0.00	0.00	-0.0%	PASS
total/synth/DUP4/d1	0.00	0.00	-0.4%	PASS
total/synth/DUP5/d0	0.00	0.00	-0.2%	PASS
total/synth/DUP5/d1	0.00	0.00	-0.3%	PASS
total/synth/DUP6/d0	0.00	0.00	-0.4%	PASS
total/synth/DUP6/d1	0.00	0.00	-0.1%	PASS
total/synth/DUP7/d0	0.00	0.00	-0.0%	PASS
total/synth/DUP7/d1	0.00	0.00	-0.4%	PASS
total/synth/DUP8/d0	0.00	0.00	-0.6%	PASS
total/synth/DUP8/d1	0.00	0.00	-0.2%	PASS
total/synth/DUP9/d0	0.00	0.00	-0.1%	PASS
total/synth/DUP9/d1	0.00	0.00	+0.0%	PASS
total/synth/EQ/b0	0.00	0.00	-0.1%	PASS
total/synth/EQ/b1	0.00	0.00	-0.1%	PASS
total/synth/GAS/a0	0.76	0.76	-0.0%	PASS
total/synth/GAS/a1	0.76	0.76	+0.0%	PASS
total/synth/GT/b0	0.00	0.00	-0.2%	PASS
total/synth/GT/b1	0.00	0.00	-0.3%	PASS
total/synth/ISZERO/u0	0.00	0.00	-0.0%	PASS
total/synth/JUMPDEST/n0	0.00	0.00	-0.7%	PASS
total/synth/LT/b0	0.00	0.00	-0.1%	PASS
total/synth/LT/b1	0.00	0.00	-0.1%	PASS
total/synth/MSIZE/a0	0.00	0.00	-0.0%	PASS
total/synth/MSIZE/a1	0.00	0.00	-0.2%	PASS
total/synth/MUL/b0	0.00	0.00	+0.1%	PASS
total/synth/MUL/b1	0.00	0.00	-0.2%	PASS
total/synth/NOT/u0	0.00	0.00	-0.1%	PASS
total/synth/OR/b0	0.00	0.00	-0.2%	PASS
total/synth/OR/b1	0.00	0.00	-0.3%	PASS
total/synth/PC/a0	0.00	0.00	-0.1%	PASS
total/synth/PC/a1	0.00	0.00	-0.2%	PASS
total/synth/PUSH1/p0	0.00	0.00	+0.2%	PASS
total/synth/PUSH1/p1	0.00	0.00	+0.6%	PASS
total/synth/PUSH10/p0	0.00	0.00	+1.8%	PASS
total/synth/PUSH10/p1	0.00	0.00	-0.8%	PASS
total/synth/PUSH11/p0	0.00	0.00	-2.2%	PASS
total/synth/PUSH11/p1	0.00	0.00	-0.5%	PASS
total/synth/PUSH12/p0	0.00	0.00	+0.6%	PASS
total/synth/PUSH12/p1	0.00	0.00	-1.6%	PASS
total/synth/PUSH13/p0	0.00	0.00	-0.9%	PASS
total/synth/PUSH13/p1	0.00	0.00	-0.3%	PASS
total/synth/PUSH14/p0	0.00	0.00	-0.2%	PASS
total/synth/PUSH14/p1	0.00	0.00	-0.9%	PASS
total/synth/PUSH15/p0	0.00	0.00	+0.3%	PASS
total/synth/PUSH15/p1	0.00	0.00	-0.1%	PASS
total/synth/PUSH16/p0	0.00	0.00	-1.3%	PASS
total/synth/PUSH16/p1	0.00	0.00	+0.1%	PASS
total/synth/PUSH17/p0	0.00	0.00	+0.2%	PASS
total/synth/PUSH17/p1	0.00	0.00	-0.4%	PASS
total/synth/PUSH18/p0	0.00	0.00	-0.3%	PASS
total/synth/PUSH18/p1	0.00	0.00	-0.6%	PASS
total/synth/PUSH19/p0	0.00	0.00	-0.0%	PASS
total/synth/PUSH19/p1	0.00	0.00	-0.5%	PASS
total/synth/PUSH2/p0	0.00	0.00	-0.9%	PASS
total/synth/PUSH2/p1	0.00	0.00	+0.1%	PASS
total/synth/PUSH20/p0	0.00	0.00	-0.6%	PASS
total/synth/PUSH20/p1	0.00	0.00	-0.6%	PASS
total/synth/PUSH21/p0	0.00	0.00	-0.1%	PASS
total/synth/PUSH21/p1	0.00	0.00	-0.6%	PASS
total/synth/PUSH22/p0	1.40	1.32	-5.8%	PASS
total/synth/PUSH22/p1	1.84	1.55	-15.8%	PASS
total/synth/PUSH23/p0	1.39	1.32	-5.6%	PASS
total/synth/PUSH23/p1	1.84	1.61	-12.2%	PASS
total/synth/PUSH24/p0	1.40	1.32	-5.6%	PASS
total/synth/PUSH24/p1	1.84	1.57	-14.8%	PASS
total/synth/PUSH25/p0	1.40	1.32	-5.6%	PASS
total/synth/PUSH25/p1	1.83	1.54	-15.5%	PASS
total/synth/PUSH26/p0	1.31	1.32	+0.5%	PASS
total/synth/PUSH26/p1	1.83	1.56	-14.7%	PASS
total/synth/PUSH27/p0	1.40	1.32	-5.7%	PASS
total/synth/PUSH27/p1	1.84	1.55	-16.0%	PASS
total/synth/PUSH28/p0	1.40	1.32	-5.5%	PASS
total/synth/PUSH28/p1	1.83	1.57	-14.5%	PASS
total/synth/PUSH29/p0	1.40	1.32	-5.6%	PASS
total/synth/PUSH29/p1	1.83	1.55	-15.6%	PASS
total/synth/PUSH3/p0	0.00	0.00	-0.0%	PASS
total/synth/PUSH3/p1	0.00	0.00	-0.1%	PASS
total/synth/PUSH30/p0	1.51	1.53	+1.5%	PASS
total/synth/PUSH30/p1	1.84	1.56	-15.1%	PASS
total/synth/PUSH31/p0	1.40	1.33	-5.3%	PASS
total/synth/PUSH31/p1	1.90	1.75	-8.0%	PASS
total/synth/PUSH32/p0	1.40	1.32	-5.5%	PASS
total/synth/PUSH32/p1	1.83	1.58	-13.5%	PASS
total/synth/PUSH4/p0	0.00	0.00	-1.1%	PASS
total/synth/PUSH4/p1	0.00	0.00	+0.5%	PASS
total/synth/PUSH5/p0	0.00	0.00	+0.8%	PASS
total/synth/PUSH5/p1	0.00	0.00	-0.2%	PASS
total/synth/PUSH6/p0	0.00	0.00	-0.3%	PASS
total/synth/PUSH6/p1	0.00	0.00	-2.0%	PASS
total/synth/PUSH7/p0	0.00	0.00	-2.1%	PASS
total/synth/PUSH7/p1	0.00	0.00	-2.0%	PASS
total/synth/PUSH8/p0	0.00	0.00	-1.5%	PASS
total/synth/PUSH8/p1	0.00	0.00	-1.9%	PASS
total/synth/PUSH9/p0	0.00	0.00	-0.2%	PASS
total/synth/PUSH9/p1	0.00	0.00	+0.8%	PASS
total/synth/RETURNDATASIZE/a0	0.03	0.03	+0.1%	PASS
total/synth/RETURNDATASIZE/a1	0.03	0.03	-0.3%	PASS
total/synth/SAR/b0	0.00	0.00	-0.1%	PASS
total/synth/SAR/b1	0.00	0.00	-0.3%	PASS
total/synth/SGT/b0	0.00	0.00	-0.1%	PASS
total/synth/SGT/b1	0.00	0.00	+0.0%	PASS
total/synth/SHL/b0	0.00	0.00	-0.0%	PASS
total/synth/SHL/b1	0.00	0.00	-0.2%	PASS
total/synth/SHR/b0	0.00	0.00	-0.2%	PASS
total/synth/SHR/b1	0.00	0.00	+0.0%	PASS
total/synth/SIGNEXTEND/b0	0.00	0.00	-0.0%	PASS
total/synth/SIGNEXTEND/b1	0.00	0.00	-0.3%	PASS
total/synth/SLT/b0	0.00	0.00	-0.2%	PASS
total/synth/SLT/b1	0.00	0.00	-0.7%	PASS
total/synth/SUB/b0	0.00	0.00	-0.2%	PASS
total/synth/SUB/b1	0.00	0.00	-0.2%	PASS
total/synth/SWAP1/s0	0.00	0.00	+0.1%	PASS
total/synth/SWAP10/s0	0.00	0.00	-0.4%	PASS
total/synth/SWAP11/s0	0.00	0.00	-0.1%	PASS
total/synth/SWAP12/s0	0.00	0.00	-0.2%	PASS
total/synth/SWAP13/s0	0.00	0.00	+0.1%	PASS
total/synth/SWAP14/s0	0.00	0.00	-0.2%	PASS
total/synth/SWAP15/s0	0.00	0.00	-0.3%	PASS
total/synth/SWAP16/s0	0.00	0.00	-0.4%	PASS
total/synth/SWAP2/s0	0.00	0.00	-0.0%	PASS
total/synth/SWAP3/s0	0.00	0.00	-0.3%	PASS
total/synth/SWAP4/s0	0.00	0.00	-0.1%	PASS
total/synth/SWAP5/s0	0.00	0.00	-0.4%	PASS
total/synth/SWAP6/s0	0.00	0.00	-0.2%	PASS
total/synth/SWAP7/s0	0.00	0.00	+0.1%	PASS
total/synth/SWAP8/s0	0.00	0.00	-0.3%	PASS
total/synth/SWAP9/s0	0.00	0.00	-0.1%	PASS
total/synth/XOR/b0	0.00	0.00	-0.2%	PASS
total/synth/XOR/b1	0.00	0.00	-0.1%	PASS
total/synth/loop_v1	1.50	1.50	-0.2%	PASS
total/synth/loop_v2	1.39	1.38	-0.7%	PASS

Summary: 194 benchmarks, 0 regressions

Two related changes responding to PR DTVMStack#514 review: 1. `CSRGraph::operator[]`: guard against null `Data.data()` pointer arithmetic. A single-block contract with no edges has empty CSR `Data`, and `Data.data()` is permitted to return `nullptr`. Forming `nullptr + Off[Node]` is undefined per [expr.add]/4 even when the offset is zero, and UBSan flags it. Return an empty `{nullptr, nullptr}` Range early when `Data.empty()`. 2. `computeInCycle` invariant comment: the pre-existing comment claimed that natural-loop union "captures every cycle" and that Tarjan SCC was the soundness backstop on the fallback path. R2 review of this PR established the actual invariant: InCycle is a performance fast path; soundness on irreducible CFGs rests on lemma614Update's `effectivePredCount(Succ) != 1` multi-pred guard. Align the inline comment with the module spec in `docs/modules/evm/cache-build.md` §Invariants, including the future-contributor warning not to remove the multi-pred guard on the assumption that InCycle covers it.

The doc previously stated time complexity as `O((N + E) · α(N))`. CHK is not a union-find algorithm and does not provide an inverse-Ackermann bound; the near-linear behaviour is workload-dependent, with worst-case bounded by dominator-tree depth and empirical `chkFixpointRounds = 2` on every measured workload. Reword as `O((N + E) · R)` with `R` defined as the number of fixpoint sweeps and the measured / worst-case bounds spelled out.

…tats `static_jump_stats` previously marked every PUSH-then-JUMP/JUMPI pair as a static target without decoding the pushed value or checking whether it lands on a valid `JUMPDEST` PC. This diverged from the cache builder's `resolveConstantJumpTarget` semantics in `src/evm/evm_cache.cpp`, which both decodes the constant and requires the target byte to be a `JUMPDEST` outside any PUSH-data region. The divergence undercounts dynamic JUMPs whenever a PUSH constant happens to point at a non-JUMPDEST byte, biasing the `dyn_jump_ratio` used for corpus stratification toward "static". Rewrite as a two-pass scan: pass 1 collects valid JUMPDEST PCs (skipping PUSH-data regions); pass 2 decodes each PUSH value and counts the following JUMP/JUMPI as static iff the decoded value is in the JUMPDEST set. End-of-code PUSH truncation is zero-padded on the right to match EVM stack semantics.

abmcar and others added 28 commits May 16, 2026 19:24

style(core): apply tools/format.sh to evm_cache.cpp after PR C work

77e0454

Pure clang-format adjustments to function signatures and continuation line breaks introduced over the InCycle / RPO / buildCFGEdges fusion commits. No semantic changes.

docs(docs): translate perf-summary.md to English

911f8c1

The change doc README and reviews were already in English; only perf-summary.md was mixed Chinese + English. Translate verbatim, preserving all numeric tables, identifiers, file paths, commit SHAs, and markdown structure.

Copilot AI review requested due to automatic review settings May 18, 2026 04:13

Copilot started reviewing on behalf of abmcar May 18, 2026 04:13 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread src/evm/evm_cache.md Outdated

Comment thread src/evm/evm_cache.cpp Outdated

Comment thread tests/corpus/evm-cache/fetch_sourcify_corpus.py

Comment thread src/evm/evm_cache.cpp

abmcar changed the title ~~perf(core): EVM cache-build overhaul (dom-CHK + phase fusion + CSR)~~ perf: EVM cache-build overhaul (dom-CHK + phase fusion + CSR) May 18, 2026

abmcar added 3 commits May 18, 2026 15:50

zoowii merged commit 0c19a1e into DTVMStack:main May 18, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: EVM cache-build overhaul (dom-CHK + phase fusion + CSR)#514

perf: EVM cache-build overhaul (dom-CHK + phase fusion + CSR)#514
zoowii merged 31 commits into
DTVMStack:mainfrom
abmcar:perf/cache-build-fusion

abmcar commented May 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

abmcar commented May 18, 2026

Summary

Production-scale pilot (n=10, directional)

Synthetic stress (algorithmic-DoS regime, not production scale)

What's in this PR

Test plan

Out of scope / future work

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚡ Performance Regression Check Results

✅ Performance Check Passed (interpreter)

✅ Performance Check Passed (multipass)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented May 18, 2026 •

edited

Loading