Skip to content

perf: EVM cache-build overhaul (dom-CHK + phase fusion + CSR)#514

Merged
zoowii merged 31 commits into
DTVMStack:mainfrom
abmcar:perf/cache-build-fusion
May 18, 2026
Merged

perf: EVM cache-build overhaul (dom-CHK + phase fusion + CSR)#514
zoowii merged 31 commits into
DTVMStack:mainfrom
abmcar:perf/cache-build-fusion

Conversation

@abmcar
Copy link
Copy Markdown
Contributor

@abmcar abmcar commented May 18, 2026

Summary

EVM cache-build pipeline overhaul. Two layers of work bundled in one PR
because the second layer depends on instrumentation + algorithmic
foundations introduced in the first:

  • Foundation layer: replace the iterative-bitset dominator
    (O(N²/64)) with Cooper-Harvey-Kennedy + Tarjan DFS Enter/Exit for
    O(1) dominates(A, B) queries; inline the dominator-tree children
    adjacency; add the opt-in ZEN_EVM_CACHE_PROFILE per-phase chrono
    instrumentation; add evmCacheComplexityDemo bytecode-replay mode +
    structural dominator GTests; add the bench_evm_cache.sh paired
    harness + analyze_evm_cache_bench.py paired-ratio BCa cluster-
    bootstrap analyzer; add the Sourcify + top-RPC corpus fetchers.
  • Fusion layer: collapse multi-pass bytecode/edge walks
    (buildGasBlocks 2-pass → 1-pass, collectJumpDests folded in,
    buildCFGEdges single-sweep); flatten Blocks[].Succs/Preds into a
    read-only CSRGraph after splitCriticalEdges freezes the graph and
    route every downstream reader through it; share
    DomInfo::RPO with computeReverseTopo; pack GasBlock 80 B → 32 B
    by extracting Succs/Preds into a parallel EdgeTables and
    reordering fields (static_assert(sizeof(GasBlock) == 32)-locked).

Behaviour-level semantics unchanged. evmone-statetest --vm external_vm -k fork_Cancun 2723/2723 pass and build/evmCacheTests 14/14 pass —
both re-ran after every implementation commit, not just at the end.

The fusion-layer N=100k headline (-41% / 1.69×) was independently
re-measured at 1.67× / -40.2%, reproducing within ±10%. Spec docs:

  • docs/changes/2026-05-16-evm-spp-overhaul/README.md (foundation)
  • docs/changes/2026-05-17-evm-cache-build-fusion/README.md (fusion)
  • docs/changes/2026-05-17-evm-cache-build-fusion/perf-summary.md
    (3-tier cross-N comparison + per-phase deltas + production-scale
    pilot + pre-committed gating criteria for the deferred micro-opts)

Production-scale pilot (n=10, directional)

Paired wall-clock on 10 mainnet contracts pulled via eth_getCode from
https://ethereum.publicnode.com, stratified by CodeSize. 15 reps per
binary, point estimates only — the full paired-ratio BCa
cluster-bootstrap (the foundation layer's harness applied to a wider
Sourcify corpus) remains a post-merge follow-up.

The speedup column below is the median of per-contract speedups,
not the ratio of the displayed Median baseline / Median HEAD
columns (each of which is itself a median across the stratum's
contracts).

Stratum n Median baseline (us) Median HEAD (us) Median speedup Median Δ%
small (<4 KB) 3 71.9 64.9 1.11× +9.7%
medium (4-16 KB) 4 343.1 341.3 1.12× +10.4%
large (16-25 KB) 3 1374.7 1003.9 1.25× +20.0%
overall 10 1.17× +14.9%

9 / 10 contracts faster on HEAD. DAI (-21.5%, 7.9 KB) is the one
outlier and is logged for follow-up — not a ship blocker, but warrants
investigation under a wider corpus. Per-contract rows live in
perf-summary.md.

Caveats: selection-biased toward high-traffic mainnet contracts
(USDT/USDC/Uniswap/WETH cluster); n=10 is too thin to support any
confidence-interval claim; this is a directional sanity check, not
production-grade methodology.

Synthetic stress (algorithmic-DoS regime, not production scale)

EIP-170 caps real contract bytecode at 24 576 bytes, so the deployable
upper bound is N ≲ 8000 blocks (at worst-case ~3 B / block packing).
evmCacheComplexityDemo at N=100 000 is outside what a deployed
contract can produce
and ships only as an algorithmic-DoS regression
guard. With that caveat front-loaded:

N upstream/main (us) This PR (us) Cumulative speedup
10 000 14 110 2 476 5.7×
20 000 45 862 4 876 9.4×
50 000 246 615 13 972 17.7×
100 000 959 509 29 065 33.0×

Of the 33× at N=100 000, the foundation layer contributes 18.6× (the
iterative-bitset dominator was the dominant cost at upstream/main) and
the fusion layer contributes the remaining 1.78× on top. Real deployed
contracts fall in the N=100-2000 band where the proportional gain
compresses substantially — see the production-scale pilot above for
an empirical anchor.

The pipeline goes from super-linear 2× N → 4× time (upstream/main,
matching the O(N²/64) bitset dataflow) to fully linear 2× N → 2.0× time on HEAD.

What's in this PR

28 commits = 17 implementation/test/tooling + 11 docs/review. Diff:
45 files, +6347 / -244. Source footprint: 11 files / +1896 / -244
under src/, tests/corpus/evm-cache/, and tools/; the rest is
docs.

Foundation layer (commits 48fada6..592fd35, 11 commits):

  • 48fada6 replace iterative-bitset dominator with CHK
  • 1be3f39 inline dom-tree children adjacency
  • 62ef503 opt-in ZEN_EVM_CACHE_PROFILE per-phase instrumentation
  • 3c659f6 bytecode-replay demo mode + structural dominator GTests
  • 9df8ee8 bench_evm_cache.sh + analyze_evm_cache_bench.py
    (paired-ratio BCa cluster-bootstrap; Efron-Tibshirani §14.3)
  • a75ab11 Sourcify + top-RPC corpus fetchers
  • 92c6c04, 04d0a55, b00efa1, 8a95175, 592fd35 change doc +
    review fixes

Fusion layer (commits e06d291..911f8c1, 17 commits):

  • Bytecode-walk fusion: e06d291 buildGasBlocks 2-pass → 1-pass;
    3bba649 collectJumpDests fold.
  • CSR adjacency + conditional Tarjan: 0dd5bb9 CSRGraph flatten
    after splitCriticalEdges; 4d74033 chkFixpointRounds diagnostic
    counter; 6e1bc6b conditional InCycle on reducible CFGs (skips
    Tarjan SCC when reducible).
  • Edge-build fusion + RPO share: de934a8 buildCFGEdges single-sweep;
    118c993 computeReverseTopo reads DomInfo::RPO.
  • GasBlock compaction: 55a250b Blocks.reserve(CodeSize) +
    emplace_back; 689e5d5 Succs/Preds extracted into parallel
    EdgeTables (GasBlock 80 → 40 B); f7630d8 field reorder packs to
    exact 32 B (static_assert-locked).
  • 77e0454 clang-format sweep (no semantic change).
  • 4f9f5be, c5db655, de507df, ab74da5, 99a666c, 911f8c1
    change doc + module spec + production-scale pilot + review fixes.

Safety invariant on irreducible CFGs is preserved by lemma614Update's
effectivePredCount multi-pred guard (evm_cache.cpp:1224), not by
the conditional InCycle fast-path. A future-contributor warning in
docs/modules/evm/cache-build.md §Invariants explicitly states the
multi-pred guard must NOT be removed on the assumption that InCycle
covers it. Counterexample (irreducible 2-entry cycle A ↔ B) included
in that warning.

Test plan

  • tools/format.sh check clean
  • cmake --build build --target dtvmapi -j$(nproc) succeeds with
    no new warnings (use CCACHE_DISABLE=1 if ccache mount is read-only)
  • build/evmCacheTests — 14 / 14 pass (10 dominator + 4
    implicit-dyn-pred)
  • evmone-statetest --vm external_vm -k fork_Cancun — 2723 / 2723
    pass (~80 s)
  • evmCacheComplexityDemo at N=10k/20k/50k/100k — monotone
    improvement vs upstream/main
  • Production-scale pilot on 10 mainnet contracts — 9 / 10 faster
    than upstream/main; DAI flagged for follow-up
  • Fusion-layer N=100k headline (-41% / 1.69×) independently
    re-measured at 1.67× / -40.2% (within ±10%)

Out of scope / future work

  • Stack-SSA + SCCP — dropped on data: 92.5% (statetest) / 98.4%
    (evmone-bench) of JUMPs already statically resolved by the existing
    PUSH→JUMP heuristic; expected runtime gain < 1% against 500+ LoC
    of SSA construction.
  • SemiNCA dominator — dropped on data: CHK fixpoint converges in
    2 rounds on every measured workload (logged via chkFixpointRounds
    diagnostic); SemiNCA's second-sweep saving (~1.5 ms) is comparable
    to its own DSU bookkeeping cost.
  • Cache-build micro-opts (computeReachable fold /
    buildCFGEdges dedup-skip / buildCSR prefetch /
    GasBlock hot/cold split) — gated on the production-scale
    validation follow-up. Pre-committed thresholds in perf-summary.md
    §Future-work: GO requires (i) production N ≲ 8000 paired median
    ≥ +5% AND p95 reduction ≥ 0.2 ms, (ii) end-to-end evmone-bench
    median ≥ +1% / p95 ≥ +3%, (iii) N=2000 paired ≥ 50% of N=100k
    paired, (iv) first-touch p95 reduction ≥ +5%. KILL if any clause
    fails → pivot to runtime / JIT / host-call hotspots.
  • UseLinearSPP=false dedicated GTest — deferred to a follow-up
    PR. Current irreducible-fallback path correctness rests on the
    multi-pred guard argument + evmone-statetest end-to-end. See
    docs/changes/2026-05-17-evm-cache-build-fusion/README.md.

🤖 Generated with Claude Code

abmcar and others added 28 commits May 16, 2026 19:24
…nedy algorithm

Replace the iterative bitset dataflow in computeDominators with
Cooper-Harvey-Kennedy 2001 (CHK) producing an immediate-dominator array,
augmented with Tarjan DFS pre/post times (DomInfo::Enter/Exit) so the
two consumers (findBackEdgesUsingDominators, buildLoopsUsingDominance)
answer dominance queries in O(1) via interval containment.

Memory drops from O(N^2) bits to 3N uint32_t. Time drops from O(N^2/64)
worst-case to O(N + E) typical for the reducible CFGs that EVM bytecode
produces. evmCacheComplexityDemo speedups vs the post-DTVMStack#446 bitset path:

  N=10000:  10.38 ms ->  3.38 ms   (3.1x)
  N=20000:  43.68 ms ->  5.90 ms   (7.4x)
  N=50000:        -  -> 14.48 ms
  N=100000:    948 ms -> 38.95 ms  (24.3x; user-provided pre-PR number)

Class A/B/C self-root seeding moved to init time so descendants of a
class-C node can intersect against a settled root in step 4 of the
fixpoint, preserving the old bitset pass's Dom[descendant] semantics
(verified by ClassCDescendant_SeedsAtInit).

Gates (all pass):
- format check on PR-changed files clean
- dtvmapi build no new warnings in PR-touched files
- evmone-unittests multipass 223/223
- evmone-unittests interpreter 215/215
- evmone-statetest -k fork_Cancun multipass 2723/2723 (zero new failures)
- evmCacheTests 9/9 (4 implicit-dyn-pred + 5 new dominator)
- evmCacheComplexityDemo gate thresholds met by >=2x margin

Spec and reviews: docs/changes/2026-05-12-evm-dom-chk/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace `vector<vector<uint32_t>> Children(N)` with a CSR layout
(ChildStart[N+1] + ChildIdx[]) so the dom-tree DFS no longer pays
N inner-vector heap allocations at large N. Same algorithm; same
output; lower constant factor.

Also collapse the class-A/B/C init branches in computeDomInfo into a
single `HasReachablePred` predicate, since Preds.empty() is just the
zero-pred case of the same condition. Behavior unchanged.

Gates: format clean, dtvmapi build clean, evmCacheTests 9/9,
multipass unittests 223/223, scaling demo within ±10% noise of
prior commit (still meets N=20k<15ms / N=100k<100ms gates).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Instrument buildGasChunksSPP with 13 named phase boundaries gated by
ZEN_EVM_CACHE_PROFILE compile-time flag. OFF (default) macro-elides
all chrono calls so release builds carry zero overhead. ON emits one
stderr CSV row per phase: EVM_CACHE_PROFILE,<phase>,<microseconds>.

Phases timed:
- buildGasBlocks, collectJumpDests, buildCFGEdges, splitCriticalEdges
- computeReachable (incl. dyn-target stitch), computeDomInfo
- findBackEdges, computeReverseTopo, computeInCycle
- buildLoopsUsingDominance, meteringInit, lemma614Schedule, writeback

Enables per-phase profiling of real-corpus contracts to drive
follow-up PR ordering (which phase dominates real workloads).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
evm_cache_complexity_demo: support --bytecode <hex-or-bin-file> [--label
<tag>] to time cache build on real contract bytecode. CSV row
<label>,<n_jumpdests>,<build_us> on stdout. Hex auto-detects 0x prefix
and whitespace, falls back to raw binary.

evm_cache_tests: add 5 structural dominator cases covering self-loops,
irreducible multi-entry SCCs, nested loops with shared exits, post-split
critical-edge diamonds, and dynamic-target JUMPDESTs inside static
loops. Each case asserts IDom well-formedness via a shared helper plus
behavioural invariants on cycle members.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bench_evm_cache.sh: spawn one fresh process per repetition (avoids
intra-process cache reuse); emit long-form CSV
(label,n_jumpdests,run_idx,phase,phase_us). `phase=total` comes from the
demo's stdout; per-phase rows are picked up from the demo's stderr when
the binary is built with -DZEN_EVM_CACHE_PROFILE=ON.

analyze_evm_cache_bench.py: cluster bootstrap on contracts (per-contract
unit, WITH replacement, N=1000 by default); BCa with jackknife `a`
(leave-one-contract-out, Efron 1987) and median-bias `z_0`; gate inverts
on the `total` phase as r_upper_CI <= 0.85 (= improvement_lo >= 15%%).

Sanity: baseline=treatment gives r_median=1.0 with degenerate CI; a
synthetic 50%% improvement on 10 contracts gives r=0.504, CI (0.490,
0.508), gate PASS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fetch_topcontracts.py: curated list of ~90 high-traffic mainnet contracts
(stables, DEX routers, lending markets, NFT marketplaces, infra); pulls
runtime bytecode via public JSON-RPC (eth_getCode); dedupe by codehash;
writes hex + per-contract meta JSON. Used for the primary paired-ratio
bench corpus (production-grade workload, ~80 unique contracts).

fetch_sourcify_corpus.py: pulls verified contracts via Sourcify v2 REST
API (`/contracts/{chainId}` + `/contract/.../?fields=runtimeBytecode,
metadata`); supplies solc_version / optimizer_runs / viaIR metadata for
stratified sampling. Higher noise floor than top-RPC (most newly
verified contracts are 100-200 byte proxy stubs) but provides 7-strata
metadata when needed.

.gitignore: corpus output (raw/ + manifest_*.json) is bench artefact,
not source. Fetchers are reproducible; bench results live in spec
Results section.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
evm_cache.md: replace the iterative-bitset dataflow description with the
Cooper-Harvey-Kennedy algorithm + Tarjan DFS Enter/Exit intervals (`O(N+E)`
time, `O(N)` memory, `O(1)` `dominates` queries). Add a section on the
optional `-DZEN_EVM_CACHE_PROFILE=ON` per-phase wall-clock CSV emission
used to drive `tools/bench_evm_cache.sh`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Move the spec drafted in ~/changes/2026-05-16-evm-spp-overhaul/ into the
project-required location docs/changes/2026-05-16-evm-spp-overhaul/, with
all Phase 0.5 motivation reviews + Phase 2 spec reviews retained.
DTVM/CLAUDE.md mandates change docs live under docs/changes/ as PR
artefacts, overriding the global ~/changes/ SSOT default.

Spec status is now Implemented (v3) per the latest Results section in
README.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address 4 of 6 Codex round-1 findings (1 NIT skipped, commit reword
deferred until Opus returns):

- Production gate: relabel "borderline" -> explicit FAIL on the
  recalibrated improvement_lo>0 clause, note user override on stratified
  evidence + algorithmic gate PASS.
- Algorithmic table: refresh with 9-rep median (was 5-rep). 100k now
  21.83x; add measurement-variance note acknowledging independent
  reviewer reruns in the 20-30x range.
- Step 5 scope: narrow the spec claim to IDom-only structural tests;
  enumerate the loop / SPP / fuzz invariants explicitly deferred and
  point at evmone-statetest 2723 + existing implicit-dyn-pred GTests
  for end-to-end coverage.
- analyze_evm_cache_bench.py docstring: cite Efron-Tibshirani 1993
  (per Phase 2 R2 accepted nit) instead of Efron 1987.
- fetch_topcontracts.py: split raw/ and meta/ into sibling dirs so the
  bench harness doesn't mis-interpret .meta.json files as bytecode
  files; remove sanctioned TornadoCash01 from TOP_CONTRACTS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address 5 of 6 Opus round-1 findings (cosmetic NIT 6 left as-is):

- Replace IrreducibleSCC_TwoEntryLoop test with IrreducibleImproperRegion.
  Old CFG was reducible under DTVM's dom-based loop detection (the two
  cycle edges had no dominating back-edge target, so zero loops were
  discovered and the fallback path was never exercised). New CFG is a
  Hecht-Ullman improper region: 0 -> 1 -> 2 -> 3 -> {1, 4}; 4 -> {2, 5}.
  Two overlapping back-edges (3->1 and 4->2) produce two loops with body
  intersection {2, 3} but neither containing the other, forcing the
  reducibility check to fail. The test asserts IDom correctness on the
  irreducible region plus the dominator-chain-reaches-root invariant.
- Reorder DomInfo::dominates() bounds check before the A==B shortcut so
  out-of-range equal arguments do not falsely report mutual dominance.
- evm_cache_for_testing.h: document that computeIDomForTesting is the
  dominator pass in isolation, with no computeReachable /
  splitCriticalEdges / reachability-stitch coverage.
- Spec Step 5 prose: add a downgrade note enumerating the per-fixture
  behavioural claims (InCycle, UseLinearSPP, buildLoopsUsingDominance,
  GasChunkCostSPP fallback, splitCriticalEdges write-back) and path-total
  fuzz that were deferred to PR B / PR C.
- Spec Checklist: annotate Step 7 with "production gate FAIL, override
  approved" so the failure flag is visible at scan time.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
R2 reviewers (Codex + Opus, parallel) reviewed commits b00efa1 + 8a95175
shipped for R1. Codex R2 raised 1 issue (variance band); Opus R2 raised 1
MINOR (test naming/comment drift) + 1 NIT (PR B note).

Fixes:

- Variance band (Codex R2 §5): 9-rep median rerun produced 19.26x at
  N=100k; spec said 20-30x. Refresh to "≈ 19-30x" with the four
  sampled medians explicitly listed (19.26 / 21.83 / 22.84 / 29.7);
  gate remains ≥10x.

- IrreducibleImproperRegion mis-naming (Opus R2 MINOR): the new CFG
  0->1->2->3->{1,4}; 4->{2,5} produces natural loops {1,2,3,4} and
  {2,3,4} where the second is properly nested in the first
  (reducible nest). My R1 fix-attempt comment claimed otherwise. Rename
  test to OverlappingBackEdgesIDom and rewrite the comment to describe
  it as a reducible nested case that exercises the CHK intersect
  finger-walk on a non-trivial back-edge set; soften the §"Step 5
  Scope Reduction" wording from "genuinely forces ... irreducible loop
  nest" to the truer narrative.

- Opus R2 NIT (PR B note): add a structural observation to the spec:
  dominator-based loop discovery only ever produces a properly-nested
  loop forest by construction, so exercising the SPP reducibility
  fallback at evm_cache.cpp:1019-1042 requires buildBytecodeCache-level
  plumb, not the computeIDomForTesting helper. Documented for PR B/C
  authors.

Code change (test rename + comment) verified: 14/14 evmCacheTests pass,
no other targets touched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previously buildGasBlocks ran two passes over the bytecode: pass 2a marked
IsBlockStart[CodeSize] for JUMPDEST positions and after-terminator bytes,
and pass 2b walked IsBlockStart to construct GasBlock entries. The
auxiliary IsBlockStart vector cost CodeSize bytes of allocation + memset
and one extra L1/L2-hostile traversal of the bytecode.

Replace with a single walk: each iteration opens a new block at the
current Pc, then advances opcode by opcode until either (a) a mid-block
OP_JUMPDEST is encountered (which starts a new block), or (b) a
gas-chunk terminator is processed (whose successor byte opens the next
block). Semantically identical because every block start under the old
scheme was either Pc=0, a JUMPDEST position, or the byte right after a
terminator -- all three are produced naturally by the fused loop.

Measured on evmCacheComplexityDemo N=100k synthetic (5 reps, median):

  phase buildGasBlocks: 10614 us -> 9250 us  (-13%)
  total cache build:    54818 us -> 46260 us (-15%)

The total wins more than the named phase because the eliminated
IsBlockStart vector (300 KB for N=100k synthetic) sat in the outer
buildBytecodeCache and is no longer allocated or zero-filled.

Verification:
- evmCacheTests: 14/14 pass
- evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass
- No behavioral change in callers; signature unchanged.
collectJumpDests previously re-scanned the entire bytecode after
buildGasBlocks, allocated a SeenBlocks[Blocks.size()] dedup vector, and
mapped each JUMPDEST byte through BlockAtPc to recover the unique set of
JUMPDEST-leading blocks. Every JUMPDEST byte in valid EVM code already
starts a new gas block, so the dedup is structurally unnecessary and the
re-scan is pure duplication of the buildGasBlocks walk.

Emit JumpDestBlocks inline: each iteration of buildGasBlocks reads the
opening opcode of the new block; if it is OP_JUMPDEST, push the block id
that is about to be assigned. Output is identical to the prior pass in
both set membership and block-id ascending order; downstream
buildCFGEdges and reachability seeding consume the list as an unordered
set so any iteration order is acceptable.

Measured on evmCacheComplexityDemo N=100k (5 reps, median):

  total:               46260 us -> 42813 us  (-7%, this commit)
  total vs main:       54818 us -> 42813 us  (-22%, cumulative w/ prior fusion)

The phase formerly named EVM_PROFILE,collectJumpDests is now absent from
profile output; its 0.4 ms instrumented cost plus an equivalent amount of
un-instrumented bytecode-rescan + SeenBlocks zero-fill is reclaimed.

Verification:
- evmCacheTests: 14/14 pass
- evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass
…sses

Previously every GasBlock owned a std::vector<uint32_t> Preds and Succs.
With N gas blocks that materialises 2N small heap allocations, and every
neighbour-iteration walks a pointer to a scattered heap chunk. The hot
SPP passes (computeDomInfo CHK intersect over Preds, computeInCycle SCC
DFS, findBackEdges over Succs, buildLoopsUsingDominance over both) all
pay this pointer-chase tax per node.

Flatten both directions into a contiguous CSR adjacency once after
splitCriticalEdges finishes mutating the graph, then route every
downstream reader through the new SuccsCSR / PredsCSR. The per-block
vectors stay live (we copy out, not swap) -- N std::vector dealloc()s
back-to-back cost more than the readers reclaim, so we trade short-lived
peak memory for time.

Reader-side measurement on evmCacheComplexityDemo N=100k (25 reps):

  pre-CSR  (commit 3bba649)  median = 44797 us
  post-CSR (this commit)      median = 39475 us  (-11.9%)

Per-phase breakdown shifts the cost from many "Preds/Succs reader" rows
into a single buildCSR row plus much faster readers:

  computeDomInfo        7233 -> 4169 us  (-42%)
  computeInCycle        5694 -> 3842 us  (-32%)
  computeReachable      1818 -> 970  us  (-47%)
  findBackEdges         1169 -> 342  us  (-71%)
  buildLoops            1309 -> 423  us  (-68%)
  computeReverseTopo    1651 -> 1114 us  (-32%)
  buildCSR              0    -> 3985 us  (new, single up-front cost)

Cumulative vs perf/evm-spp-foundation baseline (PR A binary at 54818 us):

  54818 us -> 39475 us  (-28.0%)

Mutating helpers (buildCFGEdges, splitCriticalEdges, addEdge) still
operate on the per-block vectors. CSR is built once after the mutations
finish, so addEdge / erase semantics in those phases are unchanged.

The testing helper computeIDomForTesting now builds its own CSR pair
in-place from the input Succs[] adjacency, matching the production
flow.

Verification:
- evmCacheTests: 14/14 pass
- evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass
Adds a single counter (gated on ZEN_EVM_CACHE_PROFILE) that prints how
many RPO sweeps the CHK fixpoint took to settle. Used to answer
"would SemiNCA help here?". Measurement on evmCacheComplexityDemo at
N = 10k / 20k / 50k / 100k synthetic shows the fixpoint settles in
exactly 2 rounds in every case -- one productive sweep plus one
confirmation sweep -- so SemiNCA's single-pass advantage caps at
roughly half of computeDomInfo's time, well under the cost of its
eval/link forest bookkeeping.

Zero runtime cost when ZEN_EVM_CACHE_PROFILE is off (macro elides).
computeInCycle previously ran an unconditional Tarjan SCC pass to mark
every block that participates in a cycle. On a reducible CFG -- the
common case for compiler-emitted EVM code -- this work is redundant:
every cycle is the natural loop of some back-edge, and
buildLoopsUsingDominance already enumerates those natural loops with
their NodeMask bitmaps. The union of NodeMasks equals Tarjan's
in-cycle set, so we can derive InCycle in one bitset OR sweep instead
of running a second full DFS pair over Succs and Preds.

Pipeline reorder:
  buildLoopsUsingDominance now runs before InCycle so its UseLinearSPP
  result and Loops vector are available to choose the cheap path.

Reducible path (UseLinearSPP=true): OR all Loops[].NodeMask bitmaps
into a CycleBits vector, then expand into the existing uint8_t
InCycle vector. Empty Loops vector yields all-zero InCycle, which is
correct -- an acyclic CFG has nothing in a cycle.

Irreducible path (UseLinearSPP=false): keep the full Tarjan SCC.
Dominator-based loop discovery can miss multi-entry cycles that have
no single header, and lemma614Update relies on InCycle correctness to
refuse gas shifts across cycles. The Tarjan backstop preserves
soundness for these cases (rare in practice -- statetest 2723 shows
no irreducible contracts trigger the fallback at scale).

Measured on evmCacheComplexityDemo N=100k (50 reps, median):

  pre  (commit 4d74033): 41247 us
  post (this commit):    39592 us  (-4.0%)

Phase delta: computeInCycle 3842 us -> 74 us; buildLoopsUsingDominance
absorbs ~1.2 ms of cold-cache cost from running first instead of
second. Net ~1.6 ms gain on synthetic, consistent across the IQR
band.

Verification:
- evmCacheTests: 14/14 pass (covers IrreducibleImproperRegion fallback
  path indirectly through computeIDomForTesting; full Tarjan branch
  exercised below)
- evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass
buildCFGEdges previously walked Blocks twice. The first pass called
resolveConstantJumpTarget on every JUMP block solely to count the
unresolved dynamic jumps and stamp ImplicitDynamicPredCount on every
JUMPDEST. The second pass walked Blocks again to add fallthrough and
jump-target edges, calling resolveConstantJumpTarget a second time on
each JUMP block to recover the same answer.

Collapse into one pass: count DynamicJumpCount inline while emitting
edges, then stamp the JUMPDESTs at the end. addEdge does not depend
on ImplicitDynamicPredCount being set, so deferring the stamp is
safe.

Measured on evmCacheComplexityDemo N=100k (50 reps):

  phase buildCFGEdges: 5315 us -> 4766 us  (-10%)
  total cache build:   39592 us -> 38595 us (-2.5%)

The phase win cancels half the per-call resolveConstantJumpTarget
cost (the function is pure of Block + constants, so the second call
returned the same answer with no side effect).

Verification:
- evmCacheTests: 14/14 pass
- evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass
computeReverseTopo previously ran its own full DFS over Succs to
produce a postorder list, explicitly skipping back-edges. That DFS
is semantically identical to the DFS computeDomInfo already runs:
visit each reachable node once, never follow back-edges (back-edge
targets are visited ancestors, so the "visited" check rejects them
anyway, making the explicit BackEdges filter redundant). Both
produce the same forward-DAG postorder.

Expose computeDomInfo's RPO as a DomInfo::RPO field. computeReverseTopo
collapses to a reverse copy of Dom.RPO -- O(N) memory traversal
instead of O(N+E) DFS.

The defensive second pass in computeDomInfo (that visits unreachable
components after the main reachable DFS) is preserved, so RPO covers
every block id, matching computeReverseTopo's previous output set.

Measured on evmCacheComplexityDemo N=100k (50 reps):

  phase computeReverseTopo: 1203 us -> 371 us  (-69%)
  total cache build:       38595 us -> 38534 us (-0.2%, within noise)

Total wall-clock barely moves because the freed cycles re-emerge as
slight increases in adjacent phases via cache effects -- the work
shifted, not actually disappeared in absolute terms. The win is
structural: less code, one fewer DFS, RPO available for future passes
that could subsume RevTopo entirely.

Verification:
- evmCacheTests: 14/14 pass
- evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass
Pure clang-format adjustments to function signatures and continuation
line breaks introduced over the InCycle / RPO / buildCFGEdges fusion
commits. No semantic changes.
…oc cost

buildGasBlocks previously default-constructed a stack-local GasBlock,
filled it across the inner opcode loop, then std::move'd it into
Blocks via push_back. Each push_back paid two costs:
  1. Move construction -- 80 bytes copied from stack-local Block into
     the vector slot.
  2. Geometric capacity growth -- log2(N) reallocations during build,
     each copying the entire prefix (~half of final size on average).
     For N=100k blocks that is roughly 4 MB of memmove traffic that
     contributes nothing to the result.

Replace with the two changes that drop both costs:
  - Blocks.reserve(CodeSize) up front. Worst-case bound: opcodeLen >= 1
    so block count is bounded by CodeSize. Real EVM averages 3-10
    bytes/block so this over-reserves transiently by 3-10x, but the
    saved realloc copies dominate. For EIP-170 production code
    (24576 B max) the reserve costs ~1.9 MB; for the synthetic stress
    demo at N=100k (CodeSize ~300 KB) it costs ~24 MB transient.
  - emplace_back() the new block into Blocks directly; bind a
    reference Blocks.back() (== emplace_back's return) and fill the
    block in place. No stack-local intermediate, no move.

Measured on evmCacheComplexityDemo N=100k (100 reps):

  phase buildGasBlocks: 10815 us ->  5108 us  (-53%)
  total cache build:    35170 us -> 31683 us  (-9.9%)

This is the single biggest win in the PR after the initial fusion. The
reserve calculation is conservative on purpose: knowing the exact final
block count would need another bytecode pass, which would itself cost
~1 ms at this scale.

Verification:
- evmCacheTests: 14/14 pass
- evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass
GasBlock previously embedded two std::vector<uint32_t> (Succs, Preds)
inline. Each vector occupies 24 bytes of control fields, so the block
header bloated to ~80 bytes per entry. Every pass that iterated Blocks
to read scalar fields (computeDomInfo class B/C init, buildLoops body
scans, meteringInit Cost copy, lemma614Update opcode/cost reads,
writeback Start/End/Cost emit) paid this 2x cache stride.

Replace with a parallel EdgeTables that holds two std::vector<vector<>>
keyed by block id. The CFG-build phase (buildCFGEdges, addEdge,
splitCriticalEdges) now operates on EdgeTables; the flatten step
(buildAdjacencyCSR) reads from EdgeTables. Downstream readers were
already on the CSR after the earlier CSR commit, so nothing else
needed touching.

GasBlock shrinks from ~80 -> ~40 bytes (4 uint32 PCs + 2 uint8
opcodes + uint64 cost + uint32 dyn-pred count = 32 bytes payload,
40 with padding). Iterating Blocks halves the cache traffic and the
default constructor stops zero-filling two 24-byte vector control
structs per emplace.

Measured on evmCacheComplexityDemo N=100k (100 reps):

  phase buildGasBlocks:     5108 us -> 2515 us  (-51%)
  phase buildCSR:           3929 us -> 2980 us  (-24%)
  phase splitCriticalEdges:  751 us ->  395 us  (-47%)
  phase writeback:           671 us ->  368 us  (-45%)
  total cache build:       31683 us -> 28642 us (-9.6%)

The buildGasBlocks win compounds with the prior reserve+emplace
commit: now each emplaced GasBlock is half the size and has no
vector ctor to invoke. The writeback win is pure stride compression
on a tight loop over Blocks.

Cumulative vs perf/evm-spp-foundation HEAD (47429 us at N=100k):
47429 -> 28642 us = -39.6%.

Verification:
- evmCacheTests: 14/14 pass
- evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass
After moving Succs/Preds out, GasBlock was 40 bytes -- the lone 8-byte
Cost sat after a uint32 ImplicitDynamicPredCount, leaving a 4-byte
trailing pad to satisfy the struct's 8-byte alignment.

Reorder so all five 32-bit fields cluster first (Start, End, LastPc,
PrevPc, ImplicitDynamicPredCount), followed by the two 1-byte opcodes
+ 2-byte tail pad to reach the 24-byte mark, then the 8-byte Cost.
Total = 32 bytes exact, two blocks per cache line, no trailing pad.

Locked in with a static_assert so future field additions get flagged.

Measured on evmCacheComplexityDemo N=100k (100 reps):

  phase buildGasBlocks: 2515 us -> 2157 us  (-14%, less zero-init/emplace)
  phase writeback:       368 us ->  331 us  (-10%)
  phase splitCriticalEdges: 395 us -> 361 us (-9%)
  total cache build:   28642 us -> 28180 us (-1.6%)

The buildGasBlocks win is the default-constructor doing less work per
emplace_back (32 bytes of zeroed memory instead of 40). The writeback
and split wins are from the tighter Block stride in their iteration
loops.

Verification:
- evmCacheTests: 14/14 pass
- evmone-statetest --vm external_vm -k fork_Cancun: 2723/2723 pass
Full-tier spec covering all 11 commits on perf/cache-build-fusion:
phase fusion (buildGasBlocks 2-pass merge, collectJumpDests fold,
buildCFGEdges single sweep), CSR adjacency + conditional Tarjan,
DomInfo::RPO share with computeReverseTopo, and the GasBlock
compaction trio (reserve + emplace_back, Succs/Preds split into
EdgeTables, 32-byte field repack).

Documents the data behind dropping PR B (Stack-SSA: 92.5/98.4%
JUMPs already static; <1% expected runtime win) and SemiNCA
(CHK fixpoint converges in 2 rounds at every measured N).

Cross-N speedup table vs perf/evm-spp-foundation baseline
(100-rep median): -21% at N=10k scaling to -41% at N=100k.
Both reviewers returned REVISE. Fixes applied:

Major (Opus M-1/M-2/M-3, Codex C4/C5):
- Remove fabricated "IrreducibleImproperRegion" test reference (the
  test is OverlappingBackEdgesIDom and its own comment disclaims
  fallback coverage). State that no unit test currently drives
  UseLinearSPP=false; end-to-end soundness comes from statetest.
- Rewrite R2 soundness argument: InCycle=union(natural-loops) is a
  *performance* fast-path, not the safety mechanism. Soundness on
  irreducible CFGs is provided by lemma614Update's multi-pred guard
  via effectivePredCount, since every SCC-internal node has at least
  one in-cycle predecessor pushing its count >= 2. Added explicit
  warning not to remove the multi-pred guard on the assumption that
  InCycle covers it.
- Soften "independently revertable" claim. Phase-internal commits
  (notably Phase 2's CSR/EdgeTables pair and Phase 5's reserve ->
  split -> repack chain) cannot be reverted in isolation without
  breaking the build; the per-commit-greenness claim remains.
- Rewrite the perf tables. Replace stitched 9-rep + 25-rep data with
  a single same-session 50-rep per-phase + 100-rep interleaved-total
  measurement, both rebuilt from src/evm/evm_cache.cpp at 592fd35 vs
  HEAD. Document methodology so reviewers can reproduce. N=100k
  speedup re-derives to 1.69x / -41.0% under this methodology.

Minor (Opus N-1..N-6, Codex C1/C6/C7/C8):
- Per-phase sum vs total discrepancy now explained (chrono overhead
  at 13 phase boundaries).
- Diff stat fixed: +312/-188 (was +236/-171).
- Commit count clarified: 11 implementation + 1 docs = 12.
- byte-identical EVMBytecodeCache claim softened to "behaviourally
  identical (statetest 2723/2723)" since no memcmp diff is run.
- R1 (Blocks.reserve) scope note added: the no-realloc guarantee
  covers only buildGasBlocks initial construction, not the later
  splitCriticalEdges append.
- R4 (chkFixpointRounds=2) caveat: synthetic stress + unit tests are
  the easy case for CHK; real-corpus measurement deferred.
- N-6 meteringInit +110% attribution downgraded to "conjecture from
  access pattern, not measured."
- Format gate description acknowledges Codex's exit-123 observation
  on pre-existing unrelated file violations; PR diff itself is clean.

Code changes (Codex 3.3 suggestion):
- Add Edges.size() == Blocks.size() invariant assert before
  buildAdjacencyCSR. Catches future drift if a new Blocks.push_back
  forgets to grow Edges in lockstep.
- Fix GasBlock layout comment ("22 pad uint16" -> "22 pad[2]") per
  Opus N-4 since there is no actual uint16 field there.

Verification:
- evmCacheTests 14/14 pass
- evmone-statetest -k fork_Cancun 2723/2723 pass
- tools/format.sh check clean
Both Round 2 reviewers (Opus + Codex, independent) returned PASS
verdicts after verifying the c5db655 round-1 fixes. Codex
re-measured N=100k at 1.67x speedup (-40.2%), reproducing the
documented 1.69x / -41.0% within +/-10% under the same interleaved
methodology. Opus noted no new issues introduced by the R1 fixes.

Polish item from Opus's R2 (non-blocking): the per-phase table
notes were one-sided -- they explained why baseline's instrumented
sum exceeds the total (chrono overhead at phase boundaries) but
not why HEAD's sum is below the total (un-instrumented outer
vector allocation in buildBytecodeCache; ~7 ms for synthetic
N=100k due to 9.6 MB PushValueMap zero-init, ~0.2 ms for EIP-170
production code). Added a paragraph explaining the asymmetry.

Review cadence: 2 rounds, target met within 1-2 cap.
- `docs/modules/evm/cache-build.md`: new module spec scoped to shipped
  state. Covers pipeline phase order, GasBlock 32B layout, EdgeTables /
  CSRGraph types, DomInfo (CHK + Tarjan E/E), conditional InCycle
  branches, ZEN_EVM_CACHE_PROFILE counters, and the R2-verbatim
  soundness invariant via lemma614Update's effectivePredCount multi-pred
  guard (with explicit future-contributor warning).
- `perf-summary.md`: appends a directional B-lite Sourcify pilot
  (n=10, paired wall-clock vs upstream/main `ef062ae` on mainnet
  contracts pulled via `eth_getCode`, stratified by CodeSize). Overall
  median 1.17x / +14.9% with 9/10 contracts faster; DAI flagged as
  follow-up outlier. Adds an operationalized future-work C-rubric with
  pre-committed GO/KILL/Partial thresholds covering production-size
  cache-build, end-to-end evmone-bench, N-stratum spread, and
  first-touch p95.
- `reviews/motivation-{1,2}-{opus,codex}.md`: dev-cycle motivation
  red-team for the A -> B -> C follow-up plan. iter=1 both REFINE
  (33x framing, C numeric trigger, C estimate provenance, R2 PASS
  preservation, B methodology). iter=2 Opus PROCEED conditional on
  three write-time fixes (C-rubric (iii) operationalize, evm_cache.md
  scope, B-lite labeling); Codex REFINE on the same convergent list.

All review-cited write-time fixes are applied in the deliverables
themselves: cache-build.md scoped tight; perf-summary B-lite labeled
"directional, n=10, selection-biased"; C-rubric (iii) replaced with
"N=2000 paired >= 50% of N=100k paired"; (iv) first-touch p95 >= 5%
clause added.
The pipeline table lists phases 0-13 (14 entries) but the chrono-overhead
prose said "13 phase pairs", which is the 13 phases inside
`buildGasChunksSPP` excluding phase 0 `buildJumpDestMap` that runs in
`buildBytecodeCache`'s outer scope. Reader cross-referencing the table
would briefly think the number was wrong. Clarify in the prose without
changing the table or the numeric overhead estimate.
The change doc README and reviews were already in English; only
perf-summary.md was mixed Chinese + English. Translate verbatim,
preserving all numeric tables, identifiers, file paths, commit
SHAs, and markdown structure.
Copilot AI review requested due to automatic review settings May 18, 2026 04:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR overhauls EVM bytecode cache-build performance by replacing/optimizing CFG and dominator-related passes, adding profiling and benchmarking tools, and documenting the new cache-build pipeline and performance methodology.

Changes:

  • Adds EVM cache-build profiling, benchmark analysis, corpus-fetching, and bytecode replay tooling.
  • Refactors cache-build internals around CHK dominators, CSR adjacency, phase fusion, and compact block metadata.
  • Adds extensive change documentation, performance summaries, and adversarial review records.

Reviewed changes

Copilot reviewed 45 out of 45 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
CMakeLists.txt Adds ZEN_EVM_CACHE_PROFILE option.
src/evm/CMakeLists.txt Propagates cache profiling define to EVM object target.
src/evm/evm_cache.cpp Implements cache-build pipeline changes, CSR graph, CHK dominators, and profiling hooks.
src/evm/evm_cache.md Updates cache-build algorithm documentation.
src/evm/evm_cache_for_testing.h Adds dominator testing helper API declaration.
src/tests/evm_cache_complexity_demo.cpp Adds bytecode replay mode and microsecond CSV output.
tests/corpus/evm-cache/.gitignore Ignores generated corpus/benchmark artifacts.
tests/corpus/evm-cache/fetch_sourcify_corpus.py Adds Sourcify corpus acquisition and metadata extraction.
tools/bench_evm_cache.sh Adds repeated fresh-process cache-build benchmark runner.
tools/analyze_evm_cache_bench.py Adds paired-ratio BCa bootstrap analyzer.
docs/modules/evm/cache-build.md Adds module-level cache-build specification and invariants.
docs/changes/2026-05-17-evm-cache-build-fusion/perf-summary.md Adds performance summary and follow-up gating rubric.
docs/changes/2026-05-17-evm-cache-build-fusion/reviews/round-1-opus.md Adds round-1 review record.
docs/changes/2026-05-17-evm-cache-build-fusion/reviews/round-1-codex.md Adds round-1 review record.
docs/changes/2026-05-17-evm-cache-build-fusion/reviews/round-2-opus.md Adds round-2 verification record.
docs/changes/2026-05-17-evm-cache-build-fusion/reviews/round-2-codex.md Adds round-2 verification record.
docs/changes/2026-05-17-evm-cache-build-fusion/reviews/motivation-1-opus.md Adds motivation review record.
docs/changes/2026-05-17-evm-cache-build-fusion/reviews/motivation-1-codex.md Adds motivation review record.
docs/changes/2026-05-17-evm-cache-build-fusion/reviews/motivation-2-opus.md Adds follow-up motivation review record.
docs/changes/2026-05-17-evm-cache-build-fusion/reviews/motivation-2-codex.md Adds follow-up motivation review record.
docs/changes/2026-05-16-evm-spp-overhaul/problem-statement.md Adds scoped problem statement for foundation work.
docs/changes/2026-05-16-evm-spp-overhaul/reviews/* Adds foundation-layer motivation, spec, and implementation review records.
docs/changes/2026-05-12-evm-dom-chk/reviews/* Adds prior dominator-change review records.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/evm/evm_cache.md Outdated
Comment thread src/evm/evm_cache.cpp Outdated
Comment thread tests/corpus/evm-cache/fetch_sourcify_corpus.py
Comment thread src/evm/evm_cache.cpp
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

⚡ Performance Regression Check Results

✅ Performance Check Passed (interpreter)

Performance Benchmark Results (threshold: 25%)

Benchmark Baseline (us) Current (us) Change Status
total/main/blake2b_huff/8415nulls 2.52 2.52 -0.2% PASS
total/main/blake2b_huff/empty 0.04 0.04 -0.6% PASS
total/main/blake2b_shifts/8415nulls 20.14 20.53 +2.0% PASS
total/main/sha1_divs/5311 8.66 8.63 -0.3% PASS
total/main/sha1_divs/empty 0.10 0.10 -0.5% PASS
total/main/sha1_shifts/5311 6.06 6.18 +1.9% PASS
total/main/sha1_shifts/empty 0.07 0.07 +0.2% PASS
total/main/snailtracer/benchmark 72.71 72.50 -0.3% PASS
total/main/structarray_alloc/nfts_rank 1.46 1.46 -0.1% PASS
total/main/swap_math/insufficient_liquidity 0.00 0.00 +4.6% PASS
total/main/swap_math/received 0.01 0.01 +4.9% PASS
total/main/swap_math/spent 0.01 0.01 +5.4% PASS
total/main/weierstrudel/1 0.30 0.30 +0.7% PASS
total/main/weierstrudel/15 3.43 3.41 -0.5% PASS
total/micro/JUMPDEST_n0/empty 3.01 3.01 +0.0% PASS
total/micro/jump_around/empty 0.06 0.07 +13.4% PASS
total/micro/loop_with_many_jumpdests/empty 45.85 45.84 -0.0% PASS
total/micro/memory_grow_mload/by1 0.12 0.12 -1.6% PASS
total/micro/memory_grow_mload/by16 0.14 0.14 -0.3% PASS
total/micro/memory_grow_mload/by32 0.16 0.16 -0.8% PASS
total/micro/memory_grow_mload/nogrow 0.12 0.12 -1.3% PASS
total/micro/memory_grow_mstore/by1 0.13 0.13 -2.6% PASS
total/micro/memory_grow_mstore/by16 0.15 0.15 -0.5% PASS
total/micro/memory_grow_mstore/by32 0.16 0.16 -0.2% PASS
total/micro/memory_grow_mstore/nogrow 0.13 0.13 -1.4% PASS
total/micro/signextend/one 0.27 0.27 -0.1% PASS
total/micro/signextend/zero 0.27 0.27 -0.3% PASS
total/synth/ADD/b0 3.31 3.22 -2.6% PASS
total/synth/ADD/b1 3.70 4.12 +11.1% PASS
total/synth/ADDRESS/a0 5.72 5.65 -1.2% PASS
total/synth/ADDRESS/a1 6.17 6.10 -1.2% PASS
total/synth/AND/b0 3.02 3.02 +0.0% PASS
total/synth/AND/b1 3.60 3.91 +8.4% PASS
total/synth/BYTE/b0 6.86 6.90 +0.6% PASS
total/synth/BYTE/b1 5.78 5.81 +0.6% PASS
total/synth/CALLDATASIZE/a0 3.34 3.44 +3.0% PASS
total/synth/CALLDATASIZE/a1 4.33 3.95 -8.7% PASS
total/synth/CALLER/a0 5.70 5.71 +0.1% PASS
total/synth/CALLER/a1 6.17 6.10 -1.2% PASS
total/synth/CALLVALUE/a0 4.16 3.25 -21.8% PASS
total/synth/CALLVALUE/a1 3.81 3.62 -5.0% PASS
total/synth/CODESIZE/a0 4.00 3.61 -9.7% PASS
total/synth/CODESIZE/a1 4.10 4.11 +0.4% PASS
total/synth/DUP1/d0 1.44 1.43 -0.2% PASS
total/synth/DUP1/d1 1.95 1.87 -4.1% PASS
total/synth/DUP10/d0 1.48 1.44 -2.7% PASS
total/synth/DUP10/d1 1.95 1.87 -4.0% PASS
total/synth/DUP11/d0 1.44 1.44 -0.1% PASS
total/synth/DUP11/d1 1.95 1.65 -15.6% PASS
total/synth/DUP12/d0 1.48 1.44 -2.6% PASS
total/synth/DUP12/d1 1.95 1.65 -15.5% PASS
total/synth/DUP13/d0 1.48 1.44 -2.6% PASS
total/synth/DUP13/d1 1.95 1.87 -4.2% PASS
total/synth/DUP14/d0 1.44 1.44 -0.2% PASS
total/synth/DUP14/d1 1.95 1.87 -4.0% PASS
total/synth/DUP15/d0 1.48 1.44 -2.6% PASS
total/synth/DUP15/d1 1.95 1.87 -3.9% PASS
total/synth/DUP16/d0 1.48 1.44 -2.5% PASS
total/synth/DUP16/d1 1.95 1.87 -4.0% PASS
total/synth/DUP2/d0 1.48 1.43 -3.1% PASS
total/synth/DUP2/d1 1.95 1.68 -13.9% PASS
total/synth/DUP3/d0 1.48 1.43 -2.8% PASS
total/synth/DUP3/d1 1.95 1.65 -15.4% PASS
total/synth/DUP4/d0 1.44 1.43 -0.0% PASS
total/synth/DUP4/d1 1.95 1.88 -3.8% PASS
total/synth/DUP5/d0 1.44 1.43 -0.0% PASS
total/synth/DUP5/d1 1.95 1.65 -15.4% PASS
total/synth/DUP6/d0 1.48 1.44 -2.7% PASS
total/synth/DUP6/d1 1.95 1.87 -4.1% PASS
total/synth/DUP7/d0 1.44 1.44 -0.2% PASS
total/synth/DUP7/d1 1.96 1.87 -4.3% PASS
total/synth/DUP8/d0 1.48 1.44 -2.8% PASS
total/synth/DUP8/d1 1.95 1.88 -3.7% PASS
total/synth/DUP9/d0 1.44 1.44 -0.1% PASS
total/synth/DUP9/d1 1.95 1.87 -4.0% PASS
total/synth/EQ/b0 6.00 6.09 +1.5% PASS
total/synth/EQ/b1 6.56 6.60 +0.7% PASS
total/synth/GAS/a0 3.89 3.88 -0.3% PASS
total/synth/GAS/a1 4.37 4.15 -5.0% PASS
total/synth/GT/b0 5.92 5.77 -2.5% PASS
total/synth/GT/b1 6.10 6.22 +1.9% PASS
total/synth/ISZERO/u0 9.64 9.64 +0.0% PASS
total/synth/JUMPDEST/n0 3.01 3.01 -0.0% PASS
total/synth/LT/b0 5.76 5.75 -0.1% PASS
total/synth/LT/b1 6.11 6.23 +2.0% PASS
total/synth/MSIZE/a0 5.07 5.07 -0.0% PASS
total/synth/MSIZE/a1 5.56 5.49 -1.2% PASS
total/synth/MUL/b0 6.24 6.32 +1.3% PASS
total/synth/MUL/b1 6.79 6.76 -0.3% PASS
total/synth/NOT/u0 5.24 5.16 -1.6% PASS
total/synth/OR/b0 3.03 3.02 -0.3% PASS
total/synth/OR/b1 3.51 3.84 +9.4% PASS
total/synth/PC/a0 3.41 3.44 +1.0% PASS
total/synth/PC/a1 4.34 4.16 -4.1% PASS
total/synth/PUSH1/p0 1.47 1.47 -0.0% PASS
total/synth/PUSH1/p1 2.07 1.98 -4.2% PASS
total/synth/PUSH10/p0 1.51 1.51 +0.3% PASS
total/synth/PUSH10/p1 2.07 1.75 -15.5% PASS
total/synth/PUSH11/p0 1.52 1.52 +0.0% PASS
total/synth/PUSH11/p1 2.08 1.75 -15.6% PASS
total/synth/PUSH12/p0 1.50 1.50 +0.0% PASS
total/synth/PUSH12/p1 2.08 1.76 -15.4% PASS
total/synth/PUSH13/p0 1.52 1.50 -1.1% PASS
total/synth/PUSH13/p1 2.07 1.75 -15.5% PASS
total/synth/PUSH14/p0 1.52 1.51 -0.6% PASS
total/synth/PUSH14/p1 2.08 1.98 -4.5% PASS
total/synth/PUSH15/p0 1.51 1.50 -0.6% PASS
total/synth/PUSH15/p1 2.08 1.78 -14.7% PASS
total/synth/PUSH16/p0 1.51 1.50 -0.1% PASS
total/synth/PUSH16/p1 2.08 1.98 -4.9% PASS
total/synth/PUSH17/p0 1.52 1.52 +0.1% PASS
total/synth/PUSH17/p1 2.08 1.76 -15.5% PASS
total/synth/PUSH18/p0 1.52 1.52 +0.5% PASS
total/synth/PUSH18/p1 2.07 1.76 -15.1% PASS
total/synth/PUSH19/p0 1.51 1.51 +0.3% PASS
total/synth/PUSH19/p1 2.08 1.75 -15.5% PASS
total/synth/PUSH2/p0 1.49 1.51 +0.9% PASS
total/synth/PUSH2/p1 2.06 1.97 -4.2% PASS
total/synth/PUSH20/p0 1.52 1.51 -0.0% PASS
total/synth/PUSH20/p1 2.07 1.99 -4.3% PASS
total/synth/PUSH21/p0 1.52 1.52 -0.1% PASS
total/synth/PUSH21/p1 2.07 1.75 -15.1% PASS
total/synth/PUSH22/p0 1.51 1.50 -0.3% PASS
total/synth/PUSH22/p1 2.07 1.99 -4.1% PASS
total/synth/PUSH23/p0 1.51 1.52 +0.4% PASS
total/synth/PUSH23/p1 2.07 1.75 -15.3% PASS
total/synth/PUSH24/p0 1.51 1.51 -0.0% PASS
total/synth/PUSH24/p1 2.09 1.99 -4.7% PASS
total/synth/PUSH25/p0 1.52 1.52 -0.3% PASS
total/synth/PUSH25/p1 2.07 1.99 -4.1% PASS
total/synth/PUSH26/p0 1.51 1.51 -0.1% PASS
total/synth/PUSH26/p1 2.07 1.99 -4.3% PASS
total/synth/PUSH27/p0 1.52 1.52 -0.3% PASS
total/synth/PUSH27/p1 2.07 1.98 -4.3% PASS
total/synth/PUSH28/p0 1.53 1.51 -1.0% PASS
total/synth/PUSH28/p1 2.07 1.99 -4.1% PASS
total/synth/PUSH29/p0 1.52 1.51 -0.6% PASS
total/synth/PUSH29/p1 2.08 1.99 -4.0% PASS
total/synth/PUSH3/p0 1.51 1.52 +0.7% PASS
total/synth/PUSH3/p1 2.07 1.98 -4.4% PASS
total/synth/PUSH30/p0 1.58 1.57 -0.1% PASS
total/synth/PUSH30/p1 2.08 1.76 -15.4% PASS
total/synth/PUSH31/p0 1.52 1.53 +0.5% PASS
total/synth/PUSH31/p1 2.11 1.82 -13.8% PASS
total/synth/PUSH32/p0 1.53 1.51 -1.1% PASS
total/synth/PUSH32/p1 2.09 1.76 -15.7% PASS
total/synth/PUSH4/p0 1.51 1.51 +0.1% PASS
total/synth/PUSH4/p1 2.08 1.75 -15.7% PASS
total/synth/PUSH5/p0 1.51 1.51 +0.0% PASS
total/synth/PUSH5/p1 2.07 1.98 -4.5% PASS
total/synth/PUSH6/p0 1.51 1.50 -0.4% PASS
total/synth/PUSH6/p1 2.06 1.98 -4.2% PASS
total/synth/PUSH7/p0 1.51 1.51 -0.2% PASS
total/synth/PUSH7/p1 2.07 1.77 -14.7% PASS
total/synth/PUSH8/p0 1.52 1.51 -0.8% PASS
total/synth/PUSH8/p1 2.07 1.75 -15.2% PASS
total/synth/PUSH9/p0 1.51 1.51 +0.0% PASS
total/synth/PUSH9/p1 2.06 1.97 -4.4% PASS
total/synth/RETURNDATASIZE/a0 4.05 3.62 -10.5% PASS
total/synth/RETURNDATASIZE/a1 4.22 4.12 -2.4% PASS
total/synth/SAR/b0 4.45 4.45 +0.0% PASS
total/synth/SAR/b1 5.18 5.25 +1.3% PASS
total/synth/SGT/b0 4.39 4.34 -1.1% PASS
total/synth/SGT/b1 5.06 4.90 -3.2% PASS
total/synth/SHL/b0 3.98 3.94 -0.8% PASS
total/synth/SHL/b1 3.84 3.67 -4.6% PASS
total/synth/SHR/b0 3.63 3.64 +0.4% PASS
total/synth/SHR/b1 3.74 4.00 +7.0% PASS
total/synth/SIGNEXTEND/b0 3.43 3.45 +0.4% PASS
total/synth/SIGNEXTEND/b1 4.07 3.98 -2.4% PASS
total/synth/SLT/b0 4.30 4.12 -4.4% PASS
total/synth/SLT/b1 5.08 4.90 -3.5% PASS
total/synth/SUB/b0 3.22 3.24 +0.5% PASS
total/synth/SUB/b1 3.66 4.14 +12.9% PASS
total/synth/SWAP1/s0 3.43 3.43 -0.1% PASS
total/synth/SWAP10/s0 3.45 3.45 -0.0% PASS
total/synth/SWAP11/s0 3.45 3.45 -0.0% PASS
total/synth/SWAP12/s0 3.46 3.45 -0.2% PASS
total/synth/SWAP13/s0 3.46 3.46 +0.0% PASS
total/synth/SWAP14/s0 3.46 3.46 -0.0% PASS
total/synth/SWAP15/s0 3.31 3.29 -0.5% PASS
total/synth/SWAP16/s0 3.39 3.39 -0.1% PASS
total/synth/SWAP2/s0 3.43 3.43 +0.0% PASS
total/synth/SWAP3/s0 3.44 3.43 -0.1% PASS
total/synth/SWAP4/s0 3.44 3.44 -0.1% PASS
total/synth/SWAP5/s0 3.44 3.44 +0.2% PASS
total/synth/SWAP6/s0 3.44 3.44 +0.1% PASS
total/synth/SWAP7/s0 3.45 3.45 +0.0% PASS
total/synth/SWAP8/s0 3.45 3.45 +0.0% PASS
total/synth/SWAP9/s0 3.45 3.45 +0.0% PASS
total/synth/XOR/b0 3.02 3.02 -0.0% PASS
total/synth/XOR/b1 3.61 3.70 +2.7% PASS
total/synth/loop_v1 7.11 7.08 -0.4% PASS
total/synth/loop_v2 7.04 7.09 +0.7% PASS

Summary: 194 benchmarks, 0 regressions


✅ Performance Check Passed (multipass)

Performance Benchmark Results (threshold: 25%)

Benchmark Baseline (us) Current (us) Change Status
total/main/blake2b_huff/8415nulls 0.81 0.81 -1.0% PASS
total/main/blake2b_huff/empty 0.01 0.01 +0.5% PASS
total/main/blake2b_shifts/8415nulls 4.43 4.41 -0.6% PASS
total/main/sha1_divs/5311 0.58 0.58 -0.1% PASS
total/main/sha1_divs/empty 0.01 0.01 -0.1% PASS
total/main/sha1_shifts/5311 0.54 0.54 +0.3% PASS
total/main/sha1_shifts/empty 0.01 0.01 +0.7% PASS
total/main/snailtracer/benchmark 31.09 31.09 +0.0% PASS
total/main/structarray_alloc/nfts_rank 0.27 0.27 +0.7% PASS
total/main/swap_math/insufficient_liquidity 0.00 0.00 -0.8% PASS
total/main/swap_math/received 0.00 0.00 -0.6% PASS
total/main/swap_math/spent 0.00 0.00 +0.6% PASS
total/main/weierstrudel/1 0.25 0.24 -1.2% PASS
total/main/weierstrudel/15 2.63 2.59 -1.5% PASS
total/micro/JUMPDEST_n0/empty 0.00 0.00 -0.9% PASS
total/micro/jump_around/empty 0.06 0.06 +4.5% PASS
total/micro/loop_with_many_jumpdests/empty 0.00 0.00 -0.1% PASS
total/micro/memory_grow_mload/by1 0.01 0.01 +0.9% PASS
total/micro/memory_grow_mload/by16 0.01 0.01 -1.1% PASS
total/micro/memory_grow_mload/by32 0.01 0.01 -0.3% PASS
total/micro/memory_grow_mload/nogrow 0.01 0.01 +0.6% PASS
total/micro/memory_grow_mstore/by1 0.01 0.01 +0.2% PASS
total/micro/memory_grow_mstore/by16 0.01 0.01 +1.1% PASS
total/micro/memory_grow_mstore/by32 0.01 0.01 +0.8% PASS
total/micro/memory_grow_mstore/nogrow 0.01 0.01 +0.6% PASS
total/micro/signextend/one 0.07 0.07 +0.3% PASS
total/micro/signextend/zero 0.07 0.07 +0.1% PASS
total/synth/ADD/b0 0.00 0.00 -0.1% PASS
total/synth/ADD/b1 0.00 0.00 +0.1% PASS
total/synth/ADDRESS/a0 0.15 0.15 +0.1% PASS
total/synth/ADDRESS/a1 0.15 0.15 +0.0% PASS
total/synth/AND/b0 0.00 0.00 -0.5% PASS
total/synth/AND/b1 0.00 0.00 -0.1% PASS
total/synth/BYTE/b0 0.00 0.00 -0.0% PASS
total/synth/BYTE/b1 0.00 0.00 -0.2% PASS
total/synth/CALLDATASIZE/a0 0.07 0.07 -0.1% PASS
total/synth/CALLDATASIZE/a1 0.07 0.07 +0.1% PASS
total/synth/CALLER/a0 0.18 0.18 -0.1% PASS
total/synth/CALLER/a1 0.18 0.18 +0.0% PASS
total/synth/CALLVALUE/a0 0.19 0.19 -0.0% PASS
total/synth/CALLVALUE/a1 0.19 0.19 +0.0% PASS
total/synth/CODESIZE/a0 0.07 0.07 -0.0% PASS
total/synth/CODESIZE/a1 0.07 0.07 -0.1% PASS
total/synth/DUP1/d0 0.00 0.00 -0.1% PASS
total/synth/DUP1/d1 0.00 0.00 -0.3% PASS
total/synth/DUP10/d0 0.00 0.00 -0.0% PASS
total/synth/DUP10/d1 0.00 0.00 -0.1% PASS
total/synth/DUP11/d0 0.00 0.00 +0.1% PASS
total/synth/DUP11/d1 0.00 0.00 -0.3% PASS
total/synth/DUP12/d0 0.00 0.00 +0.0% PASS
total/synth/DUP12/d1 0.00 0.00 -0.1% PASS
total/synth/DUP13/d0 0.00 0.00 -0.1% PASS
total/synth/DUP13/d1 0.00 0.00 +0.1% PASS
total/synth/DUP14/d0 0.00 0.00 -0.1% PASS
total/synth/DUP14/d1 0.00 0.00 -0.1% PASS
total/synth/DUP15/d0 0.00 0.00 -0.0% PASS
total/synth/DUP15/d1 0.00 0.00 -0.3% PASS
total/synth/DUP16/d0 0.00 0.00 -0.3% PASS
total/synth/DUP16/d1 0.00 0.00 -0.2% PASS
total/synth/DUP2/d0 0.00 0.00 -0.3% PASS
total/synth/DUP2/d1 0.00 0.00 -0.2% PASS
total/synth/DUP3/d0 0.00 0.00 -0.2% PASS
total/synth/DUP3/d1 0.00 0.00 -0.0% PASS
total/synth/DUP4/d0 0.00 0.00 -0.0% PASS
total/synth/DUP4/d1 0.00 0.00 -0.4% PASS
total/synth/DUP5/d0 0.00 0.00 -0.2% PASS
total/synth/DUP5/d1 0.00 0.00 -0.3% PASS
total/synth/DUP6/d0 0.00 0.00 -0.4% PASS
total/synth/DUP6/d1 0.00 0.00 -0.1% PASS
total/synth/DUP7/d0 0.00 0.00 -0.0% PASS
total/synth/DUP7/d1 0.00 0.00 -0.4% PASS
total/synth/DUP8/d0 0.00 0.00 -0.6% PASS
total/synth/DUP8/d1 0.00 0.00 -0.2% PASS
total/synth/DUP9/d0 0.00 0.00 -0.1% PASS
total/synth/DUP9/d1 0.00 0.00 +0.0% PASS
total/synth/EQ/b0 0.00 0.00 -0.1% PASS
total/synth/EQ/b1 0.00 0.00 -0.1% PASS
total/synth/GAS/a0 0.76 0.76 -0.0% PASS
total/synth/GAS/a1 0.76 0.76 +0.0% PASS
total/synth/GT/b0 0.00 0.00 -0.2% PASS
total/synth/GT/b1 0.00 0.00 -0.3% PASS
total/synth/ISZERO/u0 0.00 0.00 -0.0% PASS
total/synth/JUMPDEST/n0 0.00 0.00 -0.7% PASS
total/synth/LT/b0 0.00 0.00 -0.1% PASS
total/synth/LT/b1 0.00 0.00 -0.1% PASS
total/synth/MSIZE/a0 0.00 0.00 -0.0% PASS
total/synth/MSIZE/a1 0.00 0.00 -0.2% PASS
total/synth/MUL/b0 0.00 0.00 +0.1% PASS
total/synth/MUL/b1 0.00 0.00 -0.2% PASS
total/synth/NOT/u0 0.00 0.00 -0.1% PASS
total/synth/OR/b0 0.00 0.00 -0.2% PASS
total/synth/OR/b1 0.00 0.00 -0.3% PASS
total/synth/PC/a0 0.00 0.00 -0.1% PASS
total/synth/PC/a1 0.00 0.00 -0.2% PASS
total/synth/PUSH1/p0 0.00 0.00 +0.2% PASS
total/synth/PUSH1/p1 0.00 0.00 +0.6% PASS
total/synth/PUSH10/p0 0.00 0.00 +1.8% PASS
total/synth/PUSH10/p1 0.00 0.00 -0.8% PASS
total/synth/PUSH11/p0 0.00 0.00 -2.2% PASS
total/synth/PUSH11/p1 0.00 0.00 -0.5% PASS
total/synth/PUSH12/p0 0.00 0.00 +0.6% PASS
total/synth/PUSH12/p1 0.00 0.00 -1.6% PASS
total/synth/PUSH13/p0 0.00 0.00 -0.9% PASS
total/synth/PUSH13/p1 0.00 0.00 -0.3% PASS
total/synth/PUSH14/p0 0.00 0.00 -0.2% PASS
total/synth/PUSH14/p1 0.00 0.00 -0.9% PASS
total/synth/PUSH15/p0 0.00 0.00 +0.3% PASS
total/synth/PUSH15/p1 0.00 0.00 -0.1% PASS
total/synth/PUSH16/p0 0.00 0.00 -1.3% PASS
total/synth/PUSH16/p1 0.00 0.00 +0.1% PASS
total/synth/PUSH17/p0 0.00 0.00 +0.2% PASS
total/synth/PUSH17/p1 0.00 0.00 -0.4% PASS
total/synth/PUSH18/p0 0.00 0.00 -0.3% PASS
total/synth/PUSH18/p1 0.00 0.00 -0.6% PASS
total/synth/PUSH19/p0 0.00 0.00 -0.0% PASS
total/synth/PUSH19/p1 0.00 0.00 -0.5% PASS
total/synth/PUSH2/p0 0.00 0.00 -0.9% PASS
total/synth/PUSH2/p1 0.00 0.00 +0.1% PASS
total/synth/PUSH20/p0 0.00 0.00 -0.6% PASS
total/synth/PUSH20/p1 0.00 0.00 -0.6% PASS
total/synth/PUSH21/p0 0.00 0.00 -0.1% PASS
total/synth/PUSH21/p1 0.00 0.00 -0.6% PASS
total/synth/PUSH22/p0 1.40 1.32 -5.8% PASS
total/synth/PUSH22/p1 1.84 1.55 -15.8% PASS
total/synth/PUSH23/p0 1.39 1.32 -5.6% PASS
total/synth/PUSH23/p1 1.84 1.61 -12.2% PASS
total/synth/PUSH24/p0 1.40 1.32 -5.6% PASS
total/synth/PUSH24/p1 1.84 1.57 -14.8% PASS
total/synth/PUSH25/p0 1.40 1.32 -5.6% PASS
total/synth/PUSH25/p1 1.83 1.54 -15.5% PASS
total/synth/PUSH26/p0 1.31 1.32 +0.5% PASS
total/synth/PUSH26/p1 1.83 1.56 -14.7% PASS
total/synth/PUSH27/p0 1.40 1.32 -5.7% PASS
total/synth/PUSH27/p1 1.84 1.55 -16.0% PASS
total/synth/PUSH28/p0 1.40 1.32 -5.5% PASS
total/synth/PUSH28/p1 1.83 1.57 -14.5% PASS
total/synth/PUSH29/p0 1.40 1.32 -5.6% PASS
total/synth/PUSH29/p1 1.83 1.55 -15.6% PASS
total/synth/PUSH3/p0 0.00 0.00 -0.0% PASS
total/synth/PUSH3/p1 0.00 0.00 -0.1% PASS
total/synth/PUSH30/p0 1.51 1.53 +1.5% PASS
total/synth/PUSH30/p1 1.84 1.56 -15.1% PASS
total/synth/PUSH31/p0 1.40 1.33 -5.3% PASS
total/synth/PUSH31/p1 1.90 1.75 -8.0% PASS
total/synth/PUSH32/p0 1.40 1.32 -5.5% PASS
total/synth/PUSH32/p1 1.83 1.58 -13.5% PASS
total/synth/PUSH4/p0 0.00 0.00 -1.1% PASS
total/synth/PUSH4/p1 0.00 0.00 +0.5% PASS
total/synth/PUSH5/p0 0.00 0.00 +0.8% PASS
total/synth/PUSH5/p1 0.00 0.00 -0.2% PASS
total/synth/PUSH6/p0 0.00 0.00 -0.3% PASS
total/synth/PUSH6/p1 0.00 0.00 -2.0% PASS
total/synth/PUSH7/p0 0.00 0.00 -2.1% PASS
total/synth/PUSH7/p1 0.00 0.00 -2.0% PASS
total/synth/PUSH8/p0 0.00 0.00 -1.5% PASS
total/synth/PUSH8/p1 0.00 0.00 -1.9% PASS
total/synth/PUSH9/p0 0.00 0.00 -0.2% PASS
total/synth/PUSH9/p1 0.00 0.00 +0.8% PASS
total/synth/RETURNDATASIZE/a0 0.03 0.03 +0.1% PASS
total/synth/RETURNDATASIZE/a1 0.03 0.03 -0.3% PASS
total/synth/SAR/b0 0.00 0.00 -0.1% PASS
total/synth/SAR/b1 0.00 0.00 -0.3% PASS
total/synth/SGT/b0 0.00 0.00 -0.1% PASS
total/synth/SGT/b1 0.00 0.00 +0.0% PASS
total/synth/SHL/b0 0.00 0.00 -0.0% PASS
total/synth/SHL/b1 0.00 0.00 -0.2% PASS
total/synth/SHR/b0 0.00 0.00 -0.2% PASS
total/synth/SHR/b1 0.00 0.00 +0.0% PASS
total/synth/SIGNEXTEND/b0 0.00 0.00 -0.0% PASS
total/synth/SIGNEXTEND/b1 0.00 0.00 -0.3% PASS
total/synth/SLT/b0 0.00 0.00 -0.2% PASS
total/synth/SLT/b1 0.00 0.00 -0.7% PASS
total/synth/SUB/b0 0.00 0.00 -0.2% PASS
total/synth/SUB/b1 0.00 0.00 -0.2% PASS
total/synth/SWAP1/s0 0.00 0.00 +0.1% PASS
total/synth/SWAP10/s0 0.00 0.00 -0.4% PASS
total/synth/SWAP11/s0 0.00 0.00 -0.1% PASS
total/synth/SWAP12/s0 0.00 0.00 -0.2% PASS
total/synth/SWAP13/s0 0.00 0.00 +0.1% PASS
total/synth/SWAP14/s0 0.00 0.00 -0.2% PASS
total/synth/SWAP15/s0 0.00 0.00 -0.3% PASS
total/synth/SWAP16/s0 0.00 0.00 -0.4% PASS
total/synth/SWAP2/s0 0.00 0.00 -0.0% PASS
total/synth/SWAP3/s0 0.00 0.00 -0.3% PASS
total/synth/SWAP4/s0 0.00 0.00 -0.1% PASS
total/synth/SWAP5/s0 0.00 0.00 -0.4% PASS
total/synth/SWAP6/s0 0.00 0.00 -0.2% PASS
total/synth/SWAP7/s0 0.00 0.00 +0.1% PASS
total/synth/SWAP8/s0 0.00 0.00 -0.3% PASS
total/synth/SWAP9/s0 0.00 0.00 -0.1% PASS
total/synth/XOR/b0 0.00 0.00 -0.2% PASS
total/synth/XOR/b1 0.00 0.00 -0.1% PASS
total/synth/loop_v1 1.50 1.50 -0.2% PASS
total/synth/loop_v2 1.39 1.38 -0.7% PASS

Summary: 194 benchmarks, 0 regressions


@abmcar abmcar changed the title perf(core): EVM cache-build overhaul (dom-CHK + phase fusion + CSR) perf: EVM cache-build overhaul (dom-CHK + phase fusion + CSR) May 18, 2026
abmcar added 3 commits May 18, 2026 15:50
Two related changes responding to PR DTVMStack#514 review:

1. `CSRGraph::operator[]`: guard against null `Data.data()` pointer
   arithmetic. A single-block contract with no edges has empty CSR
   `Data`, and `Data.data()` is permitted to return `nullptr`. Forming
   `nullptr + Off[Node]` is undefined per [expr.add]/4 even when the
   offset is zero, and UBSan flags it. Return an empty `{nullptr,
   nullptr}` Range early when `Data.empty()`.

2. `computeInCycle` invariant comment: the pre-existing comment claimed
   that natural-loop union "captures every cycle" and that Tarjan SCC
   was the soundness backstop on the fallback path. R2 review of this
   PR established the actual invariant: InCycle is a performance fast
   path; soundness on irreducible CFGs rests on lemma614Update's
   `effectivePredCount(Succ) != 1` multi-pred guard. Align the inline
   comment with the module spec in `docs/modules/evm/cache-build.md`
   §Invariants, including the future-contributor warning not to remove
   the multi-pred guard on the assumption that InCycle covers it.
The doc previously stated time complexity as `O((N + E) · α(N))`. CHK
is not a union-find algorithm and does not provide an inverse-Ackermann
bound; the near-linear behaviour is workload-dependent, with worst-case
bounded by dominator-tree depth and empirical `chkFixpointRounds = 2`
on every measured workload. Reword as `O((N + E) · R)` with `R` defined
as the number of fixpoint sweeps and the measured / worst-case bounds
spelled out.
…tats

`static_jump_stats` previously marked every PUSH-then-JUMP/JUMPI pair as
a static target without decoding the pushed value or checking whether
it lands on a valid `JUMPDEST` PC. This diverged from the cache
builder's `resolveConstantJumpTarget` semantics in
`src/evm/evm_cache.cpp`, which both decodes the constant and requires
the target byte to be a `JUMPDEST` outside any PUSH-data region. The
divergence undercounts dynamic JUMPs whenever a PUSH constant happens
to point at a non-JUMPDEST byte, biasing the `dyn_jump_ratio` used for
corpus stratification toward "static".

Rewrite as a two-pass scan: pass 1 collects valid JUMPDEST PCs
(skipping PUSH-data regions); pass 2 decodes each PUSH value and counts
the following JUMP/JUMPI as static iff the decoded value is in the
JUMPDEST set. End-of-code PUSH truncation is zero-padded on the right
to match EVM stack semantics.
@zoowii zoowii merged commit 0c19a1e into DTVMStack:main May 18, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants