Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
d393465
feat(compiler): add peephole optimization system for dMIR and x86 CgIR
abmcar Mar 30, 2026
a9ac06f
test(compiler): add missing test scripts and expected files for peeph…
abmcar Mar 30, 2026
a9bcffd
style(tools): clean up peephole CI test wrappers
abmcar Mar 30, 2026
f16f167
fix(ci): pass --format evm --mode multipass to dtvm in timing collection
abmcar Mar 30, 2026
7327a22
fix(ci): make timing collection non-blocking in peephole validation job
abmcar Mar 30, 2026
eda2e18
ci: trigger rebuild
abmcar Mar 30, 2026
0353593
feat(compiler): add carry-dead analysis and synthesized rewrite rules
abmcar Mar 31, 2026
d4e23ef
test(compiler): update adc/sbb boundary tests for carry-dead analysis
abmcar Mar 31, 2026
2a913c3
test(compiler): add fuzz tests and coverage for synthesized rewrite r…
abmcar Mar 31, 2026
8e2b828
style(test): fix clang-format violation in dmir validation tests
abmcar Mar 31, 2026
827cd70
feat(compiler): add select folding, mul-pow2 strength reduction, and …
abmcar Mar 31, 2026
312a82c
fix(compiler): fix add(x,0) fold, RewriteCache memoization, and naming
abmcar Mar 31, 2026
8d9460e
ci(compiler): update dmir_rewrite timing budget for 70-rule pass
abmcar Mar 31, 2026
721c4e6
perf(compiler): fold MOVZX32rr8+SUBREG_TO_REG into MOVZX64rr8 in x86 …
abmcar Mar 31, 2026
998d9c6
ci(compiler): widen dmir_rewrite p95 share budget to 1.25%
abmcar Mar 31, 2026
5738f6f
feat(compiler): add MultiWordAdd/Sub atomic instructions to eliminate…
abmcar Apr 1, 2026
5917b0a
fix(compiler): fix exponential MVerifier traversal and dead ValueDep …
abmcar Apr 2, 2026
ad62c28
docs(compiler): add change document for peephole optimization system
abmcar Apr 8, 2026
52257c0
fix(compiler): address review findings in peephole optimization system
abmcar Apr 9, 2026
b8eaf65
fix(compiler): address codex review feedback for peephole system
abmcar Apr 9, 2026
42ae125
fix(compiler): add hasOneNonDBGUse guard to fold-setcc-test-jne rule
abmcar Apr 13, 2026
e635814
ci: retrigger after upstream boost CDN 502 (flake)
abmcar Apr 26, 2026
de871a0
ci: retrigger after upstream boost CDN sourceforge timeout (flake)
abmcar Apr 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions .github/workflows/dtvm_evm_test_x86.yml
Original file line number Diff line number Diff line change
Expand Up @@ -487,3 +487,86 @@ jobs:
run: |
echo "::error::Performance regression detected in ${{ matrix.mode }} mode. See logs for details."
exit 1

peephole_validation_and_timing_budget:
name: Peephole Validation and Timing Budget Check
runs-on: ubuntu-latest
container:
image: dtvmdev1/dtvm-dev-x64:main
steps:
- name: Check out code
uses: actions/checkout@v3
with:
submodules: "true"

- name: Build dtvm and x86CgPeepholeTests
run: |
export LLVM_SYS_150_PREFIX=/opt/llvm15
export LLVM_DIR=$LLVM_SYS_150_PREFIX/lib/cmake/llvm
export PATH=$LLVM_SYS_150_PREFIX/bin:$PATH
cmake -S . -B build \
-DCMAKE_BUILD_TYPE=Release \
-DZEN_ENABLE_SINGLEPASS_JIT=OFF \
-DZEN_ENABLE_MULTIPASS_JIT=ON \
-DZEN_ENABLE_EVM=ON \
-DZEN_ENABLE_SPEC_TEST=ON \
-DZEN_ENABLE_CPU_EXCEPTION=ON \
-DZEN_ENABLE_VIRTUAL_STACK=ON
cmake --build build --target dtvm --target x86CgPeepholeTests --target dmirValidationTests -j$(nproc)
bash tools/easm2bytecode.sh tests/evm_asm tests/evm_asm

- name: Verify .inc generator output is up-to-date
run: |
python tools/generate_x86_cg_peephole.py \
--rules src/compiler/target/x86/x86_cg_peephole_rules.json \
--out-inc /tmp/x86_cg_peephole_generated_check.inc \
--out-report /tmp/x86_cg_peephole_report_check.txt
diff /tmp/x86_cg_peephole_generated_check.inc \
build/src/compiler/generated/target/x86/x86_cg_peephole_generated.inc

- name: Run peephole rule validation check
run: |
python tools/check_x86_cg_peephole_validation.py \
--rules src/compiler/target/x86/x86_cg_peephole_rules.json \
--gtest-binary build/x86CgPeepholeTests

- name: Run dmir rewrite validation tests
run: ./build/dmirValidationTests

- name: Collect compiler pass timings
run: |
python tools/collect_compiler_pass_timings.py \
--dtvm build/dtvm \
--manifest tests/evm_asm/compiler_pass_timing_manifest.json \
--runs 5 \
--output /tmp/ci_timing_report.json \
-- --format evm --mode multipass --compile-only

- name: Refresh timing budgets from CI data
run: |
python tools/update_compiler_pass_timing_budget.py \
--report /tmp/ci_timing_report.json \
--out /tmp/ci_budget_x86_cg_peephole.json \
--budget-in tests/evm_asm/compiler_pass_timing_budget_x86_cg_peephole.json \
--target-pass x86_cg_peephole \
--manifest tests/evm_asm/compiler_pass_timing_manifest.json \
--runs 5
python tools/update_compiler_pass_timing_budget.py \
--report /tmp/ci_timing_report.json \
--out /tmp/ci_budget_dmir_rewrite.json \
--budget-in tests/evm_asm/compiler_pass_timing_budget_dmir_rewrite.json \
--target-pass dmir_rewrite \
--manifest tests/evm_asm/compiler_pass_timing_manifest.json \
--runs 5

- name: Check timing budget (x86_cg_peephole)
run: |
python tools/check_compiler_pass_timing_budget.py \
--budget /tmp/ci_budget_x86_cg_peephole.json \
--report /tmp/ci_timing_report.json

- name: Check timing budget (dmir_rewrite)
run: |
python tools/check_compiler_pass_timing_budget.py \
--budget /tmp/ci_budget_dmir_rewrite.json \
--report /tmp/ci_timing_report.json
70 changes: 70 additions & 0 deletions docs/changes/2026-03-30-peephole-optimization-system/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Change: Peephole Optimization System for dMIR and x86 CgIR

- **Status**: Implemented
- **Date**: 2026-03-30
- **Tier**: Full

## Overview

A two-level peephole optimization system targeting both dMIR (mid-level IR) and x86 CgIR (code generation IR). The dMIR level has 65 accepted rewrite rules (plus 5 seed rules) covering identity elimination, boolean algebra, shift-zero, and carry-dead rewrites. The x86 CgIR level has 13 declarative JSON rules for self-moves, zero-shifts, redundant CMP/TEST, fallthrough branches, and setcc+test+jne chain folding. Includes Z3-verified synthesized rules and a CI validation gate.

## Motivation

The JIT compiler generated redundant instructions from mechanical U256 decomposition and lowering. Peephole optimization is a standard compiler technique to clean up such patterns without restructuring the pipeline. The two-level approach catches patterns at both the IR and machine code level.

## Impact

### Affected Modules

- `docs/modules/compiler/` — new dMIR rewrite pass, carry-dead analysis, rule table infrastructure
- `docs/modules/singlepass/` — x86 CgIR peephole pass
- CI pipeline — new `peephole_validation_and_timing_budget` job

### Affected Contracts

No API or interface changes.

### Compatibility

- No breaking changes
- +4.6% geomean improvement on evmone-bench (27 benchmarks)
- Notable wins: snailtracer +3.9%, structarray_alloc +4.1%, swap_math +5.0-5.8%, memory_grow_mstore +11-13%
- ~0.005ms p95 compile overhead from dMIR rewrite pass

## Implementation Plan

### Phase 1: dMIR Rewrite Infrastructure

- [x] Pattern matching framework
- [x] Rule table
- [x] Validation tests

### Phase 2: Carry-Dead Analysis

- [x] `isCarryDead()` for adc→add and sbb→sub rewrites on dead-carry limbs

### Phase 3: Z3-Synthesized Rules

- [x] `add(x,x)→shl(x,1)`, negation folding, boolean identities
- [x] Verified via `tools/synthesize_dmir_rules.py`

### Phase 4: x86 CgIR Peephole

- [x] 13 declarative JSON rules
- [x] Pattern matching on machine instructions

### Phase 5: CI Gate

- [x] `.inc` freshness check
- [x] Structural/execution/semantics validation
- [x] Compile-time budget enforcement

## Compatibility Notes

No backwards-incompatible changes. The optimization passes are additive and do not alter any external APIs or module interfaces.

## Risks

- Rewrite rules must preserve U256 semantics exactly; all rules are Z3-verified but edge cases in carry chain analysis could theoretically miss a case
- Compile-time budget (0.005ms p95) may need adjustment as more rules are added
- JSON rule format for x86 CgIR is a new abstraction layer that adds maintenance surface
86 changes: 86 additions & 0 deletions docs/compiler/dmir_to_x86_mapping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
<!--
Copyright (C) 2025 the DTVM authors. All Rights Reserved.
SPDX-License-Identifier: Apache-2.0
-->

# dMIR To CgIR/x86 Mapping

## Scope

This note records the lowering bridge for the dMIR arithmetic subset that the
offline rewrite pipeline currently touches, plus the safe subset already wired
into the production dMIR rewrite pass:

- integer `add/sub`
- `cmp`
- `select`
- `adc/sbb`
- EVM 64x64->128 multiplication helpers
- EVM 128/64 division helpers

Phase 1 keeps the production DSL at `CgIR/x86`, so every dMIR-side candidate
rule eventually has to be translated into the instruction families emitted by
`X86CgLowering`.

## Current Production Status

`JITCompilerBase::compileMIRToCgIR()` now runs a tree-local `DMirRewritePass`
after `dead_mbb_elim` and before x86 lowering. The pass currently applies only
a conservative in-code subset of accepted rules whose replacements are either
existing subtrees, typed integer constants, or small synthesized boolean
expressions, for example:

- `add/sub/or/xor/shift` identities with zero
- `and` identities with zero or all-ones
- `not(not x) => x`
- `select(cond, x, x) => x`
- complement folds such as `or((not x), x) => allones`
- boolean factoring such as `xor((and x y), (xor x y)) => (or x y)`

`adc` and `sbb` candidates remain validation-only: the explicit third operand
is visible in dMIR, but rewriting them safely still requires carry/borrow-chain
proof beyond the current structural pass.

## Mapping Table

| dMIR expression family | Lowering entrypoint | CgIR/x86 family | Bridge notes |
| --- | --- | --- | --- |
| `add`, `sub` | generic FastISel path in `CgLowering<X86CgLowering>` plus `X86GenFastISel.inc` (see `src/compiler/target/x86/x86lowering.h`) | `ADD*rr/ri`, `SUB*rr/ri` | This path is table-driven, not hand-written in `x86lowering.cpp`. The exact register/immediate form depends on operand materialization. |
| `cmp` | `X86CgLowering::lowerCmpExpr()` in `src/compiler/target/x86/x86lowering.cpp` | compare op (`CMP*` or `TEST*`) + `SETCCr` + optional `MOVZX32rr8` | Integer compare results become 8-bit condition materialization first, then widen to i32/i64. This is the source-side pattern behind the existing `SETCCr/TEST8rr/JCC_1` peephole fold. |
| `select` | `X86CgLowering::lowerSelectExpr()` in `src/compiler/target/x86/x86lowering.cpp` | integer: `CMOV*`; floating-point: conditional branch + `COPY` | Integer `select` survives as a recognizable dataflow choice. Floating-point `select` is lowered into control flow and loses the direct value-select shape. |
| `adc` | `X86CgLowering::lowerAdcExpr()` in `src/compiler/target/x86/x86lowering.cpp` | `ADC8rr`, `ADC16rr`, `ADC32rr`, `ADC64rr` | The carry operand is not reified in x86 CgIR. Lowering asserts that operand 2 is the constant zero and then consumes the hardware `CF` chain directly. Any dMIR-side analysis that depends on the explicit third operand being zero must therefore happen before lowering. That alone does not justify rewriting `adc(lhs, rhs, 0)` into `add(lhs, rhs)` inside an EVM carry chain. |
| `sbb` | `X86CgLowering::lowerSbbExpr()` in `src/compiler/target/x86/x86lowering.cpp` | `SBB8rr`, `SBB16rr`, `SBB32rr`, `SBB64rr` | Same information-loss caveat as `adc`: x86 CgIR only preserves the borrow-consuming instruction, not the explicit third operand from dMIR. The zero-borrow precondition can be checked only before lowering, but borrow-chain safety still has to be established separately. |
| `evm_umul128_lo`, `evm_umul128_hi` | `X86CgLowering::lowerEvmUmul128Expr()` and `lowerEvmUmul128HiExpr()` in `src/compiler/target/x86/x86lowering.cpp` | `COPY -> RAX`, `MUL64r`, `COPY RAX`, optional `COPY RDX` | The low half is always materialized from `RAX`. The high half exists only when an `evm_umul128_hi` user is present; lowering pre-scans the function and allocates the extra copy lazily. |
| `evm_udiv128_by64`, `evm_urem128_by64` | `X86CgLowering::lowerEvmUdiv128By64Expr()` and `lowerEvmUrem128By64Expr()` in `src/compiler/target/x86/x86lowering.cpp` | `COPY -> RDX`, `COPY -> RAX`, `DIV64r`, `COPY RAX`, `COPY RDX` | Quotient and remainder are split across `RAX` and `RDX`. As with `umul128`, the helper pair lowers to one x86 instruction plus explicit register copies. |

## Translation Rules For The Current Seed Set

The current seed dMIR candidate file lives at
`src/compiler/mir/dmir_rewrite_rules.json`. For Phase 1 option A, these rules
translate into x86-facing families as follows:

| dMIR candidate | x86-facing shape after lowering | Recommended landing layer |
| --- | --- | --- |
| `(add x 0:i64) => x` | `ADD*rr/ri` with a zero operand | x86 DSL can represent this, but only after matching the exact zero-immediate form. |
| `(not (not x)) => x` | `NOT*` pair | Either layer works; x86 DSL keeps it target-specific. |
| `(select cond x x) => x` | integer `CMOV*` or FP branch diamond | Prefer dMIR for the generic rule. Lowering splits the integer and FP cases. |
| `(adc x y 0:i64) => (add x y)` | `ADC*rr` consuming implicit `CF` | Only a dMIR-side candidate today. The explicit third operand disappears after lowering, so this precondition cannot be recovered at the x86 layer. A future promotion still needs carry-chain-specific safety proof. |
| `(sbb x y 0:i64) => (sub x y)` | `SBB*rr` consuming implicit `CF` | Same reasoning as `adc`: the precondition is only visible in dMIR, but promotion still needs borrow-chain-specific safety proof. |

## Why This Mapping Matters

Two pieces of information are lost across lowering:

- The explicit third operand of `adc/sbb`
- The high-level `select(cmp(...), lhs, rhs)` shape once it turns into x86
condition codes plus `SETCCr`, `CMOV*`, or explicit branches

That split is the main reason the current implementation keeps three parallel
tracks:

- a conservative production `DMirRewritePass` for tree-local structural folds
- production peepholes at `CgIR/x86`
- offline dMIR candidate rules plus interpreter-backed validation

The bridge file above is the minimum subset needed to move rules between those
tracks without rediscovering the source locations each time.
Loading
Loading