perf(evm): depth-indexed InterpreterExecContext pool for nested calls#482
Conversation
There was a problem hiding this comment.
Pull request overview
This PR significantly changes the EVM execution hot path to reduce interpreter overhead and introduce a profile-guided, background JIT compilation flow, while also adding a depth-indexed InterpreterExecContext pool to avoid per-nested-call allocations.
Changes:
- Reuse
InterpreterExecContextper call depth to eliminate repeated large frame-stack allocations in deeply nested calls. - Add profile-guided JIT triggering with a sliding-window profiler and a background compilation thread pool, plus new config/CLI options.
- Inline several “pure”/hot EVM opcodes in the interpreter dispatch loop to reduce handler overhead.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/vm/dt_evmc_vm.cpp | Adds depth-indexed exec-context reuse and implements profile-guided JIT + background compilation plumbing in execute() |
| src/tests/spec_unit_tests.cpp | Adds --enable-profile-guided-jit CLI flag for the spec test runner |
| src/tests/solidity_contract_tests.cpp | Adds --enable-profile-guided-jit flag but removes --enable-multipass-lazy |
| src/runtime/runtime.cpp | Switches between interpreter/JIT based on whether JIT code is actually available |
| src/runtime/evm_module.h | Makes JIT code pointer atomic and adds a std::future for background compilation |
| src/runtime/evm_module.cpp | Waits for any in-flight background JIT compilation in EVMModule destructor; skips eager compile when PGJ is enabled |
| src/runtime/config.h | Adds EnableProfileGuidedJIT and NumJITCompileThreads runtime config fields |
| src/evm/interpreter.cpp | Inlines several opcode implementations inside the dispatch loop (calldata/txcontext/memory/misc/transient storage) |
| src/compiler/evm_frontend/evm_mir_compiler.cpp | Minor comment tweak in MIR builder init |
| src/compiler/evm_compiler.cpp | Wraps eager EVM JIT compile in try/catch and changes how JIT code pointer is published |
| src/cli/dtvm.cpp | Adds --enable-profile-guided-jit to the CLI |
| src/action/compiler.cpp | Removes the lazy-compilation warning branch and always runs eager EVM JIT compile in multipass mode |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
⚡ Performance Regression Check Results✅ Performance Check Passed (interpreter)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions ✅ Performance Check Passed (multipass)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions |
47a7f32 to
99f4061
Compare
There was a problem hiding this comment.
Pull request overview
This PR targets interpreter-mode EVM performance by reducing per-call allocations for nested calls and inlining several hot opcodes directly into the computed-goto dispatch loop.
Changes:
- Add a depth-indexed
InterpreterExecContextpool to reuse execution contexts across nested EVMC calls. - Inline multiple “pure read” opcodes and several memory/misc/transient-storage opcodes in
BaseInterpreter::interpret()to avoid handler overhead and reduce hot-path work.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/vm/dt_evmc_vm.cpp | Adds a per-depth InterpreterExecContext reuse pool for interpreter fast-path execution. |
| src/evm/interpreter.cpp | Inlines multiple opcode implementations inside the computed-goto interpreter loop for performance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // ---- Memory ops (inlined: MLOAD/MSTORE/MSTORE8) ---- | ||
| // Each opcode performs: | ||
| // 1. stack underflow check | ||
| // 2. memory expansion + gas charge (mirror of | ||
| // checkMemoryExpandAndChargeGas in opcode_handlers.cpp) | ||
| // 3. memcpy load/store | ||
| // The gas formula MUST match the canonical | ||
| // calculateMemoryExpansionCost EXACTLY: | ||
| // MemoryCost(W) = (W*W)/512 + 3*W (computed in __int128) | ||
| // delta = MemoryCost(NewWords) - MemoryCost(CurrentWords) | ||
| // NewWords/CurrentWords = ceil(size/32) where size is byte length. | ||
| // Any deviation (e.g. inlining the subtraction before the divide) | ||
| // can desynchronize gas accounting and break consensus. |
There was a problem hiding this comment.
Agree the duplication is a maintenance concern. I'll extract a static inline helper for the expansion+charge sequence in a follow-up.
|
CI failed. @ys8888john |
512ee4e to
66d72f1
Compare
| if (OffsetVal > intx::uint256(InputSize)) { | ||
| Frame->Stack[sp - 1] = intx::uint256(0); | ||
| } else { | ||
| // OffsetVal <= InputSize fits safely in size_t. |
There was a problem hiding this comment.
only inline small functions (keep it easy to read by human).
the code in else branch is too large for inline
There was a problem hiding this comment.
This inline policy is inspired by evmone baseline, but for readability larger ops are still dispatched through *Handler::doExecute(), e.g. RETURN, RETURNDATACOPY, etc.
66d72f1 to
cad7286
Compare
cad7286 to
1a62057
Compare
After rebasing onto current upstream/main (which now includes DTVMStack#458 / DTVMStack#460 / DTVMStack#482 / DTVMStack#483 perf work) and running a 10-rep evmone-bench on the 27 paper benches, the cumulative PR delta has collapsed to noise (raw geomean +1.15%, +0.46% after correcting a single-iteration outlier on main/blake2b_shifts/8415nulls via a focused 20-rep re-measurement). 0 benches above the +/-25% CI gate. The A-vs-PR-base -2.73% from this commit's own optimization is unchanged; the framing shift is that the absolute runtime delta of the whole PR vs unmodified main has been absorbed by the intervening upstream perf optimizations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1. Does this PR affect any open issues?(Y/N) and add issue references (e.g. "fix #123", "re #123".):
2. What is the scope of this PR (e.g. component or file name):
3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):
4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):
5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links:
6. Release note