ci: speed up evmone perf regression checks#513
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Speeds up the CI perf regression job by caching the evmone-bench checkout/build by resolved for_test commit SHA, running benchmarks in parallel shards, and adding PR-run concurrency cancellation to reduce redundant CI work.
Changes:
- Add commit-SHA-based caching for the evmone benchmark checkout/build in the x86 EVM workflow.
- Add benchmark sharding/parallel execution support to
tools/check_performance_regression.pyvia--benchmark-jobs. - Update
.ci/run_test_suite.shto resolve and check out a specific evmone commit (and pass through benchmark parallelism settings).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tools/check_performance_regression.py | Adds --benchmark-jobs and parallel shard execution for evmone-bench runs. |
| .github/workflows/dtvm_evm_test_x86.yml | Adds concurrency cancellation and caches evmone-bench by resolved commit; runs perf benchmarks with 3 shards. |
| .ci/run_test_suite.sh | Resolves and checks out evmone by commit SHA; passes --benchmark-jobs; builds only evmone-bench. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
⚡ Performance Regression Check Results✅ Performance Check Passed (interpreter)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions ✅ Performance Check Passed (multipass)Performance Benchmark Results (threshold: 25%)
Summary: 194 benchmarks, 0 regressions |
2e396c1 to
1f7965b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
DTVMStack/evmonefor_testHEAD during the perf job and cache the resultingevmone-benchcheckout/build by resolved commit SHA.for_testcommit; cache invalidates automatically when that branch moves.BENCHMARK_REPETITIONS=5.Why
The perf regression job is dominated by building/running evmone benchmark work. This keeps the same same-runner baseline/current comparison semantics while reducing repeated evmone setup/build work and benchmark wall time.
Relationship to #512
This branch is based directly on current
mainand intentionally excludes the Ninja/sccache rollout from #512. If #512 merges first, this PR will need a small rebase over.ci/run_test_suite.shand.github/workflows/dtvm_evm_test_x86.ymlbecause both PRs touch the perf job setup.Validation
bash -n .ci/run_test_suite.shpython3 -m py_compile tools/check_performance_regression.pyyaml.safe_loadgit diff --checkTiming
This PR's first upstream CI run, including initial cache population:
17m13s20m54sReference warm-cache fork validation for the same evmone cache + 3-shard strategy:
28m14s, multipass33m50s12m11s, multipass15m50sA 4-shard run was faster, but produced counted interpreter threshold exceedances, so this uses 3 shards as the safer speed/noise tradeoff.