feat(ci): overhaul fixture releases#2888
Conversation
abb005f to
8bd6b77
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## forks/amsterdam #2888 +/- ##
===================================================
+ Coverage 87.16% 90.50% +3.34%
===================================================
Files 586 535 -51
Lines 35791 32407 -3384
Branches 3364 3011 -353
===================================================
- Hits 31198 29331 -1867
+ Misses 3943 2559 -1384
+ Partials 650 517 -133
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
7eac7b3 to
d558bc7
Compare
|
This PR needs to be merged to fully test out the new workflow, but I did some smaller smoke tests on my fork:
To keep the workflow runnable in minutes on a fork (no Smoke test releases can be found here: https://github.com/spencer-tb/execution-specs/releases |
d558bc7 to
b2ac847
Compare
|
|
|
This is great, thanks. Just one question about benchmark releases. Does it make sense for those to also have benchmark-consensus and benchmark-devnet variants? For example, recently ive been hitting a slight complication which ive had to work around - the benchmark fixtures dont have BALs in them so I couldnt use them locally for optimisation work on top of the devnet changes. I had to synthesise my own fixtures essentially. If we had benchmark-devnet variant for those I wouldve not had to do this workaround |
|
I support @taratorio 's idea if this does not complicate the workflow too much |
danceratopz
left a comment
There was a problem hiding this comment.
Thanks so much for this effort! Looking forward to getting all releases in execution-specs!
I feel a little bit weird about hijacking the MAJOR version for fork/devnet version (fork version is MINOR in EELS releases, so another slight inconsistency). We do have to hope PandaOps never launches feat-devnet-alphaone or similar to avoid invalid versions 😆
On face value, it seems to fit well. And i think one of your major motivations is to avoid hard-coding releases in the hive-tests runner config repos...
https://github.com/ethpandaops/hive-tests/blob/52cf2fcfa6bb698c11344994d8e223e69dd3a969/.github/workflows/hive-devnet-7.yaml#L161-L162
But it might be worth thinking through a little more. If we have two devnet versions (bal-devnet-3,bal-devnet-7) running at the same time, does it still simplify artifact download? I guess we never have hive runner configs for two devnets simultaneously?
As-is we could already do:
consume cache --input=bal-devnet-7@latest
and if we aim to deprecate consume cache then both versions are equally convenient with gh release download, I think?
Versioning scheme in this PR:
#!/usr/bin/env bash
TAG=$(gh release list --repo ethereum/execution-specs --limit 100 --json tagName \
--jq 'map(select(.tagName | startswith("tests-bal-devnet@v7."))) | .[0].tagName')
gh release download "$TAG" --repo ethereum/execution-specs --pattern '*.tar.gz'Versioning scheme currently:
#!/usr/bin/env bash
TAG=$(gh release list --repo ethereum/execution-specs --limit 100 --json tagName \
--jq 'map(select(.tagName | startswith("tests-bal-devnet-7@v"))) | .[0].tagName')
gh release download "$TAG" --repo ethereum/execution-specs --pattern '*.tar.gz'If this does not greatly help downstream convenience I would opt for explicit hard-coded release names here bal-devnet-7) and not add our own custom convention to versioning.
If we keep it, I wonder, at the risk of complicating the workflows, we should automate/hard-code the major version to avoid user error (see the docs corrections below) as suggested here. This would avoid duplicating this in the case of the devnet branch and put it close to the fill --until=<fork> config in the case of a fork branch. The comments on the invalid tag/version highlight that this is error prone.
Could you restructure the docs to keep all (or most of) the "Formats and Release Layout" section, but move it below the "Release Tracks" (or "Test Release Types" if we rename)?
I think removing blockchain_test_engine from the benchmark spec deserves its own PR as it gets a bit lost here.
| consensus: | ||
| evm-type: eels | ||
| fill-params: --until=BPO2 --generate-all-formats | ||
| fill-params: --until=BPO4 |
There was a problem hiding this comment.
Can we keep --generate-all-formats please? :)
| evm-type: benchmark | ||
| fill-params: --fork=Osaka --generate-all-formats --gas-benchmark-values 100 ./tests/benchmark/compute | ||
| feature_only: true | ||
| fill-params: --fork=Amsterdam --gas-benchmark-values 1,10,30,60,100,150 ./tests/benchmark |
There was a problem hiding this comment.
Should we append 200M here to reflect the minimum glammie target value from the interop? Or other values @LouisTsai-Csie
Can also be a follow-up!
There was a problem hiding this comment.
Let's make it a follow up, I'm pulling the benchmark changes out of this PR
There was a problem hiding this comment.
The fact that we configure a fill flag here is initially surprizing, a boolean supports-xdist instead of x-dist would show clearer intent. But probably not worth the overhead.
There was a problem hiding this comment.
Keeping as-is for now.
There was a problem hiding this comment.
Nit: Can we rename x-dist to xdist? I've never seen it hyphenated anywhere else )
| # Client aliases | ||
| benchmark: | ||
| impl: geth | ||
| repo: ethereum/go-ethereum | ||
| ref: master | ||
| evm-bin: evm | ||
| x-dist: auto | ||
| consensus: | ||
| impl: eels | ||
| repo: null | ||
| ref: null | ||
| evm-bin: null | ||
| x-dist: auto | ||
|
|
There was a problem hiding this comment.
For consensus this seems like an unnecessary redirct? I'd skip this obfuscation and just use eels, respectively geth in .github/configs/feature.yaml.
And, same for benchmark, assuming we only have one type of benchmark feature (as is the case in this PR - it removes benchmark_fast).
There was a problem hiding this comment.
No longer using consensus.
There was a problem hiding this comment.
Benchmark is kept, as without it we have to update the evm version is 2 places.
Keeping the alias here means only one place needs updated, and shows clearly what t8n we are currently using for benchmarking which is a moving target atm
| | Example dispatch | Git tag | Release title | Artifact | | ||
| | ---------------- | ------- | ------------- | -------- | | ||
| | `feature=consensus version=v1.2.3` | `tests-consensus@v1.2.3` | `consensus@v1.2.3` | `fixtures_consensus.tar.gz` | | ||
| | `feature=bal-devnet version=v1.0.0 branch=bal-devnet-7` | `tests-bal-devnet@v1.0.0` | `bal-devnet@v1.0.0` | `fixtures_bal-devnet.tar.gz` | |
There was a problem hiding this comment.
Invalid version/tag I believe? It must be compatible with the target devnet version.
| | `feature=bal-devnet version=v1.0.0 branch=bal-devnet-7` | `tests-bal-devnet@v1.0.0` | `bal-devnet@v1.0.0` | `fixtures_bal-devnet.tar.gz` | | |
| | `feature=bal-devnet version=v7.0.0 branch=bal-devnet-7` | `tests-bal-devnet@v7.0.0` | `bal-devnet@v7.0.0` | `fixtures_bal-devnet.tar.gz` | |
| To cut a new release: | ||
|
|
||
| These releases are tagged using the format `<pre_release_name>@vX.Y.Z`. | ||
| 1. **Pick the next version** per the [Versioning Scheme](#versioning-scheme) for the track you're releasing on (e.g. the next consensus release after `consensus@v20.3.0` is `consensus@v20.3.1` for a tests-only bump, or `consensus@v20.4.0` for a behaviour change). |
There was a problem hiding this comment.
I think this could be less ambiguous:
- tests-only = new tests?
- behavior change = fixed tests/specs?
There was a problem hiding this comment.
Updated this, replaced the ambiguous wording with the scheme from Marios last comment: Y = consensus breaking spec change targeting fork/devnet X, Z = non breaking change (refactor) / new / modified tests.
| ```bash | ||
| gh workflow run release_fixtures.yaml -f feature=consensus -f version=v20.3.1 | ||
| # devnet releases additionally require the branch to release from: | ||
| gh workflow run release_fixtures.yaml -f feature=bal-devnet -f version=v1.0.0 -f branch=bal-devnet-7 |
There was a problem hiding this comment.
Perhaps this should also be tested by the workflow that the branch "suffix" matches the target release major version?
| gh workflow run release_fixtures.yaml -f feature=bal-devnet -f version=v1.0.0 -f branch=bal-devnet-7 | |
| gh workflow run release_fixtures.yaml -f feature=bal-devnet -f version=v7.0.0 -f branch=bal-devnet-7 |
There was a problem hiding this comment.
Done, the check is now in validate_inputs() via the python script.
| version: | ||
| description: "Release version, e.g. v20.0.0" | ||
| required: true | ||
| type: string |
There was a problem hiding this comment.
How about making this Y.Z and:
- Devnet release type: Extracting
Xfromdevnets/bal/X(or in potentially in the futuredevnets-bal-X. - Consensus release type: Hard-coding in
.github/configs/feature.yaml(close to where--until=<fork>is defined). - Benchmark release type: Hard-code in
.github/configs/feature.yaml.
There was a problem hiding this comment.
Went a different route here as per Mario's last comment. We now validate X for devnet releases only. We can add further validation to a follow up imo.
| env: | ||
| INPUT_FEATURE: ${{ inputs.feature }} | ||
| INPUT_VERSION: ${{ inputs.version }} | ||
| INPUT_BRANCH: ${{ inputs.branch }} |
There was a problem hiding this comment.
This is somewhat of an edge case, but it's valid and there's an easy fix that's probably worth adding.
From codex:
There's a potential race condition when INPUT_BRANCH is set, e.g. devnets/bal/7):
- setup, build (matrix), combine, release each call actions/checkout with:
ref: ${{ inputs.branch }}→ resolves the current branch tip independently at each checkout (lines 42, 97, 118, 174). - Release step:
TARGET="${INPUT_BRANCH:-$GITHUB_SHA}"→INPUT_BRANCH(a branch name string) →gh release create --target bal-devnet-7→ resolves the current branch tip again at tag creation time (line 221). - That's 5+ independent snapshots of the branch tip, any of which can disagree if the branch advances during the (potentially 24h =job timeout) build.
GITHUB_SHAdoesn't enter into this path at all — it's whatever the dispatch ref's tip was at dispatch time, which is typically a different ref (e.g.forks/amsterdam) and never gets consulted in this path.
No problem when INPUT_BRANCH is empty:
- Every actions/checkout is called with
ref: ${{ inputs.branch }}→ empty → checkout falls back toGITHUB_SHA. - Release step:
TARGET="${INPUT_BRANCH:-$GITHUB_SHA}"→GITHUB_SHA. GITHUB_SHAis immutable for the duration of the run.- Race-free. All jobs see the same SHA, tag lands on the same SHA. Done.
Suggested fix:
# in `setup` job, after checkout:
- name: Resolve target SHA
id: target_sha
run: echo "sha=$(git rev-parse HEAD)" >> "$GITHUB_OUTPUT"
Add target_sha to setup.outputs, then replace every ref: ${{ inputs.branch }} with ref: ${{ needs.setup.outputs.target_sha }} (in build, combine, release), and --target "$TARGET" with --target "${{ needs.setup.outputs.target_sha }}". Now there's exactly one tip resolution; everything downstream is SHA-pinned.
There was a problem hiding this comment.
Added the exact fix here thanks!
Hey @taratorio, thanks for asking and the feedback! Yes, this def makes sense in general and we can add these as required.
Were you filling these fixtures yourself? The latest release https://github.com/ethereum/execution-specs/releases/tag/tests-benchmark%40v0.0.9 is only filled for Osaka. There might have been a bit of flux here due to incompatible/incomplete t8n tools for EELS and geth (benchmark releases have been using geth), but as-is today 😆 on |
yes, I filled a few that I was interested in with erigon but not via t8n but my own quick and hacky way |
|
Just discussed with @spencer-tb and @LouisTsai-Csie, this is our suggestion going forward:
|
I mostly agree with this comment except for points 3 and 4, since I think the MAJOR should refer to the devnet/fork: My suggestion is as follows: For
For
I.e. if a client is targeting to join fork/devnet X, it should target MAJOR equal to X, must take the latest MINOR, and should ideally take the latest PATCH. With fork/devnet as MAJOR, the version number alone tells a client whether a release is relevant to them. Under the alternative (MAJOR tracks any spec change), you'd have to combine the version and the -N suffix in the release name to figure out whether a spec change targets your devnet or a different one. Devnet compatibility shouldn't require parsing two fields IMO. Concretely, the mindset just from looking at the version (ignoring the name) should be: I'm a client dev targeting devnet 7, and I was passing tests contained in On the topic of parallel maintenance of two different devnets, this scheme handles this naturally IMO. E.g. if we are maintaining devnets 3 & 7 for the same feature, releasing v3.0.1 after or alongside v7.0.0 is not a problem (think Python 2.x vs 3.x). It is a well-known Semver pattern, and Semver is good at this. We should simply make this rule clear and follow it so we are predictable. On On benchmarking, major and minor should mirror the feature they are targeting, while the patch moves freely at its own pace. |
|
🚢 |
ac49382 to
93a501c
Compare
e15b920 to
9ffaf6a
Compare
9ffaf6a to
bfafa90
Compare
🗒️ Description
This PR outlines a detailed description and updates our test fixture release process for non-benchmark tests.
Key Changes
evm-impl.yamlis merged intoevm.yamland each entry now carries itsimpl,repo,ref,evm-binandxdist.benchmarkclient alias is kept sofeature.yamlworkflows can reference a client by name, andbuild-evm-baseresolves which builder to run via theimplfield.tests,devnetandbenchmarkfeatures (inforks/amsterdam).devnetis specifically chosen so we don't ever need to updatefeature.yamlfor future devnet features. Thedevnetkeyword is used as a substring match on release tagging, and<feat>-devnetkeeps its friendly name.release_fixture_feature.yamlis replaced withrelease_fixtures.yaml:workflow_dispatchwith explicitfeature+versioninputs. The git tag istests-<feature>@<version>(thetestsfeature tags astests@<version>) and the release title is<feature>@<version>.versionmust matchvX.Y.Z,featuremust be non-empty,*-devnetrequires abranch, and anevmoverride must be a key inevm.yaml.evm/evm_repo/evm_refinputs override the client impl and the t8n tool repo/ref for a one off release.osakarange (no standalonebposplit).Tagging A Release
To tag releases we must now use the Github CLI. This has the benefit of only creating tags in EELS if the fixture building process is successful. No more tag deletion and recreation, only workflow triggers.
The following can be ran locally or optionally triggered with the github actions website UI.
Downloading A Release
Fixtures can be downloaded by 2 seperate methods.
gh release downloadfor the raw tarball, orconsume cacheif you want tag resolution (@latest), local caching, and--inputintegration with consume subcommands.Fixed Release Types
tests@vX.Y.ZIn the past (pre-Weld) we had the following release features:
stable&develop, wherestablewas a subset fordevelop. To converge on these 2 features we now define thetestsfeature. This is the invariant to thebenchmarkfeature.The
testsfeature acts as our mainnet set of tests. That is for nowfill --until BPO4, all tests for all forks until last mainnet fork. This will be released weekly on any change or addition to the tests always from the latest development branch:forks/amsterdamcurrently. Clients will use this release on their main/master branches in CI, and eventually this release will contain all tests fromethereum/testsðereum/legacy-tests(allowing us to archive both of these repos). TLDR; one tag type to verify you will not break mainnet. Additionally this will be ran in our Hive CI (under what is currently labelled generic).Here we define a new semantic versioning type for our
testsreleases:The first release of this type in EELS will be
tests@v20.0.0, to catch up from the last fork. For the fork under development (Amsterdam) the firsttestsrelease will be tagged once all CFI'd EIPs are deemed successful in a devnet (purposely ambiguous here, things change in ethereum). Typically this will resolve to the last devnet before the first testnet; this first release can be viewed as the testnet release. For Amsterdam we will tagtests@v24.0.0likely afterglamsterdam-devnet-6is deemed successful. Spec changes can still occur and that is why we have Y in the new fixture versioning scheme.tests-<feat>-devnet@vX.Y.ZThis release type will follow on from the current devnet release process but more explicitly. Today
<feat>isbaland soon it will beglamsterdam.<feat>can essentially be any keyword but typically is the fork name or the headliner feature.The
devnetfeature is entirely for test releases during the fork development process for the upcoming fork. Here we still fill for all forks so clients can make sure they do not introduce any regressions, currentlyfill --until Amsterdam, all tests for all forks until the development fork. Clients will use this release on theirdevnetbranches in CI; they must pass all of these tests before being included in the devnet that the release is tagged for. This release will be ran in Hive CI under the same naming scheme as we do currently.For the
baldevnets today we use the tagtests-bal@vX.Y.Z. This PR will change the process totests-bal-devnet@vX.Y.Z. Asbal-devnet-7is the last of thebal's we will start the new tagging scheme forglamsterdam-devnet-5. The devnet releases will additionally follow a new versioning scheme:glamsterdam-devnet-5this would be 5,testsreleases.Following the latter, the first
glamsterdam-devnet-5release will be tagged astests-glamsterdam-devnet@v5.0.0. All devnet releases must be tagged from a devnet branch in EELS, in this caseglamsterdam-devnet-5. Here we are specifically choosing to diverge from the EELS/branch naming scheme to align every repo under the same devnet name.🔗 Related Issues or PRs
✅ Checklist
just statictype(scope):.Cute Animal Picture