ci: split release pipeline into per-OS native builds#69
Merged
Conversation
The previous release.yml ran goreleaser inside a single Linux container and cross-compiled for linux/amd64, linux/arm64, windows/amd64, and darwin/arm64. That approach left the cross-compiled binaries with empty go:embed sections — the Hyper-V kernel/initrd/rootfs, the macOS Vz aarch64 VM assets, and the arm64 GHA runner tarball all live in platform-specific paths fed by mage targets gated to the matching host. Cross-compiling from Linux saw only the x64 Linux assets and filled the rest with EnsurePlaceholders() zero-byte files — the resulting Windows and Darwin binaries would compile fine but be unable to boot a Linux VM. Replace with four parallel build jobs, one per platform, each running on its native self-hosted runner via the appropriate mage target: - linux/amd64: mage build:build on [self-hosted, linux, x64] - linux/arm64: mage build:build on [self-hosted, linux, arm64] - windows/amd64: mage build:windows on [self-hosted, windows, x64] — the full two-stage build (Linuxembed, Rootfs, Kernelx86, Initrdx86) - darwin/arm64: mage build:macos on [self-hosted, macos, arm64] — the Darwin build with aarch64 Linux VM assets + codesign Each job packages its binary (tar.gz on unix, zip on Windows) and uploads it as a workflow artifact. A final release job downloads all four artifacts, computes sha256 checksums.txt, and creates a draft GitHub release via `gh release create --draft --prerelease`. The draft gate is intentional — release notes auto-generated by --generate-notes, publishing is manual. Drop .goreleaser.yml; the workflow uses `gh release create` directly and mage handles cross-compile via its existing per-OS build:* targets.
Two related things, both surfaced by the PR #69 CI run failing on the same TestPushHandlerEndToEnd flake from PR #68 that I "fixed" with a post-stage Info() diagnostic. 1. AGENTS.md — short hard-rules file for any agent (Claude, Cursor, etc.) working in this repo. Centerpiece is "run mage lint AND mage test before every push, no exceptions". Local cgo failures on Windows are not a free pass — that's exactly the path that produced two recent red CI runs (errcheck on debugexec_linux.go in PR #68, flake-mask regression in PR #69). Also documents the no-flake-masking rule: never paper over a flaky test with a diagnostic call, sleep, or label that "might help". 2. The real TestPushHandlerEndToEnd fix — hold a 5-minute lease across the entire staging→push lifecycle via leases.Create + WithLease. Without an active lease, content.WriteBlob's addContentLease is a no-op (leases.FromContext returns false), and the staged blobs are namespace-bucket-registered but un-leased and un-labeled. That combination flakes in CI in ways that look like the layer digest "doesn't exist" mid-push. Replaces the post-stage Info() diagnostic from PR #68, which was flake-masking: it made one CI run pass but the underlying race was never fixed. Verified: 5 sequential `go test -run=TestPushHandlerEndToEnd` runs pass on Windows (CGO_ENABLED=0) in 0.65–1.0s each, vs. the previous half-second-flake behavior. Note on this commit's lint coverage: golangci-lint on this Windows box fails on the miekg/pkcs11 cgo cross-import (a known local-env issue documented in AGENTS.md). golangci-lint reports "0 issues" before exiting on that typecheck. `go build ./pkg/dind/...` and `go test ./pkg/dind/...` both pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the single-Linux-container goreleaser pipeline with four parallel native builds, so v0.0.1 and future tags actually ship working binaries on every advertised platform.
Why
The old
release.ymlran goreleaser inside a single[self-hosted, linux, x64]container and cross-compiled all four platforms. That left three of the four binaries broken:Root cause:
mage download:allon Linux fetches only Linux-host assets (Linux ephemerd has no VM dependency), andEnsurePlaceholders()then creates zero-byte files for the other platforms' embeds so the build compiles regardless. Compiles fine, runs broken.How
Four parallel build jobs, each on its native self-hosted runner running the appropriate full
magetarget:build-linux-amd64:mage build:buildon[self-hosted, linux, x64]build-linux-arm64:mage build:buildon[self-hosted, linux, arm64]build-windows-amd64:mage build:windowson[self-hosted, windows, x64]— full two-stage build with Linuxembed + Rootfs + Kernelx86 + Initrdx86build-darwin-arm64:mage build:macoson[self-hosted, macos, arm64]— aarch64 Linux VM assets + codesignEach job packages its binary (
tar.gzon unix,zipon Windows) and uploads it as an artifact. A finalreleasejob downloads all four, computessha256sum > checksums.txt, and creates a draft GitHub release viagh release create --draft --prerelease --generate-notes. Publish step stays manual..goreleaser.ymlis deleted —gh release createhandles the release, mage handles per-OS cross-compile.Test plan
v0.0.1-rc1and watch all four build jobs run on their native runners.checksums.txt.ephemerd --versionreturns the tag,ephemerd startboots without errors.Caveats
releasejob depends on the Windows self-hosted runner and macOS self-hosted runner being online. If either is offline, the release won't complete until it's available — same situation as PR feat: end-to-end KIND-on-dind support — kube-proxy networking, dind hardening, debug tooling #68'sBuild (Linux arm64).Arch doc?
This is the first cross-platform release pipeline for the project. Happy to add
docs/arch/release-pipeline.mdif you'd like — it'd document the per-OS native build invariant + why goreleaser cross-compile from Linux is wrong here, so the next person doesn't try to consolidate it back.