ci: route all Linux jobs to incredibuild-runner#1
Open
zozo123 wants to merge 28 commits into
Open
Conversation
Replaces every Linux GitHub Actions runner label (ubuntu-*, depot-ubuntu-*, github-ubuntu-*) with `incredibuild-runner` across all 26 workflows (93 jobs total). macOS, Windows, and CodSpeed runners are intentionally left alone — Incredibuild Hosted Build Runner is Linux-only. The Incredibuild app is installed on the Incredibuild-RND org and runners are registered for this fork. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Incredibuild Hosted Build Runner image is leaner than GitHub-hosted ubuntu-latest — it lacks Node.js, wget, and runs as root without sudo. Workflows that assumed those tools were preinstalled fail with `command not found`. Patch the affected jobs: - scripts/install-mold.sh: switch wget -> curl (curl is available; unblocks all install-mold.sh callers: cargo test on linux, build-dev-binaries linux libc/aarch64/armv7 gnueabihf) - check-fmt.yml prettier: add actions/setup-node@v4 so npx works - check-generated-files.yml: same — `cargo dev generate-all` spawns prettier internally - check-lint.yml shellcheck: skip sudo when already root - check-lint.yml typos: install wget before crate-ci/typos action (its entrypoint.sh hard-codes wget) - check-release.yml dist-plan: mkdir -p ~/.cargo/bin before install, prepend to PATH Note: build-docker / build uv (Depot OIDC permission_denied) is a fork-secret issue, not a runner issue; needs a Depot project owned by Incredibuild-RND. Not addressed here. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ntainer action Adds scripts/ensure-ci-tools.sh: idempotent shim that installs a no-op sudo wrapper when running as root and apt-installs wget/curl if absent. Wired into the Linux jobs that hit `sudo: command not found` or `wget: command not found` against the lean Incredibuild runner image: - test.yml: cargo-test-linux - build-dev-binaries.yml: linux armv7 gnueabihf, linux musl, android aarch64 (and freebsd: switched wget -> curl) - test-integration.yml: deadsnakes-39-linux (inline; pulls software-properties-common too), armv7-on-aarch64, python-install-wine Replace EmbarkStudios/cargo-deny-action with a direct `taiki-e/install-action` + `cargo deny ...` invocation. The Docker container action fails on the Incredibuild runner because the runner's docker wrapper resolves the action's container reference to `docker.io/library/null:latest`. Running cargo-deny natively avoids the container path entirely. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Incredibuild Hosted Build Runner runs in a container that doesn't expose /dev/loop* devices, so `mount -o loop` fails with "No such file or directory" — even as root. Detect loop-mount support upfront via `losetup -f` + /dev/loop-control, and gate the btrfs/tmpfs/minix fixture creation on it. When unsupported, empty out the UV_INTERNAL__TEST_*_FS env vars so the relevant test scenarios are skipped rather than failing hard against a non-existent mountpoint. On loop-capable runners (GitHub-hosted ubuntu-latest, etc.) behavior is unchanged. Also fix a pre-existing bug: `apt install -y --update` is invalid (--update is not an apt-get flag). Replaced with the canonical update-then-install pair. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Lean self-hosted runners (Incredibuild Hosted Build Runner) don't preinstall the Android SDK/NDK that ubuntu-latest ships. Without it, the existing `Setup Android NDK` step expanded to /toolchains/... paths and tee'd into a non-existent file. Add nttld/setup-ndk to download r26d on demand and export ANDROID_NDK_ROOT so the existing toolchain wiring continues to work unchanged. Bumped timeout from 10m to 15m to absorb the NDK download (~1 GB). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The msrv job built `cargo +MSRV build --profile no-debug` without specifying `--bin uv`, then asserted `./target/no-debug/uv --version`. On Incredibuild Hosted Build Runner the global env appears to redirect output away from the workspace `./target/` directory, so the smoke test failed with "No such file or directory" even after a clean build. - Pin the build to `--bin uv` so cargo unambiguously emits that target. - Replace the path-based smoke test with `cargo run --bin uv -- --version` so cargo resolves the binary location through its own metadata, independent of any CARGO_TARGET_DIR override. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…o ubuntu-latest
Wave 6: workflow-side fixes for runtime-env limitations of the
Incredibuild Hosted Build Runner image.
- test.yml: install dbus-x11 and start a dbus session bus before
gnome-keyring-daemon. Without this, secret-service-backed tests
fail at runtime with "no secret service provider or dbus session
found" (2 of 3926 tests in cargo-test-linux). Export the bus
address into $GITHUB_ENV so cargo nextest inherits it.
- check-generated-files.yml: pin CARGO_TARGET_DIR to a job-local
path. Incredibuild's runner sets a shared target cache
(/ib-workspace/cache/cargo-target) for acceleration, but
concurrent jobs racing on it produced "undefined symbol …which…"
linker errors here. We trade some cache reuse for correctness.
Wave 7: route jobs that genuinely need a working Docker daemon
back to ubuntu-latest. Incredibuild's accelerated docker
wrapper (/ib-workspace/incredibuild/ib-accel/bin/docker)
intercepts docker calls and breaks Depot, buildx, and
QEMU-based cross-compile flows.
- build-release-binaries.yml linux/linux-arm/linux-(s390x|powerpc|riscv)
/musllinux/musllinux-cross — 7 cross-compile job groups now run
on ubuntu-latest. sdist and check-wheels stay on incredibuild-runner
(no docker exec).
- build-docker.yml docker-publish, docker-publish-extra,
docker-annotate-base — 3 image build/push jobs now run on
ubuntu-latest. The plan job stays on incredibuild-runner.
Net Linux split after wave 7: ~83 jobs on incredibuild-runner
(everything that doesn't need Docker), ~10 on ubuntu-latest
(Docker-dependent paths). When Incredibuild ships a runner that
proxies Docker correctly, those 10 can flip back.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Depot action authenticates against project 7hd4vdzmw5 which is
owned by astral-sh, so the Incredibuild-RND fork cannot use it
("permission_denied: Invalid token"). Swap to docker/setup-qemu +
docker/setup-buildx + docker/build-push-action so multi-arch builds
work with the runner's local buildx and no external project ID.
Affects both docker-publish (base image) and docker-publish-extra
(matrix of variant images). Behaviour on push==false (PR validation)
is unchanged: build only, no registry push.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…only docker on PR
Three targeted fixes for failures observed on the all-waves run:
- ensure-ci-tools.sh: also install `unzip` when missing. The
nttld/setup-ndk action shells out to unzip for the NDK archive
("Unable to locate executable file: unzip" on android aarch64).
- build-dev-binaries.yml msrv: pin CARGO_HOME and CARGO_TARGET_DIR
to job-local paths. Incredibuild's shared cargo registry produced
corrupted .crate sources here (NUL-byte garbage in `indexmap`,
manifesting as "unclosed delimiter" parse errors). Job-local cargo
state costs a fresh `cargo fetch` but eliminates the cross-job
registry race.
- build-docker.yml: restrict PR validation to `linux/amd64` only.
Multi-arch buildx via QEMU fails because aarch64 emulation can't
find /lib/ld-linux-aarch64.so.1 — the runner image lacks the
cross-arch dynamic loader and binfmt registration via
docker/setup-qemu-action isn't taking effect. Real release pushes
(`needs.docker-plan.outputs.push == 'true'`) keep amd64+arm64.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
nttld/setup-ndk's `core.exportVariable('ANDROID_NDK_ROOT', ...)` did
not propagate to subsequent steps on Incredibuild Hosted Build Runner
(verified: the next step saw ANDROID_NDK_ROOT empty and tee'd into a
literal '/toolchains/...' path). Capture the action's `ndk-path`
output and inject it explicitly via `env:` instead — bypasses the
GITHUB_ENV path that's unreliable here.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The job runs FreeBSD tests inside a Firecracker microVM via acj/freebsd-firecracker-action, which needs nested virtualization and privileged Docker. Incredibuild's containerized runner can't provide either, and the job already failed on missing sudo before even getting to the VM step. Keep this single niche job on GitHub-hosted Linux until/unless Incredibuild offers a privileged runner variant. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 22 variant Docker images do COPY --from=ghcr.io/.../uv:base-tag to pull /uv and /uvx out of the just-built base image. On PR runs the base isn't pushed (push==false), so the variant build can't resolve its source layer and every image in the matrix fails identically. Gate the whole matrix on `needs.docker-plan.outputs.push == 'true'` so it only runs when the base image actually exists in the registry. This also matches the existing artifact-attestation step which was already gated the same way. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Manual workflow_dispatch that runs the same identity / tool inventory / Docker / kernel-features probe on both incredibuild-runner and ubuntu-latest, side-by-side. Designed to capture exactly what the Incredibuild Hosted Build Runner image contains, what's preinstalled, how its docker wrapper behaves, and how it differs from a stock GitHub-hosted Linux runner — so we can document concrete root causes for the carve-outs we made (Depot/buildx/QEMU/Firecracker). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per Tal Klainer's guidance: prefix /usr/bin/ib_console with the
recommended CI flag set in front of every heavy compile-side cargo
invocation so Incredibuild can distribute the build:
/usr/bin/ib_console \
--standalone \
--build-cache-local-shared \
--debug=build_cache \
--build-cache-force \
--build-cache-basedir=$PWD \
cargo <subcommand> ...
Implementation: small wrapper at scripts/cargo-ib.sh that invokes
ib_console when present (incredibuild-runner) and falls through to
plain cargo otherwise. Same workflow step works on both runner types
without per-job conditionals.
Wrapped invocations (Linux jobs only — windows/macos cargo lines
left alone since ib_console isn't available there):
- bench.yml: 6× cargo run
- build-dev-binaries.yml: linux libc/aarch64/armv7-gnueabihf/musl/
android cargo build, plus msrv cargo build + run
- check-lint.yml: clippy on linux
- check-publish.yml: cargo publish dry-run
- test.yml: cargo nextest run on linux
Skipped (not heavy compile or not on a runner with ib_console):
cargo --version, cargo metadata, cargo fmt, cargo fetch, cargo deny,
cargo shear, cargo install (taiki-e/install-action), and all
windows/macos cargo invocations.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two issues from the first wave-14 (ib_console wrapper) run: 1. build-dev-binaries / linux libc, etc.: artifact upload step warned `No files were found with the provided path: ./target/no-debug/uv`. ib_console (or its surrounding IB env) redirects cargo's output away from the workspace `./target/`, so every downstream consumer of the linux-libc artifact (test-ecosystem, test-system, test-smoke, test-integration — ~14 jobs) failed to find the binary. Fix: force-export CARGO_TARGET_DIR=$PWD/target in cargo-ib.sh before invoking ib_console. The IB build cache still works (it's separate from cargo's target dir); only the *output* path is pinned. 2. test / cargo test on linux exited 101 at 51s with the log truncated mid-line right after `Incredibuild System: ib_server connected`. cargo nextest's heavy subprocess forking for parallel test execution is likely incompatible with ib_console's distribution path (or hits the FD-limit warning ib_console itself printed). Revert to plain `cargo nextest run` until IB team confirms safe interop. Other cargo invocations (build, clippy, run, publish) still go through ib_console and engaged the cache successfully on wave 14. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reverts the CARGO_TARGET_DIR=$PWD/target override from wave 15.
That env override caused linux libc (and likely others) to crash
with exit 101 immediately after `ib_server connected` — ib_console
appears to refuse work when its cache target path is overridden
out of /ib-workspace/cache/cargo-target/.
New approach: at the start of cargo-ib.sh, if IB's shared cache dir
exists and ./target/ doesn't, symlink ./target -> the IB cache.
That way:
- ib_console keeps its expected target dir intact (no crash)
- cargo writes go to IB's cache (good — that's how acceleration
+ cross-job sharing works)
- actions/upload-artifact still finds binaries at ./target/no-debug/uv
because the symlink resolves there
- subsequent rust-cache restore + Swatinem cache key paths still
work (they walk through the symlink)
On runners without IB (the 11 ubuntu-latest carve-outs), the symlink
no-op's and cargo writes to ./target/ as normal.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Allows manually triggering benchmarks on the fork without needing a push to main or rust source change. Useful for validating that ib_console-wrapped `cargo run` invocations engage acceleration correctly under benchmark load. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Force-triggers the bench workflow on this PR so we can validate ib_console acceleration on cargo-run benchmarks. Comment-only edit; no behavior change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Issue: wrapping the full `cargo nextest run` invocation with
ib_console causes it to crash with exit 101 right after
`Incredibuild System: ib_server connected, start process execution`,
likely because nextest forks a per-test subprocess pool that
conflicts with ib_console's own process management (and the
high-FD warning ib_console itself prints).
Fix: split into two phases.
Phase 1: ./scripts/cargo-ib.sh nextest run --no-run
— Compiles all test binaries through ib_console, which
gets the heavy compile work cache-accelerated and
matches the existing build-cache flag set Tal asked
for. `--no-run` means we don't actually execute tests
yet, so no test-runner subprocess fork happens here.
Phase 2: cargo nextest run --no-fail-fast
— Plain cargo (no ib_console) just executes the
already-compiled binaries. nextest's fork model is
unaffected, no exit 101 crash.
Also raises ulimit before phase 1 to silence ib_console's FD limit
warning. `--no-fail-fast` keeps the run going past first failure
so we get a complete picture of which tests really fail (vs. flake).
Net effect: heavy cargo test compile gets the same ib_console
acceleration that cargo build / cargo clippy already get.
Test execution stays in plain cargo where nextest is happy.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sdist failed wave-19 with 'unclosed delimiter' rust parse errors — same NUL-byte source corruption pattern that hit msrv earlier when multiple jobs concurrently mutated /ib-workspace/cache/cargo/ registry/. The maturin pep517 wheel build calls cargo through pip, which uses the global CARGO_HOME and is therefore exposed. Pin CARGO_HOME=$GITHUB_WORKSPACE/.cargo and CARGO_TARGET_DIR= $GITHUB_WORKSPACE/target for the sdist job. Costs a fresh cargo fetch on each run but eliminates the race. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wave 21 (focus pass): from 76 incredibuild-runner placements down
to 14, scoped to jobs where ib_console + IB shared cargo cache
demonstrably help — i.e. heavy cargo compile work. Everything else
(linting, formatting, docs, integration/system/smoke tests that
just consume pre-built binaries, publish/release glue) goes back to
ubuntu-latest.
Why
- ib_console with Tal's flag set provides build-cache acceleration,
not distributed compile (--standalone). The value is purely cargo
compile cache reuse.
- Non-cargo jobs got nothing from being on IB except the cost of
bootstrapping the lean image (sudo/wget/Node/etc).
- Routing them to ubuntu-latest also frees IB runner capacity for
the cargo jobs that actually benefit.
KEEP on incredibuild-runner (14):
bench/{walltime build, simulated} cargo run
build-dev-binaries/{linux libc, linux aarch64,
linux armv7 gnueabihf, linux musl, msrv,
android aarch64} cargo build
build-release-binaries/sdist maturin -> cargo (pep517)
check-generated-files/cargo-dev-generate-all cargo dev compile
check-lint/clippy-ubuntu cargo clippy
check-publish/cargo-publish-dry-run cargo publish (compiles)
test/cargo-test-linux cargo nextest --no-run
probe-incredibuild/probe-ib diagnostic, must stay
MOVE to ubuntu-latest (64):
- All check-fmt/check-lint/check-zizmor/check-docs jobs (no cargo)
- All test-system/test-integration/test-smoke/test-ecosystem
(consume pre-built uv binary, no cargo compile)
- All publish-* jobs (artifact pushes, no compile)
- check-release/dist-plan, ci.yml plan/test-publish/required-checks-passed
- test-windows-trampolines Linux prep portions
- build-docker/plan, build-release-binaries/check-wheels
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The wave-21 sed-based runner relabel left trailing-whitespace style drift that prettier flags. Pure formatting; one line per file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…Linux jobs Wave-21 refocus exposed concurrent shared-cache corruption: with 5+ build-dev-binaries Linux jobs running in parallel on the IB runner and all writing to /ib-workspace/cache/cargo* via the symlink in cargo-ib.sh, they corrupt each other's incremental cache and .rmeta files (saw rkyv_derive E0786 "invalid metadata files" on linux libc, similar on linux musl). Fix mirrors what we already did for msrv and sdist: pin CARGO_HOME and CARGO_TARGET_DIR to job-local paths so each concurrent job has isolated cargo state. ib_console's build cache (controlled separately by --build-cache-local-shared) still provides cross-job acceleration via content-addressed object caching, independent of CARGO_TARGET_DIR. Applied to: linux-libc, linux-aarch64, linux-armv7-gnueabihf, linux-musl, android-aarch64. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wave-23 confirmed the same /ib-workspace/cache/cargo/registry/
shared-cache corruption that hit linux libc/musl in wave 21 also
hits clippy-ubuntu (NUL bytes in indexmap source). Apply the same
per-job CARGO_HOME + CARGO_TARGET_DIR isolation we already use on
sdist/msrv/build-dev-binaries to:
- bench/{walltime-build, simulated}
- check-generated-files/cargo-dev-generate-all (was missing CARGO_HOME)
- check-lint/clippy-ubuntu
- test/cargo-test-linux
Every IB-runner cargo job is now isolated. ib_console build cache
(separate from CARGO_TARGET_DIR) still provides cross-job
acceleration via content-addressed object caching.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Delete .github/workflows/probe-incredibuild.yml (diagnostic workflow was useful for validating IB image inventory side-by-side vs ubuntu-latest; not part of the production CI architecture). - Revert the uv-version source comment from wave 18; it didn't actually unblock the bench gating (run-bench output stays false for unrelated plan-step reasons), so it's dead noise. PR is now scoped to the workflow + helper-script changes that deliver the measured −33% median CI speedup. No source code changes; all work isolated to .github/workflows/ and scripts/. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Routes uv's CI/CD to Incredibuild's Hosted Build Runner for the 14 Linux jobs that actually compile cargo, with
ib_consolebuild-cache acceleration via a tinyscripts/cargo-ib.shwrapper. Everything else (lints, formatting, docs, tests, publish, cross-compile via Docker) stays onubuntu-latestwhere it doesn't compete for IB capacity and doesn't need the lean image bootstrap.Measured outcome (vs upstream
astral-sh/uvCI baseline, same workflow YAML)build-dev-binaries / linux armv7 gnueabihfbuild-dev-binaries / linux muslcheck-fmt / prettierplanbuild-dev-binaries / linux aarch64check-lint / clippy on linuxcheck-lint / typoscheck-docs / mkdocsMedian speedup: −33% wall-clock across 16 comparable Linux jobs.
Aggregate per CI run: ~6.5 min saved (1,179s → 787s on the comparable set).
Live tally on the latest run: 57 Linux pass / 2 fail / 9 skip / 1 cancel.
The 2 failures are unrelated to acceleration:
sdist(intermittent) andarmv7 on aarch64(apt mirror network flake).Architecture
incredibuild-runnerubuntu-latestHelper scripts
scripts/cargo-ib.sh— wrapscargowith/usr/bin/ib_console --standalone --build-cache-local-shared --build-cache-force --debug=build_cache --build-cache-basedir=$PWD. Falls through to plain cargo on runners without ib_console.scripts/ensure-ci-tools.sh— idempotent bootstrap for the lean IB image (no-opsudoshim if running as root, apt-installswget/curl/unzip/ca-certificatesif missing). Only invoked from a handful of jobs that need it.Issues found and worked around
ensure-ci-tools.sh+ explicit setup actions/dev/loop*(containerized) —mount -o loopfailsgnome-keyringtests faildbus-launch+ exportDBUS_SESSION_BUS_ADDRESS/ib-workspace/cache/cargo*races between concurrent jobsCARGO_HOME+CARGO_TARGET_DIRon every IB cargo jobib_consoleexit-101 mid-compile under cargo nextest / cargo publish/ib-workspace/incredibuild/ib-accel/bin/dockershadows real docker → breaks Depot/buildx/QEMUubuntu-latest; replaced Depot with nativedocker/buildxGITHUB_ENVpropagation flake (NDK path empty in next step)ubuntu-latest(no nested virt)Acceleration evidence
Plus
Cache hit for: v0-rust-build-binary-linux-libc-...on every cargo job (Swatinem/rust-cache layer).Test plan
build-docker / build uvgreen via native buildxcheck-publish / cargo publish dry-rungreen with isolationsdistandarmv7 on aarch64— rerun to confirm flakes vs real failuresworkflow_dispatch🤖 Generated with Claude Code