ci: pilot-migrate clippy job to smithy self-hosted runners#201
Open
ci: pilot-migrate clippy job to smithy self-hosted runners#201
Conversation
Switches just the clippy job from ubuntu-latest to [self-hosted, linux, x64, rust-cpu] — one of the three rust-cpu runners on pulseengine-ci-01 (hetzner-private group). Other jobs (fmt, test) stay on ubuntu-latest for now; once we have a few green clippy runs and timing data, the rest can follow. Why clippy first: - meaningful compile work (good sccache test) - bounded scope — failure doesn't block fmt or test - no sudo, apt, or container needed - spar already tracks nightly via dtolnay/rust-toolchain so the toolchain matches between hosted and self-hosted If this PR's clippy job goes red on the self-hosted runner but passes locally / on hosted, that's a smithy bug, not a code bug.
The previous clippy run on the self-hosted runner failed at highs-sys build because cmake wasn't on the host. smithy main now ships the common Rust build-dep set (cmake, clang, lld, perl, m4, protobuf-compiler, libclang-dev, zlib1g-dev). Pushing an empty commit to re-trigger CI; clippy should now finish on rust-cpu.
Builds on the proven clippy migration (PR description, original
commit on this branch). Two separate concerns:
1) ci.yml — broaden the migration
Migrate every gating job that doesn't need infra we don't have on
the smithy host. Two stay on ubuntu-latest with explicit comments
explaining why; everything else now targets the matching smithy
runner class:
rust-cpu (12G MemoryHigh) clippy, test, bench-smoke,
coverage, proptest, fuzz-smoke,
rivet-validate
lean-mem (24G MemoryHigh) miri, mutants
light (4G MemoryHigh) fmt, audit, deny, supply-chain
ubuntu-latest (kept) bazel-test (no Bazel on host),
kani (kani-verifier bundles CBMC,
~100 MB install — not worth pre-
provisioning until kani sees more
use)
The lean-mem class for miri / mutants is deliberate: both are
RAM-aggressive (Miri's borrow tracker, mutants' parallel cargo
invocations). The 24G MemoryHigh ceiling on smithy lean-mem
runners is comfortably above the 12G rust-cpu cap.
2) mutants-weekly.yml — new heavy-quality workflow
Counterpart to the gating `mutants:` job in ci.yml. Different
operational pattern (smithy DD-pattern for "heavy quality"):
- schedule: 02:00 UTC every Sunday + workflow_dispatch on demand
- runs-on: lean-mem (24G), timeout-minutes: 720
- concurrency.cancel-in-progress: false (never cancel a quality run)
- workflow_dispatch inputs: `shard` (default 0/8 for sanity, "all"
for the full ~hours pass) + `packages` (space-separated -p list)
- results land in GITHUB_STEP_SUMMARY (markdown table of
missed/caught/timeout/unviable) plus an uploaded artefact with
90-day retention
- no PR red lights; no auto-Issue filing yet (that's a follow-up
once the report shape stabilises)
This is the second-pattern pilot the smithy fleet was sized for —
the lean-mem runners have been idle since registration; this puts
them on the work they were labelled for.
GitHub limits workflow_dispatch and schedule triggers to workflows that already exist on the default branch. Adding a path-filtered push trigger lets us exercise the workflow on this PR before merge. The push: block carries a TEMPORARY marker; remove it before merge.
Prior run hit 'Permission denied (os error 13)' on .d files in target/. Direct file-write tests as the runner user succeed; the files are owned correctly with mode 640. Suspect: stale state left by a cancelled run interacting badly with concurrent jobs landing on the same runner via cache restoration. Clearing all runner _work and the shared sccache to bisect: if a clean run also fails, it's not stale state.
Disabled RUSTC_WRAPPER in runner env (smithy commit 65e57a2); runners restarted to pick up the new environment. bpftrace running on host capturing every openat returning EACCES with PID/UID/comm/filename. Pushing this empty commit to fire CI.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
The action bundles an older cargo-audit that can't parse CVSS 4.0 advisories like RUSTSEC-2026-0037 and exits non-zero on the parse error before evaluating spar's Cargo.lock. cargo-audit is pre- installed on smithy at v0.21.2 (toolchains role) which handles CVSS 4.0 fine. Same effect (audit blocks PRs on advisory hits) without the wrapper.
Smithy main now ships: - subuid/subgid for runner1..8 (Cargo Deny rootless container fix) - CARGO_HOME/bin on the runner env PATH (Rivet validate fix) - always-on bpftrace EACCES tracing (smithy-trace-eacces.service) Plus this branch carries: - cargo audit invoked directly (replaces broken rustsec/audit-check) All runners restarted with new env. This commit fires fresh CI.
…roken)
Two adjustments after the smithy subuid + PATH fixes landed:
1. cargo-deny: drop EmbarkStudios/cargo-deny-action@v2 (which runs
in a rootless container) in favour of direct `cargo deny check`.
Smithy has cargo-deny installed (toolchains role v0.16.4). The
container action fails on our hardened runner systemd unit:
newuidmap is setuid but NoNewPrivileges=true blocks the
escalation, so the rootless namespace can't be set up. Going
direct sidesteps the entire interaction; we'd otherwise need to
weaken the runner hardening for this single workflow.
2. audit: back to ubuntu-latest temporarily. Smithy ships cargo-audit
v0.21.2 which still rejects RUSTSEC-2026-0037 ('unsupported CVSS
version: 4.0') even though upstream rustsec 0.30+ supports CVSS
4.0. v0.22.1 would fix it but that build trips on our
sccache-on-cc setup (aws-lc-sys C compile through sccache fails).
Move back once smithy ships an upgraded cargo-audit.
Surfaced when running `cargo deny check` directly with the toolchains-role-installed cargo-deny v0.16.4 on smithy: error[deprecated]: this key has been removed, see EmbarkStudios/cargo-deny#611 The yanked + licenses + bans + sources sections still gate normally. Unmaintained-crate detection moved out of the static config in newer cargo-deny; revisit if/when we want to re-enable that signal.
cargo-deny and cargo-audit share the same rustsec advisory parser. Both fail at the same point on RUSTSEC-2026-0037 because the embedded rustsec rejects CVSS 4.0 strings. The audit job (on hosted) still covers vulnerability matching; cargo-deny here keeps gating bans, licenses, and sources, which is what it actually adds beyond audit. Drop the workaround once smithy ships an upgraded rustsec parser (tracked alongside the cargo-audit upgrade).
This was referenced May 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First pilot migration of a CI job from GitHub-hosted to the
pulseengine self-hosted fleet (
hetzner-privaterunner group onpulseengine-ci-01). Scope deliberately small: just theclippyjob, switched to
[self-hosted, linux, x64, rust-cpu]. Other jobs(
fmt,test,proofs) stay onubuntu-latest.Rationale
which is GitHub-hosted runner queueing on the org-free tier
(20-concurrent cap).
but bounded — failure doesn't block format checks or tests.
sudo,apt, or container needed → no friction with ourrootless runner setup.
dtolnay/rust-toolchain, so thetoolchain version matches between hosted and self-hosted.
Test plan
rust-cpurunner (1 of 5/6/7) within seconds (no GitHub queue)ubuntu-latestas beforeRollback
Revert this commit.
runs-on:flips back toubuntu-latestandthe next run uses GitHub-hosted compute.
Follow-ups (if green)
fmtandtestnext (separate PRs).mutants-weekly.yml) that targetslean-memrunners, separate from gating CI.