ci: route heavy Rust jobs through Incredibuild build runners#1
ci: route heavy Rust jobs through Incredibuild build runners#1zozo123 wants to merge 65 commits into
Conversation
Mirror the pattern used in Incredibuild-RND/uv (branch
ci/incredibuild-runners): move pure-cargo Linux jobs onto the
self-hosted `incredibuild-runner` label and wrap their cargo
invocations with a small wrapper that goes through `ib_console` when
present (falls back to plain cargo elsewhere, so the same workflow
step still works on GitHub-hosted runners).
Jobs migrated:
- test-rust (8x cargo llvm-cov compile/test invocations)
- bench-test (cargo bench)
- miri (cargo +nightly miri test)
- fuzz (cargo install cargo-fuzz + cargo fuzz run)
Jobs intentionally NOT migrated yet:
- test-python / test-python-coverage -- compile through maturin,
needs a follow-up to route maturin's internal cargo invocation
through ib_console
- test-rust-os -- macOS / Windows only
- lint, build*, test-builds-*, release-* -- light or Docker-based
New files:
- scripts/cargo-ib.sh -- ib_console-aware cargo wrapper,
graceful fallback to plain cargo
- scripts/ensure-ci-tools.sh -- bootstrap sudo/curl/wget on lean
self-hosted runners
Each migrated job pins its own CARGO_HOME / CARGO_TARGET_DIR under
${{ github.workspace }} so concurrent IB jobs don't corrupt each
other through the shared /ib-workspace/cache/cargo* volumes.
ib_console's separate build cache still accelerates compile.
The self-hosted incredibuild-runner image installs Python via actions/setup-python, which on this runner ships libpython3.X.so.1.0 but not the linker-discoverable libpython3.X.so symlink. pyo3-using crates emit a '-lpython3.X' directive, so test-rust (links monty-datatest via pyo3) and bench-test (links monty-bench via pyo3) both fail at the link step: rust-lld: error: unable to find library -lpython3.14 Add a small symlink-recovery step right after setup-python in both jobs. No-op when the .so symlink is already present, so safe on GitHub-hosted runners too.
The first fix (creating the missing libpython3.X.so symlink under $sys.prefix/lib) was necessary but not sufficient. pyo3-ffi's build.rs reads sysconfig at compile time and emits a -L pointing at the path baked into the python-build-standalone tarball (/opt/hostedtoolcache/Python/...), which doesn't exist on this self-hosted IB runner — the real install is under /actions-runner/_work/_tool/Python/.... When the rust-cache restore brings back the cached pyo3-ffi build script output, the stale -L survives across runs. Make the link work regardless of stale paths by exporting LIBRARY_PATH and LD_LIBRARY_PATH pointing at the real lib dir via $GITHUB_ENV. cc / lld fall back to LIBRARY_PATH when the explicit -L paths don't resolve, and LD_LIBRARY_PATH covers runtime when cargo llvm-cov subsequently runs the produced binaries. Also adds a SYSCONFIG_LIBDIR diagnostic to confirm the theory in future logs.
test-rust runs monty-datatest, which spawns CPython subprocesses and compares their output against monty. On the IB runner the default locale is C/POSIX, so CPython picks the ASCII codec for default text I/O and tests that open files with non-ASCII content (mount_fs__errors.py, mount_fs__ops.py — emoji + 0x80 bytes) fail with UnicodeDecodeError. ubuntu-latest has C.UTF-8 by default. Pin LANG / LC_ALL to C.UTF-8 and set PYTHONUTF8=1 belt-and-braces.
These are monty's heaviest workloads — test-python is a 5-version matrix that each compiles pyo3+monty+monty-python via maturin twice (dev + release), and test-python-coverage adds full llvm-cov instrumentation on top. Moving them onto incredibuild-runner is where the biggest acceleration headroom lives. maturin spawns cargo as a subprocess. Cargo respects the $CARGO env var when an external tool launches it, so setting CARGO=$GITHUB_WORKSPACE/scripts/cargo-ib.sh at the job level makes maturin's internal cargo invocation go through ib_console exactly like the direct cargo calls in test-rust. Each test-python matrix entry pre-installs its target Python through uv (so we can locate the install before maturin runs), then creates the libpython3.X.so symlink and exports LIBRARY_PATH/LD_LIBRARY_PATH — same recipe as test-rust/bench-test, applied per matrix Python. test-python-coverage uses the same fix plus wraps its direct cargo llvm-cov invocations the same way as test-rust.
…sole cargo-ib.sh execs ib_console which writes 'Incredibuild System: Trying to connect to ib_server...' / 'ib_server connected, start process execution...' to stdout before passing through to cargo. For compile commands that's harmless logging. For 'cargo llvm-cov show-env --export-prefix' — whose entire stdout is meant to be eval'd as shell — those leading lines get evaluated: + eval 'Incredibuild System: Trying to connect to ib_server... /actions-runner/_work/_temp/...: Incredibuild: command not found Use plain cargo for the env-discovery call. Compile commands (clean, report) still go through the wrapper, and maturin's internal cargo invocation still gets accelerated via the job-level CARGO env.
Codecov Results 📊✅ Patch coverage is 100.00%. Project has 23456 uncovered lines. Generated by Codecov Action |
Reading the ib_linux source (Incredibuild-RND/ib_linux), two findings drive this change: 1. The default profile at /opt/incredibuild/data/ib_profile.xml lists rustc as type='allow_remote' but does NOT enable ib_cache for it. Only cc1/cc1plus/gcc/clang have cached='true'. So by default ib_console DISTRIBUTES rustc invocations but does NOT persist their outputs to the build-avoidance cache. Every CI run recompiles every crate. For a Rust-heavy workspace like monty, that's the dominant cost. The android9+ custom profile bundled in ib_linux shows the right syntax (<ib_cache enabled='true' /> child element, not the cached='true' attribute which routes to ccache). We add a minimal custom profile that overrides only rustc and pass it via ib_console --profile=. 2. Per ib_linux:cpp/BuildCache/BuildCache_HitMiss.cpp, ib_console writes hit/miss info to a logfile when started with --build-cache-local-logfile=. Combined with --build-cache-report-all-miss, each run produces a per-job log we can dump and grep to see what is hitting / missing the cache. Changes: - scripts/ib-profile.xml: enable ib_cache for rustc, keep the default exclude_args (skip build_script_build/build_script_main / version probes). - scripts/cargo-ib.sh: pass --profile=, --build-cache-local-logfile, --build-cache-report-all-miss to every wrapped cargo invocation. - .github/workflows/ci.yml: add 'IB pre-flight diagnostics' and 'IB cache stats' steps (if: always()) to every migrated job. These print ib_console version, cache directory location, and post-build hit/miss summary so the value of IB acceleration is visible in the GitHub Actions run log.
- concurrency.cancel-in-progress=true on the workflow: stops the pile-up of in-flight runs all competing for the single self-hosted IB runner when a chain of commits lands quickly. - max-parallel: 3 on the test-python matrix: 5 simultaneous matrix entries on one IB runner caused contention that pushed each job's wall time well above the ubuntu-latest baseline. Three at a time keeps each job closer to dedicated-runner timings while still parallelising the matrix. - timeout-minutes: 30 on every IB-routed job: gives us a known cap to compare against the mysterious ~12-minute kill we saw on test python 3.14 in the previous two runs. If the runner kills before 30 min, the kill came from outside GitHub Actions and we'll see a different failure signature.
Two fixes / one extension:
1. scripts/ib-profile.xml: XML 1.0 forbids '--' inside <!-- --> comments
per spec 2.5. The previous version had literal command-line flags
(--build-cache-local-shared etc.) in the comment body, which made
ib_console reject the profile with:
ib_console: Comment must not contain '--' (double-hyphen)
That broke every IB-routed job in the run before this one (exit 255
in 14-30 seconds, before any compile). Rephrased the comment to
avoid '--' sequences and re-validated against the schema implicitly
(Python's xml.etree.ElementTree parses it cleanly).
2. Migrate the lint job to incredibuild-runner. lint runs prek which
triggers a workspace-wide clippy compile pass and is the last big
rust-compile workload not yet routed through IB. With CARGO env
set at the job level, prek's internal cargo invocations go through
cargo-ib.sh and benefit from the same ib_cache as test-rust.
Migrated jobs are now:
lint, test-rust, test-python-coverage, test-python (5-version
matrix), bench-test, miri, fuzz.
Remaining ubuntu-latest jobs are intentional: macOS/Windows
test-rust-os; Docker-bound build/build-pgo/build-js; lightweight
artifact/inspection/release jobs.
…rapper The ib_console XML schema (data/ib_profile.xsd in ib_linux) requires: 1. <ib_profile> element to carry version='1' attribute 2. <process> elements wrapped in a <processes> sequence container Without those, ib_console rejects the profile early with: ib_console: Element 'ib_profile': The attribute 'version' is required but missing. Can't validate document from '...' using schema '/opt/incredibuild/data/ib_profile.xsd' That fails every IB-routed job with exit 255 before any compile step. Matched the structure used by the bundled android9+ custom profile (ib_linux:data/custom_profiles/android/9+/ib_profile.xml).
The ib_profile.xsd schema (ib_linux:data/ib_profile.xsd) defines:
<xs:complexType name="ib_profile_type">
<xs:sequence minOccurs="1" maxOccurs="1">
<xs:element name="globals" type="globals_type" />
<xs:element name="processes" type="processes_type" />
</xs:sequence>
<xs:attribute name="version" type="version_type" use="required" />
</xs:complexType>
and globals_type requires ignore_following_profiles. Without it,
ib_console refuses the profile:
ib_console: Element 'processes': This element is not expected.
Expected is ( globals ).
Setting ignore_following_profiles='false' makes our profile additive
on top of /opt/incredibuild/data/ib_profile.xml — the system default
still loads and only the rustc entry is overridden to enable
ib_cache.
Two cosmetic fixes from yamlfmt that lint enforces: - Remove the misindented 'alls-green#why' top-of-job comment that ended up between fuzz job's last step and the next job header. yamlfmt kept trying to push it inside the fuzz job's block, producing diffs each run. - Drop the extra blank line inside the test-python matrix's libpython step body. Functionally identical; just unblocks the lint job from cycling on formatting nits.
Two corrections discovered by re-reading ib_linux:cpp/XgConsole/
XgConsole_main.cpp and BuildCache/BuildCache_defines.h:
1. --build-cache-force is NOT a real ib_console flag. There's no
matching getopt_long entry and no GETOPT_ enum value, so prior
runs were silently ignoring it. Removed from cargo-ib.sh. The
semantically equivalent behavior (cache-fill on first run) is
implicit in --build-cache-local-shared.
2. The IB build-avoidance cache lives at:
/etc/incredibuild/cache/build_cache/shared/
(BUILD_CACHE_LOCAL_PATH in BuildCache_defines.h), NOT under
/ib-workspace/cache/. Build reports for sqlite-based stats live
under /etc/incredibuild/db/. The diagnostic steps now inspect
those real paths before and after each job and try to surface
hit/miss stats via the bundled show_build_cache_statistics.sh
when a buildId can be inferred.
This is purely a visibility + correctness change; cache behavior
itself is unchanged from the previous commit. Lets us see, in each
job log, whether the IB cache is being populated and growing as
expected, and whether the rustc-cached profile actually translates
to manifest.json + .tar artifacts under the shared cache dir.
Discovered in miri run pydantic#12's stdout: Incredibuild System: Build Cache report is '/etc/incredibuild/log/2026-May-11/local-14/ib_hm.log' So ib_console writes hit/miss data to a per-build path under /etc/incredibuild/log/YYYY-Mon-DD/local-<buildId>/, regardless of where --build-cache-local-logfile points. (The runtime path our script asks for is inside the chroot/namespace, hence invisible.) Post-flight step now finds the 3 most-recent ib_hm.log files via mtime, dumps the tail of each, and counts HIT/MISS lines so each job's cache effectiveness is visible directly in the GHA log. Also visible from run pydantic#12: /etc/incredibuild/cache/build_cache/shared already contains 465 MiB across 454 .tar artifacts and hash-prefixed subdirs (00..ff). The cache is real, populated, and surviving across runs. The missing piece was just the per-run hit/miss numbers; this commit surfaces them.
prek runs make lint-rs which invokes cargo clippy directly (no
'uv run' wrapper). cargo honors .cargo/config.toml which sets
PYO3_PYTHON=.venv/bin/python3 (relative). On the IB self-hosted
runner that path doesn't resolve at clippy time:
error: failed to run custom build command for pyo3-build-config
error: failed to run the Python interpreter at
/actions-runner/_work/monty/monty/.venv/bin/python3:
No such file or directory (os error 2)
The other migrated jobs (test-rust, bench-test, miri) already do
'rm .cargo/config.toml' for the same reason — clippy then uses
setup-uv's python via pyo3-build-config auto-detection.
…t deps
When CARGO_HOME=$github.workspace/.cargo, cargo's git dependency
checkouts land at .cargo/git/checkouts/<crate-hash>/<rev>/...
inside the workspace. prek then runs ruff/format-lint-py across the
workspace, walks into .cargo/git/checkouts/ruff-*/, and chokes on
ruff's own intentional bad-input test fixtures:
Failed to read .cargo/git/checkouts/ruff-.../crates/
ruff_notebook/resources/test/fixtures/jupyter/invalid_extension.ipynb:
Expected a Jupyter Notebook, [...] isn't valid JSON
Failed to parse .cargo/git/checkouts/ruff-.../crates/
ty_completion_eval/truth/.../main.py:1:1:
Invalid annotated assignment target
Pin CARGO_HOME to $runner.temp/lint-cargo for the lint job so the
cargo registry/git checkouts live outside the prek scan root.
This is lint-only because it's the only IB-routed job that runs ruff
on the workspace tree. The other migrated jobs keep CARGO_HOME under
github.workspace to avoid cross-job collisions on a shared registry
when concurrent jobs share the IB runner filesystem.
…env)
runner.temp is only available at STEP-level env / in run scripts —
NOT at job-level env. The previous commit's
CARGO_HOME: ${{ runner.temp }}/lint-cargo
caused the whole workflow to fail to start (run had 0 jobs, run
name reverted to '.github/workflows/ci.yml' literal path, signal
that GitHub Actions rejected the file during initial validation).
Use a static /tmp/lint-cargo — guaranteed writable on Ubuntu-based
self-hosted runners and reliably outside the workspace tree.
Two issues observed on run pydantic#16: 1. lint failed at the runner's 12-minute hard cap. Real work (prek, IB cache stats) all SUCCEEDED in ~30s. The 11+ minutes were spent in 'Post Run Swatinem/rust-cache' (saving cache to GitHub Actions cache storage from inside ib_console's chroot/namespace). Whereas test-rust's Post-Swatinem completed fine because the cache key already matched the restored entry (nothing new to save). lint uses nightly Rust + prek-installed tools, so the post-restore diff is larger and the save phase stalls. 2. test python 3.12 and 3.14 hit the 12-minute cap on 'make dev-py-release'. Other matrix entries (3.10/3.11/3.13) finished in ~5 minutes. Suggests resource contention between 3 concurrent maturin-release compiles on the single IB runner. Mitigations: - save-if: ${{ false }} on every Swatinem/rust-cache step in IB jobs. The IB build cache is what's actually accelerating us (Swatinem restored only 1.7 KB on previous runs); making Swatinem restore-only eliminates the post-action stall. - max-parallel: 3 -> 2 on the test-python matrix to give each concurrent maturin release compile more CPU headroom on the single runner.
…ility Run pydantic#18 showed that long-compile IB jobs (miri, fuzz, lint) hit a ~10-12 minute wall-clock cap on the self-hosted IB runner when 6+ concurrent compile jobs share its CPU. The cap is runner-side (not GitHub Actions timeout-minutes). Workaround: reduce concurrent IB jobs. Changes: - test-python matrix: max-parallel 2 -> 1 Serializes the 5 Python versions, removing the largest single source of concurrent compile pressure. - miri: needs [bench-test] Stages miri after bench-test, so miri's cargo-fuzz / miri test compile doesn't share CPU with bench-test's monty-bench compile. - fuzz: needs [miri] Stages fuzz after miri. Both are compile-heavy. Net effect on a typical run: - ~4 concurrent heavy IB jobs at peak (was ~8) - per-job wall-clock should stay under the cap - workflow wall-clock increases but reliability improves
Pulls every migrated job's IB setup/diagnostic boilerplate out of
ci.yml and into two helper scripts:
scripts/ib-prep.sh pre-flight: baseline tools (sudo/curl/wget)
+ ib_console diagnostics + libpython.so symlink
+ LIBRARY_PATH/LD_LIBRARY_PATH exports
+ .venv ensure for lint's prek/clippy
scripts/ib-stats.sh post-flight: dump real cache path size + .tar
artifact count + ib_hm.log tails
Each migrated job's body is now minimal:
- uses: actions/checkout@...
- name: IB pre-flight
run: ./scripts/ib-prep.sh
- <real work>
- name: IB cache stats
if: always()
run: ./scripts/ib-stats.sh
ci.yml drops 474 lines (-28 %). Future upstream syncs are now easy:
re-pull the workflow, drop one line per migrated job (the pre-flight
and stats steps), and the rest is upstream verbatim.
Also fixes the persistent lint failure: don't 'rm -f .cargo/config.toml'
(prek's check-yaml hook requires the file present on disk); instead
ib-prep.sh pre-creates .venv at workspace root via 'uv venv' so the
PYO3_PYTHON=.venv/bin/python3 path resolves under clippy.
scripts/ensure-ci-tools.sh removed; its baseline-tool logic now lives
inside ib-prep.sh.
Two fixes after run pydantic#20 surfaced two new issues: 1. zizmor (workflow security audit, exit 12) flagged the 'save-if: ${{ false }}' as obfuscation per docs.zizmor.sh audits/#obfuscation — recommends the static evaluation. Switch to literal 'save-if: false' on all 7 Swatinem steps. Same behavior, zizmor-clean. 2. bench-test (and any other pyo3-linking job) failed with 'rust-lld: error: unable to find library -lpython3.14' because ib-prep.sh ran right after checkout, BEFORE setup-python. With no python3 on PATH yet, the libpython.so symlink + LIBRARY_PATH exports were skipped, and by the time cargo bench ran, pyo3-ffi had no library search path. Move 'IB pre-flight' to sit just before the first cargo / make / maturin / prek invocation in each migrated job. ib-prep.sh now runs after setup-python and setup-uv, so it has the right python on PATH for its libpython + .venv work.
test-rust hit the IB runner's 12-min wall-clock cap on run pydantic#21 while mid-way through its 7-pass cargo llvm-cov sequence (step 14 of 22). The cap is shared-CPU-driven: when 4+ heavy compile jobs share the single self-hosted IB runner, test-rust's wall-clock blows past the cap. Stage test-rust to wait for bench-test (~50s), lint (~150s), and test-python-coverage (~115s) before it starts. Once those clear, the only concurrent compile load is the already-serialised test-python matrix (max-parallel:1). With less competition, test-rust's 7×llvm-cov fits under the cap (was 250s wall-clock on run pydantic#16 in similar conditions).
Run pydantic#22 had 10/11 jobs green but test python 3.14 sat queued ~40min on the IB runner. Trigger a fresh run that should: - run on warm IB cache (run pydantic#22's compiles persisted to /etc/incredibuild/cache/build_cache/shared/) - pick up the runner cleanly via the concurrency cancel-in-progress - give us the complete 11/11 green baseline for the benchmark
basedpyright failed in lint with:
uv run basedpyright
/ib-workspace/build/venv/lib/python3.14/site-packages/basedpyright/
dist/pyright.js:154568
SyntaxError: Invalid or unexpected token
The IB runner image carries a stale /ib-workspace/build/venv that uv
falls through to when it can't find a project venv. The pyright.js
there is broken, and 'uv run' picks it up over the venv our 'uv sync'
creates.
Pin UV_PROJECT_ENVIRONMENT=$github.workspace/.venv at the lint job
env so 'uv run' resolves to the fresh local venv. ib-prep.sh already
'uv venv .venv' fallback-creates it.
The IB self-hosted runner's ~10 min wall-clock cap repeatedly killed lint mid-prek across runs pydantic#18-24. lint's heavy steps (basedpyright loading 154k-line pyright.js, workspace-wide clippy compile) are neither IB-cacheable in a meaningful way nor compile-bound enough to benefit from ib_cache. Run it back on ubuntu-latest (was 4m07s upstream) where parallelism + bigger CPU keep it under any timeout. test-rust's 'needs:' chain drops 'lint' (lint is now parallel on ubuntu). Still needs [bench-test, test-python-coverage] which both sit on the same IB runner and want to clear before test-rust's 7-pass llvm-cov compile starts.
make dev-py-release runs uv run maturin develop --release. The repo's release profile is lto='fat' + codegen-units=1 (great for shipping wheels, slow to compile). On the IB self-hosted runner that compile + the followup pytest blew past the ~12-min wall-clock cap on test python 3.10 / 3.12 / 3.14 across runs pydantic#16, pydantic#20, pydantic#24, pydantic#26, pydantic#27. Override CARGO_PROFILE_RELEASE_LTO=false and CODEGEN_UNITS=16 inside test-python only. Same release semantics (optimized + debuginfo stripped behavior intact), just trades a bit of binary perf for much faster link. The real LTO-built wheels are still exercised end-to-end by test-builds-os/test-builds-arch which use maturin-action's Docker image (not migrated to IB).
|
the monty project owner honesty pass — went back through the actual CI logs and found one number I had inflated. Pushing a recalibration to What changedThe 8.36× number is real but it's the ceiling, not the realistic CI value. I had been quoting it as both. Verified what is and isn't true: Verified ✅
Recalibrated
|
| # | command | wall |
|---|---|---|
| 1 | llvm-cov --no-report -p monty |
84 s (cold for the llvm-cov-instrumented variant) |
| 2 | llvm-cov run --no-report -p monty-datatest |
26 s (warm replay + tests) |
| 3 | --features memory-model-checks |
62 s (new feature flag = different cache key) |
| 4 | same flag, monty-datatest |
14 s (warm replay + tests) |
| 5 | --features ref-count-return |
56 s (new feature) |
| 6 | same flag, monty-datatest |
15 s (warm) |
| 7 | monty_type_checking -p monty_typeshed |
47 s (different crates) |
| total | ~304 s |
Why this is so much smaller than the 8× ceiling: monty's coverage matrix sprays distinct rustc cache keys by design (different --features, different -p). The cache cleanly replays on 3 of 7 invocations and shows ~2.5–3× per call when it does, but the other 4 hit fresh keys and run near-baseline. Net test-rust wall (~304 s) vs an estimated ubuntu-latest baseline (~350–450 s for the same 7 calls) is ~1.2–1.5×, plus the 1.55× hardware floor → ~1.5–2× total.
Honest headline numbers for any external use
- 1.55× — pure hardware floor (cell B steady-state, no rustc cache)
- ~1.5–2× — realistic
test-rustspeedup on monty as currently structured - ~2.5–3× — per-invocation when the cache actually hits (test-rust steps 4, 6 are the proof)
- 8.36× — ceiling on identical-workload cache replay (the bench's cell D)
The integration is still correct and worth merging — every speedup is positive and the wrapper is source-grounded. I just wouldn't want to promise an 8× CI cut without first looking at how feature-flag-diverse a customer's cargo invocations are.
Also documented in the doc
- The 500 MiB warm-replay target/ delta is
target/debug/incremental/— cache replay restores rustc outputs (.rlib/.rmeta/test binaries) but not cargo's own incremental-state side files. Correct forcargo test --no-run, no functional issue, but means a subsequent edit-rebuild on the same checkout gets the IB cache (replay-fast for unchanged code) instead of cargo-incremental (which is what you'd expect). Worth knowing for the mental model. - Per-runner cache locality (the 8 / 614 / 987 MiB observation) implies first cargo invocation on each new runner pays a one-shot ~40–80 s cache fill; everything after amortises against the local cache.
What this means for the strategic recommendation
The original three product asks still stand and only get stronger:
- Make rustc caching the default, not opt-in. Today the out-of-the-box experience for any Rust repo is the 1.55× hardware floor and zero cache value until someone reads the source.
- Cached
build_script_build/build_script_mainin a sandboxed-env mode would noticeably help any pyo3/maturin repo because every cold compile re-runs all build scripts. - A test-binary fingerprint cache would unlock the test-execution dilution we just measured (steps 4 and 6 are 14–15 s of which most is test runtime, not compile). This is a real product feature, not a config knob.
Plus the -- -in-XML-comment fail-fast bug from the earlier comment.
PR is still merge-ready. Numbers in IB_BENCH_RESULTS.md are now the ones I'd actually defend with a customer.
Cells A/B/C/D measure the synthetic `cargo test --no-run -p monty`
workload, which is fast but doesn't capture the full test-rust cost
(7x cargo llvm-cov + clean). The realistic test-rust speedup so far
has been an estimate (~1.5–2x) inferred from real-CI logs.
Adds two new measurement cells running the actual ci.yml::test-rust
sequence verbatim, so the E → F steady-state ratio is the directly
measured number:
E ubuntu-latest, plain cargo, 2 iterations
F incredibuild-runner, cargo-ib.sh, IB warm cache, 2 iterations
(chained after D for predictable IB cache state)
Implementation:
* scripts/ib-bench-run.sh — adds WORKLOAD={synthetic,test-rust} and
CARGO_BIN env vars. Synthetic stays the default so cells A/B/C/D
are unchanged. The test-rust workload runs the 8-call llvm-cov
sequence per iteration; per-iter wall/user/sys are summed across
calls and rss is the per-call max. CSV schema unchanged
(one row per iteration).
* .github/workflows/ib-bench.yml — adds cell-E-ubuntu-test-rust
and cell-F-ib-test-rust jobs with 30-min timeouts; both feed
the summarize job's needs list and CSV-collection loop.
* scripts/ib-bench-summarize.py — extends CELLS with E/F, adds an
"E → F" steady-state row that fmt_ratio's iter≥2 means, refreshes
the top-level doc and section heading.
Pure additive: cells A/B/C/D, scripts/cargo-ib.sh, scripts/ib-profile.xml
and .github/workflows/ci.yml are untouched.
Co-authored-by: Cursor <cursoragent@cursor.com>
Three additive PoV improvements based on parallel subagent investigations: - Cell E (ubuntu-latest, real test-rust workload, 8 cargo llvm-cov calls / iter, target wiped between iters) measured at 357 s steady-state from run 25705064240. Replaces the previously- inferred ubuntu-latest baseline. Cell F still pending the IB runner pool which has been fully offline (0/30 online) for the measurement window. - New ib-probe.yml workflow (dispatch-only, 5 min on incredibuild- runner) probes role markers, ib_server/ib_coordinator presence, Coordinator.* rows in the agent SQLite DB, --check-license, and a no-standalone smoke test. Answers "is IB distribution available on this runner image?" — currently believed to be no (initiator-only image), but --standalone in the wrapper silences the only diagnostic that would prove or disprove it. - IB_BENCH_RESULTS.md gains a "Distribution mode" section and an "sccache structural comparison" section. Distribution explains what --standalone really does (per XgConsole_Session.cpp:308- 404: tolerate missing coordinator, NOT skip ib_server connect timeout — earlier doc was wrong on this) and what cell Q would measure if helpers were provisioned. Sccache section explains why the OSS baseline structurally caps below IB's 8.36x ceiling on monty (~25 proc-macro crates + bin test binary + incremental workspace crates are all uncacheable by sccache); cites public sccache speedup numbers from NeoSmart 2024 + sccache#2041. Also fixes the --standalone comment in cargo-ib.sh to reflect what the source actually shows the flag does. Co-authored-by: Cursor <cursoragent@cursor.com>
|
the monty project owner PoV iteration: dispatched three parallel investigation/build subagents and pushed their results as commit Summary of changes since the last comment
Cell E ground truth (run 25705064240)The same 8-call sequence as
That replaces the previously-inferred ~350–450 s estimate with a direct measurement at 357 s. Iter 2 is only 14% faster than iter 1, which is the right answer: most of the wall is rustc on a wiped target/, which a registry warmup can't help. Cell F: pendingWill land at run 25706688862 (or a re-trigger) once IB runners come back online. Predicted band: 150–250 s based on 1.55× hardware floor × 1.3–2.0× cache value on the mixed-key matrix. Once F completes, the measured E→F speedup replaces the estimated band. Distribution: the second axis we deliberately left unmeasuredThe wrapper currently uses
Source citations: Indirect evidence (every successful IB job in this PR ran with To confirm, dispatch If the probe shows distribution is available: a future cell If the probe shows distribution is not available: that's a high-leverage product finding — the GH-hosted IB runner image as shipped cannot demonstrate the distribution side of IB's value prop. Provisioning a default helper pool in the runner image would unlock another 1.7–2.5× on cold-path CI for every customer who uses the runner as-is. sccache: why this PoV's value isn't trivially commoditised by the OSS baselineThe most-asked sceptical question on a CI-cache PoV is "sccache is free and also caches rustc — why pay for IB?". Documented answer in the writeup, summary here:
Estimated direct comparison: sccache would land at ~1.7–3.2× on monty's Final headline numbers (what to put on a slide)
Strategic implications for IB-as-product (sharper now)Cumulative findings worth raising with the IB product team:
Files updated
PR is still merge-ready. Numbers are now ones I'd defend in front of a customer or product team. Cell F will arrive at the next online window without further code changes. |
Co-authored-by: Cursor <cursoragent@cursor.com>
All six bench cells green on the same date / same runner pool. Replaces estimates with measurements: - Cell A (synthetic, ubuntu-latest): 36.4s steady-state - Cell B (synthetic, IB no-cache): 22.1s steady → 1.65x hardware floor - Cell C (synthetic, IB cold cache): 40.6s, +612 MiB - Cell D (synthetic, IB warm cache): 4.2s steady → 8.68x ceiling - Cell E (real test-rust, ubuntu-latest): 325.7s steady - Cell F (real test-rust, IB warm cache): 220.2s steady → 1.48x measured ib-probe.yml run (25706946478) confirmed: runner image is initiator + helper, coordinator-less. Distribution path is structurally unavailable until a coordinator + helper-pool registration are added at runner-image build time. Updated the distribution section to reflect the probe's actual output rather than the prior "to be probed" wording. Final realistic test-rust speedup of 1.48x is at the bottom of the prior 1.5-2x estimate band. Documented why: feature-flag matrix spray, IB_MAX_LOCAL_CORES throttling for wall-clock-cap mitigation, and uncached test execution combined leave less room than the unthrottled cell B can show on a single cargo call. Co-authored-by: Cursor <cursoragent@cursor.com>
|
the monty project owner PoV definitive — all six bench cells green, distribution gap confirmed by probe. Last comment ended with cells E pending and F predicted-but-unmeasured. Both have now landed on the same runner pool, same date. Plus the IB topology probe ran and gave a definitive answer on whether distribution mode is even possible on this runner image. Run 25706688862 (commit Final canonical numbers (steady state, iter ≥ 2)
What the headline numbers mean
Note about cell F vs cell BCell F's 1.48× is measurably less than cell B's 1.65× hardware floor. That looks counter-intuitive but is correct: cell F pays the ib_console daemon-startup overhead 8 times per iteration (vs 1 in cell B), and we throttle to Distribution mode: probe confirms it's structurally unavailableNew diagnostic workflow The runner image is initiator + helper, coordinator-less.
What this implies for IB-as-product (sharper now, two findings)Finding 1 — biggest leverage for any Rust customer: out of the box, Finding 2 — second-biggest leverage, surfaced by the probe: the GitHub-hosted IB runner image ships The deployed sccache structural comparison (for the inevitable "why pay" question)Direct measurement is a follow-up PR (would muddy this diff with a separate stats parser). Structural ceiling characterised in
Estimated direct comparison: sccache lands at ~1.7–3.2× on monty's What to put on a slide
Where to look
PR is merge-ready. Numbers are now ones I'd defend in front of a customer or product team. The sccache comparison cell and (if a coordinator is provisioned) a distribution cell would slot into the same harness as a follow-up. |
|
Please stop referencing me in this! |
…I + Layer B manylinux probe + Sam doc Summary of this commit (the monty-side of the seven-layer plan in .cursor/plans/monty-ib-cross-repo-strategy-*.plan.md): Layer F — three monty wirings (unilateral, no upstream dependency) - .github/workflows/codspeed.yml: runs-on: incredibuild-runner + CARGO=$(pwd)/scripts/cargo-ib.sh + IB pre-flight/stats steps. Codspeed builds the bench crate every PR; high cache locality. - .github/workflows/ci.yml::build-js: matrix entries for x86_64-unknown-linux-gnu and wasm32-wasip1-threads switched to incredibuild-runner with conditional IB env (CARGO, IB_MAX_LOCAL_CORES, IB_PREVENT_OVERLOAD) and IB pre-flight/stats guarded by `if: matrix.settings.host == 'incredibuild-runner'`. macOS / Windows / aarch64 / arm64 entries kept on their existing runners (IB has no pool there yet — Layer G). Validation cells (extending the existing A–F bench matrix) - ib-bench.yml::cell-G-ib-shim-simulation: Layer-A simulation. Same test-rust workload as cell F, but cargo is dispatched via a PATH-prepended shim that hand-mimics what vnext-processing-engine/src/build_accelerator/default_rules.yaml's generated cargo entry would auto-emit if cargo were upgraded from ENV mode to SHIM mode (the contents of branch feat/cargo-rustc-shim's ib-accel/bin/cargo). G tracking F within noise is the green light to retire scripts/cargo-ib.sh from monty the moment Layer A lands and the runner image rebuilds. - ib-bench.yml::cell-I-ib-codspeed: codspeed workload (cargo codspeed build -p monty-bench --bench main) on IB warm. Validates Layer F's codspeed.yml rewire. Disjoint rustc keyspace from test-rust, so D/F caches don't help — I's iter1→iter2 ratio is the cleanest single-job signal for the every-PR codspeed workflow. - scripts/ib-bench-run.sh: new `codspeed` workload variant alongside the existing `synthetic` and `test-rust` workloads. - scripts/ib-bench-summarize.py: G/I rendered in the markdown table with their own steady-state comparison sub-tables (F→G ratio, I cold/warm). Layer B — manylinux container probe - .github/workflows/ib-probe.yml: new `manylinux-probe` job runs `runs-on: incredibuild-runner` + `container: image: quay.io/pypa/manylinux_2_28_x86_64`. Probes whether vnext-processing-engine's container-hooks/index.js already injects /ib-workspace volumes and ib_console into a manylinux container (the hypothesis being that 8 of monty's compile-bound jobs — the whole wheel-build matrix — are already IB-reachable but never verified). Probe checks: volume injection, ib_console resolution, glibc compat, --standalone smoke test. Documentation - IB_BENCH_RESULTS.md: appended a Cross-repo strategy update section explaining the two upstream gaps (cargo ENV-mode-only in default_rules.yaml; container-hooks/index.js shipping but never verified for manylinux). Includes a coverage-trajectory table showing how each layer moves monty IB coverage from 12.5% today to 84% with all layers shipped. - IB_NEXT_STEPS_SAM.md: new action-item companion to the bench results doc. Maps each layer (A through G) to owner / effort / effect on monty / effect on every other IB customer; spells out the cleanup deletes that follow each layer's merge; lists the four concrete asks for Sam (approve, get vnext PR reviewed, schedule IB-ops sync for C+E, triage Layer B's probe outcome). Cross-repo PR The companion to this commit is feat/cargo-rustc-shim on Incredibuild-RND/vnext-processing-engine (Layer A — promote cargo from ENV to SHIM mode in default_rules.yaml; 83 unit tests + 6 integration tests). Branch pushed; PR-ready. Co-authored-by: Cursor <cursoragent@cursor.com>
Cross-repo strategy update — Ultrathink plan implementation completeJust pushed What's in this commitLayer F — three monty wirings (unilateral):
Validation harness extensions:
Layer B — manylinux container probe:
Documentation:
Companion vnext PR (Layer A)Opened Incredibuild-RND/vnext-processing-engine#210 — branch Remaining owner actionsSpelled out at the bottom of
If only one of these can ship: #2 (the vnext PR). It's the foundation everything else builds on, and after it lands monty can delete |
…er); pin manylinux digest CI run 25722680967 reproducibly failed in `cargo codspeed run` with: setarch: failed to set personality to x86_64: Operation not permitted ##[error]failed to execute valgrind The CodSpeedHQ action shells out to valgrind, which uses setarch to set ADDR_NO_RANDOMIZE personality. The IB self-hosted runner image runs under restricted Linux capabilities (no SYS_ADMIN, user-namespace remap) so the personality syscall is blocked. github-hosted runners allow it. This is a structural blocker — not specific to monty — that affects every valgrind-based tool in CI (callgrind, memcheck, codspeed, ...). Two paths to recover the IB value here are documented in IB_NEXT_STEPS_SAM.md as a new IB-product roadmap item: 1. Hybrid: cargo codspeed build on IB, transfer artifacts, cargo codspeed run on ubuntu-latest. Doable but requires careful artifact pinning. 2. Have IB ops relax the runner image's seccomp/capability profile to allow setarch personality (or grant CAP_SYS_ADMIN). Common for build runners. Until either lands, codspeed.yml stays on ubuntu-latest. The monty-side measurement of the IB-build value lives in ib-bench.yml::cell-I-ib-codspeed (only `cargo codspeed build`, no valgrind run, so it works on IB). Also pinned the manylinux container image in ib-probe.yml by manifest digest (sha256:443eabd378e1...), addressing zizmor's unpinned-images audit. The probe job uses the digest-pinned image to validate Layer B (container hooks injecting /ib-workspace into container: image: xx jobs). Co-authored-by: Cursor <cursoragent@cursor.com>
ib-probe.yml::manylinux-probe (run 25726192172) confirmed end-to-end:
- vnext-processing-engine container-hooks/index.js fires on a
GHA-level container: block, bind-mounting /ib-workspace/cache and
/ib-workspace/incredibuild + putting /ib-workspace/incredibuild/
ib-accel/bin at the front of PATH inside the container.
- /usr/bin/ib_console v3.25.2 runs natively under the manylinux
image's glibc 2.28 (no GLIBC_2.x mismatch).
- --standalone --no-monitor -- /bin/true connects to ib_server,
proving the cache and the in-namespace distribution path are both
live inside the container.
Cell H closes the loop on Layer B by measuring cargo-test-no-run on
the same manylinux image under ib_console, comparable to cell D
(synthetic, IB warm, on the bare host). H_warm / D_warm tracking
1.0 ± 10% means container-ization adds no overhead and the wheel-
build matrix (build job's 7 Linux entries + build-pgo linux) can be
migrated onto incredibuild-runner with a two-line GHA edit per job.
Doc updates:
- IB_BENCH_RESULTS.md: Layer-A row points at vnext PR pydantic#210; Layer-B
marked GREEN with run link; coverage trajectory updated for the
Phase-8 path (4 -> 6 -> 14 -> 17 -> 27 of 32).
- IB_NEXT_STEPS_SAM.md: Layer-B section rewritten as the validated
result; ask pydantic#4 to Sam flipped to "done"; explicit 30-min agenda
added for the Layer-C + Layer-E IB-ops sync.
Co-authored-by: Cursor <cursoragent@cursor.com>
Closure-plan progress (Sam: this is the brief)Phase 1 — vnext PR review Phase 2 — manylinux container probe → GREEN
Phase 8 — Cell H added (Layer-B end-to-end measurement) Note for the eventual Phase 3 — IB-ops sync agenda (for Sam to schedule)
Phases 4 / 5 / 6 / 7 — gated on the above
Remaining concrete asks (in priority order):
Closing this loop is what gets monty IB coverage from 4/32 (today) → 14/32 after Phase 8 → 17/32 after Phase 7. Layers C and E are pure IB-side config edits; they don't need a code review on either side. |
…nup-fix Three small follow-ups after the Layer-B GREEN result and Cell-H first run: 1. ib-probe.yml::probe — add a "Layer-A cargo SHIM deploy check" group that looks for /ib-workspace/incredibuild/ib-accel/bin/cargo (or /opt/ib-accel/bin/cargo on older variants). The next probe run after vnext-processing-engine#210 lands and the runner image rebuilds will report `FOUND` and unblock Phase 5 of the closure plan automatically — no one has to remember to re-check. 2. IB_CLEANUP_SPEC.md — new mechanical cleanup spec for closure-plan Phases 5 (cargo-ib.sh removal), 6 (ib-profile.xml removal), 7 (lint/fuzz/test-python re-route), and 8 (manylinux build matrix migration). Each phase lists exact files + line ranges + sed patterns + verification + commit-message template, so when its gate clears the right person can open the cleanup PR in 10 min without re-deriving the change set. 3. scripts/ib-bench-run.sh — fix cleanup step to honor $CARGO_TARGET_DIR. Cell H sets CARGO_TARGET_DIR=target-h to isolate from host-side cells, but the cleanup hardcoded `rm -rf target` so cell H iter 2 reused iter 1's artifacts (measured 0.35s instead of a real warm-cache rebuild). target_size() also updated to honor the env. Cells A-G/I always use the default target/ so behavior unchanged for them. The Cell-H first run (in ib-bench run 25727104334) still proved the qualitative finding: container hook fires, ib_console runs under glibc 2.28, cargo wrapping works end-to-end (iter 1 = 46.5s cold). The numerical H_warm/D_warm comparison just needs a re-run with this fix. Co-authored-by: Cursor <cursoragent@cursor.com>
ib-bench run 25727572729 with the CARGO_TARGET_DIR fix produced clean Cell H numbers: A iter 2 (ubuntu-latest, no IB): 37.4 s D iter 2 (IB host, warm cache): 5.27 s -> 7.10x vs A H iter 2 (IB manylinux container, warm): 21.3 s -> 1.76x vs A H beats the closure plan's 1.3x gate for Phase 8. The 4x gap between H (container) and D (bare host) on the same workload is a follow-up: the container's separate rustup install gives it disjoint cargo cache keys from the host. Aligning the toolchain would close the gap, but 1.76x vs ubuntu-latest is already enough to migrate the wheel-build matrix. Co-authored-by: Cursor <cursoragent@cursor.com>
Closure plan — final statePosting now to close my loop on the closure plan. Every phase has either landed, has its mechanical diff fully specified for the moment its gate clears, or has been deferred per the plan. Measured speedups (ib-bench run 25727572729)
Cell H confirms Phase 8 — wiring a wheel-build matrix entry to Phase status
What lands automatically vs needs a humanAutomatic (no further action):
Needs a human:
Branch artifacts
That's everything I can drive from here. The remaining critical-path actions (review, IB-ops config, runner-image rebuild) are scheduled around external owners. The mechanical follow-up PRs are spec'd to the line so they can land in 10 minutes each whenever their gate clears. |
Apply ruff formatting to the Cell-H summary strings so the lint job no longer rewrites scripts/ib-bench-summarize.py in CI. Co-authored-by: Cursor <cursoragent@cursor.com>
Post-PR210 status update
I also immediately re-ran monty's Result: the live runner image still does not contain the generated cargo shim yet. Current probe output: So the decision rule is unchanged: do not delete I did fix the independent lint blocker and pushed Next external asks:
|
Tal deployed the runner image built from vnext-processing-engine#210, and ib-probe run 25732897099 confirmed the generated cargo shim is live at /ib-workspace/incredibuild/ib-accel/bin/cargo. Remove monty's repo-local cargo wrapper and route CI/bench commands through plain cargo so the runner-image shim owns ib_console wrapping via PATH. Keep the repo profile alive until Layer C by teaching ib-prep.sh to export IB_CONSOLE_ARGS for the vnext shim, including the per-job cache logfile and --profile=scripts/ib-profile.xml unless IB_NO_CACHE is set. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Keep the monty wiring aligned with the shipped cargo shim while preserving the small bridge for cargo extension workloads, and make the hosted-profile and CodSpeed decisions explicit locally. Co-authored-by: Cursor <cursoragent@cursor.com>
Record the vnext follow-up that will remove monty's remaining cargo bridge once the runner image is rebuilt. Co-authored-by: Cursor <cursoragent@cursor.com>
Use the deployed vnext cargo shim for Monty's cargo extension and toolchain forms so the evidence branch proves the out-of-the-box runner path. Co-authored-by: Cursor <cursoragent@cursor.com>
Keep the real test-rust benchmark cell aligned with ci.yml so the evidence workflow measures the deployed shim without tripping the runner wall-clock cap. Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
Routes
monty's heavy Rust jobs through Incredibuild build runners,adds the one XML knob needed to extract caching value (the
ib_linuxdefault profile does not cache
rustc, only C/C++ compilers), andships an A/B/C/D bench workflow to measure the value end-to-end.
Full source-grounded write-up, methodology, raw CSVs, and a "what to
tell Sam" handoff: see
IB_BENCH_RESULTS.md.Why one XML knob
ib_linux:data/ib_profile.xmlshipsrustcastype="allow_remote"with no
<ib_cache>element (C/C++ compilers, by contrast, arecached via
type="local_only" cached="true"). For monty (~100% rustc)that means out-of-the-box IB caching value is near-zero. The cache key
machinery for rustc is already implemented in
ib_linux:cpp/BuildCache/BuildCache_Rules.cpp(rsp-file basedirplaceholder remap keyed on the process name
"rustc"); enabling it isone element in an additive profile.
scripts/ib-profile.xmladds exactly that, additive on top of thesystem default (
ignore_following_profiles="false"), preserving thedefault's gcc/clang/cc1/cc1plus rules instead of redeclaring them.
scripts/cargo-ib.shis the wrapper, every flag verified againstib_linux:cpp/XgConsole/XgConsole_main.cpp:--standalone --build-cache-local-shared --build-cache-basedir=$PWD --build-cache-local-logfile --build-cache-report-all-miss --no-monitor [--profile=…]. On runners without/usr/bin/ib_consoleit exec'splain
cargoso the same workflow step is portable.Measurement matrix (
.github/workflows/ib-bench.yml)cargo test --no-run -p monty,target/wiped between iterations,3 iterations per cell.
ubuntu-latestcargo, Swatinem warmincredibuildIB_NO_CACHE=1)incredibuildincredibuildCell B HIT=0 / MISS=0 is expected —
IB_NO_CACHE=1skips--profile=, so the system default profile applies andrustcisn'tcached. Monty's graph has no significant C work for the default
profile to cache. The 1.6× speedup over A is therefore pure runner
hardware (more cores).
C and D are the cells that would expose the cache value of the
single XML knob. They cannot run on a pool that isn't online; this
is an infra issue on the IB self-hosted runner side, not a monty
issue. See
IB_BENCH_RESULTS.md → "What I need from you (Sam)"forthe one-line button-press to finish the experiment as soon as the
pool is back.
Bug found & fixed mid-experiment (commit 4c68706)
ib_consolerejected the first version ofscripts/ib-profile.xml:XML 1.0 disallows
--inside<!-- … -->. The comment block hadflag names like
--versionwritten literally.xmllint --nooutcatches this; Python's
ElementTreedoes not. Worth reportingupstream in
ib_linux: when--profile=<file>fails to parse,ib_consoleexits 255 and takes the wrapped command with it,rather than warning and falling back to the system default profile.
That's how this masqueraded as "cache produces no work" until the
per-iteration log was read.
Files
scripts/ib-profile.xml— the one-knob additive profile.scripts/cargo-ib.sh— minimalib_consolewrapper.scripts/ib-prep.sh— exportsIB_CACHE_LOG,IB_PROFILE,installs
/usr/bin/timeif missing.scripts/ib-stats.sh— reads per-jobIB_CACHE_LOGinto$GITHUB_STEP_SUMMARY.scripts/ib-bench-run.sh— per-cell driver.scripts/ib-bench-summarize.py— aggregator..github/workflows/ib-bench.yml— 4-cell bench workflow..github/workflows/ci.yml— addsIB_MAX_LOCAL_CORES/IB_PREVENT_OVERLOADto mitigate the ~10–12 min wall-clock cap onthe shared self-hosted runner.
IB_BENCH_RESULTS.md— finish-line write-up + handoff.Test plan
steady-state ~24 s.
runner pool.
runner pool.
xmllint --noout scripts/ib-profile.xmlpasses.ib_consoleacceptsscripts/ib-profile.xml(verified bycell-D run picking it up — no parse error post-fix).
cargo test --no-run -p montyexits 0 under the wrapper(cell B iteration logs).