Finalize Incredibuild runner integration#3
Open
zozo123 wants to merge 66 commits into
Open
Conversation
Mirror the pattern used in Incredibuild-RND/uv (branch
ci/incredibuild-runners): move pure-cargo Linux jobs onto the
self-hosted `incredibuild-runner` label and wrap their cargo
invocations with a small wrapper that goes through `ib_console` when
present (falls back to plain cargo elsewhere, so the same workflow
step still works on GitHub-hosted runners).
Jobs migrated:
- test-rust (8x cargo llvm-cov compile/test invocations)
- bench-test (cargo bench)
- miri (cargo +nightly miri test)
- fuzz (cargo install cargo-fuzz + cargo fuzz run)
Jobs intentionally NOT migrated yet:
- test-python / test-python-coverage -- compile through maturin,
needs a follow-up to route maturin's internal cargo invocation
through ib_console
- test-rust-os -- macOS / Windows only
- lint, build*, test-builds-*, release-* -- light or Docker-based
New files:
- scripts/cargo-ib.sh -- ib_console-aware cargo wrapper,
graceful fallback to plain cargo
- scripts/ensure-ci-tools.sh -- bootstrap sudo/curl/wget on lean
self-hosted runners
Each migrated job pins its own CARGO_HOME / CARGO_TARGET_DIR under
${{ github.workspace }} so concurrent IB jobs don't corrupt each
other through the shared /ib-workspace/cache/cargo* volumes.
ib_console's separate build cache still accelerates compile.
The self-hosted incredibuild-runner image installs Python via actions/setup-python, which on this runner ships libpython3.X.so.1.0 but not the linker-discoverable libpython3.X.so symlink. pyo3-using crates emit a '-lpython3.X' directive, so test-rust (links monty-datatest via pyo3) and bench-test (links monty-bench via pyo3) both fail at the link step: rust-lld: error: unable to find library -lpython3.14 Add a small symlink-recovery step right after setup-python in both jobs. No-op when the .so symlink is already present, so safe on GitHub-hosted runners too.
The first fix (creating the missing libpython3.X.so symlink under $sys.prefix/lib) was necessary but not sufficient. pyo3-ffi's build.rs reads sysconfig at compile time and emits a -L pointing at the path baked into the python-build-standalone tarball (/opt/hostedtoolcache/Python/...), which doesn't exist on this self-hosted IB runner — the real install is under /actions-runner/_work/_tool/Python/.... When the rust-cache restore brings back the cached pyo3-ffi build script output, the stale -L survives across runs. Make the link work regardless of stale paths by exporting LIBRARY_PATH and LD_LIBRARY_PATH pointing at the real lib dir via $GITHUB_ENV. cc / lld fall back to LIBRARY_PATH when the explicit -L paths don't resolve, and LD_LIBRARY_PATH covers runtime when cargo llvm-cov subsequently runs the produced binaries. Also adds a SYSCONFIG_LIBDIR diagnostic to confirm the theory in future logs.
test-rust runs monty-datatest, which spawns CPython subprocesses and compares their output against monty. On the IB runner the default locale is C/POSIX, so CPython picks the ASCII codec for default text I/O and tests that open files with non-ASCII content (mount_fs__errors.py, mount_fs__ops.py — emoji + 0x80 bytes) fail with UnicodeDecodeError. ubuntu-latest has C.UTF-8 by default. Pin LANG / LC_ALL to C.UTF-8 and set PYTHONUTF8=1 belt-and-braces.
These are monty's heaviest workloads — test-python is a 5-version matrix that each compiles pyo3+monty+monty-python via maturin twice (dev + release), and test-python-coverage adds full llvm-cov instrumentation on top. Moving them onto incredibuild-runner is where the biggest acceleration headroom lives. maturin spawns cargo as a subprocess. Cargo respects the $CARGO env var when an external tool launches it, so setting CARGO=$GITHUB_WORKSPACE/scripts/cargo-ib.sh at the job level makes maturin's internal cargo invocation go through ib_console exactly like the direct cargo calls in test-rust. Each test-python matrix entry pre-installs its target Python through uv (so we can locate the install before maturin runs), then creates the libpython3.X.so symlink and exports LIBRARY_PATH/LD_LIBRARY_PATH — same recipe as test-rust/bench-test, applied per matrix Python. test-python-coverage uses the same fix plus wraps its direct cargo llvm-cov invocations the same way as test-rust.
…sole cargo-ib.sh execs ib_console which writes 'Incredibuild System: Trying to connect to ib_server...' / 'ib_server connected, start process execution...' to stdout before passing through to cargo. For compile commands that's harmless logging. For 'cargo llvm-cov show-env --export-prefix' — whose entire stdout is meant to be eval'd as shell — those leading lines get evaluated: + eval 'Incredibuild System: Trying to connect to ib_server... /actions-runner/_work/_temp/...: Incredibuild: command not found Use plain cargo for the env-discovery call. Compile commands (clean, report) still go through the wrapper, and maturin's internal cargo invocation still gets accelerated via the job-level CARGO env.
Reading the ib_linux source (Incredibuild-RND/ib_linux), two findings drive this change: 1. The default profile at /opt/incredibuild/data/ib_profile.xml lists rustc as type='allow_remote' but does NOT enable ib_cache for it. Only cc1/cc1plus/gcc/clang have cached='true'. So by default ib_console DISTRIBUTES rustc invocations but does NOT persist their outputs to the build-avoidance cache. Every CI run recompiles every crate. For a Rust-heavy workspace like monty, that's the dominant cost. The android9+ custom profile bundled in ib_linux shows the right syntax (<ib_cache enabled='true' /> child element, not the cached='true' attribute which routes to ccache). We add a minimal custom profile that overrides only rustc and pass it via ib_console --profile=. 2. Per ib_linux:cpp/BuildCache/BuildCache_HitMiss.cpp, ib_console writes hit/miss info to a logfile when started with --build-cache-local-logfile=. Combined with --build-cache-report-all-miss, each run produces a per-job log we can dump and grep to see what is hitting / missing the cache. Changes: - scripts/ib-profile.xml: enable ib_cache for rustc, keep the default exclude_args (skip build_script_build/build_script_main / version probes). - scripts/cargo-ib.sh: pass --profile=, --build-cache-local-logfile, --build-cache-report-all-miss to every wrapped cargo invocation. - .github/workflows/ci.yml: add 'IB pre-flight diagnostics' and 'IB cache stats' steps (if: always()) to every migrated job. These print ib_console version, cache directory location, and post-build hit/miss summary so the value of IB acceleration is visible in the GitHub Actions run log.
- concurrency.cancel-in-progress=true on the workflow: stops the pile-up of in-flight runs all competing for the single self-hosted IB runner when a chain of commits lands quickly. - max-parallel: 3 on the test-python matrix: 5 simultaneous matrix entries on one IB runner caused contention that pushed each job's wall time well above the ubuntu-latest baseline. Three at a time keeps each job closer to dedicated-runner timings while still parallelising the matrix. - timeout-minutes: 30 on every IB-routed job: gives us a known cap to compare against the mysterious ~12-minute kill we saw on test python 3.14 in the previous two runs. If the runner kills before 30 min, the kill came from outside GitHub Actions and we'll see a different failure signature.
Two fixes / one extension:
1. scripts/ib-profile.xml: XML 1.0 forbids '--' inside <!-- --> comments
per spec 2.5. The previous version had literal command-line flags
(--build-cache-local-shared etc.) in the comment body, which made
ib_console reject the profile with:
ib_console: Comment must not contain '--' (double-hyphen)
That broke every IB-routed job in the run before this one (exit 255
in 14-30 seconds, before any compile). Rephrased the comment to
avoid '--' sequences and re-validated against the schema implicitly
(Python's xml.etree.ElementTree parses it cleanly).
2. Migrate the lint job to incredibuild-runner. lint runs prek which
triggers a workspace-wide clippy compile pass and is the last big
rust-compile workload not yet routed through IB. With CARGO env
set at the job level, prek's internal cargo invocations go through
cargo-ib.sh and benefit from the same ib_cache as test-rust.
Migrated jobs are now:
lint, test-rust, test-python-coverage, test-python (5-version
matrix), bench-test, miri, fuzz.
Remaining ubuntu-latest jobs are intentional: macOS/Windows
test-rust-os; Docker-bound build/build-pgo/build-js; lightweight
artifact/inspection/release jobs.
…rapper The ib_console XML schema (data/ib_profile.xsd in ib_linux) requires: 1. <ib_profile> element to carry version='1' attribute 2. <process> elements wrapped in a <processes> sequence container Without those, ib_console rejects the profile early with: ib_console: Element 'ib_profile': The attribute 'version' is required but missing. Can't validate document from '...' using schema '/opt/incredibuild/data/ib_profile.xsd' That fails every IB-routed job with exit 255 before any compile step. Matched the structure used by the bundled android9+ custom profile (ib_linux:data/custom_profiles/android/9+/ib_profile.xml).
The ib_profile.xsd schema (ib_linux:data/ib_profile.xsd) defines:
<xs:complexType name="ib_profile_type">
<xs:sequence minOccurs="1" maxOccurs="1">
<xs:element name="globals" type="globals_type" />
<xs:element name="processes" type="processes_type" />
</xs:sequence>
<xs:attribute name="version" type="version_type" use="required" />
</xs:complexType>
and globals_type requires ignore_following_profiles. Without it,
ib_console refuses the profile:
ib_console: Element 'processes': This element is not expected.
Expected is ( globals ).
Setting ignore_following_profiles='false' makes our profile additive
on top of /opt/incredibuild/data/ib_profile.xml — the system default
still loads and only the rustc entry is overridden to enable
ib_cache.
Two cosmetic fixes from yamlfmt that lint enforces: - Remove the misindented 'alls-green#why' top-of-job comment that ended up between fuzz job's last step and the next job header. yamlfmt kept trying to push it inside the fuzz job's block, producing diffs each run. - Drop the extra blank line inside the test-python matrix's libpython step body. Functionally identical; just unblocks the lint job from cycling on formatting nits.
Two corrections discovered by re-reading ib_linux:cpp/XgConsole/
XgConsole_main.cpp and BuildCache/BuildCache_defines.h:
1. --build-cache-force is NOT a real ib_console flag. There's no
matching getopt_long entry and no GETOPT_ enum value, so prior
runs were silently ignoring it. Removed from cargo-ib.sh. The
semantically equivalent behavior (cache-fill on first run) is
implicit in --build-cache-local-shared.
2. The IB build-avoidance cache lives at:
/etc/incredibuild/cache/build_cache/shared/
(BUILD_CACHE_LOCAL_PATH in BuildCache_defines.h), NOT under
/ib-workspace/cache/. Build reports for sqlite-based stats live
under /etc/incredibuild/db/. The diagnostic steps now inspect
those real paths before and after each job and try to surface
hit/miss stats via the bundled show_build_cache_statistics.sh
when a buildId can be inferred.
This is purely a visibility + correctness change; cache behavior
itself is unchanged from the previous commit. Lets us see, in each
job log, whether the IB cache is being populated and growing as
expected, and whether the rustc-cached profile actually translates
to manifest.json + .tar artifacts under the shared cache dir.
Discovered in miri run pydantic#12's stdout: Incredibuild System: Build Cache report is '/etc/incredibuild/log/2026-May-11/local-14/ib_hm.log' So ib_console writes hit/miss data to a per-build path under /etc/incredibuild/log/YYYY-Mon-DD/local-<buildId>/, regardless of where --build-cache-local-logfile points. (The runtime path our script asks for is inside the chroot/namespace, hence invisible.) Post-flight step now finds the 3 most-recent ib_hm.log files via mtime, dumps the tail of each, and counts HIT/MISS lines so each job's cache effectiveness is visible directly in the GHA log. Also visible from run pydantic#12: /etc/incredibuild/cache/build_cache/shared already contains 465 MiB across 454 .tar artifacts and hash-prefixed subdirs (00..ff). The cache is real, populated, and surviving across runs. The missing piece was just the per-run hit/miss numbers; this commit surfaces them.
prek runs make lint-rs which invokes cargo clippy directly (no
'uv run' wrapper). cargo honors .cargo/config.toml which sets
PYO3_PYTHON=.venv/bin/python3 (relative). On the IB self-hosted
runner that path doesn't resolve at clippy time:
error: failed to run custom build command for pyo3-build-config
error: failed to run the Python interpreter at
/actions-runner/_work/monty/monty/.venv/bin/python3:
No such file or directory (os error 2)
The other migrated jobs (test-rust, bench-test, miri) already do
'rm .cargo/config.toml' for the same reason — clippy then uses
setup-uv's python via pyo3-build-config auto-detection.
…t deps
When CARGO_HOME=$github.workspace/.cargo, cargo's git dependency
checkouts land at .cargo/git/checkouts/<crate-hash>/<rev>/...
inside the workspace. prek then runs ruff/format-lint-py across the
workspace, walks into .cargo/git/checkouts/ruff-*/, and chokes on
ruff's own intentional bad-input test fixtures:
Failed to read .cargo/git/checkouts/ruff-.../crates/
ruff_notebook/resources/test/fixtures/jupyter/invalid_extension.ipynb:
Expected a Jupyter Notebook, [...] isn't valid JSON
Failed to parse .cargo/git/checkouts/ruff-.../crates/
ty_completion_eval/truth/.../main.py:1:1:
Invalid annotated assignment target
Pin CARGO_HOME to $runner.temp/lint-cargo for the lint job so the
cargo registry/git checkouts live outside the prek scan root.
This is lint-only because it's the only IB-routed job that runs ruff
on the workspace tree. The other migrated jobs keep CARGO_HOME under
github.workspace to avoid cross-job collisions on a shared registry
when concurrent jobs share the IB runner filesystem.
…env)
runner.temp is only available at STEP-level env / in run scripts —
NOT at job-level env. The previous commit's
CARGO_HOME: ${{ runner.temp }}/lint-cargo
caused the whole workflow to fail to start (run had 0 jobs, run
name reverted to '.github/workflows/ci.yml' literal path, signal
that GitHub Actions rejected the file during initial validation).
Use a static /tmp/lint-cargo — guaranteed writable on Ubuntu-based
self-hosted runners and reliably outside the workspace tree.
Two issues observed on run pydantic#16: 1. lint failed at the runner's 12-minute hard cap. Real work (prek, IB cache stats) all SUCCEEDED in ~30s. The 11+ minutes were spent in 'Post Run Swatinem/rust-cache' (saving cache to GitHub Actions cache storage from inside ib_console's chroot/namespace). Whereas test-rust's Post-Swatinem completed fine because the cache key already matched the restored entry (nothing new to save). lint uses nightly Rust + prek-installed tools, so the post-restore diff is larger and the save phase stalls. 2. test python 3.12 and 3.14 hit the 12-minute cap on 'make dev-py-release'. Other matrix entries (3.10/3.11/3.13) finished in ~5 minutes. Suggests resource contention between 3 concurrent maturin-release compiles on the single IB runner. Mitigations: - save-if: ${{ false }} on every Swatinem/rust-cache step in IB jobs. The IB build cache is what's actually accelerating us (Swatinem restored only 1.7 KB on previous runs); making Swatinem restore-only eliminates the post-action stall. - max-parallel: 3 -> 2 on the test-python matrix to give each concurrent maturin release compile more CPU headroom on the single runner.
…ility Run pydantic#18 showed that long-compile IB jobs (miri, fuzz, lint) hit a ~10-12 minute wall-clock cap on the self-hosted IB runner when 6+ concurrent compile jobs share its CPU. The cap is runner-side (not GitHub Actions timeout-minutes). Workaround: reduce concurrent IB jobs. Changes: - test-python matrix: max-parallel 2 -> 1 Serializes the 5 Python versions, removing the largest single source of concurrent compile pressure. - miri: needs [bench-test] Stages miri after bench-test, so miri's cargo-fuzz / miri test compile doesn't share CPU with bench-test's monty-bench compile. - fuzz: needs [miri] Stages fuzz after miri. Both are compile-heavy. Net effect on a typical run: - ~4 concurrent heavy IB jobs at peak (was ~8) - per-job wall-clock should stay under the cap - workflow wall-clock increases but reliability improves
Pulls every migrated job's IB setup/diagnostic boilerplate out of
ci.yml and into two helper scripts:
scripts/ib-prep.sh pre-flight: baseline tools (sudo/curl/wget)
+ ib_console diagnostics + libpython.so symlink
+ LIBRARY_PATH/LD_LIBRARY_PATH exports
+ .venv ensure for lint's prek/clippy
scripts/ib-stats.sh post-flight: dump real cache path size + .tar
artifact count + ib_hm.log tails
Each migrated job's body is now minimal:
- uses: actions/checkout@...
- name: IB pre-flight
run: ./scripts/ib-prep.sh
- <real work>
- name: IB cache stats
if: always()
run: ./scripts/ib-stats.sh
ci.yml drops 474 lines (-28 %). Future upstream syncs are now easy:
re-pull the workflow, drop one line per migrated job (the pre-flight
and stats steps), and the rest is upstream verbatim.
Also fixes the persistent lint failure: don't 'rm -f .cargo/config.toml'
(prek's check-yaml hook requires the file present on disk); instead
ib-prep.sh pre-creates .venv at workspace root via 'uv venv' so the
PYO3_PYTHON=.venv/bin/python3 path resolves under clippy.
scripts/ensure-ci-tools.sh removed; its baseline-tool logic now lives
inside ib-prep.sh.
Two fixes after run pydantic#20 surfaced two new issues: 1. zizmor (workflow security audit, exit 12) flagged the 'save-if: ${{ false }}' as obfuscation per docs.zizmor.sh audits/#obfuscation — recommends the static evaluation. Switch to literal 'save-if: false' on all 7 Swatinem steps. Same behavior, zizmor-clean. 2. bench-test (and any other pyo3-linking job) failed with 'rust-lld: error: unable to find library -lpython3.14' because ib-prep.sh ran right after checkout, BEFORE setup-python. With no python3 on PATH yet, the libpython.so symlink + LIBRARY_PATH exports were skipped, and by the time cargo bench ran, pyo3-ffi had no library search path. Move 'IB pre-flight' to sit just before the first cargo / make / maturin / prek invocation in each migrated job. ib-prep.sh now runs after setup-python and setup-uv, so it has the right python on PATH for its libpython + .venv work.
test-rust hit the IB runner's 12-min wall-clock cap on run pydantic#21 while mid-way through its 7-pass cargo llvm-cov sequence (step 14 of 22). The cap is shared-CPU-driven: when 4+ heavy compile jobs share the single self-hosted IB runner, test-rust's wall-clock blows past the cap. Stage test-rust to wait for bench-test (~50s), lint (~150s), and test-python-coverage (~115s) before it starts. Once those clear, the only concurrent compile load is the already-serialised test-python matrix (max-parallel:1). With less competition, test-rust's 7×llvm-cov fits under the cap (was 250s wall-clock on run pydantic#16 in similar conditions).
Run pydantic#22 had 10/11 jobs green but test python 3.14 sat queued ~40min on the IB runner. Trigger a fresh run that should: - run on warm IB cache (run pydantic#22's compiles persisted to /etc/incredibuild/cache/build_cache/shared/) - pick up the runner cleanly via the concurrency cancel-in-progress - give us the complete 11/11 green baseline for the benchmark
basedpyright failed in lint with:
uv run basedpyright
/ib-workspace/build/venv/lib/python3.14/site-packages/basedpyright/
dist/pyright.js:154568
SyntaxError: Invalid or unexpected token
The IB runner image carries a stale /ib-workspace/build/venv that uv
falls through to when it can't find a project venv. The pyright.js
there is broken, and 'uv run' picks it up over the venv our 'uv sync'
creates.
Pin UV_PROJECT_ENVIRONMENT=$github.workspace/.venv at the lint job
env so 'uv run' resolves to the fresh local venv. ib-prep.sh already
'uv venv .venv' fallback-creates it.
The IB self-hosted runner's ~10 min wall-clock cap repeatedly killed lint mid-prek across runs pydantic#18-24. lint's heavy steps (basedpyright loading 154k-line pyright.js, workspace-wide clippy compile) are neither IB-cacheable in a meaningful way nor compile-bound enough to benefit from ib_cache. Run it back on ubuntu-latest (was 4m07s upstream) where parallelism + bigger CPU keep it under any timeout. test-rust's 'needs:' chain drops 'lint' (lint is now parallel on ubuntu). Still needs [bench-test, test-python-coverage] which both sit on the same IB runner and want to clear before test-rust's 7-pass llvm-cov compile starts.
make dev-py-release runs uv run maturin develop --release. The repo's release profile is lto='fat' + codegen-units=1 (great for shipping wheels, slow to compile). On the IB self-hosted runner that compile + the followup pytest blew past the ~12-min wall-clock cap on test python 3.10 / 3.12 / 3.14 across runs pydantic#16, pydantic#20, pydantic#24, pydantic#26, pydantic#27. Override CARGO_PROFILE_RELEASE_LTO=false and CODEGEN_UNITS=16 inside test-python only. Same release semantics (optimized + debuginfo stripped behavior intact), just trades a bit of binary perf for much faster link. The real LTO-built wheels are still exercised end-to-end by test-builds-os/test-builds-arch which use maturin-action's Docker image (not migrated to IB).
…in this PR) After 5 IB runs hit the ~12-min wall-clock cap on test-python's make dev-py-release step (runs pydantic#16, pydantic#20, pydantic#24, pydantic#26, pydantic#27), and the CARGO_PROFILE_RELEASE_LTO=false override (run pydantic#28) didn't dispatch within a reasonable time, take the same pragmatic path we took for lint: keep the matrix on ubuntu-latest. Final shape of IB-routed jobs: test-rust (heavy: 7×cargo llvm-cov on workspace) bench-test (monty-bench compile) miri (cargo +nightly miri test) fuzz (cargo install cargo-fuzz + fuzz run) test-python-coverage (single maturin compile + pytest + llvm-cov) These 5 jobs reliably succeed on IB and demonstrate the cache effect (run pydantic#10 cold → run pydantic#16/22/26 warm shows 1.5-2.5x speedup on the same workload). Lint and the 5-version test-python matrix stay on ubuntu-latest where parallelism + bigger CPU keep them within timeouts; this is the same tradeoff every distributed-build setup makes when a single shared runner can't host every parallel workflow.
Run #25692017142 cell-D logs showed: ib_console: Double hyphen within comment: <!-- ib_console: Failed to parse '/.../scripts/ib-profile.xml' Can't validate document from '/.../scripts/ib-profile.xml' using schema '/opt/incredibuild/data/ib_profile.xsd' XML 1.0 disallows '--' inside <!-- ... --> comments and ib_console's libxml-based parser enforces it strictly. The comment block in this file referenced '--version' literally, which tripped the parser, and ib_console then exited 255 — making cells C and D in the bench complete in 20ms with rustc never cached. Cell B (IB_NO_CACHE=1) was unaffected because it doesn't pass --profile. Replace literal flag prefixes inside the comment with neutral phrasing; the XML data on the rustc <process> element keeps its actual '--version:-vV:...' attribute (which is allowed because attribute values, unlike comments, may contain double hyphens). Co-authored-by: Cursor <cursoragent@cursor.com>
Captures the state of PR #1 at finish-line: * Cell A (ubuntu-latest, plain cargo) and cell B (IB runner, no rustc cache) measured cleanly across 3 iterations each. Steady-state wall is 38.5s vs ~24s — IB runner hardware alone is ~1.6x faster than ubuntu-latest on monty's compile workload. * Cells C (cold rustc cache) and D (warm rustc cache) blocked on the Incredibuild-RND/monty self-hosted runner pool sitting at 42 total / 0 online during the most recent experiment window (50+ minutes continuous). This is an infra issue on the IB pool, not a monty change. * Documents the profile-XML double-hyphen-in-comment bug found and fixed mid-experiment (commit 4c68706): ib_console rejects the profile, exits 255, and takes the wrapped cargo invocation with it, which masquerades as 'cache produces no work'. Worth flagging upstream in ib_linux as a usability bug. * Spells out exactly what Sam (project owner) needs to do to close the loop: stable runner pool + one workflow_dispatch button. The bench infra (workflow, scripts, profile, summarizer) is already green and will populate cells C and D as soon as runners are reachable. Co-authored-by: Cursor <cursoragent@cursor.com>
…est) Documents and pins the existing design: ib_console wraps cargo invocations only. pytest, uv, top-level maturin, prek/ruff/mypy are deliberately NOT wrapped. The cargo subprocess that maturin shells out to IS routed through cargo-ib.sh via the cargo CARGO=<path> env- var contract (already wired in test-python-coverage), so the rustc cache still pays off for the heavy compile. Why nothing else is worth wrapping (reasoning grounded in ib_linux:cpp/BuildCache/BuildCache_Rules.cpp and BuildCache_BuildCache.cpp): * ib_console's cache key is process-name + argv + env subset + content hashes of files referenced literally on argv (or in the rustc rsp file). No tracking of dlopen / Python imports / runtime fs reads. That's the right shape for compilers, the wrong shape for an interpreter. * pytest / uv run / python: dynamic import graph, runtime side effects. Cache key would either trivially miss or be wrong. * maturin's top-level driver: Python orchestrator that calls cargo and copies a .so. The orchestration is fast and side-effecty; the cargo subprocess is the part worth caching, and that's already routed via CARGO=/scripts/cargo-ib.sh at the job level. * ruff/mypy/basedpyright/prek: linters with their own incremental caches; ib_console daemon-startup cost would dwarf the work, and the lint job already runs on ubuntu-latest anyway. Changes: 1. scripts/cargo-ib.sh - added a SCOPE section to the header spelling out the rule so future contributors don't 'helpfully' pipe pytest through the wrapper. 2. .github/workflows/ci.yml::test-python-coverage - expanded the one-line CARGO env comment into the full why-not-pytest rationale at the call site. 3. IB_BENCH_RESULTS.md - added a 'Python and ib_console - when does it make sense?' section walking through every Python touch-point in the workflow with a keep/skip verdict and a one-line reason each, plus a TL;DR bullet at the top for Sam. Also notes two concrete things ib_linux could add (cached build_script_*, test- binary fingerprint cache) that would extend value to Rust+Python repos generally. Co-authored-by: Cursor <cursoragent@cursor.com>
…le-fixer) Co-authored-by: Cursor <cursoragent@cursor.com>
Cells A/B/C/D all green on ib-bench run #25696652366. Summarizer now splits all-iter aggregate (which mixes cold-cache iter 1 with warm iter 2/3) from steady-state (iter >= 2 only) so the value claim is unambiguous. Also formats per ruff format and replaces the ambiguous 'l' loop variable so the lint hook on ci.yml's lint job stops complaining (format-lint-py). Final numbers (cargo test --no-run -p monty, target/ wiped between iterations, 3 iters per cell): steady state (iter>=2) wall speedup A: ubuntu-latest, plain cargo 38.3+/-0.5 s 1.00x B: IB runner, default IB profile (no rustc) 24.6+/-0.3 s 1.55x D: IB runner, custom profile, warm cache 4.6+/-0.0 s 8.36x Cell C proves the cache populates: one cold compile grew the shared build cache by 612 MiB. Cell D iter 1 was 39.5 s (cold cache fill on a different ephemeral runner than C); iters 2 and 3 were 4.59 s and 4.56 s (cache replay). Co-authored-by: Cursor <cursoragent@cursor.com>
The previous 'all_shas = set().union(*shas.values())' triggered basedpyright reportUnknownVariableType because bare set() is set[Unknown]. A type annotation alone wasn't enough (basedpyright still inferred set[Unknown | str] | set[str] for the union expression). Replaced with an explicit-type-annotated empty set + loop union, which produces a clean set[str]. Co-authored-by: Cursor <cursoragent@cursor.com>
fuzz tokens_input_panic finished at 12:01 wall on the IB runner across multiple PR runs (75463693214 at 11:00, 75465317455 at 12:01, etc.) — exactly the well-known ~10-12-min wall-clock cap on this self-hosted runner. The job pays cargo-fuzz install + fuzz-target compile + 60s fuzz run + ib_console daemon-startup × 2; even with IB_MAX_LOCAL_CORES and IB_PREVENT_OVERLOAD throttling, the cap is unreachable in this shape of workload. Reverting fuzz to ubuntu-latest doesn't reduce IB coverage because the rustc-cache value claim is established by .github/workflows/ib-bench.yml on the same shape of compile (cells A/B/C/D, 8.36x warm-cache speedup documented in IB_BENCH_RESULTS.md). Same revert rationale already applied to 'lint' and the 'test-python' matrix earlier in this PR. The IB jobs that now remain on incredibuild-runner are the ones that fit the cap and benefit from rustc cache: - test-rust (7x cargo llvm-cov, IB_MAX_LOCAL_CORES=4) - test-python-coverage (maturin develop + pytest, with maturin's cargo routed via CARGO=) - bench-test (cargo bench compile) - miri (nightly cargo miri test, slow but bounded) Co-authored-by: Cursor <cursoragent@cursor.com>
Verified bench claims against the green CI run (25703024761) and found one important honesty correction: - 8.36x is the bench ceiling (identical workload, target wiped, warm rustc cache replay). Verified: cargo really compiled, 22 test binaries with byte-identical hashes to iter 1, exit 0. - Real test-rust speedup is ~1.5-2x, not 8x. The 7 cargo llvm-cov invocations spray distinct rustc cache keys via mixed feature flags, so the cache only fully replays on steps 2/4/6. Steps 1/3/5/7 hit fresh keys and run at near-baseline. Net job wall ~304s vs an estimated ~350-450s on ubuntu-latest. Also documented per-runner cache locality (614/987/8 MiB observed across three jobs in the same CI run) and the warm-replay target/ size delta (cache restores rustc outputs but not target/debug/incremental/, which is a non-issue for cargo test --no-run but worth flagging for the mental model). Co-authored-by: Cursor <cursoragent@cursor.com>
Cells A/B/C/D measure the synthetic `cargo test --no-run -p monty`
workload, which is fast but doesn't capture the full test-rust cost
(7x cargo llvm-cov + clean). The realistic test-rust speedup so far
has been an estimate (~1.5–2x) inferred from real-CI logs.
Adds two new measurement cells running the actual ci.yml::test-rust
sequence verbatim, so the E → F steady-state ratio is the directly
measured number:
E ubuntu-latest, plain cargo, 2 iterations
F incredibuild-runner, cargo-ib.sh, IB warm cache, 2 iterations
(chained after D for predictable IB cache state)
Implementation:
* scripts/ib-bench-run.sh — adds WORKLOAD={synthetic,test-rust} and
CARGO_BIN env vars. Synthetic stays the default so cells A/B/C/D
are unchanged. The test-rust workload runs the 8-call llvm-cov
sequence per iteration; per-iter wall/user/sys are summed across
calls and rss is the per-call max. CSV schema unchanged
(one row per iteration).
* .github/workflows/ib-bench.yml — adds cell-E-ubuntu-test-rust
and cell-F-ib-test-rust jobs with 30-min timeouts; both feed
the summarize job's needs list and CSV-collection loop.
* scripts/ib-bench-summarize.py — extends CELLS with E/F, adds an
"E → F" steady-state row that fmt_ratio's iter≥2 means, refreshes
the top-level doc and section heading.
Pure additive: cells A/B/C/D, scripts/cargo-ib.sh, scripts/ib-profile.xml
and .github/workflows/ci.yml are untouched.
Co-authored-by: Cursor <cursoragent@cursor.com>
Three additive PoV improvements based on parallel subagent investigations: - Cell E (ubuntu-latest, real test-rust workload, 8 cargo llvm-cov calls / iter, target wiped between iters) measured at 357 s steady-state from run 25705064240. Replaces the previously- inferred ubuntu-latest baseline. Cell F still pending the IB runner pool which has been fully offline (0/30 online) for the measurement window. - New ib-probe.yml workflow (dispatch-only, 5 min on incredibuild- runner) probes role markers, ib_server/ib_coordinator presence, Coordinator.* rows in the agent SQLite DB, --check-license, and a no-standalone smoke test. Answers "is IB distribution available on this runner image?" — currently believed to be no (initiator-only image), but --standalone in the wrapper silences the only diagnostic that would prove or disprove it. - IB_BENCH_RESULTS.md gains a "Distribution mode" section and an "sccache structural comparison" section. Distribution explains what --standalone really does (per XgConsole_Session.cpp:308- 404: tolerate missing coordinator, NOT skip ib_server connect timeout — earlier doc was wrong on this) and what cell Q would measure if helpers were provisioned. Sccache section explains why the OSS baseline structurally caps below IB's 8.36x ceiling on monty (~25 proc-macro crates + bin test binary + incremental workspace crates are all uncacheable by sccache); cites public sccache speedup numbers from NeoSmart 2024 + sccache#2041. Also fixes the --standalone comment in cargo-ib.sh to reflect what the source actually shows the flag does. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
All six bench cells green on the same date / same runner pool. Replaces estimates with measurements: - Cell A (synthetic, ubuntu-latest): 36.4s steady-state - Cell B (synthetic, IB no-cache): 22.1s steady → 1.65x hardware floor - Cell C (synthetic, IB cold cache): 40.6s, +612 MiB - Cell D (synthetic, IB warm cache): 4.2s steady → 8.68x ceiling - Cell E (real test-rust, ubuntu-latest): 325.7s steady - Cell F (real test-rust, IB warm cache): 220.2s steady → 1.48x measured ib-probe.yml run (25706946478) confirmed: runner image is initiator + helper, coordinator-less. Distribution path is structurally unavailable until a coordinator + helper-pool registration are added at runner-image build time. Updated the distribution section to reflect the probe's actual output rather than the prior "to be probed" wording. Final realistic test-rust speedup of 1.48x is at the bottom of the prior 1.5-2x estimate band. Documented why: feature-flag matrix spray, IB_MAX_LOCAL_CORES throttling for wall-clock-cap mitigation, and uncached test execution combined leave less room than the unthrottled cell B can show on a single cargo call. Co-authored-by: Cursor <cursoragent@cursor.com>
…I + Layer B manylinux probe + Sam doc Summary of this commit (the monty-side of the seven-layer plan in .cursor/plans/monty-ib-cross-repo-strategy-*.plan.md): Layer F — three monty wirings (unilateral, no upstream dependency) - .github/workflows/codspeed.yml: runs-on: incredibuild-runner + CARGO=$(pwd)/scripts/cargo-ib.sh + IB pre-flight/stats steps. Codspeed builds the bench crate every PR; high cache locality. - .github/workflows/ci.yml::build-js: matrix entries for x86_64-unknown-linux-gnu and wasm32-wasip1-threads switched to incredibuild-runner with conditional IB env (CARGO, IB_MAX_LOCAL_CORES, IB_PREVENT_OVERLOAD) and IB pre-flight/stats guarded by `if: matrix.settings.host == 'incredibuild-runner'`. macOS / Windows / aarch64 / arm64 entries kept on their existing runners (IB has no pool there yet — Layer G). Validation cells (extending the existing A–F bench matrix) - ib-bench.yml::cell-G-ib-shim-simulation: Layer-A simulation. Same test-rust workload as cell F, but cargo is dispatched via a PATH-prepended shim that hand-mimics what vnext-processing-engine/src/build_accelerator/default_rules.yaml's generated cargo entry would auto-emit if cargo were upgraded from ENV mode to SHIM mode (the contents of branch feat/cargo-rustc-shim's ib-accel/bin/cargo). G tracking F within noise is the green light to retire scripts/cargo-ib.sh from monty the moment Layer A lands and the runner image rebuilds. - ib-bench.yml::cell-I-ib-codspeed: codspeed workload (cargo codspeed build -p monty-bench --bench main) on IB warm. Validates Layer F's codspeed.yml rewire. Disjoint rustc keyspace from test-rust, so D/F caches don't help — I's iter1→iter2 ratio is the cleanest single-job signal for the every-PR codspeed workflow. - scripts/ib-bench-run.sh: new `codspeed` workload variant alongside the existing `synthetic` and `test-rust` workloads. - scripts/ib-bench-summarize.py: G/I rendered in the markdown table with their own steady-state comparison sub-tables (F→G ratio, I cold/warm). Layer B — manylinux container probe - .github/workflows/ib-probe.yml: new `manylinux-probe` job runs `runs-on: incredibuild-runner` + `container: image: quay.io/pypa/manylinux_2_28_x86_64`. Probes whether vnext-processing-engine's container-hooks/index.js already injects /ib-workspace volumes and ib_console into a manylinux container (the hypothesis being that 8 of monty's compile-bound jobs — the whole wheel-build matrix — are already IB-reachable but never verified). Probe checks: volume injection, ib_console resolution, glibc compat, --standalone smoke test. Documentation - IB_BENCH_RESULTS.md: appended a Cross-repo strategy update section explaining the two upstream gaps (cargo ENV-mode-only in default_rules.yaml; container-hooks/index.js shipping but never verified for manylinux). Includes a coverage-trajectory table showing how each layer moves monty IB coverage from 12.5% today to 84% with all layers shipped. - IB_NEXT_STEPS_SAM.md: new action-item companion to the bench results doc. Maps each layer (A through G) to owner / effort / effect on monty / effect on every other IB customer; spells out the cleanup deletes that follow each layer's merge; lists the four concrete asks for Sam (approve, get vnext PR reviewed, schedule IB-ops sync for C+E, triage Layer B's probe outcome). Cross-repo PR The companion to this commit is feat/cargo-rustc-shim on Incredibuild-RND/vnext-processing-engine (Layer A — promote cargo from ENV to SHIM mode in default_rules.yaml; 83 unit tests + 6 integration tests). Branch pushed; PR-ready. Co-authored-by: Cursor <cursoragent@cursor.com>
…er); pin manylinux digest CI run 25722680967 reproducibly failed in `cargo codspeed run` with: setarch: failed to set personality to x86_64: Operation not permitted ##[error]failed to execute valgrind The CodSpeedHQ action shells out to valgrind, which uses setarch to set ADDR_NO_RANDOMIZE personality. The IB self-hosted runner image runs under restricted Linux capabilities (no SYS_ADMIN, user-namespace remap) so the personality syscall is blocked. github-hosted runners allow it. This is a structural blocker — not specific to monty — that affects every valgrind-based tool in CI (callgrind, memcheck, codspeed, ...). Two paths to recover the IB value here are documented in IB_NEXT_STEPS_SAM.md as a new IB-product roadmap item: 1. Hybrid: cargo codspeed build on IB, transfer artifacts, cargo codspeed run on ubuntu-latest. Doable but requires careful artifact pinning. 2. Have IB ops relax the runner image's seccomp/capability profile to allow setarch personality (or grant CAP_SYS_ADMIN). Common for build runners. Until either lands, codspeed.yml stays on ubuntu-latest. The monty-side measurement of the IB-build value lives in ib-bench.yml::cell-I-ib-codspeed (only `cargo codspeed build`, no valgrind run, so it works on IB). Also pinned the manylinux container image in ib-probe.yml by manifest digest (sha256:443eabd378e1...), addressing zizmor's unpinned-images audit. The probe job uses the digest-pinned image to validate Layer B (container hooks injecting /ib-workspace into container: image: xx jobs). Co-authored-by: Cursor <cursoragent@cursor.com>
ib-probe.yml::manylinux-probe (run 25726192172) confirmed end-to-end:
- vnext-processing-engine container-hooks/index.js fires on a
GHA-level container: block, bind-mounting /ib-workspace/cache and
/ib-workspace/incredibuild + putting /ib-workspace/incredibuild/
ib-accel/bin at the front of PATH inside the container.
- /usr/bin/ib_console v3.25.2 runs natively under the manylinux
image's glibc 2.28 (no GLIBC_2.x mismatch).
- --standalone --no-monitor -- /bin/true connects to ib_server,
proving the cache and the in-namespace distribution path are both
live inside the container.
Cell H closes the loop on Layer B by measuring cargo-test-no-run on
the same manylinux image under ib_console, comparable to cell D
(synthetic, IB warm, on the bare host). H_warm / D_warm tracking
1.0 ± 10% means container-ization adds no overhead and the wheel-
build matrix (build job's 7 Linux entries + build-pgo linux) can be
migrated onto incredibuild-runner with a two-line GHA edit per job.
Doc updates:
- IB_BENCH_RESULTS.md: Layer-A row points at vnext PR pydantic#210; Layer-B
marked GREEN with run link; coverage trajectory updated for the
Phase-8 path (4 -> 6 -> 14 -> 17 -> 27 of 32).
- IB_NEXT_STEPS_SAM.md: Layer-B section rewritten as the validated
result; ask pydantic#4 to Sam flipped to "done"; explicit 30-min agenda
added for the Layer-C + Layer-E IB-ops sync.
Co-authored-by: Cursor <cursoragent@cursor.com>
…nup-fix Three small follow-ups after the Layer-B GREEN result and Cell-H first run: 1. ib-probe.yml::probe — add a "Layer-A cargo SHIM deploy check" group that looks for /ib-workspace/incredibuild/ib-accel/bin/cargo (or /opt/ib-accel/bin/cargo on older variants). The next probe run after vnext-processing-engine#210 lands and the runner image rebuilds will report `FOUND` and unblock Phase 5 of the closure plan automatically — no one has to remember to re-check. 2. IB_CLEANUP_SPEC.md — new mechanical cleanup spec for closure-plan Phases 5 (cargo-ib.sh removal), 6 (ib-profile.xml removal), 7 (lint/fuzz/test-python re-route), and 8 (manylinux build matrix migration). Each phase lists exact files + line ranges + sed patterns + verification + commit-message template, so when its gate clears the right person can open the cleanup PR in 10 min without re-deriving the change set. 3. scripts/ib-bench-run.sh — fix cleanup step to honor $CARGO_TARGET_DIR. Cell H sets CARGO_TARGET_DIR=target-h to isolate from host-side cells, but the cleanup hardcoded `rm -rf target` so cell H iter 2 reused iter 1's artifacts (measured 0.35s instead of a real warm-cache rebuild). target_size() also updated to honor the env. Cells A-G/I always use the default target/ so behavior unchanged for them. The Cell-H first run (in ib-bench run 25727104334) still proved the qualitative finding: container hook fires, ib_console runs under glibc 2.28, cargo wrapping works end-to-end (iter 1 = 46.5s cold). The numerical H_warm/D_warm comparison just needs a re-run with this fix. Co-authored-by: Cursor <cursoragent@cursor.com>
ib-bench run 25727572729 with the CARGO_TARGET_DIR fix produced clean Cell H numbers: A iter 2 (ubuntu-latest, no IB): 37.4 s D iter 2 (IB host, warm cache): 5.27 s -> 7.10x vs A H iter 2 (IB manylinux container, warm): 21.3 s -> 1.76x vs A H beats the closure plan's 1.3x gate for Phase 8. The 4x gap between H (container) and D (bare host) on the same workload is a follow-up: the container's separate rustup install gives it disjoint cargo cache keys from the host. Aligning the toolchain would close the gap, but 1.76x vs ubuntu-latest is already enough to migrate the wheel-build matrix. Co-authored-by: Cursor <cursoragent@cursor.com>
Apply ruff formatting to the Cell-H summary strings so the lint job no longer rewrites scripts/ib-bench-summarize.py in CI. Co-authored-by: Cursor <cursoragent@cursor.com>
Tal deployed the runner image built from vnext-processing-engine#210, and ib-probe run 25732897099 confirmed the generated cargo shim is live at /ib-workspace/incredibuild/ib-accel/bin/cargo. Remove monty's repo-local cargo wrapper and route CI/bench commands through plain cargo so the runner-image shim owns ib_console wrapping via PATH. Keep the repo profile alive until Layer C by teaching ib-prep.sh to export IB_CONSOLE_ARGS for the vnext shim, including the per-job cache logfile and --profile=scripts/ib-profile.xml unless IB_NO_CACHE is set. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Keep the monty wiring aligned with the shipped cargo shim while preserving the small bridge for cargo extension workloads, and make the hosted-profile and CodSpeed decisions explicit locally. Co-authored-by: Cursor <cursoragent@cursor.com>
Record the vnext follow-up that will remove monty's remaining cargo bridge once the runner image is rebuilt. Co-authored-by: Cursor <cursoragent@cursor.com>
Use the deployed vnext cargo shim for Monty's cargo extension and toolchain forms so the evidence branch proves the out-of-the-box runner path. Co-authored-by: Cursor <cursoragent@cursor.com>
Keep the real test-rust benchmark cell aligned with ci.yml so the evidence workflow measures the deployed shim without tripping the runner wall-clock cap. Co-authored-by: Cursor <cursoragent@cursor.com>
Keep CodSpeed from blocking the Incredibuild fork where project authorization is unavailable, and remove direct project-owner naming from IB handoff docs. Co-authored-by: Cursor <cursoragent@cursor.com>
Codecov Results 📊✅ Patch coverage is 100.00%. Project has 23456 uncovered lines. Generated by Codecov Action |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
incredibuild-runnerusing the deployed runner-image cargo shim and Monty IB profile.ib-benchandib-probeworkflows plus docs for the measured value and remaining runner/product follow-ups.ubuntu-latestand skips it on the Incredibuild fork, where project authorization is unavailable.Validation
Bench headline
36.4s -> 22.1s)36.4s -> 4.2s)test-rustspeedup: 1.48x (325.7s -> 220.2s)Made with Cursor