Skip to content

Finalize Incredibuild runner integration#3

Open
zozo123 wants to merge 66 commits into
mainfrom
ci/incredibuild-final-working
Open

Finalize Incredibuild runner integration#3
zozo123 wants to merge 66 commits into
mainfrom
ci/incredibuild-final-working

Conversation

@zozo123
Copy link
Copy Markdown
Collaborator

@zozo123 zozo123 commented May 14, 2026

Summary

  • Routes the validated heavy Rust CI paths through incredibuild-runner using the deployed runner-image cargo shim and Monty IB profile.
  • Adds reproducible ib-bench and ib-probe workflows plus docs for the measured value and remaining runner/product follow-ups.
  • Keeps CodSpeed on ubuntu-latest and skips it on the Incredibuild fork, where project authorization is unavailable.

Validation

Bench headline

  • Hardware floor: 1.65x (36.4s -> 22.1s)
  • Warm identical-workload cache ceiling: 8.68x (36.4s -> 4.2s)
  • Real Monty test-rust speedup: 1.48x (325.7s -> 220.2s)

Made with Cursor

zozo123 added 30 commits May 11, 2026 11:47
Mirror the pattern used in Incredibuild-RND/uv (branch
ci/incredibuild-runners): move pure-cargo Linux jobs onto the
self-hosted `incredibuild-runner` label and wrap their cargo
invocations with a small wrapper that goes through `ib_console` when
present (falls back to plain cargo elsewhere, so the same workflow
step still works on GitHub-hosted runners).

Jobs migrated:
- test-rust         (8x cargo llvm-cov compile/test invocations)
- bench-test        (cargo bench)
- miri              (cargo +nightly miri test)
- fuzz              (cargo install cargo-fuzz + cargo fuzz run)

Jobs intentionally NOT migrated yet:
- test-python / test-python-coverage  -- compile through maturin,
  needs a follow-up to route maturin's internal cargo invocation
  through ib_console
- test-rust-os                        -- macOS / Windows only
- lint, build*, test-builds-*, release-*  -- light or Docker-based

New files:
- scripts/cargo-ib.sh        -- ib_console-aware cargo wrapper,
                                graceful fallback to plain cargo
- scripts/ensure-ci-tools.sh -- bootstrap sudo/curl/wget on lean
                                self-hosted runners

Each migrated job pins its own CARGO_HOME / CARGO_TARGET_DIR under
${{ github.workspace }} so concurrent IB jobs don't corrupt each
other through the shared /ib-workspace/cache/cargo* volumes.
ib_console's separate build cache still accelerates compile.
The self-hosted incredibuild-runner image installs Python via
actions/setup-python, which on this runner ships libpython3.X.so.1.0
but not the linker-discoverable libpython3.X.so symlink. pyo3-using
crates emit a '-lpython3.X' directive, so test-rust (links
monty-datatest via pyo3) and bench-test (links monty-bench via pyo3)
both fail at the link step:

  rust-lld: error: unable to find library -lpython3.14

Add a small symlink-recovery step right after setup-python in both
jobs. No-op when the .so symlink is already present, so safe on
GitHub-hosted runners too.
The first fix (creating the missing libpython3.X.so symlink under
$sys.prefix/lib) was necessary but not sufficient. pyo3-ffi's
build.rs reads sysconfig at compile time and emits a -L pointing at
the path baked into the python-build-standalone tarball
(/opt/hostedtoolcache/Python/...), which doesn't exist on this
self-hosted IB runner — the real install is under
/actions-runner/_work/_tool/Python/.... When the rust-cache restore
brings back the cached pyo3-ffi build script output, the stale
-L survives across runs.

Make the link work regardless of stale paths by exporting
LIBRARY_PATH and LD_LIBRARY_PATH pointing at the real lib dir via
$GITHUB_ENV. cc / lld fall back to LIBRARY_PATH when the explicit
-L paths don't resolve, and LD_LIBRARY_PATH covers runtime when
cargo llvm-cov subsequently runs the produced binaries.

Also adds a SYSCONFIG_LIBDIR diagnostic to confirm the theory in
future logs.
test-rust runs monty-datatest, which spawns CPython subprocesses and
compares their output against monty. On the IB runner the default
locale is C/POSIX, so CPython picks the ASCII codec for default
text I/O and tests that open files with non-ASCII content
(mount_fs__errors.py, mount_fs__ops.py — emoji + 0x80 bytes) fail
with UnicodeDecodeError. ubuntu-latest has C.UTF-8 by default.

Pin LANG / LC_ALL to C.UTF-8 and set PYTHONUTF8=1 belt-and-braces.
These are monty's heaviest workloads — test-python is a 5-version
matrix that each compiles pyo3+monty+monty-python via maturin twice
(dev + release), and test-python-coverage adds full llvm-cov
instrumentation on top. Moving them onto incredibuild-runner is
where the biggest acceleration headroom lives.

maturin spawns cargo as a subprocess. Cargo respects the $CARGO env
var when an external tool launches it, so setting
CARGO=$GITHUB_WORKSPACE/scripts/cargo-ib.sh at the job level makes
maturin's internal cargo invocation go through ib_console exactly
like the direct cargo calls in test-rust.

Each test-python matrix entry pre-installs its target Python through
uv (so we can locate the install before maturin runs), then creates
the libpython3.X.so symlink and exports LIBRARY_PATH/LD_LIBRARY_PATH
— same recipe as test-rust/bench-test, applied per matrix Python.

test-python-coverage uses the same fix plus wraps its direct cargo
llvm-cov invocations the same way as test-rust.
…sole

cargo-ib.sh execs ib_console which writes 'Incredibuild System:
Trying to connect to ib_server...' / 'ib_server connected, start
process execution...' to stdout before passing through to cargo. For
compile commands that's harmless logging. For 'cargo llvm-cov
show-env --export-prefix' — whose entire stdout is meant to be
eval'd as shell — those leading lines get evaluated:

  + eval 'Incredibuild System: Trying to connect to ib_server...
  /actions-runner/_work/_temp/...: Incredibuild: command not found

Use plain cargo for the env-discovery call. Compile commands (clean,
report) still go through the wrapper, and maturin's internal cargo
invocation still gets accelerated via the job-level CARGO env.
Reading the ib_linux source (Incredibuild-RND/ib_linux), two findings
drive this change:

1. The default profile at /opt/incredibuild/data/ib_profile.xml lists
   rustc as type='allow_remote' but does NOT enable ib_cache for it.
   Only cc1/cc1plus/gcc/clang have cached='true'. So by default
   ib_console DISTRIBUTES rustc invocations but does NOT persist their
   outputs to the build-avoidance cache. Every CI run recompiles every
   crate. For a Rust-heavy workspace like monty, that's the dominant
   cost.

   The android9+ custom profile bundled in ib_linux shows the right
   syntax (<ib_cache enabled='true' /> child element, not the
   cached='true' attribute which routes to ccache). We add a minimal
   custom profile that overrides only rustc and pass it via
   ib_console --profile=.

2. Per ib_linux:cpp/BuildCache/BuildCache_HitMiss.cpp, ib_console
   writes hit/miss info to a logfile when started with
   --build-cache-local-logfile=. Combined with
   --build-cache-report-all-miss, each run produces a per-job log we
   can dump and grep to see what is hitting / missing the cache.

Changes:
- scripts/ib-profile.xml: enable ib_cache for rustc, keep the default
  exclude_args (skip build_script_build/build_script_main / version
  probes).
- scripts/cargo-ib.sh: pass --profile=, --build-cache-local-logfile,
  --build-cache-report-all-miss to every wrapped cargo invocation.
- .github/workflows/ci.yml: add 'IB pre-flight diagnostics' and
  'IB cache stats' steps (if: always()) to every migrated job. These
  print ib_console version, cache directory location, and post-build
  hit/miss summary so the value of IB acceleration is visible in the
  GitHub Actions run log.
- concurrency.cancel-in-progress=true on the workflow: stops the
  pile-up of in-flight runs all competing for the single self-hosted
  IB runner when a chain of commits lands quickly.
- max-parallel: 3 on the test-python matrix: 5 simultaneous matrix
  entries on one IB runner caused contention that pushed each job's
  wall time well above the ubuntu-latest baseline. Three at a time
  keeps each job closer to dedicated-runner timings while still
  parallelising the matrix.
- timeout-minutes: 30 on every IB-routed job: gives us a known cap
  to compare against the mysterious ~12-minute kill we saw on
  test python 3.14 in the previous two runs. If the runner kills
  before 30 min, the kill came from outside GitHub Actions and we'll
  see a different failure signature.
Two fixes / one extension:

1. scripts/ib-profile.xml: XML 1.0 forbids '--' inside <!-- --> comments
   per spec 2.5. The previous version had literal command-line flags
   (--build-cache-local-shared etc.) in the comment body, which made
   ib_console reject the profile with:
     ib_console: Comment must not contain '--' (double-hyphen)
   That broke every IB-routed job in the run before this one (exit 255
   in 14-30 seconds, before any compile). Rephrased the comment to
   avoid '--' sequences and re-validated against the schema implicitly
   (Python's xml.etree.ElementTree parses it cleanly).

2. Migrate the lint job to incredibuild-runner. lint runs prek which
   triggers a workspace-wide clippy compile pass and is the last big
   rust-compile workload not yet routed through IB. With CARGO env
   set at the job level, prek's internal cargo invocations go through
   cargo-ib.sh and benefit from the same ib_cache as test-rust.

Migrated jobs are now:
  lint, test-rust, test-python-coverage, test-python (5-version
  matrix), bench-test, miri, fuzz.

Remaining ubuntu-latest jobs are intentional: macOS/Windows
test-rust-os; Docker-bound build/build-pgo/build-js; lightweight
artifact/inspection/release jobs.
…rapper

The ib_console XML schema (data/ib_profile.xsd in ib_linux) requires:
  1. <ib_profile> element to carry version='1' attribute
  2. <process> elements wrapped in a <processes> sequence container

Without those, ib_console rejects the profile early with:
  ib_console: Element 'ib_profile': The attribute 'version' is required but missing.
  Can't validate document from '...' using schema '/opt/incredibuild/data/ib_profile.xsd'

That fails every IB-routed job with exit 255 before any compile step.
Matched the structure used by the bundled android9+ custom profile
(ib_linux:data/custom_profiles/android/9+/ib_profile.xml).
The ib_profile.xsd schema (ib_linux:data/ib_profile.xsd) defines:

  <xs:complexType name="ib_profile_type">
    <xs:sequence minOccurs="1" maxOccurs="1">
      <xs:element name="globals"   type="globals_type"   />
      <xs:element name="processes" type="processes_type" />
    </xs:sequence>
    <xs:attribute name="version" type="version_type" use="required" />
  </xs:complexType>

and globals_type requires ignore_following_profiles. Without it,
ib_console refuses the profile:

  ib_console: Element 'processes': This element is not expected.
              Expected is ( globals ).

Setting ignore_following_profiles='false' makes our profile additive
on top of /opt/incredibuild/data/ib_profile.xml — the system default
still loads and only the rustc entry is overridden to enable
ib_cache.
Two cosmetic fixes from yamlfmt that lint enforces:
- Remove the misindented 'alls-green#why' top-of-job comment that
  ended up between fuzz job's last step and the next job header.
  yamlfmt kept trying to push it inside the fuzz job's block,
  producing diffs each run.
- Drop the extra blank line inside the test-python matrix's
  libpython step body.

Functionally identical; just unblocks the lint job from cycling on
formatting nits.
Two corrections discovered by re-reading ib_linux:cpp/XgConsole/
XgConsole_main.cpp and BuildCache/BuildCache_defines.h:

1. --build-cache-force is NOT a real ib_console flag. There's no
   matching getopt_long entry and no GETOPT_ enum value, so prior
   runs were silently ignoring it. Removed from cargo-ib.sh. The
   semantically equivalent behavior (cache-fill on first run) is
   implicit in --build-cache-local-shared.

2. The IB build-avoidance cache lives at:
     /etc/incredibuild/cache/build_cache/shared/
   (BUILD_CACHE_LOCAL_PATH in BuildCache_defines.h), NOT under
   /ib-workspace/cache/. Build reports for sqlite-based stats live
   under /etc/incredibuild/db/. The diagnostic steps now inspect
   those real paths before and after each job and try to surface
   hit/miss stats via the bundled show_build_cache_statistics.sh
   when a buildId can be inferred.

This is purely a visibility + correctness change; cache behavior
itself is unchanged from the previous commit. Lets us see, in each
job log, whether the IB cache is being populated and growing as
expected, and whether the rustc-cached profile actually translates
to manifest.json + .tar artifacts under the shared cache dir.
Discovered in miri run pydantic#12's stdout:

  Incredibuild System: Build Cache report is
    '/etc/incredibuild/log/2026-May-11/local-14/ib_hm.log'

So ib_console writes hit/miss data to a per-build path under
/etc/incredibuild/log/YYYY-Mon-DD/local-<buildId>/, regardless of
where --build-cache-local-logfile points. (The runtime path our
script asks for is inside the chroot/namespace, hence invisible.)

Post-flight step now finds the 3 most-recent ib_hm.log files via
mtime, dumps the tail of each, and counts HIT/MISS lines so each
job's cache effectiveness is visible directly in the GHA log.

Also visible from run pydantic#12: /etc/incredibuild/cache/build_cache/shared
already contains 465 MiB across 454 .tar artifacts and hash-prefixed
subdirs (00..ff). The cache is real, populated, and surviving across
runs. The missing piece was just the per-run hit/miss numbers; this
commit surfaces them.
prek runs make lint-rs which invokes cargo clippy directly (no
'uv run' wrapper). cargo honors .cargo/config.toml which sets
PYO3_PYTHON=.venv/bin/python3 (relative). On the IB self-hosted
runner that path doesn't resolve at clippy time:

  error: failed to run custom build command for pyo3-build-config
    error: failed to run the Python interpreter at
    /actions-runner/_work/monty/monty/.venv/bin/python3:
    No such file or directory (os error 2)

The other migrated jobs (test-rust, bench-test, miri) already do
'rm .cargo/config.toml' for the same reason — clippy then uses
setup-uv's python via pyo3-build-config auto-detection.
…t deps

When CARGO_HOME=$github.workspace/.cargo, cargo's git dependency
checkouts land at .cargo/git/checkouts/<crate-hash>/<rev>/...
inside the workspace. prek then runs ruff/format-lint-py across the
workspace, walks into .cargo/git/checkouts/ruff-*/, and chokes on
ruff's own intentional bad-input test fixtures:

  Failed to read .cargo/git/checkouts/ruff-.../crates/
    ruff_notebook/resources/test/fixtures/jupyter/invalid_extension.ipynb:
    Expected a Jupyter Notebook, [...] isn't valid JSON
  Failed to parse .cargo/git/checkouts/ruff-.../crates/
    ty_completion_eval/truth/.../main.py:1:1:
    Invalid annotated assignment target

Pin CARGO_HOME to $runner.temp/lint-cargo for the lint job so the
cargo registry/git checkouts live outside the prek scan root.

This is lint-only because it's the only IB-routed job that runs ruff
on the workspace tree. The other migrated jobs keep CARGO_HOME under
github.workspace to avoid cross-job collisions on a shared registry
when concurrent jobs share the IB runner filesystem.
…env)

runner.temp is only available at STEP-level env / in run scripts —
NOT at job-level env. The previous commit's
  CARGO_HOME: ${{ runner.temp }}/lint-cargo
caused the whole workflow to fail to start (run had 0 jobs, run
name reverted to '.github/workflows/ci.yml' literal path, signal
that GitHub Actions rejected the file during initial validation).

Use a static /tmp/lint-cargo — guaranteed writable on Ubuntu-based
self-hosted runners and reliably outside the workspace tree.
Two issues observed on run pydantic#16:

1. lint failed at the runner's 12-minute hard cap. Real work (prek,
   IB cache stats) all SUCCEEDED in ~30s. The 11+ minutes were spent
   in 'Post Run Swatinem/rust-cache' (saving cache to GitHub Actions
   cache storage from inside ib_console's chroot/namespace). Whereas
   test-rust's Post-Swatinem completed fine because the cache key
   already matched the restored entry (nothing new to save). lint
   uses nightly Rust + prek-installed tools, so the post-restore
   diff is larger and the save phase stalls.

2. test python 3.12 and 3.14 hit the 12-minute cap on 'make
   dev-py-release'. Other matrix entries (3.10/3.11/3.13) finished
   in ~5 minutes. Suggests resource contention between 3 concurrent
   maturin-release compiles on the single IB runner.

Mitigations:

- save-if: ${{ false }} on every Swatinem/rust-cache step in IB
  jobs. The IB build cache is what's actually accelerating us
  (Swatinem restored only 1.7 KB on previous runs); making Swatinem
  restore-only eliminates the post-action stall.
- max-parallel: 3 -> 2 on the test-python matrix to give each
  concurrent maturin release compile more CPU headroom on the
  single runner.
…ility

Run pydantic#18 showed that long-compile IB jobs (miri, fuzz, lint) hit a
~10-12 minute wall-clock cap on the self-hosted IB runner when 6+
concurrent compile jobs share its CPU. The cap is runner-side
(not GitHub Actions timeout-minutes). Workaround: reduce concurrent
IB jobs.

Changes:
- test-python matrix: max-parallel 2 -> 1
  Serializes the 5 Python versions, removing the largest single
  source of concurrent compile pressure.
- miri: needs [bench-test]
  Stages miri after bench-test, so miri's cargo-fuzz / miri test
  compile doesn't share CPU with bench-test's monty-bench compile.
- fuzz: needs [miri]
  Stages fuzz after miri. Both are compile-heavy.

Net effect on a typical run:
- ~4 concurrent heavy IB jobs at peak (was ~8)
- per-job wall-clock should stay under the cap
- workflow wall-clock increases but reliability improves
Pulls every migrated job's IB setup/diagnostic boilerplate out of
ci.yml and into two helper scripts:

  scripts/ib-prep.sh   pre-flight: baseline tools (sudo/curl/wget)
                       + ib_console diagnostics + libpython.so symlink
                       + LIBRARY_PATH/LD_LIBRARY_PATH exports
                       + .venv ensure for lint's prek/clippy
  scripts/ib-stats.sh  post-flight: dump real cache path size + .tar
                       artifact count + ib_hm.log tails

Each migrated job's body is now minimal:

  - uses: actions/checkout@...
  - name: IB pre-flight
    run: ./scripts/ib-prep.sh
  - <real work>
  - name: IB cache stats
    if: always()
    run: ./scripts/ib-stats.sh

ci.yml drops 474 lines (-28 %). Future upstream syncs are now easy:
re-pull the workflow, drop one line per migrated job (the pre-flight
and stats steps), and the rest is upstream verbatim.

Also fixes the persistent lint failure: don't 'rm -f .cargo/config.toml'
(prek's check-yaml hook requires the file present on disk); instead
ib-prep.sh pre-creates .venv at workspace root via 'uv venv' so the
PYO3_PYTHON=.venv/bin/python3 path resolves under clippy.

scripts/ensure-ci-tools.sh removed; its baseline-tool logic now lives
inside ib-prep.sh.
Two fixes after run pydantic#20 surfaced two new issues:

1. zizmor (workflow security audit, exit 12) flagged the
   'save-if: ${{ false }}' as obfuscation per docs.zizmor.sh
   audits/#obfuscation — recommends the static evaluation. Switch
   to literal 'save-if: false' on all 7 Swatinem steps. Same
   behavior, zizmor-clean.

2. bench-test (and any other pyo3-linking job) failed with
   'rust-lld: error: unable to find library -lpython3.14' because
   ib-prep.sh ran right after checkout, BEFORE setup-python. With no
   python3 on PATH yet, the libpython.so symlink + LIBRARY_PATH
   exports were skipped, and by the time cargo bench ran, pyo3-ffi
   had no library search path.

   Move 'IB pre-flight' to sit just before the first cargo / make /
   maturin / prek invocation in each migrated job. ib-prep.sh now
   runs after setup-python and setup-uv, so it has the right python
   on PATH for its libpython + .venv work.
test-rust hit the IB runner's 12-min wall-clock cap on run pydantic#21 while
mid-way through its 7-pass cargo llvm-cov sequence (step 14 of 22).
The cap is shared-CPU-driven: when 4+ heavy compile jobs share the
single self-hosted IB runner, test-rust's wall-clock blows past the
cap.

Stage test-rust to wait for bench-test (~50s), lint (~150s), and
test-python-coverage (~115s) before it starts. Once those clear, the
only concurrent compile load is the already-serialised test-python
matrix (max-parallel:1). With less competition, test-rust's
7×llvm-cov fits under the cap (was 250s wall-clock on run pydantic#16 in
similar conditions).
Run pydantic#22 had 10/11 jobs green but test python 3.14 sat queued ~40min
on the IB runner. Trigger a fresh run that should:
- run on warm IB cache (run pydantic#22's compiles persisted to
  /etc/incredibuild/cache/build_cache/shared/)
- pick up the runner cleanly via the concurrency cancel-in-progress
- give us the complete 11/11 green baseline for the benchmark
basedpyright failed in lint with:

  uv run basedpyright
  /ib-workspace/build/venv/lib/python3.14/site-packages/basedpyright/
    dist/pyright.js:154568
  SyntaxError: Invalid or unexpected token

The IB runner image carries a stale /ib-workspace/build/venv that uv
falls through to when it can't find a project venv. The pyright.js
there is broken, and 'uv run' picks it up over the venv our 'uv sync'
creates.

Pin UV_PROJECT_ENVIRONMENT=$github.workspace/.venv at the lint job
env so 'uv run' resolves to the fresh local venv. ib-prep.sh already
'uv venv .venv' fallback-creates it.
The IB self-hosted runner's ~10 min wall-clock cap repeatedly killed
lint mid-prek across runs pydantic#18-24. lint's heavy steps (basedpyright
loading 154k-line pyright.js, workspace-wide clippy compile) are
neither IB-cacheable in a meaningful way nor compile-bound enough to
benefit from ib_cache. Run it back on ubuntu-latest (was 4m07s
upstream) where parallelism + bigger CPU keep it under any timeout.

test-rust's 'needs:' chain drops 'lint' (lint is now parallel on
ubuntu). Still needs [bench-test, test-python-coverage] which both
sit on the same IB runner and want to clear before test-rust's
7-pass llvm-cov compile starts.
make dev-py-release runs uv run maturin develop --release. The repo's
release profile is lto='fat' + codegen-units=1 (great for shipping
wheels, slow to compile). On the IB self-hosted runner that
compile + the followup pytest blew past the ~12-min wall-clock cap
on test python 3.10 / 3.12 / 3.14 across runs pydantic#16, pydantic#20, pydantic#24, pydantic#26, pydantic#27.

Override CARGO_PROFILE_RELEASE_LTO=false and CODEGEN_UNITS=16 inside
test-python only. Same release semantics (optimized + debuginfo
stripped behavior intact), just trades a bit of binary perf for
much faster link. The real LTO-built wheels are still exercised
end-to-end by test-builds-os/test-builds-arch which use
maturin-action's Docker image (not migrated to IB).
…in this PR)

After 5 IB runs hit the ~12-min wall-clock cap on test-python's
make dev-py-release step (runs pydantic#16, pydantic#20, pydantic#24, pydantic#26, pydantic#27), and the
CARGO_PROFILE_RELEASE_LTO=false override (run pydantic#28) didn't dispatch
within a reasonable time, take the same pragmatic path we took for
lint: keep the matrix on ubuntu-latest.

Final shape of IB-routed jobs:
  test-rust          (heavy: 7×cargo llvm-cov on workspace)
  bench-test         (monty-bench compile)
  miri               (cargo +nightly miri test)
  fuzz               (cargo install cargo-fuzz + fuzz run)
  test-python-coverage (single maturin compile + pytest + llvm-cov)

These 5 jobs reliably succeed on IB and demonstrate the cache
effect (run pydantic#10 cold → run pydantic#16/22/26 warm shows 1.5-2.5x speedup
on the same workload). Lint and the 5-version test-python matrix
stay on ubuntu-latest where parallelism + bigger CPU keep them
within timeouts; this is the same tradeoff every distributed-build
setup makes when a single shared runner can't host every parallel
workflow.
zozo123 and others added 29 commits May 11, 2026 22:27
Run #25692017142 cell-D logs showed:
  ib_console: Double hyphen within comment: <!--
  ib_console: Failed to parse '/.../scripts/ib-profile.xml'
  Can't validate document from '/.../scripts/ib-profile.xml' using
  schema '/opt/incredibuild/data/ib_profile.xsd'

XML 1.0 disallows '--' inside <!-- ... --> comments and ib_console's
libxml-based parser enforces it strictly. The comment block in this
file referenced '--version' literally, which tripped the parser, and
ib_console then exited 255 — making cells C and D in the bench complete
in 20ms with rustc never cached. Cell B (IB_NO_CACHE=1) was unaffected
because it doesn't pass --profile.

Replace literal flag prefixes inside the comment with neutral phrasing;
the XML data on the rustc <process> element keeps its actual
'--version:-vV:...' attribute (which is allowed because attribute
values, unlike comments, may contain double hyphens).

Co-authored-by: Cursor <cursoragent@cursor.com>
Captures the state of PR #1 at finish-line:

* Cell A (ubuntu-latest, plain cargo) and cell B (IB runner, no rustc
  cache) measured cleanly across 3 iterations each. Steady-state wall
  is 38.5s vs ~24s — IB runner hardware alone is ~1.6x faster than
  ubuntu-latest on monty's compile workload.

* Cells C (cold rustc cache) and D (warm rustc cache) blocked on the
  Incredibuild-RND/monty self-hosted runner pool sitting at 42 total /
  0 online during the most recent experiment window (50+ minutes
  continuous). This is an infra issue on the IB pool, not a monty
  change.

* Documents the profile-XML double-hyphen-in-comment bug found and
  fixed mid-experiment (commit 4c68706): ib_console rejects the
  profile, exits 255, and takes the wrapped cargo invocation with it,
  which masquerades as 'cache produces no work'. Worth flagging
  upstream in ib_linux as a usability bug.

* Spells out exactly what Sam (project owner) needs to do to close
  the loop: stable runner pool + one workflow_dispatch button. The
  bench infra (workflow, scripts, profile, summarizer) is already
  green and will populate cells C and D as soon as runners are
  reachable.

Co-authored-by: Cursor <cursoragent@cursor.com>
…est)

Documents and pins the existing design: ib_console wraps cargo
invocations only. pytest, uv, top-level maturin, prek/ruff/mypy are
deliberately NOT wrapped. The cargo subprocess that maturin shells
out to IS routed through cargo-ib.sh via the cargo CARGO=<path> env-
var contract (already wired in test-python-coverage), so the rustc
cache still pays off for the heavy compile.

Why nothing else is worth wrapping (reasoning grounded in
ib_linux:cpp/BuildCache/BuildCache_Rules.cpp and BuildCache_BuildCache.cpp):

* ib_console's cache key is process-name + argv + env subset +
  content hashes of files referenced literally on argv (or in the
  rustc rsp file). No tracking of dlopen / Python imports / runtime
  fs reads. That's the right shape for compilers, the wrong shape
  for an interpreter.

* pytest / uv run / python: dynamic import graph, runtime side
  effects. Cache key would either trivially miss or be wrong.

* maturin's top-level driver: Python orchestrator that calls cargo
  and copies a .so. The orchestration is fast and side-effecty; the
  cargo subprocess is the part worth caching, and that's already
  routed via CARGO=/scripts/cargo-ib.sh at the job level.

* ruff/mypy/basedpyright/prek: linters with their own incremental
  caches; ib_console daemon-startup cost would dwarf the work, and
  the lint job already runs on ubuntu-latest anyway.

Changes:

1. scripts/cargo-ib.sh - added a SCOPE section to the header
   spelling out the rule so future contributors don't 'helpfully'
   pipe pytest through the wrapper.

2. .github/workflows/ci.yml::test-python-coverage - expanded the
   one-line CARGO env comment into the full why-not-pytest rationale
   at the call site.

3. IB_BENCH_RESULTS.md - added a 'Python and ib_console - when does
   it make sense?' section walking through every Python touch-point
   in the workflow with a keep/skip verdict and a one-line reason
   each, plus a TL;DR bullet at the top for Sam. Also notes two
   concrete things ib_linux could add (cached build_script_*, test-
   binary fingerprint cache) that would extend value to Rust+Python
   repos generally.

Co-authored-by: Cursor <cursoragent@cursor.com>
…le-fixer)

Co-authored-by: Cursor <cursoragent@cursor.com>
Cells A/B/C/D all green on ib-bench run #25696652366. Summarizer now
splits all-iter aggregate (which mixes cold-cache iter 1 with warm
iter 2/3) from steady-state (iter >= 2 only) so the value claim is
unambiguous. Also formats per ruff format and replaces the ambiguous
'l' loop variable so the lint hook on ci.yml's lint job stops
complaining (format-lint-py).

Final numbers (cargo test --no-run -p monty, target/ wiped between
iterations, 3 iters per cell):

  steady state (iter>=2)                          wall            speedup
  A: ubuntu-latest, plain cargo                   38.3+/-0.5 s    1.00x
  B: IB runner, default IB profile (no rustc)     24.6+/-0.3 s    1.55x
  D: IB runner, custom profile, warm cache         4.6+/-0.0 s    8.36x

Cell C proves the cache populates: one cold compile grew the shared
build cache by 612 MiB. Cell D iter 1 was 39.5 s (cold cache fill on
a different ephemeral runner than C); iters 2 and 3 were 4.59 s and
4.56 s (cache replay).

Co-authored-by: Cursor <cursoragent@cursor.com>
The previous 'all_shas = set().union(*shas.values())' triggered
basedpyright reportUnknownVariableType because bare set() is set[Unknown].
A type annotation alone wasn't enough (basedpyright still inferred
set[Unknown | str] | set[str] for the union expression). Replaced with
an explicit-type-annotated empty set + loop union, which produces a
clean set[str].

Co-authored-by: Cursor <cursoragent@cursor.com>
fuzz tokens_input_panic finished at 12:01 wall on the IB runner across
multiple PR runs (75463693214 at 11:00, 75465317455 at 12:01, etc.) —
exactly the well-known ~10-12-min wall-clock cap on this self-hosted
runner. The job pays cargo-fuzz install + fuzz-target compile + 60s
fuzz run + ib_console daemon-startup × 2; even with IB_MAX_LOCAL_CORES
and IB_PREVENT_OVERLOAD throttling, the cap is unreachable in this
shape of workload.

Reverting fuzz to ubuntu-latest doesn't reduce IB coverage because the
rustc-cache value claim is established by .github/workflows/ib-bench.yml
on the same shape of compile (cells A/B/C/D, 8.36x warm-cache speedup
documented in IB_BENCH_RESULTS.md). Same revert rationale already
applied to 'lint' and the 'test-python' matrix earlier in this PR.

The IB jobs that now remain on incredibuild-runner are the ones that
fit the cap and benefit from rustc cache:
- test-rust (7x cargo llvm-cov, IB_MAX_LOCAL_CORES=4)
- test-python-coverage (maturin develop + pytest, with maturin's cargo
  routed via CARGO=)
- bench-test (cargo bench compile)
- miri (nightly cargo miri test, slow but bounded)

Co-authored-by: Cursor <cursoragent@cursor.com>
Verified bench claims against the green CI run (25703024761) and
found one important honesty correction:

- 8.36x is the bench ceiling (identical workload, target wiped,
  warm rustc cache replay). Verified: cargo really compiled, 22
  test binaries with byte-identical hashes to iter 1, exit 0.

- Real test-rust speedup is ~1.5-2x, not 8x. The 7 cargo llvm-cov
  invocations spray distinct rustc cache keys via mixed feature
  flags, so the cache only fully replays on steps 2/4/6. Steps
  1/3/5/7 hit fresh keys and run at near-baseline. Net job wall
  ~304s vs an estimated ~350-450s on ubuntu-latest.

Also documented per-runner cache locality (614/987/8 MiB observed
across three jobs in the same CI run) and the warm-replay
target/ size delta (cache restores rustc outputs but not
target/debug/incremental/, which is a non-issue for cargo test
--no-run but worth flagging for the mental model).

Co-authored-by: Cursor <cursoragent@cursor.com>
Cells A/B/C/D measure the synthetic `cargo test --no-run -p monty`
workload, which is fast but doesn't capture the full test-rust cost
(7x cargo llvm-cov + clean). The realistic test-rust speedup so far
has been an estimate (~1.5–2x) inferred from real-CI logs.

Adds two new measurement cells running the actual ci.yml::test-rust
sequence verbatim, so the E → F steady-state ratio is the directly
measured number:

  E  ubuntu-latest, plain cargo, 2 iterations
  F  incredibuild-runner, cargo-ib.sh, IB warm cache, 2 iterations
     (chained after D for predictable IB cache state)

Implementation:

* scripts/ib-bench-run.sh — adds WORKLOAD={synthetic,test-rust} and
  CARGO_BIN env vars. Synthetic stays the default so cells A/B/C/D
  are unchanged. The test-rust workload runs the 8-call llvm-cov
  sequence per iteration; per-iter wall/user/sys are summed across
  calls and rss is the per-call max. CSV schema unchanged
  (one row per iteration).
* .github/workflows/ib-bench.yml — adds cell-E-ubuntu-test-rust
  and cell-F-ib-test-rust jobs with 30-min timeouts; both feed
  the summarize job's needs list and CSV-collection loop.
* scripts/ib-bench-summarize.py — extends CELLS with E/F, adds an
  "E → F" steady-state row that fmt_ratio's iter≥2 means, refreshes
  the top-level doc and section heading.

Pure additive: cells A/B/C/D, scripts/cargo-ib.sh, scripts/ib-profile.xml
and .github/workflows/ci.yml are untouched.

Co-authored-by: Cursor <cursoragent@cursor.com>
Three additive PoV improvements based on parallel subagent
investigations:

- Cell E (ubuntu-latest, real test-rust workload, 8 cargo llvm-cov
  calls / iter, target wiped between iters) measured at 357 s
  steady-state from run 25705064240. Replaces the previously-
  inferred ubuntu-latest baseline. Cell F still pending the IB
  runner pool which has been fully offline (0/30 online) for the
  measurement window.

- New ib-probe.yml workflow (dispatch-only, 5 min on incredibuild-
  runner) probes role markers, ib_server/ib_coordinator presence,
  Coordinator.* rows in the agent SQLite DB, --check-license, and
  a no-standalone smoke test. Answers "is IB distribution
  available on this runner image?" — currently believed to be no
  (initiator-only image), but --standalone in the wrapper
  silences the only diagnostic that would prove or disprove it.

- IB_BENCH_RESULTS.md gains a "Distribution mode" section and an
  "sccache structural comparison" section. Distribution explains
  what --standalone really does (per XgConsole_Session.cpp:308-
  404: tolerate missing coordinator, NOT skip ib_server connect
  timeout — earlier doc was wrong on this) and what cell Q would
  measure if helpers were provisioned. Sccache section explains
  why the OSS baseline structurally caps below IB's 8.36x ceiling
  on monty (~25 proc-macro crates + bin test binary + incremental
  workspace crates are all uncacheable by sccache); cites public
  sccache speedup numbers from NeoSmart 2024 + sccache#2041.

Also fixes the --standalone comment in cargo-ib.sh to reflect
what the source actually shows the flag does.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
All six bench cells green on the same date / same runner pool.
Replaces estimates with measurements:

- Cell A (synthetic, ubuntu-latest): 36.4s steady-state
- Cell B (synthetic, IB no-cache): 22.1s steady → 1.65x hardware floor
- Cell C (synthetic, IB cold cache): 40.6s, +612 MiB
- Cell D (synthetic, IB warm cache): 4.2s steady → 8.68x ceiling
- Cell E (real test-rust, ubuntu-latest): 325.7s steady
- Cell F (real test-rust, IB warm cache): 220.2s steady → 1.48x measured

ib-probe.yml run (25706946478) confirmed: runner image is
initiator + helper, coordinator-less. Distribution path is
structurally unavailable until a coordinator + helper-pool
registration are added at runner-image build time. Updated the
distribution section to reflect the probe's actual output rather
than the prior "to be probed" wording.

Final realistic test-rust speedup of 1.48x is at the bottom of the
prior 1.5-2x estimate band. Documented why: feature-flag matrix
spray, IB_MAX_LOCAL_CORES throttling for wall-clock-cap
mitigation, and uncached test execution combined leave less room
than the unthrottled cell B can show on a single cargo call.

Co-authored-by: Cursor <cursoragent@cursor.com>
…I + Layer B manylinux probe + Sam doc

Summary of this commit (the monty-side of the seven-layer plan in
.cursor/plans/monty-ib-cross-repo-strategy-*.plan.md):

Layer F — three monty wirings (unilateral, no upstream dependency)
- .github/workflows/codspeed.yml: runs-on: incredibuild-runner +
  CARGO=$(pwd)/scripts/cargo-ib.sh + IB pre-flight/stats steps.
  Codspeed builds the bench crate every PR; high cache locality.
- .github/workflows/ci.yml::build-js: matrix entries for
  x86_64-unknown-linux-gnu and wasm32-wasip1-threads switched to
  incredibuild-runner with conditional IB env (CARGO,
  IB_MAX_LOCAL_CORES, IB_PREVENT_OVERLOAD) and IB pre-flight/stats
  guarded by `if: matrix.settings.host == 'incredibuild-runner'`.
  macOS / Windows / aarch64 / arm64 entries kept on their existing
  runners (IB has no pool there yet — Layer G).

Validation cells (extending the existing A–F bench matrix)
- ib-bench.yml::cell-G-ib-shim-simulation: Layer-A simulation. Same
  test-rust workload as cell F, but cargo is dispatched via a
  PATH-prepended shim that hand-mimics what
  vnext-processing-engine/src/build_accelerator/default_rules.yaml's
  generated cargo entry would auto-emit if cargo were upgraded from
  ENV mode to SHIM mode (the contents of branch
  feat/cargo-rustc-shim's ib-accel/bin/cargo). G tracking F within
  noise is the green light to retire scripts/cargo-ib.sh from monty
  the moment Layer A lands and the runner image rebuilds.
- ib-bench.yml::cell-I-ib-codspeed: codspeed workload (cargo
  codspeed build -p monty-bench --bench main) on IB warm. Validates
  Layer F's codspeed.yml rewire. Disjoint rustc keyspace from
  test-rust, so D/F caches don't help — I's iter1→iter2 ratio is
  the cleanest single-job signal for the every-PR codspeed
  workflow.
- scripts/ib-bench-run.sh: new `codspeed` workload variant alongside
  the existing `synthetic` and `test-rust` workloads.
- scripts/ib-bench-summarize.py: G/I rendered in the markdown table
  with their own steady-state comparison sub-tables (F→G ratio,
  I cold/warm).

Layer B — manylinux container probe
- .github/workflows/ib-probe.yml: new `manylinux-probe` job runs
  `runs-on: incredibuild-runner` + `container: image:
  quay.io/pypa/manylinux_2_28_x86_64`. Probes whether
  vnext-processing-engine's container-hooks/index.js already injects
  /ib-workspace volumes and ib_console into a manylinux container
  (the hypothesis being that 8 of monty's compile-bound jobs — the
  whole wheel-build matrix — are already IB-reachable but never
  verified). Probe checks: volume injection, ib_console resolution,
  glibc compat, --standalone smoke test.

Documentation
- IB_BENCH_RESULTS.md: appended a Cross-repo strategy update section
  explaining the two upstream gaps (cargo ENV-mode-only in
  default_rules.yaml; container-hooks/index.js shipping but never
  verified for manylinux). Includes a coverage-trajectory table
  showing how each layer moves monty IB coverage from 12.5% today
  to 84% with all layers shipped.
- IB_NEXT_STEPS_SAM.md: new action-item companion to the bench
  results doc. Maps each layer (A through G) to owner / effort /
  effect on monty / effect on every other IB customer; spells out
  the cleanup deletes that follow each layer's merge; lists the
  four concrete asks for Sam (approve, get vnext PR reviewed,
  schedule IB-ops sync for C+E, triage Layer B's probe outcome).

Cross-repo PR
The companion to this commit is feat/cargo-rustc-shim on
Incredibuild-RND/vnext-processing-engine (Layer A — promote cargo
from ENV to SHIM mode in default_rules.yaml; 83 unit tests +
6 integration tests). Branch pushed; PR-ready.

Co-authored-by: Cursor <cursoragent@cursor.com>
…er); pin manylinux digest

CI run 25722680967 reproducibly failed in `cargo codspeed run` with:
  setarch: failed to set personality to x86_64: Operation not permitted
  ##[error]failed to execute valgrind

The CodSpeedHQ action shells out to valgrind, which uses setarch to
set ADDR_NO_RANDOMIZE personality. The IB self-hosted runner image
runs under restricted Linux capabilities (no SYS_ADMIN, user-namespace
remap) so the personality syscall is blocked. github-hosted runners
allow it. This is a structural blocker — not specific to monty —
that affects every valgrind-based tool in CI (callgrind, memcheck,
codspeed, ...).

Two paths to recover the IB value here are documented in
IB_NEXT_STEPS_SAM.md as a new IB-product roadmap item:
1. Hybrid: cargo codspeed build on IB, transfer artifacts, cargo
   codspeed run on ubuntu-latest. Doable but requires careful
   artifact pinning.
2. Have IB ops relax the runner image's seccomp/capability profile
   to allow setarch personality (or grant CAP_SYS_ADMIN). Common
   for build runners.

Until either lands, codspeed.yml stays on ubuntu-latest. The
monty-side measurement of the IB-build value lives in
ib-bench.yml::cell-I-ib-codspeed (only `cargo codspeed build`,
no valgrind run, so it works on IB).

Also pinned the manylinux container image in ib-probe.yml by
manifest digest (sha256:443eabd378e1...), addressing zizmor's
unpinned-images audit. The probe job uses the digest-pinned image
to validate Layer B (container hooks injecting /ib-workspace into
container: image: xx jobs).

Co-authored-by: Cursor <cursoragent@cursor.com>
ib-probe.yml::manylinux-probe (run 25726192172) confirmed end-to-end:
  - vnext-processing-engine container-hooks/index.js fires on a
    GHA-level container: block, bind-mounting /ib-workspace/cache and
    /ib-workspace/incredibuild + putting /ib-workspace/incredibuild/
    ib-accel/bin at the front of PATH inside the container.
  - /usr/bin/ib_console v3.25.2 runs natively under the manylinux
    image's glibc 2.28 (no GLIBC_2.x mismatch).
  - --standalone --no-monitor -- /bin/true connects to ib_server,
    proving the cache and the in-namespace distribution path are both
    live inside the container.

Cell H closes the loop on Layer B by measuring cargo-test-no-run on
the same manylinux image under ib_console, comparable to cell D
(synthetic, IB warm, on the bare host). H_warm / D_warm tracking
1.0 ± 10% means container-ization adds no overhead and the wheel-
build matrix (build job's 7 Linux entries + build-pgo linux) can be
migrated onto incredibuild-runner with a two-line GHA edit per job.

Doc updates:
- IB_BENCH_RESULTS.md: Layer-A row points at vnext PR pydantic#210; Layer-B
  marked GREEN with run link; coverage trajectory updated for the
  Phase-8 path (4 -> 6 -> 14 -> 17 -> 27 of 32).
- IB_NEXT_STEPS_SAM.md: Layer-B section rewritten as the validated
  result; ask pydantic#4 to Sam flipped to "done"; explicit 30-min agenda
  added for the Layer-C + Layer-E IB-ops sync.

Co-authored-by: Cursor <cursoragent@cursor.com>
…nup-fix

Three small follow-ups after the Layer-B GREEN result and Cell-H
first run:

1. ib-probe.yml::probe — add a "Layer-A cargo SHIM deploy check"
   group that looks for /ib-workspace/incredibuild/ib-accel/bin/cargo
   (or /opt/ib-accel/bin/cargo on older variants). The next probe
   run after vnext-processing-engine#210 lands and the runner image
   rebuilds will report `FOUND` and unblock Phase 5 of the closure
   plan automatically — no one has to remember to re-check.

2. IB_CLEANUP_SPEC.md — new mechanical cleanup spec for closure-plan
   Phases 5 (cargo-ib.sh removal), 6 (ib-profile.xml removal), 7
   (lint/fuzz/test-python re-route), and 8 (manylinux build matrix
   migration). Each phase lists exact files + line ranges + sed
   patterns + verification + commit-message template, so when its
   gate clears the right person can open the cleanup PR in 10 min
   without re-deriving the change set.

3. scripts/ib-bench-run.sh — fix cleanup step to honor
   $CARGO_TARGET_DIR. Cell H sets CARGO_TARGET_DIR=target-h to
   isolate from host-side cells, but the cleanup hardcoded `rm -rf
   target` so cell H iter 2 reused iter 1's artifacts (measured
   0.35s instead of a real warm-cache rebuild). target_size() also
   updated to honor the env. Cells A-G/I always use the default
   target/ so behavior unchanged for them.

The Cell-H first run (in ib-bench run 25727104334) still proved
the qualitative finding: container hook fires, ib_console runs
under glibc 2.28, cargo wrapping works end-to-end (iter 1 = 46.5s
cold). The numerical H_warm/D_warm comparison just needs a re-run
with this fix.

Co-authored-by: Cursor <cursoragent@cursor.com>
ib-bench run 25727572729 with the CARGO_TARGET_DIR fix produced
clean Cell H numbers:

  A iter 2 (ubuntu-latest, no IB):        37.4 s
  D iter 2 (IB host, warm cache):          5.27 s   ->  7.10x vs A
  H iter 2 (IB manylinux container, warm): 21.3 s   ->  1.76x vs A

H beats the closure plan's 1.3x gate for Phase 8. The 4x gap between
H (container) and D (bare host) on the same workload is a follow-up:
the container's separate rustup install gives it disjoint cargo cache
keys from the host. Aligning the toolchain would close the gap, but
1.76x vs ubuntu-latest is already enough to migrate the wheel-build
matrix.

Co-authored-by: Cursor <cursoragent@cursor.com>
Apply ruff formatting to the Cell-H summary strings so the lint job no longer rewrites scripts/ib-bench-summarize.py in CI.

Co-authored-by: Cursor <cursoragent@cursor.com>
Tal deployed the runner image built from vnext-processing-engine#210, and ib-probe run 25732897099 confirmed the generated cargo shim is live at /ib-workspace/incredibuild/ib-accel/bin/cargo.

Remove monty's repo-local cargo wrapper and route CI/bench commands through plain cargo so the runner-image shim owns ib_console wrapping via PATH. Keep the repo profile alive until Layer C by teaching ib-prep.sh to export IB_CONSOLE_ARGS for the vnext shim, including the per-job cache logfile and --profile=scripts/ib-profile.xml unless IB_NO_CACHE is set.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Keep the monty wiring aligned with the shipped cargo shim while preserving the small bridge for cargo extension workloads, and make the hosted-profile and CodSpeed decisions explicit locally.

Co-authored-by: Cursor <cursoragent@cursor.com>
Record the vnext follow-up that will remove monty's remaining cargo bridge once the runner image is rebuilt.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use the deployed vnext cargo shim for Monty's cargo extension and toolchain forms so the evidence branch proves the out-of-the-box runner path.

Co-authored-by: Cursor <cursoragent@cursor.com>
Keep the real test-rust benchmark cell aligned with ci.yml so the evidence workflow measures the deployed shim without tripping the runner wall-clock cap.

Co-authored-by: Cursor <cursoragent@cursor.com>
Keep CodSpeed from blocking the Incredibuild fork where project authorization is unavailable, and remove direct project-owner naming from IB handoff docs.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Copy Markdown

Codecov Results 📊

✅ Patch coverage is 100.00%. Project has 23456 uncovered lines.


Generated by Codecov Action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant