Follow-up to #507 (item 3). PGO was deferred there because the full release matrix cross-compiles five targets (x64/arm64 linux-musl via cross, x64/arm64 darwin, x64 windows), and PGO needs to run an instrumented binary to collect profiles — impractical for arm64/musl on x86 CI runners. This issue narrows the scope to the one target where PGO is clean to do and validate: x64-linux only.
Scope
- Apply PGO only to the
x86_64-unknown-linux-* release binary. All other targets keep the current LTO-only build, unchanged.
- Do not reshape the rest of the release pipeline.
Proposed CI flow (x64-linux job only)
- Build an instrumented binary (
-Cprofile-generate), e.g. via cargo-pgo.
- Run it against the existing fixtures (
cargo run -- check over tests/fixtures/, plus mono-large) to collect profiles.
- Build the final binary with
-Cprofile-use=… + existing LTO.
- Ship that binary in the x64-linux release tarball.
Bench gating (hard requirement before merge)
- Compare
micro-bench deltas vs baseline on the PR (CI already does this).
- Validate the wall-time delta in the competitive bench (vs changesets / release-please) on x64-linux.
- Refuse merge if any
*_full_check_flow benchmark regresses. PGO ships only if the measured win is real on this target.
Out of scope
- arm64 / musl / darwin / windows PGO (no practical instrumented-run path on current runners).
- Local-dev PGO builds.
Expected upside: 5–15% wall time on top of LTO for the deterministic hot paths (full_check_flow, tag_index_build) on x64-linux. If the bench delta doesn't materialize, close without shipping.
Follow-up to #507 (item 3). PGO was deferred there because the full release matrix cross-compiles five targets (x64/arm64 linux-musl via
cross, x64/arm64 darwin, x64 windows), and PGO needs to run an instrumented binary to collect profiles — impractical for arm64/musl on x86 CI runners. This issue narrows the scope to the one target where PGO is clean to do and validate: x64-linux only.Scope
x86_64-unknown-linux-*release binary. All other targets keep the current LTO-only build, unchanged.Proposed CI flow (x64-linux job only)
-Cprofile-generate), e.g. viacargo-pgo.cargo run -- checkovertests/fixtures/, plusmono-large) to collect profiles.-Cprofile-use=…+ existing LTO.Bench gating (hard requirement before merge)
micro-benchdeltas vs baseline on the PR (CI already does this).*_full_check_flowbenchmark regresses. PGO ships only if the measured win is real on this target.Out of scope
Expected upside: 5–15% wall time on top of LTO for the deterministic hot paths (
full_check_flow,tag_index_build) on x64-linux. If the bench delta doesn't materialize, close without shipping.