Skip to content

perf(core): reduce allocations and memory in ProfileAnalyzer#18

Merged
m2papierz merged 2 commits into
masterfrom
feat/core-perf
Jun 14, 2026
Merged

perf(core): reduce allocations and memory in ProfileAnalyzer#18
m2papierz merged 2 commits into
masterfrom
feat/core-perf

Conversation

@m2papierz

Copy link
Copy Markdown
Contributor

What

Replace two per-bucket counter Vecs with a single Vec bitfield in ProfileAnalyzer::analyze, and add
unit tests for the cumulative helper methods on ExecutionProfile.

Why

Bottleneck classification only checks presence (> 0), never reads actual stall/failure counts. Tracking two
Vecs (8 bytes per bucket) for a boolean signal wastes memory and adds an unnecessary .zip() in the
post-loop pass. Per-stall detail is already preserved in stall_events, per-failure detail in the raw Trace.

How

  • Bitfield replacement (analyzer.rs): stalls_in_bucket: Vec and factory_failure_counts: Vec replaced
    by bucket_flags: Vec with HAD_STALL = 1 and HAD_FAILURE = 2 flag bits. 75% memory reduction per bucket.
    Bottleneck classification iterates one Vec instead of zipping two.
  • GateServed arm reorder (analyzer.rs): magic-state counting moved before the wait > 0 stall branch —
    unconditional work first, conditional second.
  • Cumulative helper tests (profile.rs): 5 unit tests covering prefix-sum correctness, empty input, p_logical
    scaling, zero probability, and last-element-equals-total invariant.

Testing

  • make ci passes locally (fmt + clippy + test + audit)
  • New behavior has tests
  • Hot-path changes have criterion benchmarks

Checklist

  • PR description explains why, not just what
  • No new unwrap()/expect() in production code
  • No new allocations in the simulation hot loop
  • Crate boundaries respected (pirx-core never imports from pirx-adapters)

…rofileAnalyzer

Replace two pre-allocated Vec<u32> (stalls_in_bucket, factory_failure_counts)
with a single Vec<u8> using HAD_STALL/HAD_FAILURE flag bits. Bottleneck
classification only checks presence, never actual counts, so the bitfield
is semantically identical with 75% less memory per bucket. Merge GateServed
match arm to count magic states before stall check. Add unit tests for
cumulative_magic_states() and cumulative_infidelity() helper methods.
@codspeed-hq

codspeed-hq Bot commented Jun 14, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 12.68%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 3 improved benchmarks
✅ 12 untouched benchmarks

Performance Changes

Benchmark BASE HEAD Efficiency
analyze[10] 13.8 µs 12 µs +14.78%
analyze[500] 314.1 µs 280.9 µs +11.82%
analyze[100] 71.6 µs 64.3 µs +11.48%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing feat/core-perf (766e3c7) with master (f3cd97b)

Open in CodSpeed

Lift HAD_STALL/HAD_FAILURE from function body to module-level constants,
following standard Rust idiom for named flags.
@m2papierz m2papierz merged commit 9ec9291 into master Jun 14, 2026
11 checks passed
@m2papierz m2papierz deleted the feat/core-perf branch June 14, 2026 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant