fix(core): resolve analyzer perf regression and harden pirx-core visibility by m2papierz · Pull Request #17 · pirxware/pirx

m2papierz · 2026-06-13T21:22:30Z

What

Fix ~20-25% performance regression in ProfileAnalyzer::analyze introduced by error budget tracking, apply the same fix to monte_carlo::extract_summary, and tighten module/type visibility across pirx-core.

Why

CodSpeed flagged analyze[500] at -25.49%, analyze[100] at -22.85%, analyze[10] at -19.7% comparing master
(71f67ed) against the error budget baseline (8bc4d1a). The regression was caused by commit 92cc496 restructuring the GateServed match arm from a guarded pattern to an unguarded one, making every zero-wait event pay for a
saturating_add + bounds-checked array write it previously skipped. The same redundant-counter pattern existed in monte_carlo::extract_summary. Several internal-only types and modules were unnecessarily pub, widening the API surface beyond what downstream crates consume.

How

Analyzer (analyzer.rs):

Restored the guarded match arm (GateServed { gate, wait } if *wait > 0) for the stall+counting path; added a
separate GateServed { .. } arm for zero-wait bucket counting only.
Replaced the redundant total_magic_states loop counter with trace.magic_states_consumed (already computed by the engine).
Fused the post-loop cumulative_magic_states prefix-sum and cumulative_infidelity computation into a single
pass, eliminating a separate .map().collect() allocation.

Monte Carlo (monte_carlo.rs):

Removed the redundant magic_states_consumed counter; reads trace.magic_states_consumed directly.
Restored the GateServed guard so zero-wait events fall through to the no-op arm.

Visibility (lib.rs, buffer.rs, events.rs, routing.rs):

buffer, events, routing modules: pub → pub(crate).
ScalarRouting, ManhattanRouting, RoutingKind, build_position_index, from_config: pub → pub(crate).
MagicStateBuffer, EngineEvent, TimedEvent, EventQueue: pub → pub(crate).
MagicStateBuffer::is_full, is_empty, EventQueue::is_empty: gated behind #[cfg(test)] (test-only usage).

Testing

make ci passes locally (fmt + clippy + test + audit)
New behavior has tests (existing 206 tests cover all changed paths)
Hot-path changes have criterion benchmarks (trace_analysis/analyze/{10,100,500})

Checklist

PR description explains why, not just what
No new unwrap()/expect() in production code
No new allocations in the simulation hot loop
Crate boundaries respected (pirx-core never imports from pirx-adapters)
New dependencies justified (not "it's popular" — what does it replace?)

Error budget tracking (92cc496) changed GateServed from a guarded match arm (wait > 0, falling to no-op catch-all) to an unguarded arm processing every event. In warm-start benchmarks most gates are served with zero wait, so each paid a redundant saturating_add + array write.

…er in monte_carlo

codspeed-hq · 2026-06-13T21:24:05Z

Merging this PR will improve performance by 19.31%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 3 improved benchmarks
✅ 12 untouched benchmarks

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	`analyze[500]`	376.4 µs	314.1 µs	+19.84%
⚡	`analyze[100]`	85.8 µs	71.6 µs	+19.77%
⚡	`analyze[10]`	16.3 µs	13.8 µs	+18.32%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing fix/analyzer-perf (3711630) with master (71f67ed)}

…nalyzer

Revert the separate post-loop pass over trace events — benchmarking showed it was slower than inline counting. Replace pre-allocated vec![0; n] + fill loop for cumulative_magic_states and cumulative_infidelity with iterator collect, avoiding two allocator calls and ~60 KB of unnecessary zero-initialization.

cumulative_magic_states and cumulative_infidelity were pre-computed Vec fields on ExecutionProfile, adding 2 allocations + 2 O(n) passes to every analyze() call. Both are trivially derivable from magic_states_per_bucket and p_logical. Move them to computed methods so the cost is paid only by callers that need the timeline curves.

m2papierz added 2 commits June 13, 2026 23:10

harden(core): tighten visibility, remove redundant magic_states count…

9e8d15e

…er in monte_carlo

m2papierz added 3 commits June 13, 2026 23:38

fix(core): move magic_states_per_bucket to post-loop pass in ProfileA…

57c3f7a

…nalyzer

m2papierz merged commit f3cd97b into master Jun 13, 2026
11 checks passed

m2papierz deleted the fix/analyzer-perf branch June 13, 2026 22:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): resolve analyzer perf regression and harden pirx-core visibility#17

fix(core): resolve analyzer perf regression and harden pirx-core visibility#17
m2papierz merged 5 commits into
masterfrom
fix/analyzer-perf

m2papierz commented Jun 13, 2026

Uh oh!

codspeed-hq Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

m2papierz commented Jun 13, 2026

What

Why

How

Testing

Checklist

Uh oh!

codspeed-hq Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 19.31%

Performance Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codspeed-hq Bot commented Jun 13, 2026 •

edited

Loading