fix(core): resolve analyzer perf regression and harden pirx-core visibility#17
Merged
Conversation
Error budget tracking (92cc496) changed GateServed from a guarded match arm (wait > 0, falling to no-op catch-all) to an unguarded arm processing every event. In warm-start benchmarks most gates are served with zero wait, so each paid a redundant saturating_add + array write.
…er in monte_carlo
Merging this PR will improve performance by 19.31%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ⚡ | analyze[500] |
376.4 µs | 314.1 µs | +19.84% |
| ⚡ | analyze[100] |
85.8 µs | 71.6 µs | +19.77% |
| ⚡ | analyze[10] |
16.3 µs | 13.8 µs | +18.32% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing fix/analyzer-perf (3711630) with master (71f67ed)
Revert the separate post-loop pass over trace events — benchmarking showed it was slower than inline counting. Replace pre-allocated vec![0; n] + fill loop for cumulative_magic_states and cumulative_infidelity with iterator collect, avoiding two allocator calls and ~60 KB of unnecessary zero-initialization.
cumulative_magic_states and cumulative_infidelity were pre-computed Vec fields on ExecutionProfile, adding 2 allocations + 2 O(n) passes to every analyze() call. Both are trivially derivable from magic_states_per_bucket and p_logical. Move them to computed methods so the cost is paid only by callers that need the timeline curves.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Fix ~20-25% performance regression in
ProfileAnalyzer::analyzeintroduced by error budget tracking, apply the same fix tomonte_carlo::extract_summary, and tighten module/type visibility across pirx-core.Why
CodSpeed flagged
analyze[500]at -25.49%,analyze[100]at -22.85%,analyze[10]at -19.7% comparing master(71f67ed) against the error budget baseline (8bc4d1a). The regression was caused by commit 92cc496 restructuring the
GateServedmatch arm from a guarded pattern to an unguarded one, making every zero-wait event pay for asaturating_add+ bounds-checked array write it previously skipped. The same redundant-counter pattern existed inmonte_carlo::extract_summary. Several internal-only types and modules were unnecessarilypub, widening the API surface beyond what downstream crates consume.How
Analyzer (analyzer.rs):
GateServed { gate, wait } if *wait > 0) for the stall+counting path; added aseparate
GateServed { .. }arm for zero-wait bucket counting only.total_magic_statesloop counter withtrace.magic_states_consumed(already computed by the engine).cumulative_magic_statesprefix-sum andcumulative_infidelitycomputation into a singlepass, eliminating a separate
.map().collect()allocation.Monte Carlo (monte_carlo.rs):
magic_states_consumedcounter; readstrace.magic_states_consumeddirectly.GateServedguard so zero-wait events fall through to the no-op arm.Visibility (lib.rs, buffer.rs, events.rs, routing.rs):
buffer,events,routingmodules:pub→pub(crate).ScalarRouting,ManhattanRouting,RoutingKind,build_position_index,from_config:pub→pub(crate).MagicStateBuffer,EngineEvent,TimedEvent,EventQueue:pub→pub(crate).MagicStateBuffer::is_full,is_empty,EventQueue::is_empty: gated behind#[cfg(test)](test-only usage).Testing
make cipasses locally (fmt + clippy + test + audit)trace_analysis/analyze/{10,100,500})Checklist
unwrap()/expect()in production code