Skip to content

fix(core): remove routing magic number, add BottleneckType::RoutingContention, harden analyzer and public API#19

Merged
m2papierz merged 4 commits into
masterfrom
feat/core-perf
Jun 14, 2026
Merged

fix(core): remove routing magic number, add BottleneckType::RoutingContention, harden analyzer and public API#19
m2papierz merged 4 commits into
masterfrom
feat/core-perf

Conversation

@m2papierz

Copy link
Copy Markdown
Contributor

What

Five targeted fixes across pirx-hw, pirx-core, and pirx-ir: remove the scalar routing magic number, add
#[must_use] to key public functions, document the TraceEvent size budget, add defensive clearing in the
analyzer, and align BottleneckType with the architecture spec.

Why

  • overhead_fraction: The * 10.0 multiplier in scalar routing was an undocumented magic number. Users setting overhead_fraction = 0.5 in TOML had no way to know it meant 5 cycles without reading the source.
  • #[must_use]: Engine::run(), run_monte_carlo(), ProfileAnalyzer::analyze(), validate(), and load() all return expensive results that are silently discarded if not captured — a common source of bugs in integration code.
  • TraceEvent size: The bare assert!(size == 32) produced a cryptic error with no guidance on how to fix it when adding new variants.
  • factory_starts: The analyzer relied on an implicit engine invariant (FactoryStarted always follows FactoryProduced/Failed). Clearing consumed starts makes it self-contained — correct regardless of trace ordering guarantees, future-proofing for external trace ingestion.
  • BottleneckType: The spec (§4.5) defines four types including RoutingContention. The code had only three, and Balanced was semantically wrong (stall + factory failure instead of stall + routing contention). Factory failures are part of factory throughput, not a separate bottleneck dimension.

How

  • overhead_fraction → overhead_cycles: RoutingConfig::Scalar now holds overhead_cycles: u32 (default 5, preserving the previous effective default). The from_config() conversion in pirx-core becomes a direct field copy. InvalidOverheadFraction error variant removed — u32 needs no range validation.
  • #[must_use]: Annotations with descriptive messages on 5 public functions across 4 crates.
  • TraceEvent: Added a block comment explaining memory budget (320 GB vs 480 GB at scale), cache efficiency (32 B = half a cache line), and three mitigation strategies. Separate assertion on TraceEventKind (24 bytes) pinpoints which part grew.
  • factory_starts: FactoryProduced and FactoryFailed handlers set factory_starts[id] = None after consuming the start cycle. Naive reference in difference_array_matches_naive_reference updated to match.
  • BottleneckType: Added RoutingContention variant. Replaced HAD_FAILURE with HAD_ROUTING tracking RoutingCompleted { latency > 0 }. Classification uses (has_stall, has_routing) match. Two new tests: routing_contention_detected (manhattan routing), balanced_when_stall_and_routing (synthetic trace).

Testing

  • make ci passes locally (fmt + clippy + test + audit)
  • New behavior has tests

Checklist

  • PR description explains why, not just what
  • No new unwrap()/expect() in production code
  • No new allocations in the simulation hot loop
  • Crate boundaries respected (pirx-core never imports from pirx-adapters)

The scalar routing model used an opaque `overhead_fraction: f64` field
that was silently converted via `(fraction * 10.0).ceil() as u32`. This
made the TOML config non-self-documenting: setting `overhead_fraction =
0.5` actually meant 5 cycles, which is not obvious to users.

Replace with a direct `overhead_cycles: u32` field (default 5, preserving
the previous effective default). Remove `InvalidOverheadFraction` error
variant since u32 needs no range validation. Update all TOML models,
test fixtures, and tests.
…alidate()

Prevents silent discard of simulation results. Engine::run() returns a
Trace, run_monte_carlo() returns a MonteCarloResult, and validate()
returns a ValidatedCircuit — all expensive to compute and useless if
not captured.
…uide

Replace the bare `assert!(size_of::<TraceEvent>() == 32)` with a
documented block explaining why the budget exists (memory + cache
efficiency), which variant is currently the largest, and three
mitigation strategies for future contributors. Add a separate
assertion on TraceEventKind (24 bytes) to pinpoint which part grew
if the budget is exceeded.
…ng contention

Two related fixes in the profile analyzer:

1. Clear factory_starts after consumption in FactoryProduced/FactoryFailed
   handlers. The analyzer previously relied on an implicit invariant that
   FactoryStarted always follows FactoryProduced/Failed. Clearing consumed
   starts makes the analyzer self-contained and correct regardless of trace
   ordering guarantees. Naive reference in test updated to match.

2. Add RoutingContention variant to BottleneckType to match the architecture
   spec (§4.5). Replace HAD_FAILURE with HAD_ROUTING — factory failures are
   part of factory throughput, not a separate bottleneck dimension. Track
   RoutingCompleted events with latency > 0. Classify buckets via
   (has_stall, has_routing) → Balanced/FactoryThroughput/RoutingContention/None.

Add #[must_use] to ProfileAnalyzer::analyze().
@codspeed-hq

codspeed-hq Bot commented Jun 14, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 15 untouched benchmarks


Comparing feat/core-perf (de487af) with master (9ec9291)

Open in CodSpeed

@m2papierz m2papierz merged commit d674fd6 into master Jun 14, 2026
11 checks passed
@m2papierz m2papierz deleted the feat/core-perf branch June 14, 2026 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant