perf(concat_source): skip Mutex acquire in add() under &mut self#232
perf(concat_source): skip Mutex acquire in add() under &mut self#232
Conversation
`ConcatSource::add` takes `&mut self`, which already guarantees exclusive access to the inner children Vec. Locking the Mutex was a redundant atomic acquire/release on a hot construction path — rspack's JS code generation calls `add` hundreds of times per chunk (e.g. concatenated module rendering, runtime module assembly, parallel rayon folds). Switch to `Mutex::get_mut()` which is the standard zero-overhead alternative when exclusive access is statically known. All other read-only accessors (`Clone`, `Debug`, `optimized_children`) still acquire the lock.
There was a problem hiding this comment.
Pull request overview
This PR optimizes ConcatSource::add by avoiding an unnecessary Mutex lock when the method already has exclusive access via &mut self, reducing synchronization overhead in a hot path.
Changes:
- Replaced
self.children.lock().unwrap()withself.children.get_mut().expect(...)insideConcatSource::add. - Added an explanatory comment documenting why the lock can be skipped safely in this method.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Merging this PR will not alter performance
Performance Changes
Comparing Footnotes
|
a35cad4 to
82da852
Compare
What
Replace
lock().unwrap()withget_mut().unwrap()insideConcatSource::add, removing one redundant atomic acquire/release per call. Other&selfaccess points (Clone,Debug::fmt,optimized_children, and the downcast paths insideaddthat touch otherConcatSourceinstances) still uselock().Why
addalready takes&mut self, so the borrow checker statically guarantees exclusive access to the innerVec<BoxSource>— the runtime synchronization aMutexprovides is unnecessary here. Beyond the perf consideration, this also makes the intent more accurate:&mut selfshould not need to synchronize with anyone.The existing benchmarks build a
ConcatSourceonce during setup and then loop over.map().to_json(), so they exercise the cached read path rather thanadd— which is why CodSpeed shows no change here. Real Rspack hot paths useaddvery differently:rspack_plugin_javascript/src/runtime.rsdoes apar_iter().fold(ConcatSource::default, |mut acc, src| { acc.add(src); acc }), andrspack_core's concatenated module rendering chains 300+addcalls per module.I ran a targeted local microbenchmark (Apple Silicon, 50 samples × 8s, criterion
--baseline, repeated runs to confirm reproducibility): aConcatSourcebuilt by 500 sequentialaddcalls dropped from 8.29 µs to 5.05 µs (−39%, p < 0.001, ~1% CI width); the 16-add variant dropped from 421 ns to 278 ns (−32%). That works out to ~6–7 ns saved peradd, matching the cost of aMutexacquire/release on Apple Silicon. The microbenchmarks themselves are landing in a separate PR so this one stays a minimal, focused change.How
src/concat_source.rs::add:&mut *self.children.lock().unwrap()→self.children.get_mut().expect(...).get_mutborrows the wholeMutexmutably instead of going through a guard, equivalent to the original mutable-borrow semantics; the subsequentself.is_optimized.take()still works because it's a disjoint field borrow.cargo clippy --releaseandcargo fmt --checkboth clean.