perf: gate MyVarCleanupStack emission on per-sub AST analysis#533
Merged
fglock merged 1 commit intofeature/refcount-perf-combinedfrom Apr 21, 2026
Merged
Conversation
A new CleanupNeededVisitor (following the established visitor pattern,
like TempLocalCountVisitor) analyses each sub's body and decides
whether the per-scope-exit MyVarCleanupStack.unregister loop can be
skipped at emit time.
The visitor flags the sub as "needs cleanup" when its body contains
any of:
- bless / weaken / isweak
- local
- defer blocks
- user sub calls (BinaryOperatorNode "(") — callee may return
blessed refs
- method calls (BinaryOperatorNode "->" with IdentifierNode or "("
on the RHS) — array/hash derefs are recognised and NOT flagged
- nested SubroutineNode (closure captures)
Builtins (print, push, chr, length, etc.) are parsed as OperatorNode
and don't trigger the sub-call branch, so they're correctly classified
as safe. Overridden builtins imported via `use subs` resolve to user
sub calls at parse time and are flagged automatically.
When the visitor says no-cleanup-needed, EmitStatement skips ONLY the
Phase-E MyVarCleanupStack.unregister loop. Phases 1, 1b (scopeExitCleanup
on blessed scalars/containers) and Phase 3 (MortalList.flush) are
always emitted — skipping those would break blessed-parameter DESTROY,
tie_scalar untie, DBIC txn_scope_guard, and the destroy_eval_die
tests. The correctness-to-perf tradeoff for these is NOT worth taking.
Perf impact:
life_bitpacked A/B controlled: 8.25 (force-cleanup) -> 8.46 (-2.5%)
5 consecutive runs each, same process, env-toggled
Other benches: within ±2% of baseline noise
Correctness:
- `make` passes except pre-existing destroy_eval_die.t#4
- tie_scalar.t 12/12 ok
- DBIC t/storage/txn_scope_guard.t 18/18 ok
Env-var escape hatch: JPERL_FORCE_CLEANUP=1 disables the analysis
globally, forcing the old (slow) emission path. Use when investigating
suspected correctness regressions attributable to this optimization.
This is a modest but real win and follows a clean architectural
pattern (visitor + flag + emit branch). Future work could extend the
visitor to also detect when Phase 1 / 1b can be skipped, but that
requires static type analysis of scalar contents — deferred to a
separate effort.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First concrete step on §0 of PR #526's parity plan. New
CleanupNeededVisitor(AST pass, ~200 lines, same pattern asTempLocalCountVisitor) decides per-sub whether theMyVarCleanupStack.unregisteremission can be skipped at scope exit.Measurement protocol
Rigorous A/B within a single process to eliminate JVM-startup and system-load confounds:
Result: 8.46 Mcells/s (optimized) vs 8.25 Mcells/s (forced-cleanup), ~2.5% consistent across 5 runs each. Modest but reproducible.
Other benchmarks stay within ±2% noise.
Why it's a modest win, not a big one
The analysis was initially aggressive (also skipped Phase 1
scopeExitCleanup) and broke DBICtxn_scope_guard.t+tie_scalar.t(blessed params need cleanup regardless of whether the sub itself callsbless). Narrowed scope to Phase E only.Phase E (
MyVarCleanupStack.unregister) is the emission master does NOT have — it's specifically our walker-hardening overhead. Skipping it for simple subs is safe and gives the measured gain. The deeperscopeExitCleanupcosts are shared with master and aren't addressed here.See
dev/design/life_bitpacked_regression_analysis.mdfor the full architectural context.Correctness
makeunit tests: pass except the pre-existingdestroy_eval_die.t#4tie_scalar.t: 12/12 okt/storage/txn_scope_guard.t: 18/18 okJPERL_FORCE_CLEANUP=1env-var escape hatch for correctness debuggingMerge plan
Targets
feature/refcount-perf-combined(not master) so it stacks cleanly into PR #526 before that lands.What's next
Generated with Devin