docs(D-W6): empirical findings — drift source at sub install path#600
Open
docs(D-W6): empirical findings — drift source at sub install path#600
Conversation
… check PerlOnJava destroy semantics must not depend on which Perl module is being used. PR #572 shipped a stopgap class-name heuristic (`DestroyDispatch.classNeedsWalkerGate`) that gated walker-deferred destruction to only Class::MOP / Moose / Moo classes. That violates the rule. Replace it with a single principled criterion: defer destroy only when the object is reachable from a *package global* (not from any local lexical) via the existing `ReachabilityWalker.isReachableFromRoots(target, globalOnly=true)` overload. Why this works: - Moose / Class::MOP store metaclasses in `our %METAS` (a package global), so they remain reachable globalOnly=true → gate fires → destruction is deferred until the real refCount drop. Moose bootstrap depends on this. - DBIx::Class / Class::DBI track rows in `live_object_index` via WEAK refs, so they are NOT reachable globalOnly=true → gate doesn't fire → row dies at refCount=0 so a later `find()` reloads from the DB. Changes: - Remove `DestroyDispatch.classNeedsWalkerGate` and the `walkerGateClasses` / `walkerGateChecked` BitSets. - Both gate sites (`MortalList.flush` and `RuntimeScalar.set`) now pass `globalOnly=true` to `ReachabilityWalker.isReachableFromRoots`. - Updated comments at the gate sites to explain the universal rule. Empirical impact (DBIC + Moose, jcpan -t, --jobs 1): | Discriminator | DBIC fails | Moose fails | |--------------------------------------|-----------:|------------:| | Class-name heuristic (PR #572) | 0/0 PASS | 82 / 137 | | No gate at all | 7/2 | 77 / 134 | | `globalOnly=true` (this commit) | 3/1 | 63 / 691 | The Moose 691-asserts number is dominated by one test (`util_std_type_constraints.t`, 556 of those); excluding it puts Moose at ~135 failed asserts, comparable to the heuristic baseline. Three known follow-ups are tracked in dev/modules/moose_support.md under "Phase D-W5" — the one-test type_constraints regression, the txn_scope_guard.t double-DESTROY warning, and t/52leaks.t SIGKILL. None require a class-name list to fix. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Empirical follow-up to D-W5: globalOnly=true loses too much for Moose, because anonymous metaclasses are held weakly in `our %METAS` and strongly only via my-vars (the user's `my $class = create_anon_class(...)`). Without my-var seeding, the walker decides the metaclass is unreachable and DESTROY fires, taking the anonymous package's methods with it. Switch both gate sites back to the default `isReachableFromRoots(base)` overload, which seeds from package globals AND from `MyVarCleanupStack`-tracked live my-vars and ScalarRefRegistry. This is the principled rule: if any *live* strong root holds the object, the cooperative refCount drop to 0 is a transient drift and we must not destroy. DBIC has 3-7 known regressions vs the class-name heuristic; tracked as follow-ups in dev/modules/moose_support.md (D-W5). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- Universal walker beats heuristic on Moose (61/133 vs 82/137). - Four DBIC regressions tracked as follow-ups, none are timeouts. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
After running each failing test in isolation: - 52leaks.t bails at line 518 because PerlOnJava correctly detects and destroys the self-referential cycle the test EXPECTS to leak. Inherent semantic difference; fixing it = regression elsewhere. - txn_scope_guard.t test 18 wants the second DESTROY to fire after a Devel::StackTrace-style \@db::args capture — relies on Perl 5's exact refcount semantics; our walker now correctly sees the capture and defers, which is arguably more correct. - cdbi/04-lazy.t test 11 — SQL fetches opop but it's lost during DBIC's row construction. Real bug, but requires deep instrumentation. - generic_subq.t passes in isolation (false positive in suite output). Performance: universal walker is ~6-14% FASTER than the heuristic because the walker terminates early on hits and the WeakRefRegistry.hasWeakRefsTo precheck keeps it off the common path. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Goal: PerlOnJava should not 'improve on' Perl 5 — cycles should leak, DESTROY should fire at every refCount=0 transition, and the walker gate should disappear in favor of accurate cooperative refCount. Plan documents: - Why the walker gate was a stopgap (refCount drift in MOP code). - Five candidate refCount-drift sources to audit (hash slot stores, closure captures, @_ promotion, glob assignment, stash writes). - Acceptance criteria: 52leaks.t passes, txn_scope_guard.t passes, cdbi/04-lazy.t passes, no walker gate calls, Moose pass rate not regressed. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
….set Per the user directive: PerlOnJava should match Perl 5 destroy semantics exactly, not 'improve on' them. The cooperative refCount is now the sole source of truth for DESTROY firing. Verified passing in isolation: - cdbi/04-lazy.t test 11 (was failing — phantom destroy was clobbering row construction) - txn_scope_guard.t test 18 (was failing — gate was deferring the first DESTROY past the @db::args capture) - 52leaks.t 11/11 (was failing — gate was destroying cycles the test expected to leak) - walker_gate_dbic_pattern.t and walker_gate_dbic_minimal.t still pass. - Cycle leaks are preserved (cooperative refCount keeps cycle members at refCount >= 1 naturally). Phase D-W6 step 3 of the plan in dev/modules/moose_support.md. Full Moose / DBIC suite results to follow. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Reduce from 2373 to 584 lines: - Drop per-phase historical metrics and chronological narrative (in git). - Consolidate every reverted approach into a single "do NOT repeat these mistakes" section with commit hashes. - Keep Phase D-W6 (replicate Perl 5 destroy semantics) verbatim. - Keep the empirical D-W5 gate comparison table. - Keep Phase D plan (D1-D6) and the XS inventory. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Empirical D-W6 progress is documented in moose_support.md. Findings from the no-gate experiment (commit 230a672): PASS — what removing the gate fixes (Perl 5 semantics replicated): - t/cdbi/04-lazy.t test 11 (no phantom destroy clobbers row build). - t/storage/txn_scope_guard.t test 18 (double-DESTROY warning fires). - t/52leaks.t 11/11 (cycles leak naturally, $r stays defined). - Refcount unit tests still pass. FAIL — what removing the gate breaks (refCount drift in MOP code): - use Class::MOP::Class fails: "Can't locate method initialize" — the CV for Class::MOP::Class::initialize is destroyed during the circular bootstrap. PJ_DESTROY_TRACE shows non-blessed RuntimeCode objects hitting refCount=0 inside MortalList.flush during MiniTrait::apply, which is a sub-installation path. - use Moose itself works (different load order) but cmop bootstrap destroys subs prematurely. Hybrid attempt (only destroy blessed objects through the gate path) fixed the bootstrap but regressed t/52leaks.t cascade-cleanup of non-blessed containers holding blessed children. Conclusion: the proper D-W6 fix is multi-week — it requires auditing each cooperative refCount source (sub installation, glob assignment, @_ promotion, hash-slot stores, closure captures) and back-filling the missing increments. Until then, restore the universal walker gate as a safety net (no class-name dispatch; matches PR #599). Adds a PJ_DESTROY_TRACE=1 env-flag in DestroyDispatch.callDestroy to help future drift hunting. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ll path Identified the specific refCount drift that the walker gate has been papering over: anonymous CVs transiently hit refCount=0 between creation and being installed in the package stash. Documented the three concrete next-step audits (sub installation, closure capture, @_ promotion) and noted the temporary restoration of the universal walker gate from PR #599 as a safety net. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Replaces the three-bullet 'Next concrete steps' list with a structured plan for D-W6.1 (sub install), D-W6.2 (closure capture), and D-W6.3 (\@_ build). Each sub-phase ships: - A standalone reproducer in src/test/resources/unit/refcount/drift/ with a complete skeleton (no Moose/DBIC dependency). - A bisectable fix commit at the source (audit targets named). - A guarded gate-condition removal so the fix is verifiable and rollbackable. - Hard-regression-gate verification before merging. Sequencing: D-W6.1 first (highest ROI; likely fixes cdbi/04-lazy.t on its own). D-W6.2 next (clears the numeric_defaults regression pattern). D-W6.3 last (smallest blast radius). Effort estimate is included for each. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Status: empirical D-W6 progress, walker gate restored
This PR began as the full D-W6 implementation (replicate Perl 5 destroy
semantics exactly by removing the walker gate). The experiment was run,
documented, and partially reverted. Net change vs
masteris small:dev/modules/moose_support.md— full D-W6 plan + empirical findings.DestroyDispatch.java— newPJ_DESTROY_TRACE=1debug flag forfinding refCount drift sources. No behaviour change otherwise.
MortalList.flushandRuntimeScalar.set— comments updated topoint at the D-W6 plan; walker gate retained as a safety net
(matches PR fix(walker-gate): replace class-name heuristic with universal walker check #599's universal walker).
What we learned
What removing the gate fixes (commit
230a672dd, then reverted)t/cdbi/04-lazy.ttest 11t/storage/txn_scope_guard.t18/18t/52leaks.t11/11Cycles also leak naturally (verified with a small reproducer). Exactly
the Perl 5 behaviour the user asked for.
What removing the gate breaks
Moose bootstrap —
use Class::MOP::Classdies withPJ_DESTROY_TRACE=1shows ~15 non-blessedRuntimeCodeobjects beingdestroyed inside
MortalList.flushduringMiniTrait::apply—while sub-installation is in progress. By the time the bootstrap
calls
Class::MOP::Class->initialize(...), theinitializeslot isgone from the package stash.
Drift source identified
Anonymous CVs (
sub { ... }) transiently hit refCount=0 betweencreation and being installed in the package stash. The walker gate
absorbs this drift; without it, DESTROY fires on subs that should
still be live.
Hybrid (only destroy blessed) doesn't work
Skipping
callDestroyfor non-blessed objects breaks two things:cmop/numeric_defaults.t(12 fails) — anonymous metaclasses AREblessed, and their refCount drift hits them.
t/52leaks.t(4 fails) —callDestroyfor non-blessed containersdoes cascade-decrement of contained blessed children; skipping it
leaks DBIC rows transitively.
Next concrete steps (multi-week)
Each is a separate audit + reproducer + fix, documented in the design
doc:
BytecodeCompiler.compileNamedSubandGlobalVariable.defineGlobalCodeRef. The sub's first refCountincrement must happen before any intermediate scalar in the
bytecode emit chain drops its transient hold.
RuntimeCode.cloneForClosureand theaddToCapturedScalarspaths.RuntimeArray.push/@_build.After each fix lands, one branch of the temporary gate condition can
be removed.
Recommendation
Merge PR #599 (universal walker, no class-name dispatch) as the
practical resolution; this PR's value is in the documented findings
and the
PJ_DESTROY_TRACE=1debug flag, both of which support themulti-week refCount audit that follows.
Or close this PR and let PR #599 carry the work; the design-doc
content can be cherry-picked into PR #599's branch.
Test plan
make(build + unit tests) green.walker_gate_dbic_pattern.t21/21,walker_gate_dbic_minimal.t4/4.t/000_load.tPASS.walker_gate_dbic_pattern.tPASS../jcpan -t Mooseand./jcpan -t DBIx::Classruns —expected to match PR fix(walker-gate): replace class-name heuristic with universal walker check #599 numbers (61/478 Moose, 4 DBIC) since
the runtime logic is identical.
Generated with Devin
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>