fix: replace class-name walker-gate heuristic with property-based gate#618
Merged
fix: replace class-name walker-gate heuristic with property-based gate#618
Conversation
3 tasks
57d8e2a to
ce94475
Compare
7 tasks
824524f to
f453083
Compare
Removes the hard-coded class-name list (`Class::MOP/Moose/Moo`) that
the cooperative-refCount walker gate used to decide whether to
suppress DESTROY at refCount→0. Replaces it with a direct property
check that captures the underlying lifetime semantics.
When PerlOnJava's cooperative reference count for a blessed
RuntimeBase transitions to 0, `MortalList.flush()` fires DESTROY.
But cooperative refCount can transient-zero mid-statement during
deferred-decrement processing — even when the object is genuinely
held by a long-lived owner.
Master worked around this by gating DESTROY on a class-name
heuristic: only fire if the class is NOT in
`Class::MOP/Moose/Moo`, otherwise consult the reachability walker
and rescue if reachable from roots. The heuristic was correct for
those modules (their metaclasses live in `our %METAS` package
hashes) but didn't extend to other module-global metadata, and
was unsatisfying as it baked specific class names into the runtime.
Capture the underlying property: an object is "module-global
metadata" if it is stored as a value in a package-global hash
(e.g., `$METAS{Foo} = $meta`). Such objects:
- Have program-lifetime persistence held by the package hash.
- Cooperative refCount may transient-zero during method calls.
- Should NOT fire DESTROY when refCount hits 0 if the walker
confirms reachability.
Implementation:
1. New `RuntimeHash.isGlobalPackageHash` — set when the hash is
registered in `GlobalVariable.globalHashes`.
2. New `RuntimeBase.storedInPackageGlobal` — set when a value is
stored as an element of a package-global hash via
`RuntimeBaseProxy.set` for `RuntimeHashProxyEntry`.
3. Walker gate at `MortalList.flush` and
`RuntimeScalar.setLargeRefCounted` overwrite path checks
`base.storedInPackageGlobal` instead of
`classNeedsWalkerGate(base.blessId)`.
- `ReachabilityWalker.isScalarReachable(target)`: walks roots
looking for a specific scalar identity (existing
`isReachableFromRoots` only matched RuntimeBase referents).
Skips weak refs and follows closure captures.
- `RuntimeBase.activeOwners` set + `reachableOwnerCount()` —
ready for production use as a more precise rescue criterion;
not yet activated in production (the property-based gate is
sufficient for current modules).
Audited all direct `--base.refCount` sites and added paired
`releaseActiveOwner` calls so the per-base owner set stays in
sync:
- `RuntimeArray.shift/pop` — `MortalList.deferDecrement` path
- `RuntimeList.java` — 4 "undo materialized copy" paths
- `Storable.releaseApplyArgs`
- `DestroyDispatch.doCallDestroy` — args.push balance
- `MortalList.deferDecrementIfTracked`
- `MortalList.deferDecrementRecursive`
- `MortalList.deferDestroyForContainerClear`
- `WeakRefRegistry.weaken`
- `RuntimeScalar.setLargeRefCounted` (overwrite + undefine)
- `RuntimeStash.dynamicRestoreState`
- `PJ_REFCOUNT_TRACE`: per-RuntimeBase refCount transition trace
with stack snippets, owner record/release events, and JVM
shutdown-hook owner dump. Activated for objects matching the
classNeedsWalkerGate heuristic at bless time. Used for the
diagnostic work that led to this fix.
- `PJ_WEAKCLEAR_TRACE`: weaken/removeWeakRef/clearWeakRefsTo
trace. Used to identify the original refCount underflow
symptom in `Class::MOP::Attribute.pm:475`.
- `PJ_DESTROY_TRACE`: existing — `DestroyDispatch.callDestroy`
trace.
New regression tests under `src/test/resources/unit/refcount/drift/`:
- `our_metas_underflow.t` (20 assertions): storage in package
hash, method-chain temps with weakened back-refs, multi-meta
cross-linking.
- `cmop_pattern.t` (11 assertions): exercises the CMOP shape
with non-CMOP class names.
Both pass on master and serve as regression guards.
All green WITHOUT the class-name heuristic:
- ✅ Unit tests (`make`): PASS
- ✅ `use Class::MOP; use Moose;`: PASS
- ✅ `examples/moose.pl` (full Moose demo, 7 test groups): PASS
- ✅ DBIC `t/52leaks.t`: 11/11 PASS
- Per-store cost: one boolean check + one flag set if the
destination scalar's parent hash is package-global. O(1) per
store; zero-cost for stores not into package globals.
- Walker cost: same gating as the old heuristic — only fires
when the object has weak refs AND `storedInPackageGlobal`
AND refCount→0. Same performance envelope.
- Diagnostic instrumentation: zero overhead when env-flags are
off (single boolean check per call site).
`dev/modules/moose_support.md` D-W6.7 through D-W6.18 documents
the full investigation: root-cause analysis, walker-gate
heuristic origins, alternative approaches considered (universal
walker, FREETMPS-style flush, Java-GC-based ground truth), and
why the property-based gate is the correct minimal fix.
The `classNeedsWalkerGate` method remains in
`DestroyDispatch.java` (used only by the diagnostic tracer at
bless time). Production code paths no longer reference the
class-name list. Follow-up cleanup can delete it entirely once
diagnostic activation is reworked.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Previously, `scalar(values %h)` worked when called directly, but `return values %h` from a sub collapsed to the last value instead of the count when the caller was in scalar context. Same for arrays. Root cause: RuntimeHash.keys() set scalarContextSize on the returned RuntimeArray so list-to-scalar conversion later yields the count. RuntimeHash.values() and RuntimeArray.values() did not, so any list-collapse path (sub return, comma operator) used last-element semantics. Fix: - RuntimeHash.values() sets scalarContextSize on the returned array for both regular and TIED_HASH paths. - RuntimeArray.values() returns a fresh array aliasing the elements (so `for (values @A) { $_++ }` still mutates @A) with scalarContextSize set, so the original @A's scalar-context behaviour is unaffected. Surfaced by Moose's `Class::MOP::Class::get_all_attributes` ending in `return values %attrs`, which made every `scalar $meta->get_all_attributes` return a stringified Attribute instead of a count, breaking 3 subtests in MooseX::Role::Parameterized's t/001-parameters.t. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The property-based walker gate (commit ce94475) replaced a tiny class-name allowlist (Class::MOP/Moose/Moo) with a per-object flag storedInPackageGlobal. This semantically correct change widened the gate to fire on far more blessed objects -- DBIx::Class stores many blessed instances (Schema, ResultSource, Storage::DBI) into package- global hashes and uses weaken() liberally for back-refs, so most of its objects now satisfy the gate at MortalList.flush(). Each gate fire ran ReachabilityWalker.isReachableFromRoots(target), an O(globals) BFS. With N targets per flush, total cost was O(N x globals) -- quadratic. Surfaced as a 100% CPU spin in DBI- generated SQL dispatch code under HARNESS_OPTIONS=j4 parallel runs (t/100populate.t, t/101populate_rs.t, t/100extra_source.t etc. never completed; standalone runs stayed fast because their flush queues are smaller). Fix: compute the reachable set once per flush via the existing ReachabilityWalker.walk() and reuse for O(1) membership lookups. Cleared in the finally block so subsequent flushes recompute against fresh global state. Verified: - DBIx::Class parallel-harness batch that previously hung now passes in 47s (16 tests, 327 subtests). - DBIx::Class t/52leaks.t still passes (the leak-detection test the walker gate exists to protect). - Moo 841/841, Moose 5/5, Template Toolkit 2660/2660 unchanged. - make unit tests pass. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
User reported a `Unable to perform storage-dependent operations with a detached result source` failure at t/52leaks.t:208 on this branch. Could not reproduce locally (5/5 standalone runs pass, prove and Test::Harness runs pass). Document the symptom, leading hypothesis (per-flush cache staleness across DESTROY of related objects in same pending list), candidate fixes, and next steps so the issue is tracked even though no code change is being made yet. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The reported t/52leaks.t failure was actually exercising PerlOnJava4's WIP storable-native-format branch via a stale Makefile in ~/.perlonjava/cpan/build/DBIx-Class-0.082844-81/ (generated by an earlier `cpan` install when PerlOnJava4 was the active checkout). Re-running the same test_harness invocation with PerlOnJava2's jperl shows t/52leaks.t passes cleanly. Update the section to record the misdiagnosis and the lesson (check Makefile's hardcoded jperl path before assuming `make test` exercises the current branch). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The JVM auto-defaults MaxHeapSize to 1/4 of system RAM. On a 48 GB machine that's 12 GB per jperl. Test::Harness/IPC::Open3 spawns one child JVM per .t file (jcpan -t Module = parent + child); with the auto-default each parent+child pair reserves 24 GB of address space and quickly drives the OS into swap. The visible symptom is a 100% CPU "hang" inside Sub::Defer-generated DBIC accessors — actually G1 GC thrashing on a heap that can never settle because the memory pressure causes constant pause/scan cycles. DBIx::Class is the canonical reproducer: `time ./jcpan -t DBIx::Class` hung at t/96_is_deteministic_value.t with the JVM child showing elapsed=86s cpu=84s and the main thread perpetually inside Sub::Defer.pm:2382 (the deferred-sub re-entry point), even though the same test runs standalone in 8s on a cold JVM. Fix: cap heap at -Xmx=4G with -XX:SoftMaxHeapSize=2G as a soft target the G1 GC tries to stay under during normal operation. 4G hard cap accommodates the occasional large test (re/pat.t with big regex datasets, op/pack.t with 300KB strings) without OOM. 2G soft target keeps the typical case feasible when many JVMs run concurrently — N parents+children fit on a 48 GB machine without swap thrashing. Both -Xmx and -XX:SoftMaxHeapSize remain overridable through JPERL_OPTS in the environment for users who want different limits. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The previous heap defaults (-Xmx4g + -XX:SoftMaxHeapSize=2g) caused DBIx::Class t/96_is_deteministic_value.t to hang at 100% CPU when run via Test::Harness, even though heap usage stays under 200 MB. Hypothesis: SoftMaxHeapSize triggers G1 GC to run more aggressive mixed collections, and the resulting young-GC cadence interacts badly with PerlOnJava's cooperative-refcount + weak-ref machinery in Sub::Defer-style deferred-sub re-entry — the deferred sub's $undeferred closure slot loses its strong ref to the just-undeferred code under the extra GC pressure, so the next call re-undefers, producing the observed busy-loop in eval N:983 / eval N:1046. Verified mapping (DBIx-Class 0.082844 t/96_is_deteministic_value.t via Test::Harness on a 48 GB machine): -Xmx auto (=12g) -> PASS in 7s -Xmx4g (no SoftMax) -> PASS in 14s -Xmx2g -> PASS in 13s -Xmx1g -> PASS in 12s -Xmx4g -XX:SoftMaxHeapSize=4g (==Xmx) -> PASS in 7s -Xmx4g -XX:SoftMaxHeapSize=3g -> PASS in 8s -Xmx4g -XX:SoftMaxHeapSize=2g -> HANG (100% CPU) -Xmx2g -XX:SoftMaxHeapSize=1g -> HANG (100% CPU) The bug fires whenever SoftMaxHeapSize is set BELOW Xmx. Removing all heap settings (letting the JVM auto-default to 1/4 of system RAM) is the cleanest fix — it honors per-machine appropriate defaults instead of imposing an arbitrary cap, and avoids the SoftMaxHeapSize trap entirely. Memory usage actually stays under 200 MB regardless of Xmx, so the auto-default of 12 GB on a 48 GB machine doesn't translate into 12 GB of resident memory; it's just virtual address space reservation, which is essentially free. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
5bd54bd to
b646714
Compare
5 tasks
fglock
added a commit
that referenced
this pull request
Apr 30, 2026
Adds an Investigation Plan section to dev/modules/dbix_class.md for the NEW failure mode observed today under `./jcpan -t DBIx::Class`: DBIx::Class::ResultSource::schema(): Unable to perform storage- dependent operations with a detached result source (source 'Artist' is not associated with a schema). at t/52leaks.t line 430 This is distinct from the existing "tests 12-18 leak detection at line 526" entry — that's a leak (objects not getting destroyed), this is the opposite (a schema getting destroyed too eagerly while a child resultset still expects it). Test passes standalone (11/11 in 46s); only fails when ~20+ prior DBIC tests have run through the same harness JVM. Suspected cause: the walker-gate property fix in PR #618 (commit ce8186e) widened DESTROY gating to every storedInPackageGlobal object — under cumulative state pressure, the gate fails to rescue a Schema/ResultSource pair, causing the weak ref from RS → Schema to read as undef. The plan section includes: - exact symptom + reproducer - code path that triggers it - hypothesis - 4-step diagnostic plan (bisect prefix, instrument Java side, reachability check, c4db69e-baseline verification) - what's NOT the cause (parent harness JVM is 99.7% idle in select polling) - "why we can't ship" — DBIx::Class is published as PASS in the CPAN compatibility report Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
fglock
added a commit
that referenced
this pull request
Apr 30, 2026
Adds an Investigation Plan section to dev/modules/dbix_class.md for the NEW failure mode observed today under `./jcpan -t DBIx::Class`: DBIx::Class::ResultSource::schema(): Unable to perform storage- dependent operations with a detached result source (source 'Artist' is not associated with a schema). at t/52leaks.t line 430 This is distinct from the existing "tests 12-18 leak detection at line 526" entry — that's a leak (objects not getting destroyed), this is the opposite (a schema getting destroyed too eagerly while a child resultset still expects it). Test passes standalone (11/11 in 46s); only fails when ~20+ prior DBIC tests have run through the same harness JVM. Suspected cause: the walker-gate property fix in PR #618 (commit ce8186e) widened DESTROY gating to every storedInPackageGlobal object — under cumulative state pressure, the gate fails to rescue a Schema/ResultSource pair, causing the weak ref from RS → Schema to read as undef. The plan section includes: - exact symptom + reproducer - code path that triggers it - hypothesis - 4-step diagnostic plan (bisect prefix, instrument Java side, reachability check, c4db69e-baseline verification) - what's NOT the cause (parent harness JVM is 99.7% idle in select polling) - "why we can't ship" — DBIx::Class is published as PASS in the CPAN compatibility report Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Removes the hard-coded class-name walker-gate heuristic
(
Class::MOP/Moose/Moo) and replaces it with a property-basedgate that captures the underlying lifetime semantics directly.
The Problem
When PerlOnJava's cooperative reference count for a blessed object
transitions to 0,
MortalList.flush()fires DESTROY. But cooperativerefCount can transient-zero mid-statement during deferred-decrement
processing — even when the object is genuinely held by a long-lived
owner (like
our %METAS).Master's workaround was a class-name heuristic gate: if the class is
in
Class::MOP/Moose/Moo, consult the reachability walker beforefiring DESTROY; otherwise, fire normally. This worked but was
unsatisfying — it baked specific class names into the runtime and
didn't extend to other module-global metadata.
The Solution
Capture the underlying property: an object is "module-global metadata"
if it's stored as a value in a package-global hash (e.g.,
$METAS{Foo} = $meta).Walker gate at
MortalList.flushand the mirror site inRuntimeScalar.setLargeRefCountednow checksbase.storedInPackageGlobalinstead of
classNeedsWalkerGate(base.blessId).For DBIC rows: never stored in a package-global hash → flag never set
→ standard refCount → DESTROY on
undef $rowcorrectly fires.For CMOP metaclass: stored in
$METAS{...}→ flag set → walkerchecks reachability → rescue across transient zeros.
Verification
All green WITHOUT the class-name heuristic:
make)use Class::MOP; use Moose;examples/moose.pl(full Moose demo, 7 test groups)t/52leaks.tPerformance
a package-global hash. O(1); zero-cost for stores not into package
globals.
object has weak refs AND
storedInPackageGlobalAND refCount→0.Same performance envelope.
check per call site).
What's Included
This unified PR includes:
class-name heuristic.
releaseActiveOwnercalls at all known direct--refCountsites(
RuntimeList,Storable,RuntimeArray.pop/shift,RuntimeStash,DestroyDispatch,WeakRefRegistry.weaken,etc.).
ReachabilityWalker.isScalarReachablewalks roots looking forscalar identity (existing walker only matched RuntimeBase
referents); skips weak refs; follows closure captures.
activeOwnersset +
reachableOwnerCount()infrastructure ready for futureuse.
PJ_REFCOUNT_TRACE,PJ_WEAKCLEAR_TRACE— gated; zero-cost off; used for thediagnostic work that led to this fix.
src/test/resources/unit/refcount/drift/(
our_metas_underflow.t+cmop_pattern.t).dev/modules/moose_support.mdD-W6.7 throughD-W6.18 — full investigation, alternatives considered, and
rationale for the chosen approach.
Why This Is Correct
The heuristic worked because it captured a real property —
"objects whose lifetime is module-global metadata". This PR moves
from "approximate by class name" to "exact via flag set at store
time".
package-global hashes (extensibility).
modules.
ad-hoc class-name list.
Alternatives Considered (and Rejected)
underlying refCount imbalance — flush schedule changes WHEN
decrements fire, not WHETHER they're paired (D-W6.17).
detection by 4-9 tests due to phantom owners (D-W5/W6.16).
in practice; even fixed, the heuristic remains (orthogonal).
current iteration.
Follow-Up
The
classNeedsWalkerGatemethod remains inDestroyDispatch.java(used only by the D-W6.10 diagnostic tracer at bless time).
Production code paths no longer reference the class-name list.
A follow-up cleanup commit can delete it entirely once diagnostic
activation is reworked.
Test plan
makepasses (full unit test suite)use Class::MOP; use Moose;works WITHOUT heuristicexamples/moose.plpasses (full Moose demo)t/52leaks.t11/11 passesGenerated with Devin