fix: replace class-name walker-gate heuristic with property-based gate by fglock · Pull Request #618 · fglock/PerlOnJava

fglock · 2026-04-29T11:50:37Z

Summary

Removes the hard-coded class-name walker-gate heuristic
(Class::MOP/Moose/Moo) and replaces it with a property-based
gate that captures the underlying lifetime semantics directly.

The Problem

When PerlOnJava's cooperative reference count for a blessed object
transitions to 0, MortalList.flush() fires DESTROY. But cooperative
refCount can transient-zero mid-statement during deferred-decrement
processing — even when the object is genuinely held by a long-lived
owner (like our %METAS).

Master's workaround was a class-name heuristic gate: if the class is
in Class::MOP/Moose/Moo, consult the reachability walker before
firing DESTROY; otherwise, fire normally. This worked but was
unsatisfying — it baked specific class names into the runtime and
didn't extend to other module-global metadata.

The Solution

Capture the underlying property: an object is "module-global metadata"
if it's stored as a value in a package-global hash (e.g., $METAS{Foo} = $meta).

// New on RuntimeBase:
public boolean storedInPackageGlobal = false;

// New on RuntimeHash:
public boolean isGlobalPackageHash = false;  // set when registered
                                             // in GlobalVariable.globalHashes

// In RuntimeBaseProxy.set, when this is a RuntimeHashProxyEntry:
if (parent.isGlobalPackageHash && lvalue.value instanceof RuntimeBase rb) {
    rb.storedInPackageGlobal = true;
}

Walker gate at MortalList.flush and the mirror site in
RuntimeScalar.setLargeRefCounted now checks base.storedInPackageGlobal
instead of classNeedsWalkerGate(base.blessId).

For DBIC rows: never stored in a package-global hash → flag never set
→ standard refCount → DESTROY on undef $row correctly fires.

For CMOP metaclass: stored in $METAS{...} → flag set → walker
checks reachability → rescue across transient zeros.

Verification

All green WITHOUT the class-name heuristic:

Test	Status
Unit tests (`make`)	✅ PASS
`use Class::MOP; use Moose;`	✅ PASS
`examples/moose.pl` (full Moose demo, 7 test groups)	✅ PASS
DBIC `t/52leaks.t`	✅ 11/11 PASS

Performance

Per-store: one boolean check + one flag set if destination is in
a package-global hash. O(1); zero-cost for stores not into package
globals.
Walker: same gating as the old heuristic — only fires when
object has weak refs AND storedInPackageGlobal AND refCount→0.
Same performance envelope.
Diagnostics: zero overhead when env-flags off (single boolean
check per call site).

What's Included

This unified PR includes:

The fix (D-W6.18): property-based walker gate replacing the
class-name heuristic.
RefCount discipline audit (D-W6.13): paired
releaseActiveOwner calls at all known direct --refCount sites
(RuntimeList, Storable, RuntimeArray.pop/shift,
RuntimeStash, DestroyDispatch, WeakRefRegistry.weaken,
etc.).
Walker improvements (D-W6.14/16):
ReachabilityWalker.isScalarReachable walks roots looking for
scalar identity (existing walker only matched RuntimeBase
referents); skips weak refs; follows closure captures.
Per-RuntimeBase ownership tracking (D-W6.13): activeOwners
set + reachableOwnerCount() infrastructure ready for future
use.
Diagnostic env-flags (D-W6.10/11/12): PJ_REFCOUNT_TRACE,
PJ_WEAKCLEAR_TRACE — gated; zero-cost off; used for the
diagnostic work that led to this fix.
Regression tests: src/test/resources/unit/refcount/drift/
(our_metas_underflow.t + cmop_pattern.t).
Documentation: dev/modules/moose_support.md D-W6.7 through
D-W6.18 — full investigation, alternatives considered, and
rationale for the chosen approach.

Why This Is Correct

The heuristic worked because it captured a real property —
"objects whose lifetime is module-global metadata". This PR moves
from "approximate by class name" to "exact via flag set at store
time".

Class-agnostic: works for any module that stores metadata in
package-global hashes (extensibility).
Behaviorally equivalent to the old heuristic for tested
modules.
Captures the underlying property explicitly rather than via
ad-hoc class-name list.

Alternatives Considered (and Rejected)

Per-statement FREETMPS-style flush: doesn't fix the
underlying refCount imbalance — flush schedule changes WHEN
decrements fire, not WHETHER they're paired (D-W6.17).
Universal walker (no class filter): regresses DBIC's leak
detection by 4-9 tests due to phantom owners (D-W5/W6.16).
Find unpaired refCount site: bounded but unclear if findable
in practice; even fixed, the heuristic remains (orthogonal).
Java-GC ground truth: major rewrite, too risky for the
current iteration.

Follow-Up

The classNeedsWalkerGate method remains in DestroyDispatch.java
(used only by the D-W6.10 diagnostic tracer at bless time).
Production code paths no longer reference the class-name list.
A follow-up cleanup commit can delete it entirely once diagnostic
activation is reworked.

Test plan

make passes (full unit test suite)
use Class::MOP; use Moose; works WITHOUT heuristic
examples/moose.pl passes (full Moose demo)
DBIC t/52leaks.t 11/11 passes
New regression tests pass

Generated with Devin

Removes the hard-coded class-name list (`Class::MOP/Moose/Moo`) that the cooperative-refCount walker gate used to decide whether to suppress DESTROY at refCount→0. Replaces it with a direct property check that captures the underlying lifetime semantics. When PerlOnJava's cooperative reference count for a blessed RuntimeBase transitions to 0, `MortalList.flush()` fires DESTROY. But cooperative refCount can transient-zero mid-statement during deferred-decrement processing — even when the object is genuinely held by a long-lived owner. Master worked around this by gating DESTROY on a class-name heuristic: only fire if the class is NOT in `Class::MOP/Moose/Moo`, otherwise consult the reachability walker and rescue if reachable from roots. The heuristic was correct for those modules (their metaclasses live in `our %METAS` package hashes) but didn't extend to other module-global metadata, and was unsatisfying as it baked specific class names into the runtime. Capture the underlying property: an object is "module-global metadata" if it is stored as a value in a package-global hash (e.g., `$METAS{Foo} = $meta`). Such objects: - Have program-lifetime persistence held by the package hash. - Cooperative refCount may transient-zero during method calls. - Should NOT fire DESTROY when refCount hits 0 if the walker confirms reachability. Implementation: 1. New `RuntimeHash.isGlobalPackageHash` — set when the hash is registered in `GlobalVariable.globalHashes`. 2. New `RuntimeBase.storedInPackageGlobal` — set when a value is stored as an element of a package-global hash via `RuntimeBaseProxy.set` for `RuntimeHashProxyEntry`. 3. Walker gate at `MortalList.flush` and `RuntimeScalar.setLargeRefCounted` overwrite path checks `base.storedInPackageGlobal` instead of `classNeedsWalkerGate(base.blessId)`. - `ReachabilityWalker.isScalarReachable(target)`: walks roots looking for a specific scalar identity (existing `isReachableFromRoots` only matched RuntimeBase referents). Skips weak refs and follows closure captures. - `RuntimeBase.activeOwners` set + `reachableOwnerCount()` — ready for production use as a more precise rescue criterion; not yet activated in production (the property-based gate is sufficient for current modules). Audited all direct `--base.refCount` sites and added paired `releaseActiveOwner` calls so the per-base owner set stays in sync: - `RuntimeArray.shift/pop` — `MortalList.deferDecrement` path - `RuntimeList.java` — 4 "undo materialized copy" paths - `Storable.releaseApplyArgs` - `DestroyDispatch.doCallDestroy` — args.push balance - `MortalList.deferDecrementIfTracked` - `MortalList.deferDecrementRecursive` - `MortalList.deferDestroyForContainerClear` - `WeakRefRegistry.weaken` - `RuntimeScalar.setLargeRefCounted` (overwrite + undefine) - `RuntimeStash.dynamicRestoreState` - `PJ_REFCOUNT_TRACE`: per-RuntimeBase refCount transition trace with stack snippets, owner record/release events, and JVM shutdown-hook owner dump. Activated for objects matching the classNeedsWalkerGate heuristic at bless time. Used for the diagnostic work that led to this fix. - `PJ_WEAKCLEAR_TRACE`: weaken/removeWeakRef/clearWeakRefsTo trace. Used to identify the original refCount underflow symptom in `Class::MOP::Attribute.pm:475`. - `PJ_DESTROY_TRACE`: existing — `DestroyDispatch.callDestroy` trace. New regression tests under `src/test/resources/unit/refcount/drift/`: - `our_metas_underflow.t` (20 assertions): storage in package hash, method-chain temps with weakened back-refs, multi-meta cross-linking. - `cmop_pattern.t` (11 assertions): exercises the CMOP shape with non-CMOP class names. Both pass on master and serve as regression guards. All green WITHOUT the class-name heuristic: - ✅ Unit tests (`make`): PASS - ✅ `use Class::MOP; use Moose;`: PASS - ✅ `examples/moose.pl` (full Moose demo, 7 test groups): PASS - ✅ DBIC `t/52leaks.t`: 11/11 PASS - Per-store cost: one boolean check + one flag set if the destination scalar's parent hash is package-global. O(1) per store; zero-cost for stores not into package globals. - Walker cost: same gating as the old heuristic — only fires when the object has weak refs AND `storedInPackageGlobal` AND refCount→0. Same performance envelope. - Diagnostic instrumentation: zero overhead when env-flags are off (single boolean check per call site). `dev/modules/moose_support.md` D-W6.7 through D-W6.18 documents the full investigation: root-cause analysis, walker-gate heuristic origins, alternative approaches considered (universal walker, FREETMPS-style flush, Java-GC-based ground truth), and why the property-based gate is the correct minimal fix. The `classNeedsWalkerGate` method remains in `DestroyDispatch.java` (used only by the diagnostic tracer at bless time). Production code paths no longer reference the class-name list. Follow-up cleanup can delete it entirely once diagnostic activation is reworked. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

@A

Previously, `scalar(values %h)` worked when called directly, but `return values %h` from a sub collapsed to the last value instead of the count when the caller was in scalar context. Same for arrays. Root cause: RuntimeHash.keys() set scalarContextSize on the returned RuntimeArray so list-to-scalar conversion later yields the count. RuntimeHash.values() and RuntimeArray.values() did not, so any list-collapse path (sub return, comma operator) used last-element semantics. Fix: - RuntimeHash.values() sets scalarContextSize on the returned array for both regular and TIED_HASH paths. - RuntimeArray.values() returns a fresh array aliasing the elements (so `for (values @A) { $_++ }` still mutates @A) with scalarContextSize set, so the original @A's scalar-context behaviour is unaffected. Surfaced by Moose's `Class::MOP::Class::get_all_attributes` ending in `return values %attrs`, which made every `scalar $meta->get_all_attributes` return a stringified Attribute instead of a count, breaking 3 subtests in MooseX::Role::Parameterized's t/001-parameters.t. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

The property-based walker gate (commit ce94475) replaced a tiny class-name allowlist (Class::MOP/Moose/Moo) with a per-object flag storedInPackageGlobal. This semantically correct change widened the gate to fire on far more blessed objects -- DBIx::Class stores many blessed instances (Schema, ResultSource, Storage::DBI) into package- global hashes and uses weaken() liberally for back-refs, so most of its objects now satisfy the gate at MortalList.flush(). Each gate fire ran ReachabilityWalker.isReachableFromRoots(target), an O(globals) BFS. With N targets per flush, total cost was O(N x globals) -- quadratic. Surfaced as a 100% CPU spin in DBI- generated SQL dispatch code under HARNESS_OPTIONS=j4 parallel runs (t/100populate.t, t/101populate_rs.t, t/100extra_source.t etc. never completed; standalone runs stayed fast because their flush queues are smaller). Fix: compute the reachable set once per flush via the existing ReachabilityWalker.walk() and reuse for O(1) membership lookups. Cleared in the finally block so subsequent flushes recompute against fresh global state. Verified: - DBIx::Class parallel-harness batch that previously hung now passes in 47s (16 tests, 327 subtests). - DBIx::Class t/52leaks.t still passes (the leak-detection test the walker gate exists to protect). - Moo 841/841, Moose 5/5, Template Toolkit 2660/2660 unchanged. - make unit tests pass. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

User reported a `Unable to perform storage-dependent operations with a detached result source` failure at t/52leaks.t:208 on this branch. Could not reproduce locally (5/5 standalone runs pass, prove and Test::Harness runs pass). Document the symptom, leading hypothesis (per-flush cache staleness across DESTROY of related objects in same pending list), candidate fixes, and next steps so the issue is tracked even though no code change is being made yet. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

The reported t/52leaks.t failure was actually exercising PerlOnJava4's WIP storable-native-format branch via a stale Makefile in ~/.perlonjava/cpan/build/DBIx-Class-0.082844-81/ (generated by an earlier `cpan` install when PerlOnJava4 was the active checkout). Re-running the same test_harness invocation with PerlOnJava2's jperl shows t/52leaks.t passes cleanly. Update the section to record the misdiagnosis and the lesson (check Makefile's hardcoded jperl path before assuming `make test` exercises the current branch). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

The JVM auto-defaults MaxHeapSize to 1/4 of system RAM. On a 48 GB machine that's 12 GB per jperl. Test::Harness/IPC::Open3 spawns one child JVM per .t file (jcpan -t Module = parent + child); with the auto-default each parent+child pair reserves 24 GB of address space and quickly drives the OS into swap. The visible symptom is a 100% CPU "hang" inside Sub::Defer-generated DBIC accessors — actually G1 GC thrashing on a heap that can never settle because the memory pressure causes constant pause/scan cycles. DBIx::Class is the canonical reproducer: `time ./jcpan -t DBIx::Class` hung at t/96_is_deteministic_value.t with the JVM child showing elapsed=86s cpu=84s and the main thread perpetually inside Sub::Defer.pm:2382 (the deferred-sub re-entry point), even though the same test runs standalone in 8s on a cold JVM. Fix: cap heap at -Xmx=4G with -XX:SoftMaxHeapSize=2G as a soft target the G1 GC tries to stay under during normal operation. 4G hard cap accommodates the occasional large test (re/pat.t with big regex datasets, op/pack.t with 300KB strings) without OOM. 2G soft target keeps the typical case feasible when many JVMs run concurrently — N parents+children fit on a 48 GB machine without swap thrashing. Both -Xmx and -XX:SoftMaxHeapSize remain overridable through JPERL_OPTS in the environment for users who want different limits. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

The previous heap defaults (-Xmx4g + -XX:SoftMaxHeapSize=2g) caused DBIx::Class t/96_is_deteministic_value.t to hang at 100% CPU when run via Test::Harness, even though heap usage stays under 200 MB. Hypothesis: SoftMaxHeapSize triggers G1 GC to run more aggressive mixed collections, and the resulting young-GC cadence interacts badly with PerlOnJava's cooperative-refcount + weak-ref machinery in Sub::Defer-style deferred-sub re-entry — the deferred sub's $undeferred closure slot loses its strong ref to the just-undeferred code under the extra GC pressure, so the next call re-undefers, producing the observed busy-loop in eval N:983 / eval N:1046. Verified mapping (DBIx-Class 0.082844 t/96_is_deteministic_value.t via Test::Harness on a 48 GB machine): -Xmx auto (=12g) -> PASS in 7s -Xmx4g (no SoftMax) -> PASS in 14s -Xmx2g -> PASS in 13s -Xmx1g -> PASS in 12s -Xmx4g -XX:SoftMaxHeapSize=4g (==Xmx) -> PASS in 7s -Xmx4g -XX:SoftMaxHeapSize=3g -> PASS in 8s -Xmx4g -XX:SoftMaxHeapSize=2g -> HANG (100% CPU) -Xmx2g -XX:SoftMaxHeapSize=1g -> HANG (100% CPU) The bug fires whenever SoftMaxHeapSize is set BELOW Xmx. Removing all heap settings (letting the JVM auto-default to 1/4 of system RAM) is the cleanest fix — it honors per-machine appropriate defaults instead of imposing an arbitrary cap, and avoids the SoftMaxHeapSize trap entirely. Memory usage actually stays under 200 MB regardless of Xmx, so the auto-default of 12 GB on a 48 GB machine doesn't translate into 12 GB of resident memory; it's just virtual address space reservation, which is essentially free. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Adds an Investigation Plan section to dev/modules/dbix_class.md for the NEW failure mode observed today under `./jcpan -t DBIx::Class`: DBIx::Class::ResultSource::schema(): Unable to perform storage- dependent operations with a detached result source (source 'Artist' is not associated with a schema). at t/52leaks.t line 430 This is distinct from the existing "tests 12-18 leak detection at line 526" entry — that's a leak (objects not getting destroyed), this is the opposite (a schema getting destroyed too eagerly while a child resultset still expects it). Test passes standalone (11/11 in 46s); only fails when ~20+ prior DBIC tests have run through the same harness JVM. Suspected cause: the walker-gate property fix in PR #618 (commit ce8186e) widened DESTROY gating to every storedInPackageGlobal object — under cumulative state pressure, the gate fails to rescue a Schema/ResultSource pair, causing the weak ref from RS → Schema to read as undef. The plan section includes: - exact symptom + reproducer - code path that triggers it - hypothesis - 4-step diagnostic plan (bisect prefix, instrument Java side, reachability check, c4db69e-baseline verification) - what's NOT the cause (parent harness JVM is 99.7% idle in select polling) - "why we can't ship" — DBIx::Class is published as PASS in the CPAN compatibility report Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

fglock mentioned this pull request Apr 29, 2026

diag(D-W6.7): pinpoint root cause of metaclass refCount underflow #611

Closed

3 tasks

fglock force-pushed the fix/walker-gate-property-based branch from 57d8e2a to ce94475 Compare April 29, 2026 11:54

fglock mentioned this pull request Apr 29, 2026

fix(runtime): values() in scalar context returns count, not last element #619

Closed

7 tasks

fglock force-pushed the fix/walker-gate-property-based branch from 824524f to f453083 Compare April 29, 2026 13:55

fglock and others added 7 commits April 29, 2026 19:37

fglock force-pushed the fix/walker-gate-property-based branch from 5bd54bd to b646714 Compare April 29, 2026 17:38

fglock merged commit 23bd419 into master Apr 29, 2026
2 checks passed

fglock deleted the fix/walker-gate-property-based branch April 29, 2026 17:45

fglock mentioned this pull request Apr 30, 2026

docs(refcount): rename "cooperative" → "selective"; new memory-management.md #634

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: replace class-name walker-gate heuristic with property-based gate#618

fix: replace class-name walker-gate heuristic with property-based gate#618
fglock merged 7 commits intomasterfrom
fix/walker-gate-property-based

fglock commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fglock commented Apr 29, 2026

Summary

The Problem

The Solution

Verification

Performance

What's Included

Why This Is Correct

Alternatives Considered (and Rejected)

Follow-Up

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant