fglock · fglock · Apr 29, 2026 · Apr 29, 2026
diff --git a/dev/modules/moose_support.md b/dev/modules/moose_support.md
@@ -1991,6 +1991,189 @@ Tests fixed:
   - A handful of cmop/method introspection edge cases (constants,
     forward declarations, eval-defined subs).
 
+## Phase D-W6.2: refcount drift investigation log (2026-04-29)
+
+This investigation builds on PR #599's "no class-name dispatch" rule
+and PR #600's "remove the gate, find the drift sources" plan.
+
+### Reproducers shipped
+
+`src/test/resources/unit/refcount/drift/`:
+
+- **`sub_install.t`** — five sub-installation patterns (glob assign,
+  named sub, loop install, temp drop, nested install). All pass on
+  master AND with the walker gate disabled.
+- **`closure_capture.t`** — five closure-capture patterns (single,
+  two-, three-, five-layer wrap, plus a 20-closure chain). All pass
+  on master AND with the walker gate disabled.
+- **`hash_slot.t`** — four hash-slot patterns (direct, package
+  global, 50-entry registry, slot overwrite). All pass on master
+  AND with the walker gate disabled.
+
+`PJ_DESTROY_TRACE=1` is also wired into `DestroyDispatch.callDestroy`
+to print every destroy with `Pkg::subname` for `RuntimeCode` (and
+class name for blessed objects). Off by default; zero cost.
+
+### What the simple patterns prove
+
+The basic shapes of sub-install, closure-capture, and hash-slot all
+have correct cooperative refCount semantics in PerlOnJava — strong
+holds from package stashes, hash slots, and closure captures all
+keep their referents alive without the walker gate.
+
+### Where the drift actually is
+
+`PJ_DESTROY_TRACE=1 ./jperl -e 'use Class::MOP'` (gate disabled)
+fails with:
+
+```
+Can't call method "get_method" on an undefined value
+  at jar:PERL5LIB/Class/MOP/Attribute.pm line 475.
+```
+
+`Class::MOP::Attribute._remove_accessor` calls
+`$class->get_method($accessor)` where `$class = $self->associated_class()`.
+`associated_class` is a *weakened* ref. The weak ref reads as undef,
+which means the metaclass it pointed at was destroyed.
+
+The trace shows `Class::MOP::Class@1424108509` destroyed **twice**:
+
+```
+[DESTROY] Class::MOP::Class@1424108509 refCount=-2147483648
+  at MortalList.flush(line 585)
+  at anon1205.apply(.../Class/MOP/Class.pm:260)
+  ...
+[DESTROY] Class::MOP::Class@1424108509 refCount=-2147483648
+  at MortalList.drainPendingSince(line 659)
+  at DestroyDispatch.doCallDestroy(line 373)
+  at DestroyDispatch.callDestroy(line 266)
+  at MortalList.flush(line 585)
+  at anon1205.apply(.../Class/MOP/Class.pm:260)
+```
+
+Same identity-hash, two destroys. The metaclass instance was added
+to the deferred-decrement queue twice (or processed twice during
+cascading flush).
+
+### Conclusion
+
+D-W6.2 is **not** in the simple closure-capture path. The drift is
+specifically in the **metaclass-instance lifecycle during `Class::MOP`
+load**, where a Class::MOP::Class instance built up by
+`_construct_class_instance` ends up double-decremented when:
+
+1. its `attach_to_class` path weakens an attribute's `associated_class`
+   ref to itself, AND
+2. some intermediate scope-exit cleanup queues the instance for
+   deferred decrement that the cascading flush from another
+   destroy already drained.
+
+### Next concrete leads
+
+1. **Audit `MortalList.deferDecrementIfTracked`** for double-add: a
+   single `RuntimeBase` should never appear twice in the `pending`
+   list. Add an `IdentityHashMap`-based dedup at the deferred-add
+   point, or detect the second add and drop it.
+2. **Audit `MortalList.drainPendingSince`** — the second destroy of
+   `Class::MOP::Class@1424108509` came through this path. If the
+   pending list contains an entry whose refCount has already been
+   zeroed (or marked MIN_VALUE), `drainPendingSince` should skip it.
+3. **Audit `Class::MOP::Class.pm:260`** (the line emitting the
+   first destroy) — that's likely
+   `_construct_class_instance`'s last statement; figure out which
+   scope-exit puts the metaclass on the deferred queue.
+
+### What's deferred
+
+- D-W6.1 (sub-install drift): closed — the simple patterns work; the
+  observed Sub::Install destroys during bootstrap are *symptoms*,
+  not the root cause.
+- D-W6.2 (closure-capture drift): closed — the simple patterns work
+  here too.
+- **D-W6.4 (NEW) — pending-list double-add / metaclass lifecycle**:
+  the actual drift identified by the investigation. This is what
+  needs the next round of debugging.
+- D-W6.3 (`@_` argument promotion): still pending; reproducer not
+  yet written.
+
+## Phase D-W6.4 (continued, 2026-04-29): weak-metaclass drift hunt
+
+### Reproducer landed
+
+`src/test/resources/unit/refcount/drift/weak_metaclass.t` —
+14 tests covering:
+
+- A: weakened single hash slot, my-var holds strong ref
+- B: weakened hash slot + outer `@keepalive` array
+- C: 20 weakened slots in a loop with all preserved by `@keepalive`
+- D: weak-ref → strong-ref "rescue" pattern (the Schema::DESTROY
+  shape)
+
+All 14 pass on master AND with the walker gate disabled. So the
+simple **`store strong → weaken in place → other strong holder`**
+pattern is also not the drift source.
+
+### What this means
+
+The `Class::MOP` bootstrap drift is something *more specific* than:
+- sub installation (D-W6.1 — works without the gate)
+- closure capture (D-W6.2 — works without the gate)
+- hash-slot tracking (D-W6.2 — works without the gate)
+- weakened-hash + multi-holder (D-W6.4 — works without the gate)
+
+It must involve a combination of these patterns *plus* one of:
+
+1. **Cascade clear during destroyFired branch.** When a metaclass's
+   refCount drops to 0 and DESTROY runs, weak refs are cleared at
+   the end. If the SAME object's refCount drops to 0 again later
+   (via a duplicate pending entry, or via cooperative refcount
+   drift in some unrelated path), the second `callDestroy` enters
+   the `destroyFired` branch (DestroyDispatch.java:212-229) which
+   *also* does `WeakRefRegistry.clearWeakRefsTo`. This is harmless
+   if weak refs were already cleared, but DOES cascade into hash
+   contents via `scopeExitCleanupHash`. That cascade can decrement
+   refCount on the metaclass's own attribute references, which
+   transitively …
+
+2. **`destroyFired` cascade re-running after weak-ref-target rescue.**
+   The first DESTROY may have set up a Schema-style rescue (added
+   to `rescuedObjects`). If a second `callDestroy` enters and the
+   object is *not* in `rescuedObjects` (maybe the rescue list was
+   cleaned up by `processRescuedObjects` already), the second call
+   hits `WeakRefRegistry.clearWeakRefsTo` and the weak refs go.
+
+3. **Resurrection-in-flight from `args.push(self)` in doCallDestroy.**
+   `doCallDestroy` does `args.push(self)` (refCount: MIN_VALUE → 1
+   via setLargeRefCounted special case), then runs DESTROY. The
+   rebalance walk decrements refCount: 1 → 0. But if anything in
+   DESTROY queued a deferred decrement on `self`, the
+   `drainPendingSince` after the rebalance walk could decrement
+   again, going 0 → -1. The next callDestroy on the same object
+   sees refCount as MIN_VALUE-ish and rejects.
+
+The trace shows the second destroy comes through
+`drainPendingSince` after the first destroy's body — so #3 is the
+strongest hypothesis.
+
+### Concrete next steps
+
+1. **Audit `args.push(self)` and the rebalance walk in
+   `doCallDestroy`.** Specifically: does the Perl DESTROY body's
+   first-line `my $self = shift` (which is what most DESTROY methods
+   do, including Class::MOP::Class) decrement refCount via
+   deferDecrementIfTracked? If yes, the rebalance walk's "still in
+   args.elements" check would falsely fire.
+
+2. **Add a guard to `drainPendingSince` that also checks
+   `referent.destroyFired`.** If destroyFired is true, skip the
+   entry — the cascading destroy already handled cleanup.
+
+3. **Instrument `pending.add` to log identity-hash + caller** when
+   the same `RuntimeBase` is added a second time. This surfaces
+   the duplicate-add path directly.
+
+
+
 ## Related Documents
 
 - [xs_fallback.md](xs_fallback.md) — XS fallback mechanism

diff --git a/src/main/java/org/perlonjava/runtime/runtimetypes/DestroyDispatch.java b/src/main/java/org/perlonjava/runtime/runtimetypes/DestroyDispatch.java
@@ -17,6 +17,12 @@
  */
 public class DestroyDispatch {
 
+    /** Phase D-W6 debug: enable destroy tracing via -Dperlonjava.destroyTrace=1
+     *  or env PJ_DESTROY_TRACE=1. */
+    private static final boolean DESTROY_TRACE =
+            "1".equals(System.getProperty("perlonjava.destroyTrace"))
+            || "1".equals(System.getenv("PJ_DESTROY_TRACE"));
+
     // BitSet indexed by |blessId| — set if the class defines DESTROY (or AUTOLOAD)
     private static final BitSet destroyClasses = new BitSet();
 
@@ -146,6 +152,24 @@ public static void invalidateCache() {
     public static void callDestroy(RuntimeBase referent) {
         // refCount is already MIN_VALUE (set by caller)
 
+        // Phase D-W6 debug: optional trace of every destroy call.
+        // Enable with -Dperlonjava.destroyTrace=1 (or env PJ_DESTROY_TRACE=1)
+        // to find refCount-drift sources.
+        if (DESTROY_TRACE) {
+            String klass = referent.blessId != 0
+                    ? NameNormalizer.getBlessStr(referent.blessId)
+                    : referent.getClass().getSimpleName();
+            String extra = "";
+            if (referent instanceof RuntimeCode rc) {
+                extra = " name=" + (rc.packageName != null ? rc.packageName : "?")
+                        + "::" + (rc.subName != null ? rc.subName : "(anon)");
+            }
+            System.err.println("[DESTROY] " + klass + "@"
+                    + System.identityHashCode(referent)
+                    + " refCount=" + referent.refCount + extra);
+            new RuntimeException("destroy trace").printStackTrace(System.err);
+        }
+
         // Phase 3 (refcount_alignment_plan.md): Re-entry guard.
         // If this object is already inside its own DESTROY body, a transient
         // decrement-to-0 (local temp release, deferred MortalList flush,

diff --git a/src/test/resources/unit/refcount/drift/closure_capture.t b/src/test/resources/unit/refcount/drift/closure_capture.t
@@ -0,0 +1,132 @@
+# D-W6.2 — Closure-capture drift reproducer.
+#
+# Tracing `PJ_DESTROY_TRACE=1 ./jperl -e 'use Class::MOP::Class'` showed
+# anonymous CVs from Sub::Install being destroyed prematurely. The
+# pattern is Sub::Install's nested closure wrappers:
+#
+#     *install_sub = _build_public_installer(_ignore_warnings(_installer));
+#
+# Each layer is `sub { ... my $code = shift; sub { $code->(@_) } }` —
+# a closure that captures a CODE-ref my-var and returns a new closure
+# using it. Three layers stack three levels of capture.
+#
+# The hypothesis (D-W6.2): when a closure captures a my-var holding a
+# CODE ref, and the my-var's outer scope exits, PerlOnJava decrements
+# the CODE ref's cooperative refCount even though the closure still
+# references it. The walker gate masks this; without the gate the
+# CODE ref's refCount goes negative and DESTROY fires.
+use strict;
+use warnings;
+use Test::More;
+
+# ---- Pattern A: single-layer wrap (baseline) -----------------------------
+sub wrap_one {
+    my $code = shift;
+    sub { $code->(@_) };
+}
+
+{
+    my $cv = sub { 'A-result' };
+    my $wrapped = wrap_one($cv);
+    $cv = undef;            # drop outer reference
+
+    is $wrapped->(), 'A-result',
+        'A: single-layer wrapped closure callable after outer ref dropped';
+}
+
+# ---- Pattern B: two-layer wrap -------------------------------------------
+sub wrap_two_a {
+    my $code = shift;
+    sub { $code->(@_) };
+}
+sub wrap_two_b {
+    my $code = shift;
+    sub { my $r = $code->(@_); $r };
+}
+
+{
+    my $cv = sub { 'B-result' };
+    my $wrapped = wrap_two_b(wrap_two_a($cv));
+    $cv = undef;
+
+    is $wrapped->(), 'B-result',
+        'B: two-layer wrapped closure callable';
+}
+
+# ---- Pattern C: three-layer wrap (Sub::Install shape) --------------------
+# This is the precise install_sub pattern.
+sub _installer {
+    sub {
+        my ($pkg, $name, $code) = @_;
+        no strict 'refs';
+        *{"${pkg}::${name}"} = $code;
+        return $code;
+    }
+}
+
+sub _ignore_warnings {
+    my $code = shift;
+    sub {
+        local $SIG{__WARN__} = sub {};
+        $code->(@_);
+    };
+}
+
+sub _build_public_installer {
+    my $installer = shift;
+    sub {
+        my $arg = shift;
+        $installer->(@{$arg}{qw(into as code)});
+    };
+}
+
+# Build the install function the way Sub::Install does it.
+my $install_sub = _build_public_installer(_ignore_warnings(_installer()));
+
+# The build helpers' temp lexicals (`$code`, `$installer`) are now out of
+# scope — the only ref to each layer's CV is the next outer closure's
+# capture.
+
+$install_sub->({
+    into => 'D_W6_2_C',
+    as   => 'method',
+    code => sub { 'C-result' },
+});
+
+ok exists &D_W6_2_C::method, 'C: three-layer install put method in stash';
+is D_W6_2_C->method, 'C-result',
+    'C: three-layer-installed method callable';
+
+# ---- Pattern D: deep capture chain (5 levels) ----------------------------
+sub make_layer {
+    my $depth = shift;
+    return sub { @_ } if $depth == 0;
+    my $inner = make_layer($depth - 1);
+    return sub { $inner->(@_) };
+}
+
+{
+    my $top = make_layer(5);
+    is_deeply [$top->('deep-1', 'deep-2')], ['deep-1', 'deep-2'],
+        'D: 5-layer deep capture chain returns args';
+}
+
+# ---- Pattern E: closure captures a CV that captures a CV -----------------
+# Each level captures the level below — refCount on each captured CV
+# must not decay.
+sub make_chain {
+    my $tag = shift;
+    my $inner = sub { "$tag-result" };
+    return sub {
+        my $extra = shift;
+        return $inner->() . " ($extra)";
+    };
+}
+
+my @chained = map { make_chain("E$_") } 1 .. 20;
+my @results = map { $chained[$_]->("call$_") } 0 .. 19;
+is scalar @results, 20, 'E: 20 chained closures all callable';
+is $results[0], 'E1-result (call0)', 'E: first closure result';
+is $results[19], 'E20-result (call19)', 'E: last closure result';
+
+done_testing;