Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
183 changes: 183 additions & 0 deletions dev/modules/moose_support.md
Original file line number Diff line number Diff line change
Expand Up @@ -1991,6 +1991,189 @@ Tests fixed:
- A handful of cmop/method introspection edge cases (constants,
forward declarations, eval-defined subs).

## Phase D-W6.2: refcount drift investigation log (2026-04-29)

This investigation builds on PR #599's "no class-name dispatch" rule
and PR #600's "remove the gate, find the drift sources" plan.

### Reproducers shipped

`src/test/resources/unit/refcount/drift/`:

- **`sub_install.t`** — five sub-installation patterns (glob assign,
named sub, loop install, temp drop, nested install). All pass on
master AND with the walker gate disabled.
- **`closure_capture.t`** — five closure-capture patterns (single,
two-, three-, five-layer wrap, plus a 20-closure chain). All pass
on master AND with the walker gate disabled.
- **`hash_slot.t`** — four hash-slot patterns (direct, package
global, 50-entry registry, slot overwrite). All pass on master
AND with the walker gate disabled.

`PJ_DESTROY_TRACE=1` is also wired into `DestroyDispatch.callDestroy`
to print every destroy with `Pkg::subname` for `RuntimeCode` (and
class name for blessed objects). Off by default; zero cost.

### What the simple patterns prove

The basic shapes of sub-install, closure-capture, and hash-slot all
have correct cooperative refCount semantics in PerlOnJava — strong
holds from package stashes, hash slots, and closure captures all
keep their referents alive without the walker gate.

### Where the drift actually is

`PJ_DESTROY_TRACE=1 ./jperl -e 'use Class::MOP'` (gate disabled)
fails with:

```
Can't call method "get_method" on an undefined value
at jar:PERL5LIB/Class/MOP/Attribute.pm line 475.
```

`Class::MOP::Attribute._remove_accessor` calls
`$class->get_method($accessor)` where `$class = $self->associated_class()`.
`associated_class` is a *weakened* ref. The weak ref reads as undef,
which means the metaclass it pointed at was destroyed.

The trace shows `Class::MOP::Class@1424108509` destroyed **twice**:

```
[DESTROY] Class::MOP::Class@1424108509 refCount=-2147483648
at MortalList.flush(line 585)
at anon1205.apply(.../Class/MOP/Class.pm:260)
...
[DESTROY] Class::MOP::Class@1424108509 refCount=-2147483648
at MortalList.drainPendingSince(line 659)
at DestroyDispatch.doCallDestroy(line 373)
at DestroyDispatch.callDestroy(line 266)
at MortalList.flush(line 585)
at anon1205.apply(.../Class/MOP/Class.pm:260)
```

Same identity-hash, two destroys. The metaclass instance was added
to the deferred-decrement queue twice (or processed twice during
cascading flush).

### Conclusion

D-W6.2 is **not** in the simple closure-capture path. The drift is
specifically in the **metaclass-instance lifecycle during `Class::MOP`
load**, where a Class::MOP::Class instance built up by
`_construct_class_instance` ends up double-decremented when:

1. its `attach_to_class` path weakens an attribute's `associated_class`
ref to itself, AND
2. some intermediate scope-exit cleanup queues the instance for
deferred decrement that the cascading flush from another
destroy already drained.

### Next concrete leads

1. **Audit `MortalList.deferDecrementIfTracked`** for double-add: a
single `RuntimeBase` should never appear twice in the `pending`
list. Add an `IdentityHashMap`-based dedup at the deferred-add
point, or detect the second add and drop it.
2. **Audit `MortalList.drainPendingSince`** — the second destroy of
`Class::MOP::Class@1424108509` came through this path. If the
pending list contains an entry whose refCount has already been
zeroed (or marked MIN_VALUE), `drainPendingSince` should skip it.
3. **Audit `Class::MOP::Class.pm:260`** (the line emitting the
first destroy) — that's likely
`_construct_class_instance`'s last statement; figure out which
scope-exit puts the metaclass on the deferred queue.

### What's deferred

- D-W6.1 (sub-install drift): closed — the simple patterns work; the
observed Sub::Install destroys during bootstrap are *symptoms*,
not the root cause.
- D-W6.2 (closure-capture drift): closed — the simple patterns work
here too.
- **D-W6.4 (NEW) — pending-list double-add / metaclass lifecycle**:
the actual drift identified by the investigation. This is what
needs the next round of debugging.
- D-W6.3 (`@_` argument promotion): still pending; reproducer not
yet written.

## Phase D-W6.4 (continued, 2026-04-29): weak-metaclass drift hunt

### Reproducer landed

`src/test/resources/unit/refcount/drift/weak_metaclass.t` —
14 tests covering:

- A: weakened single hash slot, my-var holds strong ref
- B: weakened hash slot + outer `@keepalive` array
- C: 20 weakened slots in a loop with all preserved by `@keepalive`
- D: weak-ref → strong-ref "rescue" pattern (the Schema::DESTROY
shape)

All 14 pass on master AND with the walker gate disabled. So the
simple **`store strong → weaken in place → other strong holder`**
pattern is also not the drift source.

### What this means

The `Class::MOP` bootstrap drift is something *more specific* than:
- sub installation (D-W6.1 — works without the gate)
- closure capture (D-W6.2 — works without the gate)
- hash-slot tracking (D-W6.2 — works without the gate)
- weakened-hash + multi-holder (D-W6.4 — works without the gate)

It must involve a combination of these patterns *plus* one of:

1. **Cascade clear during destroyFired branch.** When a metaclass's
refCount drops to 0 and DESTROY runs, weak refs are cleared at
the end. If the SAME object's refCount drops to 0 again later
(via a duplicate pending entry, or via cooperative refcount
drift in some unrelated path), the second `callDestroy` enters
the `destroyFired` branch (DestroyDispatch.java:212-229) which
*also* does `WeakRefRegistry.clearWeakRefsTo`. This is harmless
if weak refs were already cleared, but DOES cascade into hash
contents via `scopeExitCleanupHash`. That cascade can decrement
refCount on the metaclass's own attribute references, which
transitively …

2. **`destroyFired` cascade re-running after weak-ref-target rescue.**
The first DESTROY may have set up a Schema-style rescue (added
to `rescuedObjects`). If a second `callDestroy` enters and the
object is *not* in `rescuedObjects` (maybe the rescue list was
cleaned up by `processRescuedObjects` already), the second call
hits `WeakRefRegistry.clearWeakRefsTo` and the weak refs go.

3. **Resurrection-in-flight from `args.push(self)` in doCallDestroy.**
`doCallDestroy` does `args.push(self)` (refCount: MIN_VALUE → 1
via setLargeRefCounted special case), then runs DESTROY. The
rebalance walk decrements refCount: 1 → 0. But if anything in
DESTROY queued a deferred decrement on `self`, the
`drainPendingSince` after the rebalance walk could decrement
again, going 0 → -1. The next callDestroy on the same object
sees refCount as MIN_VALUE-ish and rejects.

The trace shows the second destroy comes through
`drainPendingSince` after the first destroy's body — so #3 is the
strongest hypothesis.

### Concrete next steps

1. **Audit `args.push(self)` and the rebalance walk in
`doCallDestroy`.** Specifically: does the Perl DESTROY body's
first-line `my $self = shift` (which is what most DESTROY methods
do, including Class::MOP::Class) decrement refCount via
deferDecrementIfTracked? If yes, the rebalance walk's "still in
args.elements" check would falsely fire.

2. **Add a guard to `drainPendingSince` that also checks
`referent.destroyFired`.** If destroyFired is true, skip the
entry — the cascading destroy already handled cleanup.

3. **Instrument `pending.add` to log identity-hash + caller** when
the same `RuntimeBase` is added a second time. This surfaces
the duplicate-add path directly.



## Related Documents

- [xs_fallback.md](xs_fallback.md) — XS fallback mechanism
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@
*/
public class DestroyDispatch {

/** Phase D-W6 debug: enable destroy tracing via -Dperlonjava.destroyTrace=1
* or env PJ_DESTROY_TRACE=1. */
private static final boolean DESTROY_TRACE =
"1".equals(System.getProperty("perlonjava.destroyTrace"))
|| "1".equals(System.getenv("PJ_DESTROY_TRACE"));

// BitSet indexed by |blessId| — set if the class defines DESTROY (or AUTOLOAD)
private static final BitSet destroyClasses = new BitSet();

Expand Down Expand Up @@ -146,6 +152,24 @@ public static void invalidateCache() {
public static void callDestroy(RuntimeBase referent) {
// refCount is already MIN_VALUE (set by caller)

// Phase D-W6 debug: optional trace of every destroy call.
// Enable with -Dperlonjava.destroyTrace=1 (or env PJ_DESTROY_TRACE=1)
// to find refCount-drift sources.
if (DESTROY_TRACE) {
String klass = referent.blessId != 0
? NameNormalizer.getBlessStr(referent.blessId)
: referent.getClass().getSimpleName();
String extra = "";
if (referent instanceof RuntimeCode rc) {
extra = " name=" + (rc.packageName != null ? rc.packageName : "?")
+ "::" + (rc.subName != null ? rc.subName : "(anon)");
}
System.err.println("[DESTROY] " + klass + "@"
+ System.identityHashCode(referent)
+ " refCount=" + referent.refCount + extra);
new RuntimeException("destroy trace").printStackTrace(System.err);
}

// Phase 3 (refcount_alignment_plan.md): Re-entry guard.
// If this object is already inside its own DESTROY body, a transient
// decrement-to-0 (local temp release, deferred MortalList flush,
Expand Down
132 changes: 132 additions & 0 deletions src/test/resources/unit/refcount/drift/closure_capture.t
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# D-W6.2 — Closure-capture drift reproducer.
#
# Tracing `PJ_DESTROY_TRACE=1 ./jperl -e 'use Class::MOP::Class'` showed
# anonymous CVs from Sub::Install being destroyed prematurely. The
# pattern is Sub::Install's nested closure wrappers:
#
# *install_sub = _build_public_installer(_ignore_warnings(_installer));
#
# Each layer is `sub { ... my $code = shift; sub { $code->(@_) } }` —
# a closure that captures a CODE-ref my-var and returns a new closure
# using it. Three layers stack three levels of capture.
#
# The hypothesis (D-W6.2): when a closure captures a my-var holding a
# CODE ref, and the my-var's outer scope exits, PerlOnJava decrements
# the CODE ref's cooperative refCount even though the closure still
# references it. The walker gate masks this; without the gate the
# CODE ref's refCount goes negative and DESTROY fires.
use strict;
use warnings;
use Test::More;

# ---- Pattern A: single-layer wrap (baseline) -----------------------------
sub wrap_one {
my $code = shift;
sub { $code->(@_) };
}

{
my $cv = sub { 'A-result' };
my $wrapped = wrap_one($cv);
$cv = undef; # drop outer reference

is $wrapped->(), 'A-result',
'A: single-layer wrapped closure callable after outer ref dropped';
}

# ---- Pattern B: two-layer wrap -------------------------------------------
sub wrap_two_a {
my $code = shift;
sub { $code->(@_) };
}
sub wrap_two_b {
my $code = shift;
sub { my $r = $code->(@_); $r };
}

{
my $cv = sub { 'B-result' };
my $wrapped = wrap_two_b(wrap_two_a($cv));
$cv = undef;

is $wrapped->(), 'B-result',
'B: two-layer wrapped closure callable';
}

# ---- Pattern C: three-layer wrap (Sub::Install shape) --------------------
# This is the precise install_sub pattern.
sub _installer {
sub {
my ($pkg, $name, $code) = @_;
no strict 'refs';
*{"${pkg}::${name}"} = $code;
return $code;
}
}

sub _ignore_warnings {
my $code = shift;
sub {
local $SIG{__WARN__} = sub {};
$code->(@_);
};
}

sub _build_public_installer {
my $installer = shift;
sub {
my $arg = shift;
$installer->(@{$arg}{qw(into as code)});
};
}

# Build the install function the way Sub::Install does it.
my $install_sub = _build_public_installer(_ignore_warnings(_installer()));

# The build helpers' temp lexicals (`$code`, `$installer`) are now out of
# scope — the only ref to each layer's CV is the next outer closure's
# capture.

$install_sub->({
into => 'D_W6_2_C',
as => 'method',
code => sub { 'C-result' },
});

ok exists &D_W6_2_C::method, 'C: three-layer install put method in stash';
is D_W6_2_C->method, 'C-result',
'C: three-layer-installed method callable';

# ---- Pattern D: deep capture chain (5 levels) ----------------------------
sub make_layer {
my $depth = shift;
return sub { @_ } if $depth == 0;
my $inner = make_layer($depth - 1);
return sub { $inner->(@_) };
}

{
my $top = make_layer(5);
is_deeply [$top->('deep-1', 'deep-2')], ['deep-1', 'deep-2'],
'D: 5-layer deep capture chain returns args';
}

# ---- Pattern E: closure captures a CV that captures a CV -----------------
# Each level captures the level below — refCount on each captured CV
# must not decay.
sub make_chain {
my $tag = shift;
my $inner = sub { "$tag-result" };
return sub {
my $extra = shift;
return $inner->() . " ($extra)";
};
}

my @chained = map { make_chain("E$_") } 1 .. 20;
my @results = map { $chained[$_]->("call$_") } 0 .. 19;
is scalar @results, 20, 'E: 20 chained closures all callable';
is $results[0], 'E1-result (call0)', 'E: first closure result';
is $results[19], 'E20-result (call19)', 'E: last closure result';

done_testing;
Loading
Loading