Skip to content

docs(D-W6.4): weak-metaclass reproducer + concrete hypotheses for the actual drift#606

Open
fglock wants to merge 2 commits intomasterfrom
fix/d-w6-4-pending-double-add
Open

docs(D-W6.4): weak-metaclass reproducer + concrete hypotheses for the actual drift#606
fglock wants to merge 2 commits intomasterfrom
fix/d-w6-4-pending-double-add

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Apr 29, 2026

Summary

D-W6.4 investigation continued. The simple "store strong → weaken in
place → outer keepalive" pattern works correctly without the walker
gate
— yet another shape ruled out as the drift source.

What landed

  • src/test/resources/unit/refcount/drift/weak_metaclass.t (14
    tests) — direct weakened slot, weakened slot + outer @Keepalive,
    20-entry loop variant, weak-ref → strong-ref "rescue"
    (Schema::DESTROY shape). All 14 pass on master AND with the
    walker gate disabled.

What we now know is not the drift source

After four reproducer files (sub_install, closure_capture,
hash_slot, weak_metaclass) and 53 total bare-Perl test cases:

  • ✅ Sub installation (glob assign, named sub, loop install,
    temp drop, nested install) — works without the gate
  • ✅ Closure capture (single-, two-, three-, five-layer wrap, plus
    20-closure chain) — works without the gate
  • ✅ Hash-slot tracking (direct slot, package global, 50-entry
    registry, slot overwrite) — works without the gate
  • ✅ Weakened-hash + multi-holder — works without the gate

The Class::MOP drift is something more specific than any of
these shapes alone.

Three concrete next-step hypotheses

Documented in dev/modules/moose_support.md D-W6.4 section:

  1. Audit args.push(self) and the rebalance walk in
    doCallDestroy.
    When the DESTROY body's first line is
    my $self = shift, the shift queues a deferred decrement.
    The rebalance-walk-after-DESTROY may double-count this — going
    refCount=1 (push) → 0 (rebalance) → -1 (drainPendingSince
    processing the queued shift decrement).

  2. Guard drainPendingSince against entries with
    destroyFired=true.
    Such entries have already been handled
    by the cascading destroy and re-processing them in
    drainPendingSince clears weak refs that downstream code (the
    weakened associated_class ref in Class::MOP::Attribute)
    relies on.

  3. Instrument pending.add to log identity-hash + caller when
    the same RuntimeBase is added twice. This surfaces the
    duplicate-add path directly.

#3 is the cheapest experiment to try first — it's a one-line
IdentityHashMap wrapped around pending.add.

Open D-W6 PR backlog

Test plan

  • make (build + unit tests) green.
  • weak_metaclass.t 14/14 on master (gate active).
  • weak_metaclass.t 14/14 with the gate disabled (probe build).
  • All other drift reproducers still pass.

Generated with Devin

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

fglock and others added 2 commits April 29, 2026 09:40
…drift

D-W6.2 investigation outcome:

The simple closure-capture and hash-slot patterns all work correctly
without the walker gate. Reproducers landing in
src/test/resources/unit/refcount/drift/:

- closure_capture.t (8 tests) — single, two-, three-, five-layer
  wrap, plus a 20-closure chain.
- hash_slot.t (14 tests) — direct slot, package global, 50-entry
  registry, slot overwrite.
- sub_install.t (12 tests, copied from earlier branch) — five
  sub-install patterns.

All pass on master AND with the walker gate disabled. Therefore
the simple shapes of these three code paths have correct cooperative
refCount semantics; they are NOT the source of the drift.

PJ_DESTROY_TRACE=1 instrumentation added to DestroyDispatch.callDestroy
(zero-cost when off; prints Pkg::subname for RuntimeCode and the
class name for blessed objects).

The actual drift, surfaced by `PJ_DESTROY_TRACE=1 ./jperl -e 'use
Class::MOP'` (gate disabled), is in the metaclass-instance lifecycle:
the same Class::MOP::Class instance is destroyed TWICE (same
identity hash) — once via MortalList.flush, once via
MortalList.drainPendingSince in a cascading flush. Investigation
notes in dev/modules/moose_support.md (Phase D-W6.2) describe three
concrete next leads:

1. Audit MortalList.deferDecrementIfTracked for double-add.
2. Audit MortalList.drainPendingSince for entries that have already
   been zeroed.
3. Trace which scope-exit on Class/MOP/Class.pm:260 puts the
   metaclass on the deferred queue.

D-W6.4 (a new sub-phase) is added to track this work; D-W6.1 and
D-W6.2 are closed as "the simple patterns work, the actual drift is
elsewhere".

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…potheses

D-W6.4 investigation continued. Added one more reproducer:

- weak_metaclass.t (14 tests) — store strong → weaken in place →
  outer keepalive holds strong; 20-entry loop variant; and the
  weak-ref → strong-ref "rescue" (Schema::DESTROY) pattern.

All 14 pass on master AND with the walker gate disabled. So the
simple "store strong, weaken in place" pattern is also not the
drift source.

Combined with D-W6.1 / D-W6.2 findings, the drift is something
MORE specific than: sub installation, closure capture, hash-slot
tracking, weakened-hash + multi-holder. Each of those simple
shapes works correctly without the walker gate.

The trace data points to the destroyFired branch in
DestroyDispatch.callDestroy as the cleanup path that actually
clears the weak refs that break Class::MOP's bootstrap. The
plausible path that re-enters callDestroy after the first
destroy is `drainPendingSince` post-DESTROY — when the
`my $self = shift` inside Class::MOP::Class's DESTROY body
queues a deferred decrement on a RuntimeBase that the rebalance
walk thought it had already handled.

Three concrete next-step hypotheses recorded in moose_support.md:

1. Audit args.push(self) and the rebalance walk in doCallDestroy
   for the case where the DESTROY body's `shift @_` queues a
   decrement that drainPendingSince re-processes.
2. Guard drainPendingSince against entries with destroyFired=true.
3. Instrument pending.add to log identity-hash + caller when the
   same RuntimeBase is added twice.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant