Skip to content

fix(D-W6): three diagnostic env-flags + sharper Class::MOP drift diagnosis#607

Open
fglock wants to merge 1 commit intomasterfrom
fix/d-w6-pending-instrumentation
Open

fix(D-W6): three diagnostic env-flags + sharper Class::MOP drift diagnosis#607
fglock wants to merge 1 commit intomasterfrom
fix/d-w6-pending-instrumentation

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Apr 29, 2026

Summary

Adds three off-by-default debug env-flags to instrument the cooperative
refcount system. Combined trace output on use Class::MOP (with the
gate temporarily disabled to surface the bug) sharpens the D-W6
diagnosis significantly.

Diagnostic env-flags

Flag What it does
PJ_DESTROY_TRACE=1 Log every DestroyDispatch.callDestroy entry: class, identity, refCount, destroyFired, stack
PJ_PENDING_TRACE=1 At every MortalList.flush() start, log duplicate identities in pending
PJ_WEAKCLEAR_TRACE=1 Log every WeakRefRegistry.clearWeakRefsTo for blessed objects with caller stack

All zero-cost when disabled. The walker gate is restored — these are pure
diagnostic additions, no production behaviour change.

What the traces revealed

Running PJ_PENDING_TRACE=1 PJ_DESTROY_TRACE=1 PJ_WEAKCLEAR_TRACE=1 ./jperl -e 'use Class::MOP':

  1. A Class::MOP::Class instance reaches refCount=0 inside a
    legitimate MortalList.flush() call from inside an anonymous CV at
    Class/MOP/Class.pm:260. The trace says destroyFired=false
    this is the first destroy of the metaclass.

  2. pending contains the same metaclass identity multiple times
    (counted via [PENDING-DUP]), but each duplicate corresponds to a
    real strong reference (refCount=N matches count). So the drift is
    NOT a duplicate-add bug.

  3. The destroy clears Class::MOP::Class's weak refs, which wipes
    Class::MOP::Attribute::associated_class — the proximate cause of
    the bootstrap failure.

What this rules in / out

  • Earlier "double-destroy" framing was imprecise. The first
    destroy alone is sufficient to clear weak refs.
  • ❌ Auto-sweep — correctly guarded by ModuleInitGuard.
  • The first destroy IS legitimate from cooperative refcount's
    perspective.
    Some N pending decrements brought refCount from N to 0.
    The decrements are real — the cooperative count thinks no one
    strongly holds the metaclass.

So the real bug is upstream: someone is queueing a decrement on
the Class::MOP::Class metaclass that they shouldn't be
, OR
our %METAS is not registering as a strong holder.

Concrete next leads (sharper than D-W6.4 round 1)

Documented in dev/modules/moose_support.md:

  1. Audit RuntimeHash.put and the package-global hash store
    path.
    Class::MOP::store_metaclass_by_name does
    $METAS{$pkg} = $meta from inside a function — that path may
    skip the refCount increment that hash_slot.t exercises from
    the caller scope.

  2. Trace every refCount decrement of Class::MOP::Class
    instances
    with a stack trace. The decrement that takes
    refCount 1 → 0 is the smoking gun.

  3. Resolve Class/MOP/Class.pm:260's closure to its Perl
    source position to identify which scope-exit is leaking the
    decrement.

Test plan

  • make (build + unit tests) green.
  • master baseline behaviour unchanged (gate restored, no
    production paths altered).
  • All existing drift reproducers pass.

Open D-W6 PR backlog (unchanged)

Generated with Devin

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Adds three off-by-default debug env-flags to instrument the cooperative
refcount system:

- PJ_DESTROY_TRACE=1 — log every DestroyDispatch.callDestroy entry
  with class, identity hash, refCount, destroyFired, and stack.
- PJ_PENDING_TRACE=1 — at every MortalList.flush() start, log
  duplicate identities in `pending`.
- PJ_WEAKCLEAR_TRACE=1 — log every WeakRefRegistry.clearWeakRefsTo
  call for blessed objects with caller stack.

All zero-cost when disabled.

Combined trace output on `use Class::MOP` (with the gate disabled
to surface the bug) shows:

- A Class::MOP::Class instance reaches refCount=0 inside a
  MortalList.flush() call from inside an anon CV at
  Class/MOP/Class.pm:260.
- The first destroy (NOT a double-destroy — the second is a
  destroyFired cleanup pass) clears weak refs.
- That weak-ref clear wipes the Class::MOP::Attribute's
  `associated_class` weakened ref, making it read as undef in
  `_remove_accessor`, which is the proximate cause of the
  "Can't call method get_method" failure.

So the real bug is that some scope-exit is queueing a decrement on
the metaclass that brings refCount legitimately to 0 — i.e. the
cooperative refcount thinks `our %METAS` is NOT a strong holder.
The next session should add a "log every refCount decrement of
Class::MOP::Class" probe to find which scope's exit takes refCount
from 1 to 0.

The walker gate is restored (matches master) — these are pure
diagnostic additions.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant