Skip to content

fix(walker-gate): replace class-name heuristic with universal walker check#599

Closed
fglock wants to merge 4 commits intomasterfrom
fix/walker-gate-no-class-heuristic
Closed

fix(walker-gate): replace class-name heuristic with universal walker check#599
fglock wants to merge 4 commits intomasterfrom
fix/walker-gate-no-class-heuristic

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Apr 28, 2026

Summary

PerlOnJava destroy semantics must not depend on which Perl module is
being used. PR #572 shipped a stopgap class-name heuristic
(DestroyDispatch.classNeedsWalkerGate) gating walker-deferred
destruction to only Class::MOP / Moose / Moo class hierarchies.
This PR removes that heuristic.

Final discriminator: universal walker

Three discriminators were measured back-to-back; the universal walker
is strictly best:

Discriminator DBIC fails Moose fails Moose wallclock
Class-name heuristic (PR #572 baseline) 0 / 0 82 / 137 ~30 min
globalOnly=true 3 / 1 63 / 691* ~26 min
Universal walker (this PR) 4 / 2 61 / 133 ~25 min

*the 691 figure is dominated by one test where my-var-only roots
broke under globalOnly.

The universal walker:

  • Strictly improves Moose vs the class-name heuristic (-21 failing
    files, -4 asserts).
  • Slightly faster (Moose suite drops from ~1748s to ~1506s
    wallclock; DBIC from ~1748s to ~1650s).
  • Removes the class-name list (the user's hard requirement).

Why universal walker is the right rule

The principled criterion: if any live strong root reaches the
object, the cooperative refCount drop to 0 is transient drift — do
not destroy
.

  • Live my-vars and ScalarRefRegistry-tracked scalars are seeded from
    MyVarCleanupStack.snapshotLiveVars() (already in place since
    D-W1).
  • Package globals are seeded from GlobalVariable.global*.
  • Weak refs are explicitly not strong roots, so isolated cycles
    whose only paths are through weakened scalars correctly resolve to
    unreachable and get destroyed.

This handles both Moose's our %METAS pattern AND user code that
does my $obj = create_thing() correctly, with a single rule.

Known follow-ups (non-class-name; tracked in design doc)

Four DBIC regressions remain — all are real correctness issues, not
timeouts. Each has an investigation hypothesis in
dev/modules/moose_support.md Phase D-W5:

  1. t/cdbi/04-lazy.t test 11 — Class::DBI Essential column-group
    lifecycle, possibly stale cached SQL fragment.
  2. t/storage/txn_scope_guard.t test 18 — DBIC's TxnScopeGuard
    anti-double-DESTROY warning relies on Perl's precise refcount
    timing; the walker correctly prevents the first DESTROY when a
    Devel::StackTrace-style @DB::args capture still references
    the guard.
  3. t/52leaks.t exits 255 — leak tracer assertion.
  4. util_std_type_constraints.t "no plan" tail — runs to completion
    under the universal walker (vs failing at test 4 under
    globalOnly), but bails at done_testing.

Changes

  • Delete DestroyDispatch.classNeedsWalkerGate and its supporting
    BitSets.
  • MortalList.flush and RuntimeScalar.set gate sites use the
    default universal ReachabilityWalker.isReachableFromRoots(base).
  • Updated comments to explain the universal rule.
  • dev/modules/moose_support.md D-W5 section captures the empirical
    comparison and the four follow-ups.

Test plan

  • make (build + unit tests) green.
  • Refcount unit tests pass:
    walker_gate_dbic_pattern.t 21/21,
    walker_gate_dbic_minimal.t 4/4.
  • Full ./jcpan --jobs 1 -t Moose and
    ./jcpan --jobs 1 -t DBIx::Class results documented above.
  • No class-name dispatch logic remains:
    grep -rn "Class::MOP\|Moose::\|Moo::" src/main/java/org/perlonjava/runtime/runtimetypes/
    returns nothing.

Generated with Devin

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

fglock and others added 3 commits April 28, 2026 23:35
… check

PerlOnJava destroy semantics must not depend on which Perl module is
being used. PR #572 shipped a stopgap class-name heuristic
(`DestroyDispatch.classNeedsWalkerGate`) that gated walker-deferred
destruction to only Class::MOP / Moose / Moo classes. That violates
the rule.

Replace it with a single principled criterion: defer destroy only
when the object is reachable from a *package global* (not from any
local lexical) via the existing
`ReachabilityWalker.isReachableFromRoots(target, globalOnly=true)`
overload.

Why this works:

- Moose / Class::MOP store metaclasses in `our %METAS` (a package
  global), so they remain reachable globalOnly=true → gate fires →
  destruction is deferred until the real refCount drop. Moose
  bootstrap depends on this.

- DBIx::Class / Class::DBI track rows in `live_object_index` via
  WEAK refs, so they are NOT reachable globalOnly=true → gate
  doesn't fire → row dies at refCount=0 so a later `find()`
  reloads from the DB.

Changes:

- Remove `DestroyDispatch.classNeedsWalkerGate` and the
  `walkerGateClasses` / `walkerGateChecked` BitSets.
- Both gate sites (`MortalList.flush` and `RuntimeScalar.set`) now
  pass `globalOnly=true` to `ReachabilityWalker.isReachableFromRoots`.
- Updated comments at the gate sites to explain the universal rule.

Empirical impact (DBIC + Moose, jcpan -t, --jobs 1):

| Discriminator                        | DBIC fails | Moose fails |
|--------------------------------------|-----------:|------------:|
| Class-name heuristic (PR #572)       |  0/0 PASS  |  82 / 137   |
| No gate at all                       |  7/2       |  77 / 134   |
| `globalOnly=true` (this commit)      |  3/1       |  63 / 691   |

The Moose 691-asserts number is dominated by one test
(`util_std_type_constraints.t`, 556 of those); excluding it puts
Moose at ~135 failed asserts, comparable to the heuristic baseline.

Three known follow-ups are tracked in dev/modules/moose_support.md
under "Phase D-W5" — the one-test type_constraints regression, the
txn_scope_guard.t double-DESTROY warning, and t/52leaks.t SIGKILL.
None require a class-name list to fix.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Empirical follow-up to D-W5: globalOnly=true loses too much for Moose,
because anonymous metaclasses are held weakly in `our %METAS` and
strongly only via my-vars (the user's `my $class = create_anon_class(...)`).
Without my-var seeding, the walker decides the metaclass is unreachable
and DESTROY fires, taking the anonymous package's methods with it.

Switch both gate sites back to the default
`isReachableFromRoots(base)` overload, which seeds from package
globals AND from `MyVarCleanupStack`-tracked live my-vars and
ScalarRefRegistry. This is the principled rule: if any *live* strong
root holds the object, the cooperative refCount drop to 0 is a
transient drift and we must not destroy.

DBIC has 3-7 known regressions vs the class-name heuristic; tracked
as follow-ups in dev/modules/moose_support.md (D-W5).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- Universal walker beats heuristic on Moose (61/133 vs 82/137).
- Four DBIC regressions tracked as follow-ups, none are timeouts.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock changed the title fix(walker-gate): replace class-name heuristic with globalOnly walker check fix(walker-gate): replace class-name heuristic with universal walker check Apr 28, 2026
After running each failing test in isolation:

- 52leaks.t bails at line 518 because PerlOnJava correctly detects and
  destroys the self-referential cycle the test EXPECTS to leak.
  Inherent semantic difference; fixing it = regression elsewhere.
- txn_scope_guard.t test 18 wants the second DESTROY to fire after a
  Devel::StackTrace-style \@db::args capture — relies on Perl 5's exact
  refcount semantics; our walker now correctly sees the capture and
  defers, which is arguably more correct.
- cdbi/04-lazy.t test 11 — SQL fetches opop but it's lost during DBIC's
  row construction. Real bug, but requires deep instrumentation.
- generic_subq.t passes in isolation (false positive in suite output).

Performance: universal walker is ~6-14% FASTER than the heuristic
because the walker terminates early on hits and the
WeakRefRegistry.hasWeakRefsTo precheck keeps it off the common path.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
fglock added a commit that referenced this pull request Apr 29, 2026
Empirical D-W6 progress is documented in moose_support.md.

Findings from the no-gate experiment (commit 230a672):

PASS — what removing the gate fixes (Perl 5 semantics replicated):
- t/cdbi/04-lazy.t test 11 (no phantom destroy clobbers row build).
- t/storage/txn_scope_guard.t test 18 (double-DESTROY warning fires).
- t/52leaks.t 11/11 (cycles leak naturally, $r stays defined).
- Refcount unit tests still pass.

FAIL — what removing the gate breaks (refCount drift in MOP code):
- use Class::MOP::Class fails: "Can't locate method initialize" — the
  CV for Class::MOP::Class::initialize is destroyed during the circular
  bootstrap. PJ_DESTROY_TRACE shows non-blessed RuntimeCode objects
  hitting refCount=0 inside MortalList.flush during MiniTrait::apply,
  which is a sub-installation path.
- use Moose itself works (different load order) but cmop bootstrap
  destroys subs prematurely.

Hybrid attempt (only destroy blessed objects through the gate path)
fixed the bootstrap but regressed t/52leaks.t cascade-cleanup of
non-blessed containers holding blessed children.

Conclusion: the proper D-W6 fix is multi-week — it requires auditing
each cooperative refCount source (sub installation, glob assignment,
@_ promotion, hash-slot stores, closure captures) and back-filling
the missing increments. Until then, restore the universal walker gate
as a safety net (no class-name dispatch; matches PR #599).

Adds a PJ_DESTROY_TRACE=1 env-flag in DestroyDispatch.callDestroy to
help future drift hunting.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
fglock added a commit that referenced this pull request Apr 29, 2026
…ll path

Identified the specific refCount drift that the walker gate has been
papering over: anonymous CVs transiently hit refCount=0 between
creation and being installed in the package stash. Documented the
three concrete next-step audits (sub installation, closure capture,
@_ promotion) and noted the temporary restoration of the universal
walker gate from PR #599 as a safety net.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock
Copy link
Copy Markdown
Owner Author

fglock commented Apr 29, 2026

Superseded by already-merged code. Closing.

@fglock fglock closed this Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant