Skip to content

perf: reduce apply() bytecode size (Phase 1 + 2: TempLocalCountVisitor, buffer 256→32, TAILCALL trampoline extraction)#650

Merged
fglock merged 13 commits intomasterfrom
perf/reduce-apply-bytecode
May 2, 2026
Merged

perf: reduce apply() bytecode size (Phase 1 + 2: TempLocalCountVisitor, buffer 256→32, TAILCALL trampoline extraction)#650
fglock merged 13 commits intomasterfrom
perf/reduce-apply-bytecode

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented May 1, 2026

Summary

Reduces the bytecode size of JVM-compiled apply() methods, targeting
the C1 JIT failure ("out of virtual registers in linear scan") that causes
t/96_is_deteministic_value.t (DBIx::Class) to time out under CPU pressure.

Phase 1 — Extend TempLocalCountVisitor + reduce null-init buffer 256→32

Every apply() method begins with a null-initialisation prologue for all
local variable slots. The pre-init count was previously estimated with a
fixed +256 safety margin; this replaces it with a precise count:

  • Sub call operator (: +1 (callContextSlot)
  • eval/evalbytes: +4 (evalResultSlot, cfSlot, labelSlot, typeSlot)
  • for/foreach (For1Node): +4 (preEvalListLocal, iteratorIndex, 2 conditional)
  • while/for (For3Node): +1 (conservative)
  • Flip-flop ../...: +3 (flipFlopIdSlot, leftSlot, rightSlot)
  • xor/^^: +1 (leftVar)

Buffer reduced from +256 to +32. Result: ~56% total bytecode reduction
for the DBIx::Class workload.

Phase 2 — Extract per-call-site TAILCALL trampoline to resolveTailCalls()

Every sub call site previously emitted an ~83-byte inline TAILCALL trampoline
to handle goto &sub chains. This adds RuntimeCode.resolveTailCalls() as a
static helper and moves the trampoline logic there.

Key design: resolveTailCalls() is called only inside the
isNonLocalGoto()==true branch, so the common path (no control flow)
has zero extra overhead
vs Phase 1 — it remains:

apply(...)  →  ASTORE  →  ALOAD  →  isNonLocalGoto()  →  IFEQ (branch taken)

Per-call-site bytecode: ~83 bytes → ~38 bytes (saves ~45 bytes per site).
A method with 20 call sites shrinks by ~900 bytes; large SQL-generation
closures with 100+ call sites shrink by several KB.

Phase 2 bugfix — Eliminate VerifyError from int-typed callContextSlot

After Phase 2 landed, DBIx::Class t/multi_create/torture.t failed with
"You tried to plan twice". Root cause:

  • callContextSlot was allocated via allocateLocalVariable() and stored
    with ISTORE (int type) in EmitSubroutine, Dereference, and
    EmitOperator.
  • The pre-init loop initialises every temp slot as ACONST_NULL/ASTORE
    (reference type). At blockDispatcher merge points the JVM verifier saw
    "int vs null-reference" and threw VerifyError: Bad local variable type.
  • The VerifyError was caught by executeCodeImpl, which silently fell back
    to the interpreter — re-executing the script body and calling
    plan tests => 23 a second time.

Fix: replace every ISTORE/ILOAD callContextSlot with inline
pushCallContext() at each use site. The -> and ( entries in
TempLocalCountVisitor are kept (they control the pre-init range for
reference spill slots — removing them caused a second VerifyError at a
different slot).

Post-rebase fix — Restore correct tied-scalar FETCH count for fallback compound assigns

The Phase 2 changes exposed a pre-existing bug in the fallback path of
handleCompoundAssignment() (used for |=, &=, ^=, **=, <<=, >>=, .=).
For tied scalar lvalues, Perl semantics require two FETCHes on compound
assignment (one to read the current value, one to return the result). The
fallback path was returning set()'s return value — a plain scalar copy from
tiedStore() — instead of the lvalue itself, so the 2nd FETCH never fired.

Before Phase 2 this accidentally passed in op/bop.t because the tied object
was unwrapped later by a toString() call inside is(). Phase 2 changed
argument passing so that path no longer triggered, exposing the bug as 9 new
test failures (491/522 vs baseline 500/522).

Fix: keep leftSlot live across the set() call, discard set()'s return
value, then reload leftSlot — matching how addAssign, subtractAssign, etc.
already return arg1 directly. Result: op/bop.t 500/522 (baseline restored).

Files changed

File Change
TempLocalCountVisitor.java Extended to count 6 new local-allocating patterns; kept -> and ( entries as pre-init range proxies
EmitterMethodCreator.java Buffer: +256+32
RuntimeCode.java New resolveTailCalls(RuntimeList, int) static helper
EmitSubroutine.java Removed inline TAILCALL trampoline; removed callContextSlot allocation
Dereference.java Removed callContextSlot allocation; inline pushCallContext()
EmitOperator.java Removed callContextSlot allocation; inline pushCallContext()
EmitBinaryOperator.java Fallback compound-assign returns lvalue (not set() result) for correct tied-scalar FETCH semantics
dev/design/reduce-apply-bytecode.md Design doc + progress tracking

Test plan

  • make — all unit tests pass (after rebase onto master)
  • tail_calls.t — 7/7 (goto &sub factorial, multi-hop chains, @_ aliasing)
  • subroutine.t — 39/39
  • t/multi_create/torture.t — 23/23 (was failing with "plan twice" before bugfix)
  • op/bop.t — 500/522 (baseline restored; 9 tied-scalar compound-assign tests fixed)
  • Full DBIx::Class 0.082844 suite — 13,858/13,858 PASS (at commit 3a1707b56)
  • Both JVM and --interpreter backends
  • LAST/NEXT/REDO after tail-called sub return

Note on t/cdbi/68-inflate_has_a.t CI timeout

This test timed out (exit 137, 300 s) in one CI run "under load". It is not
a regression
: the test passes in <60 s locally on the current HEAD (all 6
subtests pass), and the full DBIx::Class suite PASS recorded at 3a1707b56
includes this test. The timeout is load-induced — competing JVMs delay JIT
compilation of DateTime (a large module loaded by this test) past the 300 s
harness limit, the same mechanism documented in dev/design/reduce-apply-bytecode.md.

Generated with Devin

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

@fglock fglock force-pushed the perf/reduce-apply-bytecode branch from 50f6c36 to 9389e05 Compare May 1, 2026 15:14
@fglock fglock changed the title perf: reduce apply() bytecode size (Phase 1: TempLocalCountVisitor + buffer 256→32) perf: reduce apply() bytecode size (Phase 1 + 2: TempLocalCountVisitor, buffer 256→32, TAILCALL trampoline extraction) May 1, 2026
@fglock fglock force-pushed the perf/reduce-apply-bytecode branch 2 times, most recently from de12a06 to 534885e Compare May 1, 2026 20:24
@fglock
Copy link
Copy Markdown
Owner Author

fglock commented May 1, 2026

Phase 4 added: method-size threshold for eval-generated code

Commit a2079f9d6 adds Phase 4 — the key fix for load-induced timeouts.

Problem recap

After Phases 1+2, 7 methods generated by Sub::Quote / Moo via string eval are still 9–36 KB. Under CPU pressure, C1 JIT either fails to compile them efficiently or triggers ASM frame-compute crashes, causing t/cdbi/68-inflate_has_a.t to time out at the 300 s harness limit.

Phase 4 approach

Add a method-size threshold check after bytecode generation in EmitterMethodCreator.getBytecodeInternal():

  • Parse the generated class file to find the largest method's code length
  • If the code is eval-generated (fileName starts with "(eval ") and the largest method exceeds JPERL_MAX_METHOD_BYTES (default: 9,000 bytes), throw a RuntimeException with "requires interpreter fallback" in the message
  • createRuntimeCode() catches this and routes to the PerlOnJava interpreter
  • Fast path: if classData.length <= threshold, skip parsing (99% of classes are small)

Why eval-only?

Regular Perl files can also have methods up to ~16 KB (e.g. unit test bodies in local.t) that work correctly on the JVM backend but would fail on the interpreter due to known parity gaps. The C1 failure problem exists exclusively in Sub::Quote string-eval'd code, so restricting the threshold to eval-generated code avoids false positives.

Results

make: all unit tests PASS
t/cdbi/68-inflate_has_a.t: PASS 6/6
  Phase 4 fires for 3 Sub::Quote methods:
    - 14,392 bytes
    - 13,449 bytes  
    - 15,668 bytes

The default threshold (9,000 bytes) catches all 7 Sub::Quote/Moo methods (minimum 9,043 bytes) while leaving all regular user-code methods on the fast JVM path.

Env vars

Var Effect
JPERL_MAX_METHOD_BYTES=N Override threshold (set to large number like 999999 to disable)
JPERL_SHOW_FALLBACK=1 Shows "Method too large for reliable JIT: N bytes" when threshold fires
JPERL_BYTECODE_SIZE_DEBUG=1 Shows all method sizes + threshold actions

fglock and others added 13 commits May 2, 2026 10:01
…racy

Extends TempLocalCountVisitor to accurately count temporary local
variable allocations that bypass the spill pool (i.e. always call
allocateLocalVariable() directly), allowing the pre-init buffer in
EmitterMethodCreator to shrink from +256 to +32.

Changes to TempLocalCountVisitor:
- BinaryOperatorNode("("): +1 per sub call (callContextSlot)
- BinaryOperatorNode("..", "..."): +3 per flip-flop (flipFlopIdSlot +
  leftSlot + rightSlot; overcounts for list-context range, which is safe)
- BinaryOperatorNode("xor", "^^"): +1 (leftVar always direct)
- OperatorNode("eval", "evalbytes"): +4 (evalResultSlot + cfSlot +
  labelSlot + typeSlot; was +1)
- For1Node: +4 (preEvalListLocal + iteratorIndex + savedLoopVarIndex +
  foreachRegexStateLocal; was +1)
- For3Node: +1 (regexStateLocal or conditionResultReg; was 0)

Reduces EmitterMethodCreator pre-init buffer from 256 to 32.
Infrastructure slots (~22: tailCallCodeRefSlot, controlFlowActionSlot,
16 spill slots, returnValueSlot, dynamicIndex, etc.) plus any remaining
uncounted patterns are absorbed by the 32-slot buffer.

Effect on a simple sub like `sub foo { my $x = shift; $x + 1 }`:
  Before: 0 (visitor) + 256 = 256 null-init bytecodes → ~773 bytes
  After:  1 (visitor) + 32  =  33 null-init bytecodes →  ~99 bytes

All unit tests pass at buffer=32.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ply-bytecode

Adds a new "How to Evaluate Results" section to dev/design/reduce-apply-bytecode.md
with:

- Six-step evaluation recipe: bytecode size measurement (JPERL_BYTECODE_SIZE_DEBUG),
  timing commands, correctness verification, C1 JIT failure detection via
  PrintCompilation and JFR, reference benchmarks, and per-phase size targets.
- Reference benchmarks from the Phase 1 completion run (2026-05-01):
  96_is_deteministic_value.t: 11.8s, 8/8; 76joins.t: 9.3s, 27/27;
  8,264 methods, 4.0 MB total, avg 511 bytes, largest 36,554 bytes,
  ~56% total bytecode reduction vs old buffer=256.
- Note that the largest method (36,554 bytes) is driven by per-call-site
  TAILCALL trampoline code (Phase 2 target), not by the pre-init buffer.
- "Progress Tracking" section showing Phase 1 complete, next steps for
  Phase 2 (trampoline extraction) and Phase 3 (eval prologue extraction).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ailCalls()

Replace the ~83-byte inline TAILCALL trampoline emitted at every JVM sub call
site with a single INVOKESTATIC to the new RuntimeCode.resolveTailCalls() helper.

Add `resolveTailCalls(RuntimeList result, int callContext)` static helper:
- Loops while the result is a TAILCALL control-flow marker
- Each iteration calls `apply(codeRef, "tailcall", args, callContext)`, which
  already sets `inTailCallTrampoline` to preserve the existing deep-recursion
  counter behaviour (tail calls don't inflate the depth counter)
- Returns the first non-TAILCALL result to the call site

Before the `ASTORE controlFlowTempSlot`, prepend:
  ILOAD callContextSlot
  INVOKESTATIC RuntimeCode.resolveTailCalls(RuntimeList, I)RuntimeList

Remove the 65-byte inline trampoline block that previously followed
`isNonLocalGoto → IFEQ notControlFlow`:
  - TAILCALL type check (CHECKCAST + getControlFlowType + ordinal + ICONST_4 + IF_ICMPNE)
  - tailcallLoop label + extract codeRef/args + INVOKESTATIC apply("tailcall", …)
  - re-check isNonLocalGoto + re-check TAILCALL + IF_ICMPEQ tailcallLoop
  - two GOTO blockDispatcher branches + notTailcall label

After the change the per-call-site sequence is:
  INVOKESTATIC apply(…)               ← unchanged
  ILOAD callContextSlot               ← new
  INVOKESTATIC resolveTailCalls(…)    ← new  (~5 bytes replaces ~65 bytes)
  ASTORE controlFlowTempSlot
  ALOAD  controlFlowTempSlot
  INVOKEVIRTUAL isNonLocalGoto()
  IFEQ notControlFlow
  GOTO blockDispatcher
  notControlFlow:
  ALOAD controlFlowTempSlot

- Per-call-site bytecode: ~83 bytes → ~18 bytes (~65 bytes saved per site)
- Large methods with many sub calls (e.g., SQL::Abstract closures with 250+
  call sites) shrink by ~16 KB, reducing C1 JIT "out of virtual registers"
  failures on CPU-starved machines
- tailCallCodeRefSlot / tailCallArgsSlot kept: still needed by the
  returnLabel trampoline in EmitterMethodCreator for goto &NAME in sub bodies

- All `make` unit tests pass
- tail_calls.t: 7/7, subroutine.t: 39/39
- goto &sub chains (factorial, multi-hop, @_ aliasing) verified correct
- LAST/NEXT/REDO after tail-called sub return verified correct
- Both JVM and --interpreter backends pass

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…e control-flow branch

The initial Phase 2 commit called resolveTailCalls() unconditionally at every
call site (before the isNonLocalGoto() check), adding an extra INVOKESTATIC
+ instanceof to the common path.  This reverts that regression.

resolveTailCalls() is now invoked only inside the isNonLocalGoto()==true
branch, which executes only when apply() returned a control-flow marker
(LAST/NEXT/REDO/GOTO/RETURN/TAILCALL) — a rare event in normal execution.

```
apply(...)
ASTORE controlFlowTempSlot
ALOAD  controlFlowTempSlot
INVOKEVIRTUAL isNonLocalGoto()   <- one virtual call, returns false
IFEQ notControlFlow              <- branch taken, nothing else executes
notControlFlow:
ALOAD controlFlowTempSlot
```

```
INVOKEVIRTUAL isNonLocalGoto()   <- returns true (control flow detected)
ALOAD  controlFlowTempSlot
ILOAD  callContextSlot
INVOKESTATIC resolveTailCalls()  <- resolves TAILCALL chain
ASTORE controlFlowTempSlot
INVOKEVIRTUAL isNonLocalGoto()   <- re-check (chain may have ended normally)
IFEQ   notControlFlow
GOTO   blockDispatcher
```

| Version        | Bytes/site | Hot-path overhead |
|----------------|------------|-------------------|
| Phase 1        | ~83 bytes  | 1 virtual call    |
| Phase 2 (bad)  | ~23 bytes  | 1 static + 1 virtual call (+1 extra) |
| Phase 2 (this) | ~38 bytes  | 1 virtual call (identical to Phase 1) |

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…code emitter

The `callContextSlot` variable was allocated via `allocateLocalVariable()` and
stored with `ISTORE` (int type) in four places:

  - EmitSubroutine.handleApplyOperator()  — sub calls: func(args)
  - Dereference.java  — method calls: $obj->method(args)
  - EmitOperator.handleSubstrOperator()  — substr built-in
  - EmitOperator.handleOperator()  — prototype-based operators

The pre-initialisation loop in EmitterMethodCreator initialises every
temp slot as `ACONST_NULL / ASTORE` (reference type).  When any of these
`callContextSlot` slots was in the pre-init range, two code paths (one
that executed `ISTORE callContextSlot` and one that jumped past it) met
at the `blockDispatcher` merge point with incompatible types (int vs
null-reference).  The JVM verifier threw:

    VerifyError: Bad local variable type

This VerifyError was caught by `executeCodeImpl` which re-ran the main
script body through the interpreter, calling `plan tests => 23` a second
time — causing "You tried to plan twice" in DBIx::Class torture.t.

Fix: replace each `ISTORE callContextSlot` + later `ILOAD callContextSlot`
with a direct call to `pushCallContext()` at every use site.
`pushCallContext()` emits either an `LDC` constant or `ILOAD 2` (the
callContext method parameter); both are side-effect-free and can be
re-emitted inline without storing in a local slot.

TempLocalCountVisitor: the `->` and `(` cases previously reflected the
(now-removed) int slot allocation.  They are KEPT because the count
controls the pre-init range in EmitterMethodCreator — removing them would
shrink that range and leave reference-typed slots in TOP state, causing a
second VerifyError ("Type top … is not assignable to reference type").

Fixes the regression introduced in perf/reduce-apply-bytecode Phase 2
(blockDispatcher sharing across call sites).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…yError

Document the root cause (int slot pre-initialised as null-reference causing
VerifyError at blockDispatcher merge point) and the fix (pushCallContext()
at each use site; keep TempLocalCountVisitor counts for pre-init range).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
DBIx::Class 0.082844: 314 files, 13858 tests, PASS (1618 s wall).
Commit e3b9876 is the earliest known-good SHA on this branch.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…signs

When a compound assignment operator like `|=`, `&=`, `^=`, `**=`, `<<=`, `>>=`,
or `.=` is applied to a tied scalar variable, Perl semantics require exactly two
FETCHes: one to read the current value (for the RHS of the base operation) and
one to return the result to the caller.  The fast-path operators that go through
dedicated *Assign methods (addAssign, subtractAssign, etc.) already return the
arg1 (lvalue) object directly — so the caller's assignment triggers the 2nd FETCH
correctly.

The fallback path in handleCompoundAssignment() was using the return value of
set() instead of the original lvalue.  For TIED_SCALAR variables, set() calls
tiedStore() which returns the stored value (a plain RuntimeScalar copy), not the
tied object.  So the 2nd FETCH never fired, leaving tied-scalar compound assigns
with fetch=1 rather than the required fetch=2.

This caused 9 tests in op/bop.t to fail on the perf/reduce-apply-bytecode branch
(491/522 passing vs. the expected 500/522 baseline), even though they were
accidentally passing before (the tied scalar was being unwrapped by a later
assignment in is(), which accidentally triggered the 2nd FETCH).

Fix: keep leftSlot live across the set() call, discard set()'s return value, then
reload leftSlot so the lvalue (TIED_SCALAR) propagates to the caller — exactly as
the *Assign methods do.  For non-tied lvalues the object identity is unchanged so
there is no behavioral difference.

Result: op/bop.t now passes 500/522 (matching the Phase 1 baseline).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Methods generated by Sub::Quote / Moo via string eval can be 9–36 KB.
The C1 JIT either fails to compile them efficiently or triggers ASM
frame-compute crashes under CPU pressure, causing t/cdbi/68-inflate_has_a.t
to time out at 300 s in CI when other JVMs are competing for the CPU.

Phase 4 adds a method-size threshold check in getBytecodeInternal():
- After generating bytecode, parse the class file to find the largest method
- If the code is eval-generated (fileName starts with "(eval ") AND the
  largest method exceeds JPERL_MAX_METHOD_BYTES (default: 9 000 bytes),
  throw a RuntimeException("...requires interpreter fallback") which is
  then handled by createRuntimeCode() to route to the interpreter
- Fast path: if classData.length <= threshold, skip parsing entirely (99%
  of classes are small; no overhead for them)

Why eval-only?
Regular Perl files can also have methods up to ~16 KB (e.g. unit test bodies
in local.t) that work correctly on the JVM backend but would fail on the
interpreter due to known parity gaps.  The C1 failure problem exists
exclusively in Sub::Quote string-eval'd code, so restricting the threshold
to eval-generated code avoids false positives.

Key changes:
- EmitterMethodCreator.java: add Phase 4 threshold check
  - Parse bytecode to find max method size (only when isEvalCode + large)
  - Throw RuntimeException with "requires interpreter fallback" in message
  - Pre-existing RuntimeException catch re-throws cleanly without wrapping
- EmitterMethodCreator.java: improve debug output
  - JPERL_BYTECODE_SIZE_DEBUG output now correctly scoped inside size-check
  - JPERL_MAX_METHOD_BYTES env var overrides the 9 000-byte default
- dev/design/reduce-apply-bytecode.md: add Phase 4 design and update
  progress tracking, mark Phase 4 complete

Verified:
- make: all unit tests PASS (including local.t 74/74, destroy_collections.t)
- t/cdbi/68-inflate_has_a.t: PASS 6/6 (with threshold firing 3 times for
  14 392 / 13 449 / 15 668-byte Sub::Quote-generated methods)

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Reorganise the three-level guard comment block to match the
guard-order list used in the design doc, and add explicit
closing-brace comments so the nesting is immediately legible.

Also rename local variable `env` → `maxBytesEnv` to avoid
shadowing the `String[] env` declared earlier in the same method
(the old name caused a compile error introduced in the previous
structural refactor attempt).

No behaviour change; build and tests pass.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Replace the terse 10-line comment with a full explanation covering:

- The exact bug it prevents (DBIx::Class timeout: C1 JIT "out of
  virtual registers" → 50× slowdown → TAP SIGKILL after 300 s)
- Why the interpreter is preferable to C1 failure (~5–20× vs ~50×)
- Which 7 methods are affected (Sub::Quote/Moo, 9–36 KB)
- How the "requires interpreter fallback" exception propagates
  through getBytecodeInternal() up to checkForInterpreterFallback()
- Why the isEvalCode guard must not be removed (regular .t files
  have equally large methods but fail on the interpreter)
- Why classData.length alone cannot serve as the final threshold

Also add a targeted warning at the throw site noting that the
literal substring "requires interpreter fallback" is load-bearing.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The ASCII box took 7 lines for what two lines say just as clearly.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Standard Perl behaviour when a class calls `use overload;` with no
operator arguments:
- The package IS registered as overloaded (the `((` marker is set)
- But NO actual operator methods are defined
- String/numeric operators on such objects silently fall back to
  native comparison — no "no method found" error is raised

PerlOnJava was incorrectly throwing "Operation 'eq': no method found,
left argument in overloaded package …" for blessed objects whose class
had `use overload;` with no operators (e.g. Image::ExifTool).

Root cause
----------
`OverloadContext.allowsFallbackAutogen()` returned `true` only when
`hasFallbackGlob=true` AND `fallbackValue=1` (i.e. explicit `fallback
=> 1`).  With `use overload;` (no args) there is no `()` glob so
`hasFallbackGlob=false` and the method returned `false`, causing
`throwIfFallbackDenied` in CompareOperators to raise the error.

Fix
---
Add a `hasAnyOperatorMethod` boolean to `OverloadContext`.  Computed
once in `prepare()` by scanning `globalCodeRefs` for any `(x` key
(excluding the `((` marker and `()` fallback) in the full MRO.

`allowsFallbackAutogen()` now returns `true` when
`!hasAnyOperatorMethod`, regardless of the fallback setting.  This
matches the four real-world cases confirmed against system Perl:

  use overload;                        # no ops       → eq works, no throw
  use overload '+' => sub {};          # one op       → eq throws
  use overload '+' => sub {}, fb => 1; # one op + fb1 → eq works, no throw
  use overload fallback => 0;          # no ops + fb0 → eq throws (fb=0 path)

Verified
--------
- New subtests added to src/test/resources/unit/overload/fallback.t
  pass under both system Perl and jperl
- Full `make` (all unit shards) passes

Fixes the Image::ExifTool Apple.t failure seen on master where
`ImageInfo($file)` returned a blessed ExifTool object and the caller
compared it with `eq`.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock
Copy link
Copy Markdown
Owner Author

fglock commented May 2, 2026

./jcpan -t DBIx::Class pass

@fglock fglock merged commit f6e0f57 into master May 2, 2026
2 checks passed
@fglock fglock deleted the perf/reduce-apply-bytecode branch May 2, 2026 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant