Conversation
50f6c36 to
9389e05
Compare
de12a06 to
534885e
Compare
Phase 4 added: method-size threshold for eval-generated codeCommit Problem recapAfter Phases 1+2, 7 methods generated by Sub::Quote / Moo via string Phase 4 approachAdd a method-size threshold check after bytecode generation in
Why eval-only?Regular Perl files can also have methods up to ~16 KB (e.g. unit test bodies in ResultsThe default threshold (9,000 bytes) catches all 7 Sub::Quote/Moo methods (minimum 9,043 bytes) while leaving all regular user-code methods on the fast JVM path. Env vars
|
…racy
Extends TempLocalCountVisitor to accurately count temporary local
variable allocations that bypass the spill pool (i.e. always call
allocateLocalVariable() directly), allowing the pre-init buffer in
EmitterMethodCreator to shrink from +256 to +32.
Changes to TempLocalCountVisitor:
- BinaryOperatorNode("("): +1 per sub call (callContextSlot)
- BinaryOperatorNode("..", "..."): +3 per flip-flop (flipFlopIdSlot +
leftSlot + rightSlot; overcounts for list-context range, which is safe)
- BinaryOperatorNode("xor", "^^"): +1 (leftVar always direct)
- OperatorNode("eval", "evalbytes"): +4 (evalResultSlot + cfSlot +
labelSlot + typeSlot; was +1)
- For1Node: +4 (preEvalListLocal + iteratorIndex + savedLoopVarIndex +
foreachRegexStateLocal; was +1)
- For3Node: +1 (regexStateLocal or conditionResultReg; was 0)
Reduces EmitterMethodCreator pre-init buffer from 256 to 32.
Infrastructure slots (~22: tailCallCodeRefSlot, controlFlowActionSlot,
16 spill slots, returnValueSlot, dynamicIndex, etc.) plus any remaining
uncounted patterns are absorbed by the 32-slot buffer.
Effect on a simple sub like `sub foo { my $x = shift; $x + 1 }`:
Before: 0 (visitor) + 256 = 256 null-init bytecodes → ~773 bytes
After: 1 (visitor) + 32 = 33 null-init bytecodes → ~99 bytes
All unit tests pass at buffer=32.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ply-bytecode Adds a new "How to Evaluate Results" section to dev/design/reduce-apply-bytecode.md with: - Six-step evaluation recipe: bytecode size measurement (JPERL_BYTECODE_SIZE_DEBUG), timing commands, correctness verification, C1 JIT failure detection via PrintCompilation and JFR, reference benchmarks, and per-phase size targets. - Reference benchmarks from the Phase 1 completion run (2026-05-01): 96_is_deteministic_value.t: 11.8s, 8/8; 76joins.t: 9.3s, 27/27; 8,264 methods, 4.0 MB total, avg 511 bytes, largest 36,554 bytes, ~56% total bytecode reduction vs old buffer=256. - Note that the largest method (36,554 bytes) is driven by per-call-site TAILCALL trampoline code (Phase 2 target), not by the pre-init buffer. - "Progress Tracking" section showing Phase 1 complete, next steps for Phase 2 (trampoline extraction) and Phase 3 (eval prologue extraction). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ailCalls()
Replace the ~83-byte inline TAILCALL trampoline emitted at every JVM sub call
site with a single INVOKESTATIC to the new RuntimeCode.resolveTailCalls() helper.
Add `resolveTailCalls(RuntimeList result, int callContext)` static helper:
- Loops while the result is a TAILCALL control-flow marker
- Each iteration calls `apply(codeRef, "tailcall", args, callContext)`, which
already sets `inTailCallTrampoline` to preserve the existing deep-recursion
counter behaviour (tail calls don't inflate the depth counter)
- Returns the first non-TAILCALL result to the call site
Before the `ASTORE controlFlowTempSlot`, prepend:
ILOAD callContextSlot
INVOKESTATIC RuntimeCode.resolveTailCalls(RuntimeList, I)RuntimeList
Remove the 65-byte inline trampoline block that previously followed
`isNonLocalGoto → IFEQ notControlFlow`:
- TAILCALL type check (CHECKCAST + getControlFlowType + ordinal + ICONST_4 + IF_ICMPNE)
- tailcallLoop label + extract codeRef/args + INVOKESTATIC apply("tailcall", …)
- re-check isNonLocalGoto + re-check TAILCALL + IF_ICMPEQ tailcallLoop
- two GOTO blockDispatcher branches + notTailcall label
After the change the per-call-site sequence is:
INVOKESTATIC apply(…) ← unchanged
ILOAD callContextSlot ← new
INVOKESTATIC resolveTailCalls(…) ← new (~5 bytes replaces ~65 bytes)
ASTORE controlFlowTempSlot
ALOAD controlFlowTempSlot
INVOKEVIRTUAL isNonLocalGoto()
IFEQ notControlFlow
GOTO blockDispatcher
notControlFlow:
ALOAD controlFlowTempSlot
- Per-call-site bytecode: ~83 bytes → ~18 bytes (~65 bytes saved per site)
- Large methods with many sub calls (e.g., SQL::Abstract closures with 250+
call sites) shrink by ~16 KB, reducing C1 JIT "out of virtual registers"
failures on CPU-starved machines
- tailCallCodeRefSlot / tailCallArgsSlot kept: still needed by the
returnLabel trampoline in EmitterMethodCreator for goto &NAME in sub bodies
- All `make` unit tests pass
- tail_calls.t: 7/7, subroutine.t: 39/39
- goto &sub chains (factorial, multi-hop, @_ aliasing) verified correct
- LAST/NEXT/REDO after tail-called sub return verified correct
- Both JVM and --interpreter backends pass
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…e control-flow branch The initial Phase 2 commit called resolveTailCalls() unconditionally at every call site (before the isNonLocalGoto() check), adding an extra INVOKESTATIC + instanceof to the common path. This reverts that regression. resolveTailCalls() is now invoked only inside the isNonLocalGoto()==true branch, which executes only when apply() returned a control-flow marker (LAST/NEXT/REDO/GOTO/RETURN/TAILCALL) — a rare event in normal execution. ``` apply(...) ASTORE controlFlowTempSlot ALOAD controlFlowTempSlot INVOKEVIRTUAL isNonLocalGoto() <- one virtual call, returns false IFEQ notControlFlow <- branch taken, nothing else executes notControlFlow: ALOAD controlFlowTempSlot ``` ``` INVOKEVIRTUAL isNonLocalGoto() <- returns true (control flow detected) ALOAD controlFlowTempSlot ILOAD callContextSlot INVOKESTATIC resolveTailCalls() <- resolves TAILCALL chain ASTORE controlFlowTempSlot INVOKEVIRTUAL isNonLocalGoto() <- re-check (chain may have ended normally) IFEQ notControlFlow GOTO blockDispatcher ``` | Version | Bytes/site | Hot-path overhead | |----------------|------------|-------------------| | Phase 1 | ~83 bytes | 1 virtual call | | Phase 2 (bad) | ~23 bytes | 1 static + 1 virtual call (+1 extra) | | Phase 2 (this) | ~38 bytes | 1 virtual call (identical to Phase 1) | Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…code emitter
The `callContextSlot` variable was allocated via `allocateLocalVariable()` and
stored with `ISTORE` (int type) in four places:
- EmitSubroutine.handleApplyOperator() — sub calls: func(args)
- Dereference.java — method calls: $obj->method(args)
- EmitOperator.handleSubstrOperator() — substr built-in
- EmitOperator.handleOperator() — prototype-based operators
The pre-initialisation loop in EmitterMethodCreator initialises every
temp slot as `ACONST_NULL / ASTORE` (reference type). When any of these
`callContextSlot` slots was in the pre-init range, two code paths (one
that executed `ISTORE callContextSlot` and one that jumped past it) met
at the `blockDispatcher` merge point with incompatible types (int vs
null-reference). The JVM verifier threw:
VerifyError: Bad local variable type
This VerifyError was caught by `executeCodeImpl` which re-ran the main
script body through the interpreter, calling `plan tests => 23` a second
time — causing "You tried to plan twice" in DBIx::Class torture.t.
Fix: replace each `ISTORE callContextSlot` + later `ILOAD callContextSlot`
with a direct call to `pushCallContext()` at every use site.
`pushCallContext()` emits either an `LDC` constant or `ILOAD 2` (the
callContext method parameter); both are side-effect-free and can be
re-emitted inline without storing in a local slot.
TempLocalCountVisitor: the `->` and `(` cases previously reflected the
(now-removed) int slot allocation. They are KEPT because the count
controls the pre-init range in EmitterMethodCreator — removing them would
shrink that range and leave reference-typed slots in TOP state, causing a
second VerifyError ("Type top … is not assignable to reference type").
Fixes the regression introduced in perf/reduce-apply-bytecode Phase 2
(blockDispatcher sharing across call sites).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…yError Document the root cause (int slot pre-initialised as null-reference causing VerifyError at blockDispatcher merge point) and the fix (pushCallContext() at each use site; keep TempLocalCountVisitor counts for pre-init range). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
DBIx::Class 0.082844: 314 files, 13858 tests, PASS (1618 s wall). Commit e3b9876 is the earliest known-good SHA on this branch. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…signs When a compound assignment operator like `|=`, `&=`, `^=`, `**=`, `<<=`, `>>=`, or `.=` is applied to a tied scalar variable, Perl semantics require exactly two FETCHes: one to read the current value (for the RHS of the base operation) and one to return the result to the caller. The fast-path operators that go through dedicated *Assign methods (addAssign, subtractAssign, etc.) already return the arg1 (lvalue) object directly — so the caller's assignment triggers the 2nd FETCH correctly. The fallback path in handleCompoundAssignment() was using the return value of set() instead of the original lvalue. For TIED_SCALAR variables, set() calls tiedStore() which returns the stored value (a plain RuntimeScalar copy), not the tied object. So the 2nd FETCH never fired, leaving tied-scalar compound assigns with fetch=1 rather than the required fetch=2. This caused 9 tests in op/bop.t to fail on the perf/reduce-apply-bytecode branch (491/522 passing vs. the expected 500/522 baseline), even though they were accidentally passing before (the tied scalar was being unwrapped by a later assignment in is(), which accidentally triggered the 2nd FETCH). Fix: keep leftSlot live across the set() call, discard set()'s return value, then reload leftSlot so the lvalue (TIED_SCALAR) propagates to the caller — exactly as the *Assign methods do. For non-tied lvalues the object identity is unchanged so there is no behavioral difference. Result: op/bop.t now passes 500/522 (matching the Phase 1 baseline). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Methods generated by Sub::Quote / Moo via string eval can be 9–36 KB.
The C1 JIT either fails to compile them efficiently or triggers ASM
frame-compute crashes under CPU pressure, causing t/cdbi/68-inflate_has_a.t
to time out at 300 s in CI when other JVMs are competing for the CPU.
Phase 4 adds a method-size threshold check in getBytecodeInternal():
- After generating bytecode, parse the class file to find the largest method
- If the code is eval-generated (fileName starts with "(eval ") AND the
largest method exceeds JPERL_MAX_METHOD_BYTES (default: 9 000 bytes),
throw a RuntimeException("...requires interpreter fallback") which is
then handled by createRuntimeCode() to route to the interpreter
- Fast path: if classData.length <= threshold, skip parsing entirely (99%
of classes are small; no overhead for them)
Why eval-only?
Regular Perl files can also have methods up to ~16 KB (e.g. unit test bodies
in local.t) that work correctly on the JVM backend but would fail on the
interpreter due to known parity gaps. The C1 failure problem exists
exclusively in Sub::Quote string-eval'd code, so restricting the threshold
to eval-generated code avoids false positives.
Key changes:
- EmitterMethodCreator.java: add Phase 4 threshold check
- Parse bytecode to find max method size (only when isEvalCode + large)
- Throw RuntimeException with "requires interpreter fallback" in message
- Pre-existing RuntimeException catch re-throws cleanly without wrapping
- EmitterMethodCreator.java: improve debug output
- JPERL_BYTECODE_SIZE_DEBUG output now correctly scoped inside size-check
- JPERL_MAX_METHOD_BYTES env var overrides the 9 000-byte default
- dev/design/reduce-apply-bytecode.md: add Phase 4 design and update
progress tracking, mark Phase 4 complete
Verified:
- make: all unit tests PASS (including local.t 74/74, destroy_collections.t)
- t/cdbi/68-inflate_has_a.t: PASS 6/6 (with threshold firing 3 times for
14 392 / 13 449 / 15 668-byte Sub::Quote-generated methods)
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Reorganise the three-level guard comment block to match the guard-order list used in the design doc, and add explicit closing-brace comments so the nesting is immediately legible. Also rename local variable `env` → `maxBytesEnv` to avoid shadowing the `String[] env` declared earlier in the same method (the old name caused a compile error introduced in the previous structural refactor attempt). No behaviour change; build and tests pass. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Replace the terse 10-line comment with a full explanation covering: - The exact bug it prevents (DBIx::Class timeout: C1 JIT "out of virtual registers" → 50× slowdown → TAP SIGKILL after 300 s) - Why the interpreter is preferable to C1 failure (~5–20× vs ~50×) - Which 7 methods are affected (Sub::Quote/Moo, 9–36 KB) - How the "requires interpreter fallback" exception propagates through getBytecodeInternal() up to checkForInterpreterFallback() - Why the isEvalCode guard must not be removed (regular .t files have equally large methods but fail on the interpreter) - Why classData.length alone cannot serve as the final threshold Also add a targeted warning at the throw site noting that the literal substring "requires interpreter fallback" is load-bearing. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The ASCII box took 7 lines for what two lines say just as clearly. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Standard Perl behaviour when a class calls `use overload;` with no
operator arguments:
- The package IS registered as overloaded (the `((` marker is set)
- But NO actual operator methods are defined
- String/numeric operators on such objects silently fall back to
native comparison — no "no method found" error is raised
PerlOnJava was incorrectly throwing "Operation 'eq': no method found,
left argument in overloaded package …" for blessed objects whose class
had `use overload;` with no operators (e.g. Image::ExifTool).
Root cause
----------
`OverloadContext.allowsFallbackAutogen()` returned `true` only when
`hasFallbackGlob=true` AND `fallbackValue=1` (i.e. explicit `fallback
=> 1`). With `use overload;` (no args) there is no `()` glob so
`hasFallbackGlob=false` and the method returned `false`, causing
`throwIfFallbackDenied` in CompareOperators to raise the error.
Fix
---
Add a `hasAnyOperatorMethod` boolean to `OverloadContext`. Computed
once in `prepare()` by scanning `globalCodeRefs` for any `(x` key
(excluding the `((` marker and `()` fallback) in the full MRO.
`allowsFallbackAutogen()` now returns `true` when
`!hasAnyOperatorMethod`, regardless of the fallback setting. This
matches the four real-world cases confirmed against system Perl:
use overload; # no ops → eq works, no throw
use overload '+' => sub {}; # one op → eq throws
use overload '+' => sub {}, fb => 1; # one op + fb1 → eq works, no throw
use overload fallback => 0; # no ops + fb0 → eq throws (fb=0 path)
Verified
--------
- New subtests added to src/test/resources/unit/overload/fallback.t
pass under both system Perl and jperl
- Full `make` (all unit shards) passes
Fixes the Image::ExifTool Apple.t failure seen on master where
`ImageInfo($file)` returned a blessed ExifTool object and the caller
compared it with `eq`.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
4dc236e to
10b504c
Compare
|
./jcpan -t DBIx::Class pass |
Summary
Reduces the bytecode size of JVM-compiled
apply()methods, targetingthe C1 JIT failure ("out of virtual registers in linear scan") that causes
t/96_is_deteministic_value.t(DBIx::Class) to time out under CPU pressure.Phase 1 — Extend
TempLocalCountVisitor+ reduce null-init buffer 256→32Every
apply()method begins with a null-initialisation prologue for alllocal variable slots. The pre-init count was previously estimated with a
fixed
+256safety margin; this replaces it with a precise count:(: +1 (callContextSlot)eval/evalbytes: +4 (evalResultSlot,cfSlot,labelSlot,typeSlot)for/foreach(For1Node): +4 (preEvalListLocal,iteratorIndex, 2 conditional)while/for(For3Node): +1 (conservative)../...: +3 (flipFlopIdSlot,leftSlot,rightSlot)xor/^^: +1 (leftVar)Buffer reduced from
+256to+32. Result: ~56% total bytecode reductionfor the DBIx::Class workload.
Phase 2 — Extract per-call-site TAILCALL trampoline to
resolveTailCalls()Every sub call site previously emitted an ~83-byte inline TAILCALL trampoline
to handle
goto &subchains. This addsRuntimeCode.resolveTailCalls()as astatic helper and moves the trampoline logic there.
Key design:
resolveTailCalls()is called only inside theisNonLocalGoto()==truebranch, so the common path (no control flow)has zero extra overhead vs Phase 1 — it remains:
Per-call-site bytecode: ~83 bytes → ~38 bytes (saves ~45 bytes per site).
A method with 20 call sites shrinks by ~900 bytes; large SQL-generation
closures with 100+ call sites shrink by several KB.
Phase 2 bugfix — Eliminate
VerifyErrorfrom int-typedcallContextSlotAfter Phase 2 landed, DBIx::Class
t/multi_create/torture.tfailed with"You tried to plan twice". Root cause:
callContextSlotwas allocated viaallocateLocalVariable()and storedwith
ISTORE(int type) inEmitSubroutine,Dereference, andEmitOperator.ACONST_NULL/ASTORE(reference type). At
blockDispatchermerge points the JVM verifier saw"int vs null-reference" and threw
VerifyError: Bad local variable type.executeCodeImpl, which silently fell backto the interpreter — re-executing the script body and calling
plan tests => 23a second time.Fix: replace every
ISTORE/ILOAD callContextSlotwith inlinepushCallContext()at each use site. The->and(entries inTempLocalCountVisitorare kept (they control the pre-init range forreference spill slots — removing them caused a second VerifyError at a
different slot).
Post-rebase fix — Restore correct tied-scalar FETCH count for fallback compound assigns
The Phase 2 changes exposed a pre-existing bug in the fallback path of
handleCompoundAssignment()(used for|=,&=,^=,**=,<<=,>>=,.=).For tied scalar lvalues, Perl semantics require two FETCHes on compound
assignment (one to read the current value, one to return the result). The
fallback path was returning
set()'s return value — a plain scalar copy fromtiedStore()— instead of the lvalue itself, so the 2nd FETCH never fired.Before Phase 2 this accidentally passed in
op/bop.tbecause the tied objectwas unwrapped later by a
toString()call insideis(). Phase 2 changedargument passing so that path no longer triggered, exposing the bug as 9 new
test failures (491/522 vs baseline 500/522).
Fix: keep
leftSlotlive across theset()call, discardset()'s returnvalue, then reload
leftSlot— matching howaddAssign,subtractAssign, etc.already return
arg1directly. Result:op/bop.t500/522 (baseline restored).Files changed
TempLocalCountVisitor.java->and(entries as pre-init range proxiesEmitterMethodCreator.java+256→+32RuntimeCode.javaresolveTailCalls(RuntimeList, int)static helperEmitSubroutine.javacallContextSlotallocationDereference.javacallContextSlotallocation; inlinepushCallContext()EmitOperator.javacallContextSlotallocation; inlinepushCallContext()EmitBinaryOperator.javaset()result) for correct tied-scalar FETCH semanticsdev/design/reduce-apply-bytecode.mdTest plan
make— all unit tests pass (after rebase onto master)tail_calls.t— 7/7 (goto &subfactorial, multi-hop chains,@_aliasing)subroutine.t— 39/39t/multi_create/torture.t— 23/23 (was failing with "plan twice" before bugfix)op/bop.t— 500/522 (baseline restored; 9 tied-scalar compound-assign tests fixed)3a1707b56)--interpreterbackendsNote on
t/cdbi/68-inflate_has_a.tCI timeoutThis test timed out (exit 137, 300 s) in one CI run "under load". It is not
a regression: the test passes in <60 s locally on the current HEAD (all 6
subtests pass), and the full DBIx::Class suite PASS recorded at
3a1707b56includes this test. The timeout is load-induced — competing JVMs delay JIT
compilation of
DateTime(a large module loaded by this test) past the 300 sharness limit, the same mechanism documented in
dev/design/reduce-apply-bytecode.md.Generated with Devin
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>