perf: reduce apply() bytecode size (Phase 1 + 2: TempLocalCountVisitor, buffer 256→32, TAILCALL trampoline extraction) by fglock · Pull Request #650 · fglock/PerlOnJava

fglock · 2026-05-01T14:16:02Z

Summary

Reduces the bytecode size of JVM-compiled apply() methods, targeting
the C1 JIT failure ("out of virtual registers in linear scan") that causes
t/96_is_deteministic_value.t (DBIx::Class) to time out under CPU pressure.

Phase 1 — Extend `TempLocalCountVisitor` + reduce null-init buffer 256→32

Every apply() method begins with a null-initialisation prologue for all
local variable slots. The pre-init count was previously estimated with a
fixed +256 safety margin; this replaces it with a precise count:

Sub call operator (: +1 (callContextSlot)
eval/evalbytes: +4 (evalResultSlot, cfSlot, labelSlot, typeSlot)
for/foreach (For1Node): +4 (preEvalListLocal, iteratorIndex, 2 conditional)
while/for (For3Node): +1 (conservative)
Flip-flop ../...: +3 (flipFlopIdSlot, leftSlot, rightSlot)
xor/^^: +1 (leftVar)

Buffer reduced from +256 to +32. Result: ~56% total bytecode reduction
for the DBIx::Class workload.

Phase 2 — Extract per-call-site TAILCALL trampoline to `resolveTailCalls()`

Every sub call site previously emitted an ~83-byte inline TAILCALL trampoline
to handle goto &sub chains. This adds RuntimeCode.resolveTailCalls() as a
static helper and moves the trampoline logic there.

Key design: resolveTailCalls() is called only inside the
isNonLocalGoto()==true branch, so the common path (no control flow)
has zero extra overhead vs Phase 1 — it remains:

apply(...)  →  ASTORE  →  ALOAD  →  isNonLocalGoto()  →  IFEQ (branch taken)

Per-call-site bytecode: ~83 bytes → ~38 bytes (saves ~45 bytes per site).
A method with 20 call sites shrinks by ~900 bytes; large SQL-generation
closures with 100+ call sites shrink by several KB.

Phase 2 bugfix — Eliminate `VerifyError` from int-typed `callContextSlot`

After Phase 2 landed, DBIx::Class t/multi_create/torture.t failed with
"You tried to plan twice". Root cause:

callContextSlot was allocated via allocateLocalVariable() and stored
with ISTORE (int type) in EmitSubroutine, Dereference, and
EmitOperator.
The pre-init loop initialises every temp slot as ACONST_NULL/ASTORE
(reference type). At blockDispatcher merge points the JVM verifier saw
"int vs null-reference" and threw VerifyError: Bad local variable type.
The VerifyError was caught by executeCodeImpl, which silently fell back
to the interpreter — re-executing the script body and calling
plan tests => 23 a second time.

Fix: replace every ISTORE/ILOAD callContextSlot with inline
pushCallContext() at each use site. The -> and ( entries in
TempLocalCountVisitor are kept (they control the pre-init range for
reference spill slots — removing them caused a second VerifyError at a
different slot).

Post-rebase fix — Restore correct tied-scalar FETCH count for fallback compound assigns

The Phase 2 changes exposed a pre-existing bug in the fallback path of
handleCompoundAssignment() (used for |=, &=, ^=, **=, <<=, >>=, .=).
For tied scalar lvalues, Perl semantics require two FETCHes on compound
assignment (one to read the current value, one to return the result). The
fallback path was returning set()'s return value — a plain scalar copy from
tiedStore() — instead of the lvalue itself, so the 2nd FETCH never fired.

Before Phase 2 this accidentally passed in op/bop.t because the tied object
was unwrapped later by a toString() call inside is(). Phase 2 changed
argument passing so that path no longer triggered, exposing the bug as 9 new
test failures (491/522 vs baseline 500/522).

Fix: keep leftSlot live across the set() call, discard set()'s return
value, then reload leftSlot — matching how addAssign, subtractAssign, etc.
already return arg1 directly. Result: op/bop.t 500/522 (baseline restored).

Files changed

File	Change
`TempLocalCountVisitor.java`	Extended to count 6 new local-allocating patterns; kept `->` and `(` entries as pre-init range proxies
`EmitterMethodCreator.java`	Buffer: `+256` → `+32`
`RuntimeCode.java`	New `resolveTailCalls(RuntimeList, int)` static helper
`EmitSubroutine.java`	Removed inline TAILCALL trampoline; removed `callContextSlot` allocation
`Dereference.java`	Removed `callContextSlot` allocation; inline `pushCallContext()`
`EmitOperator.java`	Removed `callContextSlot` allocation; inline `pushCallContext()`
`EmitBinaryOperator.java`	Fallback compound-assign returns lvalue (not `set()` result) for correct tied-scalar FETCH semantics
`dev/design/reduce-apply-bytecode.md`	Design doc + progress tracking

Test plan

make — all unit tests pass (after rebase onto master)
tail_calls.t — 7/7 (goto &sub factorial, multi-hop chains, @_ aliasing)
subroutine.t — 39/39
t/multi_create/torture.t — 23/23 (was failing with "plan twice" before bugfix)
op/bop.t — 500/522 (baseline restored; 9 tied-scalar compound-assign tests fixed)
Full DBIx::Class 0.082844 suite — 13,858/13,858 PASS (at commit 3a1707b56)
Both JVM and --interpreter backends
LAST/NEXT/REDO after tail-called sub return

Note on `t/cdbi/68-inflate_has_a.t` CI timeout

This test timed out (exit 137, 300 s) in one CI run "under load". It is not
a regression: the test passes in <60 s locally on the current HEAD (all 6
subtests pass), and the full DBIx::Class suite PASS recorded at 3a1707b56
includes this test. The timeout is load-induced — competing JVMs delay JIT
compilation of DateTime (a large module loaded by this test) past the 300 s
harness limit, the same mechanism documented in dev/design/reduce-apply-bytecode.md.

Generated with Devin

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

fglock · 2026-05-01T21:33:55Z

Phase 4 added: method-size threshold for eval-generated code

Commit a2079f9d6 adds Phase 4 — the key fix for load-induced timeouts.

Problem recap

After Phases 1+2, 7 methods generated by Sub::Quote / Moo via string eval are still 9–36 KB. Under CPU pressure, C1 JIT either fails to compile them efficiently or triggers ASM frame-compute crashes, causing t/cdbi/68-inflate_has_a.t to time out at the 300 s harness limit.

Phase 4 approach

Add a method-size threshold check after bytecode generation in EmitterMethodCreator.getBytecodeInternal():

Parse the generated class file to find the largest method's code length
If the code is eval-generated (fileName starts with "(eval ") and the largest method exceeds JPERL_MAX_METHOD_BYTES (default: 9,000 bytes), throw a RuntimeException with "requires interpreter fallback" in the message
createRuntimeCode() catches this and routes to the PerlOnJava interpreter
Fast path: if classData.length <= threshold, skip parsing (99% of classes are small)

Why eval-only?

Regular Perl files can also have methods up to ~16 KB (e.g. unit test bodies in local.t) that work correctly on the JVM backend but would fail on the interpreter due to known parity gaps. The C1 failure problem exists exclusively in Sub::Quote string-eval'd code, so restricting the threshold to eval-generated code avoids false positives.

Results

make: all unit tests PASS
t/cdbi/68-inflate_has_a.t: PASS 6/6
  Phase 4 fires for 3 Sub::Quote methods:
    - 14,392 bytes
    - 13,449 bytes  
    - 15,668 bytes

The default threshold (9,000 bytes) catches all 7 Sub::Quote/Moo methods (minimum 9,043 bytes) while leaving all regular user-code methods on the fast JVM path.

Env vars

Var	Effect
`JPERL_MAX_METHOD_BYTES=N`	Override threshold (set to large number like `999999` to disable)
`JPERL_SHOW_FALLBACK=1`	Shows "Method too large for reliable JIT: N bytes" when threshold fires
`JPERL_BYTECODE_SIZE_DEBUG=1`	Shows all method sizes + threshold actions

…racy Extends TempLocalCountVisitor to accurately count temporary local variable allocations that bypass the spill pool (i.e. always call allocateLocalVariable() directly), allowing the pre-init buffer in EmitterMethodCreator to shrink from +256 to +32. Changes to TempLocalCountVisitor: - BinaryOperatorNode("("): +1 per sub call (callContextSlot) - BinaryOperatorNode("..", "..."): +3 per flip-flop (flipFlopIdSlot + leftSlot + rightSlot; overcounts for list-context range, which is safe) - BinaryOperatorNode("xor", "^^"): +1 (leftVar always direct) - OperatorNode("eval", "evalbytes"): +4 (evalResultSlot + cfSlot + labelSlot + typeSlot; was +1) - For1Node: +4 (preEvalListLocal + iteratorIndex + savedLoopVarIndex + foreachRegexStateLocal; was +1) - For3Node: +1 (regexStateLocal or conditionResultReg; was 0) Reduces EmitterMethodCreator pre-init buffer from 256 to 32. Infrastructure slots (~22: tailCallCodeRefSlot, controlFlowActionSlot, 16 spill slots, returnValueSlot, dynamicIndex, etc.) plus any remaining uncounted patterns are absorbed by the 32-slot buffer. Effect on a simple sub like `sub foo { my $x = shift; $x + 1 }`: Before: 0 (visitor) + 256 = 256 null-init bytecodes → ~773 bytes After: 1 (visitor) + 32 = 33 null-init bytecodes → ~99 bytes All unit tests pass at buffer=32. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

…ply-bytecode Adds a new "How to Evaluate Results" section to dev/design/reduce-apply-bytecode.md with: - Six-step evaluation recipe: bytecode size measurement (JPERL_BYTECODE_SIZE_DEBUG), timing commands, correctness verification, C1 JIT failure detection via PrintCompilation and JFR, reference benchmarks, and per-phase size targets. - Reference benchmarks from the Phase 1 completion run (2026-05-01): 96_is_deteministic_value.t: 11.8s, 8/8; 76joins.t: 9.3s, 27/27; 8,264 methods, 4.0 MB total, avg 511 bytes, largest 36,554 bytes, ~56% total bytecode reduction vs old buffer=256. - Note that the largest method (36,554 bytes) is driven by per-call-site TAILCALL trampoline code (Phase 2 target), not by the pre-init buffer. - "Progress Tracking" section showing Phase 1 complete, next steps for Phase 2 (trampoline extraction) and Phase 3 (eval prologue extraction). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

…ailCalls() Replace the ~83-byte inline TAILCALL trampoline emitted at every JVM sub call site with a single INVOKESTATIC to the new RuntimeCode.resolveTailCalls() helper. Add `resolveTailCalls(RuntimeList result, int callContext)` static helper: - Loops while the result is a TAILCALL control-flow marker - Each iteration calls `apply(codeRef, "tailcall", args, callContext)`, which already sets `inTailCallTrampoline` to preserve the existing deep-recursion counter behaviour (tail calls don't inflate the depth counter) - Returns the first non-TAILCALL result to the call site Before the `ASTORE controlFlowTempSlot`, prepend: ILOAD callContextSlot INVOKESTATIC RuntimeCode.resolveTailCalls(RuntimeList, I)RuntimeList Remove the 65-byte inline trampoline block that previously followed `isNonLocalGoto → IFEQ notControlFlow`: - TAILCALL type check (CHECKCAST + getControlFlowType + ordinal + ICONST_4 + IF_ICMPNE) - tailcallLoop label + extract codeRef/args + INVOKESTATIC apply("tailcall", …) - re-check isNonLocalGoto + re-check TAILCALL + IF_ICMPEQ tailcallLoop - two GOTO blockDispatcher branches + notTailcall label After the change the per-call-site sequence is: INVOKESTATIC apply(…) ← unchanged ILOAD callContextSlot ← new INVOKESTATIC resolveTailCalls(…) ← new (~5 bytes replaces ~65 bytes) ASTORE controlFlowTempSlot ALOAD controlFlowTempSlot INVOKEVIRTUAL isNonLocalGoto() IFEQ notControlFlow GOTO blockDispatcher notControlFlow: ALOAD controlFlowTempSlot - Per-call-site bytecode: ~83 bytes → ~18 bytes (~65 bytes saved per site) - Large methods with many sub calls (e.g., SQL::Abstract closures with 250+ call sites) shrink by ~16 KB, reducing C1 JIT "out of virtual registers" failures on CPU-starved machines - tailCallCodeRefSlot / tailCallArgsSlot kept: still needed by the returnLabel trampoline in EmitterMethodCreator for goto &NAME in sub bodies - All `make` unit tests pass - tail_calls.t: 7/7, subroutine.t: 39/39 - goto &sub chains (factorial, multi-hop, @_ aliasing) verified correct - LAST/NEXT/REDO after tail-called sub return verified correct - Both JVM and --interpreter backends pass Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

…e control-flow branch The initial Phase 2 commit called resolveTailCalls() unconditionally at every call site (before the isNonLocalGoto() check), adding an extra INVOKESTATIC + instanceof to the common path. This reverts that regression. resolveTailCalls() is now invoked only inside the isNonLocalGoto()==true branch, which executes only when apply() returned a control-flow marker (LAST/NEXT/REDO/GOTO/RETURN/TAILCALL) — a rare event in normal execution. ``` apply(...) ASTORE controlFlowTempSlot ALOAD controlFlowTempSlot INVOKEVIRTUAL isNonLocalGoto() <- one virtual call, returns false IFEQ notControlFlow <- branch taken, nothing else executes notControlFlow: ALOAD controlFlowTempSlot ``` ``` INVOKEVIRTUAL isNonLocalGoto() <- returns true (control flow detected) ALOAD controlFlowTempSlot ILOAD callContextSlot INVOKESTATIC resolveTailCalls() <- resolves TAILCALL chain ASTORE controlFlowTempSlot INVOKEVIRTUAL isNonLocalGoto() <- re-check (chain may have ended normally) IFEQ notControlFlow GOTO blockDispatcher ``` | Version | Bytes/site | Hot-path overhead | |----------------|------------|-------------------| | Phase 1 | ~83 bytes | 1 virtual call | | Phase 2 (bad) | ~23 bytes | 1 static + 1 virtual call (+1 extra) | | Phase 2 (this) | ~38 bytes | 1 virtual call (identical to Phase 1) | Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

…code emitter The `callContextSlot` variable was allocated via `allocateLocalVariable()` and stored with `ISTORE` (int type) in four places: - EmitSubroutine.handleApplyOperator() — sub calls: func(args) - Dereference.java — method calls: $obj->method(args) - EmitOperator.handleSubstrOperator() — substr built-in - EmitOperator.handleOperator() — prototype-based operators The pre-initialisation loop in EmitterMethodCreator initialises every temp slot as `ACONST_NULL / ASTORE` (reference type). When any of these `callContextSlot` slots was in the pre-init range, two code paths (one that executed `ISTORE callContextSlot` and one that jumped past it) met at the `blockDispatcher` merge point with incompatible types (int vs null-reference). The JVM verifier threw: VerifyError: Bad local variable type This VerifyError was caught by `executeCodeImpl` which re-ran the main script body through the interpreter, calling `plan tests => 23` a second time — causing "You tried to plan twice" in DBIx::Class torture.t. Fix: replace each `ISTORE callContextSlot` + later `ILOAD callContextSlot` with a direct call to `pushCallContext()` at every use site. `pushCallContext()` emits either an `LDC` constant or `ILOAD 2` (the callContext method parameter); both are side-effect-free and can be re-emitted inline without storing in a local slot. TempLocalCountVisitor: the `->` and `(` cases previously reflected the (now-removed) int slot allocation. They are KEPT because the count controls the pre-init range in EmitterMethodCreator — removing them would shrink that range and leave reference-typed slots in TOP state, causing a second VerifyError ("Type top … is not assignable to reference type"). Fixes the regression introduced in perf/reduce-apply-bytecode Phase 2 (blockDispatcher sharing across call sites). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

…yError Document the root cause (int slot pre-initialised as null-reference causing VerifyError at blockDispatcher merge point) and the fix (pushCallContext() at each use site; keep TempLocalCountVisitor counts for pre-init range). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

DBIx::Class 0.082844: 314 files, 13858 tests, PASS (1618 s wall). Commit e3b9876 is the earliest known-good SHA on this branch. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

…signs When a compound assignment operator like `|=`, `&=`, `^=`, `**=`, `<<=`, `>>=`, or `.=` is applied to a tied scalar variable, Perl semantics require exactly two FETCHes: one to read the current value (for the RHS of the base operation) and one to return the result to the caller. The fast-path operators that go through dedicated *Assign methods (addAssign, subtractAssign, etc.) already return the arg1 (lvalue) object directly — so the caller's assignment triggers the 2nd FETCH correctly. The fallback path in handleCompoundAssignment() was using the return value of set() instead of the original lvalue. For TIED_SCALAR variables, set() calls tiedStore() which returns the stored value (a plain RuntimeScalar copy), not the tied object. So the 2nd FETCH never fired, leaving tied-scalar compound assigns with fetch=1 rather than the required fetch=2. This caused 9 tests in op/bop.t to fail on the perf/reduce-apply-bytecode branch (491/522 passing vs. the expected 500/522 baseline), even though they were accidentally passing before (the tied scalar was being unwrapped by a later assignment in is(), which accidentally triggered the 2nd FETCH). Fix: keep leftSlot live across the set() call, discard set()'s return value, then reload leftSlot so the lvalue (TIED_SCALAR) propagates to the caller — exactly as the *Assign methods do. For non-tied lvalues the object identity is unchanged so there is no behavioral difference. Result: op/bop.t now passes 500/522 (matching the Phase 1 baseline). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Methods generated by Sub::Quote / Moo via string eval can be 9–36 KB. The C1 JIT either fails to compile them efficiently or triggers ASM frame-compute crashes under CPU pressure, causing t/cdbi/68-inflate_has_a.t to time out at 300 s in CI when other JVMs are competing for the CPU. Phase 4 adds a method-size threshold check in getBytecodeInternal(): - After generating bytecode, parse the class file to find the largest method - If the code is eval-generated (fileName starts with "(eval ") AND the largest method exceeds JPERL_MAX_METHOD_BYTES (default: 9 000 bytes), throw a RuntimeException("...requires interpreter fallback") which is then handled by createRuntimeCode() to route to the interpreter - Fast path: if classData.length <= threshold, skip parsing entirely (99% of classes are small; no overhead for them) Why eval-only? Regular Perl files can also have methods up to ~16 KB (e.g. unit test bodies in local.t) that work correctly on the JVM backend but would fail on the interpreter due to known parity gaps. The C1 failure problem exists exclusively in Sub::Quote string-eval'd code, so restricting the threshold to eval-generated code avoids false positives. Key changes: - EmitterMethodCreator.java: add Phase 4 threshold check - Parse bytecode to find max method size (only when isEvalCode + large) - Throw RuntimeException with "requires interpreter fallback" in message - Pre-existing RuntimeException catch re-throws cleanly without wrapping - EmitterMethodCreator.java: improve debug output - JPERL_BYTECODE_SIZE_DEBUG output now correctly scoped inside size-check - JPERL_MAX_METHOD_BYTES env var overrides the 9 000-byte default - dev/design/reduce-apply-bytecode.md: add Phase 4 design and update progress tracking, mark Phase 4 complete Verified: - make: all unit tests PASS (including local.t 74/74, destroy_collections.t) - t/cdbi/68-inflate_has_a.t: PASS 6/6 (with threshold firing 3 times for 14 392 / 13 449 / 15 668-byte Sub::Quote-generated methods) Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Reorganise the three-level guard comment block to match the guard-order list used in the design doc, and add explicit closing-brace comments so the nesting is immediately legible. Also rename local variable `env` → `maxBytesEnv` to avoid shadowing the `String[] env` declared earlier in the same method (the old name caused a compile error introduced in the previous structural refactor attempt). No behaviour change; build and tests pass. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Replace the terse 10-line comment with a full explanation covering: - The exact bug it prevents (DBIx::Class timeout: C1 JIT "out of virtual registers" → 50× slowdown → TAP SIGKILL after 300 s) - Why the interpreter is preferable to C1 failure (~5–20× vs ~50×) - Which 7 methods are affected (Sub::Quote/Moo, 9–36 KB) - How the "requires interpreter fallback" exception propagates through getBytecodeInternal() up to checkForInterpreterFallback() - Why the isEvalCode guard must not be removed (regular .t files have equally large methods but fail on the interpreter) - Why classData.length alone cannot serve as the final threshold Also add a targeted warning at the throw site noting that the literal substring "requires interpreter fallback" is load-bearing. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

The ASCII box took 7 lines for what two lines say just as clearly. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Standard Perl behaviour when a class calls `use overload;` with no operator arguments: - The package IS registered as overloaded (the `((` marker is set) - But NO actual operator methods are defined - String/numeric operators on such objects silently fall back to native comparison — no "no method found" error is raised PerlOnJava was incorrectly throwing "Operation 'eq': no method found, left argument in overloaded package …" for blessed objects whose class had `use overload;` with no operators (e.g. Image::ExifTool). Root cause ---------- `OverloadContext.allowsFallbackAutogen()` returned `true` only when `hasFallbackGlob=true` AND `fallbackValue=1` (i.e. explicit `fallback => 1`). With `use overload;` (no args) there is no `()` glob so `hasFallbackGlob=false` and the method returned `false`, causing `throwIfFallbackDenied` in CompareOperators to raise the error. Fix --- Add a `hasAnyOperatorMethod` boolean to `OverloadContext`. Computed once in `prepare()` by scanning `globalCodeRefs` for any `(x` key (excluding the `((` marker and `()` fallback) in the full MRO. `allowsFallbackAutogen()` now returns `true` when `!hasAnyOperatorMethod`, regardless of the fallback setting. This matches the four real-world cases confirmed against system Perl: use overload; # no ops → eq works, no throw use overload '+' => sub {}; # one op → eq throws use overload '+' => sub {}, fb => 1; # one op + fb1 → eq works, no throw use overload fallback => 0; # no ops + fb0 → eq throws (fb=0 path) Verified -------- - New subtests added to src/test/resources/unit/overload/fallback.t pass under both system Perl and jperl - Full `make` (all unit shards) passes Fixes the Image::ExifTool Apple.t failure seen on master where `ImageInfo($file)` returned a blessed ExifTool object and the caller compared it with `eq`. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

fglock · 2026-05-02T09:23:32Z

./jcpan -t DBIx::Class pass

fglock force-pushed the perf/reduce-apply-bytecode branch from 50f6c36 to 9389e05 Compare May 1, 2026 15:14

fglock changed the title ~~perf: reduce apply() bytecode size (Phase 1: TempLocalCountVisitor + buffer 256→32)~~ perf: reduce apply() bytecode size (Phase 1 + 2: TempLocalCountVisitor, buffer 256→32, TAILCALL trampoline extraction) May 1, 2026

fglock force-pushed the perf/reduce-apply-bytecode branch 2 times, most recently from de12a06 to 534885e Compare May 1, 2026 20:24

fglock and others added 13 commits May 2, 2026 10:01

docs: replace box header in Phase 4 comment with plain text

567a650

The ASCII box took 7 lines for what two lines say just as clearly. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

fglock force-pushed the perf/reduce-apply-bytecode branch from 4dc236e to 10b504c Compare May 2, 2026 08:01

fglock mentioned this pull request May 2, 2026

DBIx::Class t/52leaks.t intermittent: detached result source from premature weak-ref clear by ReachabilityWalker auto-sweep #646

Closed

4 tasks

fglock merged commit f6e0f57 into master May 2, 2026
2 checks passed

fglock deleted the perf/reduce-apply-bytecode branch May 2, 2026 09:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce apply() bytecode size (Phase 1 + 2: TempLocalCountVisitor, buffer 256→32, TAILCALL trampoline extraction)#650

perf: reduce apply() bytecode size (Phase 1 + 2: TempLocalCountVisitor, buffer 256→32, TAILCALL trampoline extraction)#650
fglock merged 13 commits intomasterfrom
perf/reduce-apply-bytecode

fglock commented May 1, 2026 •

edited

Loading

Uh oh!

fglock commented May 1, 2026

Uh oh!

fglock commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fglock commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phase 1 — Extend TempLocalCountVisitor + reduce null-init buffer 256→32

Phase 2 — Extract per-call-site TAILCALL trampoline to resolveTailCalls()

Phase 2 bugfix — Eliminate VerifyError from int-typed callContextSlot

Post-rebase fix — Restore correct tied-scalar FETCH count for fallback compound assigns

Files changed

Test plan

Note on t/cdbi/68-inflate_has_a.t CI timeout

Uh oh!

fglock commented May 1, 2026

Phase 4 added: method-size threshold for eval-generated code

Problem recap

Phase 4 approach

Why eval-only?

Results

Env vars

Uh oh!

fglock commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fglock commented May 1, 2026 •

edited

Loading

Phase 1 — Extend `TempLocalCountVisitor` + reduce null-init buffer 256→32

Phase 2 — Extract per-call-site TAILCALL trampoline to `resolveTailCalls()`

Phase 2 bugfix — Eliminate `VerifyError` from int-typed `callContextSlot`

Note on `t/cdbi/68-inflate_has_a.t` CI timeout