Skip to content

Optimize temporary local variable allocation#173

Closed
fglock wants to merge 1 commit intomasterfrom
optimize-temp-local-allocation
Closed

Optimize temporary local variable allocation#173
fglock wants to merge 1 commit intomasterfrom
optimize-temp-local-allocation

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Feb 6, 2026

Summary

Reduce over-allocation of temporary local variables that was causing significant bytecode bloat.

Changes

Before:

int preInitTempLocalsCount = Math.max(128, tempCountVisitor.getMaxTempCount() + 64);
  • Always allocated minimum 128 slots
  • Added 64-slot buffer on top of actual count
  • Total minimum: 192 slots regardless of actual need

After:

int preInitTempLocalsCount = tempCountVisitor.getMaxTempCount() + 32;
  • Use visitor's estimate with modest 32-slot buffer
  • No artificial minimum
  • Scales with actual code complexity

Results

Bytecode Reduction for japh.pl

  • Total ASTORE instructions: 1019 → 255 (75% reduction)
  • Null initializations: ~888 → 124 (86% reduction)
  • Total bytecode lines: 3345 → 1817 (46% reduction)
  • Savings: 1528 bytecode lines

Testing

  • ✅ All 152 unit tests pass (100% pass rate)
  • ✅ Tested with perl test runner (7010/7010 tests pass)
  • ✅ No VerifyErrors or other issues

Why 32-slot buffer?

The TempLocalCountVisitor only counts 3 specific cases:

  • Logical operators (&&, ||, //)
  • For loops
  • local() operators

But there are ~90 places in codegen that dynamically allocate temp variables (method calls, array/hash operations, regex, etc.). The visitor underestimates significantly.

Testing results:

  • No buffer: Broke perl5_t tests (uni/variables.t, uni/fold.t, opbasic/cmp.t, uni/lower.t)
  • Buffer=32: All tests pass, massive bytecode reduction

The 32-slot buffer provides safety margin for complex expressions without the extreme waste of the previous min-128 + 64-buffer approach.

Performance Impact

Smaller bytecode means:

  • Faster class loading
  • Better JIT compiler performance
  • Reduced memory footprint
  • Improved code cache utilization

🤖 Generated with Claude Code

Reduce over-allocation of temp locals that was causing bytecode bloat.

Before: Math.max(128, tempCount + 64) - minimum 128 slots, 64-slot buffer
After: tempCount + 32 - modest 32-slot buffer

The original allocation was excessive because TempLocalCountVisitor only
counts 3 specific cases (logical operators, for loops, local()), but
there are ~90 places in codegen that allocate temp variables dynamically.

A 32-slot buffer provides safety margin without the extreme waste of the
previous min-128 + 64-buffer approach.

Testing:
- ✅ All 152 unit tests pass (100% pass rate)
- ✅ Reduces bytecode bloat while maintaining safety
- Note: Originally tried no buffer (broke perl5 tests), 32 is the balance

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@fglock fglock force-pushed the optimize-temp-local-allocation branch from 7e5559e to 8a865e1 Compare February 6, 2026 13:19
@fglock
Copy link
Copy Markdown
Owner Author

fglock commented Feb 6, 2026

Merged directly to master after confirming no regressions. The issue was incorrect test execution method - using jperl directly instead of perl_test_runner.pl gave false positives. All tests pass with the optimization.

@fglock fglock closed this Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant