Skip to content

Optimize control flow with block-level dispatcher sharing#161

Merged
fglock merged 1 commit intomasterfrom
optimize-block-level-dispatchers
Feb 4, 2026
Merged

Optimize control flow with block-level dispatcher sharing#161
fglock merged 1 commit intomasterfrom
optimize-block-level-dispatchers

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Feb 4, 2026

Summary

Implements block-level dispatcher sharing to eliminate redundant control flow dispatch code when multiple calls occur in the same block with the same visible loops.

The Problem

Previously, each call site emitted a complete control flow dispatcher (~150 bytes):

for (1..3) {
    A();  # 150 bytes dispatcher
    B();  # 150 bytes dispatcher (identical!)
    C();  # 150 bytes dispatcher (identical!)
    D();  # 150 bytes dispatcher (identical!)
}

Total: 600 bytes of redundant code for 4 calls.

The Solution

Block-level shared dispatchers - all calls with the same visible loop state share ONE dispatcher:

  • Call sites: Simple check (~20 bytes) + GOTO to shared dispatcher
  • Block dispatcher: Full dispatch logic (~150 bytes, emitted once)
  • Automatic reuse: Signature-based matching via blockDispatcherLabels map

Results

Bytecode Savings

Calls Old (bytes) New (bytes) Savings Percentage
1 150 173 -23 -15%
2 300 193 107 36% ✅
4 600 233 367 61% ✅
10 1500 353 1147 76% ✅

Real-World Measurements

  • Test with 4 sequential calls: 2232 → 2139 bytecode lines (4.2% reduction)
  • CHECKCAST operations: 23 → 17 (26% reduction)
  • Complex nested loops: No regression (1374 lines maintained)
  • All 2006 unit tests pass

Implementation Details

Modified Files

  1. JavaClassInfo.java

    • Added blockDispatcherLabels map to track dispatcher reuse
    • Added getLoopStateSignature() method for unique loop state identification
    • Uses identity hash codes to distinguish loop instances
  2. EmitSubroutine.java

    • Simplified call-site emission to ~20 bytes (check + GOTO)
    • Added emitBlockDispatcher() helper method
    • First call with a signature creates and emits dispatcher
    • Subsequent calls with same signature reuse existing dispatcher
  3. Documentation

    • New: BLOCK_DISPATCHER_OPTIMIZATION.md - detailed optimization guide
    • New: CONTROL_FLOW_IMPLEMENTATION.md - comprehensive implementation guide
    • Removed: 6 obsolete design docs

How It Works

  1. Compute loop state signature (visible loop labels + identity hashes)
  2. Check if dispatcher exists for that signature in blockDispatcherLabels
  3. If first use: create dispatcher label, emit dispatcher code after call site
  4. If reuse: jump to existing dispatcher label
  5. Dispatcher stays within loop scope (no frame computation issues)

Trade-offs

Advantages:

  • ✅ Massive savings for multiple calls (61% for 4 calls)
  • ✅ Common pattern in real Perl code
  • ✅ No frame computation issues
  • ✅ Automatic optimization

Disadvantages:

  • ⚠️ 23 bytes overhead for single calls (acceptable)
  • Small HashMap overhead per method

Net Result: Overall WIN for typical Perl code patterns.

Testing

All 2006 unit tests pass, including:

  • ✅ Control flow tests (last/next/redo)
  • ✅ Non-local control flow
  • ✅ Tail call optimization
  • ✅ Nested loops
  • ✅ Labeled control flow
  • ✅ Complex real-world code

Why This Works Better Than Alternatives

vs. Per-Call Dispatchers (Previous)

  • Eliminates redundancy while maintaining correctness
  • 36-76% savings for multi-call blocks

vs. Method-Level Centralization (Attempted)

  • Stays within loop scope (no frame errors)
  • Only checks visible loops (not all method loops)
  • Actually reduces bytecode (centralization increased it)

Block-level is the sweet spot: sharing within proper scope boundaries.

🤖 Generated with Claude Code

Implements block-level dispatcher sharing to eliminate redundant control
flow dispatch code when multiple calls occur in the same block with the
same visible loops.

Key improvements:
- Multiple calls in same block share ONE dispatcher (not one per call)
- Call sites reduced from ~150 bytes to ~20 bytes each
- Block dispatcher emitted once per unique loop state (~150 bytes)
- 36-76% bytecode savings for blocks with 2+ calls

Results:
- Test with 4 sequential calls: 2232 → 2139 lines (4.2% reduction)
- CHECKCAST operations: 23 → 17 (26% reduction)
- All 2006 unit tests pass

Implementation:
- JavaClassInfo: Added blockDispatcherLabels map and getLoopStateSignature()
- EmitSubroutine: Simplified call sites, added emitBlockDispatcher() helper
- Dispatcher stays within loop scope (no frame computation issues)
- Automatic signature-based reuse via identity hash codes

Trade-offs:
- Single call: 23 bytes overhead (acceptable)
- Multiple calls: massive savings (61% for 4 calls, 76% for 10 calls)
- Net win for typical Perl code patterns

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@fglock fglock merged commit 5f0b7b9 into master Feb 4, 2026
2 checks passed
@fglock fglock deleted the optimize-block-level-dispatchers branch February 4, 2026 17:38
fglock added a commit that referenced this pull request Feb 4, 2026
With block-level dispatcher sharing (PR #161), non-local control flow
now works correctly. The skip() function can use 'last SKIP' directly
without workarounds.

Changes:
- Test/More.pm: Replaced skip_internal() with proper skip() that uses last SKIP
- TestMoreHelper.java: Removed skip() call rewriting logic
- test.pl.patch: Removed skip_internal() workaround from Perl 5 tests

Testing:
- All 2012 unit tests pass (100%)
- Perl 5 tests work correctly with native skip() implementation
- Non-local last SKIP exits SKIP block immediately from subroutine

This cleanup removes ~100 lines of workaround code that is no longer needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant