Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 15 additions & 22 deletions dev/architecture/large-code-refactoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

PerlOnJava uses a **two-tier strategy** to handle Perl code that exceeds the JVM's 65,535-byte method size limit:

1. **Proactive**: During codegen, large blocks are detected and wrapped in closure calls to split them across multiple JVM methods
1. **Proactive**: During codegen, large blocks are detected and wrapped in a closure call to push them into a separate JVM method
2. **Reactive fallback**: If ASM still produces a method that's too large, the code is compiled using the bytecode interpreter backend instead

## The Problem
Expand All @@ -13,7 +13,7 @@ The JVM limits each method to 65,535 bytes of bytecode. PerlOnJava compiles each

### Closure Scoping Complication

The natural fix is to split large blocks into chunks wrapped in anonymous subs: `sub { ...chunk... }->(@_)`. However, this changes lexical scoping. When `use` or `require` statements are wrapped in closures, their imports happen in the closure's scope instead of the package scope:
The natural fix is to wrap large blocks in anonymous subs: `sub { ...block... }->(@_)`. However, this changes lexical scoping. When `use` or `require` statements are wrapped in closures, their imports happen in the closure's scope instead of the package scope:

```perl
# Original code
Expand All @@ -31,7 +31,7 @@ my $x = $Config{foo}; # ERROR: %Config not in scope

This is why proactive refactoring skips subroutines, special blocks (BEGIN/END/INIT/CHECK/UNITCHECK), and blocks with unsafe control flow.

## Tier 1: Proactive Block Refactoring
## Tier 1: Proactive Block Wrapping

### Entry Point

Expand All @@ -52,7 +52,13 @@ EmitBlock.emitBlock(visitor, blockNode)
└── Return false → normal block emission continues
```

Wrapping pushes the block's code into a separate JVM method (the anonymous sub body), giving it its own 64KB budget.
Wrapping pushes the block's code into a separate JVM method (the anonymous sub body), giving it its own 64KB budget. This effectively doubles the available space for that block.

### Limitations

The wrapping is a **single-level** operation — it wraps the entire block in one closure. It does not recursively split the block into smaller chunks. This means:
- For blocks up to ~2x the 64KB limit, wrapping succeeds (the block fits in the new method)
- For blocks larger than ~2x the limit, wrapping is insufficient and the `MethodTooLargeException` still occurs, triggering Tier 2

### Thresholds

Expand All @@ -63,12 +69,12 @@ Wrapping pushes the block's code into a separate JVM method (the anonymous sub b

### Key Classes

- **`BlockRefactor`** (`backend/jvm/astrefactor/BlockRefactor.java`) — Utility methods: `createAnonSubCall()` creates `sub { ... }->(@_)` AST nodes, `buildNestedStructure()` builds nested tail-closure chains, `createBlockNode()` with anti-recursion guard
- **`BlockRefactor`** (`backend/jvm/astrefactor/BlockRefactor.java`) — Constants and `createAnonSubCall()` utility that creates `sub { ... }->(@_)` AST nodes
- **`LargeBlockRefactorer`** (`backend/jvm/astrefactor/LargeBlockRefactorer.java`) — Orchestrates block-level refactoring: size estimation, control flow safety checks, whole-block wrapping

## Tier 2: Interpreter Fallback

When the proactive refactoring is insufficient (or skipped due to unsafe control flow), ASM may still throw `MethodTooLargeException`. The fallback catches this and compiles the code using the bytecode interpreter instead.
When the proactive wrapping is insufficient (or skipped due to unsafe control flow), ASM throws `MethodTooLargeException`. The fallback catches this and compiles the code using the bytecode interpreter instead.

### Flow

Expand All @@ -94,7 +100,7 @@ The fallback also handles other compilation failures (`VerifyError`, `ClassForma

When fallback is triggered with `JPERL_SHOW_FALLBACK=1`:
```
Note: Method too large after AST splitting, using interpreter backend.
Note: Method too large, using interpreter backend.
```

## Technical Details
Expand All @@ -106,27 +112,14 @@ Note: Method too large after AST splitting, using interpreter backend.
### Refactoring Strategy
1. **Whole-block wrapping**: The entire block becomes `sub { <block> }->(@_)`
2. **`@_` passthrough**: Arguments are forwarded so the wrapper is transparent
3. **Anti-recursion guard**: `BlockRefactor.createBlockNode()` sets a thread-local `skipRefactoring` flag to prevent infinite recursion when the wrapper's BlockNode is constructed
3. **Anti-recursion guard**: `blockAlreadyRefactored` annotation prevents infinite recursion when the wrapper's BlockNode is processed
4. **Safe boundaries**: Blocks with unlabeled control flow (`next`/`last`/`redo`/`goto` outside loops) are not refactored, since these would break when wrapped in a closure

### Dead Code

The codebase contains remnants of a former retry-based approach that was replaced by the interpreter fallback:

| Dead Code | Purpose (unused) |
|-----------|-----------------|
| `LargeBlockRefactorer.forceRefactorForCodegen()` | Was meant for reactive retry after MethodTooLargeException |
| `LargeBlockRefactorer.trySmartChunking()` | Sophisticated chunking algorithm (only called by dead code above) |
| `DepthFirstLiteralRefactorVisitor` (entire class) | Depth-first literal refactoring (marked OBSOLETE in design docs) |
| `LargeNodeRefactorer` (entire class) | Element list chunking (only called by dead code above) |

These are candidates for removal.

## Implementation Files

| File | Role |
|------|------|
| `backend/jvm/astrefactor/BlockRefactor.java` | Constants, closure-wrapping utilities |
| `backend/jvm/astrefactor/BlockRefactor.java` | Constants, closure-wrapping utility |
| `backend/jvm/astrefactor/LargeBlockRefactorer.java` | Block-level proactive refactoring |
| `backend/jvm/EmitBlock.java` | Calls `processBlock()` during block emission |
| `backend/jvm/EmitterMethodCreator.java` | Catches `MethodTooLargeException`, triggers interpreter fallback |
Expand Down
2 changes: 1 addition & 1 deletion dev/custom_bytecode/STATUS.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Show diagnostic messages when compilation paths are taken:
export JPERL_SHOW_FALLBACK=1
./jperl script.pl
# Output: "Note: JVM compilation succeeded."
# Or: "Note: Method too large after AST splitting, using interpreter backend."
# Or: "Note: Method too large, using interpreter backend."
```

### JPERL_EVAL_USE_INTERPRETER
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -510,7 +510,7 @@ private static RuntimeCode compileToExecutable(Node ast, EmitterContext ctx) thr
if (needsInterpreterFallback(e)) {
boolean showFallback = System.getenv("JPERL_SHOW_FALLBACK") != null;
if (showFallback) {
System.err.println("Note: Method too large after AST splitting, using interpreter backend.");
System.err.println("Note: Method too large, using interpreter backend.");
}

if (CompilerOptions.DEBUG_ENABLED) ctx.logDebug("Falling back to bytecode interpreter due to method size");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1529,7 +1529,7 @@ public static RuntimeCode createRuntimeCode(
} catch (MethodTooLargeException e) {
if (USE_INTERPRETER_FALLBACK) {
if (SHOW_FALLBACK) {
System.err.println("Note: Method too large after AST splitting, using interpreter backend.");
System.err.println("Note: Method too large, using interpreter backend.");
}
return compileToInterpreter(ast, ctx, useTryCatch);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,113 +28,4 @@ public static BinaryOperatorNode createAnonSubCall(int tokenIndex, BlockNode nes
tokenIndex
);
}

/**
* Builds nested closure structure from segments.
* Structure: direct1, direct2, sub{ chunk1, sub{ chunk2, chunk3 }->(@_) }->(@_)
* Closures are always placed at tail position to preserve variable scoping.
*
* @param segments List of segments (either Node for direct elements or List<Node> for chunks)
* @param tokenIndex token index for new nodes
* @param minChunkSize minimum size for a chunk to be wrapped in a closure
* @param returnTypeIsList if true, wrap elements in ListNode to return list; if false, execute statements
* @param skipRefactoring thread-local flag to prevent recursion during BlockNode construction
* @return List of processed elements with nested structure
*/
@SuppressWarnings("unchecked")
public static List<Node> buildNestedStructure(
List<Object> segments,
int tokenIndex,
int minChunkSize,
boolean returnTypeIsList,
ThreadLocal<Boolean> skipRefactoring) {
if (segments.isEmpty()) {
return new ArrayList<>();
}

int firstBigIndex = -1;
int endExclusive = segments.size();
Node tailClosure = null;

for (int i = segments.size() - 1; i >= 0; i--) {
Object segment = segments.get(i);
if (!(segment instanceof List)) {
continue;
}
List<Node> chunk = (List<Node>) segment;
if (chunk.size() < minChunkSize) {
continue;
}

firstBigIndex = i;

List<Node> blockElements = new ArrayList<>();
blockElements.addAll(chunk);
for (int s = i + 1; s < endExclusive; s++) {
Object seg = segments.get(s);
if (seg instanceof Node directNode) {
blockElements.add(directNode);
} else {
blockElements.addAll((List<Node>) seg);
}
}
if (tailClosure != null) {
blockElements.add(tailClosure);
}

List<Node> wrapped = returnTypeIsList ? wrapInListNode(blockElements, tokenIndex) : blockElements;
BlockNode block = createBlockNode(wrapped, tokenIndex, skipRefactoring);
tailClosure = createAnonSubCall(tokenIndex, block);

endExclusive = i;
}

if (tailClosure == null) {
List<Node> result = new ArrayList<>();
for (Object segment : segments) {
if (segment instanceof Node directNode) {
result.add(directNode);
} else {
result.addAll((List<Node>) segment);
}
}
return result;
}

List<Node> result = new ArrayList<>();
for (int s = 0; s < firstBigIndex; s++) {
Object seg = segments.get(s);
if (seg instanceof Node directNode) {
result.add(directNode);
} else {
result.addAll((List<Node>) seg);
}
}
result.add(tailClosure);
return result;
}

/**
* Wraps elements in a ListNode to ensure the closure returns a list of elements.
*/
private static List<Node> wrapInListNode(List<Node> elements, int tokenIndex) {
ListNode listNode = new ListNode(elements, tokenIndex);
listNode.setAnnotation("chunkAlreadyRefactored", true);
return List.of(listNode);
}

/**
* Creates a BlockNode using thread-local flag to prevent recursion.
*/
private static BlockNode createBlockNode(List<Node> elements, int tokenIndex, ThreadLocal<Boolean> skipRefactoring) {
BlockNode block;
skipRefactoring.set(true);
try {
block = new BlockNode(elements, tokenIndex);
} finally {
skipRefactoring.set(false);
}
return block;
}

}
Loading
Loading