Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
183 changes: 183 additions & 0 deletions dev/prompts/bytecode_compiler_size_analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
# Analysis: BytecodeCompiler Size and Recursion Usage

## Context

The user asked whether BytecodeCompiler's large size is due to **lack of recursion** in handling complex expressions. Specifically, they mentioned that `$$a` should generate `$a` first, then dereference it.

## Key Finding: BytecodeCompiler ALREADY Uses Recursion

**Answer: NO** - The size is NOT due to lack of recursion. BytecodeCompiler properly uses recursion for complex expressions.

### Evidence: Dereference Handling is Recursive

For `$$a`, the implementation in `BytecodeCompiler.compileVariableReference()` (lines 2494-2506):

```java
} else if (node.operand instanceof OperatorNode) {
// Operator dereference: $$x, $${expr}, etc.
node.operand.accept(this); // ← RECURSIVE: compiles $x first
int refReg = lastResultReg;

// Dereference the result
int rd = allocateRegister();
emitWithToken(Opcodes.DEREF, node.getIndex()); // ← Then emits DEREF
emitReg(rd);
emitReg(refReg);
lastResultReg = rd;
}
```

**Compilation flow for `$$$a`:**
1. Outer `$` → recursively compiles `$$a`
2. Middle `$` → recursively compiles `$a`
3. Inner `$` → loads variable `a`
4. Unwind: emit DEREF for middle level
5. Unwind: emit DEREF for outer level
6. Result: 3 opcodes generated efficiently

✅ **This is proper recursive compilation** - each dereference level calls `accept()` on its operand.

## Actual Size Breakdown

### Total BytecodeCompiler Ecosystem: ~9,000+ lines

| Component | Lines | Purpose |
|-----------|-------|---------|
| **BytecodeCompiler.java** | 3,904 | Main compilation entry, AST visitors |
| **CompileOperator.java** | 2,604 | Giant method handling 50+ operators |
| **CompileAssignment.java** | 1,481 | Assignment operations |
| **CompileBinaryOperator.java** | 594 | Binary operators |
| **Helper classes** | ~1,200+ | BinaryHelper, etc. |

### Largest Methods

| Method | Lines | Reason for Size |
|--------|-------|-----------------|
| `CompileOperator.visitOperator()` | **2,604** | One giant switch with 50+ operator cases inlined |
| `compileVariableDeclaration()` | 929 | Handles my/our/local/state × $/@/% × closures × special cases |
| `compileVariableReference()` | 279 | Handles $/@/%/&/\/* × identifiers/blocks/operators/derefs |
| `visit(ListNode)` | 216 | List compilation with context handling |
| `visit(For3Node)` | 150 | C-style for loops |

## Why BytecodeCompiler is Large

The size comes from **horizontal breadth** (many features), not vertical depth (lack of abstraction):

### 1. Many Operators (~50+)
CompileOperator handles: say, print, not, !, ~, binary~, ~., defined, ref, prototype, bless, scalar, package, class, die, warn, exit, return, rand, sleep, study, pos, lock, eval, do, require, use, no, goto, dump, sprintf, keys, values, each, delete, exists, push, pop, shift, unshift, reverse, sort, grep, map, join, split, substr, index, rindex, quotemeta, lc, lcfirst, uc, ucfirst, chr, ord, hex, oct, length, chop, chomp, matchRegex, substituteRegex, transliterate, and more.

### 2. Multiple Code Paths Per Feature
Example: Variable declarations have paths for:
- 4 keywords: my/our/local/state
- 3 sigils: $/@/%
- Captured vs regular variables
- Declared references (my \$x)
- Persistent storage
- List vs single declarations

### 3. Special Case Handling
- `@_` is always register 1
- Closure variables use RETRIEVE_BEGIN_* opcodes
- State variables need persistence
- Global vs lexical variable resolution
- Context-dependent behavior (scalar vs list)

### 4. Inlined Logic
Rather than extracting helper methods, much logic is inlined in switch cases for performance/clarity.

## Potential Improvements (If Desired)

If the goal is to reduce BytecodeCompiler size, here are options:

### Option A: Extract Operator Handlers (Similar to BytecodeInterpreter Pattern)

Like how BytecodeInterpreter extracted opcodes to `OpcodeHandlerExtended` and `OpcodeHandlerFileTest`, we could extract operator groups:

```
CompileOperator.java (2,604 lines)
↓ Extract to:
CompileOperatorString.java // substr, index, rindex, quotemeta, lc, uc, etc.
CompileOperatorArray.java // push, pop, shift, unshift, keys, values, etc.
CompileOperatorIO.java // say, print, die, warn, etc.
CompileOperatorRegex.java // matchRegex, substituteRegex, transliterate
...etc
```

**Pros:**
- Reduces individual file size
- Groups related operators
- Easier to find specific operator logic

**Cons:**
- More files to navigate
- Potentially harder to see all operators at once
- May not actually reduce total code (just distributes it)

### Option B: Extract Common Patterns into Helper Methods

Many operators share patterns:
- Evaluate operand in scalar context
- Allocate result register
- Emit opcode
- Return result register

Could extract reusable patterns:

```java
private int compileUnaryScalarOp(Node operand, short opcode) {
int savedContext = currentCallContext;
currentCallContext = RuntimeContextType.SCALAR;
operand.accept(this);
int rs = lastResultReg;
currentCallContext = savedContext;

int rd = allocateRegister();
emit(opcode);
emitReg(rd);
emitReg(rs);
return rd;
}
```

**Pros:**
- Reduces duplication
- Clearer operator patterns
- Easier to ensure consistency

**Cons:**
- Abstracts away details (may be harder to debug)
- Some operators have unique behavior that doesn't fit patterns

### Option C: Leave As-Is

The current structure is working:
- ✅ Recursive where it should be
- ✅ Reasonable delegation (CompileOperator, CompileBinaryOperator, CompileAssignment)
- ✅ Code is generally readable despite size
- ✅ Performance is good

**Pros:**
- No risk of introducing bugs
- Current structure is proven to work
- Size is manageable with good IDE navigation

## Recommendation

**The current BytecodeCompiler design is sound.** The recursion is implemented correctly for complex expressions like `$$a`. The size is due to the inherent complexity of compiling 50+ Perl operators with their various edge cases.

If size reduction is desired for maintainability, **Option A** (extract operator groups) would be most impactful, reducing CompileOperator.java from 2,604 lines to several smaller, focused files.

However, there's no urgent need to refactor unless:
1. The size is causing JVM compilation issues (it's not - it's compilation time, not runtime)
2. Maintainability is suffering (navigation with IDE works fine)
3. Team prefers smaller files for review purposes

## Question for User

Would you like to:
1. **Leave as-is** - The current design is working well
2. **Extract operator groups** - Split CompileOperator into themed handler classes
3. **Extract helper methods** - Reduce duplication with pattern extraction
4. **Something else** - Different approach to improve the codebase

The recursion is already correct, so any changes would be for code organization purposes only.
45 changes: 45 additions & 0 deletions src/main/java/org/perlonjava/interpreter/BytecodeInterpreter.java
Original file line number Diff line number Diff line change
Expand Up @@ -1887,6 +1887,51 @@ public static RuntimeList execute(InterpretedCode code, RuntimeArray args, int c
pc = OpcodeHandlerExtended.executeSubstrVar(bytecode, pc, registers);
break;

case Opcodes.TIE: {
// tie($var, $classname, @args): rd = TieOperators.tie(ctx, argsListReg)
// Format: TIE rd argsListReg context
int rd = bytecode[pc++];
int argsReg = bytecode[pc++];
int ctx = bytecode[pc++];
RuntimeList tieArgs = (RuntimeList) registers[argsReg];
RuntimeScalar result = org.perlonjava.operators.TieOperators.tie(
ctx,
tieArgs.elements.toArray(new RuntimeBase[0])
);
registers[rd] = result;
break;
}

case Opcodes.UNTIE: {
// untie($var): rd = TieOperators.untie(ctx, argsListReg)
// Format: UNTIE rd argsListReg context
int rd = bytecode[pc++];
int argsReg = bytecode[pc++];
int ctx = bytecode[pc++];
RuntimeList untieArgs = (RuntimeList) registers[argsReg];
RuntimeScalar result = org.perlonjava.operators.TieOperators.untie(
ctx,
untieArgs.elements.toArray(new RuntimeBase[0])
);
registers[rd] = result;
break;
}

case Opcodes.TIED: {
// tied($var): rd = TieOperators.tied(ctx, argsListReg)
// Format: TIED rd argsListReg context
int rd = bytecode[pc++];
int argsReg = bytecode[pc++];
int ctx = bytecode[pc++];
RuntimeList tiedArgs = (RuntimeList) registers[argsReg];
RuntimeScalar result = org.perlonjava.operators.TieOperators.tied(
ctx,
tiedArgs.elements.toArray(new RuntimeBase[0])
);
registers[rd] = result;
break;
}

// GENERATED_HANDLERS_END

default:
Expand Down
23 changes: 23 additions & 0 deletions src/main/java/org/perlonjava/interpreter/CompileOperator.java
Original file line number Diff line number Diff line change
Expand Up @@ -2597,6 +2597,29 @@ public static void visitOperator(BytecodeCompiler bytecodeCompiler, OperatorNode
bytecodeCompiler.emitInt(bytecodeCompiler.currentCallContext);

bytecodeCompiler.lastResultReg = rd;
} else if (op.equals("tie") || op.equals("untie") || op.equals("tied")) {
// tie($var, $classname, @args) or untie($var) or tied($var)
// Compile all arguments as a list
if (node.operand != null) {
node.operand.accept(bytecodeCompiler);
int argsReg = bytecodeCompiler.lastResultReg;

int rd = bytecodeCompiler.allocateRegister();
short opcode = switch (op) {
case "tie" -> Opcodes.TIE;
case "untie" -> Opcodes.UNTIE;
case "tied" -> Opcodes.TIED;
default -> throw new IllegalStateException("Unexpected operator: " + op);
};
bytecodeCompiler.emitWithToken(opcode, node.getIndex());
bytecodeCompiler.emitReg(rd);
bytecodeCompiler.emitReg(argsReg);
bytecodeCompiler.emit(bytecodeCompiler.currentCallContext);

bytecodeCompiler.lastResultReg = rd;
} else {
bytecodeCompiler.throwCompilerException(op + " requires arguments");
}
} else {
bytecodeCompiler.throwCompilerException("Unsupported operator: " + op);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ public static int executeStringConcatAssign(short[] bytecode, int pc, RuntimeBas
public static int executeBitwiseAndAssign(short[] bytecode, int pc, RuntimeBase[] registers) {
int rd = bytecode[pc++];
int rs = bytecode[pc++];
RuntimeScalar result = BitwiseOperators.bitwiseAndBinary(
RuntimeScalar result = BitwiseOperators.bitwiseAnd(
(RuntimeScalar) registers[rd],
(RuntimeScalar) registers[rs]
);
Expand Down
14 changes: 13 additions & 1 deletion src/main/java/org/perlonjava/interpreter/Opcodes.java
Original file line number Diff line number Diff line change
Expand Up @@ -894,10 +894,22 @@ public class Opcodes {
* Format: SUBSTR_VAR rd argsListReg ctx */
public static final short SUBSTR_VAR = 237;

/** tie($var, $classname, @args): rd = TieOperators.tie(ctx, argsListReg)
* Format: TIE rd argsListReg context */
public static final short TIE = 238;

/** untie($var): rd = TieOperators.untie(ctx, argsListReg)
* Format: UNTIE rd argsListReg context */
public static final short UNTIE = 239;

/** tied($var): rd = TieOperators.tied(ctx, argsListReg)
* Format: TIED rd argsListReg context */
public static final short TIED = 240;

// =================================================================
// BUILT-IN FUNCTION OPCODES - after LASTOP
// Last manually-assigned opcode (for tool reference)
private static final short LASTOP = 237;
private static final short LASTOP = 240;

// =================================================================
// Generated by dev/tools/generate_opcode_handlers.pl
Expand Down