feat: Automated opcode generation tool and 43 new interpreter operators#209
Merged
Conversation
…dling
This commit consolidates runtime and JVM compiler enhancements:
Features:
- Add escapeInvalidQuantifierBraces function for Perl regex compatibility
(currently disabled due to test regressions - needs more work)
- Add DEBUG_REGEX environment variable support for regex debugging
Fixes:
- Preserve RUNTIME context for RHS of logical operators in JVM compiler
- Evaluate LHS of logical operators in SCALAR context (for boolean test)
- Add debug logging to RuntimeRegex.compile() and matchRegexDirect()
Implementation Details:
- EmitLogicalOperator: Changed context handling for logical operators
- LHS evaluated in SCALAR context for boolean test
- RHS preserves RUNTIME context when in RUNTIME mode
- Prevents context loss at subroutine exits
- RegexPreprocessor: Added escapeInvalidQuantifierBraces()
- Handles Perl-style quantifier braces like {1}, {,3}, {2,5}
- Escapes invalid braces that would cause Java Pattern.compile() errors
- Currently disabled (line 82-84) due to edge case regressions
- Function ready for future refinement and re-enabling
- RuntimeRegex: Added DEBUG_REGEX support
- Set DEBUG_REGEX=1 environment variable to enable regex debug output
- Logs pattern compilation, cache hits/misses, and matching operations
- Helps diagnose regex preprocessing and matching issues
Files Modified:
- EmitLogicalOperator.java: +17/-12 lines
- RegexPreprocessor.java: +212/-0 lines
- RegexPreprocessorHelper.java: +123/-71 lines (refactored)
- RuntimeRegex.java: +41/-13 lines
Test Results (vs master):
- re/regexp.t: 1788/2210 (+2)
- re/pat.t: 896/1296 (+1)
- re/pat_rt_report.t: 2384/2514 (+3)
- re/reg_mesg.t: 1642/2479 (no change)
- Net: +6 improvements, 0 regressions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The BytecodeCompiler was emitting STRING_BITWISE_* opcodes for the default bitwise operators (&, |, ^) when it should emit BITWISE_*_BINARY opcodes. In Perl, the default bitwise operators perform numeric operations, not string operations. This bug caused eval STRING expressions like 'eval "83 | 120"' to return 930 (string bitwise OR result) instead of 123 (numeric bitwise OR result). Fixed: - & now emits BITWISE_AND_BINARY (was STRING_BITWISE_AND) - | now emits BITWISE_OR_BINARY (was STRING_BITWISE_OR) - ^ now emits BITWISE_XOR_BINARY (was STRING_BITWISE_XOR) The string bitwise operators (&., |., ^.) continue to emit STRING_BITWISE_* opcodes correctly. Impact: Fixes interpreter parity for bitwise operations in eval STRING context. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Created dev/tools/generate_opcode_handlers.pl to automatically generate opcode handlers for built-in functions from OperatorHandler.java. Key Features: - Automatically reads LASTOP from Opcodes.java to determine next opcode - Skips existing opcodes to avoid duplicates - Generates handler classes with efficient zero-overhead dispatch pattern - Automatically updates Opcodes.java, BytecodeInterpreter.java, and InterpretedCode.java at marker locations - Uses -> syntax for clean, modern Java code Generated Handlers: - ScalarUnaryOpcodeHandler: 31 operators (chr, ord, abs, sin, cos, lc, uc, etc.) - ScalarBinaryOpcodeHandler: 12 operators (atan2, eq, ne, lt, le, gt, ge, cmp, binary&, binary|, binary^, x) Opcodes Generated: - Reserved range: 221-263 (43 opcodes) - Next available: 264 Markers Added: - // GENERATED_OPCODES_START/END in Opcodes.java - // GENERATED_HANDLERS_START/END in BytecodeInterpreter.java - // GENERATED_DISASM_START/END in InterpretedCode.java Implementation: - Added LASTOP constant to track manually-assigned opcodes - Tool excludes generated sections when reading existing opcodes - Skips operators with complex signatures (varargs, etc.) - Skips operators that already have opcodes (rand, length, rindex, index, require, isa, bless, ref, join, prototype, getc) Future Work: - Add BytecodeCompiler.java generation for emit cases - Add more operator types (list, array, hash operations) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added manual emit cases for key unary operators (chr, ord, hex, oct, abs, int, uc, lc) in BytecodeCompiler.java to enable interpreter execution. Updated SKILL.md with comprehensive code generator documentation: - Quick start guide - Eligibility criteria for operators - LASTOP management critical for opcode numbering - Common gotchas and solutions - Testing procedures - Manual implementation guidance All 17 test cases now pass with interpreter: ✓ chr, ord, abs, int, uc, lc, hex, oct (unary) ✓ eq, ne, cmp, lt, gt, x (binary) ✓ Bitwise OR, AND, XOR Next: Enhance tool to auto-generate BytecodeCompiler emit cases to reduce code repetition. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enhanced dev/tools/generate_opcode_handlers.pl to automatically generate emit cases in BytecodeCompiler.java, eliminating 150+ lines of repetitive code. Changes: - Tool now updates 4 files automatically (was 3): * Opcodes.java - opcode constants * BytecodeInterpreter.java - dispatch cases * InterpretedCode.java - disassembly cases * BytecodeCompiler.java - emit cases (NEW!) - Removed 150+ lines of repetitive manual emit code - All 31 unary operators now generated automatically - Binary/ternary operators can be added similarly Verification: - LASTOP tracking works correctly (starts at 221 = LASTOP + 1) - All 17 test cases pass ✓ - Build successful, no compilation errors Benefits: - Eliminates manual code repetition - Consistent pattern across all operators - Easy to add new operators (just run tool) - Reduces maintenance burden Tool Usage: ``` perl dev/tools/generate_opcode_handlers.pl make ``` Next: Add binary/ternary emit case generation for complete automation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Changed generated opcodes to use LASTOP + offset notation instead of
hardcoded numbers, making manual opcode additions much easier.
Before:
public static final short ATAN2 = 228;
public static final short INT = 221;
After:
public static final short ATAN2 = LASTOP + 8;
public static final short INT = LASTOP + 1;
Benefits:
- Add manual opcode: just update LASTOP, run tool
- All 43 generated opcodes auto-adjust
- No manual renumbering needed
- Clear relationship to LASTOP visible in code
Example workflow:
1. Add manual opcode at 221
2. Update LASTOP = 221
3. Run perl dev/tools/generate_opcode_handlers.pl
4. Generated opcodes shift from 221-263 to 222-264 automatically
Verification:
- All 17 tests pass ✓
- INT = LASTOP + 1 = 221 (correct)
- Build successful
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ptimization
Fixed critical opcode ordering issue where opcodes were assigned in
OperatorHandler.java appearance order, creating gaps that prevented JVM
tableswitch optimization. Now assigns opcodes contiguously grouped by
signature type:
- Binary operators: LASTOP+1 through LASTOP+12 (12 contiguous)
- Unary operators: LASTOP+13 through LASTOP+43 (31 contiguous)
This ensures JVM uses tableswitch (O(1)) instead of lookupswitch (O(log n))
for optimal interpreter performance. Verified with javap showing
"tableswitch { // 0 to 263" covering all opcodes.
All 17 test cases pass (chr, ord, abs, int, uc, lc, hex, oct, eq, ne,
cmp, lt, gt, x, bitwise |, &, ^).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds automated code generation tool for interpreter opcodes, enabling bulk addition of 43 built-in operators (chr, ord, abs, int, uc, lc, hex, oct, eq, ne, cmp, lt, gt, etc.) with zero-overhead dispatch patterns and optimal JVM performance.
Key Features
Automated Code Generation Tool (
dev/tools/generate_opcode_handlers.pl)OperatorHandler.javafor eligible operators (scalar unary/binary/ternary)43 New Interpreter Operators
LASTOP-Relative Numbering
LASTOP + offsetnotationContiguous Opcode Assignment
tableswitch(O(1)) instead oflookupswitch(O(log n))javap: "tableswitch { // 0 to 263"Comprehensive Documentation
dev/interpreter/SKILL.mdwith code generator guidePerformance
Files Changed
Generated Files:
ScalarUnaryOpcodeHandler.java(115 lines)ScalarBinaryOpcodeHandler.java(74 lines)Auto-Updated Files:
Opcodes.java- 43 new opcode constants with LASTOP-relative numberingBytecodeInterpreter.java- dispatch cases for all handlersInterpretedCode.java- disassembly casesBytecodeCompiler.java- 576 lines of auto-generated emit casesTool & Documentation:
dev/tools/generate_opcode_handlers.pl(589 lines)dev/interpreter/SKILL.md(175 lines added)Test plan
make)make test-unit)--interpreterflagtableswitch { // 0 to 263(optimal dispatch)perl dev/tools/generate_opcode_handlers.plworks correctly🤖 Generated with Claude Code