feat: Add negated regex match (!~) operator to interpreter#206
Closed
fglock wants to merge 16 commits into
Closed
Conversation
Implement support for the !~ operator in the interpreter's bytecode compiler, enabling tests like perl5_t/t/re/charset.t to run with JPERL_EVAL_USE_INTERPRETER=1. Changes: - Added MATCH_REGEX_NOT opcode (217) to Opcodes.java - Implemented compiler support in BytecodeCompiler.java - Added runtime handler in BytecodeInterpreter.java - Added disassembly support in InterpretedCode.java The operator negates the result of RuntimeRegex.matchRegex() to provide the inverse match semantics of the =~ operator. Fixes eval STRING failures in tests using negated regex matches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixed multiple issues with context propagation in logical operators
and regex matching, ensuring operands are evaluated in SCALAR context
for boolean tests.
Changes:
1. BytecodeCompiler (interpreter path):
- Fixed &&, ||, // operators to evaluate operands in SCALAR context
- Fixed !, not operators to evaluate operands in SCALAR context
- Fixed ternary ? : operator to evaluate condition in SCALAR context
2. EmitLogicalOperator (JVM path):
- Fixed logical operators to always use SCALAR context for operands
- Previously preserved RUNTIME context, causing wantarray (often VOID)
to be used instead of SCALAR context
- This bug was exposed when postfix if was the last statement
3. RuntimeRegex:
- Fixed regex matches in LIST context to return (1) for success
when there are no captures (non-global matches)
- Previously returned empty list, causing false in boolean context
Fixes postfix if/unless, logical NOT, ternary operator, and regex
matching in various contexts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pport
Added full support for loop control operators (last, next, redo) in the
interpreter bytecode compiler and runtime, including labeled control flow.
Changes:
1. Opcodes.java:
- Added LAST (218), NEXT (219), REDO (220) opcodes
- Format: opcode + target PC (absolute jump address)
2. BytecodeCompiler.java:
- Added LoopInfo class to track loop boundaries and labels
- Added loopStack to manage nested loops
- Implemented handleLoopControlOperator() for last/next/redo compilation
- Updated For1Node visitor to push/pop loop info and patch jumps
- Updated For3Node visitor to push/pop loop info and patch jumps
- Tracks three jump types per loop:
* breakPcs: PCs to patch for 'last' (jump to end)
* nextPcs: PCs to patch for 'next' (jump to continue)
* redoPcs: PCs to patch for 'redo' (jump to start)
3. BytecodeInterpreter.java:
- Added runtime handlers for LAST/NEXT/REDO opcodes
- Simple PC jump implementation (non-local control flow not yet supported)
4. InterpretedCode.java:
- Added disassembly support for LAST/NEXT/REDO opcodes
Features:
- Unlabeled last/next/redo work in nearest enclosing loop
- Labeled last/next/redo work with nested loops
- Proper jump target patching during compilation
- All three operators fully tested with for loops
Fixes infinite loop in re/pat_rt_report.t test #21 which uses 'last'
inside eval STRING.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add isTrueLoop flag to LoopInfo to distinguish true loops (for/while/foreach) from pseudo-loops (do-while/bare blocks). Throw proper error when loop control operators (last/next/redo) are used in do-while loops, matching Perl behavior. This fixes the infinite loop issue in re/pat_rt_report.t test where next was incorrectly allowed in do-while loops. Test now passes 2384/2514 tests (94.8%). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The RHS of logical operators (&&, ||, //) should preserve RUNTIME context when the operator is the return value of a subroutine. The LHS always uses SCALAR context for the boolean test, but the RHS should use the current context (RUNTIME, SCALAR, or LIST) to properly propagate wantarray through the operator at subroutine exit. This fixes op/wantarray.t tests 14-22, which now pass 28/28. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r mode)
Fix for interpolated regex patterns in eval STRING where invalid quantifier
braces need to be treated as literals. **This fix applies to JVM compiler mode only.**
Interpreter mode still has the issue (see dev/prompts/eval-interpreter-regex-issue.md).
## Problem
When a regex pattern with escaped braces like `/(.*?)\{(.*?)\}/g` is
interpolated into a `qq{}` string or heredoc, the backslashes are consumed
by the string interpolation, resulting in `/(.*?){(.*?)}/g`. The pattern
`{(.*?)}` has invalid quantifier syntax (quantifiers must be {n}, {n,}, or
{n,m} with numeric values).
Java's Pattern.compile() rejects this with PatternSyntaxException, but
real Perl treats invalid quantifier braces as literal characters with
a deprecation warning.
## Solution
Add preprocessing in RegexPreprocessor.escapeInvalidQuantifierBraces()
to detect and escape invalid quantifier braces before passing patterns to
Java's Pattern.compile().
Key features:
- Detects invalid quantifier syntax: {(.*?)}, {abc}, {}, {,5}, etc.
- Preserves valid quantifiers: {3}, {2,4}, {2,}
- Skips escape sequences that use braces: \N{...}, \x{...}, \o{...},
\p{...}, \P{...}, \g{...}
- Handles character classes correctly (braces in [...] are always literal)
- Escapes both opening and closing braces of invalid quantifiers
- Comprehensive warning comments about edge cases and potential issues
## Test Coverage
New test file: src/test/resources/unit/regex/unescaped_braces.t
- Direct unescaped braces: `/(.*?){(.*?)}/g` matches "a{b}c{d}" correctly
- Interpolated patterns in eval STRING now work
- Valid quantifiers unchanged: /ab{3}/, /ab{2,4}/, /ab{2,}/
- Mixed valid and invalid braces: /x{y}z{3}/
- Character class braces remain literal: /[a{3}]+/
- Empty and non-numeric braces: /x{}y/, /x{abc}y/
## Test Results
Test case: `my $rx = q{/(.*?)\{(.*?)\}/g}; eval qq{while (\$input =~ $rx) {...}}`
| Mode | Input "a{b}c{d}" | Expected | Actual | Status |
|------|----------|----------|--------|--------|
| Real Perl | 2 matches | i=2 | i=2 | ✅ Reference |
| JVM Compiler | 2 matches | i=2 | i=2 | ✅ **Fixed** |
| Interpreter | 2 matches | i=2 | i=0 or infinite | ❌ **Still broken** |
Fixes: perl5_t/t/re/pat_rt_report.t test 21 in JVM compiler mode
(test still fails in interpreter mode - requires separate fix)
## Known Limitations
⚠️ **Interpreter mode is not fixed** - Different code path bypasses preprocessor.
See dev/prompts/eval-interpreter-regex-issue.md for analysis.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The fix for invalid quantifier braces only works in JVM compiler mode. Interpreter mode has a different code path that bypasses the preprocessor, resulting in either no matches or infinite loops. This document analyzes the problem and outlines next steps for fixing the interpreter mode.
**Problem:** In interpreter mode, while/if loop conditions were being evaluated in VOID context instead of SCALAR context, causing regex matches to fail when used in eval STRING. This led to test 21 in pat_rt_report.t hanging infinitely. **Root Cause:** BytecodeCompiler was not setting proper context when visiting condition nodes in For3Node (while/for loops) and IfNode (if statements). The JVM compiler correctly uses SCALAR context for these conditions, but the interpreter was using whatever the current context was (often VOID). **Fix:** - For3Node: Save/restore context, set to SCALAR when evaluating condition - IfNode: Save/restore context, set to SCALAR when evaluating condition - Add DEBUG_REGEX flag to trace regex compilation and matching - Add debug logging in RuntimeRegex and BytecodeInterpreter **Testing:** - All three test cases in /tmp/test.pl now pass in interpreter mode - Unit tests pass: make test - perl5_t/t/re/pat_rt_report.t now completes 2384/2514 tests (94.8%) without hanging (previously hung on test 21) **Related:** - Matches behavior in JVM compiler mode (EmitStatement.emitFor3) - Fixes infinite loop in interpreter mode for regex in eval STRING Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When assigning special variables like $& to captured variables in eval STRING interpreter mode, the SET_SCALAR opcode was copying empty fields instead of the computed value. Solution: Use addToScalar() instead of set() in SET_SCALAR opcode. ScalarSpecialVariable.addToScalar() already calls getValueAsScalar() to get the computed value, matching how the JVM backend handles this. Changes: - BytecodeInterpreter.java: Change SET_SCALAR to use addToScalar() - BytecodeCompiler.java: Use NameNormalizer for global variable names Results: - re/regexp.t: 1144 → 1746 passing tests (+602, 99% of JVM mode) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The interpreter was incorrectly using numeric bitwise operators (BITWISE_*_BINARY) for &, |, ^ by default, when these should use string bitwise operators (STRING_BITWISE_*) which validate that operands don't contain code points over 0xFF. In Perl: - & | ^ without "use integer" → STRING bitwise (validates Unicode) - binary& binary| binary^ with "use integer" → NUMERIC bitwise This was causing tests to fail when eval STRING threw exceptions for high Unicode code points in bitwise operations - the exceptions weren't being thrown at all because the wrong operator was used. Changes: - BytecodeCompiler.java: Split & | ^ into separate cases from binary& binary| binary^, mapping & | ^ to STRING_BITWISE_* opcodes Results: - op/bop.t: 264 → 268 passing tests in interpreter mode (+4) - Exceptions now properly captured in $@ for eval STRING Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The recent fix for regex LIST context (commit 80fab8b) incorrectly returned (1) for patterns with optional captures that didn't participate. This affected both JVM compiler and interpreter modes. ## Problem Pattern: /(a)?/ String: "" - Expected: (undef) [capture group exists but didn't match] - Before fix: (1) [incorrectly treated as "no captures"] The bug was checking if result.elements.isEmpty() instead of checking if the pattern has zero capturing groups (captureCount == 0). ## Solution 1. Track captureCount outside the while loop for later use 2. Always add captures to result, using scalarUndef for non-participating groups (when Matcher.group(i) returns null) 3. Check captureCount == 0 instead of result.elements.isEmpty() ## Perl Semantics - Non-participating captures (e.g., (a)? not matching) → undef - Empty captures (e.g., (a*) matching zero a's) → empty string "" - Pattern with no captures → (1) on success ## Test Results Comprehensive test suite confirms correct behavior: - /abc/ matching "abc" → (1) ✓ - /(a)(b)(c)/ matching "abc" → ("a","b","c") ✓ - /(a)?/ matching "" → (undef) ✓ [was (1) before fix] - /(a)|(b)/ matching "a" → ("a",undef) ✓ [was ("a") before fix] - /(a*)b/ matching "b" → ("") ✓ Unit tests pass. re/regexp.t maintains 1765 passing tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The escapeInvalidQuantifierBraces feature (added in commit 42bcde5) was too aggressive and broke 7 tests, reducing passing count from 1788 to 1781 even after the {,m} and \b{}/\B{} fixes. ## Problem The feature attempted to escape invalid quantifier braces like {(.*?)} but had issues: 1. Edge cases with complex nested patterns 2. Interaction with other escape sequences 3. Breaking valid Perl patterns in subtle ways ## Solution Disable the feature entirely for now. The original issue it solved (interpolated patterns in eval STRING with unescaped braces) is rare, and the feature needs more comprehensive testing before re-enabling. ## Test Results Before (with feature + fixes): 1781/2210 After (feature disabled): 1788/2210 ✅ This exceeds the 1786 target and matches the pre-regression baseline. ## Future Work To re-enable this feature safely: - Add comprehensive test suite for edge cases - Test interaction with all escape sequences - Ensure no regressions in re/regexp.t - Consider alternative approaches (runtime error handling vs preprocessing) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This test file was added in commit 42bcde5 to test the escapeInvalidQuantifierBraces feature, which has now been disabled due to causing test regressions. The feature will need more comprehensive testing before being re-enabled, at which point this test can be restored and expanded.
…T context" This reverts commit e050511.
…ST context" This reverts commit a078172.
…es in LIST context"" This reverts commit 1a0346e.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds support for the negated regex match (
!~) operator to the interpreter's bytecode compiler, fixing eval STRING failures in tests that use this operator.Problem
The test
perl5_t/t/re/charset.twas failing when run withJPERL_EVAL_USE_INTERPRETER=1due to missing support for the!~operator. The error was:Solution
Implemented the
!~operator following the SKILL.md guide:MATCH_REGEX_NOT = 217inOpcodes.java!~inBytecodeCompiler.compileBinaryOperatorSwitch()BytecodeInterpreter.javathat negates the=~resultInterpretedCode.javafor debuggingChanges
Modified Files
src/main/java/org/perlonjava/interpreter/Opcodes.java- Added MATCH_REGEX_NOT opcodesrc/main/java/org/perlonjava/interpreter/BytecodeCompiler.java- Added compiler casesrc/main/java/org/perlonjava/interpreter/BytecodeInterpreter.java- Added runtime handlersrc/main/java/org/perlonjava/interpreter/InterpretedCode.java- Added disassembly caseNew Files
dev/prompts/20260218_interpreter_negated_regex_match.md- Implementation notesTesting
Before Fix
Tests using
!~were failing:After Fix
All
!~operator tests pass:Test Results
perl5_t/t/re/charset.truns successfully with interpreter modeRelated
🤖 Generated with Claude Code