feat: Add negated regex match (!~) operator to interpreter by fglock · Pull Request #206 · fglock/PerlOnJava

fglock · 2026-02-18T08:58:46Z

Summary

This PR adds support for the negated regex match (!~) operator to the interpreter's bytecode compiler, fixing eval STRING failures in tests that use this operator.

Problem

The test perl5_t/t/re/charset.t was failing when run with JPERL_EVAL_USE_INTERPRETER=1 due to missing support for the !~ operator. The error was:

Unsupported operator: !~ at (eval 345) line 1, near ") /x"

Solution

Implemented the !~ operator following the SKILL.md guide:

Added opcode definition - MATCH_REGEX_NOT = 217 in Opcodes.java
Added compiler support - Case for !~ in BytecodeCompiler.compileBinaryOperatorSwitch()
Added runtime implementation - Handler in BytecodeInterpreter.java that negates the =~ result
Added disassembly support - Case in InterpretedCode.java for debugging

Changes

Modified Files

src/main/java/org/perlonjava/interpreter/Opcodes.java - Added MATCH_REGEX_NOT opcode
src/main/java/org/perlonjava/interpreter/BytecodeCompiler.java - Added compiler case
src/main/java/org/perlonjava/interpreter/BytecodeInterpreter.java - Added runtime handler
src/main/java/org/perlonjava/interpreter/InterpretedCode.java - Added disassembly case

New Files

dev/prompts/20260218_interpreter_negated_regex_match.md - Implementation notes

Testing

Before Fix

Tests using !~ were failing:

not ok 3 - my $a = "\t"; $a !~ qr/ (?a: \S ) /x

After Fix

All !~ operator tests pass:

ok 3 - my $a = "\t"; $a !~ qr/ (?a: \S ) /x
ok 4 - my $a = "\t" x 10; $a !~ qr/ (?a: \S{10} ) /x

Test Results

✓ All unit tests pass
✓ perl5_t/t/re/charset.t runs successfully with interpreter mode
Total test cases: 5552

Implement support for the !~ operator in the interpreter's bytecode compiler, enabling tests like perl5_t/t/re/charset.t to run with JPERL_EVAL_USE_INTERPRETER=1. Changes: - Added MATCH_REGEX_NOT opcode (217) to Opcodes.java - Implemented compiler support in BytecodeCompiler.java - Added runtime handler in BytecodeInterpreter.java - Added disassembly support in InterpretedCode.java The operator negates the result of RuntimeRegex.matchRegex() to provide the inverse match semantics of the =~ operator. Fixes eval STRING failures in tests using negated regex matches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fixed multiple issues with context propagation in logical operators and regex matching, ensuring operands are evaluated in SCALAR context for boolean tests. Changes: 1. BytecodeCompiler (interpreter path): - Fixed &&, ||, // operators to evaluate operands in SCALAR context - Fixed !, not operators to evaluate operands in SCALAR context - Fixed ternary ? : operator to evaluate condition in SCALAR context 2. EmitLogicalOperator (JVM path): - Fixed logical operators to always use SCALAR context for operands - Previously preserved RUNTIME context, causing wantarray (often VOID) to be used instead of SCALAR context - This bug was exposed when postfix if was the last statement 3. RuntimeRegex: - Fixed regex matches in LIST context to return (1) for success when there are no captures (non-global matches) - Previously returned empty list, causing false in boolean context Fixes postfix if/unless, logical NOT, ternary operator, and regex matching in various contexts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…pport Added full support for loop control operators (last, next, redo) in the interpreter bytecode compiler and runtime, including labeled control flow. Changes: 1. Opcodes.java: - Added LAST (218), NEXT (219), REDO (220) opcodes - Format: opcode + target PC (absolute jump address) 2. BytecodeCompiler.java: - Added LoopInfo class to track loop boundaries and labels - Added loopStack to manage nested loops - Implemented handleLoopControlOperator() for last/next/redo compilation - Updated For1Node visitor to push/pop loop info and patch jumps - Updated For3Node visitor to push/pop loop info and patch jumps - Tracks three jump types per loop: * breakPcs: PCs to patch for 'last' (jump to end) * nextPcs: PCs to patch for 'next' (jump to continue) * redoPcs: PCs to patch for 'redo' (jump to start) 3. BytecodeInterpreter.java: - Added runtime handlers for LAST/NEXT/REDO opcodes - Simple PC jump implementation (non-local control flow not yet supported) 4. InterpretedCode.java: - Added disassembly support for LAST/NEXT/REDO opcodes Features: - Unlabeled last/next/redo work in nearest enclosing loop - Labeled last/next/redo work with nested loops - Proper jump target patching during compilation - All three operators fully tested with for loops Fixes infinite loop in re/pat_rt_report.t test #21 which uses 'last' inside eval STRING. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add isTrueLoop flag to LoopInfo to distinguish true loops (for/while/foreach) from pseudo-loops (do-while/bare blocks). Throw proper error when loop control operators (last/next/redo) are used in do-while loops, matching Perl behavior. This fixes the infinite loop issue in re/pat_rt_report.t test where next was incorrectly allowed in do-while loops. Test now passes 2384/2514 tests (94.8%). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The RHS of logical operators (&&, ||, //) should preserve RUNTIME context when the operator is the return value of a subroutine. The LHS always uses SCALAR context for the boolean test, but the RHS should use the current context (RUNTIME, SCALAR, or LIST) to properly propagate wantarray through the operator at subroutine exit. This fixes op/wantarray.t tests 14-22, which now pass 28/28. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…r mode) Fix for interpolated regex patterns in eval STRING where invalid quantifier braces need to be treated as literals. **This fix applies to JVM compiler mode only.** Interpreter mode still has the issue (see dev/prompts/eval-interpreter-regex-issue.md). ## Problem When a regex pattern with escaped braces like `/(.*?)\{(.*?)\}/g` is interpolated into a `qq{}` string or heredoc, the backslashes are consumed by the string interpolation, resulting in `/(.*?){(.*?)}/g`. The pattern `{(.*?)}` has invalid quantifier syntax (quantifiers must be {n}, {n,}, or {n,m} with numeric values). Java's Pattern.compile() rejects this with PatternSyntaxException, but real Perl treats invalid quantifier braces as literal characters with a deprecation warning. ## Solution Add preprocessing in RegexPreprocessor.escapeInvalidQuantifierBraces() to detect and escape invalid quantifier braces before passing patterns to Java's Pattern.compile(). Key features: - Detects invalid quantifier syntax: {(.*?)}, {abc}, {}, {,5}, etc. - Preserves valid quantifiers: {3}, {2,4}, {2,} - Skips escape sequences that use braces: \N{...}, \x{...}, \o{...}, \p{...}, \P{...}, \g{...} - Handles character classes correctly (braces in [...] are always literal) - Escapes both opening and closing braces of invalid quantifiers - Comprehensive warning comments about edge cases and potential issues ## Test Coverage New test file: src/test/resources/unit/regex/unescaped_braces.t - Direct unescaped braces: `/(.*?){(.*?)}/g` matches "a{b}c{d}" correctly - Interpolated patterns in eval STRING now work - Valid quantifiers unchanged: /ab{3}/, /ab{2,4}/, /ab{2,}/ - Mixed valid and invalid braces: /x{y}z{3}/ - Character class braces remain literal: /[a{3}]+/ - Empty and non-numeric braces: /x{}y/, /x{abc}y/ ## Test Results Test case: `my $rx = q{/(.*?)\{(.*?)\}/g}; eval qq{while (\$input =~ $rx) {...}}` | Mode | Input "a{b}c{d}" | Expected | Actual | Status | |------|----------|----------|--------|--------| | Real Perl | 2 matches | i=2 | i=2 | ✅ Reference | | JVM Compiler | 2 matches | i=2 | i=2 | ✅ **Fixed** | | Interpreter | 2 matches | i=2 | i=0 or infinite | ❌ **Still broken** | Fixes: perl5_t/t/re/pat_rt_report.t test 21 in JVM compiler mode (test still fails in interpreter mode - requires separate fix) ## Known Limitations ⚠️ **Interpreter mode is not fixed** - Different code path bypasses preprocessor. See dev/prompts/eval-interpreter-regex-issue.md for analysis. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The fix for invalid quantifier braces only works in JVM compiler mode. Interpreter mode has a different code path that bypasses the preprocessor, resulting in either no matches or infinite loops. This document analyzes the problem and outlines next steps for fixing the interpreter mode.

**Problem:** In interpreter mode, while/if loop conditions were being evaluated in VOID context instead of SCALAR context, causing regex matches to fail when used in eval STRING. This led to test 21 in pat_rt_report.t hanging infinitely. **Root Cause:** BytecodeCompiler was not setting proper context when visiting condition nodes in For3Node (while/for loops) and IfNode (if statements). The JVM compiler correctly uses SCALAR context for these conditions, but the interpreter was using whatever the current context was (often VOID). **Fix:** - For3Node: Save/restore context, set to SCALAR when evaluating condition - IfNode: Save/restore context, set to SCALAR when evaluating condition - Add DEBUG_REGEX flag to trace regex compilation and matching - Add debug logging in RuntimeRegex and BytecodeInterpreter **Testing:** - All three test cases in /tmp/test.pl now pass in interpreter mode - Unit tests pass: make test - perl5_t/t/re/pat_rt_report.t now completes 2384/2514 tests (94.8%) without hanging (previously hung on test 21) **Related:** - Matches behavior in JVM compiler mode (EmitStatement.emitFor3) - Fixes infinite loop in interpreter mode for regex in eval STRING Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When assigning special variables like $& to captured variables in eval STRING interpreter mode, the SET_SCALAR opcode was copying empty fields instead of the computed value. Solution: Use addToScalar() instead of set() in SET_SCALAR opcode. ScalarSpecialVariable.addToScalar() already calls getValueAsScalar() to get the computed value, matching how the JVM backend handles this. Changes: - BytecodeInterpreter.java: Change SET_SCALAR to use addToScalar() - BytecodeCompiler.java: Use NameNormalizer for global variable names Results: - re/regexp.t: 1144 → 1746 passing tests (+602, 99% of JVM mode) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The interpreter was incorrectly using numeric bitwise operators (BITWISE_*_BINARY) for &, |, ^ by default, when these should use string bitwise operators (STRING_BITWISE_*) which validate that operands don't contain code points over 0xFF. In Perl: - & | ^ without "use integer" → STRING bitwise (validates Unicode) - binary& binary| binary^ with "use integer" → NUMERIC bitwise This was causing tests to fail when eval STRING threw exceptions for high Unicode code points in bitwise operations - the exceptions weren't being thrown at all because the wrong operator was used. Changes: - BytecodeCompiler.java: Split & | ^ into separate cases from binary& binary| binary^, mapping & | ^ to STRING_BITWISE_* opcodes Results: - op/bop.t: 264 → 268 passing tests in interpreter mode (+4) - Exceptions now properly captured in $@ for eval STRING Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The recent fix for regex LIST context (commit 80fab8b) incorrectly returned (1) for patterns with optional captures that didn't participate. This affected both JVM compiler and interpreter modes. ## Problem Pattern: /(a)?/ String: "" - Expected: (undef) [capture group exists but didn't match] - Before fix: (1) [incorrectly treated as "no captures"] The bug was checking if result.elements.isEmpty() instead of checking if the pattern has zero capturing groups (captureCount == 0). ## Solution 1. Track captureCount outside the while loop for later use 2. Always add captures to result, using scalarUndef for non-participating groups (when Matcher.group(i) returns null) 3. Check captureCount == 0 instead of result.elements.isEmpty() ## Perl Semantics - Non-participating captures (e.g., (a)? not matching) → undef - Empty captures (e.g., (a*) matching zero a's) → empty string "" - Pattern with no captures → (1) on success ## Test Results Comprehensive test suite confirms correct behavior: - /abc/ matching "abc" → (1) ✓ - /(a)(b)(c)/ matching "abc" → ("a","b","c") ✓ - /(a)?/ matching "" → (undef) ✓ [was (1) before fix] - /(a)|(b)/ matching "a" → ("a",undef) ✓ [was ("a") before fix] - /(a*)b/ matching "b" → ("") ✓ Unit tests pass. re/regexp.t maintains 1765 passing tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The escapeInvalidQuantifierBraces feature (added in commit 42bcde5) was too aggressive and broke 7 tests, reducing passing count from 1788 to 1781 even after the {,m} and \b{}/\B{} fixes. ## Problem The feature attempted to escape invalid quantifier braces like {(.*?)} but had issues: 1. Edge cases with complex nested patterns 2. Interaction with other escape sequences 3. Breaking valid Perl patterns in subtle ways ## Solution Disable the feature entirely for now. The original issue it solved (interpolated patterns in eval STRING with unescaped braces) is rare, and the feature needs more comprehensive testing before re-enabling. ## Test Results Before (with feature + fixes): 1781/2210 After (feature disabled): 1788/2210 ✅ This exceeds the 1786 target and matches the pre-regression baseline. ## Future Work To re-enable this feature safely: - Add comprehensive test suite for edge cases - Test interaction with all escape sequences - Ensure no regressions in re/regexp.t - Consider alternative approaches (runtime error handling vs preprocessing) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This test file was added in commit 42bcde5 to test the escapeInvalidQuantifierBraces feature, which has now been disabled due to causing test regressions. The feature will need more comprehensive testing before being re-enabled, at which point this test can be restored and expanded.

…T context" This reverts commit e050511.

…ST context" This reverts commit a078172.

…es in LIST context"" This reverts commit 1a0346e.

fglock and others added 16 commits February 18, 2026 09:57

Revert "fix: Return undef for non-participating regex captures in LIS…

a078172

…T context" This reverts commit e050511.

Reapply "fix: Return undef for non-participating regex captures in LI…

1a0346e

…ST context" This reverts commit a078172.

Revert "Reapply "fix: Return undef for non-participating regex captur…

5a0c340

…es in LIST context"" This reverts commit 1a0346e.

fglock closed this Feb 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add negated regex match (!~) operator to interpreter#206

feat: Add negated regex match (!~) operator to interpreter#206
fglock wants to merge 16 commits into
masterfrom
feature/eval-interpreter-mode

fglock commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fglock commented Feb 18, 2026

Summary

Problem

Solution

Changes

Modified Files

New Files

Testing

Before Fix

After Fix

Test Results

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant