Skip to content

feat: Add negated regex match (!~) operator to interpreter#206

Closed
fglock wants to merge 16 commits into
masterfrom
feature/eval-interpreter-mode
Closed

feat: Add negated regex match (!~) operator to interpreter#206
fglock wants to merge 16 commits into
masterfrom
feature/eval-interpreter-mode

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Feb 18, 2026

Summary

This PR adds support for the negated regex match (!~) operator to the interpreter's bytecode compiler, fixing eval STRING failures in tests that use this operator.

Problem

The test perl5_t/t/re/charset.t was failing when run with JPERL_EVAL_USE_INTERPRETER=1 due to missing support for the !~ operator. The error was:

Unsupported operator: !~ at (eval 345) line 1, near ") /x"

Solution

Implemented the !~ operator following the SKILL.md guide:

  1. Added opcode definition - MATCH_REGEX_NOT = 217 in Opcodes.java
  2. Added compiler support - Case for !~ in BytecodeCompiler.compileBinaryOperatorSwitch()
  3. Added runtime implementation - Handler in BytecodeInterpreter.java that negates the =~ result
  4. Added disassembly support - Case in InterpretedCode.java for debugging

Changes

Modified Files

  • src/main/java/org/perlonjava/interpreter/Opcodes.java - Added MATCH_REGEX_NOT opcode
  • src/main/java/org/perlonjava/interpreter/BytecodeCompiler.java - Added compiler case
  • src/main/java/org/perlonjava/interpreter/BytecodeInterpreter.java - Added runtime handler
  • src/main/java/org/perlonjava/interpreter/InterpretedCode.java - Added disassembly case

New Files

  • dev/prompts/20260218_interpreter_negated_regex_match.md - Implementation notes

Testing

Before Fix

Tests using !~ were failing:

not ok 3 - my $a = "\t"; $a !~ qr/ (?a: \S ) /x

After Fix

All !~ operator tests pass:

ok 3 - my $a = "\t"; $a !~ qr/ (?a: \S ) /x
ok 4 - my $a = "\t" x 10; $a !~ qr/ (?a: \S{10} ) /x

Test Results

  • ✓ All unit tests pass
  • perl5_t/t/re/charset.t runs successfully with interpreter mode
  • Total test cases: 5552

Related

  • Fixes eval STRING failures with negated regex matches
  • Part of the broader interpreter mode feature development
  • Maintains opcode contiguity (217 is next sequential number after 216)

🤖 Generated with Claude Code

fglock and others added 16 commits February 18, 2026 09:57
Implement support for the !~ operator in the interpreter's bytecode
compiler, enabling tests like perl5_t/t/re/charset.t to run with
JPERL_EVAL_USE_INTERPRETER=1.

Changes:
- Added MATCH_REGEX_NOT opcode (217) to Opcodes.java
- Implemented compiler support in BytecodeCompiler.java
- Added runtime handler in BytecodeInterpreter.java
- Added disassembly support in InterpretedCode.java

The operator negates the result of RuntimeRegex.matchRegex() to provide
the inverse match semantics of the =~ operator.

Fixes eval STRING failures in tests using negated regex matches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixed multiple issues with context propagation in logical operators
and regex matching, ensuring operands are evaluated in SCALAR context
for boolean tests.

Changes:

1. BytecodeCompiler (interpreter path):
   - Fixed &&, ||, // operators to evaluate operands in SCALAR context
   - Fixed !, not operators to evaluate operands in SCALAR context
   - Fixed ternary ? : operator to evaluate condition in SCALAR context

2. EmitLogicalOperator (JVM path):
   - Fixed logical operators to always use SCALAR context for operands
   - Previously preserved RUNTIME context, causing wantarray (often VOID)
     to be used instead of SCALAR context
   - This bug was exposed when postfix if was the last statement

3. RuntimeRegex:
   - Fixed regex matches in LIST context to return (1) for success
     when there are no captures (non-global matches)
   - Previously returned empty list, causing false in boolean context

Fixes postfix if/unless, logical NOT, ternary operator, and regex
matching in various contexts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pport

Added full support for loop control operators (last, next, redo) in the
interpreter bytecode compiler and runtime, including labeled control flow.

Changes:

1. Opcodes.java:
   - Added LAST (218), NEXT (219), REDO (220) opcodes
   - Format: opcode + target PC (absolute jump address)

2. BytecodeCompiler.java:
   - Added LoopInfo class to track loop boundaries and labels
   - Added loopStack to manage nested loops
   - Implemented handleLoopControlOperator() for last/next/redo compilation
   - Updated For1Node visitor to push/pop loop info and patch jumps
   - Updated For3Node visitor to push/pop loop info and patch jumps
   - Tracks three jump types per loop:
     * breakPcs: PCs to patch for 'last' (jump to end)
     * nextPcs: PCs to patch for 'next' (jump to continue)
     * redoPcs: PCs to patch for 'redo' (jump to start)

3. BytecodeInterpreter.java:
   - Added runtime handlers for LAST/NEXT/REDO opcodes
   - Simple PC jump implementation (non-local control flow not yet supported)

4. InterpretedCode.java:
   - Added disassembly support for LAST/NEXT/REDO opcodes

Features:
- Unlabeled last/next/redo work in nearest enclosing loop
- Labeled last/next/redo work with nested loops
- Proper jump target patching during compilation
- All three operators fully tested with for loops

Fixes infinite loop in re/pat_rt_report.t test #21 which uses 'last'
inside eval STRING.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add isTrueLoop flag to LoopInfo to distinguish true loops (for/while/foreach)
from pseudo-loops (do-while/bare blocks). Throw proper error when loop control
operators (last/next/redo) are used in do-while loops, matching Perl behavior.

This fixes the infinite loop issue in re/pat_rt_report.t test where next was
incorrectly allowed in do-while loops. Test now passes 2384/2514 tests (94.8%).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The RHS of logical operators (&&, ||, //) should preserve RUNTIME context
when the operator is the return value of a subroutine. The LHS always uses
SCALAR context for the boolean test, but the RHS should use the current
context (RUNTIME, SCALAR, or LIST) to properly propagate wantarray through
the operator at subroutine exit.

This fixes op/wantarray.t tests 14-22, which now pass 28/28.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r mode)

Fix for interpolated regex patterns in eval STRING where invalid quantifier
braces need to be treated as literals. **This fix applies to JVM compiler mode only.**
Interpreter mode still has the issue (see dev/prompts/eval-interpreter-regex-issue.md).

## Problem

When a regex pattern with escaped braces like `/(.*?)\{(.*?)\}/g` is
interpolated into a `qq{}` string or heredoc, the backslashes are consumed
by the string interpolation, resulting in `/(.*?){(.*?)}/g`. The pattern
`{(.*?)}` has invalid quantifier syntax (quantifiers must be {n}, {n,}, or
{n,m} with numeric values).

Java's Pattern.compile() rejects this with PatternSyntaxException, but
real Perl treats invalid quantifier braces as literal characters with
a deprecation warning.

## Solution

Add preprocessing in RegexPreprocessor.escapeInvalidQuantifierBraces()
to detect and escape invalid quantifier braces before passing patterns to
Java's Pattern.compile().

Key features:
- Detects invalid quantifier syntax: {(.*?)}, {abc}, {}, {,5}, etc.
- Preserves valid quantifiers: {3}, {2,4}, {2,}
- Skips escape sequences that use braces: \N{...}, \x{...}, \o{...},
  \p{...}, \P{...}, \g{...}
- Handles character classes correctly (braces in [...] are always literal)
- Escapes both opening and closing braces of invalid quantifiers
- Comprehensive warning comments about edge cases and potential issues

## Test Coverage

New test file: src/test/resources/unit/regex/unescaped_braces.t
- Direct unescaped braces: `/(.*?){(.*?)}/g` matches "a{b}c{d}" correctly
- Interpolated patterns in eval STRING now work
- Valid quantifiers unchanged: /ab{3}/, /ab{2,4}/, /ab{2,}/
- Mixed valid and invalid braces: /x{y}z{3}/
- Character class braces remain literal: /[a{3}]+/
- Empty and non-numeric braces: /x{}y/, /x{abc}y/

## Test Results

Test case: `my $rx = q{/(.*?)\{(.*?)\}/g}; eval qq{while (\$input =~ $rx) {...}}`

| Mode | Input "a{b}c{d}" | Expected | Actual | Status |
|------|----------|----------|--------|--------|
| Real Perl | 2 matches | i=2 | i=2 | ✅ Reference |
| JVM Compiler | 2 matches | i=2 | i=2 | ✅ **Fixed** |
| Interpreter | 2 matches | i=2 | i=0 or infinite | ❌ **Still broken** |

Fixes: perl5_t/t/re/pat_rt_report.t test 21 in JVM compiler mode
(test still fails in interpreter mode - requires separate fix)

## Known Limitations

⚠️ **Interpreter mode is not fixed** - Different code path bypasses preprocessor.
See dev/prompts/eval-interpreter-regex-issue.md for analysis.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The fix for invalid quantifier braces only works in JVM compiler mode.
Interpreter mode has a different code path that bypasses the preprocessor,
resulting in either no matches or infinite loops.

This document analyzes the problem and outlines next steps for fixing
the interpreter mode.
**Problem:**
In interpreter mode, while/if loop conditions were being evaluated in
VOID context instead of SCALAR context, causing regex matches to fail
when used in eval STRING. This led to test 21 in pat_rt_report.t
hanging infinitely.

**Root Cause:**
BytecodeCompiler was not setting proper context when visiting condition
nodes in For3Node (while/for loops) and IfNode (if statements). The JVM
compiler correctly uses SCALAR context for these conditions, but the
interpreter was using whatever the current context was (often VOID).

**Fix:**
- For3Node: Save/restore context, set to SCALAR when evaluating condition
- IfNode: Save/restore context, set to SCALAR when evaluating condition
- Add DEBUG_REGEX flag to trace regex compilation and matching
- Add debug logging in RuntimeRegex and BytecodeInterpreter

**Testing:**
- All three test cases in /tmp/test.pl now pass in interpreter mode
- Unit tests pass: make test
- perl5_t/t/re/pat_rt_report.t now completes 2384/2514 tests (94.8%)
  without hanging (previously hung on test 21)

**Related:**
- Matches behavior in JVM compiler mode (EmitStatement.emitFor3)
- Fixes infinite loop in interpreter mode for regex in eval STRING

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When assigning special variables like $& to captured variables in eval
STRING interpreter mode, the SET_SCALAR opcode was copying empty fields
instead of the computed value.

Solution: Use addToScalar() instead of set() in SET_SCALAR opcode.
ScalarSpecialVariable.addToScalar() already calls getValueAsScalar()
to get the computed value, matching how the JVM backend handles this.

Changes:
- BytecodeInterpreter.java: Change SET_SCALAR to use addToScalar()
- BytecodeCompiler.java: Use NameNormalizer for global variable names

Results:
- re/regexp.t: 1144 → 1746 passing tests (+602, 99% of JVM mode)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The interpreter was incorrectly using numeric bitwise operators
(BITWISE_*_BINARY) for &, |, ^ by default, when these should use
string bitwise operators (STRING_BITWISE_*) which validate that
operands don't contain code points over 0xFF.

In Perl:
- & | ^ without "use integer" → STRING bitwise (validates Unicode)
- binary& binary| binary^ with "use integer" → NUMERIC bitwise

This was causing tests to fail when eval STRING threw exceptions for
high Unicode code points in bitwise operations - the exceptions weren't
being thrown at all because the wrong operator was used.

Changes:
- BytecodeCompiler.java: Split & | ^ into separate cases from binary&
  binary| binary^, mapping & | ^ to STRING_BITWISE_* opcodes

Results:
- op/bop.t: 264 → 268 passing tests in interpreter mode (+4)
- Exceptions now properly captured in $@ for eval STRING

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The recent fix for regex LIST context (commit 80fab8b) incorrectly
returned (1) for patterns with optional captures that didn't participate.
This affected both JVM compiler and interpreter modes.

## Problem

Pattern: /(a)?/  String: ""
- Expected: (undef)  [capture group exists but didn't match]
- Before fix: (1)  [incorrectly treated as "no captures"]

The bug was checking if result.elements.isEmpty() instead of checking
if the pattern has zero capturing groups (captureCount == 0).

## Solution

1. Track captureCount outside the while loop for later use
2. Always add captures to result, using scalarUndef for non-participating
   groups (when Matcher.group(i) returns null)
3. Check captureCount == 0 instead of result.elements.isEmpty()

## Perl Semantics

- Non-participating captures (e.g., (a)? not matching) → undef
- Empty captures (e.g., (a*) matching zero a's) → empty string ""
- Pattern with no captures → (1) on success

## Test Results

Comprehensive test suite confirms correct behavior:
- /abc/ matching "abc" → (1) ✓
- /(a)(b)(c)/ matching "abc" → ("a","b","c") ✓
- /(a)?/ matching "" → (undef) ✓ [was (1) before fix]
- /(a)|(b)/ matching "a" → ("a",undef) ✓ [was ("a") before fix]
- /(a*)b/ matching "b" → ("") ✓

Unit tests pass. re/regexp.t maintains 1765 passing tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The escapeInvalidQuantifierBraces feature (added in commit 42bcde5)
was too aggressive and broke 7 tests, reducing passing count from
1788 to 1781 even after the {,m} and \b{}/\B{} fixes.

## Problem

The feature attempted to escape invalid quantifier braces like {(.*?)}
but had issues:

1. Edge cases with complex nested patterns
2. Interaction with other escape sequences
3. Breaking valid Perl patterns in subtle ways

## Solution

Disable the feature entirely for now. The original issue it solved
(interpolated patterns in eval STRING with unescaped braces) is rare,
and the feature needs more comprehensive testing before re-enabling.

## Test Results

Before (with feature + fixes): 1781/2210
After (feature disabled): 1788/2210 ✅

This exceeds the 1786 target and matches the pre-regression baseline.

## Future Work

To re-enable this feature safely:
- Add comprehensive test suite for edge cases
- Test interaction with all escape sequences
- Ensure no regressions in re/regexp.t
- Consider alternative approaches (runtime error handling vs preprocessing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This test file was added in commit 42bcde5 to test the
escapeInvalidQuantifierBraces feature, which has now been disabled
due to causing test regressions.

The feature will need more comprehensive testing before being
re-enabled, at which point this test can be restored and expanded.
@fglock fglock closed this Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant