Skip to content

Fix ExifTool compatibility and package refactoring#252

Merged
fglock merged 5 commits into
masterfrom
fix-exiftool-and-package-refactor
Feb 28, 2026
Merged

Fix ExifTool compatibility and package refactoring#252
fglock merged 5 commits into
masterfrom
fix-exiftool-and-package-refactor

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Feb 28, 2026

Summary

This branch contains a collection of fixes aimed at improving ExifTool compatibility and fixing various bugs discovered during testing:

Regex & Capture Groups

  • Store capture groups in explicit lastCaptureGroups array instead of relying on globalMatcher.group() which could throw IllegalStateException after certain match sequences. Captures are now explicitly managed: updated on successful capturing matches, cleared on non-capturing or failed capturing matches, preserved across failed non-capturing matches.
  • Dynamic scoping for regex match variables via RegexState save/restore of lastCaptureGroups
  • Two-tier regex state scoping: subroutine-level (unconditional save/restore at method boundaries) and block-level (conditional, gated by RegexUsageDetector for blocks containing regex ops)
  • Fix blockIsSubroutine annotation for named subs so block-level regex save/restore is correctly skipped for subroutine body blocks (set in createRuntimeCode())
  • Safe matcher access: use saved lastMatchedString/lastMatchStart/lastMatchEnd fields instead of globalMatcher.group(0)/start(0)/end(0) to avoid IllegalStateException after s///g
  • Materialize special vars before regex state restore: $1, $&, etc. in return values are resolved to concrete scalars before the subroutine-level restore overwrites global regex state
  • Fix regex octal escapes inside character classes
  • Fix \G regex anchor and grep scalar context in bytecode interpreter
  • Fix non-/g regex matches incorrectly using pos()

eval & Package Scoping

  • Fix eval STRING corrupting outer scope my variables during recursive calls - cleanup aliases from GlobalVariable in finally block
  • Fix eval package scoping - restore currentPackage after DynamicVariableManager push
  • Fix eval BLOCK and eval STRING in bytecode interpreter
  • Scope eval currentPackage via DynamicVariableManager to prevent leaking

File Operations & stat

  • Fix stat _ / lstat _ to use cached stat buffer instead of re-statting
  • Fix file test operators on filehandles and stat _ caching/parsing
  • Fix stat _ parsing edge cases

Bytecode Interpreter

  • Add goto LABEL support in bytecode interpreter
  • Add labeled block support and fix split args casting
  • Fix foreach with pre-declared lexical variable
  • Fix HASH_SET and NOT opcodes for non-scalar registers
  • Fix undef $scalar, HASH_SET read-only, print return value, MY_SCALAR opcode
  • Disable tryWholeBlockRefactoring to fix return inside loops

Other Fixes

  • Fix AUTOLOAD dispatch order: search full MRO before AUTOLOAD
  • Fix prototype @ argument parsing with parenthesized expressions
  • Fix unpack float/double for non-UTF8 binary data
  • Fix Encode::is_utf8 returning inverted result
  • Fix BYTE_STRING propagation, sort SUBNAME LIST, foreach lexical restore, substr lvalue
  • Fix ScalarSpecialVariable not resolving in copy/set operations
  • Fix sysread/read type corruption on tied scalars
  • Fix scalar localtime/gmtime to use Perl ctime format
  • Fix list slice with range indices: (expr)[0..2] now works correctly
  • Fix signatures regression and error messages

Test plan

  • All 154 unit tests pass
  • re/subst.t: 183/281 (matches master) in both JVM and interpreter modes
  • ExifTool test suite compatibility (in progress)

Generated with Devin

fglock and others added 5 commits February 28, 2026 20:37
… globalMatcher.group()

The old code read $1/$2/etc from globalMatcher.group() which could throw
IllegalStateException after certain match sequences (caught silently in
ScalarSpecialVariable). This was fragile and made capture persistence
behavior implicit.

Now captures are explicitly stored in a String[] array on each successful
capturing match, cleared on successful non-capturing matches and failed
capturing matches, and preserved across failed non-capturing matches.
RegexState saves/restores the array for dynamic scoping.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <noreply@cognition.ai>
… calls

In Perl 5, $1/$2/etc are dynamically scoped - regex matches inside called
functions don't affect the caller's capture variables. PerlOnJava stored
these globally, so any function doing a regex match would clobber the
caller's $1.

Fix: save/restore RegexState around every subroutine call in both JVM
compiled (RuntimeCode.apply) and interpreted (InterpretedCode.apply)
code paths. Before restoring, materialize any ScalarSpecialVariable
values in the return list so that "return $1" works correctly.

This fixes ExifTool test 2 where GetTagTable's internal regex clobbered
$1 before HandleTag could read it.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <noreply@cognition.ai>
In Perl 5, all blocks ({ }, if, while, for, do) scope $1/$2/etc.
Previously PerlOnJava only scoped regex state at subroutine call
boundaries. This adds save/restore at block level using a
RegexUsageDetector visitor to only emit the overhead for blocks
that actually contain regex operations.

Changes:
- New RegexUsageDetector: iterative AST walker that detects
  matchRegex, replaceRegex, =~, !~, split (stops at sub boundaries)
- New SAVE_REGEX_STATE/RESTORE_REGEX_STATE bytecode opcodes
- BytecodeCompiler.visit(BlockNode): emit save/restore when needed
- EmitBlock.emitBlock(): JVM path save/restore for block bodies
- EmitForeach: save/restore for inlined foreach loop bodies

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <noreply@cognition.ai>
Instead of saving/restoring RegexState in RuntimeCode.apply() and
InterpretedCode.apply() (runtime wrapper), move it into the generated
subroutine code itself, matching how local variable teardown works:

JVM path: save at method entry, materialize+restore at returnLabel
  (same join point where Local.localTeardown runs)
Bytecode path: save at execute() entry, restore in finally block;
  materialize at RETURN opcode before the return

This removes the try/finally overhead from apply() and unifies
regex scoping with the existing local-variable cleanup mechanism.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <noreply@cognition.ai>
…tcher access

- Set blockIsSubroutine annotation in createRuntimeCode() so named
  subroutine blocks skip redundant block-level regex save/restore
- Use lastMatchedString in captureString(0) instead of globalMatcher.group(0)
  to avoid IllegalStateException after s///g
- Use saved lastMatchStart/lastMatchEnd for matcherStart/matcherEnd group 0
- Add try-catch for matcherStart/matcherEnd on non-zero groups
- Remove all debug logging (RegexState id/label, EmitBlock/EmitterMethodCreator/
  RuntimeCode System.err.println)

re/subst.t: 183/281 (matches master) in both JVM and interpreter modes.
All 154 unit tests pass.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <noreply@cognition.ai>
@fglock fglock merged commit baa797d into master Feb 28, 2026
2 checks passed
@fglock fglock deleted the fix-exiftool-and-package-refactor branch February 28, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant