Fix ExifTool compatibility and package refactoring#252
Merged
Conversation
… globalMatcher.group() The old code read $1/$2/etc from globalMatcher.group() which could throw IllegalStateException after certain match sequences (caught silently in ScalarSpecialVariable). This was fragile and made capture persistence behavior implicit. Now captures are explicitly stored in a String[] array on each successful capturing match, cleared on successful non-capturing matches and failed capturing matches, and preserved across failed non-capturing matches. RegexState saves/restores the array for dynamic scoping. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <noreply@cognition.ai>
… calls In Perl 5, $1/$2/etc are dynamically scoped - regex matches inside called functions don't affect the caller's capture variables. PerlOnJava stored these globally, so any function doing a regex match would clobber the caller's $1. Fix: save/restore RegexState around every subroutine call in both JVM compiled (RuntimeCode.apply) and interpreted (InterpretedCode.apply) code paths. Before restoring, materialize any ScalarSpecialVariable values in the return list so that "return $1" works correctly. This fixes ExifTool test 2 where GetTagTable's internal regex clobbered $1 before HandleTag could read it. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <noreply@cognition.ai>
In Perl 5, all blocks ({ }, if, while, for, do) scope $1/$2/etc.
Previously PerlOnJava only scoped regex state at subroutine call
boundaries. This adds save/restore at block level using a
RegexUsageDetector visitor to only emit the overhead for blocks
that actually contain regex operations.
Changes:
- New RegexUsageDetector: iterative AST walker that detects
matchRegex, replaceRegex, =~, !~, split (stops at sub boundaries)
- New SAVE_REGEX_STATE/RESTORE_REGEX_STATE bytecode opcodes
- BytecodeCompiler.visit(BlockNode): emit save/restore when needed
- EmitBlock.emitBlock(): JVM path save/restore for block bodies
- EmitForeach: save/restore for inlined foreach loop bodies
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <noreply@cognition.ai>
Instead of saving/restoring RegexState in RuntimeCode.apply() and InterpretedCode.apply() (runtime wrapper), move it into the generated subroutine code itself, matching how local variable teardown works: JVM path: save at method entry, materialize+restore at returnLabel (same join point where Local.localTeardown runs) Bytecode path: save at execute() entry, restore in finally block; materialize at RETURN opcode before the return This removes the try/finally overhead from apply() and unifies regex scoping with the existing local-variable cleanup mechanism. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <noreply@cognition.ai>
…tcher access - Set blockIsSubroutine annotation in createRuntimeCode() so named subroutine blocks skip redundant block-level regex save/restore - Use lastMatchedString in captureString(0) instead of globalMatcher.group(0) to avoid IllegalStateException after s///g - Use saved lastMatchStart/lastMatchEnd for matcherStart/matcherEnd group 0 - Add try-catch for matcherStart/matcherEnd on non-zero groups - Remove all debug logging (RegexState id/label, EmitBlock/EmitterMethodCreator/ RuntimeCode System.err.println) re/subst.t: 183/281 (matches master) in both JVM and interpreter modes. All 154 unit tests pass. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <noreply@cognition.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This branch contains a collection of fixes aimed at improving ExifTool compatibility and fixing various bugs discovered during testing:
Regex & Capture Groups
lastCaptureGroupsarray instead of relying onglobalMatcher.group()which could throwIllegalStateExceptionafter certain match sequences. Captures are now explicitly managed: updated on successful capturing matches, cleared on non-capturing or failed capturing matches, preserved across failed non-capturing matches.RegexStatesave/restore oflastCaptureGroupsRegexUsageDetectorfor blocks containing regex ops)blockIsSubroutineannotation for named subs so block-level regex save/restore is correctly skipped for subroutine body blocks (set increateRuntimeCode())lastMatchedString/lastMatchStart/lastMatchEndfields instead ofglobalMatcher.group(0)/start(0)/end(0)to avoidIllegalStateExceptionafters///g$1,$&, etc. in return values are resolved to concrete scalars before the subroutine-level restore overwrites global regex state\Gregex anchor and grep scalar context in bytecode interpreter/gregex matches incorrectly usingpos()eval & Package Scoping
myvariables during recursive calls - cleanup aliases fromGlobalVariableinfinallyblockcurrentPackageafterDynamicVariableManagerpushDynamicVariableManagerto prevent leakingFile Operations & stat
stat _/lstat _to use cached stat buffer instead of re-statting_caching/parsingstat _parsing edge casesBytecode Interpreter
goto LABELsupport in bytecode interpreterforeachwith pre-declared lexical variableHASH_SETandNOTopcodes for non-scalar registersundef $scalar,HASH_SETread-only, print return value,MY_SCALARopcodetryWholeBlockRefactoringto fix return inside loopsOther Fixes
@argument parsing with parenthesized expressionsEncode::is_utf8returning inverted resultBYTE_STRINGpropagation, sort SUBNAME LIST, foreach lexical restore, substr lvalueScalarSpecialVariablenot resolving in copy/set operationssysread/readtype corruption on tied scalarslocaltime/gmtimeto use Perl ctime format(expr)[0..2]now works correctlyTest plan
re/subst.t: 183/281 (matches master) in both JVM and interpreter modesGenerated with Devin