fix: Hash::MultiValue + blib/arch — splice spill, deref slice, dclone hooks#478
Merged
Conversation
…ne hooks Three bugs fixed: 1. JVM backend: splice with constant sub causes ASM frame crash handleSpliceBuiltin left the first arg on the JVM operand stack while evaluating remaining args. When those contained a function call (constant sub), the blockDispatcher's GOTOs created inconsistent stack depths at merge points. Fixed with register spilling, matching handlePushOperator's existing pattern. 2. Interpreter backend: @$ref[@idx] = ... unsupported The array slice assignment handler only supported @array[@idx] with plain IdentifierNode. Added a branch to handle dereferenced array refs using DEREF_ARRAY/DEREF_ARRAY_NONSTRICT opcodes. 3. Storable::dclone: shared refs from STORABLE_freeze hooks dclone passed extra refs from STORABLE_freeze directly to STORABLE_thaw without cloning them. Inside-out objects like Hash::MultiValue ended up sharing internal arrays between original and clone. Fixed by deep-cloning extra refs (indices 1+). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The generated Makefile pure_all target only created blib/lib/ but not blib/arch/. The blib.pm pragma requires both directories to exist (-d blib && -d blib/lib && -d blib/arch), so use blib and -Mblib would die with Cannot find blib even in ... This caused CPAN module test suites (e.g. HTTP::Thin) to fail when they use open3 with -Mblib to verify modules load cleanly. Add mkdir -p blib/arch to the pure_all target so the directory always exists alongside blib/lib. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…tures
The regex error catch block was downgrading ALL exceptions to warnings
when JPERL_UNIMPLEMENTED=warn was set, including real compilation errors
like "Invalid Unicode character name". Now only PerlJavaUnimplementedException
is downgraded to a warning; other regex errors remain fatal.
This fixes a hang in lib/croak.t where `qr/(?{})\N{}/;while(my($0)=0){}`
would continue past the \N{} error into an infinite while loop when
JPERL_UNIMPLEMENTED=warn was set by the test runner.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Perl requires that forced-global variables ($_, @_, %_, $0, $1, $!, $/, $^W, etc.) cannot be lexicalized with 'my' or 'state'. Previously jperl silently allowed this, which could cause infinite loops when my($0) returned truthy in a while condition. Now emits: Can't use global $X in "my" (matching Perl's error message. The check covers: - Underscore variables: src/main/java/org/perlonjava/frontend/parser/OperatorParser.java, @_, %_ - Digit-only names: bash, , , ... - Single punctuation names: , $/, , $;, $,, $., $|, etc. - Caret/control variables: $^W, $^H, $^O, etc. 'our' and 'local' are unaffected (they correctly alias/localize globals). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> EOF )
- RegexPreprocessor: use regexUnimplemented() (not regexError()) for
control verbs, (?@...), and lookbehind >255 so JPERL_UNIMPLEMENTED=warn
can downgrade them (fixes pat_rt_report.t: 196 -> 2397)
- RuntimeRegex: restructure catch block to distinguish
PerlJavaUnimplementedException (extends PerlCompilerException) from
real PerlCompilerException syntax errors. Wrap Java
PatternSyntaxException as unimplemented so it can be downgraded.
(fixes pat_advanced.t: 54 -> 63)
- RegexPreprocessor.handleCodeBlock: don't consume closing paren - let
handleParentheses consume it, matching the protocol used by all other
group handlers. Fixes code blocks causing "Unmatched (" errors.
(fixes pat.t: 239 -> 428)
- OperatorParser.isGlobalOnlyVariable: restrict single-char punctuation
check to ASCII (< 128) so Unicode characters that Java doesn't
recognize as letters aren't rejected.
(fixes uni/variables.t: 66803 -> 66880)
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- Fix handleQuantifier consuming regex metacharacters inside invalid
brace expressions (e.g., {(?>...)*} was consumed as a single literal
brace expression). Now only escapes the opening { and lets the main
loop process subsequent characters.
- Fix \x{...} hex escapes with non-hex characters to extract valid
hex prefix like Perl does (e.g., \x{9bq} -> 0x9B). Fixes fatal
crash in pat_advanced.t at line 321.
- Handle bare \xNN with non-hex chars (e.g., \xk -> \x00 + literal k)
instead of passing through to Java Pattern which rejects it.
- Fix NullPointerException when regex with (?{...}) code blocks fails
with JPERL_UNIMPLEMENTED=warn: set regex.patternString in catch
block and add null guard in preProcessRegex.
Test improvements:
pat_advanced.t: 63 -> 731 passing (+668)
pat.t: 428 -> 533 passing (+105)
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Analyzed pat.t (99 failures + 666 blocked), pat_advanced.t (107 failures),
and pat_rt_report.t (77 failures) into 16 categories (A-P) with difficulty
ratings and priority recommendations.
Key findings:
- \p{isAlpha} alias crash blocks 666 pat.t tests (quick fix)
- Bug 41010 conditional+$ anchor accounts for 48 pat_rt_report.t failures
- $^N not updated = 20 pat_advanced.t failures
- \N{name} charnames = 25 pat_advanced.t failures
- (?{...}) code blocks = 46 failures (very hard, engine limitation)
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…p names
Three regex fixes that unblock 543 additional pat.t tests:
1. \p{isAlpha} POSIX-style aliases: make Is/is prefix stripping
case-insensitive, add Space/Alnum/Punct/White_Space aliases
to the switch statement
2. \p{Property=Value} syntax: split on '=' and pass property name
and value separately to ICU4J. Handle True/False/Yes/No values.
3. Named capture groups with underscores: Java regex only allows
[a-zA-Z][a-zA-Z0-9]* for group names but Perl allows \w+.
Encode underscores as "U95" in Java regex names, decode back
when accessing %+/%- hashes. Also handle \k<name> backrefs.
Test results:
- pat.t: 533/632 -> 1076/1298 (all 1298 now run, +543 passing)
- pat_advanced.t: 731/838 (unchanged)
- pat_rt_report.t: 2431/2508 (unchanged)
- uni/variables.t: 66880/66880 (unchanged)
- make: all unit tests pass
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…sequence handling Major fixes: - Refactor user-defined property resolution to use UnicodeSet directly instead of Java regex patterns, fixing properties that use +utf8:: references (e.g., +utf8::Uppercase, &utf8::ASCII) - Cache user-defined property sub results (matching Perl behavior of calling each property sub only once) - Fix regex cache preventing deferred property recompilation by evicting stale entries in ensureCompiledForRuntime() - Add Titlecase/TitlecaseLetter/Lt property aliases - Make (?&name) named group recursion downgradable with JPERL_UNIMPLEMENTED=warn - Make (?digit) numbered recursion downgradable (regexError -> regexUnimplemented) Test improvements: - pat_advanced.t: 731/838 -> 1308/1625 (+577 passes, +787 more tests run) - regexp_unicode_prop.t: 1000/1110 -> 1017/1096 (+17 passes, above baseline) - pat.t: 1076/1298 -> 1077/1298 (stable, +1 pass) Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- Add categories Q (package-scoped user properties), R (invalid \pX), S (/i caseless flag for user property subs) - Update test pass rates: pat_advanced 1308/1625, regexp_unicode_prop 1017/1096 - Update early termination table with new crash points - Document fixes 8-13 in progress tracking - Update priority recommendations Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
3d2b08e to
c6ee040
Compare
- Relax user-defined property regex patterns to accept any character after Is/In prefix (e.g., Is_q, Is_foo), matching Perl behavior where ANY Is/In-prefixed name triggers user-defined property lookup - Clamp code points > U+10FFFF to JVM limit instead of throwing fatal errors (Perl supports 31-bit code points, JVM does not) - Fixes pat_advanced.t crash at test 1625 (Is_q) and 1639 (Is_31_Bit_Super) - pat_advanced.t: 1324/1678 (was 1308/1625, +16 passed, all tests reached) Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes all
./jcpan -t Hash::MultiValuetest failures (was 7/10 programs passing, now 10/10), the./jcpan -t HTTP::Thinblib/arch issue, regex error-handling bugs under JPERL_UNIMPLEMENTED=warn, regex preprocessing issues, and a missing compile-time check for global-only variables inmy/statedeclarations.Bug 1: JVM backend — splice with constant sub causes ASM frame crash
handleSpliceBuiltinleft the first arg on the JVM operand stack while evaluating remaining argsCfromuse constant), the blockDispatcher GOTOs created inconsistent stack depths at merge points, crashing ASMFrame.mergeEmitOperator.java, matching the existinghandlePushOperatorpatternBug 2: Interpreter backend —
@$ref[@idx] = ...unsupportedCompileAssignment.javaonly supported@array[@idx]with plainIdentifierNodeDEREF_ARRAY/DEREF_ARRAY_NONSTRICTopcodesBug 3: Storable::dclone — shared refs from STORABLE_freeze hooks
dclonepassed extra refs fromSTORABLE_freezedirectly toSTORABLE_thawwithout cloning themHash::MultiValueended up sharing internal arrays between original and cloneSTORABLE_thawinStorable.javaBug 4: blib/arch missing in generated Makefile
ExtUtils::MakeMakergenerated a Makefile whosepure_alltarget never createdblib/arch/blib.pmrequires it and dies if missing@mkdir -p blib/archto thepure_alltarget inExtUtils/MakeMaker.pmBug 5: JPERL_UNIMPLEMENTED=warn downgrading real regex errors
\N{}) to become warnings, allowing execution to continue into infinite loopsPerlJavaUnimplementedException, not all exceptions, inRuntimeRegex.javaPerlJavaUnimplementedException(extendsPerlCompilerException) from real syntax errors, and wrap JavaPatternSyntaxExceptionas downgradableBug 6:
my $0/my @_/my %_not rejected at compile timemyorstateisGlobalOnlyVariable()check inOperatorParser.addVariableToScope()that emitsCan't use global $X in "my"(matching Perl's error message)Bug 7: Regex preprocessor — unimplemented features using wrong exception type
regexError()(throwsPerlCompilerException) was used for unimplemented features like control verbs,(?@...), and lookbehind >255regexUnimplemented()(throwsPerlJavaUnimplementedException) so they can be downgraded withJPERL_UNIMPLEMENTED=warnregexUnimplemented()inRegexPreprocessor.javaBug 8: handleCodeBlock consuming closing paren meant for handleParentheses
handleCodeBlockconsumed both}and)of(?{...}), buthandleParenthesesexpects to consume the closing)itself(?{...})code blocks in single-quoted regexeshandleCodeBlocknow returns offset pointing TO), lettinghandleParenthesesconsume itBug 9: handleQuantifier consuming regex metacharacters in brace expressions
handleQuantifier()usedindexOf('}')to find the closing brace, crossing character class and group boundaries{ (?> [^{}]+ | (??{...}) )* }had the(?>...consumed as literal text{and return immediatelyBug 10:
\x{...}hex escape with non-hex characters crashesInteger.parseInt(hexStr, 16)threw fatalNumberFormatExceptionfor\x{9bq}\xNNwith non-hex chars by parsing up to 2 hex digits.Bug 11: NullPointerException when regex fails with JPERL_UNIMPLEMENTED=warn
regex.patternStringwas never set in the catch block, causing NPE downstreamregex.patternStringin catch block, add null guard inpreProcessRegex()Bug 12:
\p{isAlpha}POSIX-style Unicode property aliases not recognized\p{isAlpha},\p{isSpace}etc. crashed because theIsprefix stripping was case-sensitiveIs/isprefix handling, addSpace,Alnum,Punct,White_Spacealiasesreturninside a literal can break JVM stack #1 crash blocker for pat.t — unblocked 666 previously unreachable testsBug 13:
\p{ASCII_Hex_Digit=True}Property=Value syntax not supported\p{ASCII_Hex_Digit=True}was passed as a single string to ICU4J=and pass property/value separately. Handle True/False/Yes/No values.Bug 14: Named capture groups with underscores crash Java regex
[a-zA-Z][a-zA-Z0-9]*for group names, but Perl allows\w+(?<_>abc)and(?<foo_bar>abc)caused fatalPatternSyntaxException_→U95,foo_bar→fooU95bar), decode back when accessing%+/%-hashes, handle\k<name>backrefsTest plan
make clean ; makepasses (all unit tests)./jcpan -t Hash::MultiValue— 10/10 test programs, 55/55 subtests PASS./jcpan -t HTTP::Thin— blib/arch created correctlyop/local.t— no regression vs master (142/170 ok on both)my $0,my @_,my %_,my $!,my $/,my $^W,state $_all correctly rejectedour $_,local $_,my $a,my %ENVall correctly alloweduni/variables.t— 66880/66880 (matches master)re/pat_rt_report.t— 2431/2515 (improved from 2397)re/pat.t— 1076/1298, all 1298 now run (was 533/632, crashed mid-run)re/pat_advanced.t— 731/838 (improved from 63)re/reg_eval_scope.t— 7/49 (improved from 0)Generated with Devin