Skip to content

Commit 3a3bb3f

Browse files
fix: Hash::MultiValue + blib/arch — splice spill, deref slice, dclone hooks (#478)
* fix: Hash::MultiValue test failures — splice spill, deref slice, dclone hooks Three bugs fixed: 1. JVM backend: splice with constant sub causes ASM frame crash handleSpliceBuiltin left the first arg on the JVM operand stack while evaluating remaining args. When those contained a function call (constant sub), the blockDispatcher's GOTOs created inconsistent stack depths at merge points. Fixed with register spilling, matching handlePushOperator's existing pattern. 2. Interpreter backend: @$ref[@idx] = ... unsupported The array slice assignment handler only supported @array[@idx] with plain IdentifierNode. Added a branch to handle dereferenced array refs using DEREF_ARRAY/DEREF_ARRAY_NONSTRICT opcodes. 3. Storable::dclone: shared refs from STORABLE_freeze hooks dclone passed extra refs from STORABLE_freeze directly to STORABLE_thaw without cloning them. Inside-out objects like Hash::MultiValue ended up sharing internal arrays between original and clone. Fixed by deep-cloning extra refs (indices 1+). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: create blib/arch in generated Makefile for -Mblib compatibility The generated Makefile pure_all target only created blib/lib/ but not blib/arch/. The blib.pm pragma requires both directories to exist (-d blib && -d blib/lib && -d blib/arch), so use blib and -Mblib would die with Cannot find blib even in ... This caused CPAN module test suites (e.g. HTTP::Thin) to fail when they use open3 with -Mblib to verify modules load cleanly. Add mkdir -p blib/arch to the pure_all target so the directory always exists alongside blib/lib. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: JPERL_UNIMPLEMENTED=warn should only downgrade unimplemented features The regex error catch block was downgrading ALL exceptions to warnings when JPERL_UNIMPLEMENTED=warn was set, including real compilation errors like "Invalid Unicode character name". Now only PerlJavaUnimplementedException is downgraded to a warning; other regex errors remain fatal. This fixes a hang in lib/croak.t where `qr/(?{})\N{}/;while(my($0)=0){}` would continue past the \N{} error into an infinite while loop when JPERL_UNIMPLEMENTED=warn was set by the test runner. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: reject my/state of global-only variables at compile time Perl requires that forced-global variables ($_, @_, %_, $0, $1, $!, $/, $^W, etc.) cannot be lexicalized with 'my' or 'state'. Previously jperl silently allowed this, which could cause infinite loops when my($0) returned truthy in a while condition. Now emits: Can't use global $X in "my" (matching Perl's error message. The check covers: - Underscore variables: src/main/java/org/perlonjava/frontend/parser/OperatorParser.java, @_, %_ - Digit-only names: bash, , , ... - Single punctuation names: , $/, , $;, $,, $., $|, etc. - Caret/control variables: $^W, $^H, $^O, etc. 'our' and 'local' are unaffected (they correctly alias/localize globals). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> EOF ) * fix: regex error handling regressions and Unicode var check - RegexPreprocessor: use regexUnimplemented() (not regexError()) for control verbs, (?@...), and lookbehind >255 so JPERL_UNIMPLEMENTED=warn can downgrade them (fixes pat_rt_report.t: 196 -> 2397) - RuntimeRegex: restructure catch block to distinguish PerlJavaUnimplementedException (extends PerlCompilerException) from real PerlCompilerException syntax errors. Wrap Java PatternSyntaxException as unimplemented so it can be downgraded. (fixes pat_advanced.t: 54 -> 63) - RegexPreprocessor.handleCodeBlock: don't consume closing paren - let handleParentheses consume it, matching the protocol used by all other group handlers. Fixes code blocks causing "Unmatched (" errors. (fixes pat.t: 239 -> 428) - OperatorParser.isGlobalOnlyVariable: restrict single-char punctuation check to ASCII (< 128) so Unicode characters that Java doesn't recognize as letters aren't rejected. (fixes uni/variables.t: 66803 -> 66880) Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: regex preprocessing -- brace quantifier, hex escapes, NPE - Fix handleQuantifier consuming regex metacharacters inside invalid brace expressions (e.g., {(?>...)*} was consumed as a single literal brace expression). Now only escapes the opening { and lets the main loop process subsequent characters. - Fix \x{...} hex escapes with non-hex characters to extract valid hex prefix like Perl does (e.g., \x{9bq} -> 0x9B). Fixes fatal crash in pat_advanced.t at line 321. - Handle bare \xNN with non-hex chars (e.g., \xk -> \x00 + literal k) instead of passing through to Java Pattern which rejects it. - Fix NullPointerException when regex with (?{...}) code blocks fails with JPERL_UNIMPLEMENTED=warn: set regex.patternString in catch block and add null guard in preProcessRegex. Test improvements: pat_advanced.t: 63 -> 731 passing (+668) pat.t: 428 -> 533 passing (+105) Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * docs: categorize all remaining regex test failures with priorities Analyzed pat.t (99 failures + 666 blocked), pat_advanced.t (107 failures), and pat_rt_report.t (77 failures) into 16 categories (A-P) with difficulty ratings and priority recommendations. Key findings: - \p{isAlpha} alias crash blocks 666 pat.t tests (quick fix) - Bug 41010 conditional+$ anchor accounts for 48 pat_rt_report.t failures - $^N not updated = 20 pat_advanced.t failures - \N{name} charnames = 25 pat_advanced.t failures - (?{...}) code blocks = 46 failures (very hard, engine limitation) Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: Unicode property aliases, Property=Value syntax, underscore group names Three regex fixes that unblock 543 additional pat.t tests: 1. \p{isAlpha} POSIX-style aliases: make Is/is prefix stripping case-insensitive, add Space/Alnum/Punct/White_Space aliases to the switch statement 2. \p{Property=Value} syntax: split on '=' and pass property name and value separately to ICU4J. Handle True/False/Yes/No values. 3. Named capture groups with underscores: Java regex only allows [a-zA-Z][a-zA-Z0-9]* for group names but Perl allows \w+. Encode underscores as "U95" in Java regex names, decode back when accessing %+/%- hashes. Also handle \k<name> backrefs. Test results: - pat.t: 533/632 -> 1076/1298 (all 1298 now run, +543 passing) - pat_advanced.t: 731/838 (unchanged) - pat_rt_report.t: 2431/2508 (unchanged) - uni/variables.t: 66880/66880 (unchanged) - make: all unit tests pass Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * docs: update design doc -- pat.t fully unblocked, 1076/1298 passing Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: user-defined Unicode properties, regex cache, and unimplemented sequence handling Major fixes: - Refactor user-defined property resolution to use UnicodeSet directly instead of Java regex patterns, fixing properties that use +utf8:: references (e.g., +utf8::Uppercase, &utf8::ASCII) - Cache user-defined property sub results (matching Perl behavior of calling each property sub only once) - Fix regex cache preventing deferred property recompilation by evicting stale entries in ensureCompiledForRuntime() - Add Titlecase/TitlecaseLetter/Lt property aliases - Make (?&name) named group recursion downgradable with JPERL_UNIMPLEMENTED=warn - Make (?digit) numbered recursion downgradable (regexError -> regexUnimplemented) Test improvements: - pat_advanced.t: 731/838 -> 1308/1625 (+577 passes, +787 more tests run) - regexp_unicode_prop.t: 1000/1110 -> 1017/1096 (+17 passes, above baseline) - pat.t: 1076/1298 -> 1077/1298 (stable, +1 pass) Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * docs: update design doc with new problems Q/R/S and progress tracking - Add categories Q (package-scoped user properties), R (invalid \pX), S (/i caseless flag for user property subs) - Update test pass rates: pat_advanced 1308/1625, regexp_unicode_prop 1017/1096 - Update early termination table with new crash points - Document fixes 8-13 in progress tracking - Update priority recommendations Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix: handle Is/In-prefixed Unicode properties with non-uppercase chars - Relax user-defined property regex patterns to accept any character after Is/In prefix (e.g., Is_q, Is_foo), matching Perl behavior where ANY Is/In-prefixed name triggers user-defined property lookup - Clamp code points > U+10FFFF to JVM limit instead of throwing fatal errors (Perl supports 31-bit code points, JVM does not) - Fixes pat_advanced.t crash at test 1625 (Is_q) and 1639 (Is_31_Bit_Super) - pat_advanced.t: 1324/1678 (was 1308/1625, +16 passed, all tests reached) Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --------- Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
1 parent e6849c6 commit 3a3bb3f

13 files changed

Lines changed: 903 additions & 124 deletions

File tree

dev/design/regex_preprocessing_fixes.md

Lines changed: 282 additions & 0 deletions
Large diffs are not rendered by default.

src/main/java/org/perlonjava/backend/bytecode/CompileAssignment.java

Lines changed: 45 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1189,31 +1189,54 @@ public static void compileAssignmentOperator(BytecodeCompiler bytecodeCompiler,
11891189
// Handle array slice assignment: @array[1, 3, 5] = (20, 30, 40)
11901190
if (leftBin.operator.equals("[") && leftBin.left instanceof OperatorNode arrayOp) {
11911191

1192-
// Must be @array (not $array)
1193-
if (arrayOp.operator.equals("@") && arrayOp.operand instanceof IdentifierNode) {
1194-
String varName = "@" + ((IdentifierNode) arrayOp.operand).name;
1195-
1192+
// Must be @array or @$ref (not $array)
1193+
if (arrayOp.operator.equals("@")) {
11961194
int arrayReg;
1197-
if (bytecodeCompiler.currentSubroutineBeginId != 0 && bytecodeCompiler.currentSubroutineClosureVars != null
1198-
&& bytecodeCompiler.currentSubroutineClosureVars.contains(varName)) {
1195+
1196+
if (arrayOp.operand instanceof IdentifierNode) {
1197+
String varName = "@" + ((IdentifierNode) arrayOp.operand).name;
1198+
1199+
if (bytecodeCompiler.currentSubroutineBeginId != 0 && bytecodeCompiler.currentSubroutineClosureVars != null
1200+
&& bytecodeCompiler.currentSubroutineClosureVars.contains(varName)) {
1201+
arrayReg = bytecodeCompiler.allocateRegister();
1202+
int nameIdx = bytecodeCompiler.addToStringPool(varName);
1203+
bytecodeCompiler.emitWithToken(Opcodes.RETRIEVE_BEGIN_ARRAY, node.getIndex());
1204+
bytecodeCompiler.emitReg(arrayReg);
1205+
bytecodeCompiler.emit(nameIdx);
1206+
bytecodeCompiler.emit(bytecodeCompiler.currentSubroutineBeginId);
1207+
} else if (bytecodeCompiler.hasVariable(varName)) {
1208+
arrayReg = bytecodeCompiler.getVariableRegister(varName);
1209+
} else {
1210+
arrayReg = bytecodeCompiler.allocateRegister();
1211+
String globalArrayName = NameNormalizer.normalizeVariableName(
1212+
((IdentifierNode) arrayOp.operand).name,
1213+
bytecodeCompiler.getCurrentPackage()
1214+
);
1215+
int nameIdx = bytecodeCompiler.addToStringPool(globalArrayName);
1216+
bytecodeCompiler.emit(Opcodes.LOAD_GLOBAL_ARRAY);
1217+
bytecodeCompiler.emitReg(arrayReg);
1218+
bytecodeCompiler.emit(nameIdx);
1219+
}
1220+
} else if (arrayOp.operand instanceof OperatorNode || arrayOp.operand instanceof BlockNode) {
1221+
// @$ref[@idx] = ... or @{expr}[@idx] = ...
1222+
// Compile the scalar reference expression and dereference to array
1223+
bytecodeCompiler.compileNode(arrayOp.operand, -1, RuntimeContextType.SCALAR);
1224+
int scalarReg = bytecodeCompiler.lastResultReg;
11991225
arrayReg = bytecodeCompiler.allocateRegister();
1200-
int nameIdx = bytecodeCompiler.addToStringPool(varName);
1201-
bytecodeCompiler.emitWithToken(Opcodes.RETRIEVE_BEGIN_ARRAY, node.getIndex());
1202-
bytecodeCompiler.emitReg(arrayReg);
1203-
bytecodeCompiler.emit(nameIdx);
1204-
bytecodeCompiler.emit(bytecodeCompiler.currentSubroutineBeginId);
1205-
} else if (bytecodeCompiler.hasVariable(varName)) {
1206-
arrayReg = bytecodeCompiler.getVariableRegister(varName);
1226+
if (bytecodeCompiler.isStrictRefsEnabled()) {
1227+
bytecodeCompiler.emitWithToken(Opcodes.DEREF_ARRAY, node.getIndex());
1228+
bytecodeCompiler.emitReg(arrayReg);
1229+
bytecodeCompiler.emitReg(scalarReg);
1230+
} else {
1231+
int pkgIdx = bytecodeCompiler.addToStringPool(bytecodeCompiler.getCurrentPackage());
1232+
bytecodeCompiler.emitWithToken(Opcodes.DEREF_ARRAY_NONSTRICT, node.getIndex());
1233+
bytecodeCompiler.emitReg(arrayReg);
1234+
bytecodeCompiler.emitReg(scalarReg);
1235+
bytecodeCompiler.emit(pkgIdx);
1236+
}
12071237
} else {
1208-
arrayReg = bytecodeCompiler.allocateRegister();
1209-
String globalArrayName = NameNormalizer.normalizeVariableName(
1210-
((IdentifierNode) arrayOp.operand).name,
1211-
bytecodeCompiler.getCurrentPackage()
1212-
);
1213-
int nameIdx = bytecodeCompiler.addToStringPool(globalArrayName);
1214-
bytecodeCompiler.emit(Opcodes.LOAD_GLOBAL_ARRAY);
1215-
bytecodeCompiler.emitReg(arrayReg);
1216-
bytecodeCompiler.emit(nameIdx);
1238+
bytecodeCompiler.throwCompilerException("Array slice assignment requires identifier or reference");
1239+
return;
12171240
}
12181241

12191242
// Compile indices (right side of [])

src/main/java/org/perlonjava/backend/jvm/EmitOperator.java

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -534,9 +534,28 @@ static void handleSpliceBuiltin(EmitterVisitor emitterVisitor, OperatorNode node
534534

535535
if (first != null) {
536536
try {
537+
MethodVisitor mv = emitterVisitor.ctx.mv;
537538
first.accept(emitterVisitor.with(RuntimeContextType.LIST));
539+
540+
// Spill the first operand before evaluating remaining args so
541+
// non-local control flow can't jump to returnLabel with an
542+
// extra value on the JVM operand stack.
543+
int firstSlot = emitterVisitor.ctx.javaClassInfo.acquireSpillSlot();
544+
boolean pooled = firstSlot >= 0;
545+
if (!pooled) {
546+
firstSlot = emitterVisitor.ctx.symbolTable.allocateLocalVariable();
547+
}
548+
mv.visitVarInsn(Opcodes.ASTORE, firstSlot);
549+
538550
// Accept the remaining arguments in LIST context.
539551
args.accept(emitterVisitor.with(RuntimeContextType.LIST));
552+
553+
mv.visitVarInsn(Opcodes.ALOAD, firstSlot);
554+
mv.visitInsn(Opcodes.SWAP);
555+
556+
if (pooled) {
557+
emitterVisitor.ctx.javaClassInfo.releaseSpillSlot();
558+
}
540559
} finally {
541560
listArgs.elements.addFirst(first);
542561
}

src/main/java/org/perlonjava/core/Configuration.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ public final class Configuration {
3333
* Automatically populated by Gradle/Maven during build.
3434
* DO NOT EDIT MANUALLY - this value is replaced at build time.
3535
*/
36-
public static final String gitCommitId = "b8043f312";
36+
public static final String gitCommitId = "c6ee04074";
3737

3838
/**
3939
* Git commit date of the build (ISO format: YYYY-MM-DD).
@@ -48,7 +48,7 @@ public final class Configuration {
4848
* Parsed by App::perlbrew and other tools via: perl -V | grep "Compiled at"
4949
* DO NOT EDIT MANUALLY - this value is replaced at build time.
5050
*/
51-
public static final String buildTimestamp = "Apr 10 2026 11:59:23";
51+
public static final String buildTimestamp = "Apr 10 2026 13:40:40";
5252

5353
// Prevent instantiation
5454
private Configuration() {

src/main/java/org/perlonjava/frontend/parser/OperatorParser.java

Lines changed: 68 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -252,6 +252,61 @@ static BinaryOperatorNode parsePrint(Parser parser, LexerToken token, int curren
252252
return new BinaryOperatorNode(token.text, handle, operand, currentIndex);
253253
}
254254

255+
/**
256+
* Check if a variable name refers to a forced-global variable that cannot
257+
* be lexicalized with 'my' or 'state'.
258+
*
259+
* Perl rule: the following are always global:
260+
* - $_, @_, %_ (the underscore variables, since Perl 5.30)
261+
* - $0, $1, $2, ... (digit-only names)
262+
* - $!, $/, $@, $;, $,, $., $|, etc. (single punctuation character names)
263+
* - $^W, $^H, etc. (control character / caret variable names)
264+
*/
265+
private static boolean isGlobalOnlyVariable(String name) {
266+
if (name == null || name.isEmpty()) return false;
267+
268+
// Underscore: $_, @_, %_ are all forced global (since Perl 5.30)
269+
if (name.equals("_")) return true;
270+
271+
// Digit-only names: $0, $1, $2, ...
272+
boolean allDigits = true;
273+
for (int i = 0; i < name.length(); i++) {
274+
if (!Character.isDigit(name.charAt(i))) {
275+
allDigits = false;
276+
break;
277+
}
278+
}
279+
if (allDigits) return true;
280+
281+
// Single ASCII non-alphanumeric, non-underscore character: $!, $/, $@, $;, etc.
282+
// Only check ASCII range — Unicode characters (>= 128) may be valid identifiers
283+
// even if Java's Character.isLetterOrDigit() doesn't recognize them.
284+
if (name.length() == 1) {
285+
char c = name.charAt(0);
286+
if (c < 128 && !Character.isLetterOrDigit(c) && c != '_') return true;
287+
}
288+
289+
// Control character prefix (caret variables like $^W stored as chr(23))
290+
if (name.charAt(0) < 32) return true;
291+
292+
return false;
293+
}
294+
295+
/**
296+
* Format a variable name for display in error messages.
297+
* Converts internal control character representation back to ^X form.
298+
* E.g., chr(23) + "" becomes "^W", chr(8) + "MATCH" becomes "^HMATCH".
299+
*/
300+
private static String formatVarNameForDisplay(String name) {
301+
if (name == null || name.isEmpty()) return name;
302+
char first = name.charAt(0);
303+
if (first < 32) {
304+
// Control character: convert to ^X notation
305+
return "^" + (char) (first + 'A' - 1) + name.substring(1);
306+
}
307+
return name;
308+
}
309+
255310
private static void addVariableToScope(EmitterContext ctx, String operator, OperatorNode node) {
256311
String sigil = node.operator;
257312
if ("$@%".contains(sigil)) {
@@ -260,7 +315,19 @@ private static void addVariableToScope(EmitterContext ctx, String operator, Oper
260315
if (identifierNode instanceof IdentifierNode) { // my $a
261316
String name = ((IdentifierNode) identifierNode).name;
262317
String var = sigil + name;
263-
318+
319+
// Check for global-only variables in my/state declarations
320+
// Perl: "Can't use global $0 in "my""
321+
if ((operator.equals("my") || operator.equals("state"))
322+
&& isGlobalOnlyVariable(name)) {
323+
throw new PerlCompilerException(
324+
node.getIndex(),
325+
"Can't use global " + sigil + formatVarNameForDisplay(name)
326+
+ " in \"" + operator + "\"",
327+
ctx.errorUtil
328+
);
329+
}
330+
264331
// Check for redeclaration warnings
265332
if (operator.equals("our")) {
266333
// For 'our', only warn if redeclared in the same package (matching Perl behavior)

src/main/java/org/perlonjava/runtime/perlmodule/Storable.java

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -548,9 +548,12 @@ private static RuntimeScalar deepClone(RuntimeScalar scalar, IdentityHashMap<Obj
548548
RuntimeArray thawArgs = new RuntimeArray();
549549
RuntimeArray.push(thawArgs, newObj);
550550
RuntimeArray.push(thawArgs, new RuntimeScalar(1)); // cloning = true
551-
// Pass serialized data and any extra refs from freeze
552-
for (int i = 0; i < freezeArray.size(); i++) {
553-
RuntimeArray.push(thawArgs, freezeArray.get(i));
551+
// First element is the serialized string — pass as-is
552+
RuntimeArray.push(thawArgs, freezeArray.get(0));
553+
// Remaining elements are extra refs — deep-clone them
554+
// so the thawed object gets independent copies
555+
for (int i = 1; i < freezeArray.size(); i++) {
556+
RuntimeArray.push(thawArgs, deepClone(freezeArray.get(i), cloned));
554557
}
555558
RuntimeCode.apply(thawMethod, thawArgs, RuntimeContextType.VOID);
556559
}

src/main/java/org/perlonjava/runtime/regex/CaptureNameEncoder.java

Lines changed: 58 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -161,15 +161,64 @@ public static boolean isCodeBlockCapture(String captureName) {
161161
return captureName != null && captureName.startsWith("cb") && captureName.length() > 5;
162162
}
163163

164-
// FUTURE ENHANCEMENTS:
165-
//
166-
// For underscore support: (?<my_name>)
167-
// Use the same hex encoding pattern: (?<ncHEX>) where HEX encodes "my_name"
168-
// Then %CAPTURE decodes back to show original name to user
164+
// UNDERSCORE ENCODING:
169165
//
170-
// For duplicate names: (?<name>a)|(?<name>b)
171-
// Encode with disambiguation: (?<ncHEX1>a)|(?<ncHEX2>b) where HEX encodes "name"
172-
// Track mapping for proper capture group retrieval
166+
// Java regex doesn't allow underscores in group names (only [a-zA-Z][a-zA-Z0-9]*).
167+
// Perl allows \w+ (letters, digits, underscores) for group names.
173168
//
174-
// The generic hex encoding pattern is reusable for all Java regex limitations!
169+
// Encoding: Replace each underscore with "U95" (ASCII code 95 for '_')
170+
// (?<my_name>) → (?<myU95name>)
171+
// (?<_>) → (?<U95>)
172+
// (?<_foo>) → (?<U95foo>)
173+
//
174+
// Names starting with underscore need a letter prefix for Java, so U95 works
175+
// since it starts with 'U'. To avoid ambiguity, literal "U95" sequences in
176+
// names are escaped as "UU95" (the 'U' itself is escaped).
177+
178+
/**
179+
* Encodes a Perl capture group name for use in Java regex.
180+
* Replaces underscores with "U95" and escapes literal "U95" sequences.
181+
*
182+
* @param perlName The original Perl capture group name
183+
* @return The encoded name safe for Java regex, or the original if no encoding needed
184+
*/
185+
public static String encodeGroupName(String perlName) {
186+
if (perlName == null || (!perlName.contains("_") && !perlName.contains("U95"))) {
187+
return perlName;
188+
}
189+
// First escape any existing "U95" as "UU95" to avoid ambiguity
190+
String encoded = perlName.replace("U95", "UU95");
191+
// Then replace underscores with "U95"
192+
encoded = encoded.replace("_", "U95");
193+
return encoded;
194+
}
195+
196+
/**
197+
* Decodes a Java regex capture group name back to the original Perl name.
198+
* Reverses the encoding done by encodeGroupName.
199+
*
200+
* @param javaName The encoded Java group name
201+
* @return The original Perl capture group name
202+
*/
203+
public static String decodeGroupName(String javaName) {
204+
if (javaName == null || !javaName.contains("U95")) {
205+
return javaName;
206+
}
207+
// First restore underscores from "U95"
208+
String decoded = javaName.replace("U95", "_");
209+
// Then restore literal "U95" from "U_95" (which was "UU95" before first step)
210+
decoded = decoded.replace("U_95", "U95");
211+
return decoded;
212+
}
213+
214+
/**
215+
* Checks if a capture group name is an internal name that should be hidden
216+
* from user-visible variables like %+ and %-.
217+
*
218+
* @param captureName The capture group name to check
219+
* @return true if this is an internal capture (code block or \K marker)
220+
*/
221+
public static boolean isInternalCapture(String captureName) {
222+
return isCodeBlockCapture(captureName) || "perlK".equals(captureName);
223+
}
175224
}

0 commit comments

Comments
 (0)