diff --git a/dev/modules/jcpan_datetimex_easy.md b/dev/modules/jcpan_datetimex_easy.md new file mode 100644 index 000000000..31129991c --- /dev/null +++ b/dev/modules/jcpan_datetimex_easy.md @@ -0,0 +1,247 @@ +# jcpan DateTimeX::Easy Fix Plan + +## Overview + +Running `jcpan -t DateTimeX::Easy` fails because `DateTimeX::Easy` depends +(transitively) on three CPAN dists whose tests fail under PerlOnJava. The +`DateTimeX::Easy` distribution itself is fine — its `t/00-load.t` only fails +because `DateTime::Format::Natural` never finishes installing. + +``` +DateTimeX::Easy +├── DateTime::Format::DateManip ← FAILS (Issue A: regex translator) +├── DateTime::Format::ICal ← passes +├── DateTime::Set ← passes +└── DateTime::Format::Natural ← FAILS (Issues B + cascade) + ├── Module::Util ← FAILS (Issue C: stray warning) + └── Test::MockTime::HiRes ← FAILS (Issue D: sleep override) +``` + +This document tracks all three root-cause issues and the plan to fix them. + +--- + +## Issue A — Duplicate named capture groups in alternation + +### Symptom + +`DateTime::Format::DateManip` test `t/01conversions.t` exits 255 with no +subtests run because `Date::Manip`'s regex fails to compile: + +``` +Regex compilation failed: Named capturing group is already defined near index 186 +``` + +Reproducer: + +```sh +$ ./jperl -e 'qr/(?foo)|(?bar)/' +Regex compilation failed: Named capturing group is already defined +``` + +### Root cause + +Date::Manip generates patterns like: + +``` +(?:(?[-+]?\d+(?:\.\d*)?|\.\d+)|(?twenty-one|two|...)) +``` + +A single named capture used in alternation branches. Perl 5.10+ accepts this +implicitly (one logical group, two textual occurrences). Java's regex engine +always rejects duplicate names. + +### Fix plan + +Detect the duplicate-name case in the regex translator +(`src/main/java/org/perlonjava/regex/`) and emit a Java-acceptable form that +still produces the right `%+` / `%-` matches. Two viable approaches: + +1. **Rename + capture map**: rewrite the second occurrence to a unique synthetic + name (e.g. `__pjdup0_y`) and remember the alias so `%+`/`%-`/`$+{name}` + merges all branches under the original name. +2. **Branch reset**: where the duplicates sit at sibling alternation branches, + wrap them in `(?|...)`. Simpler when applicable but doesn't cover every + shape Date::Manip produces, so option 1 is the reliable target. + +Tests to add: a unit test in `src/test/resources/unit/` covering +`(?foo)|(?bar)` matching either branch, and a multi-name case. + +### Status + +Pending — biggest effort of the three. + +--- + +## Issue B — `t/11-parse_success.t` failures in DateTime::Format::Natural + +### Symptom + +5/19 subtests fail; phrases ending in implicit hour are not parsed: +`feb 28 at 3`, `28 feb at 3`, `may 22nd 2011 at 9`, `22nd may 2011 at 9`, +`saturday 3 months ago at 5`. Accompanied by: +`Use of uninitialized value in hash element at .../DateTime/Format/Natural/Lang/EN.pm line 251`. + +### Status + +Out of scope for this iteration — needs a small repro before deciding root +cause (could be lookbehind/lookahead semantics, hash-key coercion, or +something else). Tracked here for follow-up; not a blocker for installing +`DateTimeX::Easy`. + +--- + +## Issue C — Spurious `Can't stat` warning in `Module::Util` + +### Symptom + +`Module-Util-1.09 t/01..module.t` test 44 fails: + +``` +not ok 44 - no warnings generated when searching in missing path +# Can't stat /Users/.../fake/path: No such file or directory +``` + +The test does: + +```perl +local $SIG{__WARN__} = sub { push @warnings, @_ }; +find_in_namespace('', catdir(qw( fake path ))); +ok !@warnings, 'no warnings generated when searching in missing path'; +``` + +Real Perl is silent when `find_in_namespace` is given a non-existent path. +PerlOnJava emits a `Can't stat` warning. + +### Fix plan + +`Module::Util::find_in_namespace` ultimately calls `File::Find::find` / +`File::Find::finddepth`. The warning comes from PerlOnJava's `File::Find` +(or one level deeper, `opendir`/`stat`). Reproduce: + +```sh +$ ./jperl -e 'use File::Find; find(sub {}, "/no/such/dir")' +``` + +Compare with system perl: + +```sh +$ perl -e 'use File::Find; find(sub {}, "/no/such/dir")' +``` + +If real Perl is silent and we warn, locate the warning source in our +`File::Find` (`src/main/perl/lib/File/Find.pm`) or in the underlying +`opendir`/`stat` operator and either: + +- guard the `Can't stat` warning behind `$^W`/`use warnings`-style check + matching real Perl's behaviour, or +- skip the warning when the path does not exist (real Perl's `File::Find` + silently skips non-existent top-level dirs by default; only `no_chdir` + variants warn, controlled by `$File::Find::dont_use_nlink` etc.). + +Add a unit test asserting silence for `find(sub {}, "/no/such/path")`. + +### Status + +Pending. + +--- + +## Issue D — `*CORE::GLOBAL::sleep` override is not honored + +### Symptom + +`Test::MockTime::HiRes` fails 7 subtests across `t/01_core.t`, `t/02_hires.t`, +`t/03_anyevent.t` because `mock_time { ... } $now;` cannot intercept `sleep` +to advance the mocked clock without actually waiting. + +Reproducer: + +```sh +$ ./jperl -e 'BEGIN { *CORE::GLOBAL::sleep = sub { print "mocked $_[0]\n" } } + sleep 2; print "done\n"' +done # waits 2s; should print "mocked 2" then "done" instantly +``` + +`*CORE::GLOBAL::time` is honored ✅; `*CORE::GLOBAL::sleep` is not ❌. Same +applies to `Time::HiRes::sleep` — note the **plain symbol-table override** +(`*Time::HiRes::sleep = sub { ... }`) **does** work; the test failures involve +`CORE::GLOBAL::sleep`. + +### Fix plan + +Find where `time` looks up its `CORE::GLOBAL::*` override and replicate it for +`sleep`, plus `usleep`/`nanosleep`/`gettimeofday` if needed by the test suite. + +Search: `grep -rn "CORE::GLOBAL" src/main/java/org/perlonjava/operators/` +and `grep -rn "\"sleep\"" src/main/java/org/perlonjava/`. + +### Status + +Pending. + +--- + +## Issue E — DateTimeX::Easy `t/00-load.t` cascade + +Pure consequence of Issues A and B preventing `DateTime::Format::Natural` +from being installed. No direct work item. + +--- + +## Implementation order + +1. **Issue D** (CORE::GLOBAL::sleep) — small, contained, unblocks + Test::MockTime::HiRes immediately. +2. **Issue C** (silent missing-path) — small, unblocks Module::Util. +3. **Issue A** (duplicate named captures) — larger; unblocks Date::Manip and + many other CPAN modules. +4. **Issue B** (Natural parse failures) — defer; investigate after A is done + so the dependency chain is healthy. + +Each fix lands in its own commit on a feature branch so it can be reviewed +independently. + +## Progress Tracking + +### Current Status: COMPLETE — `jcpan -t DateTimeX::Easy` exits 0 + +### Completed Phases +- [x] Plan written (2026-04-25) +- [x] Issue D (`CORE::GLOBAL::sleep`) fixed (2026-04-25) + - `src/main/java/org/perlonjava/frontend/parser/ParserTables.java`: added `sleep` to `OVERRIDABLE_OP`. + - Test: `src/test/resources/unit/operator_overrides.t` — new "sleep operator override" subtest. +- [x] Issue C (silent missing-path) fixed (2026-04-25) + - `src/main/java/org/perlonjava/frontend/parser/SpecialBlockParser.java`: emit a `CompilerFlagNode` after a BEGIN block so `BEGIN { unimport warnings ... }` propagates the runtime warning scope (`local ${^WARNING_SCOPE} = N`) the same way `parseUseDeclaration` does for `use`/`no` declarations. + - Test: `src/test/resources/unit/warnings.t` — new test covering the `File::Find` use case. +- [x] Issue A (duplicate named captures) fixed (2026-04-25) + - `src/main/java/org/perlonjava/runtime/regex/RegexPreprocessor.java`: track named captures already emitted in the current pattern and suffix subsequent occurrences with `ZpjdupZ`. Also skip name text inside `(?` / `(?'name'` / `(?P` / `(?P=name)` when applying multi-char fold expansion (otherwise `(?...)/i` becomes `(?...)`). + - `src/main/java/org/perlonjava/runtime/regex/CaptureNameEncoder.java`: `decodeGroupName` strips the duplicate marker; new `stripDuplicateMarker`/`isDuplicateMarkerName` helpers. + - `src/main/java/org/perlonjava/runtime/runtimetypes/HashSpecialVariable.java`: group duplicate captures by decoded Perl name in `entrySet`/`get`/`containsKey` so `$+{name}` returns the matched alternative and `$-{name}` returns an arrayref of all alternatives. + - Test: `src/test/resources/unit/regex/regex_named_capture.t` — new test cases 10 & 11. +- [x] Issue B (`DateTime::Format::Natural` parse_success) fixed (2026-04-25) + - `src/main/java/org/perlonjava/runtime/regex/RuntimeRegex.java`: `replaceRegex()` now honors `pos()` when the pattern uses `\G`, so `s/\G/.../` after a previous `/g` match anchors at the previous match-end instead of offset 0. This is what `DateTime::Format::Natural::Rewrite::_rewrite_conditional` relies on to turn "feb 28 at 3" into "feb 28 at 3:00". + - Test: `src/test/resources/unit/regex/regex_g_pos.t` — new "\\G in s/// honors pos()" subtest. +- [x] `Time::HiRes::gettimeofday` scalar context (2026-04-25) + - `src/main/java/org/perlonjava/runtime/perlmodule/TimeHiRes.java`: scalar/void context returns `seconds + micros/1_000_000` instead of just microseconds. +- [x] `Time::Piece->strptime` lenient parsing (2026-04-25) + - `src/main/java/org/perlonjava/runtime/perlmodule/TimePiece.java`: switched to `DateTimeFormatterBuilder.parseLenient().appendPattern(...)` so non-zero-padded fields (e.g. "1:13:8" against `%H:%M:%S`) parse successfully. + +### Verification + +`jcpan -t DateTimeX::Easy` exit code: **0**. Per-distribution results: + +| Dist | Tests | Result | +|---|---|---| +| DateTime::Format::DateManip | 7 | PASS | +| DateTime::Set | 959 | PASS | +| Module::Util | 47 | PASS | +| Test::MockTime::HiRes | 216 | PASS | +| DateTime::Format::Natural | 11077 | PASS | +| DateTimeX::Easy | 107 | PASS | + +`make` (full build + parallel unit tests) is green on the feature branch. + +### Notes / Possible follow-ups +- All seven items above land on the same feature branch / single PR; consider splitting if reviewers prefer smaller commits per fix (each is already its own commit). +- The duplicate-name marker `ZpjdupZ` chosen for capture names is unlikely but theoretically collidable with a user name ending in `ZpjdupZ\d+`. If that ever becomes a concern, the encoder could escape literal `ZpjdupZ` sequences before suffixing — analogous to the existing `U95` underscore round-trip. diff --git a/src/main/java/org/perlonjava/core/Configuration.java b/src/main/java/org/perlonjava/core/Configuration.java index 0eef179a0..da6b8686c 100644 --- a/src/main/java/org/perlonjava/core/Configuration.java +++ b/src/main/java/org/perlonjava/core/Configuration.java @@ -33,7 +33,7 @@ public final class Configuration { * Automatically populated by Gradle/Maven during build. * DO NOT EDIT MANUALLY - this value is replaced at build time. */ - public static final String gitCommitId = "2bfd27045"; + public static final String gitCommitId = "0443cf987"; /** * Git commit date of the build (ISO format: YYYY-MM-DD). @@ -48,7 +48,7 @@ public final class Configuration { * Parsed by App::perlbrew and other tools via: perl -V | grep "Compiled at" * DO NOT EDIT MANUALLY - this value is replaced at build time. */ - public static final String buildTimestamp = "Apr 25 2026 19:38:25"; + public static final String buildTimestamp = "Apr 25 2026 22:05:08"; // Prevent instantiation private Configuration() { diff --git a/src/main/java/org/perlonjava/frontend/parser/ParserTables.java b/src/main/java/org/perlonjava/frontend/parser/ParserTables.java index 9ebf250ae..c44c8f67c 100644 --- a/src/main/java/org/perlonjava/frontend/parser/ParserTables.java +++ b/src/main/java/org/perlonjava/frontend/parser/ParserTables.java @@ -34,6 +34,7 @@ public class ParserTables { "kill", "oct", "open", "readline", "readpipe", "rename", "require", + "sleep", "stat", "system", "time", "uc", diff --git a/src/main/java/org/perlonjava/frontend/parser/SpecialBlockParser.java b/src/main/java/org/perlonjava/frontend/parser/SpecialBlockParser.java index 2f6ce3e20..266655f5b 100644 --- a/src/main/java/org/perlonjava/frontend/parser/SpecialBlockParser.java +++ b/src/main/java/org/perlonjava/frontend/parser/SpecialBlockParser.java @@ -7,6 +7,7 @@ import org.perlonjava.frontend.lexer.LexerTokenType; import org.perlonjava.frontend.semantic.ScopedSymbolTable; import org.perlonjava.frontend.semantic.SymbolTable; +import org.perlonjava.runtime.HintHashRegistry; import org.perlonjava.runtime.runtimetypes.*; import java.util.ArrayList; @@ -126,6 +127,35 @@ static Node parseSpecialBlock(Parser parser) { // Execute other special blocks normally runSpecialBlock(parser, blockName, block); + // After a BEGIN block runs, propagate any compile-time state changes the + // block made (e.g. `BEGIN { unimport warnings qw(File::Find) }`) to the + // surrounding lexical scope at runtime, the same way `parseUseDeclaration` + // does for `use`/`no` statements. Without this, a BEGIN block that calls + // `warnings::unimport` (or any pragma `unimport`) would set lastScopeId + // but never emit `local ${^WARNING_SCOPE} = N`, so warnings::warnif would + // not honor the suppression at runtime. + if ("BEGIN".equals(blockName)) { + int warningScopeId = WarningFlags.getLastScopeId(); + WarningFlags.clearLastScopeId(); + + java.util.BitSet fatalFlags = (java.util.BitSet) parser.ctx.symbolTable.warningFatalStack.peek().clone(); + java.util.BitSet disabledFlags = (java.util.BitSet) parser.ctx.symbolTable.warningDisabledStack.peek().clone(); + int hintHashSnapshotId = HintHashRegistry.snapshotCurrentHintHash(); + CompilerFlagNode flagNode = new CompilerFlagNode( + (java.util.BitSet) parser.ctx.symbolTable.warningFlagsStack.getLast().clone(), + fatalFlags, + disabledFlags, + parser.ctx.symbolTable.featureFlagsStack.getLast(), + parser.ctx.symbolTable.strictOptionsStack.getLast(), + warningScopeId, + hintHashSnapshotId, + parser.tokenIndex); + if (warningScopeId == 0 && hintHashSnapshotId == 0) { + flagNode.setAnnotation("compileTimeOnly", true); + } + return flagNode; + } + // Return an undefined operator node marked as compile-time-only // so it doesn't affect the file's return value OperatorNode result = new OperatorNode("undef", null, parser.tokenIndex); diff --git a/src/main/java/org/perlonjava/runtime/perlmodule/TimeHiRes.java b/src/main/java/org/perlonjava/runtime/perlmodule/TimeHiRes.java index 78b702015..9d5c21057 100644 --- a/src/main/java/org/perlonjava/runtime/perlmodule/TimeHiRes.java +++ b/src/main/java/org/perlonjava/runtime/perlmodule/TimeHiRes.java @@ -3,6 +3,7 @@ import org.perlonjava.runtime.operators.MathOperators; import org.perlonjava.runtime.operators.Time; import org.perlonjava.runtime.runtimetypes.RuntimeArray; +import org.perlonjava.runtime.runtimetypes.RuntimeContextType; import org.perlonjava.runtime.runtimetypes.RuntimeList; import org.perlonjava.runtime.runtimetypes.RuntimeScalar; @@ -54,10 +55,17 @@ public static RuntimeList nanosleep(RuntimeArray args, int ctx) { public static RuntimeList gettimeofday(RuntimeArray args, int ctx) { Instant now = Instant.now(); long seconds = now.getEpochSecond(); - double microseconds = now.getNano() / 1000.0; + long micros = now.getNano() / 1000L; + // In SCALAR/VOID context Time::HiRes::gettimeofday returns a single + // floating-point number `seconds + micros/1_000_000`. In LIST context + // it returns the (seconds, microseconds) pair as integers. + if (ctx != RuntimeContextType.LIST) { + double preciseEpochTime = seconds + micros / 1_000_000.0; + return new RuntimeScalar(preciseEpochTime).getList(); + } RuntimeList result = new RuntimeList(); result.add(new RuntimeScalar(seconds)); - result.add(new RuntimeScalar(microseconds)); + result.add(new RuntimeScalar(micros)); return result; } diff --git a/src/main/java/org/perlonjava/runtime/perlmodule/TimePiece.java b/src/main/java/org/perlonjava/runtime/perlmodule/TimePiece.java index b66a0a5cc..03da56b03 100644 --- a/src/main/java/org/perlonjava/runtime/perlmodule/TimePiece.java +++ b/src/main/java/org/perlonjava/runtime/perlmodule/TimePiece.java @@ -6,6 +6,7 @@ import java.text.DateFormatSymbols; import java.time.*; import java.time.format.DateTimeFormatter; +import java.time.format.DateTimeFormatterBuilder; import java.time.format.DateTimeParseException; import java.time.temporal.ChronoField; import java.util.*; @@ -76,7 +77,14 @@ public static RuntimeList _strptime(RuntimeArray args, int ctx) { String javaPattern = convertStrftimeToJava(format, locales); try { - DateTimeFormatter formatter = DateTimeFormatter.ofPattern(javaPattern, Locale.getDefault()); + // Use a lenient parser so non-padded numeric fields are accepted, + // matching POSIX strptime / Perl's Time::Piece::strptime. Without + // this, "1:13:8" won't match "%H:%M:%S" because Java's `HH`/`mm`/`ss` + // require exactly two digits. + DateTimeFormatter formatter = new DateTimeFormatterBuilder() + .parseLenient() + .appendPattern(javaPattern) + .toFormatter(Locale.getDefault()); // Try to parse - we need to handle partial dates LocalDateTime parsedDateTime = parseFlexible(dateString, formatter, javaPattern); diff --git a/src/main/java/org/perlonjava/runtime/regex/CaptureNameEncoder.java b/src/main/java/org/perlonjava/runtime/regex/CaptureNameEncoder.java index df539ab8a..d796d2297 100644 --- a/src/main/java/org/perlonjava/runtime/regex/CaptureNameEncoder.java +++ b/src/main/java/org/perlonjava/runtime/regex/CaptureNameEncoder.java @@ -62,6 +62,28 @@ public class CaptureNameEncoder { */ public static final int MAX_CAPTURE_NAME_LENGTH = 200; + /** + * Marker appended to capture group names that appear more than once in the + * same pattern, e.g. {@code (?a)|(?b)}. Perl accepts duplicate names + * in alternation; Java rejects them. RegexPreprocessor appends + * "{@value #DUPLICATE_MARKER}" to the second and later occurrences, and + * {@link #decodeGroupName} strips the suffix back off so user code sees + * the original name. + * + * The marker is intentionally distinctive (no underscores so it survives + * the underscore encoding round-trip; bookended by 'Z' so it is unlikely + * to collide with any real Perl capture name). + */ + public static final String DUPLICATE_MARKER = "ZpjdupZ"; + + /** + * Compiled pattern that matches the duplicate-name marker followed by its + * counter at the end of a Java group name. Used by {@link #decodeGroupName} + * and {@link #stripDuplicateMarker} to recover the original Perl name. + */ + private static final java.util.regex.Pattern DUPLICATE_MARKER_PATTERN = + java.util.regex.Pattern.compile(java.util.regex.Pattern.quote(DUPLICATE_MARKER) + "\\d+$"); + /** * Encodes a code block constant value into a capture group name. * Simple approach: hex-encode the string representation. @@ -195,13 +217,21 @@ public static String encodeGroupName(String perlName) { /** * Decodes a Java regex capture group name back to the original Perl name. - * Reverses the encoding done by encodeGroupName. + * Reverses the encoding done by encodeGroupName, and also strips any + * duplicate-name marker added by {@link RegexPreprocessor#handleNamedCapture} + * for patterns like {@code (?a)|(?b)}. * * @param javaName The encoded Java group name * @return The original Perl capture group name */ public static String decodeGroupName(String javaName) { - if (javaName == null || !javaName.contains("U95")) { + if (javaName == null) { + return javaName; + } + // First strip any duplicate-name marker (preprocessor adds this for + // names that appear more than once in alternation branches). + javaName = stripDuplicateMarker(javaName); + if (!javaName.contains("U95")) { return javaName; } // First restore underscores from "U95" @@ -211,6 +241,29 @@ public static String decodeGroupName(String javaName) { return decoded; } + /** + * Strips a trailing duplicate-name marker (e.g. "ZpjdupZ3") if present. + * Returns the input unchanged if no marker is present. + */ + public static String stripDuplicateMarker(String javaName) { + if (javaName == null || !javaName.contains(DUPLICATE_MARKER)) { + return javaName; + } + java.util.regex.Matcher m = DUPLICATE_MARKER_PATTERN.matcher(javaName); + if (m.find()) { + return javaName.substring(0, m.start()); + } + return javaName; + } + + /** + * Returns true if {@code javaName} carries a duplicate-name marker, i.e. + * it is the second-or-later occurrence of a duplicated capture name. + */ + public static boolean isDuplicateMarkerName(String javaName) { + return javaName != null && DUPLICATE_MARKER_PATTERN.matcher(javaName).find(); + } + /** * Checks if a capture group name is an internal name that should be hidden * from user-visible variables like %+ and %-. diff --git a/src/main/java/org/perlonjava/runtime/regex/RegexPreprocessor.java b/src/main/java/org/perlonjava/runtime/regex/RegexPreprocessor.java index db5582c46..b524c8c77 100644 --- a/src/main/java/org/perlonjava/runtime/regex/RegexPreprocessor.java +++ b/src/main/java/org/perlonjava/runtime/regex/RegexPreprocessor.java @@ -60,6 +60,16 @@ public class RegexPreprocessor { static boolean inlinePFlagEncountered; static boolean branchResetEncountered; static boolean backslashKEncountered; + /** + * Tracks named capture groups already emitted in the current pattern. + * Used to detect duplicate names like `(?a)|(?b)` (legal in Perl, + * rejected by Java). Duplicates get a synthetic suffix added by + * {@link #handleNamedCapture}; {@link CaptureNameEncoder#decodeGroupName} + * strips the suffix back off when reporting names to user code via + * `%+` / `%-`. + */ + static java.util.Set seenNamedCaptures = new java.util.HashSet<>(); + static int duplicateNameCounter; static void markDeferredUnicodePropertyEncountered() { deferredUnicodePropertyEncountered = true; @@ -104,6 +114,8 @@ static String preProcessRegex(String s, RegexFlags regexFlags) { inlinePFlagEncountered = false; branchResetEncountered = false; backslashKEncountered = false; + seenNamedCaptures.clear(); + duplicateNameCounter = 0; // First, escape invalid quantifier braces (Perl compatibility) // DISABLED: Causes test regressions - needs more work @@ -382,6 +394,46 @@ private static String expandMultiCharFolds(String pattern) { } } + // Skip the name of a named capture group / backreference. Group + // names are syntactic identifiers, not pattern text, so the fold + // expansion must not touch them — otherwise `(?...)` under /i + // becomes `(?...)` and Java rejects the resulting name. + if (!escaped && !inCharClass && ch == '(' && i + 2 < len + && pattern.charAt(i + 1) == '?' + && (pattern.charAt(i + 2) == '<' || pattern.charAt(i + 2) == '\'' + || (pattern.charAt(i + 2) == 'P' && i + 3 < len + && (pattern.charAt(i + 3) == '<' || pattern.charAt(i + 3) == '\'' || pattern.charAt(i + 3) == '=')))) { + // (?... | (?'name'... | (?P... | (?P'name'... | (?P=name) + int nameStart; + char closer; + if (pattern.charAt(i + 2) == 'P') { + if (pattern.charAt(i + 3) == '=') { + // (?P=name) — name terminates at ')' + nameStart = i + 4; + closer = ')'; + } else { + nameStart = i + 4; + closer = pattern.charAt(i + 3) == '<' ? '>' : '\''; + } + } else { + nameStart = i + 3; + closer = pattern.charAt(i + 2) == '<' ? '>' : '\''; + } + // Reject lookbehind: (?<= or (?' && nameStart < len + && (pattern.charAt(nameStart) == '=' || pattern.charAt(nameStart) == '!')) { + // Not a named capture; fall through to normal handling. + } else { + int closeIdx = pattern.indexOf(closer, nameStart); + if (closeIdx > nameStart) { + // Append the literal `(?` (or variant) verbatim, no folding. + result.append(pattern, i, closeIdx + 1); + i = closeIdx + 1; + continue; + } + } + } + // Handle escape sequences if (ch == '\\' && !escaped) { escaped = true; @@ -1214,6 +1266,14 @@ private static int handleNamedCapture(int c, String s, int offset, int length, S String name = s.substring(start, end); // Encode underscores for Java regex compatibility String encodedName = CaptureNameEncoder.encodeGroupName(name); + // Perl allows the same capture group name to appear in multiple + // alternation branches (e.g. `(?\d+)|(?foo)`). Java's regex + // engine rejects duplicates outright, so we suffix subsequent + // occurrences with a synthetic marker; CaptureNameEncoder.decodeGroupName + // strips the marker back off when reporting names to user code. + if (!seenNamedCaptures.add(encodedName)) { + encodedName = encodedName + CaptureNameEncoder.DUPLICATE_MARKER + (duplicateNameCounter++); + } sb.append("(?<").append(encodedName).append(">"); captureGroupCount++; // Increment counter for capturing groups return handleRegex(s, end + 1, sb, regexFlags, true); // Process content inside the group diff --git a/src/main/java/org/perlonjava/runtime/regex/RuntimeRegex.java b/src/main/java/org/perlonjava/runtime/regex/RuntimeRegex.java index d8ce0d308..5ba37696b 100644 --- a/src/main/java/org/perlonjava/runtime/regex/RuntimeRegex.java +++ b/src/main/java/org/perlonjava/runtime/regex/RuntimeRegex.java @@ -1141,6 +1141,25 @@ public static RuntimeBase replaceRegex(RuntimeScalar quotedRegex, RuntimeScalar CharSequence matchInput = new RegexTimeoutCharSequence(inputStr); Matcher matcher = pattern.matcher(matchInput); + // Honor pos() when \G is used. `s/\G.../.../` should anchor at + // pos($string) so a substitution inserted right after a previous /g + // match takes effect at the right offset (e.g. the + // DateTime::Format::Natural rewrite idiom: `$s =~ /pat/g; $s =~ s/\G/:00/`). + // Without setting region(), Java's matcher would scan from offset 0 + // and \G would anchor at 0, prepending the replacement. + if (regex.useGAssertion) { + RuntimeScalar posScalar = RuntimePosLvalue.pos(string); + if (posScalar.getDefinedBoolean()) { + int startPos = posScalar.getInt(); + if (startPos >= 0 && startPos <= inputStr.length()) { + matcher.region(startPos, inputStr.length()); + // Same rationale as matchRegex: keep ^/$ from anchoring + // at the artificial region boundary under /m. + matcher.useAnchoringBounds(false); + } + } + } + // The result string after substitutions StringBuilder resultBuffer = new StringBuilder(); int found = 0; diff --git a/src/main/java/org/perlonjava/runtime/runtimetypes/HashSpecialVariable.java b/src/main/java/org/perlonjava/runtime/runtimetypes/HashSpecialVariable.java index f4f69d721..6addcb3e3 100644 --- a/src/main/java/org/perlonjava/runtime/runtimetypes/HashSpecialVariable.java +++ b/src/main/java/org/perlonjava/runtime/runtimetypes/HashSpecialVariable.java @@ -77,27 +77,37 @@ public Set> entrySet() { Matcher matcher = RuntimeRegex.globalMatcher; if (matcher != null) { Map namedGroups = matcher.pattern().namedGroups(); + // Collect entries by decoded Perl name so that duplicate-name + // captures (e.g. `(?a)|(?b)`) merge into a single key. + java.util.Map> byPerlName = new java.util.LinkedHashMap<>(); for (String name : namedGroups.keySet()) { - // Skip internal captures (code blocks, \K marker) if (CaptureNameEncoder.isInternalCapture(name)) { continue; } - // Decode the name back to original Perl name (reverse underscore encoding) String perlName = CaptureNameEncoder.decodeGroupName(name); - String matchedValue = matcher.group(name); + byPerlName.computeIfAbsent(perlName, k -> new java.util.ArrayList<>()).add(name); + } + for (Map.Entry> e : byPerlName.entrySet()) { + String perlName = e.getKey(); + java.util.List javaNames = e.getValue(); if (this.mode == Id.CAPTURE_ALL) { - // For %-, values are always array refs (even for non-participating groups) + // For %-, value is an arrayref containing every alternative + // (matched ones get the captured value, unmatched get undef). RuntimeArray arr = new RuntimeArray(); - if (matchedValue != null) { - arr.push(new RuntimeScalar(matchedValue)); - } else { - arr.push(new RuntimeScalar()); // undef for non-participating groups + for (String jn : javaNames) { + String v = matcher.group(jn); + arr.push(v != null ? new RuntimeScalar(v) : new RuntimeScalar()); } entries.add(new SimpleEntry<>(perlName, arr.createReference())); } else { - // For %+, only include groups that actually matched - if (matchedValue != null) { - entries.add(new SimpleEntry<>(perlName, new RuntimeScalar(matchedValue))); + // For %+, only include the alternative that actually matched. + // For duplicate names at most one branch will have matched. + for (String jn : javaNames) { + String v = matcher.group(jn); + if (v != null) { + entries.add(new SimpleEntry<>(perlName, new RuntimeScalar(v))); + break; + } } } } @@ -186,24 +196,29 @@ public RuntimeScalar get(Object key) { if (matcher != null && key instanceof String name) { // Encode the Perl name to Java regex name (underscore encoding) String encodedName = CaptureNameEncoder.encodeGroupName(name); - // Check if this is a valid named group - if (!matcher.pattern().namedGroups().containsKey(encodedName)) { + Map namedGroups = matcher.pattern().namedGroups(); + // Collect every Java group whose decoded Perl name matches the + // requested key. For non-duplicated names this is just the + // single direct match; for duplicated names we may have several. + java.util.List javaNames = collectJavaNamesFor(namedGroups, encodedName); + if (javaNames.isEmpty()) { return scalarUndef; } - String matchedValue = matcher.group(encodedName); if (this.mode == Id.CAPTURE_ALL) { - // For %-, always return array ref (with undef for non-participating groups) + // For %-, always return array ref containing one slot per alternative. RuntimeArray arr = new RuntimeArray(); - if (matchedValue != null) { - arr.push(new RuntimeScalar(matchedValue)); - } else { - arr.push(new RuntimeScalar()); // undef + for (String jn : javaNames) { + String v = matcher.group(jn); + arr.push(v != null ? new RuntimeScalar(v) : new RuntimeScalar()); } return arr.createReference(); } else { - // For %+, return the matched value or undef - if (matchedValue != null) { - return new RuntimeScalar(matchedValue); + // For %+, return the matched value (or undef if no branch matched). + for (String jn : javaNames) { + String v = matcher.group(jn); + if (v != null) { + return new RuntimeScalar(v); + } } } } @@ -230,7 +245,7 @@ public boolean containsKey(Object key) { Matcher matcher = RuntimeRegex.globalMatcher; if (matcher != null && key instanceof String name) { String encodedName = CaptureNameEncoder.encodeGroupName(name); - return matcher.pattern().namedGroups().containsKey(encodedName); + return !collectJavaNamesFor(matcher.pattern().namedGroups(), encodedName).isEmpty(); } return false; } @@ -239,13 +254,43 @@ public boolean containsKey(Object key) { Matcher matcher = RuntimeRegex.globalMatcher; if (matcher != null && key instanceof String name) { String encodedName = CaptureNameEncoder.encodeGroupName(name); - return matcher.pattern().namedGroups().containsKey(encodedName) && matcher.group(encodedName) != null; + for (String jn : collectJavaNamesFor(matcher.pattern().namedGroups(), encodedName)) { + if (matcher.group(jn) != null) { + return true; + } + } } return false; } return super.containsKey(key); } + /** + * Returns every Java capture-group name in the matcher's pattern whose + * decoded Perl name equals {@code encodedPerlName}. For typical patterns + * this is at most one entry; for duplicate-name patterns like + * {@code (?a)|(?b)} the preprocessor renames the second occurrence + * to {@code yZpjdupZ0}, etc., and this helper collects all of them. + */ + private static java.util.List collectJavaNamesFor(Map namedGroups, String encodedPerlName) { + java.util.List out = new java.util.ArrayList<>(); + if (namedGroups == null) { + return out; + } + if (namedGroups.containsKey(encodedPerlName)) { + out.add(encodedPerlName); + } + // Also pick up duplicate-marker variants (e.g. nameZpjdupZ0, ZpjdupZ1, ...). + for (String jn : namedGroups.keySet()) { + if (!jn.equals(encodedPerlName) + && CaptureNameEncoder.isDuplicateMarkerName(jn) + && CaptureNameEncoder.stripDuplicateMarker(jn).equals(encodedPerlName)) { + out.add(jn); + } + } + return out; + } + @Override public RuntimeScalar put(String key, RuntimeScalar value) { if (this.mode == Id.STASH) { diff --git a/src/test/resources/unit/operator_overrides.t b/src/test/resources/unit/operator_overrides.t index 3eee61067..c9c670df5 100644 --- a/src/test/resources/unit/operator_overrides.t +++ b/src/test/resources/unit/operator_overrides.t @@ -168,4 +168,25 @@ subtest 'die operator override' => sub { is($result, "DIED: test error", 'die override returned custom value'); }; +subtest 'sleep operator override' => sub { + plan tests => 3; + + # Override sleep globally so it doesn't actually wait. + # Required for Test::MockTime::HiRes-style mocking. + our @sleep_args; + BEGIN { + *CORE::GLOBAL::sleep = sub { + push @sleep_args, $_[0]; + return $_[0]; + }; + } + + my $start = time; + my $rc = sleep 5; + my $elapsed = time - $start; + is($rc, 5, 'sleep override returned the requested duration'); + cmp_ok($elapsed, '<', 2, 'sleep override did not actually wait'); + is_deeply(\@sleep_args, [5], 'sleep override saw the right argument'); +}; + done_testing(); diff --git a/src/test/resources/unit/regex/regex_g_pos.t b/src/test/resources/unit/regex/regex_g_pos.t index 6550d4f82..d43edbcd9 100644 --- a/src/test/resources/unit/regex/regex_g_pos.t +++ b/src/test/resources/unit/regex/regex_g_pos.t @@ -155,6 +155,22 @@ if ($data =~ /\G(\w+)/) { ok(0, '\\G non-/g match with capture should match at pos 6'); } +################### +# `s/\G.../.../` honors pos() set by a previous /g match. +# This is the idiom DateTime::Format::Natural uses to rewrite "feb 28 at 3" +# into "feb 28 at 3:00": +# $s =~ /\S+? \s+? at \s+? (\S+)/g; # leaves pos($s) at end of match +# $s =~ s/\G/:00/; # must insert at pos(), not at 0 +{ + my $s = "feb 28 at 3"; + if ($s =~ /\S+? \s+? at \s+? (\S+)/gx) { + $s =~ s/\G/:00/; + is($s, "feb 28 at 3:00", '\\G in s/// honors pos() from previous /g'); + } else { + fail('precondition: /g match did not succeed'); + } +} + ################### # End of Perl `pos` and `\G` Tests diff --git a/src/test/resources/unit/regex/regex_named_capture.t b/src/test/resources/unit/regex/regex_named_capture.t index 795684a68..bb4261685 100644 --- a/src/test/resources/unit/regex/regex_named_capture.t +++ b/src/test/resources/unit/regex/regex_named_capture.t @@ -1,7 +1,7 @@ #!/usr/bin/perl use strict; use warnings; -use Test::More tests => 16; +use Test::More tests => 22; # Test case 1: Simple named capture my $string1 = 'foo'; @@ -83,5 +83,41 @@ if ($string9 =~ /(?mouse)(rat)\g{-2}/) { fail('Test case 9: Pattern did not match'); } +# Test case 10: Duplicate named capture across alternation branches. +# Perl 5.10+ allows the same name in multiple branches; only one branch +# can match at a time. Also exercises that /i fold expansion does not +# corrupt the capture-group name. +{ + my @cases = ( + ['foo', 'foo'], + ['bar', 'bar'], + ['BAR', 'BAR'], # /i fold; name 'off'-shaped names must survive + ); + for my $c (@cases) { + my ($s, $expect) = @$c; + if ($s =~ /(?foo)|(?bar)/i) { + is($+{y}, $expect, "Test case 10: dup-name '$s' -> \$+{y} = '$expect'"); + } else { + fail("Test case 10: pattern did not match '$s'"); + } + } + # %- aggregates all alternatives; only the matched branch carries a value. + 'foo' =~ /(?foo)|(?bar)/; + is_deeply($-{y}, ['foo', undef], 'Test case 10: %- aggregates dup-name branches'); + 'bar' =~ /(?foo)|(?bar)/; + is_deeply($-{y}, [undef, 'bar'], 'Test case 10: %- second branch'); +} + +# Test case 11: capture-group name that contains characters which trigger +# /i multi-char fold expansion ('ff' would become (?:ff|fi) without the +# preprocessor skipping name text). Regression test for Date::Manip use. +{ + if ('OFF' =~ /(?off)/i) { + is($+{off}, 'OFF', 'Test case 11: /i name with foldable letters (off)'); + } else { + fail('Test case 11: /i pattern with foldable name did not match'); + } +} + done_testing(); diff --git a/src/test/resources/unit/warnings.t b/src/test/resources/unit/warnings.t index c20de49f5..61fcabaa4 100644 --- a/src/test/resources/unit/warnings.t +++ b/src/test/resources/unit/warnings.t @@ -1,5 +1,5 @@ use strict; -use Test::More tests => 6; +use Test::More tests => 7; # Note: warnings::enabled() is currently broken - it always returns false # because warning flags are set at compile time but getCurrentScope() at @@ -75,3 +75,21 @@ use Test::More tests => 6; # no warnings 'numeric'; # ok(!warnings::enabled('numeric'), "'no warnings \"numeric\"' disables numeric"); # ok(warnings::enabled('substr'), "other categories remain enabled"); + +# Test 7: BEGIN { unimport warnings 'cat' } inside a sub propagates the +# suppression to runtime. Module::Util uses this idiom to silence +# File::Find's "Can't stat" warnings. +{ + use File::Find; + sub _find_with_no_warn { + BEGIN { unimport warnings qw(File::Find) if $] >= 5.008 } + my @out; + File::Find::find({ no_chdir => 1, wanted => sub { push @out, $_ } }, $_[0]); + return @out; + } + + my @w; + local $SIG{__WARN__} = sub { push @w, @_ }; + _find_with_no_warn("/no/such/path-for-warnings-test"); + is(scalar(@w), 0, 'BEGIN { unimport warnings ... } inside sub propagates to runtime'); +}