Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions dev/modules/xml_simple.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# XML::Simple Fix Plan

## Overview

**Module**: XML::Simple 2.25 (depends on XML::SAX 1.02, XML::SAX::PurePerl)
**Test command**: `./jcpan -j 8 -t XML::Simple`
**Status**: **14/14 test files pass, 487/487 subtests (100%)** — ALL FIXED

## Dependency Tree

```
XML::Simple 2.25
├── XML::SAX 1.02 (ParserDetails.ini parsing)
│ └── XML::SAX::PurePerl (default parser)
│ └── XML::SAX::PurePerl::Productions (Unicode regex char classes)
│ └── XML::SAX::Base (indirect method call: throw)
│ └── XML::SAX::Exception (throw method)
├── XML::Parser (optional, not installed)
├── XML::SAX::Expat (optional, not installed)
└── Storable (lock_nstore missing)
```

## Test Results Summary

### Final Status: 14/14 test files pass, 487/487 subtests (100%)

| Test File | Plan | Passed | Status |
|-----------|------|--------|--------|
| t/0_Config.t | 1 | 1 | PASS |
| t/1_XMLin.t | 132 | 132 | PASS |
| t/2_XMLout.t | 201 | 201 | PASS |
| t/3_Storable.t | 4 | 4 | PASS |
| t/4_MemShare.t | 8 | 8 | PASS |
| t/5_MemCopy.t | 7 | 7 | PASS |
| t/6_ObjIntf.t | 37 | 37 | PASS |
| t/7_SaxStuff.t | 8 | 8 | PASS |
| t/8_Namespaces.t | 8 | 8 | PASS |
| t/9_Strict.t | 44 | 44 | PASS |
| t/A_XMLParser.t | 0 | 0 | SKIP (no XML::Parser) |
| t/B_Hooks.t | 12 | 12 | PASS |
| t/C_External_Entities.t | 0 | 0 | SKIP (no XML::Parser) |
| t/author-pod-syntax.t | 0 | 0 | SKIP (author tests) |

## Bugs Fixed

### Bug 1: `/^$/m` doesn't match empty strings — Java MULTILINE quirk

**Fix**: Strip MULTILINE flag for empty input strings in `matchRegexDirect` and `replaceRegex`.
**Files**: `RuntimeRegex.java`

### Bug 2: `\x{NNNN}` broken in regex character classes `[...]`

**Fix**: Added `\x{...}` handler in `handleRegexCharacterClassEscape`, updated range validator.
**Files**: `RegexPreprocessorHelper.java`

### Bug 3: Indirect method call fails with unknown method + qualified class

**Fix**: Allow indirect method call when packageName contains `::`.
**Files**: `SubroutineParser.java`

### Bug 4: Storable `lock_nstore`/`lock_store`/`lock_retrieve` not implemented

**Fix**: Added stubs delegating to non-locking variants.
**Files**: `Storable.pm`

### Bug 5: `use vars` / Exporter-imported single-letter variables rejected under strict

**Root cause**: Single-letter globals like `$A`-`$Z` auto-vivified under `no strict` would
incorrectly bypass `use strict 'vars'` later (since `existsGlobalVariable` returned true).
A blunt fix that forced `existsGlobally = false` for all single-letter vars also blocked
legitimately imported ones (e.g., `$S` from XML::SAX::PurePerl::Productions via Exporter).

**Fix**: Added `declaredGlobalVariables/Arrays/Hashes` tracking sets in `GlobalVariable.java`
that distinguish explicitly declared globals (`use vars`, Exporter glob assignment) from
auto-vivified ones. `Vars.importVars()` and `RuntimeGlob.set()` (REFERENCE/ARRAY/HASH cases)
now call `declareGlobal*()`. The strict check for single-letter vars consults
`isDeclaredGlobalVariable()` instead of unconditionally blocking.

**Files**: `GlobalVariable.java`, `Vars.java`, `RuntimeGlob.java`, `Variable.java`, `EmitVariable.java`, `BytecodeCompiler.java`

### Bug 6: `Encode::define_alias` missing

**Fix**: Added stub delegating to `Encode::Alias::define_alias`.
**Files**: `Encode.pm`

### Bug 7: `warnings::enabled()` broken for custom categories from `warnings::register`

**Root cause**: Two issues:
1. `warnings::enabled()` (no args) used `getCallerPackageAtLevel(0)` which added a +1 offset,
skipping the direct caller and returning the wrong package name. Fixed to use `caller(0)`
directly to get the correct calling package.
2. `isEnabledInBits()` / `isFatalInBits()` failed for custom warning categories because the
category's bit position (offset >= 128) either exceeded the standard 21-byte warning bits
string or wasn't set in the bits string (since the category was registered via
`warnings::register` after `use warnings` compiled). Fixed to fall back to checking if the
"all" category is enabled — in Perl 5, `use warnings` (which enables "all") implicitly
enables all custom categories, including ones registered later.

**Files**: `Warnings.java`, `WarningFlags.java`

## Progress Tracking

### Current Status: COMPLETE — 487/487 subtests pass (100%)

### Completed Phases
- [x] Investigation (2026-04-03)
- Identified 7 PerlOnJava bugs
- Files analyzed: RegexPreprocessorHelper.java, RuntimeRegex.java, SubroutineParser.java, Storable.pm, Warnings.java, WarningFlags.java
- [x] Phase 1: Fix `\x{NNNN}` in regex character classes (2026-04-03)
- Files: RegexPreprocessorHelper.java
- [x] Phase 2: Fix `/^$/m` empty string matching (2026-04-03)
- Files: RuntimeRegex.java
- [x] Phase 3: Fix indirect method call with qualified class (2026-04-03)
- Files: SubroutineParser.java
- [x] Phase 4: Add Storable lock stubs (2026-04-03)
- Files: Storable.pm
- [x] Phase 5: Fix strict vars for single-letter globals (declared vs auto-vivified) (2026-04-03)
- Added declaredGlobals tracking to GlobalVariable.java
- Hooked Vars.importVars() and RuntimeGlob.set() to declare globals
- Files: GlobalVariable.java, Vars.java, RuntimeGlob.java, Variable.java, EmitVariable.java, BytecodeCompiler.java
- [x] Phase 6: Add `Encode::define_alias` stub (2026-04-03)
- Files: Encode.pm
- [x] Phase 7: Fix `warnings::enabled()` for custom categories (2026-04-03)
- Fixed caller(0) offset in enabled()/fatalEnabled()/warnIf() methods
- Fixed isEnabledInBits/isFatalInBits fallback for custom categories
- Files: Warnings.java, WarningFlags.java

## Related Documents
- `dev/modules/README.md` - Module porting index
- `dev/modules/cpan_patch_plan.md` - CPAN patching strategy
Original file line number Diff line number Diff line change
Expand Up @@ -462,9 +462,8 @@ boolean shouldBlockGlobalUnderStrictVars(String varName) {
allowIfAlreadyExists = true;
}

// Perl's strict 'vars' requires declaration for unqualified single-letter globals
// even if they were previously created under 'no strict'.
// This mirrors EmitVariable.java lines 349-359.
// Single-letter scalars ($A-$Z) bypass strict only if explicitly declared
// (via use vars or Exporter import), not if merely auto-vivified under 'no strict'.
boolean isSpecialSortVar = sigil.equals("$")
&& (bareVarName.equals("a") || bareVarName.equals("b"));
if (sigil.equals("$")
Expand All @@ -473,7 +472,9 @@ boolean shouldBlockGlobalUnderStrictVars(String varName) {
&& Character.isLetter(bareVarName.charAt(0))
&& !bareVarName.contains("::")
&& !isSpecialSortVar) {
allowIfAlreadyExists = false;
if (!GlobalVariable.isDeclaredGlobalVariable(normalizedName)) {
allowIfAlreadyExists = false;
}
}

return !allowIfAlreadyExists;
Expand Down
9 changes: 5 additions & 4 deletions src/main/java/org/perlonjava/backend/jvm/EmitVariable.java
Original file line number Diff line number Diff line change
Expand Up @@ -438,16 +438,17 @@ static void handleVariableOperator(EmitterVisitor emitterVisitor, OperatorNode n
allowIfAlreadyExists = GlobalVariable.existsGlobalHash(normalizedName);
}

// Perl's strict 'vars' requires declaration for unqualified globals like $A
// even if they were previously created under 'no strict'.
// Keep this narrow to avoid changing behavior for other globals.
// Single-letter scalars ($A-$Z) bypass strict only if explicitly declared
// (via use vars or Exporter import), not if merely auto-vivified under 'no strict'.
if (sigil.equals("$")
&& name != null
&& name.length() == 1
&& Character.isLetter(name.charAt(0))
&& !name.contains("::")
&& !isSpecialSortVar) {
allowIfAlreadyExists = false;
if (!GlobalVariable.isDeclaredGlobalVariable(normalizedName)) {
allowIfAlreadyExists = false;
}
}
}

Expand Down
2 changes: 1 addition & 1 deletion src/main/java/org/perlonjava/core/Configuration.java
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ public final class Configuration {
* Automatically populated by Gradle/Maven during build.
* DO NOT EDIT MANUALLY - this value is replaced at build time.
*/
public static final String gitCommitId = "20340661d";
public static final String gitCommitId = "1b7317697";

/**
* Git commit date of the build (ISO format: YYYY-MM-DD).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -229,11 +229,12 @@ static Node parseSubroutineCall(Parser parser, boolean isMethod) {
// Reject if:
// 1. Explicitly marked as non-package (false in cache), OR
// 2. Unknown package (null) AND unknown subroutine (!isKnownSub) AND followed by '('
// - this is a function call like mycan(...)
// AND name is not package-qualified (no ::) - this is a function call like mycan(...)
// Allow if:
// - Marked as package (true), OR
// - Unknown (null) but NOT followed by '(' - like 'new NonExistentClass'
if ((isPackage != null && !isPackage) || (isPackage == null && !isKnownSub && token.text.equals("("))) {
// - Name contains '::' (qualified names are always treated as packages in indirect syntax)
if ((isPackage != null && !isPackage) || (isPackage == null && !isKnownSub && token.text.equals("(") && !packageName.contains("::"))) {
parser.tokenIndex = currentIndex2;
} else {
// Not a known subroutine, check if it's valid indirect object syntax
Expand Down
7 changes: 5 additions & 2 deletions src/main/java/org/perlonjava/frontend/parser/Variable.java
Original file line number Diff line number Diff line change
Expand Up @@ -356,11 +356,14 @@ private static void checkStrictVarsAtParseTime(Parser parser, String sigil, Stri
} else if (sigil.equals("%") && !normalizedName.endsWith("::"))
existsGlobally = GlobalVariable.existsGlobalHash(normalizedName);

// Single-letter scalars require declaration even if they exist globally
// Single-letter scalars ($A-$Z) bypass strict only if explicitly declared
// (via use vars or Exporter import), not if merely auto-vivified under 'no strict'.
if (sigil.equals("$") && varName.length() == 1
&& Character.isLetter(varName.charAt(0))
&& !varName.equals("a") && !varName.equals("b")) {
existsGlobally = false;
if (!GlobalVariable.isDeclaredGlobalVariable(normalizedName)) {
existsGlobally = false;
}
}

if (existsGlobally) return;
Expand Down
3 changes: 3 additions & 0 deletions src/main/java/org/perlonjava/runtime/perlmodule/Vars.java
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,15 @@ public static RuntimeList importVars(RuntimeArray args, int ctx) {
if (variableString.startsWith("$")) {
// Create a scalar variable
GlobalVariable.getGlobalVariable(fullName);
GlobalVariable.declareGlobalVariable(fullName);
} else if (variableString.startsWith("@")) {
// Create an array variable
GlobalVariable.getGlobalArray(fullName);
GlobalVariable.declareGlobalArray(fullName);
} else if (variableString.startsWith("%")) {
// Create a hash variable
GlobalVariable.getGlobalHash(fullName);
GlobalVariable.declareGlobalHash(fullName);
} else if (variableString.startsWith("&")) {
// Create a code variable
GlobalVariable.getGlobalCodeRef(fullName);
Expand Down
29 changes: 23 additions & 6 deletions src/main/java/org/perlonjava/runtime/perlmodule/Warnings.java
Original file line number Diff line number Diff line change
Expand Up @@ -360,8 +360,15 @@ public static RuntimeList enabled(RuntimeArray args, int ctx) {
if (args.size() > 0) {
category = args.get(0).toString();
} else {
// No args: use calling package as category (Perl 5 behavior)
String pkg = getCallerPackageAtLevel(0);
// No args: use calling package as category (Perl 5 behavior).
// Use caller(0) directly (without the +1 offset from getCallerPackageAtLevel)
// because this registered Java method adds a frame to the caller stack,
// and we need the direct caller's package (caller(0)).
RuntimeList callerInfo = RuntimeCode.caller(
new RuntimeList(RuntimeScalarCache.getScalarInt(0)),
RuntimeContextType.LIST
);
String pkg = (callerInfo.size() > 0) ? callerInfo.elements.get(0).toString() : null;
category = (pkg != null) ? pkg : "all";
}

Expand Down Expand Up @@ -413,8 +420,13 @@ public static RuntimeList fatalEnabled(RuntimeArray args, int ctx) {
if (args.size() > 0) {
category = args.get(0).toString();
} else {
// No args: use calling package as category (Perl 5 behavior)
String pkg = getCallerPackageAtLevel(0);
// No args: use calling package as category (Perl 5 behavior).
// Use caller(0) directly to get the direct caller's package.
RuntimeList callerInfo = RuntimeCode.caller(
new RuntimeList(RuntimeScalarCache.getScalarInt(0)),
RuntimeContextType.LIST
);
String pkg = (callerInfo.size() > 0) ? callerInfo.elements.get(0).toString() : null;
category = (pkg != null) ? pkg : "all";
}

Expand Down Expand Up @@ -491,9 +503,14 @@ public static RuntimeList warnIf(RuntimeArray args, int ctx) {
category = args.get(0).toString();
message = args.get(1);
} else {
// warnif(message) - use calling package as category
// warnif(message) - use calling package as category.
// Use caller(0) directly to get the direct caller's package.
message = args.get(0);
String pkg = getCallerPackageAtLevel(0);
RuntimeList callerInfo = RuntimeCode.caller(
new RuntimeList(RuntimeScalarCache.getScalarInt(0)),
RuntimeContextType.LIST
);
String pkg = (callerInfo.size() > 0) ? callerInfo.elements.get(0).toString() : null;
category = (pkg != null) ? pkg : "main";
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -561,6 +561,30 @@ static int handleRegexCharacterClassEscape(int offset, String s, StringBuilder s
if (nextChar == 'b' || nextChar == 'N' || nextChar == 'p' || nextChar == 'P') {
// These are special escapes, can't be in range
nextChar = -1;
} else if (nextChar == 'x' && nextPos + 2 < length && s.charAt(nextPos + 2) == '{') {
// Parse \x{NNNN} as range endpoint
int endBrace = s.indexOf('}', nextPos + 3);
if (endBrace != -1) {
String hex = s.substring(nextPos + 3, endBrace).trim().replace("_", "");
try {
nextChar = Integer.parseInt(hex, 16);
rangeEndCharCount = endBrace - nextPos + 1;
} catch (NumberFormatException e) {
nextChar = -1;
}
}
} else if (nextChar == 'o' && nextPos + 2 < length && s.charAt(nextPos + 2) == '{') {
// Parse \o{NNNN} as range endpoint
int endBrace = s.indexOf('}', nextPos + 3);
if (endBrace != -1) {
String oct = s.substring(nextPos + 3, endBrace).trim().replace("_", "");
try {
nextChar = Integer.parseInt(oct, 8);
rangeEndCharCount = endBrace - nextPos + 1;
} catch (NumberFormatException e) {
nextChar = -1;
}
}
}
}

Expand Down Expand Up @@ -661,6 +685,25 @@ static int handleRegexCharacterClassEscape(int offset, String s, StringBuilder s
} else {
RegexPreprocessor.regexError(s, offset, "Missing right brace on \\o{}");
}
} else if (offset + 1 < length && s.charAt(offset) == 'x' && s.charAt(offset + 1) == '{') {
// Handle \x{...} hex escape construct in character class
offset += 2; // Skip past x{
int endBrace = s.indexOf('}', offset);
if (endBrace != -1) {
String hexStr = s.substring(offset, endBrace).trim();
// Remove underscores (Perl allows them in number literals)
hexStr = hexStr.replace("_", "");
try {
int value = Integer.parseInt(hexStr, 16);
sb.append(String.format("x{%X}", value));
offset = endBrace;
lastChar = value;
} catch (NumberFormatException e) {
RegexPreprocessor.regexError(s, offset, "Invalid hex number in \\x{...}");
}
} else {
RegexPreprocessor.regexError(s, offset, "Missing right brace on \\x{}");
}
} else if (s.codePointAt(offset) == 'b') {
// \b inside character class = backspace in Perl
// Java doesn't support \b in [...], so convert to \x08
Expand Down
13 changes: 13 additions & 0 deletions src/main/java/org/perlonjava/runtime/regex/RuntimeRegex.java
Original file line number Diff line number Diff line change
Expand Up @@ -574,6 +574,14 @@ private static RuntimeBase matchRegexDirect(RuntimeScalar quotedRegex, RuntimeSc
// else: BYTE_STRING or Latin-1 only content - keep ASCII pattern (default)
}

// Workaround for Java MULTILINE quirk: Java's Pattern.MULTILINE changes ^ to only
// match after line terminators, so "^" fails on empty strings. In Perl, /m makes ^
// and $ match at line boundaries AND at start/end of string. Since empty strings have
// no line breaks, MULTILINE is irrelevant and we can safely strip it.
if (inputStr.isEmpty() && (pattern.flags() & Pattern.MULTILINE) != 0) {
pattern = Pattern.compile(pattern.pattern(), pattern.flags() & ~Pattern.MULTILINE);
}

CharSequence matchInput = new RegexTimeoutCharSequence(inputStr);
Matcher matcher = pattern.matcher(matchInput);

Expand Down Expand Up @@ -946,6 +954,11 @@ public static RuntimeBase replaceRegex(RuntimeScalar quotedRegex, RuntimeScalar
// else: BYTE_STRING or Latin-1 only content - keep ASCII pattern (default)
}

// Workaround for Java MULTILINE quirk (same as matchRegexDirect)
if (inputStr.isEmpty() && (pattern.flags() & Pattern.MULTILINE) != 0) {
pattern = Pattern.compile(pattern.pattern(), pattern.flags() & ~Pattern.MULTILINE);
}

CharSequence matchInput = new RegexTimeoutCharSequence(inputStr);
Matcher matcher = pattern.matcher(matchInput);

Expand Down
Loading
Loading