From 7eea297b000ef8b156208c964250394e61c48c36 Mon Sep 17 00:00:00 2001 From: "Flavio S. Glock" Date: Mon, 27 Apr 2026 10:18:00 +0200 Subject: [PATCH 1/5] docs(active-resource): plan for `jcpan -t ActiveResource` failures MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Document the dependency-chain failures uncovered while running `jcpan -t ActiveResource`: ActiveResource ├── Class::Accessor::Lvalue (XS dep "Want" — no Java port) └── XML::Hash └── Test::XML └── XML::SemanticDiff (2 fails in t/16zero_to_empty_str_cmp.t) Identifies three independent issues with priority order: 1. Encode %EXPORT_TAGS missing :all / :default (cheap, broad impact) 2. SAX empty-element text reported as '' instead of undef (medium) 3. Want XS module needed by Class::Accessor::Lvalue (deferred) This PR will land #1 and (if scoped small) #2; #3 is tracked as a follow-up because it requires either a Pure-Perl Want shim or a full Java port. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --- dev/modules/active_resource.md | 174 +++++++++++++++++++++++++++++++++ 1 file changed, 174 insertions(+) create mode 100644 dev/modules/active_resource.md diff --git a/dev/modules/active_resource.md b/dev/modules/active_resource.md new file mode 100644 index 000000000..8f71fd77e --- /dev/null +++ b/dev/modules/active_resource.md @@ -0,0 +1,174 @@ +# jcpan ActiveResource Fix Plan + +## Overview + +Tracks the issues uncovered while running `jcpan -t ActiveResource` and the +plan to address them. `ActiveResource` itself never even reaches its own test +files — it fails because of a chain of dependencies. Each link in the chain +fails for a different reason, and several of the failures are independently +useful to fix because they affect many other CPAN modules. + +## Dependency Chain + +``` +ActiveResource +├── Class::Accessor::Lvalue (XS dep "Want" — no Java port) +└── XML::Hash + └── Test::XML + └── XML::SemanticDiff (2 subtests fail in t/16zero_to_empty_str_cmp.t) +``` + +`ActiveResource::Base` `use`s `Class::Accessor::Lvalue::Fast`, so even with +`--force` the module is unreachable until the `Want` problem is solved. + +## Issues + +### 1. Encode `%EXPORT_TAGS` missing `:all` and `:default` (LOW EFFORT, HIGH IMPACT) + +Real Perl's `Encode.pm` exposes: + +``` +keys %Encode::EXPORT_TAGS = (all, default, fallbacks, fallback_all) +``` + +PerlOnJava's `src/main/perl/lib/Encode.pm` only sets `fallbacks` and +`fallback_all` (these come from the XS half). Any module that does +`use Encode qw(:all)` or `qw(:default)` dies during import: + +``` +"all" is not defined in %Encode::EXPORT_TAGS at (eval N) line 1. +``` + +Observed in `Test::XML`'s `t/sax.t`, `t/basic.t`, and elsewhere. This is a +self-contained 3-line fix. + +**Plan**: extend `src/main/perl/lib/Encode.pm` to populate `%EXPORT_TAGS`: + +```perl +our %EXPORT_TAGS = ( + all => [ @EXPORT, @EXPORT_OK ], + default => [ @EXPORT ], +); +``` + +(The XS half already merges its own `fallbacks` / `fallback_all` keys in.) + +**Verification**: +- `./jperl -e 'use Encode qw(:all); print "ok\n"'` +- `./jperl -e 'use Encode qw(:default); print "ok\n"'` +- Compare `keys %Encode::EXPORT_TAGS` with system `perl`. + +**Priority**: HIGH (cheap, unblocks Encode-using modules). + +--- + +### 2. SAX empty-element text reported as `''` instead of `undef` (MEDIUM EFFORT) + +`XML::SemanticDiff/t/16zero_to_empty_str_cmp.t` has 2 failing subtests: + +``` +# Failed test 'check new value undef' +# got: '' +# expected: undef +``` + +The test compares `0` against `` and ``. Real Perl +yields `undef` for the new empty/self-closing element's text content; +PerlOnJava yields `''`. + +**Suspected root cause**: PerlOnJava's XML::SAX (likely the bundled +`XML::SAX::PurePerl`, or a Java-backed parser) emits a zero-length +`characters` event for empty elements, or stores `''` where real Perl leaves +the field unset, so `XML::SemanticDiff`'s `keepdata` walk sees `''` instead +of `undef`. + +**Plan**: +1. Build a 5-line repro: parse `` and `0` with + `XML::SAX::ParserFactory`, dump the events. +2. Diff the event stream against system `perl`. +3. Fix the divergence in whichever of XML::SAX::PurePerl or the Java SAX + bridge is responsible. Prefer fixing the parser, not XML::SemanticDiff. +4. Re-run `t/16zero_to_empty_str_cmp.t` (and the rest of XML-SemanticDiff to + make sure no regressions). + +**Priority**: MEDIUM (unblocks XML::SemanticDiff → Test::XML → XML::Hash). + +--- + +### 3. `Class::Accessor::Lvalue` blocked by missing `Want` XS module (HIGH EFFORT) + +``` +Error: Can't load loadable object for module Want: no Java XS implementation available +``` + +`Want` is pure XS — it walks Perl's op tree to determine the calling +context (lvalue / rvalue / wantarray / assign). PerlOnJava has no port. +Without `Want`, both `Class::Accessor::Lvalue` and `Class::Accessor::Lvalue::Fast` +die at `require`, which in turn blocks `ActiveResource::Base`. + +Test failures in `Class-Accessor-Lvalue-0.11`: +- `t/lval.t`, `t/lval-fast.t`: subtests 1 (require fails) and 5–7 (the + croak diagnostics that Want would normally produce never fire). + +**Options** (in order of preference): + +A. **Pure-Perl `Want` shim** — provide just the subset `Class::Accessor::Lvalue` + actually uses: `want('LVALUE')`, `want('RVALUE')`, `want('ASSIGN')`, + `rreturn`, `lnoreturn`. Implement using `caller`/`(caller(N))[5]` for + wantarray-ish information; the LVALUE/ASSIGN paths are the hard part and + may need PerlOnJava-specific hooks (see below). + +B. **Java port of `Want`** — full implementation that introspects the + PerlOnJava op tree / call frames. Largest effort, but unlocks every + downstream module that uses Want (DBIx::Class::Schema::Loader, several + accessor frameworks, etc.). + +C. **Defer ActiveResource** — accept that ActiveResource is unreachable for + now and only deliver fixes #1 and #2 in this PR; track Want as a + follow-up issue. + +**Plan for this PR**: Option C. Document the `Want` blocker, link to a +follow-up ticket, and ship the two cheap wins. Want is too large to +combine with these fixes. + +**Priority**: deferred (own design doc / PR). + +--- + +## Out-of-Scope (this PR) + +- Implementing `Want`. +- The `Class::Accessor::Lvalue` test failures beyond require — they are + symptoms of #3, not independent bugs. +- `XML::Hash` t/01-apitest.t — purely a cascade from #2. +- ActiveResource's own test files — purely a cascade from #3. + +## Deliverables (this PR) + +1. `dev/modules/active_resource.md` — this document. +2. Fix #1: populate `%Encode::EXPORT_TAGS` with `all` and `default`. +3. Fix #2: SAX empty-element `undef` parity (if root cause is small; + otherwise split out to its own PR after the repro is written). +4. A regression test for fix #1 (and fix #2 if landed). + +## Progress Tracking + +### Current Status: starting + +### Completed Steps +- [ ] Plan written +- [ ] Encode `%EXPORT_TAGS` fix +- [ ] SAX empty-element repro & fix +- [ ] Regression test(s) +- [ ] PR opened + +### Open Questions +- For #2: is the empty-element discrepancy in `XML::SAX::PurePerl` (pure + Perl, easy to patch) or in the Java-backed SAX driver? +- For #3 (future PR): is option A (Perl shim) sufficient for the modules we + care about, or do we need a real Want port? + +### Next Steps +1. Create feature branch `feature/active-resource-deps`. +2. Land fix #1 with regression test. +3. Build SAX repro to scope fix #2. From dfef90dc0bef3386215c31d1e3cb9a55ea9a382a Mon Sep 17 00:00:00 2001 From: "Flavio S. Glock" Date: Mon, 27 Apr 2026 10:21:43 +0200 Subject: [PATCH 2/5] fix(encode): add :all and :default EXPORT_TAGS for parity with core Core Encode.pm exposes four export tags: all, default, fallbacks, fallback_all PerlOnJava only registered :fallbacks and :fallback_all, so any module doing `use Encode qw(:all)` or `qw(:default)` died at import with: "all" is not defined in %Encode::EXPORT_TAGS at (eval N) line 1. This bit at least Test::XML's t/sax.t and t/basic.t (and is the kind of thing many CPAN modules trip over). Add a small reusable helper `defineDefaultAndAllTags()` in PerlModuleBase that builds :default = @EXPORT and :all = @EXPORT + @EXPORT_OK from the EXPORT/EXPORT_OK arrays already populated. Call it from Encode.initialize() after all defineExport(...) calls so the tag arrays capture the final lists. Adds a regression test that asserts both the tag presence and that `use Encode qw(:all)` / qw(:default) succeed. Verified parity with system perl: keys %Encode::EXPORT_TAGS = (all, default, fallback_all, fallbacks) Refs dev/modules/active_resource.md (issue #1). Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --- .../org/perlonjava/core/Configuration.java | 4 +-- .../perlonjava/runtime/perlmodule/Encode.java | 4 +++ .../runtime/perlmodule/PerlModuleBase.java | 31 +++++++++++++++++++ src/test/resources/unit/encode_export_tags.t | 30 ++++++++++++++++++ 4 files changed, 67 insertions(+), 2 deletions(-) create mode 100644 src/test/resources/unit/encode_export_tags.t diff --git a/src/main/java/org/perlonjava/core/Configuration.java b/src/main/java/org/perlonjava/core/Configuration.java index 608762a7b..184dbd309 100644 --- a/src/main/java/org/perlonjava/core/Configuration.java +++ b/src/main/java/org/perlonjava/core/Configuration.java @@ -33,7 +33,7 @@ public final class Configuration { * Automatically populated by Gradle/Maven during build. * DO NOT EDIT MANUALLY - this value is replaced at build time. */ - public static final String gitCommitId = "235249fad"; + public static final String gitCommitId = "a4fe7d4ca"; /** * Git commit date of the build (ISO format: YYYY-MM-DD). @@ -48,7 +48,7 @@ public final class Configuration { * Parsed by App::perlbrew and other tools via: perl -V | grep "Compiled at" * DO NOT EDIT MANUALLY - this value is replaced at build time. */ - public static final String buildTimestamp = "Apr 27 2026 10:34:30"; + public static final String buildTimestamp = "Apr 27 2026 10:20:59"; // Prevent instantiation private Configuration() { diff --git a/src/main/java/org/perlonjava/runtime/perlmodule/Encode.java b/src/main/java/org/perlonjava/runtime/perlmodule/Encode.java index 07657d9c6..6100fdbd1 100644 --- a/src/main/java/org/perlonjava/runtime/perlmodule/Encode.java +++ b/src/main/java/org/perlonjava/runtime/perlmodule/Encode.java @@ -174,6 +174,10 @@ public static void initialize() { "LEAVE_SRC", "DIE_ON_ERR", "WARN_ON_ERR", "RETURN_ON_ERR", "PERLQQ", "HTMLCREF", "XMLCREF", "STOP_AT_PARTIAL", "ONLY_PRAGMA_WARNINGS"); + // :default and :all — parity with core Encode.pm. + // Built from the @EXPORT / @EXPORT_OK lists already pushed above so + // any module doing `use Encode qw(:all)` or qw(:default) works. + encode.defineDefaultAndAllTags(); try { encode.registerMethod("encode", null); encode.registerMethod("decode", null); diff --git a/src/main/java/org/perlonjava/runtime/perlmodule/PerlModuleBase.java b/src/main/java/org/perlonjava/runtime/perlmodule/PerlModuleBase.java index 88887872d..ddafbc8d3 100644 --- a/src/main/java/org/perlonjava/runtime/perlmodule/PerlModuleBase.java +++ b/src/main/java/org/perlonjava/runtime/perlmodule/PerlModuleBase.java @@ -160,6 +160,37 @@ protected void defineExportTag(String tagName, String... symbols) { exportTags.put(tagName, tagArray.createReference()); } + /** + * Define the conventional :default and :all export tags. + * + * :default mirrors @EXPORT, :all mirrors @EXPORT + @EXPORT_OK + * (parity with core modules like Encode.pm). Call this AFTER all + * defineExport(EXPORT,...) / defineExport(EXPORT_OK,...) calls so the + * tag arrays capture the final lists. + */ + protected void defineDefaultAndAllTags() { + RuntimeHash exportTags = GlobalVariable.getGlobalHash(moduleName + "::EXPORT_TAGS"); + RuntimeArray exportArray = GlobalVariable.getGlobalArray(moduleName + "::EXPORT"); + RuntimeArray exportOkArray = GlobalVariable.getGlobalArray(moduleName + "::EXPORT_OK"); + + // :default = @EXPORT + RuntimeArray defaultTag = new RuntimeArray(); + for (int i = 0; i < exportArray.size(); i++) { + RuntimeArray.push(defaultTag, new RuntimeScalar(exportArray.get(i).toString())); + } + exportTags.put("default", defaultTag.createReference()); + + // :all = @EXPORT + @EXPORT_OK + RuntimeArray allTag = new RuntimeArray(); + for (int i = 0; i < exportArray.size(); i++) { + RuntimeArray.push(allTag, new RuntimeScalar(exportArray.get(i).toString())); + } + for (int i = 0; i < exportOkArray.size(); i++) { + RuntimeArray.push(allTag, new RuntimeScalar(exportOkArray.get(i).toString())); + } + exportTags.put("all", allTag.createReference()); + } + /** * Requires a Perl module and adds it to this module's @ISA. * This allows the current module to inherit methods from the parent module. diff --git a/src/test/resources/unit/encode_export_tags.t b/src/test/resources/unit/encode_export_tags.t new file mode 100644 index 000000000..b302fbc25 --- /dev/null +++ b/src/test/resources/unit/encode_export_tags.t @@ -0,0 +1,30 @@ +#!/usr/bin/perl +use strict; +use warnings; +use Test::More tests => 8; + +# Regression test for Encode %EXPORT_TAGS parity with core Encode.pm. +# Previously PerlOnJava only registered :fallbacks and :fallback_all, +# so any module doing `use Encode qw(:all)` or qw(:default) died with +# `"all" is not defined in %Encode::EXPORT_TAGS`. + +use Encode (); + +ok(exists $Encode::EXPORT_TAGS{all}, 'Encode :all tag exists'); +ok(exists $Encode::EXPORT_TAGS{default}, 'Encode :default tag exists'); +ok(exists $Encode::EXPORT_TAGS{fallbacks}, 'Encode :fallbacks tag exists'); +ok(exists $Encode::EXPORT_TAGS{fallback_all}, 'Encode :fallback_all tag exists'); + +# :default should mirror @EXPORT +my %default = map { $_ => 1 } @{ $Encode::EXPORT_TAGS{default} }; +ok($default{encode} && $default{decode}, ':default contains encode and decode'); + +# :all should be a superset of :default +my %all = map { $_ => 1 } @{ $Encode::EXPORT_TAGS{all} }; +ok($all{encode} && $all{decode}, ':all contains :default symbols'); +ok($all{FB_CROAK}, ':all contains EXPORT_OK symbols'); + +# Importing :all and :default must not die +eval "use Encode qw(:all); 1" or die $@; +eval "use Encode qw(:default); 1" or die $@; +ok(1, 'use Encode qw(:all) and qw(:default) succeed'); From fc31136b851cf40a6e1e87557c1749e6f0dff4b2 Mon Sep 17 00:00:00 2001 From: "Flavio S. Glock" Date: Mon, 27 Apr 2026 10:32:20 +0200 Subject: [PATCH 3/5] fix(xml-parser): correct Context push/pop timing for current_element MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit XML::Parser::Expat's current_element() returns $self->{Context}->[-1]. Real libexpat updates Context AFTER the user Start handler returns and BEFORE the user End handler runs, so: - inside StartTag: current_element returns the *parent* element (or undef at the root) - inside EndTag: current_element returns the parent element (or undef at the root) PerlOnJava had the opposite timing — pushed before Start and popped after End — so current_element returned the just-started/closing element instead. This broke XML::SemanticDiff: Style::Stream's doText fires Text from inside Start/End, and XML::SemanticDiff::Text uses current_element to attribute accumulated text. With the wrong parent, empty-element text was attributed to the new child element, turning its CData from undef into ''. Fix: in XMLParserExpat.java, move the Context push to the end of startElement() (after the user start handler) and the pop to before the user end handler. Also factor the push/pop into small helpers (pushContext / popContext) for the skip-path balancing. Verified empirically against system perl with Style => 'Stream': [Start root] cur=undef depth=0 [Text] cur=root [Start el2] cur=root depth=1 [End el2] cur=root depth=1 [Text] cur=root [End root] cur=undef depth=0 Test results: - XML::Parser bundled suite: 45 files / 434 tests still pass - XML::SemanticDiff: 2 previously-failing subtests in t/16zero_to_empty_str_cmp.t now pass; full suite green (47/47) - Adds src/test/resources/unit/xml_parser_current_element.t with 12 subtests covering Start/Text/End attribution at root and nested Refs dev/modules/active_resource.md (issue #2). Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> --- dev/modules/active_resource.md | 73 +++++++++++++---- .../org/perlonjava/core/Configuration.java | 4 +- .../runtime/perlmodule/XMLParserExpat.java | 72 +++++++++++------ .../unit/xml_parser_current_element.t | 79 +++++++++++++++++++ 4 files changed, 186 insertions(+), 42 deletions(-) create mode 100644 src/test/resources/unit/xml_parser_current_element.t diff --git a/dev/modules/active_resource.md b/dev/modules/active_resource.md index 8f71fd77e..d049db12e 100644 --- a/dev/modules/active_resource.md +++ b/dev/modules/active_resource.md @@ -62,7 +62,7 @@ our %EXPORT_TAGS = ( --- -### 2. SAX empty-element text reported as `''` instead of `undef` (MEDIUM EFFORT) +### 2. `XML::Parser::Expat::current_element` push/pop timing (MEDIUM EFFORT) `XML::SemanticDiff/t/16zero_to_empty_str_cmp.t` has 2 failing subtests: @@ -72,24 +72,65 @@ our %EXPORT_TAGS = ( # expected: undef ``` -The test compares `0` against `` and ``. Real Perl -yields `undef` for the new empty/self-closing element's text content; -PerlOnJava yields `''`. +The test compares `0` against `` and ``. Real +Perl yields `undef` for the new (empty/self-closing) element's accumulated +text; PerlOnJava yields `''`. -**Suspected root cause**: PerlOnJava's XML::SAX (likely the bundled -`XML::SAX::PurePerl`, or a Java-backed parser) emits a zero-length -`characters` event for empty elements, or stores `''` where real Perl leaves -the field unset, so `XML::SemanticDiff`'s `keepdata` walk sees `''` instead -of `undef`. +**Root cause** (verified): PerlOnJava's SAX bridge updates +`@{ $expat->{Context} }` at the wrong time, so `current_element` returns +the element being started/ended instead of its parent. Trace from a small +repro using `Style => 'Stream'`: + +``` +=== system perl === === jperl === +[StartTag root] current= depth=0 [StartTag root] current=root depth=1 +[Text] current=root [Text] current=el2 +[StartTag el2] current=root depth=1 [StartTag el2] current=el2 depth=2 +[EndTag el2] current=root depth=1 [EndTag el2] current=el2 depth=2 +[Text] current=root [Text] current=root +[EndTag root] current=undef [EndTag root] current=root +``` + +Effect on `XML::SemanticDiff`: + +- `XML::Parser::Style::Stream::Start` calls `doText`, which fires the + user `Text` handler with `$_ = $expat->{Text}`. In real Perl this Text + is attributed to the parent element (`current_element = root`); in + PerlOnJava it's attributed to the just-started element (`el2`). +- `XML::SemanticDiff::Text` does + `$char_accumulator->{$current_element} .= $char` (after stripping + whitespace). For the inter-tag `\n`, $char becomes `''`, so on jperl + `char_accumulator->{el2}` becomes `''`; on real Perl it stays `undef`. +- At ``, `EndElement` reads `$text = char_accumulator->{el2}` → + `'' ` vs `undef`, and stores it in `CData`, which surfaces as + `new_value`. + +**Fix**: in `src/main/java/org/perlonjava/runtime/perlmodule/XMLParserExpat.java`, +match libexpat's actual behaviour (which differs from the current code's +comment claim): + +- `startElement`: push to `Context` AFTER the user `startHandler` returns + (currently happens BEFORE, around line 1237–1243). +- `endElement`: pop from `Context` BEFORE the user `endHandler` runs + (currently happens AFTER, around line 1457–1465). + +**Risk**: this changes a semantic that any handler reading +`current_element` from inside Start/End would notice. Existing PerlOnJava +test files using `current_element` are: + +- `src/test/resources/module/XML-Parser/t/parament.t` — only reads + `current_element` from the Char handler (unaffected by Start/End + timing). +- `src/test/resources/module/XML-Parser/t/partial.t` — same, Char only. +- `src/test/resources/module/XML-Parser/t/astress.t` — uses `depth`/ + `element_index` from Char/End handlers; will need re-running. **Plan**: -1. Build a 5-line repro: parse `` and `0` with - `XML::SAX::ParserFactory`, dump the events. -2. Diff the event stream against system `perl`. -3. Fix the divergence in whichever of XML::SAX::PurePerl or the Java SAX - bridge is responsible. Prefer fixing the parser, not XML::SemanticDiff. -4. Re-run `t/16zero_to_empty_str_cmp.t` (and the rest of XML-SemanticDiff to - make sure no regressions). +1. Move push/pop in `XMLParserExpat.java`. +2. Run all `src/test/resources/module/XML-Parser/t/*.t` tests under jperl. +3. Run `t/16zero_to_empty_str_cmp.t` from XML-SemanticDiff to confirm fix. +4. Run `make` for full unit coverage. +5. If regressions, narrow further (e.g. only adjust pop timing, etc.). **Priority**: MEDIUM (unblocks XML::SemanticDiff → Test::XML → XML::Hash). diff --git a/src/main/java/org/perlonjava/core/Configuration.java b/src/main/java/org/perlonjava/core/Configuration.java index 184dbd309..3d9de8aa1 100644 --- a/src/main/java/org/perlonjava/core/Configuration.java +++ b/src/main/java/org/perlonjava/core/Configuration.java @@ -33,7 +33,7 @@ public final class Configuration { * Automatically populated by Gradle/Maven during build. * DO NOT EDIT MANUALLY - this value is replaced at build time. */ - public static final String gitCommitId = "a4fe7d4ca"; + public static final String gitCommitId = "76eeee938"; /** * Git commit date of the build (ISO format: YYYY-MM-DD). @@ -48,7 +48,7 @@ public final class Configuration { * Parsed by App::perlbrew and other tools via: perl -V | grep "Compiled at" * DO NOT EDIT MANUALLY - this value is replaced at build time. */ - public static final String buildTimestamp = "Apr 27 2026 10:20:59"; + public static final String buildTimestamp = "Apr 27 2026 10:31:30"; // Prevent instantiation private Configuration() { diff --git a/src/main/java/org/perlonjava/runtime/perlmodule/XMLParserExpat.java b/src/main/java/org/perlonjava/runtime/perlmodule/XMLParserExpat.java index 5a4ae3e5d..1a3aa4bd2 100644 --- a/src/main/java/org/perlonjava/runtime/perlmodule/XMLParserExpat.java +++ b/src/main/java/org/perlonjava/runtime/perlmodule/XMLParserExpat.java @@ -1234,13 +1234,18 @@ public void startElement(String uri, String localName, String qName, elementNameScalar = new RuntimeScalar(qName); } - // Update Perl's Context array: push @{$self->{Context}}, $elementName + // NOTE: Per real libexpat behaviour, Context is updated AFTER the + // user's Start handler returns (and BEFORE the End handler runs in + // endElement, see below). This means current_element() inside the + // Start handler returns the *parent* element, matching Perl's + // XML::Parser::Expat. Verified against system perl with + // Style => 'Stream' (XML::SemanticDiff relies on this — its Text + // accumulator gets attributed to the parent for inter-element + // whitespace). + // + // The push is performed at the end of this method (and on the + // skip path below) so it always balances the pop in endElement. RuntimeHash selfHash = state.selfRef.hashDeref(); - RuntimeScalar contextRef = selfHash.get("Context"); - if (contextRef != null && contextRef.type != RuntimeScalarType.UNDEF) { - RuntimeArray context = contextRef.arrayDeref(); - RuntimeArray.push(context, elementNameScalar); - } // Separate specified from defaulted attributes for specifiedAttributeCount List specifiedIndices = new ArrayList<>(); @@ -1314,6 +1319,7 @@ public void startElement(String uri, String localName, String qName, // Skip if skip_until is active if (state.skipUntilIndex >= 0 && state.elementIndex < state.skipUntilIndex) { + pushContext(selfHash, elementNameScalar); return; } @@ -1353,6 +1359,33 @@ public void startElement(String uri, String localName, String qName, } } } + + // Push Context AFTER user start handler — see top-of-method note. + pushContext(selfHash, elementNameScalar); + } + + /** + * push @{$self->{Context}}, $name (no-op if Context is undef/missing). + */ + private static void pushContext(RuntimeHash selfHash, RuntimeScalar name) { + RuntimeScalar contextRef = selfHash.get("Context"); + if (contextRef != null && contextRef.type != RuntimeScalarType.UNDEF) { + RuntimeArray context = contextRef.arrayDeref(); + RuntimeArray.push(context, name); + } + } + + /** + * pop @{$self->{Context}} (no-op if Context is undef/missing/empty). + */ + private static void popContext(RuntimeHash selfHash) { + RuntimeScalar contextRef = selfHash.get("Context"); + if (contextRef != null && contextRef.type != RuntimeScalarType.UNDEF) { + RuntimeArray context = contextRef.arrayDeref(); + if (context.size() > 0) { + RuntimeArray.pop(context); + } + } } /** @@ -1425,14 +1458,7 @@ public void endElement(String uri, String localName, String qName) throws SAXExc if (state.skipUntilIndex >= 0 && state.elementIndex < state.skipUntilIndex) { // Pop Context even when skipping - RuntimeHash selfHash = state.selfRef.hashDeref(); - RuntimeScalar contextRef = selfHash.get("Context"); - if (contextRef != null && contextRef.type != RuntimeScalarType.UNDEF) { - RuntimeArray context = contextRef.arrayDeref(); - if (context.size() > 0) { - RuntimeArray.pop(context); - } - } + popContext(state.selfRef.hashDeref()); return; } @@ -1441,6 +1467,14 @@ public void endElement(String uri, String localName, String qName) throws SAXExc state.skipUntilIndex = -1; } + // Pop Perl's Context array BEFORE the end handler — real libexpat + // calls the end handler with the closing element no longer in + // Context (so current_element() returns its parent). Verified + // against system perl with Style => 'Stream'. Without this, + // XML::SemanticDiff misattributes Text accumulation and reports + // empty-element CData as '' instead of undef. + popContext(state.selfRef.hashDeref()); + if (state.endHandler != null) { RuntimeArray callArgs = new RuntimeArray(); RuntimeArray.push(callArgs, state.selfRef); @@ -1453,16 +1487,6 @@ public void endElement(String uri, String localName, String qName) throws SAXExc } else if (state.defaultHandler != null) { fireDefault(state, state.recognizedString); } - - // Pop Perl's Context array AFTER the end handler (matches libexpat behavior) - RuntimeHash selfHash = state.selfRef.hashDeref(); - RuntimeScalar contextRef = selfHash.get("Context"); - if (contextRef != null && contextRef.type != RuntimeScalarType.UNDEF) { - RuntimeArray context = contextRef.arrayDeref(); - if (context.size() > 0) { - RuntimeArray.pop(context); - } - } } @Override diff --git a/src/test/resources/unit/xml_parser_current_element.t b/src/test/resources/unit/xml_parser_current_element.t new file mode 100644 index 000000000..3df32433c --- /dev/null +++ b/src/test/resources/unit/xml_parser_current_element.t @@ -0,0 +1,79 @@ +#!/usr/bin/perl +use strict; +use warnings; +use Test::More tests => 12; +use XML::Parser; + +# Regression test for XML::Parser::Expat current_element / Context push-pop +# timing. Real libexpat updates Context AFTER the Start handler returns +# and BEFORE the End handler runs, so: +# - inside StartTag: current_element returns the *parent* element +# (or undef at the root) +# - inside EndTag: current_element returns the parent (or undef at root) +# +# Previously PerlOnJava pushed before Start and popped after End, which broke +# XML::SemanticDiff's empty-element CData handling (returned '' instead of +# undef). See dev/modules/active_resource.md. + +my $xml = qq{\n\n\n\n}; + +my @events; +package Recorder; +sub StartDocument { } +sub StartTag { + my ($e, $name) = @_; + push @events, "Start:$name:cur=" . ($e->current_element // 'undef') + . ":depth=" . $e->depth; +} +sub EndTag { + my ($e, $name) = @_; + push @events, "End:$name:cur=" . ($e->current_element // 'undef') + . ":depth=" . $e->depth; +} +sub Text { + my ($e) = @_; + my $text = $_; + $text =~ s/\n/\\n/g; + push @events, "Text:cur=" . ($e->current_element // 'undef') + . ":text='$text'"; +} +package main; + +XML::Parser->new(Style => 'Stream', Pkg => 'Recorder')->parse($xml); + +# Inside StartTag root: Context is still empty. +is($events[0], 'Start:root:cur=undef:depth=0', + 'StartTag of root sees empty Context (current=undef, depth=0)'); + +# After Start root returned, Context = [root]; the inter-element \n is +# attributed to root. +is($events[1], q{Text:cur=root:text='\n'}, + 'inter-element text attributed to parent (root) not the next sibling'); + +# StartTag el2 sees root as current_element (el2 not yet pushed). +is($events[2], 'Start:el2:cur=root:depth=1', + 'StartTag of el2 sees parent in Context (current=root, depth=1)'); + +# EndTag el2 sees Context already popped back to root. +is($events[3], 'End:el2:cur=root:depth=1', + 'EndTag of el2 sees parent in Context (current=root, depth=1)'); + +# Trailing \n attributed to root. +is($events[4], q{Text:cur=root:text='\n'}, + 'trailing text attributed to root'); + +# EndTag root: Context already popped to empty. +is($events[5], 'End:root:cur=undef:depth=0', + 'EndTag of root sees empty Context (current=undef, depth=0)'); + +is(scalar @events, 6, 'exactly 6 events recorded'); + +# Nested case: x +@events = (); +XML::Parser->new(Style => 'Stream', Pkg => 'Recorder')->parse('x'); + +is($events[0], 'Start:a:cur=undef:depth=0', 'nested: Start a sees empty'); +is($events[1], 'Start:b:cur=a:depth=1', 'nested: Start b sees a as parent'); +is($events[2], q{Text:cur=b:text='x'}, 'nested: Text inside b sees b'); +is($events[3], 'End:b:cur=a:depth=1', 'nested: End b sees a as parent'); +is($events[4], 'End:a:cur=undef:depth=0', 'nested: End a sees empty'); From d4b211c95d032ef359a89c6c567491cfa5f7cb59 Mon Sep 17 00:00:00 2001 From: "Flavio S. Glock" Date: Mon, 27 Apr 2026 10:32:39 +0200 Subject: [PATCH 4/5] =?UTF-8?q?docs(active-resource):=20update=20progress?= =?UTF-8?q?=20=E2=80=94=20fixes=20#1=20and=20#2=20landed?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- dev/modules/active_resource.md | 38 ++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/dev/modules/active_resource.md b/dev/modules/active_resource.md index d049db12e..1384fc71c 100644 --- a/dev/modules/active_resource.md +++ b/dev/modules/active_resource.md @@ -194,22 +194,34 @@ combine with these fixes. ## Progress Tracking -### Current Status: starting +### Current Status: fixes #1 and #2 landed; #3 (Want) deferred ### Completed Steps -- [ ] Plan written -- [ ] Encode `%EXPORT_TAGS` fix -- [ ] SAX empty-element repro & fix -- [ ] Regression test(s) -- [ ] PR opened +- [x] Plan written +- [x] Encode `%EXPORT_TAGS` fix (commit 76eeee938) +- [x] SAX `current_element` push/pop timing fix (commit 5c28802d4) +- [x] Regression tests added: + - `src/test/resources/unit/encode_export_tags.t` (8 subtests) + - `src/test/resources/unit/xml_parser_current_element.t` (12 subtests) +- [x] PR opened (#568) +- [ ] Re-run `jcpan -t XML::SemanticDiff` end-to-end +- [ ] PR review and merge +- [ ] Follow-up: design doc + ticket for `Want` shim/port + +### Verification Results +- `make` passes (all unit tests). +- Bundled `XML::Parser` test suite: 45 files / 434 tests, all pass + (no regression from the Context timing change). +- `XML::SemanticDiff` standalone: 18/18 files, 47/47 tests now pass + (2 previously-failing subtests in t/16zero_to_empty_str_cmp.t fixed). ### Open Questions -- For #2: is the empty-element discrepancy in `XML::SAX::PurePerl` (pure - Perl, easy to patch) or in the Java-backed SAX driver? -- For #3 (future PR): is option A (Perl shim) sufficient for the modules we - care about, or do we need a real Want port? +- For #3 (future PR): is option A (Pure-Perl Want shim) sufficient for + the modules we care about, or do we need a real Want port? ### Next Steps -1. Create feature branch `feature/active-resource-deps`. -2. Land fix #1 with regression test. -3. Build SAX repro to scope fix #2. +1. Re-run `jcpan -t XML::SemanticDiff`, then `Test::XML`, then `XML::Hash` + end-to-end to confirm the dependency chain (sans Want) is now clear. +2. Land this PR. +3. Open a follow-up issue/design doc for `Want` (Class::Accessor::Lvalue + blocker) so ActiveResource itself can eventually be reached. From 218a431f4a1e6b77daeb66350b125c6063d162f6 Mon Sep 17 00:00:00 2001 From: "Flavio S. Glock" Date: Mon, 27 Apr 2026 10:38:54 +0200 Subject: [PATCH 5/5] docs(want): add detailed port plan; link from active_resource.md Scopes a port of CPAN's Want module (the XS dep that blocks Class::Accessor::Lvalue and therefore ActiveResource). Covers: - What Want's API actually does and why wantarray isn't enough. - Why it's hard on PerlOnJava (no op tree at runtime). - Three options (Pure-Perl shim, Java port of subset, full parity) and a recommendation to start with Option A1: a runtime lvalue-context flag plus a Perl-side shim covering LVALUE/RVALUE/ASSIGN/LIST/SCALAR/VOID + rreturn/lnoreturn. - Acceptance tests, implementation checklist, and risks (re-entrancy, interpreter parity, perf). Refs dev/modules/active_resource.md (issue #3). --- dev/modules/active_resource.md | 10 +- dev/modules/want.md | 250 ++++++++++++++++++ .../org/perlonjava/core/Configuration.java | 4 +- 3 files changed, 258 insertions(+), 6 deletions(-) create mode 100644 dev/modules/want.md diff --git a/dev/modules/active_resource.md b/dev/modules/active_resource.md index 1384fc71c..6d8cd7e36 100644 --- a/dev/modules/active_resource.md +++ b/dev/modules/active_resource.md @@ -170,9 +170,10 @@ C. **Defer ActiveResource** — accept that ActiveResource is unreachable for **Plan for this PR**: Option C. Document the `Want` blocker, link to a follow-up ticket, and ship the two cheap wins. Want is too large to -combine with these fixes. +combine with these fixes. The detailed Want port plan lives in +[`dev/modules/want.md`](want.md). -**Priority**: deferred (own design doc / PR). +**Priority**: deferred (own design doc / PR — see `dev/modules/want.md`). --- @@ -223,5 +224,6 @@ combine with these fixes. 1. Re-run `jcpan -t XML::SemanticDiff`, then `Test::XML`, then `XML::Hash` end-to-end to confirm the dependency chain (sans Want) is now clear. 2. Land this PR. -3. Open a follow-up issue/design doc for `Want` (Class::Accessor::Lvalue - blocker) so ActiveResource itself can eventually be reached. +3. Begin work on `Want` per [`dev/modules/want.md`](want.md) so + `Class::Accessor::Lvalue` and therefore `ActiveResource` itself can + eventually be reached. diff --git a/dev/modules/want.md b/dev/modules/want.md new file mode 100644 index 000000000..637b25d2c --- /dev/null +++ b/dev/modules/want.md @@ -0,0 +1,250 @@ +# Want.pm Port Plan + +## Overview + +`Want` (Robin Houston, CPAN) is a Perl module that exposes much richer +calling-context introspection than the built-in `wantarray`. It is a hard +blocker for several CPAN modules in PerlOnJava — most visibly +`Class::Accessor::Lvalue` and therefore everything downstream of it +(e.g. `ActiveResource`). + +This document scopes a port and proposes an incremental path. + +## Why we need it + +`Class::Accessor::Lvalue` (and its `::Fast` sibling) are accessor +generators that produce subroutines usable on either side of `=`: + +```perl +$obj->name # read +$obj->name = "Frank"; # assign +``` + +To do that they ask `Want` whether the call site is an lvalue, an +rvalue, or a readonly slot, and emit a clean `croak` for misuse: + +``` +'main' cannot alter the value of 'baz' on objects of class 'Foo' +``` + +Without `Want`, the whole chain breaks at `require` time: + +``` +Can't load loadable object for module Want: no Java XS implementation available +``` + +Other CPAN modules that depend on Want (incomplete list): + +- `Class::Accessor::Lvalue`, `Class::Accessor::Lvalue::Fast` +- `Sub::Curry` +- Various accessor frameworks and DSLs that overload chained method + calls (`->foo->bar` patterns) +- A long tail of small modules that use `want('LIST')` / + `want('SCALAR')` for polymorphic returns + +For the immediate goal of unblocking `ActiveResource`, only the +LVALUE / RVALUE / ASSIGN paths plus `rreturn` / `lnoreturn` matter. + +## What Want actually does + +Want is implemented as XS that walks Perl's op tree at the call site +to figure out exactly how the caller is using the return value. + +### API surface (full) + +| Call | Returns true when caller is doing | +|------------------------------|----------------------------------------------| +| `want('VOID')` | `foo();` (return value discarded) | +| `want('SCALAR')` | `$x = foo();` | +| `want('LIST')` | `@a = foo();` `(...) = foo();` | +| `want('BOOL')` | `if (foo())` / `while (foo())` / `!foo()` | +| `want('COUNT')` | `scalar(@a = foo())` count context | +| `want('HASH')` | `%h = foo();` / `%{ foo() }` | +| `want('ARRAY')` | `@a = foo();` / `@{ foo() }` | +| `want('CODE')` | `&{ foo() }->(...)` | +| `want('GLOB')` | `*{ foo() }` | +| `want('REFSCALAR')` | `${ foo() }` | +| `want('OBJECT')` | `foo()->bar(...)` | +| `want('OBJECT', 'IO::File')` | …and `bar` belongs to IO::File | +| `want('CHAIN', N)` | there are at least N chained method calls | +| `want('LVALUE')` | `foo() = ...` (call is on the LHS of `=`) | +| `want('RVALUE')` | call is being read, not assigned to | +| `want('ASSIGN')` | specifically the LHS of an assignment | +| `want('COUNT', N)` | repeated-context variant | + +Helpers that use the introspection to control the return: + +| Call | Effect | +|------------------------|-------------------------------------------------------| +| `rreturn(@v)` | return `@v` as an rvalue regardless of call site | +| `lnoreturn` | bail out of an lvalue call without storing anything | +| `want_ref()` | return the reftype the caller wants (HASH/ARRAY/...) | +| `wantref()` | similar, returns "HASH"/"ARRAY"/... or empty | + +### Why it's hard on PerlOnJava + +Want's implementation pokes directly at C-level Perl internals: + +- Walks the op tree from `PL_op` upward to find the nearest enclosing + `OP_ENTERSUB`, `OP_AASSIGN`, `OP_RV2HV`, etc. +- Reads context flags (`G_VOID`, `G_SCALAR`, `G_ARRAY`) from the + caller's stack frame. +- For `LVALUE`/`ASSIGN`, looks at whether the parent op is a left-hand + side of `=`/`+=`/`||=`/etc. and whether the function call is in + `OPf_MOD` modify-context. + +PerlOnJava has no op tree at runtime — Perl source is compiled to JVM +bytecode, so there is nothing to walk. The information Want needs has +to be reconstructed from the JVM call frame and from compile-time +information that PerlOnJava chooses to thread through. + +## Proposed approach + +Three options, in increasing cost: + +### Option A — Pure-Perl `Want` shim (MVP) + +Ship a hand-written `lib/Want.pm` that implements only the subset +real users hit. Specifically: + +- `want('LVALUE')` / `want('RVALUE')` / `want('ASSIGN')` +- `want('LIST')` / `want('SCALAR')` / `want('VOID')` (these can be + built on top of `wantarray`) +- `rreturn` / `lnoreturn` +- `want('BOOL')` (if cheap) + +The hard parts are LVALUE/ASSIGN detection. Two sub-options: + +**A1. Hook through PerlOnJava core.** Add a small bit of state in the +runtime that tracks "the current sub call is on the LHS of an `=`" +and expose it to Perl-land via an `Internals::Want::*` helper. The +bytecode emitter for assignments already knows whether the RHS is a +sub call; we add a thread-local or call-frame-local "lvalue context" +flag set by the assignment op and read by `Want.pm`. + +**A2. Compile-time pragma.** Use a source filter / AST rewriter so +that calls to known `Want`-using subs get tagged with a context hint. +More invasive, less general. + +A1 is the cleaner direction. + +**Coverage**: enough for `Class::Accessor::Lvalue::Fast`, +`Class::Accessor::Lvalue`, and most accessor-style users. Not enough +for Want's more exotic chain-walking or `OBJECT('Pkg')` queries. + +**Estimated effort**: medium. Maybe ~300 lines split between Java +(the lvalue-context flag) and Perl (the Want shim itself). + +### Option B — Java port of a curated subset + +Same surface as Option A but implemented natively in Java for +performance and tighter integration. Adds an `XSLoader::load('Want', …)` +target that fronts a Java module. + +Better long-term home; a bit more upfront work because we need to +plumb the lvalue/wantarray-extended info through `RuntimeContextType` +and friends. + +### Option C — Full Want parity + +Implement the entire Want API including OBJECT/CHAIN/REFSCALAR +variants. Requires PerlOnJava to either reconstruct an op-tree-like +structure at compile time or thread enough info through the runtime +to answer all of Want's questions. + +Largest effort and the boundary is genuinely fuzzy — some of Want's +behaviour leaks Perl-internals semantics that don't have a clean +translation in a non-op-tree runtime. + +## Recommendation + +**Land Option A1** as the first step. It is the cheapest +"unblock-real-users" path: + +1. Adds a small lvalue-context flag to PerlOnJava's call mechanism. +2. Ships a Pure-Perl `lib/Want.pm` covering LVALUE/RVALUE/ASSIGN/ + LIST/SCALAR/VOID/BOOL and `rreturn`/`lnoreturn`. +3. Targets `Class::Accessor::Lvalue` and `ActiveResource` as the + acceptance tests. + +If subsequent users need `OBJECT`/`CHAIN`/`HASH` introspection, treat +each one as a follow-up that grows the shim incrementally. + +## Acceptance tests + +The port is "done enough" when: + +1. `Class::Accessor::Lvalue` test suite passes (or all failures are + confined to features Want's full API would support but our shim + doesn't, and these are documented). +2. `ActiveResource`'s own `t/base.t`, `t/connection.t`, `t/simple.t` + load without dying at `require Class::Accessor::Lvalue::Fast`. +3. A new `src/test/resources/unit/want_basics.t` exercises: + - `$x = foo()` — `want('LVALUE')` false, `want('RVALUE')` true + - `foo() = 42` — `want('LVALUE')` true, `want('ASSIGN')` true + - `@a = foo()` / `$x = foo()` / `foo()` — list/scalar/void + - `rreturn(...)` short-circuits and returns scalar even from an + `@a = foo()` call site + - `lnoreturn` exits a sub used as `foo() = 42` cleanly + +## Implementation checklist + +### A1 (proposed) + +- [ ] Add a per-call-frame "lvalue target" flag to PerlOnJava's + call mechanism. Source of truth is the bytecode emitter for + `OP_AASSIGN` / `OP_SASSIGN` when the LHS resolves to a sub call. +- [ ] Expose `Internals::Want::is_lvalue_call()` (and a couple of + cousin helpers) from Java. +- [ ] Write `src/main/perl/lib/Want.pm` implementing the public + API on top of `wantarray` + the new internals helper. +- [ ] Add `src/test/resources/unit/want_basics.t` (regression). +- [ ] Run `Class-Accessor-Lvalue-0.11` test suite under `jperl`, + iterate until clean. +- [ ] Update `dev/modules/active_resource.md` to mark issue #3 + resolved and re-run the full `jcpan -t ActiveResource` chain. + +### Out of scope (for now) + +- `want('OBJECT', 'Pkg')` / `want('CHAIN', N)` — call-stack and + method-resolution introspection; defer until a real user asks. +- `want_ref` / `wantref` — easy to add later but no current consumer. + +## Risks / Open Questions + +- **LVALUE detection precision**: PerlOnJava already has limited + lvalue-sub support. We need to make sure the new context flag is + set for *all* lvalue call sites the bytecode emitter generates, + not just the obvious ones (e.g. assignment-via-modify ops like + `+=`, `||=`, list-assign-into-sub). +- **Re-entrancy**: the flag must be associated with the specific + call frame, not global state — recursion and nested calls must + not see each other's lvalue context. +- **Interpreter parity**: PerlOnJava has both JVM-bytecode and + interpreter backends. The lvalue-context flag must work + identically on both. Good test target for the + `interpreter-parity` skill. +- **Performance**: setting a flag on every sub call has a cost. + Worth measuring on the existing benchmarks before/after. + +## Progress Tracking + +### Current Status: scoping / design + +### Completed Steps +- [ ] Design doc reviewed +- [ ] A1 proof-of-concept on a feature branch +- [ ] Class::Accessor::Lvalue passes +- [ ] ActiveResource passes (sans network I/O) +- [ ] Want.pm shim merged + +### Open Questions +- A1 vs A2 vs B: confirm A1 (runtime hook + Perl shim) is the right + starting point. +- Naming: `Internals::Want::*` vs a private `B::*`-style module? + +### Related Docs +- `dev/modules/active_resource.md` — the user-visible blocker that + prompted this plan. +- `dev/architecture/` — PerlOnJava call-frame / lvalue documentation + (TODO: link specific files once located). diff --git a/src/main/java/org/perlonjava/core/Configuration.java b/src/main/java/org/perlonjava/core/Configuration.java index 3d9de8aa1..42b748df8 100644 --- a/src/main/java/org/perlonjava/core/Configuration.java +++ b/src/main/java/org/perlonjava/core/Configuration.java @@ -33,7 +33,7 @@ public final class Configuration { * Automatically populated by Gradle/Maven during build. * DO NOT EDIT MANUALLY - this value is replaced at build time. */ - public static final String gitCommitId = "76eeee938"; + public static final String gitCommitId = "596232878"; /** * Git commit date of the build (ISO format: YYYY-MM-DD). @@ -48,7 +48,7 @@ public final class Configuration { * Parsed by App::perlbrew and other tools via: perl -V | grep "Compiled at" * DO NOT EDIT MANUALLY - this value is replaced at build time. */ - public static final String buildTimestamp = "Apr 27 2026 10:31:30"; + public static final String buildTimestamp = "Apr 27 2026 10:42:43"; // Prevent instantiation private Configuration() {