Add XML::Parser Java XS implementation (JDK SAX backend)#457
Merged
Conversation
Implements XML::Parser::Expat as a Java XS module using JDK's built-in SAX parser instead of the native expat C library. Key features: - Full SAX-based parsing with Start/End/Char/PI/Comment handlers - Namespace support using dualvar scalars (string=localname, int=ns_index) matching expat's gen_ns_name() dual PV/IV behavior - XMLDecl, element/attlist declaration handlers - Namespace prefix tracking (new_ns_prefixes, expand_ns_prefix, current_ns_prefixes) - Error string mapping, ExpatVersion, security API stubs - Byte position tracking via accumulated token lengths - CPAN::Distribution helpers for XS module fallback installation Test results: 24 of 47 XML::Parser tests pass, including all 15 namespace tests. No unit test regressions. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Plan to implement XML::Parser::Expat as a Java XS class using the JDK built-in javax.xml.parsers.SAXParser as the XML engine. No new Maven dependencies required. Covers: 47 test files, 55 XS functions, 20 handler types, 5 phases. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Major improvements to XML::Parser Java XS implementation:
- Fix UTF-8 double-encoding: use ISO_8859_1 for BYTE_STRING input
to avoid re-encoding raw UTF-8 bytes (fixes utf8_handling.t,
debug_multibyte.t - 32 tests)
- Fix specified vs defaulted attributes: use Attributes2.isSpecified()
to separate and reorder attributes, matching expat convention
(fixes defaulted.t - 4 tests)
- Fix error messages: format SAX errors as not well-formed (invalid
token) with escaping hints, matching libexpat output format
(fixes error_hint.t - 5 tests)
- Fix systemId resolution: un-resolve SAX-resolved absolute URIs back
to relative paths by tracking parseBaseUri on InputSource
(fixes decl.t tests 5/35 - 44 tests now pass)
- Fix string interpolation: support ${ref}{key} subscript access
after braced variable expressions in double-quoted strings
(fixes styles.t Objects style - 11 tests)
- Fix IO handle class detection: treat GLOB ref class as IO::Handle
for input_record_separator calls (fixes stream.t partial)
- Fix MakeMaker BASEEXT scanning: recursively find .pm files in the
module base directory for Style submodule installation
- Fix extern_ent_lexical_glob.t: handle file:/path compact URI form
XML::Parser test results: 35/47 files pass (74%), 365/385 subtests (95%)
Previously: 29/47 files, ~262/308 subtests
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Documents architecture, test status (35/47 pass, 95% subtests), known limitations, and TODO items including self-closing tag column recognition fix. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Fixes: - Stream delimiter parsing: read line-by-line via readline() respecting $/ set by Expat.pm, enabling resumable delimited stream parsing - Self-closing tag detection: scan inputBytes to detect <foo/> vs <foo></foo> for correct column tracking in both start and end handlers - Entity expansion tracking: use startEntity/endEntity from LexicalHandler to set original_string to unexpanded entity ref (e.g. "&draft.day;") - ExternEntFin handler: now called for both filehandle and string returns from ExternEnt handler - Element index stack: maintain per-element index via push/pop so element_index returns same value in start and end handlers - ProtocolEncoding: store and apply encoding from ParserCreate to InputSource, fixing ISO-8859-1 encoded documents - PositionContext: implement position_in_context() returning surrounding lines and correct linepos for pointer insertion - ParseParamEnt: conditionally enable external-parameter-entities and load-external-dtd SAX features based on ParseParamEnt option - Entity resolver: preserve systemId on returned InputSource so SAX can resolve relative references within external DTDs - Context pop order: pop Context array AFTER end handler callback, matching libexpat behavior for depth() consistency Test results: 41/47 files pass (377/397 subtests, 95.0%) Newly passing: astress.t, g_void.t, partial.t, stream.t, position_overflow.t, parament_internal.t Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Documents approach for handling expat-specific encoding names (x-sjis-unicode -> Shift_JIS) that JDK SAX does not support natively. Covers encoding.t, parament.t, and decl.t test improvements. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
xml_parser.md now points to dev/design/xml_parser_xs.md as the single source of truth for progress tracking and TODOs. Updated stale status from 'Not yet started' to '41/47 tests pass (95%)'. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Phase 4 encoding conversion: - Map expat-specific encoding names (x-sjis-unicode, x-euc-jp-unicode) to JDK charsets (Shift_JIS, EUC-JP) - Pre-parse encoding detection and byte re-encoding to UTF-8 - Applied in ParseString, ParseStream, ParseDone, resolveEntity, doParse Tail call trampoline in RuntimeCode.apply(): - Handle goto &func returning TAILCALL control flow from static callers - Needed for XML::Parser initial_ext_ent_handler which uses goto &func Test results: 43/47 files pass (97.7% subtests), up from 41/47 (95%) - encoding.t: 0 -> 43/43 - parament.t: 4/13 -> 13/13 Generated with Devin (https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Document root causes, exact line numbers, and suggested fixes for: - decl.t: NOTATION type off-by-one bug (substring(8) should be 9) and missing XMLDecl for external entity text declarations - foreign_dtd.t: UseForeignDTD not implemented, with 3 approaches - checklib_findcc.t: stub inc/Devel/CheckLib.pm lacks source patterns - checklib_tmpdir.t: same stub, missing tempfile/mktemp calls Generated with Devin (https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Phase 5 final fixes: - NOTATION type format: fix off-by-one (substring(8) -> substring(9)) - XMLDecl text declarations: fire handler from resolveEntity() for external entity text declarations (version=undef, original encoding) - UseForeignDTD: synthesize ExternEnt handler call with undef sysid/pubid, inject DOCTYPE with synthetic system ID, resolve in resolveEntity() - Error messages: map SAX 'was referenced, but not declared' to expat 'undefined entity' format - Devel::CheckLib: replaced stub with real upstream source (not tracked) Generated with Devin (https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add XML::Parser 2.56 PM files to bundled lib:
- XML/Parser.pm (upstream, unmodified)
- XML/Parser/Style/{Debug,Objects,Stream,Subs,Tree}.pm
- XML/Parser/LWPExternEnt.pl (optional LWP entity handler)
XML::Parser::Expat.pm (Java SAX-backed shim) was already bundled.
All dependencies (Carp, XSLoader, File::Spec, IO::Handle, etc.)
were already bundled. No new dependencies needed.
Update docs:
- README.md: add XML::Parser to module list
- changelog.md: add to v5.42.3 module list
- feature-matrix.md: add to non-core modules section
Generated with Devin (https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add a JUnit 5 test runner (ModuleTestExecutionTest.java) for bundled
CPAN module tests. Tests live under src/test/resources/module/{Name}/
and are executed with chdir to the module directory so relative paths
resolve correctly.
- 45 XML::Parser tests stored in module/XML-Parser/{t/,samples/}
(2 Devel::CheckLib C compiler tests excluded as irrelevant)
- Gradle task 'testModule' with @tag("module") filter
- Makefile target 'make test-bundled-modules'
- .gitignore exception for src/test/resources/module/*/t/
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…script
In Perl, explicit braces in ${var} terminate the variable name, so
subsequent [...] should be a character class (regex) or literal text
(string), not an array subscript. Only deref expressions like
${$ref}[0] should parse subscripts after braces.
Before: qr/${var}[$idx]/ → treated [$idx] as $var[$idx] (wrong)
After: qr/${var}[$idx]/ → scalar $var + char class [$idx] (correct)
This also fixes "${arr}[0]" in strings, which was incorrectly
producing $arr[0] instead of $arr followed by literal "[0]".
Fixes unit/regex/array_element_strict_vs_nonstrict.t CI failure.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
XML::Parser::Expatas a Java XS class (XMLParserExpat.java) using JDK's built-in SAX parserParser.pm(pure Perl) — only replace the XS backendExpat.pmshim injar:PERL5LIBthat delegates to Java via XSLoaderKey features implemented
Remaining (6 test files)
Test plan
makepasses (all unit tests)Replaces #455 (branch consolidation).
Generated with Devin