Skip to content

feat(pod): bundle Pod::Html + regex parity fix; module porting plans#557

Merged
fglock merged 3 commits intomasterfrom
plan/math-int64-port
Apr 25, 2026
Merged

feat(pod): bundle Pod::Html + regex parity fix; module porting plans#557
fglock merged 3 commits intomasterfrom
plan/math-int64-port

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Apr 25, 2026

Summary

Module-port plans + first implementation: bundles Pod::Html, fixes the regex bug it depends on, and lands accompanying %Config improvements. Math::Int64 remains plan-only for a future PR.

What's implemented in this PR

Phase 0 — Regex ^/m/g parity fix (general infrastructure)

In RuntimeRegex.matchRegexDirect, the LIST-context global-match loop was calling matcher.region(startPos, ...) after every non-zero-length match. Java's Matcher.region(...) defaults to useAnchoringBounds(true), which made ^ match at the artificial region boundary even when that offset wasn't actually preceded by \n. Result:

"ab\ncd\n" =~ /^(.*)/mg
# perl  : 2 matches  ("ab", "cd")
# jperl : 4 matches  ("ab", "", "cd", "")   <-- before this PR

This silently corrupted any line-walking idiom under /^...$/mg, including the Pod::Html::Util::trim_leading_whitespace dedent that broke verbatim block rendering.

Fix:

  • Tighten the predicate so matcher.region(...) only runs when the engine forcibly advances past a zero-length match (the matchEnd = matchStart + 1 path).
  • Add matcher.useAnchoringBounds(false) at the remaining region() call sites so ^/$ only anchor at real \n line boundaries in the input string.

New unit test (src/test/resources/unit/regex/regex_caret_multiline_global.t, 15 subtests) covers the canonical idioms.

Phase 1 — Bundle Pod::Html

Pod::Html is dual-life: CPAN ships it only inside the full perl source tarball, so jcpan -t Pod::Html is structurally a dead end (it tries to run perl's Configure shell script). Bundled via dev/import-perl5/sync.pl instead:

  • Added perl5/ext/Pod-Html/lib/Pod entry to dev/import-perl5/config.yaml → imports Pod/Html.pm (1.36) and Pod/Html/Util.pm.
  • Copied upstream t/ and corpus/ into src/test/resources/module/Pod-Html/.
  • All 18 upstream tests pass under make test-bundled-modules.

Config cosmetic fix (folded in)

$Config{perladmin}, $Config{cf_email}, $Config{cf_by}, $Config{myhostname} are now populated from the running JVM's user.name + Sys::Hostname (real perl gets these from Configure-time autoconf). They show up in pod2html's <link rev="made" href="mailto:user@host"> tag and in test fixtures that interpolate $Config{perladmin}. Was originally tracked as Phase 3 of the Pod::Html plan; needed in this PR for feature2.t to pass without spurious "Use of uninitialized value" warnings.

Plan documents

Doc Status
dev/modules/math_int64.md plan only (next PR)
dev/modules/pod_html.md implemented in this PR — see "Progress Tracking" section

Test plan

  • make (full unit suite) — green.
  • make test-bundled-modules — green (all bundled modules including new Pod-Html subtree).
  • JPERL_TEST_FILTER=Pod-Html make test-bundled-modules — 18/18 tests pass.
  • ./jperl -e 'use Pod::Html; print "v=$Pod::Html::VERSION ok\n"'v=1.36 ok.
  • New regex parity unit test passes on both ./jperl and system perl (locks in cross-engine behaviour).

Generated with Devin

fglock and others added 2 commits April 25, 2026 09:04
Plan-only design doc in dev/modules/math_int64.md covering:

- Why no Maven dep is needed (java.lang.Long, ByteBuffer, SecureRandom,
  Math.*Exact cover the entire Int64.xs surface, signed and unsigned).
- Three XS MODULE blocks (miu64_, mi64, mu64) mapped to a single
  MathInt64.java with an Int64Holder pattern matching dev/modules/bit_vector.md.
- Reuse of upstream lib/Math/Int64.pm and the two pure-Perl pragmas
  (die_on_overflow, native_if_available) unchanged.
- Six implementation phases tied to the upstream .t files.
- Phase 0 prerequisite (separate PR): make ExtUtils::CBuilder fail
  loudly when Config{cc}=javac and fix the relative archlibexp /
  empty obj_ext issues uncovered while investigating
  `jcpan -t Math::Int64`.

No implementation yet.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Plan-only design doc in dev/modules/pod_html.md covering two phases:

Phase 0 — Regex `^/m/g` fix (general infrastructure):
- Diagnoses a bug in RuntimeRegex.matchRegexDirect where, in LIST
  context global matches, matcher.region(startPos, ...) is called
  after every non-zero-length match (the `startPos > matchStart`
  predicate is always true, despite a comment claiming otherwise).
- Java's Matcher.region defaults to useAnchoringBounds(true), making
  ^ match at the artificial region boundary even when that offset
  is not actually preceded by \n. Result:
    "ab\ncd\n" =~ /^(.*)/mg yields 4 matches in jperl vs 2 in perl.
- Verified the fix with a direct Java repro:
  matcher.useAnchoringBounds(false) after each region() call restores
  Perl-compatible ^/$ semantics.
- Includes a reduced unit-test outline.

Phase 1 — Bundle Pod::Html:
- Pod::Html is dual-life and only shipped on CPAN inside the full
  perl source tarball, so `jcpan -t Pod::Html` is structurally a
  dead end. Plan to add it via dev/import-perl5/sync.pl.
- All dependencies (Pod::Simple{,::XHTML,::SimpleTree,::Search},
  Text::Tabs, etc.) already work in PerlOnJava.
- 13 of 16 substantive upstream tests already pass against the
  in-tree code; the 3 failures all trace back to the Phase 0 regex
  bug via Pod::Html::Util::trim_leading_whitespace.

No implementation yet.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock changed the title docs(modules): plan for porting Math::Int64 / Math::UInt64 docs(modules): porting plans (Math::Int64, Pod::Html, ...) Apr 25, 2026
Implements dev/modules/pod_html.md.

Phase 0 — Regex ^ in /m mode under /g.

In LIST-context global matches, matcher.region(startPos, ...) was
called after every non-zero-length match. Java's region() defaults
useAnchoringBounds(true), making ^ match at the artificial region
boundary even when that offset is not actually preceded by \n. Result:

    "ab\ncd\n" =~ /^(.*)/mg yielded 4 matches in jperl, 2 in perl.

This silently corrupted any line-walking idiom that combines ^/$
under /m with /g — including Pod::Html::Util::trim_leading_whitespace,
which is why Pod::Html's verbatim-block dedenting was broken.

Fix in RuntimeRegex.matchRegexDirect:
- Tighten the predicate so matcher.region(...) is only called when
  the engine forcibly advanced past a zero-length match (the
  matchEnd = matchStart + 1 path); in every other case Java's
  find() already continues from end() naturally.
- Add matcher.useAnchoringBounds(false) at the remaining
  region() call sites (the initial pos()-based seek and the
  zero-length-advance redirect), restoring Perl's ^/$ semantics.

New unit test src/test/resources/unit/regex/regex_caret_multiline_global.t
covers the canonical line-walking forms (15 subtests).

Phase 1 — Bundle Pod::Html.

Pod::Html is dual-life and CPAN ships it only inside the full perl
source tarball, so jcpan -t Pod::Html is structurally a dead end.
Bundle it via dev/import-perl5/sync.pl instead:

- Add perl5/ext/Pod-Html/lib/Pod entry to dev/import-perl5/config.yaml
  (imports Pod/Html.pm 1.36 and Pod/Html/Util.pm).
- Copy upstream t/ and corpus/ into src/test/resources/module/Pod-Html/.
- All 18 upstream tests pass under `make test-bundled-modules`.

Cosmetic Config fix folded in (needed for feature2.t):

- Config{perladmin}, Config{cf_email}, Config{cf_by}, Config{myhostname}
  are now populated from the running JVM's user.name + Sys::Hostname
  (real perl gets these from Configure-time autoconf probing). They
  show up in pod2html's <link rev="made" href="mailto:..."> tag and
  in test fixtures that interpolate $Config{perladmin}.

All unit tests pass. All bundled module tests pass.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock changed the title docs(modules): porting plans (Math::Int64, Pod::Html, ...) feat(pod): bundle Pod::Html + regex parity fix; module porting plans Apr 25, 2026
@fglock fglock merged commit 2c57f04 into master Apr 25, 2026
2 checks passed
@fglock fglock deleted the plan/math-int64-port branch April 25, 2026 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant