Skip to content

Fix utf8::valid() for CPAN::Meta parsing#346

Merged
fglock merged 16 commits into
masterfrom
fix/utf8-valid-cpan-meta
Mar 21, 2026
Merged

Fix utf8::valid() for CPAN::Meta parsing#346
fglock merged 16 commits into
masterfrom
fix/utf8-valid-cpan-meta

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Mar 20, 2026

Summary

Fixes utf8::valid() to properly check string validity, resolving CPAN::Meta::YAML parsing errors.

Problem

When installing CPAN modules (like DateTime), CPAN::Meta::YAML parsing would fail with:

Read an invalid UTF-8 string (maybe mixed UTF-8 and 8-bit character set).
Did you decode with lax :utf8 instead of strict :encoding(UTF-8)?

Root Cause

The utf8::valid() implementation was using ICU CharsetDetector which was fundamentally wrong.

Solution

Rewrote utf8::valid() to correctly check string validity:

  • For character strings: Validates that surrogate pairs are properly formed
  • For byte strings: Attempts to decode bytes as UTF-8

Changes

  • Utf8.java: Fixed valid() method
  • cpan_client.md: Added Phase 16 documentation

Test plan

  • make passes
  • utf8::valid() now returns correct values
  • CPAN::Meta can parse MYMETA.yml files

Generated with Devin

fglock and others added 16 commits March 20, 2026 21:37
The previous implementation used CharsetDetector which was incorrect:
- It converted the string to bytes using default charset
- Then tried to detect if those bytes were UTF-8
- This always failed for properly decoded strings

The new implementation correctly checks:
- For character strings (UTF-8 flag on): validates surrogate pairs
- For byte strings (UTF-8 flag off): decodes bytes as UTF-8

This fixes CPAN::Meta::YAML parsing which was failing with:
"Read an invalid UTF-8 string (maybe mixed UTF-8 and 8-bit character set)"

The error occurred because CPAN::Meta::YAML checks:
  if (utf8::is_utf8($string) && !utf8::valid($string))
and utf8::valid() was always returning false.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Added documentation for the fix that resolved CPAN::Meta::YAML parsing
errors that were preventing proper test dependency detection.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
1. ExtUtils/MakeMaker.pm: Generate meta-spec v2 format MYMETA.yml
   - Test dependencies now properly detected by CPAN.pm
   - Uses nested prereqs structure instead of flat v1.4 format

2. dev/design/JCPAN_DATETIME_FIXES.md: Comprehensive fix plan
   - Documents all errors from clean cache DateTime install
   - Prioritized implementation plan
   - Critical: File::stat.pm needed for DateTime::Locale

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…fyError

Phase 17 DateTime fixes:
- Import Class::Struct.pm from perl5/lib (required by File::stat)
- Import File::stat.pm from perl5/lib (required by File::ShareDir::Install)
- Document JVM VerifyError with minimal reproducer

File::stat triggers a JVM bytecode verification error due to a bug when
compiling: no strict 'refs' + for loop + defined eval { &{symbolic_ref} }

Minimal reproducer:
  no strict 'refs';
  for (qw(X Y Z)) { defined eval { &{"Fcntl::S_IF$_"} } }

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The minimal reproducer doesn't require a for loop:
  no strict 'refs';
  my $result = defined eval { &{"Fcntl::S_IFX"} };

Root cause: The block dispatcher stores ordinal in controlFlowActionSlot,
but different code paths merge at a label with inconsistent types for
that slot (integer vs TOP).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
When a subroutine call inside an eval block returns a control flow marker
(LAST/NEXT/REDO), the control flow dispatch code for unlabeled LAST/NEXT
was jumping to target labels with an empty operand stack, but the merge
point expected a RuntimeList on the stack.

This caused a JVM VerifyError ("Inconsistent stackmap frames") when
loading File::stat.pm or other modules that use the pattern:
  eval { &{"Fcntl::S_IFX"} }

The fix pushes the control flow marker onto the stack before jumping to
the merge point, ensuring consistent stack state at all paths.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Document the root cause analysis and fix for Issue #11 (VerifyError
when loading File::stat.pm.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
EOF
)
File::stat.pm requires S_IRUSR, S_IWUSR, S_IXUSR and other mode constants
from Fcntl. These were listed in @EXPORT_OK but never actually defined.

Added:
- File type masks: S_IFMT, S_IFREG, S_IFDIR, S_IFLNK, etc.
- Special mode bits: S_ISUID, S_ISGID, S_ISVTX
- User permissions: S_IRUSR, S_IWUSR, S_IXUSR, S_IRWXU
- Group permissions: S_IRGRP, S_IWGRP, S_IXGRP, S_IRWXG
- Other permissions: S_IROTH, S_IWOTH, S_IXOTH, S_IRWXO
- File type test functions: S_ISREG, S_ISDIR, S_ISLNK, etc.
- Permission extraction: S_IMODE

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Both the JVM VerifyError and the missing Fcntl constants have been fixed.
File::stat.pm now loads and works correctly.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
When CORE::GLOBAL::require is overridden and a module uses 'require Bareword;'
under strict subs, the bareword was incorrectly flagged as a strict subs violation.

Root cause: When the parser detected a CORE::GLOBAL::require override, it rewrote
the require call to a subroutine call, but the bareword argument (e.g., 'Exporter')
was parsed as an expression instead of using require's special bareword-to-filename
conversion.

Fix: Added special handling in ParsePrimary.java for 'require' when
CORE::GLOBAL::require is overridden:
1. Parse the argument using standard require handling (converts bareword to filename)
2. Build a subroutine call node with the &CORE::GLOBAL::require code ref

Also added Exporter::require_version() method which delegates to UNIVERSAL::VERSION
for historical compatibility with older Exporter usage.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- Marked Exporter::require_version as FIXED
- Marked CORE::GLOBAL::require bareword handling as FIXED
- Updated progress tracking section

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
When calling Module->import('0.03', 'symbol'), Perl's Exporter treats
arguments starting with a digit as version checks, not symbols to export.
Added this logic to the Java Exporter:

1. If symbol starts with digit, call $pkg->VERSION($version)
2. If version was only argument, import from @export
3. If version + empty string ('use Foo 1.23, ""'), import nothing
4. Otherwise skip version and continue with other imports

This fixes: 'Symbol 0.03 not allowed for export in package File::ShareDir::Install'

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
When Makefile.PL scripts provide an explicit PM hash with Make-style
variables like $(INST_LIB)/Module.pm, expand them to actual paths.
Without this fix, modules would be installed to a literal '$(INST_LIB)'
directory instead of the actual install base.

This fixes Class::Inspector and similar modules using explicit PM hashes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
All blocking issues have been fixed:
- Exporter version check in import arguments
- MakeMaker $(INST_LIB) variable expansion

jcpan install DateTime now works successfully.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
When Makefile.PL uses File::ShareDir::Install to register share directories
(via install_share()), our MakeMaker now processes @file::ShareDir::Install::DIRS
and copies those files to the proper location under auto/share/dist/<DistName>/.

This enables DateTime::Locale to work correctly, as it uses share files for
locale data (1070 .pl files for different locales).

Test: jcpan install DateTime::Locale now installs all locale data files.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
When open3() is called with redirection directives like '>&STDERR', they are
read-only strings that cannot be modified. This fix:

1. In IPCOpen3.java:
   - Added isOutputRedirection() and isInputRedirection() to detect >&/&< directives
   - Added handleOutputRedirection() to pipe process output to named handles
   - Added isUsableHandle() to properly detect undef handles (not just reference-to-undef)
   - Added getStringValue() to properly dereference scalar references

2. In Open3.pm:
   - Check for redirection directives before trying to update caller's variables
   - Skip assignment to $_[N] when it's a redirection directive (read-only)

This fixes: 'open3: Modification of a read-only value attempted' errors in
tests like t/00-compile.t that use open3($stdin, '>&STDERR', $stderr, ...).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock merged commit 1523041 into master Mar 21, 2026
2 checks passed
@fglock fglock deleted the fix/utf8-valid-cpan-meta branch March 21, 2026 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant