fix: List::MoreUtils PP-mode test failures (RC1-RC5)#515
Merged
Conversation
Under `use strict 'refs'`, dereferencing a scalar whose value is a plain
integer or double as an ARRAY/HASH ref now dies with the perl-compatible
message `Can't use string ("N") as an ARRAY ref while "strict refs" in use`
(previously `RuntimeScalar.arrayDeref()` silently returned an empty array
and `hashDeref()` threw "Not a HASH reference").
The same rule applies to read-only scalars that happen to hold a number
(e.g. a `foreach` loop variable aliased to a caller's literal argument),
matching perl: `for my $x (1) { @$x }` dies. Compile-time literal arrow
dereferences (`print 1->[0]`, `print 1->{a}`) stay silent via new
`arrayDerefGet` / `hashDerefGet` overrides on `RuntimeScalarReadOnly`,
also matching perl.
This unblocks four List::MoreUtils PP-mode tests whose `is_dying` checks
rely on this diagnostic: binsert.t, bremove.t, mesh.t, zip6.t.
Part of dev/modules/list_moreutils.md (RC1).
Generated with [Devin](https://devin.ai)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
POSIX.pm already exported `setlocale`, `localeconv`, and the LC_* category constants, but none of them had an implementation: any caller got `Undefined subroutine &POSIX::setlocale`. PerlOnJava cannot really switch the JVM/C locale, but adding minimal stubs is enough for modules that just call `setlocale(LC_COLLATE, "C")` for its return value or probe `localeconv()` for basic numeric formatting. The stubs return the requested locale name (`C` by default) and a reasonable default locale table. Unblocks List::MoreUtils PP-mode minmaxstr.t (RC3 in dev/modules/list_moreutils.md). Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Perl treats the `my` declaration in
`my @long_list = EXPR for LIST;` / `my $x = EXPR while COND;` as
declared in the enclosing scope — the variable name is visible for the
rest of the block — while each loop iteration still creates a fresh
instance. Without this, the enclosing scope sees an undeclared variable
and `use strict` bails at compile time, e.g.:
my @long_list = int rand(1000) for 0 .. 1E7;
my @part = part { ... } @long_list; # strict error pre-fix
The parser now detects this pattern in the `for`/`foreach` and
`while`/`until` statement-modifier branches, emits a bare `my DECL;` in
the enclosing scope, and leaves the inner `my DECL = RHS` in the loop
body (wrapped in a BlockNode for `while`/`until` so the inner `my`
properly shadows the outer one). The end-of-loop value of the outer
variable matches perl: empty/undef, because the body's `my` introduces
a fresh per-iteration instance.
Unblocks List::MoreUtils PP-mode part.t (RC2 in
dev/modules/list_moreutils.md).
Generated with [Devin](https://devin.ai)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Perl's split re-runs its regex at the end of each zero-width match with
REG_NOTEMPTY_ATSTART. When a consuming alternative of the pattern also
matches at that position, the consumed characters become an additional
separator and an empty field appears between the two separators, e.g.
split /(?:\b|\s)/, "Lorem ipsum,"
# ("Lorem", "", "ipsum", ",")
Java's Matcher tries alternations left-to-right and stops at the first
match, so `\b` always wins and the `\s` alternative is never attempted
at the same offset. Without compensation, the space leaks into the next
field and the empty separator disappears: jperl was returning
`("Lorem", " ", "ipsum", ",")`.
After each zero-width match, we now probe with `Matcher.matches()` on
regions of increasing length starting at matchEnd. The shortest region
the full pattern matches gives the length of the consuming alternative;
if found, we emit an empty field and advance past the consumed
characters.
Unblocks List::MoreUtils PP-mode mode.t (RC4 in
dev/modules/list_moreutils.md).
Generated with [Devin](https://devin.ai)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Perl emits
Use of uninitialized value in array element
Use of uninitialized value in hash element
whenever an undef is used as a subscript under `use warnings`. PerlOnJava
wasn't producing this diagnostic, which made it impossible to test
patterns like `$parts[$code->($_)]` where `$code` returns undef.
The warning is emitted from the RuntimeArray.get / RuntimeHash.get
entry points, covering both the rvalue read path and the lvalue / autoviv
path (both go through get()).
Unblocks List::MoreUtils PP-mode part.t warnings checks (RC5 in
dev/modules/list_moreutils.md).
Generated with [Devin](https://devin.ai)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes 6 of 7 failing subtests in
./jcpan -t List::MoreUtils(v0.430). The full plan and per-RC rationale live indev/modules/list_moreutils.md.The one remaining failure is
indexes.ttest 18, aScalar::Util::weaken-on-temporary test (RC6 in the design doc). That sits on top of PerlOnJava's cooperative-refcount overlay and is being addressed on a separateweakenbranch; this PR does not touch it.Commits
db94a5ae1binsert.t,bremove.t,mesh.t,zip6.ta161fa284minmaxstr.t3bfaffda3myinEXPR for LIST;/EXPR while COND;statement-modifier bodies to outer scope — unblockspart.tc9b8e05ddsplitemits empty field between zero-width and consuming match at the same offset —mode.t96c4f92d5Use of uninitialized value in array|hash elementon undef subscript —part.tleak-free testsf67c5860cdev/modules/list_moreutils.mdWhat changed
RC1 — strict-refs for numeric deref (
RuntimeScalar.java,RuntimeScalarReadOnly.java)arrayDeref()/hashDeref()used to silently return empty or throwNot a HASH referenceforINTEGER/DOUBLE. They now throw the perl-compatibleCan't use string ("N") as an ARRAY\|HASH ref while "strict refs" in use.RuntimeScalarReadOnlypicks up the same rule for loop aliases to literals (for my $x (1) { @$x }), but newarrayDerefGet/hashDerefGetoverrides keep1->[0]/1->{a}silent to match perl's literal-arrow compile-time optimization.RC3 — POSIX stubs (
src/main/perl/lib/POSIX.pm)POSIX.pmalready exportedsetlocale,localeconv, and theLC_*constants — none of them were actually defined. Added Perl stubs:setlocalereturns its locale argument,localeconvreturns a default "C"-locale table,LC_*are distinct small integers.RC2 —
myhoisting in statement-modifier loops (StatementResolver.java)Perl treats
my @x = EXPR for LIST;as declaring@xin the enclosing scope (so the rest of the block can refer to it) while each iteration creates a fresh instance. The parser now detects this pattern forfor/foreachandwhile/untilmodifiers, emits a baremy DECL;before the loop, and wraps thewhile/untilbody in a BlockNode so the innermyshadows the outer on each iteration. Outer-scope value matches perl: empty / undef.RC4 — split with zero-width ∥ consuming alternation (
Operator.java)Java's
Matcheralways tries alternations left-to-right, so in(?:\b|\s)the\bbranch always wins and\sis never attempted at the same offset. Perl's split, in contrast, re-runs the regex withREG_NOTEMPTY_ATSTARTafter each zero-width match — a consuming alternative at the same position becomes an additional separator with an empty field between the two. After each zero-width match we now probe viaMatcher.matches()on progressively larger regions starting atmatchEnd; the shortest region the pattern matches gives the length of the consuming alternative.RC5 — undef-as-subscript warning (
RuntimeArray.java,RuntimeHash.java)RuntimeArray.get/RuntimeHash.getnow emitUse of uninitialized value in array\|hash element(categoryuninitialized) when called with anUNDEFindex. Both the read and lvalue/autoviv paths go throughget().Test plan
make(build + unit tests) green after every commit./jcpan -t List::MoreUtils— 60/61 files green, 1/61 deferred to weaken branch (was 53/61 on master)use strict; my $x=1; @$xnow dies (was silent)print 1->[0]still silent (unchanged)my @x = (1,2) for 1..3; print scalar @x— now prints0, matching perlsplit /(?:\b|\s)/, "Lorem ipsum,"— matches perlGenerated with Devin
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>