diff --git a/dev/design/source_filters.md b/dev/design/source_filters.md index bc94f8129..7d69b8b56 100644 --- a/dev/design/source_filters.md +++ b/dev/design/source_filters.md @@ -475,6 +475,37 @@ print "X"; # Should print Y ### Remaining Work - [ ] **Phase 4**: Add method filter support (currently returns original source for method filters) - [ ] Add debug environment variable documentation (JPERL_FILTER_DEBUG=1) +- [ ] **Phase 5**: Fix FILTER_ONLY @transforms issue in Java instead of patching Filter::Simple (see below) + +### Known Issues + +#### FILTER_ONLY @transforms Scope Issue (2026-03-27) + +**Problem**: When multiple filter modules using `FILTER_ONLY` are loaded in sequence, the second filter's `$multitransform` closure incorrectly includes transforms from the first module. + +**Root Cause**: In Filter::Simple, `@transforms` is a package variable. In native Perl, this works because filters process source incrementally - each filter completes before the next filter module is loaded. In PerlOnJava, we tokenize upfront then apply filters, so multiple filter modules may be loaded before any filter runs, causing `@transforms` to accumulate transforms from different modules. + +**Current Fix**: Patched `Filter::Simple.pm` to make `@transforms` lexical in `FILTER_ONLY`: +```perl +sub FILTER_ONLY { + my $caller = caller; + my @transforms; # Made lexical instead of package-scoped + ... +} +``` + +**TODO - Proper Java-side Fix**: The ideal solution would be to fix this in PerlOnJava's module loading code: +1. Before loading a module that may use `FILTER_ONLY`, save `@Filter::Simple::transforms` +2. Clear `@Filter::Simple::transforms` +3. After module loading completes, restore the saved value + +This would allow using unmodified upstream Filter::Simple. The challenge is detecting which modules will use `FILTER_ONLY` before loading them. Possible approaches: +- Clear `@Filter::Simple::transforms` before every `require` (may have side effects) +- Track filter module loading depth and isolate transforms per level +- Hook into Filter::Simple's FILTER_ONLY to auto-reset before each call + +**Files affected by current fix**: +- `src/main/perl/lib/Filter/Simple.pm` (marked as `protected: true` in config.yaml) ### Files Modified - `src/main/java/org/perlonjava/runtime/perlmodule/FilterUtilCall.java` diff --git a/dev/import-perl5/config.yaml b/dev/import-perl5/config.yaml index 1bae669ae..3eb963bf3 100644 --- a/dev/import-perl5/config.yaml +++ b/dev/import-perl5/config.yaml @@ -640,6 +640,17 @@ imports: - source: perl5/cpan/Term-ANSIColor/lib/Term/ANSIColor.pm target: src/main/perl/lib/Term/ANSIColor.pm + # Filter::Simple - Simplified source filtering (used by Log::Log4perl :resurrect, etc.) + # Protected: has PerlOnJava-specific fix for @transforms scope in FILTER_ONLY + - source: perl5/dist/Filter-Simple/lib/Filter/Simple.pm + target: src/main/perl/lib/Filter/Simple.pm + protected: true + + # Tests for Filter::Simple + - source: perl5/dist/Filter-Simple/t + target: perl5_t/Filter-Simple + type: directory + # Class::Struct - Declare struct-like datatypes as Perl classes # Required by File::stat.pm - source: perl5/lib/Class/Struct.pm diff --git a/docs/about/changelog.md b/docs/about/changelog.md index c4ec38a09..ed30a84cd 100644 --- a/docs/about/changelog.md +++ b/docs/about/changelog.md @@ -31,6 +31,8 @@ Release history of PerlOnJava. See [Roadmap](roadmap.md) for future plans. - The interpreter mode excels at dynamic eval STRING operations (46x faster than compilation for unique strings, matching Perl 5 performance). For general code, it runs only 15% slower than Perl 5. It is also useful for implementing debugging, handling "Method too large" errors, and enabling Android and GraalVM compatibility. - Planned release date: 2026-04-10. +- Add modules: `Filter::Simple` with `FILTER` and `FILTER_ONLY` support. + - Work in Progress - PerlIO - `get_layers` @@ -45,7 +47,6 @@ Release history of PerlOnJava. See [Roadmap](roadmap.md) for future plans. - `ungetc` - Auto-bless filehandle into IO::Handle subclass - IO::Seekable - - Filter::Simple - Math::BigInt - Text::ParseWords - Text::Tabs diff --git a/docs/reference/feature-matrix.md b/docs/reference/feature-matrix.md index d0dc1a213..2a0090893 100644 --- a/docs/reference/feature-matrix.md +++ b/docs/reference/feature-matrix.md @@ -692,6 +692,7 @@ The `:encoding()` layer supports all encodings provided by Java's `Charset.forNa - ✅ **ExtUtils::MakeMaker** module: PerlOnJava version installs pure Perl modules directly. - ✅ **Fcntl** module - ✅ **FileHandle** module +- ✅ **Filter::Simple** module: `FILTER` and `FILTER_ONLY` for source code filtering. - ✅ **File::Basename** use the same version as Perl. - ✅ **File::Find** use the same version as Perl. - ✅ **File::Spec::Functions** module. diff --git a/src/main/java/org/perlonjava/core/Configuration.java b/src/main/java/org/perlonjava/core/Configuration.java index 5e9af45fa..06592f092 100644 --- a/src/main/java/org/perlonjava/core/Configuration.java +++ b/src/main/java/org/perlonjava/core/Configuration.java @@ -33,7 +33,7 @@ public final class Configuration { * Automatically populated by Gradle/Maven during build. * DO NOT EDIT MANUALLY - this value is replaced at build time. */ - public static final String gitCommitId = "8bb1eff41"; + public static final String gitCommitId = "9153c9ba8"; /** * Git commit date of the build (ISO format: YYYY-MM-DD). diff --git a/src/main/java/org/perlonjava/frontend/parser/Whitespace.java b/src/main/java/org/perlonjava/frontend/parser/Whitespace.java index 12b8bbb9d..e8959f77f 100644 --- a/src/main/java/org/perlonjava/frontend/parser/Whitespace.java +++ b/src/main/java/org/perlonjava/frontend/parser/Whitespace.java @@ -51,12 +51,12 @@ public static int skipWhitespace(Parser parser, int tokenIndex, List if (tokenIndex + 1 < tokens.size() && tokens.get(tokenIndex + 1).type == LexerTokenType.IDENTIFIER) { boolean inPod = true; - // Skip through pod section until 'cut' or 'end' is found + // Skip through pod section until '=cut' is found + // Note: '=end formatname' only ends a =begin block, not the entire POD section while (tokenIndex < tokens.size() && inPod) { String podEqual = tokens.get(tokenIndex).text; String podToken = tokens.get(tokenIndex + 1).text; - if (podEqual.equals("=") - && (podToken.equals("cut") || podToken.equals("end"))) { + if (podEqual.equals("=") && podToken.equals("cut")) { inPod = false; // End of pod } diff --git a/src/main/java/org/perlonjava/runtime/perlmodule/FilterUtilCall.java b/src/main/java/org/perlonjava/runtime/perlmodule/FilterUtilCall.java index c7ffde8df..4db2f5fb5 100644 --- a/src/main/java/org/perlonjava/runtime/perlmodule/FilterUtilCall.java +++ b/src/main/java/org/perlonjava/runtime/perlmodule/FilterUtilCall.java @@ -293,6 +293,29 @@ public static String applyFilters(String sourceCode) { if (debug) { System.err.println("[FILTER] Got chunk: " + chunk); } + + // Check if the chunk ends with __DATA__, __END__, or "no Module;" terminator + // If so, stop filtering and append remaining source unchanged + // This is important for Filter::Simple which stops at these terminators + // Pattern matches: + // - __DATA__ or __END__ at end of line + // - "no ModuleName;" at start of line (with optional comment) + if (chunk.matches("(?sm).*^__(?:DATA|END)__\\s*$") || + chunk.matches("(?sm).*^\\s*no\\s+[\\w:]+\\s*;.*$")) { + // Append remaining source unchanged + if (debug) { + System.err.println("[FILTER] Hit terminator, currentLine=" + context.currentLine + + ", totalLines=" + context.sourceLines.length); + } + while (context.currentLine < context.sourceLines.length) { + filteredCode.append(context.sourceLines[context.currentLine]); + context.currentLine++; + } + continueFiltering = false; + if (debug) { + System.err.println("[FILTER] Hit __DATA__/__END__ terminator, appending remaining source unchanged"); + } + } } // Check status - convert to scalar if it's a list @@ -318,7 +341,8 @@ public static String applyFilters(String sourceCode) { } if (debug) { - System.err.println("[FILTER] Final filtered code: " + filteredCode.toString().substring(0, Math.min(200, filteredCode.length()))); + System.err.println("[FILTER] Final filtered code length: " + filteredCode.length()); + System.err.println("[FILTER] Final filtered code: " + filteredCode.toString()); } return filteredCode.toString(); diff --git a/src/main/perl/lib/Filter/Simple.pm b/src/main/perl/lib/Filter/Simple.pm index 924c2aecb..fafc5c07b 100644 --- a/src/main/perl/lib/Filter/Simple.pm +++ b/src/main/perl/lib/Filter/Simple.pm @@ -144,6 +144,13 @@ sub FILTER (&;$) { sub FILTER_ONLY { my $caller = caller; + # PerlOnJava fix: @transforms must be lexical, not package-scoped. + # In native Perl, filters process source incrementally during parsing, + # so each filter completes before the next filter module is loaded. + # In PerlOnJava, we tokenize upfront then apply filters, so multiple + # filter modules may be loaded before any filter runs. Using a package + # variable causes transforms from different modules to accumulate. + my @transforms; while (@_ > 1) { my ($what, $how) = splice(@_, 0, 2); fail "Unknown selector: $what" diff --git a/src/test/resources/unit/pod.t b/src/test/resources/unit/pod.t new file mode 100644 index 000000000..268ca03a7 --- /dev/null +++ b/src/test/resources/unit/pod.t @@ -0,0 +1,77 @@ +# Test POD (Plain Old Documentation) parsing + +use strict; +use warnings; + +print "1..6\n"; + +my $test = 1; + +=pod + +Basic POD block + +=cut + +print "ok $test - basic POD block\n"; +$test++; + +=head1 NAME + +Test documentation + +=cut + +print "ok $test - =head1 POD block\n"; +$test++; + +=begin scrumbly + +This is inside a scrumbly format block. + +=end scrumbly + +This text is between =end scrumbly and =cut. +Per perlpod, this should be treated as POD, not code. +foo bar baz + +=cut + +print "ok $test - =begin/=end block with trailing content\n"; +$test++; + +=begin comment + +A comment block + +=end comment + +More POD content after =end but before =cut + +=cut + +print "ok $test - =begin comment block\n"; +$test++; + +=pod + +=end + +standalone =end stays in POD + +=cut + +print "ok $test - standalone =end inside POD\n"; +$test++; + +=begin cut + +this format is named 'cut' + +=end cut + +still in POD after =end cut + +=cut + +print "ok $test - =begin cut / =end cut format\n";