Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions dev/modules/text_csv_bundled_tests_plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Text::CSV Bundled Module Tests Plan

## Goal

Add the original CPAN Text::CSV 2.06 test suite to `src/test/resources/module/Text-CSV/`
and make all tests pass against a bundled Text::CSV stack that uses Java as the XS backend.

## Architecture (Revised)

Instead of reimplementing hundreds of methods in our simplified CSV.pm, we now use the
**original CPAN modules** with Java replacing XS:

```
Text::CSV (CPAN 2.06 wrapper)
└─ tries Text::CSV_XS first, falls back to Text::CSV_PP
├─ Text::CSV_XS (our Java-backed module)
│ └─ inherits from Text::CSV_PP
│ └─ $VERSION = "1.61" (satisfies >= 1.60 check)
│ └─ XSLoader::load('TextCsv') for Java parse/combine
│ └─ overrides parse()/combine() with Java acceleration
└─ Text::CSV_PP (CPAN 2.06, 6454 lines, pure Perl)
└─ complete implementation: all accessors, meta_info,
callbacks, types, formula, etc.
```

**Files:**
- `src/main/perl/lib/Text/CSV.pm` — CPAN 2.06 wrapper (146 lines of code)
- `src/main/perl/lib/Text/CSV_PP.pm` — CPAN 2.06 pure-Perl backend (6454 lines)
- `src/main/perl/lib/Text/CSV_XS.pm` — our Java-backed XS replacement
- `src/main/java/org/perlonjava/runtime/perlmodule/TextCsv.java` — Java parse/combine

**Why this approach:**
- The CPAN `Text::CSV_PP` already passes 39/40 tests (52,356/52,360 subtests) on PerlOnJava
(documented in `dev/modules/text_csv_fix_plan.md`)
- All complex logic (accessors, meta_info, callbacks, types, formula, etc.) is handled
by the battle-tested CPAN code
- Java only needs to implement the performance-critical parse/combine operations
- No need to reimplement hundreds of methods

## Test Files

40 test files + `t/util.pl` helper + 2 CSV data files copied from Text-CSV-2.06 CPAN
distribution to `src/test/resources/module/Text-CSV/`.

## Implementation Plan

### Phase 1: Bundle CPAN Modules

1. Replace `src/main/perl/lib/Text/CSV.pm` with CPAN 2.06 wrapper
2. Copy `Text/CSV_PP.pm` from CPAN to `src/main/perl/lib/Text/`
3. Create `Text/CSV_XS.pm` that inherits from `Text::CSV_PP`:
- `$VERSION = "1.61"` (passes Text::CSV's `>= 1.60` check)
- Inherits all methods from CSV_PP via `@ISA`
- Exports same constants/functions
- Later: override parse/combine with Java acceleration

### Phase 2: Fix TextCsv.java Registration

The existing `TextCsv.java` registers methods on `Text::CSV` package. After the refactor:
- Either update it to register on `Text::CSV_XS` package
- Or disable it if Text::CSV_PP handles everything
- The XSLoader::load('TextCsv') call in old CSV.pm needs to be removed/updated

### Phase 3: Run Tests and Debug

With CPAN modules bundled, most tests should pass immediately (39/40 based on prior work).
Known remaining issue from `text_csv_fix_plan.md`:
- t/70_rt.t: 4 subtest failures (raw non-UTF-8 bytes, IO::Handle edge cases)

### Phase 4: Java Acceleration (Optional, Future)

Override `parse()` and `combine()` in `Text::CSV_XS` to delegate to Java:
- `Parse($str, $fields, $fflags)` — Java via XSLoader
- `Combine(\$str, \@fields, $useIO)` — Java via XSLoader
- `SetDiag($code, $msg)` — Java error management
- `_cache_set` / `_cache_get_eolt` — Java cache

This is optional since CSV_PP already works. Add it for performance.

## XS API Contract (for future Java implementation)

The XS module must provide these C-level functions (via XSLoader):

| XS Method | Called From | Purpose |
|-----------|------------|---------|
| `Parse($str, $fields, $fflags)` | `parse()` | Core CSV parsing |
| `Combine(\$str, \@fields, $useIO)` | `combine()`, `print()` | Core CSV combining |
| `SetDiag($code, $msg?)` | everywhere | Error diagnostics |
| `_cache_set($idx, $val)` | accessor setters | XS state cache |
| `_cache_get_eolt()` | `eol_type()` | EOL type detection |
| `_cache_diag()` | debugging | Cache dump |
| `print($io, $fields)` | `print()` | Direct IO print |
| `getline($io)` | `getline()` | Direct IO getline |
| `getline_all($io, ...)` | `getline_all()` | Batch IO getline |
| `error_input()` | `error_input()` | Last error input |

## Progress Tracking

### Current Status: Phase 1 in progress

### Completed
- [x] Copy CPAN test files to module/Text-CSV/ (2026-04-08)
- [x] Architecture decision: use CPAN Text::CSV + CSV_PP + Java-backed CSV_XS
- [x] Analyzed XS API contract from Text-CSV_XS-1.61 source

### In Progress
- [ ] Bundle CPAN Text::CSV.pm (replace our custom one)
- [ ] Bundle CPAN Text::CSV_PP.pm
- [ ] Create Text::CSV_XS.pm (inherits from CSV_PP)

### Next Steps
1. Handle TextCsv.java registration (update or disable)
2. Build and run tests
3. Debug any remaining failures
4. (Future) Add Java acceleration for parse/combine
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ public class PerlLanguageProvider {
public static void resetAll() {
globalInitialized = false;
resetAllGlobals();
DataSection.reset();
}

/**
Expand Down
4 changes: 2 additions & 2 deletions src/main/java/org/perlonjava/core/Configuration.java
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ public final class Configuration {
* Automatically populated by Gradle/Maven during build.
* DO NOT EDIT MANUALLY - this value is replaced at build time.
*/
public static final String gitCommitId = "cc2df66a9";
public static final String gitCommitId = "5eecf59d6";

/**
* Git commit date of the build (ISO format: YYYY-MM-DD).
Expand All @@ -48,7 +48,7 @@ public final class Configuration {
* Parsed by App::perlbrew and other tools via: perl -V | grep "Compiled at"
* DO NOT EDIT MANUALLY - this value is replaced at build time.
*/
public static final String buildTimestamp = "Apr 8 2026 09:46:56";
public static final String buildTimestamp = "Apr 8 2026 10:45:34";

// Prevent instantiation
private Configuration() {
Expand Down
9 changes: 9 additions & 0 deletions src/main/java/org/perlonjava/frontend/parser/DataSection.java
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,15 @@ public class DataSection {
*/
private static final Set<String> placeholderCreated = new HashSet<>();

/**
* Resets all static state for DataSection.
* Called between test runs to prevent stale state from interfering.
*/
public static void reset() {
processedPackages.clear();
placeholderCreated.clear();
}

/**
* Creates a placeholder DATA filehandle for a package early in parsing.
* This ensures the DATA filehandle exists during BEGIN block execution.
Expand Down
15 changes: 7 additions & 8 deletions src/main/java/org/perlonjava/runtime/perlmodule/TextCsv.java
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,15 @@ public TextCsv() {

/**
* Initializes and registers all Text::CSV methods.
*
* NOTE: Registration is intentionally disabled because Text::CSV now
* delegates to Text::CSV_XS (which inherits from Text::CSV_PP).
* Registering Java-backed parse/combine on "Text::CSV" would override
* the pure-Perl implementations inherited through the CPAN wrapper.
*/
public static void initialize() {
TextCsv csv = new TextCsv();
try {
// Register core CSV methods (high-level methods now in Perl)
csv.registerMethod("parse", null);
csv.registerMethod("combine", null);
} catch (NoSuchMethodException e) {
System.err.println("Warning: Missing Text::CSV method: " + e.getMessage());
}
// No-op: Java-backed CSV methods are no longer used.
// The CPAN Text::CSV wrapper + Text::CSV_PP handle everything.
}

/**
Expand Down
Loading
Loading