Skip to content

Commit 2fc5115

Browse files
authored
Merge pull request #269 from fglock/fix-exiftool-cli
Fix exiftool CLI and achieve 100% interpreter test parity
2 parents cf4c600 + 398ec76 commit 2fc5115

26 files changed

Lines changed: 891 additions & 415 deletions
Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
---
2+
name: fix-pat-sprintf
3+
description: Fix re/pat.t and op/sprintf2.t test regressions on fix-exiftool-cli branch
4+
argument-hint: "[test-name or specific failure]"
5+
triggers:
6+
- user
7+
- model
8+
---
9+
10+
# Fix pat.t and sprintf2.t Regressions
11+
12+
You are fixing test regressions in `re/pat.t` (-17 tests) and `op/sprintf2.t` (-3 tests) on the `fix-exiftool-cli` branch of PerlOnJava.
13+
14+
## Hard Constraints
15+
16+
1. **No AST refactoring fallback.** The `LargeBlockRefactorer` / AST splitter must NOT be restored. This is non-negotiable.
17+
2. **Fix the interpreter.** The bytecode interpreter must achieve feature parity with the JVM compiler. Both backends must produce identical results for all Perl constructs.
18+
3. **Match the baseline exactly.** Target is the master baseline scores — no more, no less:
19+
- `re/pat.t`: 1056/1296
20+
- `op/sprintf2.t`: 1652/1655
21+
4. **Do NOT modify shared runtime** (`RuntimeRegex.java`, `RegexFlags.java`, `RegexPreprocessor.java`, etc.). The runtime is shared between both backends. Fixes must be in the interpreter code.
22+
23+
## Why the Interpreter Is Involved
24+
25+
Large subroutines that exceed the JVM 64KB method limit fall back to the bytecode interpreter via `EmitterMethodCreator.createRuntimeCode()`.
26+
27+
- **pat.t**: The `run_tests` subroutine (lines 38-2652, ~2614 lines) falls back to interpreter. All 1296 tests run through it. Confirmed with `JPERL_SHOW_FALLBACK=1`.
28+
- **sprintf2.t**: Same mechanism — large test body falls back to interpreter.
29+
30+
## Baseline vs Branch
31+
32+
| Test | Master baseline (397ba45d) | Branch HEAD | Delta |
33+
|------|---------------------------|-------------|-------|
34+
| re/pat.t | 1056/1296 | 1039/1296 | -17 |
35+
| op/sprintf2.t | 1652/1655 | 1649/1655 | -3 |
36+
37+
## Methodology
38+
39+
For each failing test:
40+
41+
1. **Extract** the specific Perl code from the test file
42+
2. **Compare** JVM vs interpreter output:
43+
```bash
44+
./jperl -E 'extracted code' # JVM backend (correct behavior)
45+
./jperl --interpreter -E 'extracted code' # Interpreter (may differ)
46+
```
47+
3. **When they differ**: identify the root cause in the interpreter code (BytecodeCompiler, BytecodeInterpreter, etc.) and fix it
48+
4. **When they don't differ standalone**: the failure depends on context from earlier tests in the same large function. Investigate what prior state affects the result — look at regex state, variable scoping, match variables, pos(), etc.
49+
5. **Verify** the fix doesn't break other tests
50+
51+
## Running the Tests
52+
53+
```bash
54+
# Build
55+
make build
56+
57+
# Run individual tests via test runner (sets correct ENV vars)
58+
perl dev/tools/perl_test_runner.pl perl5_t/t/re/pat.t
59+
perl dev/tools/perl_test_runner.pl perl5_t/t/op/sprintf2.t
60+
61+
# Run manually with correct ENV
62+
cd perl5_t/t
63+
PERL_SKIP_BIG_MEM_TESTS=1 JPERL_UNIMPLEMENTED=warn JPERL_OPTS="-Xss256m" ../../jperl re/pat.t
64+
PERL_SKIP_BIG_MEM_TESTS=1 JPERL_UNIMPLEMENTED=warn ../../jperl op/sprintf2.t
65+
66+
# Compare JVM vs interpreter for a specific construct
67+
./jperl -E 'code'
68+
./jperl --interpreter -E 'code'
69+
70+
# Check if a test file uses interpreter fallback
71+
cd perl5_t/t && JPERL_SHOW_FALLBACK=1 ../../jperl re/pat.t 2>&1 | grep 'interpreter backend'
72+
73+
# Get interpreter bytecodes for a construct
74+
./jperl --interpreter --disassemble -E 'code' 2>&1
75+
```
76+
77+
## pat.t: Exact Regressions (18 PASS->FAIL, 1 FAIL->PASS, net -17)
78+
79+
### Tests that went from PASS to FAIL
80+
81+
| # | Test Description | pat.t Line | Category |
82+
|---|-----------------|------------|----------|
83+
| 1 | Stack may be bad | 508 | regex match |
84+
| 2 | $^N, @- and @+ are read-only | 845-851 | eval STRING special vars |
85+
| 3-4 | \G testing (x2) | 858, 866 | \G anchor |
86+
| 5 | \b is not special | 1089 | word boundary |
87+
| 6-8 | \s, [[:space:]] and [[:blank:]] (x3) | 1223-1225 | POSIX classes |
88+
| 9 | got a latin string - rt75680 | 1252 | latin/unicode |
89+
| 10-11 | RT #3516 A, B | 1329, 1335 | \G loop |
90+
| 12 | Qr3 bare | ~1490 | qr// overload |
91+
| 13 | Qr3 bare - with use re eval | ~1498 | qr// eval |
92+
| 14 | Eval-group not allowed at runtime | 524 | regex eval |
93+
| 15-18 | Branch reset pattern 1-4 | 2392-2409 | branch reset |
94+
95+
### Test that went from FAIL to PASS
96+
97+
| Test Description | Category |
98+
|-----------------|----------|
99+
| 1 '', '1', '12' (Eval-group) | regex eval |
100+
101+
## Interpreter Architecture
102+
103+
```
104+
Source -> Lexer -> Parser -> AST --+--> JVM Compiler (EmitterMethodCreator) -> JVM bytecode
105+
\--> BytecodeCompiler -> InterpretedCode -> BytecodeInterpreter
106+
```
107+
108+
Both backends share the same runtime (RuntimeRegex, RuntimeScalar, etc.). The difference is ONLY in how the AST is lowered to executable form. The interpreter must handle every construct identically to the JVM compiler.
109+
110+
### Key interpreter files
111+
112+
| File | Role |
113+
|------|------|
114+
| `backend/bytecode/BytecodeCompiler.java` | AST -> interpreter bytecodes |
115+
| `backend/bytecode/BytecodeInterpreter.java` | Main dispatch loop |
116+
| `backend/bytecode/InterpretedCode.java` | Code object + disassembler |
117+
| `backend/bytecode/Opcodes.java` | Opcode constants |
118+
| `backend/bytecode/CompileAssignment.java` | Assignment compilation |
119+
| `backend/bytecode/CompileBinaryOperator.java` | Binary ops compilation |
120+
| `backend/bytecode/CompileOperator.java` | Unary/misc ops compilation |
121+
| `backend/bytecode/SlowOpcodeHandler.java` | Rarely-used op handlers |
122+
| `backend/bytecode/OpcodeHandlerExtended.java` | CREATE_CLOSURE, STORE_GLOB, etc. |
123+
| `backend/bytecode/MiscOpcodeHandler.java` | Misc operations |
124+
| `backend/bytecode/EvalStringHandler.java` | eval STRING compilation for interpreter |
125+
126+
All paths relative to `src/main/java/org/perlonjava/`.
127+
128+
### Key source files (do NOT modify)
129+
130+
| Area | File | Notes |
131+
|------|------|-------|
132+
| Regex runtime | `runtime/regex/RuntimeRegex.java` | DO NOT MODIFY |
133+
| Regex flags | `runtime/regex/RegexFlags.java` | DO NOT MODIFY |
134+
| Regex preprocessor | `runtime/regex/RegexPreprocessor.java` | DO NOT MODIFY |
135+
136+
All paths relative to `src/main/java/org/perlonjava/`.
137+
138+
## Verification Steps
139+
140+
After any fix:
141+
142+
```bash
143+
# 1. Build must pass
144+
make build
145+
146+
# 2. Unit tests must pass
147+
make test-unit
148+
149+
# 3. Check pat.t — must match baseline (1056/1296)
150+
perl dev/tools/perl_test_runner.pl perl5_t/t/re/pat.t
151+
152+
# 4. Check sprintf2.t — must match baseline (1652/1655)
153+
perl dev/tools/perl_test_runner.pl perl5_t/t/op/sprintf2.t
154+
155+
# 5. No regressions in other key tests
156+
perl dev/tools/perl_test_runner.pl perl5_t/t/op/pack.t
157+
perl dev/tools/perl_test_runner.pl perl5_t/t/re/pat_rt_report.t
158+
```
159+
160+
## Debugging Tips
161+
162+
### Compare raw output between baseline and branch
163+
```bash
164+
# Save branch output
165+
cd perl5_t/t && PERL_SKIP_BIG_MEM_TESTS=1 JPERL_UNIMPLEMENTED=warn JPERL_OPTS="-Xss256m" ../../jperl re/pat.t > /tmp/pat_branch.txt 2>&1
166+
167+
# Compare by test name against saved baseline
168+
LC_ALL=C diff \
169+
<(LC_ALL=C grep -E '^(ok|not ok)' /tmp/pat_base_raw.txt | LC_ALL=C sed 's/^ok [0-9]* - /PASS: /;s/^not ok [0-9]* - /FAIL: /' | LC_ALL=C sort) \
170+
<(LC_ALL=C grep -E '^(ok|not ok)' /tmp/pat_branch.txt | LC_ALL=C sed 's/^ok [0-9]* - /PASS: /;s/^not ok [0-9]* - /FAIL: /' | LC_ALL=C sort) \
171+
| grep '^[<>]'
172+
```
173+
174+
### Test specific construct through both backends
175+
```bash
176+
./jperl -E 'my $s="abcde"; pos $s=2; say $s =~ /^\G/ ? "match" : "no"'
177+
./jperl --interpreter -E 'my $s="abcde"; pos $s=2; say $s =~ /^\G/ ? "match" : "no"'
178+
```

.cognition/skills/interpreter-parity/SKILL.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,48 @@ Test::More → Test::Builder → Test::Builder::Formatter → Test2::Formatter::
200200
```
201201
The failure is a ClassCastException in `Test/Builder/Formatter.pm` BEGIN block where `*OUT_STD = Test2::Formatter::TAP->can('OUT_STD')` — method call result (RuntimeList) is stored to glob (expects RuntimeScalar).
202202

203+
## Design Decision: JVM Emitter Must Not Mutate the AST
204+
205+
When the JVM backend fails with `MethodTooLargeException` (or `VerifyError`, etc.), `createRuntimeCode()` in `EmitterMethodCreator.java` falls back to the interpreter via `compileToInterpreter(ast, ...)`. The same fallback exists in `PerlLanguageProvider.compileToExecutable()`.
206+
207+
**Problem**: The JVM emitter (EmitterVisitor and helpers) mutates the AST during code generation. If JVM compilation fails partway through, the interpreter receives a corrupted AST, producing wrong results. This is the root cause of mixed-mode failures (e.g., pack.t gets 45 extra failures when the main script falls back to interpreter after partial JVM emission).
208+
209+
**Rule**: The JVM emitter must NEVER permanently mutate AST nodes. All mutations must either:
210+
1. Be avoided entirely (work on local copies), OR
211+
2. Use save/restore in try/finally (already done in `EmitLogicalOperator.java`)
212+
213+
### Known AST mutation sites
214+
215+
| File | Line(s) | What it mutates | Status |
216+
|------|---------|-----------------|--------|
217+
| `EmitOperator.java` | ~373 | `operand.elements.addFirst(operand.handle)` in `handleSystemBuiltin` — adds handle to elements list, never removed | **DANGEROUS** |
218+
| `Dereference.java` | ~347,442,511,579,911 | `nodeRight.elements.set(0, new StringNode(...))` — converts IdentifierNode to StringNode for hash autoquoting. `nodeRight` comes from `asListNode()` which creates a new ListNode but shares the same `elements` list | **DANGEROUS** — mutates shared elements list |
219+
| `EmitLogicalOperator.java` | ~188,300,340 | Temporarily rewrites `declaration.operator`/`.operand` | **SAFE** — uses save/restore in try/finally |
220+
| `EmitControlFlow.java` | ~280 | `argsNode.elements.add(atUnderscore)` | **SAFE**`argsNode` is a freshly created ListNode |
221+
| `EmitOperator.java` | ~398,410 | `handleSpliceBuiltin` removes/restores first element | **SAFE** — uses try/finally restore |
222+
| Annotations (`setAnnotation`) | various | Sets `blockIsSubroutine`, `skipRegexSaveRestore`, `isDeclaredReference` | **Likely safe** — annotations are additive hints, but verify interpreter handles them |
223+
224+
### How to fix dangerous sites
225+
226+
**`handleSystemBuiltin` (EmitOperator.java:373)**: Wrap in try/finally to remove the added element after accept():
227+
```java
228+
if (operand.handle != null) {
229+
hasHandle = true;
230+
operand.elements.addFirst(operand.handle);
231+
}
232+
try {
233+
operand.accept(emitterVisitor.with(RuntimeContextType.LIST));
234+
} finally {
235+
if (hasHandle) {
236+
operand.elements.removeFirst();
237+
}
238+
}
239+
```
240+
241+
**Dereference.java autoquoting**: `asListNode()` creates a new ListNode but passes the SAME `elements` list reference. The `elements.set(0, ...)` call mutates the original HashLiteralNode's elements. Fix by either:
242+
- Making `asListNode()` copy the elements list: `new ListNode(new ArrayList<>(elements), tokenIndex)`
243+
- Or saving/restoring the original element in try/finally
244+
203245
## Lessons Learned
204246

205247
### InterpretedCode constructor drops metadata
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
---
2+
name: migrate-jna
3+
description: Migrate from JNA to a modern native access library (eliminate sun.misc.Unsafe warnings)
4+
argument-hint: "[library choice or file to migrate]"
5+
triggers:
6+
- user
7+
---
8+
9+
# Migrate JNA to Modern Native Access Library
10+
11+
## Problem
12+
13+
JNA 5.18.1 uses `sun.misc.Unsafe::staticFieldBase` internally, which produces deprecation warnings on Java 21+ and will break in future JDK releases. The project needs to migrate to a library that uses supported APIs.
14+
15+
## Candidate Replacement Libraries
16+
17+
The choice of replacement library is TBD. Evaluate these options:
18+
19+
### Option A: jnr-posix
20+
- **Maven**: `com.github.jnr:jnr-posix`
21+
- **Pros**: Purpose-built for POSIX ops, used by JRuby (production-proven), clean high-level API (`FileStat`, `kill()`, `waitpid()`, `umask()`, `utime()`), built on jnr-ffi (no `sun.misc.Unsafe`)
22+
- **Cons**: Third-party dependency, may not cover Windows-specific calls
23+
24+
### Option B: Java Foreign Function & Memory API (FFM)
25+
- **Module**: `java.lang.foreign` (JDK built-in)
26+
- **Pros**: No third-party dependency, official JDK solution, no deprecated APIs
27+
- **Cons**: Stable only since Java 22 (preview in 21), verbose low-level API, requires manual struct layout definitions
28+
- **Note**: If the project bumps minimum to Java 22, this becomes viable without preview flags
29+
30+
### Option C: jnr-ffi (without jnr-posix)
31+
- **Maven**: `com.github.jnr:jnr-ffi`
32+
- **Pros**: Modern JNA alternative, no `sun.misc.Unsafe`, flexible
33+
- **Cons**: Lower-level than jnr-posix, requires manual bindings (similar effort to FFM)
34+
35+
## Current JNA Usage
36+
37+
10 files use JNA. All paths relative to `src/main/java/org/perlonjava/`.
38+
39+
### Native interface definitions
40+
41+
| File | JNA Usage |
42+
|------|-----------|
43+
| `runtime/nativ/PosixLibrary.java` | POSIX C library bindings: `stat`, `lstat`, `chmod`, `chown`, `getpid`, `getppid`, `setpgid`, `getpgid`, `setsid`, `tcsetpgrp`, `tcgetpgrp`, `getpgrp`, `setpgrp` |
44+
| `runtime/nativ/WindowsLibrary.java` | Windows kernel32 bindings: `GetCurrentProcessId`, `_getpid` |
45+
| `runtime/nativ/NativeUtils.java` | JNA Platform utilities: `getpid()`, `getuid()`, `geteuid()`, `getgid()`, `getegid()`, plus `CLibrary` for `getpriority`/`setpriority`/`alarm`/`getlogin` |
46+
| `runtime/nativ/ExtendedNativeUtils.java` | Additional POSIX: `getpwuid`, `getpwnam`, `getgrnam`, `getgrgid` (passwd/group lookups) |
47+
48+
### Consumers (files that call native operations)
49+
50+
| File | Operations Used |
51+
|------|----------------|
52+
| `runtime/operators/Stat.java` | `PosixLibrary.stat()`, `PosixLibrary.lstat()` — all 13 stat fields (dev, ino, mode, nlink, uid, gid, rdev, size, atime, mtime, ctime, blksize, blocks) |
53+
| `runtime/operators/Operator.java` | `PosixLibrary.chmod()`, `PosixLibrary.chown()`, `NativeUtils` for pid/uid/gid |
54+
| `runtime/operators/KillOperator.java` | `PosixLibrary.kill()` for sending signals, `NativeUtils.getpid()` |
55+
| `runtime/operators/WaitpidOperator.java` | JNA `CLibrary.waitpid()` with `WNOHANG`/`WUNTRACED` flags, macros `WIFEXITED`/`WEXITSTATUS`/`WIFSIGNALED`/`WTERMSIG`/`WIFSTOPPED`/`WSTOPSIG` |
56+
| `runtime/operators/UmaskOperator.java` | JNA `CLibrary.umask()` |
57+
| `runtime/operators/UtimeOperator.java` | JNA `CLibrary.utimes()` with `timeval` struct |
58+
59+
## Migration Strategy
60+
61+
### Phase 1: Replace native interface definitions
62+
1. Create new interface files using the chosen library
63+
2. Keep the same method signatures where possible
64+
3. Ensure struct mappings (stat, timeval, passwd, group) are complete
65+
66+
### Phase 2: Update consumers one by one
67+
Migrate in this order (least to most complex):
68+
1. `UmaskOperator.java` — single `umask()` call
69+
2. `KillOperator.java``kill()` + `getpid()`
70+
3. `UtimeOperator.java``utimes()` with struct
71+
4. `Operator.java``chmod()`, `chown()`, pid/uid/gid
72+
5. `WaitpidOperator.java``waitpid()` with flag macros
73+
6. `Stat.java``stat()`/`lstat()` with 13-field struct
74+
7. `NativeUtils.java` / `ExtendedNativeUtils.java` — passwd/group lookups
75+
76+
### Phase 3: Remove JNA dependency
77+
1. Remove JNA imports from all files
78+
2. Remove JNA from `build.gradle` and `pom.xml`
79+
3. Remove `--enable-native-access=ALL-UNNAMED` from `jperl` launcher (if no longer needed)
80+
4. Verify the `sun.misc.Unsafe` warning is gone
81+
82+
## Testing
83+
84+
After each file migration:
85+
```bash
86+
make # Must pass
87+
make test-all # Check for regressions
88+
```
89+
90+
Key tests that exercise native operations:
91+
- `perl5_t/t/op/stat.t` — stat/lstat fields
92+
- `perl5_t/t/io/fs.t` — chmod, chown, utime
93+
- `perl5_t/t/op/fork.t` — kill, waitpid
94+
- `src/test/resources/unit/glob.t` — readdir (uses stat internally)
95+
96+
## Build Configuration
97+
98+
### Current JNA in gradle
99+
```
100+
# gradle/libs.versions.toml
101+
jna = "5.18.1"
102+
jna = { module = "net.java.dev.jna:jna", version.ref = "jna" }
103+
jna-platform = { module = "net.java.dev.jna:jna-platform", version.ref = "jna" }
104+
```
105+
106+
### Current JNA in pom.xml
107+
```xml
108+
<dependency>
109+
<groupId>net.java.dev.jna</groupId>
110+
<artifactId>jna</artifactId>
111+
</dependency>
112+
<dependency>
113+
<groupId>net.java.dev.jna</groupId>
114+
<artifactId>jna-platform</artifactId>
115+
</dependency>
116+
```
117+
118+
## Platform Considerations
119+
120+
- **macOS/Linux**: Full POSIX support required (stat, lstat, kill, waitpid, chmod, chown, umask, utime, passwd/group lookups)
121+
- **Windows**: Limited support via `kernel32` (`GetCurrentProcessId`), `msvcrt` (`_getpid`, stat)
122+
- The replacement must handle both platforms, or gracefully degrade on Windows (as JNA currently does)

jperl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,5 +24,5 @@ else
2424
fi
2525

2626
# Launch Java
27-
java ${JPERL_OPTS} -cp "$CLASSPATH:$JAR_PATH" org.perlonjava.app.cli.Main "$@"
27+
java --enable-native-access=ALL-UNNAMED ${JPERL_OPTS} -cp "$CLASSPATH:$JAR_PATH" org.perlonjava.app.cli.Main "$@"
2828

0 commit comments

Comments
 (0)