Skip to content

fix(interpreter): raise register cap so big eval STRINGs compile#519

Merged
fglock merged 3 commits intomasterfrom
fix/interpreter-register-limit
Apr 21, 2026
Merged

fix(interpreter): raise register cap so big eval STRINGs compile#519
fglock merged 3 commits intomasterfrom
fix/interpreter-register-limit

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Apr 21, 2026

Summary

Running ./jcpan -t DateTime::Format::Alami (or any distribution under PERLANCAR) failed before even fetching the tarball because CPAN does Safe->new->reval($CHECKSUMS) on the author's CHECKSUMS file, and that file is one giant hash literal (56K+ lines, ~8000 entries × ~6 subfields). Our interpreter bytecode compiler refused to allocate more than 65535 registers.

Investigation showed the 65535 cap was entirely artificial:

  • bytecode is ArrayList<Integer> during compile (dynamic).
  • The per-call register array is allocated exactly sized at new RuntimeBase[maxRegisterEverUsed + 1] (dynamic).
  • Nothing stores register indices as short or char — the "registers are 16-bit" comment was stale.

Registers are only recycled between statements, so one huge hash literal legitimately needs ~200K registers.

Changes

  • Raise the interpreter register cap from 65535 to 16M (128MB worst-case frame allocation — still a reasonable sanity bound against runaway allocations, but big enough for every realistic workload we've seen).
  • Drop the stale "16-bit" comment on emitReg.

Test plan

  • make passes (unit tests).
  • Minimal repro that previously died now succeeds:
    use Safe; my $s = Safe->new;
    my $code = "{"; $code .= qq["k$_" => { a=>1,b=>2,c=>3,d=>4,e=>5,f=>6 },\n] for 1..5000;
    $code .= "}"; $s->reval($code);  # was "Too many registers...", now OK
  • ./jcpan -t DateTime::Format::Alami now progresses past the CHECKSUMS eval and continues through the normal install/test flow (next failures are unrelated regex-recursion / (?{...}) limitations tracked separately).

Notes

DateTime::Format::Alami itself still doesn't pass its tests because its patterns use Perl-specific regex recursion (?&NAME) inside (?(DEFINE)...) plus many (?{ code }) action blocks — features that java.util.regex doesn't support. Those are pre-existing limitations, not regressions from this PR. Filing as a separate investigation.

Generated with Devin

The interpreter bytecode compiler hard-capped registers at 65535 with a
stale comment claiming registers were 16-bit. In reality the bytecode
stream is int[] and the per-call register array is allocated exactly
sized to maxRegisterEverUsed+1, so nothing structural limits registers
to 65535 — it was a safety check left over from an older encoding.

Large eval STRINGs (notably CPAN CHECKSUMS files read via
Safe->reval(), which can be 50K+ lines and one giant hash literal)
legitimately need ~200K registers because registers are only recycled
between statements, not within a single expression.

Raise the cap to 16M (keeps a sanity bound against runaway allocations:
the register array would still fit in ~128MB in the worst case) and
fix the stale "16-bit" comment on emitReg.

Repro:
    use Safe; my $s = Safe->new;
    my $code = "{" . (map { qq["k$_" => { a=>1,b=>2,c=>3,d=>4,e=>5,f=>6 },] } 1..5000) . "}";
    $s->reval($code);
Before: "Too many registers: exceeded 65535 register limit"
After:  ok

This unblocks `./jcpan -t <module>` for any distribution whose author
has a large CHECKSUMS file (PERLANCAR, etc.).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock force-pushed the fix/interpreter-register-limit branch from 3e141e8 to ecabff6 Compare April 21, 2026 08:45
Non-constant (?{ CODE }) blocks are replaced by the string-interpolation
parser with a placeholder that RegexPreprocessor detects and reports as
"not implemented" (or warns under JPERL_UNIMPLEMENTED=warn).

The old placeholder spelled "UNIMPLEMENTED_CODE_BLOCK" — that trailing
'K' participates in the preprocessor's /i single-char fold expansion
(K ↔ k ↔ Kelvin sign U+212A), which rewrites the marker into
(?:\Qk\E|\QK\E|\Qℹ K\E) *before* the marker-detection check runs. Under
/i the detection therefore silently failed and the pattern containing
the garbled placeholder was compiled as-is, producing no diagnostic and
either failing to match or matching unrelated input.

This also introduces a shared RegexMarkers class so StringSegmentParser
and RegexPreprocessor cannot drift out of sync on the exact spelling of
the placeholder.

The (??{...}) recursive-pattern placeholder is renamed identically for
consistency, but its handling is deliberately unchanged: the existing
(??{...}) code path reduces non-constant bodies to an empty non-capturing
group, a soft fallback that existing tests and CPAN modules rely on.

Before (with /i): qr{(?{ $x })wk}ix silently compiled a broken pattern.
After: "(?{...}) code blocks in regex not implemented" regardless of
flags, matching the pre-existing non-/i behavior.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock force-pushed the fix/interpreter-register-limit branch from ecabff6 to 57b3c29 Compare April 21, 2026 08:58
…y hide

Follow-up to the (?{...}) fold-safe marker fix. (??{ CODE })
recursive/dynamic patterns are not supported by the underlying Java
regex engine. Previously the preprocessor silently substituted an
empty non-capturing group for non-constant bodies, producing wrong
match semantics with no diagnostic whatsoever.

That silent fallback was masking real problems: tests and CPAN
modules that used (??{...}) compiled "successfully" and sometimes
matched by coincidence (because an empty pattern happens to match at
position 0 or because the rest of the regex was what actually mattered),
giving the false impression that the feature worked.

JPERL_UNIMPLEMENTED=warn exists so test runs can keep going past
unsupported features — it is NOT meant to hide problems. Bring
(??{...}) in line with that principle:

- Under default die mode: throw a clean "(??{...}) recursive/dynamic
  regex patterns not implemented" error, same as (?{...}).
- Under JPERL_UNIMPLEMENTED=warn: emit the warning *and* fall through
  to the existing non-constant (??{...}) code path (which appends
  "(?:" for a soft empty-group fallback). The user sees a diagnostic
  but the surrounding pattern still compiles and test harnesses don't
  abort mid-run.

Implementation: new regexUnimplementedSoft() helper that picks the
die-vs-warn behavior based on $ENV{JPERL_UNIMPLEMENTED}. Unlike the
RuntimeRegex-level catch (which replaces the whole pattern with a
never-matching sentinel), this helper warns directly and lets the
caller emit its fallback, so only the unsupported construct is
degraded — the rest of the pattern keeps its semantics.

Tests in perl5_t/t/re/reg_eval.t and re/regexp.t that previously
"passed" only because the silent fallback happened to give the same
match result will now correctly report as failing, surfacing the real
state of (??{...}) support. That is the intended outcome.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock merged commit f8e0a8f into master Apr 21, 2026
2 checks passed
@fglock fglock deleted the fix/interpreter-register-limit branch April 21, 2026 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant