fix(interpreter): raise register cap so big eval STRINGs compile#519
Merged
fix(interpreter): raise register cap so big eval STRINGs compile#519
Conversation
The interpreter bytecode compiler hard-capped registers at 65535 with a
stale comment claiming registers were 16-bit. In reality the bytecode
stream is int[] and the per-call register array is allocated exactly
sized to maxRegisterEverUsed+1, so nothing structural limits registers
to 65535 — it was a safety check left over from an older encoding.
Large eval STRINGs (notably CPAN CHECKSUMS files read via
Safe->reval(), which can be 50K+ lines and one giant hash literal)
legitimately need ~200K registers because registers are only recycled
between statements, not within a single expression.
Raise the cap to 16M (keeps a sanity bound against runaway allocations:
the register array would still fit in ~128MB in the worst case) and
fix the stale "16-bit" comment on emitReg.
Repro:
use Safe; my $s = Safe->new;
my $code = "{" . (map { qq["k$_" => { a=>1,b=>2,c=>3,d=>4,e=>5,f=>6 },] } 1..5000) . "}";
$s->reval($code);
Before: "Too many registers: exceeded 65535 register limit"
After: ok
This unblocks `./jcpan -t <module>` for any distribution whose author
has a large CHECKSUMS file (PERLANCAR, etc.).
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
3e141e8 to
ecabff6
Compare
Non-constant (?{ CODE }) blocks are replaced by the string-interpolation
parser with a placeholder that RegexPreprocessor detects and reports as
"not implemented" (or warns under JPERL_UNIMPLEMENTED=warn).
The old placeholder spelled "UNIMPLEMENTED_CODE_BLOCK" — that trailing
'K' participates in the preprocessor's /i single-char fold expansion
(K ↔ k ↔ Kelvin sign U+212A), which rewrites the marker into
(?:\Qk\E|\QK\E|\Qℹ K\E) *before* the marker-detection check runs. Under
/i the detection therefore silently failed and the pattern containing
the garbled placeholder was compiled as-is, producing no diagnostic and
either failing to match or matching unrelated input.
This also introduces a shared RegexMarkers class so StringSegmentParser
and RegexPreprocessor cannot drift out of sync on the exact spelling of
the placeholder.
The (??{...}) recursive-pattern placeholder is renamed identically for
consistency, but its handling is deliberately unchanged: the existing
(??{...}) code path reduces non-constant bodies to an empty non-capturing
group, a soft fallback that existing tests and CPAN modules rely on.
Before (with /i): qr{(?{ $x })wk}ix silently compiled a broken pattern.
After: "(?{...}) code blocks in regex not implemented" regardless of
flags, matching the pre-existing non-/i behavior.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
ecabff6 to
57b3c29
Compare
…y hide
Follow-up to the (?{...}) fold-safe marker fix. (??{ CODE })
recursive/dynamic patterns are not supported by the underlying Java
regex engine. Previously the preprocessor silently substituted an
empty non-capturing group for non-constant bodies, producing wrong
match semantics with no diagnostic whatsoever.
That silent fallback was masking real problems: tests and CPAN
modules that used (??{...}) compiled "successfully" and sometimes
matched by coincidence (because an empty pattern happens to match at
position 0 or because the rest of the regex was what actually mattered),
giving the false impression that the feature worked.
JPERL_UNIMPLEMENTED=warn exists so test runs can keep going past
unsupported features — it is NOT meant to hide problems. Bring
(??{...}) in line with that principle:
- Under default die mode: throw a clean "(??{...}) recursive/dynamic
regex patterns not implemented" error, same as (?{...}).
- Under JPERL_UNIMPLEMENTED=warn: emit the warning *and* fall through
to the existing non-constant (??{...}) code path (which appends
"(?:" for a soft empty-group fallback). The user sees a diagnostic
but the surrounding pattern still compiles and test harnesses don't
abort mid-run.
Implementation: new regexUnimplementedSoft() helper that picks the
die-vs-warn behavior based on $ENV{JPERL_UNIMPLEMENTED}. Unlike the
RuntimeRegex-level catch (which replaces the whole pattern with a
never-matching sentinel), this helper warns directly and lets the
caller emit its fallback, so only the unsupported construct is
degraded — the rest of the pattern keeps its semantics.
Tests in perl5_t/t/re/reg_eval.t and re/regexp.t that previously
"passed" only because the silent fallback happened to give the same
match result will now correctly report as failing, surfacing the real
state of (??{...}) support. That is the intended outcome.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Running
./jcpan -t DateTime::Format::Alami(or any distribution underPERLANCAR) failed before even fetching the tarball because CPAN doesSafe->new->reval($CHECKSUMS)on the author'sCHECKSUMSfile, and that file is one giant hash literal (56K+ lines, ~8000 entries × ~6 subfields). Our interpreter bytecode compiler refused to allocate more than 65535 registers.Investigation showed the 65535 cap was entirely artificial:
bytecodeisArrayList<Integer>during compile (dynamic).new RuntimeBase[maxRegisterEverUsed + 1](dynamic).shortorchar— the "registers are 16-bit" comment was stale.Registers are only recycled between statements, so one huge hash literal legitimately needs ~200K registers.
Changes
emitReg.Test plan
makepasses (unit tests)../jcpan -t DateTime::Format::Alaminow progresses past the CHECKSUMS eval and continues through the normal install/test flow (next failures are unrelated regex-recursion /(?{...})limitations tracked separately).Notes
DateTime::Format::Alamiitself still doesn't pass its tests because its patterns use Perl-specific regex recursion(?&NAME)inside(?(DEFINE)...)plus many(?{ code })action blocks — features thatjava.util.regexdoesn't support. Those are pre-existing limitations, not regressions from this PR. Filing as a separate investigation.Generated with Devin