Bound prefer_inner_replay_corrections by depth_diff to substitute siblings only#682
Bound prefer_inner_replay_corrections by depth_diff to substitute siblings only#682stefanobaghino wants to merge 37 commits into
Conversation
push_meta_ops emitted the compound Pop before the Restore when the leaving context carried `clear_scopes:` and `set_pop_count > 1`. The intermediate popped frame's meta_content_scope is split across the live scope stack and clear_stack; Pop — counting the full pre-Clear total — then ate atoms from frames below the popped range. Observed on Batch File `cmd-set-quoted-value-inner-end` (`clear_scopes: 1`) firing `pop: 2, set: ignored-tail-outer`, which dropped `meta.command.set.dosbatch` from the trailing content of every quoted `set "var"=...` line. Emit Restore before Pop when `set_pop_count > 1`; plain `set:` keeps the existing Pop-then-Restore order (gated by the Lisp `defun` test `v2_set_to_target_with_clear_scopes_clears_parent_meta_content_scope`). syntax_test_batch_file.bat: 74 → 0 on both backends; no other baseline entry changes. New regression `pop_n_set_with_cur_clear_scopes_restores_before_popping_deeper_frames`. Refs: trishume#631
Sublime Text applies `captures: N:` to the overlap between group N's span and the rule's consumed match range. syntect dropped lookaround- internal captures in `parse_captures` and would have emitted Pop past match_end in `build_capture_ops` had they reached it. Keep every non-negative `captures:` key at load time; clip `(cap_start, cap_end)` to `regions.pos(0)` at apply time. Removes the now-unused `get_consuming_capture_indexes` walker and its tests. Baseline (both backends): clears ASP/syntax_test_asp.asp (53), C#/tests/syntax_test_Generics.cs (3), Rails/tests/syntax_test_rails.html.erb (23). Refs: trishume#631
ST drops the popped context's `meta_scope` and `meta_content_scope`
from the trigger match's text for `pop: N + embed:`, unlike
`pop: N + set:` which preserves both. Rules in the wild re-add the
meta_scope atom explicitly in their match `scope:` so it still
appears exactly once on the trigger — HTML (JSP).sublime-syntax's
`tag-jsp-{declaration,expression,scriptlet}-attributes` all do.
syntect's Embed → synthetic Set routing in `push_meta_ops`
inherited plain-Set semantics, so cur.meta_scope stayed on the
stack and the match's explicit scope duplicated it on top.
Fix: when `pop_count > 0`, emit initial-phase Pops for cur's mcs
and ms, then pass a scope-stripped clone of cur through to the
recursive Set call so its non-initial `num_to_pop` doesn't
double-account atoms that are already off the stack. Probe and
ordering invariant in `v2_pop_embed_suppresses_cur_meta_scope_on_match`.
Net syntest: Java/jsp 44 → 39 on both baselines (5
`- meta.tag.jsp meta.tag.jsp` assertions cleared). The other 39
are three unrelated root causes, not addressed here.
Refs: trishume#631
A cross-line `fail` replay commits a Push(meta_scope) to
`flushed_ops` for a speculative context; a later same-line `fail`
for a branch_point created *during* the replay then truncates the
owning context out of `self.stack` without emitting a balancing
Pop (the Push is in `flushed_ops`, beyond `ops.truncate`'s reach).
`exec_escape` pops based on the truncated stack, leaving an
orphan atom at the top.
Track a `shadow: ScopeStack` mirror of the consumer view, synced
at `parse_line` boundaries the same way `syntest` applies
`replayed` + `ops`. `exec_escape` now emits a corrective Pop for
any atoms exceeding the sum of `self.stack`'s `meta_scope` /
gated `meta_content_scope` contributions.
Drops `syntax_test_latex.tex: 76` from both
`testdata/known_syntest_failures{,_fancy}.txt`.
`IncludeWithPrototype` in `MatchIter` pushed the included target on top of the prototype, and `MatchIter::next` reads the stack top (`ctx_stack[len-1]`) — so the target's patterns were iterated first and the external prototype's second. The parser's tie-break on match_start is strict `<`, so whichever rule is enumerated first wins a same-position match. ST's `apply_prototype` semantics and `ParseState::find_best_match`'s own `context.prototype` chaining (`chain(cur_prototype).chain(cur_context)`) both put prototype patterns ahead of the target. Swap the two pushes so the prototype lands on top of the stack and is iterated first. Concretely: in HAML `tag-attributes-content`, `ruby-code` does `include: scope:source.ruby.rails.embedded.haml apply_prototype: true`. Ruby-for-HAML's prototype injects HAML's `pipe-continuations` (match `|\s*$`). Before this fix, Ruby's bitwise-or rule (`[~|^]`) at the same position was iterated first and won the tie, so `|` at EOL got `keyword.operator.bitwise.ruby` instead of `punctuation.separator.continuation.haml`; the attribute braces popped at the newline, and every continuation assertion below cascaded. After the swap, the prototype's pipe-continuation wins the tie. Refs: trishume#631
Strengthens the existing `apply_prototype_includes_external_prototype` from build-only to parse-and-assert. Adds precedence, opt-out, and HAML-Rails end-to-end guards alongside it in `src/parsing/syntax_set.rs`. Refs: trishume#631
All 65 assertions in `syntax_test_rails.haml` pass after the apply_prototype ordering fix. Delta applies to both baselines.
ST's `text_point(row, col)` overflows past-EOL columns into the next row, so its syntax-test framework evaluates past-EOL assertions against the corresponding column on the next line. syntect's harness was instead testing against the consumed `\n`'s scope — silent divergence whenever the `\n` carried parent meta_scopes that the EOL pop chain dropped. Reorder the loop to parse-before-assert; thread the first post-target line's scopes into `process_assertions` (`examples/syntest.rs`); fall back to the previous behaviour when next-line scopes aren't available (EOF, replay path). Closes 17 `syntax_test_git_config` and 1 `syntax_test_clojure` stale baselines. Refs: trishume#631
Two cross-line branches failing on the same parse_line grew
`flushed_ops` by append, so `ParseLineOutput::replayed` doubled and
consumers that pair `replayed[i]` with the i-th pending line slid
ops from one buffered line onto another's text. Observed as the
byte-77 panic at `syntax_test_java.java` line 624.
Track `flushed_ops_start` alongside `flushed_ops` and merge
subsequent fails against the prior snapshot's range. See
`ParseState::merge_flushed` docs for the composition rule.
`known_syntest_failures{,_fancy}` absorb the unmasking: Python /
TypeScript / Bash / Zsh files previously panicking now report their
real path-1 counts. Java stays at `1` — next panic site is a
pre-existing stale-`line_number` on branches created during replay,
tracked as follow-up. Refs: trishume#631
Branches created while `handle_fail` re-parses a buffered past line snapshotted `self.line_number` / `self.pending_lines.len()`, which still reflect the *outer* `parse_line`'s current line. A later fail on the outer line would then see `bp.line_number == cur_line`, classify the branch as same-line, and apply the branch's replay-line-relative `match_start` to a shorter outer line — shipped as `byte index 20 out of bounds of " foo = BAR,\n"` on `syntax_test_java.java:10263` inside `@MultiLineAnnotation(...)`, and the matching byte-2 panic in `syntax_test_markdown.md`'s multi-line math blocks. Introduce a `replay_ctx: Option<ReplayCtx>` set around each inner `parse_line_inner*` call in both replay loops. Branch creation and `handle_fail`'s `cur_line` read through it, so branches born in the re-parse of line `L+i` record `line_number = L+i` and `pending_lines_snapshot_len = <slot for L+i>`. Baselines absorb the unmasking: TypeScript drops 230 to 12 (cascading replay-branch misclassifications fixed), Markdown moves 1 to 897 (the `1` was the byte-2 panic artefact; real count surfaces). `syntax_test_java.java` stays at `1`: a distinct pre-existing `NoClearedScopesToRestore` surfaces further into the same file, tracked as a follow-up. Refs: trishume#631
A `branch_point` born inside `handle_fail`'s cross-line replay recorded only the inner re-parse's local `res` Vec as its `prefix_ops`. When that nested branch later failed cross-line, its own replay reconstructed the line from an empty prefix and the captures emitted before the *outer* branch trigger vanished. Shipped as `[foo]: /url` losing its `meta.link.reference.def.markdown` and capture scopes whenever the outer `link-def-title-continuation` branch's `immediately-pop2` alt-1 spawned a nested `link-def-attr-continuation` whose own fail then replayed line 3 without the original LRD opener captures. Compose the first-line prefix (outer `prefix_ops` + new-alt meta/pat/capture/meta_content) up front in both cross-line replay paths, surface it via `ParseState::replay_prefix_ops`, and prepend it to inner branch creations' `prefix_ops`. Baselines: Markdown 897 → 565, TypeScript 12 → 0 (file disappears — `syntax_test_typescript.ts` exercised the same nested-replay shape). Refs: trishume#631
`parse_line` captured the buffered shadow snapshot BEFORE the line ran, and the syntest consumer captured `stack_before` similarly. A replay applied during that line's parse may have corrected ops for prior buffered lines, leaving the captured snapshot reflecting the uncorrected baseline. A LATER replay covering the same line then resets to that stale snapshot, re-applies the corrected ops on top, and resurrects any meta_scope the prior replay had unwound. Manifested as `meta.link.reference.def.markdown` leaking past back-to-back Markdown link reference definitions and polluting all subsequent paragraphs, code blocks, blockquotes, autolinks, footnotes, etc. for the rest of the file (~408 chars / 88 assertions in `syntax_test_markdown.md`). After applying replays in `parse_line`, overwrite each buffered `pending_line_start_shadows[start_idx + i + 1]` with the post-i shadow, and use the post-replay shadow as the snapshot for the current line being pushed. Mirror the same correction in `syntest`'s consumer loop on `parsed_line_buffer[..].stack_before`. Baselines: - Markdown 565 → 158 (the LRD-leak family) - Java 1 (panic) → 18953 (real failures unmasked — the `NoClearedScopesToRestore` panic that the same drift was triggering is gone) Refs: trishume#631
A `pop: N + branch_point` snapshots `stack_depth` pre-pop; the synthetic Set's post-Set retain (`bp.stack_depth <= final_len`) and `handle_fail`'s validity check (`stack.len() < bp.stack_depth`) both ignored that `pop_count`, dropping the freshly-created bp at creation. Same-line re-emit also missed the popped contexts' meta_scope clearance Pop — route it through `push_meta_ops` like the original push. Symptom: `meta.annotation.identifier.java meta.path.java` leaking past nested-annotation extends paths in `syntax_test_java.java`. Drops Java baseline 18953 -> 9956.
Mirrors the trishume#660 same-line fix into the cross-line branch — the bespoke re-emit of the new alternative's meta_scope/meta_content_scope was missing the popped contexts' Pop, leaking the popped meta_scope (annotation-qualified-identifier's meta_scope in Java) plus the surrounding declaration's meta_scope when an annotation crosses a line into a class/enum/interface declaration.
…ctions
When an outer cross-line `fail`'s replay re-parses buffered lines, an
inner cross-line `fail` firing during the loop writes its correction
into `self.flushed_ops`. Previously, the outer's locally-computed
`replayed_ops[i]` overwrote that correction via `merge_flushed`, freezing
a stale interpretation for indices the inner had already corrected.
Fixes the leak in `src/parsing/parser.rs::handle_fail` for both the
alt-N and exhaustion cross-line paths. Repro: Java
`@A.B\n(par=1)\nenum E {}\n` — the outer `declarations` fail's line-1
reparse froze the dotted annotation as `path` alt before the inner
`annotation-qualified-identifier` fail's `name`-alt resolution landed.
Drops Java syntest baseline 9935 → 9774; no regressions in other
languages or in `Markdown` (still 158).
When a same-line branch_point exhausts at a zero-width lookahead, rewind the cursor to the BP's original position and skip the same-name Branch pattern on retry — letting the parent context's next rule fire instead of advancing past the lookahead, which let stale keyword rules match inside identifiers (`package` in `$package`, `class` in `Foo.class;`). Drops Java syntest 9774 → 1987 (-7787, -80%); jsp 39 → 0; Zsh 604 → 410. Markdown unchanged at 158. No regressions elsewhere. See parser.rs::handle_fail same-line exhaustion handler and the new `skipped_branches` field; new test `exhausted_branch_point_falls_through_to_parent_next_rule`.
`push_meta_ops`'s non-initial phase emitted the deep-context meta_scope/mcs Pops before restoring `cur_context.clear_scopes`. When the cleared atom belonged to one of the deeper contexts being popped, the Pops landed on the wrong (still-visible) scope — observed on Java's `case DayType when -> "incomplete"`, where `case-label-expression`'s `clear_scopes: 1` hid `case-label`'s `meta.case.java` and `case-label-end`'s `pop: 2` then popped the surrounding switch block off the consumer's stack. Move the cur_context Restore to before the depth loop so the previously-cleared atom is visible again when the deeper-context Pop lands on it. Drops Java syntest 1987 → 949 (-1038, additional -50%); fixes C#'s `syntax_test_GeneralStructure.cs` (was 2 → 0) and Haskell -1. Markdown unchanged at 158, no other regressions. See `parser.rs::push_meta_ops` Pop arm and the new test `pop_n_restores_clear_before_unwinding_deeper_meta_scopes`.
The YAML loader checked `set:`, `branch:`, and `embed:` after a `pop:` key but never `push:`. Combined `pop: N + push: X` rules degraded to a plain `Pop(N)` and silently dropped the push, leaving the parser on the outer context instead of the intended target. Affected rules in vendored syntaxes: Java's `pop: 2 + push: annotation-parameters-body` (lambda3 line 10069 and many others) and `pop: 1 + push: case-label-expression`; Python's `pop: 2 + push: function-parameter-list-body` and `type-parameter-list-body`. Java syntest 641 → 245 (-396); Python 66 → 45 (-21). Other language baselines unchanged.
The Set initial-phase Pop at parser.rs:1992 unconditionally popped `cur_context.meta_content_scope.len()` even when cur_context's mcs was never pushed because the context immediately below has `embed_scope_replaces=true`. This dropped the topmost wrapper-pushed embed_scope token. Mirrors the skip already in the Pop branch at parser.rs:1912. Markdown 158 -> 31; Python 45 -> 32 (free benefit).
Plain `set:` (no `pop_count`) into a target with `clear_scopes` emitted that Clear in `push_meta_ops`'s initial phase even when the leaving context carried its own `meta_scope` / `meta_content_scope`. Cur's ms sits on top of the visible stack at that point; Clear hid it instead of the parent atom the optimization was meant to strip. The non-initial Pop then ate atoms below cur's hidden ms, and the trailing Restore resurrected cur's ms — leaving cur's meta_scope where the parent's atom used to be. Bash repro `: ~/`: `~` set: `tilde-modifier` (clear+ms); `''` zero-width set: `tilde-modifier-username` (clear+mcs); `/` lookahead pops. ST scopes `/` as `meta.string.glob.shell string.unquoted.shell`; syntect emitted `meta.interpolation.tilde.shell string.unquoted.shell`. Fix: when cur has `meta_scope` or `meta_content_scope`, defer the single-context-set target Clear to the non-initial phase, after Pop+Restore (so Pop finds cur's ms visible and Restore brings the parent atoms back) and before pushing target's ms/mcs. The cur-empty case (Lisp `(defun fn (...)`, pinned by `v2_set_to_target_with_clear_scopes_clears_parent_meta_content_scope`) is unchanged. Net syntest: bash 249 → 30, zsh 410 → 25, java 245 → 221 on both regex backends; no other baseline lines change. New regression `cur_meta_scope_set_to_target_with_clear_scopes` mirrors the bash shape. Refs: trishume#631
Multi-context `set:` whose target body has both `clear_scopes: N` and
a non-empty `meta_scope`, fired from a cur with no ms/mcs/clear,
needs an extra atom dropped on the trigger token beyond Clear's
reach. ST drops `N + 1` atoms on the trigger and `N` on the body
content, anchoring the extra drop on the target's `meta_scope`.
`push_meta_ops` previously kept both atoms on the trigger, leaking
nested `meta.function.php` / `meta.function.return-type.php` into
the `:` of PHP `function bye(): never {`. The fix emits a combined
`Clear(N + 1)` in the initial phase and a paired `Restore` in the
non-initial phase, leaving the body content's existing per-context
Clear+Push to land it at the same place as before.
Gated on the clear-bearing target carrying a non-empty `meta_scope`
so syntaxes whose target has only `meta_content_scope` are
unaffected — Zsh's `zsh-redirection-glob-range-end` (clear+mcs, no
ms) on the `<` redirection trigger otherwise loses
`source.shell.zsh` and `meta.function-call.arguments.shell`.
PHP 1 -> 0.
push_meta_ops's `MatchOperation::Set` arm with `set_pop_count > 1` lumped target.ms + cur.ms + every popped deeper frame's mcs+ms into a single Pop. Per-frame clear_scopes were never restored — their cleared atoms stayed in clear_stack out of reach, and the new target's clear_scopes then bit one atom too deep. Observed on Python `r'''(?ix:some text(?-i:hello))(?iLmsux)(?a)foo'''`: the `(?ix:` rule's `pop: 3 + set:[group-body-extended, maybe-unexpected-quantifiers]` left `group-body-extended_outer`'s cleared `meta.mode.extended.regexp` in clear_stack; `group-body-extended_target`'s `clear_scopes: 1` then cleared `source.regexp.python` (the embed wrapper's mcs) instead of `mode_outer`. ST keeps `source.regexp.python` visible from col 22 through col 47+; syntect previously dropped it from col 27 onward. Split the lumped Pop into a head Pop (target.ms + cur.ms) and a per-depth Pop+Restore loop mirroring `MatchOperation::Pop` arm at parser.rs:1954-1971. New regression tests `pop_n_set_restores_deeper_frame_clear_scopes` (positive) and `pop_n_set_without_deeper_clear_scopes_unaffected` (negative gate against regressing Java's `pop:2 + push:annotation-parameters-body` shape). Refs: trishume#631
Resolved by per-depth clear_scopes Restore on pop:N + set:. Refs: trishume#631
`yaml_load`'s `parse_embed_op` was setting `embed_scope_replaces=true` on the wrapper unconditionally. That flag tells the per-target loop in `parser.rs` to suppress the next context's `meta_content_scope` push, to avoid duplicating the embedded syntax's top-level scope (auto- inserted into `main`'s mcs at `yaml_load.rs:706-713`) with the wrapper's last `embed_scope` atom. That dedup is only needed when the embed enters via `main`. Fragment embeds (e.g. `embed: scope:source.toml.embedded.python#toml`) bypass `main`, so the fragment context's mcs is independent of the syntax's top-level scope. Suppressing it strips a real grammar atom (TOML's `meta.mapping.toml`) and the next `clear_scopes:` then bites the wrapper instead of the intended grammar atom — leaking the wrapper out of every nested scope inside the embed. Mark the wrapper as `embed_scope_replaces=true` only when the embed target has no `#fragment`. Two regression tests: - `fragment_embed_preserves_target_meta_content_scope` (positive) - `non_fragment_embed_still_suppresses_main_mcs` (negative gate) The b31b727 test `embed_scope_replaces_preserves_wrapper_mcs_across_inner_set` is unaffected — Markdown's bash code-fence embed has no fragment. Python 32 -> 0 on both regex backends; no other baseline moves.
When a child syntax has multiple parents in `extends:` and the parents disagree on a shared context or variable, a parent's directly-defined entry now outranks another parent's inherited entry. Same-provenance ties still resolve last-wins. Fixes the indented zsh shebang in Markdown fenced blocks: `Zsh (for Markdown)` extends `[Bash (for Markdown), Zsh]`. Bash (for Markdown) owns a lenient `main` (`^(?=\s*#!)`); Zsh inherits Bash's strict column-0 main. The previous last-wins merge let Zsh's inherited main override, so the indented ` #!/usr/bin/env zsh` fell into the regular comments rule.
`get_line_assertion_details` recognised any line where the testtoken
appeared mid-text and where valid assertion markers followed. ST's
syntax-test format only allows assertions on dedicated comment-only
lines, so source code preceding the testtoken means the markers are
coincidental. The harness was processing such lines as assertions
anyway, producing spurious failures and pinning
`test_against_line_number` away from the source line so the *next*
genuine assertion tested against stale scopes.
Fix: early-return `None` when source code precedes the testtoken or
non-whitespace follows the closing testtoken_end. The two bash repros
are `: ${#^pattern}` (the `#` is the parameter-length operator) and
`[ <<doc ] # <- ]` (a trailing comment whose body starts with `<-`).
Doing this also exposed a latent bug in
`only_whitespace_after_token_end`: `after_token_end` was the substring
*from* the end-token, so the end-token glyphs themselves always
counted as non-whitespace, and `/* ^ scope */` lines were silently
classified as non-pure. Under the old gate this was harmless (the
flag only fed the `parse_test_lines` path), but the early-return
turned every C-style block-comment assertion into a non-assertion
source line. Skip the end-token before checking the trailing content.
Three new harness unit tests cover the corrected predicate and both
shell repros. Two existing tests already exercise the pure-assertion
path; their `is_pure_assertion_line` field assertions are now
invariant-true at the constructor, but kept as documentation.
Net syntest deltas (both regex backends):
- Bash 30 -> 4 (residual: backtick `for...done` interaction)
- Zsh 25 -> 10 (residual: zsh glob-range scoping)
- Haskell 49 -> 43
Stacks on trishume#673.
The per-line search cache stored full-line `regex.search` results keyed
by MatchPattern pointer, then reused them on every later search regardless
of `search_end`. Inside an embed where `search_end` is clipped to the
escape position, that reuse can flip rule outcomes whose lookaheads sit
exactly at the boundary — the cached "no match" was computed against the
escape glyph, but a fresh truncated search would see end-of-input there.
Concretely: in `` `for i in $(seq 100); do echo $i; done` `` the
`done{{cmd_break}}` rule (`done(?!cmd_char)`) was searched at the outer
level with full-line text, where the lookahead saw the closing backtick
(itself a cmd_char) and failed. That `None` was cached. Inside the
backtick embed, with search_end clipped to the close, the cache
short-circuited the lookup before the regex could re-run with
end-of-input semantics, so `done` fell through to `cmd-name-body` and
got `variable.function.shell` instead of `keyword.control.loop.end.shell`.
Skip the cache lookup whenever `search_end < line.len()`. Insertion was
already gated on `search_end == line.len()`, so the cache stays
populated by full-line answers; truncated searches just re-run.
In a multi-context `set:` whose non-topmost target declares `clear_scopes: N` plus a `meta_content_scope`-only body (empty `meta_scope`), Sublime applies the Clear to atoms that earlier targets pushed via their `meta_scope` and the strip is visible to the trigger match's own scope/captures. Syntect was deferring the Clear to the non-initial phase, so the trigger token leaked the cleared atom even though body content saw it removed. Surfaces on Zsh glob-range openings inside `[ <1-2> ]` etc.: the `zsh-redirection-glob-range-begin` `set:` lists `string-path-pattern-body` (meta_scope `meta.string.glob.shell string.unquoted.shell`) before `zsh-redirection-glob-range-end` (`clear_scopes: 1` + `meta_content_scope: meta.range.shell.zsh`), and the `<` carries a capture scope asserted with `- string`. Drops the residual 10-char Zsh syntest failure on both backends.
Two-part guard against `branch_point` exhaustion collapsing a parent
`meta_scope` one line boundary too early on empty lines:
1. In `parse_next_token`, a non-consuming `Branch` match that lands at
or past the replay line's end is skipped when inside `replay_ctx`.
Without this, the outer fail-replay would chain another `branch_point`
at end-of-replay-line whose own cross-line exhaustion later attaches
pops to the wrong line.
2. In `handle_fail`'s same-line path, when the rewind position is 0 of
a purely empty line (length ≤ 1, just `\n`), advance the cursor to
`line.len()`. The next-iteration `match: ''` of an `immediately-pop`-
style alt then emits its scope pops past-EOL, which
`ScopeRegionIterator` wraps onto the next line's baseline.
Together they make Markdown's non-terminated link reference definition
keep `meta.link.reference.def.markdown` on the empty line between
`blah` and the closing `text` paragraph, matching ST.
Baseline: Markdown 1 → 0 (the `syntax_test_markdown.md` line drops
from `known_syntest_failures{,_fancy}.txt`). No other rows change.
The harness's `SYNTAX_TEST_HEADER_PATTERN` restricted `testtoken_end`
to punctuation glyphs (`*/`, `-->`, …), assuming alphabetic tails like
`dmd`, `clojure`, or `dotnet run` were shebang-style instructions to
ignore. ST disagrees: those tails *are* the closing testtoken, and ST
clips each assertion line's selector at the first substring match.
The D shebang test's ` #! <- keyword.operator.logical.d dmd` and the
Clojure shebang's `<- comment.line.shebang.clojure …` both relied on
that clipping; under the old regex `dmd` / `clojure` leaked into the
selector and the assertions failed against scopes the parser had
correct.
Two-part fix in `examples/syntest.rs`:
- Broaden the `testtoken_end` capture to the entire whitespace-stripped
trailing tail (`\S(?:.*\S)?`), so multi-word tails like `dotnet run`
also round-trip cleanly.
- Drop the `only_whitespace_after_token_end` gate. The Clojure case
has `clojure` inside `comment.line.shebang.clojure`, so clipping
succeeds but content follows the closing token; ST still treats the
line as a pure assertion (with the clipped selector) rather than as
source code, and so should we. The before-`testtoken_start`
whitespace check alone is enough to reject the bash `: ${#^pat}` and
`[ <<doc ] # <- ]` repros that motivated the gate.
Baseline drops both `syntax_test_shebang.d` and
`syntax_test_shebang.clj` rows from
`testdata/known_syntest_failures{,_fancy}.txt`. No other rows change.
Stacked on trishume#677.
Pre-fix `recursively_mark_no_prototype` followed every `Push` / `Set` /
`Branch` / `Embed` AND every nested `include` from the prototype's
include chain unconditionally, marking every reachable context as
"don't include the prototype". For Haskell that meant marking
`function-name`, `variable-name`, and `variable-name-end` because of
the chain
prototype → preprocessor-pragmas
→ push: preprocessor-pragma-body
→ embed: preprocessor-pragma-signature-value
→ include: functions
→ branch: variable-name, function-name
→ push: variable-name-end
With the prototype's `line-comments` rule no longer applied inside
`variable-name-end`, the `(?=\S)` pop:2 rule fired on every `--` of
the assertion-comment lines that sit between an infix operator
declaration and its `:: a -> Bool` continuation. That popped the
branch alternative off the stack mid-air, orphaned the `functions`
branch_point, and prevented the `(?=::)` `fail: functions` rule from
ever installing `meta.function.identifier.haskell` via cross-line
replay. ST verified via `scope_at_test`: every position the harness
flagged as wrong is `source.haskell meta.function.identifier.haskell …`
in ST.
The fix tracks a `via_push` flag through the recursion: includes are
followed only while still in the prototype's include chain
(`via_push: false`); once we've crossed a Push/Set/Branch/Embed we
keep following further match-op targets but stop following the body's
own `include:`s. That preserves the YAML and Lua cases (where
prototype-pushed bodies chain via `set:` to other prototype-pushed
bodies that DO need the no_prototype mark to break the loop —
`property → property-body`, `line-doc-comment-body → maybe-line-doc-
comment → line-doc-comment-body`) while keeping prototype attached
to general code-parsing contexts that are merely included from a
body for its local rule access.
Baseline: `syntax_test_haskell.hs` 43 → 1 (just the orthogonal
`variable.other..haskell` double-dot selector failure remains, fixed
in the next commit). `syntax_test_java.java` 221 → 212 incidentally —
same underlying mechanism unmasked nine column-failures that the
over-marking had been hiding.
Stacked on trishume#678.
`Scope::new("variable.other..haskell")` (double dot from a typo or a
test author writing `variable.other..haskell` to bypass ST's symbol-
test heuristics) used to pack `""` as a real atom, producing a 4-atom
scope `[variable, other, "", haskell]` that no longer prefix-matched
the 3-atom `variable.other.haskell` it was meant to equal.
ST's selector engine collapses runs of dots — `score_selector(
'variable.other..haskell', 'source.haskell variable.other.haskell')`
returns 48, the same as the single-dot form. Mirror that by filtering
empty segments in `ScopeRepository::build`. Symmetric: applies to
both selector parsing in syntest assertions and to scope construction
where a syntax accidentally has `scope: foo..bar`.
Surfaces as the last `syntax_test_haskell.hs` failure
(`syntax_test_haskell.hs:2348` line `:: a -> Bool`,
`-- ^ variable.other..haskell` against scope
`source.haskell variable.other.haskell`).
Baseline: `syntax_test_haskell.hs` drops out of both
`testdata/known_syntest_failures{,_fancy}.txt`. Java incidentally went
from 221 to 212 with the prior commit's prototype-attachment fix; the
new line is recorded here.
Submodule moves from `1ba99a47` (`v4201-119-g1ba99a47`) to the
shipped `v4202` tag (`91ad8085`, "[D, Makefile, Rust] Standardize
build output scopes"). v4202 is the most recent stable release tag
before the C# v2 migration `8621831d` and the regex embed grammar
`c735169b`; pinning here keeps `regex_string` on the legacy
`embed: scope:source.regexp; embed_scope: meta.string.cs meta.regexp.cs`
form, sidestepping the wrapper-mcs divergence between syntect and
ST DEV's renderer that produced the `syntax_test_C#11.cs: 35`
baseline entry.
Compared with v4200 and v4204/v4205:
- v4200 requires regenerating `testdata/test4.html` against the
older `Cargo.sublime-syntax` (pre-`91ad8085` scope rename); v4202
matches the existing fixture as-is.
- v4204/v4205 reintroduce the C#11 row (35) plus a
`parser.rs::can_parse_preprocessor_rules` divergence from the C
directive-scope refactor `44871676`.
Baseline movement: `make syntest` and `make syntest-fancy` both end
clean ("No new failures!"). C#11 row drops (-35); Java row at 212
unchanged. Net -35 failures.
Companion fixes for v4202's older fixtures:
- `parsing::syntax_set::tests::can_load`: Rails `main`'s
`context_iter` count drops from 185 to 184 (one context added
upstream post-v4202).
- `parser.rs::push_meta_ops`: keep the auto-injected top-level scope
across v2 set's cur.mcs Pop. The initial-phase Pop was popping
`cur_context.meta_content_scope.len()` atoms at `match_start` so
the matched text wouldn't see cur's `meta_content_scope`. That
overcounts when cur is `main`: `add_initial_contexts` injects the
syntax's top-level scope at `main.meta_content_scope[0]`, which
ST keeps on the visible stack across the trigger (verified against
ST 4200 stable on TOML's `[section]` rule, where the `[` trigger
emits `source.toml` alongside `meta.section.toml`). Without this,
the v4202-era `Rust/tests/syntax_test_frontmatter.{rs,md}` would
fail at the `[section]` trigger position — the upstream fix
`20212766` for the same divergence is post-v4202 and not in
scope. Regression coverage in
`v2_set_does_not_apply_parent_meta_content_scope_to_matched_text`
still pins user-declared cur.mcs as popped.
Cross-line all-exhaustion in `handle_fail` advanced one char past the branch_point's lookahead, leaving the rest of the matched identifier to be reparsed without the branch_point in scope. The same-line arm already does the rewind+skipped_branches dance from f3e497a; extend it to the cross-line arm so the parent context's NEXT rule fires at the BP's match position. Drops Java syntest 212 → 119 (-93 char-assertions). The three unique-line wins are `package apple dot` line 572, and the ` variable` after `import no.terminator` / `import static no.terminator` on lines 656 and 671 — top-level-`java` cases where `declarations` exhausts and ST falls through to `else-expressions → expressions → constant-expressions → variables`. Drops `outer_cross_line_replay_prefers_inner_correction`. The test was added in trishume#663 to guard the inner-correction-preference machinery under the path "outer `declarations` 0 → 1, inner `annotation-qualified-identifier` 0 → 1". Intervening parser fixes between trishume#663's baseline (9774) and current HEAD (212) shifted control flow so that the test's 3-line input now hits the cross-line all-exhaust path instead, with the outer cycling all 5 alts. The test's coverage of `prefer_inner_replay_corrections` was already lost before this change; deleting it reflects that. The current Java baseline failures still exercise the alt-N path through other inputs.
d7fde05 to
1a893f2
Compare
Cluster-2 / multigen16 investigation logA separate investigation pass on a stacked branch tried several architectural directions to also fix cluster 2 (and unblock the What the probe capturedParsing
Cluster 2 (substitute wanted) and the multigen16 doubling (skip wanted) have identical lineage signatures — only Directions tried, all reverted
Findings summary
Remaining viable directions (deferred)
Diagnostic infrastructureAdded but not included in this PR (kept on a stacked branch for future pickup, since it doesn't drive syntest down):
If picking this back up, that infrastructure is the right starting point. |
…lings only
The previous gate skipped substitution iff
`inner.stack_depth > outer.stack_depth`, which collapsed two
structurally distinct cases — sibling refinement (substitute
needed, e.g. `outer=declarations(3)`,
`inner=annotation-identifier(4)` on the cluster-1 input
`@Anno\n.\nAnno\n(par=1)\nenum E {}`) and child-of-resolved-alt
nesting (substitute must skip, the multigen16 doubling guarded by
`deeper_inner_bp_correction_does_not_double_outer_meta_scope`).
Tighten to `depth_diff in {0, 1}`. Java syntest 119 → 117. Adds
`cross_line_all_exhaust_with_pop_count_emits_popped_meta_scope_pops`
as a passing regression test for the cluster-1 input. Doubling
guard stays green.
1a893f2 to
ae78419
Compare
Review just this PR's changes:
631-java-cross-line-bp-fall-through...631-java-cross-line-allexhaust-popsStacked on #681.
The previous gate in
prefer_inner_replay_corrections(commit
0a2139a)skipped substitution iff
inner.stack_depth > outer.stack_depth.That predicate collapsed two structurally distinct cases:
deeper than outer on the same line — e.g. cluster 1's
@Anno\n.\nAnno\n(par=1)\nenum E {}withouter=declarations(3), inner=annotation-identifier(4). Inner'squalified-identifier alt brings
meta.path.javathat outer'slocally-computed parse drops.
deeper, nested inside contexts outer's resolved alt pushed —
e.g. multigen16's
outer=class-members(4), inner=object-type(11). Inner's reparse adds atoms outer's altalready provides (the
deeper_inner_bp_correction_does_not_double_outer_meta_scopeguard).
This PR tightens the gate to
depth_diff in {0, 1}, separatingthe two by the smallest viable structural signal. Java syntest
baseline drops 119 → 117. The cluster-1 ignored repro becomes a
passing regression test
(
cross_line_all_exhaust_with_pop_count_emits_popped_meta_scope_pops).The doubling guard stays green.
Cluster 2's multi-line qualified field type (depth_diff=5) and
cluster 3's
pop: 2miscount remain unresolved; per investigationcaptured in scratch on the stacked branch, neither has a
local-signal discriminator distinguishing it from the doubling
cascade. Those are deferred to a separate pass.