Skip to content

spec: generic syntax-highlight definition mechanism (#9955)#10129

Open
lonexreb wants to merge 1 commit intowarpdotdev:masterfrom
lonexreb:spec/9955-syntax-highlight-mechanism
Open

spec: generic syntax-highlight definition mechanism (#9955)#10129
lonexreb wants to merge 1 commit intowarpdotdev:masterfrom
lonexreb:spec/9955-syntax-highlight-mechanism

Conversation

@lonexreb
Copy link
Copy Markdown
Contributor

@lonexreb lonexreb commented May 5, 2026

Summary

Spec for issue #9955 — adding a new language to Warp's syntax highlighting today requires changes in 5+ places in crates/languages/src/lib.rs (SUPPORTED_LANGUAGES array, language_by_filename, to_arborium_name, get_arborium_highlight_query, plus a grammars/<lang>/ folder), all of which require modifying compiled Rust code and shipping a Warp release. This blocks the most-requested kind of community contribution: "I use $LANG and would happily contribute the highlighting definition."

Investigation

  • The closed registry: SUPPORTED_LANGUAGES: [&str; 32] (lib.rs:23).
  • Hand-written match statements at language_by_filename (lib.rs:115), to_arborium_name (lib.rs:226), get_arborium_highlight_query (lib.rs:239).
  • Grammar substrate: arborium::tree_sitter::{Language, Query} (lib.rs:7) — tree-sitter is the right substrate; this spec preserves it.
  • Per-language directories at crates/languages/grammars/<lang>/ already exist with config.yaml, identifiers.scm, indents.scm — embedded via RustEmbed.
  • Adding a language Warp doesn't yet support requires either upstream arborium support OR vendoring a tree-sitter grammar.

What's in the spec

product.md — 8 testable behavior invariants (B1–B8), 7 acceptance criteria (A1–A7), explicit non-goals (no TextMate-style regex grammars, no native dynamic-library loading, no sub-language injection in V1, no hot-reload), and 6 risks with concrete TECH decisions.

tech.md — picks the three-source discovery architecture, the language.toml schema, the loader, the file-association index, and the staged migration strategy. 5+4 unit tests plus 3 integration tests with a real tree-sitter fixture.

Architectural choices

  • Three-source discovery in priority order: Hardcoded > Bundled > User-local. V1 adds discovery beside the existing hardcoded path; the 32 existing languages keep working unchanged. No flag day — each language migrates in its own independently-revertable PR.
  • Schema-driven language.toml — one contract a contributor learns; everything else is standard tree-sitter files (highlights.scm, indents.scm, optional identifiers.scm).
  • WASM-only for runtime-loaded user grammars. Native dynamic libraries (.so/.dylib/.dll) explicitly rejected with a clear error and no dlopen call exists on the loader path. Bundled grammars can be either WASM or a Cargo dependency on a Rust grammar crate.
  • Validation with clear failure modes: a malformed grammar does NOT break Warp startup. Surfaces via log::error! + Settings → Editor → Languages notification. Other grammars load normally.
  • Tree-sitter substrate preserved. The issue's references to TextMate / Sublime / Midnight Commander are treated as community-distribution exemplars (the goal is contributor velocity), not as recommended technology.
  • Per-language migration template for the 32 existing languages, each migration as a single mechanical PR. The bundled_parsers.rs compile-time map is the only hand-edited file for adding bundled grammars (with an open question about whether to use inventory-style auto-registration to eliminate even that).

Test plan

  • Schema unit T1–T5 (parse, validate, reject native_lib, reject ABI mismatch, reject schema_version mismatch)
  • Loader unit T6–T9 (failed grammar surfaces as LoadFailure not panic, collision dedup, ABI mismatch reporting)
  • Integration IT1–IT3 with a real tree-sitter-toml fixture (load via env var, malformed grammar isolation, registry merge)
  • Regression: existing crates/languages/src/lib_tests.rs and crates/syntax_tree/src/queries/*_tests.rs pass unchanged
  • Manual: drop a ~/.warp/grammars/zig/ directory with language.toml, highlights.scm, grammar.wasm; restart; .zig files render
  • Manual: attempt to load grammar.so; verify clear error, no dlopen

Open questions for maintainer review

  1. WASM grammars require a tree-sitter version supporting WasmStore. Verify against current Cargo.lock.
  2. User-local grammar dir: $XDG_CONFIG_HOME/warp/grammars/ (when set) vs ~/.warp/grammars/ (fallback). Confirm precedence.
  3. Should the Languages settings page allow disabling individual loaded languages? Spec defers this to a follow-up.
  4. Should bundled_parsers.rs use an inventory-style auto-registration pattern to eliminate the only remaining hand-edit? Adds a build-time crate dep but removes the last manual step.

Closes (spec-only) #9955 — implementation PR will follow once spec direction is confirmed. The 32 existing languages will migrate via independent follow-up PRs, one per language.

Adds product.md + tech.md for issue warpdotdev#9955: a contributor-friendly
mechanism for adding new languages to Warp's syntax highlighting
without modifying compiled Rust code and without releasing Warp.

Investigation: today, adding a language requires changes in 5+
places in crates/languages/src/lib.rs (SUPPORTED_LANGUAGES array,
language_by_filename match, to_arborium_name match,
get_arborium_highlight_query match, plus a grammars/<lang>/ folder).
The closed registry blocks the most-requested kind of community
contribution: "I use $LANG and would happily contribute the
highlighting definition." That contribution today requires touching
the internal arborium crate dependency and shipping a Warp release.

Spec proposes:
- Three-source discovery (compile-time hardcoded, bundled directory,
  user-local directory) with explicit priority order. Hardcoded >
  bundled > user-local. Staged migration: V1 adds the discovery
  layer beside the hardcoded path; existing 32 languages keep
  working unchanged. No flag day.
- Schema-driven language.toml (display_name, internal_name,
  comment_prefix, indent_unit, file_associations [extensions,
  filenames, shebangs, aliases], brackets, parser, ts_abi). One
  contract a contributor learns; everything else is standard
  tree-sitter files.
- WASM-only for runtime-loaded user grammars; native dynamic
  libraries (.so/.dylib/.dll) explicitly rejected with a clear
  error message and no dlopen call exists on the loader path.
- Validation with clear failure modes: a malformed grammar does
  NOT break Warp startup; surfaces via log + Settings page
  notification. Other grammars load normally.
- Settings > Editor > Languages page lists all loaded grammars,
  their source, and any failures.
- Tree-sitter substrate preserved (rejects switching to
  TextMate/Sublime regex grammars referenced in the issue as
  community-distribution exemplars only, not as recommended tech).
- Per-language migration template for the 32 existing languages,
  each as an independently revertable PR.

Test plan covers five schema-validation unit tests, four loader
unit tests (including ABI mismatch and collision dedup), and
three integration tests with a real tree-sitter grammar fixture.

Six risks called out (WASM perf cost, schema versioning, capture-
name standard set, ABI mismatch detection, sub-language injection,
theme integration) with concrete TECH decisions or recommended
follow-ups for each.

Four open questions for maintainer review on tree-sitter version
prerequisites, XDG path fallback, per-language disable in V1, and
inventory-style auto-registration for bundled parsers.
@cla-bot cla-bot Bot added the cla-signed label May 5, 2026
@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented May 5, 2026

@lonexreb

I'm starting a first review of this spec-only pull request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

@github-actions github-actions Bot added the external-contributor Indicates that a PR has been opened by someone outside the Warp team. label May 5, 2026
Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

The spec defines a tree-sitter-based registry for bundled and user-local syntax highlighting definitions, with a product contract, loader architecture, migration plan, and tests. The direction addresses the linked issue, but several requirements contradict each other or leave security-critical implementation details unresolved.

Concerns

  • The goal promises no compiled Rust changes/no release for both bundled and user-local paths, while the bundled path still ships in Warp and the tech spec requires Cargo and parser-map changes.
  • The migration plan alternates between a single mechanical migration PR and independently revertable per-language migrations.
  • The loader/API model does not cleanly represent failed grammar loads, yet the product requires failed grammars to appear in diagnostics.
  • The product allows missing highlights.scm, but the tech loader rejects highlight-query load failures.

Security

  • User-local WASM is treated as sufficient sandboxing without specifying resource limits or host capabilities for untrusted parsers loaded at startup.
  • Failure diagnostics log and display full grammar directory paths even though the telemetry section identifies paths as PII.

Verdict

Found: 0 critical, 6 important, 1 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread specs/GH9955/product.md
## Goal

A contributor can add a new language to Warp's syntax highlighting
**without modifying compiled Rust code and without releasing Warp**,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] This goal conflates the bundled and user-local paths: a source-tree bundled grammar still requires a Warp release, and the tech spec later requires Cargo/parser-map changes. Split the goal so only user-local grammars satisfy the no-release/no-compiled-code path.

Comment thread specs/GH9955/product.md
match statements are replaced with a registry-driven lookup. The
existing 32 languages get their associations migrated from the
match statements to per-language `language.toml` files in a
single mechanical PR (this spec calls out that PR as a follow-up,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] This says the 32 languages migrate in a single mechanical PR, but B4 and the tech spec require independently revertable one-language migrations. Pick one migration strategy so implementers do not plan incompatible rollout paths.

Comment thread specs/GH9955/product.md
libraries (`.so`, `.dylib`, `.dll`) are explicitly rejected and
never loaded. The WASM is loaded via tree-sitter's existing WASM
runtime. WASM provides the sandboxing that makes user-local
grammars safe.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] [SECURITY] WASM sandboxing alone does not define the safety contract for untrusted parsers loaded at startup. Specify host capabilities plus CPU/memory/input-size limits or timeout behavior so a malformed grammar cannot hang or exhaust Warp.

Comment thread specs/GH9955/product.md
A grammar directory that fails to load (malformed `language.toml`,
WASM that fails to instantiate, `highlights.scm` that fails to
parse against the grammar) does NOT break Warp startup. Instead:
- A `log::error!` fires with the directory path and the failure
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] [SECURITY] Logging the full grammar directory path conflicts with the tech spec's path-is-PII telemetry rule and can expose usernames or private project paths in shared logs. Require redaction or basename-only diagnostics for logs, with full paths only in local UI if needed.

Comment thread specs/GH9955/tech.md

# User-local grammars: WASM file path relative to the grammar dir.
# Mutually exclusive with [parser.rust_crate].
wasm = "grammar.wasm"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 [SUGGESTION] Split this schema block into separate bundled and user-local examples; as written, the canonical example sets both rust_crate and wasm even though the comments say they are mutually exclusive.

Comment thread specs/GH9955/tech.md
pub struct LoadedLanguage {
pub language: Arc<Language>,
pub source: LanguageSource,
pub failure: Option<LoadFailure>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] LoadedLanguage cannot represent a grammar that fails before a Language is constructed because language is mandatory, yet failed grammars must be returned and listed in Settings. Define a separate LoadResult/FailedGrammar variant before implementation.

Comment thread specs/GH9955/tech.md
Reject if `WasmStore` reports an ABI mismatch with the host's
`tree_sitter::TREE_SITTER_LANGUAGE_VERSION`.
3. Compile `highlights.scm` against the resolved grammar via
`Query::new`. On failure: record `LoadFailure`, return.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] The product spec says a valid grammar with missing highlights.scm still loads without coloring, but this loader step treats highlight-query failure as LoadFailure and returns. Specify missing-file handling separately from invalid-query handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed external-contributor Indicates that a PR has been opened by someone outside the Warp team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant