docs: cookbook for adding a new language extractor#97
Open
andreinknv wants to merge 2 commits into
Open
Conversation
Adds docs/ADDING-A-LANGUAGE.md walking through every step a contributor
needs to add a new language extractor:
1. Source a tree-sitter wasm grammar — covers the three real-world
paths (already in tree-sitter-wasms, pre-built release artifact,
build from source via tree-sitter-cli's bundled wasi-sdk).
2. Probe the AST with a small scratch script before writing code.
3. Register in src/types.ts + src/extraction/grammars.ts.
4. Type-check before adding extraction logic.
5. Pick a pattern: LanguageExtractor config (procedural / OO) or a
self-contained extractor class (declarative / template / non-OO).
6. Map onto existing NodeKind / EdgeKind values.
7. Tests + end-to-end CLI smoke.
8. PR description checklist.
Each section points at the existing extractors as worked examples
(R for the OO path, HCL/SQL/Liquid for the custom path, Pascal+DFM
for the cross-format case). README.md and CLAUDE.md gain a one-line
pointer to the cookbook.
Closes colbymchenry#55.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 26, 2026
Sections 3, 5a, 5b previously taught the monolithic-file pattern that PR colbymchenry#116 obsoletes. After colbymchenry#116, adding a language is one new file in src/extraction/languages/ + 2 lines in registry.ts (and an optional 1-line addition to the Language union for TypeScript narrowing). Updated: - Section 3: full rewrite. Was 3-file mutation (types.ts, grammars.ts, CLAUDE.md). Now: 1 LanguageDef file + registry import + Language union entry. Includes a "why per-file" sidebar pointing at the cross-PR conflict bottleneck the registry resolves. - Section 5a: drops the EXTRACTORS-map registration step. The extractor is referenced from the LanguageDef directly. - Section 5b: drops the tree-sitter.ts dispatch wiring. customExtractor on the LanguageDef takes the dispatch — no per-language if-branches. Section 1 (sourcing wasm), Section 2 (probing AST), Sections 6/7/8 (NodeKind mapping, tests, PR checklist), and the existing-extractors reference table are unchanged — those parts of the workflow didn't change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
docs/ADDING-A-LANGUAGE.md— a cookbook for contributors adding a new language extractor. Closes #55.The doc was prompted by @cfournel's question on #55 (they published
tree-sitter-mql5and asked how to plug it in), but applies equally to anyone adding HCL/Terraform, R, SQL, dbt, Scala, Vue, or any of the other language requests in the issue tracker. Now there's a single self-serve walkthrough.What it covers
tree-sitter-wasms, pre-built in a GitHub release / npm tarball, or built from source viatree-sitter-cli's bundled wasi-sdk (no Docker / local emcc needed).src/extraction/languages/<name>.tsexporting aLanguageDef) plus a 2-line registry update; reflects the per-file registry pattern from refactor: per-language registry — eliminate cross-PR conflict surface for language additions #116.LanguageDef:LanguageExtractorconfig (procedural / OO — Python, Ruby, R)extractFromSource+ the end-to-end CLI smoke recipe.Each section points at concrete extractors in the repo as worked examples — R for the OO path, HCL / SQL / Liquid for the custom-class path, Pascal + DFM for the cross-format case.
Files changed
docs/ADDING-A-LANGUAGE.mdREADME.mdCLAUDE.mdTest plan
tree-sitter-cli build --wasmworkflow on the SQL grammar (the wasi-sdk path the doc recommends)🤖 Generated with Claude Code