docs: cookbook for adding a new language extractor#39
Open
mschreib28 wants to merge 2 commits into
Open
Conversation
Adds docs/ADDING-A-LANGUAGE.md walking through every step a contributor
needs to add a new language extractor:
1. Source a tree-sitter wasm grammar — covers the three real-world
paths (already in tree-sitter-wasms, pre-built release artifact,
build from source via tree-sitter-cli's bundled wasi-sdk).
2. Probe the AST with a small scratch script before writing code.
3. Register in src/types.ts + src/extraction/grammars.ts.
4. Type-check before adding extraction logic.
5. Pick a pattern: LanguageExtractor config (procedural / OO) or a
self-contained extractor class (declarative / template / non-OO).
6. Map onto existing NodeKind / EdgeKind values.
7. Tests + end-to-end CLI smoke.
8. PR description checklist.
Each section points at the existing extractors as worked examples
(R for the OO path, HCL/SQL/Liquid for the custom path, Pascal+DFM
for the cross-format case). README.md and CLAUDE.md gain a one-line
pointer to the cookbook.
Closes colbymchenry#55.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sections 3, 5a, 5b previously taught the monolithic-file pattern that PR colbymchenry#116 obsoletes. After colbymchenry#116, adding a language is one new file in src/extraction/languages/ + 2 lines in registry.ts (and an optional 1-line addition to the Language union for TypeScript narrowing). Updated: - Section 3: full rewrite. Was 3-file mutation (types.ts, grammars.ts, CLAUDE.md). Now: 1 LanguageDef file + registry import + Language union entry. Includes a "why per-file" sidebar pointing at the cross-PR conflict bottleneck the registry resolves. - Section 5a: drops the EXTRACTORS-map registration step. The extractor is referenced from the LanguageDef directly. - Section 5b: drops the tree-sitter.ts dispatch wiring. customExtractor on the LanguageDef takes the dispatch — no per-language if-branches. Section 1 (sourcing wasm), Section 2 (probing AST), Sections 6/7/8 (NodeKind mapping, tests, PR checklist), and the existing-extractors reference table are unchanged — those parts of the workflow didn't change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Copied from colbymchenry/codegraph#97