feat(graph): symbol-level issue history mined from Fixes #N commits#23
Open
mschreib28 wants to merge 10 commits into
Open
feat(graph): symbol-level issue history mined from Fixes #N commits#23mschreib28 wants to merge 10 commits into
Fixes #N commits#23mschreib28 wants to merge 10 commits into
Conversation
Adding a new language used to require coordinated edits to 6
shared lists across 4 files (Language union in types.ts;
DEFAULT_CONFIG.include; WASM_GRAMMAR_FILES, EXTENSION_MAP, and
getLanguageDisplayName in grammars.ts; EXTRACTORS map in
languages/index.ts). Two PRs adding different languages typically
conflicted on every one of those.
After this refactor, adding a new language is:
1. Drop a file at src/extraction/languages/<name>.ts exporting an
<NAME>_DEF: LanguageDef constant.
2. Add ONE import line and ONE array entry to
src/extraction/languages/registry.ts (alphabetical position —
adjacent additions are still possible but rare).
That's it. grammars.ts, types.ts, tree-sitter.ts dispatch, and the
default include globs are all derived from the registry.
## What's in a LanguageDef
```ts
interface LanguageDef {
name: string; // canonical id
displayName: string; // "Pascal / Delphi"
extensions: readonly string[]; // ['.pas', '.dpr', ...]
includeGlobs: readonly string[];
grammar?: { wasmFile, vendored?, extractor }; // tree-sitter
customExtractor?: (fp, src) => ExtractionResult; // Liquid, Svelte
extensionOverrides?: { '.dfm': { customExtractor } }; // Pascal forms
}
```
Each existing language file now exports both its `xxxExtractor`
(unchanged) AND a new `XXX_DEF`. New files were added for tsx, jsx,
svelte, liquid (the latter two wrap their existing custom extractor
classes via the customExtractor field).
## Refactored consumers
- src/extraction/grammars.ts: WASM_GRAMMAR_FILES removed (was
internal-only); EXTENSION_MAP now a Proxy that lazy-builds from
the registry on first access (avoids TDZ in cyclic load paths).
loadGrammarsForLanguages, isLanguageSupported, isGrammarLoaded,
getSupportedLanguages, getLanguageDisplayName, detectLanguage —
all read from registry.
- src/extraction/tree-sitter.ts: extractFromSource's if-chain
(svelte / liquid / pascal+.dfm/.fmx) replaced with one lookup:
def.extensionOverrides[ext]?.customExtractor || def.customExtractor.
Drops direct imports of LiquidExtractor, SvelteExtractor,
DfmExtractor.
- src/types.ts: DEFAULT_CONFIG moved to src/default-config.ts (cycle
break). types.ts re-exports for backward compat. The `include`
array is now built lazily from each LanguageDef's includeGlobs.
## What still requires a one-line edit
The Language string union in types.ts still hard-codes the known
languages (typescript | javascript | … | unknown). New languages
added to the registry work at runtime as strings, but adding the
literal here is required IF the resolver wants to do exhaustive
narrowing on the new language (resolution/index.ts and
resolution/import-resolver.ts have a few `language === 'X'`
branches). Most new languages don't need such branches.
This trade-off keeps strict narrowing for the existing handful of
language-specific code paths while making everything else
registry-driven.
## Tests
380/380 pass. No new tests; behavior is identical. Existing
extraction.test.ts and pr19-improvements.test.ts heavily exercise
detectLanguage, isLanguageSupported, getSupportedLanguages, and
loadAllGrammars — all green.
## Follow-ups (out of scope)
- Auto-discovery in registry.ts via fs.readdirSync — works in
built dist/ but vite-node doesn't support extensionless require()
of TS source. A small build-time generator could remove the
static import list entirely.
- Splitting __tests__/extraction.test.ts into per-language test
files — eliminates the test-end-of-file conflict surface that
every language PR currently hits.
- Similar registry refactors for:
- MCP tool definitions (each tool self-registers; no shared
tools[] array or case-switch in execute())
- Migration files (each migration in src/db/migrations/NNN-*.ts;
auto-discovered by version)
- Index/sync hooks (centrality, churn, issue-history,
config-refs, sql-refs, cochange all currently mutate
CodeGraph.indexAll/sync; an IndexHook interface would make
each pass self-contained)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tractor Reviewer caught a real bug: the original commit kept the EXTRACTORS map in src/extraction/languages/index.ts as a separate hand-curated registry that TreeSitterExtractor read from. Adding a new grammar-backed language would have required editing EXTRACTORS too, undermining the refactor's stated single-source-of- truth claim. A future contributor missing the EXTRACTORS update would silently produce empty extraction results. Fix: - TreeSitterExtractor now reads its extractor straight off the language def: getLanguageDefByName(this.language)?.grammar?.extractor - EXTRACTORS in languages/index.ts becomes a Proxy that derives lazily from the registry (kept for backward compat — readers unchanged). - Add 16 structural-invariant tests in __tests__/language-registry.test.ts that fail loudly if any derived consumer drifts from the registry: EXTRACTORS / EXTENSION_MAP / detectLanguage / isLanguageSupported / getSupportedLanguages / getLanguageDisplayName all asserted to exactly mirror the registry contents. Adding a new grammar-backed language is now genuinely "one new file + two lines in registry.ts" — no other files to touch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…flicts
Today every PR adding an MCP tool conflicts on the same two
shared lists in src/mcp/tools.ts: the tools[] array (the
list_tools surface) and the case switch in execute(). After this
refactor:
Adding a new MCP tool:
1. Drop a file at src/mcp/tools/<name>.ts exporting a
<NAME>_TOOL: ToolModule (definition + handlerKey).
2. Add one import line and one array entry to
src/mcp/tools/registry.ts.
3. Implement handle<Name>(args) on ToolHandler in tools.ts and
add the new key to HandlerKey in tools/types.ts.
Step 3 is the only remaining "shared method on a single class"
conflict surface. Extracting handler bodies into per-tool files
(making step 3 also a single-file addition) is left as a
follow-up — the cost/benefit favors landing this incremental win
now and finishing the body extraction once language and migration
refactors land.
## What's new
- **src/mcp/tool-types.ts** — extracted ToolDefinition, ToolResult,
PropertySchema, projectPathProperty into a shared module so
per-tool files can import without circular dependency.
- **src/mcp/tools/types.ts** — ToolModule interface, HandlerKey
string union, and ToolHandlerLike (a structural type that
ToolHandler now `implements`, providing compile-time guarantee
that every HandlerKey maps to a real method).
- **src/mcp/tools/<name>.ts × 9** — one file per existing tool
(callees, callers, context, explore, files, impact, node, search,
status). Each ~25-30 lines: import + definition literal +
handlerKey reference.
- **src/mcp/tools/registry.ts** — static-import barrel, sorted
alphabetically. Exports getToolModules(), getToolModule(name),
and the derived `tools[]` array.
- **src/mcp/tools.ts** — ~200 lines deleted from the top
(inline types + tools[] array + projectPathProperty).
execute()'s case-switch replaced with a registry lookup +
type-safe `this[mod.handlerKey](args)` dispatch (now compile-
time-checked thanks to `implements ToolHandlerLike`).
All `private async handle*` methods now public to match the
interface. errorResult/textResult also public for the same reason.
- **src/mcp/index.ts** — MCPServer's tool-existence check switched
from a linear `tools.find()` scan to the O(1) `getToolModule()`
Map lookup, eliminating two parallel lookup paths.
## Tests
387/387 pass. **7 new tests** in __tests__/mcp-tool-registry.test.ts:
- Definitions are well-formed (name shape, description length).
- handlerKey shape (`handle<UpperCase>`).
- Every registered handlerKey resolves to a real method on
ToolHandler.
- Exported `tools[]` exactly mirrors the registry.
- Canonical 9 main-line tools regression guard.
- execute() unknown-tool error path.
- **End-to-end dispatch smoke test**: execute('codegraph_status', {})
reaches the real handler body (no broken `this` binding) — would
fail loudly if the dynamic dispatch chain ever breaks.
## Reviewer pass
Independent reviewer ran once. 2 REQUEST_CHANGES + 2 INFO addressed:
1. ToolHandlerLike was defined but never enforced —
ToolHandler now `implements ToolHandlerLike`. Eliminates the
`(this as unknown as Record<...>)` cast in execute(); dispatch
is fully compile-time-checked.
2. No end-to-end dispatch test — added one (see Tests above).
3. MCPServer.handleToolsCall used a linear `tools.find()` scan
while execute() used Map lookup — switched to getToolModule()
for parity.
4. Removed redundant .slice() in registry.ts (map() already
returns a fresh array).
## Backward compat
src/mcp/tools.ts still re-exports ToolDefinition, ToolResult, the
mutable `tools[]` array, ToolHandler, and getExploreBudget. Every
existing consumer (`import { ToolDefinition, ToolResult, tools,
ToolHandler } from './tools'`) keeps working unchanged.
## Affected open PRs
- colbymchenry#110 (review-context): rebases to 1 new file in tools/ + 2
lines in registry.ts + 1 method on ToolHandler + 1 line in
HandlerKey.
- colbymchenry#112 (centrality+churn): same shape for the codegraph_hotspots
tool.
- colbymchenry#114 (config-refs): same shape for codegraph_config.
- colbymchenry#115 (sql-refs): same shape for codegraph_sql.
Each goes from 4-way conflict (tools[] + case + handler + helpers)
down to 1-way conflict (HandlerKey + handler method on ToolHandler,
both in tools.ts).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Today every PR adding a schema migration claims `CURRENT_SCHEMA_VERSION = next` AND adds an array entry to `migrations: Migration[]` in src/db/migrations.ts. Two PRs both claiming the same version resolve as: "second PR's v4 silently no-ops on existing DBs" — a real silent-data-loss bug class (PR colbymchenry#113's reviewer caught one). After this refactor: Adding a new schema migration: 1. Pick the next free 3-digit prefix (`git ls-files 'src/db/migrations/[0-9]*.ts'` shows what's taken). 2. Create `src/db/migrations/<NNN>-<short-name>.ts` exporting a `MIGRATION: MigrationModule` (description + up). 3. Add one import line and one entry to `src/db/migrations/index.ts`'s REGISTERED_MODULES array. Two PRs both creating `004-foo.ts` collide on the FILESYSTEM — the maintainer sees it instantly. No more silent skipped migrations. ## What's new - `src/db/migrations/types.ts` — `MigrationModule { description, up }` and `Migration extends MigrationModule { version }`. - `src/db/migrations/002-project-metadata.ts` — extracted v2 body verbatim. - `src/db/migrations/003-lower-name-index.ts` — extracted v3 body verbatim. - `src/db/migrations/index.ts` — central registry. Static-imports each migration, parses the version FROM THE FILENAME (no hand-typed version field that can drift), enforces strict `NNN-kebab-name.ts` shape, validates uniqueness/sort at module load (throws loudly on collision), exposes ALL_MIGRATIONS and CURRENT_SCHEMA_VERSION. - `src/db/migrations.ts` — refactored to a thin runner. Same exported surface (CURRENT_SCHEMA_VERSION, getCurrentVersion, runMigrations, needsMigration, getPendingMigrations, getMigrationHistory, Migration type) — every existing import keeps working unchanged. - `__tests__/migrations-registry.test.ts` — 8 invariant tests: registry non-empty, versions unique + strictly ascending, CURRENT_SCHEMA_VERSION matches max, every file matches the strict NNN-kebab-name pattern, no orphan files, no phantom registrations. ## Reviewer pass Independent reviewer ran once. 3 REQUEST_CHANGES + 1 INFO addressed: 1. Hand-typed `version` field in REGISTERED_MODULES could drift from filename. **Fixed**: removed the version field; registry now parses version from filename via FILENAME_PATTERN regex inside validateRegistered. 2. Filename-pattern test was lenient (allowed 4-digit or 1-digit prefixes). **Fixed**: new "every migration file matches the strict NNN-kebab-name.ts pattern" test catches malformed filenames as orphan-detection-bypassing offenders. 3. `getPendingMigrations` returned `readonly Migration[]`, breaking callers that typed the result as `Migration[]`. **Fixed**: returns a fresh mutable array via `.slice()`. 4. No throw-on-duplicate test for validateRegistered (module evaluation timing). Acknowledged; not added. ## Backward compat Every existing import works unchanged: - `import { CURRENT_SCHEMA_VERSION } from './migrations'` ✓ - `import { runMigrations } from './migrations'` ✓ - `import { needsMigration } from './migrations'` ✓ - `import { getMigrationHistory } from './migrations'` ✓ - `import { getPendingMigrations } from './migrations'` — returns mutable Migration[] (preserved) - `Migration` type — re-exported ## Affected open PRs Every migration-touching PR (colbymchenry#102 UNIQUE edges, colbymchenry#105 cochange, colbymchenry#108 perf db, colbymchenry#111 LLM features, my colbymchenry#112 centrality+churn, colbymchenry#113 issue-history, colbymchenry#114 config-refs, colbymchenry#115 sql-refs) currently claims migration v4 and conflicts with each other on `migrations.ts`. After this lands they each become: - 1 new file: `src/db/migrations/<NNN>-<name>.ts` - 2 lines in registry.ts (import + array entry) Conflict shape changes from "next free version + array entry + CURRENT_SCHEMA_VERSION bump in one file" (4-way conflict) to "1 new file" + 2-line registry edit. If two PRs target the same NNN, the filesystem collision surfaces immediately — no silent skipped migrations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Today every PR adding a derived-signal pass (centrality, churn,
issue-history, config-refs, sql-refs, cochange) edits the same
3 spots in src/index.ts:
1. New imports at the top
2. New private method on `CodeGraph` (e.g. runDerivedSignals,
runIssueHistoryPass, runConfigRefsPass, runSqlRefsPass)
3. New call site in `indexAll` AFTER resolution
4. New call site in `sync` AFTER resolution
5 PRs collide on every one of those.
After this refactor:
Adding a new derived-signal pass:
1. Create `src/index-hooks/<name>.ts` exporting a
`HOOK: IndexHook` constant with `afterIndexAll` and/or
`afterSync` methods.
2. Add one import + one entry to
`src/index-hooks/registry.ts`.
`CodeGraph.indexAll` and `sync` invoke the hook runner once;
adding a new pass touches only the hook file + the registry.
Zero changes to CodeGraph itself.
## What's new
- **src/index-hooks/types.ts** — `IndexHook` interface
(`afterIndexAll`, `afterSync`, both optional), `IndexHookContext`
(projectRoot + config + queries + db), and
`IndexHookOutcome` for diagnostic reporting.
- **src/index-hooks/registry.ts** — static-import list of every
registered hook (empty on main today; PRs adding hooks fill it
in), plus the `runAfterIndexAll` / `runAfterSync` runners that
iterate hooks and catch errors so one broken hook never fails
indexing.
- **src/index.ts** — `indexAll` calls `runAfterIndexAll(ctx)`
after resolution. `sync` calls `runAfterSync(ctx, result)`
after resolution. New private `buildHookContext()` helper
exposes a stable read-only context.
- **__tests__/index-hooks.test.ts** — 6 tests covering empty
registry, runner shape, and the `afterIndexAll` / `afterSync`
contracts.
## Why ship the framework on main with zero registered hooks?
The only consumers of this framework today are 5 unmerged PRs
(colbymchenry#105 cochange + my colbymchenry#112-colbymchenry#115). Landing the framework now lets
each of those PRs rebase to a 2-line change instead of 8-10
lines mutating CodeGraph adjacent-line. Without this, all 5 PRs
collide on the same indexAll/sync call sites.
The framework adds zero behavior on main (no registered hooks =
no-op runner). 380→386 tests confirm no regression.
## Affected open PRs
| PR | Today | After this lands |
|---|---|---|
| colbymchenry#105 cochange | runDerivedSignals helper + 2 call sites | 1 hook file in src/index-hooks/ + 2 lines in registry.ts |
| colbymchenry#112 centrality+churn | same shape | same shape |
| colbymchenry#113 issue-history | same shape | same shape |
| colbymchenry#114 config-refs | same shape | same shape |
| colbymchenry#115 sql-refs | same shape | same shape |
Each goes from "edit CodeGraph in 4 spots" to "drop a hook file."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…efactors
Lands centrality (PageRank) and churn (git history) as registered
IndexHooks (`afterIndexAll` + `afterSync`) instead of CodeGraph
private methods. Adds:
- Migration 004: nodes.centrality + files.{commit_count,loc,
first_seen_ts,last_touched_ts} + indexes
- src/centrality/ + src/churn/ (pure modules)
- src/index-hooks/centrality.ts + churn.ts (registered hooks)
- CodeGraph public methods: getCentrality, getTopCentralNodes,
getCentralityRank, getFileChurn, getHotspots
- codegraph_hotspots MCP tool wired through ToolModule registry
+ handleHotspots on ToolHandler
- Updated regression-guard tests (index-hooks, mcp-tool-registry)
to reflect newly registered hooks/tools
Tests: 440/440 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mines Fixes/Closes/Resolves #N commits and attributes them to symbols touched by each commit hunks. Lands as a registered IndexHook (issue-history). - Migration 005: symbol_issues table - src/issue-history/ (pure module): mineIssueHistory + parse-diff - src/index-hooks/issue-history.ts (registered hook) - CodeGraph public method: getIssuesForNode - codegraph_node MCP tool now surfaces issue history line - enableIssueHistory flag default true wired through config merge - Removed defensive ensureSymbolIssuesTable guard and its test: the v4-collision bug class is impossible under file-based migrations (PR colbymchenry#118 refactor); filenames collide on the filesystem instead. Tests: 470/471 pass (1 watcher flake under load, isolation OK). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Copied from colbymchenry/codegraph#113