Skip to content

feat(graph): PageRank centrality + per-file churn metrics#112

Open
andreinknv wants to merge 9 commits into
colbymchenry:mainfrom
andreinknv:feat/centrality-and-churn
Open

feat(graph): PageRank centrality + per-file churn metrics#112
andreinknv wants to merge 9 commits into
colbymchenry:mainfrom
andreinknv:feat/centrality-and-churn

Conversation

@andreinknv
Copy link
Copy Markdown
Contributor

@andreinknv andreinknv commented Apr 27, 2026

🔄 Rebased onto post-refactor patterns (Centrality + churn + hotspots)
This PR has been rebased to use the per-thing file layout established by refactor PRs #116/#117/#118/#119. Behavior shipped is identical to the original; only the internal code shape changed.

What this PR now ships:

  • src/index-hooks/centrality.ts + src/index-hooks/churn.ts — two registered hooks
  • src/db/migrations/004-centrality-churn.ts — adds nodes.centrality + files.{commit_count,loc,first_seen_ts,last_touched_ts}
  • src/mcp/tools/hotspots.tsHOTSPOTS_TOOL: ToolModule for codegraph_hotspots
  • src/centrality/index.ts + src/churn/index.ts — pure compute modules
  • src/index-hooks/registry.ts, src/db/migrations/index.ts, src/mcp/tools/registry.ts, src/mcp/tools/types.ts — registry entries (~2 lines each)
  • src/db/queries.ts, src/db/schema.sql, src/types.ts, src/default-config.ts, src/config.ts, src/index.ts, src/mcp/tools.ts — query methods, config flags, public API, handler
  • __tests__/centrality.test.ts, __tests__/churn.test.ts

What you'll see in the diff: until #116-#119 land, the diff against main includes the refactor commits as part of the branch history. After they merge, GitHub auto-shrinks this PR's diff to just the files above. Full context: #120.


Summary

Adds two graph-derived signals — PageRank centrality on nodes and per-file churn mined from git — and combines them into a codegraph_hotspots MCP tool that ranks files by risk = centrality × churn. Empirically validated before implementation: top hotspot on this repo's own history is src/extraction/tree-sitter.ts (47 commits × 2347 LOC), exactly the file PR #77 split.

Why both signals together: centrality alone surfaces architecturally-important code; churn alone surfaces volatile code. The intersection — central code that changes often — is where bugs accumulate, and that's the agent-facing query this PR ships.

What's added

  • Centrality (src/centrality/index.ts): PageRank over calls + references edges (excluding contains so lexical hierarchy doesn't dominate). Pure compute, dangling-mass redistribution, Float64Array. Sub-100 ms on this repo (1,688 nodes / 4,319 edges).
  • Churn (src/churn/index.ts): per-file commit count, first/last touched timestamps, LOC. Mined from git log -z. Incremental via last_mined_churn_head with needsFullRescan recovery for force-push / gc.
  • Migration v4 with PRAGMA-table_info guards — coexists cleanly with feat(cochange): file-level co-change graph mined from git history #105 (cochange) which also targets v4.
  • codegraph_hotspots MCP tool: ranked file list, sortBy = risk | centrality | churn, with minCommits / minCentrality filters.
  • codegraph_node annotation: centrality + 1-based rank when available.
  • Config flags: enableCentrality / enableChurn (default true) for opt-out on shallow clones / non-git checkouts.

Bench

Step Time
Centrality recompute (1.7k nodes / 4.3k edges) ~50 ms
Full churn mine (98 commits) ~200 ms
Total derived-signals pass < 300 ms

Both passes are best-effort: any failure is logged + swallowed so indexing/sync still returns success. The data is advisory.

Sample output (from node dist/bin/codegraph.js index . && cg.getHotspots())

1.  src/extraction/tree-sitter.ts  pr=0.0349 commits=47 risk=1.6399
2.  src/db/queries.ts              pr=0.0539 commits=21 risk=1.1317
3.  src/index.ts                   pr=0.0738 commits=15 risk=1.1068
4.  src/types.ts                   pr=0.0699 commits=15 risk=1.0479
5.  src/mcp/tools.ts               pr=0.0311 commits=23 risk=0.7156

Side-finding from PageRank: ranks 15 / 23 in the raw output (trim, run) are resolver false-positives — overly-generic short names being conflated across files. Useful diagnostic byproduct, not a goal of this PR.

Tests

408 pass total, 28 new:

  • centrality.test.ts (10) — empty / isolated / sinks / star / cycle / mass-conservation / disconnected components / unknown-node edges / damping sentinel
  • churn.test.ts (13) — non-git / commit counts / timestamps / MAX_FILES_PER_COMMIT skip / incremental mining / unreachable sha / no-op / unicode + spaces in paths / LOC newline edge cases
  • derived-signals.test.ts (5) — end-to-end pipeline hub/leaf / no-git fallback / enableCentrality:false / sync incremental / force-push recovery

Reviewer pass

Independent reviewer agent ran once. Two REQUEST_CHANGES + three INFO items addressed:

  • Added minCentrality to codegraph_hotspots schema (capability gap vs. underlying API).
  • Added integration test for needsFullRescan recovery path (only untested branch).
  • Documented why upsertFile deliberately omits churn columns from ON CONFLICT DO UPDATE.
  • Annotated the ?? '' belt-and-braces guard at the recovery path.

Test plan

  • npx tsc --noEmit clean
  • npm test — 408/408 passing
  • End-to-end smoke on this repo: codegraph index . → centrality + churn populated, hotspots match spike-validated output
  • Migration v3 → v4 round-trip verified in foundation/pr19 tests
  • Reviewer agent: REQUEST_CHANGES → APPROVE after fixes
  • Reviewer's final pass after fixes (optional — happy to re-run)

Coexistence note

#105 (cochange) targets the same migration v4 with its own commit_count column add. Both PRs use idempotent PRAGMA table_info guards on the column add, so whichever lands first does not block the other.

🤖 Generated with Claude Code

andreinknv and others added 3 commits April 27, 2026 16:27
Adding a new language used to require coordinated edits to 6
shared lists across 4 files (Language union in types.ts;
DEFAULT_CONFIG.include; WASM_GRAMMAR_FILES, EXTENSION_MAP, and
getLanguageDisplayName in grammars.ts; EXTRACTORS map in
languages/index.ts). Two PRs adding different languages typically
conflicted on every one of those.

After this refactor, adding a new language is:

  1. Drop a file at src/extraction/languages/<name>.ts exporting an
     <NAME>_DEF: LanguageDef constant.
  2. Add ONE import line and ONE array entry to
     src/extraction/languages/registry.ts (alphabetical position —
     adjacent additions are still possible but rare).

That's it. grammars.ts, types.ts, tree-sitter.ts dispatch, and the
default include globs are all derived from the registry.

## What's in a LanguageDef

```ts
interface LanguageDef {
  name: string;                  // canonical id
  displayName: string;           // "Pascal / Delphi"
  extensions: readonly string[]; // ['.pas', '.dpr', ...]
  includeGlobs: readonly string[];
  grammar?: { wasmFile, vendored?, extractor };  // tree-sitter
  customExtractor?: (fp, src) => ExtractionResult;  // Liquid, Svelte
  extensionOverrides?: { '.dfm': { customExtractor } };  // Pascal forms
}
```

Each existing language file now exports both its `xxxExtractor`
(unchanged) AND a new `XXX_DEF`. New files were added for tsx, jsx,
svelte, liquid (the latter two wrap their existing custom extractor
classes via the customExtractor field).

## Refactored consumers

- src/extraction/grammars.ts: WASM_GRAMMAR_FILES removed (was
  internal-only); EXTENSION_MAP now a Proxy that lazy-builds from
  the registry on first access (avoids TDZ in cyclic load paths).
  loadGrammarsForLanguages, isLanguageSupported, isGrammarLoaded,
  getSupportedLanguages, getLanguageDisplayName, detectLanguage —
  all read from registry.
- src/extraction/tree-sitter.ts: extractFromSource's if-chain
  (svelte / liquid / pascal+.dfm/.fmx) replaced with one lookup:
  def.extensionOverrides[ext]?.customExtractor || def.customExtractor.
  Drops direct imports of LiquidExtractor, SvelteExtractor,
  DfmExtractor.
- src/types.ts: DEFAULT_CONFIG moved to src/default-config.ts (cycle
  break). types.ts re-exports for backward compat. The `include`
  array is now built lazily from each LanguageDef's includeGlobs.

## What still requires a one-line edit

The Language string union in types.ts still hard-codes the known
languages (typescript | javascript | … | unknown). New languages
added to the registry work at runtime as strings, but adding the
literal here is required IF the resolver wants to do exhaustive
narrowing on the new language (resolution/index.ts and
resolution/import-resolver.ts have a few `language === 'X'`
branches). Most new languages don't need such branches.

This trade-off keeps strict narrowing for the existing handful of
language-specific code paths while making everything else
registry-driven.

## Tests

380/380 pass. No new tests; behavior is identical. Existing
extraction.test.ts and pr19-improvements.test.ts heavily exercise
detectLanguage, isLanguageSupported, getSupportedLanguages, and
loadAllGrammars — all green.

## Follow-ups (out of scope)

- Auto-discovery in registry.ts via fs.readdirSync — works in
  built dist/ but vite-node doesn't support extensionless require()
  of TS source. A small build-time generator could remove the
  static import list entirely.
- Splitting __tests__/extraction.test.ts into per-language test
  files — eliminates the test-end-of-file conflict surface that
  every language PR currently hits.
- Similar registry refactors for:
  - MCP tool definitions (each tool self-registers; no shared
    tools[] array or case-switch in execute())
  - Migration files (each migration in src/db/migrations/NNN-*.ts;
    auto-discovered by version)
  - Index/sync hooks (centrality, churn, issue-history,
    config-refs, sql-refs, cochange all currently mutate
    CodeGraph.indexAll/sync; an IndexHook interface would make
    each pass self-contained)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tractor

Reviewer caught a real bug: the original commit kept the
EXTRACTORS map in src/extraction/languages/index.ts as a separate
hand-curated registry that TreeSitterExtractor read from. Adding
a new grammar-backed language would have required editing
EXTRACTORS too, undermining the refactor's stated single-source-of-
truth claim. A future contributor missing the EXTRACTORS update
would silently produce empty extraction results.

Fix:
- TreeSitterExtractor now reads its extractor straight off the
  language def: getLanguageDefByName(this.language)?.grammar?.extractor
- EXTRACTORS in languages/index.ts becomes a Proxy that derives
  lazily from the registry (kept for backward compat — readers
  unchanged).
- Add 16 structural-invariant tests in __tests__/language-registry.test.ts
  that fail loudly if any derived consumer drifts from the registry:
  EXTRACTORS / EXTENSION_MAP / detectLanguage / isLanguageSupported /
  getSupportedLanguages / getLanguageDisplayName all asserted to
  exactly mirror the registry contents.

Adding a new grammar-backed language is now genuinely "one new file
+ two lines in registry.ts" — no other files to touch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…flicts

Today every PR adding an MCP tool conflicts on the same two
shared lists in src/mcp/tools.ts: the tools[] array (the
list_tools surface) and the case switch in execute(). After this
refactor:

  Adding a new MCP tool:
  1. Drop a file at src/mcp/tools/<name>.ts exporting a
     <NAME>_TOOL: ToolModule (definition + handlerKey).
  2. Add one import line and one array entry to
     src/mcp/tools/registry.ts.
  3. Implement handle<Name>(args) on ToolHandler in tools.ts and
     add the new key to HandlerKey in tools/types.ts.

Step 3 is the only remaining "shared method on a single class"
conflict surface. Extracting handler bodies into per-tool files
(making step 3 also a single-file addition) is left as a
follow-up — the cost/benefit favors landing this incremental win
now and finishing the body extraction once language and migration
refactors land.

## What's new

- **src/mcp/tool-types.ts** — extracted ToolDefinition, ToolResult,
  PropertySchema, projectPathProperty into a shared module so
  per-tool files can import without circular dependency.
- **src/mcp/tools/types.ts** — ToolModule interface, HandlerKey
  string union, and ToolHandlerLike (a structural type that
  ToolHandler now `implements`, providing compile-time guarantee
  that every HandlerKey maps to a real method).
- **src/mcp/tools/<name>.ts × 9** — one file per existing tool
  (callees, callers, context, explore, files, impact, node, search,
  status). Each ~25-30 lines: import + definition literal +
  handlerKey reference.
- **src/mcp/tools/registry.ts** — static-import barrel, sorted
  alphabetically. Exports getToolModules(), getToolModule(name),
  and the derived `tools[]` array.
- **src/mcp/tools.ts** — ~200 lines deleted from the top
  (inline types + tools[] array + projectPathProperty).
  execute()'s case-switch replaced with a registry lookup +
  type-safe `this[mod.handlerKey](args)` dispatch (now compile-
  time-checked thanks to `implements ToolHandlerLike`).
  All `private async handle*` methods now public to match the
  interface. errorResult/textResult also public for the same reason.
- **src/mcp/index.ts** — MCPServer's tool-existence check switched
  from a linear `tools.find()` scan to the O(1) `getToolModule()`
  Map lookup, eliminating two parallel lookup paths.

## Tests

387/387 pass. **7 new tests** in __tests__/mcp-tool-registry.test.ts:
- Definitions are well-formed (name shape, description length).
- handlerKey shape (`handle<UpperCase>`).
- Every registered handlerKey resolves to a real method on
  ToolHandler.
- Exported `tools[]` exactly mirrors the registry.
- Canonical 9 main-line tools regression guard.
- execute() unknown-tool error path.
- **End-to-end dispatch smoke test**: execute('codegraph_status', {})
  reaches the real handler body (no broken `this` binding) — would
  fail loudly if the dynamic dispatch chain ever breaks.

## Reviewer pass

Independent reviewer ran once. 2 REQUEST_CHANGES + 2 INFO addressed:

1. ToolHandlerLike was defined but never enforced —
   ToolHandler now `implements ToolHandlerLike`. Eliminates the
   `(this as unknown as Record<...>)` cast in execute(); dispatch
   is fully compile-time-checked.
2. No end-to-end dispatch test — added one (see Tests above).
3. MCPServer.handleToolsCall used a linear `tools.find()` scan
   while execute() used Map lookup — switched to getToolModule()
   for parity.
4. Removed redundant .slice() in registry.ts (map() already
   returns a fresh array).

## Backward compat

src/mcp/tools.ts still re-exports ToolDefinition, ToolResult, the
mutable `tools[]` array, ToolHandler, and getExploreBudget. Every
existing consumer (`import { ToolDefinition, ToolResult, tools,
ToolHandler } from './tools'`) keeps working unchanged.

## Affected open PRs

- colbymchenry#110 (review-context): rebases to 1 new file in tools/ + 2
  lines in registry.ts + 1 method on ToolHandler + 1 line in
  HandlerKey.
- colbymchenry#112 (centrality+churn): same shape for the codegraph_hotspots
  tool.
- colbymchenry#114 (config-refs): same shape for codegraph_config.
- colbymchenry#115 (sql-refs): same shape for codegraph_sql.

Each goes from 4-way conflict (tools[] + case + handler + helpers)
down to 1-way conflict (HandlerKey + handler method on ToolHandler,
both in tools.ts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Today every PR adding a schema migration claims
`CURRENT_SCHEMA_VERSION = next` AND adds an array entry to
`migrations: Migration[]` in src/db/migrations.ts. Two PRs both
claiming the same version resolve as: "second PR's v4 silently
no-ops on existing DBs" — a real silent-data-loss bug class
(PR colbymchenry#113's reviewer caught one).

After this refactor:

  Adding a new schema migration:
  1. Pick the next free 3-digit prefix (`git ls-files
     'src/db/migrations/[0-9]*.ts'` shows what's taken).
  2. Create `src/db/migrations/<NNN>-<short-name>.ts` exporting a
     `MIGRATION: MigrationModule` (description + up).
  3. Add one import line and one entry to
     `src/db/migrations/index.ts`'s REGISTERED_MODULES array.

Two PRs both creating `004-foo.ts` collide on the FILESYSTEM —
the maintainer sees it instantly. No more silent skipped
migrations.

## What's new

- `src/db/migrations/types.ts` — `MigrationModule { description,
  up }` and `Migration extends MigrationModule { version }`.
- `src/db/migrations/002-project-metadata.ts` — extracted v2
  body verbatim.
- `src/db/migrations/003-lower-name-index.ts` — extracted v3
  body verbatim.
- `src/db/migrations/index.ts` — central registry. Static-imports
  each migration, parses the version FROM THE FILENAME (no
  hand-typed version field that can drift), enforces strict
  `NNN-kebab-name.ts` shape, validates uniqueness/sort at module
  load (throws loudly on collision), exposes ALL_MIGRATIONS and
  CURRENT_SCHEMA_VERSION.
- `src/db/migrations.ts` — refactored to a thin runner. Same
  exported surface (CURRENT_SCHEMA_VERSION, getCurrentVersion,
  runMigrations, needsMigration, getPendingMigrations,
  getMigrationHistory, Migration type) — every existing import
  keeps working unchanged.
- `__tests__/migrations-registry.test.ts` — 8 invariant tests:
  registry non-empty, versions unique + strictly ascending,
  CURRENT_SCHEMA_VERSION matches max, every file matches the
  strict NNN-kebab-name pattern, no orphan files, no phantom
  registrations.

## Reviewer pass

Independent reviewer ran once. 3 REQUEST_CHANGES + 1 INFO addressed:

1. Hand-typed `version` field in REGISTERED_MODULES could drift
   from filename. **Fixed**: removed the version field; registry
   now parses version from filename via FILENAME_PATTERN regex
   inside validateRegistered.
2. Filename-pattern test was lenient (allowed 4-digit or 1-digit
   prefixes). **Fixed**: new "every migration file matches the
   strict NNN-kebab-name.ts pattern" test catches malformed
   filenames as orphan-detection-bypassing offenders.
3. `getPendingMigrations` returned `readonly Migration[]`,
   breaking callers that typed the result as `Migration[]`.
   **Fixed**: returns a fresh mutable array via `.slice()`.
4. No throw-on-duplicate test for validateRegistered (module
   evaluation timing). Acknowledged; not added.

## Backward compat

Every existing import works unchanged:
- `import { CURRENT_SCHEMA_VERSION } from './migrations'` ✓
- `import { runMigrations } from './migrations'` ✓
- `import { needsMigration } from './migrations'` ✓
- `import { getMigrationHistory } from './migrations'` ✓
- `import { getPendingMigrations } from './migrations'` — returns
   mutable Migration[] (preserved)
- `Migration` type — re-exported

## Affected open PRs

Every migration-touching PR (colbymchenry#102 UNIQUE edges, colbymchenry#105 cochange,
colbymchenry#108 perf db, colbymchenry#111 LLM features, my colbymchenry#112 centrality+churn, colbymchenry#113
issue-history, colbymchenry#114 config-refs, colbymchenry#115 sql-refs) currently
claims migration v4 and conflicts with each other on
`migrations.ts`. After this lands they each become:
- 1 new file: `src/db/migrations/<NNN>-<name>.ts`
- 2 lines in registry.ts (import + array entry)

Conflict shape changes from "next free version + array entry +
CURRENT_SCHEMA_VERSION bump in one file" (4-way conflict) to "1
new file" + 2-line registry edit. If two PRs target the same
NNN, the filesystem collision surfaces immediately — no silent
skipped migrations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Today every PR adding a derived-signal pass (centrality, churn,
issue-history, config-refs, sql-refs, cochange) edits the same
3 spots in src/index.ts:

  1. New imports at the top
  2. New private method on `CodeGraph` (e.g. runDerivedSignals,
     runIssueHistoryPass, runConfigRefsPass, runSqlRefsPass)
  3. New call site in `indexAll` AFTER resolution
  4. New call site in `sync` AFTER resolution

5 PRs collide on every one of those.

After this refactor:

  Adding a new derived-signal pass:
  1. Create `src/index-hooks/<name>.ts` exporting a
     `HOOK: IndexHook` constant with `afterIndexAll` and/or
     `afterSync` methods.
  2. Add one import + one entry to
     `src/index-hooks/registry.ts`.

`CodeGraph.indexAll` and `sync` invoke the hook runner once;
adding a new pass touches only the hook file + the registry.
Zero changes to CodeGraph itself.

## What's new

- **src/index-hooks/types.ts** — `IndexHook` interface
  (`afterIndexAll`, `afterSync`, both optional), `IndexHookContext`
  (projectRoot + config + queries + db), and
  `IndexHookOutcome` for diagnostic reporting.
- **src/index-hooks/registry.ts** — static-import list of every
  registered hook (empty on main today; PRs adding hooks fill it
  in), plus the `runAfterIndexAll` / `runAfterSync` runners that
  iterate hooks and catch errors so one broken hook never fails
  indexing.
- **src/index.ts** — `indexAll` calls `runAfterIndexAll(ctx)`
  after resolution. `sync` calls `runAfterSync(ctx, result)`
  after resolution. New private `buildHookContext()` helper
  exposes a stable read-only context.
- **__tests__/index-hooks.test.ts** — 6 tests covering empty
  registry, runner shape, and the `afterIndexAll` / `afterSync`
  contracts.

## Why ship the framework on main with zero registered hooks?

The only consumers of this framework today are 5 unmerged PRs
(colbymchenry#105 cochange + my colbymchenry#112-colbymchenry#115). Landing the framework now lets
each of those PRs rebase to a 2-line change instead of 8-10
lines mutating CodeGraph adjacent-line. Without this, all 5 PRs
collide on the same indexAll/sync call sites.

The framework adds zero behavior on main (no registered hooks =
no-op runner). 380→386 tests confirm no regression.

## Affected open PRs

| PR | Today | After this lands |
|---|---|---|
| colbymchenry#105 cochange | runDerivedSignals helper + 2 call sites | 1 hook file in src/index-hooks/ + 2 lines in registry.ts |
| colbymchenry#112 centrality+churn | same shape | same shape |
| colbymchenry#113 issue-history | same shape | same shape |
| colbymchenry#114 config-refs | same shape | same shape |
| colbymchenry#115 sql-refs | same shape | same shape |

Each goes from "edit CodeGraph in 4 spots" to "drop a hook file."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv and others added 4 commits April 27, 2026 17:38
…efactors

Lands centrality (PageRank) and churn (git history) as registered
IndexHooks (`afterIndexAll` + `afterSync`) instead of CodeGraph
private methods. Adds:

- Migration 004: nodes.centrality + files.{commit_count,loc,
  first_seen_ts,last_touched_ts} + indexes
- src/centrality/ + src/churn/ (pure modules)
- src/index-hooks/centrality.ts + churn.ts (registered hooks)
- CodeGraph public methods: getCentrality, getTopCentralNodes,
  getCentralityRank, getFileChurn, getHotspots
- codegraph_hotspots MCP tool wired through ToolModule registry
  + handleHotspots on ToolHandler
- Updated regression-guard tests (index-hooks, mcp-tool-registry)
  to reflect newly registered hooks/tools

Tests: 440/440 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@andreinknv andreinknv force-pushed the feat/centrality-and-churn branch from 65a7687 to 8b16d8e Compare April 27, 2026 22:53
andreinknv added a commit to andreinknv/codegraph that referenced this pull request Apr 27, 2026
…efactors

Lands centrality (PageRank) and churn (git history) as registered
IndexHooks (`afterIndexAll` + `afterSync`) instead of CodeGraph
private methods. Adds:

- Migration 004: nodes.centrality + files.{commit_count,loc,
  first_seen_ts,last_touched_ts} + indexes
- src/centrality/ + src/churn/ (pure modules)
- src/index-hooks/centrality.ts + churn.ts (registered hooks)
- CodeGraph public methods: getCentrality, getTopCentralNodes,
  getCentralityRank, getFileChurn, getHotspots
- codegraph_hotspots MCP tool wired through ToolModule registry
  + handleHotspots on ToolHandler
- Updated regression-guard tests (index-hooks, mcp-tool-registry)
  to reflect newly registered hooks/tools

Tests: 440/440 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@andreinknv
Copy link
Copy Markdown
Contributor Author

Heads-up: this branch has been force-pushed. This PR's contents have been rebased to use the per-thing file layout established by refactor PRs #116-#119. Functionality is identical to the original; the internal file shape now matches the post-refactor architecture so this PR can be merged after the refactors with zero conflicts.

The current diff against main looks larger than the feature alone because it transitively includes the 4 refactor commits + any prerequisite PRs. That diff auto-shrinks as the prerequisites merge in the order documented in #120. See #120 (comment) for the full rebase summary.

andreinknv added a commit to andreinknv/codegraph that referenced this pull request Apr 28, 2026
The MCP `initialize` response can include an `instructions` field that
clients (Claude Code, Cursor, opencode, LangChain, OpenAI Agent SDK,
etc.) surface in the agent's system prompt automatically. Today
codegraph emits an empty initialize response — agents only see
individual tool descriptions, no overall guidance on how to compose
them.

This adds the missing playbook:

- **Tool selection by intent** — quick map from "what is X" / "how does
  X work" / "what would changing X break" to the right tool.
- **Common chains** — onboarding (context first), PR review
  (review_context), refactor planning (search → callers → impact),
  debugging a regression.
- **Tier discipline** — start at the cheap deterministic tier (search,
  context, callers, callees, impact, node, explore, files, status),
  escalate to conditional tools only when their data exists, and only
  reach for LLM-mediated tools when the cheap path doesn't suffice.
- **Agent-bridge tier** — explicit recipe for projects without a local
  LLM where the agent itself summarizes via codegraph_pending_summaries
  + codegraph_save_summaries.
- **Anti-patterns** — don't grep when search exists, don't chain
  search+node when context covers it, don't query the index immediately
  after a write.

Lives in src/mcp/server-instructions.ts so it's easy to update without
touching the JSON-RPC dispatch in src/mcp/index.ts. Single-file, no
schema changes, no migrations, no test changes needed.

References tools that exist on `main` today; doesn't presume any of the
in-flight feature PRs (colbymchenry#110, colbymchenry#112-115, colbymchenry#111) have landed. After those
merge, the relevant sections of this guidance start applying without
needing a follow-up edit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request Apr 28, 2026
…om a diff

Adds a new MCP tool that takes a unified diff and returns structured
review context for an LLM-driven PR reviewer. Codegraph becomes a
substrate for Greptile/CodeRabbit-style products without itself doing
the synthesis.

## What it returns

Per changed file:
  - status (added / modified / deleted / renamed)
  - hunks (line ranges)
  - affected symbols (line-range overlap with hunks)
  - tests covering the file (via tests-edges; graceful no-op if absent)

Per affected symbol:
  - signature, docstring
  - top-N callers, top-N callees
  - impact-radius node count

## Components

- src/review/diff-parser.ts: pure unified-diff parser
- src/review/index.ts: buildReviewContext + co-change warnings
- src/index.ts: CodeGraph.buildReviewContext public API
- src/mcp/tools/review-context.ts: ToolModule (post-colbymchenry#117 form)
- src/mcp/tools.ts: handleReviewContext + serializeReviewContextWithinCap
  (progressive-trim JSON serializer that keeps output <= MAX_OUTPUT_LENGTH
  while preserving JSON parseability)
- __tests__/review-context.test.ts: 25 tests

Output is JSON; the LLM consumer does the synthesis.

This is the post-colbymchenry#117 (per-tool MCP registry) form — the original
PR's case-switch dispatch is replaced by a ToolModule entry plus
a 'handleReviewContext' HandlerKey.

Tests that exercise co-change warnings include a graceful inline
ALTER TABLE for commit_count, since that column lands with colbymchenry#112's
centrality+churn migration. After colbymchenry#112 lands the ALTER becomes a
no-op (column already exists).
andreinknv added a commit to andreinknv/codegraph that referenced this pull request Apr 28, 2026
…om a diff

Adds a new MCP tool that takes a unified diff and returns structured
review context for an LLM-driven PR reviewer. Codegraph becomes a
substrate for Greptile/CodeRabbit-style products without itself doing
the synthesis.

Per changed file:
  - status (added / modified / deleted / renamed)
  - hunks (line ranges)
  - affected symbols (line-range overlap with hunks)
  - tests covering the file (via tests-edges; graceful no-op if absent)

Per affected symbol:
  - signature, docstring
  - top-N callers, top-N callees
  - impact-radius node count

- src/review/diff-parser.ts: pure unified-diff parser
- src/review/index.ts: buildReviewContext + co-change warnings
- src/index.ts: CodeGraph.buildReviewContext public API
- src/mcp/tools/review-context.ts: ToolModule (post-colbymchenry#117 form)
- src/mcp/tools.ts: handleReviewContext + serializeReviewContextWithinCap
  (progressive-trim JSON serializer that keeps output <= MAX_OUTPUT_LENGTH
  while preserving JSON parseability)
- __tests__/review-context.test.ts: 25 tests

Output is JSON; the LLM consumer does the synthesis.

This is the post-colbymchenry#117 (per-tool MCP registry) form — the original
PR's case-switch dispatch is replaced by a ToolModule entry plus
a 'handleReviewContext' HandlerKey.

Tests that exercise co-change warnings include a graceful inline
ALTER TABLE for commit_count, since that column lands with colbymchenry#112's
centrality+churn migration. After colbymchenry#112 lands the ALTER becomes a
no-op (column already exists).
andreinknv added a commit to andreinknv/codegraph that referenced this pull request Apr 29, 2026
Adds Steps K-O to walk the new PRs in dependency order:
  K: bug-fix wave (clean):    colbymchenry#128, colbymchenry#129
  L: resolution + search:     colbymchenry#130 (resolve), colbymchenry#131 (resolve)
  M: extraction edges:        colbymchenry#134 (resolve)
  N: biomarker stack:         colbymchenry#132, colbymchenry#133 (both resolve, on top of colbymchenry#125)
  O: search advanced:         colbymchenry#135 (resolve, on top of colbymchenry#131)

Also flips colbymchenry#125 from merge_clean to merge_resolve - it now hits a
queries.ts conflict after the Phase-4 stack lands (colbymchenry#111/colbymchenry#112/colbymchenry#123/colbymchenry#124
all extend the same QueryBuilder surface, so colbymchenry#125's biomarker columns
no longer apply cleanly without a resolution).

Validated end-to-end against colbymchenry/main HEAD: script ran
clean through all 43 PRs, npm run build succeeded, full test
suite reports 877/877 passing (was 829 before this wave: +48 from
new tests added by the new PRs plus the reviewer-driven follow-ups).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant