feat(graph): PageRank centrality + per-file churn metrics#112
Open
andreinknv wants to merge 9 commits into
Open
feat(graph): PageRank centrality + per-file churn metrics#112andreinknv wants to merge 9 commits into
andreinknv wants to merge 9 commits into
Conversation
Adding a new language used to require coordinated edits to 6
shared lists across 4 files (Language union in types.ts;
DEFAULT_CONFIG.include; WASM_GRAMMAR_FILES, EXTENSION_MAP, and
getLanguageDisplayName in grammars.ts; EXTRACTORS map in
languages/index.ts). Two PRs adding different languages typically
conflicted on every one of those.
After this refactor, adding a new language is:
1. Drop a file at src/extraction/languages/<name>.ts exporting an
<NAME>_DEF: LanguageDef constant.
2. Add ONE import line and ONE array entry to
src/extraction/languages/registry.ts (alphabetical position —
adjacent additions are still possible but rare).
That's it. grammars.ts, types.ts, tree-sitter.ts dispatch, and the
default include globs are all derived from the registry.
## What's in a LanguageDef
```ts
interface LanguageDef {
name: string; // canonical id
displayName: string; // "Pascal / Delphi"
extensions: readonly string[]; // ['.pas', '.dpr', ...]
includeGlobs: readonly string[];
grammar?: { wasmFile, vendored?, extractor }; // tree-sitter
customExtractor?: (fp, src) => ExtractionResult; // Liquid, Svelte
extensionOverrides?: { '.dfm': { customExtractor } }; // Pascal forms
}
```
Each existing language file now exports both its `xxxExtractor`
(unchanged) AND a new `XXX_DEF`. New files were added for tsx, jsx,
svelte, liquid (the latter two wrap their existing custom extractor
classes via the customExtractor field).
## Refactored consumers
- src/extraction/grammars.ts: WASM_GRAMMAR_FILES removed (was
internal-only); EXTENSION_MAP now a Proxy that lazy-builds from
the registry on first access (avoids TDZ in cyclic load paths).
loadGrammarsForLanguages, isLanguageSupported, isGrammarLoaded,
getSupportedLanguages, getLanguageDisplayName, detectLanguage —
all read from registry.
- src/extraction/tree-sitter.ts: extractFromSource's if-chain
(svelte / liquid / pascal+.dfm/.fmx) replaced with one lookup:
def.extensionOverrides[ext]?.customExtractor || def.customExtractor.
Drops direct imports of LiquidExtractor, SvelteExtractor,
DfmExtractor.
- src/types.ts: DEFAULT_CONFIG moved to src/default-config.ts (cycle
break). types.ts re-exports for backward compat. The `include`
array is now built lazily from each LanguageDef's includeGlobs.
## What still requires a one-line edit
The Language string union in types.ts still hard-codes the known
languages (typescript | javascript | … | unknown). New languages
added to the registry work at runtime as strings, but adding the
literal here is required IF the resolver wants to do exhaustive
narrowing on the new language (resolution/index.ts and
resolution/import-resolver.ts have a few `language === 'X'`
branches). Most new languages don't need such branches.
This trade-off keeps strict narrowing for the existing handful of
language-specific code paths while making everything else
registry-driven.
## Tests
380/380 pass. No new tests; behavior is identical. Existing
extraction.test.ts and pr19-improvements.test.ts heavily exercise
detectLanguage, isLanguageSupported, getSupportedLanguages, and
loadAllGrammars — all green.
## Follow-ups (out of scope)
- Auto-discovery in registry.ts via fs.readdirSync — works in
built dist/ but vite-node doesn't support extensionless require()
of TS source. A small build-time generator could remove the
static import list entirely.
- Splitting __tests__/extraction.test.ts into per-language test
files — eliminates the test-end-of-file conflict surface that
every language PR currently hits.
- Similar registry refactors for:
- MCP tool definitions (each tool self-registers; no shared
tools[] array or case-switch in execute())
- Migration files (each migration in src/db/migrations/NNN-*.ts;
auto-discovered by version)
- Index/sync hooks (centrality, churn, issue-history,
config-refs, sql-refs, cochange all currently mutate
CodeGraph.indexAll/sync; an IndexHook interface would make
each pass self-contained)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tractor Reviewer caught a real bug: the original commit kept the EXTRACTORS map in src/extraction/languages/index.ts as a separate hand-curated registry that TreeSitterExtractor read from. Adding a new grammar-backed language would have required editing EXTRACTORS too, undermining the refactor's stated single-source-of- truth claim. A future contributor missing the EXTRACTORS update would silently produce empty extraction results. Fix: - TreeSitterExtractor now reads its extractor straight off the language def: getLanguageDefByName(this.language)?.grammar?.extractor - EXTRACTORS in languages/index.ts becomes a Proxy that derives lazily from the registry (kept for backward compat — readers unchanged). - Add 16 structural-invariant tests in __tests__/language-registry.test.ts that fail loudly if any derived consumer drifts from the registry: EXTRACTORS / EXTENSION_MAP / detectLanguage / isLanguageSupported / getSupportedLanguages / getLanguageDisplayName all asserted to exactly mirror the registry contents. Adding a new grammar-backed language is now genuinely "one new file + two lines in registry.ts" — no other files to touch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…flicts
Today every PR adding an MCP tool conflicts on the same two
shared lists in src/mcp/tools.ts: the tools[] array (the
list_tools surface) and the case switch in execute(). After this
refactor:
Adding a new MCP tool:
1. Drop a file at src/mcp/tools/<name>.ts exporting a
<NAME>_TOOL: ToolModule (definition + handlerKey).
2. Add one import line and one array entry to
src/mcp/tools/registry.ts.
3. Implement handle<Name>(args) on ToolHandler in tools.ts and
add the new key to HandlerKey in tools/types.ts.
Step 3 is the only remaining "shared method on a single class"
conflict surface. Extracting handler bodies into per-tool files
(making step 3 also a single-file addition) is left as a
follow-up — the cost/benefit favors landing this incremental win
now and finishing the body extraction once language and migration
refactors land.
## What's new
- **src/mcp/tool-types.ts** — extracted ToolDefinition, ToolResult,
PropertySchema, projectPathProperty into a shared module so
per-tool files can import without circular dependency.
- **src/mcp/tools/types.ts** — ToolModule interface, HandlerKey
string union, and ToolHandlerLike (a structural type that
ToolHandler now `implements`, providing compile-time guarantee
that every HandlerKey maps to a real method).
- **src/mcp/tools/<name>.ts × 9** — one file per existing tool
(callees, callers, context, explore, files, impact, node, search,
status). Each ~25-30 lines: import + definition literal +
handlerKey reference.
- **src/mcp/tools/registry.ts** — static-import barrel, sorted
alphabetically. Exports getToolModules(), getToolModule(name),
and the derived `tools[]` array.
- **src/mcp/tools.ts** — ~200 lines deleted from the top
(inline types + tools[] array + projectPathProperty).
execute()'s case-switch replaced with a registry lookup +
type-safe `this[mod.handlerKey](args)` dispatch (now compile-
time-checked thanks to `implements ToolHandlerLike`).
All `private async handle*` methods now public to match the
interface. errorResult/textResult also public for the same reason.
- **src/mcp/index.ts** — MCPServer's tool-existence check switched
from a linear `tools.find()` scan to the O(1) `getToolModule()`
Map lookup, eliminating two parallel lookup paths.
## Tests
387/387 pass. **7 new tests** in __tests__/mcp-tool-registry.test.ts:
- Definitions are well-formed (name shape, description length).
- handlerKey shape (`handle<UpperCase>`).
- Every registered handlerKey resolves to a real method on
ToolHandler.
- Exported `tools[]` exactly mirrors the registry.
- Canonical 9 main-line tools regression guard.
- execute() unknown-tool error path.
- **End-to-end dispatch smoke test**: execute('codegraph_status', {})
reaches the real handler body (no broken `this` binding) — would
fail loudly if the dynamic dispatch chain ever breaks.
## Reviewer pass
Independent reviewer ran once. 2 REQUEST_CHANGES + 2 INFO addressed:
1. ToolHandlerLike was defined but never enforced —
ToolHandler now `implements ToolHandlerLike`. Eliminates the
`(this as unknown as Record<...>)` cast in execute(); dispatch
is fully compile-time-checked.
2. No end-to-end dispatch test — added one (see Tests above).
3. MCPServer.handleToolsCall used a linear `tools.find()` scan
while execute() used Map lookup — switched to getToolModule()
for parity.
4. Removed redundant .slice() in registry.ts (map() already
returns a fresh array).
## Backward compat
src/mcp/tools.ts still re-exports ToolDefinition, ToolResult, the
mutable `tools[]` array, ToolHandler, and getExploreBudget. Every
existing consumer (`import { ToolDefinition, ToolResult, tools,
ToolHandler } from './tools'`) keeps working unchanged.
## Affected open PRs
- colbymchenry#110 (review-context): rebases to 1 new file in tools/ + 2
lines in registry.ts + 1 method on ToolHandler + 1 line in
HandlerKey.
- colbymchenry#112 (centrality+churn): same shape for the codegraph_hotspots
tool.
- colbymchenry#114 (config-refs): same shape for codegraph_config.
- colbymchenry#115 (sql-refs): same shape for codegraph_sql.
Each goes from 4-way conflict (tools[] + case + handler + helpers)
down to 1-way conflict (HandlerKey + handler method on ToolHandler,
both in tools.ts).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
Today every PR adding a schema migration claims `CURRENT_SCHEMA_VERSION = next` AND adds an array entry to `migrations: Migration[]` in src/db/migrations.ts. Two PRs both claiming the same version resolve as: "second PR's v4 silently no-ops on existing DBs" — a real silent-data-loss bug class (PR colbymchenry#113's reviewer caught one). After this refactor: Adding a new schema migration: 1. Pick the next free 3-digit prefix (`git ls-files 'src/db/migrations/[0-9]*.ts'` shows what's taken). 2. Create `src/db/migrations/<NNN>-<short-name>.ts` exporting a `MIGRATION: MigrationModule` (description + up). 3. Add one import line and one entry to `src/db/migrations/index.ts`'s REGISTERED_MODULES array. Two PRs both creating `004-foo.ts` collide on the FILESYSTEM — the maintainer sees it instantly. No more silent skipped migrations. ## What's new - `src/db/migrations/types.ts` — `MigrationModule { description, up }` and `Migration extends MigrationModule { version }`. - `src/db/migrations/002-project-metadata.ts` — extracted v2 body verbatim. - `src/db/migrations/003-lower-name-index.ts` — extracted v3 body verbatim. - `src/db/migrations/index.ts` — central registry. Static-imports each migration, parses the version FROM THE FILENAME (no hand-typed version field that can drift), enforces strict `NNN-kebab-name.ts` shape, validates uniqueness/sort at module load (throws loudly on collision), exposes ALL_MIGRATIONS and CURRENT_SCHEMA_VERSION. - `src/db/migrations.ts` — refactored to a thin runner. Same exported surface (CURRENT_SCHEMA_VERSION, getCurrentVersion, runMigrations, needsMigration, getPendingMigrations, getMigrationHistory, Migration type) — every existing import keeps working unchanged. - `__tests__/migrations-registry.test.ts` — 8 invariant tests: registry non-empty, versions unique + strictly ascending, CURRENT_SCHEMA_VERSION matches max, every file matches the strict NNN-kebab-name pattern, no orphan files, no phantom registrations. ## Reviewer pass Independent reviewer ran once. 3 REQUEST_CHANGES + 1 INFO addressed: 1. Hand-typed `version` field in REGISTERED_MODULES could drift from filename. **Fixed**: removed the version field; registry now parses version from filename via FILENAME_PATTERN regex inside validateRegistered. 2. Filename-pattern test was lenient (allowed 4-digit or 1-digit prefixes). **Fixed**: new "every migration file matches the strict NNN-kebab-name.ts pattern" test catches malformed filenames as orphan-detection-bypassing offenders. 3. `getPendingMigrations` returned `readonly Migration[]`, breaking callers that typed the result as `Migration[]`. **Fixed**: returns a fresh mutable array via `.slice()`. 4. No throw-on-duplicate test for validateRegistered (module evaluation timing). Acknowledged; not added. ## Backward compat Every existing import works unchanged: - `import { CURRENT_SCHEMA_VERSION } from './migrations'` ✓ - `import { runMigrations } from './migrations'` ✓ - `import { needsMigration } from './migrations'` ✓ - `import { getMigrationHistory } from './migrations'` ✓ - `import { getPendingMigrations } from './migrations'` — returns mutable Migration[] (preserved) - `Migration` type — re-exported ## Affected open PRs Every migration-touching PR (colbymchenry#102 UNIQUE edges, colbymchenry#105 cochange, colbymchenry#108 perf db, colbymchenry#111 LLM features, my colbymchenry#112 centrality+churn, colbymchenry#113 issue-history, colbymchenry#114 config-refs, colbymchenry#115 sql-refs) currently claims migration v4 and conflicts with each other on `migrations.ts`. After this lands they each become: - 1 new file: `src/db/migrations/<NNN>-<name>.ts` - 2 lines in registry.ts (import + array entry) Conflict shape changes from "next free version + array entry + CURRENT_SCHEMA_VERSION bump in one file" (4-way conflict) to "1 new file" + 2-line registry edit. If two PRs target the same NNN, the filesystem collision surfaces immediately — no silent skipped migrations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Today every PR adding a derived-signal pass (centrality, churn,
issue-history, config-refs, sql-refs, cochange) edits the same
3 spots in src/index.ts:
1. New imports at the top
2. New private method on `CodeGraph` (e.g. runDerivedSignals,
runIssueHistoryPass, runConfigRefsPass, runSqlRefsPass)
3. New call site in `indexAll` AFTER resolution
4. New call site in `sync` AFTER resolution
5 PRs collide on every one of those.
After this refactor:
Adding a new derived-signal pass:
1. Create `src/index-hooks/<name>.ts` exporting a
`HOOK: IndexHook` constant with `afterIndexAll` and/or
`afterSync` methods.
2. Add one import + one entry to
`src/index-hooks/registry.ts`.
`CodeGraph.indexAll` and `sync` invoke the hook runner once;
adding a new pass touches only the hook file + the registry.
Zero changes to CodeGraph itself.
## What's new
- **src/index-hooks/types.ts** — `IndexHook` interface
(`afterIndexAll`, `afterSync`, both optional), `IndexHookContext`
(projectRoot + config + queries + db), and
`IndexHookOutcome` for diagnostic reporting.
- **src/index-hooks/registry.ts** — static-import list of every
registered hook (empty on main today; PRs adding hooks fill it
in), plus the `runAfterIndexAll` / `runAfterSync` runners that
iterate hooks and catch errors so one broken hook never fails
indexing.
- **src/index.ts** — `indexAll` calls `runAfterIndexAll(ctx)`
after resolution. `sync` calls `runAfterSync(ctx, result)`
after resolution. New private `buildHookContext()` helper
exposes a stable read-only context.
- **__tests__/index-hooks.test.ts** — 6 tests covering empty
registry, runner shape, and the `afterIndexAll` / `afterSync`
contracts.
## Why ship the framework on main with zero registered hooks?
The only consumers of this framework today are 5 unmerged PRs
(colbymchenry#105 cochange + my colbymchenry#112-colbymchenry#115). Landing the framework now lets
each of those PRs rebase to a 2-line change instead of 8-10
lines mutating CodeGraph adjacent-line. Without this, all 5 PRs
collide on the same indexAll/sync call sites.
The framework adds zero behavior on main (no registered hooks =
no-op runner). 380→386 tests confirm no regression.
## Affected open PRs
| PR | Today | After this lands |
|---|---|---|
| colbymchenry#105 cochange | runDerivedSignals helper + 2 call sites | 1 hook file in src/index-hooks/ + 2 lines in registry.ts |
| colbymchenry#112 centrality+churn | same shape | same shape |
| colbymchenry#113 issue-history | same shape | same shape |
| colbymchenry#114 config-refs | same shape | same shape |
| colbymchenry#115 sql-refs | same shape | same shape |
Each goes from "edit CodeGraph in 4 spots" to "drop a hook file."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 27, 2026
…efactors
Lands centrality (PageRank) and churn (git history) as registered
IndexHooks (`afterIndexAll` + `afterSync`) instead of CodeGraph
private methods. Adds:
- Migration 004: nodes.centrality + files.{commit_count,loc,
first_seen_ts,last_touched_ts} + indexes
- src/centrality/ + src/churn/ (pure modules)
- src/index-hooks/centrality.ts + churn.ts (registered hooks)
- CodeGraph public methods: getCentrality, getTopCentralNodes,
getCentralityRank, getFileChurn, getHotspots
- codegraph_hotspots MCP tool wired through ToolModule registry
+ handleHotspots on ToolHandler
- Updated regression-guard tests (index-hooks, mcp-tool-registry)
to reflect newly registered hooks/tools
Tests: 440/440 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
65a7687 to
8b16d8e
Compare
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
Apr 27, 2026
…efactors
Lands centrality (PageRank) and churn (git history) as registered
IndexHooks (`afterIndexAll` + `afterSync`) instead of CodeGraph
private methods. Adds:
- Migration 004: nodes.centrality + files.{commit_count,loc,
first_seen_ts,last_touched_ts} + indexes
- src/centrality/ + src/churn/ (pure modules)
- src/index-hooks/centrality.ts + churn.ts (registered hooks)
- CodeGraph public methods: getCentrality, getTopCentralNodes,
getCentralityRank, getFileChurn, getHotspots
- codegraph_hotspots MCP tool wired through ToolModule registry
+ handleHotspots on ToolHandler
- Updated regression-guard tests (index-hooks, mcp-tool-registry)
to reflect newly registered hooks/tools
Tests: 440/440 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Heads-up: this branch has been force-pushed. This PR's contents have been rebased to use the per-thing file layout established by refactor PRs #116-#119. Functionality is identical to the original; the internal file shape now matches the post-refactor architecture so this PR can be merged after the refactors with zero conflicts. The current diff against main looks larger than the feature alone because it transitively includes the 4 refactor commits + any prerequisite PRs. That diff auto-shrinks as the prerequisites merge in the order documented in #120. See #120 (comment) for the full rebase summary. |
This was referenced Apr 27, 2026
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
Apr 28, 2026
The MCP `initialize` response can include an `instructions` field that clients (Claude Code, Cursor, opencode, LangChain, OpenAI Agent SDK, etc.) surface in the agent's system prompt automatically. Today codegraph emits an empty initialize response — agents only see individual tool descriptions, no overall guidance on how to compose them. This adds the missing playbook: - **Tool selection by intent** — quick map from "what is X" / "how does X work" / "what would changing X break" to the right tool. - **Common chains** — onboarding (context first), PR review (review_context), refactor planning (search → callers → impact), debugging a regression. - **Tier discipline** — start at the cheap deterministic tier (search, context, callers, callees, impact, node, explore, files, status), escalate to conditional tools only when their data exists, and only reach for LLM-mediated tools when the cheap path doesn't suffice. - **Agent-bridge tier** — explicit recipe for projects without a local LLM where the agent itself summarizes via codegraph_pending_summaries + codegraph_save_summaries. - **Anti-patterns** — don't grep when search exists, don't chain search+node when context covers it, don't query the index immediately after a write. Lives in src/mcp/server-instructions.ts so it's easy to update without touching the JSON-RPC dispatch in src/mcp/index.ts. Single-file, no schema changes, no migrations, no test changes needed. References tools that exist on `main` today; doesn't presume any of the in-flight feature PRs (colbymchenry#110, colbymchenry#112-115, colbymchenry#111) have landed. After those merge, the relevant sections of this guidance start applying without needing a follow-up edit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
Apr 28, 2026
…om a diff Adds a new MCP tool that takes a unified diff and returns structured review context for an LLM-driven PR reviewer. Codegraph becomes a substrate for Greptile/CodeRabbit-style products without itself doing the synthesis. ## What it returns Per changed file: - status (added / modified / deleted / renamed) - hunks (line ranges) - affected symbols (line-range overlap with hunks) - tests covering the file (via tests-edges; graceful no-op if absent) Per affected symbol: - signature, docstring - top-N callers, top-N callees - impact-radius node count ## Components - src/review/diff-parser.ts: pure unified-diff parser - src/review/index.ts: buildReviewContext + co-change warnings - src/index.ts: CodeGraph.buildReviewContext public API - src/mcp/tools/review-context.ts: ToolModule (post-colbymchenry#117 form) - src/mcp/tools.ts: handleReviewContext + serializeReviewContextWithinCap (progressive-trim JSON serializer that keeps output <= MAX_OUTPUT_LENGTH while preserving JSON parseability) - __tests__/review-context.test.ts: 25 tests Output is JSON; the LLM consumer does the synthesis. This is the post-colbymchenry#117 (per-tool MCP registry) form — the original PR's case-switch dispatch is replaced by a ToolModule entry plus a 'handleReviewContext' HandlerKey. Tests that exercise co-change warnings include a graceful inline ALTER TABLE for commit_count, since that column lands with colbymchenry#112's centrality+churn migration. After colbymchenry#112 lands the ALTER becomes a no-op (column already exists).
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
Apr 28, 2026
…om a diff Adds a new MCP tool that takes a unified diff and returns structured review context for an LLM-driven PR reviewer. Codegraph becomes a substrate for Greptile/CodeRabbit-style products without itself doing the synthesis. Per changed file: - status (added / modified / deleted / renamed) - hunks (line ranges) - affected symbols (line-range overlap with hunks) - tests covering the file (via tests-edges; graceful no-op if absent) Per affected symbol: - signature, docstring - top-N callers, top-N callees - impact-radius node count - src/review/diff-parser.ts: pure unified-diff parser - src/review/index.ts: buildReviewContext + co-change warnings - src/index.ts: CodeGraph.buildReviewContext public API - src/mcp/tools/review-context.ts: ToolModule (post-colbymchenry#117 form) - src/mcp/tools.ts: handleReviewContext + serializeReviewContextWithinCap (progressive-trim JSON serializer that keeps output <= MAX_OUTPUT_LENGTH while preserving JSON parseability) - __tests__/review-context.test.ts: 25 tests Output is JSON; the LLM consumer does the synthesis. This is the post-colbymchenry#117 (per-tool MCP registry) form — the original PR's case-switch dispatch is replaced by a ToolModule entry plus a 'handleReviewContext' HandlerKey. Tests that exercise co-change warnings include a graceful inline ALTER TABLE for commit_count, since that column lands with colbymchenry#112's centrality+churn migration. After colbymchenry#112 lands the ALTER becomes a no-op (column already exists).
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
Apr 29, 2026
Adds Steps K-O to walk the new PRs in dependency order: K: bug-fix wave (clean): colbymchenry#128, colbymchenry#129 L: resolution + search: colbymchenry#130 (resolve), colbymchenry#131 (resolve) M: extraction edges: colbymchenry#134 (resolve) N: biomarker stack: colbymchenry#132, colbymchenry#133 (both resolve, on top of colbymchenry#125) O: search advanced: colbymchenry#135 (resolve, on top of colbymchenry#131) Also flips colbymchenry#125 from merge_clean to merge_resolve - it now hits a queries.ts conflict after the Phase-4 stack lands (colbymchenry#111/colbymchenry#112/colbymchenry#123/colbymchenry#124 all extend the same QueryBuilder surface, so colbymchenry#125's biomarker columns no longer apply cleanly without a resolution). Validated end-to-end against colbymchenry/main HEAD: script ran clean through all 43 PRs, npm run build succeeded, full test suite reports 877/877 passing (was 829 before this wave: +48 from new tests added by the new PRs plus the reviewer-driven follow-ups).
This was referenced May 6, 2026
Open
Open
Open
This was referenced May 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds two graph-derived signals — PageRank centrality on nodes and per-file churn mined from git — and combines them into a
codegraph_hotspotsMCP tool that ranks files byrisk = centrality × churn. Empirically validated before implementation: top hotspot on this repo's own history issrc/extraction/tree-sitter.ts(47 commits × 2347 LOC), exactly the file PR #77 split.Why both signals together: centrality alone surfaces architecturally-important code; churn alone surfaces volatile code. The intersection — central code that changes often — is where bugs accumulate, and that's the agent-facing query this PR ships.
What's added
src/centrality/index.ts): PageRank overcalls + referencesedges (excludingcontainsso lexical hierarchy doesn't dominate). Pure compute, dangling-mass redistribution, Float64Array. Sub-100 ms on this repo (1,688 nodes / 4,319 edges).src/churn/index.ts): per-file commit count, first/last touched timestamps, LOC. Mined fromgit log -z. Incremental vialast_mined_churn_headwithneedsFullRescanrecovery for force-push / gc.codegraph_hotspotsMCP tool: ranked file list,sortBy= risk | centrality | churn, withminCommits/minCentralityfilters.codegraph_nodeannotation: centrality + 1-based rank when available.enableCentrality/enableChurn(default true) for opt-out on shallow clones / non-git checkouts.Bench
Both passes are best-effort: any failure is logged + swallowed so indexing/sync still returns success. The data is advisory.
Sample output (from
node dist/bin/codegraph.js index . && cg.getHotspots())Side-finding from PageRank: ranks 15 / 23 in the raw output (
trim,run) are resolver false-positives — overly-generic short names being conflated across files. Useful diagnostic byproduct, not a goal of this PR.Tests
408 pass total, 28 new:
centrality.test.ts(10) — empty / isolated / sinks / star / cycle / mass-conservation / disconnected components / unknown-node edges / damping sentinelchurn.test.ts(13) — non-git / commit counts / timestamps / MAX_FILES_PER_COMMIT skip / incremental mining / unreachable sha / no-op / unicode + spaces in paths / LOC newline edge casesderived-signals.test.ts(5) — end-to-end pipeline hub/leaf / no-git fallback /enableCentrality:false/ sync incremental / force-push recoveryReviewer pass
Independent reviewer agent ran once. Two REQUEST_CHANGES + three INFO items addressed:
minCentralitytocodegraph_hotspotsschema (capability gap vs. underlying API).needsFullRescanrecovery path (only untested branch).upsertFiledeliberately omits churn columns fromON CONFLICT DO UPDATE.?? ''belt-and-braces guard at the recovery path.Test plan
npx tsc --noEmitcleannpm test— 408/408 passingcodegraph index .→ centrality + churn populated, hotspots match spike-validated outputCoexistence note
#105 (cochange) targets the same migration v4 with its own
commit_countcolumn add. Both PRs use idempotentPRAGMA table_infoguards on the column add, so whichever lands first does not block the other.🤖 Generated with Claude Code