Skip to content

feat(mcp): codegraph_review_context — structured PR-review context from a diff#110

Open
andreinknv wants to merge 1 commit into
colbymchenry:mainfrom
andreinknv:feat/review-context
Open

feat(mcp): codegraph_review_context — structured PR-review context from a diff#110
andreinknv wants to merge 1 commit into
colbymchenry:mainfrom
andreinknv:feat/review-context

Conversation

@andreinknv
Copy link
Copy Markdown
Contributor

@andreinknv andreinknv commented Apr 26, 2026

🔄 Rebased onto post-refactor patterns (review-context MCP tool)
This PR has been rebased to use the per-thing file layout established by refactor PRs #116/#117/#118/#119. Behavior shipped is identical to the original; only the internal code shape changed.

What this PR now ships:

  • src/mcp/tools/review-context.tsREVIEW_CONTEXT_TOOL: ToolModule
  • src/review/index.ts + src/review/diff-parser.ts — pure diff→context builder
  • src/mcp/tools/registry.ts — 2 lines
  • src/mcp/tools/types.ts — 1 HandlerKey entry
  • src/mcp/tools.tshandleReviewContext method on ToolHandler + JSON-cap serializer
  • src/index.tsbuildReviewContext public method
  • __tests__/review-context.test.ts

What you'll see in the diff: until #116-#119 land, the diff against main includes the refactor commits as part of the branch history. After they merge, GitHub auto-shrinks this PR's diff to just the files above. Full context: #120.


Summary

Adds a new MCP tool codegraph_review_context that takes a unified diff and returns structured review context for an LLM-driven PR reviewer. Codegraph becomes a substrate for Greptile/CodeRabbit-style products without itself doing the synthesis — the LLM consumer (Claude Code, ultrareview, etc.) writes the review comments, decides severity, and posts to GitHub.

What it returns (JSON)

Per changed file:

Per affected symbol:

For deleted files:

  • brokenIncomingRefs — every distinct caller of a vanishing symbol (deduped by source name + file + line)

Co-change warnings (the genuinely novel signal):

Components

  • src/review/diff-parser.ts — pure unified-diff parser. Handles modified, added (/dev/null in ---), deleted (/dev/null in +++), renamed (rename from/to), multi-hunk, single-line hunks, C-style-quoted paths with spaces, and hunk-less rename / mode-change files mid-diff.

  • src/review/index.tsbuildReviewContext(diff, queries, traverser, options). Iterates files, maps hunks to symbols, attaches per-symbol context, surfaces co-change warnings.

  • CodeGraph.buildReviewContext(diff, options) — public API method.

  • MCP tool in src/mcp/tools.ts — JSON output runs through serializeReviewContextWithinCap which progressively trims (docstrings → signatures → caller/callee caps → drop callers/callees → drop oldest files → summary-only) so the result stays valid JSON even when it would exceed MAX_OUTPUT_LENGTH. (Naive mid-string truncation would corrupt JSON parsing.)

Files changed

File Change
src/review/diff-parser.ts (NEW) Pure unified-diff parser
src/review/index.ts (NEW) buildReviewContext + co-change warnings
src/index.ts Add CodeGraph.buildReviewContext public API
src/mcp/tools.ts New tool + handler + JSON-safe truncation
__tests__/review-context.test.ts (NEW) 25 regression tests

Example flow

# Get the diff for a PR you want to review
gh pr diff 42 > /tmp/pr-42.diff

# Then the LLM (via MCP) calls:
codegraph_review_context({ diff: <diff text> })
# → returns structured JSON; LLM writes review comments,
#   posts via gh pr comment / gh api

Test plan

  • npm test: 405/405 pass on macOS
  • npx tsc --noEmit clean
  • Independent reviewer pass before pushing — addressed four findings:
    • Mid-diff hunk-less rename / mode-change was silently dropped (request_changes → fixed; flushHunkless on every header transition, flags reset after consumption to prevent phantom emits)
    • truncateOutput sliced JSON mid-string producing invalid JSON (request_changes → tier-and-trim helper, regression test asserts output stays parseable even with 200-symbol over-cap context)
    • brokenIncomingRefs could double-list the same caller with multiple edge types (info → dedup by name|filePath|line)
    • C-style-quoted paths in diff --git "a/x" "b/y" headers were not stripped (info → regex now matches both shapes; unquote helper handles the value)

Forward-compat with sibling PRs

This PR works on a clean upstream/main install today. When PRs #105 (co-change graph) and #106 (tests-as-edges) land, the output gets richer automatically — no further code changes required.

🤖 Generated with Claude Code

…om a diff

Adds a new MCP tool that takes a unified diff and returns structured
review context for an LLM-driven PR reviewer. Codegraph becomes a
substrate for Greptile/CodeRabbit-style products without itself doing
the synthesis.

## What it returns

Per changed file:
  - status (added / modified / deleted / renamed)
  - hunks (line ranges)
  - affected symbols (line-range overlap with hunks)
  - tests covering the file (via PR colbymchenry#106 `tests` edges; graceful no-op
    if those edges aren't present)

Per affected symbol:
  - signature, docstring
  - top-N callers (PR colbymchenry#108's batch lookups make this fast)
  - top-N callees
  - impact-radius node count

For deleted files:
  - brokenIncomingRefs — every distinct caller of a vanishing symbol
    (deduped by source name + file + line)

Co-change warnings (the genuinely novel signal):
  - For each changed file, surface up to N historical co-changers
    that were NOT touched in this PR (catches "you changed schema.sql
    but didn't update migrations.ts" coupling violations)
  - Graceful degrade if PR colbymchenry#105's co_changes table isn't present

Output is JSON; the LLM consumer does the synthesis (writes review
comments, decides severity, posts to GitHub).

## Components

- src/review/diff-parser.ts — pure unified-diff parser. Handles
  modified, added (/dev/null in ---), deleted (/dev/null in +++),
  renamed (rename from/to), multi-hunk files, single-line hunks
  (no comma in @@), C-style-quoted paths with spaces, and hunk-less
  rename / mode-change files mid-diff.

- src/review/index.ts — buildReviewContext(diff, queries, traverser).
  Iterates files, maps hunks to symbols, attaches per-symbol context,
  surfaces co-change warnings.

- src/index.ts — CodeGraph.buildReviewContext(diff, options) public API.

- src/mcp/tools.ts — codegraph_review_context tool definition + handler.
  JSON output runs through serializeReviewContextWithinCap which
  progressively trims (docstrings → signatures → caller/callee caps →
  drop callers/callees → drop oldest files → summary-only) so the
  result stays valid JSON even when it would exceed MAX_OUTPUT_LENGTH.

## Files changed

| File | Change |
|---|---|
| src/review/diff-parser.ts (NEW) | Pure unified-diff parser |
| src/review/index.ts (NEW) | buildReviewContext + co-change warnings |
| src/index.ts | Add CodeGraph.buildReviewContext public API |
| src/mcp/tools.ts | New tool + handler + JSON-safe truncation |
| __tests__/review-context.test.ts (NEW) | 25 regression tests |

## Test plan

- [x] npm test: 405/405 pass on macOS
- [x] npx tsc --noEmit clean
- [x] Independent reviewer pass before pushing — addressed four findings:
  - Mid-diff hunk-less rename / mode-change was silently dropped
    (request_changes → fixed; flushHunkless on every header transition,
    flags reset after consumption to prevent phantom emits)
  - truncateOutput sliced JSON mid-string producing invalid JSON
    (request_changes → tier-and-trim helper, regression test asserts
    output stays parseable)
  - brokenIncomingRefs could double-list the same caller with multiple
    edge types (info → dedup by name|filePath|line, regression test added)
  - C-style-quoted paths in `diff --git "a/x" "b/y"` headers were not
    stripped (info → regex now matches both shapes, regression test added)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@andreinknv
Copy link
Copy Markdown
Contributor Author

Heads-up: this branch has been force-pushed. This PR's contents have been rebased to use the per-thing file layout established by refactor PRs #116-#119. Functionality is identical to the original; the internal file shape now matches the post-refactor architecture so this PR can be merged after the refactors with zero conflicts.

The current diff against main looks larger than the feature alone because it transitively includes the 4 refactor commits + any prerequisite PRs. That diff auto-shrinks as the prerequisites merge in the order documented in #120. See #120 (comment) for the full rebase summary.

andreinknv added a commit to andreinknv/codegraph that referenced this pull request Apr 28, 2026
…kers

Two new tools landed in colbymchenry#124 and colbymchenry#125 that this playbook should
route the agent to instead of falling back to "read the source":

  - codegraph_biomarkers (PR colbymchenry#125): structured static-analysis
    signals (Code Health, cyclomatic, nesting, length) so an
    agent can ask "is this function risky to change?" without
    reading the source.
  - codegraph_coverage (PR colbymchenry#124): per-symbol coverage from lcov
    so an agent can ask "is this function tested?" with a
    structured answer.

Updates:
  - "When to use which tool" map gains two entries.
  - Refactor-planning chain expanded to call both tools before
    callers/impact -- and points at the killer cross-tool query
    (high-centrality + warning-severity findings).
  - Tier table places biomarkers in tier 1 (always available
    after colbymchenry#125 lands) and coverage in tier 2 (conditional on a
    prior `codegraph coverage <lcov>` ingestion).

Both references are forward-compatible: agents that try to call
a not-yet-merged tool get a graceful "unknown tool" error, same
pattern the existing playbook already uses for colbymchenry#110, colbymchenry#111, etc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request Apr 28, 2026
The MCP `initialize` response can include an `instructions` field that
clients (Claude Code, Cursor, opencode, LangChain, OpenAI Agent SDK,
etc.) surface in the agent's system prompt automatically. Today
codegraph emits an empty initialize response — agents only see
individual tool descriptions, no overall guidance on how to compose
them.

This adds the missing playbook:

- **Tool selection by intent** — quick map from "what is X" / "how does
  X work" / "what would changing X break" to the right tool.
- **Common chains** — onboarding (context first), PR review
  (review_context), refactor planning (search → callers → impact),
  debugging a regression.
- **Tier discipline** — start at the cheap deterministic tier (search,
  context, callers, callees, impact, node, explore, files, status),
  escalate to conditional tools only when their data exists, and only
  reach for LLM-mediated tools when the cheap path doesn't suffice.
- **Agent-bridge tier** — explicit recipe for projects without a local
  LLM where the agent itself summarizes via codegraph_pending_summaries
  + codegraph_save_summaries.
- **Anti-patterns** — don't grep when search exists, don't chain
  search+node when context covers it, don't query the index immediately
  after a write.

Lives in src/mcp/server-instructions.ts so it's easy to update without
touching the JSON-RPC dispatch in src/mcp/index.ts. Single-file, no
schema changes, no migrations, no test changes needed.

References tools that exist on `main` today; doesn't presume any of the
in-flight feature PRs (colbymchenry#110, colbymchenry#112-115, colbymchenry#111) have landed. After those
merge, the relevant sections of this guidance start applying without
needing a follow-up edit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request Apr 28, 2026
…kers

Two new tools landed in colbymchenry#124 and colbymchenry#125 that this playbook should
route the agent to instead of falling back to "read the source":

  - codegraph_biomarkers (PR colbymchenry#125): structured static-analysis
    signals (Code Health, cyclomatic, nesting, length) so an
    agent can ask "is this function risky to change?" without
    reading the source.
  - codegraph_coverage (PR colbymchenry#124): per-symbol coverage from lcov
    so an agent can ask "is this function tested?" with a
    structured answer.

Updates:
  - "When to use which tool" map gains two entries.
  - Refactor-planning chain expanded to call both tools before
    callers/impact -- and points at the killer cross-tool query
    (high-centrality + warning-severity findings).
  - Tier table places biomarkers in tier 1 (always available
    after colbymchenry#125 lands) and coverage in tier 2 (conditional on a
    prior `codegraph coverage <lcov>` ingestion).

Both references are forward-compatible: agents that try to call
a not-yet-merged tool get a graceful "unknown tool" error, same
pattern the existing playbook already uses for colbymchenry#110, colbymchenry#111, etc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@andreinknv andreinknv force-pushed the feat/review-context branch 3 times, most recently from b481d10 to c7ea248 Compare April 28, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant