Skip to content

feat(mcp): emit server-level instructions in initialize response#16

Open
mschreib28 wants to merge 3 commits into
mainfrom
upstream/feat/mcp-server-instructions
Open

feat(mcp): emit server-level instructions in initialize response#16
mschreib28 wants to merge 3 commits into
mainfrom
upstream/feat/mcp-server-instructions

Conversation

@mschreib28
Copy link
Copy Markdown
Owner

Summary\n\nThe MCP initialize response can include an instructions field that clients (Claude Code, Cursor, opencode, LangChain, OpenAI Agent SDK, …) surface in the agent's system prompt automatically. Today codegraph emits an empty initialize response — agents only see individual tool descriptions, with no overall guidance on how to compose them.\n\nThis adds the missing playbook in a new src/mcp/server-instructions.ts module, wired into the initialize handler.\n\n## Empirical validation (A/B test)\n\nTested in-session by running the same task two ways and counting tool calls:\n\nTask: "Predict the blast radius of changing extractFromSource in the codegraph codebase."\n\n| Approach | Calls | Output | Completeness |\n|---|---|---|---|\n| Path A — naive (no playbook) | codegraph_searchcodegraph_callers → ~5 more recursive walks | ≈7 calls, fragmented output | Partial |\n| Path B — playbook-guided | codegraph_impact("extractFromSource") | 1 call, 152 transitive symbols across 14 files | Complete at depth 2 |\n\nThe playbook's mapping "What would changing this break?" → codegraph_impact saved ~6 redundant tool calls and produced a more complete answer. The benefit isn't theoretical — without the meta-guidance, the natural agent instinct is to start with codegraph_search (the most general-sounding tool) and walk the call graph manually. Tool descriptions alone don't redirect that instinct.\n\n## What the instructions teach the agent\n\n- Tool selection by intent — quick map from "what is X" / "how does X work" / "what would changing X break" to the right tool.\n- Common chains — onboarding (context first), PR review (review_context), refactor planning (search → callers → impact), debugging a regression.\n- Tier discipline — start at the cheap deterministic tier (search, context, callers, callees, impact, node, explore, files, status), escalate to conditional tools only when their data exists, reach for LLM-mediated tools only when the cheap path doesn't suffice.\n- Agent-bridge tier — explicit recipe for projects without a local LLM where the agent itself summarizes via codegraph_pending_summaries + codegraph_save_summaries.\n- Anti-patterns — don't grep when search exists, don't chain search+node when context covers it, don't query the index immediately after a write.\n\n## Why MCP-level vs CLAUDE.md\n\nCLAUDE.md is a Claude-Code-only convention. The MCP instructions protocol field reaches every client. Both can coexist — the existing CLAUDE.md template still covers the Claude-Code-specific Explore-agent pattern. This PR adds the universal playbook on top.\n\n## Why a separate PR\n\nOriginally considered as part of colbymchenry#111 (LLM tools), but pulled out because:\n- Most users won't run a local LLM, but everyone benefits from tool-selection guidance for the deterministic tools.\n- The instructions reference tools that exist on main today; they don't presume colbymchenry#110/colbymchenry#112-colbymchenry#115/colbymchenry#111 have landed. After those merge, the relevant sections of the guidance simply start applying.\n\n## Per-language guidance — intentionally not included\n\nConsidered (and explicitly rejected): per-language sections like "in Python, callers includes decorators." Reasons:\n1. Token cost on every session for content irrelevant ~80% of the time (codegraph supports 19+ languages but typical sessions touch 1-3).\n2. The tools themselves are language-agnostic; result shape differs per-language but tool usage doesn't.\n3. Where language matters (codegraph_sql, codegraph_config), the tool description already self-documents.\n4. Project-specific patterns belong in project-local CLAUDE.md, not the universal MCP instructions.\n\nIf we ever want this, the principled implementation is dynamic per-project tailoring at initialize time (only emit the SQL section if the project has SQL nodes, etc.). Out of scope here.\n\n## Test plan\n\n- [x] npx tsc --noEmit clean\n- [x] npx vitest run clean (no test changes — the JSON-RPC initialize response is structurally compatible)\n- [x] A/B test (above) — validated the playbook reduces tool calls on a representative task\n\n## Files changed\n\n| File | Change |\n|---|---|\n| src/mcp/server-instructions.ts | New module (~75 lines, mostly the instructions string) |\n| src/mcp/index.ts | 1 import + 1 line in the initialize result |\n\n🤖 Generated with Claude Code\n


Copied from colbymchenry/codegraph#121

andreinknv and others added 3 commits April 27, 2026 17:01
…flicts

Today every PR adding an MCP tool conflicts on the same two
shared lists in src/mcp/tools.ts: the tools[] array (the
list_tools surface) and the case switch in execute(). After this
refactor:

  Adding a new MCP tool:
  1. Drop a file at src/mcp/tools/<name>.ts exporting a
     <NAME>_TOOL: ToolModule (definition + handlerKey).
  2. Add one import line and one array entry to
     src/mcp/tools/registry.ts.
  3. Implement handle<Name>(args) on ToolHandler in tools.ts and
     add the new key to HandlerKey in tools/types.ts.

Step 3 is the only remaining "shared method on a single class"
conflict surface. Extracting handler bodies into per-tool files
(making step 3 also a single-file addition) is left as a
follow-up — the cost/benefit favors landing this incremental win
now and finishing the body extraction once language and migration
refactors land.

## What's new

- **src/mcp/tool-types.ts** — extracted ToolDefinition, ToolResult,
  PropertySchema, projectPathProperty into a shared module so
  per-tool files can import without circular dependency.
- **src/mcp/tools/types.ts** — ToolModule interface, HandlerKey
  string union, and ToolHandlerLike (a structural type that
  ToolHandler now `implements`, providing compile-time guarantee
  that every HandlerKey maps to a real method).
- **src/mcp/tools/<name>.ts × 9** — one file per existing tool
  (callees, callers, context, explore, files, impact, node, search,
  status). Each ~25-30 lines: import + definition literal +
  handlerKey reference.
- **src/mcp/tools/registry.ts** — static-import barrel, sorted
  alphabetically. Exports getToolModules(), getToolModule(name),
  and the derived `tools[]` array.
- **src/mcp/tools.ts** — ~200 lines deleted from the top
  (inline types + tools[] array + projectPathProperty).
  execute()'s case-switch replaced with a registry lookup +
  type-safe `this[mod.handlerKey](args)` dispatch (now compile-
  time-checked thanks to `implements ToolHandlerLike`).
  All `private async handle*` methods now public to match the
  interface. errorResult/textResult also public for the same reason.
- **src/mcp/index.ts** — MCPServer's tool-existence check switched
  from a linear `tools.find()` scan to the O(1) `getToolModule()`
  Map lookup, eliminating two parallel lookup paths.

## Tests

387/387 pass. **7 new tests** in __tests__/mcp-tool-registry.test.ts:
- Definitions are well-formed (name shape, description length).
- handlerKey shape (`handle<UpperCase>`).
- Every registered handlerKey resolves to a real method on
  ToolHandler.
- Exported `tools[]` exactly mirrors the registry.
- Canonical 9 main-line tools regression guard.
- execute() unknown-tool error path.
- **End-to-end dispatch smoke test**: execute('codegraph_status', {})
  reaches the real handler body (no broken `this` binding) — would
  fail loudly if the dynamic dispatch chain ever breaks.

## Reviewer pass

Independent reviewer ran once. 2 REQUEST_CHANGES + 2 INFO addressed:

1. ToolHandlerLike was defined but never enforced —
   ToolHandler now `implements ToolHandlerLike`. Eliminates the
   `(this as unknown as Record<...>)` cast in execute(); dispatch
   is fully compile-time-checked.
2. No end-to-end dispatch test — added one (see Tests above).
3. MCPServer.handleToolsCall used a linear `tools.find()` scan
   while execute() used Map lookup — switched to getToolModule()
   for parity.
4. Removed redundant .slice() in registry.ts (map() already
   returns a fresh array).

## Backward compat

src/mcp/tools.ts still re-exports ToolDefinition, ToolResult, the
mutable `tools[]` array, ToolHandler, and getExploreBudget. Every
existing consumer (`import { ToolDefinition, ToolResult, tools,
ToolHandler } from './tools'`) keeps working unchanged.

## Affected open PRs

- colbymchenry#110 (review-context): rebases to 1 new file in tools/ + 2
  lines in registry.ts + 1 method on ToolHandler + 1 line in
  HandlerKey.
- colbymchenry#112 (centrality+churn): same shape for the codegraph_hotspots
  tool.
- colbymchenry#114 (config-refs): same shape for codegraph_config.
- colbymchenry#115 (sql-refs): same shape for codegraph_sql.

Each goes from 4-way conflict (tools[] + case + handler + helpers)
down to 1-way conflict (HandlerKey + handler method on ToolHandler,
both in tools.ts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The MCP `initialize` response can include an `instructions` field that
clients (Claude Code, Cursor, opencode, LangChain, OpenAI Agent SDK,
etc.) surface in the agent's system prompt automatically. Today
codegraph emits an empty initialize response — agents only see
individual tool descriptions, no overall guidance on how to compose
them.

This adds the missing playbook:

- **Tool selection by intent** — quick map from "what is X" / "how does
  X work" / "what would changing X break" to the right tool.
- **Common chains** — onboarding (context first), PR review
  (review_context), refactor planning (search → callers → impact),
  debugging a regression.
- **Tier discipline** — start at the cheap deterministic tier (search,
  context, callers, callees, impact, node, explore, files, status),
  escalate to conditional tools only when their data exists, and only
  reach for LLM-mediated tools when the cheap path doesn't suffice.
- **Agent-bridge tier** — explicit recipe for projects without a local
  LLM where the agent itself summarizes via codegraph_pending_summaries
  + codegraph_save_summaries.
- **Anti-patterns** — don't grep when search exists, don't chain
  search+node when context covers it, don't query the index immediately
  after a write.

Lives in src/mcp/server-instructions.ts so it's easy to update without
touching the JSON-RPC dispatch in src/mcp/index.ts. Single-file, no
schema changes, no migrations, no test changes needed.

References tools that exist on `main` today; doesn't presume any of the
in-flight feature PRs (colbymchenry#110, colbymchenry#112-115, colbymchenry#111) have landed. After those
merge, the relevant sections of this guidance start applying without
needing a follow-up edit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…kers

Two new tools landed in colbymchenry#124 and colbymchenry#125 that this playbook should
route the agent to instead of falling back to "read the source":

  - codegraph_biomarkers (PR colbymchenry#125): structured static-analysis
    signals (Code Health, cyclomatic, nesting, length) so an
    agent can ask "is this function risky to change?" without
    reading the source.
  - codegraph_coverage (PR colbymchenry#124): per-symbol coverage from lcov
    so an agent can ask "is this function tested?" with a
    structured answer.

Updates:
  - "When to use which tool" map gains two entries.
  - Refactor-planning chain expanded to call both tools before
    callers/impact -- and points at the killer cross-tool query
    (high-centrality + warning-severity findings).
  - Tier table places biomarkers in tier 1 (always available
    after colbymchenry#125 lands) and coverage in tier 2 (conditional on a
    prior `codegraph coverage <lcov>` ingestion).

Both references are forward-compatible: agents that try to call
a not-yet-merged tool get a graceful "unknown tool" error, same
pattern the existing playbook already uses for colbymchenry#110, colbymchenry#111, etc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants