Skip to content

Improve Aurora MCP tool discovery & steering#466

Open
isiddharthsingh wants to merge 10 commits into
mainfrom
sms10221/dev-1208-mcp-interaction-is-bad
Open

Improve Aurora MCP tool discovery & steering#466
isiddharthsingh wants to merge 10 commits into
mainfrom
sms10221/dev-1208-mcp-interaction-is-bad

Conversation

@isiddharthsingh
Copy link
Copy Markdown
Contributor

@isiddharthsingh isiddharthsingh commented Jun 2, 2026

Why

External agents (Cursor, Claude Desktop/Code, Codex, Windsurf) connecting to Aurora's self-hosted MCP server consistently over-used chat_with_aurora / ask_incident for questions a cheap direct read could answer, and failed to discover obvious read-only tools (infrastructure context, service graph, RCA findings). The steering text was chat-maximalist in three layers, the high-value reads were buried in search_tools-only Tier 3, and the per-sub-agent RCA findings weren't on the MCP surface at all. Result: wrong tool choices and ~40s latency where a ~1s direct read would do.

The tiered architecture is correct (keeps the upfront surface under client tool-count caps). This keeps it and fixes the problem with (a) re-steering the prompt text toward direct reads, (b) promoting a curated set of reads to the always-visible surface, and (c) fixing the search_tools matcher so the long tail is findable.

What changed

1. Re-steered the steering text (chat-first → direct-read-first)

  • mcp_server.py instructions — routing guide naming direct-read categories; critical sentence front-loaded; phantom github_rca scrubbed.
  • chat_with_aurora docstring — dropped "Default tool for any question", added a "NOT a structured-data lookup" scope note, trimmed under the 2KB cap (was being truncated). ask_incident points at the direct readers first.
  • Tier-2 query_logs/query_metrics descriptions softened; search_tools docstring + hint made loud about the long tail.
  • prompts.py reordered to direct reads first; blast_radius_analysis repointed to the new service_impact tool.

2. Promoted 6 curated reads to first-class Tier-1 tools
get_infrastructure_context, list_services, service_impact, incident_findings, incident_finding_detail, incident_list_alerts. The 3 graph entries were removed from DISPATCH_ALLOWLIST (no double-exposure); findings are new. service_impact URL-encodes the service name itself.

3. Fixed the search_tools matcher (two bugs)

  • Tokenized the query so multi-word searches match (the old whole-string matcher returned nothing for e.g. "rca tools steps" and forced a chat fallback); ranks by match count.
  • Added a query-only fallback when a category/connector filter zeroes out a real token match (an LLM guessing category="metrics" for an LLM-usage question whose tools live under category="usage" previously got zero results and gave up).

4. Expanded connector coverage — added skill-gated read entries for Jenkins/CloudBees/Spinnaker (CI/CD), Sentry, and Grafana. Removed the phantom github_list_repos entry (/github/repos 404s).

Docs — updated website/docs/integrations/mcp.md (tier tables, routing note, per-client setup caveats) and added docs/design/mcp-agent-steering.md.

Testing

  • MCP unit tests: 54 pass (docker exec aurora-server python -m pytest tests/mcp/ -q).
  • Live MCP end-to-end against the running server: all 6 promoted tools present + returning real data; search_tools/call_tool discovery, connector-gating, and the allowlist boundary all behave correctly; the wrong-category fallback resolves the reported empty-result case.
  • Tool-selection hit-rate eval: factual prompts route to direct tools (0 chat-funnels); only open-ended prompts escalate to chat. Holds both with instructions and descriptions-only.
  • Latency: a factual lookup that took ~31s via chat is ~0.01-0.7s via the direct tool.

Note for reviewers

The CLAUDE.md New Connector Checklist and the architectural RBAC test do not apply here — these are MCP tools wrapping already-RBAC-decorated REST routes, not new connectors/routes.

Summary by CodeRabbit

  • New Features

    • Promoted several infrastructure and incident analysis tools to always-available tier.
    • Added CI/CD and monitoring integrations (Jenkins, CloudBees, Spinnaker, Sentry, Grafana).
    • Improved tool discovery ranking and graceful fallback for broader queries.
  • Documentation

    • Clarified when to use direct tools vs. assistant-driven investigations; expanded tool usage examples and setup instructions for Cursor/Claude/Cline/Codex.
    • Expanded inline guidance and tool hints to show more example tool families and callable requirements.
  • Tests

    • Expanded coverage for registry, search behavior, and tier-1 registration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b6aab60b-978d-47d5-b47e-30be0193dcde

📥 Commits

Reviewing files that changed from the base of the PR and between 264a2b2 and 92e5f76.

📒 Files selected for processing (2)
  • server/aurora_mcp/dispatch.py
  • server/tests/mcp/test_mcp_dispatch.py

Walkthrough

Promotes several read-only tools to Tier-1, replaces substring dispatch search with tokenized ranking plus fallback, removes promoted entries from the Tier-3 allowlist, updates prompts/docs to prefer direct tools over chat_with_aurora for factual lookups, and adds tests covering search and registration behavior.

Changes

MCP Discovery and Steering Refactor

Layer / File(s) Summary
Tier-1 tool implementation and helpers
server/aurora_mcp/tools_always_on.py
Adds new Tier-1 helpers and @mcp.tool() endpoints (get_infrastructure_context, list_services, service_impact, incident_findings, incident_finding_detail, incident_list_alerts) with slim/truncate helpers and URL-encoding for service_impact.
Token-based search matching and registry expansion
server/aurora_mcp/registry.py
Replaces substring matching with tokenized query matching and token-count ranking + fallback retry; removes promoted graph tools and github_list_repos from Tier-3 allowlist; extends Tier-3 with CI/CD and observability entries; updates Tier-2 descriptions.
Dispatch hints and search_tools behavior
server/aurora_mcp/dispatch.py
_do_search_tools now requests the full ranked match set (sized from the allowlist), annotates callable_now, reorders visible-first, and caps results; expands search_tools hint and docstring examples.
Prompt and instruction reframing
server/mcp_server.py, server/aurora_mcp/prompts.py, server/aurora_mcp/tools_always_on.py
Updates FastMCP instructions and incident/blast-radius prompts to prefer direct read tools for factual lookups; updates chat_with_aurora and ask_incident docstrings to clarify escalation boundaries.
User-facing docs and client setup guidance
website/docs/integrations/mcp.md
Updates MCP docs to list new Tier-1 incident/infrastructure tools, narrows Tier-2 connector guidance, clarifies Tier-3 allowlist, expands client setup (Cursor/Claude Code/Codex/Cline), and revises example workflows.
Registry and dispatch test coverage
server/tests/mcp/test_mcp_registry.py, server/tests/mcp/test_mcp_dispatch.py
Adds tests for tokenized search/ranking, empty-query fallback, allowlist invariants (promotions/removals), connector gating and enabling_skills, search-time skill gating, and Tier-1 registration behavior (URL-encoding and query-param propagation).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Arvo-AI/aurora#432: Promotes get_infrastructure_context from Tier-3 dispatch to Tier-1 always-on MCP tools and related wiring.
  • Arvo-AI/aurora#395: Earlier changes to search/dispatch behavior related to tokenized matching and dispatch ranking.
  • Arvo-AI/aurora#425: Related adjustments to callable_now gating and connector-status handling during dispatch/search.

Suggested reviewers

  • beng360
  • damianloch
  • OlivierTrudeau
  • Harrio-6

"🐰 I nibble through code with a curious nose,
Tools promoted high, where clear guidance grows.
Tokens aligned, searches swift and bright—
Direct reads first, then synthesize at night.
Hooray for lean answers and fewer long chats!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 65.96% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Improve Aurora MCP tool discovery & steering' is directly related to the main objectives of the PR, which focus on improving tool discovery (via fixed search matching and new Tier-1 promotions) and steering (reoriented to prefer direct tools over chat fallback).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sms10221/dev-1208-mcp-interaction-is-bad

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@isiddharthsingh isiddharthsingh marked this pull request as ready for review June 3, 2026 01:14
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/aurora_mcp/dispatch.py`:
- Around line 129-136: The hint text in server/aurora_mcp/dispatch.py currently
promises "traces" support even though no trace tool or dispatch exists; update
the "hint" string(s) (the dict entries with the "hint" key in dispatch.py) to
remove any mention of "traces" (and related phrasing about searching for trace
tools) so the MCP does not advertise an unsupported capability; repeat the same
removal for the other similar hint block in the file (the second "hint" entry
referenced in the review) to keep messaging consistent with registry.py's actual
tools.

In `@server/aurora_mcp/registry.py`:
- Around line 975-981: _tokenize_query currently splits only on whitespace so
punctuation-attached tokens like "deployments?", "mttr/dora", or "rca,tools"
remain joined; update _tokenize_query to use a regex-based tokenization (e.g.
re.findall for alphanumeric runs) instead of q.split(), filter out tokens
shorter than 3, and add the required import (re) so tokens delimited by
punctuation are separated correctly.

In `@server/tests/mcp/test_mcp_registry.py`:
- Line 143: The monkeypatch of registry._check_skill_connected currently uses a
lambda with an unused parameter named u which triggers Ruff ARG005; update the
lambda used in the monkeypatch (the registry._check_skill_connected replacement)
to rename the unused parameter to a leading-underscore placeholder (e.g., _u or
simply _) so the function signature remains the same but the linter no longer
flags the unused argument.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 46f4f880-1218-436b-85d8-ac3b239c39e1

📥 Commits

Reviewing files that changed from the base of the PR and between 6cbb080 and aa3cec7.

📒 Files selected for processing (8)
  • docs/design/mcp-agent-steering.md
  • server/aurora_mcp/dispatch.py
  • server/aurora_mcp/prompts.py
  • server/aurora_mcp/registry.py
  • server/aurora_mcp/tools_always_on.py
  • server/mcp_server.py
  • server/tests/mcp/test_mcp_registry.py
  • website/docs/integrations/mcp.md

Comment thread server/aurora_mcp/dispatch.py
Comment thread server/aurora_mcp/registry.py Outdated
Comment thread server/tests/mcp/test_mcp_registry.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/aurora_mcp/dispatch.py (1)

115-123: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fetch the full ranked match set before reordering by callable_now.

This truncates before visibility sorting. For broad queries, disconnected entries can fill the first limit * 2 slots, so connected tools that rank slightly lower never get surfaced even though this function is trying to put callable tools first. Pull the full ranked set, then split visible/non-visible, and only apply limit at the end.

Proposed fix
 from .registry import (
+    DISPATCH_ALLOWLIST,
     _get_cached_connector_status,
     dispatch_entry_visible,
     find_dispatch_entry,
     parse_and_cache_connector_status,
     search_dispatch_entries,
@@
     all_matches = search_dispatch_entries(
         query=query, category=category, connector=connector,
-        user_id=None, limit=limit * 2,
+        user_id=None, limit=len(DISPATCH_ALLOWLIST),
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/aurora_mcp/dispatch.py` around lines 115 - 123, The code currently
truncates matches before reordering by visibility (all_matches =
search_dispatch_entries(..., limit=limit * 2)), which can exclude callable
entries; change the flow in dispatch.py so you fetch the full ranked set (call
search_dispatch_entries without the premature limit or with a sufficiently
large/unbounded limit), then build annotated = [(e, dispatch_entry_visible(e,
user_id)) for e in all_matches], then form ordered by visible-first ([(e,
True)...] + [(e, False)...]) and only apply the final [:limit] slice to ordered;
adjust variable usage around all_matches, annotated, and ordered accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@server/aurora_mcp/dispatch.py`:
- Around line 115-123: The code currently truncates matches before reordering by
visibility (all_matches = search_dispatch_entries(..., limit=limit * 2)), which
can exclude callable entries; change the flow in dispatch.py so you fetch the
full ranked set (call search_dispatch_entries without the premature limit or
with a sufficiently large/unbounded limit), then build annotated = [(e,
dispatch_entry_visible(e, user_id)) for e in all_matches], then form ordered by
visible-first ([(e, True)...] + [(e, False)...]) and only apply the final
[:limit] slice to ordered; adjust variable usage around all_matches, annotated,
and ordered accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a5efd3f1-6192-4196-a967-db40205d35a8

📥 Commits

Reviewing files that changed from the base of the PR and between aa3cec7 and 264a2b2.

📒 Files selected for processing (3)
  • server/aurora_mcp/dispatch.py
  • server/aurora_mcp/registry.py
  • server/tests/mcp/test_mcp_registry.py

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 3, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant