Improve Aurora MCP tool discovery & steering by isiddharthsingh · Pull Request #466 · Arvo-AI/aurora

isiddharthsingh · 2026-06-02T19:29:55Z

Why

External agents (Cursor, Claude Desktop/Code, Codex, Windsurf) connecting to Aurora's self-hosted MCP server consistently over-used chat_with_aurora / ask_incident for questions a cheap direct read could answer, and failed to discover obvious read-only tools (infrastructure context, service graph, RCA findings). The steering text was chat-maximalist in three layers, the high-value reads were buried in search_tools-only Tier 3, and the per-sub-agent RCA findings weren't on the MCP surface at all. Result: wrong tool choices and ~40s latency where a ~1s direct read would do.

The tiered architecture is correct (keeps the upfront surface under client tool-count caps). This keeps it and fixes the problem with (a) re-steering the prompt text toward direct reads, (b) promoting a curated set of reads to the always-visible surface, and (c) fixing the search_tools matcher so the long tail is findable.

What changed

1. Re-steered the steering text (chat-first → direct-read-first)

mcp_server.py instructions — routing guide naming direct-read categories; critical sentence front-loaded; phantom github_rca scrubbed.
chat_with_aurora docstring — dropped "Default tool for any question", added a "NOT a structured-data lookup" scope note, trimmed under the 2KB cap (was being truncated). ask_incident points at the direct readers first.
Tier-2 query_logs/query_metrics descriptions softened; search_tools docstring + hint made loud about the long tail.
prompts.py reordered to direct reads first; blast_radius_analysis repointed to the new service_impact tool.

2. Promoted 6 curated reads to first-class Tier-1 tools
get_infrastructure_context, list_services, service_impact, incident_findings, incident_finding_detail, incident_list_alerts. The 3 graph entries were removed from DISPATCH_ALLOWLIST (no double-exposure); findings are new. service_impact URL-encodes the service name itself.

3. Fixed the search_tools matcher (two bugs)

Tokenized the query so multi-word searches match (the old whole-string matcher returned nothing for e.g. "rca tools steps" and forced a chat fallback); ranks by match count.
Added a query-only fallback when a category/connector filter zeroes out a real token match (an LLM guessing category="metrics" for an LLM-usage question whose tools live under category="usage" previously got zero results and gave up).

4. Expanded connector coverage — added skill-gated read entries for Jenkins/CloudBees/Spinnaker (CI/CD), Sentry, and Grafana. Removed the phantom github_list_repos entry (/github/repos 404s).

Docs — updated website/docs/integrations/mcp.md (tier tables, routing note, per-client setup caveats) and added docs/design/mcp-agent-steering.md.

Testing

MCP unit tests: 54 pass (docker exec aurora-server python -m pytest tests/mcp/ -q).
Live MCP end-to-end against the running server: all 6 promoted tools present + returning real data; search_tools/call_tool discovery, connector-gating, and the allowlist boundary all behave correctly; the wrong-category fallback resolves the reported empty-result case.
Tool-selection hit-rate eval: factual prompts route to direct tools (0 chat-funnels); only open-ended prompts escalate to chat. Holds both with instructions and descriptions-only.
Latency: a factual lookup that took ~31s via chat is ~0.01-0.7s via the direct tool.

Note for reviewers

The CLAUDE.md New Connector Checklist and the architectural RBAC test do not apply here — these are MCP tools wrapping already-RBAC-decorated REST routes, not new connectors/routes.

Summary by CodeRabbit

New Features
- Promoted several infrastructure and incident analysis tools to always-available tier.
- Added CI/CD and monitoring integrations (Jenkins, CloudBees, Spinnaker, Sentry, Grafana).
- Improved tool discovery ranking and graceful fallback for broader queries.
Documentation
- Clarified when to use direct tools vs. assistant-driven investigations; expanded tool usage examples and setup instructions for Cursor/Claude/Cline/Codex.
- Expanded inline guidance and tool hints to show more example tool families and callable requirements.
Tests
- Expanded coverage for registry, search behavior, and tier-1 registration.

…1, drop phantom github route, add CI/CD+Sentry+Grafana entries

…urora toward direct lookups

…mpts toward direct reads

…nnector entries

…overage, and per-client setup

…owlist coupling

… zeroes out real token matches

coderabbitai · 2026-06-02T19:30:03Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b6aab60b-978d-47d5-b47e-30be0193dcde

📥 Commits

Reviewing files that changed from the base of the PR and between 264a2b2 and 92e5f76.

📒 Files selected for processing (2)

server/aurora_mcp/dispatch.py
server/tests/mcp/test_mcp_dispatch.py

Walkthrough

Promotes several read-only tools to Tier-1, replaces substring dispatch search with tokenized ranking plus fallback, removes promoted entries from the Tier-3 allowlist, updates prompts/docs to prefer direct tools over chat_with_aurora for factual lookups, and adds tests covering search and registration behavior.

Changes

MCP Discovery and Steering Refactor

Layer / File(s)	Summary
Tier-1 tool implementation and helpers `server/aurora_mcp/tools_always_on.py`	Adds new Tier-1 helpers and `@mcp.tool()` endpoints (`get_infrastructure_context`, `list_services`, `service_impact`, `incident_findings`, `incident_finding_detail`, `incident_list_alerts`) with slim/truncate helpers and URL-encoding for `service_impact`.
Token-based search matching and registry expansion `server/aurora_mcp/registry.py`	Replaces substring matching with tokenized query matching and token-count ranking + fallback retry; removes promoted graph tools and `github_list_repos` from Tier-3 allowlist; extends Tier-3 with CI/CD and observability entries; updates Tier-2 descriptions.
Dispatch hints and search_tools behavior `server/aurora_mcp/dispatch.py`	`_do_search_tools` now requests the full ranked match set (sized from the allowlist), annotates `callable_now`, reorders visible-first, and caps results; expands `search_tools` hint and docstring examples.
Prompt and instruction reframing `server/mcp_server.py`, `server/aurora_mcp/prompts.py`, `server/aurora_mcp/tools_always_on.py`	Updates FastMCP instructions and incident/blast-radius prompts to prefer direct read tools for factual lookups; updates `chat_with_aurora` and `ask_incident` docstrings to clarify escalation boundaries.
User-facing docs and client setup guidance `website/docs/integrations/mcp.md`	Updates MCP docs to list new Tier-1 incident/infrastructure tools, narrows Tier-2 connector guidance, clarifies Tier-3 allowlist, expands client setup (Cursor/Claude Code/Codex/Cline), and revises example workflows.
Registry and dispatch test coverage `server/tests/mcp/test_mcp_registry.py`, `server/tests/mcp/test_mcp_dispatch.py`	Adds tests for tokenized search/ranking, empty-query fallback, allowlist invariants (promotions/removals), connector gating and `enabling_skills`, search-time skill gating, and Tier-1 registration behavior (URL-encoding and query-param propagation).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Arvo-AI/aurora#432: Promotes get_infrastructure_context from Tier-3 dispatch to Tier-1 always-on MCP tools and related wiring.
Arvo-AI/aurora#395: Earlier changes to search/dispatch behavior related to tokenized matching and dispatch ranking.
Arvo-AI/aurora#425: Related adjustments to callable_now gating and connector-status handling during dispatch/search.

Suggested reviewers

beng360
damianloch
OlivierTrudeau
Harrio-6

"🐰 I nibble through code with a curious nose,
Tools promoted high, where clear guidance grows.
Tokens aligned, searches swift and bright—
Direct reads first, then synthesize at night.
Hooray for lean answers and fewer long chats!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 65.96% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Improve Aurora MCP tool discovery & steering' is directly related to the main objectives of the PR, which focus on improving tool discovery (via fixed search matching and new Tier-1 promotions) and steering (reoriented to prefer direct tools over chat fallback).
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch sms10221/dev-1208-mcp-interaction-is-bad

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/aurora_mcp/dispatch.py`:
- Around line 129-136: The hint text in server/aurora_mcp/dispatch.py currently
promises "traces" support even though no trace tool or dispatch exists; update
the "hint" string(s) (the dict entries with the "hint" key in dispatch.py) to
remove any mention of "traces" (and related phrasing about searching for trace
tools) so the MCP does not advertise an unsupported capability; repeat the same
removal for the other similar hint block in the file (the second "hint" entry
referenced in the review) to keep messaging consistent with registry.py's actual
tools.

In `@server/aurora_mcp/registry.py`:
- Around line 975-981: _tokenize_query currently splits only on whitespace so
punctuation-attached tokens like "deployments?", "mttr/dora", or "rca,tools"
remain joined; update _tokenize_query to use a regex-based tokenization (e.g.
re.findall for alphanumeric runs) instead of q.split(), filter out tokens
shorter than 3, and add the required import (re) so tokens delimited by
punctuation are separated correctly.

In `@server/tests/mcp/test_mcp_registry.py`:
- Line 143: The monkeypatch of registry._check_skill_connected currently uses a
lambda with an unused parameter named u which triggers Ruff ARG005; update the
lambda used in the monkeypatch (the registry._check_skill_connected replacement)
to rename the unused parameter to a leading-underscore placeholder (e.g., _u or
simply _) so the function signature remains the same but the linter no longer
flags the unused argument.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 46f4f880-1218-436b-85d8-ac3b239c39e1

📥 Commits

Reviewing files that changed from the base of the PR and between 6cbb080 and aa3cec7.

📒 Files selected for processing (8)

docs/design/mcp-agent-steering.md
server/aurora_mcp/dispatch.py
server/aurora_mcp/prompts.py
server/aurora_mcp/registry.py
server/aurora_mcp/tools_always_on.py
server/mcp_server.py
server/tests/mcp/test_mcp_registry.py
website/docs/integrations/mcp.md

…tuation, fix test lint

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

server/aurora_mcp/dispatch.py (1)

115-123: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fetch the full ranked match set before reordering by callable_now.

This truncates before visibility sorting. For broad queries, disconnected entries can fill the first limit * 2 slots, so connected tools that rank slightly lower never get surfaced even though this function is trying to put callable tools first. Pull the full ranked set, then split visible/non-visible, and only apply limit at the end.

Proposed fix

 from .registry import (
+    DISPATCH_ALLOWLIST,
     _get_cached_connector_status,
     dispatch_entry_visible,
     find_dispatch_entry,
     parse_and_cache_connector_status,
     search_dispatch_entries,
@@
     all_matches = search_dispatch_entries(
         query=query, category=category, connector=connector,
-        user_id=None, limit=limit * 2,
+        user_id=None, limit=len(DISPATCH_ALLOWLIST),
     )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/aurora_mcp/dispatch.py` around lines 115 - 123, The code currently
truncates matches before reordering by visibility (all_matches =
search_dispatch_entries(..., limit=limit * 2)), which can exclude callable
entries; change the flow in dispatch.py so you fetch the full ranked set (call
search_dispatch_entries without the premature limit or with a sufficiently
large/unbounded limit), then build annotated = [(e, dispatch_entry_visible(e,
user_id)) for e in all_matches], then form ordered by visible-first ([(e,
True)...] + [(e, False)...]) and only apply the final [:limit] slice to ordered;
adjust variable usage around all_matches, annotated, and ordered accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@server/aurora_mcp/dispatch.py`:
- Around line 115-123: The code currently truncates matches before reordering by
visibility (all_matches = search_dispatch_entries(..., limit=limit * 2)), which
can exclude callable entries; change the flow in dispatch.py so you fetch the
full ranked set (call search_dispatch_entries without the premature limit or
with a sufficiently large/unbounded limit), then build annotated = [(e,
dispatch_entry_visible(e, user_id)) for e in all_matches], then form ordered by
visible-first ([(e, True)...] + [(e, False)...]) and only apply the final
[:limit] slice to ordered; adjust variable usage around all_matches, annotated,
and ordered accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a5efd3f1-6192-4196-a967-db40205d35a8

📥 Commits

Reviewing files that changed from the base of the PR and between aa3cec7 and 264a2b2.

📒 Files selected for processing (3)

server/aurora_mcp/dispatch.py
server/aurora_mcp/registry.py
server/tests/mcp/test_mcp_registry.py

…ools aren't dropped

sonarqubecloud · 2026-06-03T03:55:12Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

isiddharthsingh added 7 commits June 1, 2026 18:10

fix(mcp): tokenize search_tools matcher; promote graph reads to Tier-…

87218a8

…1, drop phantom github route, add CI/CD+Sentry+Grafana entries

feat(mcp): promote 6 curated reads to Tier-1 and re-steer chat_with_a…

d05d35f

…urora toward direct lookups

refactor(mcp): re-steer instructions, search_tools discovery, and pro…

5186d92

…mpts toward direct reads

test(mcp): cover promoted Tier-1 tools, tokenized matcher, and new co…

c7aa3ed

…nnector entries

docs(mcp): document direct-read steering, promoted tools, connector c…

8f399f4

…overage, and per-client setup

refactor(mcp): simplify match haystack and document promoted-tool/all…

a7184f2

…owlist coupling

fix(mcp): fall back to query-only search when a wrong category filter…

aa3cec7

… zeroes out real token matches

isiddharthsingh marked this pull request as ready for review June 3, 2026 01:14

chore(mcp): remove internal design doc

dd614b8

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread server/aurora_mcp/dispatch.py

Comment thread server/aurora_mcp/registry.py Outdated

Comment thread server/tests/mcp/test_mcp_registry.py Outdated

fix(mcp): drop unsupported traces from hints, tokenize search on punc…

264a2b2

…tuation, fix test lint

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

fix(mcp): rank full match set before visibility reorder so callable t…

92e5f76

…ools aren't dropped

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Aurora MCP tool discovery & steering#466

Improve Aurora MCP tool discovery & steering#466
isiddharthsingh wants to merge 10 commits into
mainfrom
sms10221/dev-1208-mcp-interaction-is-bad

isiddharthsingh commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

sonarqubecloud Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

isiddharthsingh commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

Testing

Note for reviewers

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Jun 3, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

isiddharthsingh commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading