Skip to content

refactor(mcp): relocate user-facts logic out of EvolveClient#252

Merged
gaodan-fang merged 3 commits into
mainfrom
#241
May 11, 2026
Merged

refactor(mcp): relocate user-facts logic out of EvolveClient#252
gaodan-fang merged 3 commits into
mainfrom
#241

Conversation

@gaodan-fang
Copy link
Copy Markdown
Collaborator

@gaodan-fang gaodan-fang commented May 8, 2026

Summary

Relocates user-facts logic out of EvolveClient so the client stays a pure CRUD layer and the LLM module stays pure LLM, mirroring the existing save_trajectory / get_guidelines pattern already used in mcp_server.py.

After this change, all three write-tools (create_entity, store_user_facts, save_trajectory) converge on client.update_entities, and all three read-tools (get_entities, get_guidelines, retrieve_user_facts) converge on client.search_entities. This sets up the shared MCP-tool-wrapping helper from the original issue #241 ask as a trivial follow-up PR.

Changes

  • altk_evolve/llm/fact_extraction/fact_extraction.py — add categorize_facts() pure helper that groups list[RecordedEntity] by metadata.category.
  • altk_evolve/frontend/client/evolve_client.py — remove store_user_facts and retrieve_user_facts methods; drop now-unused imports (Any, ExtractedFact, extract_facts_from_messages).
  • altk_evolve/frontend/mcp/mcp_server.py — rewrite both MCP tools to orchestrate directly:
    • store_user_facts: _parse_metadata → trim-and-empty-skip → _resolve_namespaceextract_facts_from_messages → build Entity(type="fact", ...)client.update_entities (with NamespaceNotFoundException retry) → serialize.
    • retrieve_user_facts: limit <= 0 / namespace-missing early return → client.search_entities via _search_facts_with_fallback (preserves the query-less retry and default-user fallback from the old client) → categorize_facts → serialize.
  • tests/unit/test_client.py — remove the fact tests (logic no longer lives in EvolveClient).
  • tests/unit/test_mcp_server.py — update mocks to target client.update_entities / search_entities / namespace_exists; add parametrized coverage for None / "" / whitespace messages, trimming, namespace-missing, and non-positive-limit early returns.
  • .gitignore — ignore .omc/ workspace artifacts.

Scope notes

  • Deliberately deferred: the original issue refactor: extract shared persistence helper for entity writes #241 ask for a shared MCP-tool-wrapping helper (metadata parsing + NamespaceNotFoundException retry + EntityUpdate→JSON serialization across the three write-tools), and the latent retry-gap fix in store_user_facts. Both become trivial follow-ups after this refactor.
  • MCP tool signatures and JSON envelope shapes are unchanged.
  • EvolveClient.store_user_facts / retrieve_user_facts are fully removed with no back-compat shims — only in-tree callers (MCP tools and tests) imported them.

Test plan

  • uv run ruff check altk_evolve/ tests/unit/ — clean
  • uv run ruff format --check — clean
  • uv run python -m mypy on changed files — success, no issues
  • uv run pytest tests/unit/ — 271 passed, 0 failed (excluding optional psycopg / milvus backends)
  • test_store_and_retrieve_user_facts e2e test is unchanged in intent (it exercises the public MCP tool surface, which is preserved)

Refs #241

Summary by CodeRabbit

  • New Features

    • Added categorize_facts() utility function for organizing facts by category.
  • Refactor

    • Removed store_user_facts() and retrieve_user_facts() methods from EvolveClient.
    • MCP server now directly handles fact extraction and retrieval operations.

Follow the save_trajectory / get_guidelines pattern so EvolveClient
stays a pure CRUD layer and the LLM module stays pure LLM. After the
refactor, all write-tools converge on client.update_entities and all
read-tools on client.search_entities, which sets up the shared
MCP-tool-wrapping helper from the original issue as a trivial
follow-up.

- Add categorize_facts() pure helper in altk_evolve/llm/fact_extraction
- Remove store_user_facts / retrieve_user_facts from EvolveClient
- Rewrite the matching MCP tools to orchestrate directly: store via
  extract_facts_from_messages + client.update_entities, retrieve via
  client.search_entities + the existing fallback chain +
  categorize_facts. JSON envelopes and tool signatures unchanged.
- Move the fact tests from test_client.py into test_mcp_server.py,
  update mocks to target the CRUD methods, add parametrized coverage
  for None/""/whitespace messages and non-positive-limit early returns.

Refs #241
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR relocates fact extraction and retrieval logic from EvolveClient to the MCP server layer, adding server-side fact categorization, namespace resolution with retry handling, and fallback search chains for user-scoped and default-user-scoped facts.

Changes

User Facts Logic Migration

Layer / File(s) Summary
Fact Categorization Contract
altk_evolve/llm/fact_extraction/fact_extraction.py
New public categorize_facts function groups RecordedEntity records by metadata category (defaulting to "misc") and returns structured result with id, content, key, and value.
Client API Removal
altk_evolve/frontend/client/evolve_client.py
Removes store_user_facts and retrieve_user_facts methods; cleans up imports of typing.Any and fact-extraction utilities.
MCP Server Helper Functions
altk_evolve/frontend/mcp/mcp_server.py
Adds fact extraction imports, introduces _empty_store_user_facts_response helper, and implements _search_facts_with_fallback for multi-step fact queries across user and default-user scopes.
MCP Server Tool Implementation
altk_evolve/frontend/mcp/mcp_server.py
Rewrites store_user_facts to extract facts server-side, build Entity(type="fact") records, resolve namespace with NamespaceNotFound retry; rewrites retrieve_user_facts to validate inputs, fetch via fallback search, and categorize results.
Test Updates
tests/unit/test_client.py, tests/unit/test_mcp_server.py
Removes two EvolveClient.store_user_facts unit tests; expands MCP server tests to verify payload structure, fact extraction behavior, blank-message short-circuiting, whitespace trimming, fallback search mechanics, and edge cases (missing namespace, non-positive limit).
Build Configuration
.gitignore
Adds .omc to ignore patterns.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • AgentToolkit/altk-evolve#238: Adds the original store_user_facts and retrieve_user_facts MCP tools that call EvolveClient methods; this PR refactors those tools to move extraction and retrieval server-side with new categorization and fallback search logic.

Suggested reviewers

  • visahak
  • illeatmyhat

Poem

🐰 The rabbit hops with glee, extracting facts with care,
From client-side to server they now float through the air,
With fallback chains and namespace retries clean,
The finest fact-organizing refactor I've seen! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: relocating user-facts logic from EvolveClient to mcp_server.py, which is the primary objective of this refactor.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch #241

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
altk_evolve/frontend/mcp/mcp_server.py (1)

411-427: 💤 Low value

Consider adding NamespaceNotFoundException handling for consistency.

Unlike get_entities_logic (lines 179-194) which handles NamespaceNotFoundException with a retry pattern, retrieve_user_facts relies solely on the namespace_exists pre-check. There's a potential TOCTOU race if the namespace is deleted between the check (line 416) and the search call (line 426).

Given that namespace deletion is rare, this is low-risk, but adding exception handling would align with the established pattern in this file.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@altk_evolve/frontend/mcp/mcp_server.py` around lines 411 - 427,
retrieve_user_facts currently does a namespace_exists pre-check but can still
suffer a TOCTOU race if the namespace is deleted before
_search_facts_with_fallback is called; add a try/except around the call to
_search_facts_with_fallback to catch NamespaceNotFoundException and implement
the same retry/handle logic used in get_entities_logic (i.e., on
NamespaceNotFoundException re-check/create the namespace or retry the operation
once, then return the empty JSON result if it still fails). Reference
retrieve_user_facts, _search_facts_with_fallback, namespace_exists, and
NamespaceNotFoundException when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@altk_evolve/frontend/mcp/mcp_server.py`:
- Around line 411-427: retrieve_user_facts currently does a namespace_exists
pre-check but can still suffer a TOCTOU race if the namespace is deleted before
_search_facts_with_fallback is called; add a try/except around the call to
_search_facts_with_fallback to catch NamespaceNotFoundException and implement
the same retry/handle logic used in get_entities_logic (i.e., on
NamespaceNotFoundException re-check/create the namespace or retry the operation
once, then return the empty JSON result if it still fails). Reference
retrieve_user_facts, _search_facts_with_fallback, namespace_exists, and
NamespaceNotFoundException when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fd1d1b89-af79-4db3-9d26-16ebbce5a082

📥 Commits

Reviewing files that changed from the base of the PR and between d7a3f49 and 12d3785.

📒 Files selected for processing (6)
  • .gitignore
  • altk_evolve/frontend/client/evolve_client.py
  • altk_evolve/frontend/mcp/mcp_server.py
  • altk_evolve/llm/fact_extraction/fact_extraction.py
  • tests/unit/test_client.py
  • tests/unit/test_mcp_server.py
💤 Files with no reviewable changes (2)
  • tests/unit/test_client.py
  • altk_evolve/frontend/client/evolve_client.py

@visahak
Copy link
Copy Markdown
Collaborator

visahak commented May 11, 2026

Summary

This PR moves user-facts storage/retrieval logic from EvolveClient into the MCP server, adds provenance/influence audit support to the evolve-lite plugin, and expands
platform integration coverage including Codex sandbox flows.

Findings

  1. Removed public Python client user-facts API without a compatibility path (confidence: 85/100)
    • Why it matters: EvolveClient is the documented programmatic interface, and this PR deletes store_user_facts and retrieve_user_facts from that class while only
      preserving equivalent behavior as MCP tools. Existing Python consumers calling those methods will now fail with AttributeError.
    • Evidence: altk_evolve/frontend/client/evolve_client.py removes both methods, and a local probe confirms both attributes are absent from EvolveClient.
    • Example: hasattr(EvolveClient, "store_user_facts") and hasattr(EvolveClient, "retrieve_user_facts") both return False.

Testing

  • set -a; source .env; set +a; uv run pytest -m e2e -v: 197 passed, 2 skipped, 574 deselected, 1 warning
  • uv run pytest tests/unit/test_client.py tests/unit/test_mcp_server.py -v: 32 passed
  • uv run python plugin-source/build_plugins.py check: passed
  • git diff --check upstream/main...HEAD: passed
  • Failure details: none after allowing uv cache access

Copy link
Copy Markdown
Collaborator

@visahak visahak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this feature needs backwards compatibility.

@gaodan-fang
Copy link
Copy Markdown
Collaborator Author

@visahak Good call to raise it. My read: no back-compat needed here.

  • EvolveClient.store_user_facts / retrieve_user_facts were introduced in feat(mcp): user facts tools, SSE transport hardening, and warmup #238 and shipped in 1.1.0 — they've existed for a single release cycle, so downstream Python adoption is likely near-zero.
  • The only in-tree callers were mcp_server.py and two test files (all updated in this PR) — I grepped before removing.
  • The public MCP tool surface (tool names, parameters, JSON envelopes) is unchanged, which is the primary consumer pattern.
  • Every shim option has a real design cost: inline duplication in evolve_client.py, a layer-crossing import from client → frontend.mcp.mcp_server, or re-coupling fact_extraction.py to the client (the exact inversion this PR was trying to remove).

If an external caller surfaces later, re-adding a thin shim is a 10-line change — easier to design against a concrete caller than a hypothetical one.

@gaodan-fang gaodan-fang merged commit 85cf98d into main May 11, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants