feat(mcp): expand MCP server with learn, status, session_list tools (#717) by magicpro97 · Pull Request #729 · magicpro97/copilot-session-knowledge

magicpro97 · 2026-05-30T08:59:33Z

Summary

Expands MCP server with write tools: learn, status, session_list.

Closes #717

Changes

mcp-server.py: Added learn, status, session_list MCP tools
test_fixes.py: 35 I717 tests (now correctly placed after sys.exit fix)

Fixes

Ruff CI errors fixed (rebased on main b0352a5)
I717 tests were dead code (after sys.exit) — fixed (original review by review-pr725)

Review

Original Opus review completed: dead code bug found and fixed
All 35 I717 tests now execute and pass ✅

Copilot

Pull request overview

Expands the stdio MCP server with three new tools — learn, status, and session_list — turning mcp-server.py from a read-only surface into a read/write knowledge surface, and adds an I717 test block in test_fixes.py exercising both source-level expectations and an end-to-end subprocess round-trip against a temporary $HOME and knowledge.db.

Changes:

Adds VALID_LEARN_CATEGORIES, three new TOOLS entries, dispatch wiring, and three handlers: _run_learn shells out to learn.py and parses an entry id from stdout; _run_status reports session/entry counts plus watcher liveness from .watcher.lock; _run_session_list returns recent sessions via parameterized SQL in read-only mode.
Adds ~35 I717-* tests in test_fixes.py covering source markers, dispatch wiring, SQL safety, and a JSON-RPC subprocess round-trip against a synthetic DB.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
mcp-server.py	Registers and implements `learn`, `status`, `session_list` MCP tools and dispatch entries.
test_fixes.py	Adds I717 source-check and integration tests for the new MCP write/read tools.

+    {
+        "name": "learn",
+        "description": "Write a knowledge entry (mistake, pattern, feature, discovery, etc.) to the local knowledge base. Issue #717.",
+        "inputSchema": {
+            "type": "object",
+            "properties": {
+                "category": {
+                    "type": "string",
+                    "enum": sorted(VALID_LEARN_CATEGORIES),
+                    "description": "Knowledge category.",
+                },
+                "title": {"type": "string", "description": "Short title for the knowledge entry."},
+                "description": {"type": "string", "description": "Content / body of the knowledge entry."},
+                "tags": {
+                    "type": "string",
+                    "description": "Comma-separated tags (optional).",
+                },
+            },
+            "required": ["category", "title", "description"],
+            "additionalProperties": False,
+        },
+    },
+    {
+        "name": "status",
+        "description": "Return a JSON health snapshot: session count, entry count, watcher status. Issue #717.",
+        "inputSchema": {
+            "type": "object",
+            "properties": {},
+            "required": [],
+            "additionalProperties": False,
+        },
+    },
+    {
+        "name": "session_list",
+        "description": "Return the most recent sessions from the local knowledge base. Issue #717.",
+        "inputSchema": {
+            "type": "object",
+            "properties": {
+                "limit": {
+                    "type": "integer",
+                    "minimum": 1,
+                    "maximum": 100,
+                    "description": "Maximum sessions to return (default 20).",
+                },
+            },
+            "required": [],
+            "additionalProperties": False,
+        },
+    },


+    combined = result.stdout + result.stderr
+    entry_id = None
+    m = re.search(r"#(\d+)", combined)
+    if m:
+        entry_id = int(m.group(1))
+
+    body = {"status": "ok", "message": "Entry recorded", "id": entry_id}


+def _run_learn(arguments: dict[str, Any]) -> dict[str, Any]:
+    """Write a knowledge entry by calling learn.py as a subprocess."""
+    category = _require_string(arguments, "category")
+    if category not in VALID_LEARN_CATEGORIES:
+        raise JsonRpcError(
+            JSONRPC_INVALID_PARAMS,
+            f"'category' must be one of: {', '.join(sorted(VALID_LEARN_CATEGORIES))}",
+        )
+    title = _require_string(arguments, "title")
+    description = _require_string(arguments, "description")
+    tags = _optional_string(arguments, "tags", max_length=500)
+
+    learn_py = TOOLS_DIR / "learn.py"
+    if not learn_py.exists():
+        raise JsonRpcError(JSONRPC_INTERNAL_ERROR, "learn.py not found")
+
+    flag = f"--{category}"
+    cmd = [sys.executable, str(learn_py), flag, title, description]
+    if tags:
+        cmd += ["--tags", tags]
+
+    try:
+        result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
+    except subprocess.TimeoutExpired as exc:
+        raise JsonRpcError(JSONRPC_INTERNAL_ERROR, "learn.py timed out") from exc
+    except Exception as exc:
+        raise JsonRpcError(JSONRPC_INTERNAL_ERROR, f"learn.py subprocess error: {exc}") from exc
+
+    if result.returncode != 0:
+        msg = result.stderr.strip() or result.stdout.strip() or "learn.py failed"
+        raise JsonRpcError(JSONRPC_INTERNAL_ERROR, msg)


magicpro97

Code Review — PR #729 (feat(mcp): expand MCP server with learn, status, session_list tools)

Three real bugs found. Style/formatting issues are not flagged.

🔴 Issue 1 — `_run_learn`: `title` and `description` have no length cap (DoS / subprocess arg overflow)

File: mcp-server.py, in _run_learn

title = _require_string(arguments, "title")         # ← no max_length
description = _require_string(arguments, "description")  # ← no max_length
tags = _optional_string(arguments, "tags", max_length=500)  # tags IS capped

_require_string enforces non-empty but has no max_length parameter. A caller can pass an arbitrarily large string (e.g. 10 MB); it will be forwarded as a subprocess command-line argument:

cmd = [sys.executable, str(learn_py), flag, title, description]

This can cause: OOM in the subprocess, OS arg-length rejection (ARG_MAX ~2 MB on macOS/Linux), or a timeout (the 30 s guard fires but the damage is already done). Tags are already capped, but the higher-risk fields (body content) are not.

Fix: add explicit length validation before the subprocess call, e.g.:

if len(title) > 500:
    raise JsonRpcError(JSONRPC_INVALID_PARAMS, "'title' must be 500 characters or fewer")
if len(description) > 10_000:
    raise JsonRpcError(JSONRPC_INVALID_PARAMS, "'description' must be 10 000 characters or fewer")

🔴 Issue 2 — `_run_status` and `_run_session_list`: SQLite connection not closed on unexpected exceptions

File: mcp-server.py, in _run_status and _run_session_list

Both new functions open a DB connection but only call db.close() on the happy path. The established pattern in this file (see _run_query_memory, the finally: db.close() block) is try / finally. Neither new function follows it.

In _run_status:

try:
    db = sqlite3.connect(db_uri, uri=True)
    ...
    db.close()        # ← skipped if ProgrammingError, MemoryError, etc.
except Exception:
    pass              # ← silently swallows everything; db never closed

In _run_session_list the outer except Exception re-raises as JsonRpcError but also never closes the connection when a non-OperationalError occurs mid-query.

CPython reference counting makes this unlikely to leak in practice, but it deviates from the invariant maintained everywhere else in the file.

Fix: use try / finally (matching _run_query_memory):

db = sqlite3.connect(db_uri, uri=True)
try:
    ...
finally:
    db.close()

🟡 Issue 3 — `_run_learn`: fragile `entry_id` regex can return the wrong ID

File: mcp-server.py, in _run_learn

combined = result.stdout + result.stderr
m = re.search(r"#(\d+)", combined)

re.search returns the first match of #N anywhere in stdout+stderr. If learn.py emits any other #N before the entry-ID line (progress counter, line reference, etc.), the returned id will silently be wrong — not null, but a misidentified integer.

Fix: anchor to the known output format of learn.py, or have it emit a machine-readable marker (e.g. ENTRY_ID=123) to parse unambiguously.

✅ What looks good

SQL injection: session_list correctly uses ? placeholder — no interpolation.
Category validation: VALID_LEARN_CATEGORIES whitelist checked server-side before subprocess call; also enforced in JSON schema.
Read-only DB: both new handlers open with ?mode=ro.
status missing-DB path: gracefully returns zeros.
No shell injection: subprocess.run uses list form (shell=False) despite user-supplied args.
30 s timeout on learn.py subprocess is appropriate.
Tests I717-15..22: real end-to-end subprocess round-trips with synthetic DB — good coverage. I717-22 validates invalid-category rejection.

magicpro97 · 2026-05-30T12:04:12Z

Fixed all reviewer issues:

Token gating: Added _check_auth(arguments) at the top of _run_learn — matches the pattern in _run_query_memory
Length caps: Added max_length=500 for title and max_length=10_000 for description via updated _require_string signature
Fragile regex: Now passes --json flag to learn.py and parses structured {"id": N} JSON output; only falls back to anchored regex Added new \w+ #(\d+) when JSON parse fails
Connection leaks in _run_status / _run_session_list: Wrapped db.close() calls in try/finally blocks

Requesting re-review. CI pending.

…717) - mcp-server.py: add learn tool (writes knowledge entry via subprocess), status tool (DB health snapshot), session_list tool (last N sessions) - Wire all 3 in _handle_tools_call() with proper input schemas - test_fixes.py: +35 tests (I717) Closes #717 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The 35 I717 test cases were placed after sys.exit() and never executed. Move the terminal exit to after the I717 block and add a proper final summary block. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…atus/session_list

…+ JSON ID parsing - Add _check_auth() call to _run_learn for COPILOT_MCP_TOKEN gating (matching _run_query_memory) - Add max_length=500 to title and max_length=10000 to description in _run_learn - Update _require_string to accept optional max_length parameter - Fix fragile #(\d+) regex: pass --json to learn.py and parse structured JSON output for entry_id; anchored fallback regex 'Added new \w+ #(\d+)' used only if JSON missing - Fix connection leak in _run_status: use try/finally around db.close() - Fix connection leak in _run_session_list: use try/finally around db.close() Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 30, 2026 08:59

Copilot AI reviewed May 30, 2026

View reviewed changes

magicpro97 force-pushed the feat/i717-mcp-write-rebased branch 3 times, most recently from a590699 to 8404fe0 Compare May 30, 2026 11:39

magicpro97 commented May 30, 2026

View reviewed changes

magicpro97 force-pushed the feat/i717-mcp-write-rebased branch from 5afd762 to b89fd2d Compare May 30, 2026 12:11

Linh Ngo and others added 6 commits May 30, 2026 19:16

fix(lint): ruff fix in test_fixes.py

1e9ede8

fix(format): ruff format test_fixes.py

1ec75c1

fix(tests): update mcp tool count from 3→6 after wave 8 adds learn/st…

6bfb7b4

…atus/session_list

magicpro97 force-pushed the feat/i717-mcp-write-rebased branch from b89fd2d to 2552679 Compare May 30, 2026 12:16

magicpro97 merged commit 2226bae into main May 30, 2026
34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): expand MCP server with learn, status, session_list tools (#717)#729

feat(mcp): expand MCP server with learn, status, session_list tools (#717)#729
magicpro97 merged 6 commits into
mainfrom
feat/i717-mcp-write-rebased

magicpro97 commented May 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

magicpro97 left a comment

Uh oh!

magicpro97 commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

magicpro97 commented May 30, 2026

Summary

Changes

Fixes

Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

magicpro97 left a comment

Choose a reason for hiding this comment

Code Review — PR #729 (feat(mcp): expand MCP server with learn, status, session_list tools)

🔴 Issue 1 — _run_learn: title and description have no length cap (DoS / subprocess arg overflow)

🔴 Issue 2 — _run_status and _run_session_list: SQLite connection not closed on unexpected exceptions

🟡 Issue 3 — _run_learn: fragile entry_id regex can return the wrong ID

✅ What looks good

Uh oh!

magicpro97 commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🔴 Issue 1 — `_run_learn`: `title` and `description` have no length cap (DoS / subprocess arg overflow)

🔴 Issue 2 — `_run_status` and `_run_session_list`: SQLite connection not closed on unexpected exceptions

🟡 Issue 3 — `_run_learn`: fragile `entry_id` regex can return the wrong ID