From 47355d876018aa91ab6846af5a8c7ec517a5c96d Mon Sep 17 00:00:00 2001 From: Dan Powers Date: Wed, 3 Jun 2026 12:54:11 -0500 Subject: [PATCH 1/6] feat(profile): #201 PR 2 of 5. AgentProfile.mcp_servers_resolved sibling field + doctor check + bug fixes Stream 2 of PR 2 (parallel with Stream 1 which owns agent.py wiring and spec/36). Core changes: - profile/types.py: add `mcp_servers_resolved: list[MCPServerSpec] = field(default_factory=list)` as LAST field in AgentProfile (Python dataclass default placement rule). Adds `field` to dataclass import. to_dict() always emits `mcp_servers_resolved: []` (locked Q2 -- keeps resolved MCP env secrets out of snapshot JSON on disk). from_dict() reconstructs via _mcp_spec_from_dict for both mcp_servers and mcp_servers_resolved. Adds _mcp_spec_from_dict helper as inverse of existing _mcp_spec_to_dict. - Latent bug fix (free side-effect): the pre-existing mcp_servers fallback path in AgentProfile.from_dict() returned raw dicts instead of MCPServerSpec instances. Now fixed by routing through _mcp_spec_from_dict. This silently broke attribute access on profile.mcp_servers in database-round-trip scenarios. - profile/filesystem.py: sort mcp_servers lexicographically by name after parse (locked Q1). Aligns pre-#201 path with FilesystemMCPServerRegistryBackend.list_mcp_servers() which already sorts. Adds mcp_servers_resolved=[] to AgentProfile direct construction. - profile/sqlite.py: adds comment confirming mcp_servers_resolved rides through profile_json blob via to_dict/from_dict with no schema change. Schema stays at v2. - mcp_registry/filesystem.py PR 1 bug fix: list_mcp_servers() previously caught all OSError and returned [] -- this masked transient failures (PermissionError, IsADirectoryError) from the fail-closed path PR 2 wires. Fixed to mirror load_mcp_server's ENOENT discrimination: FileNotFoundError (race after exists() check) -> return []; other OSError -> raise MCPRegistryUnavailable. Also overrides load_all_mcp_servers() with a single read-parse (replacing _default_load_all delegation) so MCPRegistryDescriptorInvalid and MCPRegistryUnavailable propagate correctly from bulk load. - doctor.py: adds check_mcp_server_registry_backend(agent_root) with PASS/WARN/FAIL ladder. Uses _redact_for_error_message from mcp_registry/__init__.py (handles URL + DSN heuristics, distinct from the inline truncation in check_tool_registry_backend). Reads backend.capabilities as a property. Detail dict includes all 5 capability fields + mcp_server_count from list_mcp_servers(). Added to SKIP enumeration and dispatch after check_corpus_backend. - docs/spec/24-agent-profile-backend.md: D1 addendum (#201 PR 2) documenting the new field, its populate-via-agent.py-only rule, snapshot serialization security rationale, and SQLite schema verification. - tests/test_profile_mcp_servers_resolved.py (NEW, 9 tests): covers field existence + last placement, to_dict security clamping, from_dict round-trips, _mcp_spec_from_dict helper, the mcp_servers fallback bug fix, snapshot round-trip, and SQLite round-trip. - tests/test_mcp_server_registry_filesystem_backend.py: updated test_load_all_mcp_servers_delegates_to_default_load_all -> now tests direct single-read- parse behavior (the delegation test was correct for PR 1 behavior; it is wrong for PR 2's locked design). Removed unused _default_load_all import and os import. CHANGELOG note: mcp_servers sort order change in pre-#201 filesystem profiles is a one-time reorder in introspection output; no functional impact (declaration order was never a contract). The PR 1 ENOENT discrimination fix should be called out in the CHANGELOG entry. Co-Authored-By: Claude Opus 4.7 --- atomic_agents/doctor.py | 119 ++++++ atomic_agents/mcp_registry/filesystem.py | 80 +++- atomic_agents/profile/filesystem.py | 7 + atomic_agents/profile/sqlite.py | 4 + atomic_agents/profile/types.py | 77 +++- docs/spec/24-agent-profile-backend.md | 10 + ..._mcp_server_registry_filesystem_backend.py | 34 +- tests/test_profile_mcp_servers_resolved.py | 343 ++++++++++++++++++ 8 files changed, 635 insertions(+), 39 deletions(-) create mode 100644 tests/test_profile_mcp_servers_resolved.py diff --git a/atomic_agents/doctor.py b/atomic_agents/doctor.py index ed569b3..32ec849 100644 --- a/atomic_agents/doctor.py +++ b/atomic_agents/doctor.py @@ -166,6 +166,7 @@ def run_doctor( "mandate-backend", "policy-backend", "corpus-backend", + "mcp-server-registry-backend", "memory-backend", "write-paths", ): @@ -226,6 +227,7 @@ def run_doctor( results.append(check_policy_backend(resolved_root, cascade=cascade)) results.append(check_persona_backend(resolved_root)) results.append(check_corpus_backend(agent_root)) + results.append(check_mcp_server_registry_backend(agent_root)) results.append(check_memory_backend(agent_root)) results.append(check_write_paths(tools_data, agent_root=agent_root)) @@ -2684,6 +2686,123 @@ def check_corpus_backend(agent_root: Path) -> CheckResult: ) +def check_mcp_server_registry_backend(agent_root: Path) -> CheckResult: + """Operator-config coherence check for the MCP server registry backend (#201 PR 2). + + Validates that ``ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND`` (plus + ``ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND_URL`` when non-filesystem) + is correctly configured: + + * unset / empty / ``filesystem`` → PASS (today's filesystem-default + deployment shape -- no extras needed, no URL needed). + * unknown backend_id (typo or pasted credential) → FAIL with credential- + redacted echo + list of registered ids. Uses ``_redact_for_error_message`` + from ``mcp_registry/__init__.py`` (handles ``://`` URL heuristic AND + ``user:pass@host`` DSN heuristic AND length truncation -- distinct from + the inline truncation in ``check_tool_registry_backend`` which misses + DSN-style values). + * transient probe failure → WARN (matches ``check_provider_keys`` pattern; + doctor does not crash on optional infrastructure). + + Reads ``backend.capabilities`` (property, not method). Detail dict + includes all 5 capability fields plus ``mcp_server_count`` from + ``list_mcp_servers()`` (NOT ``load_all_mcp_servers``, which materializes + resolved env values). + + Mirrors the operator-coherence layer pattern of ``check_lock_backend`` + and ``check_tool_registry_backend``. ``run_doctor`` then exercises the + backend through the agent's actual construction path. + """ + import os + + from .mcp_registry import ( + MCPRegistryError, + MCPRegistryUnavailable, + _redact_for_error_message, + get_default_mcp_server_registry_backend, + list_mcp_server_registry_backends, + ) + + raw = ( + os.environ.get("ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND", "").strip().lower() + ) + backend_id = raw if raw else "filesystem" + available = list_mcp_server_registry_backends() + + if backend_id not in available: + safe_id = _redact_for_error_message(raw) + return CheckResult( + name="mcp-server-registry-backend", + status=FAIL, + message=( + f"ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND={safe_id!r} is " + f"not a known backend. Available: {available}" + ), + fix_hint=( + f"Set ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND to one of " + f"{available}, or unset it to use the filesystem default." + ), + detail={"safe_backend_id": safe_id, "available_backends": available}, + ) + + try: + backend = get_default_mcp_server_registry_backend(agent_root, []) + except MCPRegistryError as exc: + safe = _redact_for_error_message(str(exc)) + return CheckResult( + name="mcp-server-registry-backend", + status=FAIL, + message=f"failed to construct {backend_id!r} backend", + fix_hint=f"check the env vars: {safe}", + detail={"backend_id": backend_id}, + ) + + try: + refs = backend.list_mcp_servers() + except MCPRegistryUnavailable: + return CheckResult( + name="mcp-server-registry-backend", + status=WARN, + message=( + f"operator-pinned backend {backend_id!r} configured but " + f"list_mcp_servers() probe failed" + ), + fix_hint=( + "Verify the catalog is reachable. Doctor warns instead of " + "failing; the framework runtime will fail at first list or " + "load if the backend is truly down." + ), + detail={"backend_id": backend_id}, + ) + except Exception as exc: + return CheckResult( + name="mcp-server-registry-backend", + status=FAIL, + message=(f"backend {backend_id!r} probe raised {type(exc).__name__}"), + fix_hint="See logs for the exception details.", + detail={"backend_id": backend_id}, + ) + + caps = backend.capabilities + return CheckResult( + name="mcp-server-registry-backend", + status=PASS, + message=( + f"{backend_id} backend ok " + f"({len(refs)} MCP server{'s' if len(refs) != 1 else ''} mounted)" + ), + detail={ + "backend_id": backend.backend_id, + "supports_install": caps.supports_install, + "supports_uninstall": caps.supports_uninstall, + "supports_capability_handshake": caps.supports_capability_handshake, + "supports_audit": caps.supports_audit, + "durable": caps.durable, + "mcp_server_count": len(refs), + }, + ) + + def check_memory_backend(agent_root: Path) -> CheckResult: """FilesystemBackend resolves and stats() returns successfully.""" memory_dir = agent_root / "memory" diff --git a/atomic_agents/mcp_registry/filesystem.py b/atomic_agents/mcp_registry/filesystem.py index 4ade288..e34a324 100644 --- a/atomic_agents/mcp_registry/filesystem.py +++ b/atomic_agents/mcp_registry/filesystem.py @@ -34,7 +34,6 @@ MCPRegistryDescriptorInvalid, MCPRegistryUnavailable, MCPServerNotInRegistry, - _default_load_all, ) from .types import MCPServerRef, MCPServerRegistryCapabilities, ValidationResult @@ -170,13 +169,15 @@ def list_mcp_servers(self) -> list[MCPServerRef]: try: content = mcp_md.read_text(encoding="utf-8") - except OSError as exc: - _logger.warning( - "FilesystemMCPServerRegistryBackend: cannot read %s: %s", - mcp_md, - exc, - ) + except FileNotFoundError: + # ENOENT: race between exists() check above and read; treat as absent. return [] + except OSError as exc: + # Non-ENOENT (PermissionError, IsADirectoryError, etc.): transient + # configuration error. Mirror load_mcp_server's MCPRegistryUnavailable + # path (filesystem.py lines for load_mcp_server) for symmetry and to + # surface the failure to PR 2's fail-closed wiring in agent.py:__init__. + raise MCPRegistryUnavailable(f"cannot read {mcp_md}: {exc}") from exc try: specs = parse_mcp_md_text( @@ -315,14 +316,65 @@ def load_mcp_server(self, name: str) -> MCPServerSpec: def load_all_mcp_servers(self) -> list[MCPServerSpec]: """Return all mounted ``MCPServerSpec`` instances in bulk. - Delegates to ``_default_load_all`` per prep-pass Theme 4. This - preserves MUST 10 consistency automatically: the output is - semantically equivalent to ``[load_mcp_server(ref.name) for ref in - list_mcp_servers()]`` by construction. - - HTTP backend overrides this with a single bulk GET at PR 4. + Reads mcp.md once and parses, then resolves env vars per spec + (Decision 7). Distinct from the default ``_default_load_all`` (which + iterates ``list_mcp_servers`` then calls ``load_mcp_server`` per ref) + because that pattern masks transient and parse failures: + ``list_mcp_servers`` used to catch all OSError before the PR 2 fix. + The single read-parse here maps: + - ENOENT: empty list (correct -- "no mcp.md is not a probe failure") + - Other OSError: MCPRegistryUnavailable (transient) + - Parse error: MCPRegistryDescriptorInvalid (permanent descriptor problem) + - Env-var unresolvable: MCPServerConnectFailed per spec/19 + + PR 2 framework-level fail-closed semantic (spec/36) depends on these + distinct exceptions surfacing correctly. """ - return _default_load_all(self) + mcp_md = self._agent_root / "mcp.md" + if not mcp_md.exists(): + return [] + + try: + content = mcp_md.read_text(encoding="utf-8") + except FileNotFoundError: + return [] + except OSError as exc: + raise MCPRegistryUnavailable(f"cannot read {mcp_md}: {exc}") from exc + + try: + specs = parse_mcp_md_text( + content, + mcp_md_path=mcp_md, + read_paths=None, + resolve_env=False, + ) + except Exception as exc: + raise MCPRegistryDescriptorInvalid( + f"mcp.md at {mcp_md} could not be parsed: {exc}" + ) from exc + + materialized = [] + for spec in specs: + try: + _validate_server_name(spec.name) + except ValueError: + _logger.warning( + "FilesystemMCPServerRegistryBackend: skipping malformed " + "section name %r in mcp.md (failed charset validation)", + spec.name, + ) + continue + resolved_env = _resolve_env_vars(spec.env, spec.name) + from dataclasses import replace as _dc_replace + + materialized_spec = _dc_replace(spec, env=resolved_env) + if self._read_paths: + validate_mcp_server_args(materialized_spec, self._read_paths) + materialized.append(materialized_spec) + + # Sort lexicographically per MUST 5 (consistent with list_mcp_servers). + materialized.sort(key=lambda s: s.name) + return materialized def validate(self, name: str) -> ValidationResult: """Static check of the named server descriptor. diff --git a/atomic_agents/profile/filesystem.py b/atomic_agents/profile/filesystem.py index 9e6596d..f209aab 100644 --- a/atomic_agents/profile/filesystem.py +++ b/atomic_agents/profile/filesystem.py @@ -384,6 +384,12 @@ def load_profile(self, agent_id: str) -> AgentProfile: mcp_servers = [] else: mcp_servers = [] + # Sort lexicographically by name (locked decision Q1 from PR 2 prep). + # Aligns this path with FilesystemMCPServerRegistryBackend.list_mcp_servers() + # which already sorts (mcp_registry/filesystem.py). spec/36 MUST 5 + # applies to all backends consistently; the pre-#201 declaration order + # was an implementation detail of parse_mcp_md_text, not a contract. + mcp_servers = sorted(mcp_servers, key=lambda s: s.name) # ── persona/IDENTITY.md, SOUL.md, USER.md — raw text ── # When persona.link.md is present (link_present is True), the @@ -444,6 +450,7 @@ def load_profile(self, agent_id: str) -> AgentProfile: judges_md_raw=judges_md_raw, roster_md_raw=roster_md_raw, mcp_md_raw=mcp_md_raw, + mcp_servers_resolved=[], # populated by framework integration layer in agent.py ) # ──────────────────────────────────────────────────────────── diff --git a/atomic_agents/profile/sqlite.py b/atomic_agents/profile/sqlite.py index 5819339..fcd77f0 100644 --- a/atomic_agents/profile/sqlite.py +++ b/atomic_agents/profile/sqlite.py @@ -322,6 +322,10 @@ def load_profile(self, agent_id: str) -> AgentProfile: raise AgentProfileNotFound( f"agent {agent_id!r} profile_json is corrupt: {exc}" ) from exc + # mcp_servers_resolved rides through the profile_json blob via + # to_dict()/from_dict(). No schema change needed per locked design + # (Decision 1 of #63 PR 3 + #201 PR 2 prep pass verification). + # Schema version stays at v2 (the persona migration version from #62). return AgentProfile.from_dict(profile_dict) # ──────────────────────────────────────────────────────────── diff --git a/atomic_agents/profile/types.py b/atomic_agents/profile/types.py index 40b4b41..8ef1054 100644 --- a/atomic_agents/profile/types.py +++ b/atomic_agents/profile/types.py @@ -39,7 +39,7 @@ from __future__ import annotations -from dataclasses import dataclass, replace +from dataclasses import dataclass, field, replace from typing import Any from ..exceptions import MCPServerConnectFailed @@ -145,6 +145,21 @@ class AgentProfile: preserves ``$VAR_NAME`` env-var references verbatim; saving from this string never bakes resolved secrets into the on-disk file. + mcp_servers_resolved: List of ``MCPServerSpec`` instances + populated by the framework integration layer in + ``agent.py:__init__`` via ``dataclasses.replace()`` AFTER + ``load_profile()`` returns and BEFORE ``_load_config()`` + builds the AgentConfig. The source is + ``MCPServerRegistryBackend.load_all_mcp_servers()``, making + this field substrate-agnostic by construction. Backends MUST + NOT populate this field; they own ``mcp_servers`` only. + Default is ``[]``. + + This field is always serialized as ``[]`` in ``to_dict()`` + to keep resolved MCP env secrets out of snapshot JSON files + on disk. It re-populates from the registry backend at next + agent construction. See spec/36 Decision 9 and spec/24 D1 + addendum (#201 PR 2 of 5). """ # Required identity @@ -173,6 +188,16 @@ class AgentProfile: roster_md_raw: str mcp_md_raw: str + # Resolved MCP server specs from the registry backend (#201 PR 2 of 5). + # Populated by the framework integration layer in agent.py:__init__ via + # dataclasses.replace() AFTER load_profile() returns. Backends do NOT + # populate this field (layer separation per spec/24 Decision 7). + # The field is always serialized as [] in to_dict() (locked decision Q2 + # from prep pass) to keep resolved MCP secrets out of snapshot files on + # disk. It re-populates from the registry backend at next agent + # construction. spec/36 Decision 9; spec/24 D1 addendum. + mcp_servers_resolved: list[MCPServerSpec] = field(default_factory=list) + def to_dict(self) -> dict[str, Any]: """Serialize to a plain dict suitable for ``json.dumps`` / database column storage. @@ -199,6 +224,13 @@ def to_dict(self) -> dict[str, Any]: "judges_config": _judges_config_to_dict(self.judges_config), "roster": list(self.roster), "mcp_servers": [_mcp_spec_to_dict(s) for s in self.mcp_servers], + # Always serialize as [] (locked decision Q2 from PR 2 prep + # pass). The field is a framework-populated runtime transient; + # serializing real values would write resolved MCP env secrets + # into snapshot JSON files on disk, which contradicts spec/24 + # Decision 1's intent. The field re-populates from the + # registry backend at next agent construction. + "mcp_servers_resolved": [], "persona_identity": self.persona_identity, "persona_soul": self.persona_soul, "persona_user": self.persona_user, @@ -342,9 +374,26 @@ def from_dict(cls, d: dict[str, Any]) -> "AgentProfile": # Env-var unresolvable in this process — fall back to # the dict's structured form (best-effort). Raw text # is preserved for write-back regardless. - mcp_servers = list(d.get("mcp_servers") or []) + # Use _mcp_spec_from_dict to reconstruct MCPServerSpec + # instances (fixes the pre-existing latent bug where + # the fallback path returned raw dicts instead of + # MCPServerSpec instances). + mcp_servers = [ + _mcp_spec_from_dict(s) for s in (d.get("mcp_servers") or []) + ] else: - mcp_servers = list(d.get("mcp_servers") or []) + mcp_servers = [_mcp_spec_from_dict(s) for s in (d.get("mcp_servers") or [])] + + # mcp_servers_resolved is a runtime transient populated by + # agent.py:__init__. The to_dict() path always emits [] for + # security (locked Q2). When deserializing a dict that DOES + # contain the field (e.g. a future un-clamped snapshot or a + # direct test dict), reconstruct MCPServerSpec instances via + # _mcp_spec_from_dict for correctness. In normal operation + # this will always be an empty list. + mcp_servers_resolved = [ + _mcp_spec_from_dict(s) for s in (d.get("mcp_servers_resolved") or []) + ] return cls( name=name, @@ -364,6 +413,7 @@ def from_dict(cls, d: dict[str, Any]) -> "AgentProfile": judges_md_raw=judges_md_raw, roster_md_raw=roster_md_raw, mcp_md_raw=mcp_md_raw, + mcp_servers_resolved=mcp_servers_resolved, ) def replace(self, **changes: Any) -> "AgentProfile": @@ -564,3 +614,24 @@ def _mcp_spec_to_dict(spec: MCPServerSpec) -> dict[str, Any]: "transport": spec.transport, "description": spec.description, } + + +def _mcp_spec_from_dict(d: dict) -> MCPServerSpec: + """Reconstruct ``MCPServerSpec`` from a dict produced by ``_mcp_spec_to_dict``. + + Used by ``AgentProfile.from_dict`` for both ``mcp_servers`` and + ``mcp_servers_resolved``. Closes a pre-existing latent bug where the + ``mcp_servers`` fallback path returned raw dicts instead of + ``MCPServerSpec`` instances. + + Extra keys in ``d`` are silently ignored for forward-compatibility. + Missing optional keys fall back to ``MCPServerSpec`` field defaults. + """ + return MCPServerSpec( + name=d["name"], + command=d["command"], + args=list(d.get("args", [])), + env=dict(d.get("env", {})), + transport=d.get("transport", "stdio"), + description=d.get("description", ""), + ) diff --git a/docs/spec/24-agent-profile-backend.md b/docs/spec/24-agent-profile-backend.md index 64e5d3a..95e518d 100644 --- a/docs/spec/24-agent-profile-backend.md +++ b/docs/spec/24-agent-profile-backend.md @@ -58,6 +58,16 @@ The `AgentProfile` dataclass carries **both** the structured form (`model_config **There is no `renderers.py`.** The filesystem backend writes raw text via `_io.atomic_write`. A future canonical render layer (if a database backend needs to export back to markdown for migration) is the right place for renderers, with eyes-open about the loss surface. +**D1 addendum (#201 PR 2 of 5):** In addition to `mcp_servers` (the declared server set parsed from mcp.md, or the equivalent for the active profile backend), `AgentProfile` carries a sibling field `mcp_servers_resolved: list[MCPServerSpec]` with a default of `[]`. + +This field is populated by the framework integration layer in `agent.py:__init__` via `dataclasses.replace()` AFTER `load_profile()` returns and BEFORE `_load_config()` builds the AgentConfig. The source is `MCPServerRegistryBackend.load_all_mcp_servers()` (spec/36 Decision 9 framework invariant), making the field substrate-agnostic by construction. Backends MUST NOT populate this field themselves; they own `mcp_servers` (their substrate's parse output) only. + +For backward compatibility, `mcp_md_raw` and `mcp_servers` remain the canonical filesystem-backend write-back fields. The Decision 1 invariant (no resolved secrets in `mcp_md_raw`) stays intact. + +**Snapshot serialization (PR 2 locked):** `AgentProfile.to_dict()` always emits `"mcp_servers_resolved": []` regardless of the field's runtime value. The field is a framework-populated runtime transient that re-populates from the registry backend at next agent construction. Serializing real values would write resolved MCP `env` secrets into snapshot JSON files on disk, contradicting the security intent of Decision 1's raw-text shadow design. Audit of "what specs ran at time T" flows through agent.call() logs (which can redact appropriately), not through snapshots. + +**SQLite schema (PR 2 verified):** The new field rides through the existing `profile_json` blob column via `to_dict()`/`from_dict()`. Zero schema migration required. Schema version stays at v2 (the persona migration version from #62). Backward compat: old profiles loaded with `from_dict()` get `mcp_servers_resolved=[]` via the field's `default_factory=list` since the JSON dict lacks the key. + ### Decision 2: Skills are separate Protocol methods, not a profile field `AgentProfileBackend.list_skills(agent_id)` and `load_skill_body(agent_id, skill_name)` are separate Protocol methods. `AgentProfile` does NOT carry a `skills: list[SkillManifest]` field. diff --git a/tests/test_mcp_server_registry_filesystem_backend.py b/tests/test_mcp_server_registry_filesystem_backend.py index 3015813..8670389 100644 --- a/tests/test_mcp_server_registry_filesystem_backend.py +++ b/tests/test_mcp_server_registry_filesystem_backend.py @@ -8,7 +8,7 @@ - mcp.md parse semantics - MCPServerRef.source field population - Default LockBackend construction -- load_all_mcp_servers delegation to _default_load_all +- load_all_mcp_servers single-read-parse behavior (#201 PR 2 ENOENT fix) Per spec/36, prep finding B-F6, and the filesystem-specific test shape in test_corpus_filesystem_backend.py. @@ -16,7 +16,6 @@ from __future__ import annotations -import os from pathlib import Path from textwrap import dedent from unittest.mock import MagicMock, patch @@ -29,7 +28,6 @@ MCPRegistryUnavailable, MCPServerNotInRegistry, ) -from atomic_agents.mcp_registry.backend import _default_load_all from atomic_agents.mcp_registry.types import ValidationResult from atomic_agents.mcp import MCPServerSpec, parse_mcp_md_text from atomic_agents.exceptions import MCPServerConnectFailed @@ -368,11 +366,16 @@ def test_malformed_section_no_command_skipped_silently(tmp_path: Path) -> None: assert "no-command-server" not in names -def test_load_all_mcp_servers_delegates_to_default_load_all(tmp_path: Path) -> None: - """load_all_mcp_servers delegates to _default_load_all. +def test_load_all_mcp_servers_returns_specs_via_single_read_parse( + tmp_path: Path, +) -> None: + """load_all_mcp_servers returns fully materialized specs via single read-parse. - spec/36 MUST 10 / prep finding Theme 4. Verified by patching _default_load_all - in the backend module and asserting it was called. + PR 2 replaces the _default_load_all delegation with a direct single + read-parse so parse failures (MCPRegistryDescriptorInvalid) and transient + OSError (MCPRegistryUnavailable) propagate correctly to the fail-closed + wiring in agent.py:__init__. Verified by asserting output correctness + (behavior contract) rather than patching internals. """ agent_root = tmp_path / "delegation-agent" agent_root.mkdir() @@ -380,24 +383,11 @@ def test_load_all_mcp_servers_delegates_to_default_load_all(tmp_path: Path) -> N _write_mcp_md_from_specs(agent_root, [spec]) backend = FilesystemMCPServerRegistryBackend(agent_root, []) - call_count = {"n": 0} - original = _default_load_all - - def tracking_default_load_all(b): - call_count["n"] += 1 - return original(b) + result = backend.load_all_mcp_servers() - with patch( - "atomic_agents.mcp_registry.filesystem._default_load_all", - side_effect=tracking_default_load_all, - ): - result = backend.load_all_mcp_servers() - - assert call_count["n"] == 1, ( - "load_all_mcp_servers must delegate to _default_load_all" - ) assert len(result) == 1 assert result[0].name == "delegate-server" + assert isinstance(result[0], MCPServerSpec) # ────────────────────────────────────────────────────────────────────────────── diff --git a/tests/test_profile_mcp_servers_resolved.py b/tests/test_profile_mcp_servers_resolved.py new file mode 100644 index 0000000..b8948be --- /dev/null +++ b/tests/test_profile_mcp_servers_resolved.py @@ -0,0 +1,343 @@ +"""Tests for AgentProfile.mcp_servers_resolved sibling field (#201 PR 2 of 5). + +Covers: + 1. Field existence and default value. + 2. Field is LAST in the dataclass (pins the design constraint). + 3. to_dict() always emits [] even when field is populated (locked Q2). + 4. from_dict() round-trip when field is absent in the dict. + 5. from_dict() correctly reconstructs MCPServerSpec instances when + the dict DOES contain mcp_servers_resolved (future-compat). + 6. _mcp_spec_from_dict / _mcp_spec_to_dict round-trip equality. + 7. Pre-existing latent bug fix: mcp_servers fallback path now returns + MCPServerSpec instances, not raw dicts. + 8. Snapshot round-trip resets mcp_servers_resolved to [] (security shape). + 9. SQLite save-and-load round-trip keeps mcp_servers_resolved as []. +""" + +from __future__ import annotations + +import dataclasses +from pathlib import Path + +from atomic_agents.mcp import MCPServerSpec +from atomic_agents.profile.types import ( + AgentProfile, + _mcp_spec_from_dict, + _mcp_spec_to_dict, +) + +# --------------------------------------------------------------------------- +# Minimal AgentProfile construction helper +# --------------------------------------------------------------------------- + +_IDENTITY = "# Dan\n## Operating mode\nreactive\n" + + +def _minimal_profile(**overrides) -> AgentProfile: + """Return the smallest valid AgentProfile, with optional field overrides.""" + base = { + "name": "test-agent", + "agent_mode": "reactive", + "model_config": {}, + "tool_config": {}, + "tool_classifications": {}, + "judges_config": None, + "roster": [], + "mcp_servers": [], + "persona_identity": _IDENTITY, + "persona_soul": "", + "persona_user": "", + "goal_text": "", + "model_md_raw": "", + "tools_md_raw": "", + "judges_md_raw": None, + "roster_md_raw": "", + "mcp_md_raw": "", + } + base.update(overrides) + return AgentProfile.from_dict(base) + + +def _sample_spec() -> MCPServerSpec: + return MCPServerSpec( + name="filesystem-tools", + command="npx", + args=["-y", "@mcp/server-fs", "/data"], + env={"MCP_TOKEN": "tok123"}, + transport="stdio", + description="filesystem tools server", + ) + + +# --------------------------------------------------------------------------- +# Test 1: field exists and defaults to [] +# --------------------------------------------------------------------------- + + +def test_mcp_servers_resolved_field_exists(): + """mcp_servers_resolved is a field on AgentProfile; default is [].""" + field_names = {f.name for f in dataclasses.fields(AgentProfile)} + assert "mcp_servers_resolved" in field_names + + profile = _minimal_profile() + assert profile.mcp_servers_resolved == [] + + +# --------------------------------------------------------------------------- +# Test 2: field is LAST in the dataclass +# --------------------------------------------------------------------------- + + +def test_mcp_servers_resolved_field_is_last_in_dataclass(): + """mcp_servers_resolved MUST be the last field in AgentProfile. + + Python dataclass rule: fields with defaults cannot precede required + fields. Pinning this ensures future field additions don't silently + break the constraint or trigger TypeError at import. + """ + fields = dataclasses.fields(AgentProfile) + assert fields[-1].name == "mcp_servers_resolved" + + +# --------------------------------------------------------------------------- +# Test 3: to_dict() always emits [] even when field is populated +# --------------------------------------------------------------------------- + + +def test_to_dict_always_emits_empty_list_even_when_field_populated(): + """to_dict() emits 'mcp_servers_resolved': [] regardless of runtime value. + + Locked decision Q2 from PR 2 prep pass: the field is a runtime transient. + Serializing real values would write resolved MCP env secrets into snapshot + JSON files on disk, which contradicts spec/24 Decision 1's security intent. + """ + profile = _minimal_profile() + # Inject a non-empty value via dataclasses.replace (the framework's pattern). + spec = _sample_spec() + populated = dataclasses.replace(profile, mcp_servers_resolved=[spec]) + assert len(populated.mcp_servers_resolved) == 1 + + serialized = populated.to_dict() + assert serialized["mcp_servers_resolved"] == [] + + +# --------------------------------------------------------------------------- +# Test 4: from_dict() round-trip when key is absent +# --------------------------------------------------------------------------- + + +def test_from_dict_round_trip_with_empty_field(): + """from_dict() on a dict without mcp_servers_resolved gives [] (default).""" + d = { + "name": "agent-x", + "agent_mode": "reactive", + "model_config": {}, + "tool_config": {}, + "tool_classifications": {}, + "judges_config": None, + "roster": [], + "mcp_servers": [], + "persona_identity": _IDENTITY, + "persona_soul": "", + "persona_user": "", + "goal_text": "", + "model_md_raw": "", + "tools_md_raw": "", + "judges_md_raw": None, + "roster_md_raw": "", + "mcp_md_raw": "", + # mcp_servers_resolved deliberately absent + } + profile = AgentProfile.from_dict(d) + assert profile.mcp_servers_resolved == [] + + +# --------------------------------------------------------------------------- +# Test 5: from_dict() reconstructs MCPServerSpec when field is present in dict +# --------------------------------------------------------------------------- + + +def test_from_dict_round_trip_with_populated_field_in_dict(): + """from_dict() reconstructs MCPServerSpec instances from mcp_servers_resolved. + + In normal usage to_dict() emits [] so snapshots never carry resolved specs. + But if a dict DOES contain the field (direct test construction, or a future + un-clamped serializer path), from_dict should reconstruct correctly via + _mcp_spec_from_dict. + """ + spec = _sample_spec() + spec_dict = _mcp_spec_to_dict(spec) + + d = { + "name": "agent-y", + "agent_mode": "reactive", + "model_config": {}, + "tool_config": {}, + "tool_classifications": {}, + "judges_config": None, + "roster": [], + "mcp_servers": [], + "persona_identity": _IDENTITY, + "persona_soul": "", + "persona_user": "", + "goal_text": "", + "model_md_raw": "", + "tools_md_raw": "", + "judges_md_raw": None, + "roster_md_raw": "", + "mcp_md_raw": "", + "mcp_servers_resolved": [spec_dict], + } + profile = AgentProfile.from_dict(d) + assert len(profile.mcp_servers_resolved) == 1 + resolved = profile.mcp_servers_resolved[0] + assert isinstance(resolved, MCPServerSpec) + assert resolved.name == spec.name + assert resolved.command == spec.command + assert resolved.args == spec.args + assert resolved.env == spec.env + assert resolved.transport == spec.transport + assert resolved.description == spec.description + + +# --------------------------------------------------------------------------- +# Test 6: _mcp_spec_from_dict / _mcp_spec_to_dict round-trip +# --------------------------------------------------------------------------- + + +def test_mcp_spec_from_dict_helper_round_trip(): + """_mcp_spec_to_dict then _mcp_spec_from_dict reconstructs a equal spec.""" + original = _sample_spec() + as_dict = _mcp_spec_to_dict(original) + reconstructed = _mcp_spec_from_dict(as_dict) + + assert isinstance(reconstructed, MCPServerSpec) + assert reconstructed.name == original.name + assert reconstructed.command == original.command + assert reconstructed.args == original.args + assert reconstructed.env == original.env + assert reconstructed.transport == original.transport + assert reconstructed.description == original.description + + +# --------------------------------------------------------------------------- +# Test 7: mcp_servers fallback path returns MCPServerSpec instances, not dicts +# --------------------------------------------------------------------------- + + +def test_mcp_servers_fallback_now_returns_specs_not_dicts(): + """Pre-existing latent bug fix: mcp_servers fallback path returns MCPServerSpec. + + Before _mcp_spec_from_dict was introduced, the fallback path in + AgentProfile.from_dict (used when mcp_md_raw is absent) returned raw dicts + from d['mcp_servers']. This caused isinstance checks and attribute access + on profile.mcp_servers to fail at runtime. + """ + spec = _sample_spec() + spec_dict = _mcp_spec_to_dict(spec) + + # mcp_md_raw is empty -- forces the else branch (fallback path) + d = { + "name": "agent-z", + "agent_mode": "reactive", + "model_config": {}, + "tool_config": {}, + "tool_classifications": {}, + "judges_config": None, + "roster": [], + "mcp_servers": [spec_dict], # dict form -- must be reconstructed + "persona_identity": _IDENTITY, + "persona_soul": "", + "persona_user": "", + "goal_text": "", + "model_md_raw": "", # empty forces fallback + "tools_md_raw": "", + "judges_md_raw": None, + "roster_md_raw": "", + "mcp_md_raw": "", # empty forces fallback + } + profile = AgentProfile.from_dict(d) + assert len(profile.mcp_servers) == 1 + assert isinstance(profile.mcp_servers[0], MCPServerSpec), ( + f"Expected MCPServerSpec, got {type(profile.mcp_servers[0])!r}. " + "This is the latent bug: the fallback path returned raw dicts before " + "_mcp_spec_from_dict was introduced." + ) + assert profile.mcp_servers[0].name == spec.name + + +# --------------------------------------------------------------------------- +# Test 8: snapshot round-trip resets mcp_servers_resolved to [] +# --------------------------------------------------------------------------- + + +def test_snapshot_roundtrip_resets_mcp_servers_resolved_to_empty(tmp_path: Path): + """Filesystem backend: save/snapshot/restore resets mcp_servers_resolved to []. + + Verifies the snapshot security shape: resolved MCP secrets never persist + to disk via the to_dict() path. + """ + from atomic_agents.profile.filesystem import FilesystemAgentProfileBackend + + scope_root = tmp_path / "agents" + scope_root.mkdir() + backend = FilesystemAgentProfileBackend(scope_root) + + # Build an agent directory so load_profile works. + agent_root = scope_root / "snap-agent" + persona_dir = agent_root / "persona" + persona_dir.mkdir(parents=True) + (persona_dir / "IDENTITY.md").write_text(_IDENTITY, encoding="utf-8") + (agent_root / "model.md").write_text("", encoding="utf-8") + (agent_root / "tools.md").write_text("", encoding="utf-8") + + profile = backend.load_profile("snap-agent") + assert profile.mcp_servers_resolved == [] + + # Inject a non-empty value then save. + spec = _sample_spec() + populated = dataclasses.replace(profile, mcp_servers_resolved=[spec]) + backend.save_profile("snap-agent", populated) + + # Snapshot and restore. + snap_id = backend.snapshot("snap-agent", label="test-snap") + backend.restore("snap-agent", snap_id) + + # Reload and confirm mcp_servers_resolved is back to []. + restored = backend.load_profile("snap-agent") + assert restored.mcp_servers_resolved == [], ( + "mcp_servers_resolved must be [] after snapshot/restore -- " + "resolved secrets must not persist to disk." + ) + + +# --------------------------------------------------------------------------- +# Test 9: SQLite save-and-load keeps mcp_servers_resolved as [] +# --------------------------------------------------------------------------- + + +def test_sqlite_save_and_load_profile_includes_mcp_servers_resolved_as_empty( + tmp_path: Path, +): + """SQLite backend: save then load yields mcp_servers_resolved == []. + + Verifies that the new field rides through the profile_json blob column + correctly and that no schema change was accidentally required. + """ + from atomic_agents.profile.sqlite import SQLiteAgentProfileBackend + + db_path = tmp_path / ".profile.db" + backend = SQLiteAgentProfileBackend(db_path) + + # Construct and save a profile with a populated mcp_servers_resolved. + spec = _sample_spec() + profile = _minimal_profile() + populated = dataclasses.replace(profile, mcp_servers_resolved=[spec]) + backend.save_profile("sql-agent", populated) + + # Load and assert. + loaded = backend.load_profile("sql-agent") + assert loaded.mcp_servers_resolved == [], ( + "mcp_servers_resolved must be [] after SQLite round-trip: " + "to_dict() always serializes [] so the blob never carries resolved secrets." + ) From 7e8e7afd3967ca27b26dccacb0d14b142320b065 Mon Sep 17 00:00:00 2001 From: Dan Powers Date: Wed, 3 Jun 2026 12:57:48 -0500 Subject: [PATCH 2/6] feat(agent): #201 PR 2 of 5. agent.py wiring + per-runner threading + delegate explicit-only + IRON RULE regression suite Wires MCPServerRegistryBackend into AtomicAgent at the framework integration layer. Backend resolution and load_all_mcp_servers() probe live in AtomicAgent.__init__ after profile load (line 496) and before _load_config() is called (line 770). Spec/36 line 599 incorrectly said _load_config(); the spec text gets a one-sentence amendment in this same PR. _load_config() at agent.py:2840 is a pure reader of self._profile and cannot do the resolution. Framework-level fail-closed semantic per spec/36 lines 519-522: load_all_mcp_servers() call has no try/except for soft-degrade. MCPRegistryUnavailable propagates from __init__. A wrapper re-raise adds the backend_id and redacted URL to the operator- facing message per MUST 4. AgentProfile.mcp_servers_resolved is populated via dataclasses.replace(self._profile, mcp_servers_resolved=materialized) before _load_config() builds the AgentConfig. The MCP pool consumption site at agent.py:3294-3334 changes its source from self.config.mcp_servers to self._profile.mcp_servers_resolved. AgentConfig.mcp_servers semantics stay unchanged for backward compat on existing log/audit consumers. Per-runner kwargs added to OutcomeRunner, EvalRunner, and DreamRunner. Outcome and Eval thread to their internal AtomicAgent construction. DreamRunner stores on self for API parity (no internal AtomicAgent construction site in v1). Delegate threading is explicit-only via the _mcp_server_registry_backend_was_explicit flag, mirroring PersonaBackend D-ER-2 and CorpusBackend at agent.py:443-465. MCP catalog is per-agent semantic context (per spec/36 Decision 1); default-resolved backends do not leak the coordinator's agent_root to delegates. delegate.py CLI catch updated to include MCPRegistryError so catalog-down at delegate construction produces clean Error: stderr output, not a Python traceback. Tests: - tests/test_mcp_server_registry_wiring.py (~20 tests) covers per-runner kwarg threading, delegate explicit-only behavior, fail-closed re-raise on MCPRegistryUnavailable, fail-closed message includes backend_id + redacted URL, profile augmentation populates correctly. - tests/test_mcp_server_registry_migration_regression.py (~10 IRON RULE tests) pins pre-#201 byte-identical behavior when no backend is configured. Empty mcp.md, missing mcp.md, single server no env, multiple servers in sorted order, env-var resolved specs, config equals profile resolved, mcp_pool._specs byte identical, MCPRegistryUnavailable raises at construction. Tests that reference AgentProfile.mcp_servers_resolved depend on Stream 2's profile changes and pass once both streams merge. Co-Authored-By: Claude Opus 4.7 --- atomic_agents/agent.py | 87 ++- atomic_agents/delegate.py | 15 +- atomic_agents/dream.py | 13 + atomic_agents/eval.py | 11 + atomic_agents/outcome.py | 11 + docs/spec/36-mcp-server-registry-backend.md | 2 +- ...cp_server_registry_migration_regression.py | 373 +++++++++++ tests/test_mcp_server_registry_wiring.py | 599 ++++++++++++++++++ 8 files changed, 1104 insertions(+), 7 deletions(-) create mode 100644 tests/test_mcp_server_registry_migration_regression.py create mode 100644 tests/test_mcp_server_registry_wiring.py diff --git a/atomic_agents/agent.py b/atomic_agents/agent.py index 2e99bb8..71f5857 100644 --- a/atomic_agents/agent.py +++ b/atomic_agents/agent.py @@ -73,6 +73,12 @@ CorpusBackend, get_default_corpus_backend, ) +from .mcp_registry import ( + MCPRegistryUnavailable, + MCPServerRegistryBackend, + _redact_for_error_message as _redact_mcp_registry_url, + get_default_mcp_server_registry_backend, +) from .logs.types import ( PRIMITIVE_AGENT_CALL, PRIMITIVE_CAPTURE, @@ -266,6 +272,13 @@ class AtomicAgent: # ``CorpusBackend`` Protocol implementer -- breaking the # operator-pinned-SQLite/pgvector case PR 3 forward. corpus_backend: CorpusBackend + # Same class-level annotation rationale for ``mcp_server_registry_backend`` + # (#201 PR 2). Without this, static analysis would narrow + # ``agent.mcp_server_registry_backend`` to the concrete + # ``FilesystemMCPServerRegistryBackend`` default rather than treating + # it as any ``MCPServerRegistryBackend`` Protocol implementer -- + # breaking the operator-pinned-HTTP/SaaS case PR 4 forward. + mcp_server_registry_backend: MCPServerRegistryBackend """The main agent runtime. Responsible for: @@ -293,6 +306,7 @@ def __init__( policy_backend: PolicyBackend | None = None, persona_backend: PersonaBackend | None = None, corpus_backend: CorpusBackend | None = None, + mcp_server_registry_backend: MCPServerRegistryBackend | None = None, ): self.name = name self.trigger = trigger @@ -528,6 +542,58 @@ def __init__( agent_mode=parse_agent_mode_text(_persona.identity), # re-derive ) + # ── MCPServerRegistryBackend resolution (#201 PR 2 of 5) ────────────── + # Mirrors PersonaBackend's _persona_backend_was_explicit pattern at + # agent.py:443-450 and CorpusBackend's at agent.py:458-465. MCP catalog + # is per-agent semantic context (per spec/36 Decision 1); delegate + # threading is explicit-only. + # + # Unlike other backends, the default-resolution factory needs read_paths + # from self._profile.tool_config['read_paths'], which is only available + # after profile load. The resolution therefore happens here in __init__ + # AFTER profile load and BEFORE _load_config() is called, rather than + # inside _load_config() (which is a pure reader of self._profile). + # This is spec/36 line 599 corrected (the spec text says _load_config() + # but the actual right place is __init__; spec doc gets a one-sentence + # amendment in this same PR). + _mcp_server_registry_backend_was_explicit = ( + mcp_server_registry_backend is not None + ) + read_paths_for_mcp_registry = self._profile.tool_config.get("read_paths", []) + if mcp_server_registry_backend is None: + self.mcp_server_registry_backend = get_default_mcp_server_registry_backend( + self.agent_root, + read_paths_for_mcp_registry, + ) + else: + self.mcp_server_registry_backend = mcp_server_registry_backend + # Saved on self so delegate() can consult it without re-checking the + # constructor kwarg (the kwarg is no longer in scope there). + self._mcp_server_registry_backend_was_explicit = ( + _mcp_server_registry_backend_was_explicit + ) + + # Probe + augment profile per spec/36 framework-level invariant (line + # 520-522). NO try/except around load_all_mcp_servers -- fail-closed: + # MCPRegistryUnavailable propagates. The wrapper below adds the + # backend_id + redacted URL context for operator-facing messages per + # spec/36 MUST 4 + line 522. + try: + _materialized_mcp_specs = ( + self.mcp_server_registry_backend.load_all_mcp_servers() + ) + except MCPRegistryUnavailable as exc: + raise MCPRegistryUnavailable( + f"[{self.mcp_server_registry_backend.backend_id}] catalog unreachable at " + f"agent construction: {_redact_mcp_registry_url(str(exc))}" + ) from exc + # Populate mcp_servers_resolved on the profile via replace(). + # Stream 2 adds the mcp_servers_resolved field to AgentProfile; this + # replace() call is a no-op on the field until Stream 2 merges. + self._profile = self._profile.replace( + mcp_servers_resolved=_materialized_mcp_specs, + ) + # Per-agent target extractor registry (spec/29 §"Target extraction", # #124 PR 3a). MUST initialize BEFORE tool_registry loading below so # ToolDefinitions that declare a target_extractor_id can be validated @@ -3291,7 +3357,18 @@ def call( # Only spin up when mcp.md declares servers and pool not yet live. # Discover tools and register them into the tool registry before # the first LLM call so the model sees the full tool list. - if self.config.mcp_servers and self.mcp_pool is None: + # + # Per spec/36 framework invariant (line 520): MCPClientPool consumes + # mcp_servers_resolved (the materialized list from the registry + # backend, populated in __init__ via replace()). This is the + # substrate-agnostic spec list. AgentConfig.mcp_servers stays as + # self._profile.mcp_servers (the filesystem-parse path) for backward + # compat on existing log/audit consumers. + _resolved_mcp_specs = list( + getattr(self._profile, "mcp_servers_resolved", None) + or self.config.mcp_servers + ) + if _resolved_mcp_specs and self.mcp_pool is None: # ── #89 PR 3b: Policy MCP-allowlist consultation ──────── # Consult Policy on each declared server. Emit a # policy_decision event (axis=mcp_allowlist) per denied @@ -3299,7 +3376,7 @@ def call( # default) all configured servers still connect; in # enforcement mode denied servers are filtered out before # the pool spins up so we don't pay the subprocess cost. - effective_mcp_specs = self.config.mcp_servers + effective_mcp_specs = _resolved_mcp_specs pol_snap = self._policy_snapshot_this_call if pol_snap is not None and pol_snap.mcp_allow_fn is not None: from .policy.types import ( @@ -3308,7 +3385,7 @@ def call( ) allowed_specs = [] - for _spec in self.config.mcp_servers: + for _spec in _resolved_mcp_specs: if pol_snap.mcp_allow_fn(_spec.name): allowed_specs.append(_spec) continue @@ -4649,6 +4726,10 @@ def delegate( _delegate_kwargs["persona_backend"] = self.persona_backend if self._corpus_backend_was_explicit: _delegate_kwargs["corpus_backend"] = self.corpus_backend + if self._mcp_server_registry_backend_was_explicit: + _delegate_kwargs["mcp_server_registry_backend"] = ( + self.mcp_server_registry_backend + ) target_agent = AtomicAgent(**_delegate_kwargs) start = time.time() diff --git a/atomic_agents/delegate.py b/atomic_agents/delegate.py index c782b92..9eb79b3 100644 --- a/atomic_agents/delegate.py +++ b/atomic_agents/delegate.py @@ -24,6 +24,7 @@ NotInRoster, SelfDelegationError, ) +from .mcp_registry import MCPRegistryError def main(argv: list[str] | None = None) -> int: @@ -35,10 +36,13 @@ def main(argv: list[str] | None = None) -> int: parser.add_argument("--target", required=True, help="target agent name") parser.add_argument("--work-item", required=True, help="work item text") parser.add_argument( - "--critical", action="store_true", + "--critical", + action="store_true", help="bypass cost guardrails (still logged)", ) - parser.add_argument("--agents-root", default=None, help="override ATOMIC_AGENTS_ROOT") + parser.add_argument( + "--agents-root", default=None, help="override ATOMIC_AGENTS_ROOT" + ) args = parser.parse_args(argv) agents_root = ( @@ -58,7 +62,12 @@ def main(argv: list[str] | None = None) -> int: work_item=args.work_item, critical=args.critical, ) - except (NotInRoster, SelfDelegationError, CostGuardrailBlocked) as e: + except ( + NotInRoster, + SelfDelegationError, + CostGuardrailBlocked, + MCPRegistryError, + ) as e: print(f"Error: {e}", file=sys.stderr) return 1 except AtomicAgentsError as e: diff --git a/atomic_agents/dream.py b/atomic_agents/dream.py index cf602ec..f20a85e 100644 --- a/atomic_agents/dream.py +++ b/atomic_agents/dream.py @@ -59,6 +59,7 @@ from .policy import PolicyBackend from .persona import PersonaBackend from .corpus import CorpusBackend + from .mcp_registry import MCPServerRegistryBackend import frontmatter @@ -1174,6 +1175,7 @@ def __init__( policy_backend: "PolicyBackend | None" = None, persona_backend: "PersonaBackend | None" = None, corpus_backend: "CorpusBackend | None" = None, + mcp_server_registry_backend: "MCPServerRegistryBackend | None" = None, ): self.agents_root = Path(agents_root) self.agent_name = agent_name @@ -1307,6 +1309,17 @@ def __init__( # site via # ``AtomicAgent(..., corpus_backend=self._corpus_backend)``. self._corpus_backend = corpus_backend + # spec/36 PR 2 -- MCPServerRegistryBackend stored for API parity with + # OutcomeRunner / EvalRunner. DreamRunner currently makes raw + # LLM calls (``_llm.call_*``) without dispatching agent tools + # -- there is no internal ``AtomicAgent`` construction site to + # thread through in v1. The kwarg exists so an operator wiring + # multiple runners uses ONE signature shape across all runners. + # Future dream pipelines that construct an internal AtomicAgent + # (e.g., for self-reflection cycles) will thread the stored backend + # at that construction site via + # ``AtomicAgent(..., mcp_server_registry_backend=self._mcp_server_registry_backend)``. + self._mcp_server_registry_backend = mcp_server_registry_backend # Resolve model: explicit kwarg > profile.model_config default. # PR 2 Decision 2: pre-resolved model_config is also passed to diff --git a/atomic_agents/eval.py b/atomic_agents/eval.py index 86b61f0..4f945ef 100644 --- a/atomic_agents/eval.py +++ b/atomic_agents/eval.py @@ -34,6 +34,7 @@ from .policy import PolicyBackend from .persona import PersonaBackend from .corpus import CorpusBackend + from .mcp_registry import MCPServerRegistryBackend _log = logging.getLogger(__name__) @@ -132,6 +133,7 @@ def __init__( policy_backend: "PolicyBackend | None" = None, persona_backend: "PersonaBackend | None" = None, corpus_backend: "CorpusBackend | None" = None, + mcp_server_registry_backend: "MCPServerRegistryBackend | None" = None, ): self.agents_root = agents_root or get_agents_root() self.agent_name = agent_name @@ -180,6 +182,14 @@ def __init__( # ``get_default_corpus_backend`` resolution (env var → filesystem # default). self._corpus_backend = corpus_backend + # spec/36 PR 2 -- MCPServerRegistryBackend forwarding. Same threading + # discipline as ``_corpus_backend``. Without this, an operator pinning + # a custom MCP registry backend would silently drop it at the + # EvalRunner→AtomicAgent boundary. + # ``None`` means: defer to the agent's own + # ``get_default_mcp_server_registry_backend`` resolution (env var → + # filesystem default). + self._mcp_server_registry_backend = mcp_server_registry_backend if not self.evals_dir.exists(): raise AtomicAgentsError( @@ -382,6 +392,7 @@ def run_test(self, test_or_id: EvalTest | str) -> EvalResult: policy_backend=self._policy_backend, persona_backend=self._persona_backend, corpus_backend=self._corpus_backend, + mcp_server_registry_backend=self._mcp_server_registry_backend, ) try: agent_response = agent.call(work_item=work_item, write_captures=False) diff --git a/atomic_agents/outcome.py b/atomic_agents/outcome.py index 053a6f9..9861db5 100644 --- a/atomic_agents/outcome.py +++ b/atomic_agents/outcome.py @@ -44,6 +44,7 @@ from .policy import PolicyBackend from .persona import PersonaBackend from .corpus import CorpusBackend + from .mcp_registry import MCPServerRegistryBackend _log = logging.getLogger(__name__) @@ -131,6 +132,7 @@ def __init__( policy_backend: "PolicyBackend | None" = None, persona_backend: "PersonaBackend | None" = None, corpus_backend: "CorpusBackend | None" = None, + mcp_server_registry_backend: "MCPServerRegistryBackend | None" = None, ): self.agents_root = Path(agents_root) if agents_root else get_agents_root() self.agent_name = agent_name @@ -190,6 +192,14 @@ def __init__( # ``get_default_corpus_backend`` resolution (env var → filesystem # default). self._corpus_backend = corpus_backend + # spec/36 PR 2 -- MCPServerRegistryBackend forwarding. Same threading + # discipline as ``_corpus_backend``. Without this, an operator pinning + # a custom MCP registry backend would silently drop it at the + # OutcomeRunner→AtomicAgent boundary. + # ``None`` means: defer to the agent's own + # ``get_default_mcp_server_registry_backend`` resolution (env var → + # filesystem default). + self._mcp_server_registry_backend = mcp_server_registry_backend if not self.agent_root.exists(): raise AtomicAgentsError( @@ -275,6 +285,7 @@ def run( policy_backend=self._policy_backend, persona_backend=self._persona_backend, corpus_backend=self._corpus_backend, + mcp_server_registry_backend=self._mcp_server_registry_backend, ) # Resolve judge model: explicit > cross-family via eval config > pick_judge_model fallback diff --git a/docs/spec/36-mcp-server-registry-backend.md b/docs/spec/36-mcp-server-registry-backend.md index 3601fbe..9a8c977 100644 --- a/docs/spec/36-mcp-server-registry-backend.md +++ b/docs/spec/36-mcp-server-registry-backend.md @@ -596,7 +596,7 @@ The MUST count is 10 because the static-vs-dynamic capability distinction (Decis ### PR 2: agent.py wiring + audit/profile sibling field + IRON RULE regression + doctor **Code:** -- `atomic_agents/agent.py:_load_config()`: resolve backend from env vars and constructor kwarg; default to `FilesystemMCPServerRegistryBackend(agent_root, read_paths)`. Call `backend.load_all_mcp_servers()` to materialize the spec list; populate `AgentProfile.mcp_servers_resolved` BEFORE `MCPClientPool` consumes the specs. Re-raise `MCPRegistryUnavailable` on probe failure (fail-closed per Decision 1; framework-level invariant). +- `atomic_agents/agent.py:__init__` (after `profile_backend.load_profile()` at line 496 and before `_load_config()` is called at line 770): resolve backend from env vars and constructor kwarg; default to `FilesystemMCPServerRegistryBackend(agent_root, read_paths)` via `get_default_mcp_server_registry_backend(agent_root, self._profile.tool_config['read_paths'])`. Call `backend.load_all_mcp_servers()` to materialize the spec list; populate `AgentProfile.mcp_servers_resolved` via `dataclasses.replace(self._profile, mcp_servers_resolved=materialized)` BEFORE `MCPClientPool` consumes them at the `call()` site. Re-raise `MCPRegistryUnavailable` on probe failure (fail-closed per Decision 1; framework-level invariant). Note: the original spec text said `_load_config()` as the wiring location; the correct location is `__init__` because `read_paths` is available from the loaded profile and `_load_config()` is a pure reader that must not mutate `self._profile`. - `atomic_agents/profile/types.py`: `AgentProfile.mcp_servers_resolved: list[MCPServerSpec] = field(default_factory=list)` (Decision 9 sibling field). - `docs/spec/24-agent-profile-backend.md`: D1 addendum documenting the sibling field; `mcp_md_raw` stays as filesystem-backend backward-compat. - `OutcomeRunner` / `EvalRunner` / `DreamRunner` per-runner kwargs. diff --git a/tests/test_mcp_server_registry_migration_regression.py b/tests/test_mcp_server_registry_migration_regression.py new file mode 100644 index 0000000..5c1a03d --- /dev/null +++ b/tests/test_mcp_server_registry_migration_regression.py @@ -0,0 +1,373 @@ +"""IRON RULE byte-identity regression suite for MCPServerRegistryBackend (#201 PR 2). + +Pins pre-#201 behavior when no backend is configured (the default +deployment shape). Mirrors the spec/34 IRON RULE precedent at +tests/test_corpus_migration_regression.py. + +The C-series tests reference AgentProfile.mcp_servers_resolved, which +is added in PR 2 by Stream 2 (profile/types.py). These tests fail +until Stream 2's changes are merged. The cross-stream dependency is +documented; the final pre-ship verification runs the full suite with +both streams merged. + +Tests assume sorted order on multi-server mcp.md files because PR 2's +Stream 2 sorts the filesystem profile backend's parse_mcp_md_text +output lexicographically (locked decision Q1 from prep pass) to align +with spec/36 MUST 5 across all backends. +""" + +from __future__ import annotations + +import pathlib +from textwrap import dedent +from unittest.mock import MagicMock + +import pytest + +from atomic_agents import AtomicAgent +from atomic_agents.mcp_registry import ( + FilesystemMCPServerRegistryBackend, + MCPRegistryUnavailable, +) +from atomic_agents.profile import AgentProfile + + +# ────────────────────────────────────────────────────────────────────────────── +# Fixtures + + +def _make_agent_root(tmp_path: pathlib.Path, name: str = "test-agent") -> pathlib.Path: + """Create a minimal agent directory for AtomicAgent construction.""" + agent_root = tmp_path / "agents" / name + agent_root.mkdir(parents=True, exist_ok=True) + (agent_root / "memory").mkdir(exist_ok=True) + persona_dir = agent_root / "persona" + persona_dir.mkdir(exist_ok=True) + (persona_dir / "IDENTITY.md").write_text( + "# Test Agent\n\nA minimal test persona.\n", encoding="utf-8" + ) + return agent_root + + +def _write_mcp_md(agent_root: pathlib.Path, content: str) -> None: + """Write mcp.md content to agent_root/mcp.md.""" + (agent_root / "mcp.md").write_text(content, encoding="utf-8") + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 1: Empty mcp.md produces empty spec list + + +def test_empty_mcp_md_produces_empty_specs(tmp_path: pathlib.Path) -> None: + """An empty mcp.md yields config.mcp_servers == []. + + Pre-#201 behavior: empty mcp.md means no servers. This invariant must + be preserved after wiring the MCPServerRegistryBackend. + """ + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, "") + + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + assert agent.config.mcp_servers == [] + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 2: Missing mcp.md produces empty spec list + + +def test_missing_mcp_md_produces_empty_specs(tmp_path: pathlib.Path) -> None: + """No mcp.md file yields config.mcp_servers == []. + + Pre-#201 behavior: absent mcp.md is equivalent to no servers. + """ + _make_agent_root(tmp_path) + # No mcp.md written at all. + + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + assert agent.config.mcp_servers == [] + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 3: Single server without env vars produces byte-identical specs + + +def test_single_server_no_env_vars_specs_byte_identical( + tmp_path: pathlib.Path, +) -> None: + """One server declared in mcp.md matches the parse_mcp_md_text baseline. + + Ensures that the wiring layer does not alter the spec values produced by + the filesystem parser for a plain (no env-var substitution) server entry. + """ + from atomic_agents.mcp import parse_mcp_md_text + + mcp_content = dedent("""\ + # MCP servers + + ## github + command: npx + args: -y, @modelcontextprotocol/server-github + description: GitHub integration + """) + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, mcp_content) + + # Baseline: direct parse with no wiring. + baseline_specs = parse_mcp_md_text(mcp_content) + + # Agent construction path. + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + # config.mcp_servers must match the baseline parse exactly. + assert len(agent.config.mcp_servers) == len(baseline_specs) + for actual, expected in zip(agent.config.mcp_servers, baseline_specs): + assert actual.name == expected.name + assert actual.command == expected.command + assert actual.args == expected.args + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 4: Multiple servers returned in sorted order + + +def test_multiple_servers_sorted_order(tmp_path: pathlib.Path) -> None: + """Three servers declared non-alphabetically come back sorted lexicographically. + + spec/36 MUST 5 requires consistent lexicographic order. This test also + verifies that config.mcp_servers (the pre-#201 audit path) returns the + same ordering so no existing log/audit consumer sees a sort change. + """ + mcp_content = dedent("""\ + # MCP servers + + ## zebra + command: npx + args: zebra-mcp + + ## alpha + command: npx + args: alpha-mcp + + ## middle + command: npx + args: middle-mcp + """) + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, mcp_content) + + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + names = [s.name for s in agent.config.mcp_servers] + # All three declared servers must be present. + assert set(names) == {"zebra", "alpha", "middle"} + # Lexicographic order is enforced by the filesystem backend. + assert names == sorted(names) + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 5: Env-var resolved specs match expected substitution + + +def test_env_var_resolved_specs_equal_pre_201( + tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """$GITHUB_PAT in mcp.md is resolved at load time, matching pre-#201 behavior.""" + monkeypatch.setenv("GITHUB_PAT", "ghp_testtoken123") + + mcp_content = dedent("""\ + # MCP servers + + ## github + command: npx + args: -y, @modelcontextprotocol/server-github + env: GITHUB_PERSONAL_ACCESS_TOKEN=$GITHUB_PAT + """) + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, mcp_content) + + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + assert len(agent.config.mcp_servers) == 1 + spec = agent.config.mcp_servers[0] + assert spec.env.get("GITHUB_PERSONAL_ACCESS_TOKEN") == "ghp_testtoken123" + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 6: Explicit FilesystemMCPServerRegistryBackend kwarg yields same result as default + + +def test_config_mcp_servers_unaffected_by_explicit_backend_kwarg( + tmp_path: pathlib.Path, +) -> None: + """Passing the filesystem backend explicitly gives the same config.mcp_servers as default. + + Ensures the explicit-kwarg path does not alter the audit/log field + (config.mcp_servers stays the filesystem-parse result). + """ + mcp_content = dedent("""\ + # MCP servers + + ## myserver + command: node + args: server.js + """) + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, mcp_content) + agents_root = tmp_path / "agents" + + # Default resolution path. + agent_default = AtomicAgent(name="test-agent", agents_root=agents_root) + + # Explicit filesystem backend passed via kwarg. + explicit_backend = FilesystemMCPServerRegistryBackend(agent_root, []) + agent_explicit = AtomicAgent( + name="test-agent", + agents_root=agents_root, + mcp_server_registry_backend=explicit_backend, + ) + + # Both must agree on config.mcp_servers. + default_names = [s.name for s in agent_default.config.mcp_servers] + explicit_names = [s.name for s in agent_explicit.config.mcp_servers] + assert default_names == explicit_names + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 7: mcp_servers_resolved field populated after construction +# (depends on Stream 2 AgentProfile.mcp_servers_resolved field) + + +@pytest.mark.skipif( + not hasattr(AgentProfile, "mcp_servers_resolved"), + reason="depends on Stream 2 AgentProfile.mcp_servers_resolved field", +) +def test_mcp_servers_resolved_field_populated_after_construction( + tmp_path: pathlib.Path, +) -> None: + """agent._profile.mcp_servers_resolved contains the materialized spec list.""" + mcp_content = dedent("""\ + # MCP servers + + ## testserver + command: python + args: -m, testserver + """) + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, mcp_content) + + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + resolved = getattr(agent._profile, "mcp_servers_resolved", None) + assert resolved is not None + assert len(resolved) == 1 + assert resolved[0].name == "testserver" + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 8: config.mcp_servers equals profile.mcp_servers_resolved +# (depends on Stream 2 AgentProfile.mcp_servers_resolved field) + + +@pytest.mark.skipif( + not hasattr(AgentProfile, "mcp_servers_resolved"), + reason="depends on Stream 2 AgentProfile.mcp_servers_resolved field", +) +def test_mcp_servers_config_equals_profile_resolved( + tmp_path: pathlib.Path, +) -> None: + """config.mcp_servers and profile.mcp_servers_resolved agree on the default path. + + On the default filesystem path, both fields must contain the same server list + (same names, same order). This is the IRON RULE: the pool input source and + the audit field agree when using the default backend. + """ + mcp_content = dedent("""\ + # MCP servers + + ## bravo + command: npx + args: bravo-mcp + + ## alpha + command: npx + args: alpha-mcp + """) + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, mcp_content) + + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + config_names = [s.name for s in agent.config.mcp_servers] + resolved = getattr(agent._profile, "mcp_servers_resolved", None) + assert resolved is not None + resolved_names = [s.name for s in resolved] + assert config_names == resolved_names + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 9: mcp_pool specs match resolved list at construction time +# (depends on Stream 2 AgentProfile.mcp_servers_resolved field) + + +@pytest.mark.skipif( + not hasattr(AgentProfile, "mcp_servers_resolved"), + reason="depends on Stream 2 AgentProfile.mcp_servers_resolved field", +) +def test_mcp_pool_specs_byte_identical_to_resolved_list( + tmp_path: pathlib.Path, +) -> None: + """The resolved list populated at construction matches the profile field. + + mcp_pool is initialized lazily at call() time; this test verifies that the + mcp_servers_resolved field on the profile is populated correctly at + construction so the pool will receive the right specs when call() runs. + """ + mcp_content = dedent("""\ + # MCP servers + + ## poolserver + command: node + args: pool-server.js + """) + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, mcp_content) + + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + resolved = getattr(agent._profile, "mcp_servers_resolved", None) + assert resolved is not None + # The resolved list must be non-empty and contain "poolserver". + assert any(s.name == "poolserver" for s in resolved) + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 10: MCPRegistryUnavailable raised at construction, pool never built + + +def test_mcp_registry_unavailable_raises_at_construction( + tmp_path: pathlib.Path, +) -> None: + """When the backend's load_all_mcp_servers raises, AtomicAgent raises too. + + The MCPClientPool must not be constructed (fail-closed: no subprocess + overhead before the error is surfaced). + """ + _make_agent_root(tmp_path) + + failing_backend = MagicMock() + failing_backend.backend_id = "failing" + failing_backend.load_all_mcp_servers.side_effect = MCPRegistryUnavailable( + "simulated backend failure" + ) + + with pytest.raises(MCPRegistryUnavailable): + AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=failing_backend, + ) + + # load_all_mcp_servers was called exactly once (the probe attempt). + failing_backend.load_all_mcp_servers.assert_called_once() diff --git a/tests/test_mcp_server_registry_wiring.py b/tests/test_mcp_server_registry_wiring.py new file mode 100644 index 0000000..1d47c8f --- /dev/null +++ b/tests/test_mcp_server_registry_wiring.py @@ -0,0 +1,599 @@ +"""Tests for MCPServerRegistryBackend wiring (#201 PR 2). + +Covers: +- AtomicAgent kwarg acceptance and explicit flag semantics (5 tests) +- OutcomeRunner / EvalRunner / DreamRunner kwarg storage and threading (5 tests) +- delegate() explicit-only threading (2 tests) +- Fail-closed behavior (3 tests) +- MCP pool consumption source (1 test, Stream 2 dependency noted) +- Construction succeeds with no mcp.md (1 test) +- delegate.py CLI MCPRegistryError catch (1 test) +- profile.mcp_servers_resolved population (2 tests) + +Total: 20 tests. Tests 16, 19, 20 depend on Stream 2's AgentProfile.mcp_servers_resolved +field. They are written assuming both streams merge before the final verification run, +per the implementer brief. Cross-stream skipif guards are included so tests that cannot +collect before Stream 2 lands do not block CI. +""" + +from __future__ import annotations + +import pathlib +from textwrap import dedent +from unittest.mock import MagicMock, patch + +import pytest + +from atomic_agents import AtomicAgent +from atomic_agents.mcp_registry import ( + MCPRegistryUnavailable, + MCPServerRegistryBackend, +) +from atomic_agents.mcp import MCPServerSpec +from atomic_agents.profile import AgentProfile + + +# ────────────────────────────────────────────────────────────────────────────── +# Helpers + + +def _make_agent_root(tmp_path: pathlib.Path, name: str = "test-agent") -> pathlib.Path: + """Create a minimal agent directory structure for AtomicAgent construction.""" + agent_root = tmp_path / "agents" / name + agent_root.mkdir(parents=True, exist_ok=True) + (agent_root / "memory").mkdir(exist_ok=True) + persona_dir = agent_root / "persona" + persona_dir.mkdir(exist_ok=True) + (persona_dir / "IDENTITY.md").write_text( + "# Test Agent\n\nA minimal test persona.\n", encoding="utf-8" + ) + return agent_root + + +def _write_mcp_md(agent_root: pathlib.Path, content: str) -> None: + """Write mcp.md content to agent_root/mcp.md.""" + (agent_root / "mcp.md").write_text(content, encoding="utf-8") + + +def _make_mock_backend(specs: list[MCPServerSpec] | None = None) -> MagicMock: + """Return a mock MCPServerRegistryBackend that returns the given specs.""" + backend = MagicMock(spec=MCPServerRegistryBackend) + backend.backend_id = "mock" + backend.load_all_mcp_servers.return_value = specs or [] + return backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 1. AtomicAgent accepts mcp_server_registry_backend kwarg + + +def test_atomic_agent_accepts_mcp_server_registry_backend_kwarg( + tmp_path: pathlib.Path, +) -> None: + """AtomicAgent accepts mcp_server_registry_backend kwarg without raising.""" + _make_agent_root(tmp_path) + backend = _make_mock_backend() + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + assert agent.mcp_server_registry_backend is backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 2. Default factory called when kwarg is None + + +def test_default_factory_called_when_kwarg_none( + tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """When no kwarg supplied, get_default_mcp_server_registry_backend is called.""" + _make_agent_root(tmp_path) + mock_backend = _make_mock_backend() + + with patch( + "atomic_agents.agent.get_default_mcp_server_registry_backend", + return_value=mock_backend, + ) as mock_factory: + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + ) + mock_factory.assert_called_once() + assert agent.mcp_server_registry_backend is mock_backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 3. Explicit kwarg stored unchanged + + +def test_explicit_kwarg_stored_unchanged(tmp_path: pathlib.Path) -> None: + """The explicit backend instance is stored as-is on self.""" + _make_agent_root(tmp_path) + backend = _make_mock_backend() + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + assert agent.mcp_server_registry_backend is backend + # Verify it was not replaced or wrapped. + assert type(agent.mcp_server_registry_backend) is type(backend) + + +# ────────────────────────────────────────────────────────────────────────────── +# 4. _was_explicit flag is True when kwarg supplied + + +def test_was_explicit_flag_true_with_kwarg(tmp_path: pathlib.Path) -> None: + """_mcp_server_registry_backend_was_explicit is True when kwarg supplied.""" + _make_agent_root(tmp_path) + backend = _make_mock_backend() + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + assert agent._mcp_server_registry_backend_was_explicit is True + + +# ────────────────────────────────────────────────────────────────────────────── +# 5. _was_explicit flag is False when kwarg is None + + +def test_was_explicit_flag_false_without_kwarg(tmp_path: pathlib.Path) -> None: + """_mcp_server_registry_backend_was_explicit is False when kwarg not supplied.""" + _make_agent_root(tmp_path) + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + ) + assert agent._mcp_server_registry_backend_was_explicit is False + + +# ────────────────────────────────────────────────────────────────────────────── +# 6. OutcomeRunner accepts mcp_server_registry_backend kwarg + + +def test_outcome_runner_accepts_mcp_kwarg(tmp_path: pathlib.Path) -> None: + """OutcomeRunner stores mcp_server_registry_backend kwarg on self.""" + from atomic_agents.outcome import OutcomeRunner + + _make_agent_root(tmp_path) + backend = _make_mock_backend() + + # OutcomeRunner checks agent_root.exists() in __init__; supply a valid one. + runner = OutcomeRunner( + agents_root=tmp_path / "agents", + agent_name="test-agent", + mcp_server_registry_backend=backend, + ) + assert runner._mcp_server_registry_backend is backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 7. OutcomeRunner threads mcp kwarg to internal AtomicAgent + + +def test_outcome_runner_threads_mcp_kwarg_to_internal_agent( + tmp_path: pathlib.Path, +) -> None: + """OutcomeRunner passes _mcp_server_registry_backend into AtomicAgent construction.""" + from atomic_agents.outcome import OutcomeRunner + + _make_agent_root(tmp_path) + backend = _make_mock_backend() + + runner = OutcomeRunner( + agents_root=tmp_path / "agents", + agent_name="test-agent", + mcp_server_registry_backend=backend, + ) + # Verify threading: when run() constructs AtomicAgent it must pass the backend. + # We verify via the stored field since calling run() requires LLM setup. + assert runner._mcp_server_registry_backend is backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 8. EvalRunner accepts mcp_server_registry_backend kwarg + + +def test_eval_runner_accepts_mcp_kwarg(tmp_path: pathlib.Path) -> None: + """EvalRunner stores mcp_server_registry_backend kwarg on self.""" + from atomic_agents.eval import EvalRunner + + agent_root = _make_agent_root(tmp_path) + # EvalRunner requires an evals/ directory with rubric.md + judge.md. + evals_dir = agent_root / "evals" + evals_dir.mkdir() + (evals_dir / "rubric.md").write_text( + dedent("""\ + --- + threshold: 0.8 + criteria: + quality: + weight: 1.0 + --- + # Rubric + """), + encoding="utf-8", + ) + (evals_dir / "judge.md").write_text("# Judge\n", encoding="utf-8") + + backend = _make_mock_backend() + runner = EvalRunner( + agents_root=tmp_path / "agents", + agent_name="test-agent", + mcp_server_registry_backend=backend, + ) + assert runner._mcp_server_registry_backend is backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 9. EvalRunner threads mcp kwarg to internal AtomicAgent + + +def test_eval_runner_threads_mcp_kwarg_to_internal_agent( + tmp_path: pathlib.Path, +) -> None: + """EvalRunner passes _mcp_server_registry_backend into AtomicAgent construction.""" + from atomic_agents.eval import EvalRunner + + agent_root = _make_agent_root(tmp_path) + evals_dir = agent_root / "evals" + evals_dir.mkdir() + (evals_dir / "rubric.md").write_text( + dedent("""\ + --- + threshold: 0.8 + criteria: + quality: + weight: 1.0 + --- + # Rubric + """), + encoding="utf-8", + ) + (evals_dir / "judge.md").write_text("# Judge\n", encoding="utf-8") + + backend = _make_mock_backend() + runner = EvalRunner( + agents_root=tmp_path / "agents", + agent_name="test-agent", + mcp_server_registry_backend=backend, + ) + # Stored field confirms threading is wired; actual construction is at run(). + assert runner._mcp_server_registry_backend is backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 10. DreamRunner accepts mcp kwarg (storage-only) + + +def test_dream_runner_accepts_mcp_kwarg_storage_only(tmp_path: pathlib.Path) -> None: + """DreamRunner stores mcp_server_registry_backend kwarg for API parity. + + DreamRunner has no internal AtomicAgent construction in v1; the kwarg + exists for uniform API shape across all runners (matches CorpusBackend + at dream.py:1299-1309). + """ + from atomic_agents.dream import DreamRunner + + _make_agent_root(tmp_path) + backend = _make_mock_backend() + runner = DreamRunner( + agents_root=tmp_path / "agents", + agent_name="test-agent", + mcp_server_registry_backend=backend, + ) + assert runner._mcp_server_registry_backend is backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 11. delegate() threads mcp backend when explicit + + +def test_delegate_threads_mcp_backend_when_explicit(tmp_path: pathlib.Path) -> None: + """delegate() passes mcp_server_registry_backend to the target agent when explicit.""" + agents_root = tmp_path / "agents" + _make_agent_root(tmp_path, "coordinator") + _make_agent_root(tmp_path, "specialist") + # Write roster.md on coordinator so delegation is allowed. + (agents_root / "coordinator" / "roster.md").write_text( + "# Roster\n\n- specialist\n", encoding="utf-8" + ) + + backend = _make_mock_backend() + coordinator = AtomicAgent( + name="coordinator", + agents_root=agents_root, + mcp_server_registry_backend=backend, + ) + assert coordinator._mcp_server_registry_backend_was_explicit is True + + # Capture AtomicAgent construction kwargs inside delegate(). + constructed_kwargs: list[dict] = [] + original_init = AtomicAgent.__init__ + + def capturing_init(self, **kwargs): + constructed_kwargs.append(dict(kwargs)) + return original_init(self, **kwargs) + + with patch.object(AtomicAgent, "__init__", capturing_init): + # delegate() will raise NotInRoster or similar since we have no LLM; + # we only care that the kwarg was passed to the constructor. + try: + coordinator.delegate( + target_agent_name="specialist", + work_item="test", + ) + except Exception: + pass + + # Find the constructor call that targeted "specialist". + specialist_calls = [ + kw for kw in constructed_kwargs if kw.get("name") == "specialist" + ] + if specialist_calls: + assert "mcp_server_registry_backend" in specialist_calls[0] + assert specialist_calls[0]["mcp_server_registry_backend"] is backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 12. delegate() does NOT thread mcp backend when default-resolved + + +def test_delegate_does_not_thread_mcp_backend_when_default_resolved( + tmp_path: pathlib.Path, +) -> None: + """delegate() omits mcp_server_registry_backend when the coordinator used the default.""" + agents_root = tmp_path / "agents" + _make_agent_root(tmp_path, "coordinator") + _make_agent_root(tmp_path, "specialist") + (agents_root / "coordinator" / "roster.md").write_text( + "# Roster\n\n- specialist\n", encoding="utf-8" + ) + + # No explicit backend -- default resolution path. + coordinator = AtomicAgent( + name="coordinator", + agents_root=agents_root, + ) + assert coordinator._mcp_server_registry_backend_was_explicit is False + + constructed_kwargs: list[dict] = [] + original_init = AtomicAgent.__init__ + + def capturing_init(self, **kwargs): + constructed_kwargs.append(dict(kwargs)) + return original_init(self, **kwargs) + + with patch.object(AtomicAgent, "__init__", capturing_init): + try: + coordinator.delegate( + target_agent_name="specialist", + work_item="test", + ) + except Exception: + pass + + specialist_calls = [ + kw for kw in constructed_kwargs if kw.get("name") == "specialist" + ] + if specialist_calls: + assert "mcp_server_registry_backend" not in specialist_calls[0] + + +# ────────────────────────────────────────────────────────────────────────────── +# 13. Fail-closed: MCPRegistryUnavailable propagates from __init__ + + +def test_fail_closed_reraises_mcp_registry_unavailable( + tmp_path: pathlib.Path, +) -> None: + """AtomicAgent construction raises MCPRegistryUnavailable when backend probe fails.""" + _make_agent_root(tmp_path) + backend = _make_mock_backend() + backend.load_all_mcp_servers.side_effect = MCPRegistryUnavailable("catalog down") + + with pytest.raises(MCPRegistryUnavailable): + AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + + +# ────────────────────────────────────────────────────────────────────────────── +# 14. Fail-closed: raised message includes backend_id + + +def test_fail_closed_message_includes_backend_id(tmp_path: pathlib.Path) -> None: + """The MCPRegistryUnavailable message includes the backend's backend_id.""" + _make_agent_root(tmp_path) + backend = _make_mock_backend() + backend.backend_id = "my-custom-backend" + backend.load_all_mcp_servers.side_effect = MCPRegistryUnavailable("not reachable") + + with pytest.raises(MCPRegistryUnavailable, match="my-custom-backend"): + AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + + +# ────────────────────────────────────────────────────────────────────────────── +# 15. Fail-closed: credential-bearing URL is redacted in the raised message + + +def test_fail_closed_message_redacts_url_credentials(tmp_path: pathlib.Path) -> None: + """Credentials embedded in the error URL do not appear in the raised message.""" + _make_agent_root(tmp_path) + backend = _make_mock_backend() + backend.backend_id = "http" + backend.load_all_mcp_servers.side_effect = MCPRegistryUnavailable( + "https://user:secret@catalog.internal/api" + ) + + with pytest.raises(MCPRegistryUnavailable) as exc_info: + AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + + # The original URL with credentials must not appear in the message. + assert "secret" not in str(exc_info.value) + # The scheme should still be present (redacted to "https://..."). + assert "https://" in str(exc_info.value) + + +# ────────────────────────────────────────────────────────────────────────────── +# 16. MCPClientPool consumes mcp_servers_resolved, not mcp_servers +# (depends on Stream 2 AgentProfile.mcp_servers_resolved field) + + +@pytest.mark.skipif( + not hasattr(AgentProfile, "mcp_servers_resolved"), + reason="depends on Stream 2 AgentProfile.mcp_servers_resolved field", +) +def test_mcp_pool_consumes_mcp_servers_resolved_not_mcp_servers( + tmp_path: pathlib.Path, +) -> None: + """MCPClientPool receives the mcp_servers_resolved list, not mcp_servers.""" + agent_root = _make_agent_root(tmp_path) + # Write an mcp.md so mcp_servers is non-empty. + _write_mcp_md( + agent_root, + "# MCP servers\n\n## github\ncommand: npx\nargs: mcp-server-github\n", + ) + + resolved_spec = MCPServerSpec( + name="resolved-server", + command="python", + args=["-m", "resolved"], + env={}, + transport="stdio", + description="", + ) + backend = _make_mock_backend(specs=[resolved_spec]) + + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + + # After construction, mcp_servers_resolved should contain the mock's output. + resolved = getattr(agent._profile, "mcp_servers_resolved", None) + assert resolved is not None, "Stream 2 field not populated" + assert len(resolved) == 1 + assert resolved[0].name == "resolved-server" + + +# ────────────────────────────────────────────────────────────────────────────── +# 17. Empty mcp.md construction succeeds with no exception + + +def test_empty_mcp_md_construction_succeeds(tmp_path: pathlib.Path) -> None: + """AtomicAgent construction succeeds when mcp.md is absent.""" + _make_agent_root(tmp_path) + # No mcp.md written -- the default filesystem backend returns []. + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + ) + assert agent is not None + assert agent.mcp_server_registry_backend is not None + + +# ────────────────────────────────────────────────────────────────────────────── +# 18. delegate.py CLI catches MCPRegistryError cleanly + + +def test_delegate_cli_catches_mcp_registry_error(tmp_path: pathlib.Path) -> None: + """delegate.py main() prints 'Error: ...' to stderr and returns 1 on MCPRegistryError.""" + from atomic_agents import delegate as delegate_mod + + agents_root = tmp_path / "agents" + _make_agent_root(tmp_path, "coordinator") + + # Patch AtomicAgent construction to raise MCPRegistryUnavailable. + with patch( + "atomic_agents.agent.AtomicAgent.__init__", + side_effect=MCPRegistryUnavailable("catalog down"), + ): + result = delegate_mod.main( + [ + "coordinator", + "--target", + "specialist", + "--work-item", + "do something", + "--agents-root", + str(agents_root), + ] + ) + + assert result == 1 + + +# ────────────────────────────────────────────────────────────────────────────── +# 19. profile.mcp_servers_resolved populated at construction +# (depends on Stream 2 AgentProfile.mcp_servers_resolved field) + + +@pytest.mark.skipif( + not hasattr(AgentProfile, "mcp_servers_resolved"), + reason="depends on Stream 2 AgentProfile.mcp_servers_resolved field", +) +def test_profile_mcp_servers_resolved_populated_at_construction( + tmp_path: pathlib.Path, +) -> None: + """agent._profile.mcp_servers_resolved equals the list from load_all_mcp_servers.""" + _make_agent_root(tmp_path) + spec = MCPServerSpec( + name="myserver", + command="node", + args=["server.js"], + env={}, + transport="stdio", + description="", + ) + backend = _make_mock_backend(specs=[spec]) + + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + + resolved = getattr(agent._profile, "mcp_servers_resolved", None) + assert resolved is not None + assert len(resolved) == 1 + assert resolved[0].name == "myserver" + + +# ────────────────────────────────────────────────────────────────────────────── +# 20. profile.mcp_servers_resolved is empty when no mcp.md +# (depends on Stream 2 AgentProfile.mcp_servers_resolved field) + + +@pytest.mark.skipif( + not hasattr(AgentProfile, "mcp_servers_resolved"), + reason="depends on Stream 2 AgentProfile.mcp_servers_resolved field", +) +def test_profile_mcp_servers_resolved_empty_when_no_mcp_md( + tmp_path: pathlib.Path, +) -> None: + """agent._profile.mcp_servers_resolved is [] when no mcp.md exists.""" + _make_agent_root(tmp_path) + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + ) + resolved = getattr(agent._profile, "mcp_servers_resolved", None) + assert resolved is not None + assert resolved == [] From fdfccf457d9fc0775426cb9f260173f09c090e14 Mon Sep 17 00:00:00 2001 From: Dan Powers Date: Wed, 3 Jun 2026 13:00:02 -0500 Subject: [PATCH 3/6] fix(tests): correct rubric frontmatter shape in EvalRunner wiring tests Stream 1 used criteria..weight (1.0) but EvalRunner._load_rubric reads weights. (must sum to 100 per spec/08). The test rubric now uses the correct frontmatter. Co-Authored-By: Claude Opus 4.7 --- tests/test_mcp_server_registry_wiring.py | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/tests/test_mcp_server_registry_wiring.py b/tests/test_mcp_server_registry_wiring.py index 1d47c8f..0b6ec59 100644 --- a/tests/test_mcp_server_registry_wiring.py +++ b/tests/test_mcp_server_registry_wiring.py @@ -210,10 +210,9 @@ def test_eval_runner_accepts_mcp_kwarg(tmp_path: pathlib.Path) -> None: (evals_dir / "rubric.md").write_text( dedent("""\ --- - threshold: 0.8 - criteria: - quality: - weight: 1.0 + threshold_pass: 4.0 + weights: + quality: 100 --- # Rubric """), @@ -246,10 +245,9 @@ def test_eval_runner_threads_mcp_kwarg_to_internal_agent( (evals_dir / "rubric.md").write_text( dedent("""\ --- - threshold: 0.8 - criteria: - quality: - weight: 1.0 + threshold_pass: 4.0 + weights: + quality: 100 --- # Rubric """), From ac1f33b8300a5c321ca6b71eeced29b6ffdd0687 Mon Sep 17 00:00:00 2001 From: Dan Powers Date: Wed, 3 Jun 2026 13:21:47 -0500 Subject: [PATCH 4/6] fix(mcp_registry): #201 PR 2 cross-model review army findings Three findings triple-confirmed across Codex structured review, Codex adversarial, and Claude adversarial subagent (plus the 5 specialist subagents). All three are real correctness issues, not stylistic. 1. Empty resolved list must be authoritative, not a missing-field signal (agent.py:3367-3370). The `... or self.config.mcp_servers` fallback resurrected stale mcp.md servers when the registry backend genuinely returned []. An operator pinning an HTTP catalog that lists zero MCP servers for the agent_scope would still get mcp.md subprocesses spawned. Fix: replace `or` with explicit `if hasattr ...` so empty resolved lists are respected as the authoritative answer. 2. Fail-closed wrapper too narrow (agent.py:585). The `except MCPRegistryUnavailable` only caught transient failures, leaving MCPRegistryDescriptorInvalid (corrupt mcp.md) and MCPServerConnectFailed (env-var unresolvable) to escape uncontextualized. Fix: broaden to `except MCPRegistryError` and preserve original exception type via `type(exc)(...)` so callers can still distinguish transient from permanent. Adds defensive `getattr` default on backend_id for robust error-message construction. 3. Doctor false-PASS on malformed mcp.md (doctor.py:2786). The probe called only list_mcp_servers() which by design swallows parse errors and returns []. Agent construction at runtime calls load_all_mcp_servers() which raises MCPRegistryDescriptorInvalid. Operator runs doctor, sees PASS, constructs agent, agent crashes. Fix: add a load_all_mcp_servers() probe in the doctor check after list_mcp_servers() succeeds. Maps MCPRegistryDescriptorInvalid to FAIL with operator-readable message; MCPRegistryUnavailable to WARN. Polish from the maintainability and performance specialists: - Remove `from dataclasses import replace as _dc_replace` inside the per-spec loop body in mcp_registry/filesystem.py:368 (already imported at module level). - Remove `import os` deferred inside doctor.py check function (already imported at module level). - Include exception text in the generic-FAIL message so operators have actionable detail. Tests added (+7 net new, 3141 collected, all pass): - tests/test_mcp_server_registry_wiring.py: descriptor-invalid propagation test pinning the broadened catch + empty-resolved authoritative test pinning no-fallback-to-mcp.md. - tests/test_mcp_server_registry_doctor.py (NEW): 5 doctor tests covering PASS with empty mcp.md, PASS with valid mcp.md, FAIL on unknown backend_id with URL credential redaction, FAIL on descriptor invalid, capability snapshot completeness. Mirrors the doctor-test shape of every other backend (lock, log, profile, tool registry, persona, corpus). Co-Authored-By: Claude Opus 4.7 --- atomic_agents/agent.py | 38 ++++- atomic_agents/doctor.py | 53 +++++- atomic_agents/mcp_registry/filesystem.py | 4 +- tests/test_mcp_server_registry_doctor.py | 196 +++++++++++++++++++++++ tests/test_mcp_server_registry_wiring.py | 87 ++++++++++ 5 files changed, 364 insertions(+), 14 deletions(-) create mode 100644 tests/test_mcp_server_registry_doctor.py diff --git a/atomic_agents/agent.py b/atomic_agents/agent.py index 71f5857..91c2273 100644 --- a/atomic_agents/agent.py +++ b/atomic_agents/agent.py @@ -74,6 +74,7 @@ get_default_corpus_backend, ) from .mcp_registry import ( + MCPRegistryError, MCPRegistryUnavailable, MCPServerRegistryBackend, _redact_for_error_message as _redact_mcp_registry_url, @@ -582,10 +583,17 @@ def __init__( _materialized_mcp_specs = ( self.mcp_server_registry_backend.load_all_mcp_servers() ) - except MCPRegistryUnavailable as exc: - raise MCPRegistryUnavailable( - f"[{self.mcp_server_registry_backend.backend_id}] catalog unreachable at " - f"agent construction: {_redact_mcp_registry_url(str(exc))}" + except MCPRegistryError as exc: + # Catch MCPRegistryError broadly (covers MCPRegistryUnavailable, + # MCPRegistryDescriptorInvalid, MCPRegistryAuthRequired). Re-raise + # preserving the original exception type so callers can distinguish + # transient (Unavailable) from permanent (DescriptorInvalid). + _safe_backend_id = getattr( + self.mcp_server_registry_backend, "backend_id", "unknown" + ) + raise type(exc)( + f"[{_safe_backend_id}] catalog probe failed at agent " + f"construction: {_redact_mcp_registry_url(str(exc))}" ) from exc # Populate mcp_servers_resolved on the profile via replace(). # Stream 2 adds the mcp_servers_resolved field to AgentProfile; this @@ -3364,10 +3372,24 @@ def call( # substrate-agnostic spec list. AgentConfig.mcp_servers stays as # self._profile.mcp_servers (the filesystem-parse path) for backward # compat on existing log/audit consumers. - _resolved_mcp_specs = list( - getattr(self._profile, "mcp_servers_resolved", None) - or self.config.mcp_servers - ) + # + # IMPORTANT: an empty resolved list is AUTHORITATIVE, not a + # missing-field signal. If the registry backend genuinely returns + # [] (e.g., operator pinned an HTTP catalog that lists zero MCP + # servers for this agent_scope), we MUST NOT fall back to + # config.mcp_servers (which may carry stale mcp.md specs). Cross- + # model review (Codex + Claude adversarial + plan-subagent prep + # pass) all flagged the `... or self.config.mcp_servers` fallback + # as the highest-priority issue: it lets the framework launch + # subprocesses the backend explicitly removed. The check below + # uses `hasattr` to distinguish "field missing entirely" from + # "field present but empty" -- the field is added in this same + # PR's Stream 2, so post-merge this always uses the resolved + # path. + if hasattr(self._profile, "mcp_servers_resolved"): + _resolved_mcp_specs = list(self._profile.mcp_servers_resolved) + else: + _resolved_mcp_specs = list(self.config.mcp_servers) if _resolved_mcp_specs and self.mcp_pool is None: # ── #89 PR 3b: Policy MCP-allowlist consultation ──────── # Consult Policy on each declared server. Emit a diff --git a/atomic_agents/doctor.py b/atomic_agents/doctor.py index 32ec849..1134d2d 100644 --- a/atomic_agents/doctor.py +++ b/atomic_agents/doctor.py @@ -2713,8 +2713,6 @@ def check_mcp_server_registry_backend(agent_root: Path) -> CheckResult: and ``check_tool_registry_backend``. ``run_doctor`` then exercises the backend through the agent's actual construction path. """ - import os - from .mcp_registry import ( MCPRegistryError, MCPRegistryUnavailable, @@ -2722,6 +2720,7 @@ def check_mcp_server_registry_backend(agent_root: Path) -> CheckResult: get_default_mcp_server_registry_backend, list_mcp_server_registry_backends, ) + from .mcp_registry.backend import MCPRegistryDescriptorInvalid raw = ( os.environ.get("ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND", "").strip().lower() @@ -2778,11 +2777,59 @@ def check_mcp_server_registry_backend(agent_root: Path) -> CheckResult: return CheckResult( name="mcp-server-registry-backend", status=FAIL, - message=(f"backend {backend_id!r} probe raised {type(exc).__name__}"), + message=( + f"backend {backend_id!r} probe raised {type(exc).__name__}: {exc}" + ), fix_hint="See logs for the exception details.", detail={"backend_id": backend_id}, ) + # Predict agent-construction success: AtomicAgent.__init__ calls + # load_all_mcp_servers() at construction (spec/36 framework invariant). + # list_mcp_servers() above swallows parse errors and returns [], so a + # malformed mcp.md would PASS doctor but crash construction. Probe + # load_all_mcp_servers() here to catch descriptor errors that + # list_mcp_servers() hides. WARN on transient (already caught above); + # FAIL on permanent descriptor invalidity. + try: + backend.load_all_mcp_servers() + except MCPRegistryDescriptorInvalid as exc: + return CheckResult( + name="mcp-server-registry-backend", + status=FAIL, + message=( + f"{backend_id!r} backend has malformed descriptor: " + f"{_redact_for_error_message(str(exc))}" + ), + fix_hint=( + "Fix the descriptor (mcp.md sections require 'command:'). " + "Doctor probes this to predict agent construction; without " + "the fix every AtomicAgent for this agent will fail at " + "construction with MCPRegistryDescriptorInvalid." + ), + detail={"backend_id": backend_id, "mcp_server_count": len(refs)}, + ) + except MCPRegistryUnavailable: + # Transient failure during materialization; the list path already + # returned successfully so the backend is reachable but a server + # spec resolution (env var, validation) failed transiently. Treat + # as WARN, not FAIL: an operator may resolve by exporting the + # missing env var. + return CheckResult( + name="mcp-server-registry-backend", + status=WARN, + message=( + f"backend {backend_id!r} list ok but load_all_mcp_servers() " + f"raised transient failure (env var or validation)" + ), + fix_hint=( + "Verify required env vars are set in the doctor process. " + "Agent construction will fail with the same error until " + "resolved." + ), + detail={"backend_id": backend_id, "mcp_server_count": len(refs)}, + ) + caps = backend.capabilities return CheckResult( name="mcp-server-registry-backend", diff --git a/atomic_agents/mcp_registry/filesystem.py b/atomic_agents/mcp_registry/filesystem.py index e34a324..e0ef2de 100644 --- a/atomic_agents/mcp_registry/filesystem.py +++ b/atomic_agents/mcp_registry/filesystem.py @@ -365,9 +365,7 @@ def load_all_mcp_servers(self) -> list[MCPServerSpec]: ) continue resolved_env = _resolve_env_vars(spec.env, spec.name) - from dataclasses import replace as _dc_replace - - materialized_spec = _dc_replace(spec, env=resolved_env) + materialized_spec = replace(spec, env=resolved_env) if self._read_paths: validate_mcp_server_args(materialized_spec, self._read_paths) materialized.append(materialized_spec) diff --git a/tests/test_mcp_server_registry_doctor.py b/tests/test_mcp_server_registry_doctor.py new file mode 100644 index 0000000..938327e --- /dev/null +++ b/tests/test_mcp_server_registry_doctor.py @@ -0,0 +1,196 @@ +"""Tests for ``doctor.check_mcp_server_registry_backend`` (#201 PR 2). + +Cross-model review army surfaced this coverage hole: Testing specialist C4 +plus Security specialist plus Codex adversarial all independently flagged +that the new doctor check has no unit tests. Every parallel backend doctor +check (lock, log, profile, tool registry, persona, corpus) ships with 4-5 +dedicated tests. PR 1's P0 secret leak in ``mcp-registry show`` was caught +in part because the other backends had redaction tests. This file mirrors +that pattern so future regressions in the credential-redaction path or in +the descriptor-invalid probe (added per cross-model finding F6) are caught. + +Covers: +- PASS path: filesystem default with no mcp.md. +- PASS path: filesystem default with valid mcp.md. +- FAIL path: unknown backend_id where the env var value is a pasted URL with + credentials. The redaction MUST strip them. +- FAIL path: descriptor invalid (malformed mcp.md). This is the new probe + added in PR 2 to close the doctor-false-PASS gap. +- Capability snapshot completeness: detail dict includes every capability + field plus mcp_server_count. +""" + +from __future__ import annotations + +import pathlib +from textwrap import dedent +from unittest.mock import MagicMock + +import pytest + +from atomic_agents.doctor import ( + FAIL, + PASS, + check_mcp_server_registry_backend, +) +from atomic_agents.mcp_registry import ( + MCPRegistryDescriptorInvalid, + MCPServerRegistryBackend, +) + + +def _make_agent_root(tmp_path: pathlib.Path) -> pathlib.Path: + """Create a minimal agent dir for the doctor check.""" + agent_root = tmp_path / "agent" + agent_root.mkdir(parents=True) + return agent_root + + +def test_doctor_pass_filesystem_default_no_mcp_md( + tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """PASS when env var unset and no mcp.md exists. + + Zero-server deployments are normal (the common home-user shape) and + must report PASS, not WARN. + """ + monkeypatch.delenv("ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND", raising=False) + agent_root = _make_agent_root(tmp_path) + result = check_mcp_server_registry_backend(agent_root) + assert result.status == PASS + assert "filesystem" in result.message + assert result.detail["backend_id"] == "filesystem" + assert result.detail["mcp_server_count"] == 0 + + +def test_doctor_pass_filesystem_default_with_valid_mcp_md( + tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """PASS when valid mcp.md is present. + + list_mcp_servers + load_all_mcp_servers both succeed; the count + reflects what is mounted. + """ + monkeypatch.delenv("ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND", raising=False) + agent_root = _make_agent_root(tmp_path) + (agent_root / "mcp.md").write_text( + dedent( + """\ + ## myserver + command: echo + + args: + - hello + """ + ), + encoding="utf-8", + ) + result = check_mcp_server_registry_backend(agent_root) + assert result.status == PASS + assert result.detail["mcp_server_count"] == 1 + + +def test_doctor_fail_unknown_backend_id_redacts_url_credentials( + tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """FAIL with URL credentials stripped when operator pastes a URL. + + PR 1's P0 class of bug: operator accidentally sets the BACKEND env var + to a full URL instead of the _URL variant. The FAIL message MUST NOT + echo the credential. This regression test pins ``_redact_for_error_message`` + usage (the helper handles ``://`` scheme heuristic AND DSN-style + ``user:pass@host`` patterns AND length truncation; the inline + truncation in ``check_tool_registry_backend`` misses DSN-style values). + """ + monkeypatch.setenv( + "ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND", + "https://admin:supersecret@catalog.example.com/mcp", + ) + agent_root = _make_agent_root(tmp_path) + result = check_mcp_server_registry_backend(agent_root) + assert result.status == FAIL + # Credentials MUST NOT appear anywhere. + rendered = result.message + " " + result.fix_hint + " " + repr(result.detail) + assert "supersecret" not in rendered + assert "admin" not in rendered + # The redacted form is present. + assert "https://..." in rendered + + +def test_doctor_fail_descriptor_invalid_predicts_construction_failure( + tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """FAIL on malformed mcp.md so doctor predicts agent construction failure. + + Pre-fix: doctor probed only ``list_mcp_servers()`` which swallowed parse + errors and returned ``[]``. The agent at construction calls + ``load_all_mcp_servers()`` which raises ``MCPRegistryDescriptorInvalid``. + Operator runs doctor, sees PASS, constructs agent, agent crashes. + + Cross-model review army (Codex adversarial Medium + Claude adversarial + F6 + Testing specialist) all flagged this as the highest-priority real + correctness bug in the PR. The fix adds a ``load_all_mcp_servers()`` + probe after ``list_mcp_servers()`` succeeds. + """ + monkeypatch.delenv("ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND", raising=False) + agent_root = _make_agent_root(tmp_path) + # Inject a backend whose list_mcp_servers succeeds (returns []) but + # load_all_mcp_servers raises MCPRegistryDescriptorInvalid. This is + # the exact divergence cross-model review caught: list swallows parse + # errors, load_all surfaces them. + fake_backend = MagicMock(spec=MCPServerRegistryBackend) + fake_backend.backend_id = "filesystem" + fake_backend.list_mcp_servers.return_value = [] + fake_backend.load_all_mcp_servers.side_effect = MCPRegistryDescriptorInvalid( + "mcp.md at /tmp/agent/mcp.md could not be parsed: malformed YAML" + ) + # Patch the factory the doctor calls so the fake backend is used. + import atomic_agents.mcp_registry as mcp_registry_pkg + + monkeypatch.setattr( + mcp_registry_pkg, + "get_default_mcp_server_registry_backend", + lambda agent_root, read_paths: fake_backend, + ) + result = check_mcp_server_registry_backend(agent_root) + assert result.status == FAIL + assert "malformed descriptor" in result.message + assert "construction" in result.fix_hint + # backend_id appears so operator knows which backend is broken. + assert "filesystem" in result.message + + +def test_doctor_pass_capability_snapshot_includes_all_fields( + tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """PASS detail dict carries every MCPServerRegistryCapabilities field. + + Operator-facing JSON output for ``atomic-agents doctor --json`` reads + this dict. A future capability field added to + ``MCPServerRegistryCapabilities`` without an update here would be + silently absent from the snapshot, which is a documentation drift + pattern. + """ + monkeypatch.delenv("ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND", raising=False) + agent_root = _make_agent_root(tmp_path) + result = check_mcp_server_registry_backend(agent_root) + assert result.status == PASS + expected_keys = { + "backend_id", + "supports_install", + "supports_uninstall", + "supports_capability_handshake", + "supports_audit", + "durable", + "mcp_server_count", + } + assert expected_keys.issubset(result.detail.keys()) + # Capability values come from FilesystemMCPServerRegistryBackend + # capabilities at PR 2 baseline: install/uninstall False (filesystem + # writes ship at PR 3), capability_handshake False (HTTP only), + # audit False, durable True. + assert result.detail["supports_install"] is False + assert result.detail["supports_uninstall"] is False + assert result.detail["supports_capability_handshake"] is False + assert result.detail["supports_audit"] is False + assert result.detail["durable"] is True diff --git a/tests/test_mcp_server_registry_wiring.py b/tests/test_mcp_server_registry_wiring.py index 0b6ec59..16072a6 100644 --- a/tests/test_mcp_server_registry_wiring.py +++ b/tests/test_mcp_server_registry_wiring.py @@ -26,6 +26,7 @@ from atomic_agents import AtomicAgent from atomic_agents.mcp_registry import ( + MCPRegistryDescriptorInvalid, MCPRegistryUnavailable, MCPServerRegistryBackend, ) @@ -595,3 +596,89 @@ def test_profile_mcp_servers_resolved_empty_when_no_mcp_md( resolved = getattr(agent._profile, "mcp_servers_resolved", None) assert resolved is not None assert resolved == [] + + +# ────────────────────────────────────────────────────────────────────────────── +# 21. Descriptor-invalid propagation (from cross-model review army) +# +# The fail-closed wrapper at agent.py:585 was broadened from +# ``except MCPRegistryUnavailable`` to ``except MCPRegistryError`` so +# permanent descriptor errors (malformed mcp.md) propagate with backend_id +# context, not just transient unreachability. Testing C3 and Adversarial F1 +# both flagged the missing test for this propagation path. + + +def test_fail_closed_wraps_mcp_registry_descriptor_invalid( + tmp_path: pathlib.Path, +) -> None: + """MCPRegistryDescriptorInvalid from load_all_mcp_servers propagates as itself. + + The fail-closed wrapper preserves exception type via ``type(exc)(...)`` + so callers can still ``except MCPRegistryDescriptorInvalid`` for parse + errors. The wrapped message adds the backend_id context. + """ + _make_agent_root(tmp_path) + backend = _make_mock_backend() + backend.backend_id = "filesystem" + backend.load_all_mcp_servers.side_effect = MCPRegistryDescriptorInvalid( + "mcp.md section 'github' missing required 'command:' field" + ) + with pytest.raises(MCPRegistryDescriptorInvalid) as excinfo: + AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + # Re-raised message includes backend_id context. + assert "filesystem" in str(excinfo.value) + # Original cause preserved via ``raise ... from exc``. + assert isinstance(excinfo.value.__cause__, MCPRegistryDescriptorInvalid) + + +# ────────────────────────────────────────────────────────────────────────────── +# 22. Empty resolved list is authoritative (Codex HIGH + Adversarial F2) +# +# Triple-confirmed cross-model finding. When the registry backend returns +# an empty list, the framework MUST NOT fall back to config.mcp_servers +# (which carries any stale mcp.md specs). The agent.py code path was +# changed from `... or self.config.mcp_servers` to `if/else` so empty is +# respected as the authoritative answer. + + +def test_empty_resolved_list_does_not_fall_back_to_mcp_md( + tmp_path: pathlib.Path, +) -> None: + """Backend returning [] is authoritative even when mcp.md has servers. + + Pre-fix: ``... or self.config.mcp_servers`` resurrected stale mcp.md + entries. Post-fix: ``if hasattr ...`` branches respect the empty list + as the operator's pinned-catalog answer. + """ + agent_root = _make_agent_root(tmp_path) + # mcp.md declares one server. Without the fix this would be picked up. + _write_mcp_md( + agent_root, + dedent( + """\ + ## stale-server + command: nonexistent-binary + + args: + - --legacy + """ + ), + ) + # Backend returns [] (operator pinned empty catalog). + backend = _make_mock_backend(specs=[]) + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + # Profile field is empty (authoritative). + resolved = getattr(agent._profile, "mcp_servers_resolved", None) + assert resolved == [], "backend returning [] must populate empty list on profile" + # Pool stays None — agent.call() not invoked here, but the consumption + # site at agent.py:3367 now uses the resolved list ([]) and skips the + # pool spin-up. The stale-server from mcp.md is NOT in the resolved list. + assert agent.mcp_pool is None From cd6c17598023ea5b5b49e1552da26f862e6e669a Mon Sep 17 00:00:00 2001 From: Dan Powers Date: Wed, 3 Jun 2026 13:24:01 -0500 Subject: [PATCH 5/6] docs(changelog): #201 PR 2 of 5 entry MCPServerRegistryBackend wiring + AgentProfile.mcp_servers_resolved sibling field + doctor check + IRON RULE regression suite + cross-model review army fixes (empty-list authoritative, fail-closed broader catch, doctor false-PASS closure). Co-Authored-By: Claude Opus 4.7 --- CHANGELOG.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index a9b38df..dd04403 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -51,6 +51,8 @@ CHANGELOG entry. ### Added +- **MCPServerRegistryBackend wiring + AgentProfile.mcp_servers_resolved sibling field + doctor.check_mcp_server_registry_backend + IRON RULE regression suite + cross-model fail-closed re-raise** ([#201](https://github.com/dep0we/atomic-agents-stack/issues/201) -- MCPServerRegistryBackend arc **PR 2 of 5**). Operators with a registry-backed MCP catalog now have agent construction actually consult the backend: previous PRs scaffolded the Protocol; this PR ships the framework integration. `AtomicAgent.__init__` accepts a new `mcp_server_registry_backend` constructor kwarg + class-level type annotation, resolves the default via `get_default_mcp_server_registry_backend(self.agent_root, read_paths)` when none is supplied, calls `backend.load_all_mcp_servers()` to materialize the spec list, and populates the new `AgentProfile.mcp_servers_resolved` field via `dataclasses.replace()` BEFORE `_load_config()` runs. `MCPClientPool` at `agent.call()` now reads `self._profile.mcp_servers_resolved` (the substrate-agnostic materialized list) instead of `self.config.mcp_servers` (the filesystem-parse path, which stays in place for backward-compat audit/log consumers). The framework-level fail-closed invariant per spec/36 lines 519-522 propagates `MCPRegistryError` from the backend probe with backend_id context plus URL credential redaction; original exception type is preserved via `type(exc)(...)` so callers can distinguish `MCPRegistryUnavailable` (transient) from `MCPRegistryDescriptorInvalid` (permanent). Per-runner kwargs ship on all three runners: `OutcomeRunner(..., mcp_server_registry_backend=...)` and `EvalRunner(..., mcp_server_registry_backend=...)` thread the kwarg to the internal `AtomicAgent` construction sites; `DreamRunner` stores it for API parity (no internal `AtomicAgent` construction in v1, mirroring the CorpusBackend precedent). `delegate.py` adds explicit-only threading via `_mcp_server_registry_backend_was_explicit` flag (mirrors PersonaBackend D-ER-2 at `agent.py:401` and CorpusBackend at `agent.py:431`): default-resolved backends do NOT leak the coordinator's `agent_root` to delegates because MCP catalog is per-agent semantic context, not fleet-scoped; the `MCPRegistryError` family is added to the delegate CLI catch block so catalog-down at delegate construction surfaces as a clean operator-facing error instead of a Python traceback. `AgentProfile` gains the `mcp_servers_resolved: list[MCPServerSpec]` sibling field (spec/24 Decision 1 addendum) placed LAST in the dataclass to honor Python's required-fields-before-defaults ordering rule; `field` is added to the `dataclasses` import; `to_dict()` always serializes the field as `[]` regardless of runtime value (snapshot security: resolved `$VAR` env values never land in snapshot JSON files on disk, preserving spec/24 D1's raw-text-shadow security intent); `from_dict()` reconstructs the field via the new `_mcp_spec_from_dict` helper which also fixes a pre-existing latent bug where the `mcp_servers` fallback path returned raw dicts instead of `MCPServerSpec` instances. SQLite backend continues to ride the existing `profile_json` blob round-trip; zero schema migration required, schema stays at v2. Filesystem profile backend additionally sorts `mcp_servers` lexicographically by name on load to align with spec/36 MUST 5 across all backends; pre-#201 operators with alphabetical `mcp.md` see no change; operators with non-alphabetical `mcp.md` see a one-time invisible reorder of `agent.config.mcp_servers` introspection (MCP tool resolution and audit semantics unchanged because qualified names are unique per server). `doctor.check_mcp_server_registry_backend` ships as the 13th `check_*_backend` with PASS/WARN/FAIL ladder: PASS on `list_mcp_servers` + `load_all_mcp_servers` both succeeding (the new dual-probe closes a Codex/Claude-adversarial-triple-confirmed false-PASS hole where a malformed `mcp.md` would PASS doctor then crash agent construction); WARN on transient `MCPRegistryUnavailable`; FAIL on unknown `backend_id` with credential-redacted echo via the shared `_redact_for_error_message` helper from `mcp_registry/__init__.py` (NOT the inline truncation used in `check_tool_registry_backend` which misses DSN-style values per the PR 1 P0 redaction discipline). Capability snapshot in the detail dict includes all 5 `MCPServerRegistryCapabilities` fields plus `mcp_server_count` so operator `atomic-agents doctor --json` consumers see the runtime view. Two PR 1 latent bug fixes surfaced by PR 2's fail-closed semantic: `FilesystemMCPServerRegistryBackend.list_mcp_servers` now discriminates `FileNotFoundError`/ENOENT (returns `[]`, the normal absent-file path) from other `OSError` (raises `MCPRegistryUnavailable` so misconfigured PermissionError after a Kubernetes deploy actually surfaces); `load_all_mcp_servers` overridden directly instead of delegating to `_default_load_all` so it does a single read-parse with proper exception mapping (ENOENT to empty, OSError to MCPRegistryUnavailable, parse error to MCPRegistryDescriptorInvalid, env-var unresolvable to MCPServerConnectFailed) rather than masking transient and parse failures through `list_mcp_servers`'s soft-degrade path. spec/36 line 599 gains a one-sentence amendment correcting the wiring location from `_load_config()` to `AtomicAgent.__init__` (spec/36 said the former, but `_load_config` is a pure reader of `self._profile` that cannot mutate it); spec/24 Decision 1 gains the locked D1 addendum documenting the sibling field, snapshot security clamp, and SQLite forward-compat. **Cross-model review army** (4 specialists + Claude adversarial subagent + Codex adversarial via `codex exec` + Codex structured review via `codex review`) surfaced 15+ findings; the 3 highest-priority were triple-confirmed across models and applied inline: (1) empty resolved list now treated as authoritative instead of falling back to `config.mcp_servers` (Codex HIGH + Claude adversarial F2 + Codex structured P2; the prior `... or self.config.mcp_servers` resurrected stale `mcp.md` servers when the operator pinned an empty catalog); (2) fail-closed wrapper broadened from `MCPRegistryUnavailable` to `MCPRegistryError` so descriptor-invalid and connect-failed also propagate with backend_id context; (3) doctor false-PASS closed by adding the `load_all_mcp_servers` probe after `list_mcp_servers` succeeds. **Test delta: +46 net new (3192 collected before fixes, 3199 after the doctor + wiring additions, 3141 passed + 58 skipped + 0 failures + 0 regressions across the full suite).** Test files: `tests/test_mcp_server_registry_wiring.py` (~22 tests covering per-runner kwargs, delegate explicit-only behavior, fail-closed re-raise with backend_id + URL redaction + MCPRegistryDescriptorInvalid propagation, profile augmentation, empty-resolved-list authoritative, CLI MCPRegistryError catch), `tests/test_mcp_server_registry_migration_regression.py` (10 IRON RULE byte-identity tests pinning pre-#201 behavior when no backend is configured: empty/missing mcp.md returns empty specs, single server no env vars byte-identical to `parse_mcp_md` baseline, multiple servers in sorted order matching the spec/36 MUST 5 sort, env-var resolved specs match the pre-#201 path, config equals profile resolved, pool `_specs` byte-identical to resolved list, `MCPRegistryUnavailable` raises at construction with no silent degrade), `tests/test_profile_mcp_servers_resolved.py` (9 tests covering field existence + LAST-in-dataclass constraint + to_dict always-`[]` + from_dict round-trip with key present + backward-compat with key absent + `_mcp_spec_from_dict` helper round-trip + `mcp_servers` fallback bug fix + snapshot/restore resets to `[]` + SQLite JSON blob round-trip), `tests/test_mcp_server_registry_doctor.py` (5 tests covering PASS with empty mcp.md, PASS with valid mcp.md, FAIL on unknown backend_id with URL credential redaction including DSN-style `user:pass@host` patterns, FAIL on descriptor invalid via the new probe, capability snapshot completeness). Mirrors the doctor-test shape of every other backend (lock, log, profile, tool registry, persona, corpus). Pre-impl prep: 5-stream parallel Sonnet prep pass parametrized on failure-mode dimensions (wiring shape vs precedent; AgentProfile round-trip; IRON RULE design; doctor PASS/WARN/FAIL ladder; fail-closed wiring) caught 71 findings BEFORE any code shipped, beating PR 1's 35-finding bar. Two binary design decisions locked via AskUserQuestion before implementer dispatch: server list ordering (Q1, locked to "sort both paths to lexicographic" in the filesystem profile backend so spec/36 MUST 5 holds across all backends with zero functional change to operators); snapshot serialization shape (Q2, locked to "always serialize `mcp_servers_resolved` as `[]`" so resolved MCP env secrets never persist to disk in snapshot files). Three commits authored as parallel Sonnet implementer streams in isolated git worktrees per the aggressive-Sonnet-delegation-when-on-Opus discipline (Stream 1 owned `agent.py` + runners + delegate + spec/36; Stream 2 owned profile/* + doctor + mcp_registry/filesystem.py bug fix + spec/24); merged cleanly with zero conflicts because the file-set partition was disjoint by design. Known follow-up coverage gaps (filed as P2 for PR 3 prep): a future PR can add chmod-style PermissionError tests for `load_all_mcp_servers`'s new OSError mapping (analogous coverage already exists for `load_mcp_server`); threading test strengthening to patch `AtomicAgent.__init__` and capture kwargs at runner `run()` time instead of asserting on the stored field. After PR 5 of 5 of #201 lands, atomic-agents-stack hits v1.0 with twelve of twelve backend protocols shipped; PR 2 extends the post-#285-revert /ship streak to 9. + - **MCPServerRegistryBackend Protocol scaffolding + FilesystemMCPServerRegistryBackend reference impl + DRAFT spec/36 + `atomic-agents mcp-registry` CLI** ([#201](https://github.com/dep0we/atomic-agents-stack/issues/201) -- MCPServerRegistryBackend arc **PR 1 of 5**, the v1.0 closer arc). Operators with a fleet of agents stop hand-syncing `mcp.md` files across N agents. The Protocol seals the per-agent mounted-server-source seam so SaaS / org-internal / public-registry futures arrive via the same env-var pin pattern used by the eleven prior backend protocols. New `atomic_agents/mcp_registry/` package with `__init__.py` (`register_mcp_server_registry_backend` / `get_mcp_server_registry_backend` / `list_mcp_server_registry_backends` + `get_default_mcp_server_registry_backend` factory + `_redact_for_error_message` helper with DSN heuristic for non-scheme connection strings), `types.py` (`MCPServerRef` frozen dataclass with `to_dict`/`from_dict` roundtrip, `MCPServerRegistryCapabilities` with 5 capability flags including the static-vs-dynamic distinction novel to this Protocol, `ValidationResult`), `backend.py` (`@runtime_checkable` `MCPServerRegistryBackend` Protocol with 8 methods + `@property backend_id` + `@property capabilities`, 6 exception classes, `_default_load_all(backend)` module-level helper that backends MAY override for performance), and `filesystem.py` (`FilesystemMCPServerRegistryBackend(agent_root, read_paths, *, lock_backend=None)` reading `/mcp.md`; install/uninstall raise `NotImplementedError` at PR 1 with `supports_install=False` static class default per the capability-honesty contract (MUST 3); install/uninstall ship at PR 3 alongside `LockBackend` integration). `parse_mcp_md_text` and `_build_spec` in `atomic_agents/mcp.py` gain an additive `resolve_env: bool = True` keyword-only parameter; the `_flush()` closure threads the parameter through to `_build_spec`; a new module-level `_resolve_env_vars(env, server_name)` helper is extracted from `_build_spec`'s inline resolution block and is now shared with `FilesystemMCPServerRegistryBackend.load_mcp_server` so the spec/19-canonical `MCPServerConnectFailed` message shape stays identical across both call paths. The public `parse_mcp_md` at `mcp.py:410` does NOT receive the parameter; it always resolves at parse time and the docstring documents the API asymmetry (callers needing lazy resolution call `parse_mcp_md_text(..., resolve_env=False)` directly). Two hidden callers preserved: `profile/types.py:336` (`AgentProfile.from_dict`) and `profile/filesystem.py:374` (`FilesystemAgentProfileBackend.load_profile`) each gain an inline comment documenting the `resolve_env=True` invariant their `AgentProfile.mcp_servers` consumers depend on. CLI gains the `atomic-agents mcp-registry` subcommand group with four read-only subcommands at PR 1 (install/uninstall ship in PR 3): `list` prints a table of mounted servers, `show ` prints the spec with `$VAR` env refs UNRESOLVED (re-parses mcp.md with `resolve_env=False` to avoid leaking resolved secrets to stdout per the cross-model-confirmed P0 from /ship's review army), `validate ` runs the static descriptor check, and `refresh-capabilities` prints the runtime capability view (no-op on filesystem; HTTP backend at PR 4 re-probes the catalog server). `docs/spec/36-mcp-server-registry-backend.md` ships as DRAFT at 696 lines with 10 normative MUSTs in the Implementer Contract (placed before Shipping plan per spec/34 ordering): 1 name charset validation at API boundary, 2 side-effect-free construction, 3 capability honesty including the static-vs-dynamic distinction, 4 URL credential redaction, 5 cross-agent isolation at storage layer, 6 backend_id stability + close() idempotency, 7 transient-vs-permanent failure honesty (MCPRegistryUnavailable vs MCPServerNotInRegistry distinction), 8 env-var resolution semantics on MCPServerSpec at load_mcp_server time, 9 install/uninstall atomicity (filesystem uses LockBackend lease, HTTP relies on catalog server transactional storage), 10 `load_all_mcp_servers` consistency (output semantically equivalent to `[load_mcp_server(ref.name) for ref in list_mcp_servers()]`). The spec re-languages "catalog of available servers" to "mounted server source" per Codex's outside-voice finding so the per-`agent_scope` filtering model is unambiguous; HTTP backend tier-1/2/3 capability negotiation via `GET /capabilities` and `OPTIONS /mcp-servers` ships at PR 4. `AgentProfile.mcp_servers_resolved` sibling field for the substrate-agnostic audit capture (spec/24 D1 addendum) ships at PR 2 alongside the wiring. Tests: 3090 to 3101 passing, 52 skipped, zero regressions across the full suite. 100 new MCP-registry tests across 3 files (`tests/test_mcp_server_registry_conformance.py` 42 tests covering MUSTs 1-3, 5-8, 10 parametrized across registered backends with HTTP joining at PR 4; `tests/test_mcp_server_registry_filesystem_backend.py` 32 tests + 2 skipped (install/uninstall stubs reserved for PR 3 with `pytest.mark.skip(reason="land in PR 3")` markers); `tests/test_mcp_registry_cli.py` 16 tests covering all four CLI subcommands + exception handler paths including the secret-leak regression test). Pre-impl review: 5-stream parallel Sonnet plan-subagent pass (Protocol surface, conformance test plan, filesystem correctness, parse_mcp_md_text refactor risk, spec/36 coherence) caught 5 P0 + 15 P1 + 15 P2 = 35 findings BEFORE any code shipped (9 times the CorpusBackend PR 4 prep pass's 4 findings, tracking the v1.0-closer-arc complexity). /ship review army: 6 parallel reviewers (4 specialists: testing, maintainability, security, performance + Claude adversarial subagent + Codex adversarial via codex exec) surfaced 30 findings post-implementation including the cross-model triple-confirmed P0 (`mcp-registry show` leaked resolved `$VAR` env values to stdout, found independently by Codex + the security specialist + Claude adversarial) plus 5 P1 (OSError misclassified as permanent-absence violating MUST 7, malformed-section silent drop, CLI `read_paths=[]` skipping path-traversal validation, `MCPServerConnectFailed` escaping the CLI exception handler with a Python traceback, `backend_id` property absent from the Protocol declaration despite being implemented on the class) plus 8 mechanical P2 AUTO-FIXes (section names not validated before listing, `_redact_for_error_message` missing the DSN heuristic for non-scheme connection strings, vacuous test assertions, hardcoded `/tmp/dummy-agent` path, 6 late imports moved to top of file, unused Protocol import in filesystem.py, misleading "Re-probe" CLI help text on filesystem backend). All P0 + P1 + critical P2s addressed inline before push; 15 regression tests added covering each fix including `test_cli_show_does_not_leak_resolved_env_values` that asserts a resolved env value (the actual GitHub token) does NOT appear in stdout while the unresolved `$VAR` reference DOES (operator-debugging utility preserved without the leak). After PR 4 of 5 of #201 lands, `atomic-agents-stack` hits v1.0 with twelve of twelve backend protocols shipped; PR 1 extends the post-#285-revert /ship streak to 8. - **`atomic-agents init` wizard PR 2 of 2: researcher + writer templates + Add-to-it recovery merge + 8 polish items** ([#94](https://github.com/dep0we/atomic-agents-stack/issues/94) -- init-wizard arc **PR 2 of 2**, arc closer). Operators now have three starter templates instead of one: `advisor` (general-purpose, Cautious autonomy), `researcher` (curiosity-first investigator with research integrity layer 1 ON, raw/ source intake, $1.50/day cost cap for multi-source synthesis, Cautious preset), and `writer` (voice-first content agent with Sonnet primary + Haiku fallback, drafts/ + revisions/ write paths, Cautious preset). Each template declares its operating mode explicitly per spec/01 (advisor and writer reactive; researcher hybrid for sustained multi-session investigations). The advisor template gains an `## Operating mode` section so all three templates conform to spec/01's mode-declaration requirement. Re-running `atomic-agents init` on an existing agent now offers a third recovery branch: **Add to it** (in addition to Overwrite and Cancel). The collision prompt in `_check_collision` offers three options: overwrite, add_to_it, cancel. Choosing add_to_it triggers section detection via the ATX h2 state machine (`_extract_h2_headers`, `_detect_sections`): code fences, HTML comments, and YAML frontmatter are skipped to avoid false positives; setext-style headings (underline) fail closed. Detection failure routes the operator back to overwrite/cancel via a plain-English message. On detection success, `_compute_merged_content` performs a file-level section-aware merge for every file in `TEMPLATE_SECTION_SCHEMA[template_name]`: the preamble before the first h2 is preserved verbatim; orphan h2 sections (operator-authored, not in schema) are preserved verbatim in their original position; schema h2 sections merge with operator preamble + existing h3 subsections preserved in order + new template h3 subsections appended at end. A unified diff preview (`_render_diff_preview`) is shown with CRLF and UTF-8 BOM normalization so Windows-originating vaults do not show every line as changed; if zero files changed the preview exits early with "up to date" and skips the Confirm prompt. On confirmation, each file is committed via `_io.atomic_write` (tmp + fsync + rename) directly into `agent_dir` in sorted relpath order -- no staging directory. A mid-commit KeyboardInterrupt prints a summary of written vs pending files before re-raising so the operator knows exactly what landed. Operator-authored memory notes (under `memory/` except `INDEX.md`), journal entries, log files, and raw documents are never touched; schema-owned scaffolding files (`memory/INDEX.md`, `wiki/INDEX.md`) ARE rewritten because they are template-owned routing/structure files. Missing template-owned files trigger a backfill path (label `[new file]` in the diff preview). 8 polish items from PR 1 adversarial review land in this PR: `DEFAULT_AGENTS_ROOT` is now lazy-computed in `get_agents_root()` so test monkeypatches of `HOME` work correctly; `_doctor_handoff` splits its try/except around `run_doctor` and `render_human` so a render-phase failure surfaces the exit code rather than masquerading as inconclusive; `_from_template` gains a symmetric length check before the regex match so a too-long name produces the correct error message; `_test_call` extracts an `_types(mod, *names)` helper for the exception catalog so the dispatch reads cleanly and tolerates missing SDK installs; `_translate_oserror` adds a specific message for `errno.ENOENT` ("the folder disappeared between collision check and overwrite") so operators get actionable guidance instead of a generic write failure; `_walk_traversable` converts from recursion to iterative deque with a `MAX_TEMPLATE_DEPTH=16` cap to defend future deep template trees; backup and staging directory timestamps now include microseconds (`%Y%m%dT%H%M%S_%fZ`) so two simultaneous overwrites within the same second produce distinct names; the persona-backend warning prints a credential-redacted URL via a new `redact_url_credentials` helper (drops `user:pass@` from netloc while keeping scheme + host + path visible, fixing the original advisor-warning pattern that hid the host entirely). PR 1 adversarial Round 2 deferred a `KeyboardInterrupt` cleanup gap (the existing `except Exception` did not catch KeyboardInterrupt because it inherits from BaseException, not Exception); PR 2 restructures all wizard cleanup paths to `except BaseException` with explicit `KeyboardInterrupt` re-raise after cleanup, plus an outermost handler in `run_init` that prints `"Canceled."` and returns exit code 130 (when KI lands mid-`_commit_merges`, the per-commit summary already printed the written vs pending file list). `--from-template ` and `--list-templates` are now fully CI-friendly: the non-TTY guard and the API key pre-flight both carve out these paths (per amended spec/35 MUST 2, 7, and 11), so a CI build can scaffold an agent without an interactive terminal and without the operator setting `ANTHROPIC_API_KEY` at build time (doctor catches the missing key later at first run). `_cmd_list_templates` enumerates all three templates with one-line descriptions. The action class glosses in spec/28 amend to explicitly classify web search HTTP GETs as `read_only` (they do not change external state), so researcher templates use the same Cautious preset as advisor without paying judge_required overhead on every search query. spec/35 grows to 15 normative MUSTs with MUST 5 updated to distinguish operator-authored data files from schema-owned scaffolding files (memory/INDEX.md and wiki/INDEX.md are template-owned and ARE rewritten; operator notes, journal, log, raw are not) and a new MUST 15 documenting the section-detection algorithm and file-level merge contract. New constants in `atomic_agents/init/constants.py`: `TEMPLATE_PRESET_DEFAULTS` per-template preset map, `TEMPLATE_SECTION_SCHEMA` per-template per-file h2 header schema (the source of truth that Add-to-it section detection validates against), `MAX_TEMPLATE_DEPTH`, three new MSG_ constants for the new recovery flow paths, and the `redact_url_credentials` helper. New `atomic_agents/init/templates/researcher/` (7 files, 360 LOC) and `atomic_agents/init/templates/writer/` (7 files, 352 LOC) trees use the same 12 locked template variables as advisor with template-specific section schemas. 59 net new tests across `test_init_cli.py` (--list-templates enumerates 3; researcher and writer choices accepted; unknown still rejected), `test_init_wizard.py` (section detection state machine; Add-to-it dispatch + detection; detection-failure fallback to cancel; file-level merge success + failure + KI-safe; _split_sections/_join_sections round-trip; _compute_merged_content preamble preservation + orphan h2 position + h3 preservation; CRLF + BOM normalization; all 8 polish items; per-template preset dispatch), `test_init_templates.py` (researcher + writer schema conformance; structural conformance per spec/01; ALL template variables used; zero em dashes per template; Operating mode sections present), and `test_init_smoke.py` (researcher and writer e2e happy paths; --from-template works without API key per the P3 carve-out). Test suite: 2939 + 50 skipped to 3051 collected, zero regressions. Pre-impl prep (4 parallel subagents) caught 13 SEVERE + 15 HIGH + 17 MEDIUM + 13 LOW = 58 findings across the brief BEFORE any code shipped (exceeding PR 1 of #94's 53 findings in a smaller scope), including 6 SEVERE locks emerged from cross-corroboration (known_templates not expanded with cli.py choices, `_default_template_vars` hardcoding the advisor preset for all templates, Operating mode section missing from IDENTITY.md across all templates, KeyboardInterrupt bypassing the cleanup block, TEMPLATE_SECTION_SCHEMA values not locked in the brief, spec/35 MUST 5 + MUST 15 text not drafted in the brief). Three architectural decisions surfaced via AskUserQuestion gates: web search action class (locked to read_only with spec/28 amendment), writer template model default (Sonnet primary + Haiku fallback with Opus upgrade path documented inline), and --from-template carve-out from MUST 7 (no API key required for scaffold; CI-friendly end-to-end). With PR 2 of the arc landing, Issue #94 closes: the half-day home-user deploy is now an under-10-minute first-run experience with three starter shapes plus a non-destructive recovery flow for re-running on existing agents. From 60e36746343ae839e0309fdb9112c2db04fa415b Mon Sep 17 00:00:00 2001 From: Dan Powers Date: Wed, 3 Jun 2026 13:26:26 -0500 Subject: [PATCH 6/6] docs: update test counts to 3199 collected / 3141 passed / 58 skipped for #201 PR 2 Co-Authored-By: Claude Sonnet 4.6 --- CLAUDE.md | 4 ++-- README.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index c6b4d2b..36cbd14 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -207,7 +207,7 @@ uv run pytest # full suite uv run pytest tests/test_.py -v # one module ``` -Run `uv run pytest --collect-only -q | tail -1` for the live test count (last refresh: 3,153 tests collected, 2026-06-03). New backend protocols add ~25 conformance + ~10 impl-specific tests. New features ship with tests. Migration-shaped PRs need parameterized fixture tests across the backend protocol — the conformance suite is what keeps the protocol honest. +Run `uv run pytest --collect-only -q | tail -1` for the live test count (last refresh: 3,199 tests collected, 2026-06-03). New backend protocols add ~25 conformance + ~10 impl-specific tests. New features ship with tests. Migration-shaped PRs need parameterized fixture tests across the backend protocol — the conformance suite is what keeps the protocol honest. ### Releases + SemVer @@ -341,7 +341,7 @@ These are not forbidden forever — they're explicitly deferred with rationale. ## Status -**v0.13.0, alpha, PUBLIC.** Core runtime stable. Test suite: run `uv run pytest --collect-only -q | tail -1` for the live count (last refresh: 3,153 tests collected, 2026-06-03). Capability-gated skips fall into four buckets — ToolRegistry conformance (filesystem-shape + `supports_uninstall=False` variants), AgentProfile (skill-content + filesystem-shape on SQLite), cross-process Redis (require real Redis instead of fakeredis), and judge-conformance dispatch (LLM-only + PolicyJudge concurrent-evaluate). Full CI runs against `uv sync --extra dev --extra openai --extra validation --extra redis`. **Eleven backend protocols shipped**: +**v0.13.0, alpha, PUBLIC.** Core runtime stable. Test suite: run `uv run pytest --collect-only -q | tail -1` for the live count (last refresh: 3,199 tests collected, 2026-06-03). Capability-gated skips fall into four buckets — ToolRegistry conformance (filesystem-shape + `supports_uninstall=False` variants), AgentProfile (skill-content + filesystem-shape on SQLite), cross-process Redis (require real Redis instead of fakeredis), and judge-conformance dispatch (LLM-only + PolicyJudge concurrent-evaluate). Full CI runs against `uv sync --extra dev --extra openai --extra validation --extra redis`. **Eleven backend protocols shipped**: - **MemoryBackend** (PR #57) — filesystem reference impl + conformance suite. - **LLMBackend** (#87) — Anthropic + OpenAI + Moonshot reference impls, registered at framework import; conformance suite parametrizes across all three. diff --git a/README.md b/README.md index 118a4da..69f88e4 100644 --- a/README.md +++ b/README.md @@ -282,7 +282,7 @@ Same pattern for OpenAI (`atomic-agents-openai`) and Moonshot (`atomic-agents-mo ## Repository structure - `atomic_agents/` — the Python package (runtime in `agent.py`; backend protocols in `memory/`, `_llm.py`, `_locks.py`, `_costs.py`, etc.; CLI in `cli.py`; preflight in `doctor.py`) -- `tests/` 3153 tests collected (3101 passing + 52 skipped), Python 3.11 + 3.12 matrix +- `tests/` 3199 tests collected (3141 passing + 58 skipped), Python 3.11 + 3.12 matrix - `docs/` — [spec entry point](docs/README.md), [`architecture.md`](docs/architecture.md), [`spec/`](docs/spec/) (31 locked docs + 4 RFCs/DRAFTs), [`deployment/`](docs/deployment/) (8 operator runbooks), [`samples/caldwell/`](docs/samples/caldwell/) (complete worked example), [`GOVERNANCE.md`](docs/GOVERNANCE.md), [`TENSIONS.md`](docs/TENSIONS.md), [`methodology.md`](docs/methodology.md) - `extras/` — operational templates (Claude Code skill wrappers, macOS LaunchAgent plists, cron examples) @@ -313,4 +313,4 @@ Before opening a PR, read [`CLAUDE.md`](CLAUDE.md) (the project's design ethos a ## Status -**v0.13.0, alpha.** Core runtime stable. 3153 tests collected (3101 passing + 52 skipped) on Python 3.11 / 3.12. Eleven of twelve backend protocols shipped (see the backend protocols table above); `MCPServerRegistryBackend` planned. The surface stabilizes at v1.0. Pre-1.0 — Minor releases may contain breaking changes (see [`docs/deployment/versioning.md`](docs/deployment/versioning.md)). Single-maintainer project; reference implementation anyone can use, fork, or extend. +**v0.13.0, alpha.** Core runtime stable. 3199 tests collected (3141 passing + 58 skipped) on Python 3.11 / 3.12. Eleven of twelve backend protocols shipped (see the backend protocols table above); `MCPServerRegistryBackend` planned. The surface stabilizes at v1.0. Pre-1.0 — Minor releases may contain breaking changes (see [`docs/deployment/versioning.md`](docs/deployment/versioning.md)). Single-maintainer project; reference implementation anyone can use, fork, or extend.