feat(agent): #201 PR 2 of 5. MCPServerRegistryBackend wiring + AgentProfile sibling field + doctor + IRON RULE regression#331
Merged
Conversation
…ing field + doctor check + bug fixes Stream 2 of PR 2 (parallel with Stream 1 which owns agent.py wiring and spec/36). Core changes: - profile/types.py: add `mcp_servers_resolved: list[MCPServerSpec] = field(default_factory=list)` as LAST field in AgentProfile (Python dataclass default placement rule). Adds `field` to dataclass import. to_dict() always emits `mcp_servers_resolved: []` (locked Q2 -- keeps resolved MCP env secrets out of snapshot JSON on disk). from_dict() reconstructs via _mcp_spec_from_dict for both mcp_servers and mcp_servers_resolved. Adds _mcp_spec_from_dict helper as inverse of existing _mcp_spec_to_dict. - Latent bug fix (free side-effect): the pre-existing mcp_servers fallback path in AgentProfile.from_dict() returned raw dicts instead of MCPServerSpec instances. Now fixed by routing through _mcp_spec_from_dict. This silently broke attribute access on profile.mcp_servers in database-round-trip scenarios. - profile/filesystem.py: sort mcp_servers lexicographically by name after parse (locked Q1). Aligns pre-#201 path with FilesystemMCPServerRegistryBackend.list_mcp_servers() which already sorts. Adds mcp_servers_resolved=[] to AgentProfile direct construction. - profile/sqlite.py: adds comment confirming mcp_servers_resolved rides through profile_json blob via to_dict/from_dict with no schema change. Schema stays at v2. - mcp_registry/filesystem.py PR 1 bug fix: list_mcp_servers() previously caught all OSError and returned [] -- this masked transient failures (PermissionError, IsADirectoryError) from the fail-closed path PR 2 wires. Fixed to mirror load_mcp_server's ENOENT discrimination: FileNotFoundError (race after exists() check) -> return []; other OSError -> raise MCPRegistryUnavailable. Also overrides load_all_mcp_servers() with a single read-parse (replacing _default_load_all delegation) so MCPRegistryDescriptorInvalid and MCPRegistryUnavailable propagate correctly from bulk load. - doctor.py: adds check_mcp_server_registry_backend(agent_root) with PASS/WARN/FAIL ladder. Uses _redact_for_error_message from mcp_registry/__init__.py (handles URL + DSN heuristics, distinct from the inline truncation in check_tool_registry_backend). Reads backend.capabilities as a property. Detail dict includes all 5 capability fields + mcp_server_count from list_mcp_servers(). Added to SKIP enumeration and dispatch after check_corpus_backend. - docs/spec/24-agent-profile-backend.md: D1 addendum (#201 PR 2) documenting the new field, its populate-via-agent.py-only rule, snapshot serialization security rationale, and SQLite schema verification. - tests/test_profile_mcp_servers_resolved.py (NEW, 9 tests): covers field existence + last placement, to_dict security clamping, from_dict round-trips, _mcp_spec_from_dict helper, the mcp_servers fallback bug fix, snapshot round-trip, and SQLite round-trip. - tests/test_mcp_server_registry_filesystem_backend.py: updated test_load_all_mcp_servers_delegates_to_default_load_all -> now tests direct single-read- parse behavior (the delegation test was correct for PR 1 behavior; it is wrong for PR 2's locked design). Removed unused _default_load_all import and os import. CHANGELOG note: mcp_servers sort order change in pre-#201 filesystem profiles is a one-time reorder in introspection output; no functional impact (declaration order was never a contract). The PR 1 ENOENT discrimination fix should be called out in the CHANGELOG entry. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… delegate explicit-only + IRON RULE regression suite Wires MCPServerRegistryBackend into AtomicAgent at the framework integration layer. Backend resolution and load_all_mcp_servers() probe live in AtomicAgent.__init__ after profile load (line 496) and before _load_config() is called (line 770). Spec/36 line 599 incorrectly said _load_config(); the spec text gets a one-sentence amendment in this same PR. _load_config() at agent.py:2840 is a pure reader of self._profile and cannot do the resolution. Framework-level fail-closed semantic per spec/36 lines 519-522: load_all_mcp_servers() call has no try/except for soft-degrade. MCPRegistryUnavailable propagates from __init__. A wrapper re-raise adds the backend_id and redacted URL to the operator- facing message per MUST 4. AgentProfile.mcp_servers_resolved is populated via dataclasses.replace(self._profile, mcp_servers_resolved=materialized) before _load_config() builds the AgentConfig. The MCP pool consumption site at agent.py:3294-3334 changes its source from self.config.mcp_servers to self._profile.mcp_servers_resolved. AgentConfig.mcp_servers semantics stay unchanged for backward compat on existing log/audit consumers. Per-runner kwargs added to OutcomeRunner, EvalRunner, and DreamRunner. Outcome and Eval thread to their internal AtomicAgent construction. DreamRunner stores on self for API parity (no internal AtomicAgent construction site in v1). Delegate threading is explicit-only via the _mcp_server_registry_backend_was_explicit flag, mirroring PersonaBackend D-ER-2 and CorpusBackend at agent.py:443-465. MCP catalog is per-agent semantic context (per spec/36 Decision 1); default-resolved backends do not leak the coordinator's agent_root to delegates. delegate.py CLI catch updated to include MCPRegistryError so catalog-down at delegate construction produces clean Error: stderr output, not a Python traceback. Tests: - tests/test_mcp_server_registry_wiring.py (~20 tests) covers per-runner kwarg threading, delegate explicit-only behavior, fail-closed re-raise on MCPRegistryUnavailable, fail-closed message includes backend_id + redacted URL, profile augmentation populates correctly. - tests/test_mcp_server_registry_migration_regression.py (~10 IRON RULE tests) pins pre-#201 byte-identical behavior when no backend is configured. Empty mcp.md, missing mcp.md, single server no env, multiple servers in sorted order, env-var resolved specs, config equals profile resolved, mcp_pool._specs byte identical, MCPRegistryUnavailable raises at construction. Tests that reference AgentProfile.mcp_servers_resolved depend on Stream 2's profile changes and pass once both streams merge. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stream 1 used criteria.<dim>.weight (1.0) but EvalRunner._load_rubric reads weights.<dim> (must sum to 100 per spec/08). The test rubric now uses the correct frontmatter. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three findings triple-confirmed across Codex structured review, Codex adversarial, and Claude adversarial subagent (plus the 5 specialist subagents). All three are real correctness issues, not stylistic. 1. Empty resolved list must be authoritative, not a missing-field signal (agent.py:3367-3370). The `... or self.config.mcp_servers` fallback resurrected stale mcp.md servers when the registry backend genuinely returned []. An operator pinning an HTTP catalog that lists zero MCP servers for the agent_scope would still get mcp.md subprocesses spawned. Fix: replace `or` with explicit `if hasattr ...` so empty resolved lists are respected as the authoritative answer. 2. Fail-closed wrapper too narrow (agent.py:585). The `except MCPRegistryUnavailable` only caught transient failures, leaving MCPRegistryDescriptorInvalid (corrupt mcp.md) and MCPServerConnectFailed (env-var unresolvable) to escape uncontextualized. Fix: broaden to `except MCPRegistryError` and preserve original exception type via `type(exc)(...)` so callers can still distinguish transient from permanent. Adds defensive `getattr` default on backend_id for robust error-message construction. 3. Doctor false-PASS on malformed mcp.md (doctor.py:2786). The probe called only list_mcp_servers() which by design swallows parse errors and returns []. Agent construction at runtime calls load_all_mcp_servers() which raises MCPRegistryDescriptorInvalid. Operator runs doctor, sees PASS, constructs agent, agent crashes. Fix: add a load_all_mcp_servers() probe in the doctor check after list_mcp_servers() succeeds. Maps MCPRegistryDescriptorInvalid to FAIL with operator-readable message; MCPRegistryUnavailable to WARN. Polish from the maintainability and performance specialists: - Remove `from dataclasses import replace as _dc_replace` inside the per-spec loop body in mcp_registry/filesystem.py:368 (already imported at module level). - Remove `import os` deferred inside doctor.py check function (already imported at module level). - Include exception text in the generic-FAIL message so operators have actionable detail. Tests added (+7 net new, 3141 collected, all pass): - tests/test_mcp_server_registry_wiring.py: descriptor-invalid propagation test pinning the broadened catch + empty-resolved authoritative test pinning no-fallback-to-mcp.md. - tests/test_mcp_server_registry_doctor.py (NEW): 5 doctor tests covering PASS with empty mcp.md, PASS with valid mcp.md, FAIL on unknown backend_id with URL credential redaction, FAIL on descriptor invalid, capability snapshot completeness. Mirrors the doctor-test shape of every other backend (lock, log, profile, tool registry, persona, corpus). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
MCPServerRegistryBackend wiring + AgentProfile.mcp_servers_resolved sibling field + doctor check + IRON RULE regression suite + cross-model review army fixes (empty-list authoritative, fail-closed broader catch, doctor false-PASS closure). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… for #201 PR 2 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 2 of 5 of the MCPServerRegistryBackend arc (#201, the v1.0 closer). Wires PR 1's Protocol scaffolding into
AtomicAgent, ships theAgentProfile.mcp_servers_resolvedsibling field for substrate-agnostic audit (spec/24 D1 addendum), addsdoctor.check_mcp_server_registry_backendas the 13thcheck_*_backend, and pins pre-#201 byte-identical behavior via the IRON RULE regression suite. Three correctness fixes landed inline from cross-model review army (Codex + Claude adversarial + 4 specialists).Framework integration:
AtomicAgent.__init__acceptsmcp_server_registry_backendconstructor kwarg + class-level type annotation; defaults viaget_default_mcp_server_registry_backend(self.agent_root, read_paths); callsbackend.load_all_mcp_servers()to materialize the spec list; populatesAgentProfile.mcp_servers_resolvedviadataclasses.replace()BEFORE_load_config()runs.MCPClientPoolconsumption site atagent.call()now readsself._profile.mcp_servers_resolved(substrate-agnostic).self.config.mcp_serverssemantics preserved for backward-compat on log/audit consumers.MCPRegistryErrorfrom backend probe with backend_id context + URL credential redaction; original exception type preserved viatype(exc)(...).delegate.pyexplicit-only threading via_mcp_server_registry_backend_was_explicitflag (mirrors PersonaBackend D-ER-2 + CorpusBackend precedents).MCPRegistryErroradded todelegate.pyCLI catch block for clean operator UX on catalog-down at delegate construction.Profile changes:
AgentProfile.mcp_servers_resolved: list[MCPServerSpec]field added LAST in dataclass (Python required-fields-before-defaults rule).fieldimport added.to_dict()always serializes the field as[](snapshot security: resolved$VARenv values never persist to disk).from_dict()uses new_mcp_spec_from_dicthelper which also fixes a pre-existing latent bug where themcp_serversfallback returned raw dicts.profile_jsonJSON blob; zero schema migration, schema stays at v2.mcp_serverslexicographically on load (Q1 decision, aligns with spec/36 MUST 5).PR 1 bug fixes (surfaced by PR 2's fail-closed semantic):
FilesystemMCPServerRegistryBackend.list_mcp_serversnow discriminatesFileNotFoundError/ENOENT (returns[]) from otherOSError(raisesMCPRegistryUnavailable); previously swallowedPermissionErrorfrom misconfigured deploys.load_all_mcp_serversoverridden directly with single read-parse mapping ENOENT/OSError/parse/env-var to the proper exception types.Cross-model review army findings (3 triple-confirmed, applied inline):
... or self.config.mcp_serverspreviously resurrected stalemcp.mdservers when registry returned[](Codex HIGH + Claude adversarial F2 + Codex structured P2). Fixed via explicitif hasattr ...branch.MCPRegistryUnavailabletoMCPRegistryErrorsoMCPRegistryDescriptorInvalidpropagates with backend_id context.load_all_mcp_servers()probe afterlist_mcp_servers()succeeds (Codex MEDIUM + Claude adversarial F6 + Testing specialist).spec doc edits:
_load_config()toAtomicAgent.__init__.Test Coverage
Test count: +46 net new tests, 3199 collected, 3141 passed + 58 skipped, 0 failures, 0 regressions across the full suite.
Coverage gate: PASS (~93% AI-assessed after the doctor + descriptor-invalid tests landed). Documented follow-up coverage notes (filed for PR 3 prep, non-blocking): chmod-style PermissionError tests for
load_all_mcp_servers's new OSError mapping; threading test strengthening to patchAtomicAgent.__init__and capture kwargs at runnerrun()time.Tests: before 3153 -> after 3199 (+46 net new), broken down:
tests/test_mcp_server_registry_wiring.py(~22 wiring tests),tests/test_mcp_server_registry_migration_regression.py(10 IRON RULE byte-identity tests),tests/test_profile_mcp_servers_resolved.py(9 sibling field round-trip tests),tests/test_mcp_server_registry_doctor.py(5 doctor PASS/WARN/FAIL tests including DSN-style credential redaction and the descriptor-invalid FAIL path).Pre-Landing Review
Review army dispatched 4 specialists (testing, maintainability, security, performance) + Claude adversarial subagent + Codex adversarial (
codex exec) + Codex structured review (codex review). 15+ findings surfaced. The 3 highest-priority were triple-confirmed across models and applied inline (see Summary above). Polish findings applied: removed_dc_replaceimport inside loop body inmcp_registry/filesystem.py:368, removed deferredimport osinside doctor check function. Pre-existingmcp_serversdeserialization bug (latent since #63) fixed as a free side effect of adding_mcp_spec_from_dict.Codex structured review: PASS (no
[P1]markers, only the empty-list fallback P2 which was applied inline).Plan Completion
25/25 plan items DONE per the Step 8 plan-completion audit subagent. Zero deferred. Zero unverifiable. Plan source: design doc at
~/.gstack/projects/dep0we-atomic-agents-stack/dep0we-main-design-20260601-165020.md+ spec/36 PR 2 scope section + 5-stream prep pass synthesis.Documentation
README.md: test count updated from 3153 collected / 3101 passed / 52 skipped to 3199 collected / 3141 passed / 58 skipped (2 sites: repository structure section + status footer).
CLAUDE.md: test count updated from 3,153 to 3,199 collected (2 sites: Tests convention section + Status section).
All other docs current: CHANGELOG entry already in commit cd6c175; spec/24 D1 addendum and spec/36 line 599 amendment shipped in implementer commits; ROADMAP.md "Eleven of twelve" count and MCPServerRegistry "Planned" status remain accurate until PR 5 of 5.
Test plan
🤖 Generated with Claude Code