Skip to content

Systemic MCP registry 401 failures block all agentic workflow safe outputs #26069

@jsquire

Description

@jsquire

Summary

Agentic workflows that rely on MCP servers (github, safeoutputs) are failing at a high rate due to transient HTTP 401 responses from the Copilot MCP registry endpoint. When this occurs, the Copilot CLI blocks all non-default MCP servers as a safety measure, leaving the agent unable to produce structured safe outputs. The agent completes its analysis correctly via shell fallback but produces {"items":[]}, which gh-aw interprets as a failure.

Environment

  • Repository: Azure/azure-sdk-for-net
  • Workflows affected: issue-triage.md (issue triage) and update-samples-and-docs.md (documentation gap detection)
  • Engine: Copilot (default)
  • gh-aw version: Latest as of April 2026
  • Runner: ubuntu-latest

Reproduction

This is not manually reproducible — it's a transient infrastructure failure. It manifests as temporal clusters (e.g., 8 failures in 36 hours) then clears up.

Failure Pattern

Agent process log signature

GET api.github.com/copilot/mcp_registry → 401 Unauthorized

Followed by:

filtered [...github, safeoutputs] (blocked by policy)

What happens

  1. Copilot CLI fetches api.github.com/copilot/mcp_registry during startup to validate MCP servers against org policy
  2. The endpoint returns HTTP 401
  3. CLI blocks ALL non-default MCP servers as a safety measure — both github and safeoutputs are filtered out
  4. Agent falls back to shell commands (curl, grep, gh CLI) and performs correct analysis
  5. Agent cannot call any safe-output MCP tools (add-labels, add-comment, assign-to-user, etc.)
  6. Agent output artifact is {"items":[]}
  7. gh-aw conclusion job detects empty outputs and files a failure report issue

Key observation

The agent's reasoning and analysis are correct. The failure is purely in the MCP server initialization path — the agent has no mechanism to write structured outputs when MCP servers are blocked.

Evidence: 10 confirmed failures (Apr 8–13, 2026)

All failures confirmed by downloading agent artifacts (gh run download <run_id> --name agent) and inspecting sandbox/agent/logs/process-*.log for the mcp_registry 401 pattern.

Issue Workflow Date (UTC) Run ID (from issue body)
#58113 Triage 2026-04-13 09:13 See issue
#58107 Triage 2026-04-11 18:00 See issue
#58075 Triage 2026-04-10 16:14 See issue
#58072 Docs 2026-04-10 15:28 See issue
#58059 Triage 2026-04-10 06:34 See issue
#58055 Docs 2026-04-10 00:59 See issue
#58054 Docs 2026-04-10 00:17 See issue
#58053 Triage 2026-04-10 00:16 See issue
#58048 Triage 2026-04-09 22:14 See issue
#58044 Docs 2026-04-09 21:09 See issue

Temporal pattern: Heaviest cluster Apr 9–10 (8 failures in ~36 hours), continuing intermittently through Apr 13.

Impact

  • 10 out of 13 recent agentic workflow failures (77%) share this exact root cause
  • Each failure auto-files a GitHub issue with the agentic-workflows label, creating noise in the issue tracker
  • Issues that should have been triaged or had docs gaps filed remain unprocessed until a human notices and re-triggers
  • No workaround exists from the workflow author side — the MCP registry check is internal to the Copilot CLI

Mitigation applied (workflow-side)

We have set report-failure-as-issue: false on both workflows to suppress the noisy auto-filed failure issues. This is a stopgap — it also suppresses reports for real workflow bugs.

Suggested improvements

  1. Fix the transient 401s — the root cause in the MCP registry auth path
  2. Graceful degradation — if the registry check fails transiently, consider retrying before blocking all MCP servers, or allowing a configurable fallback policy
  3. Distinguish infrastructure failures from agent failures — the current {"items":[]} output doesn't distinguish "agent chose not to act" from "agent couldn't access its tools"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions