feat: convert MCP servers to engine-agnostic skills for token efficiency and Copilot MCP allowlist compatibility

## Problem

Three related issues motivate converting MCP servers (starting with safeoutputs) into engine-agnostic skills:

### 1. Copilot MCP Allowlist Blocks Non-Default MCP Servers

Copilot CLI v1.0.22+ enforces an org-level MCP allowlist by calling `GET https://api.github.com/copilot/mcp_registry`. In GitHub Actions, the one-shot token used by agents gets a **401** from this endpoint, triggering a **fail-closed** policy that blocks ALL non-default MCP servers — including `safeoutputs` and `github`.

**Forensic evidence** (from `security-reviews` run with Copilot CLI v1.0.23):
```
Request to MCP registry policy at https://api.github.com/copilot/mcp_registry failed with status 401
Failed to fetch MCP registry policy: 401. Non-default MCP servers will be blocked until the policy can be fetched.
MCP server "github" filtered: Could not verify server against any configured registry
MCP server "safeoutputs" filtered: Could not verify server against any configured registry
```

We currently pin Copilot to v1.0.21 to avoid this (see github/gh-aw#25550), but this is unsustainable.

### 2. MCP Server Protocol Overhead is Token-Expensive

The MCP server protocol requires:
- A full MCP config block in the agent's `mcpServers` configuration
- Tool discovery handshake (list tools, describe schemas)
- JSON-RPC framing for every tool call
- Authentication headers per-request

For safeoutputs, which provides ~10 tools with well-known schemas, this protocol overhead consumes significant tokens on every workflow run. The tool schemas are static and known at compile time — they don't need dynamic discovery.

### 3. Cross-Engine Inconsistency

Different engines handle MCP servers differently:
- **Copilot**: Has the allowlist enforcement problem described above
- **Claude**: Supports MCP natively but with different config format
- **Codex**: MCP support varies by version

Skills that compile down to engine-native tool definitions would provide a consistent interface across all engines.

## Proposed Solution: MCP-to-Skills Conversion Framework

### Design Goals

1. **Engine-agnostic**: Skills compile to native tool definitions for each engine (Copilot, Claude, Codex)
2. **Security-equivalent**: Same isolation guarantees as MCP servers — skills execute in a separate container, not in the agent's context
3. **Token-efficient**: Tool schemas injected at compile time, no runtime discovery overhead
4. **Selective exposure**: Only tools specified in `safe-outputs:` config are exposed to the agent
5. **Generalizable**: Framework supports converting other MCP servers (mcp-scripts, agentic-workflows) to skills

### Architecture

```
┌─────────────────────────────────────────────────────┐
│ Compile Time (gh aw compile)                        │
│                                                     │
│ workflow.md ──► skill definitions ──► lock.yml      │
│   safe-outputs:     (per engine)      - tool schemas│
│     add-comment       ┌─ Copilot      - skill files │
│     add-labels        ├─ Claude       - JSONL config│
│                       └─ Codex                      │
└─────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────┐
│ Runtime                                             │
│                                                     │
│  ┌──────────┐   write JSONL    ┌──────────────────┐ │
│  │  Agent   │ ──────────────►  │ Shared Volume    │ │
│  │ (engine) │                  │ outputs.jsonl    │ │
│  └──────────┘                  └────────┬─────────┘ │
│       │                                 │           │
│       │ (no MCP calls)                  │ read      │
│       │                                 ▼           │
│       │                        ┌──────────────────┐ │
│       │                        │ Skills Sidecar   │ │
│       │                        │ (container)      │ │
│       │                        │ - validates      │ │
│       │                        │ - sanitizes      │ │
│       │                        │ - rate-limits    │ │
│       │                        └──────────────────┘ │
└─────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────┐
│ Post-Processing (safe_outputs job)                  │
│                                                     │
│  Reads validated outputs.jsonl                      │
│  Executes GitHub API calls with write permissions   │
└─────────────────────────────────────────────────────┘
```

### How It Works

#### 1. Compile-Time Skill Generation

The `gh aw compile` step reads the `safe-outputs:` config and generates:
- **Tool definitions**: Native tool schemas per engine (no MCP discovery needed)
- **Skill files**: Markdown instructions teaching the agent how to use each tool
- **Validation rules**: Injected as compile-time constants (max counts, allowed labels, field constraints)

Example — for Copilot, instead of registering `safeoutputs` as an MCP server, the compiler would generate a `.claude/skills/` or equivalent skill file describing how to write JSONL entries:

```markdown
---
name: safe-outputs
description: Write structured outputs to trigger GitHub API operations
---

# Safe Outputs

Write JSON entries to `$GH_AW_SAFE_OUTPUTS` (one per line) to trigger GitHub operations.

## Available Tools

### add_comment
Write: `{"tool": "add_comment", "args": {"body": "...", "item_number": 42}}`
Constraints: Maximum 1 comment. Body max 65000 chars.

### add_labels  
Write: `{"tool": "add_labels", "args": {"labels": ["smoke-copilot"], "item_number": 42}}`
Constraints: Only these labels allowed: ["smoke-copilot"].
```

#### 2. Runtime: Volume-Based Communication

Instead of MCP JSON-RPC calls, the agent writes JSONL directly to the shared volume:
- Agent writes to `$GH_AW_SAFE_OUTPUTS` (already the current mechanism!)
- No network calls, no MCP server process, no authentication headers
- The safeoutputs MCP server is already fundamentally a JSONL writer — the MCP layer is just indirection

#### 3. Skills Sidecar Container (Security Boundary)

A lightweight sidecar container (similar to `cli-proxy`) provides:
- **Volume protection**: Mounts the safeoutputs volume read-only from the agent, write from the sidecar
- **Validation**: Applies the same validation rules (field types, max lengths, sanitization) currently in `GH_AW_VALIDATION_JSON`
- **Rate limiting**: Enforces `defaultMax` per tool
- **Audit logging**: Logs all tool invocations

The sidecar watches the JSONL file and validates entries in real-time, rejecting invalid ones before the post-processing job runs. This is strictly more secure than the current approach where the agent writes directly to the JSONL file.

#### 4. Post-Processing (Unchanged)

The existing `safe_outputs` job continues to work exactly as it does today — it reads `outputs.jsonl` and executes GitHub API calls with elevated permissions. No changes needed here.

### Generalizing to Other MCP Servers

The framework should support converting other MCP servers:

| MCP Server | Conversion Strategy |
|---|---|
| **safeoutputs** | JSONL file writes (already the core mechanism) + validation sidecar |
| **mcp-scripts** | Skill files with bash examples + script execution via cli-proxy |
| **agentic-workflows** | Skill files with gh-aw CLI usage patterns + cli-proxy exec |
| **github** | Skill files teaching `gh api` usage via cli-proxy (already works this way with cli-proxy enabled) |

The common pattern:
1. **Compile-time**: Generate engine-native tool definitions from the MCP server's tool schemas
2. **Runtime**: Expose tools via skills (markdown instructions) + a secure execution channel (volume, cli-proxy, or sidecar)
3. **No MCP protocol**: Eliminates discovery handshake, JSON-RPC framing, and authentication overhead

### Key Design Decisions

1. **Backward compatibility**: MCP server mode should remain as a fallback. Skills are an optimization, not a forced migration.
2. **Opt-in per workflow**: A workflow-level flag (e.g., `features: { skills-mode: true }`) could toggle between MCP and skills mode.
3. **Engine detection**: The compiler already knows the engine from the workflow config. It should generate the appropriate skill format automatically.
4. **Validation parity**: The sidecar must enforce the exact same validation rules as the current `GH_AW_VALIDATION_JSON` schema. This should be the same code, just running in a different container.

## Benefits

- **No MCP allowlist dependency**: Skills are repo-embedded, not registered via API
- **~30-50% fewer tokens** per workflow run (no tool discovery, no JSON-RPC framing)
- **Faster startup**: No MCP server process to launch, no healthcheck to wait for
- **Works across all engines**: Copilot, Claude, Codex, future engines
- **Strictly better security**: Sidecar validates writes in real-time vs current write-then-check model
- **Simpler debugging**: Skill files are readable markdown, not opaque MCP server logs

## Related

- github/gh-aw#25550 — MCP servers blocked by Copilot allowlist policy
- Copilot CLI v1.0.22+ MCP registry enforcement (`GET api.github.com/copilot/mcp_registry`)
- Current workaround: pin Copilot to v1.0.21 (unsustainable)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: convert MCP servers to engine-agnostic skills for token efficiency and Copilot MCP allowlist compatibility #1938

Problem

1. Copilot MCP Allowlist Blocks Non-Default MCP Servers

2. MCP Server Protocol Overhead is Token-Expensive

3. Cross-Engine Inconsistency

Proposed Solution: MCP-to-Skills Conversion Framework

Design Goals

Architecture

How It Works

1. Compile-Time Skill Generation

2. Runtime: Volume-Based Communication

3. Skills Sidecar Container (Security Boundary)

4. Post-Processing (Unchanged)

Generalizing to Other MCP Servers

Key Design Decisions

Benefits

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

MCP Server	Conversion Strategy
safeoutputs	JSONL file writes (already the core mechanism) + validation sidecar
mcp-scripts	Skill files with bash examples + script execution via cli-proxy
agentic-workflows	Skill files with gh-aw CLI usage patterns + cli-proxy exec
github	Skill files teaching `gh api` usage via cli-proxy (already works this way with cli-proxy enabled)

feat: convert MCP servers to engine-agnostic skills for token efficiency and Copilot MCP allowlist compatibility #1938

Description

Problem

1. Copilot MCP Allowlist Blocks Non-Default MCP Servers

2. MCP Server Protocol Overhead is Token-Expensive

3. Cross-Engine Inconsistency

Proposed Solution: MCP-to-Skills Conversion Framework

Design Goals

Architecture

How It Works

1. Compile-Time Skill Generation

2. Runtime: Volume-Based Communication

3. Skills Sidecar Container (Security Boundary)

4. Post-Processing (Unchanged)

Generalizing to Other MCP Servers

Key Design Decisions

Benefits

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions