Skip to content

[POC] Add MCP tool schema regression testing with mcp-recorder#222

Open
caballeto wants to merge 1 commit intomondaycom:masterfrom
caballeto:add-snapshot-based-integration-tests
Open

[POC] Add MCP tool schema regression testing with mcp-recorder#222
caballeto wants to merge 1 commit intomondaycom:masterfrom
caballeto:add-snapshot-based-integration-tests

Conversation

@caballeto
Copy link

[POC] Adds regression test coverage for MCP tool schemas using mcp-recorder -- like VCR.py but for MCP servers. Records the full protocol exchange into a JSON cassette and verifies it hasn't changed on every PR.

Why this matters

The MCP server exposes 20+ tools to AI agents. The tool names, descriptions, input schemas, and annotations are the contract -- when they drift, downstream agents break silently. Today there's no test that catches "someone changed a .describe() string" or "a tool got accidentally filtered out."

What this covers

A single cassette (list_tools.json, ~3200 lines) captures 3 interactions:

  • Protocol handshake -- initialize response with protocol version, capabilities, server info
  • Tool schemas -- tools/list with complete input schemas, descriptions, and annotations for all registered tools

If a tool is renamed, a parameter is removed, a description changes, or an annotation flips, the CI diff shows exactly what broke.

How it works

The server is spawned via stdio with MONDAY_TOKEN=test-token. The token is never validated for initialize + tools/list -- these only enumerate the in-memory tool registry. No network calls, no secrets, no real API access needed.

Why this is a POC

This PR adds a Python dependency (mcp-recorder) to a TypeScript repo, so it may not fit the project's tooling preferences. However, it demonstrates the benefits of a snapshot-based testing:

  • Schema drift detection is cheap -- one YAML file, one cassette, zero secrets
  • The cassette diff is the review -- any tool schema change shows up as a JSON diff in the PR, making it explicit and reviewable
  • The full tool surface captured -- the committed cassette serves as living documentation of every tool's public interface

Changes

All additive -- only .github/workflows/pull_request.yml and package.json modified.

integration/
  scenarios.yml                      # list_tools scenario in ~14 lines of YAML
  cassettes/list_tools.json          # golden cassette (3 interactions, ~3200 lines)
  requirements.txt                   # mcp-recorder>=0.4.1
.github/workflows/pull_request.yml   # CI: verify step after build
package.json                         # mcp:record and mcp:verify convenience scripts

Run locally

pip install -r integration/requirements.txt
yarn build

mcp-recorder verify \
  --cassette integration/cassettes/list_tools.json \
  --target-stdio "node packages/monday-api-mcp/dist/index.js" \
  --target-env MONDAY_TOKEN=test-token

Update after intentional changes

yarn build
mcp-recorder verify \
  --cassette integration/cassettes/list_tools.json \
  --target-stdio "node packages/monday-api-mcp/dist/index.js" \
  --target-env MONDAY_TOKEN=test-token \
  --update

The cassette diff in the PR review shows exactly what changed in the tool surface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant