[POC] Add MCP tool schema regression testing with mcp-recorder#222
Open
caballeto wants to merge 1 commit intomondaycom:masterfrom
Open
[POC] Add MCP tool schema regression testing with mcp-recorder#222caballeto wants to merge 1 commit intomondaycom:masterfrom
caballeto wants to merge 1 commit intomondaycom:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[POC] Adds regression test coverage for MCP tool schemas using mcp-recorder -- like VCR.py but for MCP servers. Records the full protocol exchange into a JSON cassette and verifies it hasn't changed on every PR.
Why this matters
The MCP server exposes 20+ tools to AI agents. The tool names, descriptions, input schemas, and annotations are the contract -- when they drift, downstream agents break silently. Today there's no test that catches "someone changed a
.describe()string" or "a tool got accidentally filtered out."What this covers
A single cassette (
list_tools.json, ~3200 lines) captures 3 interactions:initializeresponse with protocol version, capabilities, server infotools/listwith complete input schemas, descriptions, and annotations for all registered toolsIf a tool is renamed, a parameter is removed, a description changes, or an annotation flips, the CI diff shows exactly what broke.
How it works
The server is spawned via stdio with
MONDAY_TOKEN=test-token. The token is never validated forinitialize+tools/list-- these only enumerate the in-memory tool registry. No network calls, no secrets, no real API access needed.Why this is a POC
This PR adds a Python dependency (
mcp-recorder) to a TypeScript repo, so it may not fit the project's tooling preferences. However, it demonstrates the benefits of a snapshot-based testing:Changes
All additive -- only
.github/workflows/pull_request.ymlandpackage.jsonmodified.Run locally
pip install -r integration/requirements.txt yarn build mcp-recorder verify \ --cassette integration/cassettes/list_tools.json \ --target-stdio "node packages/monday-api-mcp/dist/index.js" \ --target-env MONDAY_TOKEN=test-tokenUpdate after intentional changes
yarn build mcp-recorder verify \ --cassette integration/cassettes/list_tools.json \ --target-stdio "node packages/monday-api-mcp/dist/index.js" \ --target-env MONDAY_TOKEN=test-token \ --updateThe cassette diff in the PR review shows exactly what changed in the tool surface.