test(mcp): contract test for outputSchema ↔ structuredContent — previously declared done; second-pass caught the blind spot#44
Merged
Conversation
… tools Adversarial second-pass audit caught: src/index.ts (~110 lines rewritten in PR-5) and src/mcp-schemas.ts (151 lines added in PR-5) had ZERO committed tests. The 'npm test → 85/85 pass' that PR-5 reported was true but did not exercise any of the new code — same AirMCP feedback_test_blind_spot pattern. The PR-5 smoke I ran from /tmp caught a real bug (stack: z.string() when it should be z.array(z.string())) but was throwaway, not committed — so if audit_release's AuditReport shape grows tomorrow, the SDK's runtime outputSchema validator throws 'Output validation error' on the first call in production. tests/mcp-server.test.ts (new, ~220 lines): - Spawns dist/index.js as a stdio child. - Drives MCP initialize + tools/list + tools/call for all 5 tools. - Asserts: structuredContent present, no 'Output validation error' text block, key fields exist and match expected enum values. - Audit_security on THIS repo pins the README claim of 8/8 HARDENED: if this repo regresses (e.g. someone removes the CodeQL workflow), the test fails before merge. - Invalid input test: confirms the server doesn't crash on a bad template ID and surfaces isError correctly. Test count: 85 → 93. Wall time: 1.8s → 18s (binary spawn dominates). Not fixed in this PR (flagged): - M1: src/index.ts:23 PKG_VERSION='0.4.0' hardcoded while src/cli.ts:110 reads from package.json — separate PR per user direction. - M2: tests/download.test.ts:162 describe.skip for zip-slip e2e — pre-existing, not from this session's scope.
…pendent) PR-44 CI failed on 'audit_security: this repo regressed below HARDENED (verdict=needs-attention)'. Root cause: in CI, the GITHUB_TOKEN scope declared in ci.yml is 'contents: read', which doesn't grant visibility into 'security_and_analysis' on the repo metadata. The audit_security 'secret-scanning' detector calls 'gh api repos/<repo>' for that field and falls back to partial/missing when it can't see it. That's an environment limit, not a regression. Locally my personal gh creds have admin scope on the org, so the detector returns 'enabled' → present → 8/8 HARDENED. Different env, different signal. Fix: - Don't pin the verdict to 'hardened'; assert it's a valid enum value. - Pin the 7 non-environment-dependent checks (gitleaks, codeql, dep-audit, license-check, ignore-scripts, dependabot, claude-code-security-review) to status='present'. Those are read purely from .github/workflows/*.yml + .github/dependabot.yml, so CI and local agree. - Soft-assert that secret-scanning check at least *ran* (appears in the report), so a removed detector doesn't silently pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why this PR exists
The 2026-05-21 adversarial second-pass audit caught a Critical blind spot in this session's work:
src/index.ts(~110 lines) and addedsrc/mcp-schemas.ts(151 lines)./tmpthat I deleted at the end of the PR. It actually caught one real bug (stack: z.string()vsstring[]), proving the audit shape divergence is realistic — but the smoke was throwaway, not committed.This is the AirMCP
feedback_test_blind_spotpattern verbatim: registration metadata correct, runtime contract unverified.What this PR adds
tests/mcp-server.test.ts— committed contract test that spawns the built binary, drives MCP stdio, and asserts every tool'sstructuredContentmatches its declaredoutputSchema.Coverage:
initializetools/listoutputSchemalist_templatesstructuredContent.templatesis an array of 11audit_releaseshipReady.verdictis in the declared enumaudit_cddestinationsis an array,overall.verdictis in enumaudit_securityon this repocreate_projectwith invalid templateaudit_securitytest in particular pins a narrative claim (README says 8/8 HARDENED) to a code gate. If a future PR removes CodeQL or breaks gitleaks pinning, this test fails before merge — not after release.Test plan
npm run buildcleannpm test— 93/93 pass (was 85/85; +8 contract tests)Not fixed in this PR (will follow up)
src/index.ts:23PKG_VERSION = "0.4.0"hardcoded whilesrc/cli.ts:110correctly reads frompackage.json. Drifts silently on next minor bump. → separate PR per user direction.tests/download.test.ts:162describe.skip("extractTarball — zip-slip end-to-end")— pre-existing, not introduced by this session. Security-critical zip-slip path is only unit-tested onisSafeTarEntry, no e2e fixture. Flagged for retrospective; outside this PR's scope.Out-of-scope (verified, intentional)
src/scaffold.ts:288silent.catch(() => {})— cleanup best-effort with explicit comment.src/download.ts:106while (true)— bounded byreader.done+maxSizeBytes.src/index.tsas unknown as Record<string, unknown>×3 — unavoidable SDK+AuditReport bridge (covered in/simplify).