Skip to content

Commit b2ad5e9

Browse files
authored
improvement(mcp): post-merge hardening — protocol negotiation + distributed OAuth lock + typed errors (#4722)
* improvement(mcp): post-merge hardening — protocol negotiation + distributed OAuth lock + typed error dispatch - Inbound MCP server now negotiates protocolVersion per MCP 2025-06-18: echoes the client's requested version when supported, falls back to our latest. Previously hardcoded to the oldest spec version (2024-11-05). - withMcpOauthRefreshLock now takes a Postgres advisory transaction lock in addition to the in-process Promise chain, so concurrent processes (multi-task ECS) serialize on a per-OAuth-row basis. Previously a refresh race across processes could rotate a token under another process's feet and force re-auth. - categorizeError dispatches on McpOauthAuthorizationRequiredError / UnauthorizedError / McpConnectionError first, only falling back to substring matching for SDK / third-party errors. Adds 502 for connection failures and 503 for cooldown. Tests cover all four typed cases. - discoverTools no longer pretends to handle deferred-side-effect rejections via a dead allSettled().catch() — each side-effect already self-logs; we just swallow per-promise to silence unhandled-rejection warnings. * fix(mcp): bugbot review on hardening PR - Replace `as never` cast in negotiateProtocolVersion with `as readonly string[]` on the array — preserves TypeScript narrowing on the comparison while satisfying the readonly-tuple `.includes()` constraint properly. - Document the pg_advisory_xact_lock tradeoff: session-level locks (`pg_advisory_lock`) would release the connection earlier, but PgBouncer transaction-pooling mode breaks them. xact_lock is the correct choice for Sim's deployment; if pool pressure becomes real, the comment notes the Redlock escape hatch. * chore(mcp): trim verbose comments + reuse SDK Tool type in McpTool - McpTool now extends `Pick<Tool, 'name' | 'description'>` from @modelcontextprotocol/sdk so name/description fields stay in sync with the SDK; serverId/serverName remain Sim-specific additions. - Drop file-header restatements ("MCP Types - for connecting to external MCP servers"), one-line wrapper docstrings ("Get connection status"), and narrative comment blocks that just restate visible code. - Keep only comments that document non-obvious "why" — OAuth refresh-lock tradeoff, in-flight dedup key composition, SDK Tool.inputSchema typing, preregistered-client semantics, postMessage handshake contract. * improvement(mcp): swap PG advisory lock for Redis mutex on OAuth refresh withMcpOauthRefreshLock now uses `coalesceLocally` + Redis acquireLock/ releaseLock with polling — the same primitives backing regular OAuth refresh (`app/api/auth/oauth/utils.ts`). No more pinning a Postgres connection for the duration of the SDK's OAuth HTTP refresh. - In-process dedup: shared promise via `coalesceLocally`. - Cross-process: Redis SET NX EX mutex; followers poll until the leader releases (30s max wait, 100ms poll), then acquire and run fn(). - Each MCP caller still constructs its own client (semantics preserved). - Falls open when Redis is unavailable — same behavior as the regular OAuth refresh code path. * improvement(mcp): use SDK protocol versions + pool pinned undici agents + cover OAuth lock - McpClient.SUPPORTED_VERSIONS removed; getVersionInfo() and the inbound serve route both import LATEST_PROTOCOL_VERSION / SUPPORTED_PROTOCOL_VERSIONS directly from @modelcontextprotocol/sdk/types.js. New protocol revisions ship automatically with SDK upgrades. - pinned-fetch now caches undici Agents in a module-level LRU keyed by resolvedIP (max 64). Back-to-back MCP calls to the same server reuse the keep-alive connection pool instead of opening fresh TCP + TLS each time. - New integration tests for withMcpOauthRefreshLock covering: in-process dedup via coalesceLocally, cross-process serialization via Redis mutex, fall-open on Redis unavailable, lock release on throw, release-failure swallow, per-row key isolation. * fix(mcp): serialize OAuth refresh callers; do not share McpClient withMcpOauthRefreshLock previously wrapped fn() in coalesceLocally, which returns the SAME promise (and the same resolved value) to all in-process callers. fn() returns a stateful McpClient — sharing it meant whichever caller finished first would disconnect the client while another was still mid-call, leaving in-flight RPC on a closed connection. Swap coalesceLocally for a per-row Promise chain: each caller waits for the previous to settle, then runs its OWN fn() (gets its own client). Cross-process Redis mutex semantics unchanged. The "shareable scalar" assumption that makes coalesceLocally correct for regular OAuth refresh (returns an access token string) does not hold for MCP, where each caller needs an independent connection. * fix(mcp): bugbot — TTL watchdog on OAuth lock + don't close evicted pinned agents - Redis refresh lock now uses a 15s TTL with a watchdog that extends every 5s while fn() runs. Long-running OAuth refreshes no longer lose the lock mid-flight and let another process race the same refresh. - Pinned-agent LRU eviction no longer calls `agent.close()`. Existing `createMcpPinnedFetch` closures hold the dispatcher reference and were using a closed Agent after eviction. We drop from the cache and let GC release the dispatcher once the last closure dies; undici closes idle keep-alive connections via its own internal timeout. - New tests: watchdog extends while fn() runs and stops once it settles; evicted agents are not closed and captured closures still work. * fix(mcp): throw instead of falling open when refresh lock wait exceeds deadline When the Redis refresh lock can't be acquired within REFRESH_MAX_WAIT_MS the previous code ran fn() uncoordinated — but another process can still be holding the lock (watchdog-extended) and refreshing the same OAuth row, recreating the exact race the lock prevents. Throw on deadline. The caller can retry; the Redis-down branch remains the only path that runs fn() uncoordinated (no coordination is possible there). * docs(mcp): restore TSDoc that documented intent on exported types/methods Earlier comment-trim pass went too far on a few exports — restored the TSDoc that explained non-obvious "why" decisions: - SimMcpOauthProviderInit.preregistered: when set, the SDK skips DCR. - McpServerConfig.userId: required for OAuth; selects which user's stored tokens to use. - McpOauthAuthorizationRequiredError: benign pending state vs failure. - McpToolsChangedCallback / McpClientOptions: notification semantics, DNS-rebinding pinning rationale, OAuth provider contract. - StoredMcpToolReference / StoredMcpTool: minimal vs extended use. - McpClient.connect: documents listChanged handler registration. - McpService.executeTool: documents session-error retry behavior. Pure-restatement comments ("Disconnect from MCP server") stay trimmed.
1 parent 19b5099 commit b2ad5e9

12 files changed

Lines changed: 459 additions & 192 deletions

File tree

apps/sim/app/api/mcp/serve/[serverId]/route.test.ts

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -197,4 +197,51 @@ describe('MCP Serve Route', () => {
197197
expect(headers['X-API-Key']).toBeUndefined()
198198
expect(mockGenerateInternalToken).toHaveBeenCalledWith('user-1')
199199
})
200+
201+
describe('initialize protocol version negotiation', () => {
202+
async function callInitialize(protocolVersion?: string) {
203+
dbChainMockFns.limit.mockResolvedValueOnce([
204+
{
205+
id: 'server-1',
206+
name: 'Public Server',
207+
workspaceId: 'ws-1',
208+
isPublic: true,
209+
createdBy: 'owner-1',
210+
},
211+
])
212+
const params: Record<string, unknown> = {
213+
capabilities: {},
214+
clientInfo: { name: 'test', version: '1.0.0' },
215+
}
216+
if (protocolVersion !== undefined) params.protocolVersion = protocolVersion
217+
const req = new NextRequest('http://localhost:3000/api/mcp/serve/server-1', {
218+
method: 'POST',
219+
body: JSON.stringify({ jsonrpc: '2.0', id: 1, method: 'initialize', params }),
220+
})
221+
const res = await POST(req, { params: Promise.resolve({ serverId: 'server-1' }) })
222+
return res.json() as Promise<{ result: { protocolVersion: string } }>
223+
}
224+
225+
it('echoes a supported client protocolVersion (2025-06-18)', async () => {
226+
const body = await callInitialize('2025-06-18')
227+
expect(body.result.protocolVersion).toBe('2025-06-18')
228+
})
229+
230+
it('echoes a supported client protocolVersion (2024-11-05)', async () => {
231+
const body = await callInitialize('2024-11-05')
232+
expect(body.result.protocolVersion).toBe('2024-11-05')
233+
})
234+
235+
it('falls back to SDK latest when client requests unknown version', async () => {
236+
const { LATEST_PROTOCOL_VERSION } = await import('@modelcontextprotocol/sdk/types.js')
237+
const body = await callInitialize('2099-01-01')
238+
expect(body.result.protocolVersion).toBe(LATEST_PROTOCOL_VERSION)
239+
})
240+
241+
it('falls back to SDK latest when client omits protocolVersion', async () => {
242+
const { LATEST_PROTOCOL_VERSION } = await import('@modelcontextprotocol/sdk/types.js')
243+
const body = await callInitialize(undefined)
244+
expect(body.result.protocolVersion).toBe(LATEST_PROTOCOL_VERSION)
245+
})
246+
})
200247
})

apps/sim/app/api/mcp/serve/[serverId]/route.ts

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,10 @@ import {
1111
type JSONRPCError,
1212
type JSONRPCMessage,
1313
type JSONRPCResultResponse,
14+
LATEST_PROTOCOL_VERSION,
1415
type ListToolsResult,
1516
type RequestId,
17+
SUPPORTED_PROTOCOL_VERSIONS,
1618
type Tool,
1719
} from '@modelcontextprotocol/sdk/types.js'
1820
import { db } from '@sim/db'
@@ -36,6 +38,17 @@ import { getUserEntityPermissions } from '@/lib/workspaces/permissions/utils'
3638

3739
const logger = createLogger('WorkflowMcpServeAPI')
3840

41+
function negotiateProtocolVersion(rpcParams: unknown): string {
42+
const requested =
43+
rpcParams && typeof rpcParams === 'object' && 'protocolVersion' in rpcParams
44+
? (rpcParams as { protocolVersion?: unknown }).protocolVersion
45+
: undefined
46+
if (typeof requested === 'string' && SUPPORTED_PROTOCOL_VERSIONS.includes(requested)) {
47+
return requested
48+
}
49+
return LATEST_PROTOCOL_VERSION
50+
}
51+
3952
export const dynamic = 'force-dynamic'
4053

4154
interface RouteParams {
@@ -214,7 +227,7 @@ export const POST = withRouteHandler(
214227
switch (method) {
215228
case 'initialize': {
216229
const result: InitializeResult = {
217-
protocolVersion: '2024-11-05',
230+
protocolVersion: negotiateProtocolVersion(rpcParams),
218231
capabilities: { tools: {} },
219232
serverInfo: { name: server.name, version: '1.0.0' },
220233
}

apps/sim/hooks/queries/mcp.ts

Lines changed: 3 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -57,9 +57,7 @@ export const mcpKeys = {
5757

5858
export type { McpServer }
5959

60-
/**
61-
* Input for creating/updating an MCP server (distinct from McpServerConfig in types.ts)
62-
*/
60+
/** Wire shape for create/update; distinct from runtime McpServerConfig. */
6361
export interface McpServerInput {
6462
name: string
6563
transport: McpTransport
@@ -265,11 +263,7 @@ export function useCreateMcpServer() {
265263
})
266264
}
267265

268-
/**
269-
* Result of `useStartMcpOauth`. When `popup` is set, the caller should wait
270-
* for it to close (or for the `mcp-oauth` postMessage) before clearing any
271-
* "connecting" UI state.
272-
*/
266+
/** On `redirect`, the caller must wait for `popup.closed` or the `mcp-oauth` postMessage. */
273267
export type StartMcpOauthMutationResult =
274268
| { status: 'redirect'; popup: Window }
275269
| { status: 'already_authorized' }
@@ -464,13 +458,7 @@ const sseConnections: Map<string, SseEntry> =
464458
((globalThis as Record<string, unknown>)[SSE_KEY] as Map<string, SseEntry>) ??
465459
((globalThis as Record<string, unknown>)[SSE_KEY] = new Map<string, SseEntry>())
466460

467-
/**
468-
* Subscribe to MCP tool-change SSE events for a workspace.
469-
* On each `tools_changed` event, invalidates the relevant React Query caches
470-
* so the UI refreshes automatically.
471-
*
472-
* Invalidates both external MCP server keys and workflow MCP server keys.
473-
*/
461+
/** Subscribes to `tools_changed` SSE events and invalidates the affected query keys. */
474462
export function useMcpToolsEvents(workspaceId: string) {
475463
const queryClient = useQueryClient()
476464

@@ -598,17 +586,11 @@ export function useMcpServerTest() {
598586
}
599587
}
600588

601-
/**
602-
* Fetch allowed MCP domains (admin-configured allowlist)
603-
*/
604589
async function fetchAllowedMcpDomains(signal?: AbortSignal): Promise<string[] | null> {
605590
const data = await requestJson(getAllowedMcpDomainsContract, { signal })
606591
return data.allowedMcpDomains ?? null
607592
}
608593

609-
/**
610-
* Hook to fetch allowed MCP domains
611-
*/
612594
export function useAllowedMcpDomains() {
613595
return useQuery<string[] | null>({
614596
queryKey: mcpKeys.allowedDomains(),

apps/sim/lib/mcp/client.ts

Lines changed: 5 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,10 @@
1-
/**
2-
* MCP (Model Context Protocol) Client
3-
*
4-
* Implements the client side of MCP protocol with support for:
5-
* - Streamable HTTP transport (MCP 2025-06-18)
6-
* - Tool execution and discovery
7-
* - Session management and protocol version negotiation
8-
* - Custom security/consent layer
9-
*/
10-
111
import { UnauthorizedError } from '@modelcontextprotocol/sdk/client/auth.js'
122
import { Client } from '@modelcontextprotocol/sdk/client/index.js'
133
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js'
144
import {
5+
LATEST_PROTOCOL_VERSION,
156
type ListToolsResult,
7+
SUPPORTED_PROTOCOL_VERSIONS,
168
type Tool,
179
ToolListChangedNotificationSchema,
1810
} from '@modelcontextprotocol/sdk/types.js'
@@ -50,12 +42,6 @@ export class McpClient {
5042
private authProvider?: McpClientOptions['authProvider']
5143
private isConnected = false
5244

53-
private static readonly SUPPORTED_VERSIONS = [
54-
'2025-06-18', // Latest stable with elicitation and OAuth 2.1
55-
'2025-03-26', // Streamable HTTP support
56-
'2024-11-05', // Initial stable release
57-
]
58-
5945
constructor(options: McpClientOptions) {
6046
this.config = options.config
6147
this.securityPolicy = options.securityPolicy ?? {
@@ -135,9 +121,6 @@ export class McpClient {
135121
}
136122
}
137123

138-
/**
139-
* Disconnect from MCP server
140-
*/
141124
async disconnect(): Promise<void> {
142125
logger.info(`Disconnecting from MCP server: ${this.config.name}`)
143126

@@ -152,16 +135,10 @@ export class McpClient {
152135
logger.info(`Disconnected from MCP server: ${this.config.name}`)
153136
}
154137

155-
/**
156-
* Get current connection status
157-
*/
158138
getStatus(): McpConnectionStatus {
159139
return { ...this.connectionStatus }
160140
}
161141

162-
/**
163-
* List all available tools from the server
164-
*/
165142
async listTools(): Promise<McpTool[]> {
166143
if (!this.isConnected) {
167144
throw new McpConnectionError('Not connected to server', this.config.name)
@@ -190,9 +167,6 @@ export class McpClient {
190167
}
191168
}
192169

193-
/**
194-
* Execute a tool on the MCP server
195-
*/
196170
async callTool(toolCall: McpToolCall): Promise<McpToolResult> {
197171
if (!this.isConnected) {
198172
throw new McpConnectionError('Not connected to server', this.config.name)
@@ -237,10 +211,6 @@ export class McpClient {
237211
}
238212
}
239213

240-
/**
241-
* Ping the server to check if it's still alive and responsive
242-
* Per MCP spec: servers should respond to ping requests
243-
*/
244214
async ping(): Promise<{ _meta?: Record<string, any> }> {
245215
if (!this.isConnected) {
246216
throw new McpConnectionError('Not connected to server', this.config.name)
@@ -257,18 +227,11 @@ export class McpClient {
257227
}
258228
}
259229

260-
/**
261-
* Check if the server declared `capabilities.tools.listChanged: true` during initialization.
262-
*/
263230
hasListChangedCapability(): boolean {
264231
return !!this.client.getServerCapabilities()?.tools?.listChanged
265232
}
266233

267-
/**
268-
* Register a callback to be invoked when the underlying transport closes.
269-
* Used by the connection manager for reconnection logic.
270-
* Chains with the SDK's internal onclose handler so it still performs its cleanup.
271-
*/
234+
/** Chains with the SDK's internal onclose handler so its cleanup still runs. */
272235
onClose(callback: () => void): void {
273236
const existingHandler = this.transport.onclose
274237
this.transport.onclose = () => {
@@ -277,26 +240,17 @@ export class McpClient {
277240
}
278241
}
279242

280-
/**
281-
* Get server configuration
282-
*/
283243
getConfig(): McpServerConfig {
284244
return { ...this.config }
285245
}
286246

287-
/**
288-
* Get version information for this client
289-
*/
290247
static getVersionInfo(): McpVersionInfo {
291248
return {
292-
supported: [...McpClient.SUPPORTED_VERSIONS],
293-
preferred: McpClient.SUPPORTED_VERSIONS[0],
249+
supported: [...SUPPORTED_PROTOCOL_VERSIONS],
250+
preferred: LATEST_PROTOCOL_VERSION,
294251
}
295252
}
296253

297-
/**
298-
* Get the negotiated protocol version for this connection
299-
*/
300254
getNegotiatedVersion(): string | undefined {
301255
const serverVersion = this.client.getServerVersion()
302256
return typeof serverVersion === 'string' ? serverVersion : undefined
@@ -306,9 +260,6 @@ export class McpClient {
306260
return this.transport.sessionId
307261
}
308262

309-
/**
310-
* Request user consent for tool execution
311-
*/
312263
async requestConsent(consentRequest: McpConsentRequest): Promise<McpConsentResponse> {
313264
if (!this.securityPolicy.requireConsent) {
314265
return { granted: true, auditId: `audit-${Date.now()}` }

0 commit comments

Comments
 (0)