Commit b2ad5e9
authored
improvement(mcp): post-merge hardening — protocol negotiation + distributed OAuth lock + typed errors (#4722)
* improvement(mcp): post-merge hardening — protocol negotiation + distributed OAuth lock + typed error dispatch
- Inbound MCP server now negotiates protocolVersion per MCP 2025-06-18: echoes
the client's requested version when supported, falls back to our latest.
Previously hardcoded to the oldest spec version (2024-11-05).
- withMcpOauthRefreshLock now takes a Postgres advisory transaction lock in
addition to the in-process Promise chain, so concurrent processes (multi-task
ECS) serialize on a per-OAuth-row basis. Previously a refresh race across
processes could rotate a token under another process's feet and force re-auth.
- categorizeError dispatches on McpOauthAuthorizationRequiredError /
UnauthorizedError / McpConnectionError first, only falling back to substring
matching for SDK / third-party errors. Adds 502 for connection failures and
503 for cooldown. Tests cover all four typed cases.
- discoverTools no longer pretends to handle deferred-side-effect rejections
via a dead allSettled().catch() — each side-effect already self-logs; we just
swallow per-promise to silence unhandled-rejection warnings.
* fix(mcp): bugbot review on hardening PR
- Replace `as never` cast in negotiateProtocolVersion with `as readonly string[]`
on the array — preserves TypeScript narrowing on the comparison while
satisfying the readonly-tuple `.includes()` constraint properly.
- Document the pg_advisory_xact_lock tradeoff: session-level locks
(`pg_advisory_lock`) would release the connection earlier, but PgBouncer
transaction-pooling mode breaks them. xact_lock is the correct choice for
Sim's deployment; if pool pressure becomes real, the comment notes the
Redlock escape hatch.
* chore(mcp): trim verbose comments + reuse SDK Tool type in McpTool
- McpTool now extends `Pick<Tool, 'name' | 'description'>` from
@modelcontextprotocol/sdk so name/description fields stay in sync with
the SDK; serverId/serverName remain Sim-specific additions.
- Drop file-header restatements ("MCP Types - for connecting to external
MCP servers"), one-line wrapper docstrings ("Get connection status"),
and narrative comment blocks that just restate visible code.
- Keep only comments that document non-obvious "why" — OAuth refresh-lock
tradeoff, in-flight dedup key composition, SDK Tool.inputSchema typing,
preregistered-client semantics, postMessage handshake contract.
* improvement(mcp): swap PG advisory lock for Redis mutex on OAuth refresh
withMcpOauthRefreshLock now uses `coalesceLocally` + Redis acquireLock/
releaseLock with polling — the same primitives backing regular OAuth
refresh (`app/api/auth/oauth/utils.ts`). No more pinning a Postgres
connection for the duration of the SDK's OAuth HTTP refresh.
- In-process dedup: shared promise via `coalesceLocally`.
- Cross-process: Redis SET NX EX mutex; followers poll until the leader
releases (30s max wait, 100ms poll), then acquire and run fn().
- Each MCP caller still constructs its own client (semantics preserved).
- Falls open when Redis is unavailable — same behavior as the regular
OAuth refresh code path.
* improvement(mcp): use SDK protocol versions + pool pinned undici agents + cover OAuth lock
- McpClient.SUPPORTED_VERSIONS removed; getVersionInfo() and the inbound
serve route both import LATEST_PROTOCOL_VERSION / SUPPORTED_PROTOCOL_VERSIONS
directly from @modelcontextprotocol/sdk/types.js. New protocol revisions
ship automatically with SDK upgrades.
- pinned-fetch now caches undici Agents in a module-level LRU keyed by
resolvedIP (max 64). Back-to-back MCP calls to the same server reuse the
keep-alive connection pool instead of opening fresh TCP + TLS each time.
- New integration tests for withMcpOauthRefreshLock covering: in-process
dedup via coalesceLocally, cross-process serialization via Redis mutex,
fall-open on Redis unavailable, lock release on throw, release-failure
swallow, per-row key isolation.
* fix(mcp): serialize OAuth refresh callers; do not share McpClient
withMcpOauthRefreshLock previously wrapped fn() in coalesceLocally, which
returns the SAME promise (and the same resolved value) to all in-process
callers. fn() returns a stateful McpClient — sharing it meant whichever
caller finished first would disconnect the client while another was still
mid-call, leaving in-flight RPC on a closed connection.
Swap coalesceLocally for a per-row Promise chain: each caller waits for
the previous to settle, then runs its OWN fn() (gets its own client).
Cross-process Redis mutex semantics unchanged.
The "shareable scalar" assumption that makes coalesceLocally correct for
regular OAuth refresh (returns an access token string) does not hold for
MCP, where each caller needs an independent connection.
* fix(mcp): bugbot — TTL watchdog on OAuth lock + don't close evicted pinned agents
- Redis refresh lock now uses a 15s TTL with a watchdog that extends every
5s while fn() runs. Long-running OAuth refreshes no longer lose the lock
mid-flight and let another process race the same refresh.
- Pinned-agent LRU eviction no longer calls `agent.close()`. Existing
`createMcpPinnedFetch` closures hold the dispatcher reference and were
using a closed Agent after eviction. We drop from the cache and let GC
release the dispatcher once the last closure dies; undici closes idle
keep-alive connections via its own internal timeout.
- New tests: watchdog extends while fn() runs and stops once it settles;
evicted agents are not closed and captured closures still work.
* fix(mcp): throw instead of falling open when refresh lock wait exceeds deadline
When the Redis refresh lock can't be acquired within REFRESH_MAX_WAIT_MS
the previous code ran fn() uncoordinated — but another process can still
be holding the lock (watchdog-extended) and refreshing the same OAuth
row, recreating the exact race the lock prevents.
Throw on deadline. The caller can retry; the Redis-down branch remains
the only path that runs fn() uncoordinated (no coordination is possible
there).
* docs(mcp): restore TSDoc that documented intent on exported types/methods
Earlier comment-trim pass went too far on a few exports — restored
the TSDoc that explained non-obvious "why" decisions:
- SimMcpOauthProviderInit.preregistered: when set, the SDK skips DCR.
- McpServerConfig.userId: required for OAuth; selects which user's
stored tokens to use.
- McpOauthAuthorizationRequiredError: benign pending state vs failure.
- McpToolsChangedCallback / McpClientOptions: notification semantics,
DNS-rebinding pinning rationale, OAuth provider contract.
- StoredMcpToolReference / StoredMcpTool: minimal vs extended use.
- McpClient.connect: documents listChanged handler registration.
- McpService.executeTool: documents session-error retry behavior.
Pure-restatement comments ("Disconnect from MCP server") stay trimmed.1 parent 19b5099 commit b2ad5e9
12 files changed
Lines changed: 459 additions & 192 deletions
File tree
- apps/sim
- app/api/mcp/serve/[serverId]
- hooks/queries
- lib/mcp
- oauth
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
197 | 197 | | |
198 | 198 | | |
199 | 199 | | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
200 | 247 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
| 17 | + | |
16 | 18 | | |
17 | 19 | | |
18 | 20 | | |
| |||
36 | 38 | | |
37 | 39 | | |
38 | 40 | | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
39 | 52 | | |
40 | 53 | | |
41 | 54 | | |
| |||
214 | 227 | | |
215 | 228 | | |
216 | 229 | | |
217 | | - | |
| 230 | + | |
218 | 231 | | |
219 | 232 | | |
220 | 233 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
61 | | - | |
62 | | - | |
| 60 | + | |
63 | 61 | | |
64 | 62 | | |
65 | 63 | | |
| |||
265 | 263 | | |
266 | 264 | | |
267 | 265 | | |
268 | | - | |
269 | | - | |
270 | | - | |
271 | | - | |
272 | | - | |
| 266 | + | |
273 | 267 | | |
274 | 268 | | |
275 | 269 | | |
| |||
464 | 458 | | |
465 | 459 | | |
466 | 460 | | |
467 | | - | |
468 | | - | |
469 | | - | |
470 | | - | |
471 | | - | |
472 | | - | |
473 | | - | |
| 461 | + | |
474 | 462 | | |
475 | 463 | | |
476 | 464 | | |
| |||
598 | 586 | | |
599 | 587 | | |
600 | 588 | | |
601 | | - | |
602 | | - | |
603 | | - | |
604 | 589 | | |
605 | 590 | | |
606 | 591 | | |
607 | 592 | | |
608 | 593 | | |
609 | | - | |
610 | | - | |
611 | | - | |
612 | 594 | | |
613 | 595 | | |
614 | 596 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | | - | |
3 | | - | |
4 | | - | |
5 | | - | |
6 | | - | |
7 | | - | |
8 | | - | |
9 | | - | |
10 | | - | |
11 | 1 | | |
12 | 2 | | |
13 | 3 | | |
14 | 4 | | |
| 5 | + | |
15 | 6 | | |
| 7 | + | |
16 | 8 | | |
17 | 9 | | |
18 | 10 | | |
| |||
50 | 42 | | |
51 | 43 | | |
52 | 44 | | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | 45 | | |
60 | 46 | | |
61 | 47 | | |
| |||
135 | 121 | | |
136 | 122 | | |
137 | 123 | | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | 124 | | |
142 | 125 | | |
143 | 126 | | |
| |||
152 | 135 | | |
153 | 136 | | |
154 | 137 | | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | 138 | | |
159 | 139 | | |
160 | 140 | | |
161 | 141 | | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | 142 | | |
166 | 143 | | |
167 | 144 | | |
| |||
190 | 167 | | |
191 | 168 | | |
192 | 169 | | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | 170 | | |
197 | 171 | | |
198 | 172 | | |
| |||
237 | 211 | | |
238 | 212 | | |
239 | 213 | | |
240 | | - | |
241 | | - | |
242 | | - | |
243 | | - | |
244 | 214 | | |
245 | 215 | | |
246 | 216 | | |
| |||
257 | 227 | | |
258 | 228 | | |
259 | 229 | | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | 230 | | |
264 | 231 | | |
265 | 232 | | |
266 | 233 | | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
271 | | - | |
| 234 | + | |
272 | 235 | | |
273 | 236 | | |
274 | 237 | | |
| |||
277 | 240 | | |
278 | 241 | | |
279 | 242 | | |
280 | | - | |
281 | | - | |
282 | | - | |
283 | 243 | | |
284 | 244 | | |
285 | 245 | | |
286 | 246 | | |
287 | | - | |
288 | | - | |
289 | | - | |
290 | 247 | | |
291 | 248 | | |
292 | | - | |
293 | | - | |
| 249 | + | |
| 250 | + | |
294 | 251 | | |
295 | 252 | | |
296 | 253 | | |
297 | | - | |
298 | | - | |
299 | | - | |
300 | 254 | | |
301 | 255 | | |
302 | 256 | | |
| |||
306 | 260 | | |
307 | 261 | | |
308 | 262 | | |
309 | | - | |
310 | | - | |
311 | | - | |
312 | 263 | | |
313 | 264 | | |
314 | 265 | | |
| |||
0 commit comments