fix(mcp): add use_isolated_event_loop to McpToolset for Vertex AI Agent Engine compatibility #5509
Open
vipin-v-nair wants to merge 6 commits into
Open
Conversation
…compatibility On Vertex AI Agent Engine, McpToolset with StreamableHTTPConnectionParams fails with: Attempted to exit cancel scope in a different task than it was entered in Root cause: anyio's CancelScope binds to the asyncio.Task that enters it. Agent Engine's scheduler can context-switch tasks between entering and exiting the scope inside streamablehttp_client's anyio.create_task_group(), causing the assertion to fire. Fix: add use_isolated_event_loop=True to McpToolset (and the underlying McpTool). When set, each MCP operation (tool discovery and tool calls) runs via asyncio.to_thread() in a dedicated thread with asyncio.new_event_loop(). The anyio cancel scope is created and destroyed entirely within that isolated loop, so it never crosses task boundaries in the caller's scheduler. The new mcp_thread_utils module contains the thread-safe helpers (list_tools_in_thread, call_tool_in_thread). auth_scheme, auth_credential, and header_provider are fully supported in this mode. progress_callback and MCP sampling are not invoked (documented limitation). The flag is opt-in and defaults to False, preserving all existing behaviour. It is restricted to StreamableHTTPConnectionParams; other transports raise ValueError. Verified against Vertex AI Agent Engine with three Cloud Run MCP servers. Co-Authored-By: vipin-v-nair <vipinvnair@google.com>
Collaborator
|
Hi @vipin-v-nair , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Please fix the failing mypy-diff and precommit tests before we can proceed with a review. |
Author
|
Hope this addresses the failures. Please let me know if i need to do anything else |
LiuYuWei
added a commit
to costaff-ai/costaff-agent-business-analysis
that referenced
this pull request
May 15, 2026
…s the cancel-scope race (BA pilot)
The anyio CancelScope cross-task race ("unhandled errors in a TaskGroup")
strips BA of its MCP tools and makes PDF export fail silently. Confirmed
this is upstream ADK issue #4454 (open, no fix — the maintainer's
"pin anyio 3.x" workaround is impossible on ADK 1.33 which hard-requires
anyio>=4.9). Tested anyio downgrade and A2A Executor V2 — both failed.
PR #5509 ("fix(mcp): add use_isolated_event_loop to McpToolset") is the
actual fix: each MCP op runs in a dedicated thread with its own asyncio
event loop, so the anyio cancel scope is always entered and exited in
the same task. PR has E2E evidence (Vertex AI Agent Engine + 3 Cloud Run
MCP servers, zero cancel-scope errors) but is not yet merged.
This vendors the PR as a patch overlay until it lands upstream:
- agent/adk_patches/5509-isolated-event-loop.patch: the squashed PR diff
with mcp_toolset.py hunk 1 hand-adjusted for ADK 1.33's __init__
signature (1.33 added a `credential_key` kwarg the PR base lacked) and
the cosmetic docstring hunk dropped. Dry-run applies cleanly (fuzz 2).
- agent/Dockerfile: apt-get patch; after pip install, apply the patch
to the installed google-adk via `patch -p2` from site-packages.
- agent/mcp_toolsets/__init__.py: pass use_isolated_event_loop=True on
both McpToolset constructions (own MCP + dashboard extra MCPs).
Verified StreamableHTTPServerParams is an alias of
StreamableHTTPConnectionParams so the PR's isinstance guard passes.
BA-only pilot. If the race disappears here, roll out to coding / twinkle /
template the same way. Trade-off (acceptable): new HTTP connection per
tool call, no MCP progress_callback/sampling — we use neither.
LiuYuWei
added a commit
to costaff-ai/costaff-agent-business-analysis
that referenced
this pull request
May 15, 2026
Reverts bc9e74e + 610b4f3. The vendored google/adk-python#5509 patch could not be made to apply cleanly+reliably against ADK 1.33 inside the Docker build (hunk-structure fragility after hand-editing for the 1.33 signature drift, plus patch's stdin-prompt hazard). Returning BA to the clean baseline. The MCP cancel-scope race will be addressed differently: moving critical tools off MCP and onto native ADK function tools.
LiuYuWei
added a commit
to costaff-ai/costaff-agent-business-analysis
that referenced
this pull request
May 22, 2026
…s the cancel-scope race (BA pilot)
The anyio CancelScope cross-task race ("unhandled errors in a TaskGroup")
strips BA of its MCP tools and makes PDF export fail silently. Confirmed
this is upstream ADK issue #4454 (open, no fix — the maintainer's
"pin anyio 3.x" workaround is impossible on ADK 1.33 which hard-requires
anyio>=4.9). Tested anyio downgrade and A2A Executor V2 — both failed.
PR #5509 ("fix(mcp): add use_isolated_event_loop to McpToolset") is the
actual fix: each MCP op runs in a dedicated thread with its own asyncio
event loop, so the anyio cancel scope is always entered and exited in
the same task. PR has E2E evidence (Vertex AI Agent Engine + 3 Cloud Run
MCP servers, zero cancel-scope errors) but is not yet merged.
This vendors the PR as a patch overlay until it lands upstream:
- agent/adk_patches/5509-isolated-event-loop.patch: the squashed PR diff
with mcp_toolset.py hunk 1 hand-adjusted for ADK 1.33's __init__
signature (1.33 added a `credential_key` kwarg the PR base lacked) and
the cosmetic docstring hunk dropped. Dry-run applies cleanly (fuzz 2).
- agent/Dockerfile: apt-get patch; after pip install, apply the patch
to the installed google-adk via `patch -p2` from site-packages.
- agent/mcp_toolsets/__init__.py: pass use_isolated_event_loop=True on
both McpToolset constructions (own MCP + dashboard extra MCPs).
Verified StreamableHTTPServerParams is an alias of
StreamableHTTPConnectionParams so the PR's isinstance guard passes.
BA-only pilot. If the race disappears here, roll out to coding / twinkle /
template the same way. Trade-off (acceptable): new HTTP connection per
tool call, no MCP progress_callback/sampling — we use neither.
LiuYuWei
added a commit
to costaff-ai/costaff-agent-business-analysis
that referenced
this pull request
May 22, 2026
Reverts 7a54e6c + caf13b7. The vendored google/adk-python#5509 patch could not be made to apply cleanly+reliably against ADK 1.33 inside the Docker build (hunk-structure fragility after hand-editing for the 1.33 signature drift, plus patch's stdin-prompt hazard). Returning BA to the clean baseline. The MCP cancel-scope race will be addressed differently: moving critical tools off MCP and onto native ADK function tools.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
McpToolsetwithStreamableHTTPConnectionParamsfails on Vertex AI AgentEngine with:
Root cause:
anyio'sCancelScopebinds to theasyncio.Taskthat entersit. Agent Engine's scheduler context-switches tasks between entering and
exiting the scope inside
streamablehttp_client'sanyio.create_task_group().Fix
Add
use_isolated_event_loop=TruetoMcpToolset(and the underlyingMcpTool). When set, each MCP operation runs viaasyncio.to_thread()in adedicated thread with
asyncio.new_event_loop(). The cancel scope is createdand destroyed entirely within that isolated loop.
Changes
mcp_thread_utils.py(new):list_tools_in_threadandcall_tool_in_threadhelpers that open a fresh connection per call inside an isolated loop.
mcp_tool.py:use_isolated_event_loopparam; branches to thread path in_run_async_impl.mcp_toolset.py:use_isolated_event_loopparam; branches to thread pathin
get_tools; passes the flag through to eachMcpTool.Usage