Skip to content

fix(mcp): add use_isolated_event_loop to McpToolset for Vertex AI Agent Engine compatibility #5509

Open
vipin-v-nair wants to merge 6 commits into
google:mainfrom
vipin-v-nair:fix/mcp-toolset-isolated-event-loop-agent-engine
Open

fix(mcp): add use_isolated_event_loop to McpToolset for Vertex AI Agent Engine compatibility #5509
vipin-v-nair wants to merge 6 commits into
google:mainfrom
vipin-v-nair:fix/mcp-toolset-isolated-event-loop-agent-engine

Conversation

@vipin-v-nair

Copy link
Copy Markdown

Problem

McpToolset with StreamableHTTPConnectionParams fails on Vertex AI Agent
Engine with:

  Attempted to exit cancel scope in a different task than it was entered in                                                                                                     

Root cause: anyio's CancelScope binds to the asyncio.Task that enters
it. Agent Engine's scheduler context-switches tasks between entering and
exiting the scope inside streamablehttp_client's anyio.create_task_group().

Fix

Add use_isolated_event_loop=True to McpToolset (and the underlying
McpTool). When set, each MCP operation runs via asyncio.to_thread() in a
dedicated thread with asyncio.new_event_loop(). The cancel scope is created
and destroyed entirely within that isolated loop.

Changes

  • mcp_thread_utils.py (new): list_tools_in_thread and call_tool_in_thread
    helpers that open a fresh connection per call inside an isolated loop.
  • mcp_tool.py: use_isolated_event_loop param; branches to thread path in
    _run_async_impl.
  • mcp_toolset.py: use_isolated_event_loop param; branches to thread path
    in get_tools; passes the flag through to each McpTool.

Usage

McpToolset(                                               
    connection_params=StreamableHTTPConnectionParams(
        url="https://my-mcp-server.run.app/mcp"      
    ),                                         
    use_isolated_event_loop=True,  # required on Vertex AI Agent Engine                                                                                                           
)
                                                                                                                                                                                  
Notes                                                     
                                                                                                                                                                                  
- Opt-in, defaults to Falseno behaviour change for existing users.                                                                                                             
- Only supported with StreamableHTTPConnectionParams; raises ValueError otherwise.
- auth_scheme, auth_credential, and header_provider are fully supported.                                                                                                          
- progress_callback and MCP sampling are not invoked in this mode (documented).                                                                                                   
- Verified against Vertex AI Agent Engine with three Cloud Run MCP servers.                                                                                                       

---                                                                                                                                                                                                                    
Unit Tests:
                                                                                                                                                                                                                       
- I have added or updated unit tests for my change.
- All unit tests pass locally.                                                                                                                                                                                         
 
Unit tests for mcp_thread_utils.py (list_tools_in_thread, call_tool_in_thread) were not added in this PRthe helpers wrap streamablehttp_client which requires a live MCP server. Integration coverage is provided by the E2E test below.
                                                                                                                                                                                                                       
---             
Manual End-to-End (E2E) Tests:
                                                                                                                                                                                                                       
Setup:
                                                                                                                                                                                                                       
1. Deploy a Streamable HTTP MCP server (any Cloud Run MCP endpoint works).                                                                                                                                             
2. Deploy an ADK agent to Vertex AI Agent Engine (Reasoning Engine) that connects to that MCP server using McpToolset with use_isolated_event_loop=True:
                                                                                                                                                                                                                       
McpToolset(                                                                                                                                                                                                            
    connection_params=StreamableHTTPConnectionParams(url="https://<mcp-server-url>/mcp"),                                                                                                                              
    use_isolated_event_loop=True,                                                                                                                                                                                      
)               
                                                                                                                                                                                                                       
Test 1Tools are discovered correctly:                                                                                                                                                                               
- Invoke get_tools() on the deployed agent engine.
- Verify the tool list from the MCP server is returned without error.                                                                                                                                                  
                
Test 2Tool call succeeds end-to-end:                                                                                                                                                                                
- Send a prompt to the deployed agent that triggers an MCP tool call.
- Verify the agent receives the tool result and returns a coherent response.                                                                                                                                           
                                                                            
Test 3Regression: default path is unchanged:                                                                                                                                                                        
- Deploy the same agent with use_isolated_event_loop=False (default).                                                                                                                                                  
- Confirm existing behavior is unaffected.                                                                                                                                                                             
                                                                                                                                                                                                                       
Observed failure without the fix (baseline):                                                                                                                                                                           
anyio._backends._asyncio.ExceptionGroup: ...                                                                                                                                                                           
  RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
                                                                                                                                                                                                                       
Observed result with the fix:
- Agent Engine successfully calls MCP tools across multiple turns with no cancel scope errors.                                                                                                                         
- Tested on Vertex AI Agent Engine (us-central1) with three Cloud Run MCP servers.                                                                                  
                                                                                                                                     
       

…compatibility

On Vertex AI Agent Engine, McpToolset with StreamableHTTPConnectionParams
fails with:

  Attempted to exit cancel scope in a different task than it was entered in

Root cause: anyio's CancelScope binds to the asyncio.Task that enters it.
Agent Engine's scheduler can context-switch tasks between entering and
exiting the scope inside streamablehttp_client's anyio.create_task_group(),
causing the assertion to fire.

Fix: add use_isolated_event_loop=True to McpToolset (and the underlying
McpTool). When set, each MCP operation (tool discovery and tool calls) runs
via asyncio.to_thread() in a dedicated thread with asyncio.new_event_loop().
The anyio cancel scope is created and destroyed entirely within that isolated
loop, so it never crosses task boundaries in the caller's scheduler.

The new mcp_thread_utils module contains the thread-safe helpers
(list_tools_in_thread, call_tool_in_thread). auth_scheme, auth_credential,
and header_provider are fully supported in this mode. progress_callback and
MCP sampling are not invoked (documented limitation).

The flag is opt-in and defaults to False, preserving all existing behaviour.
It is restricted to StreamableHTTPConnectionParams; other transports raise
ValueError.

Verified against Vertex AI Agent Engine with three Cloud Run MCP servers.

Co-Authored-By: vipin-v-nair <vipinvnair@google.com>
@adk-bot adk-bot added the mcp [Component] Issues about MCP support label Apr 27, 2026
@rohityan rohityan self-assigned this Apr 28, 2026
@rohityan

Copy link
Copy Markdown
Collaborator

Hi @vipin-v-nair , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Please fix the failing mypy-diff and precommit tests before we can proceed with a review.

@rohityan rohityan added the request clarification [Status] The maintainer need clarification or more information from the author label Apr 28, 2026
@vipin-v-nair

Copy link
Copy Markdown
Author

Hope this addresses the failures. Please let me know if i need to do anything else

LiuYuWei added a commit to costaff-ai/costaff-agent-business-analysis that referenced this pull request May 15, 2026
…s the cancel-scope race (BA pilot)

The anyio CancelScope cross-task race ("unhandled errors in a TaskGroup")
strips BA of its MCP tools and makes PDF export fail silently. Confirmed
this is upstream ADK issue #4454 (open, no fix — the maintainer's
"pin anyio 3.x" workaround is impossible on ADK 1.33 which hard-requires
anyio>=4.9). Tested anyio downgrade and A2A Executor V2 — both failed.

PR #5509 ("fix(mcp): add use_isolated_event_loop to McpToolset") is the
actual fix: each MCP op runs in a dedicated thread with its own asyncio
event loop, so the anyio cancel scope is always entered and exited in
the same task. PR has E2E evidence (Vertex AI Agent Engine + 3 Cloud Run
MCP servers, zero cancel-scope errors) but is not yet merged.

This vendors the PR as a patch overlay until it lands upstream:
  - agent/adk_patches/5509-isolated-event-loop.patch: the squashed PR diff
    with mcp_toolset.py hunk 1 hand-adjusted for ADK 1.33's __init__
    signature (1.33 added a `credential_key` kwarg the PR base lacked) and
    the cosmetic docstring hunk dropped. Dry-run applies cleanly (fuzz 2).
  - agent/Dockerfile: apt-get patch; after pip install, apply the patch
    to the installed google-adk via `patch -p2` from site-packages.
  - agent/mcp_toolsets/__init__.py: pass use_isolated_event_loop=True on
    both McpToolset constructions (own MCP + dashboard extra MCPs).
    Verified StreamableHTTPServerParams is an alias of
    StreamableHTTPConnectionParams so the PR's isinstance guard passes.

BA-only pilot. If the race disappears here, roll out to coding / twinkle /
template the same way. Trade-off (acceptable): new HTTP connection per
tool call, no MCP progress_callback/sampling — we use neither.
LiuYuWei added a commit to costaff-ai/costaff-agent-business-analysis that referenced this pull request May 15, 2026
Reverts bc9e74e + 610b4f3. The vendored google/adk-python#5509 patch
could not be made to apply cleanly+reliably against ADK 1.33 inside the
Docker build (hunk-structure fragility after hand-editing for the 1.33
signature drift, plus patch's stdin-prompt hazard). Returning BA to the
clean baseline. The MCP cancel-scope race will be addressed differently:
moving critical tools off MCP and onto native ADK function tools.
LiuYuWei added a commit to costaff-ai/costaff-agent-business-analysis that referenced this pull request May 22, 2026
…s the cancel-scope race (BA pilot)

The anyio CancelScope cross-task race ("unhandled errors in a TaskGroup")
strips BA of its MCP tools and makes PDF export fail silently. Confirmed
this is upstream ADK issue #4454 (open, no fix — the maintainer's
"pin anyio 3.x" workaround is impossible on ADK 1.33 which hard-requires
anyio>=4.9). Tested anyio downgrade and A2A Executor V2 — both failed.

PR #5509 ("fix(mcp): add use_isolated_event_loop to McpToolset") is the
actual fix: each MCP op runs in a dedicated thread with its own asyncio
event loop, so the anyio cancel scope is always entered and exited in
the same task. PR has E2E evidence (Vertex AI Agent Engine + 3 Cloud Run
MCP servers, zero cancel-scope errors) but is not yet merged.

This vendors the PR as a patch overlay until it lands upstream:
  - agent/adk_patches/5509-isolated-event-loop.patch: the squashed PR diff
    with mcp_toolset.py hunk 1 hand-adjusted for ADK 1.33's __init__
    signature (1.33 added a `credential_key` kwarg the PR base lacked) and
    the cosmetic docstring hunk dropped. Dry-run applies cleanly (fuzz 2).
  - agent/Dockerfile: apt-get patch; after pip install, apply the patch
    to the installed google-adk via `patch -p2` from site-packages.
  - agent/mcp_toolsets/__init__.py: pass use_isolated_event_loop=True on
    both McpToolset constructions (own MCP + dashboard extra MCPs).
    Verified StreamableHTTPServerParams is an alias of
    StreamableHTTPConnectionParams so the PR's isinstance guard passes.

BA-only pilot. If the race disappears here, roll out to coding / twinkle /
template the same way. Trade-off (acceptable): new HTTP connection per
tool call, no MCP progress_callback/sampling — we use neither.
LiuYuWei added a commit to costaff-ai/costaff-agent-business-analysis that referenced this pull request May 22, 2026
Reverts 7a54e6c + caf13b7. The vendored google/adk-python#5509 patch
could not be made to apply cleanly+reliably against ADK 1.33 inside the
Docker build (hunk-structure fragility after hand-editing for the 1.33
signature drift, plus patch's stdin-prompt hazard). Returning BA to the
clean baseline. The MCP cancel-scope race will be addressed differently:
moving critical tools off MCP and onto native ADK function tools.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mcp [Component] Issues about MCP support request clarification [Status] The maintainer need clarification or more information from the author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants