Fix: Prevent hangs on transport crashes by racing tool calls against session task by JamesDuncanNz · Pull Request #4904 · google/adk-python

JamesDuncanNz · 2026-03-19T20:48:53Z

Race tool call coroutines against the background session task so that transport crashes (e.g. non-2xx HTTP responses) surface immediately instead of hanging until sse_read_timeout expires.

Add run_guarded() method to SessionContext
Add is_task_alive property for clean cross-class API
Let ConnectionError propagate to retry_on_errors for session recovery
Enhance _is_session_disconnected to detect dead background tasks
Add 6 new test cases

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

4901
4162

2. Or, if no issue exists, describe the change:

Problem:

When an MCP transport crashes (e.g. the remote server returns a non-2xx HTTP response), the session.call_tool() coroutine hangs indefinitely until sse_read_timeout expires. The background session task has already died with the transport error, but nothing notices until the timeout fires. This creates a poor user experience with long, unexplained hangs.

Solution:

To provide immediate feedback to the user upon transport failures, this change races the tool call coroutine against the background session task using asyncio.wait(return_when=FIRST_COMPLETED).

If the background task dies first (indicating a transport crash), run_guarded() cancels the tool call coroutine and raises a ConnectionError wrapping the original transport exception. This error propagates to retry_on_errors, which triggers session recovery by creating a fresh session. Exceptions originating from the tool call coroutine itself are propagated unwrapped.

Key changes:

SessionContext.run_guarded() — races a coroutine against the background task; if the task dies first, cancels the coroutine and raises ConnectionError wrapping the original exception.
SessionContext.is_task_alive — property that returns True only when the background task is running, used by _is_session_disconnected to detect dead sessions.
MCPSessionManager._is_session_disconnected() — now accepts an optional SessionContext and checks is_task_alive in addition to stream state.
MCPSessionManager.get_session_context() — new method to retrieve the SessionContext for a given session, used by MCPTool to call run_guarded().
MCPTool._call_tool_async() — wraps the call_tool coroutine with run_guarded() when a SessionContext is available.

Testing Plan

Unit Tests:

I have added or updated unit tests for my change.
All unit tests pass locally.

6 new test cases added:

Test	File	Description
`test_run_guarded_normal_completion`	`test_session_context.py`	Coroutine completes normally and returns its result
`test_run_guarded_background_task_crash`	`test_session_context.py`	Background task crash raises `ConnectionError` with original exception as cause
`test_run_guarded_task_already_dead`	`test_session_context.py`	Raises immediately when task is already done before `run_guarded` is called
`test_run_guarded_cancels_coroutine_on_crash`	`test_session_context.py`	Verifies the hung coroutine is cancelled when the background task crashes
`test_run_guarded_coroutine_raises`	`test_session_context.py`	Coroutine exceptions propagate unwrapped (not wrapped in `ConnectionError`)
`test_run_guarded_no_task`	`test_session_context.py`	Raises `ConnectionError` when task has not been started

Additional tests updated in test_mcp_session_manager.py:

TestGetSessionContext — 2 tests for the new get_session_context() method
TestIsSessionDisconnectedWithContext — 3 tests for the enhanced _is_session_disconnected() with SessionContext
Existing tests updated to include SessionContext in session pool tuples

pytest tests/unittests/tools/mcp_tool/ -v

Manual End-to-End (E2E) Tests:

Tested against a local MCP server configured to return HTTP 403 after initial connection. Before this change, call_tool hung for the full sse_read_timeout duration. After this change, the ConnectionError surfaces within milliseconds and retry_on_errors triggers a fresh session.

Checklist

I have read the CONTRIBUTING.md document.
I have performed a self-review of my own code.
I have commented my code, particularly in hard-to-understand areas.
I have added tests that prove my fix is effective or that my feature works.
New and existing unit tests pass locally with my changes.
I have manually tested my changes end-to-end.
Any dependent changes have been merged and published in downstream modules.

Additional context

The SessionContext is now stored as the 4th element in the session pool tuple (_sessions dict), changing the tuple from 3 elements to 4. All existing tests have been updated to reflect this.

Race tool call coroutines against the background session task so that transport crashes (e.g. non-2xx HTTP responses) surface immediately instead of hanging until sse_read_timeout expires. - Add run_guarded() method to SessionContext - Add is_task_alive property for clean cross-class API - Let ConnectionError propagate to retry_on_errors for session recovery - Enhance _is_session_disconnected to detect dead background tasks - Add 6 new test cases

JamesDuncanNz and others added 2 commits March 19, 2026 20:45

Merge branch 'google:main' into main

ff8a97b

adk-bot added the mcp [Component] Issues about MCP support label Mar 19, 2026

JamesDuncanNz closed this Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Prevent hangs on transport crashes by racing tool calls against session task#4904

Fix: Prevent hangs on transport crashes by racing tool calls against session task#4904
JamesDuncanNz wants to merge 2 commits intogoogle:mainfrom
JamesDuncanNz:main

JamesDuncanNz commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JamesDuncanNz commented Mar 19, 2026

Link to Issue or Description of Change

Testing Plan

Checklist

Additional context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants