Resolve MCP session deadlock with thread-safe per-transport locking#114
Resolve MCP session deadlock with thread-safe per-transport locking#114dhilloulinoracle merged 3 commits intooracle:mainfrom
Conversation
|
Thank you @gotsysdba for the PR! @paul-cayet will review it |
There was a problem hiding this comment.
Could you add regression coverage for the behavior this PR is changing?
For instance:
-
Concurrent cold session creation for the same transport/conversation: A good test would call
AsyncRuntime.get_or_create_session()from two threads at the same time and assert that only one session is created and both callers get the same session back. -
Async cold session creation should not block the event loop: Since this PR moves
get_or_create_session()behindto_thread.run_sync(...)inMCPTool.run_async()andMCPToolBox.get_tools_async(), it would be good to add tests that schedule a small heartbeat task alongside those calls and assert the heartbeat still runs while session creation is in progress.
You could add them to wayflowcore/tests/mcptools/test_mcp_tools.py near the existing session-persistence tests around line 260.
|
@sonleoracle, added tests:
|
|
Internal regression failed: Build ID #748 |
|
Internal regression succeeded 🍏: Build ID #749 |
|
Merge Gate INPROGRESS |
|
Internal regression succeeded 🍏: Build ID #1540 |
|
Merge Gate succeeded 🍏: Build ID #2280 |
Fixes #112
Fixes a deadlock where get_or_create_session() called from async methods (MCPTool.run_async, MCPToolBox._get_tools_inner_async) blocked on the portal's own event loop, and a TOCTOU race condition where concurrent callers could create duplicate sessions for the same transport.
Changes:
tools.py: Offload get_or_create_session() to a worker thread via anyio.to_thread.run_sync in both MCPTool.run_async and MCPToolBox._get_tools_inner_async, so the event loop is never blocked_session_persistence.py: Replace the direct _create_long_lived_session call with a per-transport double-checked locking pattern: