-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Description
Problem
Trivial bash tool calls (e.g., sed reading 10 lines of a file) dispatched by the headless server hang indefinitely and never complete, eventually causing session.shutdown ~22 minutes later.
Observed in: MAUI PR Reviewer worker-2 session (6b7ce97f). Two sed commands dispatched at 19:41:46 never completed. Server killed session at 20:03:25.
Evidence from events.jsonl
19:41:46.971 | tool.execution_start bash: sed -n '645,680p' Window.cs
19:41:46.972 | tool.execution_start bash: sed -n '540,550p' Window.cs
... no tool.execution_complete events ...
20:03:25.795 | session.shutdown
The session was running a sub-agent (agent-38, opus model reviewing a PR) that was still actively working. A read_agent call with 120s timeout had just completed. The two sed commands were dispatched alongside the sub-agent work.
Possible Causes
- Concurrency limit: The sub-agent may consume all execution slots, blocking trivial bash commands behind it
- Dead event stream interaction: The client event stream was already broken at this point. If the server needs any client round-trip for tool execution (permission requests?), it would hang
- Server-side tool execution bug: Internal deadlock or resource exhaustion
Impact
- Multi-agent worker sessions get killed mid-work, losing all progress
- The watchdog eventually detects this (commit ccee665 improves detection), but the work is already lost
- Workaround: None from client side. The tools simply never complete.
Investigation Needed
- Is there a concurrency limit on tool execution per session in the headless server?
- Does tool execution require any client-side round-trip (permission, etc.)?
- Is this a known issue with subagent + parallel tool dispatch?
- May need upstream Copilot CLI bug report
Related
- PR Fix multi-agent worker failures, session persistence, server health, and history recovery #391 (watchdog session.shutdown detection)
- Issue Upstream: Copilot CLI headless server breaks when global CLI cleans shared native module directory #392 (posix_spawn upstream bug)
- Issue Send keep-alive pings to prevent server idle timeout killing sessions #396 (keep-alive pings)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels