Context
The Copilot SDK/CLI emits premature session.idle events during long tool-executing turns, causing multi-agent workers to be collected with truncated responses. PR #375 added an elaborate workaround (RecoverFromPrematureIdleIfNeededAsync) using:
ManualResetEventSlim signal set by re-arm events after premature idle
events.jsonl mtime freshness detection (< 15s = CLI still writing)
- Multi-round recovery loop with
bestResponse accumulation
- 120s recovery timeout
Why this is a hack
- Relies on filesystem side-channel (file mtime) to determine SDK state
- Normal completions can stall ~15s while freshness detection runs
- Multi-round loop adds complexity and edge cases (OCE handling, bestResponse scoping)
- The root cause is the SDK emitting
session.idle before the turn is actually complete
Proposed investigation
- File upstream issue with Copilot SDK team documenting the premature idle behavior
- Request turn-scoped terminal events or turn IDs so consumers can distinguish real vs premature idle
- If upstream fix is not forthcoming, evaluate whether the workaround can be simplified (e.g., just use turn ID matching)
Priority
Medium — the workaround works but adds resilience debt in the most critical orchestration path.