Problem
When a multi-agent worker runs relaunch.sh (or the app is rebuilt/relaunched for any reason), the orchestration loop loses track of the worker. The worker session gets a new SDK session after relaunch, but the orchestrator's SendPromptAndWaitAsync was awaiting the old TCS which gets canceled/orphaned.
Observed behavior
In the PP- IC Things orchestration:
- Worker-1 was dispatched and started doing work
- Worker called
relaunch.sh to rebuild PolyPilot with code changes
- After relaunch, the worker's session was restored but the orchestrator never received the completion signal
- The orchestrator's reflection loop hung waiting for a worker result that would never come
Expected behavior
When the app relaunches mid-orchestration:
- The pending orchestration should be detected and resumed via
PendingOrchestration
- Worker results from before the relaunch should be recoverable
- OR the orchestrator should detect the relaunch and re-dispatch the worker
Notes
This is related to but distinct from the server idle timeout issue (#396). In this case the app itself is restarted, not just the server killing an idle session.