Skip to content

Workers stop when relaunch.sh rebuilds the app mid-orchestration #400

@PureWeen

Description

@PureWeen

Problem

When a multi-agent worker runs relaunch.sh (or the app is rebuilt/relaunched for any reason), the orchestration loop loses track of the worker. The worker session gets a new SDK session after relaunch, but the orchestrator's SendPromptAndWaitAsync was awaiting the old TCS which gets canceled/orphaned.

Observed behavior

In the PP- IC Things orchestration:

  • Worker-1 was dispatched and started doing work
  • Worker called relaunch.sh to rebuild PolyPilot with code changes
  • After relaunch, the worker's session was restored but the orchestrator never received the completion signal
  • The orchestrator's reflection loop hung waiting for a worker result that would never come

Expected behavior

When the app relaunches mid-orchestration:

  1. The pending orchestration should be detected and resumed via PendingOrchestration
  2. Worker results from before the relaunch should be recoverable
  3. OR the orchestrator should detect the relaunch and re-dispatch the worker

Notes

This is related to but distinct from the server idle timeout issue (#396). In this case the app itself is restarted, not just the server killing an idle session.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions