Sub-agent runtime sub-issue: checkpoint and continue child work across turns

## Parent

Sub-issue of #1806.

## Problem

#1806 correctly identifies the hard 120s sub-agent API-response ceiling. The deeper product problem is that useful child work is treated as one final response instead of a resumable, checkpointed activity.

When a child needs more runtime or more than one thinking/execution pass, the parent has no clean way to:

- See partial progress.
- Continue the child for another turn.
- Ask the child to checkpoint intermediate findings.
- Recover useful work when final synthesis times out.
- Route large results through files/handles instead of the final response body.

Evidence:

- Public report #1806 describes five parallel sub-agents that ran for about 12 minutes, then failed with a 120s API timeout and no result.
- A redacted scan of maintainer-private CodeWhale logs found 19 `spawn_agent` calls, 13 `wait_agent` calls, and 11 `close_agent` calls.
- 5 of the observed `wait_agent` calls used a timeout argument above 120s, but the API-response ceiling still matters because the child result path is final-response shaped.
- 4 observed wait outputs returned an empty status before later completion, which is consistent with parent polling needing an explicit progress/continuation contract.

No prompts, raw child summaries, secrets, local paths, or transcript text are copied here.

## Desired Behavior

Sub-agents should become resumable child work items rather than one-shot response generators.

Required shape:

- Each child has a durable task id, role, budget, model/effort route, and status.
- Child progress can be polled without requiring completion.
- Child can emit checkpoint receipts: summary, files touched/read, evidence handles, blockers, next suggested action.
- Parent can ask a child to continue for another turn with bounded context and explicit budget.
- Child can write or hand off large output through approved handles/artifacts instead of the final response body.
- Timeout should preserve last checkpoint and mark the child `needs_continuation` or `timed_out_with_checkpoint`, not simply `failed` with null result.

## Acceptance Criteria

- Sub-agent timeout, max runtime, and max turns are configurable under `[subagents]`.
- A child that exceeds one response window can resume from its last checkpoint.
- Parent UI shows `running`, `checkpointed`, `needs_continuation`, `failed`, and `complete` distinctly.
- `agent_eval` or equivalent continuation can send a follow-up assignment to the same child/session.
- If final response times out, prior checkpoint data remains visible and inspectable.
- Tests cover a child that checkpoints, times out, resumes, and completes without losing the intermediate summary.
- Approval/sandbox/provider policies are enforced on every continuation turn.

## Related

- #1806 sub-agent 120s timeout.
- #1981 agent team UX and parent-owned review.
- #2009 background task waiting/yielding.
- #2024 delegation opportunity detection.
- #2025 GitHub triage scout.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sub-agent runtime sub-issue: checkpoint and continue child work across turns #2029

Parent

Problem

Desired Behavior

Acceptance Criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Sub-agent runtime sub-issue: checkpoint and continue child work across turns #2029

Description

Parent

Problem

Desired Behavior

Acceptance Criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions