You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Long-running CodeWhale maintenance sessions often stay in the parent thread even when the work shape is clearly delegable: broad discovery, independent file reads, repeated GitHub object inspection, verification, and synthesis. That makes the parent transcript large and slow, and it can make the model look lost when the missing piece is orchestration.
Evidence from maintainer-private local CodeWhale session logs, scanned 2026-05-24:
33 CodeWhale JSONL session files were inspected.
146 turns were identified, with 123 completed in the inspected logs.
63 turns had 20 or more tool calls.
58 of those heavy tool turns had no observed agent or RLM-style delegation calls.
17 turns ran longer than 10 minutes in the earlier timing pass.
Sub-agent calls did exist in the logs, but only in a small minority of turns; RLM-style calls were not observed.
No prompts, raw tool outputs, secrets, absolute local paths, or user text are copied here. These are aggregate counters from local logs.
Desired Behavior
CodeWhale should notice when the current task has crossed from "normal parent work" into "delegate or structure this" territory.
Examples of trigger patterns:
Many independent read/search/GitHub inspection calls in one turn.
A parent turn with a long same-kind streak, such as issue views, PR views, or file reads.
A task plan that has scout, builder, verifier, and summarizer phases.
Repeated polling of long-running work where the parent has no immediate decision to make.
Context pressure caused by continued parent-thread reconnaissance.
When triggered, CodeWhale should offer a small, explicit routing choice:
Continue in parent.
Open a scout sub-agent.
Open a verifier sub-agent.
Use RLM/structured analysis for large logs or structured payloads.
Convert this into a coordinated run if the work is broad enough.
Acceptance Criteria
A delegation-opportunity detector can classify at least: GitHub triage, broad file search, large-log analysis, verification-only work, and long-running background waits.
The detector surfaces a concise suggestion before the parent thread has already accumulated excessive tool output.
Suggestions are advisory by default; they must not bypass user approval, sandbox, provider, or budget settings.
If the user accepts, CodeWhale opens an appropriately scoped child with a clear role, budget, and expected summary shape.
The parent receives summary-first results with source handles and enough evidence to verify, not raw child transcripts.
Tests cover trigger thresholds, false-positive suppression for small tasks, and budget/cap enforcement.
Problem
Long-running CodeWhale maintenance sessions often stay in the parent thread even when the work shape is clearly delegable: broad discovery, independent file reads, repeated GitHub object inspection, verification, and synthesis. That makes the parent transcript large and slow, and it can make the model look lost when the missing piece is orchestration.
Evidence from maintainer-private local CodeWhale session logs, scanned 2026-05-24:
No prompts, raw tool outputs, secrets, absolute local paths, or user text are copied here. These are aggregate counters from local logs.
Desired Behavior
CodeWhale should notice when the current task has crossed from "normal parent work" into "delegate or structure this" territory.
Examples of trigger patterns:
When triggered, CodeWhale should offer a small, explicit routing choice:
Acceptance Criteria
Related