Problem
Some work shapes are high-throughput and evidence-gathering heavy but not deep-reasoning heavy: GitHub triage, read-only repo scouting, repeated file search, issue clustering, PR/check status collection, and first-pass log categorization. These are good candidates for Flash/scout routing, with Pro reserved for planning, risk judgment, final synthesis, or high-stakes code changes.
Evidence from redacted maintainer-private CodeWhale log scans:
- 24 turns contained 8 or more GitHub-oriented tool calls.
- 22 of those GitHub-heavy turns had no observed delegation/RLM routing.
- 56 turns contained 8 or more read/search calls and no observed RLM routing.
- The largest parent GitHub-heavy turn had 114 GitHub-oriented calls in one turn.
- These are exactly the shapes where Flash/helper workers should collect structured evidence while the stronger model synthesizes.
No prompts, raw outputs, secrets, paths, or transcript text are copied here.
Desired Behavior
CodeWhale should classify low-risk, tool-heavy scout work and route it to cheaper/faster helper models by default.
Suggested policy:
- Pro handles top-level planning, architecture, safety-sensitive judgment, final merge/release decisions, and synthesis.
- Flash handles read-only scouting, issue/PR inventory, search batches, duplicate detection, status collection, and first-pass summaries.
- Pro can override a Flash scout when the task becomes high-risk, ambiguous, destructive, or policy-sensitive.
- The route is visible in the UI: parent model, child model, reasoning effort, why the route was chosen, and whether the route is cost-saving or quality-preserving.
Acceptance Criteria
- Add a routing classifier for scout/tool-heavy work: GitHub triage, read-only search, status polling, large-log first pass, and verification-only jobs.
- Child/scout creation can request
deepseek-v4-flash with an appropriate effort level while the parent remains on deepseek-v4-pro.
- A route receipt explains why Flash or Pro was selected.
- Users can configure conservative/aggressive routing behavior.
- High-risk actions such as merges, pushes, release publishing, destructive shell, credentials, provider policy, and legal/branding stay parent/Pro-gated unless explicitly approved.
- Tests cover route selection for GitHub triage, read-only scouting, code modification, release decision, and destructive operations.
Related
Problem
Some work shapes are high-throughput and evidence-gathering heavy but not deep-reasoning heavy: GitHub triage, read-only repo scouting, repeated file search, issue clustering, PR/check status collection, and first-pass log categorization. These are good candidates for Flash/scout routing, with Pro reserved for planning, risk judgment, final synthesis, or high-stakes code changes.
Evidence from redacted maintainer-private CodeWhale log scans:
No prompts, raw outputs, secrets, paths, or transcript text are copied here.
Desired Behavior
CodeWhale should classify low-risk, tool-heavy scout work and route it to cheaper/faster helper models by default.
Suggested policy:
Acceptance Criteria
deepseek-v4-flashwith an appropriate effort level while the parent remains ondeepseek-v4-pro.Related