Skip to content

Model routing: use Flash for scout/tool-heavy work and Pro for synthesis #2027

@Hmbown

Description

@Hmbown

Problem

Some work shapes are high-throughput and evidence-gathering heavy but not deep-reasoning heavy: GitHub triage, read-only repo scouting, repeated file search, issue clustering, PR/check status collection, and first-pass log categorization. These are good candidates for Flash/scout routing, with Pro reserved for planning, risk judgment, final synthesis, or high-stakes code changes.

Evidence from redacted maintainer-private CodeWhale log scans:

  • 24 turns contained 8 or more GitHub-oriented tool calls.
  • 22 of those GitHub-heavy turns had no observed delegation/RLM routing.
  • 56 turns contained 8 or more read/search calls and no observed RLM routing.
  • The largest parent GitHub-heavy turn had 114 GitHub-oriented calls in one turn.
  • These are exactly the shapes where Flash/helper workers should collect structured evidence while the stronger model synthesizes.

No prompts, raw outputs, secrets, paths, or transcript text are copied here.

Desired Behavior

CodeWhale should classify low-risk, tool-heavy scout work and route it to cheaper/faster helper models by default.

Suggested policy:

  • Pro handles top-level planning, architecture, safety-sensitive judgment, final merge/release decisions, and synthesis.
  • Flash handles read-only scouting, issue/PR inventory, search batches, duplicate detection, status collection, and first-pass summaries.
  • Pro can override a Flash scout when the task becomes high-risk, ambiguous, destructive, or policy-sensitive.
  • The route is visible in the UI: parent model, child model, reasoning effort, why the route was chosen, and whether the route is cost-saving or quality-preserving.

Acceptance Criteria

  • Add a routing classifier for scout/tool-heavy work: GitHub triage, read-only search, status polling, large-log first pass, and verification-only jobs.
  • Child/scout creation can request deepseek-v4-flash with an appropriate effort level while the parent remains on deepseek-v4-pro.
  • A route receipt explains why Flash or Pro was selected.
  • Users can configure conservative/aggressive routing behavior.
  • High-risk actions such as merges, pushes, release publishing, destructive shell, credentials, provider policy, and legal/branding stay parent/Pro-gated unless explicitly approved.
  • Tests cover route selection for GitHub triage, read-only scouting, code modification, release decision, and destructive operations.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    contextContext management / contextenhancementNew feature or request

    Projects

    Status

    Backlog

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions