Skip to content

feat(worker-model): prefer non-reasoning models and disable thinking on worker calls#122

Open
BYK wants to merge 2 commits intomainfrom
feat/non-reasoning-worker-models
Open

feat(worker-model): prefer non-reasoning models and disable thinking on worker calls#122
BYK wants to merge 2 commits intomainfrom
feat/non-reasoning-worker-models

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 5, 2026

Summary

  • Part A: ModelInfo.capabilities gains reasoning?: boolean. selectWorkerCandidates() sorts non-reasoning models before reasoning models at equal cost — so the cheapest non-reasoning variant wins validation before any reasoning-capable model at the same price tier. OpenCode's getProviderModels() now captures model.reasoning from provider.list().

  • Part B: LLMClient.prompt() opts gain thinking?: boolean. All 7 core worker call sites pass thinking: false: distillation (2), meta-distill (1), curation (2), worker-model validation Phase 1+2 (2), query expansion (1). Pi adapter forwards it as thinkingEnabled: false to complete(). Gateway already doesn't trigger thinking; OpenCode SDK has no thinking toggle and relies on Part A instead.

Why

Background workers (distillation, curation, query expansion) are single-turn text-in/text-out. Reasoning/thinking tokens are produced, billed, and silently discarded. Two independent levers close this:

  • Part A ensures the model selection itself routes away from reasoning-capable models when a cheaper non-reasoning variant exists and passes quality checks.
  • Part B is a defensive safety net for Pi (where thinking defaults off but this makes intent explicit) and documents the limitation on OpenCode.

Testing

  • 4 new selectWorkerCandidates tests: non-reasoning preferred at equal cost, ordering preserved, fallback when only reasoning model available, undefined treated as non-reasoning (backwards compat)
  • Full suite: 677 pass, 0 fail

BYK added 2 commits May 2, 2026 23:22
Replace path-only file dedup with range-aware coverage checks that
consider offset/limit parameters. Earlier reads are only collapsed
when a later read covers the same or wider line range, preventing
over-deduplication where a narrow later read would incorrectly
collapse an earlier full-file read.

Closes #117
…on worker calls

Part A: extend ModelInfo.capabilities with reasoning?: boolean and update
selectWorkerCandidates() to sort non-reasoning models before reasoning models
at equal cost. OpenCode's getProviderModels() now captures model.reasoning
from provider.list() so the flag is populated at runtime.

Part B: add thinking?: boolean to LLMClient.prompt() opts. All 7 core worker
call sites (distillation, meta-distill, curation, consolidation, validation
Phase 1+2, query expansion) pass thinking: false. Pi adapter forwards it as
thinkingEnabled: false to complete(). Gateway and OpenCode adapters documented
— Gateway never triggers thinking; OpenCode cannot honor the hint and relies
on Part A routing to non-reasoning models instead.

Adds 4 new tests for reasoning-preference ordering in selectWorkerCandidates.
@BYK BYK enabled auto-merge (squash) May 5, 2026 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant