feat: reduce planner noise for small models + structured artifact tool#23
Merged
Conversation
m62624
commented
Jun 15, 2026
Owner
- TUI resize fixes
Strip internal/noise fields from buildPlannerStatusText that do not drive
the model's next action:
- Drop runtime dump (creationMethod, compatibilityMode, nextStep,
questions*, compactBoundaries, idle/stuck/debug ids, requires*, broken,
blockedReason) — state is already reflected in Lifecycle Decision and
Next Required Action.
- Replace "## Effective Settings" with a focused "## Languages" block
(metadata.* only); drop idle/skills/contracts tuning knobs.
- Trim "## Git And Worktree" to git essentials (drop activeBranches,
mergeTargets, actualHEAD).
- Show "## Debug Mode" only while a debug session is active.
- Keep only enabled/guidance lines from the contracts section.
Add an explicit worktree-location line ("You are in worktree X — work
here") and a bold pointer to re-read the current stage instruction, so
status answers "where am I / what's the goal / what next" up front.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
After a transition the workflow tool returned a generic "Call planner_status before choosing the next planner action", forcing the small model to fetch the heavy status after every step. It now emits a compact hint built from the post-transition state. - Add buildNextStepHint(state) (src/runtime/next-step-hint.ts): worktree location, current step, goal, first required action, exit condition, and the next move — derived from the state machine. - Branch points list every allowed planner_finish_step target and label loop-back targets (e.g. run_final_tests -> implement_task) so the model sees both the forward and the fix/loop path. - Compact steps point to planner_request_compact only when the boundary is enabled; otherwise fall through to the linear next step. - Wire it into formatWorkflowToolResult using result.state (the step we moved INTO), replacing the old pre-transition lifecycle footer. The hint is deliberately lighter than planner_status (no Stage Behavior, allowed-tool dumps, or full instruction body); status stays the heavier source of truth. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add four planner wrapper tools so the small model fills artifacts from arguments instead of hand-formatting markdown, and funnel the structured ones through validation: - planner_tdd_submit: structured per-section fields (Pre-Implementation Proof Contract, Post-Implementation Counterexample Review, Task Merge Scope Audit). The wrapper assembles tdd.md, validates required fields immediately, and merges incrementally across steps (src/runtime/ tdd-form.ts). Field/section definitions are now shared with the tdd-evidence validators. - planner_discovery_submit: body + verificationProtocol[] -> always well-formed "## Verification Protocol" section. - planner_plan_submit / planner_summary_submit: content-arg writers for the open-ended plan.md / final_summary.md. Enforcement (src/guard/project-mutation.ts): built-in edit/write are now blocked on the structured artifacts that have a wrapper — goal.md, questions.md, and the active task's tdd.md — with a message naming the right tool. Open-ended, append-heavy artifacts (plan.md, discovery.md, final_summary.md) stay editable so the model can append across the lifecycle; their submit tools are convenience. Wire-up: register the tools in tool-policy step allowlists and index.ts, widen the built-in guard state to carry questionsMd/tasksDir/activeTaskId. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Rewrite all 12 bundled stage instructions (instructions/defaults/*.md) to a consistent structure and align drifted duplicate blocks, while preserving every distinct rule. Reflect the changes from the rest of this branch: - Route artifact writing through the new fill-tools: tdd.md via planner_tdd_submit (and state that built-in edit/write cannot modify it), discovery.md via planner_discovery_submit (body + verificationProtocol), plan.md via planner_plan_submit, final_summary via planner_summary_submit. - Lean on the richer planner_finish_step hint: the recurring footer now says the finish_step result already names the next step, goal, and worktree, and to call planner_status only when the full step rule or instruction is needed. - Consolidate discovery's two Fundamental Rules blocks (1-6) into one section; keep the planning Integration-vs-New-Entity and execution Uncertainty->Question rules as named rules. - Trim verbose diagnostics into compact bullets (~310 fewer lines). Parser-critical "## auto-compact"/"## manual-compact" headings are kept in exactly the files that had them. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The workspace overlay only repainted when its render signature changed (clock, content, focus/input). Terminal size was not part of the signature, so after a resize the idle tick computed the same signature and never called requestRender() — the overlay stayed frozen at the old dimensions until the next content change. Include the terminal columns/rows in computeSignature() so a resize is detected on the next tick and triggers a redraw. render() already recomputes on width/height change; it just was not being asked to. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
State plainly that the extension is built for local models and, at runtime, is driven by a small local model through Pi, while cloud LLMs assist development. Add the experiment warning verbatim so it is clear this is not a guarantee of better output and can make results worse. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The previous resize fix made the workspace request a redraw on resize, but the overlay still could not grow past its startup size. overlayOptions() is evaluated once at show time, and it returned absolute width/maxHeight from the initial terminal. resolveOverlayLayout re-runs each render but an absolute width only clamps down to a smaller terminal — it never expands — so shrink-and-restore worked while enlarging beyond the startup size (or going fullscreen) stayed pinned. Return relative geometry instead: width "100%" and maxHeight "100%" track the live terminal both ways, and margin.bottom reserves the footer rows. The component already derives its render height from the live terminal, so it fills the larger area without a double reserve. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
buildNextStepHint described the next move but not which wrapper is permitted at the current step, so a small model could still reach for a built-in write/edit. Add a "Tools allowed now: ..." line sourced from getAllowedPlannerWrapperTools(state) — e.g. at discovery/scan_project_structure it names planner_discovery_submit and the contract tools — so the model picks the right planner tool in the moment. The hint is reused by both planner_finish_step and start_step. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.