fix: stop discovery fill-tool from poisoning the doubt_review protocol parser#27
Merged
Merged
Conversation
…l parser
A decoded planner session showed the model stuck in an 11-call loop on
planner_doubt_review. Root cause: planner_discovery_submit took both a free-form
`body` and a structured `verificationProtocol` argument, and the model wrote its
own `## Verification Protocol` section into body while the wrapper also appended
one from the argument. discovery.md then carried two protocol sections;
extractVerificationProtocol collected the body's prose lead-in
("The existing project has these commands available:") as a phantom required
command and never reached the argument's section — so doubt_review could not be
satisfied.
- artifact-tools: the verificationProtocol argument is now the single source of
truth — buildDiscoveryMarkdown strips any protocol section the model wrote into
body before appending the canonical one (no brittle reject on an exact header).
- doubt-review parser hardened: extractVerificationProtocol collects only bullet
list items, and extractProtocolCommand rejects prose lines ending in ':'.
- Every fill-tool now echoes the canonical schema next to what was saved so the
model can self-correct by comparing expected vs received shape.
- next-step hint spells out submit ordering (a later step's planner_*_submit is
blocked until planner_finish_step advances there) to remove the early-call
friction seen in the session.
- Round-trip tests reproduce the deadlock and assert each artifact survives its
consuming parser/validator with no phantom or duplicate sections.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The invariant test only checked one flag combination (normal flags, debug on), leaving recovery/compact/user-decision states uncovered. Expand it to encode the real two-gate composition from orchestrator-gate.ts: - Normal allow_stage_machine path (debug on AND off): every guard-allowed wrapper must pass the stage-behavior gate — the deadlock-prevention invariant. - Broken / user-decision states bypass the behavior gate, so the guard set must be fixed (identical across every stage/step) — a step-scoped tool must never leak into a state where nothing checks it against stage behavior. - Compact boundary: the guard set must be exactly [planner_status]. Document the two-halves contract in src/runtime/AGENTS.md so future tool changes update both allowlists and stay covered. Any drift now fails before publish. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Release notes accumulated old feat/fix entries onto each new release. Root cause: the draft release used generate_release_notes:true with no baseline. GitHub anchors auto-generated notes to the last *published* (non-draft) release, but every release here is created as a draft, so the baseline stayed stale and each release re-listed every PR since the last published one. Generate the notes explicitly via `gh api releases/generate-notes` with previous_tag_name set to the previous v* tag (selected with `git tag --list 'v*' --sort=-v:refname`, excluding the just-created tag), and feed the result into the release body instead of generate_release_notes. The existing .github/release.yml categories/exclusions still apply. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Deleting a plan whose task worktree had been removed by hand failed with "git -C <repo> branch -d <branch> failed". The thrown GitCommandError aborted the whole deletion, so the plan — and its stale branches — stayed in the list. deletePlan now deletes managed branches and the worktree through tolerant helpers: a branch that is already gone (branchExists=false) is skipped, and any delete/remove failure is downgraded to a warning instead of throwing. The plan is pruned regardless, and the warnings are surfaced in the command result so the branch stops appearing among selectable plans. Tests cover both the already-removed branch and the failed-delete paths. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tools The forward invariant (guard ⊆ behavior) caught deadlocks where the guard lets a tool through but the behavior gate blocks it. The reverse direction was unguarded: a wrapper tool advertised in expectedTools (so it surfaces in status' "Stage Behavior" and the model reaches for it) while the guard blocks it leaves the model told to call a tool it cannot. Add the reverse invariant — every wrapper tool in expectedTools, except the always-allowed specials planner_status/planner_git_inspect, must be guard-allowed at that step. It surfaced three over-listings, all at runtime-driven steps where the guard intentionally allows (almost) nothing: - init/check_project advertised planner_report_stuck (execution-only per stuck-tools.ts). - init/prepare_storage advertised the contract wrappers + planner_git_commit. - init/create_plan_worktree advertised the contract read/route wrappers. Trim those expectedTools to match the guard. Not a live deadlock today (the model does not drive those steps), but exactly the drift that becomes one — now both directions are pinned across the full flag matrix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The expected/received echo previously covered only the four artifact fill-tools. Extend it to every tool that writes a structured markdown artifact so the model gets the same self-correction signal everywhere. - New artifact-echo.ts holds the shared formatArtifactEcho (expected shape + what was saved), formatCanonicalSchemaHint (schema only, for tools that already echo their content), and ARTIFACT_CANONICAL_SCHEMA keyed by tool. - artifact-tools.ts now routes through the shared helper/map (no local copy). - goal_submit / questions_submit append the canonical-schema hint (they already echo the saved content for user review). - refactor_review / doubt_review / task_upsert / contract_upsert / skill_create / skill_update append the full expected-vs-received echo. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.