Skip to content

fix: stop discovery fill-tool from poisoning the doubt_review protocol parser#27

Merged
m62624 merged 6 commits into
mainfrom
fix/fill-tool-parser-gating-ci-worktree
Jun 15, 2026
Merged

fix: stop discovery fill-tool from poisoning the doubt_review protocol parser#27
m62624 merged 6 commits into
mainfrom
fix/fill-tool-parser-gating-ci-worktree

Conversation

@m62624

@m62624 m62624 commented Jun 15, 2026

Copy link
Copy Markdown
Owner

No description provided.

m62624 and others added 6 commits June 15, 2026 22:40
…l parser

A decoded planner session showed the model stuck in an 11-call loop on
planner_doubt_review. Root cause: planner_discovery_submit took both a free-form
`body` and a structured `verificationProtocol` argument, and the model wrote its
own `## Verification Protocol` section into body while the wrapper also appended
one from the argument. discovery.md then carried two protocol sections;
extractVerificationProtocol collected the body's prose lead-in
("The existing project has these commands available:") as a phantom required
command and never reached the argument's section — so doubt_review could not be
satisfied.

- artifact-tools: the verificationProtocol argument is now the single source of
  truth — buildDiscoveryMarkdown strips any protocol section the model wrote into
  body before appending the canonical one (no brittle reject on an exact header).
- doubt-review parser hardened: extractVerificationProtocol collects only bullet
  list items, and extractProtocolCommand rejects prose lines ending in ':'.
- Every fill-tool now echoes the canonical schema next to what was saved so the
  model can self-correct by comparing expected vs received shape.
- next-step hint spells out submit ordering (a later step's planner_*_submit is
  blocked until planner_finish_step advances there) to remove the early-call
  friction seen in the session.
- Round-trip tests reproduce the deadlock and assert each artifact survives its
  consuming parser/validator with no phantom or duplicate sections.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The invariant test only checked one flag combination (normal flags, debug on),
leaving recovery/compact/user-decision states uncovered. Expand it to encode the
real two-gate composition from orchestrator-gate.ts:

- Normal allow_stage_machine path (debug on AND off): every guard-allowed wrapper
  must pass the stage-behavior gate — the deadlock-prevention invariant.
- Broken / user-decision states bypass the behavior gate, so the guard set must
  be fixed (identical across every stage/step) — a step-scoped tool must never
  leak into a state where nothing checks it against stage behavior.
- Compact boundary: the guard set must be exactly [planner_status].

Document the two-halves contract in src/runtime/AGENTS.md so future tool changes
update both allowlists and stay covered. Any drift now fails before publish.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Release notes accumulated old feat/fix entries onto each new release. Root
cause: the draft release used generate_release_notes:true with no baseline.
GitHub anchors auto-generated notes to the last *published* (non-draft) release,
but every release here is created as a draft, so the baseline stayed stale and
each release re-listed every PR since the last published one.

Generate the notes explicitly via `gh api releases/generate-notes` with
previous_tag_name set to the previous v* tag (selected with
`git tag --list 'v*' --sort=-v:refname`, excluding the just-created tag), and
feed the result into the release body instead of generate_release_notes. The
existing .github/release.yml categories/exclusions still apply.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Deleting a plan whose task worktree had been removed by hand failed with
"git -C <repo> branch -d <branch> failed". The thrown GitCommandError aborted
the whole deletion, so the plan — and its stale branches — stayed in the list.

deletePlan now deletes managed branches and the worktree through tolerant
helpers: a branch that is already gone (branchExists=false) is skipped, and any
delete/remove failure is downgraded to a warning instead of throwing. The plan
is pruned regardless, and the warnings are surfaced in the command result so the
branch stops appearing among selectable plans.

Tests cover both the already-removed branch and the failed-delete paths.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tools

The forward invariant (guard ⊆ behavior) caught deadlocks where the guard lets a
tool through but the behavior gate blocks it. The reverse direction was
unguarded: a wrapper tool advertised in expectedTools (so it surfaces in status'
"Stage Behavior" and the model reaches for it) while the guard blocks it leaves
the model told to call a tool it cannot.

Add the reverse invariant — every wrapper tool in expectedTools, except the
always-allowed specials planner_status/planner_git_inspect, must be guard-allowed
at that step. It surfaced three over-listings, all at runtime-driven steps where
the guard intentionally allows (almost) nothing:

- init/check_project advertised planner_report_stuck (execution-only per
  stuck-tools.ts).
- init/prepare_storage advertised the contract wrappers + planner_git_commit.
- init/create_plan_worktree advertised the contract read/route wrappers.

Trim those expectedTools to match the guard. Not a live deadlock today (the model
does not drive those steps), but exactly the drift that becomes one — now both
directions are pinned across the full flag matrix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The expected/received echo previously covered only the four artifact fill-tools.
Extend it to every tool that writes a structured markdown artifact so the model
gets the same self-correction signal everywhere.

- New artifact-echo.ts holds the shared formatArtifactEcho (expected shape +
  what was saved), formatCanonicalSchemaHint (schema only, for tools that already
  echo their content), and ARTIFACT_CANONICAL_SCHEMA keyed by tool.
- artifact-tools.ts now routes through the shared helper/map (no local copy).
- goal_submit / questions_submit append the canonical-schema hint (they already
  echo the saved content for user review).
- refactor_review / doubt_review / task_upsert / contract_upsert /
  skill_create / skill_update append the full expected-vs-received echo.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added the fix label Jun 15, 2026
@m62624 m62624 merged commit a5177ac into main Jun 15, 2026
2 checks passed
@m62624 m62624 deleted the fix/fill-tool-parser-gating-ci-worktree branch June 15, 2026 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant