diff --git a/README.md b/README.md index 2330164..4e2f0f4 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -> ⚠️ This repository is an experiment built with Pi and Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf for local coding work. It is maintained with local AI assistance and may contain non-professional design choices, rough edges, broken behavior, or mistakes. Use it at your own risk. +> ⚠️ Experimental. pi-code-planner is built **for local coding models** and, at runtime, is driven by one — a small local model (tested with Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf) running through Pi. Cloud LLMs (such as Claude) take part in its development. It is maintained with AI assistance and may contain non-professional design choices, rough edges, broken behavior, or mistakes. Use it at your own risk. # pi-code-planner @@ -8,6 +8,10 @@ An experimental [Pi](https://github.com/badlogic/pi-mono) extension for local coding models. Adds a persisted state machine so long tasks survive context compaction, Git branching, and approval steps without you babysitting the session. +However you run it, the unit that matters is **Pi + this extension driving a local model**. With that in mind, read this plainly: + +> This is not a guarantee of better output. The extension can make results worse by adding overhead or constraining the model at the wrong time. It is an experiment, not a stable product. + Tested with [Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf](https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF). The model still makes mistakes — sometimes it spirals on a wrong hypothesis, sometimes it misreads the persisted state. But the failure mode changes: instead of silently drifting, it tends to get stuck in a visible way and either self-corrects or calls `planner_report_stuck`. In practice, a session implementing a nontrivial feature went about 3 hours without me touching it. That is the goal. --- diff --git a/instructions/defaults/discovery.md b/instructions/defaults/discovery.md index 44c7eba..faf1aee 100644 --- a/instructions/defaults/discovery.md +++ b/instructions/defaults/discovery.md @@ -8,39 +8,30 @@ Become familiar with the project before planning. Keep this stage cheap for a lo 1. `scan_project_structure` - Read `goal.md` for normal `/planner-create` plans. - - If `planner_status` reports `creationMethod: improve`, this is a discovery-first plan: do not treat empty `goal.md` as a blocker and do not read `request.md` as the source goal. Use repository evidence to discover a bounded self-improvement goal, then return to intake/draft_goal after discovery. + - If `planner_status` reports `creationMethod: improve`, this is a discovery-first plan: do not treat empty `goal.md` as a blocker and do not read `request.md` as the source goal. Use repository evidence to discover a bounded self-improvement goal, then return to `intake/draft_goal` after discovery. - **AGENTS.md First Gate — do this before reading any project file or running any shell command:** + **AGENTS.md First Gate — before reading any project file or running any shell command:** 1. Call `planner_contract_scan` to discover all AGENTS.md/context files. - 2. If scan returns 0 discovered paths: skip routing entirely. Proceed directly to project tree inspection below. Do not call `planner_contract_route` or `planner_contract_read`. + 2. If scan returns 0 paths: skip routing entirely and proceed to project tree inspection. Do not call `planner_contract_route` or `planner_contract_read`. 3. If scan returns 1+ paths: call `planner_contract_route` with the goal's target paths to select the relevant chain. - 4. Call `planner_contract_read` for every contract in the selected chain from root to nearest. - 5. **Depth stop rule:** Stop reading deeper when any of these is true: - - The nearest contract's Child Index is `(none)` — it is a leaf; no children exist to read. - - No child domain in the Child Index matches the goal's primary changed area. - - The nearest contract's Purpose already fully covers the goal scope. - 6. Only read a child contract when the child domain explicitly covers the goal's primary implementation area. Do not read children "just to be safe." - 7. If after reading a contract you realize it is the wrong domain, call `planner_contract_route` again with a more specific target or a sibling path. + 4. Call `planner_contract_read` for every contract in the selected chain, root to nearest. + 5. **Depth stop rule:** stop reading deeper when any is true — the nearest contract's Child Index is `(none)` (it is a leaf); no child domain in the Child Index matches the goal's primary changed area; or the nearest contract's Purpose already fully covers the goal scope. + 6. Read a child contract only when the child domain explicitly covers the goal's primary implementation area. Do not read children "just to be safe." + 7. If a contract turns out to be the wrong domain, call `planner_contract_route` again with a more specific target or a sibling path. 8. After reading the relevant chain, use the `Read First` files listed in the nearest contract before broad source reads. - - Treat AGENTS.md as the only writable/canonical planner memory format. Treat non-AGENTS context files as read-only guidance; if they contain durable planner knowledge, copy the distilled rule into the nearest AGENTS.md via `planner_contract_upsert`. - - Inspect the project tree with read-only shell commands after the contract map is processed. - - Read only the manifests, entrypoints, tests, configuration, and source files needed to understand the requested work after contract guidance is considered. - - Write a concise `discovery.md`: architecture, relevant paths, commands, conventions, risks, and uncertainty. - - `discovery.md` must include a `## Verification Protocol` section before this step can finish. - - In `## Verification Protocol`, record exact test, lint, build, and format commands when they exist. Include required working directory and important flags. - - If commands are not discoverable, record `unknown` entries in `## Verification Protocol` and ask the missing setup questions in `discovery/write_questions`. - - If no useful AGENTS.md exists and discovery evidence proves meaningful architectural zones, create initial root/domain contracts through `planner_contract_upsert`. Do not create one in every folder. - - If `planner_contract_upsert` changes AGENTS.md files during discovery, commit those changes through `planner_git_commit` before finishing `scan_project_structure`. + - Treat AGENTS.md as the only writable/canonical planner memory format. Treat non-AGENTS context files as read-only guidance; copy any durable rule into the nearest AGENTS.md via `planner_contract_upsert`. + - Inspect the project tree with read-only shell commands after the contract map is processed. Read only the manifests, entrypoints, tests, configuration, and source files needed for the requested work. + - Write `discovery.md` through `planner_discovery_submit`: pass the `body` (architecture, relevant paths, conventions, risks, uncertainty) plus a `verificationProtocol` list. The wrapper renders a well-formed `## Verification Protocol` section, which is required before this step can finish. + - In `verificationProtocol`, record exact test, lint, build, and format commands with required working directory and important flags. If a command is not discoverable, record an `unknown` entry and ask the missing setup question in `discovery/write_questions`. + - If no useful AGENTS.md exists and discovery proves meaningful architectural zones, create initial root/domain contracts through `planner_contract_upsert`. Do not create one in every folder. + - If `planner_contract_upsert` changes AGENTS.md files during discovery, commit them through `planner_git_commit` before finishing `scan_project_structure`. 2. `write_questions` - Call `planner_questions_submit` with evidence-based unresolved questions and explicit assumptions. - - If the project is empty or has no existing test/lint/build conventions, ask how to set up testing: framework, test command, lint command, formatter, and any required flags. - - If the project has existing conventions but discovery could not prove exact commands or flags, ask only for the missing commands/flags. - - If questions exist, show them to the user verbatim and wait for answers. - - Call `planner_questions_resolve` with the user's explicit answers. + - If the project is empty or lacks test/lint/build conventions, ask how to set up testing: framework, test command, lint command, formatter, and required flags. If conventions exist but discovery could not prove exact commands or flags, ask only for the missing ones. + - If questions exist, show them to the user verbatim, wait for answers, then call `planner_questions_resolve`. - If no questions remain, call `planner_questions_submit` with `hasOpenQuestions: false` and state that explicitly. -3. `compact_discovery` - - Request planner-controlled compact only after `discovery.md` is useful and questions are resolved. +3. `compact_discovery` — request planner-controlled compact only after `discovery.md` is useful and questions are resolved. 4. `enter_planning` - For normal `/planner-create` plans, advance to `planning/read_context`. - For `/planner-improve` plans (`creationMethod: improve`), finish with target `intake/draft_goal` so the model can write `goal.md` from discovery findings and ask for approval. @@ -50,11 +41,10 @@ Become familiar with the project before planning. Keep this stage cheap for a lo - Do not implement production code or tests. - Do not read the whole repository by default. - Do not skip AGENTS.md routing when contract files exist. They are local architecture memory, not optional docs. -- Do not read child contracts unless the child domain directly covers the goal's primary changed area. Stop at the depth where the contract purpose matches the goal. -- Do not panic when AGENTS.md files are absent. Run the scan, confirm 0 paths, then proceed with normal tree inspection. Absence is not an error. -- Do not create AGENTS.md for every directory. A contract belongs only where it prevents future agents from reading irrelevant code or breaking a durable rule. -- Do not use the absence of AGENTS.md as evidence that no contract update is needed. For project file changes in a project with no writable AGENTS.md, create the initial meaningful contract. -- Do not build or maintain a file-by-file JSONL symbol index. +- Do not read child contracts unless the child domain directly covers the goal's primary changed area. +- Do not panic when AGENTS.md files are absent. Run the scan, confirm 0 paths, then proceed with normal tree inspection. Absence is not an error — but for project file changes in a project with no writable AGENTS.md, create the initial meaningful contract. +- Do not create AGENTS.md for every directory; a contract belongs only where it prevents future agents from reading irrelevant code or breaking a durable rule. +- Do not build or maintain a file-by-file symbol index. - Run project-scoped shell commands from the worktree path reported by `planner_status`. - Do not use raw git. @@ -75,7 +65,7 @@ Treat every discovery conclusion as suspect until it has a path, command, contra Before finishing discovery, doubt the context: -- Did you record exact test, lint, build, and format commands with working directory and important flags? +- Did you record exact test, lint, build, and format commands with working directory and flags? - If the project is empty or conventions are missing, did you ask how testing and checks should be set up? - Are source findings backed by paths and evidence, not filenames or comments alone? - Are open questions truly resolved, or only postponed? @@ -84,66 +74,42 @@ If doubt remains, update `discovery.md`/`questions.md` or ask focused questions. ## Fundamental Rules +These rules are shared planner philosophy; discovery is where they are first applied. + ### Rule 1: System Boundaries -Before reading any file, determine two things: +Before reading any file, separate **internal** (files inside the project you can read and edit) from **external** (host mechanisms, external APIs, runtime environments, servers, models, browsers, file systems outside the project root). -**Internal** — files inside the project that you can read and edit. Code, configs, tests, project documentation. +You do not write external code; you use or call external mechanisms. If a task requires an action performed by an external mechanism, the solution is in HOW to call it, not in rewriting its code. If the task says "make X happen" and X is performed externally, find the integration point where the project can ASK the external mechanism to do X. -**External** — everything outside the project. Host mechanisms, external APIs, runtime environments, servers, models, browsers, file systems outside the project root. +### Rule 2: Mechanism vs Outcome -**Rule:** You do not write external code. You use or call external mechanisms. If a task requires an action performed by an external mechanism — the solution is in HOW to call the external mechanism, not in rewriting its code. +Every requirement has an **outcome** (what should happen) and a **mechanism** (how it happens). Investigate, do not guess. When a task describes an outcome, do not jump to "I need to write code." First ask: is there already a mechanism in the project (hooks, events, handlers, scheduler)? Is there an external mechanism (host API, ready integration)? Do you need new code, or just to connect to an existing mechanism? Code is the last option, not the first. -**Deduction:** If the task says "make X happen" and X is performed by an external mechanism — you do not write X. You find the integration point where the project can ASK the external mechanism to do X. +### Rule 3: Doubt and Logical Deduction -### Rule 2: Mechanism vs Outcome +Never assume documentation, comments, or naming are accurate. Trust only what you can prove via real code, runtime execution, and tests. Assume you are missing critical details until you verify them, and state uncertainties explicitly. If you see `A` calling `B`, find `B`'s definition and verify its behavior — do not guess from the name. -Every requirement has two layers: +### Rule 4: Map Input/Output Boundaries -- **Outcome** — what should happen. The end state. -- **Mechanism** — HOW it happens. The means of achievement. +To understand any module, map its boundaries first: **inputs** (what triggers it; what data/events/config it receives, and from where) and **outputs** (what it produces; side effects, state changes, return values, file writes, events). Once the input/output protocol is clear, the internal logic becomes predictable. Never analyze a module in isolation. -**Rule:** You investigate, you do not guess. When a task describes an outcome, you must NOT automatically think "I need to write code for this." Your first step is to investigate: +### Rule 5: Pivot When Stuck -1. Is there already a mechanism in the project that can do this? Hooks, events, handlers, scheduler? -2. Is there a mechanism in the external world that can do this? Host API, ready-made integration? -3. Do you need new code, or just CONNECT to an existing mechanism? +If a task stays blocked, do not retry the same approach. Being stuck signals a wrong assumption. Map what was done, identify where reality diverged from expectations, then pivot: simplify, change direction, or backtrack to a known working state. -Code is the last option. Not the first. +### Rule 6: Extreme KISS -**Deduction:** Before writing a single line of code, you determine the mechanism. If the mechanism can be external — you find how to interact with it. If the mechanism already exists in the project — you connect to it. New code is written only when nothing suitable is found. +Write only the minimum code required for the goal. Reuse existing utilities, helpers, and classes found during discovery instead of rewriting them. Avoid speculative abstractions, factories, and patterns unless explicitly requested. ## manual-compact -Preserve the approved goal, `discovery.md`, relevant paths, commands, open questions, and exact current planner step. After compaction, call `planner_status`. +Preserve the approved goal, `discovery.md`, relevant paths, commands, open questions, and the exact current planner step. After compaction, call `planner_status`. ## auto-compact Call `planner_status` immediately. Read `discovery.md` and continue the persisted step. Read additional source files only when the current context is insufficient. -## Advanced Discovery Logic & Cognitive Strategies - -### Rule 3: Extreme Doubt and Logical Deduction -- **Doubt Everything Always**: Never assume any existing documentation, comments, or naming conventions are 100% accurate. Trust only what you can prove via real code, runtime execution, and tests. -- **No Unfounded Confidence**: You must assume you are missing critical details until you verify them. State your uncertainties explicitly. Never state that you are fully confident without proof. -- **Logically Deduce, Do Not Guess**: Every conclusion must be backed by a clear path of evidence. If you see function `A` calling `B`, do not guess what `B` does based on its name; find `B`'s definition and verify its behavior. - -### Rule 4: Systemic Mapping of Input/Output Boundaries -To properly study and comprehend any system, component, or feature, you must exhaustively map its boundaries: -- **Income (Inputs)**: What triggers this module? What data, events, parameters, or configurations does it receive? Trace the source of all inputs to their origins. -- **Outcome (Outputs)**: What does this module produce? What side-effects, state changes, return values, file writes, or events does it trigger? -- **Detailed Boundary Study**: Never analyze a module in isolation. Study all incoming and outgoing connections first. Once you understand the input/output protocol, the internal logic becomes simple and predictable. - -### Rule 5: Pivoting and Diagnostic Recovery -- **If You Get Stuck**: If a task remains blocked, do not repeatedly retry the same approach. Getting stuck is a signal that your initial assumptions were wrong, or you are moving in the wrong direction. -- **Analyze the Divergence**: Stop and map out what has been done so far. Identify where reality diverged from your expectations. -- **Pivot**: Move in another direction, simplify the problem, or backtrack to a known working state. Do not push forward blindly. - -### Rule 6: Extreme KISS (Keep It Simple, Stupid) -- **Zero Bloat Principle**: Never create redundant functions, files, or utilities. Write only the absolute minimum amount of code required to satisfy the goal. -- **Reuse Existing Code First**: In your discovery phase, look for existing utility functions, methods, helper packages, and classes. Do not rewrite functionality that already exists in the project. -- **Avoid Over-Engineering**: Do not design complex abstract layers, massive factories, or unnecessary patterns unless explicitly requested. A simple procedural or straightforward object-oriented function is always preferred. - ## If You Do Not Know What To Do Next -If you don't know what to do next, call `planner_status`. +The `planner_finish_step` result names your next step, its goal, and the worktree to work in — follow it. Call `planner_status` only when you need the full step rule or stage instruction, or when you are unsure. diff --git a/instructions/defaults/done.md b/instructions/defaults/done.md index 3e638aa..db2ea1c 100644 --- a/instructions/defaults/done.md +++ b/instructions/defaults/done.md @@ -8,44 +8,34 @@ Present the verified plan result, wait for an explicit user decision, then eithe 1. `present_result` - Read `final_summary.md` and present scope, checks, risks, plan branch, worktree path, and output options. - - If a verified reusable lesson is still missing from the planner skill library, call `planner_skill_create` before asking for acceptance. Do not create skills for ordinary summaries. - - After presenting the result, call `planner_finish_step` immediately to enter `await_user_acceptance`. + - If a verified reusable lesson is still missing from the skill library, call `planner_skill_create` before asking for acceptance. Do not create skills for ordinary summaries. + - After presenting, call `planner_finish_step` immediately to enter `await_user_acceptance`. 2. `await_user_acceptance` - - Ask the user to accept the result or request changes. - - Never decide on behalf of the user. - - If the user accepts, ask the user to run `/planner-finish`. - - If the user writes what is wrong or requests more work instead of running `/planner-finish`, treat that as a change request. - - For a change request, call `planner_finish_step` with target `done/handle_change_request`. + - Ask the user to accept the result or request changes. Never decide on their behalf. + - If the user accepts, ask them to run `/planner-finish`. + - If the user writes what is wrong or requests more work instead of running `/planner-finish`, treat that as a change request: call `planner_finish_step` with target `done/handle_change_request`. - `/planner-finish` atomically performs the remaining export, cleanup, and Pi session handoff. Do not try to reproduce that cleanup through model tools. 3. `handle_change_request` - - Record user feedback in durable artifacts. - - Append a `## Change Request` section to `decisions.md` with the user's exact requested corrections. - - Append a short `## Change Request Replan` note near the start of `plan.md`: the previous implementation is complete, but the user requested follow-up changes. Include `### Completed Work` and `### Remaining Work` subsections. Do not rewrite the old plan wholesale or delete the previous plan history. - - Append a `## Post-Implementation Snapshot` section to `discovery.md`: summarize what was implemented, current relevant files/branches, known gaps, and why the user requested another pass. Include `### Completed Work` and `### Remaining Work` subsections. - - Treat existing task artifacts as completed history. The follow-up planning pass may create new revision tasks for the remaining work, but must not reopen completed task IDs. - - Return to `planning/read_context` in the same plan worktree and branch. -4. `prepare_output_branch` - - Internal `/planner-finish` phase: prepare the output branch in the original repository. -5. `merge_or_export_result` - - Internal `/planner-finish` phase: export the plan branch result. -6. `cleanup_worktree` - - Internal `/planner-finish` phase: remove the temporary worktree and safe-to-delete managed child branches. -7. `mark_done` - - Internal `/planner-finish` phase: clear active plan state and mark the result complete. -8. `cleanup_plan_files` - - Internal `/planner-finish` phase: remove completed plan artifacts only after `mark_done`. + - Record user feedback in durable artifacts. + - Append a `## Change Request` section to `decisions.md` with the user's exact requested corrections. + - Update `plan.md` (via `planner_plan_submit` or edit): add a short `## Change Request Replan` note near the start — the previous implementation is complete, but the user requested follow-up changes — with `### Completed Work` and `### Remaining Work` subsections. Do not rewrite the old plan wholesale or delete prior history. + - Update `discovery.md` (via `planner_discovery_submit` or edit): add a `## Post-Implementation Snapshot` summarizing what was implemented, current relevant files/branches, known gaps, and why another pass was requested, with `### Completed Work` and `### Remaining Work` subsections. + - Treat existing task artifacts as completed history. The follow-up planning pass may create new revision tasks for remaining work, but must not reopen completed task IDs. + - Return to `planning/read_context` in the same plan worktree and branch. +4. `prepare_output_branch` — internal `/planner-finish` phase: prepare the output branch in the original repository. +5. `merge_or_export_result` — internal `/planner-finish` phase: export the plan branch result. +6. `cleanup_worktree` — internal `/planner-finish` phase: remove the temporary worktree and safe-to-delete managed child branches. +7. `mark_done` — internal `/planner-finish` phase: clear active plan state and mark the result complete. +8. `cleanup_plan_files` — internal `/planner-finish` phase: remove completed plan artifacts only after `mark_done`. ## Acceptance Rules -- `/planner-finish` is an explicit user acceptance command. It may safely finalize directly after `present_result` or during `await_user_acceptance` when all runtime gates are clean. +- `/planner-finish` is an explicit user acceptance command. It may finalize directly after `present_result` or during `await_user_acceptance` when all runtime gates are clean. - No production edits are allowed in done. -- Change requests preserve the worktree and return to planning. -- Cleanup requires explicit acceptance. -- During normal work the protected plan branch is never deleted by managed child cleanup. -- After successful `/planner-finish`, the temporary plan branch is removed because its result is already exported. +- Change requests preserve the worktree and return to planning. Cleanup requires explicit acceptance. +- During normal work the protected plan branch is never deleted by managed child cleanup. After successful `/planner-finish`, the temporary plan branch is removed because its result is already exported. - The user keeps exactly one output branch in the original repository and decides whether to merge, rebase, or delete it. -- If the original Pi JSONL session exists, `/planner-finish` returns to it and removes the completed worktree chat. -- If the original Pi JSONL session is missing, `/planner-finish` warns the user, creates a replacement project-root session, and asks whether to remove the completed worktree chat. +- If the original Pi JSONL session exists, `/planner-finish` returns to it and removes the completed worktree chat. If it is missing, `/planner-finish` warns the user, creates a replacement project-root session, and asks whether to remove the completed worktree chat. - Raw git is forbidden. ## Evidence Discipline @@ -59,11 +49,11 @@ Treat done as a user-decision gate, not proof that the implementation is correct ## Change Request Reload -When returning to `planning/read_context`, reread full `plan.md`, `decisions.md`, user feedback, and `discovery.md`. Treat the previous implementation as current project context, not as a blank project. Preserve completed work, revise the plan only where the change request requires it, then continue toward execution. Existing completed task artifacts remain as audit history. Rebuild tasks only as needed for the requested change. Do not repeat tasks listed under `Completed Work`; create new revision task IDs only for work listed under `Remaining Work`. +When returning to `planning/read_context`, reread full `plan.md`, `decisions.md`, user feedback, and `discovery.md`. Treat the previous implementation as current project context, not a blank project. Preserve completed work, revise the plan only where the change request requires it, then continue toward execution. Existing completed task artifacts remain audit history. Do not repeat tasks under `Completed Work`; create new revision task IDs only for work under `Remaining Work`. ## Planner Skill Memory -`planner_skill_create` is allowed in `present_result` only as a final catch-up for a reusable lesson that was already verified during execution or finalize. Skills are loaded in future planner sessions through Pi `resources_discover`; they do not replace the current done-state acceptance flow. +`planner_skill_create` is allowed in `present_result` only as a final catch-up for a reusable lesson already verified during execution or finalize. Skills are loaded in future planner sessions via Pi `resources_discover`; they do not replace the done-state acceptance flow. ## auto-compact @@ -71,4 +61,4 @@ Call `planner_status` immediately. Reread `final_summary.md` and the exact persi ## If You Do Not Know What To Do Next -If you don't know what to do next, call `planner_status`. +The `planner_finish_step` result names your next step, its goal, and the worktree to work in — follow it. Call `planner_status` only when you need the full step rule or stage instruction, or when you are unsure. diff --git a/instructions/defaults/execution.md b/instructions/defaults/execution.md index 1d3c268..00d8e00 100644 --- a/instructions/defaults/execution.md +++ b/instructions/defaults/execution.md @@ -8,66 +8,37 @@ Execute exactly one active task at a time through tests-first development, imple - At `prepare_task`, call `planner_status`, reread the full `plan.md`, read answered `questions.md` and `decisions.md`, read the selected `task.md`, then inspect `discovery.md` and use focused project search only if needed. - If `task.md` lists a Local Contract Context, call `planner_contract_route/read` before source reads. AGENTS.md files are repository-owned routing memory; higher levels route, nearest levels explain. -- During one task, reread `task.md`, `tdd.md`, `refactor.md`, and focused source files only when the current action needs details that are not already recorded. +- During one task, reread `task.md`, `tdd.md`, `refactor.md`, and focused source files only when the current action needs details not already recorded. - After `compact_task`, do not carry live reasoning into the next task. Call `planner_status`, reread the full `plan.md`, inspect task status, then load the next `task.md`. - After recovery or auto-compact, call `planner_status` before any edit or check. ## Strict Task Lifecycle -1. `prepare_task` - - Select exactly one pending task. - - Create or switch its task branch with planner git wrappers. -2. `write_tdd_plan` - - Read task context and write `tdd.md`. - - Define test strategy, mocks, fixtures, checks, edge cases, and expected failing signal. -3. `write_tests` - - Write failing, mock, or contract tests before production implementation. - - Append changed test files and intent to `tdd.md`. - - If project files changed, commit through planner git before continuing. -4. `run_failing_tests` - - Run focused checks and prove that tests detect the missing behavior. - - Record exact command, cwd, and failing signal in `tdd.md`. -5. `implement_task` - - Implement only the behavior required by `task.md` and `tdd.md`. - - Run focused checks, update `tdd.md` with results, and commit through planner git if files changed. -6. `contract_check` - - For every directory where you edited or created files: if AGENTS.md exists there or above → default is `upsert_existing`; if none exists → default is `create_new`. `no_update` requires concrete evidence, not vague confidence. - - Call `planner_contract_check`, then `planner_contract_upsert` for every upsert or create decision. - - Add durable domain guidance: connections, call chains, blast radius, non-obvious invariants. Do not store task trivia. -7. `refactor_task` - - Challenge the implementation without changing behavior. - - Write `refactor.md` with a concrete KISS review, changes applied, and decisions to keep code unchanged. - - Commit through planner git if files changed. -8. `run_final_tests` - - Run final focused and integration checks from the planner worktree. - - Record final check results and scope review in `refactor.md`. +1. `prepare_task` — select exactly one pending task; create or switch its task branch with planner git wrappers. +2. `write_tdd_plan` — read task context and write `tdd.md` through `planner_tdd_submit` (`preImplementation` fields). Define test strategy, mocks, fixtures, checks, edge cases, and the expected failing signal. Built-in edit/write cannot modify `tdd.md`. +3. `write_tests` — write failing, mock, or contract tests before production implementation. Update `tdd.md` via `planner_tdd_submit`. If project files changed, commit through planner git before continuing. +4. `run_failing_tests` — run focused checks and prove tests detect the missing behavior. Record exact command, cwd, and failing signal via `planner_tdd_submit`. +5. `implement_task` — implement only the behavior required by `task.md` and `tdd.md`. Run focused checks, update `tdd.md` (`postImplementation` fields) via `planner_tdd_submit`, and commit through planner git if files changed. +6. `contract_check` — for every directory where you edited or created files: if AGENTS.md exists there or above → default `upsert_existing`; if none exists → default `create_new`. `no_update` requires concrete evidence, not vague confidence. Call `planner_contract_check`, then `planner_contract_upsert` for every upsert/create decision. Add durable domain guidance: connections, call chains, blast radius, non-obvious invariants. Do not store task trivia. +7. `refactor_task` — challenge the implementation without changing behavior. Write `refactor.md` via `planner_refactor_review` with a concrete KISS review and decisions. Commit through planner git if files changed. +8. `run_final_tests` — run final focused and integration checks from the planner worktree. Record final results and the merge scope audit in `tdd.md` via `planner_tdd_submit` (`mergeScopeAudit` fields). 9. `capture_skill` - - Review skills already loaded in this session (visible as resources). - - For each loaded skill relevant to this task: confirm it is still accurate. If outdated or wrong, call `planner_skill_update` with the corrected content. + - Review skills already loaded in this session (visible as resources). For each loaded skill relevant to this task, confirm it is still accurate; if outdated or wrong, call `planner_skill_update`. - If this task produced a reusable lesson not covered by any existing skill, call `planner_skill_create`. - - If you updated or created a skill whose body describes a runnable probe, run it now. After running, call `planner_git_inspect` to check worktree status. If dirty, call `planner_git_discard_changes` immediately — this is mandatory. - - If no skill action is taken, write an explicit "no skill" note in `decisions.md` with the reason. - - Skill capture is not optional. A decision must be recorded either way. -10. `merge_task_to_plan` - - Merge the task branch into the plan branch through the planner wrapper. - - Call `planner_status` after merge. -11. `compact_task` - - Compact the completed task result if enabled. -12. `select_next_task` - - Choose `execution/prepare_task` for the next task or `finalize/verify_plan_branch` when execution is complete. + - If you updated or created a skill whose body describes a runnable probe, run it now, then call `planner_git_inspect`. If dirty, call `planner_git_discard_changes` immediately — this is mandatory. + - If no skill action is taken, write an explicit "no skill" note in `decisions.md` with the reason. Skill capture is not optional; a decision must be recorded either way. +10. `merge_task_to_plan` — merge the task branch into the plan branch through the planner wrapper. +11. `compact_task` — compact the completed task result if enabled. +12. `select_next_task` — choose `execution/prepare_task` for the next task or `finalize/verify_plan_branch` when execution is complete. ## Atomic Unit Rules -- A commit alone does not finish an atomic unit. -- After every planner-controlled commit or merge, call `planner_status` and continue the persisted state-machine step. -- Dirty worktree is allowed while implementing a running step, but must be resolved before merge boundaries. -- Built-in project write/edit calls are enabled only during `write_tests`, `implement_task`, and `refactor_task`. During `contract_check`, AGENTS.md changes must go through `planner_contract_upsert`, not raw write/edit. +- A commit alone does not finish an atomic unit. After a planner-controlled commit or merge, continue the persisted state-machine step named in the `planner_finish_step` result. +- A dirty worktree is allowed while implementing a running step, but must be resolved before merge boundaries. +- Built-in project write/edit calls are enabled only during `write_tests`, `implement_task`, and `refactor_task`. During `contract_check`, AGENTS.md changes go through `planner_contract_upsert`, not raw write/edit. Planner-managed structured artifacts (`tdd.md`, `goal.md`, `questions.md`) are written only through their submit wrappers. - Never edit the original checkout while a planner worktree is active. Continue inside the worktree session reported by `planner_status`. -- Run every project command from the worktree path reported by `planner_status`. This includes focused tests, full tests, builds, type checks, linters, formatters, generators, package scripts, compilers, and project-specific verification commands, regardless of language or tooling. -- Before recording a successful check, confirm that its shell cwd was the planner worktree, not the original checkout. -- Use `planner_status` after every wrapper result. -- Raw git is forbidden. -- The model chooses task ids only. It never invents merge source or target branches. +- Run every project command from the worktree path reported by `planner_status` — focused tests, full tests, builds, type checks, linters, formatters, generators, package scripts, compilers, and project-specific verification — regardless of language or tooling. Before recording a successful check, confirm its shell cwd was the planner worktree, not the original checkout. +- Raw git is forbidden. The model chooses task ids only; it never invents merge source or target branches. ## Scope Rules @@ -83,7 +54,7 @@ Treat every execution step as reversible until artifacts, diff, and checks agree - Do not continue from memory after compact, recovery, or a failed wrapper; reload the exact state. - Do not claim a task is done until `tdd.md`, refactor review, final checks, and task acceptance criteria all agree. - If a tool call fails, classify the failure before retrying. Repeating a failed action without a new hypothesis is not progress. -- If the implementation starts drifting beyond task scope, stop and return to planning instead of broadening the task. +- If the implementation drifts beyond task scope, stop and return to planning instead of broadening the task. ## Doubt Checkpoint @@ -100,65 +71,32 @@ If doubt remains, run one focused probe or record the risk. Do not add broad tes ## Planner Skill Memory -Every task has a dedicated `capture_skill` step (step 9 in the lifecycle). At that step you must decide: create a new skill, update an existing one, or explicitly record in `decisions.md` why neither is needed. +Every task has a dedicated `capture_skill` step. There you must decide: create a new skill, update an existing one, or explicitly record in `decisions.md` why neither is needed. -Good candidates for a new skill: repeated failure patterns, non-obvious debug probes, state-machine mistakes, stale context issues, exact workflow rules that prevented a real bug. +Good candidates for a new skill: repeated failure patterns, non-obvious debug probes, state-machine mistakes, stale-context issues, exact workflow rules that prevented a real bug. Good candidates for an update: the skill was partially wrong or missing a case proven by this task; the trigger changed; the probe command has a newer form. -Good candidates for updating an existing skill: the skill was partially wrong or missing a case proven by this task; the trigger condition has changed; the probe command has a newer form. +A skill may also be created/updated earlier (at any execution step) once the lesson is verified; still confirm at `capture_skill` that the existing skill is accurate. Do not create skills for ordinary implementation notes, task summaries, broad advice, or unproven suspicions. Write the body in `metadata.skillLanguage`; the wrapper writes frontmatter and updates the skill index. -A skill may also be created or updated earlier (at any execution step) if the lesson is already verified and proven before `capture_skill`. In that case, still confirm at `capture_skill` that the existing skill is accurate. +## Fundamental Rule: Uncertainty -> Question -Do not create a skill for ordinary implementation notes, task summaries, broad advice, or unproven suspicions. The skill body should be written in `metadata.skillLanguage`; the wrapper writes frontmatter and updates the planner skill index. +If a task allows more than one interpretation of mechanism or integration approach, or you are uncertain about system boundaries, ask a question — do not guess or code from assumptions. -## Fundamental Rules +**Ask when:** it is unclear which mechanism the task uses; unclear which files to touch; unclear what is immutable; or there is risk of breaking the existing architecture. **Do not ask when:** the task is unambiguous, all boundaries are clear, and the mechanism is explicitly defined. -### Rule 4: Uncertainty -> Question +## Diagnostics -**Rule:** If a task allows more than one interpretation of mechanism, integration approach, or if you are uncertain about system boundaries, ask a question. Do not guess. Do not write code based on assumptions. - -**When to ask a question:** -- Unclear which mechanism the task uses. -- Unclear which files to touch and which not. -- Unclear what to consider immutable. -- Risk that the solution will break the existing architecture. - -**When not to ask a question:** -- The task is unambiguous. -- All boundaries are clear. -- The mechanism is explicitly defined. +- **Locate the error:** on a test failure, extract the exact file path and line; do not rely on summary output. Check exact arguments/outputs at the failing boundary. Confirm commands ran from the planner worktree. +- **Stuck-loop detection:** if 3 attempts to fix a bug hit the same failure, stop. Reread `tdd.md` boundary conditions and verify mocks are not hiding the real bug or returning stale data. +- **Minimal footprint:** modify only files the active task requires; do not polish adjacent code or add speculative helpers/abstractions. Implement only behavior covered by a TDD test. ## manual-compact -Preserve the plan id, active task id, exact branch, current step, task artifact paths, TDD evidence, refactor findings, final checks, open risks, and exact next action. After compaction, call `planner_status`. - -For `compact_task`, reload full `plan.md` before choosing the next task. +Preserve the plan id, active task id, exact branch, current step, task artifact paths, TDD evidence, refactor findings, final checks, open risks, and exact next action. After compaction, call `planner_status`. For `compact_task`, reload full `plan.md` before choosing the next task. ## auto-compact Call `planner_status` immediately. Do not continue editing from chat memory. Restore the exact task from persisted state, inspect the git gate, then reread the artifacts required by the current step. Read source files only when the exact action needs details not present in the artifacts. If scope may have changed, reread full `plan.md`. -## KISS Execution & Footprint Discipline - -### 1. The Principle of Minimal Footprint -- **One File, One Goal**: Modify only files required by the active task. Do not polish adjacent code or run arbitrary refactorings during implementation. -- **Limit Function Proliferation**: Avoid new files or helpers unless the existing codebase layout strictly demands it. -- **KISS Code**: Simple, readable, direct code beats speculative abstraction. - -### 2. Implementation Boundaries -- **No Speculative Implementations**: Do not implement handling logic for future tasks or edge cases that have not been requested by the active task or verified by a TDD test. - -## Execution & Runtime Diagnostics - -### 1. Test Failures & Stack Traces -- **Locate Error Source**: When a test fails, extract the exact file path and line number. Do not rely on summary output only. -- **Isolate Module Interfaces**: Check exact arguments and outputs at the failing boundary. -- **Verify Execution Cwd**: Ensure all commands run from the specific worktree directory reported by `planner_status`. - -### 2. Algorithmic Pivot Protocol -- **Stuck Loop detection**: If you make 3 attempts to fix a bug and the same test failure persists, stop. -- **Trace Backwards**: Open `tdd.md` and re-read boundary conditions. -- **Verify Mock Integrity**: Ensure mocks are not hiding the real bug or returning stale data. - ## If You Do Not Know What To Do Next -If you don't know what to do next, call `planner_status`. +The `planner_finish_step` result names your next step, its goal, and the worktree to work in — follow it. Call `planner_status` only when you need the full step rule or stage instruction, or when you are unsure. diff --git a/instructions/defaults/finalize.md b/instructions/defaults/finalize.md index 08a842b..b7265ab 100644 --- a/instructions/defaults/finalize.md +++ b/instructions/defaults/finalize.md @@ -7,57 +7,44 @@ Verify the complete plan branch as one integrated result, write a durable user-f ## Strict Step Order 1. `verify_plan_branch` - - Inspect planner git state and confirm that all required tasks were merged. - - Run project-level checks defined by project instructions and task evidence from the worktree path reported by `planner_status`. - - Record failures, residual risks, and any checks that cannot run locally. + - Inspect planner git state and confirm all required tasks were merged. + - Run project-level checks (from project instructions and task evidence) from the worktree path reported by `planner_status`. + - Record failures, residual risks, and any checks that cannot run locally. 2. `compact_before_doubt` - - Request planner-controlled compact after integrated checks and before doubt review. - - This deliberately clears live confidence from the previous implementation loop. - - After compaction, call `planner_complete_compact`, then `planner_status`, then continue from persisted artifacts only. + - Request planner-controlled compact after integrated checks and before doubt review. This deliberately clears live confidence from the previous loop. + - After compaction, call `planner_complete_compact`, then `planner_status`, then continue from persisted artifacts only. 3. `doubt_review` - - Before asking for user acceptance, deliberately doubt the completed result. - - Reread `goal.md`, `plan.md`, task artifacts, `verify.md`, and the final worktree diff. - - Reread `discovery.md` `## Verification Protocol`. Every listed command/check is mandatory evidence for `planner_doubt_review`. - - Treat chat memory from before `compact_before_doubt` as untrusted. Reconstruct the result from artifacts, git state, and focused file reads. - - Start with a `Possible Errors` list written in `metadata.doubtReviewLanguage`. These are suspicions, not bugs yet. - - Assign every possible error to one risk category: `requirement_mismatch`, `missing_test`, `boundary_case`, `integration_break`, `state_machine_error`, `persistence_error`, `recovery_error`, `wrong_file_scope`, `user_flow_regression`, or `cleanup_or_debug_leftover`. - - Fill `verificationEvidence` with every command/check from `discovery.md` `## Verification Protocol`. Missing evidence means the result is not verified. - - If any required command/check failed, was not run, or is unknown, create a `proven_bug` or `needs_probe` finding that names that command. Do not continue to `write_final_summary`. - - For each possible error, prove it, disprove it, or mark it `needs_probe`. Use `planner_doubt_review`; do not hand-write `verify.md`. - - A suspected issue may be called `proven_bug` only after a failing test/command, exact code-path proof, or exact spec contradiction. - - `needs_probe` findings cannot finish the step. Run the probe or downgrade with proof. - - If `proven_bug` findings exist, write them to `decisions.md`, then return to `planning/read_context` for revision tasks. Do not patch ad hoc in finalize. - - After returning to `planning/read_context`, continue through `draft_plan`, `split_tasks`, `write_task_files`, execution, and verification again. Do not skip planning because the fix looks small. - - If a finding mentions placeholder, stub, TODO-only, hardcoded behavior, superficial implementation, missing tests, or unresolved work, it cannot be closed as `not_a_bug` or `disproven`. It must be `proven_bug` or `needs_probe`. - - If a proven, disproven, or probed finding teaches a reusable workflow lesson, call `planner_skill_create` with `sourceKind=doubt_review` before leaving this step. - - If no proven bug or probe remains, continue to `write_final_summary`. + - Before asking for acceptance, deliberately doubt the completed result. Reread `goal.md`, `plan.md`, task artifacts, `verify.md`, and the final worktree diff. + - Reread `discovery.md` `## Verification Protocol`. Every listed command/check is mandatory evidence for `planner_doubt_review`. + - Treat chat memory from before `compact_before_doubt` as untrusted. Reconstruct the result from artifacts, git state, and focused file reads. + - Start with a `Possible Errors` list in `metadata.doubtReviewLanguage`. These are suspicions, not bugs yet. Assign every possible error to one risk category: `requirement_mismatch`, `missing_test`, `boundary_case`, `integration_break`, `state_machine_error`, `persistence_error`, `recovery_error`, `wrong_file_scope`, `user_flow_regression`, or `cleanup_or_debug_leftover`. + - Fill `verificationEvidence` with every command/check from `## Verification Protocol`. Missing evidence means the result is not verified. If any required command failed, was not run, or is unknown, create a `proven_bug` or `needs_probe` finding naming that command, and do not continue to `write_final_summary`. + - For each possible error, prove it, disprove it, or mark it `needs_probe`. Use `planner_doubt_review`; do not hand-write `verify.md`. A suspected issue is `proven_bug` only after a failing test/command, exact code-path proof, or exact spec contradiction. `needs_probe` findings cannot finish the step — run the probe or downgrade with proof. + - If `proven_bug` findings exist, write them to `decisions.md`, then return to `planning/read_context` for revision tasks. Do not patch ad hoc in finalize. After returning, continue through `draft_plan`, `split_tasks`, `write_task_files`, execution, and verification again. Do not skip planning because the fix looks small. + - A finding mentioning placeholder, stub, TODO-only, hardcoded behavior, superficial implementation, missing tests, or unresolved work cannot be closed as `not_a_bug`/`disproven`; it must be `proven_bug` or `needs_probe`. + - If a proven, disproven, or probed finding teaches a reusable workflow lesson, call `planner_skill_create` with `sourceKind=doubt_review` before leaving this step. + - If no proven bug or probe remains, continue to `write_final_summary`. 4. `write_final_summary` - - Write `final_summary.md`. - - Use `metadata.humanLanguage` from `planner_status` unless the user explicitly requested another language. - - Include completed scope, changed files, checks, risks, output branch expectations, and unresolved limitations. - - If the whole plan produced a reusable verified lesson not already captured, call `planner_skill_create` with `sourceKind=final_summary`. -5. `compact_finalize` - - Request planner-controlled compact preserving summary, verification, branch state, and risks. -6. `enter_done` - - Advance to `done/present_result`. + - Write `final_summary.md` through `planner_summary_submit`. Use `metadata.humanLanguage` unless the user explicitly requested another language. + - Include completed scope, changed files, checks, risks, output-branch expectations, and unresolved limitations. + - If the whole plan produced a reusable verified lesson not already captured, call `planner_skill_create` with `sourceKind=final_summary`. +5. `compact_finalize` — request planner-controlled compact preserving summary, verification, branch state, and risks. +6. `enter_done` — advance to `done/present_result`. ## Restrictions - Do not introduce new production behavior during finalize. -- Do not run tests, builds, linters, formatters, or project-specific checks from the original checkout. Use the planner worktree as shell cwd. -- Do not cleanup the worktree or plan files. -- Do not export the plan result before explicit user acceptance. +- Do not run tests, builds, linters, formatters, or project checks from the original checkout. Use the planner worktree as shell cwd. +- Do not cleanup the worktree or plan files, and do not export the plan result before explicit user acceptance. - Do not use raw git. - If checks reveal missing implementation, record the issue and return through the controlled planning flow instead of patching ad hoc. -- During `doubt_review`, assume there may still be bugs even if tests pass. Passing checks are evidence, not acceptance. -- Do not call a finding a bug from suspicion alone. Suspicions without proof are `needs_probe`, not revision tasks. -- Do not normalize away placeholders or shallow implementations. If a placeholder/surface-level implementation remains, return to planning with a proven finding or run a probe. +- During `doubt_review`, assume bugs may remain even if tests pass. Passing checks are evidence, not acceptance. Do not call a finding a bug from suspicion alone (that is `needs_probe`), and do not normalize away placeholders or shallow implementations. ## Evidence Discipline Treat finalize as an adversarial audit of the whole result. -- Do not trust task-level green checks until the integrated branch is checked against `discovery.md` `## Verification Protocol`. +- Do not trust task-level green checks until the integrated branch is checked against `## Verification Protocol`. - Do not summarize "all tests passed" unless each required command/check is listed in `verificationEvidence`. - If the audit finds missing behavior, placeholder logic, stale contracts, or failed/unknown checks, return to planning through the state machine. - Do not patch from finalize because the fix looks small. Controlled revision work must get plan/tasks/TDD again. @@ -75,12 +62,12 @@ Tests are preferred for runtime behavior. Code-path proof is allowed only when t ## Verification Evidence Rules -`planner_doubt_review` must include `verificationEvidence` for every command/check in `discovery.md` `## Verification Protocol`. +`planner_doubt_review` must include `verificationEvidence` for every command/check in `## Verification Protocol`: -- `passed`: allowed only when the command/check was actually run or the exact non-shell check was completed with concrete evidence. -- `failed`: must be paired with a `proven_bug` or `needs_probe` finding naming that command/check. +- `passed`: only when the command was actually run, or the exact non-shell check was completed with concrete evidence. +- `failed`: must be paired with a `proven_bug` or `needs_probe` finding naming that command. - `not_run`: must be paired with a `needs_probe` finding unless the user explicitly accepts the missing check later. -- `unknown`: must be paired with a `needs_probe` finding and usually means discovery did not capture enough verification detail. +- `unknown`: must be paired with a `needs_probe` finding; usually means discovery did not capture enough verification detail. Do not summarize checks as "all tests passed" unless the protocol commands are listed individually with evidence. @@ -88,79 +75,47 @@ Do not summarize checks as "all tests passed" unless the protocol commands are l This step is a verification stage, not a writing exercise. Treat it like TDD for suspected problems: -1. Reconstruct the promise. - - Read the approved goal, the current plan, completed task files, final summary if present, and the final diff. - - Write down what the result must do, what it explicitly must not do, and which project checks already passed. - - Do not trust memory from earlier chat turns. Durable artifacts and the current worktree are the source of truth. -2. Generate possible errors before deciding. - - List concrete possible errors in `metadata.doubtReviewLanguage` under `Possible Errors`. - - A possible error must point to a requirement, a code path, a changed file, a missing test, a migration risk, or an integration boundary. - - Choose the narrowest risk category before deciding whether the issue is real. - - Do not include vague anxiety such as "maybe something is wrong" or style preferences without product impact. -3. Prove or disprove each possible error. - - For behavior, prefer a focused failing test or a focused command that reproduces the issue. - - For static correctness, trace the exact code path and name the files/symbols that force the conclusion. - - For spec mismatch, quote the exact approved requirement and the exact implemented behavior that contradicts it. - - If the evidence is not enough, mark `needs_probe` and run one targeted probe before finishing. Do not convert uncertainty into a bug. -4. Decide the route. - - If any finding is `proven_bug`, record it in `decisions.md` and finish this step with target `planning/read_context`. - - Planning must then create revision tasks. Do not patch production files inside finalize. - - If all findings are `disproven` or `not_a_bug`, finish this step with target `finalize/write_final_summary`. -5. Keep the artifact strict. - - Use only `planner_doubt_review` to write the final doubt artifact. - - Every finding must include `claim`, `specReference`, `codePath`, `verification`, evidence, and `nextAction`. - - `needs_probe` is not a terminal state. The runtime will block leaving this step until every probe is resolved. - -Do not reward yourself for finding many bugs. Reward exactness. False positives waste revision cycles; false negatives ship broken work. The correct outcome may be "no proven bugs remain" if every suspicion was checked and dismissed with evidence. +1. Reconstruct the promise — read the approved goal, current plan, completed task files, final summary if present, and the final diff. Write what the result must do, must not do, and which checks already passed. Trust artifacts and the current worktree, not chat memory. +2. Generate possible errors before deciding — list concrete possible errors under `Possible Errors`, each pointing to a requirement, code path, changed file, missing test, migration risk, or integration boundary. Choose the narrowest category. Exclude vague anxiety and style preferences without product impact. +3. Prove or disprove each — prefer a focused failing test/command for behavior; trace the exact code path for static correctness; quote the exact requirement and contradicting behavior for spec mismatch. If evidence is insufficient, mark `needs_probe` and run one targeted probe before finishing. +4. Decide the route — any `proven_bug` → record in `decisions.md` and finish with target `planning/read_context` (planning then creates revision tasks; do not patch in finalize). If all findings are `disproven`/`not_a_bug`, finish with target `finalize/write_final_summary`. +5. Keep the artifact strict — use only `planner_doubt_review`. Every finding includes `claim`, `specReference`, `codePath`, `verification`, evidence, and `nextAction`. The runtime blocks leaving this step until every `needs_probe` is resolved. -## Planner Skill Memory - -Use `planner_skill_create` only for verified lessons that should improve future planner sessions. A skill is not a final summary. It must describe a reusable trigger and workflow, for example a Pi extension stale-ctx pattern, a recovery proof method, or a specific class of state-machine mistake. - -In doubt review, skill creation is expected when the audit exposed a reusable bug-finding method, a repeated false assumption, a missing-test pattern, a stale-context/compact hazard, or a planner harness mistake. Do not create skills for ordinary project facts or unverified suspicions. - -The skill body should be written in `metadata.skillLanguage`. The wrapper writes YAML frontmatter and stores the skill under the planner extension library. The new skill is loaded through Pi `resources_discover` on future planner session start, resume, or reload. +Reward exactness, not bug count. False positives waste revision cycles; false negatives ship broken work. "No proven bugs remain" is a correct outcome when every suspicion was checked and dismissed with evidence. ## Overnight Loop — Autonomous Bug Return -If `doubt_review` produces any `proven_bug` finding, do **not** wait for the user. +If `doubt_review` produces any `proven_bug` finding, do **not** wait for the user: 1. Write the bugs to `decisions.md`. -2. Call `planner_doubt_review` to close the step with `nextStep: planning/read_context`. +2. Call `planner_doubt_review` to close the step with target `planning/read_context`. 3. Immediately advance: planning → execution → finalize again. 4. Notify the user only after the next full finalize cycle completes or when you hit a blocker that requires their input. -This loop is your responsibility. An overnight run that pauses to ask "should I go back to planning?" has failed the contract. The state machine already permits the transition; your job is to take it without hesitation. +The only exceptions are a finding that requires user clarification on genuinely ambiguous requirements, or a `needs_probe` that cannot run locally (e.g., needs production credentials). In those cases, surface the blocker clearly and wait. For everything else: loop. An overnight run that pauses to ask "should I go back to planning?" has failed the contract — the state machine already permits the transition. -The only exceptions are: -- A finding that requires user clarification on requirements (e.g., spec is genuinely ambiguous — not just code uncertainty). -- A `needs_probe` finding that cannot be run locally (e.g., needs production credentials the plan does not have). +## Planner Skill Memory -In those cases, surface the blocker clearly and wait. For everything else: loop. +Use `planner_skill_create` only for verified lessons that should improve future planner sessions — a reusable trigger and workflow such as a stale-context pattern, a recovery proof method, or a class of state-machine mistake. In doubt review, skill creation is expected when the audit exposed a reusable bug-finding method, a repeated false assumption, a missing-test pattern, or a compact hazard. Do not create skills for ordinary project facts or unverified suspicions. Write the body in `metadata.skillLanguage`; the wrapper writes frontmatter and stores the skill, loaded on future planner sessions via Pi `resources_discover`. ## Exit Condition Finalize is complete only when the integrated plan branch is checked, `final_summary.md` exists, final compact finishes, and state enters `done/present_result`. +## Diagnostics + +- **Integration regressions:** if final tests fail on the plan branch, check for a merge-conflict regression; rollback the merge, fix in the task branch, and re-merge. +- **Clean diff:** inspect for temporary debug lines, print statements, or scratch files before finalizing. Run lint/format first. +- **Branch sync:** verify the plan branch is up to date with the base branch. + ## manual-compact Preserve `final_summary.md`, project-level verification results, changed-file summary, branch state, known risks, and unresolved limitations. After compaction, call `planner_status`, read the final summary and verify artifacts, then enter done flow. ## auto-compact -Call `planner_status` immediately. Restore the exact finalize step. If the step is `compact_before_doubt`, complete the compact before audit work. If the step is `doubt_review`, reread `goal.md`, `plan.md`, `discovery.md`, task artifacts, `verify.md`, AGENTS.md contracts, and focused source files before deciding. Do not export or cleanup until explicit user acceptance is recorded. - -## Finalization & Verification Diagnostics - -### 1. Pre-Merge Verification Failures -- **Integration Test Regressions**: If final tests fail on the plan branch, identify if the issue is a merge conflict regression. -- **Clean Diff Verification**: Run code inspection to ensure no temporary debug lines, print statements, or scratch files are committed. -- **Branch Synchronization**: Verify that the plan branch is fully up-to-date with the main base branch. - -### 2. Resolution Flow -1. Run lint and format checks before finalizing. -2. If integration tests fail, rollback the merge, fix the bug in the task branch, and try merging again. +Call `planner_status` immediately. Restore the exact finalize step. If it is `compact_before_doubt`, complete the compact before audit work. If it is `doubt_review`, reread `goal.md`, `plan.md`, `discovery.md`, task artifacts, `verify.md`, AGENTS.md contracts, and focused source files before deciding. Do not export or cleanup until explicit user acceptance is recorded. ## If You Do Not Know What To Do Next -If you don't know what to do next, call `planner_status`. +The `planner_finish_step` result names your next step, its goal, and the worktree to work in — follow it. Call `planner_status` only when you need the full step rule or stage instruction, or when you are unsure. diff --git a/instructions/defaults/git-commit.md b/instructions/defaults/git-commit.md index 29800e4..d0bec13 100644 --- a/instructions/defaults/git-commit.md +++ b/instructions/defaults/git-commit.md @@ -6,14 +6,13 @@ Generate concise planner-controlled commit and merge messages. This file control ## Commit Message Rules -- Use imperative mood. -- Keep the subject concise and specific. +- Use imperative mood and keep the subject concise and specific. - Describe the completed atomic checkpoint, not the implementation process. -- Prefer one clear subject line. -- Mention the task or behavior when it improves clarity. +- Prefer one clear subject line; mention the task or behavior when it improves clarity. - Do not include raw model reasoning, temporary uncertainty, or verbose test logs. - Do not use vague subjects such as `update files`, `fix stuff`, `changes`, or `wip`. - Do not claim tests passed unless checks were actually run. +- Use `metadata.commitLanguage` from `planner_status` for human-readable text unless repository conventions or explicit user instructions override it. Conventional type prefixes (`feat:`, `fix:`, `test:`) are technical tokens, not translatable text. ## Suggested Subjects @@ -25,14 +24,13 @@ refactor: simplify docs: record ``` -Use the repository's existing convention when it is discoverable. Project append instructions may override language, prefix style, scope style, merge subject style, and team conventions. -Use `metadata.commitLanguage` from `planner_status` for human-readable commit text unless repository conventions or explicit user instructions override it. Conventional commit type prefixes such as `feat:`, `fix:`, and `test:` remain technical tokens. +Use the repository's existing convention when discoverable. Project append instructions may override language, prefix style, scope style, merge subject style, and team conventions. ## Merge Messages -- Refactor -> task: identify behavior-preserving cleanup. -- Task -> plan: identify completed atomic task. -- Plan -> output: use a conventional commit subject and a short body that explains the accepted result in `metadata.commitLanguage`. Include the plan title/id, output branch, and a concise summary of completed behavior. Do not use a vague one-line "merge result" message. +- Refactor -> task: identify the behavior-preserving cleanup. +- Task -> plan: identify the completed atomic task. +- Plan -> output: use a conventional commit subject and a short body in `metadata.commitLanguage`. Include the plan title/id, output branch, and a concise summary of completed behavior. Do not use a vague one-line "merge result" message. ## Restrictions @@ -49,20 +47,17 @@ Treat commit text as an audit artifact. - Merge/export commits must summarize completed behavior from `plan.md`, task artifacts, and final verification, not a generic acceptance phrase. - If the diff and task record disagree, resolve that mismatch before writing the commit. -## auto-compact - -After auto-compact, call `planner_status`. Use this style only when the current stage explicitly allows a planner commit or merge wrapper. +## Diagnostics -## Version Control & Workspace Diagnostics +- **Empty commits:** if a commit fails, confirm tracked files actually changed. +- **Locked index:** if the git index is locked, identify the holding process and advise the user or wait. +- **Wrong branch:** if you committed to the wrong branch, use planner git wrappers to inspect HEAD; do not attempt force pushes. +- Use the `planner_git_inspect` wrapper to confirm staged/modified/untracked state. -### 1. Commit Failures -- **Empty Commits**: If a commit fails, verify if you actually modified any tracked files. -- **Lock Files**: If git index is locked, identify the process holding the lock and advise the user, or wait. -- **Wrong Branch Commits**: If you committed to the wrong branch, use planner git commands to identify the head commit and do not attempt complex force pushes. +## auto-compact -### 2. Workspace Diagnostics -- Always run `git status` via the wrapper to confirm which files are staged, modified, or untracked. +After auto-compact, call `planner_status`. Use this style only when the current stage explicitly allows a planner commit or merge wrapper. ## If You Do Not Know What To Do Next -If you don't know what to do next, call `planner_status`. +The `planner_finish_step` result names your next step, its goal, and the worktree to work in — follow it. Call `planner_status` only when you need the full step rule or stage instruction, or when you are unsure. diff --git a/instructions/defaults/git.md b/instructions/defaults/git.md index 9b4847e..a861aa2 100644 --- a/instructions/defaults/git.md +++ b/instructions/defaults/git.md @@ -2,7 +2,7 @@ ## Purpose -Use git as the planner consistency layer without allowing the model to manipulate branches or commits directly. While a planner plan is active, raw git through shell is forbidden, including read-only raw git commands. +Use git as the planner consistency layer without letting the model manipulate branches or commits directly. While a planner plan is active, raw git through shell is forbidden, including read-only raw git commands. ## Public Planner Git Wrappers @@ -11,8 +11,8 @@ Use git as the planner consistency layer without allowing the model to manipulat - `planner_git_commit` stages and commits the current atomic checkpoint. - `planner_git_create_task_branch` creates or switches the active task branch. - `planner_git_create_refactor_branch` creates the refactor branch when required. -- `planner_git_merge_refactor_to_task` merges refactor result into current task. -- `planner_git_merge_task_to_plan` merges current task into protected plan branch. +- `planner_git_merge_refactor_to_task` merges refactor result into the current task. +- `planner_git_merge_task_to_plan` merges the current task into the protected plan branch. Final export, worktree removal, temporary branch cleanup, planner artifact removal, and Pi JSONL session handoff are intentionally not model tools. After explicit user acceptance, ask the user to run `/planner-finish`. @@ -26,23 +26,20 @@ base branch -> output/ ``` -The extension stores branch registry and merge targets in `state.json`. The model chooses ids only. It never chooses merge source or target branch names manually. +The extension stores the branch registry and merge targets in `state.json`. The model chooses ids only; it never chooses merge source or target branch names manually. ## Worktree Command Invariant -While a planner plan is active, the persisted worktree path reported by `planner_status` is the only project working directory. +While a planner plan is active, the persisted worktree path reported by `planner_status` (and echoed in each `planner_finish_step` result) is the only project working directory. -- Run every project-scoped shell command from the reported worktree path, regardless of language, package manager, build system, or script runner. -- This includes tests, builds, type checks, linters, formatters, code generators, dependency inspection, package scripts, compiler commands, and project-specific verification commands. -- Examples such as `npm test`, `cargo test`, `go test ./...`, `mvn test`, `make`, or custom scripts are only examples. The invariant applies to every project command. +- Run every project-scoped shell command from that worktree path, regardless of language, package manager, build system, or script runner. This includes tests, builds, type checks, linters, formatters, code generators, dependency inspection, package scripts, compiler commands, and project-specific verification commands. - Never run project checks from the original checkout while a planner plan is active. -- If the current shell cwd is unclear, call `planner_status`, read the exact worktree path, and execute the command with that path as cwd. +- If the current shell cwd is unclear, read the exact worktree path from the latest planner result or `planner_status`, and run with that path as cwd. - Planner artifact reads and writes still use the artifact paths reported by `planner_status`. ## Checkpoint Rules - Commit only through `planner_git_commit`. -- After commit or merge, call `planner_status` and continue the persisted state-machine step. - A dirty worktree is allowed during implementation but must be resolved before merge boundaries. - Conflicts, unexpected branch changes, missing worktrees, and inconsistent history require recovery inspection. - External commits trigger recovery inspection, not automatic reset. @@ -56,11 +53,11 @@ While a planner plan is active, the persisted worktree path reported by `planner ## Restrictions -- Do not run `git` through shell. -- Do not use shell aliases, scripts, or indirect commands to bypass planner git wrappers. -- Built-in project write/edit calls are enabled only for the exact execution steps reported by `planner_status`: `write_tests`, `implement_task`, and `refactor_task`. The planner does not infer file roles from names. +- Do not run `git` through shell, and do not use aliases, scripts, or indirect commands to bypass the wrappers. +- Built-in project write/edit calls are enabled only for the exact execution steps reported by `planner_status` (`write_tests`, `implement_task`, `refactor_task`). The planner does not infer file roles from names. +- Built-in write/edit cannot modify planner-managed structured artifacts (`goal.md`, `questions.md`, the active task's `tdd.md`); use their submit wrappers instead. - Never edit the original checkout while a planner worktree is active. All project changes belong in the persisted worktree path. -- Do not reset, force checkout, abort, delete, or discard changes without explicit user approval through recovery flow. +- Do not reset, force checkout, abort, delete, or discard changes without explicit user approval through the recovery flow. ## Evidence Discipline @@ -69,23 +66,21 @@ Treat git state as observed data, not memory. - Do not assume the current branch, worktree, dirty state, or merge target. Read it from planner wrappers/status. - If branch state differs from the expected planner state, stop and enter recovery instead of fixing with raw git. - Do not claim a commit or merge contains specific work unless the diff/task artifacts support it. -- After every git wrapper, call `planner_status` before deciding the next action. + +## Diagnostics + +- **Never use raw git:** raw git via bash bypasses the state machine and corrupts planner state. Use only the wrappers above. +- **Branch name conflicts:** branch names follow the lifecycle structure; the model supplies ids only. +- **Worktree incoherence:** if git worktree state disagrees with the planner database, run recovery tools immediately. ## manual-compact -Preserve current branch, HEAD, dirty/conflict status, managed branch registry, merge targets, cleanup obligations, and exact next wrapper. After compaction, call `planner_status`. +Preserve current branch, HEAD, dirty/conflict status, managed branch registry, merge targets, cleanup obligations, and the exact next wrapper. After compaction, call `planner_status`. ## auto-compact Call `planner_status` immediately. Inspect git through planner wrappers before resuming. Do not infer current branch or commit from compacted chat history. -## Git Integration Diagnostics - -### 1. Git Wrappers vs. Raw Git -- **Never use Raw Git**: Raw git commands run via bash bypass the state machine and corrupt the planner state. Only use planner git wrapper tools. -- **Branch Name Conflicts**: Ensure branch names follow the expected project structure. -- **Worktree State Incoherence**: If the git worktree state disagrees with the planner database, run recovery tools immediately. - ## If You Do Not Know What To Do Next -If you don't know what to do next, call `planner_status`. +The `planner_finish_step` result names your next step, its goal, and the worktree to work in — follow it. Call `planner_status` only when you need the full step rule or stage instruction, or when you are unsure. diff --git a/instructions/defaults/init.md b/instructions/defaults/init.md index e3fc926..7e24c6e 100644 --- a/instructions/defaults/init.md +++ b/instructions/defaults/init.md @@ -2,9 +2,9 @@ ## Purpose -Initialize planner control before any project discovery or implementation work begins. +Initialize planner control before any project discovery or implementation begins. -The normal entry point is `planner_create_plan` or `/planner-create`. The extension performs init as an internal atomic bootstrap and should leave the persisted position at `intake/draft_goal`. If `planner_status` exposes an init step, follow the exact step rule and do not skip ahead. +The normal entry point is `planner_create_plan` or `/planner-create`. The extension runs init as an internal atomic bootstrap and should leave the persisted position at `intake/draft_goal`. If `planner_status` exposes an init step, follow the exact step rule and do not skip ahead. ## Required Discipline @@ -14,8 +14,7 @@ The normal entry point is `planner_create_plan` or `/planner-create`. The extens 4. Prepare project storage, settings, instruction files, and plan artifacts. 5. Resolve the worktree location from effective settings. Do not invent a path. 6. Create exactly one dedicated worktree for the whole plan. - - For a project-local worktree, the extension writes a repository-local exclude rule for the original checkout. - - If the plan branch `.gitignore` rule is created or appended, the extension commits it immediately on the plan branch before normal planner work begins. + - For a project-local worktree, the extension writes a repository-local exclude rule for the original checkout and commits it on the plan branch before normal work begins. 7. Enter `intake/draft_goal`. ## Restrictions @@ -40,22 +39,16 @@ Treat init as untrusted bootstrap until persisted state proves otherwise. - If bootstrap facts conflict, stop and enter recovery instead of guessing the next stage. - Do not continue from optimistic assumptions after auto-compact. -## auto-compact - -An auto-compact during init does not authorize progress. Call `planner_status`, reload the exact persisted init step, and continue only with the wrapper reported by status. Do not inspect source until intake is approved and state explicitly says `discovery/scan_project_structure`. +## Diagnostics -## Initialization & Bootstrapping Diagnostics +- **Worktree conflicts:** if worktree creation fails, check for an existing directory of the same name, a dirty repository, or a locked index (`.git/index.lock`). +- **Workspace resolution:** the cwd must be inside the workspace root. Never initialize a plan in `/tmp` or a system directory. +- **State inconsistency:** if plan files exist but no active plan is detected, use recovery tools — do not manually recreate files. Consider `planner_git_init` only when git is uninitialized in the project root. -### 1. Environment & Setup Failures -- **Worktree Conflicts**: If the worktree creation fails, check if a directory with the same name already exists. Ensure that your current git repository is clean and has no locked index files (`.git/index.lock`). -- **Workspace Resolution**: Verify that the Cwd is within the workspace root. Never initialize a plan in `/tmp` or system directories. -- **State Inconsistency**: If the plan files exist but the active plan status is not detected, use recovery tools immediately instead of manually recreating files. +## auto-compact -### 2. Troubleshooting Steps -1. Call `planner_status` to see if the project storage paths are correctly resolved. -2. Check for locked files or missing file permissions. -3. If git is uninitialized in the project root, verify if you should call `planner_git_init`. +An auto-compact during init does not authorize progress. Call `planner_status`, reload the exact persisted init step, and continue only with the wrapper reported by status. Do not inspect source until intake is approved and state explicitly says `discovery/scan_project_structure`. ## If You Do Not Know What To Do Next -If you don't know what to do next, call `planner_status`. +The `planner_finish_step` result names your next step, its goal, and the worktree to work in — follow it. Call `planner_status` only when you need the full step rule or stage instruction, or when you are unsure. diff --git a/instructions/defaults/intake.md b/instructions/defaults/intake.md index 707d9a0..a4ef49d 100644 --- a/instructions/defaults/intake.md +++ b/instructions/defaults/intake.md @@ -7,20 +7,16 @@ Turn the user's raw request into an explicit approved goal before reading projec ## Strict Step Order 1. `draft_goal` - - Read `request.md`. - - Draft the `goal.md` content in your own words. - - Include the requested outcome, current assumptions, non-goals, and constraints. + - Read `request.md` and draft the `goal.md` content in your own words: requested outcome, current assumptions, non-goals, and constraints. - Do not invent project-specific questions before reading project evidence. - - Call `planner_goal_submit` with the full goal markdown, proposed title, and short planner-list description. The wrapper writes `goal.md`. - - Do not use built-in write/edit tools for `goal.md`. - - Use `metadata.titleLanguage` from `planner_status` for the title unless the user explicitly requests another language. - - The title is user-facing and may contain Unicode. It is not the stable branch-safe `planId`. + - Call `planner_goal_submit` with the full goal markdown, proposed title, and short planner-list description. The wrapper writes `goal.md`; built-in write/edit cannot. + - Use `metadata.titleLanguage` from `planner_status` for the title unless the user explicitly requests another language. The title is user-facing and may contain Unicode; it is not the stable branch-safe `planId`. - Do not inspect project source. 2. `await_goal_approval` - - Show the user the full generated `goal.md` content. The `planner_goal_submit` result includes it for review. - - Explain that `plan.md` is intentionally written later, during `planning/draft_plan`, after discovery and evidence-based questions are complete. + - Show the user the full generated `goal.md` content (the `planner_goal_submit` result includes it for review). + - Explain that `plan.md` is written later, during `planning/draft_plan`, after discovery and evidence-based questions. - Ask whether the goal and proposed title are approved or need revision. - - If revision is requested, call `planner_goal_submit` with the revised goal markdown and title, then ask again. + - If revision is requested, call `planner_goal_submit` again with the revised goal markdown and title, then ask again. - Enter discovery only after explicit approval. ## Restrictions @@ -56,21 +52,17 @@ Before finishing an intake step, doubt the normalized goal: If doubt remains, revise `goal.md` or ask one concrete intake question. Do not enter discovery on an inferred goal. -## auto-compact - -Call `planner_status` immediately. Read `request.md` and `goal.md`, then resume the exact intake step. Do not begin discovery without explicit approval. +## Diagnostics -## Intake & Goal Diagnostics +- **Underspecified outcomes:** if the request lacks concrete metrics ("make it faster", "fix bugs"), do not guess. Draft `goal.md` with explicit, testable criteria and confirm with the user. +- **Assumptions vs facts:** list technical assumptions under a dedicated header in `goal.md`; treat any unconfirmed assumption as a risk. +- **Scope creep:** define Non-Goals clearly so the model does not wander into unrelated code. +- **Rejected goal:** do not argue. Ask for specific feedback, rewrite, re-submit. Never move to discovery without a signed-off `goal.md`. -### 1. Goal Ambiguity & Verification -- **Identify Underspecified Outcomes**: If the user's initial request lacks concrete metrics (e.g., "make it faster" or "fix bugs"), do not guess. Draft `goal.md` with explicit, testable criteria and ask the user for confirmation. -- **Assumptions vs. Facts**: List all technical assumptions explicitly under a dedicated header in `goal.md`. Treat any unconfirmed assumption as a risk. -- **Scope Creep Prevention**: Clearly define "Non-Goals" to prevent the model from wandering into unrelated parts of the codebase. +## auto-compact -### 2. Failure Diagnostics -- **If the user rejects the goal**: Do not argue. Ask for specific feedback, rewrite the goal, and re-submit. -- **If the user is unresponsive**: Keep the goal simple and await explicit approval. Never move to discovery without a signed-off `goal.md`. +Call `planner_status` immediately. Read `request.md` and `goal.md`, then resume the exact intake step. Do not begin discovery without explicit approval. ## If You Do Not Know What To Do Next -If you don't know what to do next, call `planner_status`. +The `planner_finish_step` result names your next step, its goal, and the worktree to work in — follow it. Call `planner_status` only when you need the full step rule or stage instruction, or when you are unsure. diff --git a/instructions/defaults/planning.md b/instructions/defaults/planning.md index e130dc4..7bc1ffe 100644 --- a/instructions/defaults/planning.md +++ b/instructions/defaults/planning.md @@ -11,7 +11,7 @@ At `planning/read_context`, load context in this order: 1. Call `planner_status`. 2. Read `discovery.md`, `questions.md`, and `decisions.md`. 3. Use `planner_contract_route/read` for applicable AGENTS.md chains before extra source reads. -4. Read specific source files only when the recorded discovery context and local contracts are insufficient. +4. Read specific source files only when recorded discovery context and local contracts are insufficient. ## Strict Step Order @@ -19,27 +19,19 @@ At `planning/read_context`, load context in this order: - Reconstruct project context from compacted artifacts. - If `decisions.md` contains a Change Request, treat this as a follow-up planning pass. Reread the Post-Implementation Snapshot in `discovery.md`, especially `Completed Work` and `Remaining Work`. 2. `draft_plan` - - Write the full implementation strategy to `plan.md`. - - Include goal, non-goals, constraints, risks, integration boundaries, required checks, and unresolved decisions. - - In a follow-up planning pass, preserve the existing completed plan history. Append or revise only the sections needed for the change request; do not replace `plan.md` wholesale and do not repeat work already listed under `Completed Work`. - - For follow-up work, add a clearly labeled revision section with what remains and why the previous implementation was rejected. + - Write the full implementation strategy to `plan.md` through `planner_plan_submit` (pass the full content; it saves the file atomically). + - Include goal, non-goals, constraints, risks, integration boundaries, required checks, and unresolved decisions. + - In a follow-up planning pass, preserve the existing completed plan history. Append or revise only the sections needed for the change request; do not replace `plan.md` wholesale and do not repeat work already listed under `Completed Work`. Add a clearly labeled revision section with what remains and why the previous implementation was rejected. 3. `split_tasks` - - Split the plan into small ordered tasks. - - Each task must be independently understandable and small enough for one TDD loop. - - In a follow-up planning pass, existing completed task artifacts are history. Create new revision task IDs for new work; do not reuse a completed task ID. + - Split the plan into small ordered tasks, each independently understandable and small enough for one TDD loop. + - In a follow-up planning pass, existing completed task artifacts are history. Create new revision task IDs for new work; do not reuse a completed task ID. 4. `write_task_files` - - Call `planner_task_upsert` once per behavioral task. - - Provide semantic fields only: task id, title, objective, scope, acceptance criteria, and optional Local Contract Context fields. - - The wrapper creates `task.json`, `task.md`, and empty TDD lifecycle artifacts. Do not write task JSON manually. - - Each `task.md` must state scope, acceptance criteria, expected files or symbols, dependency context, checks, and relevant AGENTS.md chain when known. - - In a follow-up planning pass, call `planner_task_upsert` only for new or still-pending revision tasks. Completed task IDs are immutable audit history. -5. `verify_plan` - - Verify that tasks are ordered, bounded, testable, and free of hidden broad work. - - Record decisions and remaining risks. -6. `compact_planning` - - Compact the finished plan and task list. -7. `enter_execution` - - Advance to `execution/prepare_task`. + - Call `planner_task_upsert` once per behavioral task with semantic fields only: task id, title, objective, scope, acceptance criteria, and optional Local Contract Context fields. The wrapper creates `task.json`, `task.md`, and empty TDD lifecycle artifacts; do not write task JSON manually. + - Each `task.md` must state scope, acceptance criteria, expected files or symbols, dependency context, checks, and the relevant AGENTS.md chain when known. + - In a follow-up planning pass, call `planner_task_upsert` only for new or still-pending revision tasks. Completed task IDs are immutable audit history. +5. `verify_plan` — verify tasks are ordered, bounded, testable, and free of hidden broad work. Record decisions and remaining risks. +6. `compact_planning` — compact the finished plan and task list. +7. `enter_execution` — advance to `execution/prepare_task`. ## Task Design Rules @@ -47,20 +39,18 @@ At `planning/read_context`, load context in this order: - Prefer dependency order: foundations before composition. - Do not batch unrelated functions, files, or refactors into one task. - Every task must define how TDD proves the requested behavior. -- Never create standalone plan tasks named like "write tests", "add mocks", "test the implementation", or "verify the code". -- Tests, mocks, fixtures, and checks belong inside the individual behavioral task that needs them. Each task runs its own tests-first TDD loop before production edits. -- A separate testing task is allowed only when test infrastructure itself is the requested product behavior or an explicit shared prerequisite, not merely because implementation needs tests. +- Never create standalone plan tasks named like "write tests", "add mocks", "test the implementation", or "verify the code". Tests, mocks, fixtures, and checks belong inside the behavioral task that needs them; each task runs its own tests-first TDD loop before production edits. +- A separate testing task is allowed only when test infrastructure itself is the requested product behavior or an explicit shared prerequisite. - If a task reveals additional required work, add or revise a task artifact during planning instead of silently expanding implementation scope. - For a change request after a completed pass, use new revision task IDs such as `fix-storage-root-revision` instead of reopening completed task IDs. ## Restrictions -- Do not edit production files. -- Do not write tests yet. +- Do not edit production files or write tests yet. - Built-in project write/edit calls remain blocked. Shell remains available, but raw git is forbidden. - Do not create task branches. - Do not reread the whole project unless recorded discovery context is insufficient. -- Do not ignore local contracts. If AGENTS.md files exist, task scope should preserve the relevant contract chain or explicitly explain why none applies. +- Do not ignore local contracts. If AGENTS.md files exist, task scope should preserve the relevant contract chain or explain why none applies. - Do not rely on chat memory; write durable facts to artifacts. ## Exit Condition @@ -85,56 +75,35 @@ Before finishing planning, doubt the plan shape: - Are completed tasks preserved as history during follow-up planning? - Does `plan.md` explain remaining work without repeating work already completed? -If doubt remains, revise `plan.md` or task artifacts before entering execution. Do not rely on memory from chat. +If doubt remains, revise `plan.md` or task artifacts before entering execution. Do not rely on chat memory. -## Fundamental Rules +## Fundamental Rule: Integration vs New Entity -### Rule 3: Integration vs New Entity +**Prerequisite:** this applies ONLY when the user did not explicitly ask to modify file X. If the user said "change X", their word is final. -**Prerequisite:** This rule applies ONLY if the user did not explicitly ask to modify file X. If the user said "change X" — their word is final. +When adding new functionality to an existing project, decide: integrate into existing code, or create a new entity/module/class. -**Core principle:** When adding new functionality to an existing project, you must decide: integrate into existing code or create a new entity/module/class. +**Integrate into existing code when:** the new functionality is a natural continuation of the file/module's logic; changes are minimal and do not restructure existing code; the file already contains similar mechanisms that fit the same pattern; no refactoring of the existing structure is required. -**Criteria for integrating into existing code (when you may touch existing files):** +**Create a new entity (do NOT touch existing files, even "related" ones) when:** the existing code is already a complete, logically closed entity; integration would change its public interface; the new functionality has a distinct responsibility; the existing code follows a pattern that does not support internal extension. -- The new functionality is a natural continuation of the existing logic of this file/module -- Changes are minimal and do not restructure the existing code -- The existing file already contains similar mechanisms, and the new functionality fits the same pattern -- Adding new code does not require refactoring or rebuilding the existing structure +**How to decide:** compare responsibilities. If the new responsibility matches or is a subset of the existing one → integrate. If responsibilities differ or it is a parallel entity → create a new entity. You do not touch existing files when the new functionality is not a natural continuation of their responsibility. -**Criteria for creating a new entity (when you must NOT touch existing files, even if they seem "related"):** +## Diagnostics -- The existing code is already a complete, logically closed entity (module, class, service) -- Integration would require changing the public interface of the existing entity -- The new functionality has a distinct responsibility from the existing one -- The existing code follows a pattern that does not support internal extension (e.g., a module that must remain unchanged) - -**How to decide:** -1. Look at the existing code. What does it do? What is its responsibility? -2. Look at the new functionality. What is its responsibility? -3. If responsibilities match or the new one is a subset of the existing → integrate. -4. If responsibilities differ or the new one is a parallel entity → create a new entity. - -**Deduction:** You do not touch existing files if the new functionality is not a natural continuation of their responsibility. You create a new entity if the existing one is already complete. +- **Too-large scope:** if a task description contains multiple unrelated behaviors, split it. +- **Missing dependencies:** order tasks correctly (e.g., schema changes before API handlers). +- **Incoherent task id:** use lowercase, clean alphanumeric ids. +- If verification fails during planning, re-evaluate the architecture and confirm files in task scope exist or are planned. ## manual-compact -Preserve the full plan goal, constraints, ordered task list, task artifact paths, task dependencies, acceptance criteria, open decisions, and `discovery.md`. After compaction, call `planner_status`. Before the first task, reread the full `plan.md`, then read only the selected `task.md` and use focused project search when context is insufficient. +Preserve the full plan goal, constraints, ordered task list, task artifact paths, dependencies, acceptance criteria, open decisions, and `discovery.md`. After compaction, call `planner_status`. Before the first task, reread the full `plan.md`, then read only the selected `task.md` and use focused project search when needed. ## auto-compact Call `planner_status` immediately and restore the exact planning step. Reread `plan.md` if it has already been written. Do not regenerate tasks from chat history and do not begin execution until the persisted plan is verified. -## Planning & Task Diagnostics - -### 1. Decomposition Failures -- **Too Large Scope**: If a task description contains multiple unrelated behaviors, split it into smaller sub-tasks. -- **Missing Dependencies**: Ensure task dependencies are ordered correctly (e.g., database schema changes must be executed before API handlers). -- **Incoherent Task ID**: Use lowercase, clean alphanumeric IDs. - -### 2. Troubleshooting Planning Errors -- If verification fails during planning, re-evaluate the architecture. Ensure files listed in the task scope actually exist or are planned to be created. - ## If You Do Not Know What To Do Next -If you don't know what to do next, call `planner_status`. +The `planner_finish_step` result names your next step, its goal, and the worktree to work in — follow it. Call `planner_status` only when you need the full step rule or stage instruction, or when you are unsure. diff --git a/instructions/defaults/recovery.md b/instructions/defaults/recovery.md index c19ebe5..29db26d 100644 --- a/instructions/defaults/recovery.md +++ b/instructions/defaults/recovery.md @@ -2,37 +2,30 @@ ## Purpose -Recover safely after crash, manual git changes, missing worktree, wrong branch, conflicts, or inconsistent planner storage. +Recover safely after a crash, manual git changes, a missing worktree, a wrong branch, conflicts, or inconsistent planner storage. Recovery is inspection-first. It must never perform destructive repair without explicit user approval. ## Strict Step Order -1. `read_state` - - Read `project.json`, active `plan.json`, and active `state.json`. -2. `inspect_git` - - Use planner inspection wrappers to read worktree path, branch, HEAD, dirty state, conflicts, and managed branch existence. -3. `compare_expected_actual` - - Compare actual git reality with persisted branch, worktree, and merge targets. -4. `classify_recovery` - - Classify missing worktree, wrong branch, dirty worktree, external commit, manual checkout, history rewrite, conflict, or missing files. -5. `ask_user_if_destructive` - - Ask the user before reset, delete, force checkout, abort, discard, or other destructive repair. -6. `repair_or_resume` - - Apply only approved repair or resume into an explicit valid non-recovery stage and step. +1. `read_state` — read `project.json`, the active `plan.json`, and the active `state.json`. +2. `inspect_git` — use planner inspection wrappers to read worktree path, branch, HEAD, dirty state, conflicts, and managed-branch existence. +3. `compare_expected_actual` — compare actual git reality with persisted branch, worktree, and merge targets. +4. `classify_recovery` — classify missing worktree, wrong branch, dirty worktree, external commit, manual checkout, history rewrite, conflict, or missing files. +5. `ask_user_if_destructive` — ask the user before reset, delete, force checkout, abort, discard, or any destructive repair. +6. `repair_or_resume` — apply only approved repair, or resume into an explicit valid non-recovery stage and step. ## Recovery Rules - Use `planner_recovery_inspect` before proposing action. - Until recovery confirms the persisted worktree path, do not run project tests, builds, generators, or verification commands. After resume, run them only from the worktree path reported by `planner_status`. -- Do not run raw git. -- Do not hide external changes. +- Do not run raw git, and do not hide external changes. - Missing project context is not a reason to reset git. Rebuild a bounded overview when needed. - A clean external commit is not automatically an error. Inspect the actual branch and resume only when persisted state is coherent. - Conflicts, missing worktrees, and missing state block normal flow. - If the original project directory is missing, tell the user clearly and use only documented best-effort cleanup paths. - Persisted state and planner git wrappers remain the source of branch and merge targets. -- If recovery follows a stuck compact, treat the stuck report as evidence, not as a negative state. Reset tone: one hypothesis, one smallest falsifying probe, one observed fact, then continue or ask the user. +- If recovery follows a stuck compact, treat the stuck report as evidence, not a negative state. Reset tone: one hypothesis, one smallest falsifying probe, one observed fact, then continue or ask the user. ## Resume Reload @@ -41,7 +34,7 @@ After recovery resume: 1. Call `planner_status`. 2. Read the exact stage instruction bundle. 3. Reread full `plan.md` when scope, task ordering, branch history, or user feedback may have changed. -4. Reload active `task.md`, `tdd.md`, summaries, and focused source files only when needed after resuming execution. +4. Reload the active `task.md`, `tdd.md`, summaries, and focused source files only when needed after resuming execution. 5. Continue only after the git recovery gate is clear. ## Evidence Discipline @@ -51,24 +44,18 @@ Treat recovery as state reconstruction, not continuation of a remembered plan. - Do not trust chat memory, branch names, or previous summaries until persisted state and git state agree. - Do not mark recovery complete while conflicts, dirty state, or cleanup obligations remain unexplained. - If the next action is unclear, prefer one smallest state probe over narrative reasoning. -- If the probe contradicts the expected state, stop and use recovery guidance instead of forcing the old path. +- If the probe contradicts the expected state, use recovery guidance instead of forcing the old path. -## auto-compact - -Call `planner_status` immediately. Do not assume recovery was completed. Reload persisted state, rerun recovery inspection, and wait for explicit user approval before destructive repair. +## Diagnostics -## Recovery & Git-State Diagnostics +- **No progress during recovery:** normal planner wrappers are blocked until the git discrepancy is resolved. +- **Root cause:** read the recovery inspection report and decide whether the issue is a missing worktree, a manual checkout, or corrupted JSON. +- **Destructive acts:** never run reset or delete without asking the user. Resolution flow: inspect with `planner_recovery_inspect`, follow the classification suggestions, then resume to a safe stage/step via `planner_recovery_resume`. -### 1. Recovery Gating -- **No Progress during Recovery**: You are blocked from running normal planner commands while in recovery. You must resolve the git discrepancy first. -- **Identify Root Cause**: Read the recovery inspection report. Determine if the issue is a missing worktree, a manually checked out branch, or corrupted JSON files. -- **User Approval for Destructive Acts**: Never run reset or delete actions without asking the user. +## auto-compact -### 2. Resolution Flow -1. Inspect git state with `planner_recovery_inspect`. -2. Follow the recovery classification suggestions. -3. Resume to a safe stage/step via `planner_recovery_resume`. +Call `planner_status` immediately. Do not assume recovery was completed. Reload persisted state, rerun recovery inspection, and wait for explicit user approval before destructive repair. ## If You Do Not Know What To Do Next -If you don't know what to do next, call `planner_status`. +The `planner_finish_step` result names your next step, its goal, and the worktree to work in — follow it. Call `planner_status` only when you need the full step rule or stage instruction, or when you are unsure. diff --git a/instructions/defaults/refactor.md b/instructions/defaults/refactor.md index ff66a3e..3ab1b04 100644 --- a/instructions/defaults/refactor.md +++ b/instructions/defaults/refactor.md @@ -4,34 +4,31 @@ Challenge and improve the current task implementation after production code is committed. Refactor changes structure, clarity, naming, duplication, or integration quality without changing requested behavior. -KISS does not mean avoiding advanced language features. Traits, interfaces, generics, macros, or other abstractions are valid when the current task needs them. KISS means every abstraction, branch, type, helper, and extension point must justify its existence through the current behavior or existing project design. Do not add flexibility for imagined future work. +KISS does not mean avoiding advanced language features. Traits, interfaces, generics, macros, and other abstractions are valid when the current task needs them. KISS means every abstraction, branch, type, helper, and extension point must justify its existence through the current behavior or existing project design. Do not add flexibility for imagined future work. ## Required Process 1. Read `task.md`, `tdd.md`, existing `refactor.md`, and focused source files only when needed. 2. Inspect the current task-branch diff through planner wrappers. 3. Question the implementation actively: - - Can any helper, abstraction, branch, conversion, or validation path be removed or made clearer? - - Is any code duplicated, speculative, over-generalized, or implemented for future use rather than the current task? - - Does the implementation match existing project conventions and confirmed user decisions? - - Are signatures and effects still as small and explicit as possible? - - If you think no refactor is needed, what concrete diff fact proves that changing it would make the code worse? + - Can any helper, abstraction, branch, conversion, or validation path be removed or made clearer? + - Is any code duplicated, speculative, over-generalized, or implemented for future use rather than the current task? + - Does the implementation match existing project conventions and confirmed user decisions? + - Are signatures and effects still as small and explicit as possible? + - If you think no refactor is needed, what concrete diff fact proves that changing it would make the code worse? 4. Call `planner_refactor_review` with concrete review fields and every required category review. The wrapper writes `refactor.md` in the required format. A passing test, linter, formatter, or build is not a refactor review. 5. Apply only behavior-preserving changes. 6. Run focused tests from the worktree path reported by `planner_status` after each meaningful refactor group. -7. Commit through planner wrappers if files changed. -8. Update task artifacts when the refactor changes relevant implementation details. -9. If the review proves a reusable refactor/debugging lesson, repeated mistake, stale-context pattern, or category-specific audit method, call `planner_skill_create` with `sourceKind=refactor` before leaving this step. -10. Commit the refactor if project files changed. +7. Commit through planner wrappers if files changed, and update task artifacts when the refactor changes relevant implementation details. +8. If the review proves a reusable refactor/debugging lesson, a repeated mistake, a stale-context pattern, or a category-specific audit method, call `planner_skill_create` with `sourceKind=refactor` before leaving this step. ## Restrictions -- Do not add new scope. -- Do not weaken tests to make refactor pass. +- Do not add new scope or weaken tests to make refactor pass. - Do not change public API unless the active task explicitly requires it. - Do not perform speculative cleanup outside the active task. -- Do not claim that refactor is unnecessary merely because tests, Clippy, a linter, or a formatter pass. Tool output is evidence, not design review. -- Do not run project tests, builds, formatters, or other verification commands from the original checkout. Use the planner worktree as shell cwd. +- Do not claim refactor is unnecessary merely because tests, a linter, or a formatter pass. Tool output is evidence, not design review. +- Do not run project tests, builds, formatters, or other verification from the original checkout. Use the planner worktree as shell cwd. - If a behavior change is required, stop and return to planning or create a new task. - Do not use raw git. @@ -48,9 +45,9 @@ Treat refactor as hostile review of the actual diff, not a style ritual. - If a cleanup would change behavior or expand scope, record it as deferred instead of smuggling it into refactor. - If the diff reveals missing behavior or missing tests, return to TDD/planning rather than hiding it under refactor. -## Required refactor.md Format +## refactor.md Format (written by planner_refactor_review) -Do not hand-write this file when `planner_refactor_review` is available. Pass semantic fields to the tool and let the wrapper write these exact level-two headings. Fill every field with concrete observations from the active task diff. +Do not hand-write this file. Pass semantic fields to the tool and let the wrapper write these exact level-two headings, each filled with concrete observations from the active task diff. ```md # Refactor Review @@ -136,36 +133,27 @@ Decision: changed | kept - ... ``` -If `Decision: changed`, `## Changes Applied` must describe the behavior-preserving edits. If `Decision: kept`, `## Why Kept` must explain why changing the actual diff would make the code worse or add unnecessary complexity. Do not write generic claims such as "tests pass" or "code is already good" as the reason. +If `Decision: changed`, `## Changes Applied` must describe the behavior-preserving edits. If `Decision: kept`, `## Why Kept` must explain why changing the actual diff would make the code worse or add unnecessary complexity. Do not write generic claims such as "tests pass" or "code is already good". -Every category must be reviewed exactly once: - -- `duplication` -- `naming` -- `control_flow` -- `abstraction_level` -- `hidden_coupling` -- `error_handling` -- `test_clarity` -- `debug_leftovers` -- `scope_creep` - -Use `status: issue` when a behavior-preserving improvement is needed. Use `status: not_applicable` only with concrete evidence explaining why that category does not apply to the active diff. +Every category must be reviewed exactly once. Use `status: issue` when a behavior-preserving improvement is needed; use `status: not_applicable` only with concrete evidence explaining why that category does not apply to the active diff. ## Doubt Checkpoint Refactor doubt is mandatory but bounded: - Find one concrete simplification opportunity, or one concrete reason each tempting change should be rejected. -- Do not refactor unrelated code to satisfy doubt. -- Do not invent abstractions to look thoughtful. +- Do not refactor unrelated code to satisfy doubt, and do not invent abstractions to look thoughtful. - Do not treat tool success as design proof. ## Planner Skill Memory -Create a planner skill only when refactor review finds a transferable rule that future tasks should reuse, such as a recurring boundary mistake, a hidden coupling pattern, or a reliable simplification method. Do not create a skill for "no refactor needed" or for a one-off project detail. +Create a planner skill only when refactor review finds a transferable rule that future tasks should reuse, such as a recurring boundary mistake, a hidden coupling pattern, or a reliable simplification method. Do not create a skill for "no refactor needed" or a one-off project detail. Use `metadata.skillLanguage` for the body; keep the Pi skill `description` trigger-specific. + +## Diagnostics -Use `metadata.skillLanguage` for the body. Keep the Pi skill `description` trigger-specific: name the exact situation that should activate the skill and what it prevents. +- **Behavior changes:** refactoring must not change external behavior. A test failing after refactor means this rule was violated. +- **Dead code & lint:** remove old/dead paths cleanly; run formatting/linting after refactor to catch unused imports. +- **Recovery:** if a refactor attempt breaks tests and cannot be easily fixed, revert to the clean task-branch HEAD. Refactor in small steps, committing after each successful one. ## manual-compact @@ -175,17 +163,6 @@ Preserve active task id, refactor intent, changed files, checks, commit, and any Call `planner_status` immediately. Reload `task.md`, `tdd.md`, and `refactor.md`. Confirm whether refactor changes were committed before resuming. -## Refactoring & Code-Safety Diagnostics - -### 1. Refactoring Regressions -- **Behavior Changes**: Refactoring must not change external behavior. If a test fails after refactoring, you have violated this rule. -- **Unused Code**: Ensure refactored paths remove old, dead code cleanly. -- **Lint Violations**: Refactoring often introduces unused imports or formatting issues. Always run formatting/linting tools immediately. - -### 2. Diagnostic Steps -1. Revert to the clean task branch HEAD if a refactoring attempt breaks tests and cannot be easily fixed. -2. Refactor in small, incremental steps, committing after each successful step. - ## If You Do Not Know What To Do Next -If you don't know what to do next, call `planner_status`. +The `planner_finish_step` result names your next step, its goal, and the worktree to work in — follow it. Call `planner_status` only when you need the full step rule or stage instruction, or when you are unsure. diff --git a/instructions/defaults/tdd.md b/instructions/defaults/tdd.md index 1333239..73a4bee 100644 --- a/instructions/defaults/tdd.md +++ b/instructions/defaults/tdd.md @@ -4,85 +4,73 @@ Use strict tests-first development for every execution task. Production implementation is forbidden until the active task has a written TDD plan and a demonstrated failing, mock, or contract signal. +`tdd.md` is written only through `planner_tdd_submit`. Built-in write/edit cannot modify it. The wrapper assembles the required `##` sections from structured fields and preserves sections you are not updating, so you can fill it incrementally across steps. + ## Required Sequence -1. During `write_tdd_plan`, read answered `questions.md`, `decisions.md`, `task.md`, existing tests, and project test conventions. -2. Write `tdd.md` with: - - behavior under test - - arguments, return value, errors, and integration points - - edge cases - - fixtures and mocks - - focused test commands - - expected failure or contract signal before implementation -3. During `write_tests`, write tests and required test harness wiring only. Record changed files, intent, and expected signal in `tdd.md`. If project files changed, commit through `planner_git_commit` before continuing. -4. During `run_failing_tests`, execute focused checks and record the exact failing signal in `tdd.md`. - - Before finishing `run_failing_tests`, add `## Pre-Implementation Proof Contract`. - - Record the exact missing-behavior signal, intended production path, success signal, and files that must stay out of scope. - - **Test Signal Doubt — mandatory before advancing:** - Answer each question in `tdd.md` before moving to `implement_task`: - 1. Would a trivial implementation (`return null`, empty function, hardcoded value) pass this test? If yes — the test does not prove the missing behavior. Rewrite it. - 2. Does the failure message name the missing behavior, or only a missing file/import/module? If only harness bootstrap — add a real behavior assertion first. - 3. Is there any way to make this test green without implementing the actual requested behavior? If yes — the test is underspecified. Add the constraint that blocks the shortcut. - If any answer is "yes" or "I'm not sure" — fix the test before touching production code. -5. Begin production edits only during `implement_task`. -6. Before finishing `implement_task`, add `## Post-Implementation Counterexample Review`. - - Record the smallest counterexample, boundary value, opposite case, regression risk, scope check, and action. - - If the counterexample is real, add a test or explicitly record why it is out of scope before continuing. - - If resolving the task produced a reusable verified lesson, call `planner_skill_create` to save it for future planner sessions. Do not create a skill for ordinary task summaries. -7. During `contract_check`, call `planner_contract_check`. - - **Watcher rule — presumption is always UPDATE or CREATE:** - For every directory where you edited or created files this task, ask two questions: - - - **Does an AGENTS.md exist at or above that directory level?** - - YES → default action is `upsert_existing`. Only downgrade to `no_update` if you can state concretely: "the diff introduced zero new domain rules, connection changes, or non-obvious invariants." Vague confidence ("nothing important changed") is not evidence. - - NO → default action is `create_new`. Only downgrade to `no_update` if the change is a single isolated fix with no new module boundary, no new integration point, and no rule that future sessions need. Absence of AGENTS.md is not a reason to skip — it is a reason to create one. - - - **Connection audit:** For each changed component ask: what calls it? What does it import from? What state does it write that other components read? Non-obvious dependencies or invariants found during implementation belong in AGENTS.md Domain Details — that section survives memory wipes; tdd.md does not. - - - **Level check:** If the nearest AGENTS.md is in a parent directory but the changed area is a distinct subdomain, prefer `create_new` at the specific directory over updating an overly abstract parent. - - - After `planner_contract_check`, call `planner_contract_upsert` for every `upsert_existing` or `create_new` decision. - - **AGENTS.md Self-Doubt — after every upsert or create:** - Re-read what you just wrote and answer: - 1. Does Domain Details name at least one concrete caller or dependency (→ flow, **Who writes:**, **Who calls:**)? If it's generic description with no connections — it's a template, not a contract. Rewrite Domain Details. - 2. If you removed this AGENTS.md entirely, would a fresh model reading the code know less about blast radius and routing? If the answer is "not really" — the content is too shallow. Add what surprised you during implementation. - 3. Does the Child Index list every subdirectory that has its own domain logic? If you created a new module but didn't add it to the parent's Child Index — fix the parent too. - - AGENTS.md files created or updated here are tracked by the planner and offered as keep/remove at `/planner-finish`. Keep them — they are the primary memory for future sessions in this codebase. -8. During `run_final_tests`, rerun focused tests and required broader integration checks. -9. Before finishing `merge_task_to_plan`, add `## Task Merge Scope Audit`. - - Confirm acceptance criteria coverage, changed-file scope, commands run, debug cleanup, commit message fit, and branch drift check. +1. During `write_tdd_plan`, read answered `questions.md`, `decisions.md`, `task.md`, existing tests, and project test conventions. Call `planner_tdd_submit` with the `preImplementation` fields: the missing-behavior signal, intended production path, success signal, and files that must stay out of scope. Also note behavior under test, arguments/returns/errors/integration points, edge cases, fixtures/mocks, and focused test commands. +2. During `write_tests`, write tests and required harness wiring only. Re-submit `tdd.md` with changed files, intent, and expected signal. If project files changed, commit through `planner_git_commit`. +3. During `run_failing_tests`, execute focused checks and record the exact failing signal via `planner_tdd_submit`. + - **Test Signal Doubt — mandatory before advancing to `implement_task`:** + 1. Would a trivial implementation (`return null`, empty function, hardcoded value) pass this test? If yes, the test does not prove the missing behavior — rewrite it. + 2. Does the failure message name the missing behavior, or only a missing file/import/module? If only harness bootstrap, add a real behavior assertion first. + 3. Is there any way to make this test green without implementing the requested behavior? If yes, the test is underspecified — add the constraint that blocks the shortcut. + If any answer is "yes" or "I'm not sure", fix the test before touching production code. +4. Begin production edits only during `implement_task`. Before finishing it, submit the `postImplementation` fields: smallest counterexample, boundary value, opposite case, regression risk, scope check, and action. If the counterexample is real, add a test or explicitly record why it is out of scope. If implementing produced a reusable verified lesson, call `planner_skill_create`. +5. During `contract_check`, call `planner_contract_check`, then `planner_contract_upsert` for every decision (see the watcher rule below). Contract consistency is recorded in AGENTS.md, not in `tdd.md`. +6. During `run_final_tests`, rerun focused tests and required broader integration checks. +7. Before finishing `merge_task_to_plan`, submit the `mergeScopeAudit` fields: acceptance-criteria coverage, changed-file scope, commands run, debug cleanup, commit-message fit, and branch-drift check. + +## tdd.md Sections (written by planner_tdd_submit) + +```md +## Pre-Implementation Proof Contract +- failingSignal: exact failing test/command output, mock/contract failure, or documented reason no local failing signal is possible +- productionPath: files/functions expected to change +- successSignal: exact command or assertion expected to pass after implementation +- outOfScopeFiles: files or areas that must not be changed for this task + +## Post-Implementation Counterexample Review +- counterexample: smallest input/user flow/state that could break the fix +- boundaryValue: boundary checked, or explicit reason it is not relevant +- oppositeCase: opposite behavior checked, or explicit reason it is not relevant +- regressionRisk: old behavior that could have been broken +- scopeCheck: whether the implementation stayed inside the task scope +- action: added test, recorded non-goal, or no action with evidence + +## Task Merge Scope Audit +- acceptanceCriteriaCovered: task acceptance criteria and evidence +- changedFilesMatchScope: changed files compared with task scope +- testsRun: exact focused and broader commands run +- debugRemoved: temporary logs/probes/scratch files removed +- commitMessageMatchesBehavior: latest planner commit describes behavior, not process +- branchDriftCheck: planner status/git wrapper state showed expected task/plan branch state +``` + +## Contract Watcher Rule (during contract_check) + +For every directory where you edited or created files, ask two questions: + +- **Does an AGENTS.md exist at or above that level?** YES → default `upsert_existing`; downgrade to `no_update` only if you can state concretely that the diff introduced zero new domain rules, connection changes, or non-obvious invariants. NO → default `create_new`; downgrade only for a single isolated fix with no new module boundary, integration point, or future-relevant rule. Absence of AGENTS.md is a reason to create one, not to skip. +- **Connection audit:** for each changed component — what calls it, what it imports from, what state it writes that others read. Non-obvious dependencies/invariants belong in AGENTS.md Domain Details (that survives memory wipes; `tdd.md` does not). +- **Level check:** if the nearest AGENTS.md is an overly abstract parent but the changed area is a distinct subdomain, prefer `create_new` at the specific directory. + +After each upsert/create, re-read what you wrote: does Domain Details name a concrete caller or dependency? Would a fresh model know less about blast radius without it? Does the Child Index list every subdirectory with its own domain logic (fix the parent if you added a module)? AGENTS.md files touched here are tracked and offered as keep/remove at `/planner-finish`. ## Test Signal Rules -- Run every test, build, and verification command from the worktree path reported by `planner_status`. Never run project checks from the original checkout while a planner plan is active. -- The rule is toolchain-independent: use the project's actual commands, whether they are package scripts, compiler commands, task runners, or custom scripts. -- Prefer a real failing test when the missing behavior can execute locally. -- Use a mock test when the external dependency is unavailable or unsafe to invoke. -- Use a contract test when the critical behavior is an interface, schema, command construction, or integration boundary. -- If a test cannot run locally, document why and still add the strongest deterministic mock or contract test available. +- Run every test, build, and verification command from the worktree path reported by `planner_status`. Never run checks from the original checkout while a plan is active. The rule is toolchain-independent. +- Prefer a real failing test when the missing behavior can execute locally. Use a mock test when an external dependency is unavailable/unsafe; use a contract test for interface, schema, command construction, or integration boundaries. If a test cannot run locally, document why and add the strongest deterministic mock or contract test available. - A test that passes before implementation without proving the missing behavior is not sufficient. -- A module-not-found, import error, or file-does-not-exist failure is only a harness/bootstrap signal. It is not a complete behavior proof unless the test already contains assertions for the task behavior and fails again or passes for that behavior after implementation. -- Placeholder, stub, fake, TODO-only, or hardcoded implementations do not satisfy green TDD. If a minimal implementation is intentionally narrow, the tests must prove the accepted behavior and the counterexample review must name what remains out of scope. +- A module-not-found / import / file-does-not-exist failure is only a harness/bootstrap signal, not a behavior proof, unless the test already asserts the task behavior. +- Placeholder, stub, fake, TODO-only, or hardcoded implementations do not satisfy green TDD. A deliberately narrow implementation must have tests proving the accepted behavior and a counterexample review naming what remains out of scope. ## Editing Rules -- Test steps may edit any files required to create tests, fixtures, mocks, and test harness integration. -- Do not change production behavior during `write_tests`. -- Do not perform unrelated cleanup while writing tests. +- Test steps may edit any files required for tests, fixtures, mocks, and harness integration. +- Do not change production behavior during `write_tests`, and do not do unrelated cleanup. - Before finishing the task, inspect the planner-controlled diff and confirm every changed file belongs to the task. -## Verification Record - -Record in `tdd.md`: -- exact commands run -- worktree cwd used for each command -- expected and actual result -- failing signal before implementation -- passing signal after implementation -- skipped checks and reasons -- edge cases covered - ## Evidence Discipline Treat TDD as the proof engine. Do not trust an implementation until the test signal changes for the intended reason. @@ -92,53 +80,21 @@ Treat TDD as the proof engine. Do not trust an implementation until the test sig - If a counterexample is plausible, make it a test, record it as a non-goal with evidence, or mark the task blocked. - Do not add broad tests to feel safer; add the smallest test that can falsify the current claim. -## Planner Skill Memory - -`planner_skill_create` is future memory. Use it aggressively after a lesson is proven by a failing signal, debug probe, counterexample review, repeated mistake, stale-context recovery, or state-machine/tooling mistake that future planner sessions should avoid. - -Create the skill before leaving the step that proved the lesson when the lesson is reusable. Do not postpone it to final summary unless the lesson only becomes clear at plan level. - -Do not create skills for: -- ordinary task summaries -- project-specific paths that will not generalize -- unverified opinions -- broad advice like "write tests" or "debug carefully" - -When creating a skill, write the skill body in `metadata.skillLanguage`. The wrapper writes `name` and `description` frontmatter and stores the skill under the planner extension library. +## Doubt / Boundary Coverage -Required proof/audit sections: +For every behavioral task, choose only the cases that falsify a real acceptance risk before writing production code: the happy path that the task changes; minimum bounds (empty/zero) only when boundaries are part of the behavior; maximum bounds only when the task can plausibly break them; error/danger cases only when the task owns validation/error behavior. No reassurance tests: add a test only when it would fail before the fix or protect a named requirement. Write only the minimum production code to make the tests pass; if you handle a case no test covers, add the test first. -```md -## Pre-Implementation Proof Contract -- failingSignal: exact failing test, command output, mock/contract failure, or documented reason no local failing signal is possible -- productionPath: files/functions expected to change -- successSignal: exact command or assertion expected to pass after implementation -- outOfScopeFiles: files or areas that must not be changed for this task +## Planner Skill Memory -## Post-Implementation Counterexample Review -- counterexample: smallest input/user flow/state that could break the fix -- boundaryValue: boundary checked or explicit reason it is not relevant -- oppositeCase: opposite behavior checked or explicit reason it is not relevant -- regressionRisk: old behavior that could have been broken -- scopeCheck: whether the implementation stayed inside the task scope -- action: added test, recorded non-goal, or no action with evidence +`planner_skill_create` is future memory. Use it after a lesson is proven by a failing signal, debug probe, counterexample review, repeated mistake, stale-context recovery, or state-machine/tooling mistake. Create the skill before leaving the step that proved the lesson when it is reusable. Do not create skills for ordinary task summaries, non-generalizing project paths, unverified opinions, or broad advice. Write the body in `metadata.skillLanguage`; the wrapper writes `name`/`description` frontmatter and stores the skill. -## Contract Consistency Check -- action: no_update, upsert_existing, or create_new from planner_contract_check -- outcome: what changed and why it does or does not affect durable AGENTS.md memory -- domain impact: affected domain rule, or none with evidence -- recommended path: nearest meaningful AGENTS.md path or none -- changed files: files checked -- evidence: diff/test/artifact facts supporting the action +## Diagnostics -## Task Merge Scope Audit -- acceptanceCriteriaCovered: task acceptance criteria and evidence -- changedFilesMatchScope: changed files compared with task scope -- testsRun: exact focused and broader commands run -- debugRemoved: temporary logs/probes/scratch files removed -- commitMessageMatchesBehavior: latest planner commit describes behavior, not process -- branchDriftCheck: planner_status/git wrapper state showed expected task/plan branch state -``` +- **Compilation failures:** fix compiler/lint issues before focusing on the behavior test. +- **No failing signal:** a test that passes before the production code is invalid or testing the wrong path. +- **Bootstrap-only failure:** if the only red signal is missing module/import/file, create the module, then rerun and verify behavior assertions — do not treat "file now exists" as completion. +- **Broken mocks:** if a test hangs, check for unmocked network/database calls. +- **Pivot:** if the implementation looks correct but the test still fails, verify expectations match the method signature; verify files and import paths before blaming the runner. ## manual-compact @@ -148,33 +104,6 @@ Preserve the active task id, `tdd.md` path, failing signal, commands, fixtures, Call `planner_status` immediately. Reload `task.md`, `tdd.md`, and focused source files only when needed. Do not skip the failing-test requirement because earlier chat context was compacted. -## Smart TDD & Boundary Coverage (KISS-based TDD) - -### 1. Mandatory Core Test Cases -For every behavioral task, choose only the cases that falsify a real acceptance risk before writing production code: -- **Baseline (Happy Path)**: The exact expected outcome for typical valid input when the task changes that path. -- **Minimum Limit**: Empty inputs, zero, empty arrays, or minimum bounds only when boundaries are part of the behavior. -- **Maximum Limit**: Maximum bounds, long strings, large arrays, or overflow limits only when the task can plausibly break them. -- **Danger Zone (Edge Cases & Errors)**: Null, undefined, invalid formats, or error paths only when the task owns validation/error behavior. -- **No Reassurance Tests**: Do not add tests merely to feel safer. Add a test only when it would fail before the fix or protect a named requirement. - -### 2. Implementation Footprint Restriction -- **No Speculative Coding**: Only write the absolute minimum amount of production code required to make your TDD tests pass. -- **Do Not Guess Behaviors**: If you write code to handle a case that is not covered by a TDD test, you are violating this rule. Add a test for that case first, then implement it. -- **Doubt the Test Itself**: A passing test can prove the wrong behavior. Before implementation, name the exact failure signal that proves the missing behavior. - -## TDD & Test-Harness Diagnostics - -### 1. Test Harness Errors -- **Compilation Failures**: If the test code fails to compile, fix compiler/lint issues before focusing on the behavior test. -- **No Failing Signal**: If the test passes before you write the production code, the test is invalid or testing the wrong code path. -- **Bootstrap-Only Failure**: If the only red signal is missing module/import/file, create the module, then rerun the focused test and verify behavior assertions fail or pass for the real task. Do not treat “file now exists” as task completion. -- **Broken Mocks**: If the test hangs, check if your mocks are waiting for network/database calls that aren't mocked. - -### 2. Algorithmic Pivot Flow -- **Doubt the Test**: If the implementation is correct but the test still fails, analyze the test assertions. Verify that expectations match the method signature. -- **Synchronous Checks**: Verify files and import paths manually before running the test runner to avoid generic module-not-found errors. - ## If You Do Not Know What To Do Next -If you don't know what to do next, call `planner_status`. +The `planner_finish_step` result names your next step, its goal, and the worktree to work in — follow it. Call `planner_status` only when you need the full step rule or stage instruction, or when you are unsure. diff --git a/src/guard/project-mutation.test.ts b/src/guard/project-mutation.test.ts index 61a5b6c..7c66c55 100644 --- a/src/guard/project-mutation.test.ts +++ b/src/guard/project-mutation.test.ts @@ -70,10 +70,40 @@ describe("planner built-in Pi tool guard", () => { for (const path of [state.planPaths.goalMd, state.planPaths.requestMd]) { const result = decision({ toolName: "write", path, state }); expect(result.allow, path).toBe(false); - expect(result.reason).toContain("planner wrapper"); + expect(result.reason).toContain("planner-managed file"); } }); + it("blocks direct writes to questions.md and the active task tdd.md", () => { + const planDir = + "/agent/extensions/pi-code-planner/projects/app/plans/plan-a"; + const state = activeExecutionState("implement_task"); + state.planPaths = { + planDir, + requestMd: `${planDir}/request.md`, + goalMd: `${planDir}/goal.md`, + questionsMd: `${planDir}/questions.md`, + tasksDir: `${planDir}/tasks`, + }; + state.planState = { ...state.planState, activeTaskId: "parse-config" }; + + const questions = decision({ + toolName: "edit", + path: `${planDir}/questions.md`, + state, + }); + expect(questions.allow).toBe(false); + expect(questions.reason).toContain("planner_questions_submit"); + + const tdd = decision({ + toolName: "edit", + path: `${planDir}/tasks/parse-config/tdd.md`, + state, + }); + expect(tdd.allow).toBe(false); + expect(tdd.reason).toContain("planner_tdd_submit"); + }); + it("blocks write and edit when an active planner state cannot be loaded", () => { expect( decision({ diff --git a/src/guard/project-mutation.ts b/src/guard/project-mutation.ts index 532b443..69543b6 100644 --- a/src/guard/project-mutation.ts +++ b/src/guard/project-mutation.ts @@ -1,4 +1,4 @@ -import { isAbsolute, relative, resolve, sep } from "node:path"; +import { isAbsolute, join, relative, resolve, sep } from "node:path"; import { getPlannerStageStepBehavior } from "../runtime/stage-behavior"; import type { PlanStoragePaths, ProjectStoragePaths } from "../storage/paths"; import type { PlanStateRecord } from "../storage/schema"; @@ -15,8 +15,14 @@ export type PlannerBuiltinToolCall = export interface PlannerBuiltinGuardState extends GitWatcherState { projectPaths: Pick | null; - planPaths?: Pick | null; - planState: Pick | null; + planPaths?: Pick< + PlanStoragePaths, + "planDir" | "requestMd" | "goalMd" | "questionsMd" | "tasksDir" + > | null; + planState: Pick< + PlanStateRecord, + "stage" | "step" | "worktreePath" | "activeTaskId" + > | null; } export interface PlannerBuiltinGuardInput { @@ -48,13 +54,18 @@ export function checkPlannerBuiltinToolAllowed( return blockProjectWrite(input.state, input.tool.toolName); } - if (isProtectedPlannerArtifact(input.cwd, input.tool.path, input.state)) { + const protectedArtifact = matchProtectedPlannerArtifact( + input.cwd, + input.tool.path, + input.state, + ); + if (protectedArtifact) { return { allow: false, reason: [ - `Built-in Pi ${input.tool.toolName} cannot modify planner-managed intake files directly.`, - "Use the exact planner wrapper reported by planner_status.", - `Call ${PLANNER_STATUS_TOOL_NAME} before continuing.`, + `Built-in Pi ${input.tool.toolName} cannot modify the planner-managed file ${protectedArtifact.label} directly.`, + `Use ${protectedArtifact.tool} instead — it fills the artifact from structured arguments and validates the required sections.`, + `Call ${PLANNER_STATUS_TOOL_NAME} if you need the current wrapper.`, ].join("\n"), }; } @@ -104,17 +115,47 @@ function allowsProjectWrite( return projectAccess === "test_edits" || projectAccess === "production_edits"; } -function isProtectedPlannerArtifact( +interface ProtectedArtifact { + label: string; + tool: string; +} + +/** + * Planner artifacts that must only be written through a dedicated wrapper tool. + * request.md/goal.md/questions.md and the active task's tdd.md have structured + * wrappers; built-in edit/write would bypass their validation. Open-ended, + * append-heavy artifacts (plan.md, discovery.md, final_summary.md) intentionally + * stay editable so the model can append across the lifecycle. + */ +function matchProtectedPlannerArtifact( cwd: string, path: string, state: PlannerBuiltinGuardState, -): boolean { - if (!state.planPaths) return false; +): ProtectedArtifact | null { + if (!state.planPaths) return null; const target = isAbsolute(path) ? resolve(path) : resolve(cwd, path); - return ( - target === resolve(state.planPaths.requestMd) || - target === resolve(state.planPaths.goalMd) - ); + if (target === resolve(state.planPaths.requestMd)) { + return { label: "request.md", tool: "the planner intake flow" }; + } + if (target === resolve(state.planPaths.goalMd)) { + return { label: "goal.md", tool: "planner_goal_submit" }; + } + if ( + state.planPaths.questionsMd && + target === resolve(state.planPaths.questionsMd) + ) { + return { label: "questions.md", tool: "planner_questions_submit" }; + } + const activeTaskId = state.planState?.activeTaskId; + if (activeTaskId && state.planPaths.tasksDir) { + const tddPath = resolve( + join(state.planPaths.tasksDir, activeTaskId, "tdd.md"), + ); + if (target === tddPath) { + return { label: "tdd.md", tool: "planner_tdd_submit" }; + } + } + return null; } function isProjectPath( diff --git a/src/guard/tool-policy.ts b/src/guard/tool-policy.ts index 0e03e89..c4d2491 100644 --- a/src/guard/tool-policy.ts +++ b/src/guard/tool-policy.ts @@ -11,6 +11,10 @@ export const PLANNER_WRAPPER_TOOLS = [ "planner_goal_decide", "planner_questions_submit", "planner_questions_resolve", + "planner_plan_submit", + "planner_discovery_submit", + "planner_tdd_submit", + "planner_summary_submit", "planner_task_upsert", "planner_refactor_review", "planner_doubt_review", @@ -95,6 +99,7 @@ const STEP_ALLOWED_TOOLS = { discovery: { scan_project_structure: [ "planner_git_inspect", + "planner_discovery_submit", "planner_contract_scan", "planner_contract_route", "planner_contract_read", @@ -113,7 +118,11 @@ const STEP_ALLOWED_TOOLS = { }, planning: { read_context: ["planner_contract_route", "planner_contract_read"], - draft_plan: ["planner_contract_route", "planner_contract_read"], + draft_plan: [ + "planner_plan_submit", + "planner_contract_route", + "planner_contract_read", + ], split_tasks: ["planner_contract_route", "planner_contract_read"], write_task_files: [ "planner_task_upsert", @@ -128,6 +137,7 @@ const STEP_ALLOWED_TOOLS = { prepare_task: ["planner_git_inspect", "planner_git_create_task_branch"], write_tdd_plan: [ "planner_git_inspect", + "planner_tdd_submit", "planner_report_stuck", "planner_skill_create", "planner_skill_update", @@ -138,6 +148,7 @@ const STEP_ALLOWED_TOOLS = { ], write_tests: [ "planner_git_inspect", + "planner_tdd_submit", "planner_git_commit", "planner_report_stuck", "planner_skill_create", @@ -149,6 +160,7 @@ const STEP_ALLOWED_TOOLS = { ], run_failing_tests: [ "planner_git_inspect", + "planner_tdd_submit", "planner_report_stuck", "planner_skill_create", "planner_skill_update", @@ -159,6 +171,7 @@ const STEP_ALLOWED_TOOLS = { ], implement_task: [ "planner_git_inspect", + "planner_tdd_submit", "planner_git_commit", "planner_contract_route", "planner_contract_read", @@ -172,6 +185,7 @@ const STEP_ALLOWED_TOOLS = { ], contract_check: [ "planner_git_inspect", + "planner_tdd_submit", "planner_git_commit", "planner_contract_route", "planner_contract_read", @@ -183,6 +197,7 @@ const STEP_ALLOWED_TOOLS = { ], refactor_task: [ "planner_git_inspect", + "planner_tdd_submit", "planner_refactor_review", "planner_git_commit", "planner_contract_route", @@ -201,6 +216,7 @@ const STEP_ALLOWED_TOOLS = { ], run_final_tests: [ "planner_git_inspect", + "planner_tdd_submit", "planner_git_commit", "planner_report_stuck", "planner_skill_create", @@ -218,6 +234,7 @@ const STEP_ALLOWED_TOOLS = { ], merge_task_to_plan: [ "planner_git_inspect", + "planner_tdd_submit", "planner_git_merge_task_to_plan", ], compact_task: [], @@ -240,6 +257,7 @@ const STEP_ALLOWED_TOOLS = { "planner_contract_upsert", ], write_final_summary: [ + "planner_summary_submit", "planner_skill_create", "planner_skill_update", "planner_contract_route", @@ -257,7 +275,7 @@ const STEP_ALLOWED_TOOLS = { "planner_contract_decide", ], await_user_acceptance: ["planner_contract_decide"], - handle_change_request: [], + handle_change_request: ["planner_plan_submit", "planner_discovery_submit"], prepare_output_branch: [], merge_or_export_result: [], cleanup_worktree: [], diff --git a/src/index.tool-visibility.test.ts b/src/index.tool-visibility.test.ts index ab840fa..2392c62 100644 --- a/src/index.tool-visibility.test.ts +++ b/src/index.tool-visibility.test.ts @@ -133,14 +133,18 @@ describe("filterPlannerTools", () => { } }); - it("ALL_PLANNER_TOOL_NAMES has exactly 42 tools", () => { - expect(ALL_PLANNER_TOOL_NAMES).toHaveLength(42); + it("ALL_PLANNER_TOOL_NAMES has exactly 46 tools", () => { + expect(ALL_PLANNER_TOOL_NAMES).toHaveLength(46); expect(ALL_PLANNER_TOOL_NAMES).toContain("planner_report_stuck"); expect(ALL_PLANNER_TOOL_NAMES).toContain("planner_refactor_review"); expect(ALL_PLANNER_TOOL_NAMES).toContain("planner_doubt_review"); expect(ALL_PLANNER_TOOL_NAMES).toContain("planner_skill_create"); expect(ALL_PLANNER_TOOL_NAMES).toContain("planner_contract_check"); expect(ALL_PLANNER_TOOL_NAMES).toContain("planner_debug_strategy"); + expect(ALL_PLANNER_TOOL_NAMES).toContain("planner_plan_submit"); + expect(ALL_PLANNER_TOOL_NAMES).toContain("planner_discovery_submit"); + expect(ALL_PLANNER_TOOL_NAMES).toContain("planner_tdd_submit"); + expect(ALL_PLANNER_TOOL_NAMES).toContain("planner_summary_submit"); }); }); diff --git a/src/index.ts b/src/index.ts index 1009def..9b01cf2 100644 --- a/src/index.ts +++ b/src/index.ts @@ -34,6 +34,11 @@ import { inspectAcceptedPlan, } from "./runtime/accepted-plan"; import { readActivePlanContext } from "./runtime/active-plan"; +import { + executePlannerArtifactTool, + PLANNER_ARTIFACT_TOOL_NAMES, + type PlannerArtifactToolName, +} from "./runtime/artifact-tools"; import { buildPlannerCompactInstructionBundle, buildPlannerPostCompactMessage, @@ -666,6 +671,150 @@ const REFACTOR_REVIEW_TOOL_PARAMETERS = { additionalProperties: false, } as const; +const PLAN_SUBMIT_TOOL_PARAMETERS = { + type: "object", + properties: { + content: { + type: "string", + description: + "Full plan.md markdown: scope, constraints, risks, checks, and the intended task sequence. The wrapper writes the file atomically.", + }, + }, + required: ["content"], + additionalProperties: false, +} as const; + +const DISCOVERY_SUBMIT_TOOL_PARAMETERS = { + type: "object", + properties: { + body: { + type: "string", + description: + "Full discovery.md markdown WITHOUT the Verification Protocol section: system boundaries, findings, fundamental rules, and (for change requests) Post-Implementation Snapshot / Completed Work / Remaining Work.", + }, + verificationProtocol: { + type: "array", + description: + "Exact test/lint/build/format commands (with working directory and flags) that doubt_review must later evidence. Rendered as the ## Verification Protocol section.", + items: { type: "string" }, + }, + }, + required: ["body", "verificationProtocol"], + additionalProperties: false, +} as const; + +const SUMMARY_SUBMIT_TOOL_PARAMETERS = { + type: "object", + properties: { + content: { + type: "string", + description: + "Full final_summary.md markdown: what changed, verification evidence, and any follow-ups.", + }, + }, + required: ["content"], + additionalProperties: false, +} as const; + +const TDD_SECTION_FIELD = (description: string) => + ({ type: "string", description }) as const; + +const TDD_SUBMIT_TOOL_PARAMETERS = { + type: "object", + description: + "Fill or update tdd.md for the active task. Provide each section as the lifecycle reaches it; the wrapper assembles the markdown and preserves sections you do not pass. Built-in edit/write cannot modify tdd.md.", + properties: { + preImplementation: { + type: "object", + description: + "Pre-Implementation Proof Contract (write before run_failing_tests).", + properties: { + failingSignal: TDD_SECTION_FIELD( + "The exact failing test/assertion signal proving the behavior is absent.", + ), + productionPath: TDD_SECTION_FIELD( + "The production file/symbol that will make the test pass.", + ), + successSignal: TDD_SECTION_FIELD( + "The exact signal that will prove success after implementation.", + ), + outOfScopeFiles: TDD_SECTION_FIELD( + "Files/areas explicitly out of scope for this task.", + ), + }, + required: [ + "failingSignal", + "productionPath", + "successSignal", + "outOfScopeFiles", + ], + additionalProperties: false, + }, + postImplementation: { + type: "object", + description: + "Post-Implementation Counterexample Review (write before finishing implement_task).", + properties: { + counterexample: TDD_SECTION_FIELD( + "A concrete counterexample considered against the implementation.", + ), + boundaryValue: TDD_SECTION_FIELD("Boundary/edge value checked."), + oppositeCase: TDD_SECTION_FIELD("The opposite/negative case checked."), + regressionRisk: TDD_SECTION_FIELD( + "Regression risk and how it was mitigated or verified.", + ), + scopeCheck: TDD_SECTION_FIELD( + "Confirmation that changes stayed within the task scope.", + ), + action: TDD_SECTION_FIELD( + "Action taken from this review, or why none.", + ), + }, + required: [ + "counterexample", + "boundaryValue", + "oppositeCase", + "regressionRisk", + "scopeCheck", + "action", + ], + additionalProperties: false, + }, + mergeScopeAudit: { + type: "object", + description: "Task Merge Scope Audit (write before merge_task_to_plan).", + properties: { + acceptanceCriteriaCovered: TDD_SECTION_FIELD( + "Evidence each acceptance criterion is covered.", + ), + changedFilesMatchScope: TDD_SECTION_FIELD( + "Confirmation changed files match the task scope.", + ), + testsRun: TDD_SECTION_FIELD("The tests run and their result."), + debugRemoved: TDD_SECTION_FIELD( + "Confirmation temporary debug/instrumentation was removed.", + ), + commitMessageMatchesBehavior: TDD_SECTION_FIELD( + "Confirmation the commit message matches the behavior change.", + ), + branchDriftCheck: TDD_SECTION_FIELD( + "Confirmation the task branch did not drift from the plan branch.", + ), + }, + required: [ + "acceptanceCriteriaCovered", + "changedFilesMatchScope", + "testsRun", + "debugRemoved", + "commitMessageMatchesBehavior", + "branchDriftCheck", + ], + additionalProperties: false, + }, + }, + additionalProperties: false, +} as const; + const DOUBT_REVIEW_TOOL_PARAMETERS = { type: "object", properties: { @@ -2477,6 +2626,36 @@ function registerPlannerTools( }); } + for (const toolName of PLANNER_ARTIFACT_TOOL_NAMES) { + pi.registerTool({ + name: toolName, + label: artifactToolLabel(toolName), + description: artifactToolDescription(toolName), + promptSnippet: artifactToolPromptSnippet(toolName), + parameters: artifactToolParameters(toolName) as never, + async execute(_toolCallId, params, _signal, _onUpdate, ctx) { + const fs = createNodeFs(); + const projectPaths = await createRuntimeProjectPaths(ctx.cwd); + await recordPlannerToolActivityForProject({ + fs, + projectPaths, + now: Date.now(), + }); + const result = await executePlannerArtifactTool({ + fs, + git: new NodeGitRunner(), + projectPaths, + toolName, + params, + }); + return { + content: [{ type: "text", text: result.text }], + details: result, + }; + }, + }); + } + pi.registerTool({ name: "planner_skill_create", label: "Planner Skill Create", @@ -3119,6 +3298,58 @@ function doubtToolParameters(toolName: PlannerDoubtReviewToolName) { } } +function artifactToolLabel(toolName: PlannerArtifactToolName): string { + switch (toolName) { + case "planner_plan_submit": + return "Planner Plan Submit"; + case "planner_discovery_submit": + return "Planner Discovery Submit"; + case "planner_tdd_submit": + return "Planner TDD Submit"; + case "planner_summary_submit": + return "Planner Summary Submit"; + } +} + +function artifactToolDescription(toolName: PlannerArtifactToolName): string { + switch (toolName) { + case "planner_plan_submit": + return "Write plan.md from a single content argument during planning/draft_plan (and change-request replans)."; + case "planner_discovery_submit": + return "Write discovery.md from a body argument plus a verificationProtocol list that becomes the ## Verification Protocol section."; + case "planner_tdd_submit": + return "Fill or update tdd.md for the active task from structured section fields. Built-in edit/write cannot modify tdd.md."; + case "planner_summary_submit": + return "Write final_summary.md from a single content argument during finalize/write_final_summary."; + } +} + +function artifactToolPromptSnippet(toolName: PlannerArtifactToolName): string { + switch (toolName) { + case "planner_plan_submit": + return "Use planner_plan_submit to save plan.md instead of hand-formatting the file."; + case "planner_discovery_submit": + return "Use planner_discovery_submit so the ## Verification Protocol section is always well-formed."; + case "planner_tdd_submit": + return "Use planner_tdd_submit to fill tdd.md section by section; it validates the required fields immediately."; + case "planner_summary_submit": + return "Use planner_summary_submit to save final_summary.md."; + } +} + +function artifactToolParameters(toolName: PlannerArtifactToolName) { + switch (toolName) { + case "planner_plan_submit": + return PLAN_SUBMIT_TOOL_PARAMETERS; + case "planner_discovery_submit": + return DISCOVERY_SUBMIT_TOOL_PARAMETERS; + case "planner_tdd_submit": + return TDD_SUBMIT_TOOL_PARAMETERS; + case "planner_summary_submit": + return SUMMARY_SUBMIT_TOOL_PARAMETERS; + } +} + function debugToolLabel(toolName: PlannerDebugToolName): string { switch (toolName) { case "planner_debug_strategy": diff --git a/src/runtime/artifact-tools.ts b/src/runtime/artifact-tools.ts new file mode 100644 index 0000000..d1cd9bf --- /dev/null +++ b/src/runtime/artifact-tools.ts @@ -0,0 +1,195 @@ +import { join } from "node:path"; +import type { GitRunner } from "../git/runner"; +import type { PlannerFs } from "../storage/fs"; +import type { ProjectStoragePaths } from "../storage/paths"; +import { + checkPlannerOrchestratorToolAllowed, + runPlannerOrchestrator, +} from "./orchestrator"; +import { + mergeTddMarkdown, + renderTddSection, + TDD_SECTIONS, + type TddSectionKey, +} from "./tdd-form"; + +export const PLANNER_ARTIFACT_TOOL_NAMES = [ + "planner_plan_submit", + "planner_discovery_submit", + "planner_tdd_submit", + "planner_summary_submit", +] as const; + +export type PlannerArtifactToolName = + (typeof PLANNER_ARTIFACT_TOOL_NAMES)[number]; + +export interface PlannerArtifactToolExecutionResult { + status: "applied" | "blocked"; + toolName: PlannerArtifactToolName; + text: string; + details: { path: string } | null; +} + +export async function executePlannerArtifactTool(input: { + fs: PlannerFs; + git: GitRunner; + projectPaths: ProjectStoragePaths; + toolName: PlannerArtifactToolName; + params: unknown; +}): Promise { + const orchestrator = await runPlannerOrchestrator(input); + if (orchestrator.preflight.context.status !== "ready") { + return blocked(input.toolName, orchestrator.preflight.context.reason); + } + const policy = checkPlannerOrchestratorToolAllowed({ + orchestrator, + toolName: input.toolName, + }); + if (!policy.allow) { + return blocked( + input.toolName, + policy.reason ?? `Planner artifact tool ${input.toolName} is blocked.`, + ); + } + + const { state, planPaths } = orchestrator.preflight.context; + try { + const params = asObject(input.params); + switch (input.toolName) { + case "planner_plan_submit": { + const content = requiredString(params, "content"); + await input.fs.writeTextAtomic(planPaths.planMd, `${content}\n`); + return applied(input.toolName, planPaths.planMd, "Planner plan saved."); + } + case "planner_discovery_submit": { + const body = requiredString(params, "body"); + const protocol = requiredStringArray(params, "verificationProtocol"); + const content = [ + body, + "", + "## Verification Protocol", + ...protocol.map((command) => `- ${command}`), + ].join("\n"); + await input.fs.writeTextAtomic(planPaths.discoveryMd, `${content}\n`); + return applied( + input.toolName, + planPaths.discoveryMd, + "Planner discovery saved with a ## Verification Protocol section.", + ); + } + case "planner_summary_submit": { + const content = requiredString(params, "content"); + const summaryPath = join(planPaths.planDir, "final_summary.md"); + await input.fs.writeTextAtomic(summaryPath, `${content}\n`); + return applied( + input.toolName, + summaryPath, + "Planner final summary saved.", + ); + } + case "planner_tdd_submit": { + if (!state.activeTaskId) { + return blocked( + input.toolName, + "planner_tdd_submit requires an active task. Prepare exactly one task branch first.", + ); + } + const updates = renderTddUpdates(params); + if (Object.keys(updates).length === 0) { + return blocked( + input.toolName, + `Provide at least one section: ${TDD_SECTIONS.map((s) => s.key).join(", ")}.`, + ); + } + const tddPath = join(planPaths.tasksDir, state.activeTaskId, "tdd.md"); + const existing = (await input.fs.exists(tddPath)) + ? await input.fs.readText(tddPath) + : ""; + const content = mergeTddMarkdown(existing, updates); + await input.fs.writeTextAtomic(tddPath, content); + return applied( + input.toolName, + tddPath, + `Planner tdd.md updated (${Object.keys(updates).join(", ")}).`, + ); + } + } + } catch (error) { + return blocked(input.toolName, errorMessage(error)); + } +} + +function renderTddUpdates( + params: Record, +): Partial> { + const updates: Partial> = {}; + for (const def of TDD_SECTIONS) { + const raw = params[def.key]; + if (raw === undefined || raw === null) { + continue; + } + updates[def.key] = renderTddSection( + def, + asObject(raw) as Record, + ); + } + return updates; +} + +function applied( + toolName: PlannerArtifactToolName, + path: string, + headline: string, +): PlannerArtifactToolExecutionResult { + return { + status: "applied", + toolName, + text: [ + headline, + `Artifact: ${path}`, + "Continue the current step; the next-step hint follows after planner_finish_step.", + ].join("\n"), + details: { path }, + }; +} + +function blocked( + toolName: PlannerArtifactToolName, + text: string, +): PlannerArtifactToolExecutionResult { + return { status: "blocked", toolName, text, details: null }; +} + +function asObject(value: unknown): Record { + return value && typeof value === "object" && !Array.isArray(value) + ? (value as Record) + : {}; +} + +function requiredString(params: Record, key: string): string { + const value = params[key]; + if (typeof value !== "string" || value.trim().length === 0) { + throw new TypeError(`${key} must be a non-empty string.`); + } + return value.trim(); +} + +function requiredStringArray( + params: Record, + key: string, +): string[] { + const value = params[key]; + if (!Array.isArray(value) || value.length === 0) { + throw new TypeError(`${key} must be a non-empty array of strings.`); + } + return value.map((entry, index) => { + if (typeof entry !== "string" || entry.trim().length === 0) { + throw new TypeError(`${key}[${index}] must be a non-empty string.`); + } + return entry.trim(); + }); +} + +function errorMessage(error: unknown): string { + return error instanceof Error ? error.message : String(error); +} diff --git a/src/runtime/dashboard.ts b/src/runtime/dashboard.ts index 2cf8164..158135a 100644 --- a/src/runtime/dashboard.ts +++ b/src/runtime/dashboard.ts @@ -190,17 +190,21 @@ export async function openPlannerWorkspace( // Render as a fixed top overlay so the workspace does not live in the // chat scrollback (mouse-wheel no longer drags it off-screen) and the // native footer stays visible in the reserved rows below it. + // + // overlayOptions() is evaluated once at show time, so these must be + // RELATIVE values: the layout is re-resolved against the live terminal + // every render, but absolute numbers would pin the overlay to the + // startup size and never grow when the terminal is enlarged. "100%" + // width/height track the terminal both ways; margin.bottom reserves the + // footer rows. overlay: true, - overlayOptions: () => { - const rows = process.stdout.rows ?? 40; - const cols = process.stdout.columns ?? 100; - return { - width: cols, - maxHeight: Math.max(16, rows - footerReserve), - anchor: "top-left", - row: 0, - col: 0, - }; + overlayOptions: { + width: "100%", + maxHeight: "100%", + anchor: "top-left", + row: 0, + col: 0, + margin: { bottom: footerReserve }, }, }, ); @@ -437,7 +441,11 @@ class PlannerWorkspaceComponent implements Component { ? `${this.model.stage}/${this.model.step}/${this.model.stepStatus}/${this.model.tasksDone}/${this.model.tasksTotal}` : "unavailable"; const uiSig = `${this.focus}|${this.input}|${this.cursor}|${this.ui.selectedIndex}|${this.atBottom}:${this.topLine}|${this.expandAll}|${this.hideThinking}`; - return `${clock}#${rowsSig}#${modelSig}#${uiSig}`; + // Terminal size is part of the signature so a resize triggers a redraw. + // Without it the idle tick keeps the same signature after a resize and + // never calls requestRender(), leaving the overlay frozen at the old size. + const termSig = `${process.stdout.columns ?? 0}x${process.stdout.rows ?? 0}`; + return `${clock}#${rowsSig}#${modelSig}#${uiSig}#${termSig}`; } private scheduleRender(signature = this.computeSignature()): void { diff --git a/src/runtime/next-step-hint.test.ts b/src/runtime/next-step-hint.test.ts new file mode 100644 index 0000000..e974b89 --- /dev/null +++ b/src/runtime/next-step-hint.test.ts @@ -0,0 +1,96 @@ +import { describe, expect, it } from "vitest"; +import { + createInitialPlanState, + type PlannerStage, + type PlannerStep, + type PlanStateRecord, +} from "../storage/schema"; +import { buildNextStepHint } from "./next-step-hint"; + +describe("buildNextStepHint", () => { + it("states the worktree location, current step, and goal", () => { + const hint = buildNextStepHint( + state({ stage: "planning", step: "draft_plan", stepStatus: "running" }), + ); + expect(hint).toContain( + "worktree `/repo/app/.pi/pi-code-planner/worktrees/plan-a`", + ); + expect(hint).toContain("Now: planning/draft_plan (running)."); + expect(hint).toContain("Goal: Draft an executable plan."); + expect(hint).toContain("If unsure, re-read `planning.md`"); + }); + + it("names the planner wrappers allowed at the current step", () => { + const hint = buildNextStepHint( + state({ + stage: "discovery", + step: "scan_project_structure", + stepStatus: "running", + }), + ); + expect(hint).toContain("Tools allowed now:"); + expect(hint).toContain("planner_discovery_submit"); + }); + + it("uses the linear nextInstruction when there is a single next position", () => { + const hint = buildNextStepHint( + state({ stage: "planning", step: "draft_plan", stepStatus: "running" }), + ); + expect(hint).toContain( + "Next: Call planner_finish_step to open split_tasks.", + ); + }); + + it("lists both targets and marks the fix loop at run_final_tests", () => { + const hint = buildNextStepHint( + state({ + stage: "execution", + step: "run_final_tests", + stepStatus: "running", + }), + ); + expect(hint).toContain("call planner_finish_step and choose ONE target"); + expect(hint).toContain("{stage: 'execution', step: 'capture_skill'}"); + expect(hint).toContain( + "{stage: 'execution', step: 'implement_task'} (loops back)", + ); + }); + + it("guides toward request_compact on an enabled compact step", () => { + const hint = buildNextStepHint( + state({ + stage: "planning", + step: "compact_planning", + stepStatus: "running", + }), + ); + expect(hint).toContain("planner_request_compact"); + }); + + it("skips compact guidance when the boundary is disabled", () => { + // compact_task uses the task boundary, which defaults to disabled. + const hint = buildNextStepHint( + state({ + stage: "execution", + step: "compact_task", + stepStatus: "running", + }), + ); + expect(hint).not.toContain("planner_request_compact"); + expect(hint).toContain("select_next_task"); + }); +}); + +function state(input: Partial = {}): PlanStateRecord { + return { + ...createInitialPlanState({ + baseBranch: "main", + planBranch: "plan/plan-a", + worktreePath: "/repo/app/.pi/pi-code-planner/worktrees/plan-a", + }), + ...input, + } satisfies PlanStateRecord & { + stage: PlannerStage; + step: PlannerStep; + }; +} diff --git a/src/runtime/next-step-hint.ts b/src/runtime/next-step-hint.ts new file mode 100644 index 0000000..f925e19 --- /dev/null +++ b/src/runtime/next-step-hint.ts @@ -0,0 +1,116 @@ +import { getAllowedPlannerWrapperTools } from "../guard/tool-policy"; +import { + PLANNER_STAGE_STEPS, + type PlannerStage, + type PlanStateRecord, +} from "../storage/schema"; +import { + getAllowedNextPlannerPositions, + getPlannerStepStage, + isPlannerCompactEnabled, + type PlannerPosition, +} from "./state-machine"; +import { getPlannerStepRule, type PlannerStepRule } from "./status"; + +// Stage order used only to label a branch target as a backward/fix move. +const STAGE_ORDER: PlannerStage[] = [ + "init", + "intake", + "discovery", + "planning", + "execution", + "finalize", + "done", + "recovery", +]; + +/** + * Compact, model-facing guidance for the step the planner has just moved into. + * + * It answers the same questions as planner_status — where am I, what is the + * goal, what do I do next — but is deliberately lighter: no Stage Behavior, + * no allowed-tool dumps, no full instruction body. The full status remains the + * heavier source of truth; this keeps the model moving without calling it after + * every transition. + */ +export function buildNextStepHint(state: PlanStateRecord): string { + const rule = safeRule(state); + const lines: string[] = [ + `You are in worktree \`${state.worktreePath ?? "(none)"}\` (branch \`${state.currentBranch ?? "(unknown)"}\`) — work here.`, + `Now: ${state.stage}/${state.step} (${state.stepStatus}).`, + ]; + if (rule) { + lines.push(`Goal: ${rule.objective}`); + const firstAction = rule.requiredActions[0]; + if (firstAction) { + lines.push(`Do now: ${firstAction}`); + } + lines.push(`Exit when: ${rule.exitCondition}`); + } + lines.push(`Next: ${describeNextMove(state, rule?.nextInstruction)}`); + const tools = getAllowedPlannerWrapperTools(state); + if (tools.length > 0) { + // Name the wrappers permitted at this exact step so the model reaches for + // the right planner tool instead of a built-in write/edit. + lines.push(`Tools allowed now: ${tools.join(", ")}.`); + } + lines.push( + `If unsure, re-read \`${getPlannerStepStage(state.step)}.md\` or call planner_status.`, + ); + return lines.join("\n"); +} + +function describeNextMove( + state: PlanStateRecord, + fallback: string | undefined, +): string { + if (state.step.startsWith("compact_") && isPlannerCompactEnabled(state)) { + return "Prepare the compact boundary summary, then call planner_request_compact."; + } + const allowed = safeAllowedNext(state); + if (allowed.length > 1) { + const targets = allowed + .map( + (position) => + `{stage: '${position.stage}', step: '${position.step}'}${ + isBackwardMove(state, position) ? " (loops back)" : "" + }`, + ) + .join(" or "); + return `When the exit condition holds, call planner_finish_step and choose ONE target: ${targets}.`; + } + return fallback ?? "When the exit condition holds, call planner_finish_step."; +} + +function isBackwardMove( + state: PlanStateRecord, + target: PlannerPosition, +): boolean { + const currentStageIndex = STAGE_ORDER.indexOf(state.stage); + const targetStageIndex = STAGE_ORDER.indexOf(target.stage); + if (targetStageIndex !== currentStageIndex) { + return targetStageIndex < currentStageIndex; + } + const steps = PLANNER_STAGE_STEPS[state.stage] as readonly string[]; + return steps.indexOf(target.step) < steps.indexOf(state.step); +} + +function safeRule(state: PlanStateRecord): PlannerStepRule | null { + try { + return getPlannerStepRule({ stage: state.stage, step: state.step }); + } catch { + return null; + } +} + +function safeAllowedNext(state: PlanStateRecord): PlannerPosition[] { + try { + return getAllowedNextPlannerPositions({ + stage: state.stage, + step: state.step, + creationMethod: state.creationMethod, + }); + } catch { + return []; + } +} diff --git a/src/runtime/simplified-flow.test.ts b/src/runtime/simplified-flow.test.ts index 0305771..547a431 100644 --- a/src/runtime/simplified-flow.test.ts +++ b/src/runtime/simplified-flow.test.ts @@ -64,6 +64,7 @@ describe("simplified local-model workflow", () => { ).toEqual([ "planner_status", "planner_git_inspect", + "planner_discovery_submit", "planner_contract_scan", "planner_contract_route", "planner_contract_read", diff --git a/src/runtime/status.ts b/src/runtime/status.ts index 4b88050..1a124ea 100644 --- a/src/runtime/status.ts +++ b/src/runtime/status.ts @@ -874,66 +874,47 @@ export async function buildPlannerStatusText( }) : []; lines.push( - `- plan: ${preflight.context.activePlanId}`, - `- plan title: ${preflight.context.plan.title}`, - `- creationMethod: ${state.creationMethod ?? "create"}`, - `- compatibilityMode: ${state.compatibilityMode ?? "additive"}`, + "", + `You are in worktree \`${state.worktreePath ?? "(none)"}\` (branch \`${preflight.gitReality?.branch ?? state.currentBranch ?? "(unknown)"}\`) — work here.`, + "Re-read the Current Stage Instruction below before acting.", + "", + "## Position", + `- plan: ${preflight.context.activePlanId} — ${preflight.context.plan.title}`, `- stage: ${state.stage}`, `- step: ${state.step}`, `- stepStatus: ${state.stepStatus}`, - `- nextStep: ${state.nextStep ?? "(none)"}`, `- activeTaskId: ${state.activeTaskId ?? "(none)"}`, - `- questionsSubmitted: ${String(state.questionsSubmitted)}`, - `- questionsResolved: ${String(state.questionsResolved)}`, - `- compactBoundaries: ${JSON.stringify(state.compactBoundaries)}`, - `- lastPlannerToolCallAt: ${state.lastPlannerToolCallAt ?? "(none)"}`, - `- lastIdleWakeAt: ${state.lastIdleWakeAt ?? "(none)"}`, - `- idleWakeInFlight: ${String(state.idleWakeInFlight)}`, - `- lastStuckAttemptId: ${state.lastStuckAttemptId ?? "(none)"}`, - `- lastStuckReportPath: ${state.lastStuckReportPath ?? "(none)"}`, - `- debugSessionId: ${state.debugSessionId ?? "(none)"}`, - `- debugArtifactsDir: ${state.debugArtifactsDir ?? "(none)"}`, - `- debugCleanupRequired: ${String(state.debugCleanupRequired)}`, - `- requiresCompact: ${String(state.requiresCompact)}`, - `- requiresUserDecision: ${String(state.requiresUserDecision)}`, - `- broken: ${String(state.broken)}`, - `- blockedReason: ${state.blockedReason ?? "(none)"}`, "", - "## Effective Settings", - `- metadata.humanLanguage: ${settings.effective.metadata.humanLanguage}`, - `- metadata.titleLanguage: ${settings.effective.metadata.titleLanguage}`, - `- metadata.descriptionLanguage: ${settings.effective.metadata.descriptionLanguage}`, - `- metadata.commitLanguage: ${settings.effective.metadata.commitLanguage}`, - `- metadata.doubtReviewLanguage: ${settings.effective.metadata.doubtReviewLanguage}`, - `- metadata.skillLanguage: ${settings.effective.metadata.skillLanguage}`, - `- idle.enabled: ${String(settings.effective.idle.enabled)}`, - `- idle.timeoutMinutes: ${settings.effective.idle.timeoutMinutes}`, - `- skills.enabled: ${String(settings.effective.skills.enabled)}`, - `- skills.maxActive: ${settings.effective.skills.maxActive}`, - `- contracts.enabled: ${String(settings.effective.contracts.enabled)}`, - `- contracts.finalPolicy: ${settings.effective.contracts.finalPolicy}`, - `- contracts.scanBatchSize: ${settings.effective.contracts.scanBatchSize}`, - `- contracts.readChunkChars: ${settings.effective.contracts.readChunkChars}`, + "## Languages", + `- human: ${settings.effective.metadata.humanLanguage}`, + `- title: ${settings.effective.metadata.titleLanguage}`, + `- description: ${settings.effective.metadata.descriptionLanguage}`, + `- commit: ${settings.effective.metadata.commitLanguage}`, + `- doubtReview: ${settings.effective.metadata.doubtReviewLanguage}`, + `- skill: ${settings.effective.metadata.skillLanguage}`, "", - "## Git And Worktree", + "## Git", `- worktree: ${state.worktreePath ?? "(none)"}`, `- worktreeExists: ${String(preflight.worktreeExists)}`, - `- expectedCurrentBranch: ${state.currentBranch ?? "(none)"}`, + `- expectedBranch: ${state.currentBranch ?? "(none)"}`, `- actualBranch: ${preflight.gitReality?.branch ?? "(unavailable)"}`, - `- actualHEAD: ${preflight.gitReality?.headCommit ?? "(unavailable)"}`, `- dirty: ${preflight.gitReality ? String(preflight.gitReality.isDirty) : "(unavailable)"}`, `- conflicts: ${preflight.gitReality ? String(preflight.gitReality.hasConflicts) : "(unavailable)"}`, - `- activeBranches: ${JSON.stringify(state.activeBranches)}`, - `- mergeTargets: ${JSON.stringify(state.mergeTargets)}`, - "", - "## Debug Mode", - ...formatDebugStatusLines(state), "", + // Debug section is only relevant while a debug session is active; hide it + // otherwise so the small model is not distracted by "inactive" noise. + ...(state.debugArtifactsDir + ? ["## Debug Mode", ...formatDebugStatusLines(state), ""] + : []), "## Planner Local Contracts", + // Keep only the actionable lines (enabled flag + guidance); the verbose + // traversal/pending/summary dump is internal state the model does not need. ...formatPlannerContractsStatus({ state, settings: settings.effective.contracts, - }), + }).filter( + (line) => line.startsWith("- enabled:") || line.startsWith("- guidance:"), + ), "", ...(state.stage === "finalize" && preflight.planPaths ? await formatDoubtReviewProtocolSection( diff --git a/src/runtime/tdd-evidence.ts b/src/runtime/tdd-evidence.ts index e0c8783..9f7423b 100644 --- a/src/runtime/tdd-evidence.ts +++ b/src/runtime/tdd-evidence.ts @@ -1,13 +1,18 @@ import type { PlannerFs } from "../storage/fs"; -const PRE_IMPLEMENTATION_FIELDS = [ +export const PRE_IMPLEMENTATION_SECTION = "Pre-Implementation Proof Contract"; +export const POST_IMPLEMENTATION_SECTION = + "Post-Implementation Counterexample Review"; +export const MERGE_SCOPE_AUDIT_SECTION = "Task Merge Scope Audit"; + +export const PRE_IMPLEMENTATION_FIELDS = [ "failingSignal", "productionPath", "successSignal", "outOfScopeFiles", ] as const; -const POST_IMPLEMENTATION_FIELDS = [ +export const POST_IMPLEMENTATION_FIELDS = [ "counterexample", "boundaryValue", "oppositeCase", @@ -16,7 +21,7 @@ const POST_IMPLEMENTATION_FIELDS = [ "action", ] as const; -const MERGE_SCOPE_AUDIT_FIELDS = [ +export const MERGE_SCOPE_AUDIT_FIELDS = [ "acceptanceCriteriaCovered", "changedFilesMatchScope", "testsRun", @@ -25,6 +30,11 @@ const MERGE_SCOPE_AUDIT_FIELDS = [ "branchDriftCheck", ] as const; +/** True when the value is empty or a known placeholder token. */ +export function isTddPlaceholder(value: string): boolean { + return isPlaceholder(value); +} + export async function validatePreImplementationProofContract( fs: PlannerFs, path: string, @@ -32,7 +42,7 @@ export async function validatePreImplementationProofContract( return await validateTddSection({ fs, path, - section: "Pre-Implementation Proof Contract", + section: PRE_IMPLEMENTATION_SECTION, fields: PRE_IMPLEMENTATION_FIELDS, }); } @@ -44,7 +54,7 @@ export async function validatePostImplementationCounterexampleReview( return await validateTddSection({ fs, path, - section: "Post-Implementation Counterexample Review", + section: POST_IMPLEMENTATION_SECTION, fields: POST_IMPLEMENTATION_FIELDS, }); } @@ -56,7 +66,7 @@ export async function validateTaskMergeScopeAudit( return await validateTddSection({ fs, path, - section: "Task Merge Scope Audit", + section: MERGE_SCOPE_AUDIT_SECTION, fields: MERGE_SCOPE_AUDIT_FIELDS, }); } diff --git a/src/runtime/tdd-form.test.ts b/src/runtime/tdd-form.test.ts new file mode 100644 index 0000000..81d1aeb --- /dev/null +++ b/src/runtime/tdd-form.test.ts @@ -0,0 +1,83 @@ +import { describe, expect, it } from "vitest"; +import { mergeTddMarkdown, renderTddSection, TDD_SECTIONS } from "./tdd-form"; + +const PRE = TDD_SECTIONS[0]; +const POST = TDD_SECTIONS[1]; + +const preValues = { + failingSignal: "expect(parse()).toThrow", + productionPath: "src/parse.ts", + successSignal: "test passes", + outOfScopeFiles: "src/ui/*", +}; + +const postValues = { + counterexample: "empty input", + boundaryValue: "0-length array", + oppositeCase: "valid input still works", + regressionRisk: "covered by existing suite", + scopeCheck: "only src/parse.ts changed", + action: "added guard clause", +}; + +describe("renderTddSection", () => { + it("renders a heading and one bullet per field", () => { + const block = renderTddSection(PRE, preValues); + expect(block).toContain("## Pre-Implementation Proof Contract"); + expect(block).toContain("- failingSignal: expect(parse()).toThrow"); + expect(block).toContain("- outOfScopeFiles: src/ui/*"); + }); + + it("rejects a missing field", () => { + expect(() => + renderTddSection(PRE, { ...preValues, productionPath: "" }), + ).toThrow(/productionPath/); + }); + + it("rejects a placeholder value", () => { + expect(() => + renderTddSection(PRE, { ...preValues, successSignal: "TODO" }), + ).toThrow(/placeholder/); + }); +}); + +describe("mergeTddMarkdown", () => { + it("accumulates sections across incremental submits in canonical order", () => { + const afterPre = mergeTddMarkdown("", { + preImplementation: renderTddSection(PRE, preValues), + }); + const afterPost = mergeTddMarkdown(afterPre, { + postImplementation: renderTddSection(POST, postValues), + }); + expect(afterPost).toContain("## Pre-Implementation Proof Contract"); + expect(afterPost).toContain("## Post-Implementation Counterexample Review"); + expect(afterPost.indexOf("Pre-Implementation")).toBeLessThan( + afterPost.indexOf("Post-Implementation"), + ); + }); + + it("replaces an existing section instead of duplicating it", () => { + const first = mergeTddMarkdown("", { + preImplementation: renderTddSection(PRE, preValues), + }); + const second = mergeTddMarkdown(first, { + preImplementation: renderTddSection(PRE, { + ...preValues, + failingSignal: "new signal", + }), + }); + expect(second.match(/## Pre-Implementation Proof Contract/g)).toHaveLength( + 1, + ); + expect(second).toContain("- failingSignal: new signal"); + }); + + it("preserves unknown sections from an external edit", () => { + const external = "## Notes\n- hand-written note\n"; + const merged = mergeTddMarkdown(external, { + preImplementation: renderTddSection(PRE, preValues), + }); + expect(merged).toContain("## Notes"); + expect(merged).toContain("- hand-written note"); + }); +}); diff --git a/src/runtime/tdd-form.ts b/src/runtime/tdd-form.ts new file mode 100644 index 0000000..75a0764 --- /dev/null +++ b/src/runtime/tdd-form.ts @@ -0,0 +1,135 @@ +import { + isTddPlaceholder, + MERGE_SCOPE_AUDIT_FIELDS, + MERGE_SCOPE_AUDIT_SECTION, + POST_IMPLEMENTATION_FIELDS, + POST_IMPLEMENTATION_SECTION, + PRE_IMPLEMENTATION_FIELDS, + PRE_IMPLEMENTATION_SECTION, +} from "./tdd-evidence"; + +export type TddSectionKey = + | "preImplementation" + | "postImplementation" + | "mergeScopeAudit"; + +interface TddSectionDef { + key: TddSectionKey; + title: string; + fields: readonly string[]; +} + +// Canonical order matches the lifecycle: proof contract first, counterexample +// review after implementation, merge audit before the task lands. +export const TDD_SECTIONS: readonly TddSectionDef[] = [ + { + key: "preImplementation", + title: PRE_IMPLEMENTATION_SECTION, + fields: PRE_IMPLEMENTATION_FIELDS, + }, + { + key: "postImplementation", + title: POST_IMPLEMENTATION_SECTION, + fields: POST_IMPLEMENTATION_FIELDS, + }, + { + key: "mergeScopeAudit", + title: MERGE_SCOPE_AUDIT_SECTION, + fields: MERGE_SCOPE_AUDIT_FIELDS, + }, +]; + +export type TddSectionValues = Record; + +/** + * Validate and render one tdd.md section from structured field values. Throws + * TypeError with an actionable message when a required field is missing or a + * placeholder, so the model gets immediate feedback at submit time. + */ +export function renderTddSection( + def: TddSectionDef, + values: TddSectionValues, +): string { + const lines = [`## ${def.title}`]; + for (const field of def.fields) { + const raw = values[field]; + if (typeof raw !== "string" || raw.trim().length === 0) { + throw new TypeError( + `${def.title}: field "${field}" must be a non-empty string. Required fields: ${def.fields.join(", ")}.`, + ); + } + const value = raw.trim(); + if (isTddPlaceholder(value)) { + throw new TypeError( + `${def.title}: field "${field}" must contain concrete evidence, not a placeholder ("${value}").`, + ); + } + lines.push(`- ${field}: ${value}`); + } + return lines.join("\n"); +} + +/** + * Merge the provided sections into existing tdd.md content, preserving any + * sections not being updated. Known sections are emitted in canonical order; + * unknown sections (e.g. from an external edit) are kept after them. + */ +export function mergeTddMarkdown( + existing: string, + updates: Partial>, +): string { + const parsed = parseLevelTwoSections(existing); + for (const def of TDD_SECTIONS) { + const rendered = updates[def.key]; + if (rendered !== undefined) { + parsed.set(def.title, rendered); + } + } + + const knownTitles = new Set(TDD_SECTIONS.map((def) => def.title)); + const blocks: string[] = []; + for (const def of TDD_SECTIONS) { + const block = parsed.get(def.title); + if (block) { + blocks.push(block.trimEnd()); + } + } + for (const [title, body] of parsed) { + if (!knownTitles.has(title)) { + blocks.push(`## ${title}\n${body}`.trimEnd()); + } + } + return `${blocks.join("\n\n")}\n`; +} + +/** + * Parse `## Heading` blocks, returning each section's full rendered text + * (heading line included) keyed by heading. Mirrors the parser used by the + * tdd-evidence validators so the round-trip stays consistent. + */ +function parseLevelTwoSections(text: string): Map { + const sections = new Map(); + let currentTitle: string | null = null; + let buffer: string[] = []; + + const flush = () => { + if (currentTitle !== null) { + sections.set(currentTitle, [`## ${currentTitle}`, ...buffer].join("\n")); + } + }; + + for (const line of text.split(/\r?\n/)) { + const heading = /^##\s+(.+?)\s*$/.exec(line); + if (heading) { + flush(); + currentTitle = heading[1].trim(); + buffer = []; + continue; + } + if (currentTitle !== null) { + buffer.push(line); + } + } + flush(); + return sections; +} diff --git a/src/runtime/workflow-tools.ts b/src/runtime/workflow-tools.ts index 700069b..88c5fe4 100644 --- a/src/runtime/workflow-tools.ts +++ b/src/runtime/workflow-tools.ts @@ -25,6 +25,7 @@ import { decidePlannerLifecycleNext, type PlannerLifecycleDecision, } from "./lifecycle"; +import { buildNextStepHint } from "./next-step-hint"; import { checkPlannerOrchestratorToolAllowed, runPlannerOrchestrator, @@ -90,7 +91,7 @@ export async function executePlannerWorkflowTool( state: null, }; return { - text: formatWorkflowToolResult(result, null), + text: formatWorkflowToolResult(result), transition, result, lifecycle: null, @@ -116,7 +117,7 @@ export async function executePlannerWorkflowTool( state, }; return { - text: formatWorkflowToolResult(result, null), + text: formatWorkflowToolResult(result), transition, result, lifecycle: null, @@ -140,7 +141,7 @@ export async function executePlannerWorkflowTool( state, }; return { - text: formatWorkflowToolResult(result, null), + text: formatWorkflowToolResult(result), transition, result, lifecycle: null, @@ -165,7 +166,7 @@ export async function executePlannerWorkflowTool( } return { - text: formatWorkflowToolResult(result, lifecycle), + text: formatWorkflowToolResult(result), transition, result, lifecycle, @@ -672,7 +673,6 @@ function isPlannerStepInStage( function formatWorkflowToolResult( result: PlannerStateTransitionResult, - lifecycle: PlannerLifecycleDecision | null, ): string { if (result.status === "blocked") { return [ @@ -688,24 +688,14 @@ function formatWorkflowToolResult( .join("\n"); } - const lines: string[] = [ - `Planner transition applied: ${result.transition.type}`, - `Previous: ${result.previousState.stage}/${result.previousState.step} (${result.previousState.stepStatus})`, - `Current: ${result.state.stage}/${result.state.step} (${result.state.stepStatus})`, - ...(result.state.nextStep ? [`Next step: ${result.state.nextStep}`] : []), - ]; - - if ( - result.transition.type === "start_step" && - lifecycle && - lifecycle.requiredTool - ) { - lines.push(`Recommended finish tool: ${lifecycle.requiredTool}`); - } - - lines.push("Call planner_status before choosing the next planner action."); - - return lines.filter(Boolean).join("\n"); + // The applied result reflects the step we just moved INTO, so build the + // guidance from the post-transition state. This replaces the old generic + // "Call planner_status" footer with a compact, actionable hint. + return [ + `Planner transition applied: ${result.transition.type} (previous: ${result.previousState.stage}/${result.previousState.step}).`, + "", + buildNextStepHint(result.state), + ].join("\n"); } function asObject(value: unknown): Record {