Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
242 changes: 221 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
<img src="assets/icon.webp" alt="pi-code-planner icon" width="120">
</p>

An experimental [Pi](https://github.com/badlogic/pi-mono) extension for local coding models. Adds a persisted state machine so long tasks survive context compaction, Git branching, and approval steps without you babysitting the session.
An experimental [Pi](https://github.com/badlogic/pi-mono) extension for local coding models. Adds a persisted state machine so long tasks survive context compaction, Git branching, and user approval steps without you babysitting the session.

However you run it, the unit that matters is **Pi + this extension driving a local model**. With that in mind, read this plainly:

Expand Down Expand Up @@ -34,22 +34,230 @@ Open Pi inside a Git project and run `/planner-create`.

---

## Workflow
## State Machine Overview

The extension drives the model through a fixed sequence of stages. Each stage contains a set of steps. The model cannot skip stages, call out-of-order tools, or advance past a gate without satisfying its exit conditions.

```mermaid
flowchart TD
S([user runs /planner-create]) --> INIT

INIT["**init** — 7 steps\nbootstrap worktree and plan record"]
INTAKE["**intake** — 2 steps\nwrite and approve goal"]
DISCOVERY["**discovery** — 4 steps\nscan project, write verification protocol"]
PLANNING["**planning** — 7 steps\nwrite plan.md, split into tasks"]
EXECUTION["**execution** — 12 steps per task\nTDD → implement → contracts → refactor → merge"]
FINALIZE["**finalize** — 6 steps\nintegration check, doubt review, summary"]
DONE["**done** — 8 steps\npresent result, await user acceptance"]
RECOVERY["**recovery** — 6 steps\ndiagnose and repair broken state"]
OUT([output/&lt;plan-id&gt; branch])

INIT --> INTAKE
INTAKE --> DISCOVERY
DISCOVERY --> PLANNING
PLANNING --> EXECUTION
EXECUTION -->|"select next task"| EXECUTION
EXECUTION -->|"all tasks done"| FINALIZE
FINALIZE --> DONE
DONE -->|"/planner-finish"| OUT
DONE -->|"change request"| PLANNING

INIT & INTAKE & DISCOVERY & PLANNING & EXECUTION & FINALIZE -.->|"broken / stuck"| RECOVERY
RECOVERY -.->|"resume"| INIT
```

The model never touches the diagram directly — it only ever calls the tool for the current step, and the gate advances the pointer.

---

## Stages and Steps

52 steps total across 8 stages.

### init — 7 steps

Runs once when `/planner-create` is called. Fully automated — the model does not drive these steps.

| Step | What happens |
| --- | --- |
| `check_project` | Verify a Git repo exists |
| `check_git` | Ensure Git is usable; init if needed |
| `prepare_storage` | Create `.pi/pi-code-planner/` storage directories |
| `choose_worktree_location` | Select worktree path |
| `create_plan_record` | Write `plan.json` and `project.json` |
| `create_plan_worktree` | `git worktree add` for the plan branch |
| `enter_intake` | Transition to intake stage |

### intake — 2 steps

The model reads the user's request and writes a normalized goal. The user must approve before planning begins.

| Step | What happens |
| --- | --- |
| `draft_goal` | Model writes `goal.md` via `planner_goal_submit` |
| `await_goal_approval` | Model presents goal; user approves or revises via `planner_goal_decide` |

### discovery — 4 steps

The model scans the project, reads contracts, and produces a `discovery.md` artifact that persists across all future compactions.

| Step | What happens |
| --- | --- |
| `scan_project_structure` | Read files, AGENTS.md contracts, run checks; write `discovery.md` with a **Verification Protocol** (exact commands to prove work is correct) |
| `write_questions` | Write and resolve open questions in `questions.md` |
| `compact_discovery` | Compact checkpoint — Pi compacts context here |
| `enter_planning` | Transition to planning stage |

The **Verification Protocol** is critical: it locks down the exact commands (`cargo test`, `npm run ci`, etc.) that every `doubt_review` submission must prove passed. The parser enforces this — if a command is missing from evidence, the submission is blocked.

### planning — 7 steps

The model writes a plan and splits it into atomic tasks. Each task gets its own `task.md` artifact.

| Step | What happens |
| --- | --- |
| `read_context` | Route AGENTS.md contracts relevant to the goal |
| `draft_plan` | Write `plan.md` via `planner_plan_submit` |
| `split_tasks` | Identify atomic tasks from the plan |
| `write_task_files` | Write each `task.md` via `planner_task_upsert` |
| `verify_plan` | Self-check: all tasks have acceptance criteria and scope |
| `compact_planning` | Compact checkpoint |
| `enter_execution` | Transition to execution stage |

### execution — 12 steps (repeated per task)

The main loop. Each task cycles through all 12 steps before the next task begins.

| Step | What happens |
| --- | --- |
| `prepare_task` | Create `task/<plan-id>/<task-id>` branch via `planner_git_create_task_branch` |
| `write_tdd_plan` | Write pre-implementation TDD plan: failing signal, production path, success signal |
| `write_tests` | Write or locate tests; commit via `planner_git_commit` |
| `run_failing_tests` | Verify tests fail (or exist) before implementation |
| `implement_task` | Implement; commit incrementally |
| `contract_check` | Assess AGENTS.md impact; upsert contracts if needed |
| `refactor_task` | Review changed surface for complexity and naming |
| `run_final_tests` | Run full verification protocol; tests must pass |
| `capture_skill` | Record reusable technique in skill library if warranted |
| `merge_task_to_plan` | Merge task branch into plan branch via `planner_git_merge_task_to_plan` |
| `compact_task` | Compact checkpoint |
| `select_next_task` | Pick next pending task or exit to finalize |

### finalize — 6 steps

Integration check and adversarial review of the complete plan branch before presenting to the user.

| Step | What happens |
| --- | --- |
| `verify_plan_branch` | Run full test suite on the merged plan branch |
| `compact_before_doubt` | Compact checkpoint — doubt review starts with a clean context |
| `doubt_review` | Adversarial audit via `planner_doubt_review`: every item in the Verification Protocol must appear in evidence; every possible bug must be classified as proven, needs_probe, or dismissed with proof |
| `write_final_summary` | Write `final_summary.md` |
| `compact_finalize` | Compact checkpoint |
| `enter_done` | Transition to done stage |

### done — 8 steps

The model presents the result and waits. It cannot advance to the internal export/cleanup steps on its own — those are driven exclusively by `/planner-finish`.

| Step | What happens |
| --- | --- |
| `present_result` | Model shows summary, commits, and output options |
| `await_user_acceptance` | Model waits; only the user can proceed (via `/planner-finish`) or request changes |
| `handle_change_request` | Model records corrections and returns to `planning/read_context` |
| `prepare_output_branch` | *(internal — /planner-finish only)* Create `output/<plan-id>` |
| `merge_or_export_result` | *(internal — /planner-finish only)* Merge plan branch to output |
| `cleanup_worktree` | *(internal — /planner-finish only)* Remove the plan worktree |
| `mark_done` | *(internal — /planner-finish only)* Clear active plan record |
| `cleanup_plan_files` | *(internal — /planner-finish only)* Remove plan artifacts |

> The model is explicitly blocked from calling `planner_finish_step` to enter any internal step. Attempting to simulate `/planner-finish` via tools results in a gate error.

### recovery — 6 steps

Entered automatically when the plan sets `broken=true` or `requiresUserDecision=true`. The model inspects state, classifies the problem, and either repairs or asks the user before resuming.

| Step | What happens |
| --- | --- |
| `read_state` | Read `state.json` and surface current position |
| `inspect_git` | Check branch, worktree, and diff state |
| `compare_expected_actual` | Diff expected vs actual file and branch state |
| `classify_recovery` | Determine if recovery is safe or requires user decision |
| `ask_user_if_destructive` | Present risk to user and wait for explicit approval |
| `repair_or_resume` | Apply repairs; call `planner_recovery_resume` to return |

---

## How Local Models Are Kept on Track

Running a local model on a multi-hour task without supervision requires more than good prompting. Here is what the extension actually enforces.

### 1. Persisted state — compaction survival

All plan state lives in JSON and Markdown artifacts on disk, not in chat. After every context compaction, the model calls `planner_status`, which reloads the current stage, step, and active task from `state.json`. The conversation is not the source of truth; the artifacts are. A compaction is a checkpoint, not a reset.

### 2. Dual-gate tool allowlist

Every planner tool call passes through two independent gates composed in sequence:

- **Policy gate** (`tool-policy.ts`): For the current `{stage, step}`, returns the exact set of allowed wrapper tools. Anything not on the list is blocked before any logic runs.
- **Behavior gate** (`stage-behavior.ts`): For each step, declares `expectedTools[]`. The model cannot call a tool that the step's contract does not expect, even if the policy gate would allow it.

Both gates must pass. The model cannot call `planner_task_upsert` during `doubt_review`, `planner_doubt_review` during `planning`, or anything outside the current step's declared scope.

### 3. Exit-condition validation

`planner_finish_step` is gated by `validateWorkflowExit`. The model cannot leave a step until its exit conditions are satisfied:

- `discovery/scan_project_structure`: `discovery.md` must contain `## Verification Protocol` with at least one command.
- `discovery/write_questions`: questions must be explicitly resolved.
- `finalize/doubt_review`: every Verification Protocol command must appear in `verificationEvidence`; no finding may have `status: possible` without a `proofLevel`.
- `done/handle_change_request`: `decisions.md`, `plan.md`, and `discovery.md` must each contain required sections.
- `done/await_user_acceptance`: `planner_finish_step` is blocked entirely unless targeting `handle_change_request` — the model must instruct the user to run `/planner-finish`.

### 4. Echo expected vs received

Every planner tool that writes structured Markdown returns two blocks in its result:

```
user request
→ normalize and approve goal
→ scan AGENTS.md contracts, inspect relevant files
→ persist discovery.md
→ write plan.md, split into tasks
→ for each task: write tests → implement → update AGENTS.md contracts → refactor → verify → merge
→ verify integrated plan branch
→ doubt_review: prove or disprove possible errors
→ ask user to accept
→ export output/<plan-id> branch with full task history
## Expected shape (canonical schema)
<the required section headers and field structure>

## What you submitted (saved to disk)
<verbatim content that was written>
```

After compaction, the model calls `planner_status`, reloads from persisted JSON/Markdown artifacts, and continues. Chat is not the source of truth — artifacts are.
This applies to all 12 strict-structure tools: `planner_goal_submit`, `planner_questions_submit`, `planner_plan_submit`, `planner_discovery_submit`, `planner_tdd_submit`, `planner_summary_submit`, `planner_task_upsert`, `planner_refactor_review`, `planner_doubt_review`, `planner_contract_upsert`, `planner_skill_create`, `planner_skill_update`.

The model can self-correct by comparing what it wrote against the expected schema without reading the file again.

### 5. Verification Protocol enforcement

During `discovery/scan_project_structure`, the model writes the project's exact verification commands (test, lint, build, format) into `## Verification Protocol` in `discovery.md`. During `finalize/doubt_review`, the parser extracts those commands and requires each one to appear in `verificationEvidence`. If the model skips a command or adds a phantom one, the submission is blocked.

`planner_discovery_submit` is the single writer of this section — any protocol content in the `body` argument is stripped and replaced by the canonical section built from the `verificationProtocol[]` argument. Parser and writer share the same invariant.

### 6. Compact checkpoints

Compact boundaries (`compact_discovery`, `compact_planning`, `compact_task`, `compact_before_doubt`, `compact_finalize`) are baked into the state machine. The model calls `planner_request_compact`, Pi compacts the context, and the model calls `planner_complete_compact` to resume. The gate blocks all other tools at compact boundaries until the compact/resume cycle completes.

### 7. AGENTS.md contracts

The planner treats `AGENTS.md` files as local architecture contracts — model-facing memory routed by topic rather than file path. Inspired by [DOX](https://github.com/agent0ai/dox). Before planning, the model calls `planner_contract_route` to fetch only the contracts relevant to the current goal. After each task, `planner_contract_check` determines whether the implementation changed any architectural surface and updates contracts if needed.

Contracts are written only through `planner_contract_upsert`. The planner tracks touched files in `state.json` and keeps baselines so `/planner-finish` can offer to remove or restore them.

### 8. Skill library

When a task reveals a non-obvious technique (a workaround, a tricky API pattern, a testing approach), the model can write it to the skill library via `planner_skill_create`. Before future tasks, `planner_skill_create` is allowed only at `capture_skill` — the step explicitly reserved for this. Skills are searchable via `/planner-skills`.

### 9. Git isolation

Each plan gets a dedicated worktree on a `plan/<plan-id>` branch. Task work happens on short-lived `task/<plan-id>/<task-id>` branches that are removed after merge. Raw `git` is blocked while a plan is active — the model uses `planner_git_*` wrappers. This keeps the plan's history clean and prevents the model from accidentally touching the base branch.

### 10. Recovery stage

If the model calls `planner_report_stuck`, or if an internal invariant fails, the plan sets `broken=true` and the tool allowlist collapses to the recovery set. The model then walks through `recovery/*` steps to diagnose and repair before resuming. The user is always consulted before any destructive recovery action.

---

Expand Down Expand Up @@ -88,14 +296,6 @@ See [SETTINGS.md](SETTINGS.md) for the full reference — worktree, compact, idl

---

## AGENTS.md Contracts

The planner treats `AGENTS.md` files as local architecture contracts — durable model-facing memory that routes the model through the project without reading irrelevant code first. Inspired by [DOX](https://github.com/agent0ai/dox).

Contracts are written only through `planner_contract_upsert`. The planner tracks touched files in `state.json` and keeps baselines so `/planner-finish` can remove or restore them.

---

## Development

```bash
Expand Down
113 changes: 113 additions & 0 deletions src/runtime/workflow-tools.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -662,4 +662,117 @@ describe("workflowToolTransition", () => {
expect(blocked.result.status).toBe("blocked");
expect(blocked.text).toContain("must be proven_bug or needs_probe");
});

it("blocks planner_finish_step at done/await_user_acceptance unless targeting handle_change_request", async () => {
const fs = new MockPlannerFs();
const git = new MockGitRunner();
const projectPaths = createProjectStoragePaths({
agentDir: "/agent",
projectRoot: "/repo/app",
});
const planPaths = createPlanStoragePaths(projectPaths, "plan-a");
const worktreePath = "/repo/app/.pi/pi-code-planner/worktrees/plan-a";
await ensureProjectRecord(fs, projectPaths);
await initializePlanFiles(
fs,
planPaths,
createPlanRecord({ planId: "plan-a", title: "Plan A" }),
);
await fs.mkdirp(worktreePath);
await initializePlanState(fs, planPaths, {
...createInitialPlanState({
baseBranch: "main",
planBranch: "plan/plan-a",
worktreePath,
}),
stage: "done",
step: "await_user_acceptance",
stepStatus: "running",
currentBranch: "plan/plan-a",
});
await setActivePlan(fs, projectPaths, "plan-a");

const blockedNoTarget = await executePlannerWorkflowTool({
fs,
git,
projectPaths,
toolName: "planner_finish_step",
params: {},
});
expect(blockedNoTarget.result.status).toBe("blocked");
expect(blockedNoTarget.text).toContain("/planner-finish");

const blockedWrongTarget = await executePlannerWorkflowTool({
fs,
git,
projectPaths,
toolName: "planner_finish_step",
params: { nextStage: "done", nextStep: "prepare_output_branch" },
});
expect(blockedWrongTarget.result.status).toBe("blocked");
expect(blockedWrongTarget.text).toContain("/planner-finish");

const allowed = await executePlannerWorkflowTool({
fs,
git,
projectPaths,
toolName: "planner_finish_step",
params: { nextStage: "done", nextStep: "handle_change_request" },
});
expect(allowed.result.status).toBe("applied");
expect(allowed.result.state).toMatchObject({
stage: "done",
step: "handle_change_request",
});
});

it("blocks planner_finish_step at internal done steps (prepare_output_branch, etc.)", async () => {
const internalSteps = [
"prepare_output_branch",
"merge_or_export_result",
"cleanup_worktree",
"mark_done",
"cleanup_plan_files",
] as const;

for (const step of internalSteps) {
const fs = new MockPlannerFs();
const git = new MockGitRunner();
const projectPaths = createProjectStoragePaths({
agentDir: "/agent",
projectRoot: "/repo/app",
});
const planPaths = createPlanStoragePaths(projectPaths, "plan-a");
const worktreePath = "/repo/app/.pi/pi-code-planner/worktrees/plan-a";
await ensureProjectRecord(fs, projectPaths);
await initializePlanFiles(
fs,
planPaths,
createPlanRecord({ planId: "plan-a", title: "Plan A" }),
);
await fs.mkdirp(worktreePath);
await initializePlanState(fs, planPaths, {
...createInitialPlanState({
baseBranch: "main",
planBranch: "plan/plan-a",
worktreePath,
}),
stage: "done",
step,
stepStatus: "running",
currentBranch: "plan/plan-a",
});
await setActivePlan(fs, projectPaths, "plan-a");

const result = await executePlannerWorkflowTool({
fs,
git,
projectPaths,
toolName: "planner_finish_step",
params: {},
});
expect(result.result.status, `step=${step}`).toBe("blocked");
expect(result.text, `step=${step}`).toContain("/planner-finish");
}
});
});
Loading
Loading