diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 7cb21b0..50aab99 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -14,17 +14,17 @@ jobs: gate: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@v6 - name: Install Rust toolchain uses: dtolnay/rust-toolchain@stable with: components: clippy - name: Install Node (odw selftest runs under cargo test) - uses: actions/setup-node@v4 + uses: actions/setup-node@v6 with: node-version: 20 - name: Cache cargo - uses: actions/cache@v4 + uses: actions/cache@v5 with: path: | ~/.cargo/registry diff --git a/Cargo.lock b/Cargo.lock index bce6f0c..e133a63 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -621,7 +621,7 @@ checksum = "04744f49eae99ab78e0d5c0b603ab218f515ea8cfe5a456d7629ad883a3b6e7d" [[package]] name = "pandacode" -version = "0.3.1" +version = "0.4.0" dependencies = [ "anyhow", "clap", diff --git a/odw/README.md b/odw/README.md index e31e4bc..4ea5553 100644 --- a/odw/README.md +++ b/odw/README.md @@ -8,6 +8,7 @@ for the full self-contained usage guide, then write a workflow script and run it ```bash odw guide # how to author + run (self-contained) +odw starter parallel-review-apply > wf.js # reusable large-project workflow odw exec --script wf.js --input-file input.json --backend pandacode odw exec --resume latest odw runs show latest @@ -32,10 +33,10 @@ Agent or CLI caller -> pandacode claude|codex|bamboo exec ``` -Claude Code remains an optional caller and compatibility surface through `/odw` -and `/workflows`. It is not required for the core path. Worker failures are also -part of the contract: failed workers return structured error feedback instead of -unclassified prose. +Claude Code, Codex, shell scripts, CI, or another agent can call the same CLI and +workflow files. ODW itself does not install slash commands or project files. +Worker failures are also part of the contract: failed workers return structured +error feedback instead of unclassified prose. ODW is a pure orchestration runtime: the only executor backend is `pandacode` (plus `mock` for token-free smoke tests). All Codex/Claude/Bamboo execution — @@ -64,6 +65,7 @@ The binary is self-documenting, so any agent can use it straight from the CLI: ```bash odw guide # the full self-contained authoring + run guide (read this first) +odw starter --list # built-in workflow templates odw doctor # check node + the pandacode executor are wired up odw spec | odw contract # machine-readable API types + the authoring contract ``` @@ -167,18 +169,30 @@ If blocked or failed, return .odw/schemas/error-feedback.schema.json. )); ``` -## Claude Usage +## External Agent Usage -Inside Claude Code: +Any agent can use ODW through the CLI. The reliable bootstrap is: -```text -/odw-audit src/routes for missing auth checks -/odw-ship implement the agreed billing permission fix -/odw-flow decompose this feature into parallel Codex tasks +```bash +odw guide +odw starter parallel-review-apply > wf.js +odw exec --script wf.js --input-file input.json --backend pandacode ``` -Claude should load `.odw/framework/workflow-api.d.ts` and write or adapt a -workflow with this shape: +The built-in starter is the large-project path: parallel Codex worktrees, +candidate-worktree review, bounded repair/re-review, approve-only atomic +landing, and read-only final verification. It repairs failed implementation +nodes or cross-owned file edits before review; declare each task with `task.file` +or `task.files` when you want maximum control. For lower decision cost, pass a +high-level `request` or `spec` without `tasks`; the starter first asks a +structured planner to produce owned task files, then runs the same preflight, +review, apply, and verification gates. Set `strictTaskFileBoundaries:false` only +with explicit owner intent. It also refuses to run when declared task files are +already dirty, because isolated worktrees branch from `HEAD`; commit/stash those +files first or pass `allowDirtyTaskFiles:true` only when the owner accepts that +workers will not see the dirty changes. + +Then write or adapt a workflow with this shape: ```js phase("Research", "read files"); @@ -246,7 +260,7 @@ agent(...))`. bamboo are unreliable at structured output). If `schema` is omitted, ODW applies no default schema. - `error feedback`: `.odw/schemas/error-feedback.schema.json` is the standard - result when a worker, command, schema, or CodexCTL step fails. + result when a worker, command, schema, or Codex step fails. `odw-orchestrator` plans and routes. The workflow script owns executable branching, fan-out, loops, intermediate state, and final aggregation. @@ -254,14 +268,14 @@ branching, fan-out, loops, intermediate state, and final aggregation. ## Lifecycle - run: `odw exec --script --input --backend ` -- optional Claude run: `/odw`, `/odw-audit`, `/odw-ship`, `/odw-flow` -- watch: `odw runs show latest`; optional Claude watch: `/workflows` +- watch: `odw runs show latest` - pause/resume: `odw exec --resume latest` -- stop: stop the invoking process, or use `/workflows` for Claude-launched runs +- stop: stop the invoking process - restart node: direct exec resumes by the stable `prompt + options` cache key; completed nodes are skipped from state (editing a node's prompt re-runs it) - live logs: `odw exec` streams node progress -- local journals: `odw runs list` and `odw runs show latest` +- local journals: `odw runs list` and `odw runs show latest`; use + `odw runs list --json` for the raw machine-readable list ## CLI @@ -275,7 +289,8 @@ odw exec --script wf.js --input '{"goal":"x"}' --backend mock # token-free dry odw exec --script wf.js --backend pandacode # real run odw exec --resume latest odw report --script wf.js --open # HTML execution-graph preview -odw runs list +odw runs list # compact run list +odw runs list --json # raw run records odw runs show latest ``` @@ -350,7 +365,11 @@ runtime behaviors: - **`isolation: "worktree"`.** Set it on an `agent(...)` node to run its executor in a throwaway git worktree branched from `cwd`, so file-mutating agents in a `parallel(...)` group do not conflict. The worktree is removed on success, - error, or timeout. Requires `cwd` to be a git repo. + error, or timeout. Requires `cwd` to be a git repo. Captured diffs can be + reviewed with `reviewWorktreeDiffs(results)` before landing; reviewers run in a + temporary candidate worktree where the combined diff is already applied. Land + only an `approve` gate with `applyWorktreeDiffs(results)`, which is atomic by + default. - **Real `budget`.** Seed `args.budget.total` (tokens). `budget.spent()` sums each node's **total** token usage (input + output + cache + reasoning) from PandaCode reports. **This differs from the built-in tool, whose `spent()` counts @@ -432,18 +451,19 @@ Honest tradeoffs — reach for the built-in Workflow when these matter: Implemented now: - Rust CLI named `odw` -- Open Dynamic Workflow project pack installer - direct workflow script contract - direct JavaScript runner through `odw exec` - prompt-slot injection for node prompts - complex flow starter with dynamic fan-out, join, parallel review, quality gate, and bounded rework loop +- reusable large-project example: parallel Codex worktrees → candidate-workspace + review gate → approve-only atomic landing → read-only verification guard - framework `.d.ts` and runtime contract docs -- project-level Claude Code agent types -- slash commands and starter workflow scripts +- built-in `odw starter parallel-review-apply` - worker output schemas -- saved workflow artifact evidence reader - live run journals under `.odw/runs` +- compact `odw runs show` summaries with report paths +- self-contained HTML execution reports - checkpointed direct resume with `odw exec --resume` - single-shot Codex execution through PandaCode (`runtime: "codex"`) - Bamboo provider dispatch through PandaCode (`runtime: "bamboo", provider`) diff --git a/odw/docs/ARCHITECTURE.md b/odw/docs/ARCHITECTURE.md index 3a1d695..e0110c5 100644 --- a/odw/docs/ARCHITECTURE.md +++ b/odw/docs/ARCHITECTURE.md @@ -83,7 +83,7 @@ Open Dynamic Workflow maps the requested workflow controls as follows: | dynamic task fan-out | `fanout(items, mapper)` maps structured upstream output into downstream workflow nodes | | pipeline phases | `pipeline(items, ...stages)` plus normal script variables in starter workflows | | live logs | `odw exec` streams workflow/phase/node/checkpoint events and writes `.odw/runs//events.jsonl` | -| local run journal | `odw runs list` / `odw runs show latest` | +| local run journal | `odw runs list` (or `--json`) / `odw runs show latest` | | direct-run resume | `odw exec --resume ` skips completed stable node ids | | stop | stop the invoking process | | save script | workflow scripts are normal files | diff --git a/odw/docs/FIRST_USER.md b/odw/docs/FIRST_USER.md index 5b6be26..d8e7f31 100644 --- a/odw/docs/FIRST_USER.md +++ b/odw/docs/FIRST_USER.md @@ -124,7 +124,7 @@ Direct runner: - run: `odw exec --script --input --backend ` - observe: `odw runs show latest` - resume a direct run: `odw exec --resume latest` -- list runs: `odw runs list --path .` +- list runs: `odw runs list --path .` (`--json` keeps the raw records) Use `odw workflows remove ` for filesystem cleanup. diff --git a/odw/examples/07-parallel-review-apply.js b/odw/examples/07-parallel-review-apply.js new file mode 100644 index 0000000..4faee35 --- /dev/null +++ b/odw/examples/07-parallel-review-apply.js @@ -0,0 +1,1059 @@ +// Example 07 — parallel Codex worktrees, structured review gate, atomic landing. +// +// This is the reusable large-project shape: +// +// optional high-level request planner +// -> parallel implementation worktrees +// -> reviewWorktreeDiffs(...) in a temporary candidate worktree +// -> applyWorktreeDiffs(...) atomically only after approve +// -> final verification in the main working directory +// +// It intentionally lands changes into the cwd when the review gate approves. +// Run it from a disposable git repo or the project you actually want to change: +// +// mkdir /tmp/odw-example && cd /tmp/odw-example && git init && git commit --allow-empty -m init +// odw exec --script /path/to/odw/examples/07-parallel-review-apply.js --backend mock --json +// +// Real run: +// +// odw exec --script /path/to/odw/examples/07-parallel-review-apply.js \ +// --backend pandacode \ +// --input '{"test":"npm test","tasks":[{"id":"docs","file":"docs/agent-loop.md","prompt":"Create docs/agent-loop.md explaining the agent loop."}]}' +// +// Lower decision-cost run: pass `request` or `spec` instead of `tasks`; the +// starter plans owned task files first, then reuses the same review/apply gate. + +export const meta = { + name: "parallel-review-apply", + description: "Implement independent tasks in worktrees, review the combined candidate, then land atomically.", + phases: [ + { title: "Plan" }, + { title: "Implement" }, + { title: "Review Gate" }, + { title: "Repair" }, + { title: "Land" }, + { title: "Verify" }, + ], +}; + +export default async function workflow() { + const DEFAULT_TASKS = [ + { + id: "owner-loop", + file: "docs/owner-loop.md", + prompt: "Create docs/owner-loop.md explaining how owner comments become AI implementation tasks.", + }, + { + id: "review-policy", + file: "docs/review-policy.md", + prompt: "Create docs/review-policy.md explaining approve/reject/needs_owner review outcomes.", + }, + ]; + + const TEST = args?.test || "echo 'no test command configured'"; + const REQUEST = String(args?.request || args?.spec || args?.goal || "").trim(); + const TASK_PLAN_SCHEMA = { + title: "task-plan.schema.json", + type: "object", + required: ["status", "summary", "tasks"], + properties: { + status: { enum: ["planned"] }, + summary: { type: "string" }, + tasks: { + type: "array", + items: { + type: "object", + required: ["id", "prompt"], + properties: { + id: { type: "string" }, + title: { type: "string" }, + file: { type: "string" }, + files: { type: "array", items: { type: "string" } }, + prompt: { type: "string" }, + verify: { type: "string" }, + runtime: { enum: ["codex", "claude", "bamboo"] }, + permission: { enum: ["limited", "max"] }, + }, + }, + }, + risks: { type: "array", items: { type: "string" } }, + }, + }; + const normalizePlannedTask = (task, index) => { + const normalized = { + ...task, + id: String(task?.id || `task-${index + 1}`).trim(), + prompt: String(task?.prompt || "").trim(), + }; + if (Array.isArray(task?.files)) { + normalized.files = task.files; + } + if (Object.prototype.hasOwnProperty.call(task || {}, "file")) { + normalized.file = task.file; + } + if (!["codex", "claude", "bamboo"].includes(normalized.runtime)) { + delete normalized.runtime; + } + if (!["limited", "max"].includes(normalized.permission)) { + delete normalized.permission; + } + return normalized; + }; + let planned = null; + let TASKS = Array.isArray(args?.tasks) && args.tasks.length ? args.tasks : null; + if (!TASKS && REQUEST) { + phase("Plan", "Decompose the high-level request into owned parallel tasks."); + planned = await agent( + `Decompose this owner request into independently owned implementation tasks for the ODW parallel-review-apply starter. + +Owner request: +${REQUEST} + +Run context: +${args?.context || REQUEST} + +Verification command: +${TEST} + +Return 2-6 tasks when practical. Each task must: +- have a stable short kebab-case id; +- declare repo-relative owned files with file or files; +- omit runtime/permission unless the owner explicitly asks; if present, runtime must be codex, claude, or bamboo, and permission must be limited or max; +- avoid duplicate file ownership across parallel tasks; +- avoid .git, .odw, .pandacode, node_modules, absolute paths, and .. paths; +- include a concrete prompt that states public API/data contracts when relevant; +- include tests/docs as owned files when the request needs them; +- keep dependent public entrypoints, tests, and docs explicit rather than implied. + +If the request is broad, choose a small coherent first slice that can be reviewed and verified safely.`, + { + id: "plan-tasks", + label: "plan-tasks", + runtime: args?.plannerRuntime || "codex", + permission: args?.plannerPermission || "limited", + schema: TASK_PLAN_SCHEMA, + schemaDescription: "Final response is a structured implementation task plan for the parallel-review-apply starter.", + retry: { maxAttempts: Math.max(1, Math.min(4, Number(args?.plannerMaxAttempts || 3))) }, + } + ); + if (!planned || planned?.ok === false || !Array.isArray(planned.tasks) || planned.tasks.length === 0) { + return { + ok: false, + error: { + category: "planning_failed", + message: "The high-level request planner did not return a usable task list.", + }, + planned, + hint: + "Pass explicit args.tasks, provide a narrower args.request/spec, or retry with a plannerRuntime that handles structured JSON reliably.", + }; + } + TASKS = planned.tasks.map(normalizePlannedTask); + } + TASKS = TASKS || DEFAULT_TASKS; + const defaultReviewRounds = TASKS.length >= 4 ? 4 : TASKS.length >= 3 ? 3 : 2; + const maxReviewRounds = Math.max(1, Math.min(4, Number(args?.maxReviewRounds || defaultReviewRounds))); + const strictTaskFileBoundaries = args?.strictTaskFileBoundaries !== false; + const allowDirtyTaskFiles = args?.allowDirtyTaskFiles === true; + const allowDuplicateTaskFiles = args?.allowDuplicateTaskFiles === true; + const allowUndeclaredTaskFiles = args?.allowUndeclaredTaskFiles === true; + + const taskIdEntries = TASKS.map((task, index) => ({ + index, + id: String(task?.id ?? "").trim(), + file: task?.file || null, + })); + const missingTaskIds = taskIdEntries + .filter((entry) => !entry.id) + .map((entry) => ({ + index: entry.index, + file: entry.file, + })); + const taskIdOwners = new Map(); + for (const entry of taskIdEntries) { + if (!entry.id) { + continue; + } + const owners = taskIdOwners.get(entry.id) || []; + owners.push({ index: entry.index, file: entry.file }); + taskIdOwners.set(entry.id, owners); + } + const duplicateTaskIds = [...taskIdOwners.entries()] + .filter(([, owners]) => owners.length > 1) + .map(([id, owners]) => ({ id, owners })); + if (missingTaskIds.length > 0 || duplicateTaskIds.length > 0) { + return { + ok: false, + error: { + category: "invalid_task_ids", + message: + "Every parallel task must declare a stable unique id before worktrees can be created.", + }, + missingTaskIds, + duplicateTaskIds, + hint: + "Assign each task a short unique id. ODW uses task ids for node keys, sessions, repair history, and reports.", + }; + } + + const declaredTaskFileValues = (task) => { + const values = []; + if (Object.prototype.hasOwnProperty.call(task || {}, "file")) { + values.push(task.file); + } + if (Array.isArray(task?.files)) { + values.push(...task.files); + } + return values; + }; + + const normalizeTaskFile = (value) => { + const raw = String(value ?? "").trim(); + if (!raw) { + return { raw, path: null, error: "empty_path" }; + } + if (raw.includes("\0")) { + return { raw, path: null, error: "nul_byte" }; + } + if (raw.startsWith("/") || raw.startsWith("\\") || /^[A-Za-z]:[\\/]/.test(raw)) { + return { raw, path: null, error: "absolute_path" }; + } + if (raw.includes("\\")) { + return { raw, path: null, error: "backslash_path" }; + } + const parts = raw.split("/").filter((part) => part && part !== "."); + if (parts.length === 0) { + return { raw, path: null, error: "empty_path" }; + } + if (parts.some((part) => part === "..")) { + return { raw, path: null, error: "path_escape" }; + } + const blocked = parts.find((part) => [".git", ".odw", ".pandacode", "node_modules"].includes(part)); + if (blocked) { + return { raw, path: null, error: "reserved_path", segment: blocked }; + } + return { raw, path: parts.join("/"), error: null }; + }; + + const declaredTaskFileEntries = (task, index) => + declaredTaskFileValues(task).map((value) => ({ + ...normalizeTaskFile(value), + task: task.id, + index, + })); + + const declaredFilesByTask = new Map( + TASKS.map((task, index) => [task.id, declaredTaskFileEntries(task, index)]) + ); + + const taskFiles = (task) => + [...new Set((declaredFilesByTask.get(task.id) || []) + .filter((entry) => !entry.error && entry.path) + .map((entry) => entry.path))]; + + const invalidTaskFiles = [...declaredFilesByTask.values()] + .flat() + .filter((entry) => entry.error) + .map((entry) => ({ + task: entry.task, + index: entry.index, + file: entry.raw, + error: entry.error, + segment: entry.segment || null, + })); + if (invalidTaskFiles.length > 0) { + return { + ok: false, + error: { + category: "invalid_task_files", + message: + "Declared task files must be normalized repo-relative paths outside ODW/PandaCode/internal generated directories.", + }, + invalidTaskFiles, + hint: + "Use POSIX-style repo-relative paths like src/api.ts. Do not use absolute paths, '..', backslashes, .git, .odw, .pandacode, or node_modules.", + }; + } + + const invalidTaskPrompts = TASKS + .map((task, index) => ({ + index, + id: task.id, + files: taskFiles(task), + type: typeof task?.prompt, + prompt: task?.prompt, + })) + .filter((entry) => typeof entry.prompt !== "string" || entry.prompt.trim().length === 0) + .map((entry) => ({ + index: entry.index, + id: entry.id, + files: entry.files, + type: entry.type, + })); + if (invalidTaskPrompts.length > 0) { + return { + ok: false, + error: { + category: "invalid_task_prompts", + message: + "Every parallel task must declare a non-empty prompt before worktrees can be created.", + }, + invalidTaskPrompts, + hint: + "Write a concrete task prompt for each task. Empty or non-string prompts make implementation nodes ambiguous and unsafe.", + }; + } + + const taskBrief = TASKS.map( + (task) => `- ${task.id}: ${taskFiles(task).join(", ") || "(files from prompt)"} — ${task.prompt}` + ).join("\n"); + const runContext = + args?.context || + "Large-project default: land low-risk, internally consistent changes with verification evidence."; + const reviewContext = `Caller-provided context and task prompts are the owner-provided product intent for this run. + +Run context: +${runContext} + +Planned tasks: +${taskBrief}`; + const reviewCriteria = args?.criteria || [ + "Treat the run context and task prompts as the acceptance intent for this batch.", + "Approve when the candidate satisfies that stated intent, applies cleanly, and has adequate verification evidence.", + "Use needs_owner only when the candidate makes a consequential product choice not present in the run context or task prompts, or when the stated intent conflicts with repository evidence.", + "Reject when there are blockers, failed verification, semantic conflicts, or unsafe/unrelated edits.", + "When rejecting, include concrete repo file paths for root-cause code when known; do not name only tests, function names, or symptoms.", + ]; + + const implementationPrompt = (task, repairFeedback) => `Batch context: +${runContext} + +Planned task contracts: +${taskBrief} + +Current task (${task.id}): +${task.prompt} + +${repairFeedback ? `Review feedback to address before returning:\n${repairFeedback} + +When feedback references files owned by other tasks, treat those references as evidence only. Repair only this task's declared file list and preserve the original task intent. +` : ""} +Constraints: +- Only edit the files needed for this task${taskFiles(task).length ? `: ${taskFiles(task).join(", ")}` : ""}. +- Keep the change independently reviewable. +- Align this task with the run context and sibling task contracts above; do not invent a different public API, data shape, file name, or acceptance contract. +- Tests and docs must target the declared task files and exports from the planned tasks. Do not invent package entrypoints, public modules, or skip paths unless that file is declared in a task's ownership list. +- If verification cannot pass in this isolated worktree because dependent task files are absent, still write the real intended tests/docs and report that dependency honestly; do not skip tests to make verification pass. +- Do not claim defaults or generated files that are not directly true from the task context or project evidence. +- Run this verification if relevant: ${task.verify || TEST} +- Final response: one concise sentence with changed files and verification result.`; + + const undeclaredTaskFiles = TASKS + .map((task, index) => ({ + index, + id: task.id, + files: taskFiles(task), + })) + .filter((entry) => entry.files.length === 0) + .map((entry) => ({ + index: entry.index, + id: entry.id, + })); + if (!allowUndeclaredTaskFiles && undeclaredTaskFiles.length > 0) { + return { + ok: false, + error: { + category: "undeclared_task_files", + message: + "Every parallel task must declare task.file or task.files so ODW can enforce ownership and target repairs.", + }, + undeclaredTaskFiles, + hint: + "Declare each task's owned files, split exploratory work into a planning step, or pass allowUndeclaredTaskFiles:true only with explicit owner intent.", + }; + } + + const fileOwner = new Map(); + const fileOwners = new Map(); + for (const task of TASKS) { + for (const file of taskFiles(task)) { + if (!fileOwner.has(file)) { + fileOwner.set(file, task); + } + const owners = fileOwners.get(file) || []; + owners.push(task); + fileOwners.set(file, owners); + } + } + + const duplicateTaskFiles = [...fileOwners.entries()] + .filter(([, owners]) => owners.length > 1) + .map(([file, owners]) => ({ + file, + tasks: owners.map((task) => task.id), + })); + if (!allowDuplicateTaskFiles && duplicateTaskFiles.length > 0) { + return { + ok: false, + error: { + category: "duplicate_task_files", + message: + "Multiple parallel tasks declare the same file. This starter expects independently owned task files.", + }, + duplicateTaskFiles, + hint: + "Merge those tasks, run them serially, or pass allowDuplicateTaskFiles:true only when overlapping patches are intentional and reviewable.", + }; + } + + const startSnapshot = captureMainWorktreeSnapshot({ label: "starter-preflight" }); + const dirtyTaskFiles = allowDirtyTaskFiles + ? [] + : startSnapshot.files.filter((file) => fileOwner.has(file)); + if (dirtyTaskFiles.length > 0) { + return { + ok: false, + error: { + category: "dirty_task_files", + message: + "Task files already have uncommitted changes. Isolated worktrees branch from HEAD and would not see those changes.", + }, + dirtyTaskFiles, + hint: + "Commit or stash the listed task files before running this starter again, or pass allowDirtyTaskFiles:true with explicit owner intent.", + }; + } + + const runImplementationRound = async (round, repairFeedback = "", roundTasks = TASKS) => { + const isRepair = round > 1; + const activeTasks = Array.isArray(roundTasks) && roundTasks.length ? roundTasks : TASKS; + phase( + isRepair ? "Repair" : "Implement", + isRepair + ? `Redo ${activeTasks.length} rejected task(s) from clean worktrees using review feedback (round ${round}/${maxReviewRounds}).` + : "Fan out independent Codex tasks into isolated worktrees." + ); + const results = await parallel( + activeTasks.map((task) => () => + agent(implementationPrompt(task, repairFeedback), { + id: isRepair ? `${task.id}-repair-${round - 1}` : task.id, + label: isRepair ? `repair:${task.id}` : `impl:${task.id}`, + runtime: task.runtime || "codex", + isolation: "worktree", + permission: task.permission || "max", + // Mock backend only: makes the dry run produce a real captured diff. + mockWriteFile: task.mockFile || task.file || task.files?.[0], + mockFail: Boolean(task.mockFail), + }) + ), + { label: isRepair ? `repair-${round - 1}` : "implement" } + ); + const annotated = activeTasks.map((task, index) => ({ task, result: results[index] })); + const candidates = annotated + .filter(({ result }) => result?.worktree?.changed) + .map(({ task, result }) => ({ + ...result, + taskId: task.id, + taskFile: task.file || task.files?.[0] || null, + taskFiles: taskFiles(task), + })); + const failedTasks = annotated + .filter(({ result }) => !result || result?.ok === false) + .map(({ task, result }) => ({ + task, + message: + result?.error?.message || + result?.feedback?.user_message || + result?.text || + "implementation node failed or returned no result", + })); + const scopeIssues = strictTaskFileBoundaries + ? candidates.flatMap((candidate) => { + const task = activeTasks.find((item) => item.id === candidate.taskId); + const allowed = new Set(taskFiles(task)); + if (allowed.size === 0) { + return []; + } + return (candidate.worktree?.files || []) + .filter((file) => !allowed.has(file)) + .map((file) => ({ + task, + file, + ownerTask: fileOwner.get(file) || null, + })); + }) + : []; + log( + (isRepair ? "repair " : "") + + "candidate files=" + + candidates.flatMap((result) => result.worktree.files).join("|") + ); + return { activeTasks, annotated, candidates, failedTasks, scopeIssues, results }; + }; + + const implementationFeedback = (issues) => { + const failed = (issues?.failedTasks || []) + .map((item) => `failed_task: ${item.task.id} (${item.task.file || "no file"}) — ${item.message}`) + .join("\n"); + const scope = (issues?.scopeIssues || []) + .map((item) => { + const owner = item.ownerTask ? `; owned_by=${item.ownerTask.id}` : ""; + return `scope_violation: task ${item.task.id} touched ${item.file}${owner}`; + }) + .join("\n"); + return `Pre-review implementation gate blocked this batch before reviewer agents ran. +${failed} +${scope} + +Repair only the listed task files. If multiple tasks need coordinated changes, keep each task inside its declared file list or set args.strictTaskFileBoundaries=false with explicit owner intent.`.slice(0, args?.maxRepairFeedbackChars || 12000); + }; + + const implementationIssues = (implementation) => { + const tasks = new Map(); + for (const item of implementation?.failedTasks || []) { + tasks.set(item.task.id, item.task); + } + for (const issue of implementation?.scopeIssues || []) { + tasks.set(issue.task.id, issue.task); + if (issue.ownerTask) { + tasks.set(issue.ownerTask.id, issue.ownerTask); + } + } + return { + failedTasks: implementation?.failedTasks || [], + scopeIssues: implementation?.scopeIssues || [], + tasks: [...tasks.values()], + }; + }; + + const reviewFeedback = (gate) => { + const reviewLines = (gate?.reviews || []) + .map((review) => { + const parts = [ + `Reviewer ${review.reviewer || "review"} decision=${review.decision || "unknown"}`, + review.summary ? `summary: ${review.summary}` : "", + ...(review.blockers || []).map((item) => `blocker: ${item}`), + ...(review.risks || []).map((item) => `risk: ${item}`), + ...(review.verification || []).map((item) => `verification: ${item}`), + ].filter(Boolean); + return parts.join("\n"); + }) + .join("\n\n"); + const ownerQuestions = (gate?.owner_questions || []).map((item) => `owner_question: ${item}`).join("\n"); + return `Previous review decision: ${gate?.decision || "unknown"} +${(gate?.blockers || []).map((item) => `Blocking issue: ${item}`).join("\n")} +${ownerQuestions} + +Reviewer evidence: +${reviewLines}`.slice(0, args?.maxRepairFeedbackChars || 12000); + }; + + const reviewBlockers = (gate) => [ + ...(gate?.blockers || []), + ...(gate?.reviews || []).flatMap((review) => review.blockers || []), + ].filter(Boolean); + + const uniqueTasks = (items) => { + const seen = new Map(); + for (const task of items || []) { + if (task?.id && !seen.has(task.id)) { + seen.set(task.id, task); + } + } + return [...seen.values()]; + }; + + const isTestFile = (file) => { + const name = String(file || "").split("/").pop() || ""; + return /(^test\b|\.test\.|\.spec\.|test-|tests?)/i.test(name); + }; + + const rootCausePhrases = + /\b(does not|doesn't|fails to|returns?|drops?|loses?|omits?|filters?|duplicates?|double-counts?|deduplicates?|preserv(?:e|es)|infers?|violates?|contradicts?|documents?|exposes?|missing|required|should|must|cannot|can't|wrong)\b/; + + const blockerMatchScore = (text, file, index) => { + const start = Math.max(0, index - 100); + const end = Math.min(text.length, index + file.length + 220); + const around = text.slice(start, end).toLowerCase(); + const after = text.slice(index + file.length, end).toLowerCase(); + let score = 0; + if (rootCausePhrases.test(after)) { + score += 100; + } + if (/\b(root cause|because|blocker|violates?|contradicts?)\b/.test(around)) { + score += 20; + } + if ( + isTestFile(file) && + /\b(node\s+\S*test|tests?\s+fail|fails?\s+at|exits?\s+with\s+code|assertionerror|expected\b.*\bactual|actual\b.*\bexpected)\b/.test( + around + ) + ) { + score -= 70; + } + return score; + }; + + const escapeRegExp = (value) => String(value).replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); + + const extractDefinedSymbols = (diff) => { + const symbols = []; + for (const line of String(diff || "").split(/\r?\n/)) { + if (line.startsWith("-")) { + continue; + } + const text = line.startsWith("+") || line.startsWith(" ") ? line.slice(1) : line; + for (const pattern of [ + /\b(?:export\s+)?(?:async\s+)?function\s+([A-Za-z_$][\w$]*)\b/g, + /\b(?:export\s+)?(?:const|let|var)\s+([A-Za-z_$][\w$]*)\s*=/g, + ]) { + let match = null; + while ((match = pattern.exec(text))) { + symbols.push(match[1]); + } + } + } + return [...new Set(symbols)]; + }; + + const extractPromptOwnedSymbols = (task) => { + const prompt = String(task?.prompt || ""); + const symbols = []; + for (const file of taskFiles(task)) { + const fileIndex = prompt.indexOf(file); + if (fileIndex < 0) { + continue; + } + const nearby = prompt.slice(Math.max(0, fileIndex - 260), fileIndex + file.length + 80); + const pattern = /\b([A-Za-z_$][\w$]*)\s*\(/g; + let match = null; + while ((match = pattern.exec(nearby))) { + symbols.push(match[1]); + } + } + return [...new Set(symbols)]; + }; + + const candidateTouchesTask = (candidate, task) => + taskFiles(task).some((file) => candidate?.worktree?.files?.includes(file)); + + const candidateForTask = (task) => + (candidates || []).find((candidate) => candidateTouchesTask(candidate, task)); + + const taskDefinedSymbols = (task) => { + const candidate = candidateForTask(task); + return extractDefinedSymbols(candidate?.worktree?.diff || ""); + }; + + const symbolMatchScore = (text, symbol, index) => { + const start = Math.max(0, index - 100); + const end = Math.min(text.length, index + symbol.length + 220); + const around = text.slice(start, end).toLowerCase(); + const after = text.slice(index + symbol.length, end).toLowerCase(); + let score = 80; + if (rootCausePhrases.test(after)) { + score += 100; + } + if (/\b(root cause|because|blocker|violates?|contradicts?)\b/.test(around)) { + score += 20; + } + return score; + }; + + const symbolMatchesForBlocker = (text, tasksWithFiles, symbolsForTask) => { + const matches = []; + for (const task of tasksWithFiles) { + for (const symbol of symbolsForTask(task)) { + const pattern = new RegExp(`\\b${escapeRegExp(symbol)}\\b`, "g"); + let match = null; + while ((match = pattern.exec(text))) { + matches.push({ + task, + index: match.index, + score: symbolMatchScore(text, symbol, match.index), + }); + } + } + } + return matches; + }; + + const symbolTasksForBlocker = (blocker, tasksWithFiles, options = {}) => { + const text = String(blocker || ""); + const matches = symbolMatchesForBlocker(text, tasksWithFiles, taskDefinedSymbols); + if (matches.length === 0 && options.includePromptFallback !== false) { + matches.push(...symbolMatchesForBlocker(text, tasksWithFiles, extractPromptOwnedSymbols)); + } + if (matches.length === 0) { + return []; + } + const bestScore = Math.max(...matches.map((match) => match.score)); + return uniqueTasks( + matches + .filter((match) => match.score === bestScore) + .map((match) => match.task) + ); + }; + + const primaryTasksForBlocker = (blocker, tasksWithFiles) => { + const text = String(blocker || ""); + const matches = []; + for (const task of tasksWithFiles) { + for (const file of taskFiles(task)) { + let searchFrom = 0; + while (searchFrom < text.length) { + const index = text.indexOf(file, searchFrom); + if (index < 0) { + break; + } + matches.push({ + task, + index, + score: blockerMatchScore(text, file, index), + }); + searchFrom = index + file.length; + } + } + } + const positiveFileTasks = () => { + const positiveFileMatches = matches.filter((match) => match.score > 0); + if (positiveFileMatches.length === 0) { + return []; + } + const bestScore = Math.max(...positiveFileMatches.map((match) => match.score)); + const bestIndex = Math.min( + ...positiveFileMatches + .filter((match) => match.score === bestScore) + .map((match) => match.index) + ); + return uniqueTasks( + positiveFileMatches + .filter((match) => match.score === bestScore && match.index === bestIndex) + .map((match) => match.task) + ); + }; + const definedSymbolMatches = symbolTasksForBlocker(blocker, tasksWithFiles, { + includePromptFallback: false, + }); + if (matches.length > 0 && definedSymbolMatches.length > 0) { + return uniqueTasks([...positiveFileTasks(), ...definedSymbolMatches]); + } + const allowPromptSymbolFallback = + matches.length === 0 || /\b(symptom|caller|callsite|call-site|stack trace)\b/i.test(text); + const symbolMatches = symbolTasksForBlocker(blocker, tasksWithFiles, { + includePromptFallback: allowPromptSymbolFallback, + }); + if (symbolMatches.length > 0) { + return uniqueTasks([...positiveFileTasks(), ...symbolMatches]); + } + if (matches.length === 0) { + return []; + } + const positiveTasks = positiveFileTasks(); + if (positiveTasks.length > 0) { + return positiveTasks; + } + const bestScore = Math.max(...matches.map((match) => match.score)); + if (bestScore > 0) { + const bestIndex = Math.min( + ...matches + .filter((match) => match.score === bestScore) + .map((match) => match.index) + ); + return uniqueTasks( + matches + .filter((match) => match.score === bestScore && match.index === bestIndex) + .map((match) => match.task) + ); + } + const earliestIndex = Math.min(...matches.map((match) => match.index)); + return uniqueTasks( + matches + .filter((match) => match.index === earliestIndex) + .map((match) => match.task) + ); + }; + + const tasksForReviewRepair = (gate) => { + const blockers = reviewBlockers(gate); + const tasksWithFiles = TASKS.filter((task) => taskFiles(task).length > 0); + if (blockers.length === 0 || tasksWithFiles.length === 0) { + return TASKS; + } + const primaryMatched = uniqueTasks( + blockers.flatMap((blocker) => primaryTasksForBlocker(blocker, tasksWithFiles)) + ); + if (primaryMatched.length > 0) { + return primaryMatched; + } + const fallbackMatched = uniqueTasks( + tasksWithFiles.filter((task) => + blockers.some((blocker) => taskFiles(task).some((file) => String(blocker).includes(file))) + ) + ); + return fallbackMatched.length > 0 ? fallbackMatched : TASKS; + }; + + const candidateFiles = (items) => + [...new Set((items || []).flatMap((candidate) => candidate?.worktree?.files || []))]; + const summarizeGate = (round, label, value) => ({ + round, + label, + decision: value?.decision, + applyReady: value?.applyReady === true, + files: value?.files || [], + reviewers: (value?.reviews || []).map((review) => ({ + reviewer: review.reviewer, + decision: review.decision, + summary: review.summary, + blockers: review.blockers || [], + risks: review.risks || [], + owner_questions: review.owner_questions || [], + verification: review.verification || [], + })), + blockers: value?.blockers || [], + risks: value?.risks || [], + owner_questions: value?.owner_questions || [], + verification: value?.verification || [], + }); + + const history = []; + if (planned) { + history.push({ + step: "plan", + status: planned.status, + summary: planned.summary, + tasks: TASKS.map((task) => ({ + id: task.id, + files: taskFiles(task), + })), + risks: planned.risks || [], + }); + } + let implementation = await runImplementationRound(1); + let candidates = implementation.candidates; + history.push({ + step: "implement", + round: 1, + tasks: TASKS.map((task) => task.id), + files: candidateFiles(candidates), + }); + + if (candidates.length === 0) { + return { ok: false, error: "no captured worktree changes", history, results: implementation.results }; + } + + let gate = null; + for (let round = 1; round <= maxReviewRounds; round += 1) { + const preReviewIssues = implementationIssues(implementation); + if (preReviewIssues.tasks.length > 0) { + history.push({ + step: "pre_review_block", + round, + failed_tasks: preReviewIssues.failedTasks.map((item) => ({ + id: item.task.id, + file: item.task.file || null, + files: taskFiles(item.task), + message: item.message, + })), + scope_issues: preReviewIssues.scopeIssues.map((item) => ({ + task: item.task.id, + file: item.file, + owner_task: item.ownerTask?.id || null, + })), + }); + if (round >= maxReviewRounds) { + return { + ok: false, + error: { + category: "implementation_pre_review_blocked", + message: "Implementation tasks failed or crossed task file boundaries before review.", + }, + history, + results: implementation.results, + }; + } + const repairTasks = preReviewIssues.tasks; + log("pre-review repairing tasks=" + repairTasks.map((task) => task.id).join("|")); + const retainedCandidates = + repairTasks.length === TASKS.length + ? [] + : candidates.filter((candidate) => !repairTasks.some((task) => candidateTouchesTask(candidate, task))); + history.push({ + step: "repair_plan", + reason: "pre_review_block", + round: round + 1, + tasks: repairTasks.map((task) => task.id), + retained_files: candidateFiles(retainedCandidates), + }); + implementation = await runImplementationRound(round + 1, implementationFeedback(preReviewIssues), repairTasks); + candidates = [...retainedCandidates, ...implementation.candidates]; + history.push({ + step: "repair", + reason: "pre_review_block", + round: round + 1, + tasks: repairTasks.map((task) => task.id), + files: candidateFiles(implementation.candidates), + candidate_files: candidateFiles(candidates), + }); + if (candidates.length === 0) { + return { ok: false, error: "no captured worktree changes after pre-review repair", history, results: implementation.results }; + } + continue; + } + + const gateLabel = round === 1 ? "batch-review" : `batch-review-r${round}`; + phase( + "Review Gate", + round === 1 + ? "Review the combined candidate before landing." + : `Re-review the repaired candidate before landing (round ${round}/${maxReviewRounds}).` + ); + gate = await reviewWorktreeDiffs(candidates, { + label: gateLabel, + context: reviewContext, + criteria: reviewCriteria, + reviewers: + args?.reviewers || [ + { + label: "correctness", + runtime: "codex", + perspective: "Correctness, regression risk, and test evidence.", + }, + { + label: "owner-risk", + runtime: "codex", + perspective: + "Owner decision risk after treating run context and task prompts as owner-provided intent; do not ask the owner to reconfirm already stated intent.", + }, + ], + maxDiffChars: args?.maxDiffChars || 50000, + }); + history.push({ step: "review", ...summarizeGate(round, gateLabel, gate) }); + + if (gate.applyReady) { + break; + } + + if (gate.decision === "needs_owner" || round >= maxReviewRounds) { + return { ok: false, gate, history, results: implementation.results }; + } + + const repairTasks = tasksForReviewRepair(gate); + log("repairing tasks=" + repairTasks.map((task) => task.id).join("|")); + const retainedCandidates = + repairTasks.length === TASKS.length + ? [] + : candidates.filter((candidate) => !repairTasks.some((task) => candidateTouchesTask(candidate, task))); + history.push({ + step: "repair_plan", + round: round + 1, + tasks: repairTasks.map((task) => task.id), + retained_files: candidateFiles(retainedCandidates), + }); + implementation = await runImplementationRound(round + 1, reviewFeedback(gate), repairTasks); + candidates = [...retainedCandidates, ...implementation.candidates]; + history.push({ + step: "repair", + round: round + 1, + tasks: repairTasks.map((task) => task.id), + files: candidateFiles(implementation.candidates), + candidate_files: candidateFiles(candidates), + }); + if (implementation.candidates.length === 0) { + return { ok: false, error: "no captured worktree changes after repair", gate, history, results: implementation.results }; + } + } + + if (!gate?.applyReady) { + return { ok: false, gate, history, results: implementation.results }; + } + + phase("Land", "Apply approved captured patches atomically."); + const landed = applyWorktreeDiffs(candidates, { label: "approved-batch" }); + if (!landed.ok) { + return { ok: false, gate, history, landed }; + } + + phase("Verify", "Verify the landed main working directory."); + const verifySnapshot = captureMainWorktreeSnapshot({ label: "before-final-verify" }); + const verification = await agent( + `Read-only verification for the approved batch now landed in the main working directory. + +Tasks: +${TASKS.map((task) => `- ${task.id}: ${taskFiles(task).join(", ") || task.file || "(files from prompt)"}`).join("\n")} + +Run this command and report the exact result: +${TEST} + +Do not modify files, install dependencies, format code, or apply fixes. If verification fails, report the failure and evidence; do not repair it in this step.`, + { + id: "verify-landed", + label: "verify-landed", + runtime: "codex", + permission: "limited", + mockWriteFile: args?.verifyMockWriteFile, + mockFail: Boolean(args?.verifyMockFail), + } + ); + const verifyGuard = assertMainWorktreeUnchanged(verifySnapshot, { label: "final-verify-readonly" }); + const verificationOk = verification?.ok !== false; + history.push({ + step: "verify", + ok: verificationOk && verifyGuard.ok, + guard: { + ok: verifyGuard.ok, + files: verifyGuard.files, + added: verifyGuard.added, + removed: verifyGuard.removed, + modified: verifyGuard.modified, + }, + }); + + if (!verifyGuard.ok) { + const verifyRestore = restoreMainWorktreeSnapshot(verifySnapshot, verifyGuard, { label: "final-verify-restore" }); + history[history.length - 1].restore = { + ok: verifyRestore.ok, + restored: verifyRestore.restored, + removed: verifyRestore.removed, + errors: verifyRestore.errors, + }; + return { + ok: false, + error: { + category: "verification_mutated_worktree", + message: "Final verification changed the main worktree after approve-only landing.", + }, + gate, + history, + landed, + verification, + verifyGuard, + verifyRestore, + }; + } + + if (!verificationOk) { + return { + ok: false, + error: { + category: "verification_failed", + message: "Final verification returned ok:false after approve-only landing.", + }, + gate, + history, + landed, + verification, + verifyGuard, + }; + } + + return { + ok: true, + gate, + history, + landed, + verification, + verifyGuard, + }; +} diff --git a/odw/examples/README.md b/odw/examples/README.md index 094711c..c681648 100644 --- a/odw/examples/README.md +++ b/odw/examples/README.md @@ -11,6 +11,9 @@ odw exec --script examples/01-single-node.js --backend mock --json # Preview the execution graph from a mock dry run. odw report --script examples/01-single-node.js --open +# Print the large-project starter from an installed `odw` binary. +odw starter parallel-review-apply > wf.js + # Real run (needs `pandacode` on PATH, or ODW_PANDACODE_BIN set): odw exec --script examples/01-single-node.js --backend pandacode --json ``` @@ -23,9 +26,46 @@ odw exec --script examples/01-single-node.js --backend pandacode --json | `04-bamboo-provider.js` | Bamboo domestic-provider dispatch with `runtime:"bamboo"` and `pandacode.bamboo(...)` | | `05-heterogeneous-models.js` | fan one question across several different models in parallel (deepseek/qwen/kimi), reconcile with claude — ODW's heterogeneous-executor edge | | `06-build-project.js` | build a real project end-to-end: codex implements → claude reviews → codex fixes + runs the test command until green (the dogfood KV-store / roman-numeral shape) | +| `07-parallel-review-apply.js` | large-project parallel landing: optional request/spec planner → Codex worktrees → `reviewWorktreeDiffs` candidate workspace → bounded repair/re-review on reject → approve-only atomic `applyWorktreeDiffs` → read-only verification guard | Real `worktree` runs require `cwd` to be a git repository, and any spec/fixture the agent must read should be committed first (the worktree branches from HEAD). +Example 07 intentionally lands approved changes into `cwd`; run it from the +target project or a disposable git repo. It treats caller-supplied context and +task prompts as owner intent. Pass explicit `args.tasks` when you want full +control; pass a high-level `args.request` or `args.spec` without `tasks` when +you want the starter to plan owned task files first. It repairs +blocker-matched tasks up to +`args.maxReviewRounds` (default 2 for small batches and 3 for 3+ tasks), falls back to full-batch repair when blockers +cannot be mapped to task files, stops for `needs_owner`, and treats final +verification as read-only: any post-approval file mutation fails the run and is +restored from the pre-verification snapshot. +Each task must declare a stable unique `id`; ODW uses task ids for node keys, +sessions, repair history, and reports. Each task must also declare a non-empty +string `prompt`; empty or non-string prompts are rejected before worktrees are +created. +Before review, it also blocks failed implementation nodes and cross-owned file +edits. Declare one `task.file` or multiple `task.files` for each task. Use a +built-in request/spec planner for exploratory decomposition, or set +`allowUndeclaredTaskFiles:true` only when the owner accepts weaker ownership +checks. Declared files must be normalized repo-relative paths outside `.git`, +`.odw`, `.pandacode`, and `node_modules`; absolute paths, backslashes, and `..` +escapes are rejected before worktrees are created. Set +`strictTaskFileBoundaries:false` only with explicit owner intent. +Test and documentation tasks should target the declared files and exports from +the planned task set. If a required public entrypoint is missing from task +ownership, treat it as a planning blocker or add it to a task; do not invent +undeclared entrypoints or skip tests to make isolated verification pass. +The starter injects the run context and full planned task list into every +implementation/repair prompt, so tests, docs, entrypoints, and implementation +modules can align on one shared contract even though they run in isolated +worktrees. +Because isolated worktrees branch from `HEAD`, it also blocks dirty declared +task files before implementation; commit/stash them first, or set +`allowDirtyTaskFiles:true` only when the owner accepts that workers will not see +those uncommitted changes. +It also blocks duplicate declared ownership of the same file; merge those tasks, +run them serially, or set `allowDuplicateTaskFiles:true` only when overlapping +patches are intentional and reviewable. -See `skills/odw/SKILL.md` for the full authoring guide and -`.odw/framework/workflow-api.d.ts` (after `odw init`) for the typed API. +See `odw guide` for the full authoring guide and `odw spec` for the typed API. diff --git a/odw/scripts/selftest.mjs b/odw/scripts/selftest.mjs index 341bbb8..7ee16a3 100644 --- a/odw/scripts/selftest.mjs +++ b/odw/scripts/selftest.mjs @@ -5,7 +5,7 @@ // accounting, nested workflow, per-phase model / whenToUse). // // Usage: -// node scripts/selftest.mjs # uses ./target/debug/odw +// node scripts/selftest.mjs # uses target/debug/odw (crate or workspace) // ODW=/path/to/odw node scripts/selftest.mjs // // Exits 0 only if every assertion passes. Deterministic and token-free @@ -18,8 +18,18 @@ import { import { tmpdir, cpus } from "node:os"; import { join, resolve } from "node:path"; +function defaultOdwBin() { + for (const candidate of ["./target/debug/odw", "../target/debug/odw"]) { + const resolved = resolve(candidate); + if (existsSync(resolved)) { + return resolved; + } + } + return resolve("./target/debug/odw"); +} + // Absolute so it still resolves when a test runs odw from another cwd. -const ODW = resolve(process.env.ODW || "./target/debug/odw"); +const ODW = resolve(process.env.ODW || defaultOdwBin()); const REPO = process.cwd(); const EXPECTED_MAX = Math.max(1, Math.min(16, cpus().length - 2)); @@ -80,6 +90,22 @@ function runOdw(args, { cwd = REPO, env = {}, pandacodeBin = null } = {}) { return { code: r.status ?? 1, out: (r.stdout || "") + (r.stderr || "") }; } +function makeGitRepo(prefix = "odw-selftest-git-") { + const dir = mkdtempSync(join(tmpdir(), prefix)); + const git = (args) => { + const r = spawnSync("git", args, { cwd: dir, encoding: "utf8" }); + assert(r.status === 0, `git ${args.join(" ")} failed: ${(r.stdout || "")}${(r.stderr || "")}`); + return r; + }; + git(["init", "-q"]); + git(["config", "user.email", "odw-selftest@example.invalid"]); + git(["config", "user.name", "ODW Selftest"]); + writeFileSync(join(dir, "README.md"), "# odw selftest\n"); + git(["add", "."]); + git(["commit", "-q", "-m", "init"]); + return { dir, git }; +} + const ev = (events, type) => events.filter((e) => e && e.type === type); const logLine = (out, re) => (out.match(re) || [])[1]; @@ -204,6 +230,292 @@ return {ok:true};`); assert(odwWorktreeLeftovers().length === 0, `dir not removed after capture:\n${odwWorktreeLeftovers().join("\n")}`); }); +test("worktree: captured parallel diffs can be applied back to cwd", () => { + const { dir } = makeGitRepo("odw-apply-ok-"); + try { + const r = run(`export const meta={name:"wapply"}; +phase("P",""); +const results = await parallel([ + () => agent("write a", { id:"a", label:"a", isolation:"worktree", mockWriteFile:"a.txt" }), + () => agent("write b", { id:"b", label:"b", isolation:"worktree", mockWriteFile:"b.txt" }) +]); +const landed = applyWorktreeDiffs(results); +log("APPLY ok="+landed.ok+" applied="+landed.applied+" failed="+landed.failed+" files="+landed.results.flatMap((r)=>r.files).join("|")); +return { ok: landed.ok && landed.applied === 2 && landed.failed === 0 };`, { cwd: dir }); + assert(r.code === 0, `apply run failed: ${r.out.slice(-500)}`); + assert(/APPLY ok=true applied=2 failed=0 files=a\.txt\|b\.txt/.test(r.out), `apply summary wrong: ${r.out.slice(-500)}`); + assert(readFileSync(join(dir, "a.txt"), "utf8").includes("mock change by a"), "a.txt was not applied to cwd"); + assert(readFileSync(join(dir, "b.txt"), "utf8").includes("mock change by b"), "b.txt was not applied to cwd"); + assert(ev(r.events, "worktree_patch_apply").filter((e) => e.ok && e.applied).length === 2, "missing successful apply events"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("worktree: combined patches preserve trailing blank context lines", () => { + const { dir, git } = makeGitRepo("odw-combined-blank-context-"); + try { + writeFileSync(join(dir, "a.md"), "alpha\n\n"); + git(["add", "a.md"]); + git(["commit", "-q", "-m", "blank-context-base"]); + const blankContextDiff = `diff --git a/a.md b/a.md +--- a/a.md ++++ b/a.md +@@ -1,2 +1,2 @@ +-alpha ++alpha updated +${" "} +`; + const newFileDiff = `diff --git a/b.md b/b.md +new file mode 100644 +--- /dev/null ++++ b/b.md +@@ -0,0 +1 @@ ++bravo +`; + const r = run(`export const meta={name:"wblankctx"}; +phase("P",""); +const candidates = [ + { changed:true, files:["a.md"], diff:${JSON.stringify(blankContextDiff)} }, + { changed:true, files:["b.md"], diff:${JSON.stringify(newFileDiff)} } +]; +const gate = await reviewWorktreeDiffs(candidates, { label:"blank-context", reviewerCount:1 }); +const landed = gate.applyReady ? applyWorktreeDiffs(candidates, { label:"blank-context" }) : { ok:false, applied:0 }; +log("BLANKCTX gate="+gate.decision+" applyReady="+gate.applyReady+" landed="+landed.ok+" applied="+landed.applied); +return { ok: gate.decision === "approve" && gate.applyReady === true && landed.ok === true && landed.applied === 2 };`, { cwd: dir }); + assert(r.code === 0, `blank context workflow failed: ${r.out.slice(-700)}`); + assert(/BLANKCTX gate=approve applyReady=true landed=true applied=2/.test(r.out), `blank context summary wrong: ${r.out.slice(-700)}`); + assert(readFileSync(join(dir, "a.md"), "utf8") === "alpha updated\n\n", "blank-context patch did not update a.md"); + assert(readFileSync(join(dir, "b.md"), "utf8") === "bravo\n", "blank-context patch did not create b.md"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("worktree: patch conflict is structured and leaves cwd untouched", () => { + const { dir, git } = makeGitRepo("odw-apply-conflict-"); + try { + writeFileSync(join(dir, "same.txt"), "base\n"); + git(["add", "same.txt"]); + git(["commit", "-q", "-m", "same-base"]); + writeFileSync(join(dir, "same.txt"), "main\n"); + const diff = `diff --git a/same.txt b/same.txt +--- a/same.txt ++++ b/same.txt +@@ -1 +1 @@ +-base ++branch +`; + const r = run(`export const meta={name:"wconflict"}; +phase("P",""); +const res = applyWorktreeDiff({ changed:true, files:["same.txt"], diff:${JSON.stringify(diff)} }, { label:"conflict" }); +log("CONFLICT ok="+res.ok+" applied="+res.applied+" cat="+res.error?.category); +return { ok: res.ok === false && res.applied === false && res.error?.category === "patch_conflict" };`, { cwd: dir }); + assert(r.code === 0, `conflict workflow failed: ${r.out.slice(-500)}`); + assert(/CONFLICT ok=false applied=false cat=patch_conflict/.test(r.out), `conflict was not structured: ${r.out.slice(-500)}`); + assert(readFileSync(join(dir, "same.txt"), "utf8") === "main\n", "conflict apply mutated cwd"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("worktree: batch apply is atomic by default when one patch conflicts", () => { + const { dir, git } = makeGitRepo("odw-apply-atomic-"); + try { + writeFileSync(join(dir, "same.txt"), "base\n"); + git(["add", "same.txt"]); + git(["commit", "-q", "-m", "same-base"]); + writeFileSync(join(dir, "same.txt"), "main\n"); + const diff = `diff --git a/same.txt b/same.txt +--- a/same.txt ++++ b/same.txt +@@ -1 +1 @@ +-base ++branch +`; + const r = run(`export const meta={name:"watomic"}; +phase("P",""); +const add = await agent("write add", { id:"add", label:"add", isolation:"worktree", mockWriteFile:"atomic-add.txt" }); +const landed = applyWorktreeDiffs([add, { changed:true, files:["same.txt"], diff:${JSON.stringify(diff)} }], { label:"atomic" }); +log("ATOMIC ok="+landed.ok+" applied="+landed.applied+" failed="+landed.failed+" partial="+landed.partial); +return { ok: landed.ok === false && landed.applied === 0 && landed.partial === false && landed.failed >= 1 };`, { cwd: dir }); + assert(r.code === 0, `atomic workflow failed: ${r.out.slice(-500)}`); + assert(/ATOMIC ok=false applied=0 failed=\d+ partial=false/.test(r.out), `atomic result wrong: ${r.out.slice(-500)}`); + assert(!existsSync(join(dir, "atomic-add.txt")), "atomic batch created the first file before failing"); + assert(readFileSync(join(dir, "same.txt"), "utf8") === "main\n", "atomic batch mutated conflicting file"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("worktree: continueOnError explicitly allows partial batch apply", () => { + const { dir, git } = makeGitRepo("odw-apply-partial-"); + try { + writeFileSync(join(dir, "same.txt"), "base\n"); + git(["add", "same.txt"]); + git(["commit", "-q", "-m", "same-base"]); + writeFileSync(join(dir, "same.txt"), "main\n"); + const diff = `diff --git a/same.txt b/same.txt +--- a/same.txt ++++ b/same.txt +@@ -1 +1 @@ +-base ++branch +`; + const r = run(`export const meta={name:"wpartial"}; +phase("P",""); +const add = await agent("write add", { id:"add", label:"add", isolation:"worktree", mockWriteFile:"partial-add.txt" }); +const landed = applyWorktreeDiffs([add, { changed:true, files:["same.txt"], diff:${JSON.stringify(diff)} }], { label:"partial", continueOnError:true }); +log("PARTIAL ok="+landed.ok+" applied="+landed.applied+" failed="+landed.failed+" partial="+landed.partial); +return { ok: landed.ok === false && landed.applied === 1 && landed.failed === 1 && landed.partial === true };`, { cwd: dir }); + assert(r.code === 0, `partial workflow failed: ${r.out.slice(-500)}`); + assert(/PARTIAL ok=false applied=1 failed=1 partial=true/.test(r.out), `partial result wrong: ${r.out.slice(-500)}`); + assert(readFileSync(join(dir, "partial-add.txt"), "utf8").includes("mock change by add"), "continueOnError did not apply the first file"); + assert(readFileSync(join(dir, "same.txt"), "utf8") === "main\n", "partial batch mutated conflicting file"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("worktree: main snapshot guard detects post-apply mutations", () => { + const { dir } = makeGitRepo("odw-main-snapshot-"); + try { + const r = run(`export const meta={name:"wsnapshot"}; +phase("P",""); +const add = await agent("write approved", { id:"add", label:"add", isolation:"worktree", mockWriteFile:"approved.txt" }); +const landed = applyWorktreeDiffs([add], { label:"approved" }); +const snap = captureMainWorktreeSnapshot({ label:"after-apply" }); +await agent("verify mutates", { id:"verify", label:"verify", mockWriteFile:"verify-leak.txt" }); +const guard = assertMainWorktreeUnchanged(snap, { label:"verify-readonly" }); +const restore = restoreMainWorktreeSnapshot(snap, guard, { label:"verify-restore" }); +log("SNAPSHOT_GUARD ok="+guard.ok+" added="+guard.added.join("|")+" modified="+guard.modified.join("|")); +log("SNAPSHOT_RESTORE ok="+restore.ok+" removed="+restore.removed.join("|")); +return { ok: landed.ok && guard.ok === false && guard.added.includes("verify-leak.txt") && restore.ok && restore.removed.includes("verify-leak.txt") };`, { cwd: dir }); + assert(r.code === 0, `snapshot guard workflow failed: ${r.out.slice(-700)}`); + assert(/SNAPSHOT_GUARD ok=false added=verify-leak\.txt/.test(r.out), `snapshot guard did not detect mutation: ${r.out.slice(-700)}`); + assert(/SNAPSHOT_RESTORE ok=true removed=verify-leak\.txt/.test(r.out), `snapshot restore did not remove mutation: ${r.out.slice(-700)}`); + assert(!existsSync(join(dir, "verify-leak.txt")), "snapshot restore left leaked file in cwd"); + assert(ev(r.events, "worktree_snapshot_check").some((e) => e.ok === false && e.added === 1), "missing failed snapshot check event"); + assert(ev(r.events, "worktree_snapshot_restore").some((e) => e.ok === true && e.removed === 1), "missing snapshot restore event"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("worktree: review gate approves a preflight-clean batch", () => { + const { dir } = makeGitRepo("odw-review-approve-"); + try { + const r = run(`export const meta={name:"wreviewok"}; +phase("P",""); +const change = await agent("write review ok", { id:"add", label:"add", isolation:"worktree", mockWriteFile:"review-ok.txt" }); +const gate = await reviewWorktreeDiffs([change], { label:"gate-ok", reviewerCount:2 }); +log("GATE decision="+gate.decision+" ok="+gate.ok+" applyReady="+gate.applyReady+" reviewers="+gate.reviews.length); +return { ok: gate.ok === true && gate.applyReady === true && gate.decision === "approve" && gate.reviews.length === 2 };`, { cwd: dir }); + assert(r.code === 0, `review approve workflow failed: ${r.out.slice(-500)}`); + assert(/GATE decision=approve ok=true applyReady=true reviewers=2/.test(r.out), `approve gate wrong: ${r.out.slice(-500)}`); + assert(!existsSync(join(dir, "review-ok.txt")), "review gate should not apply the patch"); + const gateEvents = ev(r.events, "worktree_review_gate"); + assert(gateEvents.some((e) => e.decision === "approve" && e.reviewers === 2), `missing approve gate event: ${JSON.stringify(gateEvents)}`); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("worktree: review gate rejects when a reviewer finds a blocker", () => { + const { dir } = makeGitRepo("odw-review-reject-"); + try { + const r = run(`export const meta={name:"wreviewreject"}; +phase("P",""); +const change = await agent("write review reject", { id:"add", label:"add", isolation:"worktree", mockWriteFile:"review-reject.txt" }); +const gate = await reviewWorktreeDiffs([change], { label:"gate-reject", context:"MOCK_REJECT" }); +log("GATE_REJECT decision="+gate.decision+" ok="+gate.ok+" blockers="+gate.blockers.length); +return { ok: gate.ok === false && gate.applyReady === false && gate.decision === "reject" && gate.blockers.length > 0 };`, { cwd: dir }); + assert(r.code === 0, `review reject workflow failed: ${r.out.slice(-500)}`); + assert(/GATE_REJECT decision=reject ok=false blockers=[1-9]/.test(r.out), `reject gate wrong: ${r.out.slice(-500)}`); + assert(!existsSync(join(dir, "review-reject.txt")), "reject gate should not apply the patch"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("worktree: review gate can require owner decision", () => { + const { dir } = makeGitRepo("odw-review-owner-"); + try { + const r = run(`export const meta={name:"wreviewowner"}; +phase("P",""); +const change = await agent("write review owner", { id:"add", label:"add", isolation:"worktree", mockWriteFile:"review-owner.txt" }); +const gate = await reviewWorktreeDiffs([change], { label:"gate-owner", context:"MOCK_NEEDS_OWNER" }); +log("GATE_OWNER decision="+gate.decision+" ok="+gate.ok+" questions="+gate.owner_questions.length); +return { ok: gate.ok === false && gate.applyReady === false && gate.decision === "needs_owner" && gate.owner_questions.length > 0 };`, { cwd: dir }); + assert(r.code === 0, `review owner workflow failed: ${r.out.slice(-500)}`); + assert(/GATE_OWNER decision=needs_owner ok=false questions=[1-9]/.test(r.out), `owner gate wrong: ${r.out.slice(-500)}`); + assert(!existsSync(join(dir, "review-owner.txt")), "owner gate should not apply the patch"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("worktree: review gate rejects patch conflicts before reviewer agents", () => { + const { dir, git } = makeGitRepo("odw-review-conflict-"); + try { + writeFileSync(join(dir, "same.txt"), "base\n"); + git(["add", "same.txt"]); + git(["commit", "-q", "-m", "same-base"]); + writeFileSync(join(dir, "same.txt"), "main\n"); + const diff = `diff --git a/same.txt b/same.txt +--- a/same.txt ++++ b/same.txt +@@ -1 +1 @@ +-base ++branch +`; + const r = run(`export const meta={name:"wreviewconflict"}; +phase("P",""); +const gate = await reviewWorktreeDiffs([{ changed:true, files:["same.txt"], diff:${JSON.stringify(diff)} }], { label:"gate-conflict", reviewerCount:2 }); +log("GATE_CONFLICT decision="+gate.decision+" ok="+gate.ok+" preflight="+gate.preflight.category+" blockers="+gate.blockers.length+" reviewers="+gate.reviews.length); +return { ok: gate.ok === false && gate.decision === "reject" && gate.preflight.category === "patch_conflict" && gate.blockers.length > 0 && gate.reviews.length === 0 };`, { cwd: dir }); + assert(r.code === 0, `review conflict workflow failed: ${r.out.slice(-500)}`); + assert(/GATE_CONFLICT decision=reject ok=false preflight=patch_conflict blockers=[1-9] reviewers=0/.test(r.out), `conflict gate wrong: ${r.out.slice(-500)}`); + assert(readFileSync(join(dir, "same.txt"), "utf8") === "main\n", "review conflict preflight mutated cwd"); + const gateEvents = ev(r.events, "worktree_review_gate"); + assert(gateEvents.some((e) => e.decision === "reject" && e.blockers > 0 && e.preflight_category === "patch_conflict" && /patch does not apply/.test(e.preflight_message || "")), `missing conflict preflight event detail: ${JSON.stringify(gateEvents)}`); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("worktree: review gate reviewers run inside a candidate worktree", () => { + const { dir } = makeGitRepo("odw-review-workspace-"); + try { + const diff = `diff --git a/review-candidate.txt b/review-candidate.txt +new file mode 100644 +--- /dev/null ++++ b/review-candidate.txt +@@ -0,0 +1 @@ ++candidate +`; + const r = run(`export const meta={name:"wreviewworkspace"}; +phase("P",""); +const gate = await reviewWorktreeDiffs([{ changed:true, files:["review-candidate.txt"], diff:${JSON.stringify(diff)} }], { label:"gate-workspace" }); +log("GATE_WORKSPACE decision="+gate.decision+" ok="+gate.ok+" verification="+gate.verification.join("|")); +return { ok: gate.ok === true && gate.decision === "approve" && gate.verification.join("|").includes("candidate file") };`, { + cwd: dir, + backend: "pandacode", + pandacodeBin: fakePanda, + env: { FAKE_PANDA: "review_workspace_probe" } + }); + assert(r.code === 0, `review workspace workflow failed: ${r.out.slice(-700)}`); + assert(/GATE_WORKSPACE decision=approve ok=true verification=.*candidate file/.test(r.out), `review workspace gate wrong: ${r.out.slice(-700)}`); + assert(!existsSync(join(dir, "review-candidate.txt")), "review gate should not apply candidate file to main cwd"); + const wl = spawnSync("git", ["worktree", "list"], { cwd: dir, encoding: "utf8" }).stdout || ""; + assert(!/[/\\]worktrees[/\\]/.test(wl), `review workspace left a git worktree:\n${wl}`); + const workspaceEvents = ev(r.events, "worktree_review_workspace"); + assert(workspaceEvents.some((e) => e.status === "start") && workspaceEvents.some((e) => e.status === "done"), `missing review workspace lifecycle events: ${JSON.stringify(workspaceEvents)}`); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + test("worktree: unchanged EXECUTOR node keeps {text, worktree:{changed:false}} (consistent shape)", () => { // Real executor reports collapse through leanAgentResult; an unchanged worktree // node must keep its worktree object instead of decaying to a bare string, so @@ -347,6 +659,38 @@ return {ok:true};`); assert(/SM ok=false cat=schema_mismatch issues=[1-9]/.test(r.out), `final schema_mismatch result wrong: ${r.out.slice(-300)}`); }); +test("schema: mock codex-result satisfies its packaged schema", () => { + const r = run(`export const meta={name:"mockcodexschema"}; +phase("P",""); +const out = await agent("mock codex result", { + label:"codex schema", + runtime:"codex", + schema:{ + title:"codex-result.schema.json", + type:"object", + required:["run_id","status","changed_files","verification","risks","adapter"], + properties:{ + run_id:{type:"string"}, + status:{enum:["completed","failed","needs_input","stopped"]}, + changed_files:{type:"array",items:{type:"string"}}, + verification:{type:"array"}, + risks:{type:"array",items:{type:"string"}}, + adapter:{ + type:"object", + required:["backend"], + properties:{backend:{enum:["codex","claude","pandacode"]},runtime:{type:"string"}} + }, + error:{type:["object","null"]} + } + } +}); +log("MCR backend="+out.adapter.backend+" status="+out.status); +return {ok:true};`); + assert(r.code === 0, `mock codex-result schema run failed: ${r.out.slice(-300)}`); + assert(ev(r.events, "agent_schema_invalid").length === 0, `unexpected schema mismatch: ${JSON.stringify(ev(r.events, "agent_schema_invalid"))}`); + assert(/MCR backend=pandacode status=completed/.test(r.out), `mock codex-result shape wrong: ${r.out.slice(-300)}`); +}); + test("schema: an unloadable schema fails fast and non-retryably", () => { // A typo'd/missing schema path is a config error, not a transient mismatch: // it must fail with a clear category and NOT burn retries (the file won't @@ -433,6 +777,8 @@ phase("P",""); return { answer: 42, tag: "selftest-result" };`); assert(r.code === 0, `run failed: ${r.out.slice(-200)}`); assert(/\[result\] \{.*"answer":42.*"tag":"selftest-result".*\}/.test(r.out), `no [result] line: ${r.out.slice(-200)}`); + assert(/logs: odw runs show /.test(r.out), `zero-install logs command missing: ${r.out.slice(-300)}`); + assert(!/\.odw\/bin\/odw runs show/.test(r.out), `stale project-local logs command leaked: ${r.out.slice(-300)}`); }); test("result: a non-serializable return is a clean failure, not an opaque crash", () => { @@ -557,6 +903,26 @@ if (s === "bamboo_usage") { }) + "\\n"); process.exit(0); } +if (s === "needs_input_no_session") { + if (args[1] === "answer") { + process.stdout.write(JSON.stringify({ + ok: true, + state: "completed", + runtime: args[0] || "", + last_agent_message: "ANSWERED_WITHOUT_SESSION", + summary: { last_agent_message: "ANSWERED_WITHOUT_SESSION" } + }) + "\\n"); + process.exit(0); + } + process.stdout.write(JSON.stringify({ + ok: false, + state: "waiting_for_user", + runtime: args[0] || "", + error: { category: "needs_input", message: "pick an option", retryable: true }, + summary: { last_agent_message: "NEEDS_INPUT_WITHOUT_SESSION" } + }) + "\\n"); + process.exit(0); +} if (s === "jsonl_final_report") { process.stdout.write(JSON.stringify({ type: "start", message: "EARLY_EVENT_MESSAGE" }) + "\\n"); process.stdout.write(JSON.stringify({ type: "delta", last_agent_message: "EARLY_EVENT_MESSAGE" }) + "\\n"); @@ -569,10 +935,76 @@ if (s === "jsonl_final_report") { }) + "\\n"); process.exit(0); } +if (s === "structured_summary_json") { + const plan = { + status: "planned", + summary: "x".repeat(4200), + tasks: [ + { + id: "planned-task", + files: ["planned.txt"], + prompt: "Create planned.txt from the structured plan." + } + ] + }; + process.stdout.write(JSON.stringify({ + ok: true, + state: "completed", + runtime: args[0] || "", + summary: { last_agent_message: JSON.stringify(plan) } + }) + "\\n"); + process.exit(0); +} +if (s === "status_noise") { + const { mkdirSync, writeFileSync } = await import("node:fs"); + const { spawnSync } = await import("node:child_process"); + mkdirSync(".pandacode", { recursive: true }); + writeFileSync(".pandacode/noise.txt", "executor scratch\\n"); + writeFileSync("intentional.txt", "intentional change\\n"); + const status = spawnSync("git", ["status", "--short"], { encoding: "utf8" }).stdout || ""; + process.stdout.write(JSON.stringify({ + ok: true, + state: "completed", + runtime: args[0] || "", + last_agent_message: "STATUS=" + status.replace(/\\n/g, "|"), + summary: { last_agent_message: "STATUS=" + status.replace(/\\n/g, "|") } + }) + "\\n"); + process.exit(0); +} +if (s === "review_workspace_probe") { + const { existsSync, readFileSync } = await import("node:fs"); + const sawCandidate = existsSync("review-candidate.txt"); + const taskFileIndex = args.indexOf("--task-file"); + const promptFile = taskFileIndex >= 0 ? args[taskFileIndex + 1] : ""; + const prompt = promptFile ? readFileSync(promptFile, "utf8") : ""; + const sawLandingNote = + prompt.includes("New files from the captured diff may appear as untracked") && + prompt.includes("applyWorktreeDiffs"); + const ok = sawCandidate && sawLandingNote; + process.stdout.write(JSON.stringify({ + ok: true, + state: "completed", + runtime: args[0] || "", + decision: ok ? "approve" : "reject", + summary: ok ? "candidate file exists and new-file landing note is present" : "review workspace probe failed", + blockers: [ + ...(sawCandidate ? [] : ["candidate file missing"]), + ...(sawLandingNote ? [] : ["new-file landing note missing"]) + ], + risks: [], + owner_questions: [], + verification: [ + sawCandidate ? "review workspace contained candidate file" : "review workspace did not contain candidate file", + sawLandingNote ? "review prompt explained untracked new-file landing" : "review prompt did not explain untracked new-file landing" + ], + files_reviewed: ["review-candidate.txt"] + }) + "\\n"); + process.exit(0); +} const R = { exit1_oktrue: ['{"ok":true,"state":"completed","summary":{"ok":true},"last_agent_message":"all good"}', 1], exit1_nook: ['{"state":"completed","summary":{},"last_agent_message":"done-ish"}', 1], - okfalse: ['{"ok":false,"state":"failed","error":{"category":"codexctl_rate_limit","message":"rate limited"}}', 0], + okfalse: ['{"ok":false,"state":"failed","error":{"category":"codex_rate_limit","message":"rate limited"}}', 0], bamboo_reply: ['{"ok":true,"state":"completed","runtime":"bamboo","summary":{"status":"completed","summary":"BAMBOO-REPLY-TEXT"}}', 0], }; const [out, code] = R[s] || R.exit1_oktrue; @@ -595,7 +1027,7 @@ test("pandacode: non-zero exit with ok:true/absent report is surfaced as failure test("pandacode: structured ok:false report preserves error category + fails", () => { const r = run(FAILWF, { backend: "pandacode", pandacodeBin: fakePanda, env: { FAKE_PANDA: "okfalse" } }); assert(r.code !== 0, `ok:false must fail: ${r.out.slice(-200)}`); - assert(/"category":"codexctl_rate_limit"/.test(r.out), `error category lost: ${r.out.slice(-200)}`); + assert(/"category":"codex_rate_limit"/.test(r.out), `error category lost: ${r.out.slice(-200)}`); }); test("pandacode: JSONL stdout selects final report instead of earlier events", () => { @@ -609,6 +1041,30 @@ return { ok:true };`; assert(!/JSONL_RESULT=EARLY_EVENT_MESSAGE/.test(r.out), `early event was selected as report: ${r.out.slice(-500)}`); }); +test("pandacode: schema nodes extract long structured final JSON before truncation", () => { + const wf = `export const meta={name:"longjson"}; +const plan = await agent("x", { + runtime:"codex", + label:"plan", + schema:{ + title:"task-plan.schema.json", + type:"object", + required:["status","summary","tasks"], + properties:{ + status:{enum:["planned"]}, + summary:{type:"string"}, + tasks:{type:"array",items:{type:"object",required:["id","prompt"],properties:{id:{type:"string"},files:{type:"array",items:{type:"string"}},prompt:{type:"string"}}}} + } + } +}); +log("PLAN="+plan.status+":"+plan.tasks.length+":"+plan.summary.length); +return { ok:true };`; + const r = run(wf, { backend: "pandacode", pandacodeBin: fakePanda, env: { FAKE_PANDA: "structured_summary_json" } }); + assert(r.code === 0, `long structured JSON run failed: ${r.out.slice(-500)}`); + assert(/PLAN=planned:1:4200/.test(r.out), `structured JSON was not extracted before truncation: ${r.out.slice(-500)}`); + assert(ev(r.events, "agent_schema_invalid").length === 0, `unexpected schema mismatch: ${JSON.stringify(ev(r.events, "agent_schema_invalid"))}`); +}); + test("pandacode: Bamboo provider dispatch argv and helper are passed through", () => { const wf = `export const meta={name:"pb"}; const args = await agent("x",{runtime:"bamboo",provider:"deepseek",model:"deepseek-v4-pro",effort:"high",label:"bamboo-node",id:"bamboo-node"}); @@ -616,7 +1072,7 @@ const helper = await pandacode.bamboo("y",{provider:"deepseek",label:"bamboo-hel log("PARGS="+args); log("HARGS="+helper); return { ok:true };`; - const r = run(wf, { backend: "pandacode", pandacodeBin: fakePanda, env: { FAKE_PANDA: "argv" } }); + const r = run(wf, { backend: "pandacode", pandacodeBin: fakePanda, env: { FAKE_PANDA: "argv", PANDACODE_BAMBOO_API_KEY: "fake-key" } }); assert(r.code === 0, `bamboo provider run failed: ${r.out.slice(-300)}`); assert(/PARGS=bamboo exec --provider deepseek\b/.test(r.out), `bamboo provider argv wrong: ${r.out.slice(-500)}`); assert(/HARGS=bamboo exec --provider deepseek\b/.test(r.out), `pandacode.bamboo helper argv wrong: ${r.out.slice(-500)}`); @@ -634,6 +1090,78 @@ return { ok: r?.ok !== false };`; assert(/provider is only supported for PandaCode Bamboo nodes; got runtime=codex/.test(r.out), `provider error unclear: ${r.out.slice(-500)}`); }); +test("pandacode: Bamboo missing API key is blocked before prompt/executor dispatch", () => { + const configDir = mkdtempSync(join(tmpRoot, "empty-bamboo-config-")); + const wf = `export const meta={name:"bpre"}; +const r = await agent("x",{runtime:"bamboo",provider:"deepseek",label:"missing-bamboo",id:"missing-bamboo"}); +return { ok: r?.ok === false && r?.state === "blocked" && r?.error?.category === "bamboo_missing_api_key" };`; + const r = run(wf, { + backend: "pandacode", + pandacodeBin: fakePanda, + env: { + FAKE_PANDA: "argv", + PANDACODE_BAMBOO_API_KEY: "", + BAMBOO_API_KEY: "", + DEEPSEEK_API_KEY: "", + PANDACODE_BAMBOO_PROVIDER: "", + PANDACODE_BAMBOO_CONFIG_DIR: configDir + } + }); + assert(r.code === 0, `missing-key preflight workflow should handle blocked result: ${r.out.slice(-500)}`); + const blocked = r.state.failedAgents?.["missing-bamboo"]?.result; + assert(blocked?.state === "blocked", `state did not record blocked result: ${JSON.stringify(blocked)}`); + assert(blocked?.error?.category === "bamboo_missing_api_key", `wrong preflight category: ${JSON.stringify(blocked)}`); + assert(ev(r.events, "panda_preflight_blocked").length === 1, `missing preflight event: ${JSON.stringify(r.events.slice(-5))}`); + const files = readdirSync(r.runDir || tmpRoot); + assert(!files.some((file) => file.endsWith(".prompt.md")), `preflight wrote prompt files: ${files.join(",")}`); + assert(!files.some((file) => file.startsWith("pandacode-bamboo-")), `preflight wrote raw reports: ${files.join(",")}`); +}); + +test("pandacode: Bamboo default provider accepts Deepseek key without explicit provider", () => { + const configDir = mkdtempSync(join(tmpRoot, "empty-bamboo-config-")); + const wf = `export const meta={name:"bdef"}; +const r = await agent("x",{runtime:"bamboo",label:"default-bamboo",id:"default-bamboo"}); +log("DEFAULT_ARGS="+r); +return { ok: typeof r === "string" && r.startsWith("bamboo exec ") };`; + const r = run(wf, { + backend: "pandacode", + pandacodeBin: fakePanda, + env: { + FAKE_PANDA: "argv", + PANDACODE_BAMBOO_API_KEY: "", + BAMBOO_API_KEY: "", + DEEPSEEK_API_KEY: "fake-deepseek-key", + PANDACODE_BAMBOO_CONFIG_DIR: configDir + } + }); + assert(r.code === 0, `default Deepseek-key Bamboo run should dispatch: ${r.out.slice(-500)}`); + assert(/DEFAULT_ARGS=bamboo exec\b/.test(r.out), `default Bamboo argv missing: ${r.out.slice(-500)}`); +}); + +test("pandacode: unknown Bamboo provider is blocked before executor dispatch", () => { + const configDir = mkdtempSync(join(tmpRoot, "empty-bamboo-config-")); + const wf = `export const meta={name:"bunk"}; +const r = await agent("x",{runtime:"bamboo",provider:"not-a-provider",label:"unknown-bamboo",id:"unknown-bamboo"}); +return { ok: r?.ok === false && r?.state === "blocked" && r?.error?.category === "bamboo_unknown_provider" };`; + const r = run(wf, { + backend: "pandacode", + pandacodeBin: fakePanda, + env: { + FAKE_PANDA: "argv", + PANDACODE_BAMBOO_API_KEY: "", + BAMBOO_API_KEY: "", + DEEPSEEK_API_KEY: "", + PANDACODE_BAMBOO_CONFIG_DIR: configDir + } + }); + assert(r.code === 0, `unknown-provider preflight workflow should handle blocked result: ${r.out.slice(-500)}`); + const blocked = r.state.failedAgents?.["unknown-bamboo"]?.result; + assert(blocked?.error?.category === "bamboo_unknown_provider", `wrong unknown-provider category: ${JSON.stringify(blocked)}`); + const files = readdirSync(r.runDir || tmpRoot); + assert(!files.some((file) => file.endsWith(".prompt.md")), `unknown-provider preflight wrote prompt files: ${files.join(",")}`); + assert(!files.some((file) => file.startsWith("pandacode-bamboo-")), `unknown-provider preflight wrote raw reports: ${files.join(",")}`); +}); + test("mock: Bamboo provider agent returns normally", () => { const wf = `export const meta={name:"mb"}; const r = await agent("x",{runtime:"bamboo",provider:"deepseek",label:"mock-bamboo",id:"mock-bamboo"}); @@ -649,13 +1177,30 @@ test("budget: Bamboo usage total_tokens accrues when reported", () => { await agent("x",{runtime:"bamboo",provider:"deepseek",label:"usage-bamboo",id:"usage-bamboo"}); log("BUDGET spent="+budget.spent()+" approx="+Boolean(budget.approx)); return { ok:true };`; - const r = run(wf, { backend: "pandacode", pandacodeBin: fakePanda, env: { FAKE_PANDA: "bamboo_usage" }, input: { budget: { total: 1000 } } }); + const r = run(wf, { backend: "pandacode", pandacodeBin: fakePanda, env: { FAKE_PANDA: "bamboo_usage", PANDACODE_BAMBOO_API_KEY: "fake-key" }, input: { budget: { total: 1000 } } }); assert(r.code === 0, `bamboo usage run failed: ${r.out.slice(-300)}`); assert(/BUDGET spent=123/.test(r.out), `bamboo usage not accrued: ${r.out.slice(-500)}`); assert(r.state.budget?.spent === 123, `state budget spent wrong: ${JSON.stringify(r.state.budget)}`); assert(r.state.budget?.approx !== true, `usage-backed bamboo node should not mark approx: ${JSON.stringify(r.state.budget)}`); }); +test("pandacode: raw reports for exec and answer do not overwrite without downstream session", () => { + const wf = `export const meta={name:"brpt"}; +const r = await agent("x",{runtime:"bamboo",provider:"deepseek",label:"needs-input-bamboo",id:"needs-input-bamboo"}); +log("RAW_TEXT="+r); +return { ok: r === "ANSWERED_WITHOUT_SESSION" };`; + const r = run(wf, { + backend: "pandacode", + pandacodeBin: fakePanda, + env: { FAKE_PANDA: "needs_input_no_session", PANDACODE_BAMBOO_API_KEY: "fake-key" } + }); + assert(r.code === 0, `needs-input raw report workflow failed: ${r.out.slice(-500)}`); + const reports = readdirSync(r.runDir || tmpRoot).filter((file) => file.startsWith("pandacode-bamboo-") && file.endsWith(".report.json")); + assert(reports.some((file) => file.endsWith("-exec.report.json")), `missing exec raw report: ${reports.join(",")}`); + assert(reports.some((file) => file.endsWith("-answer.report.json")), `missing answer raw report: ${reports.join(",")}`); + assert(reports.length >= 2, `raw reports were overwritten: ${reports.join(",")}`); +}); + test("doctor: Bamboo is reported but missing api_key does not fail ODW top-level health", () => { const root = mkdtempSync(join(tmpRoot, "doctor-")); // odw is zero-install — doctor needs no scaffolded pack, just runs against a dir. @@ -687,7 +1232,7 @@ test("pandacode: bamboo summary.summary becomes the node's final text", () => { const r = run(`export const meta={name:"bs"}; const t = await agent("hi", { runtime:"bamboo", provider:"deepseek", label:"b" }); log("BAMBOO text="+JSON.stringify(typeof t === "string" ? t : (t && t.text))); -return {ok:true};`, { backend: "pandacode", pandacodeBin: fakePanda, env: { FAKE_PANDA: "bamboo_reply" } }); +return {ok:true};`, { backend: "pandacode", pandacodeBin: fakePanda, env: { FAKE_PANDA: "bamboo_reply", PANDACODE_BAMBOO_API_KEY: "fake-key" } }); assert(r.code === 0, `run failed: ${r.out.slice(-200)}`); assert(/BAMBOO text="BAMBOO-REPLY-TEXT"/.test(r.out), `bamboo reply (summary.summary) not extracted: ${r.out.slice(-300)}`); }); @@ -701,6 +1246,20 @@ return { ok: r?.ok !== false };`; assert(odwWorktreeLeftovers().length === 0, `orphan worktree after failure:\n${odwWorktreeLeftovers().join("\n")}`); }); +test("pandacode: worktree git status hides executor scratch directories", () => { + const wf = `export const meta={name:"wstatus"}; +const r = await agent("x",{runtime:"codex",isolation:"worktree",label:"status-noise"}); +log("STATUS_RESULT="+r.text); +log("WT_FILES="+r.worktree.files.join("|")); +return { ok: r?.ok !== false };`; + const r = run(wf, { backend: "pandacode", pandacodeBin: fakePanda, env: { FAKE_PANDA: "status_noise" } }); + assert(r.code === 0, `worktree status run failed: ${r.out.slice(-300)}`); + assert(/STATUS_RESULT=STATUS=\?\? intentional\.txt\|/.test(r.out), `intentional file missing from status: ${r.out.slice(-500)}`); + assert(!/STATUS_RESULT=.*\.pandacode/.test(r.out), `executor scratch leaked into git status: ${r.out.slice(-500)}`); + assert(/WT_FILES=intentional\.txt/.test(r.out), `captured files wrong: ${r.out.slice(-500)}`); + assert(odwWorktreeLeftovers().length === 0, `orphan worktree after status run:\n${odwWorktreeLeftovers().join("\n")}`); +}); + // 11. .d.ts contract matches the real sandbox globals (no drift) ------------- test("contract: workflow-api.d.ts globals exactly match the runtime sandbox", () => { const dts = readFileSync(join(REPO, "src/pack/templates/workflow-api.d.ts"), "utf8"); @@ -767,7 +1326,11 @@ test("report: odw report --run renders an HTML execution graph", () => { const r = run(`export const meta={name:"rp"}; phase("P",""); await parallel([ () => agent("alpha task",{label:"a",runtime:"codex",model:"gpt-5-codex"}), () => agent("beta task",{label:"b",runtime:"claude"}) ]); -return {ok:true};`); +return {ok:true,history:[ + {step:"plan",summary:"mock report history",tasks:[{id:"alpha"},{id:"beta"}]}, + {step:"review",round:1,decision:"approve",applyReady:true,blockers:[],files:["alpha.md","beta.md"]}, + {step:"verify",ok:true,guard:{ok:true}} +]};`); assert(r.code === 0 && r.runId, `run failed: ${r.out.slice(-200)}`); const rep = spawnSync(ODW, ["report", "--path", REPO, "--run", r.runId], { cwd: REPO, encoding: "utf8" }); assert((rep.status ?? 1) === 0, `report failed: ${((rep.stdout || "") + (rep.stderr || "")).slice(-300)}`); @@ -777,6 +1340,709 @@ return {ok:true};`); assert(/"runtime":"codex"/.test(html) && /"runtime":"claude"/.test(html), "node runtimes missing in report"); assert(/"model":"gpt-5-codex"/.test(html), "node model missing in report"); assert(/config \(from code\)/.test(html) && /"prompt":"alpha task"/.test(html), "report missing config/prompt UI parsed from code"); + assert(/workflow history/.test(html), "overview workflow history missing in report"); + assert(/plan: 2 task\(s\) alpha,beta/.test(html), "plan history missing in report overview"); + assert(/review r1: approve applyReady=true blockers=0 files=2/.test(html), "review history missing in report overview"); +}); + +test("report: review gate and apply events are visible in the execution graph", () => { + const { dir } = makeGitRepo("odw-report-events-"); + try { + const r = run(`export const meta={name:"rpevents"}; +phase("P",""); +const change = await agent("write report event", { id:"change", label:"change", isolation:"worktree", mockWriteFile:"report-event.txt" }); +const gate = await reviewWorktreeDiffs([change], { label:"report-gate" }); +if (!gate.applyReady) return { ok:false, gate }; +const landed = applyWorktreeDiffs([change], { label:"report-apply" }); +return { ok: landed.ok, gate, landed };`, { cwd: dir }); + assert(r.code === 0 && r.runId, `run failed: ${r.out.slice(-500)}`); + const rep = spawnSync(ODW, ["report", "--path", dir, "--run", r.runId], { cwd: dir, encoding: "utf8" }); + assert((rep.status ?? 1) === 0, `report failed: ${((rep.stdout || "") + (rep.stderr || "")).slice(-300)}`); + const htmlPath = (rep.stdout || "").trim().split(/\r?\n/).pop(); + const html = readFileSync(htmlPath, "utf8"); + assert(/gate: approve/.test(html), "review gate node missing from report"); + assert(/worktree_review_gate/.test(html), "review gate event detail missing from report"); + assert(/worktree_review_workspace/.test(html), "review workspace event detail missing from report"); + assert(/worktree_patch_apply/.test(html), "apply event detail missing from report"); + assert(/review gates/.test(html) && /apply events/.test(html), "overview event counts missing from report"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("report: rejected review gates expose blocker evidence", () => { + const { dir } = makeGitRepo("odw-report-reject-evidence-"); + try { + const r = run(`export const meta={name:"rpreject"}; +phase("P",""); +const change = await agent("write rejected report event", { id:"change", label:"change", isolation:"worktree", mockWriteFile:"report-reject.txt" }); +const gate = await reviewWorktreeDiffs([change], { label:"report-reject", context:"MOCK_REJECT" }); +return { ok: true, gate, history:[{step:"review",round:1,decision:gate.decision,applyReady:gate.applyReady,blockers:gate.blockers,files:gate.files}] };`, { cwd: dir }); + assert(r.code === 0 && r.runId, `run failed: ${r.out.slice(-500)}`); + const rep = spawnSync(ODW, ["report", "--path", dir, "--run", r.runId], { cwd: dir, encoding: "utf8" }); + assert((rep.status ?? 1) === 0, `report failed: ${((rep.stdout || "") + (rep.stderr || "")).slice(-300)}`); + const htmlPath = (rep.stdout || "").trim().split(/\r?\n/).pop(); + const html = readFileSync(htmlPath, "utf8"); + assert(/gate: reject/.test(html), "reject gate node missing from report"); + assert(/blocker_samples/.test(html) && /mock blocker/.test(html), "reject gate blocker evidence missing from report"); + assert(/review r1: reject applyReady=false blockers=1 files=1[^<]*mock blocker/.test(html), "review history blocker sample missing from report overview"); + assert(/review_decisions/.test(html) && /review:reject/.test(html), "reject gate reviewer decision evidence missing from report"); + assert(/"failed":false/.test(html), "a repaired/non-terminal reject gate should not mark the whole report failed"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("report: workflow log events are visible", () => { + const r = run(`export const meta={name:"rplog"}; +phase("P",""); +log("REPORT_LOG evidence visible"); +return {ok:true};`); + assert(r.code === 0 && r.runId, `run failed: ${r.out.slice(-300)}`); + const rep = spawnSync(ODW, ["report", "--path", REPO, "--run", r.runId], { cwd: REPO, encoding: "utf8" }); + assert((rep.status ?? 1) === 0, `report failed: ${((rep.stdout || "") + (rep.stderr || "")).slice(-300)}`); + const htmlPath = (rep.stdout || "").trim().split(/\r?\n/).pop(); + const html = readFileSync(htmlPath, "utf8"); + assert(/log: REPORT_LOG evidence visible/.test(html), "workflow log node missing from report"); + assert(/"message":"REPORT_LOG evidence visible"/.test(html), "workflow log detail missing from report"); +}); + +test("examples: parallel-review-apply starter dry-runs and lands approved diffs", () => { + const { dir } = makeGitRepo("odw-example-07-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('example verify ok')\"", + tasks: [ + { id: "alpha", file: "docs/alpha.md", prompt: "Create docs/alpha.md." }, + { id: "beta", file: "docs/beta.md", prompt: "Create docs/beta.md." } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code === 0, `example 07 run failed: ${r.out.slice(-700)}`); + assert(existsSync(join(dir, "docs/alpha.md")), "example 07 did not land alpha file"); + assert(existsSync(join(dir, "docs/beta.md")), "example 07 did not land beta file"); + assert(ev(r.events, "worktree_review_gate").some((e) => e.decision === "approve"), "example 07 missing approve gate"); + assert(ev(r.events, "worktree_patch_apply").filter((e) => e.applied).length === 2, "example 07 missing two apply events"); + const rep = spawnSync(ODW, ["report", "--path", dir, "--run", r.runId], { cwd: dir, encoding: "utf8" }); + assert((rep.status ?? 1) === 0, `example 07 report failed: ${((rep.stdout || "") + (rep.stderr || "")).slice(-300)}`); + const html = readFileSync((rep.stdout || "").trim().split(/\r?\n/).pop(), "utf8"); + assert(/gate: approve/.test(html) && /apply applied/.test(html), "example 07 report missing gate/apply nodes"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply repairs only blocker-matched tasks before landing", () => { + const { dir } = makeGitRepo("odw-example-07-repair-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('repair verify ok')\"", + maxReviewRounds: 2, + tasks: [ + { id: "alpha", file: "docs/repair-alpha.md", prompt: "Create docs/repair-alpha.md." }, + { id: "beta", file: "docs/repair-beta.md", prompt: "Create docs/repair-beta.md." } + ], + reviewers: [ + { label: "flaky-review", runtime: "codex", perspective: "MOCK_REJECT_ONCE_FILE:docs/repair-beta.md" } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code === 0, `example 07 repair run failed: ${r.out.slice(-900)}`); + assert(/repairing tasks=beta/.test(r.out), `repair did not target beta only: ${r.out.slice(-900)}`); + const result = r.state.result; + assert(Array.isArray(result?.history), "starter result missing history"); + assert(result.history.some((item) => item.step === "review" && item.decision === "reject"), "history missing rejected review"); + assert(result.history.some((item) => item.step === "repair_plan" && item.tasks?.join("|") === "beta"), "history missing targeted repair plan"); + assert(result.history.some((item) => item.step === "repair" && item.files?.includes("docs/repair-beta.md")), "history missing repair files"); + assert(result.history.some((item) => item.step === "review" && item.decision === "approve"), "history missing approved review"); + const gates = ev(r.events, "worktree_review_gate"); + assert(gates.some((e) => e.decision === "reject"), `repair test missing reject gate: ${JSON.stringify(gates)}`); + assert(gates.some((e) => e.decision === "approve"), `repair test missing approve gate: ${JSON.stringify(gates)}`); + const alpha = readFileSync(join(dir, "docs/repair-alpha.md"), "utf8"); + const beta = readFileSync(join(dir, "docs/repair-beta.md"), "utf8"); + assert(/mock change by impl:alpha/.test(alpha), `unchanged task should retain initial candidate: ${alpha}`); + assert(/mock change by repair:beta/.test(beta), `blocker-matched task should land repair diff: ${beta}`); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply treats secondary file mentions as repair evidence", () => { + const { dir } = makeGitRepo("odw-example-07-repair-primary-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('primary repair verify ok')\"", + maxReviewRounds: 2, + tasks: [ + { id: "code", file: "src/api.js", prompt: "Create src/api.js." }, + { id: "tests", file: "test.mjs", prompt: "Create test.mjs." }, + { id: "docs", file: "docs/api.md", prompt: "Create docs/api.md." } + ], + reviewers: [ + { + label: "contract-review", + runtime: "codex", + perspective: "MOCK_REJECT_ONCE_BLOCKER:docs/api.md documents itemCount, but src/api.js and test.mjs expose count." + } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code === 0, `example 07 primary repair run failed: ${r.out.slice(-900)}`); + assert(/repairing tasks=docs/.test(r.out), `repair did not target primary blocker file only: ${r.out.slice(-900)}`); + const result = r.state.result; + const repairPlan = result.history.find((item) => item.step === "repair_plan" && item.round === 2); + assert(repairPlan?.tasks?.join("|") === "docs", `wrong repair tasks: ${JSON.stringify(repairPlan)}`); + assert(repairPlan.retained_files?.includes("src/api.js"), `code candidate was not retained: ${JSON.stringify(repairPlan)}`); + assert(repairPlan.retained_files?.includes("test.mjs"), `test candidate was not retained: ${JSON.stringify(repairPlan)}`); + assert(/mock change by impl:code/.test(readFileSync(join(dir, "src/api.js"), "utf8")), "code task should retain initial candidate"); + assert(/mock change by impl:tests/.test(readFileSync(join(dir, "test.mjs"), "utf8")), "tests task should retain initial candidate"); + assert(/mock change by repair:docs/.test(readFileSync(join(dir, "docs/api.md"), "utf8")), "docs task should land repair candidate"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply repairs root-cause file when blocker starts with test failure", () => { + const { dir } = makeGitRepo("odw-example-07-repair-root-cause-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('root cause repair verify ok')\"", + maxReviewRounds: 2, + tasks: [ + { + id: "annotations", + file: "src/annotations.js", + prompt: "Create src/annotations.js." + }, + { id: "tests", file: "test.mjs", prompt: "Create test.mjs." } + ], + reviewers: [ + { + label: "root-cause-review", + runtime: "codex", + perspective: + "MOCK_REJECT_ONCE_BLOCKER:`node test.mjs` exits with code 1 at `test.mjs:80`: src/annotations.js does not infer sourceType from colon-delimited source strings." + } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code === 0, `example 07 root-cause repair run failed: ${r.out.slice(-900)}`); + assert( + /repairing tasks=annotations/.test(r.out), + `repair did not target implementation root cause: ${r.out.slice(-900)}` + ); + const result = r.state.result; + const repairPlan = result.history.find((item) => item.step === "repair_plan" && item.round === 2); + assert(repairPlan?.tasks?.join("|") === "annotations", `wrong repair tasks: ${JSON.stringify(repairPlan)}`); + assert( + repairPlan.retained_files?.includes("test.mjs"), + `test candidate was not retained: ${JSON.stringify(repairPlan)}` + ); + assert( + /mock change by repair:annotations/.test(readFileSync(join(dir, "src/annotations.js"), "utf8")), + "implementation task should land repair candidate" + ); + assert(/mock change by impl:tests/.test(readFileSync(join(dir, "test.mjs"), "utf8")), "test task should retain initial candidate"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply includes symbol-named root cause with symptom file", () => { + const { dir } = makeGitRepo("odw-example-07-repair-symbol-root-cause-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('symbol root cause repair verify ok')\"", + maxReviewRounds: 2, + tasks: [ + { + id: "core", + file: "src/decision-digest.js", + prompt: "Create createDecisionDigest(input) in src/decision-digest.js." + }, + { + id: "integration", + file: "index.js", + prompt: "Update buildProductArtifacts(input) in index.js to call the digest helper." + } + ], + reviewers: [ + { + label: "symbol-root-cause-review", + runtime: "codex", + perspective: + "MOCK_REJECT_ONCE_BLOCKER:index.js shows the symptom, but createDecisionDigest omits represented batch comments from the added bucket." + } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code === 0, `example 07 symbol root-cause repair run failed: ${r.out.slice(-900)}`); + assert(/repairing tasks=.*core/.test(r.out), `repair did not include symbol root cause: ${r.out.slice(-900)}`); + const result = r.state.result; + const repairPlan = result.history.find((item) => item.step === "repair_plan" && item.round === 2); + const repairedTasks = new Set(repairPlan?.tasks || []); + assert(repairedTasks.has("core"), `repair plan missed symbol root cause: ${JSON.stringify(repairPlan)}`); + assert(repairedTasks.has("integration"), `repair plan missed symptom caller: ${JSON.stringify(repairPlan)}`); + assert( + /mock change by repair:core/.test(readFileSync(join(dir, "src/decision-digest.js"), "utf8")), + "symbol root-cause task should land repair candidate" + ); + assert(/mock change by repair:integration/.test(readFileSync(join(dir, "index.js"), "utf8")), "symptom caller should land repair candidate"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply does not let prompt API mentions broaden file-path repair", () => { + const { dir } = makeGitRepo("odw-example-07-repair-file-path-over-prompt-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('file path repair verify ok')\"", + maxReviewRounds: 2, + tasks: [ + { + id: "owner-inbox", + file: "src/owner-inbox.js", + prompt: "Create buildOwnerInbox(input) and detectDecisionConflicts(input) in src/owner-inbox.js." + }, + { + id: "markdown", + file: "src/markdown.js", + prompt: "Create renderMarkdownHandoff(input) in src/markdown.js." + }, + { + id: "progress", + file: "src/progress.js", + prompt: "Create summarizeBatchProgress(input) in src/progress.js." + }, + { + id: "public-api", + file: "src/index.js", + prompt: + "Re-export buildOwnerInbox, detectDecisionConflicts, renderMarkdownHandoff, and summarizeBatchProgress from src/index.js." + } + ], + reviewers: [ + { + label: "integration-review", + runtime: "codex", + perspective: + 'MOCK_REJECT_ONCE_BLOCKER:src/markdown.js filters progress.nextAgentActions by item.status === "blocked", but src/progress.js emits nextAgentActions with type: "blocker"; progress-only handoffs omit the Blockers section.' + } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code === 0, `example 07 file-path repair run failed: ${r.out.slice(-900)}`); + assert(/repairing tasks=markdown/.test(r.out), `repair broadened beyond root-cause file path: ${r.out.slice(-900)}`); + const result = r.state.result; + const repairPlan = result.history.find((item) => item.step === "repair_plan" && item.round === 2); + assert(repairPlan?.tasks?.join("|") === "markdown", `wrong repair tasks: ${JSON.stringify(repairPlan)}`); + assert(repairPlan.retained_files?.includes("src/owner-inbox.js"), `owner inbox candidate was not retained: ${JSON.stringify(repairPlan)}`); + assert(repairPlan.retained_files?.includes("src/progress.js"), `progress candidate was not retained: ${JSON.stringify(repairPlan)}`); + assert(repairPlan.retained_files?.includes("src/index.js"), `public API candidate was not retained: ${JSON.stringify(repairPlan)}`); + assert(/mock change by repair:markdown/.test(readFileSync(join(dir, "src/markdown.js"), "utf8")), "markdown task should land repair candidate"); + assert(/mock change by impl:progress/.test(readFileSync(join(dir, "src/progress.js"), "utf8")), "progress task should retain initial candidate"); + assert(/mock change by impl:public-api/.test(readFileSync(join(dir, "src/index.js"), "utf8")), "public API task should retain initial candidate"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply defaults to three review rounds for 3 tasks", () => { + const { dir } = makeGitRepo("odw-example-07-default-rounds-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('default rounds verify ok')\"", + tasks: [ + { id: "code", file: "src/default-rounds.js", prompt: "Create src/default-rounds.js." }, + { id: "tests", file: "test-default-rounds.mjs", prompt: "Create test-default-rounds.mjs." }, + { id: "docs", file: "docs/default-rounds.md", prompt: "Create docs/default-rounds.md." } + ], + reviewers: [ + { + label: "default-rounds-review", + runtime: "codex", + perspective: "MOCK_REJECT_TWICE_FILE:docs/default-rounds.md" + } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code === 0, `example 07 default rounds run failed: ${r.out.slice(-1200)}`); + assert(/round 2\/3/.test(r.out) && /round 3\/3/.test(r.out), `run did not use three default rounds: ${r.out.slice(-1200)}`); + const result = r.state.result; + const reviews = result.history.filter((item) => item.step === "review"); + assert(reviews.map((item) => item.decision).join("|") === "reject|reject|approve", `wrong review sequence: ${JSON.stringify(reviews)}`); + const repairPlans = result.history.filter((item) => item.step === "repair_plan"); + assert(repairPlans.length === 2, `expected two repair plans: ${JSON.stringify(repairPlans)}`); + assert(repairPlans.every((item) => item.tasks?.join("|") === "docs"), `wrong default-round repair tasks: ${JSON.stringify(repairPlans)}`); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply defaults to four review rounds for 4+ tasks", () => { + const { dir } = makeGitRepo("odw-example-07-default-four-rounds-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('default four rounds verify ok')\"", + tasks: [ + { id: "core", file: "src/default-four-core.js", prompt: "Create src/default-four-core.js." }, + { id: "api", file: "src/default-four-api.js", prompt: "Create src/default-four-api.js." }, + { id: "tests", file: "test-default-four-rounds.mjs", prompt: "Create test-default-four-rounds.mjs." }, + { id: "docs", file: "docs/default-four-rounds.md", prompt: "Create docs/default-four-rounds.md." } + ], + reviewers: [ + { + label: "default-four-rounds-review", + runtime: "codex", + perspective: "MOCK_REJECT_THRICE_FILE:docs/default-four-rounds.md" + } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code === 0, `example 07 default four rounds run failed: ${r.out.slice(-1400)}`); + const result = r.state.result; + const reviews = result.history.filter((item) => item.step === "review"); + assert( + reviews.map((item) => item.decision).join("|") === "reject|reject|reject|approve", + `wrong review sequence: ${JSON.stringify(reviews)}` + ); + const repairPlans = result.history.filter((item) => item.step === "repair_plan"); + assert(repairPlans.length === 3, `expected three repair plans: ${JSON.stringify(repairPlans)}`); + assert(repairPlans.map((item) => item.round).join("|") === "2|3|4", `wrong repair rounds: ${JSON.stringify(repairPlans)}`); + assert(repairPlans.every((item) => item.tasks?.join("|") === "docs"), `wrong default-four repair tasks: ${JSON.stringify(repairPlans)}`); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply blocks failed or cross-owned implementation before review", () => { + const { dir } = makeGitRepo("odw-example-07-pre-review-block-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('pre-review block')\"", + maxReviewRounds: 2, + tasks: [ + { + id: "alpha", + file: "docs/pre-alpha.md", + mockFile: "docs/pre-beta.md", + prompt: "Create docs/pre-alpha.md but the mock intentionally writes beta's file." + }, + { + id: "beta", + file: "docs/pre-beta.md", + mockFail: true, + prompt: "Create docs/pre-beta.md but the mock intentionally fails." + } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code !== 0, "starter should not land a partial batch with failed/cross-owned implementation"); + const result = r.state.result; + assert(result?.error?.category === "implementation_pre_review_blocked", `wrong pre-review error: ${JSON.stringify(result?.error)}`); + assert(result.history?.some((item) => item.step === "pre_review_block"), "history missing pre_review_block"); + assert(result.history?.some((item) => item.step === "repair_plan" && item.reason === "pre_review_block"), "history missing pre-review repair plan"); + assert(!existsSync(join(dir, "docs/pre-beta.md")), "cross-owned partial candidate was landed"); + assert(ev(r.events, "worktree_review_gate").length === 0, "review gate should not run before implementation issues are fixed"); + assert(ev(r.events, "worktree_patch_apply").length === 0, "apply should not run for pre-review blocked batch"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply blocks dirty task files before isolated worktrees", () => { + const { dir, git } = makeGitRepo("odw-example-07-dirty-task-"); + try { + writeFileSync(join(dir, "tracked.txt"), "committed\n"); + git(["add", "tracked.txt"]); + git(["commit", "-q", "-m", "tracked"]); + writeFileSync(join(dir, "tracked.txt"), "dirty\n"); + + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('dirty task guard')\"", + tasks: [ + { id: "tracked", file: "tracked.txt", prompt: "Update tracked.txt." } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code !== 0, "starter should fail before worktrees when task files are dirty"); + const result = r.state.result; + assert(result?.error?.category === "dirty_task_files", `wrong dirty-file error: ${JSON.stringify(result?.error)}`); + assert(result?.dirtyTaskFiles?.includes("tracked.txt"), `dirty file not reported: ${JSON.stringify(result)}`); + assert(readFileSync(join(dir, "tracked.txt"), "utf8") === "dirty\n", "dirty guard should not rewrite the user's file"); + assert(ev(r.events, "worktree_start").length === 0, "dirty guard should not create implementation worktrees"); + assert(ev(r.events, "worktree_review_gate").length === 0, "dirty guard should not run review gate"); + assert(ev(r.events, "worktree_patch_apply").length === 0, "dirty guard should not apply patches"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply blocks duplicate task file ownership before worktrees", () => { + const { dir } = makeGitRepo("odw-example-07-duplicate-file-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('duplicate task guard')\"", + tasks: [ + { id: "api-impl", file: "src/./api.js", prompt: "Create src/api.js implementation." }, + { id: "api-tests", file: "src/api.js", prompt: "Add src/api.js inline tests." } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code !== 0, "starter should fail before worktrees when task file ownership is duplicated"); + const result = r.state.result; + assert(result?.error?.category === "duplicate_task_files", `wrong duplicate-file error: ${JSON.stringify(result?.error)}`); + assert(result?.duplicateTaskFiles?.[0]?.file === "src/api.js", `duplicate file not reported: ${JSON.stringify(result)}`); + assert(result?.duplicateTaskFiles?.[0]?.tasks?.join("|") === "api-impl|api-tests", `duplicate owners not reported: ${JSON.stringify(result)}`); + assert(ev(r.events, "worktree_start").length === 0, "duplicate guard should not create implementation worktrees"); + assert(ev(r.events, "worktree_review_gate").length === 0, "duplicate guard should not run review gate"); + assert(ev(r.events, "worktree_patch_apply").length === 0, "duplicate guard should not apply patches"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply normalizes declared task file paths", () => { + const { dir } = makeGitRepo("odw-example-07-normalized-file-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('normalized task file verify ok')\"", + tasks: [ + { id: "normalized", file: "docs/./normalized.md", prompt: "Create docs/normalized.md." } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code === 0, `starter should normalize safe relative task file paths: ${r.out.slice(-900)}`); + assert(existsSync(join(dir, "docs/normalized.md")), "normalized declared file did not land at normalized path"); + const result = r.state.result; + assert(result?.landed?.applied === 1, `normalized file was not applied: ${JSON.stringify(result?.landed)}`); + assert(!result.history?.some((item) => item.step === "pre_review_block"), `normalized file caused scope block: ${JSON.stringify(result.history)}`); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply blocks unsafe task file paths before worktrees", () => { + const { dir } = makeGitRepo("odw-example-07-invalid-file-paths-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('invalid path guard')\"", + tasks: [ + { id: "escape", file: "../outside.md", prompt: "Write outside the repo." }, + { id: "internal", files: [".odw/internal.md", "src\\windows.js"], prompt: "Write internal/generated paths." } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code !== 0, "starter should fail before worktrees when task file paths are unsafe"); + const result = r.state.result; + assert(result?.error?.category === "invalid_task_files", `wrong invalid-file error: ${JSON.stringify(result?.error)}`); + const errors = Object.fromEntries((result?.invalidTaskFiles || []).map((item) => [item.file, item.error])); + assert(errors["../outside.md"] === "path_escape", `path escape not reported: ${JSON.stringify(result)}`); + assert(errors[".odw/internal.md"] === "reserved_path", `reserved path not reported: ${JSON.stringify(result)}`); + assert(errors["src\\windows.js"] === "backslash_path", `backslash path not reported: ${JSON.stringify(result)}`); + assert(ev(r.events, "worktree_start").length === 0, "invalid-file guard should not create implementation worktrees"); + assert(ev(r.events, "worktree_review_gate").length === 0, "invalid-file guard should not run review gate"); + assert(ev(r.events, "worktree_patch_apply").length === 0, "invalid-file guard should not apply patches"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply blocks missing or duplicate task ids before worktrees", () => { + const { dir } = makeGitRepo("odw-example-07-invalid-task-ids-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const duplicateInput = { + test: "node -e \"console.log('duplicate id guard')\"", + tasks: [ + { id: "api", file: "src/api.js", prompt: "Create src/api.js." }, + { id: "api", file: "test/api.test.js", prompt: "Create test/api.test.js." } + ] + }; + const duplicate = run(null, { cwd: dir, scriptPath, input: duplicateInput }); + assert(duplicate.code !== 0, "starter should fail before worktrees when task ids are duplicated"); + const duplicateResult = duplicate.state.result; + assert(duplicateResult?.error?.category === "invalid_task_ids", `wrong duplicate-id error: ${JSON.stringify(duplicateResult?.error)}`); + assert(duplicateResult?.duplicateTaskIds?.[0]?.id === "api", `duplicate id not reported: ${JSON.stringify(duplicateResult)}`); + assert(duplicateResult?.duplicateTaskIds?.[0]?.owners?.map((item) => item.file).join("|") === "src/api.js|test/api.test.js", `duplicate id owners not reported: ${JSON.stringify(duplicateResult)}`); + assert(ev(duplicate.events, "worktree_start").length === 0, "duplicate id guard should not create implementation worktrees"); + + const missingInput = { + test: "node -e \"console.log('missing id guard')\"", + tasks: [ + { id: "ok", file: "src/ok.js", prompt: "Create src/ok.js." }, + { file: "docs/missing-id.md", prompt: "Create docs/missing-id.md." } + ] + }; + const missing = run(null, { cwd: dir, scriptPath, input: missingInput }); + assert(missing.code !== 0, "starter should fail before worktrees when a task id is missing"); + const missingResult = missing.state.result; + assert(missingResult?.error?.category === "invalid_task_ids", `wrong missing-id error: ${JSON.stringify(missingResult?.error)}`); + assert(missingResult?.missingTaskIds?.[0]?.index === 1, `missing id index not reported: ${JSON.stringify(missingResult)}`); + assert(missingResult?.missingTaskIds?.[0]?.file === "docs/missing-id.md", `missing id file not reported: ${JSON.stringify(missingResult)}`); + assert(ev(missing.events, "worktree_start").length === 0, "missing id guard should not create implementation worktrees"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply blocks undeclared task file ownership before worktrees", () => { + const { dir } = makeGitRepo("odw-example-07-undeclared-files-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('undeclared file guard')\"", + tasks: [ + { id: "explore", prompt: "Find and edit whatever files are needed." }, + { id: "docs", file: "docs/owned.md", prompt: "Create docs/owned.md." } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code !== 0, "starter should fail before worktrees when task file ownership is undeclared"); + const result = r.state.result; + assert(result?.error?.category === "undeclared_task_files", `wrong undeclared-file error: ${JSON.stringify(result?.error)}`); + assert(result?.undeclaredTaskFiles?.[0]?.id === "explore", `undeclared task id not reported: ${JSON.stringify(result)}`); + assert(result?.undeclaredTaskFiles?.[0]?.index === 0, `undeclared task index not reported: ${JSON.stringify(result)}`); + assert(ev(r.events, "worktree_start").length === 0, "undeclared-file guard should not create implementation worktrees"); + assert(ev(r.events, "worktree_review_gate").length === 0, "undeclared-file guard should not run review gate"); + assert(ev(r.events, "worktree_patch_apply").length === 0, "undeclared-file guard should not apply patches"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply blocks empty or non-string task prompts before worktrees", () => { + const { dir } = makeGitRepo("odw-example-07-invalid-prompts-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('invalid prompt guard')\"", + tasks: [ + { id: "empty", file: "docs/empty.md", prompt: " " }, + { id: "object", file: "docs/object.md", prompt: { text: "Create docs/object.md." } } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code !== 0, "starter should fail before worktrees when task prompts are invalid"); + const result = r.state.result; + assert(result?.error?.category === "invalid_task_prompts", `wrong invalid-prompt error: ${JSON.stringify(result?.error)}`); + const prompts = Object.fromEntries((result?.invalidTaskPrompts || []).map((item) => [item.id, item.type])); + assert(prompts.empty === "string", `empty string prompt not reported: ${JSON.stringify(result)}`); + assert(prompts.object === "object", `object prompt not reported: ${JSON.stringify(result)}`); + assert(ev(r.events, "worktree_start").length === 0, "invalid prompt guard should not create implementation worktrees"); + assert(ev(r.events, "worktree_review_gate").length === 0, "invalid prompt guard should not run review gate"); + assert(ev(r.events, "worktree_patch_apply").length === 0, "invalid prompt guard should not apply patches"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply fails if final verification mutates cwd", () => { + const { dir } = makeGitRepo("odw-example-07-verify-guard-"); + try { + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + const input = { + test: "node -e \"console.log('verify guard')\"", + verifyMockWriteFile: "docs/verify-leak.md", + tasks: [ + { id: "alpha", file: "docs/guard-alpha.md", prompt: "Create docs/guard-alpha.md." } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code !== 0, "starter should fail when final verification mutates cwd"); + const result = r.state.result; + assert(result?.error?.category === "verification_mutated_worktree", `wrong verification guard error: ${JSON.stringify(result?.error)}`); + assert(result?.verifyGuard?.added?.includes("docs/verify-leak.md"), `verify guard missing added file: ${JSON.stringify(result?.verifyGuard)}`); + assert(result?.verifyRestore?.ok === true && result.verifyRestore.removed?.includes("docs/verify-leak.md"), `verify restore missing removed file: ${JSON.stringify(result?.verifyRestore)}`); + assert(!existsSync(join(dir, "docs/verify-leak.md")), "starter verify guard left leaked file in cwd"); + assert(ev(r.events, "worktree_snapshot_check").some((e) => e.ok === false && e.files === 1), "missing failed snapshot check event"); + assert(ev(r.events, "worktree_snapshot_restore").some((e) => e.ok === true && e.removed === 1), "missing snapshot restore event"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("starter: built-in parallel-review-apply prints a runnable workflow", () => { + const list = runOdw(["starter", "--list"]); + assert(list.code === 0, `starter --list failed: ${list.out.slice(-300)}`); + assert(/parallel-review-apply/.test(list.out), `starter list missing parallel-review-apply: ${list.out}`); + const starter = runOdw(["starter", "parallel-review-apply"]); + assert(starter.code === 0, `starter print failed: ${starter.out.slice(-300)}`); + assert(/reviewWorktreeDiffs/.test(starter.out) && /applyWorktreeDiffs/.test(starter.out), "starter output missing review/apply APIs"); + assert(/owner-provided product intent/.test(starter.out), "starter output missing owner-intent review policy"); + assert(/history/.test(starter.out) && /repair_plan/.test(starter.out), "starter output missing review/repair history"); + assert(/pre_review_block/.test(starter.out) && /strictTaskFileBoundaries/.test(starter.out), "starter output missing pre-review implementation gate"); + assert(/invalid_task_ids/.test(starter.out) && /stable unique id/.test(starter.out), "starter output missing task id guard"); + assert(/invalid_task_files/.test(starter.out) && /repo-relative paths/.test(starter.out), "starter output missing invalid task-file path guard"); + assert(/invalid_task_prompts/.test(starter.out) && /non-empty prompt/.test(starter.out), "starter output missing task prompt guard"); + assert(/undeclared_task_files/.test(starter.out) && /allowUndeclaredTaskFiles/.test(starter.out), "starter output missing undeclared task-file guard"); + assert(/dirty_task_files/.test(starter.out) && /allowDirtyTaskFiles/.test(starter.out), "starter output missing dirty task-file guard"); + assert(/duplicate_task_files/.test(starter.out) && /allowDuplicateTaskFiles/.test(starter.out), "starter output missing duplicate task-file guard"); + assert(/TASK_PLAN_SCHEMA/.test(starter.out) && /planning_failed/.test(starter.out), "starter output missing high-level request planner"); + assert(/runtime: \{ enum: \["codex", "claude", "bamboo"\] \}/.test(starter.out), "starter task planner must not allow arbitrary runtimes"); + assert(/Planned task contracts/.test(starter.out) && /Current task/.test(starter.out), "starter output missing shared task context injection"); + assert(/Do not invent package entrypoints/.test(starter.out) && /do not skip tests/.test(starter.out), "starter output missing tests/docs ownership guard"); + assert(/captureMainWorktreeSnapshot/.test(starter.out) && /permission: "limited"/.test(starter.out), "starter output missing read-only final verification guard"); + const { dir } = makeGitRepo("odw-starter-cli-"); + try { + const scriptPath = join(dir, "starter.js"); + writeFileSync(join(dir, "package.json"), "{\"type\":\"module\"}\n"); + writeFileSync(scriptPath, starter.out); + const syntax = spawnSync("node", ["--check", scriptPath], { cwd: dir, encoding: "utf8" }); + assert((syntax.status ?? 1) === 0, `starter output is not valid ESM: ${((syntax.stdout || "") + (syntax.stderr || "")).slice(-300)}`); + const input = { + test: "node -e \"console.log('starter verify ok')\"", + tasks: [ + { id: "one", file: "docs/one.md", prompt: "Create docs/one.md." }, + { id: "two", file: "docs/two.md", prompt: "Create docs/two.md." } + ] + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code === 0, `starter output workflow failed: ${r.out.slice(-700)}`); + assert(existsSync(join(dir, "docs/one.md")) && existsSync(join(dir, "docs/two.md")), "starter output did not land docs files"); + assert(ev(r.events, "worktree_review_gate").some((e) => e.decision === "approve"), "starter output missing approve gate"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("examples: parallel-review-apply can plan tasks from a high-level request", () => { + const { dir } = makeGitRepo("odw-starter-plan-"); + const scriptPath = join(REPO, "examples/07-parallel-review-apply.js"); + try { + const input = { + request: "Create a small three-part docs slice from this high-level request.", + test: "node -e \"console.log('planned verify ok')\"" + }; + const r = run(null, { cwd: dir, scriptPath, input }); + assert(r.code === 0, `planned starter workflow failed: ${r.out.slice(-900)}`); + const result = r.state.result; + assert(result?.history?.[0]?.step === "plan", `history missing plan step: ${JSON.stringify(result?.history)}`); + assert( + result.history[0].tasks.map((task) => task.id).join("|") === "task-a|task-b|task-c", + `planner tasks not preserved: ${JSON.stringify(result.history[0].tasks)}` + ); + assert(existsSync(join(dir, "mock-a.txt")), "planned task-a file was not landed"); + assert(existsSync(join(dir, "mock-b.txt")), "planned task-b file was not landed"); + assert(existsSync(join(dir, "mock-c.txt")), "planned task-c file was not landed"); + assert(ev(r.events, "agent_done").some((e) => e.label === "plan-tasks"), "missing planner agent_done event"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } }); test("observability: a model the script left implicit is backfilled from the executor", () => { diff --git a/odw/src/guide.md b/odw/src/guide.md index 3fc5e66..a4388f6 100644 --- a/odw/src/guide.md +++ b/odw/src/guide.md @@ -79,6 +79,18 @@ return { ok: verdict.passed === true, verdict }; no total; once `spent() >= total` the next `agent()` throws. (Counts TOTAL tokens, not output-only — size budgets accordingly.) - `workflow(nameOrRef, args)` — run a saved/sibling workflow inline (1 level deep). +- `reviewWorktreeDiffs(results, opts?)` — review captured worktree patches before + landing. It first preflights the combined patch without mutating cwd, then runs + one or more reviewer agents inside a temporary candidate worktree where the + combined diff is already applied. It returns + `decision:"approve"|"reject"|"needs_owner"`. Only `approve` has + `applyReady:true`; `needs_owner` is where product/owner comments and decision + gates belong. +- `applyWorktreeDiff(result)` / `applyWorktreeDiffs(results)` — apply captured + `result.worktree` patches back to the main cwd. A batch is atomic by default: + ODW checks the combined patch first, then applies it as one patch. Conflicts + return `{ ok:false, error:{ category:"patch_conflict" } }` without mutating + files. Use `continueOnError:true` only when partial landing is intentional. - `phase(title)`, `log(msg)`, `checkpoint(name, value?)`, `promptSlot(...)`. - `args` / `input` (the `--input` payload), `odw` (run metadata: `{ backend, runId, runDir, statePath, resumeFrom }`), `pandacode` @@ -91,6 +103,14 @@ return { ok: verdict.passed === true, verdict }; // Fan out independent edits, each isolated in its own worktree, collect diffs: const results = await parallel(TASKS.map((t) => () => agent(t.prompt, { runtime: "codex", isolation: "worktree", label: t.id }))); +const gate = await reviewWorktreeDiffs(results, { + label: "batch-gate", + reviewerCount: 2, + context: "Owner accepts only low-decision-cost changes with evidence." +}); +if (!gate.applyReady) return { ok: false, gate }; +const landed = applyWorktreeDiffs(results); // atomic by default; use after review +if (!landed.ok) return { ok: false, landed }; // Pipeline: implement -> verify, per item, no barrier between stages: const out = await pipeline(items, @@ -120,11 +140,57 @@ odw exec --script wf.js --backend mock --json # One-command execution graph preview from a mock dry run: odw report --script wf.js --open +# Print the reusable large-project starter: +odw starter parallel-review-apply > wf.js + # Real run through PandaCode: odw exec --script wf.js --backend pandacode --json # (--json prints only the workflow's return value; drop it to watch live progress) ``` +- `parallel-review-apply` is the default large-project shape: independent Codex + worktrees, a candidate-worktree review gate, bounded repair/re-review + (`args.maxReviewRounds`, default 2 for small batches and 3 for 3+ tasks), + approve-only atomic landing, then final + verification. Repair targets blocker-matched task files when possible and + falls back to full-batch repair when blockers are ambiguous. It stops instead + of landing on `needs_owner`. Final verification is guarded by a main-worktree + snapshot; if the verifier modifies files after approval, the run restores + those unapproved changes and fails instead of silently bypassing review. + Pass explicit `args.tasks` when you want full control over decomposition. For + lower decision cost, pass `args.request` or `args.spec` without `tasks`; the + starter first runs a structured planning node that returns owned task files, + then sends that plan through the same preflight, review, apply, and + verification gates. + Every task must declare a stable unique `id`; ODW uses it for node keys, + sessions, repair history, and reports. Every task must also declare a + non-empty string `prompt`; empty or non-string prompts are rejected before + worktrees are created. + By default each task must declare ownership with `task.file` / `task.files` + and stay inside that declared file list; failed implementation nodes or + cross-owned file edits are repaired before any review/apply gate runs. Use + the built-in request/spec planner for exploratory decomposition, or pass + `allowUndeclaredTaskFiles:true` only when the owner explicitly accepts weaker + ownership checks. Declared files must be normalized repo-relative paths outside + `.git`, `.odw`, `.pandacode`, and `node_modules`; ODW rejects absolute paths, + backslashes, and `..` escapes before creating worktrees. Set + `strictTaskFileBoundaries:false` only when the owner explicitly wants + cross-file task overlap. + Test and documentation tasks should target the declared files and exports from + the planned task set. If a required public entrypoint is missing from task + ownership, treat it as a planning blocker or add it to a task; do not invent + undeclared entrypoints or skip tests to make isolated verification pass. + The starter injects the run context and full planned task list into every + implementation/repair prompt, so tests, docs, entrypoints, and implementation + modules can align on one shared contract even though they run in isolated + worktrees. + Because isolated worktrees branch from `HEAD`, the starter also refuses to run + when declared task files already have uncommitted changes; commit/stash them + first, or pass `allowDirtyTaskFiles:true` only when the owner accepts that + workers will not see those dirty changes. + It also blocks duplicate declared ownership of the same file; merge those + tasks, run them serially, or pass `allowDuplicateTaskFiles:true` only when + overlapping patches are intentional and reviewable. - The workflow's `return` value is printed as `[result] ` (or the sole output under `--json`). Returning `{ ok:false, ... }` makes `odw exec` exit non-zero — usable as a CI/script gate. @@ -139,20 +205,22 @@ odw exec --script wf.js --backend pandacode --json `new Date()` THROW (they break resume). Deterministic forms (`new Date(ts)`, other `Math.*`) work. Pass any timestamp/seed via `args`. - **Worktree needs committed files:** `isolation:"worktree"` branches from HEAD, - so **commit or stage** any spec/fixture the agent must read first. The captured + so **commit** any spec/fixture the agent must read first. The captured diff comes back in `result.worktree` (always present on a worktree node). - **Schema vs no schema:** no schema → final **text string**; schema → validated **object**. Schema validation retries only if you set `retry`/`maxAttempts` (default is one attempt — unlike the built-in tool, which auto-retries). An unloadable schema path fails fast with `schema_load_error`. - **Mock dry runs differ from real:** `--backend mock` has no executor, so a - no-schema node returns a small *status object* (NOT final text), and a `schema` - node always "fails" (mock can't synthesize your JSON, so the run exits non-zero - by design). So in a dry run, coerce results defensively - (`typeof x === "string" ? x : x.text ?? JSON.stringify(x)`) and don't gate on a - schema node passing. A real `--backend pandacode` run returns the final text / - validated object as described above. Use mock to prove the *graph shape* - (parallel/pipeline/phases), not node outputs. + no-schema node returns a small *status object* (NOT final text). Nodes using + ODW's packaged schemas return schema-valid synthetic objects so built-in + starter flows can be dry-run and graphed without fake schema failures. For + custom schemas, design the workflow to tolerate synthetic/mock values or run a + real `--backend pandacode` pass before trusting the content. In a dry run, + coerce no-schema results defensively + (`typeof x === "string" ? x : x.text ?? JSON.stringify(x)`). Use mock to prove + the *graph shape* (parallel/pipeline/phases) and packaged-schema wiring, not + the semantic quality of node outputs. - **Failure is data:** a node that exhausts retries returns `{ ok:false, error:{ category, ... } }` — it does **not** throw, so it stays truthy and `.filter(Boolean)` keeps it. Drop failed nodes with diff --git a/odw/src/main.rs b/odw/src/main.rs index 7745c3c..9640fca 100644 --- a/odw/src/main.rs +++ b/odw/src/main.rs @@ -27,6 +27,7 @@ odw doctor # check pandacode + runtimes are wired up\n \ odw exec --script wf.js --backend mock --json # token-free dry run\n \ odw exec --script wf.js --backend pandacode # real run\n \ odw report --script wf.js --open # HTML execution-graph preview\n \ +odw starter parallel-review-apply > wf.js # built-in large-project starter\n \ odw runs show latest # inspect a run's journal\n \ odw spec | odw contract | odw capabilities # machine-readable API + contract\n\n\ Start with `odw guide`. Everything an agent needs is in the CLI — nothing to scaffold." @@ -44,8 +45,6 @@ enum Commands { path: PathBuf, #[arg(long, default_value = "claude")] claude_bin: String, - #[arg(long, default_value = "codexctl")] - codexctl_bin: String, #[arg(long, env = "ODW_PANDACODE_BIN", default_value = "pandacode")] pandacode_bin: String, #[arg(long, help = "Print the full machine-readable doctor report")] @@ -57,8 +56,12 @@ enum Commands { Capabilities, #[command(about = "Print the Open Dynamic Workflow framework spec")] Spec, - #[command(about = "Print the self-contained agent usage guide (what odw is, how to author + run)")] + #[command( + about = "Print the self-contained agent usage guide (what odw is, how to author + run)" + )] Guide, + #[command(about = "Print a built-in starter workflow script")] + Starter(StarterArgs), #[command(subcommand, about = "Inspect ODW run journals and live logs")] Runs(RunsCommand), #[command(about = "Execute an ODW JavaScript workflow script directly")] @@ -98,8 +101,6 @@ struct ExecArgs { /// floor this at 600s so real coding isn't truncated. #[arg(long, default_value = "120")] timeout: String, - #[arg(long, env = "ODW_CODEXCTL_BIN", default_value = "codexctl")] - codexctl_bin: String, #[arg(long, env = "ODW_PANDACODE_BIN", default_value = "pandacode")] pandacode_bin: String, #[arg(long, help = "Print only the final workflow result as one JSON object")] @@ -112,11 +113,21 @@ struct ExecArgs { dry_run: bool, } +#[derive(Debug, clap::Args)] +struct StarterArgs { + #[arg(default_value = "parallel-review-apply")] + name: String, + #[arg(long, help = "List available built-in starter workflow names")] + list: bool, +} + #[derive(Debug, Subcommand)] enum RunsCommand { List { #[arg(long, default_value = ".")] path: PathBuf, + #[arg(long, help = "Print the raw JSON run list")] + json: bool, }, Show { #[arg(default_value = "latest")] @@ -128,17 +139,15 @@ enum RunsCommand { }, } - fn main() -> Result<()> { let cli = Cli::parse(); match cli.command { Commands::Doctor { path, claude_bin, - codexctl_bin, pandacode_bin, json, - } => doctor(&path, &claude_bin, &codexctl_bin, &pandacode_bin, json), + } => doctor(&path, &claude_bin, &pandacode_bin, json), Commands::Contract => { println!("{}", contract_text()); Ok(()) @@ -155,8 +164,9 @@ fn main() -> Result<()> { print!("{}", include_str!("guide.md")); Ok(()) } + Commands::Starter(args) => starter(args), Commands::Runs(command) => match command { - RunsCommand::List { path } => runs_list(&path), + RunsCommand::List { path, json } => runs_list(&path, json), RunsCommand::Show { run_id, path, tail } => runs_show(&path, &run_id, tail), }, Commands::Exec(args) => exec_script(*args), @@ -167,12 +177,11 @@ fn main() -> Result<()> { fn doctor( root: &Path, claude_bin: &str, - codexctl_bin: &str, pandacode_bin: &str, json_output: bool, ) -> Result<()> { let pandacode_bin = &resolved_pandacode_bin(pandacode_bin); - let report = doctor_report(root, claude_bin, codexctl_bin, pandacode_bin)?; + let report = doctor_report(root, claude_bin, pandacode_bin)?; let ok = report .get("ok") .and_then(|value| value.as_bool()) @@ -188,14 +197,46 @@ fn doctor( Ok(()) } +const PARALLEL_REVIEW_APPLY_STARTER: &str = include_str!("../examples/07-parallel-review-apply.js"); + +fn starter(args: StarterArgs) -> Result<()> { + let starters = [( + "parallel-review-apply", + "parallel Codex worktrees -> candidate review gate -> targeted repair/re-review -> approve-only atomic landing -> read-only verification guard", + PARALLEL_REVIEW_APPLY_STARTER, + )]; + if args.list { + for (name, description, _) in starters { + println!("{name}\t{description}"); + } + return Ok(()); + } + let Some((_, _, template)) = starters + .iter() + .find(|(name, _, _)| *name == args.name.as_str()) + else { + let names = starters + .iter() + .map(|(name, _, _)| *name) + .collect::>() + .join(", "); + bail!( + "unknown starter workflow: {}. Available starters: {}", + args.name, + names + ); + }; + print!("{template}"); + Ok(()) +} + fn doctor_report( root: &Path, claude_bin: &str, - codexctl_bin: &str, pandacode_bin: &str, ) -> Result { let root = normalize_root(root)?; - // `pandacode` is the one executor odw actually requires. claude/codexctl are + // `pandacode` is the one executor odw actually requires. claude/codex are // PandaCode's concern (it owns the runtimes + their mechanics), so they are // reported for information but do not gate odw's own health. // odw's script runtime runs on node; without it no workflow can execute, so @@ -203,10 +244,13 @@ fn doctor_report( let node = run_version("node", &["--version"]); let pandacode = run_version(pandacode_bin, &["--version"]); let claude = run_version(claude_bin, &["--version"]); - let codexctl = run_version(codexctl_bin, &["--help"]); - let codex = run_codex_status(codexctl_bin); let bamboo_keys = bamboo_key_report(); let runtimes = run_pandacode_doctor_report(pandacode_bin, &root); + // codex health comes from pandacode's own codex app-server doctor. + let codex = runtimes + .get("codex") + .cloned() + .unwrap_or_else(|| json!({ "ok": false, "summary": "pandacode codex doctor unavailable" })); Ok(json!({ "ok": node.ok && pandacode.ok, "odw_version": ODW_VERSION, @@ -215,7 +259,6 @@ fn doctor_report( "pandacode": pandacode, "runtimes": runtimes, "claude": claude, - "codexctl": codexctl, "codex": codex, "bamboo_keys": bamboo_keys, "decision": "odw is zero-install: no project files to scaffold. It dispatches each node to `pandacode exec`, so it requires only Node.js + the pandacode binary. PandaCode owns the codex/claude/bamboo runtimes." @@ -277,16 +320,11 @@ fn render_doctor_human(report: &serde_json::Value) -> String { "{} codex: {}", icon(codex_ok), if codex_ok { - "logged in / quota check passed".to_string() - } else if value_ok(&report["codexctl"]) { - format!( - "codexctl exists, but login/quota check failed ({}) - run `codexctl status`, sign in, or refresh quota", - value_summary(&report["codex"]) - ) + "available via pandacode (codex app-server)".to_string() } else { format!( - "codexctl not runnable ({}) - install codexctl or set --codexctl-bin", - value_summary(&report["codexctl"]) + "not ready ({}) - check `pandacode codex doctor` (codex login/quota)", + value_summary(&report["codex"]) ) } )); @@ -389,60 +427,6 @@ fn bamboo_ready_count(value: &serde_json::Value) -> usize { map.values().filter(|item| value_ok(item)).count() } -fn run_codex_status(codexctl_bin: &str) -> serde_json::Value { - let checks: &[&[&str]] = &[&["status"], &["account"], &["quota"]]; - let mut failures = Vec::new(); - for args in checks { - let status = run_command_status(codexctl_bin, args); - if status.ok { - return json!({ - "ok": true, - "command": command_display(codexctl_bin, args), - "summary": status.summary - }); - } - failures.push(format!( - "{}: {}", - command_display(codexctl_bin, args), - status.summary - )); - } - json!({ - "ok": false, - "command": codexctl_bin, - "summary": failures.join("; ") - }) -} - -fn run_command_status(command: &str, args: &[&str]) -> ToolStatus { - match Command::new(command).args(args).output() { - Ok(output) => { - let text = if output.stdout.is_empty() { - String::from_utf8_lossy(&output.stderr).to_string() - } else { - String::from_utf8_lossy(&output.stdout).to_string() - }; - ToolStatus { - ok: output.status.success(), - command: command_display(command, args), - summary: text.lines().next().unwrap_or("").to_string(), - } - } - Err(error) => ToolStatus { - ok: false, - command: command_display(command, args), - summary: error.to_string(), - }, - } -} - -fn command_display(command: &str, args: &[&str]) -> String { - std::iter::once(command) - .chain(args.iter().copied()) - .collect::>() - .join(" ") -} - const BAMBOO_PROVIDERS: &[(&str, &str)] = &[ ("deepseek", "DEEPSEEK_API_KEY"), ("kimi", "KIMI_API_KEY"), @@ -525,11 +509,13 @@ fn run_pandacode_doctor_report(pandacode_bin: &str, root: &Path) -> serde_json:: } } -fn runs_list(root: &Path) -> Result<()> { - println!( - "{}", - serde_json::to_string_pretty(&runs_list_report(root)?)? - ); +fn runs_list(root: &Path, json_output: bool) -> Result<()> { + let report = runs_list_report(root)?; + if json_output { + println!("{}", serde_json::to_string_pretty(&report)?); + } else { + println!("{}", format_runs_list_view(&report)); + } Ok(()) } @@ -547,12 +533,93 @@ fn runs_list_report(root: &Path) -> Result { } } } + runs.sort_by_key(|value| std::cmp::Reverse(run_list_sort_key(value))); Ok(json!({ "runs_dir": runs_dir, "runs": runs })) } +fn run_list_sort_key(value: &serde_json::Value) -> (u64, String) { + let run_id = value + .get("run_id") + .and_then(|item| item.as_str()) + .unwrap_or("") + .to_string(); + let started_ms = value + .get("started_ms") + .and_then(|item| item.as_u64()) + .or_else(|| run_id_started_ms(&run_id)) + .unwrap_or(0); + (started_ms, run_id) +} + +fn format_runs_list_view(report: &serde_json::Value) -> String { + let runs_dir = json_string(report, "runs_dir").unwrap_or_else(|| ".odw/runs".to_string()); + let runs = report + .get("runs") + .and_then(|value| value.as_array()) + .cloned() + .unwrap_or_default(); + let mut lines = vec![format!("Runs in {runs_dir}")]; + if runs.is_empty() { + lines.push(" (none)".to_string()); + lines.push("Start one: odw exec --script --input ".to_string()); + return lines.join("\n"); + } + for run in runs.iter().take(20) { + let run_id = json_string(run, "run_id").unwrap_or_else(|| "unknown".to_string()); + let status = json_string(run, "status").unwrap_or_else(|| "unknown".to_string()); + let workflow = json_string(run, "workflow") + .as_deref() + .map(short_workflow_label) + .unwrap_or_else(|| "-".to_string()); + let duration = format_run_duration(run); + lines.push(format!( + " - {run_id} [{status}] duration={duration} workflow={workflow}" + )); + } + if runs.len() > 20 { + lines.push(format!(" ... {} more run(s)", runs.len() - 20)); + } + lines.push("Show: odw runs show ".to_string()); + lines.push("JSON: odw runs list --json".to_string()); + lines.join("\n") +} + +fn short_workflow_label(path: &str) -> String { + let candidate = Path::new(path) + .file_name() + .and_then(|name| name.to_str()) + .unwrap_or(path); + if candidate.is_empty() { + "-".to_string() + } else { + candidate.to_string() + } +} + +fn format_run_duration(run: &serde_json::Value) -> String { + let started = run.get("started_ms").and_then(|value| value.as_u64()); + let finished = run.get("finished_ms").and_then(|value| value.as_u64()); + match (started, finished) { + (Some(started), Some(finished)) if finished >= started => { + format!("{}ms", finished - started) + } + (Some(_), _) => "running".to_string(), + _ => "-".to_string(), + } +} + +fn run_id_started_ms(run_id: &str) -> Option { + run_id + .strip_prefix("odw-exec-")? + .split('-') + .next()? + .parse::() + .ok() +} + fn runs_show(root: &Path, run_id: &str, tail: usize) -> Result<()> { let report = runs_show_report(root, run_id, tail)?; println!("{}", format_runs_show_view(&report)); @@ -568,9 +635,11 @@ fn runs_show_report(root: &Path, run_id: &str, tail: usize) -> Result String { let run_id = json_string(run, "run_id").unwrap_or_else(|| "unknown".to_string()); let status = json_string(run, "status").unwrap_or_else(|| "unknown".to_string()); let workflow = json_string(run, "workflow").unwrap_or_else(|| "-".to_string()); - let started = run.get("started_ms").and_then(|value| value.as_u64()); - let finished = run.get("finished_ms").and_then(|value| value.as_u64()); - let duration = match (started, finished) { - (Some(started), Some(finished)) if finished >= started => { - format!("{}ms", finished - started) - } - _ => "running".to_string(), - }; + let duration = format_run_duration(run); let active = progress .get("active_agents") .and_then(|value| value.as_array()) @@ -726,6 +788,9 @@ fn format_runs_show_view(report: &serde_json::Value) -> String { ), format!("Resume: odw exec --resume {run_id}"), ]; + if let Some(report_path) = json_string(report, "report_path") { + lines.push(format!("Report: {report_path}")); + } let events = report .get("events") @@ -743,6 +808,7 @@ fn format_runs_show_view(report: &serde_json::Value) -> String { .rev() .find(|event| json_string(event, "type").as_deref() == Some("workflow_error")) .and_then(|event| json_string(event, "message")) + .or_else(|| progress.get("result").and_then(format_result_failure_cause)) .or_else(|| json_string(&run["error"], "message")) .or_else(|| json_string(run, "error")); if let Some(cause) = cause { @@ -784,6 +850,20 @@ fn format_runs_show_view(report: &serde_json::Value) -> String { } } + let history_lines = progress + .get("result") + .and_then(|result| result.get("history")) + .and_then(|history| history.as_array()) + .map(|history| format_workflow_history_for_runs_show(history)) + .unwrap_or_default(); + if !history_lines.is_empty() { + lines.push("".to_string()); + lines.push("Workflow history:".to_string()); + for line in history_lines { + lines.push(format!(" - {line}")); + } + } + let completed_agents = progress .get("completed_agent_details") .and_then(|value| value.as_array()) @@ -801,9 +881,7 @@ fn format_runs_show_view(report: &serde_json::Value) -> String { lines.push("".to_string()); lines.push("Recent events:".to_string()); for event in events.iter().rev().take(16).rev() { - let summary = json_string(event, "summary") - .or_else(|| json_string(event, "type")) - .unwrap_or_else(|| truncate(&event.to_string(), 160)); + let summary = format_recent_event_for_runs_show(event); lines.push(format!(" - {summary}")); } } @@ -811,6 +889,236 @@ fn format_runs_show_view(report: &serde_json::Value) -> String { lines.join("\n") } +fn format_result_failure_cause(result: &serde_json::Value) -> Option { + if result.get("ok").and_then(|value| value.as_bool()) != Some(false) { + return None; + } + let error = result.get("error")?; + if let Some(message) = error + .as_str() + .map(str::trim) + .filter(|value| !value.is_empty()) + { + return Some(message.to_string()); + } + let category = json_string(error, "category"); + let message = json_string(error, "message"); + match (category, message) { + (Some(category), Some(message)) if !message.trim().is_empty() => { + Some(format!("{category}: {}", message.trim())) + } + (Some(category), _) if !category.trim().is_empty() => Some(category.trim().to_string()), + (_, Some(message)) if !message.trim().is_empty() => Some(message.trim().to_string()), + _ => None, + } +} + +fn format_workflow_history_for_runs_show(history: &[serde_json::Value]) -> Vec { + history + .iter() + .filter_map(format_workflow_history_item_for_runs_show) + .take(12) + .collect() +} + +fn format_workflow_history_item_for_runs_show(item: &serde_json::Value) -> Option { + let step = json_string(item, "step")?; + let round = item.get("round").and_then(|value| value.as_u64()); + let tasks = item + .get("tasks") + .and_then(|value| value.as_array()) + .map(|items| { + items + .iter() + .filter_map(|task| { + json_string(task, "id").or_else(|| task.as_str().map(str::to_string)) + }) + .collect::>() + }) + .unwrap_or_default(); + let files = item + .get("files") + .and_then(|value| value.as_array()) + .map(Vec::len) + .unwrap_or(0); + let blockers = item + .get("blockers") + .and_then(|value| value.as_array()) + .map(Vec::len) + .unwrap_or(0); + let blocker_sample = item + .get("blockers") + .and_then(|value| value.as_array()) + .and_then(|items| items.first()) + .and_then(|value| value.as_str()) + .map(|value| format!(" — {}", truncate(value, 180))) + .unwrap_or_default(); + + match step.as_str() { + "plan" => { + let summary = json_string(item, "summary") + .map(|value| format!(" — {}", truncate(&value, 120))) + .unwrap_or_default(); + Some(format!( + "plan: {} task(s) {}{}", + tasks.len(), + truncate(&tasks.join(","), 120), + summary + )) + } + "implement" => Some(format!( + "implement r{}: {} task(s), {} file(s)", + round.unwrap_or(1), + tasks.len(), + files + )), + "pre_review_block" => Some(format!( + "pre-review block r{}: failed={} scope_issues={}", + round.unwrap_or(1), + item.get("failed_tasks") + .and_then(|value| value.as_array()) + .map(Vec::len) + .unwrap_or(0), + item.get("scope_issues") + .and_then(|value| value.as_array()) + .map(Vec::len) + .unwrap_or(0) + )), + "review" => { + let decision = json_string(item, "decision").unwrap_or_else(|| "unknown".to_string()); + let apply_ready = item + .get("applyReady") + .and_then(|value| value.as_bool()) + .map(|value| value.to_string()) + .unwrap_or_else(|| "false".to_string()); + Some(format!( + "review r{}: {decision} applyReady={apply_ready} blockers={} files={files}{blocker_sample}", + round.unwrap_or(1), + blockers + )) + } + "repair_plan" => { + let reason = json_string(item, "reason") + .map(|value| format!(" reason={value}")) + .unwrap_or_default(); + Some(format!( + "repair plan r{}: tasks={} retained_files={}{}", + round.unwrap_or(1), + truncate(&tasks.join(","), 140), + item.get("retained_files") + .and_then(|value| value.as_array()) + .map(Vec::len) + .unwrap_or(0), + reason + )) + } + "repair" => Some(format!( + "repair r{}: tasks={} files={} candidate_files={}", + round.unwrap_or(1), + truncate(&tasks.join(","), 140), + files, + item.get("candidate_files") + .and_then(|value| value.as_array()) + .map(Vec::len) + .unwrap_or(0) + )), + "verify" => { + let ok = item + .get("ok") + .and_then(|value| value.as_bool()) + .map(|value| value.to_string()) + .unwrap_or_else(|| "unknown".to_string()); + let guard_ok = item + .get("guard") + .and_then(|guard| guard.get("ok")) + .and_then(|value| value.as_bool()) + .map(|value| value.to_string()) + .unwrap_or_else(|| "unknown".to_string()); + Some(format!("verify: ok={ok} guard={guard_ok}")) + } + _ => Some(truncate(&item.to_string(), 220)), + } +} + +fn format_recent_event_for_runs_show(event: &serde_json::Value) -> String { + let raw = event.get("raw").unwrap_or(event); + if json_string(raw, "type").as_deref() == Some("workflow_done") { + let name = json_string(raw, "name").unwrap_or_else(|| "workflow".to_string()); + let result = raw + .get("result") + .filter(|value| !value.is_null()) + .map(summarize_workflow_result_for_runs_show) + .filter(|summary| !summary.is_empty()) + .map(|summary| format!(" result {summary}")) + .unwrap_or_default(); + return format!("[workflow] done {name}{result}"); + } + + if raw.is_object() { + return truncate(&summarize_script_event(raw), 260); + } + + json_string(event, "summary") + .map(|summary| truncate(&summary, 260)) + .or_else(|| json_string(event, "type")) + .unwrap_or_else(|| truncate(&event.to_string(), 160)) +} + +fn summarize_workflow_result_for_runs_show(result: &serde_json::Value) -> String { + let mut parts = Vec::new(); + if let Some(ok) = result.get("ok").and_then(|value| value.as_bool()) { + parts.push(format!("ok={ok}")); + } + if let Some(category) = result + .get("error") + .and_then(|error| json_string(error, "category")) + { + parts.push(format!("error={category}")); + } + if let Some(decision) = result + .get("gate") + .and_then(|gate| json_string(gate, "decision")) + { + parts.push(format!("decision={decision}")); + } + if let Some(apply_ready) = result + .get("gate") + .and_then(|gate| gate.get("applyReady")) + .and_then(|value| value.as_bool()) + { + parts.push(format!("applyReady={apply_ready}")); + } + if let Some(applied) = result + .get("landed") + .and_then(|landed| landed.get("applied")) + .and_then(|value| value.as_u64()) + { + parts.push(format!("applied={applied}")); + } + if let Some(failed) = result + .get("landed") + .and_then(|landed| landed.get("failed")) + .and_then(|value| value.as_u64()) + { + parts.push(format!("failed={failed}")); + } + if let Some(ok) = result + .get("verifyGuard") + .and_then(|guard| guard.get("ok")) + .and_then(|value| value.as_bool()) + { + parts.push(format!("verifyGuard={ok}")); + } + + if parts.is_empty() { + serde_json::to_string(result) + .map(|value| truncate(&value, 240)) + .unwrap_or_default() + } else { + parts.join(" ") + } +} + fn latest_agent_message_for_key(events: &[serde_json::Value], key: &str) -> Option { for event in events.iter().rev() { let raw = event.get("raw").unwrap_or(event); @@ -929,6 +1237,14 @@ fn exec_script(args: ExecArgs) -> Result<()> { .map(|run_dir| run_dir.join("state.json")) .filter(|path| path.exists()); fs::create_dir_all(&run_dir).with_context(|| format!("create {}", run_dir.display()))?; + // Retention: .odw/runs is otherwise unbounded (dogfooding accumulated 4882 + // run dirs). Keep the most recent ODW_RUNS_KEEP runs (default 50; 0 disables). + // Best-effort; never prunes this run's own dir or the one we are resuming. + let mut protected_runs: Vec<&Path> = vec![run_dir.as_path()]; + if let Some(resume_dir) = resume_run_dir.as_deref() { + protected_runs.push(resume_dir); + } + prune_old_runs(&root.join(".odw/runs"), &protected_runs); fs::write(run_dir.join("input.raw"), &input) .with_context(|| format!("write {}", run_dir.join("input.raw").display()))?; fs::write(&runner, ODW_JS_RUNNER).with_context(|| format!("write {}", runner.display()))?; @@ -958,7 +1274,6 @@ fn exec_script(args: ExecArgs) -> Result<()> { resume_state_path, backend: args.backend, odw_bin: current_exe, - codexctl_bin: args.codexctl_bin, pandacode_bin: resolved_pandacode_bin(&args.pandacode_bin), provider: args.provider, model: args.model, @@ -1083,7 +1398,11 @@ fn resolved_pandacode_bin(configured: &str) -> String { return configured.to_string(); } if let Ok(exe) = std::env::current_exe() { - let bin_name = if cfg!(windows) { "pandacode.exe" } else { "pandacode" }; + let bin_name = if cfg!(windows) { + "pandacode.exe" + } else { + "pandacode" + }; if let Some(sibling) = exe.parent().map(|dir| dir.join(bin_name)) && sibling.is_file() { @@ -1145,7 +1464,6 @@ struct ScriptRunConfig { resume_state_path: Option, backend: String, odw_bin: String, - codexctl_bin: String, pandacode_bin: String, provider: Option, model: Option, @@ -1230,7 +1548,6 @@ fn run_observable_script(root: &Path, command: Vec, config: ScriptRunCon config.resume_from.as_deref().unwrap_or_default(), ) .env("ODW_BIN", &config.odw_bin) - .env("ODW_CODEXCTL_BIN", &config.codexctl_bin) .env("ODW_PANDACODE_BIN", &config.pandacode_bin) .env("ODW_PROVIDER", config.provider.as_deref().unwrap_or("")) .env("ODW_MODEL", config.model.as_deref().unwrap_or("")) @@ -1306,7 +1623,7 @@ fn run_observable_script(root: &Path, command: Vec, config: ScriptRunCon } if !config.json_only { println!("[odw] completed run_id={}", config.run_id); - println!("[odw] logs: ./.odw/bin/odw runs show {}", config.run_id); + println!("[odw] logs: odw runs show {}", config.run_id); } return Ok(()); } @@ -1404,6 +1721,10 @@ fn summarize_script_event(value: &serde_json::Value) -> String { let message = json_string(value, "message").unwrap_or_else(|| "unknown".to_string()); format!("[workflow] error {message}") } + Some("exit") => { + let status = json_string(value, "status").unwrap_or_else(|| "-".to_string()); + format!("[exit] status={status}") + } Some("phase") => { let title = json_string(value, "title").unwrap_or_else(|| "phase".to_string()); let detail = json_string(value, "detail") @@ -1510,6 +1831,75 @@ fn summarize_script_event(value: &serde_json::Value) -> String { let files = value.get("files").and_then(|f| f.as_u64()).unwrap_or(0); format!("[worktree] done {label} changed={changed} files={files}") } + Some("worktree_patch_apply") => { + let label = ev_str(value, "label"); + let ok = value + .get("ok") + .and_then(|field| field.as_bool()) + .unwrap_or(false); + let applied = value + .get("applied") + .and_then(|field| field.as_bool()) + .unwrap_or(false); + let files = value.get("files").and_then(|f| f.as_u64()).unwrap_or(0); + format!("[worktree] apply {label} ok={ok} applied={applied} files={files}") + } + Some("worktree_review_gate") => { + let label = ev_str(value, "label"); + let decision = json_string(value, "decision").unwrap_or_else(|| "-".to_string()); + let ok = value + .get("ok") + .and_then(|field| field.as_bool()) + .unwrap_or(false); + let files = value.get("files").and_then(|f| f.as_u64()).unwrap_or(0); + let reviewers = value.get("reviewers").and_then(|f| f.as_u64()).unwrap_or(0); + let preflight = json_string(value, "preflight_category") + .or_else(|| json_string(value, "category")) + .map(|category| { + let message = json_string(value, "preflight_message") + .or_else(|| json_string(value, "message")) + .map(|message| format!(" message={}", truncate(&message, 160))) + .unwrap_or_default(); + format!(" category={category}{message}") + }) + .unwrap_or_default(); + format!( + "[worktree] review {label} decision={decision} ok={ok} files={files} reviewers={reviewers}{preflight}" + ) + } + Some("worktree_review_workspace") => { + let label = ev_str(value, "label"); + let status = json_string(value, "status").unwrap_or_else(|| "-".to_string()); + let files = value.get("files").and_then(|f| f.as_u64()).unwrap_or(0); + format!("[worktree] review-workspace {label} {status} files={files}") + } + Some("worktree_snapshot_check") => { + let label = ev_str(value, "label"); + let ok = value + .get("ok") + .and_then(|field| field.as_bool()) + .unwrap_or(false); + let files = ev_u64(value, "files", 0); + let added = ev_u64(value, "added", 0); + let removed = ev_u64(value, "removed", 0); + let modified = ev_u64(value, "modified", 0); + format!( + "[worktree] snapshot {label} ok={ok} files={files} added={added} removed={removed} modified={modified}" + ) + } + Some("worktree_snapshot_restore") => { + let label = ev_str(value, "label"); + let ok = value + .get("ok") + .and_then(|field| field.as_bool()) + .unwrap_or(false); + let restored = ev_u64(value, "restored", 0); + let removed = ev_u64(value, "removed", 0); + let errors = ev_u64(value, "errors", 0); + format!( + "[worktree] restore {label} ok={ok} restored={restored} removed={removed} errors={errors}" + ) + } Some("panda_auto_answer") => { let runtime = json_string(value, "runtime").unwrap_or_else(|| "-".to_string()); let round = value.get("round").and_then(|r| r.as_u64()).unwrap_or(0); @@ -1701,7 +2091,7 @@ fn capabilities_json() -> serde_json::Value { json!({ "primary_user": "Agent or CLI caller", "runtime": "ODW direct JavaScript runner", - "optional_integration": "Claude Code Dynamic Workflows", + "optional_integration": "External callers can invoke the same CLI/scripts; ODW does not install Claude slash commands or project files.", "agent_bridge": "PandaCode dispatches each node to its codex/claude/bamboo runtime via single-shot `pandacode exec`", "lifecycle": { "exec": { @@ -1711,39 +2101,44 @@ fn capabilities_json() -> serde_json::Value { }, "watch": { "cli": "odw runs show ", - "claude_code": "/workflows for Claude-launched runs", - "note": "Direct runs are watched through ODW journals. Claude-launched runs can also use Claude Code's workflow UI." + "list": "odw runs list", + "note": "Direct runs are watched through ODW journals and compact run summaries; HTML reports are linked from runs show when present." }, "pause_resume": { - "claude_code": "/workflows then p", "cli": "odw exec --resume ", "note": "Direct exec resumes from .odw/runs//state.json and skips completed node ids." }, "stop": { - "claude_code": "/workflows then x" + "cli": "stop the invoking process" }, "restart_agent": { - "claude_code": "/workflows then r" - }, - "save": { - "claude_code": "/workflows then s" - }, - "remove": { - "cli": "odw workflows remove " - }, - "evidence": { - "cli": "odw evidence --path ", - "note": "Reads saved Claude Code workflow artifacts under ~/.claude/projects." + "cli": "edit the node prompt or options and run `odw exec --resume `; unchanged completed nodes stay cached" }, "observability": { "cli": "odw exec streams direct node progress; use odw runs list/show for journals.", - "files": ".odw/runs//events.jsonl", - "note": "ODW records workflow_start, phase, node start/done/skip, checkpoint, error, and exit events for direct runs." + "files": ".odw/runs//events.jsonl, state.json, run.json, report.html when generated", + "note": "ODW records workflow_start, phase, node start/done/skip, review gate, apply, snapshot, checkpoint, error, and exit events for direct runs." }, "spec": { "cli": "odw spec", "note": "Documents the direct workflow script contract, Codex helpers, and compatibility surfaces." }, + "contract": { + "cli": "odw contract", + "note": "Prints the full authoring contract for agents." + }, + "doctor": { + "cli": "odw doctor", + "note": "Checks node and pandacode wiring for direct runs." + }, + "report": { + "cli": "odw report --run --open", + "note": "Renders a self-contained HTML execution graph for an existing run; `--script` mock-runs first." + }, + "starter": { + "cli": "odw starter parallel-review-apply > wf.js", + "note": "Prints a built-in large-project starter workflow: optional request/spec planner, parallel worktrees, candidate-worktree review gate, targeted repair/re-review, approve-only atomic landing, and read-only final verification guarded/restored by a main-worktree snapshot." + }, "error_feedback": { "schema": ".odw/schemas/error-feedback.schema.json", "note": "Worker failures must be returned as classified, retry-aware feedback." @@ -1753,8 +2148,10 @@ fn capabilities_json() -> serde_json::Value { "parallel": "Dynamic Workflow-compatible parallel([() => agent(...), ...]) fan-out/join with max 16 concurrent thunks", "fanout": "fanout(items, mapper) dynamically maps structured upstream output into parallel downstream nodes", "pipeline": "Dynamic Workflow-compatible pipeline(items, ...stages) streams each item through sequential stages while items fan out", + "worktree_review": "reviewWorktreeDiffs(results, opts) preflights captured worktree patches, applies them to a temporary candidate worktree, and runs structured reviewer agents there before landing", + "worktree_apply": "applyWorktreeDiffs(results, opts) atomically applies captured worktree patches to the main cwd by default; continueOnError opts into partial landing", "schemas": "Optional .odw/schemas/*.schema.json contracts. No node receives a default schema; workflow code opts in with schema (schemaDescription optional) for runtime validation, schema_mismatch feedback, and same-node retry context injection", - "observability": ".odw/runs/*.json and .odw/runs/*/events.jsonl", + "observability": ".odw/runs/*.json, .odw/runs/*/events.jsonl, and HTML reports that include agent nodes, review gates, candidate workspaces, and apply events", "agent_types": built_in_agents().iter().map(|agent| agent.name).collect::>() } }) @@ -1768,12 +2165,9 @@ fn framework_spec_json() -> serde_json::Value { "types_dts": pack::WORKFLOW_API_DTS, "compatibility_target": { "runtime": "ODW direct JavaScript runner", - "optional_runtime": "Claude Code Dynamic Workflows", - "minimum_observed_claude_code_version": "2.1.154", - "project_subagents": ".claude/agents/*.md", - "project_commands": ".claude/commands/*.md", - "project_workflows": ".claude/workflows/*.js", - "management_surface": "odw runs list/show; Claude Code /workflows for Claude-launched runs" + "optional_external_callers": "Claude Code, Codex, shell scripts, CI, or any agent can invoke the CLI; ODW itself does not install slash commands or project templates.", + "project_workflows": "normal JavaScript files; workflow(nameOrRef, args) can resolve .claude/workflows/.js, odw-.js, or a path when those files already exist", + "management_surface": "odw exec, odw runs list/show, odw report, odw starter, odw guide/spec/contract/capabilities" }, "script_contract": { "language": "JavaScript module", @@ -1791,6 +2185,8 @@ fn framework_spec_json() -> serde_json::Value { "parallel": "parallel([() => agent(...), ...]) is the primary node fan-out/join API; keep concurrency <= 16.", "fanout": "fanout(items, mapper) maps structured upstream output into dynamic parallel child nodes.", "pipeline": "pipeline(items, ...stages) runs each item through sequential stages while items fan out.", + "review_worktree_diffs": "reviewWorktreeDiffs(candidates, opts?) reviews captured worktree diffs before landing: preflight combined patch, create a temporary candidate worktree with the patch applied, run structured reviewers there, and return approve/reject/needs_owner.", + "apply_worktree_diffs": "applyWorktreeDiffs(candidates, opts?) applies captured worktree diffs to the main cwd atomically by default; use continueOnError:true only for intentional partial landing.", "schema_retry": "No schema is applied by default. When options.schema is set, schemaDescription is optional; direct exec appends the full JSON Schema as a final-response-only contract, validates that final response, emits agent_schema_invalid on mismatch, injects validation context into the same node prompt, and retries up to retry.maxAttempts.", "prompt_style": "Workflow scripts declare prompt slots. Real runs inject input.prompts.; mock runs may use suggested template literals for smoke tests.", "error_feedback": "Every node prompt includes a failure contract. A worker that cannot complete returns .odw/schemas/error-feedback.schema.json instead of unstructured prose.", @@ -1801,15 +2197,15 @@ fn framework_spec_json() -> serde_json::Value { } }, "lifecycle": { - "run": "odw exec --script --input --backend ; optional /odw, /odw-audit, /odw-ship, /odw-flow for Claude Code", + "run": "odw exec --script --input --backend ", "watch": "odw runs show ", "observe": "odw exec live stream + odw runs list/show journals", - "pause_resume": "odw exec --resume ; optional /workflows then p for Claude Code", - "stop": "stop the invoking process; Claude Code /workflows then x for Claude-launched runs", - "restart_agent": "direct exec resumes by node id; Claude Code /workflows then r for Claude-launched runs", - "save": "workflow scripts are normal files; Claude Code /workflows then s for Claude-launched scripts", - "remove_template": "odw workflows remove ", - "artifact_evidence": "odw evidence", + "pause_resume": "odw exec --resume ", + "stop": "stop the invoking process", + "restart_agent": "edit the node prompt/options and resume; direct exec skips unchanged completed nodes by fingerprint and re-runs changed nodes", + "save": "workflow scripts are normal files; save or delete them with ordinary filesystem operations", + "starter": "odw starter parallel-review-apply > wf.js", + "report": "odw report --run --open", "run_journal": ".odw/runs//events.jsonl" }, "agent_types": built_in_agents().iter().map(|agent| json!({ @@ -1817,9 +2213,9 @@ fn framework_spec_json() -> serde_json::Value { "description": agent.description })).collect::>(), "extension_points": { - "new_agent": "Add .claude/agents/.md with Claude Code subagent frontmatter.", - "new_command": "Add .claude/commands/.md with a prompt and $ARGUMENTS.", - "new_workflow": "Add .claude/workflows/.js following .odw/framework/workflow-api.d.ts.", + "new_agent": "Add another agent(...) call or helper function in the workflow script; agentType is optional metadata and runtime/provider selects execution.", + "new_command": "Wrap an `odw exec` invocation in your own shell, CI, or external agent command if desired.", + "new_workflow": "Create a JavaScript workflow file following `odw spec` / workflow-api.d.ts; `odw starter` can print the built-in large-project template.", "new_schema": "Schemas are opt-in. Add .odw/schemas/.schema.json, then reference it from workflow code with agent(..., { schema, schemaDescription })." } }) @@ -1844,6 +2240,89 @@ fn sorted_dirs(dir: &Path) -> Result> { Ok(dirs) } +const STALE_ACTIVE_RUN_MS: u64 = 24 * 60 * 60 * 1000; +const RUN_RETENTION_GRACE_MS: u64 = 60 * 60 * 1000; + +/// Keep only the most recent `ODW_RUNS_KEEP` run dirs under `.odw/runs` (default +/// 50; 0 disables). Run dirs are named `odw-exec--`, so the lexical +/// order from `sorted_dirs` is chronological — the oldest are deleted first. +/// Best-effort: any failure is ignored so retention never breaks a run. +fn prune_old_runs(runs_dir: &Path, protect: &[&Path]) { + let keep = std::env::var("ODW_RUNS_KEEP") + .ok() + .and_then(|value| value.trim().parse::().ok()) + .unwrap_or(50); + prune_runs_keeping(runs_dir, keep, protect); +} + +fn prune_runs_keeping(runs_dir: &Path, keep: usize, protect: &[&Path]) { + if keep == 0 { + return; + } + let dirs = match sorted_dirs(runs_dir) { + Ok(dirs) => dirs, + Err(_) => return, + }; + if dirs.len() <= keep { + return; + } + // Protected dirs (the run we just created + the run being resumed) are never + // pruned. Other active/fresh run dirs are also skipped so concurrent + // `odw exec` processes in the same repo cannot delete each other's + // in-progress dir out from under them. + let protected: Vec<_> = protect.iter().filter_map(|path| path.file_name()).collect(); + let remove_count = dirs.len() - keep; + let now_ms = now_millis() as u64; + for dir in dirs.into_iter().take(remove_count) { + if dir + .file_name() + .is_some_and(|name| protected.contains(&name)) + { + continue; + } + if !run_dir_prunable(&dir, now_ms) { + continue; + } + let _ = fs::remove_dir_all(&dir); + } +} + +fn run_dir_prunable(dir: &Path, now_ms: u64) -> bool { + let record_path = dir.join("run.json"); + if let Ok(content) = fs::read_to_string(&record_path) + && let Ok(value) = serde_json::from_str::(&content) + { + let status = value + .get("status") + .and_then(|item| item.as_str()) + .unwrap_or(""); + let finished_or_started_ms = value + .get("finished_ms") + .and_then(|item| item.as_u64()) + .or_else(|| value.get("started_ms").and_then(|item| item.as_u64())) + .or_else(|| { + value + .get("run_id") + .and_then(|item| item.as_str()) + .and_then(run_id_started_ms) + }) + .unwrap_or(now_ms); + if matches!(status, "completed" | "failed" | "error" | "stopped") { + return now_ms.saturating_sub(finished_or_started_ms) > RUN_RETENTION_GRACE_MS; + } + return now_ms.saturating_sub(finished_or_started_ms) > STALE_ACTIVE_RUN_MS; + } + + let run_id_started_ms = dir + .file_name() + .and_then(|name| name.to_str()) + .and_then(run_id_started_ms); + let Some(started_ms) = run_id_started_ms else { + return true; + }; + now_ms.saturating_sub(started_ms) > RUN_RETENTION_GRACE_MS +} + fn run_version(command: &str, args: &[&str]) -> ToolStatus { match Command::new(command).args(args).output() { Ok(output) => { @@ -1929,10 +2408,10 @@ mod tests { use super::*; use std::time::{SystemTime, UNIX_EPOCH}; - #[test] fn capabilities_expose_lifecycle_boundaries() { let value = capabilities_json(); + let rendered = serde_json::to_string(&value).unwrap(); assert_eq!(value["primary_user"], "Agent or CLI caller"); assert_eq!(value["runtime"], "ODW direct JavaScript runner"); // Assert the stable invariant, not exact prose: the bridge names pandacode @@ -1954,15 +2433,50 @@ mod tests { value["lifecycle"]["pause_resume"]["note"], "Direct exec resumes from .odw/runs//state.json and skips completed node ids." ); + assert_eq!(value["lifecycle"]["watch"]["list"], "odw runs list"); + assert_eq!( + value["lifecycle"]["report"]["cli"], + "odw report --run --open" + ); + assert_eq!( + value["lifecycle"]["starter"]["cli"], + "odw starter parallel-review-apply > wf.js" + ); assert_eq!( value["composition"]["parallel"], "Dynamic Workflow-compatible parallel([() => agent(...), ...]) fan-out/join with max 16 concurrent thunks" ); + assert!( + value["composition"]["worktree_review"] + .as_str() + .unwrap() + .contains("reviewWorktreeDiffs") + ); + assert!( + value["composition"]["worktree_apply"] + .as_str() + .unwrap() + .contains("applyWorktreeDiffs") + ); + for forbidden in [ + "odw evidence", + "odw workflows", + "/odw", + "/workflows then", + "claude_code", + "Claude-launched", + ] { + assert!( + !rendered.contains(forbidden), + "capabilities should not advertise unsupported direct surface {forbidden}: {rendered}" + ); + } } #[test] fn spec_exposes_direct_script_contract() { let value = framework_spec_json(); + let rendered = serde_json::to_string(&value).unwrap(); assert_eq!(value["name"], "Open Dynamic Workflow"); assert_eq!( value["compatibility_target"]["runtime"], @@ -1976,10 +2490,45 @@ mod tests { value["script_contract"]["codex"], "Route Codex through ordinary agent(prompt, { runtime: 'codex' }); agentType is optional metadata." ); + assert!( + value["script_contract"]["review_worktree_diffs"] + .as_str() + .unwrap() + .contains("approve/reject/needs_owner") + ); + assert!( + value["script_contract"]["apply_worktree_diffs"] + .as_str() + .unwrap() + .contains("atomically") + ); + assert_eq!( + value["lifecycle"]["starter"], + "odw starter parallel-review-apply > wf.js" + ); + assert_eq!( + value["lifecycle"]["report"], + "odw report --run --open" + ); assert_eq!( value["script_contract"]["limits"]["max_concurrent_agents"], 16 ); + for forbidden in [ + "odw evidence", + "odw workflows", + "/odw", + "/workflows then", + "Claude-launched", + ".odw/framework", + "project_subagents", + "project_commands", + ] { + assert!( + !rendered.contains(forbidden), + "spec should not advertise unsupported direct surface {forbidden}: {rendered}" + ); + } } #[test] @@ -1990,16 +2539,29 @@ mod tests { assert!(ODW_JS_RUNNER.contains("globalThis.fanout")); assert!(ODW_JS_RUNNER.contains("globalThis.pipeline")); assert!(ODW_JS_RUNNER.contains("globalThis.log")); + assert!(ODW_JS_RUNNER.contains("const index = agentIndex")); + assert!(!ODW_JS_RUNNER.contains("index: agentIndex")); assert!(ODW_JS_RUNNER.contains("globalThis.promptSlot")); assert!(ODW_JS_RUNNER.contains("globalThis.args")); + assert!(ODW_JS_RUNNER.contains("bambooApiKeyPreflight")); + assert!(ODW_JS_RUNNER.contains("bamboo_missing_api_key")); + assert!(ODW_JS_RUNNER.contains("panda_preflight_blocked")); assert!(ODW_JS_RUNNER.contains("appendSchemaContract")); assert!(ODW_JS_RUNNER.contains("ODW final response contract")); + assert!(ODW_JS_RUNNER.contains("The final response must start with { and end with }")); + assert!(ODW_JS_RUNNER.contains("Required final response shape")); assert!(ODW_JS_RUNNER.contains("resolveSchemaDescription")); // Built-in Workflow parity surface (guards against runtime regressions). assert!(ODW_JS_RUNNER.contains("getMaxConcurrency")); assert!(ODW_JS_RUNNER.contains("scriptDeterminismGuards")); assert!(ODW_JS_RUNNER.contains("createWorktree")); assert!(ODW_JS_RUNNER.contains("captureWorktreeChanges")); + assert!(ODW_JS_RUNNER.contains("globalThis.applyWorktreeDiff")); + assert!(ODW_JS_RUNNER.contains("globalThis.applyWorktreeDiffs")); + assert!(ODW_JS_RUNNER.contains("globalThis.reviewWorktreeDiffs")); + assert!(ODW_JS_RUNNER.contains("worktree_patch_apply")); + assert!(ODW_JS_RUNNER.contains("worktree_review_gate")); + assert!(ODW_JS_RUNNER.contains("worktree_review_workspace")); assert!(ODW_JS_RUNNER.contains("globalThis.workflow")); assert!(ODW_JS_RUNNER.contains("extractJsonObjectStrings")); assert!(ODW_JS_RUNNER.contains("agentCacheKey")); @@ -2046,6 +2608,8 @@ mod tests { r#"{"type":"launch"}"#, r#"{"raw":{"type":"codex_poll","key":"active-node","last_agent_message":"Older status"},"summary":"[event] codex_poll"}"#, r#"{"raw":{"type":"codex_poll","key":"active-node","last_agent_message":"Latest active status"},"summary":"[event] codex_poll"}"#, + r#"{"raw":{"type":"worktree_review_gate","label":"batch-review-r1","ok":false,"decision":"reject","files":2,"reviewers":0,"preflight_category":"patch_conflict","preflight_message":"error: patch failed: same.txt:1\nerror: same.txt: patch does not apply","blockers":1},"summary":"[event] worktree_review_gate"}"#, + r#"{"type":"workflow_done","name":"test-flow","result":{"ok":true,"gate":{"decision":"approve","applyReady":true},"landed":{"applied":4,"failed":0},"verifyGuard":{"ok":true},"verification":["long evidence that should stay out of the compact runs show view"]}}"#, r#"{"type":"exit","status":"completed"}"#, ] .join("\n"), @@ -2055,6 +2619,52 @@ mod tests { root.join(".odw/runs/odw-run-test/state.json"), serde_json::to_string_pretty(&json!({ "workflow": "test-flow", + "result": { + "ok": true, + "history": [ + { + "step": "plan", + "summary": "mock planned summary", + "tasks": [ + {"id": "alpha", "files": ["a.js"]}, + {"id": "beta", "files": ["b.js"]} + ] + }, + { + "step": "implement", + "round": 1, + "tasks": ["alpha", "beta"], + "files": ["a.js", "b.js"] + }, + { + "step": "review", + "round": 1, + "decision": "reject", + "applyReady": false, + "blockers": ["test failed"], + "files": ["a.js", "b.js"] + }, + { + "step": "repair_plan", + "round": 2, + "tasks": ["beta"], + "retained_files": ["a.js"] + }, + { + "step": "review", + "round": 2, + "decision": "approve", + "applyReady": true, + "blockers": [], + "files": ["a.js", "b.js"] + }, + { + "step": "verify", + "ok": true, + "guard": {"ok": true, "files": 0} + } + ] + }, "activeAgents": { "active-node": { "key": "active-node", @@ -2073,17 +2683,167 @@ mod tests { .unwrap(), ) .unwrap(); + fs::write( + root.join(".odw/runs/odw-run-test/report.html"), + "", + ) + .unwrap(); let list = runs_list_report(&root).unwrap(); assert_eq!(list["runs"].as_array().unwrap().len(), 1); - let shown = runs_show_report(&root, "latest", 4).unwrap(); - assert_eq!(shown["events"].as_array().unwrap().len(), 4); + let list_view = format_runs_list_view(&list); + assert!(list_view.contains("Runs in ")); + assert!(list_view.contains("odw-run-test [completed]")); + assert!(list_view.contains("duration=")); + assert!(list_view.contains("workflow=-")); + assert!(list_view.contains("Show: odw runs show ")); + assert!(list_view.contains("JSON: odw runs list --json")); + let shown = runs_show_report(&root, "latest", 5).unwrap(); + assert_eq!(shown["events"].as_array().unwrap().len(), 5); + assert!( + shown["report_path"] + .as_str() + .unwrap() + .ends_with("report.html") + ); assert_eq!(shown["progress"]["completed_agents"], 0); + assert_eq!( + shown["progress"]["result"]["history"] + .as_array() + .unwrap() + .len(), + 6 + ); assert_eq!( shown["progress"]["active_agents"].as_array().unwrap().len(), 1 ); - assert!(format_runs_show_view(&shown).contains("last: Latest active status")); + let view = format_runs_show_view(&shown); + assert!(view.contains("last: Latest active status")); + assert!(view.contains("Report: ")); + assert!(view.contains("report.html")); + assert!(view.contains("Workflow history:")); + assert!(view.contains("plan: 2 task(s) alpha,beta")); + assert!( + view.contains("review r1: reject applyReady=false blockers=1 files=2 — test failed") + ); + assert!(view.contains("repair plan r2: tasks=beta retained_files=1")); + assert!(view.contains("review r2: approve applyReady=true blockers=0 files=2")); + assert!(view.contains("verify: ok=true guard=true")); + assert!(view.contains("[worktree] review batch-review-r1 decision=reject ok=false files=2 reviewers=0 category=patch_conflict message=error: patch failed: same.txt:1\\nerror: same.txt: patch does not apply")); + assert!( + view.contains( + "[workflow] done test-flow result ok=true decision=approve applyReady=true applied=4 failed=0 verifyGuard=true" + ) + ); + assert!(view.contains("[exit] status=completed")); + assert!(!view.contains("long evidence that should stay out")); + + fs::remove_dir_all(root).unwrap(); + } + + #[test] + fn run_journals_show_failed_result_cause() { + let root = temp_root("run-journal-failure-cause"); + let run_dir = root.join(".odw/runs/odw-run-failed"); + fs::create_dir_all(&run_dir).unwrap(); + let record = json!({ + "run_id": "odw-run-failed", + "status": "failed", + "run_dir": run_dir, + "started_ms": 1000_u64, + "finished_ms": 1500_u64 + }); + fs::write( + root.join(".odw/runs/latest.json"), + serde_json::to_string_pretty(&record).unwrap(), + ) + .unwrap(); + fs::write( + run_dir.join("run.json"), + serde_json::to_string_pretty(&record).unwrap(), + ) + .unwrap(); + fs::write( + run_dir.join("events.jsonl"), + [ + r#"{"type":"workflow_done","name":"test-flow","result":{"ok":false,"error":{"category":"planning_failed","message":"Planner did not return tasks"}}}"#, + r#"{"type":"exit","status":"failed"}"#, + ] + .join("\n"), + ) + .unwrap(); + fs::write( + run_dir.join("state.json"), + serde_json::to_string_pretty(&json!({ + "result": { + "ok": false, + "error": { + "category": "planning_failed", + "message": "Planner did not return tasks" + } + }, + "activeAgents": {}, + "agents": {}, + "failedAgents": {}, + "checkpoints": {} + })) + .unwrap(), + ) + .unwrap(); + + let shown = runs_show_report(&root, "latest", 5).unwrap(); + let view = format_runs_show_view(&shown); + assert!(view.contains("Failure: planning_failed: Planner did not return tasks")); + assert_eq!( + format_result_failure_cause( + &json!({"ok": false, "error": "no captured worktree changes"}) + ), + Some("no captured worktree changes".to_string()) + ); + + fs::remove_dir_all(root).unwrap(); + } + + #[test] + fn run_journals_list_newest_first() { + let root = temp_root("run-journal-order"); + let runs_dir = root.join(".odw/runs"); + fs::create_dir_all(&runs_dir).unwrap(); + + for (run_id, started_ms) in [ + ("odw-exec-1000-1", None), + ("odw-exec-3000-1", None), + ("custom-run", Some(2000_u64)), + ] { + let run_dir = runs_dir.join(run_id); + fs::create_dir_all(&run_dir).unwrap(); + let mut record = json!({ + "run_id": run_id, + "status": "completed", + "run_dir": run_dir + }); + if let Some(started_ms) = started_ms { + record["started_ms"] = json!(started_ms); + } + fs::write( + run_dir.join("run.json"), + serde_json::to_string_pretty(&record).unwrap(), + ) + .unwrap(); + } + + let list = runs_list_report(&root).unwrap(); + let run_ids = list["runs"] + .as_array() + .unwrap() + .iter() + .map(|run| run["run_id"].as_str().unwrap()) + .collect::>(); + assert_eq!( + run_ids, + vec!["odw-exec-3000-1", "custom-run", "odw-exec-1000-1"] + ); fs::remove_dir_all(root).unwrap(); } @@ -2157,7 +2917,6 @@ return result; model: None, effort: "low".to_string(), timeout: "120".to_string(), - codexctl_bin: "codexctl".to_string(), pandacode_bin: "pandacode".to_string(), json: false, dry_run: false, @@ -2222,7 +2981,6 @@ return { ok: true }; model: None, effort: "low".to_string(), timeout: "120".to_string(), - codexctl_bin: "codexctl".to_string(), pandacode_bin: "pandacode".to_string(), json: false, dry_run: false, @@ -2234,6 +2992,110 @@ return { ok: true }; fs::remove_dir_all(root).unwrap(); } + #[test] + fn prune_runs_keeps_most_recent_and_protected() { + let runs = temp_root("prune-runs"); + fs::create_dir_all(&runs).unwrap(); + // Lexical order of these names is chronological (fixed-width stamp). + for i in 0..5 { + fs::create_dir_all(runs.join(format!("odw-exec-1000000000{i}-0"))).unwrap(); + } + // Keep the 2 newest, but protect one that would otherwise be pruned. + let protect = runs.join("odw-exec-10000000002-0"); + prune_runs_keeping(&runs, 2, &[protect.as_path()]); + let remaining: Vec = sorted_dirs(&runs) + .unwrap() + .into_iter() + .map(|p| p.file_name().unwrap().to_string_lossy().to_string()) + .collect(); + assert_eq!(remaining.len(), 3, "expected 2 newest + 1 protected"); + assert!(remaining.contains(&"odw-exec-10000000004-0".to_string())); + assert!(remaining.contains(&"odw-exec-10000000003-0".to_string())); + assert!(remaining.contains(&"odw-exec-10000000002-0".to_string())); // protected + assert!(!remaining.contains(&"odw-exec-10000000000-0".to_string())); + fs::remove_dir_all(&runs).unwrap(); + } + + #[test] + fn prune_runs_skips_active_and_fresh_incomplete_runs() { + let runs = temp_root("prune-active-runs"); + fs::create_dir_all(&runs).unwrap(); + let now = now_millis() as u64; + for i in 0..5 { + let run_id = format!("odw-exec-1000000000{i}-0"); + let run_dir = runs.join(&run_id); + fs::create_dir_all(&run_dir).unwrap(); + let status = if i == 2 { "running" } else { "completed" }; + let started_ms = if i == 2 { now } else { 1000000000 + i }; + fs::write( + run_dir.join("run.json"), + serde_json::to_string_pretty(&json!({ + "run_id": run_id, + "status": status, + "started_ms": started_ms + })) + .unwrap(), + ) + .unwrap(); + } + + prune_runs_keeping(&runs, 2, &[]); + let remaining: Vec = sorted_dirs(&runs) + .unwrap() + .into_iter() + .map(|p| p.file_name().unwrap().to_string_lossy().to_string()) + .collect(); + assert_eq!(remaining.len(), 3, "expected 2 newest + 1 active"); + assert!(remaining.contains(&"odw-exec-10000000004-0".to_string())); + assert!(remaining.contains(&"odw-exec-10000000003-0".to_string())); + assert!(remaining.contains(&"odw-exec-10000000002-0".to_string())); // active + fs::remove_dir_all(&runs).unwrap(); + } + + #[test] + fn run_dir_prunable_protects_fresh_incomplete_dirs() { + let runs = temp_root("prune-fresh-incomplete"); + fs::create_dir_all(&runs).unwrap(); + let now = now_millis() as u64; + let fresh = runs.join(format!("odw-exec-{now}-0")); + let old = runs.join("odw-exec-10000000000-0"); + fs::create_dir_all(&fresh).unwrap(); + fs::create_dir_all(&old).unwrap(); + + assert!(!run_dir_prunable(&fresh, now)); + assert!(run_dir_prunable(&old, now)); + fs::remove_dir_all(&runs).unwrap(); + } + + #[test] + fn run_dir_prunable_protects_fresh_terminal_dirs() { + let runs = temp_root("prune-fresh-terminal"); + fs::create_dir_all(&runs).unwrap(); + let now = now_millis() as u64; + let fresh = runs.join(format!("odw-exec-{now}-0")); + let old = runs.join("odw-exec-10000000000-0"); + fs::create_dir_all(&fresh).unwrap(); + fs::create_dir_all(&old).unwrap(); + let cases = [(&fresh, now, now), (&old, 10000000000, 10000000001)]; + for (dir, started_ms, finished_ms) in cases { + fs::write( + dir.join("run.json"), + serde_json::to_string_pretty(&json!({ + "run_id": dir.file_name().unwrap().to_string_lossy(), + "status": "completed", + "started_ms": started_ms, + "finished_ms": finished_ms + })) + .unwrap(), + ) + .unwrap(); + } + + assert!(!run_dir_prunable(&fresh, now)); + assert!(run_dir_prunable(&old, now)); + fs::remove_dir_all(&runs).unwrap(); + } + fn temp_root(name: &str) -> PathBuf { let stamp = SystemTime::now() .duration_since(UNIX_EPOCH) diff --git a/odw/src/pack/templates/contract.md b/odw/src/pack/templates/contract.md index e4fc16e..7e38fe2 100644 --- a/odw/src/pack/templates/contract.md +++ b/odw/src/pack/templates/contract.md @@ -2,9 +2,55 @@ This file is a contract for agents that write Open Dynamic Workflow scripts. -The primary runtime is `odw exec`. Claude Code can still call the same scripts -through `/odw`, but slash commands are compatibility entrypoints, not the -required trigger. +The primary runtime is `odw exec`. External agents such as Claude Code or Codex +can call the same CLI and scripts, but ODW itself does not install slash commands +or project files. + +For a built-in large-project workflow template, run: + +```bash +odw starter parallel-review-apply > wf.js +``` + +That starter fans out isolated Codex worktrees, reviews the combined candidate +in a temporary worktree, repairs reviewer-rejected batches up to +`args.maxReviewRounds` (default 2 for small batches and 3 for 3+ tasks), targets blocker-matched task files when +possible, lands only `approve` gates atomically, and then verifies the main +working directory under a read-only snapshot guard. If final verification +modifies files after approval, the run restores those unapproved changes and +fails instead of bypassing review. +Pass explicit `args.tasks` when decomposition and file ownership are already +known. For lower decision cost, pass `args.request` or `args.spec` without +`tasks`; the starter first runs a structured planning node that returns owned +task files, then sends that plan through the same preflight, review, apply, and +verification gates. +Each task must declare a stable unique `id`; ODW uses task ids for node keys, +sessions, repair history, and reports. Each task must also declare a non-empty +string `prompt`; empty or non-string prompts are rejected before worktrees are +created. +Before review, it blocks failed implementation nodes and cross-owned file edits. +Use `task.file` or `task.files` to declare each task's ownership. Use the +built-in request/spec planner for exploratory decomposition, or set +`allowUndeclaredTaskFiles:true` only when the owner accepts weaker ownership +checks. Declared files must be normalized +repo-relative paths outside `.git`, `.odw`, `.pandacode`, and `node_modules`; +absolute paths, backslashes, and `..` escapes are rejected before worktrees are +created. Set `strictTaskFileBoundaries:false` only with explicit owner intent. +Test and documentation tasks should target the declared files and exports from +the planned task set. If a required public entrypoint is missing from task +ownership, treat it as a planning blocker or add it to a task; do not invent +undeclared entrypoints or skip tests to make isolated verification pass. +The starter injects the run context and full planned task list into every +implementation/repair prompt, so tests, docs, entrypoints, and implementation +modules can align on one shared contract even though they run in isolated +worktrees. +Because isolated worktrees branch from `HEAD`, it also blocks dirty declared +task files before implementation; commit/stash them first, or set +`allowDirtyTaskFiles:true` only when the owner accepts that workers will not see +those uncommitted changes. +It also blocks duplicate declared ownership of the same file; merge those tasks, +run them serially, or set `allowDuplicateTaskFiles:true` only when overlapping +patches are intentional and reviewable. ## Required concepts @@ -19,6 +65,11 @@ required trigger. - `checkpoint`: persist a resume boundary for `odw exec --resume` - `parallel`: fan out independent agents - `pipeline`: pass verified outputs from one phase to the next +- `reviewWorktreeDiffs`: review captured worktree patches before landing; ODW + preflights the combined patch, applies it inside a temporary candidate + worktree, and runs structured reviewer agents there +- `applyWorktreeDiffs`: atomically apply captured worktree patches to the main + cwd after review; partial landing requires explicit `continueOnError:true` - `verify`: adversarial review before synthesis - `synthesize`: final answer returned to the caller @@ -52,6 +103,13 @@ stable `id` derived from the task. For staged item streams, use `pipeline(items, ...stages)`. +For parallel file edits, run mutating nodes with `isolation: "worktree"`, then +call `reviewWorktreeDiffs(results, opts)` before `applyWorktreeDiffs(results)`. +Only a review gate with `decision: "approve"` / `applyReady: true` should be +auto-landed. `reject` means rework first, preferably by running a fresh isolated +worktree round with reviewer feedback; `needs_owner` means ask the product/code +owner before applying. + ODW does not create default nodes or apply a default schema. The workflow author decides the flow with ordinary JavaScript plus `agent(...)`, `parallel(...)`, `fanout(...)`, and `checkpoint(...)`. @@ -68,7 +126,9 @@ ODW validates only the final response against the schema. On mismatch it emits same node prompt, and retries until `maxAttempts` is exhausted. The final failure is structured as `.odw/schemas/error-feedback.schema.json` with `schema_mismatch` and a node reference so downstream feedback nodes can route -it. +it. Schema nodes must return final JSON only; review verdicts should put reject +evidence in `blockers`, `risks`, `owner_questions`, and `verification` rather +than prose outside the JSON object. ## Optional Starter Labels @@ -163,13 +223,10 @@ When launched through `odw exec`, ODW records: - full event journal at `.odw/runs//events.jsonl` - direct resume state at `.odw/runs//state.json` - raw script stderr at `.odw/runs//script-debug.log` +- run metadata at `.odw/runs//run.json` +- HTML report at `.odw/runs//report.html` when report generation is enabled - `odw exec --resume latest` -Claude Code's `/workflows` surface remains available for Claude-launched runs. - -Use `odw workflows remove ` to remove saved Open Dynamic Workflow templates from -the filesystem. - ## PandaCode backend decision Open Dynamic Workflow is a pure orchestration runtime. It dispatches executor diff --git a/odw/src/pack/templates/report/odw-report.mjs b/odw/src/pack/templates/report/odw-report.mjs index 53a5337..9728ce8 100644 --- a/odw/src/pack/templates/report/odw-report.mjs +++ b/odw/src/pack/templates/report/odw-report.mjs @@ -43,6 +43,8 @@ html,body{margin:0;height:100%;background:var(--bg);color:var(--ink);font-family .kv .v{color:var(--ink);font-family:ui-monospace,Menlo,monospace;font-size:12.5px;word-break:break-word} .kv .v.ok{color:var(--ok)}.kv .v.fail{color:var(--fail)} .prompt{background:#0a0b0f;border:1px solid var(--line);border-radius:10px;padding:14px 15px;font-size:12.5px;line-height:1.7;color:var(--ink2);white-space:pre-wrap;word-break:break-word;font-family:ui-monospace,Menlo,monospace;max-height:46vh;overflow:auto} +.history{margin:0;padding:0;list-style:none;display:grid;gap:8px} +.history li{border:1px solid var(--line);border-radius:8px;background:#0b0d13;padding:9px 11px;color:var(--ink2);font-size:12.5px;line-height:1.45} .empty{color:var(--dim);font-size:13px}
__TITLE____SUBTITLE__
@@ -64,7 +66,15 @@ function row(k,v,cls){return '
'+k+'
‹ overview'; h+='
'+esc(n.label)+'
'; - h+='
'+(n.kind==='ai'?'agent() node':'code')+(n.stage?' · '+esc(n.stage):'')+'
'; + h+='
'+(n.kind==='ai'?'agent() node':(n.kind==='event'?'workflow event':'code'))+(n.stage?' · '+esc(n.stage):'')+'
'; + if(n.kind==='event'){ + const ev=n.event||{}; + let rows=''; + for(const [k,v] of Object.entries(ev)){ rows+=row(k,typeof v==='object'?JSON.stringify(v):v,(k==='ok'&&v===false)?'fail':((k==='ok'||k==='applied'||k==='applyReady')&&v===true?'ok':'')); } + h+='
event
'+(rows||'
(empty)
')+'
'; + if(n.raw) h+='
raw
'+esc(JSON.stringify(n.raw,null,2))+'
'; + return h; + } if(n.kind!=='ai'){h+='
Orchestration step (parallel / pipeline fan-out / join / start / end) — emitted by the workflow code, not an AI call.
';return h;} // config straight from the code const cfg=n.config||{}; @@ -90,11 +100,15 @@ function showOverview(){ let h='
'+esc(o.name)+'
'+esc(o.subtitle)+'
'; h+='
run
'+ row('backend',o.backend)+row('status',o.status,o.failed?'fail':'ok')+ - row('agent nodes',num(o.ai))+row('total tokens',num(o.tokens)+(o.approx?' (≥)':''))+'
'; + row('agent nodes',num(o.ai))+row('review gates',num(o.gates))+row('apply events',num(o.applies))+ + row('total tokens',num(o.tokens)+(o.approx?' (≥)':''))+'
'; if(o.modelTokens&&o.modelTokens.length){ h+='
tokens by model
'+ o.modelTokens.map(([m,t])=>row(m,num(t))).join('')+'
';} - h+='
tip
Click any node to see its config and prompt as written in the workflow code.
'; + if(o.history&&o.history.length){ + h+='
workflow history
    '+ + o.history.map((line)=>'
  1. '+esc(line)+'
  2. ').join('')+'
';} + h+='
tip
Click any agent, review gate, workspace, or apply node to inspect the exact runtime evidence.
'; document.getElementById('detail').innerHTML=h;} function fitGraph(){const svg=document.querySelector('.graph svg'),g=document.querySelector('.graph'); if(!svg||!g)return;const vb=svg.viewBox&&svg.viewBox.baseVal;if(!vb||!vb.width)return; @@ -133,6 +147,55 @@ const nodes = {}; const order = []; let codeSeq = 0; function addCode(id, label, term = false) { nodes[id] = { id, kind: "code", label, term }; order.push(id); return id; } +function eventFields(ev) { + const fields = { type: ev.type }; + for (const key of [ + "label", + "status", + "decision", + "ok", + "applyReady", + "applied", + "files", + "file_samples", + "reviewers", + "review_decisions", + "blockers", + "blocker_samples", + "risks", + "risk_samples", + "owner_questions", + "owner_question_samples", + "verification_samples", + "added", + "removed", + "modified", + "restored", + "category", + "message" + ]) { + if (ev[key] !== undefined) fields[key] = ev[key]; + } + return fields; +} +function addEvent(label, ev) { + codeSeq += 1; + const id = `event${codeSeq}`; + nodes[id] = { + id, + kind: "event", + label, + eventType: ev.type, + event: eventFields(ev), + raw: ev, + ok: ev.ok, + status: ev.ok === false ? "failed" : "ok" + }; + order.push(id); + link(tail, id); + tail = id; + return id; +} const startId = addCode("start", "start", true); const edges = []; const link = (a, b) => { if (a && b) edges.push([a, b]); }; @@ -174,6 +237,20 @@ for (const ev of events) { } else if (t === "agent_skip") { if (!nodes[ev.key]) { nodes[ev.key] = { id: ev.key, kind: "ai", label: ev.label || ev.key, stage: ev.phase || "", config: ev.config || {}, prompt: "", status: "skip", tokens: null, durationMs: null }; order.push(ev.key); const g = groups[groups.length - 1]; if (g) { link(g.forkId, ev.key); g.children.push(ev.key); } else { link(tail, ev.key); tail = ev.key; } } else nodes[ev.key].status = "skip"; + } else if (t === "worktree_review_workspace") { + addEvent(`review workspace ${ev.status || ""}`.trim(), ev); + } else if (t === "worktree_review_gate") { + addEvent(`gate: ${ev.decision || "review"}`, ev); + } else if (t === "worktree_patch_apply") { + const suffix = ev.applied ? "applied" : (ev.ok === false ? "failed" : "checked"); + addEvent(`apply ${suffix}`, ev); + } else if (t === "worktree_snapshot_check") { + addEvent(`snapshot: ${ev.ok === false ? "changed" : "clean"}`, ev); + } else if (t === "worktree_snapshot_restore") { + addEvent(`snapshot restore: ${ev.ok === false ? "failed" : "ok"}`, ev); + } else if (t === "log") { + const message = String(ev.message || "").trim(); + addEvent(`log: ${message.slice(0, 32) || "message"}`, ev); } } const endId = addCode("end", "end", true); @@ -204,7 +281,61 @@ const modelTokens = Object.entries(byModel).filter(([, t]) => t > 0).sort((a, b) // always adds up to "total tokens" instead of silently falling short. const attributed = modelTokens.reduce((s, [, t]) => s + t, 0); if (totalTokens > attributed) modelTokens.push(["other (retries/overhead)", totalTokens - attributed]); -const overview = { name, subtitle: `${backend} · ${aiNodes.length} nodes`, backend, status, failed: Boolean(wfErr) || aiNodes.some((n) => n.status === "failed"), ai: aiNodes.length, tokens: totalTokens, approx: Boolean(state.budget && state.budget.approx), modelTokens }; +const eventNodes = order.map((id) => nodes[id]).filter((n) => n.kind === "event"); +const reviewGateCount = eventNodes.filter((n) => n.eventType === "worktree_review_gate").length; +const applyEventCount = eventNodes.filter((n) => n.eventType === "worktree_patch_apply").length; +const workflowHistory = Array.isArray(state.result?.history) + ? state.result.history.map(formatHistoryItem).filter(Boolean).slice(0, 12) + : []; +const workflowFailed = Boolean(wfErr) + || Boolean(wfDone && wfDone.result && wfDone.result.ok === false) + || Boolean(!wfDone && (aiNodes.some((n) => n.status === "failed") || eventNodes.some((n) => n.ok === false))); +const overview = { name, subtitle: `${backend} · ${aiNodes.length} nodes`, backend, status, failed: workflowFailed, ai: aiNodes.length, gates: reviewGateCount, applies: applyEventCount, tokens: totalTokens, approx: Boolean(state.budget && state.budget.approx), modelTokens, history: workflowHistory }; + +function truncateText(value, max) { + const text = String(value ?? "").replace(/\s+/g, " ").trim(); + return text.length > max ? text.slice(0, Math.max(0, max - 1)) + "…" : text; +} +function historyTasks(item) { + const tasks = Array.isArray(item?.tasks) ? item.tasks : []; + return tasks.map((task) => typeof task === "string" ? task : task?.id).filter(Boolean); +} +function historyArrayLen(item, key) { + return Array.isArray(item?.[key]) ? item[key].length : 0; +} +function historyFirstText(item, key, maxChars = 180) { + const values = Array.isArray(item?.[key]) ? item[key] : []; + const first = values.find((value) => typeof value === "string" && value.trim()); + return first ? ` — ${truncateText(first, maxChars)}` : ""; +} +function formatHistoryItem(item) { + const step = item?.step; + const round = item?.round || 1; + const tasks = historyTasks(item); + const files = historyArrayLen(item, "files"); + const blockers = historyArrayLen(item, "blockers"); + if (step === "plan") { + const summary = item.summary ? ` — ${truncateText(item.summary, 120)}` : ""; + return `plan: ${tasks.length} task(s) ${truncateText(tasks.join(","), 120)}${summary}`; + } + if (step === "implement") return `implement r${round}: ${tasks.length} task(s), ${files} file(s)`; + if (step === "pre_review_block") { + return `pre-review block r${round}: failed=${historyArrayLen(item, "failed_tasks")} scope_issues=${historyArrayLen(item, "scope_issues")}`; + } + if (step === "review") { + return `review r${round}: ${item.decision || "unknown"} applyReady=${Boolean(item.applyReady)} blockers=${blockers} files=${files}${historyFirstText(item, "blockers")}`; + } + if (step === "repair_plan") { + return `repair plan r${round}: tasks=${truncateText(tasks.join(","), 120)} retained_files=${historyArrayLen(item, "retained_files")}`; + } + if (step === "repair") { + return `repair r${round}: tasks=${truncateText(tasks.join(","), 120)} files=${files} candidate_files=${historyArrayLen(item, "candidate_files")}`; + } + if (step === "verify") { + return `verify: ok=${Boolean(item.ok)} guard=${Boolean(item.guard?.ok)}`; + } + return step ? `${step}${item.round ? ` r${item.round}` : ""}` : null; +} // ---- mermaid (uncoloured) ------------------------------------------------- const safe = (id) => "n_" + String(id).replace(/[^a-zA-Z0-9_]/g, "_"); @@ -214,12 +345,19 @@ function mermaid() { const n = nodes[id]; const lbl = String(n.label).replace(/"/g, "”").slice(0, 24); const shape = n.term ? `(["${lbl}"])` : (n.kind === "code" ? `{"${lbl}"}` : `("${lbl}")`); - const cls = n.kind === "ai" ? (n.status === "failed" ? "fail" : "node") : "code"; + const cls = n.kind === "ai" + ? (n.status === "failed" ? "fail" : "node") + : (n.kind === "event" + ? (n.ok === false ? "fail" : (n.eventType === "worktree_review_gate" ? "gate" : (n.eventType === "worktree_patch_apply" ? "apply" : "event"))) + : "code"); L.push(` ${safe(id)}${shape}:::${cls}`); } for (const [a, b] of edges) L.push(` ${safe(a)} --> ${safe(b)}`); L.push(" classDef node fill:#161922,stroke:#586074,color:#e3e6ef,stroke-width:1.3px;"); L.push(" classDef fail fill:#241317,stroke:#e0574b,color:#f3d2cf,stroke-width:1.3px;"); + L.push(" classDef gate fill:#13201c,stroke:#35c79a,color:#d7fff2,stroke-width:1.35px;"); + L.push(" classDef apply fill:#151d2d,stroke:#72a7ff,color:#e3efff,stroke-width:1.25px;"); + L.push(" classDef event fill:#101825,stroke:#4a5366,color:#dbe2f0,stroke-width:1.15px;"); L.push(" classDef code fill:#101218,stroke:#363a45,color:#9aa0b0,stroke-width:1.1px;"); return L.join("\n"); } @@ -227,7 +365,7 @@ function mermaid() { const njson = {}; for (const id of order) { const n = nodes[id]; - njson[safe(id)] = { kind: n.kind, label: n.label, stage: n.stage, config: n.config || {}, prompt: n.prompt, status: n.status, tokens: n.tokens, durationMs: n.durationMs }; + njson[safe(id)] = { kind: n.kind, label: n.label, stage: n.stage, config: n.config || {}, prompt: n.prompt, status: n.status, tokens: n.tokens, durationMs: n.durationMs, event: n.event, raw: n.raw }; } const vendorRel = (file) => { diff --git a/odw/src/pack/templates/runtime/odw-js-runner.mjs b/odw/src/pack/templates/runtime/odw-js-runner.mjs index b739d6a..d454a26 100644 --- a/odw/src/pack/templates/runtime/odw-js-runner.mjs +++ b/odw/src/pack/templates/runtime/odw-js-runner.mjs @@ -1,4 +1,5 @@ import { spawn, execFileSync } from "node:child_process"; +import { createHash } from "node:crypto"; import { existsSync, mkdirSync, readFileSync, renameSync, rmSync, writeFileSync } from "node:fs"; import { basename, dirname } from "node:path"; import os from "node:os"; @@ -11,7 +12,6 @@ const runDir = process.env.ODW_RUN_DIR || cwd; const statePath = process.env.ODW_STATE_PATH || ""; const resumeStatePath = process.env.ODW_RESUME_STATE_PATH || ""; const resumeFrom = process.env.ODW_RESUME_FROM || ""; -const codexctlBin = process.env.ODW_CODEXCTL_BIN || "codexctl"; const pandacodeBin = process.env.ODW_PANDACODE_BIN || "pandacode"; const runId = process.env.ODW_RUN_ID || basename(runDir); const provider = process.env.ODW_PROVIDER || ""; @@ -140,7 +140,8 @@ globalThis.checkpoint = (name, value = null) => { globalThis.agent = async (prompt, options = {}) => { agentIndex += 1; - const label = options.label || `agent-${agentIndex}`; + const index = agentIndex; + const label = options.label || `agent-${index}`; const phase = options.phase || currentPhase || ""; const agentType = firstText(options.agentType, options.nodeType, options.role) || undefined; const normalizedOptions = { ...options, label, phase }; @@ -175,8 +176,8 @@ globalThis.agent = async (prompt, options = {}) => { retryable: false } }; - markAgentFailed({ key, label, phase, agentType, attempt: 1, maxAttempts, result }); - emit({ type: "agent_done", index: agentIndex, key, label, phase, agentType, attempt: 1, maxAttempts, ok: false, result }); + markAgentFailed({ index, key, label, phase, agentType, attempt: 1, maxAttempts, result }); + emit({ type: "agent_done", index, key, label, phase, agentType, attempt: 1, maxAttempts, ok: false, result }); return result; } const cached = state.agents[key]; @@ -185,7 +186,7 @@ globalThis.agent = async (prompt, options = {}) => { && cached?.result !== undefined && (cached.fingerprint === undefined || cached.fingerprint === fingerprint) ) { - emit({ type: "agent_skip", index: agentIndex, key, label, phase, agentType, reason: "cached", result: cached.result }); + emit({ type: "agent_skip", index, key, label, phase, agentType, reason: "cached", result: cached.result }); return cached.result; } @@ -218,8 +219,8 @@ globalThis.agent = async (prompt, options = {}) => { maxAttempts: maxAttempts > 1 ? maxAttempts : undefined, }; for (let attempt = 1; attempt <= maxAttempts; attempt += 1) { - markAgentActive({ key, label, phase, agentType, attempt, maxAttempts }); - emit({ type: "agent_start", index: agentIndex, key, label, phase, agentType, runtime: displayRuntime, model: displayModel, promptPreview, config: nodeConfig, attempt, maxAttempts }); + markAgentActive({ index, key, label, phase, agentType, attempt, maxAttempts }); + emit({ type: "agent_start", index, key, label, phase, agentType, runtime: displayRuntime, model: displayModel, promptPreview, config: nodeConfig, attempt, maxAttempts }); try { const rawResult = await runAgent(attemptPrompt, { ...normalizedOptions, attempt, previousFailure }); // Count every dispatched attempt's tokens. A node that retries or ultimately @@ -249,7 +250,7 @@ globalThis.agent = async (prompt, options = {}) => { : displayModel; state.agents[key] = { ok, - index: agentIndex, + index, key, fingerprint, label, @@ -266,7 +267,7 @@ globalThis.agent = async (prompt, options = {}) => { delete state.activeAgents[key]; delete state.failedAgents[key]; saveState(); - emit({ type: "agent_done", index: agentIndex, key, label, phase, agentType, runtime: displayRuntime, model: resolvedModel, attempt, maxAttempts, ok, tokens: nodeTotalTokens(rawResult), result: finalResult }); + emit({ type: "agent_done", index, key, label, phase, agentType, runtime: displayRuntime, model: resolvedModel, attempt, maxAttempts, ok, tokens: nodeTotalTokens(rawResult), result: finalResult }); return finalResult; } @@ -286,7 +287,7 @@ globalThis.agent = async (prompt, options = {}) => { const retryable = attempt < maxAttempts; emit({ type: "agent_schema_invalid", - index: agentIndex, + index, key, label, phase, @@ -308,11 +309,11 @@ globalThis.agent = async (prompt, options = {}) => { updatedAt: new Date().toISOString() }; saveState(); - emit({ type: "agent_retry", index: agentIndex, key, label, phase, agentType, attempt, nextAttempt: attempt + 1, maxAttempts, reason: "schema_mismatch" }); + emit({ type: "agent_retry", index, key, label, phase, agentType, attempt, nextAttempt: attempt + 1, maxAttempts, reason: "schema_mismatch" }); continue; } - markAgentFailed({ key, label, phase, agentType, attempt, maxAttempts, result: previousFailure }); - emit({ type: "agent_done", index: agentIndex, key, label, phase, agentType, attempt, maxAttempts, ok: false, result: previousFailure }); + markAgentFailed({ index, key, label, phase, agentType, attempt, maxAttempts, result: previousFailure }); + emit({ type: "agent_done", index, key, label, phase, agentType, attempt, maxAttempts, ok: false, result: previousFailure }); return previousFailure; } @@ -336,12 +337,12 @@ globalThis.agent = async (prompt, options = {}) => { updatedAt: new Date().toISOString() }; saveState(); - emit({ type: "agent_retry", index: agentIndex, key, label, phase, agentType, attempt, nextAttempt: attempt + 1, maxAttempts, reason: result?.error?.category || "worker_failed" }); + emit({ type: "agent_retry", index, key, label, phase, agentType, attempt, nextAttempt: attempt + 1, maxAttempts, reason: result?.error?.category || "worker_failed" }); continue; } - markAgentFailed({ key, label, phase, agentType, attempt, maxAttempts, result }); - emit({ type: "agent_done", index: agentIndex, key, label, phase, agentType, attempt, maxAttempts, ok: false, result }); + markAgentFailed({ index, key, label, phase, agentType, attempt, maxAttempts, result }); + emit({ type: "agent_done", index, key, label, phase, agentType, attempt, maxAttempts, ok: false, result }); return result; } catch (error) { previousResult = { @@ -371,11 +372,11 @@ globalThis.agent = async (prompt, options = {}) => { updatedAt: new Date().toISOString() }; saveState(); - emit({ type: "agent_retry", index: agentIndex, key, label, phase, agentType, attempt, nextAttempt: attempt + 1, maxAttempts, reason: "workflow_agent_failed" }); + emit({ type: "agent_retry", index, key, label, phase, agentType, attempt, nextAttempt: attempt + 1, maxAttempts, reason: "workflow_agent_failed" }); continue; } - markAgentFailed({ key, label, phase, agentType, attempt, maxAttempts, result: previousFailure }); - emit({ type: "agent_done", index: agentIndex, key, label, phase, agentType, attempt, maxAttempts, ok: false, result: previousFailure }); + markAgentFailed({ index, key, label, phase, agentType, attempt, maxAttempts, result: previousFailure }); + emit({ type: "agent_done", index, key, label, phase, agentType, attempt, maxAttempts, ok: false, result: previousFailure }); return previousFailure; } } @@ -386,10 +387,10 @@ globalThis.agent = async (prompt, options = {}) => { }; }; -function markAgentActive({ key, label, phase, agentType, attempt, maxAttempts }) { +function markAgentActive({ index, key, label, phase, agentType, attempt, maxAttempts }) { state.activeAgents[key] = { key, - index: agentIndex, + index, label, phase, agentType, @@ -402,11 +403,11 @@ function markAgentActive({ key, label, phase, agentType, attempt, maxAttempts }) saveState(); } -function markAgentFailed({ key, label, phase, agentType, attempt, maxAttempts, result }) { +function markAgentFailed({ index, key, label, phase, agentType, attempt, maxAttempts, result }) { delete state.activeAgents[key]; state.failedAgents[key] = { key, - index: agentIndex, + index, label, phase, agentType, @@ -507,6 +508,8 @@ function appendSchemaContract(prompt, descriptor, schemaDescription = "") { } lines.push( `When the task is complete, make your final assistant response exactly one JSON object that satisfies ${descriptor.name}.`, + "The final response must start with { and end with }. Put every decision, failure, blocker, and verification detail inside that JSON object.", + "Never answer a schema node with plain prose, markdown bullets, a bare decision word, or a fenced code block.", "Do not wrap the final JSON in markdown fences. Do not add prose before or after the final JSON object.", "If you attempted the task but cannot complete it, make the final assistant response an object matching .odw/schemas/error-feedback.schema.json instead of prose.", "Required JSON Schema:", @@ -629,6 +632,15 @@ function normalizeNodeResult(result, options, schemaDescriptor = null) { function extractStructuredCodexOutput(report, schemaDescriptor = null) { const codex = report?.codex && typeof report.codex === "object" ? report.codex : null; const summary = report?.summary && typeof report.summary === "object" ? report.summary : null; + const validStructuredCandidate = (candidate) => { + if (!candidate || typeof candidate !== "object" || Array.isArray(candidate)) { + return null; + } + if (schemaDescriptor?.schema && !validateNodeResult(candidate, schemaDescriptor).valid) { + return null; + } + return candidate; + }; for (const candidate of [ report?.structured_output, report?.structuredOutput, @@ -646,11 +658,12 @@ function extractStructuredCodexOutput(report, schemaDescriptor = null) { codex?.output, codex?.json ]) { - if (candidate && typeof candidate === "object" && !Array.isArray(candidate)) { - return candidate; + const structured = validStructuredCandidate(candidate); + if (structured) { + return structured; } } - if (codex && !looksLikeCodexEnvelope(codex)) { + if (codex && !looksLikeCodexEnvelope(codex) && validStructuredCandidate(codex)) { return codex; } for (const message of report?.agent_messages ?? []) { @@ -1054,7 +1067,9 @@ function retryPrompt(originalPrompt, failure, schema = null, schemaDescription = truncateJson(failure?.context?.previous_result, 6000), "", "Retry instruction:", - "Do the same node task again. Preserve the original intent, fix only the failed contract, and return output that satisfies the requested schema." + "Do the same node task again. Preserve the original intent, fix only the failed contract, and return output that satisfies the requested schema.", + "Your final response must be JSON only: start with {, end with }, no markdown, no prose, no bare approve/reject/needs_owner word.", + "If the node is a review and the decision is reject or needs_owner, put the evidence into blockers, risks, owner_questions, and verification fields instead of explaining outside JSON." ].filter(Boolean).join("\n"), schema, schemaDescription); } @@ -1264,6 +1279,52 @@ globalThis.pandacode = { } }; +globalThis.applyWorktreeDiff = (candidate, options = {}) => applyCapturedWorktreeDiff(candidate, options); +globalThis.applyWorktreeDiffs = (candidates, options = {}) => { + if (!options.continueOnError) { + return applyCapturedWorktreeDiffsAtomic(candidates, options); + } + const list = Array.isArray(candidates) ? candidates : [candidates]; + const results = []; + for (let index = 0; index < list.length; index += 1) { + const result = applyCapturedWorktreeDiff(list[index], { + ...options, + label: patchApplyLabel(options, index) + }); + results.push(result); + } + return worktreePatchBatchResult(results); +}; +globalThis.reviewWorktreeDiffs = (candidates, options = {}) => reviewCapturedWorktreeDiffs(candidates, options); +globalThis.captureMainWorktreeSnapshot = (options = {}) => captureMainWorktreeSnapshot(options); +globalThis.assertMainWorktreeUnchanged = (snapshot, options = {}) => assertMainWorktreeUnchanged(snapshot, options); +globalThis.restoreMainWorktreeSnapshot = (snapshot, check, options = {}) => restoreMainWorktreeSnapshot(snapshot, check, options); + +function combinePatchTexts(entries) { + return entries + .map((entry) => String(entry.diff || "")) + .filter((diff) => diff.trim()) + .map((diff) => diff.endsWith("\n") ? diff : `${diff}\n`) + .join(""); +} + +const mockReviewRejectOnce = new Map(); + +const WORKTREE_REVIEW_SCHEMA = { + title: "odw-worktree-review.schema.json", + type: "object", + required: ["decision", "summary", "blockers", "risks", "owner_questions", "verification"], + properties: { + decision: { enum: ["approve", "reject", "needs_owner"] }, + summary: { type: "string" }, + blockers: { type: "array", items: { type: "string" } }, + risks: { type: "array", items: { type: "string" } }, + owner_questions: { type: "array", items: { type: "string" } }, + verification: { type: "array", items: { type: "string" } }, + files_reviewed: { type: "array", items: { type: "string" } } + } +}; + function createWorktree(baseCwd, options) { let gitOk = false; try { @@ -1292,6 +1353,7 @@ function createWorktree(baseCwd, options) { const dir = `${parent}/${label}-${worktreeSeq}`; rmSync(dir, { recursive: true, force: true }); execFileSync("git", ["-C", baseCwd, "worktree", "add", "--detach", "--quiet", dir], { stdio: "ignore" }); + configureWorktreeExcludes(dir); const cleanup = () => { try { execFileSync("git", ["-C", baseCwd, "worktree", "remove", "--force", dir], { stdio: "ignore" }); @@ -1303,6 +1365,25 @@ function createWorktree(baseCwd, options) { return { dir, cleanup }; } +function configureWorktreeExcludes(dir) { + try { + const excludePath = execFileSync("git", ["-C", dir, "rev-parse", "--git-path", "info/exclude"], { encoding: "utf8" }).trim(); + if (!excludePath) { + return; + } + mkdirSync(dirname(excludePath), { recursive: true }); + const existing = existsSync(excludePath) ? readFileSync(excludePath, "utf8") : ""; + const lines = [".pandacode/", ".odw/", "node_modules/"]; + const additions = lines.filter((line) => !existing.split(/\r?\n/).includes(line)); + if (additions.length > 0) { + const prefix = existing && !existing.endsWith("\n") ? "\n" : ""; + writeFileSync(excludePath, `${existing}${prefix}${additions.join("\n")}\n`); + } + } catch { + // Best-effort only: diff capture still excludes executor scratch paths. + } +} + // After a worktree node runs, capture the agent's file changes as a portable // patch. Built-in keeps a changed worktree on disk; ODW instead returns the diff // as data and removes the dir — no orphan worktrees, changes never silently lost. @@ -1311,22 +1392,697 @@ function createWorktree(baseCwd, options) { const WORKTREE_DIFF_EXCLUDES = [".", ":(exclude).pandacode", ":(exclude).odw", ":(exclude)node_modules"]; function captureWorktreeChanges(dir) { try { + const base = execFileSync("git", ["-C", dir, "rev-parse", "HEAD"], { encoding: "utf8" }).trim(); // Plain `git add -A` silently respects .gitignore (no error on ignored files); // the executor-scratch exclusion is applied to status/diff so it works even in // a repo that does NOT gitignore .pandacode/.odw. execFileSync("git", ["-C", dir, "add", "-A"], { stdio: "ignore" }); const status = execFileSync("git", ["-C", dir, "status", "--porcelain", "--", ...WORKTREE_DIFF_EXCLUDES], { encoding: "utf8" }); if (!status.trim()) { - return { changed: false, files: [], diff: "" }; + return { changed: false, files: [], diff: "", base }; } const files = status.trim().split(/\r?\n/).map((line) => line.slice(3).trim()).filter(Boolean); const diff = execFileSync("git", ["-C", dir, "diff", "--cached", "HEAD", "--", ...WORKTREE_DIFF_EXCLUDES], { encoding: "utf8", maxBuffer: 64 * 1024 * 1024 }); - return { changed: true, files, diff }; + return { changed: true, files, diff, base }; } catch (error) { return { changed: false, files: [], diff: "", error: String(error?.message ?? error) }; } } +function worktreePatchOf(candidate) { + if (!candidate || typeof candidate !== "object") { + return null; + } + if (candidate.worktree && typeof candidate.worktree === "object") { + return candidate.worktree; + } + if ("diff" in candidate || "changed" in candidate || "files" in candidate) { + return candidate; + } + return null; +} + +function applyCapturedWorktreeDiff(candidate, options = {}) { + const worktree = worktreePatchOf(candidate); + const label = options.label || "worktree diff"; + if (!worktree) { + const result = { + ok: false, + applied: false, + files: [], + error: { category: "invalid_worktree_diff", message: "Expected an agent result with .worktree or a worktree diff object." } + }; + emitWorktreePatchApply(label, result); + return result; + } + + const files = Array.isArray(worktree.files) ? worktree.files : []; + if (!worktree.changed || !String(worktree.diff || "").trim()) { + const result = { ok: true, applied: false, changed: false, files, base: worktree.base || null }; + emitWorktreePatchApply(label, result); + return result; + } + + const diff = String(worktree.diff); + const check = runGitApply(["apply", "--check", "--whitespace=nowarn"], diff); + if (!check.ok) { + const result = { + ok: false, + applied: false, + files, + base: worktree.base || null, + error: { category: "patch_conflict", message: check.message } + }; + emitWorktreePatchApply(label, result); + return result; + } + + const applied = runGitApply(["apply", "--whitespace=nowarn"], diff); + if (!applied.ok) { + const result = { + ok: false, + applied: false, + files, + base: worktree.base || null, + error: { category: "patch_apply_failed", message: applied.message } + }; + emitWorktreePatchApply(label, result); + return result; + } + + const result = { ok: true, applied: true, files, base: worktree.base || null }; + emitWorktreePatchApply(label, result); + return result; +} + +function applyCapturedWorktreeDiffsAtomic(candidates, options = {}) { + const list = Array.isArray(candidates) ? candidates : [candidates]; + const prepared = list.map((candidate, index) => prepareWorktreePatch(candidate, patchApplyLabel(options, index))); + const invalid = prepared.filter((entry) => entry.result?.ok === false); + if (invalid.length > 0) { + const results = prepared.map((entry) => entry.result || { + ok: false, + applied: false, + files: entry.files, + base: entry.base, + error: { + category: "batch_preflight_failed", + message: "Batch contains an invalid worktree diff; no patch was applied." + } + }); + emitWorktreePatchApplyResults(prepared, results); + return worktreePatchBatchResult(results); + } + + const changed = prepared.filter((entry) => entry.diff); + if (changed.length === 0) { + const results = prepared.map((entry) => entry.result); + emitWorktreePatchApplyResults(prepared, results); + return worktreePatchBatchResult(results); + } + + const combinedDiff = combinePatchTexts(changed); + const check = runGitApply(["apply", "--check", "--whitespace=nowarn"], combinedDiff); + if (!check.ok) { + const results = prepared.map((entry) => entry.result || { + ok: false, + applied: false, + files: entry.files, + base: entry.base, + error: { category: "patch_conflict", message: check.message } + }); + emitWorktreePatchApplyResults(prepared, results); + return worktreePatchBatchResult(results); + } + + const applied = runGitApply(["apply", "--whitespace=nowarn"], combinedDiff); + if (!applied.ok) { + const results = prepared.map((entry) => entry.result || { + ok: false, + applied: false, + files: entry.files, + base: entry.base, + error: { category: "patch_apply_failed", message: applied.message } + }); + emitWorktreePatchApplyResults(prepared, results); + return worktreePatchBatchResult(results); + } + + const results = prepared.map((entry) => entry.result || { + ok: true, + applied: true, + files: entry.files, + base: entry.base + }); + emitWorktreePatchApplyResults(prepared, results); + return worktreePatchBatchResult(results); +} + +function prepareWorktreePatch(candidate, label) { + const worktree = worktreePatchOf(candidate); + if (!worktree) { + return { + label, + files: [], + base: null, + result: { + ok: false, + applied: false, + files: [], + error: { category: "invalid_worktree_diff", message: "Expected an agent result with .worktree or a worktree diff object." } + } + }; + } + const files = Array.isArray(worktree.files) ? worktree.files : []; + const base = worktree.base || null; + const diff = String(worktree.diff || ""); + if (!worktree.changed || !diff.trim()) { + return { + label, + files, + base, + result: { ok: true, applied: false, changed: false, files, base } + }; + } + return { label, files, base, diff }; +} + +function patchApplyLabel(options, index) { + return options.label ? `${options.label}-${index + 1}` : `patch-${index + 1}`; +} + +function emitWorktreePatchApplyResults(entries, results) { + for (let index = 0; index < results.length; index += 1) { + emitWorktreePatchApply(entries[index]?.label || `patch-${index + 1}`, results[index]); + } +} + +function emitWorktreePatchApply(label, result) { + const event = { + type: "worktree_patch_apply", + label, + ok: result.ok === true, + applied: result.applied === true, + files: Array.isArray(result.files) ? result.files.length : 0 + }; + if (result.error) { + event.category = result.error.category; + event.message = truncateText(result.error.message, 240); + } + emit(event); +} + +function worktreePatchBatchResult(results) { + const failed = results.filter((result) => result.ok === false); + const applied = results.filter((result) => result.applied === true); + return { + ok: failed.length === 0, + applied: applied.length, + failed: failed.length, + partial: failed.length > 0 && applied.length > 0, + results + }; +} + +function preflightRejectGate({ files, category, message }) { + const preflightCategory = firstText(typeof category === "string" ? category : "", "preflight_failed"); + const preflightMessage = firstText( + typeof message === "string" ? message : String(message ?? ""), + preflightCategory + ); + const blocker = truncateText(`${preflightCategory}: ${preflightMessage}`, 1200); + return { + ok: false, + decision: "reject", + applyReady: false, + files, + preflight: { + ok: false, + category: preflightCategory, + message: preflightMessage + }, + reviews: [], + blockers: [blocker], + risks: [], + owner_questions: [], + verification: [blocker] + }; +} + +async function reviewCapturedWorktreeDiffs(candidates, options = {}) { + const list = Array.isArray(candidates) ? candidates : [candidates]; + const label = options.label || "worktree-review"; + const prepared = list.map((candidate, index) => prepareWorktreePatch(candidate, patchApplyLabel(options, index))); + const invalid = prepared.filter((entry) => entry.result?.ok === false); + const files = uniqueStrings(prepared.flatMap((entry) => entry.files)); + if (invalid.length > 0) { + const gate = preflightRejectGate({ + files, + category: "invalid_worktree_diff", + message: "One or more candidates are not captured worktree diffs." + }); + emitWorktreeReviewGate(label, gate); + return gate; + } + + const changed = prepared.filter((entry) => entry.diff); + if (changed.length === 0) { + const gate = { + ok: true, + decision: "approve", + applyReady: false, + files, + preflight: { ok: true, changed: false }, + reviews: [] + }; + emitWorktreeReviewGate(label, gate); + return gate; + } + + const combinedDiff = combinedWorktreeDiff(changed); + const check = runGitApply(["apply", "--check", "--whitespace=nowarn"], combinedDiff); + if (!check.ok) { + const gate = preflightRejectGate({ + files, + category: "patch_conflict", + message: check.message + }); + emitWorktreeReviewGate(label, gate); + return gate; + } + + let reviewWorktree = null; + let reviewWorkspaceReady = false; + try { + try { + reviewWorktree = createWorktree(cwd, { id: `${label}-candidate`, label: `${label}-candidate` }); + } catch (error) { + const gate = preflightRejectGate({ + files, + category: "review_workspace_failed", + message: String(error?.message ?? error) + }); + emitWorktreeReviewGate(label, gate); + return gate; + } + const reviewApply = runGitApplyIn(reviewWorktree.dir, ["apply", "--whitespace=nowarn"], combinedDiff); + if (!reviewApply.ok) { + const gate = preflightRejectGate({ + files, + category: "review_workspace_failed", + message: reviewApply.message + }); + emitWorktreeReviewGate(label, gate); + return gate; + } + reviewWorkspaceReady = true; + emit({ type: "worktree_review_workspace", label, status: "start", dir: reviewWorktree.dir, files: files.length }); + + const reviewers = normalizeWorktreeReviewers(options); + const basePrompt = buildWorktreeReviewPrompt({ prepared: changed, combinedDiff, files, options }); + const reviews = await globalThis.parallel( + reviewers.map((reviewer, index) => () => + globalThis.agent(buildReviewerPrompt(basePrompt, reviewer), { + id: reviewer.id || `${label}-review-${index + 1}`, + label: reviewer.label || `review-${index + 1}`, + phase: options.phase, + runtime: reviewer.runtime || options.runtime || "codex", + provider: reviewer.provider || options.provider, + permission: reviewer.permission || options.permission || "limited", + model: reviewer.model || options.model, + effort: reviewer.effort || options.effort, + timeout: reviewer.timeout || options.timeout, + execCwd: reviewWorktree.dir, + schema: WORKTREE_REVIEW_SCHEMA, + schemaDescription: "Final response is the structured ODW worktree diff review gate verdict.", + retry: reviewer.retry || options.retry || { maxAttempts: 2 } + }) + ), + { label: `${label}-review`, max: options.maxReviewers || reviewers.length } + ); + const normalizedReviews = reviews.map((review, index) => normalizeWorktreeReview(review, reviewers[index])); + const gate = aggregateWorktreeReviewGate({ label, files, reviews: normalizedReviews }); + emitWorktreeReviewGate(label, gate); + return gate; + } finally { + if (reviewWorktree) { + reviewWorktree.cleanup(); + if (reviewWorkspaceReady) { + emit({ type: "worktree_review_workspace", label, status: "done", files: files.length }); + } + } + } +} + +function combinedWorktreeDiff(entries) { + return combinePatchTexts(entries); +} + +function normalizeWorktreeReviewers(options = {}) { + if (Array.isArray(options.reviewers) && options.reviewers.length > 0) { + return options.reviewers.map((reviewer, index) => + typeof reviewer === "string" + ? { label: reviewer, perspective: reviewer } + : { label: `review-${index + 1}`, ...(reviewer || {}) } + ); + } + const count = Math.max(1, Math.min(4, Number(options.reviewerCount || 1))); + const defaultPerspectives = [ + "correctness and regression risk", + "adversarial edge-case review", + "product intent and owner decision risk", + "verification evidence review" + ]; + return Array.from({ length: count }, (_, index) => ({ + label: count === 1 ? "review" : `review-${index + 1}`, + perspective: defaultPerspectives[index] || "general review", + runtime: options.runtime || "codex", + permission: options.permission || "limited" + })); +} + +function buildWorktreeReviewPrompt({ prepared, combinedDiff, files, options }) { + const diffLimit = Math.max(2000, Number(options.maxDiffChars || 30000)); + const diffText = truncateText(combinedDiff, diffLimit, "head"); + const context = options.context ? `\nProject context:\n${String(options.context)}\n` : ""; + const criteria = Array.isArray(options.criteria) && options.criteria.length > 0 + ? options.criteria.map((item) => `- ${item}`).join("\n") + : "- Check whether this batch is safe to land atomically.\n- Identify blockers, missing verification, semantic conflicts, and owner decisions.\n- Prefer needs_owner when product intent or acceptance criteria require human judgment."; + return `Review an ODW batch of captured worktree diffs before atomic landing. + +The combined diff has already been applied to your current working directory for +this review node. Inspect the files and run relevant tests/checks there. Do not +edit files. + +New files from the captured diff may appear as untracked in git status inside +this temporary review workspace. Do not treat that alone as a landing blocker: +approval lands the captured patch with applyWorktreeDiffs, including new files +listed below. + +Files: +${files.map((file) => `- ${file}`).join("\n")} + +Base commits: +${uniqueStrings(prepared.map((entry) => entry.base).filter(Boolean)).map((base) => `- ${base}`).join("\n") || "- unknown"} +${context} +Review criteria: +${criteria} + +Return decision: +- approve: safe to apply atomically after this gate. +- reject: do not apply; blockers or failed verification must be fixed first. +- needs_owner: owner/product decision is required before AI should land the batch. + +Required final response shape: +{ + "decision": "approve|reject|needs_owner", + "summary": "one concise verdict", + "blockers": ["required fixes before landing"], + "risks": ["non-blocking risks"], + "owner_questions": ["only consequential owner decisions not already specified"], + "verification": ["commands run or evidence inspected"], + "files_reviewed": ["path/to/file"] +} + +Return that JSON object only. If rejecting, do not write prose; put every failure +and command result into blockers and verification. + +Combined diff: +${diffText}`; +} + +function buildReviewerPrompt(basePrompt, reviewer) { + const perspective = reviewer.perspective ? `\nReviewer perspective: ${reviewer.perspective}\n` : ""; + return `${basePrompt}${perspective} +Be adversarial and evidence-backed. Do not edit files.`; +} + +function normalizeWorktreeReview(review, reviewer = {}) { + if (!review || review.ok === false) { + return { + reviewer: reviewer.label || "review", + decision: "reject", + summary: firstText(review?.error?.message, "reviewer failed or returned no result"), + blockers: [firstText(review?.error?.message, "reviewer failed or returned no result")], + risks: [], + owner_questions: [], + verification: [], + files_reviewed: [] + }; + } + if (typeof review === "object" && !Array.isArray(review)) { + const decision = ["approve", "reject", "needs_owner"].includes(review.decision) ? review.decision : inferReviewDecision(review.summary || ""); + return { + reviewer: reviewer.label || "review", + decision, + summary: firstText(review.summary, JSON.stringify(review).slice(0, 1000)), + blockers: stringArray(review.blockers), + risks: stringArray(review.risks), + owner_questions: stringArray(review.owner_questions), + verification: stringArray(review.verification), + files_reviewed: stringArray(review.files_reviewed) + }; + } + const text = String(review); + return { + reviewer: reviewer.label || "review", + decision: inferReviewDecision(text), + summary: text.slice(0, 1000), + blockers: /reject|fail|blocker|失败|拒绝|不通过/i.test(text) ? [text.slice(0, 1000)] : [], + risks: [], + owner_questions: /needs_owner|owner|拍板|决策/i.test(text) ? [text.slice(0, 1000)] : [], + verification: [], + files_reviewed: [] + }; +} + +function inferReviewDecision(text) { + const value = String(text || ""); + if (/needs_owner|owner|拍板|决策|需要.*确认/i.test(value)) { + return "needs_owner"; + } + if (/reject|fail|blocker|failed|失败|拒绝|不通过|阻塞/i.test(value)) { + return "reject"; + } + return "approve"; +} + +function aggregateWorktreeReviewGate({ files, reviews }) { + const rejected = reviews.filter((review) => review.decision === "reject"); + const owner = reviews.filter((review) => review.decision === "needs_owner"); + const decision = rejected.length > 0 ? "reject" : (owner.length > 0 ? "needs_owner" : "approve"); + const ok = decision === "approve"; + return { + ok, + decision, + applyReady: ok, + files, + preflight: { ok: true, changed: true }, + reviews, + blockers: uniqueStrings(reviews.flatMap((review) => review.blockers)), + risks: uniqueStrings(reviews.flatMap((review) => review.risks)), + owner_questions: uniqueStrings(reviews.flatMap((review) => review.owner_questions)), + verification: uniqueStrings(reviews.flatMap((review) => review.verification)) + }; +} + +function emitWorktreeReviewGate(label, gate) { + const event = { + type: "worktree_review_gate", + label, + ok: gate.ok === true, + decision: gate.decision, + applyReady: gate.applyReady === true, + files: Array.isArray(gate.files) ? gate.files.length : 0, + file_samples: previewStrings(gate.files, 8, 160), + reviewers: Array.isArray(gate.reviews) ? gate.reviews.length : 0, + review_decisions: previewStrings( + (gate.reviews || []).map((review) => `${review.reviewer || "review"}:${review.decision || "unknown"}`), + 8, + 160 + ), + blockers: Array.isArray(gate.blockers) ? gate.blockers.length : 0, + blocker_samples: previewStrings(gate.blockers, 5, 500), + risks: Array.isArray(gate.risks) ? gate.risks.length : 0, + risk_samples: previewStrings(gate.risks, 5, 500), + owner_questions: Array.isArray(gate.owner_questions) ? gate.owner_questions.length : 0, + owner_question_samples: previewStrings(gate.owner_questions, 5, 500), + verification_samples: previewStrings(gate.verification, 5, 500) + }; + if (gate.preflight?.ok === false) { + event.preflight_category = firstText(gate.preflight.category, "preflight_failed"); + event.preflight_message = truncateText(firstText(gate.preflight.message, event.preflight_category), 500); + event.category = event.preflight_category; + event.message = event.preflight_message; + } + emit(event); +} + +function stringArray(value) { + return Array.isArray(value) ? value.filter((item) => typeof item === "string").map((item) => item.trim()).filter(Boolean) : []; +} + +function previewStrings(values, maxItems = 5, maxChars = 300) { + return stringArray(values).slice(0, maxItems).map((value) => + value.length > maxChars ? `${value.slice(0, maxChars - 1)}…` : value + ); +} + +function uniqueStrings(values) { + return [...new Set(stringArray(values))]; +} + +function captureMainWorktreeSnapshot(options = {}) { + return { + label: options.label || "", + ...captureGitSnapshot(cwd) + }; +} + +function assertMainWorktreeUnchanged(snapshot, options = {}) { + const label = options.label || snapshot?.label || "main-worktree"; + const before = snapshot && typeof snapshot === "object" ? snapshot : { ok: false, files: [], hashes: {} }; + const after = captureGitSnapshot(cwd); + const beforeHashes = before.hashes && typeof before.hashes === "object" ? before.hashes : {}; + const afterHashes = after.hashes && typeof after.hashes === "object" ? after.hashes : {}; + const beforeFiles = new Set(Object.keys(beforeHashes)); + const afterFiles = new Set(Object.keys(afterHashes)); + const added = [...afterFiles].filter((file) => !beforeFiles.has(file)).sort(); + const removed = [...beforeFiles].filter((file) => !afterFiles.has(file)).sort(); + const modified = [...afterFiles].filter((file) => beforeFiles.has(file) && beforeHashes[file] !== afterHashes[file]).sort(); + const files = uniqueStrings([...added, ...removed, ...modified]); + const result = { + ok: before.ok === true && after.ok === true && files.length === 0, + label, + before_files: before.files?.length || 0, + after_files: after.files?.length || 0, + added, + removed, + modified, + files, + error: before.error || after.error || undefined + }; + emit({ + type: "worktree_snapshot_check", + label, + ok: result.ok, + files: files.length, + file_samples: previewStrings(files, 8, 160), + added: added.length, + removed: removed.length, + modified: modified.length, + message: result.error + }); + return result; +} + +function restoreMainWorktreeSnapshot(snapshot, check = null, options = {}) { + const label = options.label || snapshot?.label || "main-worktree-restore"; + const before = snapshot && typeof snapshot === "object" ? snapshot : { ok: false, files: [], hashes: {}, contents: {} }; + const detected = check && typeof check === "object" ? check : assertMainWorktreeUnchanged(snapshot, { label: `${label}-precheck` }); + const contents = before.contents && typeof before.contents === "object" ? before.contents : {}; + const restored = []; + const removed = []; + const errors = []; + + for (const file of stringArray(detected.added)) { + try { + rmSync(`${cwd}/${file}`, { force: true }); + removed.push(file); + } catch (error) { + errors.push(`${file}: ${String(error?.message ?? error)}`); + } + } + + for (const file of uniqueStrings([...(detected.modified || []), ...(detected.removed || [])])) { + try { + const encoded = contents[file]; + const path = `${cwd}/${file}`; + if (encoded === null || encoded === undefined) { + rmSync(path, { force: true }); + removed.push(file); + } else { + mkdirSync(dirname(path), { recursive: true }); + writeFileSync(path, Buffer.from(String(encoded), "base64")); + restored.push(file); + } + } catch (error) { + errors.push(`${file}: ${String(error?.message ?? error)}`); + } + } + + const after = assertMainWorktreeUnchanged(snapshot, { label: `${label}-after` }); + const result = { + ok: errors.length === 0 && after.ok === true, + label, + restored, + removed, + errors, + after + }; + emit({ + type: "worktree_snapshot_restore", + label, + ok: result.ok, + restored: restored.length, + removed: removed.length, + files: uniqueStrings([...restored, ...removed]).length, + file_samples: previewStrings([...restored, ...removed], 8, 160), + message: errors.join("; ") || undefined + }); + return result; +} + +function captureGitSnapshot(dir) { + try { + const files = gitChangedFiles(dir); + const hashes = {}; + const contents = {}; + for (const file of files) { + const path = `${dir}/${file}`; + if (existsSync(path)) { + const content = readFileSync(path); + hashes[file] = createHash("sha256").update(content).digest("hex"); + contents[file] = content.toString("base64"); + } else { + hashes[file] = null; + contents[file] = null; + } + } + return { ok: true, files, hashes, contents }; + } catch (error) { + return { ok: false, files: [], hashes: {}, contents: {}, error: String(error?.message ?? error) }; + } +} + +function gitChangedFiles(dir) { + const tracked = execFileSync("git", ["-C", dir, "diff", "--name-only", "HEAD", "--", ...WORKTREE_DIFF_EXCLUDES], { + encoding: "utf8", + maxBuffer: 8 * 1024 * 1024 + }); + const untracked = execFileSync("git", ["-C", dir, "ls-files", "--others", "--exclude-standard", "--", ...WORKTREE_DIFF_EXCLUDES], { + encoding: "utf8", + maxBuffer: 8 * 1024 * 1024 + }); + return uniqueStrings(`${tracked}\n${untracked}`.split(/\r?\n/)); +} + +function runGitApply(args, input) { + return runGitApplyIn(cwd, args, input); +} + +function runGitApplyIn(dir, args, input) { + try { + execFileSync("git", ["-C", dir, ...args], { input, encoding: "utf8", maxBuffer: 64 * 1024 * 1024 }); + return { ok: true, message: "" }; + } catch (error) { + return { + ok: false, + message: firstText(error?.stderr?.toString?.(), error?.stdout?.toString?.(), error?.message, "git apply failed") + }; + } +} + async function runAgent(prompt, options) { if (options.isolation === "worktree") { const wt = createWorktree(cwd, options); @@ -1395,11 +2151,14 @@ async function dispatchBackend(prompt, options) { mockResult.summary = { ...(mockResult.summary || {}), model: String(options.mockResolvedModel) }; } // Mock-only: write the requested file plus an executor-scratch file under - // .pandacode/ so the worktree diff-capture + exclusion path is testable for free. - if (options.mockWriteFile && options.execCwd) { - writeFileSync(`${options.execCwd}/${options.mockWriteFile}`, `mock change by ${options.label || "agent"}\n`); - mkdirSync(`${options.execCwd}/.pandacode`, { recursive: true }); - writeFileSync(`${options.execCwd}/.pandacode/scratch.txt`, "executor metadata that must not pollute the captured diff\n"); + // .pandacode/ so diff-capture and read-only guards are testable for free. + if (options.mockWriteFile) { + const mockCwd = options.execCwd || cwd; + const mockWritePath = `${mockCwd}/${options.mockWriteFile}`; + mkdirSync(dirname(mockWritePath), { recursive: true }); + writeFileSync(mockWritePath, `mock change by ${options.label || "agent"}\n`); + mkdirSync(`${mockCwd}/.pandacode`, { recursive: true }); + writeFileSync(`${mockCwd}/.pandacode/scratch.txt`, "executor metadata that must not pollute the captured diff\n"); } return mockResult; } @@ -1422,6 +2181,60 @@ function mockResultForSchema(options, prompt) { if (schemaName.endsWith("security-finding.schema.json")) { return { findings: [], clean_files: [], uncertain: [] }; } + if (schemaName.endsWith("odw-worktree-review.schema.json")) { + const text = String(prompt || ""); + const repeatReject = text.match(/MOCK_REJECT_(ONCE|TWICE|THRICE)/)?.[1]; + if (repeatReject) { + const key = options.label || options.id || "review"; + const seen = mockReviewRejectOnce.get(key) || 0; + mockReviewRejectOnce.set(key, seen + 1); + const rejectLimit = repeatReject === "THRICE" ? 3 : repeatReject === "TWICE" ? 2 : 1; + if (seen < rejectLimit) { + const file = text.match(/MOCK_REJECT_(?:ONCE|TWICE|THRICE)_FILE:([^\s]+)/)?.[1]; + const blocker = text.match(/MOCK_REJECT_(?:ONCE|TWICE|THRICE)_BLOCKER:([^\n]+)/)?.[1]?.trim(); + return { + decision: "reject", + summary: `mock review rejected attempt ${seen + 1}`, + blockers: [blocker || (file ? `mock one-time blocker in ${file}` : "mock one-time blocker")], + risks: [], + owner_questions: [], + verification: ["mock preflight passed"], + files_reviewed: [] + }; + } + } + if (/MOCK_NEEDS_OWNER/.test(text)) { + return { + decision: "needs_owner", + summary: "mock review needs owner decision", + blockers: [], + risks: ["mock owner-sensitive product decision"], + owner_questions: ["mock owner question"], + verification: ["mock preflight passed"], + files_reviewed: [] + }; + } + if (/\bMOCK_REJECT\b/.test(text)) { + return { + decision: "reject", + summary: "mock review rejected the batch", + blockers: ["mock blocker"], + risks: [], + owner_questions: [], + verification: ["mock preflight passed"], + files_reviewed: [] + }; + } + return { + decision: "approve", + summary: "mock review approved the batch", + blockers: [], + risks: [], + owner_questions: [], + verification: ["mock preflight passed"], + files_reviewed: [] + }; + } if (schemaName.endsWith("codex-plan.schema.json")) { return { status: "planned", @@ -1474,7 +2287,7 @@ function mockResultForSchema(options, prompt) { changed_files: [], verification: [], risks: [], - adapter: { backend: backend === "mock" ? "mock" : "pandacode", runtime: options.runtime || inferPandaRuntime(options) }, + adapter: { backend: "pandacode", runtime: options.runtime || inferPandaRuntime(options) }, error: null }; } @@ -1622,10 +2435,10 @@ async function autoAnswerNeedsInput(result, runtime, fallbackSession, execCwd, t if (timeoutMs) { answerArgs.push("--timeout-ms", String(timeoutMs)); } - if (runtime === "codex") { - answerArgs.push("--codexctl-bin", codexctlBin); - } - result = await runPandaCodeCommand(runtime, "answer", answerArgs, execCwd); + result = await runPandaCodeCommand(runtime, "answer", answerArgs, execCwd, { + session, + label: `${session || fallbackSession || "agent"}-answer-${round}` + }); } return result; } @@ -1633,13 +2446,28 @@ async function autoAnswerNeedsInput(result, runtime, fallbackSession, execCwd, t async function runPandaCode(prompt, options) { const execCwd = options.execCwd || cwd; const runtime = inferPandaRuntime(options); - const promptFile = writePromptFile(prompt, { ...options, label: `${runtime}-${options.label || options.id || "agent"}` }); const session = sanitizeSessionName( options.session || options.sessionName || `${runId}-${options.id || options.nodeId || options.label || "agent"}-${options.attempt || 1}` ); const selectedProvider = options.provider || options.bambooProvider || provider; + const selectedModel = options.model || model; + if (runtime === "bamboo") { + const keyPreflight = bambooApiKeyPreflight({ provider: selectedProvider, model: selectedModel, session, label: options.label || options.id }); + if (keyPreflight) { + emit({ + type: "panda_preflight_blocked", + runtime, + session, + label: options.label || options.id || "", + category: keyPreflight.error?.category, + remediation: keyPreflight.remediation + }); + return keyPreflight; + } + } + const promptFile = writePromptFile(prompt, { ...options, label: `${runtime}-${options.label || options.id || "agent"}` }); const args = [ runtime, "exec" @@ -1659,7 +2487,6 @@ async function runPandaCode(prompt, options) { promptFile, "--json" ); - const selectedModel = options.model || model; if (selectedModel) { args.push("--model", selectedModel); } @@ -1687,20 +2514,189 @@ async function runPandaCode(prompt, options) { args.push("--timeout-ms", String(timeoutMs)); } if (runtime === "codex") { - args.push("--codexctl-bin", codexctlBin); // Default to full access because a coding node usually must install // dependencies (npm/pip/cargo) and reach the network, and the only narrower - // mode codexctl exposes — workspace-write — also BLOCKS network, which breaks + // mode codex exposes — workspace-write — also BLOCKS network, which breaks // real builds (verified: `npm install` fails with connect EPERM under it). // Authors can opt a node down with { permission: "limited" } to confine it to // the working dir with no network (good for reviewing/analysing code). const permission = options.permission === "limited" ? "limited" : "max"; args.push("--permission", permission); } - const result = await runPandaCodeCommand(runtime, "exec", args, execCwd); + const result = await runPandaCodeCommand(runtime, "exec", args, execCwd, { + session, + label: options.label || options.id || options.nodeId + }); return autoAnswerNeedsInput(result, runtime, session, execCwd, timeoutMs); } +const BAMBOO_PROVIDER_API_KEY_ENVS = { + openai: ["OPENAI_API_KEY"], + anthropic: ["ANTHROPIC_API_KEY"], + deepseek: ["DEEPSEEK_API_KEY"], + xiaomi: ["XIAOMI_API_KEY", "MIMO_API_KEY"], + kimi: ["KIMI_API_KEY", "MOONSHOT_API_KEY"], + zhipu: ["ZHIPU_API_KEY", "BIGMODEL_API_KEY", "GLM_API_KEY"], + minimax: ["MINIMAX_API_KEY", "MINIMAXI_API_KEY"], + qwen: ["QWEN_API_KEY", "DASHSCOPE_API_KEY", "BAILIAN_API_KEY", "ALIBABA_API_KEY"], + stepfun: ["STEPFUN_API_KEY", "STEP_API_KEY", "STEP_PLAN_API_KEY"] +}; + +const BAMBOO_PROVIDER_ALIASES = { + openai: ["openai", "openai-compatible", "gpt", "gpt-4", "gpt-5"], + anthropic: ["anthropic", "claude"], + deepseek: ["deepseek", "deepseek-v4", "deepseek-v4-pro", "deepseek-chat"], + xiaomi: ["xiaomi", "mimo", "mimo-v2.5-pro"], + kimi: ["kimi", "moonshot", "moonshot-ai", "kimi-k2.6", "kimi-k2.5", "kimi-k2-thinking"], + zhipu: ["zhipu", "bigmodel", "glm", "chatglm", "glm-5.1", "glm-5", "glm-5-turbo", "glm-4.7"], + minimax: ["minimax", "minimaxi", "minimax-m3", "m3", "minimax-m2.7", "m2.7"], + qwen: ["qwen", "dashscope", "aliyun", "alibaba", "bailian", "tongyi", "qwen3.7-max", "qwen3.6-plus", "qwen3.6-flash"], + stepfun: ["stepfun", "step", "stepai", "step-ai", "step-3.7", "step-3.7-flash", "jieyue", "jieyuexingchen"] +}; + +function bambooApiKeyPreflight({ provider: selectedProvider, model: selectedModel, session, label }) { + const providerName = resolveBambooProviderName(selectedProvider, selectedModel); + if (selectedProvider && !providerName) { + const supported = Object.keys(BAMBOO_PROVIDER_API_KEY_ENVS).join(", "); + return { + ok: false, + backend: "pandacode", + runtime: "bamboo", + action: "preflight", + state: "blocked", + session, + provider: selectedProvider, + model: selectedModel || undefined, + remediation: `Use a supported Bamboo provider: ${supported}.`, + error: { + category: "bamboo_unknown_provider", + message: `Unknown Bamboo provider "${selectedProvider}". Use one of: ${supported}.`, + retryable: false + }, + adapter: { + backend: "pandacode", + runtime: "bamboo", + preflight: true + }, + label: label || undefined + }; + } + const requiredEnv = bambooApiKeyEnvCandidates(providerName); + if (requiredEnv.some(envValueSet) || bambooConfigHasApiKey()) { + return null; + } + const providerLabel = providerName || "configured Bamboo provider"; + const remediation = `Set one of ${requiredEnv.join(", ")} or configure api_key in PandaCode Bamboo config.`; + return { + ok: false, + backend: "pandacode", + runtime: "bamboo", + action: "preflight", + state: "blocked", + session, + provider: providerName || selectedProvider || undefined, + model: selectedModel || undefined, + missing: requiredEnv, + remediation, + error: { + category: "bamboo_missing_api_key", + message: `Bamboo ${providerLabel} is missing an API key. ${remediation}`, + retryable: false + }, + adapter: { + backend: "pandacode", + runtime: "bamboo", + preflight: true + }, + label: label || undefined + }; +} + +function resolveBambooProviderName(selectedProvider, selectedModel) { + if (String(selectedProvider || "").trim()) { + return normalizeBambooProvider(selectedProvider); + } + return inferBambooProviderFromModel(selectedModel) + || normalizeBambooProvider(process.env.PANDACODE_BAMBOO_PROVIDER) + || "deepseek"; +} + +function normalizeBambooProvider(value) { + const normalized = bambooHintTokens(value); + if (!normalized.length) { + return null; + } + const joined = normalized.join("-"); + for (const [providerName, aliases] of Object.entries(BAMBOO_PROVIDER_ALIASES)) { + if (aliases.some((alias) => hintMatchesAlias(joined, normalized, alias))) { + return providerName; + } + } + return null; +} + +function inferBambooProviderFromModel(value) { + return normalizeBambooProvider(value); +} + +function bambooHintTokens(value) { + const text = String(value || "").trim().toLowerCase(); + if (!text) { + return []; + } + return text + .replace(/_/g, "-") + .split(/[^a-z0-9.]+/) + .filter(Boolean); +} + +function hintMatchesAlias(joinedHint, tokens, alias) { + const normalizedAlias = alias.toLowerCase().replace(/_/g, "-"); + return tokens.includes(normalizedAlias) + || joinedHint === normalizedAlias + || joinedHint.startsWith(`${normalizedAlias}-`) + || joinedHint.startsWith(`${normalizedAlias}.`); +} + +function bambooApiKeyEnvCandidates(providerName) { + return dedupeStrings([ + "PANDACODE_BAMBOO_API_KEY", + "BAMBOO_API_KEY", + ...(BAMBOO_PROVIDER_API_KEY_ENVS[providerName] || []) + ]); +} + +function envValueSet(name) { + return typeof process.env[name] === "string" && process.env[name].trim() !== ""; +} + +function bambooConfigHasApiKey() { + const candidates = dedupeStrings([ + process.env.PANDACODE_BAMBOO_CONFIG_DIR ? `${process.env.PANDACODE_BAMBOO_CONFIG_DIR}/config.toml` : "", + process.env.BAMBOO_CONFIG_DIR ? `${process.env.BAMBOO_CONFIG_DIR}/config.toml` : "", + `${os.homedir()}/.pandacode/bamboo/config.toml` + ]); + for (const path of candidates) { + if (!path || !existsSync(path)) { + continue; + } + try { + const raw = readFileSync(path, "utf8"); + const match = raw.match(/^\s*api_key\s*=\s*(?:"([^"]*)"|'([^']*)'|([^\s#]+))/m); + if (match && String(match[1] || match[2] || match[3] || "").trim() !== "") { + return true; + } + } catch { + return false; + } + } + return false; +} + +function dedupeStrings(values) { + return [...new Set(values.filter((value) => typeof value === "string" && value.trim()).map((value) => value.trim()))]; +} + function inferPandaRuntime(options) { const explicit = options.runtime || options.backendRuntime || options.modelRuntime; if (explicit) { @@ -1736,7 +2732,7 @@ function sanitizeSessionName(value) { .slice(0, 180) || "odw-agent"; } -function runPandaCodeCommand(runtime, action, args, execCwd = cwd) { +function runPandaCodeCommand(runtime, action, args, execCwd = cwd, context = {}) { return new Promise((resolve) => { const child = spawn(pandacodeBin, args, { cwd: execCwd, @@ -1757,6 +2753,7 @@ function runPandaCodeCommand(runtime, action, args, execCwd = cwd) { backend: "pandacode", runtime, action, + session: context.session || context.label || "", exit_code: null, stdout_tail: stdout.slice(-4000), stderr_tail: stderr.slice(-4000), @@ -1766,7 +2763,7 @@ function runPandaCodeCommand(runtime, action, args, execCwd = cwd) { child.on("close", (code) => { const parsed = parsePandaCodeReportFromStdout(stdout) || parseJsonObjectFromText(stdout); if (parsed) { - resolve(normalizePandaCodeReport(parsed, { runtime, action, exit_code: code, stdout, stderr })); + resolve(normalizePandaCodeReport(parsed, { runtime, action, exit_code: code, stdout, stderr, ...context })); return; } resolve({ @@ -1774,6 +2771,7 @@ function runPandaCodeCommand(runtime, action, args, execCwd = cwd) { backend: "pandacode", runtime, action, + session: context.session || context.label || "", exit_code: code, stdout_tail: stdout.slice(-4000), stderr_tail: stderr.slice(-4000), @@ -1818,7 +2816,16 @@ function normalizePandaCodeReport(report, context) { } const runtime = report.runtime || context.runtime; const action = report.action || context.action; - const rawReportPath = writePandaCodeRawReport(report, { runtime, action }); + const rawReportPath = writePandaCodeRawReport(report, { runtime, action, session: context.session, label: context.label }); + const rawSummary = report.summary && typeof report.summary === "object" && !Array.isArray(report.summary) + ? report.summary + : null; + const rawStart = report.start && typeof report.start === "object" && !Array.isArray(report.start) + ? report.start + : null; + const rawExecute = report.execute && typeof report.execute === "object" && !Array.isArray(report.execute) + ? report.execute + : null; const record = compactPandaRecord(report.record); const summary = compactPandaSummary(report.summary); const start = compactPandaCommand(report.start); @@ -1828,14 +2835,17 @@ function normalizePandaCodeReport(report, context) { if (rawReportPath) { artifacts.raw_report = rawReportPath; } - const lastAgentMessage = truncateText( + const fullLastAgentMessage = readPandaCodeLastAssistantMessage(report) - || summary?.last_agent_message - || start?.last_agent_message - || execute?.last_agent_message - || "", - 4000 - ); + || rawSummary?.last_agent_message + || rawSummary?.lastAgentMessage + || rawStart?.last_agent_message + || rawStart?.summary?.last_agent_message + || rawExecute?.last_agent_message + || rawExecute?.summary?.last_agent_message + || ""; + const structuredOutput = parseJsonObjectFromText(fullLastAgentMessage); + const lastAgentMessage = truncateText(fullLastAgentMessage, 4000); const error = compactPandaError(report.error || start?.error || execute?.error); // A non-zero process exit means the executor failed, even when its JSON report // omits `ok` or optimistically reports ok:true. odw's core job is to surface @@ -1849,13 +2859,14 @@ function normalizePandaCodeReport(report, context) { // before compaction drops it, so observability can show it (vs "inherit"). model: report.summary?.model || report.record?.model || report.model || undefined, action, - session: report.session || record?.session || "", + session: report.session || record?.session || context.session || context.label || "", state: report.state || summary?.status || "unknown", exit_code: context.exit_code, run_id: report.run_id || report.runId || record?.run_id || summary?.run_id || report.session || "", thread_id: report.thread_id || report.threadId || record?.thread_id || summary?.thread_id, thread_path: report.thread_path || report.threadPath || record?.thread_path || summary?.thread_path, last_agent_message: lastAgentMessage, + structured_output: structuredOutput || undefined, summary: compactPandaNodeSummary(summary, { start, execute }), ...domainFields, artifacts, @@ -1930,10 +2941,14 @@ function compactPandaDomainFields(report) { return output; } -function writePandaCodeRawReport(report, { runtime, action }) { +function writePandaCodeRawReport(report, { runtime, action, session: fallbackSession, label }) { try { - const session = sanitizeSessionName(report.session || report.record?.session || `${runtime || "runtime"}-${action || "action"}`).slice(0, 80); - const path = `${runDir}/pandacode-${sanitizeSessionName(runtime || "runtime")}-${session}.report.json`; + const rawSession = report.session || report.record?.session || fallbackSession || label || `${runtime || "runtime"}-${action || "action"}`; + const session = sanitizeSessionName( + rawSession + ).slice(0, 80); + const actionSuffix = action ? `-${sanitizeSessionName(action)}` : ""; + const path = `${runDir}/pandacode-${sanitizeSessionName(runtime || "runtime")}-${session}${actionSuffix}.report.json`; mkdirSync(dirname(path), { recursive: true }); writeFileSync(path, JSON.stringify(report, null, 2)); return path; @@ -2330,6 +3345,12 @@ function workflowSandboxGlobals(workflowInput) { budget: globalThis.budget, odw: globalThis.odw, pandacode: globalThis.pandacode, + applyWorktreeDiff: globalThis.applyWorktreeDiff, + applyWorktreeDiffs: globalThis.applyWorktreeDiffs, + reviewWorktreeDiffs: globalThis.reviewWorktreeDiffs, + captureMainWorktreeSnapshot: globalThis.captureMainWorktreeSnapshot, + assertMainWorktreeUnchanged: globalThis.assertMainWorktreeUnchanged, + restoreMainWorktreeSnapshot: globalThis.restoreMainWorktreeSnapshot, workflow: globalThis.workflow, setTimeout, clearTimeout, diff --git a/odw/src/pack/templates/workflow-api.d.ts b/odw/src/pack/templates/workflow-api.d.ts index 7efa019..4f27213 100644 --- a/odw/src/pack/templates/workflow-api.d.ts +++ b/odw/src/pack/templates/workflow-api.d.ts @@ -7,7 +7,7 @@ export interface WorkflowMeta { name: string; description?: string; - /** When this workflow should be used (shown in the saved-workflow list). */ + /** When this workflow should be used (emitted as metadata for callers/tools). */ whenToUse?: string; /** * Phase declarations. `model` lets a phase set a default model that its @@ -44,8 +44,8 @@ export interface AgentOptions { * agents that mutate files in parallel do not conflict. The worktree is * removed when the node finishes (success, error, or timeout); the agent's * changes are returned in `result.worktree` as a diff. Requires cwd to be a - * git repository. NOTE: a worktree only contains COMMITTED files, so commit or - * stage any input files (specs, fixtures) the agent must read before running. + * git repository. NOTE: a worktree only contains COMMITTED files, so commit + * any input files (specs, fixtures) the agent must read before running. */ isolation?: "worktree"; /** @@ -102,6 +102,146 @@ export interface ParallelOptions { concurrency?: number; } +export interface WorktreeDiff { + changed: boolean; + files: string[]; + diff: string; + /** Commit SHA the isolated worktree was created from, when available. */ + base?: string | null; + error?: string; +} + +export type WorktreeDiffCandidate = WorktreeDiff | { worktree?: WorktreeDiff | null }; + +export type WorktreePatchApplyErrorCategory = + | "invalid_worktree_diff" + | "batch_preflight_failed" + | "patch_conflict" + | "patch_apply_failed"; + +export interface WorktreePatchApplyResult { + ok: boolean; + applied: boolean; + files: string[]; + base?: string | null; + changed?: boolean; + error?: { + category: WorktreePatchApplyErrorCategory; + message: string; + }; +} + +export interface WorktreePatchApplyOptions { + label?: string; + /** + * Batch only: opt into partial landing. By default `applyWorktreeDiffs` + * preflights the whole batch and applies it as one patch, so a conflict leaves + * the main cwd untouched. + */ + continueOnError?: boolean; +} + +export interface WorktreePatchBatchResult { + ok: boolean; + applied: number; + failed: number; + partial: boolean; + results: WorktreePatchApplyResult[]; +} + +export interface MainWorktreeSnapshot { + ok: boolean; + label?: string; + files: string[]; + hashes: Record; + contents: Record; + error?: string; +} + +export interface MainWorktreeUnchangedResult { + ok: boolean; + label: string; + before_files: number; + after_files: number; + added: string[]; + removed: string[]; + modified: string[]; + files: string[]; + error?: string; +} + +export interface MainWorktreeRestoreResult { + ok: boolean; + label: string; + restored: string[]; + removed: string[]; + errors: string[]; + after: MainWorktreeUnchangedResult; +} + +export type WorktreeReviewDecision = "approve" | "reject" | "needs_owner"; + +export interface WorktreeReviewReviewer { + id?: string; + label?: string; + runtime?: "claude" | "codex" | "bamboo" | string; + provider?: string; + model?: string; + effort?: string; + timeout?: string; + permission?: "limited" | "max"; + perspective?: string; + retry?: AgentOptions["retry"]; +} + +export interface WorktreeReviewOptions { + label?: string; + phase?: string; + context?: string; + criteria?: string[]; + runtime?: "claude" | "codex" | "bamboo" | string; + provider?: string; + model?: string; + effort?: string; + timeout?: string; + permission?: "limited" | "max"; + reviewerCount?: number; + maxReviewers?: number; + maxDiffChars?: number; + reviewers?: Array; + retry?: AgentOptions["retry"]; +} + +export interface WorktreeReviewResult { + reviewer: string; + decision: WorktreeReviewDecision; + summary: string; + blockers: string[]; + risks: string[]; + owner_questions: string[]; + verification: string[]; + files_reviewed: string[]; +} + +export interface WorktreeReviewGateResult { + /** True only when `decision === "approve"` and the batch can proceed. */ + ok: boolean; + decision: WorktreeReviewDecision; + applyReady: boolean; + files: string[]; + preflight: { + ok: boolean; + changed?: boolean; + category?: "invalid_worktree_diff" | "patch_conflict" | "review_workspace_failed"; + message?: string; + }; + reviews: WorktreeReviewResult[]; + blockers?: string[]; + risks?: string[]; + owner_questions?: string[]; + verification?: string[]; +} + export interface WorkflowBudget { total: number | null; /** @@ -116,14 +256,14 @@ export interface WorkflowBudget { } export type OdwErrorCategory = - | "codexctl_not_found" - | "codexctl_auth" - | "codexctl_rate_limit" - | "codexctl_network" - | "codexctl_permission" - | "codexctl_model" - | "codexctl_input" - | "codexctl_failed" + | "codex_not_found" + | "codex_auth" + | "codex_rate_limit" + | "codex_network" + | "codex_permission" + | "codex_model" + | "codex_input" + | "codex_failed" | "workflow_agent_failed" | "schema_mismatch" | "verification_failed" @@ -134,7 +274,7 @@ export interface OdwErrorFeedback { origin: { phase: string; agent: string; - backend?: "codexctl" | "claude-code" | "shell" | string; + backend?: "codex" | "claude-code" | "shell" | string; attempt?: number; }; error: { @@ -203,6 +343,69 @@ export declare function parallel( options?: ParallelOptions ): Promise; +/** + * Apply a captured `result.worktree` patch back into the main cwd. The runner + * first performs `git apply --check`; conflicts return + * `{ ok:false, error:{ category:"patch_conflict" } }` without mutating files. + */ +export declare function applyWorktreeDiff( + candidate: WorktreeDiffCandidate, + options?: WorktreePatchApplyOptions +): WorktreePatchApplyResult; + +/** + * Apply multiple captured worktree patches. By default this is atomic at the + * workflow level: the runner checks the combined patch first, then applies it as + * one patch; if any patch conflicts, no patch is written to cwd. Set + * `continueOnError:true` only when partial landing is intentional. + */ +export declare function applyWorktreeDiffs( + candidates: WorktreeDiffCandidate[] | WorktreeDiffCandidate, + options?: WorktreePatchApplyOptions +): WorktreePatchBatchResult; + +/** + * Review captured worktree patches before landing. The gate first preflights + * the combined patch without mutating cwd, then applies the patch to a temporary + * candidate worktree and runs one or more reviewer agents there with a + * structured verdict schema. Only `decision:"approve"` returns `ok:true` / + * `applyReady:true`; `reject` and `needs_owner` both block automatic apply. + */ +export declare function reviewWorktreeDiffs( + candidates: WorktreeDiffCandidate[] | WorktreeDiffCandidate, + options?: WorktreeReviewOptions +): Promise; + +/** + * Capture the current main cwd's changed-file content hashes without staging or + * mutating git state. Useful for read-only verification guards after an + * approve-only apply. + */ +export declare function captureMainWorktreeSnapshot(options?: { + label?: string; +}): MainWorktreeSnapshot; + +/** + * Compare the current main cwd against a prior snapshot and emit a + * `worktree_snapshot_check` event. Returns ok:false if verification or any + * other node changed files after the snapshot. + */ +export declare function assertMainWorktreeUnchanged( + snapshot: MainWorktreeSnapshot, + options?: { label?: string } +): MainWorktreeUnchangedResult; + +/** + * Restore the main cwd back to a prior snapshot for files changed after that + * snapshot. Added files are removed; modified/removed snapshot files are + * rewritten from the snapshot contents. Emits `worktree_snapshot_restore`. + */ +export declare function restoreMainWorktreeSnapshot( + snapshot: MainWorktreeSnapshot, + check?: MainWorktreeUnchangedResult | null, + options?: { label?: string } +): MainWorktreeRestoreResult; + export declare function fanout( items: TItem[], mapper: (item: TItem, index: number) => Promise | TResult, diff --git a/pandacode/Cargo.toml b/pandacode/Cargo.toml index 9886b0a..db5d0fc 100644 --- a/pandacode/Cargo.toml +++ b/pandacode/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "pandacode" -version = "0.3.1" +version = "0.4.0" edition = "2024" description = "PandaCode: one CLI shape to run coding tasks across codex, claude, and domestic-LLM (bamboo) runtimes." license = "MIT" diff --git a/pandacode/README.md b/pandacode/README.md index a5c0d2b..95b138f 100644 --- a/pandacode/README.md +++ b/pandacode/README.md @@ -3,8 +3,8 @@ PandaCode is an independent CLI for running coding tasks through multiple agent runtimes with one command shape. The first version supports: -- `pandacode codex ...`: Codex through `codexctl session` app-server/control-plane - commands. +- `pandacode codex ...`: Codex directly through `codex app-server` (JSON-RPC + over stdio), no external control-plane binary required. - `pandacode claude ...`: Claude Code through a real `tmux` session. - `pandacode bamboo ...`: Bamboo through its provider-native read/search/edit/write/bash coding loop for domestic OpenAI-compatible @@ -16,9 +16,8 @@ workflow system can call. By default, task execution uses the strongest production profile: -- Claude: `opus` with `max` effort. -- Codex: `gpt-5.5` with `xhigh` effort, based on local `codexctl models` - support. +- Claude: `fable` with `xhigh` effort. +- Codex: `gpt-5.5` with `xhigh` effort. - Bamboo: `deepseek` provider with Bamboo's provider default model and `high` reasoning effort. Use `--provider`, `--model`, and `--effort` to choose another domestic model. @@ -71,8 +70,8 @@ If a runtime asks for external input, `exec`/`resume` return `state: "waiting_for_user"` with `pending_user_input` instead of treating the turn as a failure. Use `pandacode answer --choice N --wait` or `--text ...` to continue the session. Claude answers the visible TUI prompt; -Codex delegates to `codexctl session answer`; Bamboo maps `answer` to a resume -turn that passes the selected/text answer back into the same Bamboo run history. +Codex and Bamboo map `answer` to a resume turn that passes the selected/text +answer back into the same thread/run history. The stable high-level states are: @@ -118,6 +117,22 @@ pandacode bamboo exec --provider deepseek --model deepseek-v4-pro --effort high pandacode claude exec - < task.md ``` +Prompt parts can be appended after the task with a repeatable `--prompt-append` +flag on `run`, `resume`, and every runtime's `exec`/`resume`: + +```bash +pandacode codex exec --task-file worker.md \ + --prompt-append builtin:implementation-worker \ + --prompt-append @spec/runs/V1/worker-packet.md +pandacode claude exec --task "fix the failing tests" \ + --prompt-append @prompts/house-rules.md \ + --prompt-append "text:never edit CI config" +``` + +All runtimes resolve prompt parts locally and append them to the task with +visible separators. `builtin:NAME` role prompts are embedded in the PandaCode +binary, and `@FILE`, `file:PATH`, and `text:TEXT` read files or literal text. + ## Agent Integration Contract Future workflow agents should treat `pandacode` as the only public interface. @@ -125,6 +140,10 @@ Once the binary is installed and the requested backend is available on the machine, an agent can run coding tasks without preparing Claude settings, MCP files, hooks, or project configuration. +Human-facing top-level inspection commands such as `pandacode doctor`, +`pandacode models`, and `pandacode list` print compact summaries by default. +Pass `--json` whenever a caller needs the full machine-readable report. + Recommended agent loop: ```bash @@ -144,13 +163,13 @@ one runtime is usable; inspect each runtime's `ok`, `missing`, and With `--runtime auto`, a known `--model` can select the matching backend: domestic model ids such as `kimi-k2.6` select Bamboo and infer their provider, -Claude aliases such as `opus` select Claude, and `gpt-*` ids select Codex. +Claude aliases such as `fable` or `opus` select Claude, and `gpt-*` ids select Codex. Use `--provider` only when the Bamboo provider cannot be inferred from the model id or you want to override the default. -The same command shape applies to Codex and Bamboo. `answer --choice` maps to -`codexctl session answer --pick`; `logs --visible` remains Claude-only because -Codex and Bamboo have structured run snapshots rather than a terminal +The same command shape applies to Codex and Bamboo. `answer --choice` picks +from the recorded pending question options; `logs --visible` remains Claude-only +because Codex and Bamboo have structured run snapshots rather than a terminal viewport. For Bamboo domestic-model runs, prefer the top-level command for normal use: @@ -183,20 +202,46 @@ PandaCode owns the runtime glue internally: ## Runtime Mapping -Codex uses `codexctl session start/send/execute/read/watch/interrupt/stop/list`. -`exec` starts the app-server session and then calls `session execute`, because -`session start` is a Plan-mode turn and PandaCode is meant to be an executor. -The PandaCode session record stores the Codex `run_id`, `thread_id`, and local -log paths under `.pandacode/sessions/codex`. Each Codex session gets its own -control channel: logs are written under `.pandacode/codex/runs//logs` -and the codexctl daemon socket is a short per-session temp socket. This keeps -parallel workflow nodes from sharing one codexctl daemon/run namespace. -Long task prompts are transported by file reference: PandaCode stores the full -task under `.pandacode//prompts/` and sends a short instruction telling -the runtime to read that file. This avoids tmux paste limits and codexctl -app-server pipe/socket failures while preserving the original task text for -observability. Codex start also retries transient transport failures with a fresh -control socket. +Codex spawns one `codex app-server` process per turn and drives it over stdio +JSON-RPC: `thread/start` (or `thread/resume`) followed by `turn/start` in +default collaboration mode, waiting for `turn/completed`. Thread state persists +as Codex rollout files, so resume works across processes without a daemon. The +PandaCode session record stores the Codex `thread_id` and `thread_path` under +`.pandacode/sessions/codex`, and the JSONL protocol traffic is logged to +`.pandacode/codex/logs/.jsonl`. + +Codex turns run with a PandaCode-managed clean Codex home under +`~/.pandacode/codex-home/`: auth material is copied from the user's +Codex home, while `config.toml`, `AGENTS.md`, and skills are deliberately left +out. Switch accounts with `--auth-home ~/.codex-work` (copies that home's auth +into its own managed directory), or pass `--codex-home DIR` / +`PANDACODE_CODEX_HOME` to use a full Codex home as-is. Auth material is +re-copied on every turn, so the managed home tracks the source home's current +tokens rather than going stale on a one-time snapshot. + +`pandacode codex doctor` reports the installed vs. tested codex CLI version +alongside the account and rate-limit status. PandaCode drives the codex +app-server JSON-RPC protocol directly, so re-verify after a codex CLI upgrade +that could change thread/turn methods or event names. + +Transient PandaCode outputs (prompts, logs, events, detached worker results) +accumulate under `.pandacode/`. Run `pandacode gc --days N` (optionally +`--dry-run`) to prune files older than N days; session records and the Codex +home are never touched. + +`--detach` returns immediately and runs the turn in a background worker that +keeps the session record fresh (status, last agent message, token usage), so +`status` doubles as a live watch. While a detached turn waits on +`item/tool/requestUserInput`, `answer --choice N|--text ...` replies inside the +same turn through the native structured-answer protocol; `interrupt` aborts the +active turn through `turn/interrupt`. For synchronous turns the question is +recorded, `waiting_for_user` is reported, and `answer` continues the thread as +a fresh turn instead. `--objective` sets a thread goal before the turn starts, +`logs --visible` reads back the structured thread history (turns and +messages), and `codex doctor` reports the account and rate-limit status from +the app-server. Claude task prompts larger than the paste threshold are +transported by file reference under `.pandacode//prompts/`; Codex +turns send the full task text directly over stdio. Claude uses `tmux` to start interactive Claude Code and sends turns into that session. Completion is detected with an explicit marker in the visible tmux diff --git a/pandacode/docs/agent-caller-quickstart.md b/pandacode/docs/agent-caller-quickstart.md index a60f4c6..5afdbef 100644 --- a/pandacode/docs/agent-caller-quickstart.md +++ b/pandacode/docs/agent-caller-quickstart.md @@ -74,7 +74,7 @@ Top-level commands expose common controls only: - JSON output With `--runtime auto`, known model ids can select a backend. Domestic model ids -select Bamboo and infer their provider, Claude aliases such as `opus` select +select Bamboo and infer their provider, Claude aliases such as `fable` or `opus` select Claude, and `gpt-*` ids select Codex. Permission defaults to `max`. Use `--permission limited` when the caller wants a diff --git a/pandacode/docs/cli-surface.html b/pandacode/docs/cli-surface.html index 37b5269..26e6244 100644 --- a/pandacode/docs/cli-surface.html +++ b/pandacode/docs/cli-surface.html @@ -155,17 +155,17 @@

PandaCode CLI Surface

-

PandaCode v1 is a narrow coding-task executor for three runtimes: Codex through codexctl session, Claude Code through real tmux, and Bamboo as an embedded provider-native coding loop for domestic OpenAI-compatible models.

+

PandaCode is a narrow coding-task executor for three runtimes: Codex directly through codex app-server stdio JSON-RPC, Claude Code through real tmux, and Bamboo as an embedded provider-native coding loop for domestic OpenAI-compatible models.

Codex Runtime implemented

-

Uses codexctl session start/send/answer/execute/read/interrupt/stop/list. exec and resume call execute so PandaCode behaves as an executor, not only a planner. Structured questions flow through answer. Defaults to gpt-5.5, xhigh effort, and max permission.

+

Spawns codex app-server per turn and drives thread/start/thread/resume + turn/start over stdio JSON-RPC in default collaboration mode. Structured questions pause as waiting_for_user and answer continues the thread. Defaults to gpt-5.5, xhigh effort, and max permission.

Claude Runtime implemented

-

Starts Claude Code inside tmux with isolated local settings, inline PandaCode-owned hook settings, and an empty strict MCP config. Defaults to opus, max effort, and max permission. It pastes tasks into the TUI, waits for an explicit completion marker, records hook events, and reads visible/log output from tmux. The banned claude -p stream-json path is not used.

+

Starts Claude Code inside tmux with isolated local settings, inline PandaCode-owned hook settings, and an empty strict MCP config. Defaults to fable, xhigh effort, and max permission. It pastes tasks into the TUI, waits for an explicit completion marker, records hook events, and reads visible/log output from tmux. The banned claude -p stream-json path is not used.

Bamboo Runtime implemented

@@ -212,7 +212,7 @@

Runtime Commands

answer Answer a runtime prompt waiting for user input. - Claude maps to visible AskUserQuestion options; Codex maps --choice N to codexctl session answer --pick N. + Claude maps to visible AskUserQuestion options; Codex maps --choice N to the recorded pending question options and answers via a resume turn. status @@ -237,7 +237,7 @@

Runtime Commands

models List model choices for a runtime. - Codex delegates to codexctl models; Bamboo returns its built-in domestic model catalog. + Codex queries model/list on the app-server; Bamboo returns its built-in domestic model catalog. interrupt diff --git a/pandacode/prompts/builtin/bdd-spec-designer.md b/pandacode/prompts/builtin/bdd-spec-designer.md new file mode 100644 index 0000000..1594393 --- /dev/null +++ b/pandacode/prompts/builtin/bdd-spec-designer.md @@ -0,0 +1,40 @@ +# BDD Spec Designer + +You are a project-agnostic BDD and acceptance-contract designer. Do not implement. + +Your job is to convert intent into executable behavior contracts that workers and reviewers can verify. + +Inputs to require or infer: + +- user goal and target version +- audience and user-visible closed loop +- in-scope and out-of-scope behavior +- interfaces, states, permissions, data shapes, errors, and edge cases +- known project spec format or required output path + +Rules: + +- Treat BDD as the source of truth for what the product must do. +- Prefer small scenario IDs that can be referenced by code, tests, and review. +- Cover happy path, empty/loading/error states, permissions, concurrency, retries, invalid input, and rollback/undo when relevant. +- Do not hide decisions in prose. Lock them as explicit scenarios, assumptions, or open questions. +- If a decision materially changes behavior and cannot be discovered from context, ask a structured question. + +Return: + +```text +decision: ready | needs_input | blocked +version: +scope: +bdd_contract: + - id: + given: + when: + then: + acceptance: +edge_cases: +io_and_data_shapes: +permissions_and_security: +open_questions: +next_actions: +``` diff --git a/pandacode/prompts/builtin/codebase-governance.md b/pandacode/prompts/builtin/codebase-governance.md new file mode 100644 index 0000000..288e064 --- /dev/null +++ b/pandacode/prompts/builtin/codebase-governance.md @@ -0,0 +1,41 @@ +# Codebase Governance + +You are the codebase governance lane. Do not implement unless explicitly paired with an implementation worker prompt. Your job is to protect long-term architecture, consistency, maintainability, and anti-mud boundaries. + +Focus on: + +- architecture boundaries and dependency direction +- module owner, data owner, API owner, migration owner +- allowed paths, forbidden paths, and change radius +- public contracts, schemas, error models, logging, permissions, and design system consistency +- shared/common/utils growth risk +- cyclic dependencies, deep imports, hidden globals, implicit side effects +- legacy safety, characterization tests, feature flags, rollback, and gradual migration +- same-class scan and guardrail update after defects + +Rules: + +- Prefer existing project patterns, public APIs, and module owners. +- Default new helper code to the owning module. Do not move code into shared/common/utils unless it is truly the same stable knowledge across at least two real use cases. +- Do not let a vertical slice become cross-layer chaos. A slice may touch UI/API/domain/storage/tests, but each layer must keep its boundary and owner. +- If a change needs more than five modules' internal details, recommend a contract, facade, seam, or preparatory refactor before continuing. +- Separate refactor and behavior change unless the active spec explicitly binds them. +- Legacy code without tests needs characterization or the smallest viable harness before risky change. +- Every defect fix should produce regression, same-class scan, and a guardrail or a recorded reason why no deterministic guardrail is feasible. + +Return: + +```text +status: ready | needs_changes | blocked +change_radius: +architecture_boundaries: +module_owners: +allowed_paths_check: +dependency_direction_check: +shared_or_utils_risk: +legacy_safety: +consistency_gates: +required_guardrails: +same_class_scan: +recommended_next_action: +``` diff --git a/pandacode/prompts/builtin/fresh-judge.md b/pandacode/prompts/builtin/fresh-judge.md new file mode 100644 index 0000000..4ac0600 --- /dev/null +++ b/pandacode/prompts/builtin/fresh-judge.md @@ -0,0 +1,24 @@ +# Fresh Judge + +You are an independent fresh-context final judge. Do not implement. + +Review the diff, artifacts, and evidence against: + +- user goal and active version/card scope +- requirements and BDD/spec IDs +- technical skeleton and design decisions +- triggered research dimensions +- required checks and evidence report +- security, data, permissions, and compatibility risks + +Report only correctness, scope, security/data, architecture, UX, test, or evidence blockers. Avoid broad style preferences. + +Return: + +```text +decision: pass | request_changes | blocked +blockers: +evidence_checked: +residual_risk: +next_actions: +``` diff --git a/pandacode/prompts/builtin/harness-verifier.md b/pandacode/prompts/builtin/harness-verifier.md new file mode 100644 index 0000000..037f11a --- /dev/null +++ b/pandacode/prompts/builtin/harness-verifier.md @@ -0,0 +1,36 @@ +# Harness Verifier + +You are a verification and evidence agent. Do not expand product scope. + +Your job is to prove whether the implementation satisfies the BDD/spec and user-visible closed loop. + +Verify with the strongest available method: + +- unit, integration, contract, E2E, migration, and regression tests +- BDD scenario mapping +- real app operation through UI/browser/CLI/API when relevant +- screenshots, logs, traces, reports, and artifacts +- accessibility, responsiveness, error-state, and empty-state checks for UI +- data integrity, permissions, and rollback checks for backend/data work + +Rules: + +- Prefer real execution evidence over static claims. +- For UI, inspect rendered output, not just code. +- Map evidence back to BDD/spec IDs. +- Report failures as blockers with reproduction steps. +- Do not mark done because tests are absent. Absence of a harness is a gap. + +Return: + +```text +decision: pass | request_changes | blocked +verified_scope: +bdd_coverage: +checks_run: +evidence_paths: +failures: +missing_harness: +residual_risk: +next_actions: +``` diff --git a/pandacode/prompts/builtin/implementation-worker.md b/pandacode/prompts/builtin/implementation-worker.md new file mode 100644 index 0000000..055cd66 --- /dev/null +++ b/pandacode/prompts/builtin/implementation-worker.md @@ -0,0 +1,36 @@ +# Implementation Worker + +You are an implementation worker executing one scoped dispatch card. + +Inputs must include: + +- card ID, version ID, and objective +- referenced BDD/spec IDs +- technical skeleton or architecture constraints +- allowed paths and forbidden paths +- required checks and evidence paths +- stop conditions + +Rules: + +- Read the referenced contracts before editing. +- Stay inside the card scope. Do not redefine product intent, architecture, or BDD. +- Prefer existing project patterns and helper APIs. +- Keep edits minimal and cohesive. +- If required contracts are missing, contradictory, or unsafe, stop and report `blocked`. +- Run the required checks. If checks cannot run, report why and provide the strongest evidence available. +- Do not self-approve final quality. + +Return: + +```text +status: completed | blocked | failed +card_id: +changes: +contracts_satisfied: +checks_run: +evidence: +blockers: +residual_risk: +handoff: +``` diff --git a/pandacode/prompts/builtin/orchestrator-base.md b/pandacode/prompts/builtin/orchestrator-base.md new file mode 100644 index 0000000..e553853 --- /dev/null +++ b/pandacode/prompts/builtin/orchestrator-base.md @@ -0,0 +1,29 @@ +# Orchestrator Base + +You are a project-agnostic version-based engineering orchestrator. Work from durable artifacts, not chat memory. + +Authority order: + +1. user goal and explicit approvals +2. project rules, specs, BDD, architecture, and design sources of truth +3. active version/card packet +4. role prompt +5. local judgment + +Default state machine: + +1. Intake: identify active version, node, row, card, and done criteria. +2. Design lock: ensure BDD/spec, technical skeleton, UX/design, and trigger scan are sufficient. +3. Dispatch: create narrow lanes with role, run_id, scope, inputs, allowed paths, checks, evidence, and stop conditions. +4. Monitor: watch each lane for running, needs_input, completed, failed, or blocked. +5. Synthesize: merge outputs, resolve conflicts, and keep decisions attached to sources. +6. Verify: require harness evidence and independent review before done. +7. Release: record result, evidence, residual risk, and next-version candidates. + +Rules: + +- Use session-style multi-turn control when a lane may ask questions or need follow-up. +- Pause a lane on needs_input; surface the question, answer by run_id, then continue the same lane. +- Do not let workers redefine product intent, architecture, or acceptance criteria. +- Do not silently expand scope. Escalate irreversible or product-semantic decisions. +- Report done only with evidence mapped to the requested scope. diff --git a/pandacode/prompts/builtin/product-ux-taste-designer.md b/pandacode/prompts/builtin/product-ux-taste-designer.md new file mode 100644 index 0000000..5c8b5e5 --- /dev/null +++ b/pandacode/prompts/builtin/product-ux-taste-designer.md @@ -0,0 +1,39 @@ +# Product UX Taste Designer + +You are a project-agnostic product, UX, and taste designer. Do not implement. + +Your job is to make the product decision-complete enough that implementation is not guessing. + +Design across: + +- user goal, motivation, and success moment +- primary workflow and secondary workflows +- information architecture and navigation +- state model: empty, loading, partial, error, success, disabled, permission-denied, destructive, undo +- interaction model: controls, feedback, latency, keyboard/touch, accessibility +- copy tone and hierarchy +- visual direction and design-system constraints when UI exists +- CLI/API ergonomics when no visual UI exists + +Rules: + +- Design the actual usable experience, not a marketing shell. +- Match density and visual language to the domain. +- Use existing design systems and components when present. +- Avoid generic AI-looking defaults. Lock concrete taste decisions: layout density, typography, color behavior, motion, and component style. +- For high-impact ambiguity, ask a structured question with a recommended default. + +Return: + +```text +decision: ready | needs_input | blocked +user_outcome: +workflow: +screen_or_surface_states: +interaction_rules: +information_architecture: +visual_or_interface_direction: +accessibility_and_responsiveness: +acceptance_checks: +open_questions: +``` diff --git a/pandacode/prompts/builtin/release-devlog-keeper.md b/pandacode/prompts/builtin/release-devlog-keeper.md new file mode 100644 index 0000000..7b659ab --- /dev/null +++ b/pandacode/prompts/builtin/release-devlog-keeper.md @@ -0,0 +1,35 @@ +# Release Devlog Keeper + +You are a release and devlog keeper. Do not implement. + +Your job is to turn completed version work into durable project memory. + +Record: + +- version ID and user-visible outcome +- BDD/spec IDs delivered +- implementation summary by behavior, not by noisy file lists +- decisions made and why +- checks run and evidence paths +- known residual risks and follow-up candidates +- blockers, skipped checks, or deferred work + +Rules: + +- Devlog records process and decisions; it does not replace BDD, architecture, or design sources of truth. +- Keep entries timestamped and concise. +- Do not hide failed checks. Record them with reason and next action. +- Separate completed work from proposed next versions. + +Return: + +```text +decision: recorded | needs_input | blocked +version: +delivered: +decisions: +checks_and_evidence: +residual_risk: +next_version_candidates: +devlog_entry: +``` diff --git a/pandacode/prompts/builtin/research-trigger.md b/pandacode/prompts/builtin/research-trigger.md new file mode 100644 index 0000000..52b4de9 --- /dev/null +++ b/pandacode/prompts/builtin/research-trigger.md @@ -0,0 +1,38 @@ +# Research Trigger + +You are a project-agnostic research and standards trigger scanner. Do not implement. + +Your job is to decide which extra rules, skills, docs, or research packets must be loaded before planning or execution. + +Scan for triggers: + +- frontend, UX, visual design, accessibility, responsive layout +- backend, database, migrations, queues, concurrency, cache, jobs +- auth, permissions, privacy, security, secrets, compliance +- payments, billing, credits, quota, pricing +- infrastructure, cloud resources, deploy, observability, rollback +- AI/model behavior, prompts, evals, agent orchestration +- browser automation, human-like verification, screenshots, recordings +- documents, spreadsheets, presentations, generated media +- risky refactors, cross-module contracts, public APIs, data loss + +Rules: + +- Load only what the task needs; do not flood the worker with irrelevant doctrine. +- Name the exact source to load when known. If unknown, name the missing source as a blocker or question. +- Distinguish hard requirements from useful context. +- If the project already has rules/specs, prefer those over generic defaults. + +Return: + +```text +decision: ready | needs_input | blocked +required_triggers: + - trigger: + reason: + source_to_load: + applies_to: +optional_context: +missing_sources: +questions: +``` diff --git a/pandacode/prompts/builtin/reviewer-red-team.md b/pandacode/prompts/builtin/reviewer-red-team.md new file mode 100644 index 0000000..fbd3654 --- /dev/null +++ b/pandacode/prompts/builtin/reviewer-red-team.md @@ -0,0 +1,33 @@ +# Reviewer Red Team + +You are an independent adversarial reviewer. Do not implement. + +Review the work against: + +- user goal and version scope +- BDD/spec IDs and acceptance criteria +- technical skeleton, module boundaries, and dependency rules +- product/UX/design decisions +- security, privacy, permission, and data integrity +- migration, compatibility, and rollback safety +- tests, harnesses, and evidence quality +- unnecessary scope expansion or hidden assumptions + +Rules: + +- Findings first. Prioritize correctness, regressions, data loss, security, architecture, and missing evidence. +- Cite concrete files, commands, artifacts, or missing contract IDs when available. +- Do not request broad style changes unless they create real risk. +- If no blocking issue is found, say so and name residual risks. + +Return: + +```text +decision: pass | request_changes | blocked +findings: +evidence_checked: +scope_drift: +missing_tests_or_evidence: +residual_risk: +recommended_next_step: +``` diff --git a/pandacode/prompts/builtin/tech-architect.md b/pandacode/prompts/builtin/tech-architect.md new file mode 100644 index 0000000..3c15a4d --- /dev/null +++ b/pandacode/prompts/builtin/tech-architect.md @@ -0,0 +1,40 @@ +# Tech Architect + +You are a project-agnostic technical architect. Do not implement unless explicitly dispatched as a worker. + +Your job is to lock the technical skeleton before coding starts. + +Evaluate and define: + +- module boundaries and ownership +- dependency direction and forbidden imports +- data ownership, persistence, migrations, and compatibility constraints +- API, event, job, and file contracts +- framework/library choices and reasons +- security, auth, permission, privacy, and secret-handling constraints +- observability and failure behavior +- test seams and harness requirements + +Rules: + +- Prefer existing project architecture over new abstractions. +- Add an abstraction only when it removes real complexity or matches an established pattern. +- Make irreversible choices explicit. +- If BDD and architecture conflict, report the conflict instead of smoothing it over. +- Do not allow workers to redefine module boundaries during implementation. + +Return: + +```text +decision: ready | request_changes | needs_input | blocked +technical_skeleton: +module_boundaries: +dependency_rules: +data_owners: +contracts: +technology_choices: +forbidden_moves: +migration_or_compatibility_notes: +test_and_harness_requirements: +open_questions: +``` diff --git a/pandacode/prompts/builtin/version-advisor.md b/pandacode/prompts/builtin/version-advisor.md new file mode 100644 index 0000000..ea9a4a4 --- /dev/null +++ b/pandacode/prompts/builtin/version-advisor.md @@ -0,0 +1,43 @@ +# Version Advisor + +You are an independent read-only Advisor. Do not edit files and do not implement. + +You are not an approval authority. Your job is to advise the main Orchestrator / executing agent on the next bounded action and the context it must load before acting. + +Advise on: + +- user-visible closed loop +- BDD/spec ambiguity +- product, UX, architecture, and data-contract readiness +- principles, rules, skills, and research packets that should be loaded now +- missing research triggers or project rules +- missing harness or evidence paths +- parallelism, isolation, and sequencing risks +- irreversible decisions requiring the user + +Rules: + +- Recommend the smallest stable next action for the Orchestrator. +- Prefer `ask_user` for product-semantic decisions that cannot be inferred. +- Prefer `revise_spec` for missing BDD, contracts, harness, or evidence. +- Prefer `load_context` when the next action depends on principles, rules, skills, research, design/taste, architecture, or prior evidence not yet loaded. +- Prefer `split_row` when parallel execution is unsafe or a row is too broad. +- Do not claim to approve or reject. The Orchestrator decides and records whether it follows your recommendation. + +Return: + +```text +recommendation: continue_current_row | revise_spec | load_context | split_row | ask_user | stop_for_blocker | run_verification | proceed_next_row +next_action: +required_context_to_load: +principles_or_rules_to_apply: +bdd_or_spec_gaps: +product_ux_gaps: +trigger_gaps: +harness_gaps: +contract_or_architecture_gaps: +parallelism_risks: +missing_evidence: +user_decisions_required: +why_this_next_step: +``` diff --git a/pandacode/prompts/builtin/worker-card.md b/pandacode/prompts/builtin/worker-card.md new file mode 100644 index 0000000..d02d4b1 --- /dev/null +++ b/pandacode/prompts/builtin/worker-card.md @@ -0,0 +1,14 @@ +# Worker Card + +You are a worker executing one dispatch card. For new orchestration, prefer the fuller `implementation-worker` builtin; this prompt remains as a compact compatibility role. + +Rules: + +- Read the referenced spec and BDD before editing. +- Stay inside allowed paths and respect forbidden paths. +- Implement only the bound card scope. +- Run the required checks. +- Produce or update the requested evidence artifact. +- Stop if BDD, contracts, permissions, migrations, or runtime resources are ambiguous. + +Do not self-approve final quality. The orchestrator or fresh judge will review evidence. diff --git a/pandacode/src/agent.rs b/pandacode/src/agent.rs index df89b55..56b6800 100644 --- a/pandacode/src/agent.rs +++ b/pandacode/src/agent.rs @@ -10602,7 +10602,8 @@ mod tests { #[test] #[cfg(unix)] fn resolve_writable_path_blocks_symlink_escape_with_create_dirs() { - let base = std::env::temp_dir().join(format!("pandacode-wpath-{}", crate::io::now_millis())); + let base = + std::env::temp_dir().join(format!("pandacode-wpath-{}", crate::io::now_millis())); let cwd = base.join("cwd"); let outside = base.join("outside"); std::fs::create_dir_all(&cwd).unwrap(); diff --git a/pandacode/src/cli.rs b/pandacode/src/cli.rs index 5d60259..5ade5f1 100644 --- a/pandacode/src/cli.rs +++ b/pandacode/src/cli.rs @@ -42,13 +42,25 @@ pub enum Commands { Interrupt(AgentSessionCommandArgs), #[command(about = "Stop the latest or selected session")] Stop(AgentSessionCommandArgs), + #[command( + about = "Wait until the given sessions settle; succeed only when every session completes and every expected artifact exists" + )] + Wait(WaitCommandArgs), + #[command( + about = "Reclaim disk: prune PandaCode-owned prompts/logs/events/detached files older than --days (session records and Codex home are never touched)" + )] + Gc(GcCommandArgs), #[command(about = "Check runtimes and required local binaries")] Doctor(GlobalArgs), #[command(about = "List known PandaCode sessions for all runtimes")] List(GlobalArgs), #[command(about = "List models for all runtimes")] Models(GlobalArgs), - #[command(subcommand, about = "Run tasks through Codex app-server/control-plane")] + #[command( + subcommand, + about = "Run tasks directly through codex app-server (stdio JSON-RPC, no daemon)", + after_help = "Examples:\n pandacode codex exec --task \"fix the failing tests\" --cd .\n pandacode codex exec --detach --session build --task-file task.md # background turn\n pandacode codex status --session build # live watch (state + last agent message)\n pandacode codex answer --session build --choice 2 # answer a pending question\n pandacode codex interrupt --session build # abort the active turn\n pandacode codex exec --auth-home ~/.codex-work --task \"...\" # another account's auth, clean config\n pandacode codex logs --session build --visible # structured thread history\n pandacode codex doctor # health + account + rate limits" + )] Codex(RuntimeCommand), #[command(subcommand, about = "Run tasks through Claude Code in tmux")] Claude(RuntimeCommand), @@ -126,9 +138,13 @@ pub struct ClaudeHookArgs { #[derive(Debug, Args, Clone)] pub struct AnswerCommandArgs { - #[arg(long, default_value = "latest")] + #[arg(long, default_value = "latest", help = "Session id, or latest")] pub session: String, - #[arg(long, default_value = ".")] + #[arg( + long, + default_value = ".", + help = "Workspace directory for session state" + )] pub cd: PathBuf, #[arg( long, @@ -145,9 +161,9 @@ pub struct AnswerCommandArgs { pub text: Option, #[arg(long, help = "Wait for the runtime to continue after answering")] pub wait: bool, - #[arg(long)] + #[arg(long, help = "Wait timeout in milliseconds")] pub timeout_ms: Option, - #[arg(long)] + #[arg(long, help = "Print machine-readable JSON")] pub json: bool, #[command(flatten)] pub bins: RuntimeBins, @@ -155,9 +171,13 @@ pub struct AnswerCommandArgs { #[derive(Debug, Args, Clone)] pub struct GlobalArgs { - #[arg(long, default_value = ".")] + #[arg( + long, + default_value = ".", + help = "Workspace directory for session state" + )] pub cd: PathBuf, - #[arg(long)] + #[arg(long, help = "Print machine-readable JSON")] pub json: bool, #[command(flatten)] pub bins: RuntimeBins, @@ -165,19 +185,76 @@ pub struct GlobalArgs { #[derive(Debug, Args, Clone)] pub struct RuntimeGlobalArgs { - #[arg(long, default_value = ".")] + #[arg( + long, + default_value = ".", + help = "Workspace directory for session state" + )] pub cd: PathBuf, - #[arg(long)] + #[arg(long, help = "Print machine-readable JSON")] pub json: bool, #[command(flatten)] pub bins: RuntimeBins, } +#[derive(Debug, Args, Clone)] +pub struct WaitCommandArgs { + #[arg( + long = "session", + required = true, + value_name = "SESSION", + help = "Session id to wait for; repeat for multiple lanes" + )] + pub sessions: Vec, + #[arg(long, default_value = ".", help = "Workspace directory")] + pub cd: PathBuf, + #[arg( + long, + default_value_t = 1_800_000, + help = "Overall wait timeout in milliseconds" + )] + pub timeout_ms: u64, + #[arg(long, default_value_t = 5_000, help = "Poll interval in milliseconds")] + pub interval_ms: u64, + #[arg( + long = "expect-artifact", + value_name = "PATH", + help = "File that must exist (relative to --cd) for the wait to succeed; repeat for multiple files" + )] + pub expect_artifact: Vec, + #[arg(long, help = "Print machine-readable JSON")] + pub json: bool, +} + +#[derive(Debug, Args, Clone)] +pub struct GcCommandArgs { + #[arg(long, default_value = ".", help = "Workspace directory")] + pub cd: PathBuf, + #[arg( + long, + default_value_t = 7, + help = "Delete PandaCode-owned prompt/log/event/detached files older than this many days" + )] + pub days: u64, + #[arg( + long, + help = "Report what would be deleted without removing anything" + )] + pub dry_run: bool, + #[arg(long, help = "Print machine-readable JSON")] + pub json: bool, +} + #[derive(Debug, Args, Clone)] pub struct AgentTaskCommandArgs { #[command(flatten)] pub common: TaskCommandArgs, - #[arg(long, value_enum, default_value_t = RuntimeSelector::Auto)] + #[arg( + long, + value_enum, + default_value_t = RuntimeSelector::Auto, + help = "Runtime to use; auto selects from model/provider hints" + )] pub runtime: RuntimeSelector, #[arg( long, @@ -190,7 +267,7 @@ pub struct AgentTaskCommandArgs { pub struct AgentSessionCommandArgs { #[command(flatten)] pub common: SessionCommandArgs, - #[arg(long, value_enum, default_value_t = RuntimeSelector::Auto)] + #[arg(long, value_enum, default_value_t = RuntimeSelector::Auto, help = "Runtime to inspect")] pub runtime: RuntimeSelector, } @@ -198,7 +275,7 @@ pub struct AgentSessionCommandArgs { pub struct AgentLogsCommandArgs { #[command(flatten)] pub common: LogsCommandArgs, - #[arg(long, value_enum, default_value_t = RuntimeSelector::Auto)] + #[arg(long, value_enum, default_value_t = RuntimeSelector::Auto, help = "Runtime to inspect")] pub runtime: RuntimeSelector, } @@ -206,7 +283,7 @@ pub struct AgentLogsCommandArgs { pub struct AgentAnswerCommandArgs { #[command(flatten)] pub common: AnswerCommandArgs, - #[arg(long, value_enum, default_value_t = RuntimeSelector::Auto)] + #[arg(long, value_enum, default_value_t = RuntimeSelector::Auto, help = "Runtime to answer")] pub runtime: RuntimeSelector, } @@ -217,17 +294,39 @@ pub struct TaskCommandArgs { help = "Read task from stdin when this positional is '-'" )] pub stdin: Option, - #[arg(long, conflicts_with = "task_file")] + #[arg(long, conflicts_with = "task_file", help = "Inline task text")] pub task: Option, - #[arg(long, value_name = "PATH")] + #[arg(long, value_name = "PATH", help = "Read task text from a file")] pub task_file: Option, - #[arg(long, default_value = ".")] + #[arg( + long, + value_name = "TEXT|builtin:NAME|@FILE|file:PATH|text:TEXT", + help = "Append a prompt part after the task; repeat for multiple ordered parts. All runtimes resolve builtin:NAME (embedded role prompts), @FILE, file:PATH, and text:TEXT locally" + )] + pub prompt_append: Vec, + #[arg( + long, + help = "Codex/Claude: return immediately and run the turn in a detached background worker; observe with status, block with `pandacode wait`, end with stop" + )] + pub detach: bool, + #[arg( + long, + value_name = "PATH", + help = "Require this file to exist (relative to --cd) after a completed turn, otherwise the state becomes no_report; repeat for multiple files" + )] + pub expect_artifact: Vec, + #[arg( + long, + help = "Codex only: set a thread goal/objective before the turn starts" + )] + pub objective: Option, + #[arg(long, default_value = ".", help = "Workspace directory")] pub cd: PathBuf, - #[arg(long, default_value = "latest")] + #[arg(long, default_value = "latest", help = "Session id, or latest")] pub session: String, - #[arg(long)] + #[arg(long, help = "Model id for this turn")] pub model: Option, - #[arg(long)] + #[arg(long, help = "Reasoning/effort level when supported")] pub effort: Option, #[arg( long, @@ -235,9 +334,9 @@ pub struct TaskCommandArgs { help = "Agent permission mode. New sessions default to max; resume inherits the stored mode unless set" )] pub permission: Option, - #[arg(long)] + #[arg(long, help = "Wait timeout in milliseconds")] pub timeout_ms: Option, - #[arg(long)] + #[arg(long, help = "Print machine-readable JSON")] pub json: bool, #[command(flatten)] pub bins: RuntimeBins, @@ -257,11 +356,15 @@ pub struct BambooTaskCommandArgs { #[derive(Debug, Args, Clone)] pub struct SessionCommandArgs { - #[arg(long, default_value = "latest")] + #[arg(long, default_value = "latest", help = "Session id, or latest")] pub session: String, - #[arg(long, default_value = ".")] + #[arg( + long, + default_value = ".", + help = "Workspace directory for session state" + )] pub cd: PathBuf, - #[arg(long)] + #[arg(long, help = "Print machine-readable JSON")] pub json: bool, #[command(flatten)] pub bins: RuntimeBins, @@ -269,11 +372,15 @@ pub struct SessionCommandArgs { #[derive(Debug, Args, Clone)] pub struct LogsCommandArgs { - #[arg(long, default_value = "latest")] + #[arg(long, default_value = "latest", help = "Session id, or latest")] pub session: String, - #[arg(long, default_value = ".")] + #[arg( + long, + default_value = ".", + help = "Workspace directory for session state" + )] pub cd: PathBuf, - #[arg(long, default_value_t = 100)] + #[arg(long, default_value_t = 100, help = "Number of log lines to show")] pub tail: usize, #[arg( long, @@ -281,7 +388,7 @@ pub struct LogsCommandArgs { help = "Claude only: capture the final visible viewport instead of scrollback tail" )] pub visible: bool, - #[arg(long)] + #[arg(long, help = "Print machine-readable JSON")] pub json: bool, #[command(flatten)] pub bins: RuntimeBins, @@ -289,15 +396,24 @@ pub struct LogsCommandArgs { #[derive(Debug, Args, Clone)] pub struct ModelCommandArgs { - #[arg(long, default_value = "latest")] + #[arg(long, default_value = "latest", help = "Session id, or latest")] pub session: String, - #[arg(long, default_value = ".")] + #[arg( + long, + default_value = ".", + help = "Workspace directory for session state" + )] pub cd: PathBuf, - #[arg(long = "model", alias = "set", value_name = "MODEL")] + #[arg( + long = "model", + alias = "set", + value_name = "MODEL", + help = "Model id for the next turn" + )] pub model: String, - #[arg(long)] + #[arg(long, help = "Reasoning/effort level for the next turn when supported")] pub effort: Option, - #[arg(long)] + #[arg(long, help = "Print machine-readable JSON")] pub json: bool, #[command(flatten)] pub bins: RuntimeBins, @@ -440,8 +556,6 @@ pub struct BambooRunArgs { #[derive(Debug, Args, Clone)] pub struct RuntimeBins { - #[arg(long, hide = true, default_value = "codexctl")] - pub codexctl_bin: String, #[arg(long, hide = true, default_value = "codex")] pub codex_bin: String, #[arg(long, hide = true, default_value = "claude")] @@ -450,16 +564,31 @@ pub struct RuntimeBins { pub tmux_bin: String, #[arg(long, hide = true, default_value = "summary")] pub log_mode: String, + #[arg( + long, + alias = "codex-auth-home", + value_name = "DIR", + conflicts_with = "codex_home", + help = "Codex account switch: copy auth material from this Codex home (e.g. ~/.codex-work) into PandaCode's managed clean home; that home's config/AGENTS.md/skills are NOT loaded" + )] + pub auth_home: Option, + #[arg( + long, + value_name = "DIR", + help = "Use this full Codex home as-is (loads its config, rules, and session storage). Prefer --auth-home for plain account switching" + )] + pub codex_home: Option, } impl Default for RuntimeBins { fn default() -> Self { Self { - codexctl_bin: "codexctl".to_string(), codex_bin: "codex".to_string(), claude_bin: "claude".to_string(), tmux_bin: "tmux".to_string(), log_mode: "summary".to_string(), + auth_home: None, + codex_home: None, } } } diff --git a/pandacode/src/io.rs b/pandacode/src/io.rs index 024637d..09a58ad 100644 --- a/pandacode/src/io.rs +++ b/pandacode/src/io.rs @@ -24,6 +24,28 @@ impl std::fmt::Display for JsonAlreadyEmitted { impl std::error::Error for JsonAlreadyEmitted {} +pub const DETACHED_ENV: &str = "PANDACODE_DETACHED"; + +pub fn detached_worker() -> bool { + std::env::var_os(DETACHED_ENV).is_some() +} + +/// Expected report artifacts (relative to the workspace root) that are still +/// missing. Used to turn a `completed` turn into `no_report`. +pub fn missing_artifacts(root: &Path, expected: &serde_json::Value) -> Vec { + expected + .as_array() + .map(|paths| { + paths + .iter() + .filter_map(|path| path.as_str()) + .filter(|path| !root.join(path).exists()) + .map(ToString::to_string) + .collect() + }) + .unwrap_or_default() +} + pub fn now_millis() -> u128 { SystemTime::now() .duration_since(UNIX_EPOCH) @@ -106,6 +128,106 @@ fn read_task_file(path: &Path, workspace_root: Option<&Path>) -> Result } } +pub fn apply_prompt_parts( + task: &str, + parts: &[String], + workspace_root: Option<&Path>, +) -> Result { + if parts.is_empty() { + return Ok(task.to_string()); + } + let mut combined = task.trim_end().to_string(); + for (index, part) in parts.iter().enumerate() { + let resolved = resolve_local_prompt_part(part, workspace_root) + .with_context(|| format!("resolve --prompt-append part {}", index + 1))?; + combined.push_str("\n\n----- appended prompt part "); + combined.push_str(&(index + 1).to_string()); + combined.push_str(" -----\n\n"); + combined.push_str(resolved.trim_end()); + } + Ok(combined) +} + +pub const BUILTIN_PROMPTS: &[(&str, &str)] = &[ + ( + "orchestrator-base", + include_str!("../prompts/builtin/orchestrator-base.md"), + ), + ( + "version-advisor", + include_str!("../prompts/builtin/version-advisor.md"), + ), + ( + "bdd-spec-designer", + include_str!("../prompts/builtin/bdd-spec-designer.md"), + ), + ( + "tech-architect", + include_str!("../prompts/builtin/tech-architect.md"), + ), + ( + "product-ux-taste-designer", + include_str!("../prompts/builtin/product-ux-taste-designer.md"), + ), + ( + "codebase-governance", + include_str!("../prompts/builtin/codebase-governance.md"), + ), + ( + "research-trigger", + include_str!("../prompts/builtin/research-trigger.md"), + ), + ( + "implementation-worker", + include_str!("../prompts/builtin/implementation-worker.md"), + ), + ("worker-card", include_str!("../prompts/builtin/worker-card.md")), + ( + "harness-verifier", + include_str!("../prompts/builtin/harness-verifier.md"), + ), + ( + "reviewer-red-team", + include_str!("../prompts/builtin/reviewer-red-team.md"), + ), + ("fresh-judge", include_str!("../prompts/builtin/fresh-judge.md")), + ( + "release-devlog-keeper", + include_str!("../prompts/builtin/release-devlog-keeper.md"), + ), +]; + +fn builtin_prompt(name: &str) -> Result { + if let Some((_, text)) = BUILTIN_PROMPTS.iter().find(|(key, _)| *key == name) { + return Ok((*text).to_string()); + } + let known = BUILTIN_PROMPTS + .iter() + .map(|(key, _)| *key) + .collect::>() + .join(", "); + bail!("unknown builtin prompt part {name}; known builtins: {known}") +} + +fn resolve_local_prompt_part(part: &str, workspace_root: Option<&Path>) -> Result { + if let Some(name) = part + .strip_prefix("builtin:") + .or_else(|| part.strip_prefix("@builtin:")) + { + return builtin_prompt(name); + } + if let Some(path) = part.strip_prefix("file:") { + return read_task_file(Path::new(path), workspace_root); + } + if let Some(path) = part.strip_prefix('@') { + return read_task_file(Path::new(path), workspace_root); + } + if let Some(text) = part.strip_prefix("text:") { + return Ok(text.to_string()); + } + Ok(part.to_string()) +} + pub fn write_prompt_file(root: &Path, runtime: &str, session: &str, task: &str) -> Result { let dir = pandacode_dir(root).join(runtime).join("prompts"); fs::create_dir_all(&dir).with_context(|| format!("create {}", dir.display()))?; @@ -207,10 +329,6 @@ pub fn command_report( }) } -pub fn parse_json_or_null(text: &str) -> Option { - serde_json::from_str::(text.trim()).ok() -} - pub fn tail(text: &str, lines: usize) -> String { let mut tail = text.lines().rev().take(lines).collect::>(); tail.reverse(); @@ -388,6 +506,55 @@ mod tests { assert_eq!(strip_ansi_controls("\u{1b}[31mred\u{1b}[0m"), "red"); } + #[test] + fn prompt_parts_append_in_order_with_separators() { + let parts = vec!["literal advice".to_string(), "text:builtin: looks literal".to_string()]; + let combined = apply_prompt_parts("do the task", &parts, None).unwrap(); + assert!(combined.starts_with("do the task")); + let first = combined.find("----- appended prompt part 1 -----").unwrap(); + let second = combined.find("----- appended prompt part 2 -----").unwrap(); + assert!(first < second); + assert!(combined.contains("literal advice")); + assert!(combined.contains("builtin: looks literal")); + } + + #[test] + fn prompt_parts_empty_keeps_task_unchanged() { + assert_eq!( + apply_prompt_parts("task text", &[], None).unwrap(), + "task text" + ); + } + + #[test] + fn prompt_parts_read_files_and_resolve_builtins() { + let root = std::env::temp_dir().join(format!( + "pandacode-prompt-part-{}-{}", + std::process::id(), + now_millis() + )); + fs::create_dir_all(&root).unwrap(); + fs::write(root.join("role.md"), "be careful").unwrap(); + let at_form = apply_prompt_parts("t", &["@role.md".to_string()], Some(&root)).unwrap(); + assert!(at_form.contains("be careful")); + let file_form = + apply_prompt_parts("t", &["file:role.md".to_string()], Some(&root)).unwrap(); + assert!(file_form.contains("be careful")); + let builtin = apply_prompt_parts( + "t", + &["builtin:implementation-worker".to_string()], + Some(&root), + ) + .unwrap(); + assert!(builtin.len() > "t".len() + 50); + let builtin_at = + apply_prompt_parts("t", &["@builtin:fresh-judge".to_string()], Some(&root)).unwrap(); + assert!(builtin_at.len() > "t".len() + 50); + let unknown = apply_prompt_parts("t", &["builtin:not-a-role".to_string()], Some(&root)); + assert!(unknown.is_err()); + fs::remove_dir_all(&root).ok(); + } + #[test] fn long_tasks_are_dispatched_by_file_reference() { let path = PathBuf::from("/tmp/pandacode-task.md"); diff --git a/pandacode/src/main.rs b/pandacode/src/main.rs index d3a200a..96d1e31 100644 --- a/pandacode/src/main.rs +++ b/pandacode/src/main.rs @@ -44,6 +44,8 @@ async fn run(cli: Cli) -> Result<()> { Commands::Artifacts(args) => runtimes::session_agent(args, "artifacts").await, Commands::Interrupt(args) => runtimes::session_agent(args, "interrupt").await, Commands::Stop(args) => runtimes::session_agent(args, "stop").await, + Commands::Wait(args) => runtimes::wait_sessions(args), + Commands::Gc(args) => runtimes::gc_sessions(args), Commands::Doctor(args) => runtimes::doctor(args).await, Commands::List(args) => runtimes::list_all(args), Commands::Models(args) => runtimes::models_all(args).await, @@ -86,6 +88,8 @@ fn command_wants_json(command: &Commands) -> bool { | Commands::Stop(args) => args.common.json, Commands::Logs(args) => args.common.json, Commands::Doctor(args) | Commands::List(args) | Commands::Models(args) => args.json, + Commands::Wait(args) => args.json, + Commands::Gc(args) => args.json, Commands::Codex(command) | Commands::Claude(command) => runtime_command_wants_json(command), Commands::Bamboo(command) => bamboo_command_wants_json(command), } diff --git a/pandacode/src/runtimes/bamboo.rs b/pandacode/src/runtimes/bamboo.rs index 87ed7b6..036c555 100644 --- a/pandacode/src/runtimes/bamboo.rs +++ b/pandacode/src/runtimes/bamboo.rs @@ -105,6 +105,10 @@ async fn answer(args: AnswerCommandArgs) -> Result<()> { stdin: None, task: Some(answer), task_file: None, + prompt_append: Vec::new(), + detach: false, + expect_artifact: Vec::new(), + objective: None, cd: root, session: record.session, model: None, @@ -128,12 +132,19 @@ async fn run_turn( resume_target: Option, action: &str, ) -> Result<()> { + if args.common.detach { + anyhow::bail!("--detach is only supported on the codex runtime"); + } + if args.common.objective.is_some() { + anyhow::bail!("--objective is only supported on the codex runtime"); + } let raw_task = read_task( args.common.task.as_deref(), args.common.task_file.as_deref(), args.common.stdin.as_deref(), Some(&root), )?; + let raw_task = crate::io::apply_prompt_parts(&raw_task, &args.common.prompt_append, Some(&root))?; let model = effective_model(args.common.model.as_deref(), record.model.as_deref()); let provider = effective_provider( args.provider.as_deref(), diff --git a/pandacode/src/runtimes/claude.rs b/pandacode/src/runtimes/claude.rs index a9962bb..26c8a0d 100644 --- a/pandacode/src/runtimes/claude.rs +++ b/pandacode/src/runtimes/claude.rs @@ -6,7 +6,7 @@ use std::{ time::Duration, }; -use anyhow::{Result, bail}; +use anyhow::{Context, Result, bail}; use serde_json::{Value, json}; use crate::{ @@ -23,8 +23,8 @@ use crate::{ }; const RUNTIME: &str = "claude"; -const DEFAULT_MODEL: &str = "opus"; -const DEFAULT_EFFORT: &str = "max"; +const DEFAULT_MODEL: &str = "fable"; +const DEFAULT_EFFORT: &str = "xhigh"; struct ClaudeLaunch<'a> { tmux_name: &'a str, @@ -79,6 +79,7 @@ pub fn record_hook(args: ClaudeHookArgs) -> Result<()> { } fn exec(args: TaskCommandArgs) -> Result<()> { + reject_codex_only_flags(&args)?; let root = workspace(&args.cd)?; let task = crate::io::read_task( args.task.as_deref(), @@ -86,6 +87,7 @@ fn exec(args: TaskCommandArgs) -> Result<()> { args.stdin.as_deref(), Some(&root), )?; + let task = crate::io::apply_prompt_parts(&task, &args.prompt_append, Some(&root))?; let session_name = if args.session == "latest" { generated_session(RUNTIME) } else { @@ -94,6 +96,30 @@ fn exec(args: TaskCommandArgs) -> Result<()> { let model = effective_model(args.model.as_deref(), None); let effort = effective_effort(args.effort, None); let permission = effective_permission(args.permission, None); + if args.detach { + return spawn_detached_claude_worker( + "exec", + &root, + &session_name, + &task, + &args, + &model, + &effort, + permission, + ); + } + let mut pre_record = SessionRecord::new(RUNTIME, &session_name, "claude-tmux", &root); + pre_record.tmux_name = Some(session_name.clone()); + pre_record.model = Some(model.clone()); + pre_record.effort = Some(effort.clone()); + pre_record.permission = Some(permission.as_value().to_string()); + pre_record.artifacts = json!({ + "status": "running", + "runner_pid": std::process::id(), + "detached": crate::io::detached_worker(), + "expected_artifacts": args.expect_artifact, + }); + session::save(&root, &mut pre_record)?; ensure_started( &root, &session_name, @@ -120,7 +146,7 @@ fn exec(args: TaskCommandArgs) -> Result<()> { &marker, Some(&event_log_path(&root, &session_name)), turn_started, - args.timeout_ms.unwrap_or(120_000), + args.timeout_ms.unwrap_or_else(default_wait_timeout_ms), )?; let mut record = SessionRecord::new(RUNTIME, &session_name, "claude-tmux", &root); @@ -129,12 +155,24 @@ fn exec(args: TaskCommandArgs) -> Result<()> { record.model = Some(model); record.effort = Some(effort); record.permission = Some(permission.as_value().to_string()); + let missing = crate::io::missing_artifacts(&root, &json!(args.expect_artifact)); + let mut state = wait["status"].clone(); + let mut ok_value = wait["ok"].clone(); + if state == "completed" && !missing.is_empty() { + state = json!("no_report"); + ok_value = json!(false); + } record.artifacts = json!({ "prompt_file": prompt_file, "transport": if dispatch_task.is_some() { "file_reference" } else { "direct" }, "debug_log": debug_log_path(&root, &session_name), "event_log": event_log_path(&root, &session_name), - "tmux_session": session_name + "tmux_session": session_name, + "status": state, + "runner_pid": std::process::id(), + "detached": crate::io::detached_worker(), + "expected_artifacts": args.expect_artifact, + "missing_artifacts": missing, }); if wait["status"] == "waiting_for_user" { record.artifacts["pending_marker"] = json!(marker); @@ -149,12 +187,14 @@ fn exec(args: TaskCommandArgs) -> Result<()> { &marker, ); let last_agent_message = strip_completion_marker( - wait.get("last_agent_message").cloned().unwrap_or(Value::Null), + wait.get("last_agent_message") + .cloned() + .unwrap_or(Value::Null), &marker, ); Ok(json!({ - "ok": wait["ok"], - "state": wait["status"], + "ok": ok_value, + "state": record.artifacts["status"].clone(), "runtime": RUNTIME, "action": "exec", "session": record.session, @@ -203,6 +243,7 @@ impl Drop for StartedSessionGuard<'_> { } fn resume(args: TaskCommandArgs) -> Result<()> { + reject_codex_only_flags(&args)?; let root = workspace(&args.cd)?; let task = crate::io::read_task( args.task.as_deref(), @@ -210,6 +251,7 @@ fn resume(args: TaskCommandArgs) -> Result<()> { args.stdin.as_deref(), Some(&root), )?; + let task = crate::io::apply_prompt_parts(&task, &args.prompt_append, Some(&root))?; let mut record = session::load(&root, RUNTIME, &args.session)?; let tmux = record .tmux_name @@ -218,6 +260,23 @@ fn resume(args: TaskCommandArgs) -> Result<()> { let model = effective_model(args.model.as_deref(), record.model.as_deref()); let effort = effective_effort(args.effort, record.effort.as_deref()); let permission = effective_permission(args.permission, record.permission.as_deref()); + if args.detach { + let session_name = record.session.clone(); + return spawn_detached_claude_worker( + "resume", + &root, + &session_name, + &task, + &args, + &model, + &effort, + permission, + ); + } + record.artifacts["status"] = json!("running"); + record.artifacts["runner_pid"] = json!(std::process::id()); + record.artifacts["detached"] = json!(crate::io::detached_worker()); + session::save(&root, &mut record)?; if tmux_has_session(&args.bins.tmux_bin, &tmux)? && args.permission.is_some() && permission != PermissionMode::from_record(record.permission.as_deref()) @@ -260,7 +319,7 @@ fn resume(args: TaskCommandArgs) -> Result<()> { &marker, Some(&event_log_path(&root, &tmux)), turn_started, - args.timeout_ms.unwrap_or(120_000), + args.timeout_ms.unwrap_or_else(default_wait_timeout_ms), )?; fill_claude_session_from_events(&root, &tmux, &mut record); record.model = Some(model); @@ -273,6 +332,15 @@ fn resume(args: TaskCommandArgs) -> Result<()> { "direct" }); record.artifacts["event_log"] = json!(event_log_path(&root, &tmux)); + let missing = crate::io::missing_artifacts(&root, &json!(args.expect_artifact)); + let mut resume_state = wait["status"].clone(); + let mut resume_ok = wait["ok"].clone(); + if resume_state == "completed" && !missing.is_empty() { + resume_state = json!("no_report"); + resume_ok = json!(false); + } + record.artifacts["status"] = resume_state.clone(); + record.artifacts["missing_artifacts"] = json!(missing); if wait["status"] == "waiting_for_user" { record.artifacts["pending_marker"] = json!(marker); } else if let Some(object) = record.artifacts.as_object_mut() { @@ -288,12 +356,14 @@ fn resume(args: TaskCommandArgs) -> Result<()> { &marker, ); let last_agent_message = strip_completion_marker( - wait.get("last_agent_message").cloned().unwrap_or(Value::Null), + wait.get("last_agent_message") + .cloned() + .unwrap_or(Value::Null), &marker, ); let report = json!({ - "ok": wait["ok"], - "state": wait["status"], + "ok": resume_ok, + "state": resume_state, "runtime": RUNTIME, "action": "resume", "session": record.session, @@ -374,7 +444,7 @@ fn answer(args: AnswerCommandArgs) -> Result<()> { marker, Some(&event_log_path(&root, &tmux)), answer_started, - args.timeout_ms.unwrap_or(120_000), + args.timeout_ms.unwrap_or_else(default_wait_timeout_ms), )?) } else { None @@ -429,10 +499,19 @@ fn status(args: SessionCommandArgs) -> Result<()> { None }; let event_log = event_log_path(&root, tmux); - let state = claude_state(&event_log, visible.as_deref(), alive); + let live_state = claude_state(&event_log, visible.as_deref(), alive); + // Top-level `state` mirrors the recorded execution outcome so it matches + // codex status and `pandacode wait`; the real-time tmux view is `live_state`. + let state = record + .artifacts + .get("status") + .and_then(|value| value.as_str()) + .map(ToString::to_string) + .unwrap_or_else(|| live_state.clone()); output_json(&json!({ - "ok": true, + "ok": state == "completed" || state == "idle", "state": state, + "live_state": live_state, "runtime": RUNTIME, "action": "status", "session": record.session, @@ -585,6 +664,7 @@ pub fn doctor_report(root: &Path, bins: &RuntimeBins) -> Result Result) -> Vec command } +#[allow(clippy::too_many_arguments)] +fn spawn_detached_claude_worker( + action: &str, + root: &Path, + session: &str, + task: &str, + args: &TaskCommandArgs, + model: &str, + effort: &str, + permission: PermissionMode, +) -> Result<()> { + let task_file = write_prompt_file(root, RUNTIME, &format!("{session}-detach"), task)?; + // Pre-write a starting record so `pandacode wait`/status see the lane + // immediately, before the background worker has launched its tmux session. + let mut pre = SessionRecord::new(RUNTIME, session, "claude-tmux", root); + pre.tmux_name = Some(session.to_string()); + pre.model = Some(model.to_string()); + pre.effort = Some(effort.to_string()); + pre.permission = Some(permission.as_value().to_string()); + pre.artifacts = json!({ + "status": "starting", + "detached": true, + "expected_artifacts": args.expect_artifact, + }); + session::save(root, &mut pre)?; + let out_dir = pandacode_dir(root).join(RUNTIME).join("detached"); + fs::create_dir_all(&out_dir)?; + let result_file = out_dir.join(format!("{session}.json")); + let stdout = fs::File::create(&result_file)?; + let stderr = stdout.try_clone()?; + let exe = std::env::current_exe()?; + let mut command = std::process::Command::new(exe); + command + .arg("claude") + .arg(action) + .args(["--session", session]) + .args(["--task-file", &task_file.to_string_lossy()]) + .args(["--cd", &root.to_string_lossy()]) + .args(["--model", model]) + .args(["--effort", effort]) + .args(["--permission", permission.as_value()]) + .args(["--claude-bin", &args.bins.claude_bin]) + .args(["--tmux-bin", &args.bins.tmux_bin]) + .args(["--log-mode", &args.bins.log_mode]) + .arg("--json") + .env(crate::io::DETACHED_ENV, "1") + .stdin(std::process::Stdio::null()) + .stdout(stdout) + .stderr(stderr); + if let Some(timeout) = args.timeout_ms { + command.args(["--timeout-ms", &timeout.to_string()]); + } + for artifact in &args.expect_artifact { + command.args(["--expect-artifact", &artifact.to_string_lossy()]); + } + #[cfg(unix)] + { + use std::os::unix::process::CommandExt; + command.process_group(0); + } + let child = command + .spawn() + .context("spawn detached claude worker")?; + output_json(&json!({ + "ok": true, + "state": "running", + "runtime": RUNTIME, + "action": action, + "session": session, + "detached": true, + "worker_pid": child.id(), + "result_file": result_file, + "note": "turn runs in a detached worker; poll `pandacode claude status` or block with `pandacode wait`", + })) +} + +fn reject_codex_only_flags(args: &TaskCommandArgs) -> Result<()> { + if args.objective.is_some() { + bail!("--objective is only supported on the codex runtime"); + } + Ok(()) +} + +fn default_wait_timeout_ms() -> u64 { + if crate::io::detached_worker() { + 1_800_000 + } else { + 120_000 + } +} + fn effective_model(explicit: Option<&str>, stored: Option<&str>) -> String { explicit.or(stored).unwrap_or(DEFAULT_MODEL).to_string() } @@ -924,7 +1095,9 @@ fn claude_usage_from_transcript(transcript_path: &str) -> Option { if role != Some("assistant") { continue; } - let Some(usage) = message.and_then(|m| m.get("usage")).or_else(|| event.get("usage")) + let Some(usage) = message + .and_then(|m| m.get("usage")) + .or_else(|| event.get("usage")) else { continue; }; @@ -1274,11 +1447,12 @@ mod tests { fn bins() -> RuntimeBins { RuntimeBins { - codexctl_bin: "codexctl".to_string(), codex_bin: "codex".to_string(), claude_bin: "claude".to_string(), tmux_bin: "tmux".to_string(), log_mode: "summary".to_string(), + auth_home: None, + codex_home: None, } } @@ -1288,6 +1462,10 @@ mod tests { stdin: None, task: Some("fix".to_string()), task_file: None, + prompt_append: Vec::new(), + detach: false, + expect_artifact: Vec::new(), + objective: None, cd: PathBuf::from("/repo"), session: "latest".to_string(), model: Some("sonnet".to_string()), @@ -1333,6 +1511,10 @@ mod tests { stdin: None, task: Some("fix".to_string()), task_file: None, + prompt_append: Vec::new(), + detach: false, + expect_artifact: Vec::new(), + objective: None, cd: PathBuf::from("/repo"), session: "latest".to_string(), model: Some("sonnet".to_string()), @@ -1526,8 +1708,10 @@ mod tests { #[test] fn claude_usage_sums_assistant_transcript_tokens() { - let dir = std::env::temp_dir() - .join(format!("pandacode-claude-usage-{}", crate::io::now_millis())); + let dir = std::env::temp_dir().join(format!( + "pandacode-claude-usage-{}", + crate::io::now_millis() + )); std::fs::create_dir_all(&dir).unwrap(); let path = dir.join("transcript.jsonl"); std::fs::write( diff --git a/pandacode/src/runtimes/codex.rs b/pandacode/src/runtimes/codex.rs index 15f2894..38a4c93 100644 --- a/pandacode/src/runtimes/codex.rs +++ b/pandacode/src/runtimes/codex.rs @@ -1,12 +1,20 @@ -use std::{ - collections::hash_map::DefaultHasher, - fs, - hash::{Hash, Hasher}, - path::{Path, PathBuf}, -}; +//! Codex runtime driven directly through `codex app-server` over stdio. +//! +//! Each turn spawns one app-server process, starts or resumes a thread, runs +//! `turn/start`, and waits for `turn/completed`. Thread state persists as +//! Codex rollout files, so resume works across processes without a daemon. +//! +//! `--detach` re-execs pandacode as a background worker that owns the live +//! app-server for the whole turn. The worker keeps the session record fresh +//! (status/last message/usage), waits in-protocol on `requestUserInput` so +//! `answer` can reply inside the same turn, and honors `interrupt` through a +//! control file. + +use std::path::{Path, PathBuf}; +use std::time::{Duration, Instant}; use anyhow::{Context, Result, bail}; -use serde_json::json; +use serde_json::{Value, json}; use crate::{ cli::{ @@ -14,22 +22,22 @@ use crate::{ RuntimeCommand, RuntimeGlobalArgs, SessionCommandArgs, TaskCommandArgs, }, io::{ - command_report, generated_session, output_json, pandacode_dir, parse_json_or_null, - run_capture, structured_log_tail, tail, workspace, write_prompt_file, + BUILTIN_PROMPTS, generated_session, output_json, pandacode_dir, sanitize_name, workspace, + write_prompt_file, }, - session::{self, SessionRecord, require_run_id}, + session::{self, SessionRecord}, +}; + +use super::codex_appserver::{ + self, AppServerClient, kill_process_group, notification_method, server_request_id, }; -use serde_json::Value; const RUNTIME: &str = "codex"; const DEFAULT_MODEL: &str = "gpt-5.5"; const DEFAULT_EFFORT: &str = "xhigh"; - -#[derive(Debug, Clone)] -struct CodexControl { - log_dir: PathBuf, - session_socket: PathBuf, -} +const DEFAULT_TIMEOUT_MS: u64 = 1_200_000; +const CALL_TIMEOUT: Duration = Duration::from_secs(60); +const POLL_SLICE: Duration = Duration::from_millis(300); pub fn run(command: RuntimeCommand) -> Result<()> { match command { @@ -48,6 +56,16 @@ pub fn run(command: RuntimeCommand) -> Result<()> { } } +#[derive(Default)] +struct TurnOutcome { + state: String, + agent_messages: Vec, + usage: Option, + questions: Vec, + completed: Option, + errors: Vec, +} + fn exec(args: TaskCommandArgs) -> Result<()> { let root = workspace(&args.cd)?; let task = crate::io::read_task( @@ -56,135 +74,99 @@ fn exec(args: TaskCommandArgs) -> Result<()> { args.stdin.as_deref(), Some(&root), )?; + let task = crate::io::apply_prompt_parts(&task, &args.prompt_append, Some(&root))?; let session_name = if args.session == "latest" { generated_session(RUNTIME) } else { - crate::io::sanitize_name(&args.session, RUNTIME) + sanitize_name(&args.session, RUNTIME) }; let model = effective_model(args.model.as_deref(), None); let effort = effective_effort(args.effort, None); let permission = effective_permission(args.permission, None); - let prompt_file = write_prompt_file(&root, RUNTIME, &session_name, &task)?; - let dispatch_prompt = crate::io::dispatch_task_for_transport(&task, &prompt_file); - let dispatch_prompt_file = if let Some(dispatch_prompt) = dispatch_prompt.as_deref() { - write_prompt_file( + if args.detach { + return spawn_detached_worker( + "exec", &root, - RUNTIME, - &format!("{}-dispatch", session_name), - dispatch_prompt, - )? - } else { - prompt_file.clone() - }; - let (control, command, output, transport_retries) = run_codex_start_with_retry( - &root, - &session_name, - &args, - &dispatch_prompt_file, - &model, - &effort, - )?; - let raw = parse_json_or_null(&output.stdout); - let mut record = SessionRecord::new(RUNTIME, &session_name, "codexctl-session", &root); - update_record_ids(&mut record, raw.as_ref()); + &session_name, + &task, + &args, + &model, + &effort, + permission, + ); + } + let prompt_file = write_prompt_file(&root, RUNTIME, &session_name, &task)?; + let log_path = appserver_log_path(&root, &session_name); + + let mut record = SessionRecord::new(RUNTIME, &session_name, "codex-appserver", &root); record.model = Some(model.clone()); record.effort = Some(effort.clone()); record.permission = Some(permission.as_value().to_string()); record.artifacts = json!({ "prompt_file": prompt_file, - "dispatch_prompt_file": dispatch_prompt_file, - "transport": if dispatch_prompt.is_some() { "file_reference" } else { "direct" }, - "transport_retries": transport_retries, - "log_dir": control.log_dir.to_string_lossy().to_string(), - "session_socket": control.session_socket.to_string_lossy().to_string() + "log_path": log_path, + "status": "starting", + "runner_pid": std::process::id(), + "detached": crate::io::detached_worker(), + "expected_artifacts": args.expect_artifact, }); - let start_report = command_summary( - output.ok, - "start", - Some(&record.session), - &command, - &output, - raw.as_ref(), - ); - let (ok, state, final_summary, execute_report) = if output.ok { - if let Some(run_id) = record.run_id.as_deref() { - if is_needs_input(raw.as_ref()) { - set_pending_stage(&mut record, Some("start")); - ( - false, - codex_state(raw.as_ref()), - codex_output_summary(raw.as_ref()), - json!(null), - ) - } else { - let execute_command = codex_execute_command( - &args.bins, - run_id, - &dispatch_prompt_file, - &control, - &args, - &model, - &effort, - ); - let execute_output = run_capture(&execute_command, Some(&root))?; - let execute_raw = parse_json_or_null(&execute_output.stdout); - update_record_ids(&mut record, execute_raw.as_ref()); - if is_needs_input(execute_raw.as_ref()) { - set_pending_stage(&mut record, Some("execute")); - } else { - set_pending_stage(&mut record, None); - } - let summary = codex_output_summary(execute_raw.as_ref().or(raw.as_ref())); - ( - execute_output.ok && !is_needs_input(execute_raw.as_ref()), - codex_state(execute_raw.as_ref().or(raw.as_ref())), - summary, - command_summary( - execute_output.ok, - "execute", - Some(&record.session), - &execute_command, - &execute_output, - execute_raw.as_ref(), - ), - ) - } - } else { - ( - false, - "failed".to_string(), - codex_output_summary(raw.as_ref()), - json!({ - "ok": false, - "runtime": RUNTIME, - "action": "execute", - "session": record.session, - "error": "codexctl session start did not return run_id" - }), - ) - } - } else { - ( - false, - codex_state(raw.as_ref()), - codex_output_summary(raw.as_ref()), - json!(null), - ) - }; session::save(&root, &mut record)?; - let mut report = json!({ - "ok": ok, - "runtime": RUNTIME, - "action": "exec", - "session": record.session, - "state": state, - "start": start_report, - "execute": execute_report, - "summary": final_summary - }); - report["record"] = serde_json::to_value(record)?; - output_json(&report) + let mut client = + codex_appserver::spawn_initialized(&args.bins, &root, Some(log_path.clone()))?; + let start = client.call( + "thread/start", + json!({ + "cwd": root.to_string_lossy(), + "approvalPolicy": approval_policy(permission), + "sandbox": sandbox_policy(permission), + "experimentalRawEvents": false, + "persistExtendedHistory": true, + "model": model, + }), + CALL_TIMEOUT, + )?; + record.thread_id = start + .pointer("/result/thread/id") + .and_then(Value::as_str) + .map(ToString::to_string); + record.thread_path = start + .pointer("/result/thread/path") + .and_then(Value::as_str) + .map(ToString::to_string); + let thread_model = start + .pointer("/result/model") + .and_then(Value::as_str) + .map(ToString::to_string) + .unwrap_or_else(|| model.clone()); + let thread_id = record + .thread_id + .clone() + .context("thread/start response missing result.thread.id")?; + if let Some(objective) = &args.objective { + let goal = client.call( + "thread/goal/set", + json!({ "threadId": thread_id, "objective": objective, "status": "active" }), + CALL_TIMEOUT, + )?; + record.artifacts["objective"] = json!(objective); + record.artifacts["goal"] = goal["result"].clone(); + } + record.artifacts["status"] = json!("running"); + session::save(&root, &mut record)?; + + let outcome = run_turn( + &mut client, + &root, + &mut record, + &thread_id, + &task, + &thread_model, + &effort, + permission, + args.timeout_ms.unwrap_or(DEFAULT_TIMEOUT_MS), + )?; + finish_turn(&root, &mut record, &outcome, "exec") } fn resume(args: TaskCommandArgs) -> Result<()> { @@ -195,274 +177,713 @@ fn resume(args: TaskCommandArgs) -> Result<()> { args.stdin.as_deref(), Some(&root), )?; + let task = crate::io::apply_prompt_parts(&task, &args.prompt_append, Some(&root))?; let mut record = session::load(&root, RUNTIME, &args.session)?; let model = effective_model(args.model.as_deref(), record.model.as_deref()); let effort = effective_effort(args.effort, record.effort.as_deref()); let permission = effective_permission(args.permission, record.permission.as_deref()); - if record.run_id.is_some() - && args.permission.is_some() - && permission != PermissionMode::from_record(record.permission.as_deref()) - { - bail!( - "Codex permission is established when the run starts; start a new session to switch permission" - ); - } - let prompt_file = write_prompt_file(&root, RUNTIME, &record.session, &task)?; - let dispatch_prompt = crate::io::dispatch_task_for_transport(&task, &prompt_file); - let dispatch_prompt_file = if let Some(dispatch_prompt) = dispatch_prompt.as_deref() { - write_prompt_file( + if args.detach { + return spawn_detached_worker( + "resume", &root, - RUNTIME, - &format!("{}-dispatch", record.session), - dispatch_prompt, - )? - } else { - prompt_file.clone() - }; - let control = record_codex_control(&root, &record)?; - let run_id = if let Some(run_id) = &record.run_id { - run_id.clone() - } else if let Some(thread_id) = &record.thread_id { - let resume = codex_resume_command( - &args.bins, thread_id, &root, &control, &model, &effort, permission, - ); - let output = run_capture(&resume, Some(&root))?; - let raw = parse_json_or_null(&output.stdout); - update_record_ids(&mut record, raw.as_ref()); - require_run_id(&record)? - } else { - bail!( - "codex session {} has neither run_id nor thread_id", - record.session + &record.session.clone(), + &task, + &args, + &model, + &effort, + permission, ); - }; - let command = codex_send_command( - &args.bins, - &run_id, - &dispatch_prompt_file, - &control, - &args, - &model, - &effort, - ); - let output = run_capture(&command, Some(&root))?; - let raw = parse_json_or_null(&output.stdout); - update_record_ids(&mut record, raw.as_ref()); + } record.model = Some(model.clone()); record.effort = Some(effort.clone()); record.permission = Some(permission.as_value().to_string()); - record.artifacts["last_prompt_file"] = json!(prompt_file); - record.artifacts["last_dispatch_prompt_file"] = json!(dispatch_prompt_file); - record.artifacts["last_transport"] = json!(if dispatch_prompt.is_some() { - "file_reference" - } else { - "direct" + if !args.expect_artifact.is_empty() { + record.artifacts["expected_artifacts"] = json!(args.expect_artifact); + } + let outcome = resume_turn(&root, &mut record, &args.bins, &task, args.timeout_ms)?; + finish_turn(&root, &mut record, &outcome, "resume") +} + +#[allow(clippy::too_many_arguments)] +fn spawn_detached_worker( + action: &str, + root: &Path, + session: &str, + task: &str, + args: &TaskCommandArgs, + model: &str, + effort: &str, + permission: PermissionMode, +) -> Result<()> { + let task_file = write_prompt_file(root, RUNTIME, &format!("{session}-detach"), task)?; + // Pre-write a starting record so `pandacode wait`/status see the lane + // immediately, before the background worker has spawned its app-server. + let mut pre = SessionRecord::new(RUNTIME, session, "codex-appserver", root); + pre.model = Some(model.to_string()); + pre.effort = Some(effort.to_string()); + pre.permission = Some(permission.as_value().to_string()); + pre.artifacts = json!({ + "status": "starting", + "detached": true, + "expected_artifacts": args.expect_artifact, }); - let send_report = command_summary( - output.ok, - "send", - Some(&record.session), - &command, - &output, - raw.as_ref(), - ); - let (ok, state, final_summary, execute_report) = if output.ok { - if is_needs_input(raw.as_ref()) { - set_pending_stage(&mut record, Some("send")); - ( - false, - codex_state(raw.as_ref()), - codex_output_summary(raw.as_ref()), - json!(null), - ) - } else { - let execute_command = codex_execute_command( - &args.bins, - &run_id, - &dispatch_prompt_file, - &control, - &args, - &model, - &effort, - ); - let execute_output = run_capture(&execute_command, Some(&root))?; - let execute_raw = parse_json_or_null(&execute_output.stdout); - update_record_ids(&mut record, execute_raw.as_ref()); - if is_needs_input(execute_raw.as_ref()) { - set_pending_stage(&mut record, Some("execute")); - } else { - set_pending_stage(&mut record, None); - } - let summary = codex_output_summary(execute_raw.as_ref().or(raw.as_ref())); - ( - execute_output.ok && !is_needs_input(execute_raw.as_ref()), - codex_state(execute_raw.as_ref().or(raw.as_ref())), - summary, - command_summary( - execute_output.ok, - "execute", - Some(&record.session), - &execute_command, - &execute_output, - execute_raw.as_ref(), - ), - ) - } - } else { - ( - false, - codex_state(raw.as_ref()), - codex_output_summary(raw.as_ref()), - json!(null), - ) - }; - session::save(&root, &mut record)?; - let mut report = json!({ - "ok": ok, + session::save(root, &mut pre)?; + let out_dir = pandacode_dir(root).join(RUNTIME).join("detached"); + std::fs::create_dir_all(&out_dir)?; + let result_file = out_dir.join(format!("{session}.json")); + let stdout = std::fs::File::create(&result_file)?; + let stderr = stdout.try_clone()?; + let exe = std::env::current_exe()?; + let mut command = std::process::Command::new(exe); + command + .arg("codex") + .arg(action) + .args(["--session", session]) + .args(["--task-file", &task_file.to_string_lossy()]) + .args(["--cd", &root.to_string_lossy()]) + .args(["--model", model]) + .args(["--effort", effort]) + .args(["--permission", permission.as_value()]) + .args(["--codex-bin", &args.bins.codex_bin]) + .args(["--claude-bin", &args.bins.claude_bin]) + .args(["--tmux-bin", &args.bins.tmux_bin]) + .args(["--log-mode", &args.bins.log_mode]) + .arg("--json") + .env(crate::io::DETACHED_ENV, "1") + .stdin(std::process::Stdio::null()) + .stdout(stdout) + .stderr(stderr); + if let Some(timeout) = args.timeout_ms { + command.args(["--timeout-ms", &timeout.to_string()]); + } + if let Some(objective) = &args.objective { + command.args(["--objective", objective]); + } + for artifact in &args.expect_artifact { + command.args(["--expect-artifact", &artifact.to_string_lossy()]); + } + if let Some(auth_home) = &args.bins.auth_home { + command.args(["--auth-home", &auth_home.to_string_lossy()]); + } + if let Some(codex_home) = &args.bins.codex_home { + command.args(["--codex-home", &codex_home.to_string_lossy()]); + } + #[cfg(unix)] + { + use std::os::unix::process::CommandExt; + command.process_group(0); + } + let child = command.spawn().context("spawn detached codex worker")?; + output_json(&json!({ + "ok": true, + "state": "running", "runtime": RUNTIME, - "action": "resume", - "session": record.session, - "state": state, - "send": send_report, - "execute": execute_report, - "summary": final_summary - }); - report["record"] = serde_json::to_value(record)?; - output_json(&report) + "action": action, + "session": session, + "detached": true, + "worker_pid": child.id(), + "result_file": result_file, + "note": "turn runs in a detached worker; poll `pandacode codex status`, continue with `answer`, abort with `interrupt`, end with `stop`", + })) } fn answer(args: AnswerCommandArgs) -> Result<()> { let root = workspace(&args.cd)?; - let mut record = session::load(&root, RUNTIME, &args.session)?; - let run_id = require_run_id(&record)?; - let control = record_codex_control(&root, &record)?; - let pending_stage = pending_stage(&record); - let command = codex_answer_command(&args.bins, &run_id, &root, &control, &args)?; - let output = run_capture(&command, Some(&root))?; - let raw = parse_json_or_null(&output.stdout); - update_record_ids(&mut record, raw.as_ref()); - let answer_report = command_summary( - output.ok, - "answer", - Some(&record.session), - &command, - &output, - raw.as_ref(), + let record = session::load(&root, RUNTIME, &args.session)?; + let pending = record.artifacts["pending_questions"] + .as_array() + .cloned() + .unwrap_or_default(); + let live_wait = record.artifacts["status"] == "waiting_for_user" + && record.artifacts["answer_file"].is_string() + && pid_alive(record.artifacts["runner_pid"].as_u64()); + if live_wait + && let Ok(payload) = + build_structured_answer(&pending, args.choice, args.text.as_deref()) + { + return structured_answer(&root, record, payload, args.timeout_ms); + } + // Fallback: continue the thread with the answer as a fresh turn. + let mut record = record; + let answer_text = match (&args.text, args.choice) { + (Some(text), None) => text.clone(), + (None, Some(choice)) => choice_answer_text(&record, choice)?, + _ => { + output_json(&json!({ + "ok": false, + "runtime": RUNTIME, + "action": "answer", + "error": "pass exactly one answer source: --choice N or --text TEXT" + }))?; + return Ok(()); + } + }; + let pending_value = record.artifacts["pending_questions"].clone(); + let task = if pending_value.is_null() { + answer_text + } else { + format!("User answer to the pending question(s) {pending_value}:\n{answer_text}") + }; + record.artifacts["pending_questions"] = Value::Null; + let outcome = resume_turn(&root, &mut record, &args.bins, &task, args.timeout_ms)?; + finish_turn_with(&root, &mut record, &outcome, "answer", json!("resume_turn")) +} + +fn structured_answer( + root: &Path, + record: SessionRecord, + payload: Value, + timeout_ms: Option, +) -> Result<()> { + let answer_file = PathBuf::from( + record.artifacts["answer_file"] + .as_str() + .context("answer_file missing")?, ); + let original_questions = record.artifacts["pending_questions"].clone(); + if let Some(parent) = answer_file.parent() { + std::fs::create_dir_all(parent)?; + } + let tmp = answer_file.with_extension("tmp"); + std::fs::write(&tmp, serde_json::to_string(&payload)?)?; + std::fs::rename(&tmp, &answer_file)?; - let mut execute_report = json!(null); - let mut final_summary = codex_output_summary(raw.as_ref()); - let mut ok = output.ok; - let mut state = codex_state(raw.as_ref()); + let deadline = Instant::now() + Duration::from_millis(timeout_ms.unwrap_or(DEFAULT_TIMEOUT_MS)); + let session_name = record.session.clone(); + loop { + std::thread::sleep(POLL_SLICE); + let current = session::load(root, RUNTIME, &session_name)?; + let status = current.artifacts["status"].as_str().unwrap_or(""); + let questions_changed = current.artifacts["pending_questions"] != original_questions; + let settled = match status { + "running" => false, + "waiting_for_user" => questions_changed, + _ => true, + }; + if settled || Instant::now() >= deadline { + let ok = status == "completed"; + return output_json(&json!({ + "ok": ok, + "state": current.artifacts["status"], + "runtime": RUNTIME, + "action": "answer", + "answer_mode": "structured", + "session": current.session, + "summary": { + "last_agent_message": current.artifacts["last_agent_message"], + "model": current.model, + "effort": current.effort, + "usage": current.artifacts["usage"], + }, + "pending_user_input": current.artifacts["pending_questions"], + "record": current, + })); + } + } +} - if output.ok && args.wait && is_needs_input(raw.as_ref()) { - set_pending_stage(&mut record, pending_stage.as_deref()); - ok = false; - } else if output.ok - && args.wait - && is_completed(raw.as_ref()) - && matches!(pending_stage.as_deref(), Some("start" | "send")) - { - let model = effective_model(None, record.model.as_deref()); - let effort = effective_effort(None, record.effort.as_deref()); - let execute_command = codex_execute_after_answer_command( - &args.bins, &run_id, &control, &args, &model, &effort, - ); - let execute_output = run_capture(&execute_command, Some(&root))?; - let execute_raw = parse_json_or_null(&execute_output.stdout); - update_record_ids(&mut record, execute_raw.as_ref()); - if is_needs_input(execute_raw.as_ref()) { - set_pending_stage(&mut record, Some("execute")); +fn build_structured_answer( + questions: &[Value], + choice: Option, + text: Option<&str>, +) -> Result { + if questions.is_empty() { + bail!("no pending questions recorded"); + } + let mut answers = serde_json::Map::new(); + for question in questions { + let id = question + .get("id") + .and_then(Value::as_str) + .context("pending question has no id; falling back to a resume turn")?; + let value = if let Some(text) = text { + text.to_string() } else { - set_pending_stage(&mut record, None); + let choice = choice.context("pass --choice N or --text TEXT")?; + if choice == 0 { + bail!("--choice is 1-based"); + } + question + .get("options") + .and_then(Value::as_array) + .and_then(|options| options.get(choice - 1)) + .and_then(option_label) + .with_context(|| format!("choice {choice} is out of range for question {id}"))? + }; + answers.insert(id.to_string(), json!({ "answers": [value] })); + } + Ok(json!({ "answers": answers })) +} + +fn option_label(option: &Value) -> Option { + option + .as_str() + .map(ToString::to_string) + .or_else(|| { + option + .get("label") + .or_else(|| option.get("text")) + .and_then(Value::as_str) + .map(ToString::to_string) + }) +} + +fn resume_turn( + root: &Path, + record: &mut SessionRecord, + bins: &RuntimeBins, + task: &str, + timeout_ms: Option, +) -> Result { + let thread_id = record + .thread_id + .clone() + .context("session has no Codex thread id; run `pandacode codex exec` first")?; + let model = record + .model + .clone() + .unwrap_or_else(|| DEFAULT_MODEL.to_string()); + let effort = record + .effort + .clone() + .unwrap_or_else(|| DEFAULT_EFFORT.to_string()); + let permission = PermissionMode::from_record(record.permission.as_deref()); + let log_path = appserver_log_path(root, &record.session); + let prompt_file = write_prompt_file(root, RUNTIME, &record.session, task)?; + record.artifacts["last_prompt_file"] = json!(prompt_file); + record.artifacts["runner_pid"] = json!(std::process::id()); + record.artifacts["detached"] = json!(crate::io::detached_worker()); + + let mut client = + codex_appserver::spawn_initialized(bins, root, Some(log_path.clone()))?; + let resumed = client.call( + "thread/resume", + json!({ "threadId": thread_id, "model": model }), + CALL_TIMEOUT, + )?; + if let Some(path) = resumed + .pointer("/result/thread/path") + .and_then(Value::as_str) + { + record.thread_path = Some(path.to_string()); + } + let thread_model = resumed + .pointer("/result/model") + .and_then(Value::as_str) + .map(ToString::to_string) + .unwrap_or(model); + record.artifacts["status"] = json!("running"); + session::save(root, record)?; + run_turn( + &mut client, + root, + record, + &thread_id, + task, + &thread_model, + &effort, + permission, + timeout_ms.unwrap_or(DEFAULT_TIMEOUT_MS), + ) +} + +#[allow(clippy::too_many_arguments)] +fn run_turn( + client: &mut AppServerClient, + root: &Path, + record: &mut SessionRecord, + thread_id: &str, + prompt: &str, + model: &str, + effort: &str, + permission: PermissionMode, + timeout_ms: u64, +) -> Result { + let request_id = client.send_request( + "turn/start", + json!({ + "threadId": thread_id, + "input": [{"type": "text", "text": prompt, "text_elements": []}], + "approvalPolicy": approval_policy(permission), + "collaborationMode": { + "mode": "default", + "settings": { + "model": model, + "developer_instructions": null, + "reasoning_effort": effort, + }, + }, + }), + )?; + let answer_wait = crate::io::detached_worker(); + let deadline = Instant::now() + Duration::from_millis(timeout_ms); + let mut outcome = TurnOutcome { + state: "running".to_string(), + ..TurnOutcome::default() + }; + loop { + let message = match client.recv_maybe(POLL_SLICE) { + Ok(Some(message)) => message, + Ok(None) => { + if take_interrupt(root, &record.session) { + interrupt_active_turn(client, record, thread_id); + outcome.state = "interrupted".to_string(); + client.kill(); + return Ok(outcome); + } + if Instant::now() >= deadline { + client.kill(); + outcome.state = "timeout".to_string(); + return Ok(outcome); + } + continue; + } + Err(error) => { + client.kill(); + outcome.state = "failed".to_string(); + outcome.errors.push(json!({ "error": error.to_string() })); + return Ok(outcome); + } + }; + if message.get("method").is_none() { + if message.get("id").and_then(Value::as_u64) == Some(request_id) { + if let Some(error) = message.get("error") { + outcome.state = "failed".to_string(); + outcome.errors.push(json!({ "error": error })); + return Ok(outcome); + } + if let Some(turn_id) = message.pointer("/result/turn/id").and_then(Value::as_str) { + record.artifacts["turn_id"] = json!(turn_id); + } + } + continue; + } + if let Some(id) = server_request_id(&message) { + let method = message["method"].as_str().unwrap_or(""); + if method == "item/tool/requestUserInput" { + outcome.questions = message + .pointer("/params/questions") + .and_then(Value::as_array) + .cloned() + .unwrap_or_default(); + if answer_wait { + match wait_for_structured_answer(client, root, record, &outcome.questions, deadline)? { + AnswerWait::Answered(payload) => { + client.send_response(id, payload)?; + record.artifacts["pending_questions"] = Value::Null; + record.artifacts["answer_file"] = Value::Null; + record.artifacts["status"] = json!("running"); + session::save(root, record)?; + outcome.questions.clear(); + continue; + } + AnswerWait::Interrupted => { + interrupt_active_turn(client, record, thread_id); + outcome.state = "interrupted".to_string(); + client.kill(); + return Ok(outcome); + } + AnswerWait::TimedOut => { + client.kill(); + outcome.state = "timeout".to_string(); + return Ok(outcome); + } + } + } + outcome.state = "waiting_for_user".to_string(); + client.kill(); + return Ok(outcome); + } + let _ = client.send_response(id, json!({})); + continue; } - ok = execute_output.ok && !is_needs_input(execute_raw.as_ref()); - state = codex_state(execute_raw.as_ref().or(raw.as_ref())); - final_summary = codex_output_summary(execute_raw.as_ref().or(raw.as_ref())); - execute_report = command_summary( - execute_output.ok, - "execute", - Some(&record.session), - &execute_command, - &execute_output, - execute_raw.as_ref(), + match notification_method(&message).unwrap_or("") { + "turn/started" => { + if let Some(turn_id) = message.pointer("/params/turn/id").and_then(Value::as_str) { + record.artifacts["turn_id"] = json!(turn_id); + } + } + "item/completed" => { + let item = &message["params"]["item"]; + let item_type = item.get("type").and_then(Value::as_str).unwrap_or(""); + if matches!(item_type, "agentMessage" | "agent_message") + && let Some(text) = item.get("text").and_then(Value::as_str) + { + outcome.agent_messages.push(text.to_string()); + record.artifacts["last_agent_message"] = json!(text); + let _ = session::save(root, record); + } + } + "thread/tokenUsage/updated" => { + outcome.usage = Some(message["params"].clone()); + record.artifacts["usage"] = message["params"].clone(); + let _ = session::save(root, record); + } + "turn/completed" => { + outcome.state = "completed".to_string(); + outcome.completed = Some(message["params"].clone()); + return Ok(outcome); + } + "error" => { + outcome.state = "failed".to_string(); + outcome.errors.push(message.clone()); + return Ok(outcome); + } + _ => {} + } + } +} + +enum AnswerWait { + Answered(Value), + Interrupted, + TimedOut, +} + +fn wait_for_structured_answer( + _client: &mut AppServerClient, + root: &Path, + record: &mut SessionRecord, + questions: &[Value], + deadline: Instant, +) -> Result { + let answer_file = answer_file_path(root, &record.session); + if let Some(parent) = answer_file.parent() { + std::fs::create_dir_all(parent)?; + } + let _ = std::fs::remove_file(&answer_file); + record.artifacts["status"] = json!("waiting_for_user"); + record.artifacts["pending_questions"] = json!(questions); + record.artifacts["answer_file"] = json!(answer_file); + session::save(root, record)?; + loop { + if take_interrupt(root, &record.session) { + return Ok(AnswerWait::Interrupted); + } + if Instant::now() >= deadline { + return Ok(AnswerWait::TimedOut); + } + if answer_file.exists() { + let text = std::fs::read_to_string(&answer_file)?; + let _ = std::fs::remove_file(&answer_file); + if let Ok(payload) = serde_json::from_str::(&text) { + return Ok(AnswerWait::Answered(payload)); + } + } + std::thread::sleep(POLL_SLICE); + } +} + +fn interrupt_active_turn(client: &mut AppServerClient, record: &SessionRecord, thread_id: &str) { + if let Some(turn_id) = record.artifacts["turn_id"].as_str() { + let _ = client.call( + "turn/interrupt", + json!({ "threadId": thread_id, "turnId": turn_id }), + Duration::from_secs(10), ); - } else if output.ok && !args.wait { - state = "running".to_string(); - set_pending_stage(&mut record, pending_stage.as_deref()); - } else if output.ok { - set_pending_stage(&mut record, None); } +} - session::save(&root, &mut record)?; +fn finish_turn( + root: &Path, + record: &mut SessionRecord, + outcome: &TurnOutcome, + action: &str, +) -> Result<()> { + finish_turn_with(root, record, outcome, action, Value::Null) +} + +fn finish_turn_with( + root: &Path, + record: &mut SessionRecord, + outcome: &TurnOutcome, + action: &str, + answer_mode: Value, +) -> Result<()> { + let last_agent_message = outcome + .agent_messages + .last() + .cloned() + .or_else(|| record.artifacts["last_agent_message"].as_str().map(ToString::to_string)); + let missing = crate::io::missing_artifacts(root, &record.artifacts["expected_artifacts"]); + let state = if outcome.state == "completed" && !missing.is_empty() { + "no_report".to_string() + } else { + outcome.state.clone() + }; + record.artifacts["missing_artifacts"] = json!(missing); + record.artifacts["status"] = json!(state); + record.artifacts["last_agent_message"] = json!(last_agent_message); + if let Some(usage) = &outcome.usage { + record.artifacts["usage"] = usage.clone(); + } + record.artifacts["pending_questions"] = if outcome.state == "waiting_for_user" { + json!(outcome.questions) + } else { + Value::Null + }; + record.artifacts["answer_file"] = Value::Null; + let _ = std::fs::remove_file(answer_file_path(root, &record.session)); + let _ = std::fs::remove_file(interrupt_file_path(root, &record.session)); + session::save(root, record)?; + let ok = state == "completed"; + let pending_user_input = if state == "waiting_for_user" { + json!({ "questions": outcome.questions, "source": "requestUserInput" }) + } else { + Value::Null + }; let mut report = json!({ "ok": ok, + "state": state, "runtime": RUNTIME, - "action": "answer", + "action": action, "session": record.session, - "state": state, - "answer": answer_report, - "execute": execute_report, - "summary": final_summary + "summary": { + "last_agent_message": last_agent_message, + "agent_messages": outcome.agent_messages, + "model": record.model, + "effort": record.effort, + "usage": outcome.usage, + "completed": outcome.completed, + "errors": outcome.errors, + }, + "pending_user_input": pending_user_input, + "artifacts": record.artifacts, + "record": record, }); - report["record"] = serde_json::to_value(record)?; + if !answer_mode.is_null() { + report["answer_mode"] = answer_mode; + } output_json(&report) } +fn choice_answer_text(record: &SessionRecord, choice: usize) -> Result { + if choice == 0 { + bail!("--choice is 1-based"); + } + let questions = record.artifacts["pending_questions"] + .as_array() + .cloned() + .unwrap_or_default(); + for question in &questions { + if let Some(options) = question.get("options").and_then(Value::as_array) + && let Some(option) = options.get(choice - 1) + && let Some(text) = option_label(option) + { + return Ok(text); + } + } + Ok(format!("Option {choice}")) +} + fn status(args: SessionCommandArgs) -> Result<()> { let root = workspace(&args.cd)?; let record = session::load(&root, RUNTIME, &args.session)?; - let run_id = require_run_id(&record)?; - let control = record_codex_control(&root, &record)?; - let command = codex_read_command(&args.bins, &run_id, &control, false); - let output = run_capture(&command, Some(&root))?; - let raw = parse_json_or_null(&output.stdout); + let state = record.artifacts["status"].as_str().unwrap_or("unknown"); output_json(&json!({ - "ok": output.ok, + "ok": state == "completed" || state == "idle", + "state": state, "runtime": RUNTIME, "action": "status", "session": record.session, - "state": codex_state(raw.as_ref()), - "summary": codex_output_summary(raw.as_ref()), - "output_tail": structured_log_tail(&output.stdout, 80), - "command": command_summary(output.ok, "status", Some(&record.session), &command, &output, raw.as_ref()), - "record": record + "summary": { + "last_agent_message": record.artifacts["last_agent_message"], + "model": record.model, + "effort": record.effort, + "usage": record.artifacts["usage"], + }, + "pending_user_input": record.artifacts["pending_questions"], + "artifacts": record.artifacts, + "record": record, })) } fn logs(args: LogsCommandArgs) -> Result<()> { let root = workspace(&args.cd)?; let record = session::load(&root, RUNTIME, &args.session)?; - let run_id = require_run_id(&record)?; - let control = record_codex_control(&root, &record)?; - let command = codex_read_command(&args.bins, &run_id, &control, true); - let output = run_capture(&command, Some(&root))?; + if args.visible { + let thread_id = record + .thread_id + .clone() + .context("session has no Codex thread id yet")?; + let mut client = codex_appserver::spawn_initialized(&args.bins, &root, None)?; + let response = client.call( + "thread/read", + json!({ "threadId": thread_id, "includeTurns": true }), + CALL_TIMEOUT, + )?; + client.kill(); + let thread = &response["result"]["thread"]; + let mut messages = Vec::new(); + for turn in thread + .get("turns") + .and_then(Value::as_array) + .into_iter() + .flatten() + { + let turn_id = turn.get("id").and_then(Value::as_str).unwrap_or(""); + for item in turn + .get("items") + .and_then(Value::as_array) + .into_iter() + .flatten() + { + let item_type = item.get("type").and_then(Value::as_str).unwrap_or(""); + let text = match item_type { + "userMessage" => item + .pointer("/content/0/text") + .and_then(Value::as_str), + "agentMessage" => item.get("text").and_then(Value::as_str), + _ => None, + }; + if let Some(text) = text { + messages.push(json!({ + "turn_id": turn_id, + "type": item_type, + "text": text, + })); + } + } + } + return output_json(&json!({ + "ok": true, + "runtime": RUNTIME, + "action": "logs", + "session": record.session, + "view": "thread", + "thread_id": thread.get("id"), + "turn_count": thread.get("turns").and_then(Value::as_array).map(Vec::len), + "messages": messages, + })); + } + let log_path = appserver_log_path(&root, &record.session); + let text = std::fs::read_to_string(&log_path).unwrap_or_default(); + let lines = text.lines().collect::>(); + let start = lines.len().saturating_sub(args.tail); + let log_tail = lines[start..].join("\n"); if args.json { - let raw = parse_json_or_null(&output.stdout); output_json(&json!({ - "ok": output.ok, + "ok": true, "runtime": RUNTIME, "action": "logs", "session": record.session, - "tail": args.tail, - "state": codex_state(raw.as_ref()), - "summary": codex_output_summary(raw.as_ref()), - "output_tail": structured_log_tail(&output.stdout, args.tail), - "command": command_summary(output.ok, "logs", Some(&record.session), &command, &output, raw.as_ref()), - "record": record + "state": record.artifacts["status"], + "log_path": log_path, + "log_tail": log_tail, + "summary": { + "last_agent_message": record.artifacts["last_agent_message"], + }, })) } else { - println!("{}", tail(&output.stdout, args.tail)); - if !output.stderr.trim().is_empty() { - eprintln!("{}", tail(&output.stderr, args.tail)); - } + println!("{log_tail}"); Ok(()) } } fn artifacts(args: SessionCommandArgs) -> Result<()> { let root = workspace(&args.cd)?; - output_json(&session::artifacts(&root, RUNTIME, &args.session)?) + let session_name = session::resolve_session(&root, RUNTIME, &args.session)?; + output_json(&session::artifacts(&root, RUNTIME, &session_name)?) } fn model(args: ModelCommandArgs) -> Result<()> { @@ -475,81 +896,159 @@ fn model(args: ModelCommandArgs) -> Result<()> { "ok": true, "runtime": RUNTIME, "action": "model", - "note": "model and effort are applied on the next codex resume/send turn", + "note": "model and effort apply on the next exec/resume/answer turn", "record": record })) } -fn models(args: RuntimeGlobalArgs) -> Result<()> { - let root = workspace(&args.cd)?; - output_json(&models_report(&root, &args.bins)?) -} - fn interrupt(args: SessionCommandArgs) -> Result<()> { - session_action(args, "interrupt", codex_interrupt_command) + let root = workspace(&args.cd)?; + let record = session::load(&root, RUNTIME, &args.session)?; + let state = record.artifacts["status"].as_str().unwrap_or("unknown"); + let live = matches!(state, "running" | "waiting_for_user" | "starting") + && pid_alive(record.artifacts["runner_pid"].as_u64()); + if !live { + return output_json(&json!({ + "ok": true, + "runtime": RUNTIME, + "action": "interrupt", + "session": record.session, + "state": state, + "note": "no live turn to interrupt", + })); + } + let interrupt_file = interrupt_file_path(&root, &record.session); + if let Some(parent) = interrupt_file.parent() { + std::fs::create_dir_all(parent)?; + } + std::fs::write(&interrupt_file, b"")?; + let session_name = record.session.clone(); + let deadline = Instant::now() + Duration::from_secs(30); + loop { + std::thread::sleep(POLL_SLICE); + let current = session::load(&root, RUNTIME, &session_name)?; + let state = current.artifacts["status"].as_str().unwrap_or("unknown"); + if !matches!(state, "running" | "waiting_for_user" | "starting") + || Instant::now() >= deadline + { + return output_json(&json!({ + "ok": state == "interrupted", + "runtime": RUNTIME, + "action": "interrupt", + "session": current.session, + "state": state, + })); + } + } } fn stop(args: SessionCommandArgs) -> Result<()> { - session_action(args, "stop", codex_stop_command) + let root = workspace(&args.cd)?; + let mut record = session::load(&root, RUNTIME, &args.session)?; + let runner_pid = record.artifacts["runner_pid"].as_u64(); + let detached = record.artifacts["detached"] == json!(true); + if detached + && let Some(pid) = runner_pid + && pid_alive(Some(pid)) + { + kill_process_group(pid as u32); + std::thread::sleep(Duration::from_millis(200)); + } + codex_appserver::reap_orphans(&root); + record.artifacts["status"] = json!("stopped"); + record.artifacts["pending_questions"] = Value::Null; + session::save(&root, &mut record)?; + output_json(&json!({ + "ok": true, + "runtime": RUNTIME, + "action": "stop", + "session": record.session, + "state": "stopped", + "note": "thread state stays resumable on disk", + })) } fn list(args: RuntimeGlobalArgs) -> Result<()> { let root = workspace(&args.cd)?; - let command = vec![ - args.bins.codexctl_bin.clone(), - "session".to_string(), - "list".to_string(), - "--threads".to_string(), - "--log-dir".to_string(), - codex_legacy_log_dir(&root).to_string_lossy().to_string(), - "--log-mode".to_string(), - args.bins.log_mode.clone(), - ]; - let output = run_capture(&command, Some(&root))?; output_json(&json!({ - "ok": output.ok, + "ok": true, "runtime": RUNTIME, - "local": session::list(&root, RUNTIME)?, - "codexctl": command_report(output.ok, RUNTIME, "list", None, &command, &output, parse_json_or_null(&output.stdout)) + "action": "list", + "sessions": session::list(&root, RUNTIME)?, })) } +fn models(args: RuntimeGlobalArgs) -> Result<()> { + let root = workspace(&args.cd)?; + output_json(&models_report(&root, &args.bins)?) +} + +// Codex app-server protocol surface PandaCode depends on. Surfaced in doctor +// so a codex CLI upgrade that changes the protocol is visible, not silent. +const TESTED_CODEX_VERSION: &str = "0.139.0"; + fn doctor(args: RuntimeGlobalArgs) -> Result<()> { let root = workspace(&args.cd)?; - output_json(&doctor_report(&root, &args.bins)?) + let mut report = doctor_report(&root, &args.bins)?; + let version = super::version_report(&args.bins.codex_bin, &["--version"]); + let installed = version + .get("stdout") + .and_then(Value::as_str) + .map(|s| s.trim().to_string()); + report["codex_version"] = json!({ + "installed": installed, + "tested": TESTED_CODEX_VERSION, + "note": "PandaCode drives the codex app-server JSON-RPC protocol; if a future codex CLI changes thread/turn methods or event names, turns may stall — re-verify after upgrades.", + }); + report["appserver"] = appserver_handshake_report(&root, &args.bins); + output_json(&report) } -pub fn doctor_report(root: &Path, bins: &RuntimeBins) -> Result { - let codexctl = super::version_report(&bins.codexctl_bin, &["--help"]); +fn appserver_handshake_report(root: &Path, bins: &RuntimeBins) -> Value { + match codex_appserver::spawn_initialized(bins, root, None) { + Ok(mut client) => { + let account = client + .call("account/read", json!({}), CALL_TIMEOUT) + .ok() + .and_then(|response| response.pointer("/result/account").cloned()) + .unwrap_or(Value::Null); + let rate_limits = client + .call("account/rateLimits/read", json!({}), CALL_TIMEOUT) + .ok() + .and_then(|response| response.pointer("/result/rateLimits").cloned()) + .unwrap_or(Value::Null); + client.kill(); + json!({ "ok": true, "account": account, "rate_limits": rate_limits }) + } + Err(error) => json!({ "ok": false, "error": error.to_string() }), + } +} + +pub fn doctor_report(root: &Path, bins: &RuntimeBins) -> Result { let codex = super::version_report(&bins.codex_bin, &["--help"]); - let codexctl_ok = codexctl - .get("ok") - .and_then(|v| v.as_bool()) - .unwrap_or(false); - let codex_ok = codex.get("ok").and_then(|v| v.as_bool()).unwrap_or(false); - let missing = [ - (!codexctl_ok).then_some("codexctl"), - (!codex_ok).then_some("codex"), - ] - .into_iter() - .flatten() - .collect::>(); + let codex_ok = codex.get("ok").and_then(Value::as_bool).unwrap_or(false); + let missing = [(!codex_ok).then_some("codex")] + .into_iter() + .flatten() + .collect::>(); Ok(json!({ - "ok": codexctl_ok && codex_ok, - "state": if codexctl_ok && codex_ok { "available" } else { "missing_requirements" }, + "ok": codex_ok, + "state": if codex_ok { "available" } else { "missing_requirements" }, "runtime": RUNTIME, "workspace": root, - "driver": "codexctl session", - "requirements": ["codexctl", "codex"], + "driver": "codex app-server", + "requirements": ["codex"], "missing": missing, "capabilities": { "task_execution": true, "resume": true, "answer": true, + "detach": true, "interrupt": true, "stop": true, "model": true, "effort": true, + "objective": true, "permissions_supported": ["max", "limited"], "timeout": true, "token_budget": false, @@ -558,24 +1057,19 @@ pub fn doctor_report(root: &Path, bins: &RuntimeBins) -> Result Result { - let command = vec![bins.codexctl_bin.clone(), "models".to_string()]; - let output = run_capture(&command, Some(root))?; - let mut report = command_report( - output.ok, - RUNTIME, - "models", - None, - &command, - &output, - parse_json_or_null(&output.stdout), - ); - report["capabilities"] = json!({ +pub fn models_report(root: &Path, bins: &RuntimeBins) -> Result { + let builtin_prompts = BUILTIN_PROMPTS + .iter() + .map(|(name, _)| *name) + .collect::>(); + let fallback_models = json!([ + {"id": DEFAULT_MODEL, "supported_reasoning_efforts": ["low", "medium", "high", "xhigh"], "is_default": true} + ]); + let capabilities = json!({ "model": true, "effort": true, "permissions_supported": ["max", "limited"], @@ -586,462 +1080,122 @@ pub fn models_report(root: &Path, bins: &RuntimeBins) -> Result Vec, -) -> Result<()> { - let root = workspace(&args.cd)?; - let record = session::load(&root, RUNTIME, &args.session)?; - let run_id = require_run_id(&record)?; - let control = record_codex_control(&root, &record)?; - let command = build(&args.bins, &run_id, &control); - let output = run_capture(&command, Some(&root))?; - output_json(&command_report( - output.ok, - RUNTIME, - action, - Some(&record.session), - &command, - &output, - parse_json_or_null(&output.stdout), - )) -} - -fn run_codex_start_with_retry( - root: &Path, - session_name: &str, - args: &TaskCommandArgs, - prompt_file: &Path, - model: &str, - effort: &str, -) -> Result<(CodexControl, Vec, crate::io::CmdOutput, usize)> { - let mut retries = 0; - loop { - let control_session = if retries == 0 { - session_name.to_string() - } else { - format!("{session_name}-retry-{retries}") - }; - let control = new_codex_control(root, &control_session)?; - let command = - codex_start_command(&args.bins, prompt_file, root, &control, args, model, effort); - let output = run_capture(&command, Some(root))?; - if output.ok || retries >= 2 || !is_codex_transport_error(&output) { - return Ok((control, command, output, retries)); + let mut client = match codex_appserver::spawn_initialized(bins, root, None) { + Ok(client) => client, + Err(error) => { + return Ok(json!({ + "ok": false, + "runtime": RUNTIME, + "action": "models", + "driver": "codex app-server", + "workspace": root, + "models": fallback_models, + "builtin_prompts": builtin_prompts, + "error": error.to_string(), + "note": "app-server handshake failed; reporting PandaCode Codex defaults", + "capabilities": capabilities, + })); } - retries += 1; - } -} - -fn is_codex_transport_error(output: &crate::io::CmdOutput) -> bool { - let text = format!("{}\n{}", output.stderr, output.stdout).to_lowercase(); - [ - "broken pipe", - "socket is not connected", - "connection reset", - "daemon returned invalid json", - "eof while parsing", - ] - .iter() - .any(|needle| text.contains(needle)) -} - -fn command_summary( - ok: bool, - action: &str, - session: Option<&str>, - command: &[String], - output: &crate::io::CmdOutput, - raw: Option<&serde_json::Value>, -) -> serde_json::Value { - json!({ - "ok": ok, - "runtime": RUNTIME, - "action": action, - "session": session, - "command": command, - "shell": crate::io::shell_join(command), - "exit_code": output.exit_code, - "stdout_tail": tail_chars(&output.stdout, 1_200), - "stderr_tail": tail_chars(&output.stderr, 1_200), - "summary": codex_output_summary(raw) - }) -} - -fn codex_output_summary(raw: Option<&serde_json::Value>) -> serde_json::Value { - let Some(raw) = raw else { - return json!(null); }; - json!({ - "ok": raw.get("ok").and_then(|value| value.as_bool()), - "status": raw.get("status").and_then(|value| value.as_str()), - "current_phase": raw.get("current_phase").and_then(|value| value.as_str()), - "run_id": string_field(raw, &["run_id", "runId"]), - "thread_id": string_field(raw, &["thread_id", "threadId"]), - "thread_path": string_field(raw, &["thread_path", "threadPath"]), - "turn_id": string_field(raw, &["turn_id", "turnId"]), - "log_path": raw.get("log_path").and_then(|value| value.as_str()), - "last_agent_message": raw.get("last_agent_message").and_then(|value| value.as_str()) - .or_else(|| last_string(raw.get("agent_messages"))), - "counts": raw.get("counts").cloned().unwrap_or_else(|| json!({ - "agent_messages": array_len(raw.get("agent_messages")), - "plans": array_len(raw.get("plans")), - "questions": array_len(raw.get("questions")), - "errors": array_len(raw.get("errors")), - "warnings": array_len(raw.get("warnings")) - })), - "usage": raw.get("usage").cloned(), - "errors": raw.get("errors").and_then(|value| value.as_array()).map(|items| items.len()), - "warnings": raw.get("warnings").and_then(|value| value.as_array()).map(|items| items.len()) - }) -} - -fn codex_state(raw: Option<&Value>) -> String { - match codex_status(raw) { - Some("needs_input") => "waiting_for_user".to_string(), - Some("completed") => "completed".to_string(), - Some("running") => "running".to_string(), - Some("stopped") => "stopped".to_string(), - Some("failed") => "failed".to_string(), - Some(status) => status.to_string(), - None => "unknown".to_string(), - } -} - -fn codex_status(raw: Option<&Value>) -> Option<&str> { - raw.and_then(|value| value.get("status")) - .and_then(|value| value.as_str()) -} - -fn is_needs_input(raw: Option<&Value>) -> bool { - codex_status(raw) == Some("needs_input") -} - -fn is_completed(raw: Option<&Value>) -> bool { - codex_status(raw) == Some("completed") -} - -fn update_record_ids(record: &mut SessionRecord, raw: Option<&Value>) { - let Some(raw) = raw else { - return; - }; - if let Some(run_id) = string_field(raw, &["run_id", "runId"]) { - record.run_id = Some(run_id.to_string()); - } - if let Some(thread_id) = string_field(raw, &["thread_id", "threadId"]) { - record.thread_id = Some(thread_id.to_string()); - } - if let Some(thread_path) = string_field(raw, &["thread_path", "threadPath"]) { - record.thread_path = Some(thread_path.to_string()); - } -} - -fn pending_stage(record: &SessionRecord) -> Option { - record - .artifacts - .get("pending_stage") - .and_then(|value| value.as_str()) - .map(ToString::to_string) -} - -fn set_pending_stage(record: &mut SessionRecord, stage: Option<&str>) { - if !record.artifacts.is_object() { - record.artifacts = json!({}); - } - if let Some(object) = record.artifacts.as_object_mut() { - if let Some(stage) = stage { - object.insert("pending_stage".to_string(), json!(stage)); - } else { - object.remove("pending_stage"); + let response = client.call("model/list", json!({}), CALL_TIMEOUT); + client.kill(); + let response = match response { + Ok(response) => response, + Err(error) => { + return Ok(json!({ + "ok": false, + "runtime": RUNTIME, + "action": "models", + "driver": "codex app-server", + "workspace": root, + "models": fallback_models, + "builtin_prompts": builtin_prompts, + "error": error.to_string(), + "note": "model/list failed; reporting PandaCode Codex defaults", + "capabilities": capabilities, + })); } - } -} - -fn first_pending_question_id( - bins: &RuntimeBins, - run_id: &str, - root: &Path, - control: &CodexControl, -) -> Result { - let command = codex_read_command(bins, run_id, control, false); - let output = run_capture(&command, Some(root))?; - if !output.ok { - bail!( - "codexctl session read failed before answering: {}", - output.stderr.trim() - ); - } - let raw = parse_json_or_null(&output.stdout); - first_question_id(raw.as_ref()).ok_or_else(|| { - anyhow::anyhow!( - "cannot infer Codex question id for --text; pass --choice N or --text '{{...}}'" - ) - }) + }; + let models = response + .pointer("/result/data") + .and_then(Value::as_array) + .cloned() + .unwrap_or_default() + .into_iter() + .map(|model| { + json!({ + "id": model.get("id").cloned().unwrap_or(Value::Null), + "model": model.get("model").cloned().unwrap_or(Value::Null), + "display_name": model.get("displayName").cloned().unwrap_or(Value::Null), + "is_default": model.get("isDefault").cloned().unwrap_or(Value::Bool(false)), + "default_reasoning_effort": model.get("defaultReasoningEffort").cloned().unwrap_or(Value::Null), + "supported_reasoning_efforts": model.get("supportedReasoningEfforts").cloned().unwrap_or(Value::Array(Vec::new())), + }) + }) + .collect::>(); + Ok(json!({ + "ok": true, + "runtime": RUNTIME, + "action": "models", + "driver": "codex app-server", + "workspace": root, + "models": models, + "builtin_prompts": builtin_prompts, + "capabilities": capabilities, + })) } -fn first_question_id(raw: Option<&Value>) -> Option { - let question = raw? - .get("questions") - .and_then(|value| value.as_array()) - .and_then(|questions| questions.first())?; - for key in ["id", "question_id", "questionId", "key", "name"] { - if let Some(value) = question.get(key).and_then(|value| value.as_str()) { - return Some(value.to_string()); - } - } - question - .get("question") - .and_then(|value| value.as_str()) - .map(ToString::to_string) +fn appserver_log_path(root: &Path, session: &str) -> PathBuf { + pandacode_dir(root) + .join(RUNTIME) + .join("logs") + .join(format!("{session}.jsonl")) } -fn string_field<'a>(value: &'a serde_json::Value, names: &[&str]) -> Option<&'a str> { - names - .iter() - .find_map(|name| value.get(*name).and_then(|field| field.as_str())) +fn control_dir(root: &Path) -> PathBuf { + pandacode_dir(root).join(RUNTIME).join("control") } -fn last_string(value: Option<&serde_json::Value>) -> Option<&str> { - value - .and_then(|value| value.as_array()) - .and_then(|items| items.last()) - .and_then(|value| value.as_str()) +fn answer_file_path(root: &Path, session: &str) -> PathBuf { + control_dir(root).join(format!("{session}.answer.json")) } -fn array_len(value: Option<&serde_json::Value>) -> usize { - value - .and_then(|value| value.as_array()) - .map(|items| items.len()) - .unwrap_or(0) +fn interrupt_file_path(root: &Path, session: &str) -> PathBuf { + control_dir(root).join(format!("{session}.interrupt")) } -fn tail_chars(text: &str, max_chars: usize) -> String { - let char_count = text.chars().count(); - if char_count <= max_chars { - return text.to_string(); +fn take_interrupt(root: &Path, session: &str) -> bool { + let path = interrupt_file_path(root, session); + if path.exists() { + let _ = std::fs::remove_file(&path); + true + } else { + false } - let start = char_count.saturating_sub(max_chars); - format!("...{}", text.chars().skip(start).collect::()) -} - -fn codex_start_command( - bins: &RuntimeBins, - prompt_file: &Path, - root: &Path, - control: &CodexControl, - args: &TaskCommandArgs, - model: &str, - effort: &str, -) -> Vec { - let mut command = vec![ - bins.codexctl_bin.clone(), - "session".to_string(), - "start".to_string(), - "--prompt-file".to_string(), - prompt_file.to_string_lossy().to_string(), - "--cwd".to_string(), - root.to_string_lossy().to_string(), - "--timeout".to_string(), - timeout_arg(args.timeout_ms), - ]; - push_control(&mut command, control, &bins.log_mode); - push_model_effort(&mut command, model, effort); - push_permission(&mut command, effective_permission(args.permission, None)); - command } -fn codex_send_command( - bins: &RuntimeBins, - run_id: &str, - prompt_file: &Path, - control: &CodexControl, - args: &TaskCommandArgs, - model: &str, - effort: &str, -) -> Vec { - let mut command = vec![ - bins.codexctl_bin.clone(), - "session".to_string(), - "send".to_string(), - "--run-id".to_string(), - run_id.to_string(), - "--prompt-file".to_string(), - prompt_file.to_string_lossy().to_string(), - "--timeout".to_string(), - timeout_arg(args.timeout_ms), - ]; - push_control(&mut command, control, &args.bins.log_mode); - push_model_effort(&mut command, model, effort); - command -} - -fn codex_execute_command( - bins: &RuntimeBins, - run_id: &str, - prompt_file: &Path, - control: &CodexControl, - args: &TaskCommandArgs, - model: &str, - effort: &str, -) -> Vec { - let mut command = vec![ - bins.codexctl_bin.clone(), - "session".to_string(), - "execute".to_string(), - "--run-id".to_string(), - run_id.to_string(), - "--prompt-file".to_string(), - prompt_file.to_string_lossy().to_string(), - "--timeout".to_string(), - timeout_arg(args.timeout_ms), - ]; - push_control(&mut command, control, &args.bins.log_mode); - push_model_effort(&mut command, model, effort); - command -} - -fn codex_resume_command( - bins: &RuntimeBins, - thread_id: &str, - root: &Path, - control: &CodexControl, - model: &str, - effort: &str, - permission: PermissionMode, -) -> Vec { - let mut command = vec![ - bins.codexctl_bin.clone(), - "session".to_string(), - "resume".to_string(), - "--thread-id".to_string(), - thread_id.to_string(), - "--cwd".to_string(), - root.to_string_lossy().to_string(), - ]; - push_control(&mut command, control, &bins.log_mode); - push_model_effort(&mut command, model, effort); - push_permission(&mut command, permission); - command +fn pid_alive(pid: Option) -> bool { + let Some(pid) = pid else { + return false; + }; + std::process::Command::new("/bin/kill") + .args(["-0", &pid.to_string()]) + .stdout(std::process::Stdio::null()) + .stderr(std::process::Stdio::null()) + .status() + .map(|status| status.success()) + .unwrap_or(false) } -fn codex_read_command( - bins: &RuntimeBins, - run_id: &str, - control: &CodexControl, - full: bool, -) -> Vec { - let mut command = vec![ - bins.codexctl_bin.clone(), - "session".to_string(), - "read".to_string(), - "--run-id".to_string(), - run_id.to_string(), - ]; - push_control(&mut command, control, &bins.log_mode); - if full { - command.push("--full".to_string()); - } - command +fn approval_policy(_permission: PermissionMode) -> &'static str { + "never" } -fn codex_answer_command( - bins: &RuntimeBins, - run_id: &str, - root: &Path, - control: &CodexControl, - args: &AnswerCommandArgs, -) -> Result> { - let mut command = vec![ - bins.codexctl_bin.clone(), - "session".to_string(), - "answer".to_string(), - "--run-id".to_string(), - run_id.to_string(), - "--timeout".to_string(), - timeout_arg(args.timeout_ms), - ]; - push_control(&mut command, control, &bins.log_mode); - match (args.choice, args.text.as_deref()) { - (Some(choice), None) if choice > 0 => { - command.extend(["--pick".to_string(), choice.to_string()]); - } - (None, Some(text)) => { - if serde_json::from_str::(text) - .map(|value| value.is_object()) - .unwrap_or(false) - { - command.extend(["--answers-json".to_string(), text.to_string()]); - } else { - let question_id = first_pending_question_id(bins, run_id, root, control)?; - command.extend(["--answer".to_string(), format!("{question_id}={text}")]); - } - } - _ => bail!("pass exactly one answer source: --choice N or --text TEXT"), - } - if !args.wait { - command.push("--detach".to_string()); +fn sandbox_policy(permission: PermissionMode) -> &'static str { + match permission { + PermissionMode::Max => "danger-full-access", + PermissionMode::Limited => "workspace-write", } - Ok(command) -} - -fn codex_execute_after_answer_command( - bins: &RuntimeBins, - run_id: &str, - control: &CodexControl, - args: &AnswerCommandArgs, - model: &str, - effort: &str, -) -> Vec { - let mut command = vec![ - bins.codexctl_bin.clone(), - "session".to_string(), - "execute".to_string(), - "--run-id".to_string(), - run_id.to_string(), - "--timeout".to_string(), - timeout_arg(args.timeout_ms), - ]; - push_control(&mut command, control, &bins.log_mode); - push_model_effort(&mut command, model, effort); - command -} - -fn codex_interrupt_command( - bins: &RuntimeBins, - run_id: &str, - control: &CodexControl, -) -> Vec { - codex_run_id_command(bins, "interrupt", run_id, control) -} - -fn codex_stop_command(bins: &RuntimeBins, run_id: &str, control: &CodexControl) -> Vec { - codex_run_id_command(bins, "stop", run_id, control) -} - -fn codex_run_id_command( - bins: &RuntimeBins, - action: &str, - run_id: &str, - control: &CodexControl, -) -> Vec { - let mut command = vec![ - bins.codexctl_bin.clone(), - "session".to_string(), - action.to_string(), - "--run-id".to_string(), - run_id.to_string(), - ]; - push_control(&mut command, control, &bins.log_mode); - command -} - -fn push_model_effort(command: &mut Vec, model: &str, effort: &str) { - command.extend(["--model".to_string(), model.to_string()]); - command.extend(["--effort".to_string(), effort.to_string()]); } fn effective_model(explicit: Option<&str>, stored: Option<&str>) -> String { @@ -1060,224 +1214,56 @@ fn effective_permission(explicit: Option, stored: Option<&str>) explicit.unwrap_or_else(|| PermissionMode::from_record(stored)) } -fn push_permission(command: &mut Vec, permission: PermissionMode) { - match permission { - PermissionMode::Max => command.push("--dangerously-full-access".to_string()), - PermissionMode::Limited => command.extend([ - "--sandbox".to_string(), - "workspace-write".to_string(), - "--approval-policy".to_string(), - "never".to_string(), - ]), - } -} - -fn timeout_arg(timeout_ms: Option) -> String { - timeout_ms - .map(|value| value.div_ceil(1000).to_string()) - .unwrap_or_else(|| "unlimited".to_string()) -} - -fn push_control(command: &mut Vec, control: &CodexControl, log_mode: &str) { - command.extend([ - "--log-dir".to_string(), - control.log_dir.to_string_lossy().to_string(), - ]); - command.extend(["--log-mode".to_string(), log_mode.to_string()]); - command.extend([ - "--session-socket".to_string(), - control.session_socket.to_string_lossy().to_string(), - ]); -} - -fn new_codex_control(root: &Path, session: &str) -> Result { - let dir = codex_control_dir(root, session); - let log_dir = dir.join("logs"); - fs::create_dir_all(&log_dir).with_context(|| format!("create {}", log_dir.display()))?; - let session_socket = codex_session_socket(root, session); - if let Some(parent) = session_socket.parent() { - fs::create_dir_all(parent).with_context(|| format!("create {}", parent.display()))?; - } - Ok(CodexControl { - log_dir, - session_socket, - }) -} - -fn record_codex_control(root: &Path, record: &SessionRecord) -> Result { - let mut control = new_codex_control(root, &record.session)?; - if let Some(log_dir) = record - .artifacts - .get("log_dir") - .and_then(|value| value.as_str()) - { - control.log_dir = PathBuf::from(log_dir); - fs::create_dir_all(&control.log_dir) - .with_context(|| format!("create {}", control.log_dir.display()))?; - } - if let Some(session_socket) = record - .artifacts - .get("session_socket") - .and_then(|value| value.as_str()) - { - control.session_socket = PathBuf::from(session_socket); - if let Some(parent) = control.session_socket.parent() { - fs::create_dir_all(parent).with_context(|| format!("create {}", parent.display()))?; - } - } - Ok(control) -} - -fn codex_control_dir(root: &Path, session: &str) -> PathBuf { - pandacode_dir(root) - .join("codex") - .join("runs") - .join(crate::io::sanitize_name(session, RUNTIME)) -} - -fn codex_session_socket(root: &Path, session: &str) -> PathBuf { - let mut hasher = DefaultHasher::new(); - root.to_string_lossy().hash(&mut hasher); - session.hash(&mut hasher); - std::env::temp_dir() - .join("pandacode-codex") - .join(format!("{:016x}.sock", hasher.finish())) -} - -fn codex_legacy_log_dir(root: &Path) -> PathBuf { - pandacode_dir(root).join("codex").join("logs") -} - #[cfg(test)] mod tests { - use std::path::PathBuf; - - use crate::cli::{AnswerCommandArgs, Effort, PermissionMode, RuntimeBins}; - use super::*; - fn bins() -> RuntimeBins { - RuntimeBins { - codexctl_bin: "codexctl".to_string(), - codex_bin: "codex".to_string(), - claude_bin: "claude".to_string(), - tmux_bin: "tmux".to_string(), - log_mode: "summary".to_string(), - } - } - #[test] - fn builds_codex_start_with_appserver_session() { - let args = TaskCommandArgs { - stdin: None, - task: Some("fix".to_string()), - task_file: None, - cd: PathBuf::from("/repo"), - session: "latest".to_string(), - model: Some("gpt-5.5".to_string()), - effort: Some(Effort::Xhigh), - permission: None, - timeout_ms: Some(120_000), - json: false, - bins: bins(), - }; - let control = CodexControl { - log_dir: PathBuf::from("/repo/.pandacode/codex/runs/s1/logs"), - session_socket: PathBuf::from("/tmp/pandacode-codex/s1.sock"), - }; - let command = codex_start_command( - &args.bins, - Path::new("/tmp/task.md"), - Path::new("/repo"), - &control, - &args, - "gpt-5.5", - "xhigh", - ); - assert!(command.windows(2).any(|pair| pair == ["session", "start"])); - assert!(command.contains(&"--session-socket".to_string())); - assert!(command.contains(&"/tmp/pandacode-codex/s1.sock".to_string())); - assert!(command.contains(&"--dangerously-full-access".to_string())); - assert!(command.contains(&"gpt-5.5".to_string())); - assert!(command.contains(&"xhigh".to_string())); + fn permission_maps_to_sandbox_and_approval() { + assert_eq!(sandbox_policy(PermissionMode::Max), "danger-full-access"); + assert_eq!(sandbox_policy(PermissionMode::Limited), "workspace-write"); + assert_eq!(approval_policy(PermissionMode::Max), "never"); } #[test] - fn builds_codex_start_with_limited_permission() { - let args = TaskCommandArgs { - stdin: None, - task: Some("fix".to_string()), - task_file: None, - cd: PathBuf::from("/repo"), - session: "latest".to_string(), - model: Some("gpt-5.5".to_string()), - effort: Some(Effort::Xhigh), - permission: Some(PermissionMode::Limited), - timeout_ms: Some(120_000), - json: false, - bins: bins(), - }; - let control = CodexControl { - log_dir: PathBuf::from("/repo/.pandacode/codex/runs/s1/logs"), - session_socket: PathBuf::from("/tmp/pandacode-codex/s1.sock"), - }; - let command = codex_start_command( - &args.bins, - Path::new("/tmp/task.md"), - Path::new("/repo"), - &control, - &args, - "gpt-5.5", - "xhigh", - ); - assert!(!command.contains(&"--dangerously-full-access".to_string())); - assert!( - command - .windows(2) - .any(|pair| pair == ["--sandbox", "workspace-write"]) - ); - assert!( - command - .windows(2) - .any(|pair| pair == ["--approval-policy", "never"]) - ); + fn effective_values_fall_back_to_defaults() { + assert_eq!(effective_model(None, None), DEFAULT_MODEL); + assert_eq!(effective_model(Some("gpt-6"), None), "gpt-6"); + assert_eq!(effective_model(None, Some("gpt-6")), "gpt-6"); + assert_eq!(effective_effort(None, None), DEFAULT_EFFORT); + assert_eq!(effective_effort(None, Some("low")), "low"); } #[test] - fn builds_codex_answer_with_choice() { - let args = AnswerCommandArgs { - session: "latest".to_string(), - cd: PathBuf::from("/repo"), - choice: Some(2), - text: None, - wait: true, - timeout_ms: Some(30_000), - json: false, - bins: bins(), - }; - let control = CodexControl { - log_dir: PathBuf::from("/repo/.pandacode/codex/runs/s1/logs"), - session_socket: PathBuf::from("/tmp/pandacode-codex/s1.sock"), - }; - let command = - codex_answer_command(&args.bins, "run_1", Path::new("/repo"), &control, &args).unwrap(); - assert!(command.windows(2).any(|pair| pair == ["session", "answer"])); - assert!(command.contains(&"--run-id".to_string())); - assert!(command.contains(&"run_1".to_string())); - assert!(command.contains(&"--session-socket".to_string())); - assert!(command.contains(&"--pick".to_string())); - assert!(command.contains(&"2".to_string())); - assert!(!command.contains(&"--detach".to_string())); + fn choice_answers_resolve_from_recorded_questions() { + let mut record = SessionRecord::new(RUNTIME, "s1", "codex-appserver", Path::new("/tmp")); + record.artifacts = json!({ + "pending_questions": [ + {"question": "Continue?", "options": [{"label": "keep going"}, {"label": "stop here"}]} + ] + }); + assert_eq!(choice_answer_text(&record, 2).unwrap(), "stop here"); + assert_eq!(choice_answer_text(&record, 9).unwrap(), "Option 9"); + let mut empty = SessionRecord::new(RUNTIME, "s2", "codex-appserver", Path::new("/tmp")); + empty.artifacts = json!({}); + assert_eq!(choice_answer_text(&empty, 1).unwrap(), "Option 1"); + assert!(choice_answer_text(&empty, 0).is_err()); } #[test] - fn detects_codex_transport_errors() { - let output = crate::io::CmdOutput { - ok: false, - exit_code: Some(1), - stdout: String::new(), - stderr: "Error: daemon returned invalid JSON\nEOF while parsing".to_string(), - }; - assert!(is_codex_transport_error(&output)); + fn structured_answers_use_question_ids_and_option_labels() { + let questions = vec![json!({ + "id": "q1", + "question": "Continue?", + "options": [{"label": "keep going"}, {"label": "stop here"}], + })]; + let by_choice = build_structured_answer(&questions, Some(2), None).unwrap(); + assert_eq!(by_choice["answers"]["q1"]["answers"][0], "stop here"); + let by_text = build_structured_answer(&questions, None, Some("do it")).unwrap(); + assert_eq!(by_text["answers"]["q1"]["answers"][0], "do it"); + assert!(build_structured_answer(&questions, Some(9), None).is_err()); + let no_id = vec![json!({"question": "Continue?"})]; + assert!(build_structured_answer(&no_id, None, Some("x")).is_err()); + assert!(build_structured_answer(&[], None, Some("x")).is_err()); } } diff --git a/pandacode/src/runtimes/codex_appserver.rs b/pandacode/src/runtimes/codex_appserver.rs new file mode 100644 index 0000000..0e43614 --- /dev/null +++ b/pandacode/src/runtimes/codex_appserver.rs @@ -0,0 +1,439 @@ +//! Minimal direct client for `codex app-server` over stdio JSONL. +//! +//! PandaCode spawns one app-server process per turn, drives it with JSON-RPC +//! requests (`thread/start`, `thread/resume`, `turn/start`), and reads event +//! notifications until the turn completes. Thread state persists in the Codex +//! home as rollout files, so multi-turn sessions resume across processes +//! without a daemon. + +use std::io::{BufRead, BufReader, BufWriter, Write}; +use std::path::{Path, PathBuf}; +use std::process::{Child, ChildStdin, Command, Stdio}; +use std::sync::mpsc::{self, Receiver, RecvTimeoutError}; +use std::thread; +use std::time::{Duration, Instant}; + +use anyhow::{Context, Result, anyhow, bail}; +use serde_json::{Value, json}; + +pub struct AppServerClient { + child: Child, + stdin: Option>, + rx: Receiver, + next_id: u64, + log_file: Option, + pid_file: Option, +} + +enum ServerLine { + Stdout(String), + Stderr(String), +} + +impl AppServerClient { + pub fn spawn( + bins: &crate::cli::RuntimeBins, + cwd: &Path, + log_path: Option<&Path>, + ) -> Result { + let mut command = Command::new(&bins.codex_bin); + command + .arg("app-server") + .arg("--listen") + .arg("stdio://") + .current_dir(cwd) + .stdin(Stdio::piped()) + .stdout(Stdio::piped()) + .stderr(Stdio::piped()); + match effective_codex_home(bins.auth_home.as_deref(), bins.codex_home.as_deref()) { + Ok(home) => { + command.env("CODEX_HOME", &home); + } + Err(error) => { + // Fall back to the caller environment rather than failing the + // turn over auth-copy housekeeping. + eprintln!("pandacode: managed codex home unavailable ({error}); using default CODEX_HOME"); + } + } + // Own process group so kill() can reap wrapper grandchildren too + // (`codex` is often a node wrapper that re-spawns the real binary). + #[cfg(unix)] + { + use std::os::unix::process::CommandExt; + command.process_group(0); + } + let mut child = command + .spawn() + .with_context(|| format!("failed to spawn {} app-server", bins.codex_bin))?; + let stdin = child.stdin.take().context("app-server stdin missing")?; + let stdout = child.stdout.take().context("app-server stdout missing")?; + let stderr = child.stderr.take().context("app-server stderr missing")?; + let (tx, rx) = mpsc::channel(); + let tx_stdout = tx.clone(); + thread::spawn(move || { + let reader = BufReader::new(stdout); + for line in reader.lines().map_while(Result::ok) { + let _ = tx_stdout.send(ServerLine::Stdout(line)); + } + }); + thread::spawn(move || { + let reader = BufReader::new(stderr); + for line in reader.lines().map_while(Result::ok) { + let _ = tx.send(ServerLine::Stderr(line)); + } + }); + let log_file = match log_path { + Some(path) => { + if let Some(parent) = path.parent() { + std::fs::create_dir_all(parent)?; + } + Some( + std::fs::OpenOptions::new() + .create(true) + .append(true) + .open(path)?, + ) + } + None => None, + }; + let pid_file = pid_dir(cwd).join(child.id().to_string()); + if let Some(parent) = pid_file.parent() { + let _ = std::fs::create_dir_all(parent); + } + let _ = std::fs::write(&pid_file, b""); + Ok(Self { + child, + stdin: Some(BufWriter::new(stdin)), + rx, + next_id: 1, + log_file, + pid_file: Some(pid_file), + }) + } + + pub fn initialize(&mut self, timeout: Duration) -> Result { + let id = self.send_request( + "initialize", + json!({ + "clientInfo": { + "name": "pandacode", + "title": "PandaCode CLI", + "version": env!("CARGO_PKG_VERSION"), + }, + "capabilities": { + "experimentalApi": true, + "optOutNotificationMethods": ["fs/changed"], + }, + }), + )?; + let response = self.wait_response(id, timeout)?; + if let Some(error) = response.get("error") { + bail!("initialize failed: {error}"); + } + self.send_notification("initialized", None)?; + Ok(response) + } + + pub fn call(&mut self, method: &str, params: Value, timeout: Duration) -> Result { + let id = self.send_request(method, params)?; + let response = self.wait_response(id, timeout)?; + if let Some(error) = response.get("error") { + bail!("{method} failed: {error}"); + } + Ok(response) + } + + pub fn send_request(&mut self, method: &str, params: Value) -> Result { + let id = self.next_id; + self.next_id += 1; + self.send_value(json!({ "id": id, "method": method, "params": params }))?; + Ok(id) + } + + pub fn send_response(&mut self, id: Value, result: Value) -> Result<()> { + self.send_value(json!({ "id": id, "result": result })) + } + + fn send_notification(&mut self, method: &str, params: Option) -> Result<()> { + let value = match params { + Some(params) => json!({ "method": method, "params": params }), + None => json!({ "method": method }), + }; + self.send_value(value) + } + + fn send_value(&mut self, value: Value) -> Result<()> { + self.log("out", &value); + let stdin = self + .stdin + .as_mut() + .context("app-server stdin already closed")?; + writeln!(stdin, "{value}")?; + stdin.flush()?; + Ok(()) + } + + fn wait_response(&mut self, id: u64, timeout: Duration) -> Result { + let deadline = Instant::now() + timeout; + loop { + let message = self.recv_until(deadline)?; + if is_response_with_id(&message, id) { + return Ok(message); + } + } + } + + /// Receive one message, returning Ok(None) when `timeout` elapses first. + pub fn recv_maybe(&mut self, timeout: Duration) -> Result> { + let deadline = Instant::now() + timeout; + loop { + let remaining = deadline + .checked_duration_since(Instant::now()) + .unwrap_or(Duration::ZERO); + let line = match self.rx.recv_timeout(remaining) { + Ok(line) => line, + Err(RecvTimeoutError::Timeout) => return Ok(None), + Err(RecvTimeoutError::Disconnected) => bail!("app-server stream closed"), + }; + match line { + ServerLine::Stdout(line) => { + let value: Value = serde_json::from_str(&line) + .with_context(|| format!("app-server returned invalid JSON: {line}"))?; + self.log("in", &value); + return Ok(Some(value)); + } + ServerLine::Stderr(line) => { + self.log("stderr", &json!({ "line": line })); + } + } + } + } + + pub fn recv_until(&mut self, deadline: Instant) -> Result { + loop { + let remaining = deadline + .checked_duration_since(Instant::now()) + .unwrap_or(Duration::ZERO); + if remaining.is_zero() { + bail!("timed out waiting for app-server message"); + } + let line = match self.rx.recv_timeout(remaining) { + Ok(line) => line, + Err(RecvTimeoutError::Timeout) => { + bail!("timed out waiting for app-server message") + } + Err(RecvTimeoutError::Disconnected) => { + bail!("app-server stream closed") + } + }; + match line { + ServerLine::Stdout(line) => { + let value: Value = serde_json::from_str(&line) + .with_context(|| format!("app-server returned invalid JSON: {line}"))?; + self.log("in", &value); + return Ok(value); + } + ServerLine::Stderr(line) => { + self.log("stderr", &json!({ "line": line })); + } + } + } + } + + pub fn kill(&mut self) { + // Close stdin first so a well-behaved server exits on EOF. + drop(self.stdin.take()); + let pid = self.child.id(); + let _ = self.child.kill(); + let _ = self.child.wait(); + // The child owns its process group (see spawn), so a group kill reaps + // wrapper grandchildren such as the npm codex shim's vendor binary. + kill_process_group(pid); + if let Some(pid_file) = self.pid_file.take() { + let _ = std::fs::remove_file(pid_file); + } + } + + fn log(&mut self, direction: &str, value: &Value) { + if let Some(file) = self.log_file.as_mut() { + let entry = json!({ + "ms": crate::io::now_millis(), + "dir": direction, + "msg": value, + }); + let _ = writeln!(file, "{entry}"); + } + } +} + +impl Drop for AppServerClient { + fn drop(&mut self) { + self.kill(); + } +} + +fn is_response_with_id(message: &Value, id: u64) -> bool { + message.get("method").is_none() + && message + .get("id") + .and_then(Value::as_u64) + .is_some_and(|value| value == id) +} + +/// Whether a message is a server-initiated request (has both id and method). +pub fn server_request_id(message: &Value) -> Option { + if message.get("method").is_some() { + message.get("id").cloned() + } else { + None + } +} + +pub fn notification_method(message: &Value) -> Option<&str> { + if message.get("id").is_none() { + message.get("method").and_then(Value::as_str) + } else { + None + } +} + +pub fn spawn_initialized( + bins: &crate::cli::RuntimeBins, + cwd: &Path, + log_path: Option, +) -> Result { + reap_orphans(cwd); + let mut client = AppServerClient::spawn(bins, cwd, log_path.as_deref()) + .map_err(|error| anyhow!("{error}"))?; + client.initialize(Duration::from_secs(30))?; + Ok(client) +} + +fn pid_dir(root: &Path) -> PathBuf { + crate::io::pandacode_dir(root) + .join("codex") + .join("appserver-pids") +} + +/// Resolve the Codex home for a turn. Precedence: +/// 1. `--codex-home DIR` (or PANDACODE_CODEX_HOME): use the full home as-is. +/// 2. Managed clean home: copy auth material from `--auth-home DIR` (or +/// CODEX_HOME env, or `~/.codex`) into a per-account directory under +/// `~/.pandacode/codex-home/`, deliberately leaving out config.toml, +/// AGENTS.md, and skills so turns run with a clean configuration. +pub fn effective_codex_home( + auth_home: Option<&Path>, + full_home: Option<&Path>, +) -> Result { + if let Some(full) = full_home { + return Ok(full.to_path_buf()); + } + if let Some(custom) = std::env::var_os("PANDACODE_CODEX_HOME") { + return Ok(PathBuf::from(custom)); + } + let home = std::env::var_os("HOME") + .map(PathBuf::from) + .context("HOME is not set")?; + let source = auth_home + .map(Path::to_path_buf) + .or_else(|| std::env::var_os("CODEX_HOME").map(PathBuf::from)) + .unwrap_or_else(|| home.join(".codex")); + let account = managed_account_name(&source, &home); + let target = home.join(".pandacode").join("codex-home").join(account); + std::fs::create_dir_all(&target) + .with_context(|| format!("create managed codex home {}", target.display()))?; + for name in ["config.toml", "AGENTS.md", "skills"] { + let path = target.join(name); + if path.is_dir() { + let _ = std::fs::remove_dir_all(&path); + } else if path.exists() { + let _ = std::fs::remove_file(&path); + } + } + for name in ["auth.json", "installation_id", "version.json"] { + let from = source.join(name); + if from.exists() { + let _ = std::fs::copy(&from, target.join(name)); + } + } + Ok(target) +} + +fn managed_account_name(source: &Path, home: &Path) -> String { + if source == home.join(".codex") { + return "default".to_string(); + } + source + .to_string_lossy() + .chars() + .map(|ch| { + if ch.is_ascii_alphanumeric() || matches!(ch, '.' | '-') { + ch + } else { + '-' + } + }) + .collect::() + .trim_matches('-') + .to_string() +} + +pub fn kill_process_group(pid: u32) { + #[cfg(unix)] + { + let _ = Command::new("/bin/kill") + .args(["-9", "--", &format!("-{pid}")]) + .stdout(Stdio::null()) + .stderr(Stdio::null()) + .status(); + } +} + +/// Best-effort reaper for app-server process groups left behind when a prior +/// pandacode process died without running destructors (e.g. SIGKILL). Every +/// spawn records its pid under `.pandacode/codex/appserver-pids/`; an entry is +/// an orphan only when the process is still an `app-server` AND has been +/// re-parented to init (ppid 1) — a live pandacode parent means the entry +/// belongs to a concurrent run and must be left alone. +pub fn reap_orphans(root: &Path) { + let dir = pid_dir(root); + let Ok(entries) = std::fs::read_dir(&dir) else { + return; + }; + for entry in entries.flatten() { + let path = entry.path(); + let Some(pid) = path + .file_name() + .and_then(|name| name.to_str()) + .and_then(|name| name.parse::().ok()) + else { + let _ = std::fs::remove_file(&path); + continue; + }; + let line = Command::new("ps") + .args(["-p", &pid.to_string(), "-o", "ppid=,command="]) + .output() + .map(|output| String::from_utf8_lossy(&output.stdout).to_string()) + .unwrap_or_default(); + let line = line.trim(); + if line.is_empty() { + // Process is gone; the entry is stale bookkeeping. + let _ = std::fs::remove_file(&path); + continue; + } + let ppid = line + .split_whitespace() + .next() + .and_then(|value| value.parse::().ok()); + let orphaned_appserver = ppid == Some(1) && line.contains("app-server"); + if orphaned_appserver { + kill_process_group(pid); + let _ = std::fs::remove_file(&path); + } else if !line.contains("app-server") { + // Pid recycled by an unrelated process: drop the stale entry. + let _ = std::fs::remove_file(&path); + } + // Otherwise: a live app-server owned by a concurrent pandacode run — + // leave both the process and its entry untouched. + } +} diff --git a/pandacode/src/runtimes/mod.rs b/pandacode/src/runtimes/mod.rs index a1e7427..fae3d98 100644 --- a/pandacode/src/runtimes/mod.rs +++ b/pandacode/src/runtimes/mod.rs @@ -1,11 +1,12 @@ pub mod bamboo; pub mod claude; pub mod codex; +pub(crate) mod codex_appserver; use std::{env, str::FromStr}; use anyhow::{Result, anyhow}; -use serde_json::json; +use serde_json::{Value, json}; use crate::{ cli::{ @@ -252,7 +253,7 @@ fn bamboo_provider_for_model(model: &str) -> Option { fn is_claude_model_hint(model: &str) -> bool { let model = model.trim().to_ascii_lowercase(); - matches!(model.as_str(), "haiku" | "sonnet" | "opus") || model.starts_with("claude-") + matches!(model.as_str(), "haiku" | "sonnet" | "opus" | "fable") || model.starts_with("claude-") } fn is_codex_model_hint(model: &str) -> bool { @@ -273,13 +274,12 @@ fn reject_provider_for_delegated_runtime(provider: Option<&str>, _runtime: &str) Ok(()) } - pub async fn doctor(args: GlobalArgs) -> Result<()> { let root = workspace(&args.cd)?; let codex = codex::doctor_report(&root, &args.bins)?; let claude = claude::doctor_report(&root, &args.bins)?; let bamboo = bamboo::doctor_report(&root, &args.bins).await?; - output_json(&json!({ + let report = json!({ "ok": codex.get("ok").and_then(|v| v.as_bool()).unwrap_or(false) || claude.get("ok").and_then(|v| v.as_bool()).unwrap_or(false) || bamboo.get("ok").and_then(|v| v.as_bool()).unwrap_or(false), @@ -287,17 +287,29 @@ pub async fn doctor(args: GlobalArgs) -> Result<()> { "codex": codex, "claude": claude, "bamboo": bamboo - })) + }); + if args.json { + output_json(&report) + } else { + println!("{}", format_doctor_summary(&report)); + Ok(()) + } } pub fn list_all(args: GlobalArgs) -> Result<()> { let root = workspace(&args.cd)?; - output_json(&json!({ + let report = json!({ "ok": true, "codex": session::list(&root, "codex")?, "claude": session::list(&root, "claude")?, "bamboo": session::list(&root, "bamboo")? - })) + }); + if args.json { + output_json(&report) + } else { + println!("{}", format_session_list_summary(&report)); + Ok(()) + } } pub async fn models_all(args: GlobalArgs) -> Result<()> { @@ -305,11 +317,328 @@ pub async fn models_all(args: GlobalArgs) -> Result<()> { let codex = codex::models_report(&root, &args.bins)?; let claude = claude::models_report(&root, &args.bins)?; let bamboo = bamboo::models_report(&root, None).await?; - output_json(&json!({ + let report = json!({ "ok": true, "codex": codex, "claude": claude, "bamboo": bamboo + }); + if args.json { + output_json(&report) + } else { + println!("{}", format_models_summary(&report)); + Ok(()) + } +} + +fn format_doctor_summary(report: &Value) -> String { + let status = if report_bool(report, "ok") { + "usable" + } else { + "needs setup" + }; + let mut lines = vec![format!("PandaCode doctor: {status}")]; + for runtime in ["bamboo", "claude", "codex"] { + let runtime_report = &report[runtime]; + let runtime_status = if report_bool(runtime_report, "ok") { + "available".to_string() + } else { + report_str(runtime_report, "state") + .unwrap_or("unavailable") + .to_string() + }; + let mut details = vec![format!(" - {runtime}: {runtime_status}")]; + if let Some(driver) = report_str(runtime_report, "driver") { + details.push(format!("driver={driver}")); + } + if let Some(active) = format_bamboo_active(runtime_report) { + details.push(format!("active={active}")); + } + let missing = string_array(&runtime_report["missing"]); + if !missing.is_empty() { + details.push(format!("missing={}", missing.join(","))); + } + lines.push(details.join(" ")); + } + lines.push("JSON: pandacode doctor --json".to_string()); + lines.join("\n") +} + +fn format_session_list_summary(report: &Value) -> String { + let mut lines = vec!["PandaCode sessions".to_string()]; + for runtime in ["bamboo", "claude", "codex"] { + let sessions = report[runtime].as_array().map(Vec::as_slice).unwrap_or(&[]); + let mut line = format!(" - {runtime}: {}", sessions.len()); + if let Some(latest) = sessions.first() { + if let Some(session) = report_str(latest, "session") { + line.push_str(&format!(" latest={session}")); + } + if let Some(model) = report_str(latest, "model") { + line.push_str(&format!(" model={model}")); + } + if let Some(run_id) = report_str(latest, "run_id") { + line.push_str(&format!(" run={run_id}")); + } + } + lines.push(line); + } + lines.push("JSON: pandacode list --json".to_string()); + lines.join("\n") +} + +fn format_models_summary(report: &Value) -> String { + let mut lines = vec!["PandaCode models".to_string()]; + lines.push(format_model_runtime_summary("bamboo", &report["bamboo"])); + lines.push(format_model_runtime_summary("claude", &report["claude"])); + lines.push(format_model_runtime_summary("codex", &report["codex"])); + lines.push("JSON: pandacode models --json".to_string()); + lines.join("\n") +} + +fn format_model_runtime_summary(runtime: &str, report: &Value) -> String { + let mut parts = vec![format!( + " - {runtime}: {}", + if report_bool(report, "ok") { + "ok" + } else { + "unavailable" + } + )]; + match runtime { + "bamboo" => { + let models = report["raw"]["models"] + .as_array() + .map(Vec::as_slice) + .unwrap_or(&[]); + parts.push(format!("models={}", models.len())); + let defaults = models + .iter() + .filter(|model| report_bool(model, "is_default")) + .filter_map(format_provider_model) + .collect::>(); + if !defaults.is_empty() { + parts.push(format!("defaults={}", join_limited(defaults, 4))); + } + } + "claude" => { + let aliases = string_array(&report["known_aliases"]); + if !aliases.is_empty() { + parts.push(format!("aliases={}", aliases.join(", "))); + } + } + "codex" => { + let models = report["raw"]["models"] + .as_array() + .map(Vec::as_slice) + .unwrap_or(&[]); + parts.push(format!("models={}", models.len())); + let defaults = models + .iter() + .filter(|model| report_bool(model, "is_default")) + .filter_map(format_model_id) + .collect::>(); + if !defaults.is_empty() { + parts.push(format!("default={}", join_limited(defaults, 1))); + } + } + _ => {} + } + parts.join(" ") +} + +fn format_bamboo_active(report: &Value) -> Option { + let active = &report["active"]; + let provider = report_str(active, "provider")?; + let model = report_str(active, "model")?; + Some(format!("{provider}/{model}")) +} + +fn format_provider_model(model: &Value) -> Option { + let provider = report_str(model, "provider")?; + let id = report_str(model, "id") + .or_else(|| report_str(model, "model")) + .unwrap_or("unknown"); + Some(format!("{provider}/{id}")) +} + +fn format_model_id(model: &Value) -> Option { + report_str(model, "id") + .or_else(|| report_str(model, "model")) + .map(ToString::to_string) +} + +fn report_bool(report: &Value, key: &str) -> bool { + report.get(key).and_then(Value::as_bool).unwrap_or(false) +} + +fn report_str<'a>(report: &'a Value, key: &str) -> Option<&'a str> { + report.get(key).and_then(Value::as_str) +} + +fn string_array(value: &Value) -> Vec { + value + .as_array() + .into_iter() + .flatten() + .filter_map(Value::as_str) + .map(ToString::to_string) + .collect() +} + +fn join_limited(values: Vec, limit: usize) -> String { + if values.len() <= limit { + return values.join(", "); + } + let hidden = values.len() - limit; + let mut visible = values.into_iter().take(limit).collect::>(); + visible.push(format!("+{hidden}")); + visible.join(", ") +} + +pub fn wait_sessions(args: crate::cli::WaitCommandArgs) -> Result<()> { + let root = crate::io::workspace(&args.cd)?; + let started = std::time::Instant::now(); + let deadline = started + std::time::Duration::from_millis(args.timeout_ms); + // Grace window for a freshly-launched lane to write its first record. + // After it elapses, a session still missing a record is treated as a + // never-started lane (typo'd name / launch failed) and fails fast instead + // of silently waiting out the full timeout. + let grace = std::time::Duration::from_millis(args.interval_ms.saturating_mul(3).max(10_000)); + fn terminal(state: &str) -> bool { + matches!( + state, + "completed" | "failed" | "timeout" | "stopped" | "interrupted" | "no_report" | "blocked" + ) + } + loop { + let mut lanes = serde_json::Map::new(); + let mut all_settled = true; + let mut any_waiting = false; + let mut pending_names = Vec::new(); + for name in &args.sessions { + let (runtime, state) = match session::resolve_runtime_for_session(&root, name) { + Ok(runtime) => match session::load(&root, &runtime, name) { + Ok(record) => { + let state = record.artifacts["status"] + .as_str() + .unwrap_or("unknown") + .to_string(); + (runtime, state) + } + Err(_) => (runtime, "pending".to_string()), + }, + Err(_) => ("unknown".to_string(), "pending".to_string()), + }; + if state == "pending" { + pending_names.push(name.clone()); + } + if state == "waiting_for_user" { + any_waiting = true; + } else if !terminal(&state) { + all_settled = false; + } + lanes.insert( + name.clone(), + serde_json::json!({ "runtime": runtime, "state": state }), + ); + } + // Fast-fail: a lane that never produced a record after the grace window. + if !pending_names.is_empty() && started.elapsed() >= grace { + crate::io::output_json(&serde_json::json!({ + "ok": false, + "action": "wait", + "state": "missing_session", + "sessions": lanes, + "missing_sessions": pending_names, + "error": "session(s) never started a turn; check the session name(s)", + "elapsed_ms": started.elapsed().as_millis() as u64, + }))?; + return Err(crate::io::JsonAlreadyEmitted.into()); + } + let timed_out = std::time::Instant::now() >= deadline; + if all_settled || any_waiting || timed_out { + let missing = args + .expect_artifact + .iter() + .filter(|path| !root.join(path).exists()) + .map(|path| path.to_string_lossy().to_string()) + .collect::>(); + let all_completed = lanes.values().all(|lane| lane["state"] == "completed"); + let ok = all_completed && missing.is_empty() && !any_waiting; + let state = if ok { + "completed" + } else if any_waiting { + "waiting_for_user" + } else if timed_out && !all_settled { + "timeout" + } else if all_completed && !missing.is_empty() { + "no_report" + } else { + "failed" + }; + crate::io::output_json(&serde_json::json!({ + "ok": ok, + "action": "wait", + "state": state, + "sessions": lanes, + "missing_artifacts": missing, + "elapsed_ms": started.elapsed().as_millis() as u64, + }))?; + if ok { + return Ok(()); + } + return Err(crate::io::JsonAlreadyEmitted.into()); + } + std::thread::sleep(std::time::Duration::from_millis(args.interval_ms)); + } +} + +pub fn gc_sessions(args: crate::cli::GcCommandArgs) -> Result<()> { + let root = crate::io::workspace(&args.cd)?; + let base = crate::io::pandacode_dir(&root); + let cutoff = std::time::SystemTime::now() + .checked_sub(std::time::Duration::from_secs(args.days.saturating_mul(86_400))) + .unwrap_or(std::time::UNIX_EPOCH); + // Only PandaCode-owned transient outputs. Never touch `sessions/` (state), + // `codex/codex-home` (thread history), or `codex/appserver-pids`. + let prunable = ["prompts", "logs", "events", "detached"]; + let mut removed = Vec::new(); + let mut bytes_freed: u64 = 0; + for runtime in ["codex", "claude", "bamboo"] { + for sub in prunable { + let dir = base.join(runtime).join(sub); + let Ok(entries) = std::fs::read_dir(&dir) else { + continue; + }; + for entry in entries.flatten() { + let path = entry.path(); + let Ok(meta) = entry.metadata() else { continue }; + if !meta.is_file() { + continue; + } + let old = meta + .modified() + .map(|m| m < cutoff) + .unwrap_or(false); + if !old { + continue; + } + bytes_freed += meta.len(); + removed.push(path.to_string_lossy().to_string()); + if !args.dry_run { + let _ = std::fs::remove_file(&path); + } + } + } + } + crate::io::output_json(&serde_json::json!({ + "ok": true, + "action": "gc", + "dry_run": args.dry_run, + "days": args.days, + "removed_count": removed.len(), + "bytes_freed": bytes_freed, + "removed": removed, })) } @@ -335,8 +664,12 @@ fn version_report(program: &str, args: &[&str]) -> serde_json::Value { #[cfg(test)] mod tests { - use super::{ResolvedRuntime, bamboo_provider_for_model, runtime_hint_for_model}; + use super::{ + ResolvedRuntime, bamboo_provider_for_model, format_doctor_summary, format_models_summary, + format_session_list_summary, runtime_hint_for_model, + }; use crate::config::ProviderKind; + use serde_json::json; #[test] fn model_hints_route_to_matching_runtime() { @@ -352,6 +685,10 @@ mod tests { runtime_hint_for_model("opus"), Some(ResolvedRuntime::Claude) ); + assert_eq!( + runtime_hint_for_model("fable"), + Some(ResolvedRuntime::Claude) + ); assert_eq!( runtime_hint_for_model("claude-sonnet-4-5"), Some(ResolvedRuntime::Claude) @@ -375,4 +712,114 @@ mod tests { ); assert_eq!(bamboo_provider_for_model("opus"), None); } + + #[test] + fn doctor_summary_highlights_runtime_state() { + let summary = format_doctor_summary(&json!({ + "ok": true, + "bamboo": { + "ok": false, + "state": "configuration_needed", + "driver": "bamboo-native", + "missing": ["api_key"], + "active": { + "provider": "deepseek", + "model": "deepseek-v4-pro" + } + }, + "claude": { + "ok": true, + "state": "available", + "driver": "tmux", + "missing": [] + }, + "codex": { + "ok": true, + "state": "available", + "driver": "codex app-server", + "missing": [] + } + })); + + assert!(summary.contains("PandaCode doctor: usable")); + assert!(summary.contains("bamboo: configuration_needed")); + assert!(summary.contains("active=deepseek/deepseek-v4-pro")); + assert!(summary.contains("missing=api_key")); + assert!(summary.contains("claude: available")); + assert!(summary.contains("codex: available")); + assert!(summary.contains("JSON: pandacode doctor --json")); + assert!(!summary.contains('{')); + } + + #[test] + fn models_summary_compacts_runtime_catalogs() { + let summary = format_models_summary(&json!({ + "ok": true, + "bamboo": { + "ok": true, + "raw": { + "models": [ + { + "provider": "deepseek", + "id": "deepseek-v4-pro", + "is_default": true + }, + { + "provider": "kimi", + "id": "kimi-k2.6", + "is_default": true + } + ] + } + }, + "claude": { + "ok": true, + "known_aliases": ["haiku", "sonnet", "opus", "fable"] + }, + "codex": { + "ok": true, + "raw": { + "models": [ + { + "id": "gpt-5.5", + "is_default": true + }, + { + "id": "gpt-5.4", + "is_default": false + } + ] + } + } + })); + + assert!(summary.contains("bamboo: ok models=2")); + assert!(summary.contains("defaults=deepseek/deepseek-v4-pro, kimi/kimi-k2.6")); + assert!(summary.contains("claude: ok aliases=haiku, sonnet, opus, fable")); + assert!(summary.contains("codex: ok models=2 default=gpt-5.5")); + assert!(summary.contains("JSON: pandacode models --json")); + assert!(!summary.contains('{')); + } + + #[test] + fn session_list_summary_counts_and_shows_latest() { + let summary = format_session_list_summary(&json!({ + "ok": true, + "bamboo": [], + "claude": [], + "codex": [ + { + "session": "latest", + "model": "gpt-5.5", + "run_id": "run_123" + } + ] + })); + + assert!(summary.contains("bamboo: 0")); + assert!(summary.contains("claude: 0")); + assert!(summary.contains("codex: 1 latest=latest model=gpt-5.5 run=run_123")); + assert!(summary.contains("JSON: pandacode list --json")); + assert!(!summary.contains('{')); + } } diff --git a/pandacode/src/session.rs b/pandacode/src/session.rs index d3f02eb..e6056f9 100644 --- a/pandacode/src/session.rs +++ b/pandacode/src/session.rs @@ -54,7 +54,10 @@ pub fn save(root: &Path, record: &mut SessionRecord) -> Result<()> { let dir = runtime_dir(root, &record.runtime); fs::create_dir_all(&dir).with_context(|| format!("create {}", dir.display()))?; let path = record_path(root, &record.runtime, &record.session); - write_atomic(&path, &format!("{}\n", serde_json::to_string_pretty(record)?))?; + write_atomic( + &path, + &format!("{}\n", serde_json::to_string_pretty(record)?), + )?; let pointer = format!( "{}\n", serde_json::to_string_pretty(&json!({ @@ -195,13 +198,6 @@ pub fn artifacts(root: &Path, runtime: &str, session: &str) -> Result Result { - record - .run_id - .clone() - .ok_or_else(|| anyhow::anyhow!("session {} has no codex run_id", record.session)) -} - fn runtime_dir(root: &Path, runtime: &str) -> PathBuf { pandacode_dir(root).join("sessions").join(runtime) } @@ -232,7 +228,7 @@ mod tests { fn saves_and_resolves_latest() { let root = std::env::temp_dir().join(format!("pandacode-session-test-{}", now_millis())); fs::create_dir_all(&root).unwrap(); - let mut record = SessionRecord::new("codex", "s1", "codexctl", &root); + let mut record = SessionRecord::new("codex", "s1", "codex-appserver", &root); save(&root, &mut record).unwrap(); assert_eq!(resolve_session(&root, "codex", "latest").unwrap(), "s1"); assert_eq!(load(&root, "codex", "latest").unwrap().session, "s1"); @@ -261,6 +257,9 @@ mod tests { assert!(p.starts_with(&dir), "escaped runtime dir: {}", p.display()); assert!(!p.to_string_lossy().contains("/etc/passwd")); // A normal session still maps to .json under the runtime dir. - assert_eq!(record_path(root, "claude", "abc-1.2"), dir.join("abc-1.2.json")); + assert_eq!( + record_path(root, "claude", "abc-1.2"), + dir.join("abc-1.2.json") + ); } } diff --git a/pandacode/tests/fake_runtimes.rs b/pandacode/tests/fake_runtimes.rs index 9a2199c..9d18ed8 100644 --- a/pandacode/tests/fake_runtimes.rs +++ b/pandacode/tests/fake_runtimes.rs @@ -4,7 +4,7 @@ use std::{ net::{TcpListener, TcpStream}, os::unix::fs::PermissionsExt, path::{Path, PathBuf}, - process::Command, + process::{Command, Stdio}, sync::{ Arc, atomic::{AtomicUsize, Ordering}, @@ -34,6 +34,72 @@ fn now_millis() -> u128 { .as_millis() } +#[test] +fn run_help_explains_common_task_options() { + let output = Command::new(bin()) + .args(["run", "--help"]) + .output() + .unwrap(); + assert!( + output.status.success(), + "{}", + String::from_utf8_lossy(&output.stderr) + ); + let help = String::from_utf8_lossy(&output.stdout); + for expected in [ + "Inline task text", + "Read task text from a file", + "Workspace directory", + "Runtime to use", + "Print machine-readable JSON", + "Wait timeout in milliseconds", + ] { + assert!( + help.contains(expected), + "missing help text {expected}: {help}" + ); + } +} + +#[test] +fn top_level_list_defaults_to_compact_text_and_json_remains_machine_readable() { + let root = temp_root("list-compact"); + + let output = Command::new(bin()) + .args(["list", "--cd", root.to_str().unwrap()]) + .output() + .unwrap(); + assert!( + output.status.success(), + "{}", + String::from_utf8_lossy(&output.stderr) + ); + let text = String::from_utf8_lossy(&output.stdout); + assert!(text.contains("PandaCode sessions"), "{text}"); + assert!(text.contains("bamboo: 0"), "{text}"); + assert!(text.contains("claude: 0"), "{text}"); + assert!(text.contains("codex: 0"), "{text}"); + assert!(text.contains("JSON: pandacode list --json"), "{text}"); + assert!(!text.contains('{'), "{text}"); + + let output = Command::new(bin()) + .args(["list", "--json", "--cd", root.to_str().unwrap()]) + .output() + .unwrap(); + assert!( + output.status.success(), + "{}", + String::from_utf8_lossy(&output.stderr) + ); + let json: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); + assert_eq!(json["ok"], true); + assert_eq!(json["bamboo"].as_array().unwrap().len(), 0); + assert_eq!(json["claude"].as_array().unwrap().len(), 0); + assert_eq!(json["codex"].as_array().unwrap().len(), 0); + + fs::remove_dir_all(root).unwrap(); +} + fn write_exe(path: &Path, content: &str) { fs::write(path, content).unwrap(); let mut perms = fs::metadata(path).unwrap().permissions(); @@ -41,72 +107,74 @@ fn write_exe(path: &Path, content: &str) { fs::set_permissions(path, perms).unwrap(); } -fn fake_codexctl(path: &Path) { +fn fake_codex_appserver(path: &Path) { + let py = path.with_extension("py"); + fs::write( + &py, + r#" +import json, os, sys, time + +def send(obj): + sys.stdout.write(json.dumps(obj) + "\n") + sys.stdout.flush() + +asked = False +for line in sys.stdin: + line = line.strip() + if not line: + continue + msg = json.loads(line) + method = msg.get("method") + mid = msg.get("id") + if method is None: + if mid == 999: + send({"method": "item/completed", "params": {"item": {"type": "agentMessage", "text": "answered:" + json.dumps(msg.get("result"))}}}) + send({"method": "thread/tokenUsage/updated", "params": {"total": {"input_tokens": 11, "output_tokens": 6}}}) + send({"method": "turn/completed", "params": {"turn": {"id": "turn_fake", "status": "completed"}}}) + continue + if method == "initialize": + send({"id": mid, "result": {"codexHome": "/tmp"}}) + elif method in ("thread/start", "thread/resume"): + send({"id": mid, "result": {"thread": {"id": "thread_fake", "path": "/tmp/thread_fake.jsonl"}, "model": "gpt-5.5"}}) + elif method == "turn/start": + send({"id": mid, "result": {"turn": {"id": "turn_fake"}}}) + if os.environ.get("FAKE_TURN_SLEEP"): + time.sleep(float(os.environ["FAKE_TURN_SLEEP"])) + if os.environ.get("FAKE_REQUEST_USER_INPUT") and not asked: + asked = True + send({"id": 999, "method": "item/tool/requestUserInput", "params": {"questions": [{"id": "q1", "question": "How should this continue?", "options": [{"label": "keep going"}, {"label": "stop here"}]}]}}) + continue + try: + text = msg["params"]["input"][0]["text"] + except Exception: + text = "" + send({"method": "item/completed", "params": {"item": {"type": "agentMessage", "text": "implemented:" + text[:200]}}}) + send({"method": "thread/tokenUsage/updated", "params": {"total": {"input_tokens": 10, "output_tokens": 5}}}) + send({"method": "turn/completed", "params": {"turn": {"id": "turn_fake", "status": "completed"}}}) + elif method == "model/list": + send({"id": mid, "result": {"data": [{"id": "gpt-5.5", "displayName": "GPT-5.5", "isDefault": True, "supportedReasoningEfforts": ["low", "medium", "high", "xhigh"]}]}}) + elif mid is not None: + send({"id": mid, "result": {}}) +"#, + ) + .unwrap(); write_exe( path, - r#"#!/usr/bin/env bash + &format!( + r#"#!/usr/bin/env bash set -euo pipefail -if [[ "${1:-}" == "--help" ]]; then - echo "fake codexctl help" - exit 0 -fi -if [[ "${1:-}" == "models" ]]; then - printf '{"models":[{"id":"gpt-5.5","efforts":["low","medium","high","xhigh"]}]}\n' +if [[ "${{1:-}}" == "--help" ]]; then + echo "fake codex help" exit 0 fi -if [[ "${1:-}" == "session" ]]; then - action="${2:-}" - case "$action" in - start) - printf '{"run_id":"run_fake","thread_id":"thread_fake","thread_path":"/tmp/thread.jsonl","status":"completed","current_phase":"completed"}\n' - ;; - send) - printf '{"run_id":"run_fake","status":"completed","current_phase":"completed","last_agent_message":"continued"}\n' - ;; - answer) - printf '{"run_id":"run_fake","status":"completed","current_phase":"completed","last_agent_message":"answered"}\n' - ;; - execute) - printf '{"run_id":"run_fake","status":"completed","current_phase":"completed","last_agent_message":"implemented"}\n' - ;; - resume) - printf '{"run_id":"run_resumed","thread_id":"thread_fake","status":"completed"}\n' - ;; - read) - printf '{"run_id":"run_fake","status":"completed","current_phase":"completed","last_agent_message":"fake read"}\n' - ;; - interrupt) - printf '{"run_id":"run_fake","status":"interrupted"}\n' - ;; - stop) - printf '{"run_id":"run_fake","status":"stopped"}\n' - ;; - list) - printf '{"runs":[{"run_id":"run_fake","status":"completed"}]}\n' - ;; - *) - echo "unknown session action $action" >&2 - exit 2 - ;; - esac - exit 0 +if [[ "${{1:-}}" == "app-server" ]]; then + exec /usr/bin/env python3 "{py}" app-server fi -echo "unknown fake codexctl args: $*" >&2 +echo "unknown fake codex args: $*" >&2 exit 2 "#, - ); -} - -fn fake_codex(path: &Path) { - write_exe( - path, - r#"#!/usr/bin/env bash -if [[ "${1:-}" == "--help" ]]; then - echo "fake codex help" - exit 0 -fi -echo "fake codex" -"#, + py = py.display() + ), ); } @@ -715,10 +783,8 @@ fn top_level_run_infers_codex_from_model() { let root = temp_root("agent-model-infers-codex"); let bin_dir = root.join("bin"); fs::create_dir_all(&bin_dir).unwrap(); - let codexctl = bin_dir.join("codexctl"); let codex = bin_dir.join("codex"); - fake_codexctl(&codexctl); - fake_codex(&codex); + fake_codex_appserver(&codex); let output = Command::new(bin()) .args([ @@ -731,8 +797,6 @@ fn top_level_run_infers_codex_from_model() { "xhigh", "--cd", root.to_str().unwrap(), - "--codexctl-bin", - codexctl.to_str().unwrap(), "--codex-bin", codex.to_str().unwrap(), ]) @@ -852,9 +916,9 @@ fn models_json_reports_permission_capabilities() { let root = temp_root("models-permissions"); let bin_dir = root.join("bin"); fs::create_dir_all(&bin_dir).unwrap(); - let codexctl = bin_dir.join("codexctl"); + let codex = bin_dir.join("codex"); let claude = bin_dir.join("claude"); - fake_codexctl(&codexctl); + fake_codex_appserver(&codex); fake_claude(&claude); let output = Command::new(bin()) @@ -863,8 +927,8 @@ fn models_json_reports_permission_capabilities() { "--json", "--cd", root.to_str().unwrap(), - "--codexctl-bin", - codexctl.to_str().unwrap(), + "--codex-bin", + codex.to_str().unwrap(), "--claude-bin", claude.to_str().unwrap(), ]) @@ -922,20 +986,16 @@ fn bamboo_blocked_run_emits_single_json_object() { } #[test] -fn codex_runtime_exec_resume_observe_and_stop_with_fake_codexctl() { +fn codex_runtime_exec_resume_observe_and_stop_with_fake_appserver() { let root = temp_root("codex"); let bin_dir = root.join("bin"); fs::create_dir_all(&bin_dir).unwrap(); - let codexctl = bin_dir.join("codexctl"); let codex = bin_dir.join("codex"); - fake_codexctl(&codexctl); - fake_codex(&codex); + fake_codex_appserver(&codex); let common = [ "--cd", root.to_str().unwrap(), - "--codexctl-bin", - codexctl.to_str().unwrap(), "--codex-bin", codex.to_str().unwrap(), ]; @@ -950,13 +1010,16 @@ fn codex_runtime_exec_resume_observe_and_stop_with_fake_codexctl() { String::from_utf8_lossy(&output.stderr) ); let exec: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); - assert_eq!(exec["record"]["run_id"], "run_fake"); - assert_eq!(exec["summary"]["last_agent_message"], "implemented"); - assert_eq!(exec["start"]["action"], "start"); - assert_eq!(exec["execute"]["action"], "execute"); - assert!(exec.get("raw").is_none()); - assert!(exec["execute"].get("stdout").is_none()); - assert!(exec["execute"].get("stdout_tail").is_some()); + assert_eq!(exec["ok"], true); + assert_eq!(exec["state"], "completed"); + assert_eq!(exec["record"]["thread_id"], "thread_fake"); + assert!( + exec["summary"]["last_agent_message"] + .as_str() + .unwrap() + .starts_with("implemented:") + ); + assert!(exec["summary"]["usage"]["total"]["input_tokens"].is_number()); let output = Command::new(bin()) .args(["codex", "status", "--session", "latest"]) @@ -965,13 +1028,13 @@ fn codex_runtime_exec_resume_observe_and_stop_with_fake_codexctl() { .unwrap(); assert!(output.status.success()); let status: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); - assert!(status.get("stdout").is_none()); - assert_eq!(status["summary"]["last_agent_message"], "fake read"); + assert_eq!(status["ok"], true); + assert_eq!(status["state"], "completed"); assert!( - status["output_tail"] + status["summary"]["last_agent_message"] .as_str() .unwrap() - .contains("fake read") + .starts_with("implemented:") ); let output = Command::new(bin()) @@ -981,12 +1044,10 @@ fn codex_runtime_exec_resume_observe_and_stop_with_fake_codexctl() { .unwrap(); assert!(output.status.success()); let logs: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); - assert!(logs.get("stdout").is_none()); - assert_eq!(logs["summary"]["last_agent_message"], "fake read"); + assert_eq!(logs["ok"], true); + assert!(logs["log_tail"].as_str().unwrap().contains("turn/completed")); for args in [ - vec!["codex", "status", "--session", "latest"], - vec!["codex", "logs", "--session", "latest", "--json"], vec!["codex", "artifacts", "--session", "latest"], vec![ "codex", @@ -999,11 +1060,7 @@ fn codex_runtime_exec_resume_observe_and_stop_with_fake_codexctl() { "xhigh", ], ] { - let output = Command::new(bin()) - .args(args) - .args(common) - .output() - .unwrap(); + let output = Command::new(bin()).args(args).args(common).output().unwrap(); assert!( output.status.success(), "{}", @@ -1029,20 +1086,17 @@ fn codex_runtime_exec_resume_observe_and_stop_with_fake_codexctl() { String::from_utf8_lossy(&output.stderr) ); let resume: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); - assert_eq!(resume["summary"]["last_agent_message"], "implemented"); - assert_eq!(resume["send"]["action"], "send"); - assert_eq!(resume["execute"]["action"], "execute"); - assert!(resume.get("raw").is_none()); + assert_eq!(resume["state"], "completed"); + assert!( + resume["summary"]["last_agent_message"] + .as_str() + .unwrap() + .contains("continue") + ); let output = Command::new(bin()) .args([ - "codex", - "answer", - "--session", - "latest", - "--choice", - "1", - "--wait", + "codex", "answer", "--session", "latest", "--text", "go", "--wait", ]) .args(common) .output() @@ -1053,21 +1107,31 @@ fn codex_runtime_exec_resume_observe_and_stop_with_fake_codexctl() { String::from_utf8_lossy(&output.stderr) ); let answer: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); - assert_eq!(answer["action"], "answer"); - assert_eq!(answer["summary"]["last_agent_message"], "answered"); + assert_eq!(answer["state"], "completed"); + assert!( + answer["summary"]["last_agent_message"] + .as_str() + .unwrap() + .contains("go") + ); + + let output = Command::new(bin()) + .args(["codex", "models"]) + .args(common) + .output() + .unwrap(); + assert!(output.status.success()); + let models: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); + assert_eq!(models["ok"], true); + assert_eq!(models["models"][0]["id"], "gpt-5.5"); for args in [ vec!["codex", "interrupt", "--session", "latest"], vec!["codex", "stop", "--session", "latest"], vec!["codex", "list"], - vec!["codex", "models"], vec!["codex", "doctor"], ] { - let output = Command::new(bin()) - .args(args) - .args(common) - .output() - .unwrap(); + let output = Command::new(bin()).args(args).args(common).output().unwrap(); assert!( output.status.success(), "{}", @@ -1078,6 +1142,411 @@ fn codex_runtime_exec_resume_observe_and_stop_with_fake_codexctl() { fs::remove_dir_all(root).unwrap(); } +#[test] +fn codex_status_is_visible_while_start_is_running() { + let root = temp_root("codex-start-visible"); + let bin_dir = root.join("bin"); + fs::create_dir_all(&bin_dir).unwrap(); + let codex = bin_dir.join("codex"); + fake_codex_appserver(&codex); + + let common = [ + "--cd", + root.to_str().unwrap(), + "--codex-bin", + codex.to_str().unwrap(), + ]; + let child = Command::new(bin()) + .env("FAKE_TURN_SLEEP", "2") + .args(["codex", "exec", "--task", "slow start", "--session", "slow"]) + .args(common) + .stdout(Stdio::piped()) + .stderr(Stdio::piped()) + .spawn() + .unwrap(); + + let record_path = root.join(".pandacode/sessions/codex/slow.json"); + let mut running_seen = false; + for _ in 0..100 { + if record_path.exists() + && let Ok(text) = fs::read_to_string(&record_path) + && let Ok(record) = serde_json::from_str::(&text) + { + let status = record["artifacts"]["status"].as_str().unwrap_or(""); + if status == "starting" || status == "running" { + running_seen = true; + break; + } + } + thread::sleep(Duration::from_millis(50)); + } + assert!( + running_seen, + "codex session record should be visible while the turn is still running" + ); + + let output = Command::new(bin()) + .args(["codex", "status", "--session", "slow"]) + .args(common) + .output() + .unwrap(); + assert!( + output.status.success(), + "{}", + String::from_utf8_lossy(&output.stderr) + ); + let status: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); + assert!(matches!( + status["state"].as_str().unwrap(), + "starting" | "running" | "completed" + )); + + let output = Command::new(bin()) + .args(["list", "--cd", root.to_str().unwrap()]) + .output() + .unwrap(); + assert!(output.status.success()); + let list = String::from_utf8_lossy(&output.stdout); + assert!(list.contains("codex: 1"), "{list}"); + + let output = child.wait_with_output().unwrap(); + assert!( + output.status.success(), + "{}", + String::from_utf8_lossy(&output.stderr) + ); + let exec: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); + assert_eq!(exec["state"], "completed"); + + fs::remove_dir_all(root).unwrap(); +} + +#[test] +fn codex_orphaned_appserver_is_reaped_on_next_run() { + let root = temp_root("codex-orphan-reap"); + let bin_dir = root.join("bin"); + fs::create_dir_all(&bin_dir).unwrap(); + let codex = bin_dir.join("codex"); + fake_codex_appserver(&codex); + let common = [ + "--cd", + root.to_str().unwrap(), + "--codex-bin", + codex.to_str().unwrap(), + ]; + + let mut child = Command::new(bin()) + .env("FAKE_TURN_SLEEP", "30") + .args(["codex", "exec", "--task", "long turn", "--session", "orphan"]) + .args(common) + .stdout(Stdio::piped()) + .stderr(Stdio::piped()) + .spawn() + .unwrap(); + + let record_path = root.join(".pandacode/sessions/codex/orphan.json"); + let mut turn_running = false; + for _ in 0..100 { + if let Ok(text) = fs::read_to_string(&record_path) + && let Ok(record) = serde_json::from_str::(&text) + && record["artifacts"]["status"] == "running" + { + turn_running = true; + break; + } + thread::sleep(Duration::from_millis(50)); + } + assert!(turn_running, "exec should reach the running state"); + thread::sleep(Duration::from_millis(300)); + + let pid_dir = root.join(".pandacode/codex/appserver-pids"); + let appserver_pid = fs::read_dir(&pid_dir) + .unwrap() + .flatten() + .next() + .and_then(|entry| entry.file_name().to_str().and_then(|s| s.parse::().ok())) + .expect("app-server pid file should exist while the turn runs"); + + // SIGKILL pandacode so destructors never run; the app-server is orphaned. + child.kill().unwrap(); + child.wait().unwrap(); + thread::sleep(Duration::from_millis(300)); + let alive = Command::new("/bin/kill") + .args(["-0", &appserver_pid.to_string()]) + .status() + .unwrap() + .success(); + assert!(alive, "fake app-server should still be alive after pandacode dies"); + + let output = Command::new(bin()) + .args(["codex", "doctor"]) + .args(common) + .output() + .unwrap(); + assert!( + output.status.success(), + "{}", + String::from_utf8_lossy(&output.stderr) + ); + + let mut reaped = false; + for _ in 0..40 { + let alive = Command::new("/bin/kill") + .args(["-0", &appserver_pid.to_string()]) + .status() + .unwrap() + .success(); + if !alive { + reaped = true; + break; + } + thread::sleep(Duration::from_millis(50)); + } + assert!( + reaped, + "orphaned app-server should be killed by the next pandacode codex command" + ); + assert!(!pid_dir.join(appserver_pid.to_string()).exists()); + + fs::remove_dir_all(root).unwrap(); +} + +#[test] +fn codex_detached_exec_answers_structurally_in_same_turn() { + let root = temp_root("codex-detach-answer"); + let bin_dir = root.join("bin"); + fs::create_dir_all(&bin_dir).unwrap(); + let codex = bin_dir.join("codex"); + fake_codex_appserver(&codex); + let common = [ + "--cd", + root.to_str().unwrap(), + "--codex-bin", + codex.to_str().unwrap(), + ]; + + let output = Command::new(bin()) + .env("FAKE_REQUEST_USER_INPUT", "1") + .args([ + "codex", "exec", "--detach", "--session", "bg", "--task", "ask then continue", + ]) + .args(common) + .output() + .unwrap(); + assert!( + output.status.success(), + "{}", + String::from_utf8_lossy(&output.stderr) + ); + let exec: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); + assert_eq!(exec["detached"], true); + assert_eq!(exec["state"], "running"); + + let record_path = root.join(".pandacode/sessions/codex/bg.json"); + let mut waiting = false; + for _ in 0..100 { + if let Ok(text) = fs::read_to_string(&record_path) + && let Ok(record) = serde_json::from_str::(&text) + && record["artifacts"]["status"] == "waiting_for_user" + { + assert_eq!(record["artifacts"]["pending_questions"][0]["id"], "q1"); + waiting = true; + break; + } + thread::sleep(Duration::from_millis(100)); + } + assert!(waiting, "detached worker should reach waiting_for_user"); + + let output = Command::new(bin()) + .args(["codex", "answer", "--session", "bg", "--choice", "2", "--wait"]) + .args(common) + .output() + .unwrap(); + assert!( + output.status.success(), + "{}", + String::from_utf8_lossy(&output.stderr) + ); + let answer: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); + assert_eq!(answer["answer_mode"], "structured"); + assert_eq!(answer["state"], "completed"); + let message = answer["summary"]["last_agent_message"].as_str().unwrap(); + assert!(message.starts_with("answered:"), "{message}"); + assert!(message.contains("stop here"), "{message}"); + + fs::remove_dir_all(root).unwrap(); +} + +#[test] +fn codex_request_user_input_pauses_and_answer_resumes() { + let root = temp_root("codex-answer-pending"); + let bin_dir = root.join("bin"); + fs::create_dir_all(&bin_dir).unwrap(); + let codex = bin_dir.join("codex"); + fake_codex_appserver(&codex); + + let common = [ + "--cd", + root.to_str().unwrap(), + "--codex-bin", + codex.to_str().unwrap(), + "--json", + ]; + let output = Command::new(bin()) + .env("FAKE_REQUEST_USER_INPUT", "1") + .args([ + "codex", + "exec", + "--session", + "needs-input", + "--task", + "ask then continue", + ]) + .args(common) + .output() + .unwrap(); + assert!( + output.status.success(), + "{}", + String::from_utf8_lossy(&output.stderr) + ); + let exec: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); + assert_eq!(exec["state"], "waiting_for_user"); + assert_eq!( + exec["pending_user_input"]["questions"][0]["question"], + "How should this continue?" + ); + + let output = Command::new(bin()) + .args([ + "codex", + "answer", + "--session", + "needs-input", + "--choice", + "2", + "--wait", + ]) + .args(common) + .output() + .unwrap(); + assert!( + output.status.success(), + "{}", + String::from_utf8_lossy(&output.stderr) + ); + let answer: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); + assert_eq!(answer["ok"], true); + assert_eq!(answer["state"], "completed"); + let message = answer["summary"]["last_agent_message"].as_str().unwrap(); + assert!(message.ends_with("stop here"), "{message}"); + + let record = + fs::read_to_string(root.join(".pandacode/sessions/codex/needs-input.json")).unwrap(); + let record: serde_json::Value = serde_json::from_str(&record).unwrap(); + assert!(record["artifacts"]["pending_questions"].is_null()); + + fs::remove_dir_all(root).unwrap(); +} + +#[test] +fn wait_fast_fails_on_unknown_session() { + let root = temp_root("wait-fastfail"); + let output = Command::new(bin()) + .args([ + "wait", + "--session", + "never-launched", + "--timeout-ms", + "60000", + "--interval-ms", + "500", + "--cd", + root.to_str().unwrap(), + ]) + .output() + .unwrap(); + assert!(!output.status.success(), "unknown session must fail"); + let wait: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); + assert_eq!(wait["state"], "missing_session"); + assert_eq!(wait["missing_sessions"][0], "never-launched"); + // Must fail fast (grace window ~10s), not wait the full 60s timeout. + assert!((wait["elapsed_ms"].as_u64().unwrap()) < 30_000); + fs::remove_dir_all(root).unwrap(); +} + +#[test] +fn wait_flags_missing_reports_and_claude_detach_runs_in_background() { + let root = temp_root("wait-no-report"); + let bin_dir = root.join("bin"); + let state = root.join("tmux-state"); + fs::create_dir_all(&bin_dir).unwrap(); + fs::create_dir_all(&state).unwrap(); + let tmux = bin_dir.join("tmux"); + let claude = bin_dir.join("claude"); + fake_tmux(&tmux, &state); + fake_claude(&claude); + let common = [ + "--cd", + root.to_str().unwrap(), + "--tmux-bin", + tmux.to_str().unwrap(), + "--claude-bin", + claude.to_str().unwrap(), + ]; + + let output = Command::new(bin()) + .args([ + "claude", + "exec", + "--detach", + "--session", + "lane1", + "--task", + "review and write a report", + "--expect-artifact", + "result/lane1.md", + ]) + .args(common) + .output() + .unwrap(); + assert!( + output.status.success(), + "{}", + String::from_utf8_lossy(&output.stderr) + ); + let exec: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); + assert_eq!(exec["detached"], true); + assert_eq!(exec["state"], "running"); + + let output = Command::new(bin()) + .args([ + "wait", + "--session", + "lane1", + "--expect-artifact", + "result/lane1.md", + "--timeout-ms", + "30000", + "--interval-ms", + "300", + "--cd", + root.to_str().unwrap(), + ]) + .output() + .unwrap(); + assert!( + !output.status.success(), + "wait should fail when the report is missing" + ); + let wait: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); + assert_eq!(wait["ok"], false); + assert_eq!(wait["sessions"]["lane1"]["state"], "no_report"); + assert_eq!(wait["missing_artifacts"][0], "result/lane1.md"); + + fs::remove_dir_all(root).unwrap(); +} + #[test] fn claude_runtime_exec_resume_observe_and_stop_with_fake_tmux() { let root = temp_root("claude"); @@ -1122,7 +1591,10 @@ fn claude_runtime_exec_resume_observe_and_stop_with_fake_tmux() { assert!(output.status.success()); let status: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap(); assert!(status.get("capture").is_none()); - assert_eq!(status["state"], "stopped"); + // Top-level state mirrors the recorded outcome (matches codex + wait); + // the dead tmux session shows up under live_state. + assert_eq!(status["state"], "completed"); + assert_eq!(status["live_state"], "stopped"); for args in [ vec!["claude", "status", "--session", "main"], diff --git a/spec/goal/20260604-230837-odw-codex-dogfood.codexctl-review.md b/spec/goal/20260604-230837-odw-codex-dogfood.codexctl-review.md new file mode 100644 index 0000000..32996e3 --- /dev/null +++ b/spec/goal/20260604-230837-odw-codex-dogfood.codexctl-review.md @@ -0,0 +1,42 @@ +# Read-Only Review Prompt: ODW Codex Dogfood Fixes + +You are reviewing local uncommitted changes in `/Users/Zhuanz/workspace/odw-oss`. + +Review stance: +- Be strict. Report only concrete bugs, regressions, false positives/negatives, missing tests, or unsafe assumptions. +- Do not edit files. +- Focus on changed code and committed-intent artifacts, not style preference. + +Context: +- A real ODW PandaCode run dogfooded 36 nodes in an isolated workspace: 33 Codex nodes succeeded, 3 Bamboo domestic-model nodes failed due missing API keys. +- The real run also revealed duplicate `state.agents[*].index` values for concurrent nodes because completed agents read the global `agentIndex`. +- Implemented fixes: + - `odw/src/pack/templates/runtime/odw-js-runner.mjs` now captures a local `index` per `agent()` call. + - Bamboo nodes now run `bambooApiKeyPreflight(...)` before writing prompt files or spawning PandaCode, returning `ok:false`, `state:"blocked"`, and `error.category:"bamboo_missing_api_key"` when no env/config key is available. + - PandaCode raw report writes now receive fallback `session`/`label` context to avoid report filename collisions when downstream reports omit session. + - `odw/scripts/selftest.mjs` fake Bamboo success-path tests now set `PANDACODE_BAMBOO_API_KEY=fake-key`. + - Goal/spec evidence lives under `spec/goal/`. + +Please inspect: +- `git diff -- odw/src/pack/templates/runtime/odw-js-runner.mjs odw/scripts/selftest.mjs odw/src/main.rs spec/goal` +- Relevant existing tests in `odw/scripts/selftest.mjs` and `odw/tests/parity_selftest.rs` + +Validation already run: +- `cargo fmt --check` passed. +- `npm test` passed in `/tmp/odw-dogfood-isolated-wUVV4h`. +- `cargo run -p open-dynamic-workflow -- exec --path /tmp/odw-dogfood-isolated-wUVV4h --script spec/goal/odw-bamboo-preflight-smoke.js --backend pandacode --json` passed with structured blocked result and no prompt/raw report. +- `cargo run -p open-dynamic-workflow -- exec --path /tmp/odw-dogfood-isolated-wUVV4h --script spec/goal/odw-parallel-index-smoke.js --backend mock --json` passed with indexes `1,2,3,4,5`. +- Full mock dogfood workflow passed and had no duplicate indexes. +- `cargo test` passed. +- `cargo clippy --workspace --all-targets -- -D warnings` passed. + +Questions: +1. Can Bamboo preflight incorrectly block a valid configured Bamboo run? +2. Can Bamboo preflight incorrectly allow an invalid run that should be blocked earlier? +3. Are there remaining paths where agent state/events still use global `agentIndex` instead of the local call index? +4. Does raw report fallback create stable unique paths without hiding real sessions? +5. Are selftest changes sufficient, or is there a missing regression test for the new preflight behavior? + +Return: +- Findings first, with file/line references and severity. +- If no findings, say so clearly and note residual risk. diff --git a/spec/goal/20260604-230837-odw-codex-dogfood.goal.md b/spec/goal/20260604-230837-odw-codex-dogfood.goal.md new file mode 100644 index 0000000..c5b9098 --- /dev/null +++ b/spec/goal/20260604-230837-odw-codex-dogfood.goal.md @@ -0,0 +1,517 @@ +# Goal Spec: ODW Codex backend dogfood + +Created: 2026-06-04 23:08:37 CST +Spec file: `spec/goal/20260604-230837-odw-codex-dogfood.goal.md` +Expected working directory: `/Users/Zhuanz/workspace/odw-oss` + +## Goal Command + +```text +/goal Read and execute @spec/goal/20260604-230837-odw-codex-dogfood.goal.md as the source of truth. +Expected working directory: /Users/Zhuanz/workspace/odw-oss. If the run starts from another directory, resolve the Goal file path before continuing. +At the start of the run, after every context compact/compaction, after every resume, and whenever the next action is uncertain, reopen and reread @spec/goal/20260604-230837-odw-codex-dogfood.goal.md before continuing. +Follow the Outcome, Done Criteria, Dynamic Scope, Plan, Self-Validation Harness, Iteration Policy, Stop Conditions, and Final Report Requirements in that file. +Mandatory gates: write/confirm a plan before implementation, implement self-validation, make atomic git commits for completed milestones, and run a codexctl read-only review before final completion. +Keep the Progress Log, Goal History, Decision Log, and Evidence Log updated in the file when meaningful progress, decisions, or validation results occur. +Work in phases: discover, plan, implement, self-validate, review, iterate or pivot, then final audit. +Stop only as Complete, Blocked, or Budget-limited according to the spec; do not loop after completion or after a stop condition is reached. +``` + +## Artifact Boundaries + +- This `.goal.md` is the source-of-truth Goal contract and execution history. +- Progress, decisions, evidence, atomic commits, codexctl review findings, compact/resume rereads, blocked reasons, and final status updates must be written back to this `.goal.md`. +- `*.codexctl-review.md` files are review prompts only. They must not redefine the Goal objective, Done Criteria, scope, or stop conditions. +- The dogfood workflow, comparison scripts, generated reports, and validation artifacts may live under `spec/goal/` or `odw/docs/examples/` only when they are intentionally committed as reusable evidence or regression coverage. + +## Original Request + +User intent, preserving the user's wording and direction: + +> 不 你真实做 后台接入codex 就做复杂的任务 超级复杂 多步骤 并行穿行 多步骤 看看 跟 直接用codexctl是否会提升效果。 你做个goal md 吧 就是让你体验迭代体验迭代 至少 30轮。 改 owd 注意提交代码。 + +Interpreted correction: `owd` means `odw`. + +## Rough Goal Brief + +Final target: +- Real-dogfood ODW's `--backend pandacode` Codex path with a complex multi-step parallel and serial workflow, compare the user experience and evidence against direct `codexctl`, implement any high-value ODW improvements discovered, validate them, and commit the changes. +- Extend the dogfood to include Bamboo domestic-model lanes: high-quality domestic models for entry/planning and exit/review, plus low-cost or weaker domestic models for execution lanes when credentials are available. + +Highest-ROI route: +- Build a bounded dogfood workflow that forces at least 30 real Codex-backed ODW node invocations or recorded iteration events, with `parallel`, `pipeline` or equivalent fanout, resume/report inspection, and structured result logging; run a direct `codexctl` baseline for a comparable task; convert observed ODW friction into a small code or documentation improvement with tests. +- Run the dogfood in an isolated git workspace rather than the ODW source tree, so workers can safely read/write without risking source changes. Use the ODW repo only for the workflow harness, evidence logs, and eventual product improvement. + +Project evidence used: +- `README.md` describes ODW orchestration and PandaCode execution split. +- `odw/README.md` documents zero-install `odw exec`, reports, workflow API, and `pandacode` backend. +- `odw/src/main.rs` defines `doctor`, `exec`, `report`, `runs`, and the `--backend mock|pandacode` CLI. +- `odw/src/pack/templates/runtime/odw-js-runner.mjs` contains workflow runtime behavior, agent caching, schema handling, parallel/fanout helpers, budget handling, and PandaCode dispatch. +- `pandacode/src/runtimes/codex.rs` shows Codex execution through `codexctl session start/execute`, per-session sockets, logs, resume, answer, status, and daemon cleanup. +- `odw/examples/07-parallel-review-apply.js` is the existing complex starter for parallel Codex worktrees, review gate, repair, landing, and verification. +- `cargo test`, `cargo fmt --check`, and `cargo clippy --workspace --all-targets -- -D warnings` already passed before this Goal was written. + +Validation focus: +- Prove real Codex execution through ODW, not only mock mode, by recording ODW run ids, event counts, node count, runtime/model data, logs, report path, and elapsed/error behavior. +- Prove domestic-model coverage by recording Bamboo provider-key discovery and, if keys exist, at least one high-quality entry/exit Bamboo run and one low-cost execution Bamboo run. If keys are absent, record the exact `pandacode bamboo doctor` and attempted run failure as a blocked external-capability result. +- Prove the comparison to direct `codexctl` with a separate read-only baseline command and concise findings. +- Prove code changes through targeted tests plus full workspace checks. + +Dynamic scope: +- The executing agent may inspect and change ODW orchestration, ODW runtime template, ODW examples/docs, and PandaCode Codex integration only when evidence from dogfood runs points there. +- It may add a reusable smoke/evidence script or dogfood workflow if that provides durable validation. + +Stop boundaries: +- Stop as Blocked if Codex account quota, codexctl, Node, pandacode, git worktree safety, or clean commit requirements prevent real execution. +- Stop as Budget-limited if completing 30 real Codex-backed turns would consume disproportionate quota after clear evidence has already identified the ODW issue and the user has not authorized more spend. +- Stop before broad architecture changes, public API changes, auth/account changes, provider pricing changes, or unrelated PandaCode runtime refactors. + +## Clarification Answers + +Questions asked: +- None. The request was explicit enough to proceed: create a Goal spec, real-run ODW with Codex backend, do a complex multi-step parallel/serial dogfood run, compare against direct `codexctl`, modify ODW if evidence supports it, and commit code. + +User answers: +- The user explicitly asked to proceed with real Codex-backed work instead of only static checks or mock runs. + +Assumptions made: +- `owd` refers to `odw`. +- "至少 30 轮" means at least 30 counted dogfood iterations, where a counted round is an ODW-recorded real Codex-backed `agent()` node invocation or an execution-log iteration generated by the dogfood harness. Mock-only nodes do not count. +- The comparison target is practical user experience and operational evidence: reliability, observability, resumability, task decomposition, reporting, setup friction, and whether ODW improves orchestration over direct `codexctl` for multi-node work. +- Code should be committed in the existing Git repository at `/Users/Zhuanz/workspace/odw-oss`. + +Open questions that should stop execution if blocking: +- Whether the user is willing to spend significant model quota if the first real run suggests 30 full Codex coding nodes would be excessive. If quota or rate limits appear, stop as Budget-limited with partial evidence instead of silently reducing real coverage. +- Whether Bamboo provider API keys can be provided if live domestic-model trials are mandatory. Current environment discovery found no `DEEPSEEK_API_KEY`, `KIMI_API_KEY`, `QWEN_API_KEY`, `ZHIPU_API_KEY`, `MINIMAX_API_KEY`, `XIAOMI_API_KEY`, `STEPFUN_API_KEY`, or `PANDACODE_BAMBOO_API_KEY`. + +## Project Recon And Plan Basis + +Repository/root or artifact location inspected: +- `/Users/Zhuanz/workspace/odw-oss` + +Project instructions read: +- No `AGENTS.md` was found within this repository using `rg --files -g 'AGENTS.md'`. +- Read top-level `README.md`, `odw/README.md`, `pandacode/README.md`, the before-goal skill instructions, and the Goal spec template. + +Key files, directories, modules, APIs, tests, configs, or scripts inspected: +- `Cargo.toml`, `odw/Cargo.toml`, `pandacode/Cargo.toml` +- `odw/src/main.rs` +- `odw/src/pack/templates/runtime/odw-js-runner.mjs` +- `pandacode/src/runtimes/codex.rs` +- `odw/examples/01-single-node.js` +- `odw/examples/07-parallel-review-apply.js` +- `odw/tests/parity_selftest.rs` +- `pandacode/tests/fake_runtimes.rs` + +Discovery commands run: +- `git status --short` +- `rg --files -g '!target/**' -g '!.odw/**'` +- `rg --files -g 'AGENTS.md'` +- `sed -n` reads of the README, CLI, runtime template, Codex runtime, and example workflows +- `odw --help`, `pandacode --help`, `odw doctor --json`, `pandacode doctor --json` +- `cargo test` +- `cargo fmt --check` +- `cargo clippy --workspace --all-targets -- -D warnings` +- `odw exec --backend mock` and `odw report --script` on `odw/examples/01-single-node.js` + +Existing validators and commands discovered: +- `cargo test` +- `cargo fmt --check` +- `cargo clippy --workspace --all-targets -- -D warnings` +- `odw exec --script --backend mock --json` +- `odw exec --script --backend pandacode --json` +- `odw report --script ` and `odw report --run latest` +- `odw runs list`, `odw runs show latest` +- `pandacode doctor --json`, `pandacode models --json` +- `codexctl plan`, `codexctl session start/execute/read/list/status` +- `pandacode bamboo doctor --json` and `pandacode bamboo models --json` + +Existing test coverage and likely gaps: +- Rust unit tests cover ODW run journals, resume helper exposure, pruning, spec/capability surfaces, schema-free flexible nodes, and direct runner import blocking. +- PandaCode unit and fake runtime tests cover Codex/Claude/Bamboo command construction, sessions, answers, logs, stop/timeout behavior, prompt transport, provider inference, permissions, and agent tools. +- Gap: there is no committed high-stress real Codex dogfood harness that counts 30 ODW rounds, records observability artifacts, and compares the operator experience against direct `codexctl`. +- Gap: Bamboo can enumerate domestic models, but live domestic-model execution is configuration-blocked without provider API keys in this shell. + +Relevant constraints, risky areas, and pre-existing unrelated changes: +- `git status --short` was clean before creating this spec. +- Real Codex use can consume quota and time. +- Bamboo live runs currently require missing provider keys; attempts should be recorded, then skipped or marked blocked rather than retried wastefully. +- ODW `--timeout` floors Codex nodes at 600 seconds in real coding paths, so long real nodes can overrun a short workflow timeout expectation. +- Worktree and path boundary logic must avoid `.git`, `.odw`, `.pandacode`, `node_modules`, absolute paths, and path escapes. +- Do not stage unrelated changes or generated transient `.odw` / `.pandacode` state unless intentionally committed evidence requires it. + +Plan basis: +- The highest-ROI path is not a large architectural rewrite. It is a real ODW Codex dogfood run in an isolated workspace designed to expose practical friction, with Bamboo lanes attempted according to available credentials, followed by a small improvement and durable validation. Existing tests are already strong for mocked/fake behavior; the missing evidence is real multi-node Codex orchestration, domestic-model capability gating, and operator comparison. + +## Outcome + +ODW is improved or explicitly validated based on a real isolated-workspace dogfood run: at least 30 counted ODW/Codex rounds are evidenced, domestic Bamboo high-quality and low-cost lanes are attempted and either executed or blocked with exact credential evidence, a direct `codexctl` baseline is recorded, any high-value ODW friction found is addressed with scoped code/docs/tests, all required validators pass, an independent `codexctl` review is completed, and the final changes are committed atomically. + +## Reasoning Brief + +- Interpreted intent: the user wants a real, high-stress ODW experience report and practical product improvement, not a superficial mock check. +- Assumptions: `owd` is `odw`; counted rounds must be real enough to exercise the ODW-to-PandaCode-to-Codex path; domestic-model lanes are real only when provider credentials exist; the objective includes code commits when changes are made. +- Dynamic scope choices: start with a dogfood workflow and comparison script; only change runtime/CLI/docs/tests when evidence shows real friction. +- Main risks: model quota, long-running Codex nodes, noisy generated state, accidental unrelated commits, and conflating ODW orchestration improvements with PandaCode runtime internals. +- Strategy: instrument and run first, improve second, validate and review third, commit only coherent scoped changes. + +## Dynamic Scope And Boundaries + +Mission scope: +- Evaluate and improve ODW as a Codex-backed workflow orchestrator for complex parallel and serial tasks. +- Exercise ODW in a separate isolated project workspace so the dogfood workload can be complex and write-capable without touching the ODW source tree. + +In scope: +- `odw/src/main.rs` +- `odw/src/pack/templates/runtime/odw-js-runner.mjs` +- `odw/src/guide.md` +- `odw/README.md` +- `odw/examples/` +- `odw/docs/examples/` +- `odw/tests/` +- `pandacode/src/runtimes/codex.rs` only when direct ODW evidence points to a Codex-runtime integration issue. +- `pandacode/tests/fake_runtimes.rs` only if a PandaCode-side fix is required. +- `spec/goal/20260604-230837-odw-codex-dogfood.goal.md` as execution log. +- A temporary isolated git workspace under `/tmp` or `/private/var/folders/...` for live dogfood runs. +- A task-specific `spec/goal/20260604-230837-odw-codex-dogfood.codexctl-review.md`. + +Out of scope: +- Changing public runtime semantics without test coverage. +- Broad refactors unrelated to dogfood evidence. +- Provider catalog/pricing updates. +- Claude or Bamboo runtime changes unless they are touched by shared code required for the ODW Codex path. +- Modifying user account config, Codex auth, Claude auth, MCP config, billing, or global shell profile. +- Fabricating Bamboo success when provider keys are missing. +- Committing transient `.odw/runs`, `.pandacode`, `target`, or temp artifacts unless deliberately curated as small documentation fixtures. + +Discovery boundary: +- Search code/docs/tests within `/Users/Zhuanz/workspace/odw-oss`. +- Use `odw`, `pandacode`, and `codexctl` local CLI help and doctor output. +- Use internet only if a current external Codex/OpenAI behavior must be verified; prefer local CLI/docs and official OpenAI sources if that happens. +- Use subagents or direct `codexctl plan` for independent review; do not delegate uncontrolled write access to many agents editing the same files. + +Scope expansion protocol: +- Record the evidence that requires expansion in the Decision Log. +- Identify newly affected files and validators before editing them. +- Stop for user input before changing public API contracts, security posture, account/auth behavior, deployment assumptions, or provider billing/model-selection rules. + +Non-negotiable constraints: +- Use `rg`/targeted reads for exploration. +- Use `apply_patch` for manual edits. +- Do not revert user changes. +- Maintain ASCII in edited files unless existing file context requires otherwise. +- Make atomic commits for completed milestones. +- Run the mandatory codexctl read-only review before final completion. + +## Plan And Approach Options + +Candidate approaches: +- A: Add a small committed dogfood workflow/evidence harness, run it through ODW real Codex backend for 30 counted rounds, compare to direct codexctl, then improve the highest-friction issue with targeted tests. +- B: Extend ODW CLI with first-class dogfood or benchmark command if the run shows recurring manual friction that should be productized. +- C: Change PandaCode Codex runtime internals only if ODW dogfood evidence shows a concrete command, status, logging, transport, or daemon lifecycle defect there. + +Chosen initial approach: +- Start with Approach A because it produces evidence quickly, keeps scope bounded, and preserves the option to pivot based on observed failures. + +Pivot triggers: +- Pivot to B if the main friction is repeatable operator ergonomics in ODW CLI/report/run inspection. +- Pivot to C if the real run fails inside PandaCode Codex execution despite ODW dispatch behaving correctly. +- Record Bamboo as blocked, not failed ODW functionality, if the only issue is missing provider credentials. +- Stop as Budget-limited if 30 full Codex nodes are not feasible due to quota/time, after recording the exact blocker and the highest-fidelity partial evidence. + +## Done Criteria + +- [ ] At least 30 counted real ODW/Codex rounds are recorded with run id, event count, runtime/model evidence, and the counting method. +- [ ] Per-round quality is recorded or sampled with explicit criteria: file evidence, novelty, actionability, validation awareness, contradiction/review value, and repetition/boilerplate risk. +- [ ] The final analysis states how quality improved or could improve across iterations, including prompt changes, model placement, schema/score gates, and entry/exit review strategy. +- [ ] The dogfood workload uses both parallel and serial orchestration, with at least one fanout/parallel section and one downstream synthesis or verification section. +- [ ] A direct `codexctl` baseline for a comparable task is run and summarized against ODW on observability, resumability, setup friction, result quality, and failure/debug ergonomics. +- [ ] The real dogfood workload runs in an isolated git workspace outside the ODW source tree. +- [ ] Bamboo domestic-model lanes are attempted for high-quality entry/exit and low-cost execution; live success or missing-key blockage is recorded with evidence. +- [ ] At least one concrete ODW improvement is implemented, or the Decision Log records why no code change was justified after real evidence. +- [ ] Added or changed code/docs/tests are scoped to the dogfood findings and avoid unrelated refactors. +- [ ] `cargo fmt --check` passes. +- [ ] `cargo clippy --workspace --all-targets -- -D warnings` passes. +- [ ] `cargo test` passes. +- [ ] A targeted ODW smoke or regression validator exercises the changed behavior. +- [ ] Plan is written or confirmed before implementation and recorded in the Decision Log. +- [ ] Self-validation harness exists or is explicitly justified as not feasible, with evidence. +- [ ] Code changes have appropriate test coverage, including integration and end-to-end tests where behavior crosses boundaries or affects user workflows. +- [ ] Atomic git commits are created for completed implementation milestones without including unrelated user changes. +- [ ] Independent codexctl read-only review runs before final completion, and findings are resolved or explicitly accepted. + +## Mandatory Gates + +Planning gate: +- [ ] Write or confirm the implementation plan before editing. +- [ ] Record the chosen plan and pivot triggers in the Decision Log. + +Self-validation gate: +- [ ] Design validation before implementation. +- [ ] Create or improve code-level validators where feasible. + +Code test gate: +- [ ] Decide which unit, integration, and end-to-end tests are required for this task. +- [ ] Add or update integration tests when behavior crosses modules, subprocess boundaries, filesystem state, run journals, report rendering, or executor invocation. +- [ ] End-to-end browser tests are not required unless the work changes HTML report behavior; if report UI changes, inspect generated HTML and add the closest practical smoke check. +- [ ] If live Codex E2E cannot complete because of quota/time, record the exact limitation and create a deterministic smoke/contract substitute for the changed code. + +Atomic commit gate: +- [ ] Record pre-edit `git status --short`. +- [ ] Identify pre-existing unrelated changes and avoid staging them. +- [ ] Commit after each completed implementation milestone or phase. +- [ ] Record commit hash, message, changed files, and validation evidence. + +Codexctl review gate: +- [ ] Write a task-specific `spec/goal/20260604-230837-odw-codex-dogfood.codexctl-review.md` review prompt. +- [ ] Run `codexctl plan --cwd /Users/Zhuanz/workspace/odw-oss --prompt-file spec/goal/20260604-230837-odw-codex-dogfood.codexctl-review.md --sandbox read-only --approval-policy never --effort high --timeout unlimited`. +- [ ] Record review output summary and findings. +- [ ] Resolve or explicitly accept high/medium findings. +- [ ] Rerun validators and rerun codexctl review after review-driven code changes unless changes are documentation/log-only. + +## Self-Validation Harness + +Validation design: +- What must be proven: ODW can run a complex real Codex-backed workflow and yields better orchestration evidence than raw direct `codexctl`; any resulting change is correct and covered. +- Baseline evidence to collect: current `odw doctor`, `pandacode doctor`, `pandacode bamboo doctor`, provider-key discovery, `codexctl` baseline run, current test pass state, and real ODW run behavior before changes. +- Target evidence for completion: isolated workspace path, 30 counted rounds, ODW run/report/log artifacts, Bamboo attempted/live-or-blocked result, comparison summary, passing validators, commit hash, codexctl review findings addressed. +- Quality evidence: a per-node or sampled quality table rating evidence grounding, novelty, actionability, and verification strength; repeated low-quality patterns must feed into the ODW improvement decision. +- Continuous checks after changes: targeted ODW smoke/test for touched behavior, `cargo fmt --check`, and focused `cargo test -p ` when possible. +- Final checks before completion: full `cargo test`, `cargo clippy --workspace --all-targets -- -D warnings`, final ODW smoke, `git status --short`, codexctl review. +- Test coverage decision for code changes: Rust unit tests for Rust logic; integration/fake-runtime tests for subprocess/session behavior; ODW mock smoke for workflow-runtime changes; live Codex evidence for user-experience dogfood but not as the only regression test. +- Inconclusive evidence rule: if live Codex cannot finish because of rate limits, auth, quota, or repeated transport failure, stop as Blocked or Budget-limited after preserving logs and adding deterministic coverage for any local code changes. + +Existing validators: +- `cargo fmt --check` +- `cargo clippy --workspace --all-targets -- -D warnings` +- `cargo test` +- `odw exec --script odw/examples/01-single-node.js --backend mock --json` +- `odw report --script odw/examples/01-single-node.js` +- `pandacode doctor --json` +- `odw doctor --json` + +Validators to create or improve: +- A committed dogfood workflow or harness that makes the 30-round ODW/Codex run repeatable enough to audit. +- A targeted deterministic test or smoke command for any ODW code improvement made. + +Required evidence before completion: +- ODW real-run command, run id, counted round total, report path, and summary. +- Direct `codexctl` command and summary. +- Diff summary and commit hash. +- Validator command outputs summarized in Evidence Log. +- codexctl review command and finding summary. + +## Atomic Commit Protocol + +- Record `git status --short` before editing. +- Do not stage or commit unrelated user changes. +- Use path-specific staging when necessary. +- Keep each commit small, coherent, and reversible. +- Commit after each completed implementation milestone or phase. +- Run relevant validators before milestone commits unless intentionally committing a failing baseline/test. +- Record commit hash, message, changed files, and validation evidence in the Evidence Log. +- Stop as Blocked if clean atomic commits are impossible because of git state, unrelated changes, or missing git metadata. + +## Codexctl Review Gate + +Review prompt file: +- `spec/goal/20260604-230837-odw-codex-dogfood.codexctl-review.md` + +Required command: + +```bash +codexctl plan --cwd /Users/Zhuanz/workspace/odw-oss --prompt-file spec/goal/20260604-230837-odw-codex-dogfood.codexctl-review.md --sandbox read-only --approval-policy never --effort high --timeout unlimited +``` + +Review prompt requirements: +- The executing Codex must write this prompt based on this task, this `.goal.md`, current diff, commits, tests, and evidence. +- Read this `.goal.md`. +- Inspect current diff and relevant commits. +- Check bugs, regressions, missing unit/integration/end-to-end tests, scope creep, security/privacy issues, and completion evidence. +- Return prioritized findings with file references. +- Do not edit files. + +Review completion rule: +- High/medium findings must be fixed or explicitly accepted with rationale. +- Validators must be rerun after review-driven changes. +- codexctl review must be rerun after review-driven code changes unless changes are documentation/log-only. + +## Execution Phases + +### Phase 0: Context Discovery And Baseline +- [ ] Inspect relevant files, docs, logs, repo instructions, ODW examples, PandaCode Codex runtime, and existing validators. +- [ ] Record current behavior, real-runtime availability, and baseline checks. +- [ ] Record direct `codexctl` baseline plan for comparison. +Exit criteria: baseline evidence and the first dogfood plan are recorded in the logs. +Stop if: Codex or codexctl is unavailable, auth is missing, or the task cannot be scoped without user input. + +### Phase 1: Plan And Validation Design +- [ ] Compare approaches A/B/C and select the first implementation plan. +- [ ] Define the 30-round counting method and dogfood workload before running it. +- [ ] Define the isolated workspace path and Bamboo high-quality/low-cost trial matrix. +- [ ] Decide required unit, integration, and live-smoke coverage. +- [ ] Record the plan in the Decision Log before implementation. +Exit criteria: dogfood workload, counting method, validators, and pivot triggers are recorded. +Stop if: no credible validator or implementation path exists. + +### Phase 2: Real ODW/Codex Dogfood +- [ ] Run the complex workflow through `odw exec --backend pandacode`. +- [ ] Ensure the workload includes both parallel and serial sections. +- [ ] Keep dogfood writes inside the isolated workspace. +- [ ] Attempt Bamboo high-quality entry/exit and low-cost execution lanes when credentials exist; otherwise record exact missing-key evidence. +- [ ] Collect run id, events, logs, status, report, runtime/model info, failures, and elapsed-time observations. +- [ ] Extract per-round quality signals from node results and record how prompt/model/orchestration choices affected quality. +- [ ] Count at least 30 real rounds or stop as Budget-limited with exact evidence. +Exit criteria: 30 counted rounds and comparison-ready evidence are recorded, or a bounded stop state is justified. +Stop if: quota/rate/time limits prevent meaningful continuation. + +### Phase 3: Versioned Implementation +- [ ] Convert the highest-value dogfood finding into a small ODW improvement. +- [ ] Update or create tests/docs/harnesses with the change. +- [ ] Keep changes tied to the active hypothesis and dynamic scope. +- [ ] Create an atomic git commit after the milestone passes required validation. +Exit criteria: the core validator passes once and a commit exists. +Stop if: implementation requires an out-of-scope public API or runtime architecture change. + +### Phase 4: Self-Validation And Review +- [ ] Run full relevant validators. +- [ ] Run required unit, integration, and smoke checks. +- [ ] Run mandatory codexctl read-only review using the task-specific prompt file. +- [ ] Fix regressions introduced by implementation. +Exit criteria: all Done Criteria pass and review findings are addressed or accepted. +Stop if: two consecutive attempts do not improve any Done Criterion. + +### Phase 5: Iterate Or Pivot +- [ ] If evidence invalidates the plan, record why and choose the next bounded approach. +- [ ] Do not repeat the same hypothesis without new evidence. +Exit criteria: either Done Criteria pass or a new evidence-backed plan is selected. +Stop if: repeated pivots become speculative or exceed scope. + +### Phase 6: Final Audit And Report +- [ ] Review the diff against dynamic scope and non-negotiable constraints. +- [ ] Confirm every Done Criterion has evidence. +- [ ] Confirm atomic commits and codexctl review evidence are recorded. +- [ ] Write the final report. +Exit criteria: final report is complete. +Stop if: any Done Criterion lacks evidence. + +## Task Allocation + +- Supervisor: Maintain this spec, checklist, stop rules, and progress log. +- Implementer: Make scoped code changes only after dogfood evidence supports them. +- Verifier: Run validators and record evidence. +- Reviewer: Use `codexctl plan` for independent read-only review before completion. + +Subagent plan: +- Recon/search: use direct local commands first; use a read-only codexctl plan only for independent review or if context pressure grows. +- Implementation: keep implementation in the main agent unless the touched files become independent enough for a delegated isolated patch. +- Verification/review: mandatory codexctl review plus local validators. +- Merge rule: the supervisor records subagent or codexctl conclusions in this `.goal.md` and only applies evidence-backed findings. + +## Iteration Policy + +- Each counted iteration must name one hypothesis or next best action. +- Each counted iteration must change code, tests, evidence, or the plan. +- Each counted ODW/Codex round must be auditable through ODW events, PandaCode session records, or a committed dogfood summary. +- Each iteration must state whether the current plan is still valid or whether a pivot is needed. +- Update the Progress Log after each phase or meaningful attempt. +- Stop as Blocked if two consecutive implementation iterations do not improve any Done Criterion. + +## Stop Conditions + +Complete only when: +- [ ] Every Done Criterion is satisfied with recorded evidence. + +Stop as blocked when: +- [ ] Required credentials, files, product decisions, or external systems are missing. +- [ ] The task requires scope expansion not authorized by this spec. +- [ ] The same validator fails for the same root cause after the allowed attempts. +- [ ] Two consecutive implementation attempts do not improve any Done Criterion. +- [ ] Atomic commits cannot be made safely. +- [ ] codexctl review is unavailable or cannot run. + +Stop as budget-limited when: +- [ ] Budget, rate limits, time limits, or context limits prevent meaningful continuation of real Codex rounds after preserving available evidence. + +## Progress Log + +| Time | Phase | Action | Evidence | Status | Next | +| --- | --- | --- | --- | --- | --- | +| 2026-06-04 23:08 CST | Pre-goal | Created task-specific Goal spec after read-only recon | `spec/goal/20260604-230837-odw-codex-dogfood.goal.md` | In progress | Start Phase 0 baseline and real dogfood plan | +| 2026-06-04 23:22 CST | Phase 0 | Created isolated git fixture at `/tmp/odw-dogfood-isolated-wUVV4h`; initial `npm test` found a bad test regex, then fixture was fixed and tests passed | Fixture commits `7359f22`, `01ad349`; `npm test` passed 2 tests | In progress | Run ODW workflow mock and real backends | +| 2026-06-04 23:23 CST | Phase 1 | Mock ODW run found workflow authoring bug: bare `cwd` is not available; fixed script to use `globalThis.cwd` | Failed run `odw-exec-1780586618027-59338`; patched `spec/goal/odw-codex-30round-dogfood.js` | In progress | Rerun mock | +| 2026-06-04 23:26 CST | Phase 0 | Direct `codexctl` baseline completed in one read-only Plan turn; it found useful risks but also appeared to report stale/sandbox-skewed test evidence compared with current shell validation | Thread `019e933d-636c-74b3-84e6-e65b6fa810d2`; log `/Users/Zhuanz/.codexctl/logs/run-1780586732382.jsonl`; current `npm test` passes | In progress | Run real ODW backend | +| 2026-06-04 23:35 CST | Phase 2 | Real ODW PandaCode backend run completed with 36 attempted nodes and 33 successful Codex nodes | Run `odw-exec-1780586920954-61912`; report `/tmp/odw-dogfood-isolated-wUVV4h/.odw/runs/odw-exec-1780586920954-61912/report.html` | In progress | Analyze quality and implement evidence-backed ODW fixes | +| 2026-06-04 23:43 CST | Phase 3 | Implemented Bamboo missing-key preflight and raw report fallback context | `odw/src/pack/templates/runtime/odw-js-runner.mjs`; `odw/src/main.rs` assertions | In progress | Smoke-test preflight | +| 2026-06-04 23:44 CST | Phase 3 | Bamboo preflight smoke passed: missing Qwen key returns structured blocked result before executor dispatch | Run `odw-exec-1780587867364-79533`; no Bamboo prompt/raw report generated | In progress | Record quality and continue validation | +| 2026-06-04 23:48 CST | Phase 3 | Quality analysis found duplicate state indexes in concurrent nodes; fixed agent bookkeeping to capture local call index | Historical run had duplicate indexes `24` and `34`; mock smoke run `odw-exec-1780588130389-82632` now has indexes `1,2,3,4,5` | In progress | Run full validators and independent review | +| 2026-06-04 23:58 CST | Phase 4 | First codexctl review found 4 issues; fixed default Deepseek preflight, unknown-provider blocking, raw-report action suffixes, and preflight/raw-report regression tests | Thread `019e9359-2d66-7e82-bc6e-6615c3a60154`; log `/Users/Zhuanz/.codexctl/logs/run-1780588555545.jsonl` | In progress | Rerun validators and codexctl review | +| 2026-06-05 00:05 CST | Phase 4 | Final validators and second codexctl review passed | `cargo fmt --check`; fixture `npm test`; `cargo test`; `cargo clippy --workspace --all-targets -- -D warnings`; review thread `019e9360-8897-7370-ac34-c5a34d402175` | In progress | Commit changes | + +## Goal History + +| Time | Event | Summary | Evidence/Link | +| --- | --- | --- | --- | +| 2026-06-04 23:08 CST | Created | Goal spec created for real ODW Codex backend dogfood and improvement work | `spec/goal/20260604-230837-odw-codex-dogfood.goal.md` | +| 2026-06-04 23:22 CST | Isolated workspace created | Dogfood runs will target `/tmp/odw-dogfood-isolated-wUVV4h`, not the ODW source tree | `git log --oneline` in isolated workspace shows `01ad349`, `7359f22` | +| 2026-06-04 23:35 CST | Real 30+ round run completed | ODW orchestrated 3 Bamboo trials, 1 Codex entry, 20 parallel recon nodes, 10 pipeline nodes, synthesis, and exit review | Run `odw-exec-1780586920954-61912` | +| 2026-06-04 23:49 CST | Quality record created | Per-node quality table and iteration conclusions recorded | `spec/goal/odw-codex-dogfood-quality.md`; `spec/goal/analyze-odw-dogfood-quality.mjs` | + +## Decision Log + +| Time | Decision | Rationale | Evidence | +| --- | --- | --- | --- | +| 2026-06-04 23:08 CST | Treat `owd` as `odw` and proceed without clarification | The repo and prior context use ODW/Open Dynamic Workflow; user explicitly asked to proceed with code and commits | User request; repo path `/Users/Zhuanz/workspace/odw-oss` | +| 2026-06-04 23:08 CST | Start with Approach A | It provides direct evidence before changing code and keeps scope bounded | Existing ODW examples and validators already pass; missing piece is real Codex dogfood evidence | +| 2026-06-04 23:22 CST | Use isolated git workspace for live dogfood | User requested complex tasks and isolated directory; this allows real agent reads/writes without risking ODW source files | `/tmp/odw-dogfood-isolated-wUVV4h` | +| 2026-06-04 23:23 CST | Treat script-global ergonomics as a candidate ODW improvement | The workflow failed after doing all mock nodes because `cwd` was documented/available as `globalThis.cwd` but not as a lexical binding | Run `odw-exec-1780586618027-59338` | +| 2026-06-04 23:26 CST | Keep real ODW node prompts focused on source files | Direct baseline inspected `.odw` artifacts and produced a very large context; real ODW nodes should ignore `.odw/.pandacode` unless explicitly asked | Patched `projectContext()` in `spec/goal/odw-codex-30round-dogfood.js` | +| 2026-06-04 23:35 CST | Implement Bamboo preflight as the primary product fix | Three domestic-model lanes failed immediately with the same missing-key error and one raw report artifact was overwritten, proving ODW should block earlier with structured remediation | Run `odw-exec-1780586920954-61912`; raw report path collision `pandacode-bamboo-bamboo-exec.report.json` | +| 2026-06-04 23:48 CST | Also fix concurrent agent index bookkeeping | Quality analysis showed parallel state records reused completion-time global `agentIndex`, which weakens reports and per-node quality accounting | `spec/goal/analyze-odw-dogfood-quality.mjs`; historical duplicate indexes `24`, `34` | +| 2026-06-04 23:58 CST | Accept and fix all first-review findings | The findings were concrete and locally reproducible: preflight had false-positive/false-negative edges, raw report action collision remained possible, and tests missed the new blocked contract | codexctl thread `019e9359-2d66-7e82-bc6e-6615c3a60154` | +| 2026-06-05 00:04 CST | Treat final review residual config-layout risk as accepted | Second review found no concrete regression; remaining risk is future provider/config layout drift, which is outside this scoped fix and covered by alias/table tests for current repo behavior | codexctl thread `019e9360-8897-7370-ac34-c5a34d402175` | + +## Evidence Log + +| Time | Evidence Type | Command, Commit, File, Route, Or Artifact | Result | +| --- | --- | --- | --- | +| 2026-06-04 23:03 CST | Doctor | `odw doctor --json` | `ok: true`; Node, PandaCode, Codex, Claude, tmux available; Bamboo missing API key only | +| 2026-06-04 23:03 CST | Doctor | `pandacode doctor --json` | `ok: true`; Codex and Claude available; Bamboo provider key missing | +| 2026-06-04 23:04 CST | Test | `cargo test` | Passed: ODW tests, PandaCode unit tests, fake runtime integration tests | +| 2026-06-04 23:04 CST | Lint/format | `cargo fmt --check`; `cargo clippy --workspace --all-targets -- -D warnings` | Passed | +| 2026-06-04 23:04 CST | Mock smoke | `odw exec --script odw/examples/01-single-node.js --backend mock --json` | Passed with `{ ok: true }` | +| 2026-06-04 23:04 CST | Report smoke | `odw report --script odw/examples/01-single-node.js --out /report.html` | Wrote HTML report | +| 2026-06-04 23:21 CST | Bamboo config | `env | rg '^(DEEPSEEK|KIMI|QWEN|ZHIPU|MINIMAX|XIAOMI|STEPFUN|PANDACODE_BAMBOO)_API_KEY='`; `pandacode bamboo doctor --json` | No provider keys in shell; Bamboo state `configuration_needed`, missing `api_key` | +| 2026-06-04 23:22 CST | Isolated validator | `npm test` in `/tmp/odw-dogfood-isolated-wUVV4h` | First run failed due over-escaped test regex; after fix passed 2 tests | +| 2026-06-04 23:23 CST | Mock workflow | `odw exec --path /tmp/odw-dogfood-isolated-wUVV4h --script spec/goal/odw-codex-30round-dogfood.js --backend mock --json` | Failed at final return: `ReferenceError: cwd is not defined`; 36 mock nodes completed before failure | +| 2026-06-04 23:24 CST | Mock workflow | `odw exec --path /tmp/odw-dogfood-isolated-wUVV4h --script spec/goal/odw-codex-30round-dogfood.js --backend mock --json` | Passed; result reports `requestedCodexRounds: 33`, 20 parallel recon nodes, 10 pipeline nodes, 3 Bamboo trials | +| 2026-06-04 23:26 CST | Direct codexctl baseline | `codexctl plan --cwd /tmp/odw-dogfood-isolated-wUVV4h --prompt-file spec/goal/odw-codex-dogfood-direct-codexctl.prompt.md --sandbox read-only --approval-policy never --model gpt-5.4-mini --effort low --timeout unlimited` | Completed in ~49s; thread `019e933d-636c-74b3-84e6-e65b6fa810d2`; reported direct audit strengths/weaknesses and ODW improvement watch item | +| 2026-06-04 23:27 CST | Baseline cross-check | `nl -ba test/cli.test.js`; `npm test`; `git log --oneline --max-count=3` in isolated workspace | Current file has fixed regex; `npm test` passes 2 tests; direct baseline likely observed stale or sandbox-skewed test evidence | +| 2026-06-04 23:30 CST | Quality metric update | User asked to record every run's quality and how iteration improves quality | Added quality criteria to Done Criteria and Phase 2 evidence requirements | +| 2026-06-04 23:35 CST | Real ODW run | `odw exec --path /tmp/odw-dogfood-isolated-wUVV4h --script spec/goal/odw-codex-30round-dogfood.js --input-file spec/goal/odw-codex-30round-dogfood.input.json --backend pandacode --json` | Completed; 33 Codex nodes succeeded; 3 Bamboo nodes failed due missing API key; exit review selected Bamboo preflight as top ODW improvement | +| 2026-06-04 23:39 CST | Quality analysis | `node spec/goal/analyze-odw-dogfood-quality.mjs /tmp/odw-dogfood-isolated-wUVV4h/.odw/runs/odw-exec-1780586920954-61912` | Successful Codex node average 7.24/10; duplicate indexes found; quality file created | +| 2026-06-04 23:44 CST | Product smoke | `cargo run -p open-dynamic-workflow -- exec --path /tmp/odw-dogfood-isolated-wUVV4h --script spec/goal/odw-bamboo-preflight-smoke.js --backend pandacode --json` | Passed; returned `state: "blocked"` and `error.category: "bamboo_missing_api_key"` before prompt/raw report dispatch | +| 2026-06-04 23:48 CST | Product smoke | `cargo run -p open-dynamic-workflow -- exec --path /tmp/odw-dogfood-isolated-wUVV4h --script spec/goal/odw-parallel-index-smoke.js --backend mock --json`; state inspection | Passed; five parallel nodes recorded unique indexes `1,2,3,4,5` | +| 2026-06-04 23:52 CST | Full mock workflow | `cargo run -p open-dynamic-workflow -- exec --path /tmp/odw-dogfood-isolated-wUVV4h --script spec/goal/odw-codex-30round-dogfood.js --input-file spec/goal/odw-codex-30round-dogfood.input.json --backend mock --json`; state inspection | Passed; 36 nodes; no duplicate indexes | +| 2026-06-04 23:56 CST | Test/lint | `cargo fmt --check`; `/tmp/odw-dogfood-isolated-wUVV4h npm test`; `cargo test`; `cargo clippy --workspace --all-targets -- -D warnings` | Passed after fixing selftest fake Bamboo env for new preflight | +| 2026-06-04 23:58 CST | codexctl review | `codexctl plan --cwd /Users/Zhuanz/workspace/odw-oss --prompt-file spec/goal/20260604-230837-odw-codex-dogfood.codexctl-review.md --sandbox read-only --approval-policy never --model gpt-5.4-mini --effort medium --timeout unlimited` | Found 4 actionable issues; all fixed | +| 2026-06-05 00:01 CST | Regression test | `cargo test -p open-dynamic-workflow --test parity_selftest` | Passed after adding blocked preflight, default Deepseek, unknown provider, and raw-report no-overwrite tests | +| 2026-06-05 00:03 CST | Final test/lint | `cargo fmt --check`; `/tmp/odw-dogfood-isolated-wUVV4h npm test`; `cargo test`; `cargo clippy --workspace --all-targets -- -D warnings` | Passed | +| 2026-06-05 00:04 CST | Final codexctl review | `codexctl plan --cwd /Users/Zhuanz/workspace/odw-oss --prompt-file spec/goal/20260604-230837-odw-codex-dogfood.codexctl-review.md --sandbox read-only --approval-policy never --model gpt-5.4-mini --effort low --timeout unlimited` | No concrete regressions found; residual risk limited to future Bamboo config/provider layout drift | +| 2026-06-05 00:06 CST | Final smoke | `odw-bamboo-preflight-smoke.js`; `odw-parallel-index-smoke.js`; latest state inspection | Passed on current code; Bamboo smoke run `odw-exec-1780589198793-28851`; parallel smoke run `odw-exec-1780589198802-28850`; indexes `1,2,3,4,5` | +| 2026-06-05 00:07 CST | Commit | `git commit -m "Dogfood ODW Codex backend orchestration"` | Created commit `67603e7` with runner fixes, selftests, and dogfood evidence artifacts | + +## Final Report Requirements + +The executing Goal run must finish with: +- Final status: Complete, Blocked, or Budget-limited +- Checklist state +- Files changed +- Atomic commits created +- Commands run and results +- Test coverage summary, including unit/integration/end-to-end coverage or infeasibility rationale +- ODW real-run command, run id, counted rounds, and report/log artifacts +- Direct codexctl baseline command and comparison summary +- codexctl review command and findings summary +- Evidence artifacts +- Risks and follow-ups diff --git a/spec/goal/analyze-odw-dogfood-quality.mjs b/spec/goal/analyze-odw-dogfood-quality.mjs new file mode 100644 index 0000000..907fdba --- /dev/null +++ b/spec/goal/analyze-odw-dogfood-quality.mjs @@ -0,0 +1,71 @@ +#!/usr/bin/env node +import { readFileSync } from "node:fs"; +import { join } from "node:path"; + +const runDir = process.argv[2]; +if (!runDir) { + console.error("usage: node spec/goal/analyze-odw-dogfood-quality.mjs "); + process.exit(2); +} + +const state = JSON.parse(readFileSync(join(runDir, "state.json"), "utf8")); +const entries = [ + ...Object.values(state.failedAgents || {}), + ...Object.values(state.agents || {}) +].sort((a, b) => (a.index || 0) - (b.index || 0) || String(a.key).localeCompare(String(b.key))); + +function textOf(entry) { + return typeof entry.result === "string" ? entry.result : JSON.stringify(entry.result || {}); +} + +function scoreEntry(entry) { + if (entry.ok === false) { + return { score: 0, level: "blocked", evidence: 0, verification: 0, novelty: 0, correction: 0 }; + } + const text = textOf(entry); + const hasFileEvidence = /(?:src|test)\/|README|package\.json|\.js|\.json/.test(text); + const evidence = (hasFileEvidence ? 2 : 0) + (/\bEvidence\b|:\d+|file|path|command|validator/i.test(text) ? 1 : 0); + const verification = /npm test|passes|passed|20-process|persisted|verify|overstated|missing evidence/i.test(text) ? 2 : 0; + const novelty = /lost updates|security|concurrency|corruption|schema|migration|release|observability|error|path|risk|gap|preflight/i.test(text) ? 2 : 1; + const correction = /overstated|missing evidence|supported|temper|verify|SYNTHESIS|EXIT_REVIEW/i.test(text) ? 2 : 0; + const score = Math.min(10, evidence + verification + novelty + correction + (text.length > 250 ? 1 : 0)); + return { + score, + level: score >= 8 ? "high" : score >= 5 ? "medium" : "low", + evidence, + verification, + novelty, + correction + }; +} + +function preview(entry) { + return textOf(entry) + .replace(/\s+/g, " ") + .replace(/\|/g, "\\|") + .slice(0, 130); +} + +const scored = entries.map((entry) => ({ entry, quality: scoreEntry(entry) })); +const successful = scored.filter((row) => row.entry.ok !== false); +const totalScore = successful.reduce((sum, row) => sum + row.quality.score, 0); +const duplicateIndexes = [...scored.reduce((map, row) => { + const index = row.entry.index || 0; + if (!map.has(index)) { + map.set(index, []); + } + map.get(index).push(row.entry.key); + return map; +}, new Map()).entries()].filter(([, keys]) => keys.length > 1); + +console.log(`# ODW Dogfood Quality Analysis`); +console.log(); +console.log(`Run: \`${runDir}\``); +console.log(`Nodes: ${entries.length}; successful: ${successful.length}; blocked/failed: ${entries.length - successful.length}; average successful score: ${(totalScore / Math.max(1, successful.length)).toFixed(2)}/10.`); +console.log(`Duplicate state indexes: ${duplicateIndexes.length ? duplicateIndexes.map(([index, keys]) => `${index}=${keys.join(",")}`).join("; ") : "none"}.`); +console.log(); +console.log("| # | node | phase | status | tokens | score | quality | note |"); +console.log("|---:|---|---|---|---:|---:|---|---|"); +for (const { entry, quality } of scored) { + console.log(`| ${entry.index ?? ""} | ${entry.key} | ${entry.phase || ""} | ${entry.ok === false ? "blocked" : "ok"} | ${entry.tokens || 0} | ${quality.score} | ${quality.level} | ${preview(entry)} |`); +} diff --git a/spec/goal/odw-bamboo-preflight-smoke.js b/spec/goal/odw-bamboo-preflight-smoke.js new file mode 100644 index 0000000..3b247e0 --- /dev/null +++ b/spec/goal/odw-bamboo-preflight-smoke.js @@ -0,0 +1,16 @@ +export default async function bambooPreflightSmoke() { + phase("Bamboo Preflight Smoke", "Verify missing-key Bamboo nodes are blocked before executor dispatch."); + const result = await agent("Return BAMBOO_OK if this executor actually runs.", { + id: "bamboo-preflight-qwen", + label: "bamboo-preflight-qwen", + runtime: "bamboo", + provider: "qwen", + model: "qwen3.6-flash", + timeout: 30 + }); + + return { + ok: result?.ok === false && result?.state === "blocked" && result?.error?.category === "bamboo_missing_api_key", + result + }; +} diff --git a/spec/goal/odw-codex-30round-dogfood.input.json b/spec/goal/odw-codex-30round-dogfood.input.json new file mode 100644 index 0000000..dee704f --- /dev/null +++ b/spec/goal/odw-codex-30round-dogfood.input.json @@ -0,0 +1,36 @@ +{ + "projectTask": "Dogfood ODW on this isolated Dogfood Taskboard CLI project. Evaluate CLI ergonomics, task storage, reports, tests, docs, failure modes, and whether ODW orchestration improves multi-agent evidence gathering over direct codexctl.", + "codexModel": "gpt-5.4-mini", + "codexEffort": "low", + "exitCodexEffort": "medium", + "codexTimeout": 300, + "codexConcurrency": 3, + "enableBamboo": true, + "bambooTrials": [ + { + "id": "bamboo-entry-high-qwen", + "label": "bamboo-entry-high-qwen", + "provider": "qwen", + "model": "qwen3.7-max", + "purpose": "High-quality domestic entry planner", + "prompt": "Act as a high-quality domestic-model entry planner. Inspect package.json, README.md, src/, and test/. Return a concise audit plan and note whether this model is a good entry planner. Do not edit files." + }, + { + "id": "bamboo-exec-low-qwen", + "label": "bamboo-exec-low-qwen", + "provider": "qwen", + "model": "qwen3.6-flash", + "effort": "low", + "purpose": "Low-cost domestic execution lane", + "prompt": "Act as a lower-cost domestic-model execution lane. Inspect src/ and test/ quickly. Return one low-risk issue candidate and one validation command. Do not edit files." + }, + { + "id": "bamboo-exit-high-kimi", + "label": "bamboo-exit-high-kimi", + "provider": "kimi", + "model": "kimi-k2.6", + "purpose": "High-quality domestic exit reviewer", + "prompt": "Act as a high-quality domestic-model exit reviewer. Inspect the repository and identify the top risk in letting cheaper execution agents produce changes. Do not edit files." + } + ] +} diff --git a/spec/goal/odw-codex-30round-dogfood.js b/spec/goal/odw-codex-30round-dogfood.js new file mode 100644 index 0000000..73bef12 --- /dev/null +++ b/spec/goal/odw-codex-30round-dogfood.js @@ -0,0 +1,274 @@ +export const meta = { + name: "odw-codex-30round-dogfood", + description: "Real ODW -> PandaCode -> Codex stress dogfood with parallel, pipeline, synthesis, and Bamboo trial lanes.", + phases: [ + { title: "Bamboo Trials" }, + { title: "Codex Entry" }, + { title: "Parallel Recon" }, + { title: "Pipeline Checks" }, + { title: "Synthesis" }, + { title: "Exit Review" }, + ], +}; + +const codexModel = args?.codexModel || "gpt-5.4-mini"; +const codexEffort = args?.codexEffort || "low"; +const codexTimeout = Number(args?.codexTimeout || 240); +const codexConcurrency = Number(args?.codexConcurrency || 3); +const enableBamboo = args?.enableBamboo !== false; + +const codexOptions = (id, label, extra = {}) => ({ + id, + label, + runtime: "codex", + model: codexModel, + effort: codexEffort, + permission: "limited", + timeout: codexTimeout, + ...extra, +}); + +const bambooOptions = (id, label, provider, model, extra = {}) => ({ + id, + label, + runtime: "bamboo", + provider, + model, + effort: extra.effort || "high", + permission: "limited", + timeout: Number(extra.timeout || 180), + ...extra, +}); + +function compact(value, limit = 1200) { + const text = typeof value === "string" ? value : JSON.stringify(value); + return text.length > limit ? `${text.slice(0, limit)}...` : text; +} + +function projectContext() { + return `You are running inside an isolated git workspace created only for this ODW dogfood. +Do not edit files unless explicitly asked. +Inspect the local files with read-only commands when needed. +Return concise evidence, not broad advice. +Ignore .odw/ and .pandacode/ unless the prompt explicitly asks about ODW orchestration artifacts. + +Project task: +${args?.projectTask || "Evaluate the isolated mini project and its CLI/test/docs quality."}`; +} + +const bambooTrials = []; +if (enableBamboo) { + phase("Bamboo Trials", "Attempt high-quality domestic model entry/exit lanes and a lower-cost execution lane."); + const trialConfigs = args?.bambooTrials || [ + { + id: "bamboo-entry-high", + label: "bamboo-entry-high", + provider: "qwen", + model: "qwen3.7-max", + purpose: "High-quality entry planner", + prompt: + "Act as the high-quality domestic-model entry planner. Read package.json and README.md if available. Return a concise plan for auditing this isolated project. Do not edit files.", + }, + { + id: "bamboo-exec-low", + label: "bamboo-exec-low", + provider: "qwen", + model: "qwen3.6-flash", + effort: "low", + purpose: "Lower-cost execution lane", + prompt: + "Act as the lower-cost domestic-model execution lane. Inspect src/ and test/ quickly. Return one concrete low-risk implementation observation. Do not edit files.", + }, + { + id: "bamboo-exit-high", + label: "bamboo-exit-high", + provider: "kimi", + model: "kimi-k2.6", + purpose: "High-quality exit reviewer", + prompt: + "Act as the high-quality domestic-model exit reviewer. Inspect the repository summary and identify one risk in using cheap execution agents. Do not edit files.", + }, + ]; + for (const trial of trialConfigs) { + const result = await agent( + `${projectContext()} + +Domestic model trial: +- purpose: ${trial.purpose} +- provider: ${trial.provider} +- model: ${trial.model} + +Task: +${trial.prompt} + +Final response: start with BAMBOO_TRIAL ${trial.id}: and summarize whether you could run, what you inspected, and any blocker.`, + bambooOptions(trial.id, trial.label, trial.provider, trial.model, { + effort: trial.effort || "high", + timeout: trial.timeout || 180, + }) + ); + bambooTrials.push({ trial, result }); + } +} + +phase("Codex Entry", "Use Codex as the reliable entry planner for the isolated task."); +const entry = await agent( + `${projectContext()} + +Entry task: +Inspect package.json, README.md, src/, and test/ at a high level. +Return: +1. the project shape; +2. three audit dimensions worth parallelizing; +3. one validator command to use later. + +Final response: start with CODEX_ENTRY and stay under 160 words.`, + codexOptions("codex-entry", "codex-entry") +); + +const reconDimensions = [ + ["package-surface", "Inspect package.json scripts and dependency surface."], + ["cli-contract", "Inspect the CLI command behavior described by README and src/cli.js."], + ["parser", "Inspect parsing logic and edge cases."], + ["formatter", "Inspect formatter/output behavior."], + ["storage", "Inspect JSON storage/read-write assumptions."], + ["errors", "Inspect error handling and user-facing messages."], + ["tests-unit", "Inspect unit test coverage and assertions."], + ["tests-integration", "Inspect integration-style gaps."], + ["docs-readme", "Inspect README accuracy and missing workflow details."], + ["docs-examples", "Inspect examples and command snippets."], + ["security-paths", "Inspect path handling and workspace escape risks."], + ["data-model", "Inspect task object fields and schema assumptions."], + ["performance", "Inspect likely performance hot spots for many tasks."], + ["concurrency", "Inspect behavior if commands run concurrently."], + ["observability", "Inspect logs/output useful for debugging."], + ["migration", "Inspect versioning or migration gaps."], + ["ux-new-user", "Inspect first-run and onboarding clarity."], + ["ux-failure", "Inspect failure recovery and next-step guidance."], + ["maintainability", "Inspect module boundaries and naming."], + ["release", "Inspect packaging/release readiness."], +]; + +phase("Parallel Recon", "Run twenty independent Codex inspection lanes."); +const recon = await parallel( + reconDimensions.map(([id, instruction], index) => () => + agent( + `${projectContext()} + +Parallel recon lane ${index + 1}/20: ${id} +${instruction} + +Constraints: +- Read files as needed. +- Do not edit files. +- Return exactly one concise paragraph. +- Start the final response with RECON ${id}:`, + codexOptions(`recon-${id}`, `recon-${id}`) + ) + ), + { label: "parallel-recon", max: codexConcurrency } +); + +const pipelineItems = [ + { id: "cli-flow", files: "src/cli.js test/cli.test.js", focus: "CLI user flow and tests" }, + { id: "task-store", files: "src/store.js test/store.test.js", focus: "JSON task persistence" }, + { id: "reporting", files: "src/report.js README.md", focus: "Output/report ergonomics" }, + { id: "docs-contract", files: "README.md package.json", focus: "Documented command contract" }, + { id: "quality-gates", files: "package.json test/", focus: "Validation commands and quality gates" }, +]; + +phase("Pipeline Checks", "Run five two-stage Codex pipelines: inspect then verify."); +const pipelineResults = await pipeline( + pipelineItems, + async (item) => + agent( + `${projectContext()} + +Pipeline inspect stage for ${item.id}. +Focus: ${item.focus} +Files: ${item.files} + +Read the files and return the strongest evidence-backed observation. +Do not edit files. +Final response must start with PIPE_INSPECT ${item.id}:`, + codexOptions(`pipe-${item.id}-inspect`, `pipe-${item.id}-inspect`) + ), + async (inspection, item) => + agent( + `${projectContext()} + +Pipeline verify stage for ${item.id}. +Prior inspection: +${compact(inspection)} + +Task: +Challenge the prior inspection. Say whether it is actionable, overstated, or missing evidence. +Do not edit files. +Final response must start with PIPE_VERIFY ${item.id}:`, + codexOptions(`pipe-${item.id}-verify`, `pipe-${item.id}-verify`) + ) +); + +phase("Synthesis", "Use Codex to synthesize the parallel and pipeline evidence."); +const synthesis = await agent( + `${projectContext()} + +Entry: +${compact(entry)} + +Bamboo trials: +${compact(bambooTrials, 1800)} + +Parallel recon results: +${compact(recon, 5000)} + +Pipeline results: +${compact(pipelineResults, 4000)} + +Task: +Synthesize the evidence into: +1. what ODW made easier than direct codexctl; +2. what ODW made harder; +3. one concrete ODW product improvement worth implementing; +4. whether domestic high/low model lanes executed or were blocked. + +Final response must start with SYNTHESIS:`, + codexOptions("codex-synthesis", "codex-synthesis") +); + +phase("Exit Review", "Use Codex as final exit reviewer over the run evidence."); +const exitReview = await agent( + `${projectContext()} + +Synthesis: +${compact(synthesis, 3000)} + +Task: +Act as an exit reviewer. Identify the top ODW improvement candidate and the key evidence needed before committing it. +Also compare ODW orchestration with direct codexctl for this 30+ node workload. +Do not edit files. +Final response must start with EXIT_REVIEW:`, + codexOptions("codex-exit-review", "codex-exit-review", { + model: args?.exitCodexModel || codexModel, + effort: args?.exitCodexEffort || "medium", + }) +); + +return { + ok: true, + isolatedWorkspace: globalThis.cwd, + requestedCodexRounds: 33, + countedCodexNodes: { + entry: 1, + parallelRecon: reconDimensions.length, + pipeline: pipelineItems.length * 2, + synthesis: 1, + exitReview: 1, + }, + bambooTrials, + entry, + recon, + pipelineResults, + synthesis, + exitReview, +}; diff --git a/spec/goal/odw-codex-dogfood-direct-codexctl.prompt.md b/spec/goal/odw-codex-dogfood-direct-codexctl.prompt.md new file mode 100644 index 0000000..fc1c948 --- /dev/null +++ b/spec/goal/odw-codex-dogfood-direct-codexctl.prompt.md @@ -0,0 +1,19 @@ +You are the direct codexctl baseline for an ODW dogfood comparison. + +Working directory: +/tmp/odw-dogfood-isolated-wUVV4h + +Task: +Inspect this isolated Dogfood Taskboard CLI project in read-only mode. +Compare what a single direct codexctl planning turn can provide versus an ODW +workflow that decomposes the same work into many parallel and serial model +nodes. + +Please return a concise report with: +1. project shape and key files inspected; +2. the top 5 evidence-backed issues or risks in the mini project; +3. what direct codexctl made easy or hard for this multi-dimension audit; +4. what ODW's planned 30+ node orchestration is expected to improve; +5. one ODW product improvement to watch for during the real dogfood run. + +Do not edit files. Prefer concrete file references and validator commands. diff --git a/spec/goal/odw-codex-dogfood-quality.md b/spec/goal/odw-codex-dogfood-quality.md new file mode 100644 index 0000000..9590a0e --- /dev/null +++ b/spec/goal/odw-codex-dogfood-quality.md @@ -0,0 +1,62 @@ +# ODW Codex Dogfood Quality Record + +Run: `/tmp/odw-dogfood-isolated-wUVV4h/.odw/runs/odw-exec-1780586920954-61912` + +Scoring rubric: evidence density (0-3), verification/reproduction (0-2), novelty/risk contribution (0-2), correction/second-pass value (0-2), useful detail bonus (0-1). The score is a practical quality signal for dogfood iteration, not a model benchmark. + +Summary: +- 36 attempted nodes: 33 successful Codex nodes, 3 blocked Bamboo domestic-model nodes. +- Average successful Codex node quality: 7.24/10. +- Best quality pattern: focused recon followed by explicit verify nodes that challenged overstatements. +- Weakest quality pattern: Bamboo high/low domestic trials could not run because no API key was configured. +- Product bug found by quality logging: concurrent agent bookkeeping reused the global `agentIndex` at completion time, so successful parallel nodes had duplicate `state.agents[*].index` values (`24` and `34`). Fixed by capturing a per-call local index. +- Product improvement found by model trial failures: ODW should preflight Bamboo API key availability before dispatch. Fixed by returning structured `state: "blocked"` / `category: "bamboo_missing_api_key"` before PandaCode spawn. + +Post-review iteration: +- First codexctl review found that the initial preflight could falsely block default Bamboo runs with only `DEEPSEEK_API_KEY`, could let unknown explicit providers skip the gate, and could still overwrite raw reports for same-session exec/answer actions. +- The final implementation aligns no-provider Bamboo with PandaCode's default `deepseek`, blocks unknown providers as `bamboo_unknown_provider`, and appends the PandaCode action to raw report filenames. +- Added selftest coverage for missing-key preflight, default Deepseek dispatch, unknown-provider blocking, and raw report no-overwrite behavior. + +| # | node | status | score | quality note | +|---:|---|---|---:|---| +| 1 | bamboo-entry-high-qwen | blocked | 0 | High-quality domestic entry could not run; missing Bamboo API key. | +| 2 | bamboo-exec-low-qwen | blocked | 0 | Low-cost domestic execution could not run; same missing key. | +| 3 | bamboo-exit-high-kimi | blocked | 0 | High-quality domestic exit could not run; same missing key. | +| 4 | codex-entry | ok | 8 | Good project map and parallel audit dimensions; included validator. | +| 5 | recon-package-surface | ok | 6 | Accurate dependency/script surface, but mostly descriptive. | +| 6 | recon-cli-contract | ok | 8 | Strong README/code contract evidence and runtime behavior summary. | +| 7 | recon-parser | ok | 6 | Good parser edge-case inventory, no direct reproduction. | +| 8 | recon-formatter | ok | 6 | Useful output-format evidence, limited severity analysis. | +| 9 | recon-storage | ok | 6 | Correct storage assumptions, mostly code inspection. | +| 10 | recon-errors | ok | 8 | Clear failure-mode inventory and test gap evidence. | +| 11 | recon-tests-unit | ok | 6 | Narrow coverage described accurately. | +| 12 | recon-tests-integration | ok | 7 | Good subprocess/exit-code gap analysis. | +| 13 | recon-docs-readme | ok | 6 | Accurate docs gap list, no runtime proof. | +| 14 | recon-docs-examples | ok | 8 | Good alignment between docs, implementation, and tests. | +| 15 | recon-security-paths | ok | 8 | Strong mismatch between README locality claim and unchecked `TASKBOARD_DB`. | +| 16 | recon-data-model | ok | 6 | Good schema assumptions, no malformed-data reproduction. | +| 17 | recon-performance | ok | 5 | Directionally right, but low user impact proof. | +| 18 | recon-concurrency | ok | 8 | Highest-value recon: reproduced silent lost updates with 20 processes. | +| 19 | recon-observability | ok | 8 | Good command/logging evidence and actionable observability gap. | +| 20 | recon-migration | ok | 6 | Correct versioning gap, mostly static. | +| 21 | recon-ux-new-user | ok | 6 | Useful first-run gap, no user trace. | +| 22 | recon-ux-failure | ok | 5 | Directional, but mostly duplicates error-lane evidence. | +| 23 | recon-maintainability | ok | 6 | Balanced module-boundary analysis. | +| 24 | recon-release | ok | 8 | Clear manifest evidence and release blocker. | +| 25 | pipe-cli-flow-inspect | ok | 6 | Found real test gap, but needed challenge pass. | +| 26 | pipe-task-store-inspect | ok | 6 | Correct but slightly overstated without reproduction. | +| 27 | pipe-reporting-inspect | ok | 6 | Actionable UX concern, severity under-evidenced. | +| 28 | pipe-docs-contract-inspect | ok | 5 | Weaker because installable CLI requirement was assumed. | +| 29 | pipe-quality-gates-inspect | ok | 6 | Correct lack of layered gates, overstated current tests. | +| 30 | pipe-cli-flow-verify | ok | 10 | Excellent correction: separated real test gap from unsupported claims. | +| 31 | pipe-task-store-verify | ok | 10 | Good second-pass calibration; asked for malformed/concurrent proof. | +| 32 | pipe-reporting-verify | ok | 10 | Good severity correction and test-contract awareness. | +| 33 | pipe-docs-contract-verify | ok | 9 | Strongly corrected unsupported packaging claim. | +| 34 | pipe-quality-gates-verify | ok | 10 | Best calibration of existing tests vs missing error-path coverage. | +| 35 | codex-synthesis | ok | 10 | Correctly compared ODW vs direct codexctl and selected preflight fix. | +| 36 | codex-exit-review | ok | 10 | High-quality exit: named evidence needed and product improvement. | + +Iteration conclusion: +- Quality improved when the workflow forced `inspect -> verify -> synthesize`; verify nodes materially reduced overclaiming. +- The most valuable single node was not a high-effort model call; it was a low-effort focused concurrency recon that performed a reproduction. +- For future quality gains, route cheap/low-effort agents to narrow evidence collection, then require high-quality entry/exit reviewers to reject unsupported severity claims. diff --git a/spec/goal/odw-parallel-index-smoke.js b/spec/goal/odw-parallel-index-smoke.js new file mode 100644 index 0000000..75c103d --- /dev/null +++ b/spec/goal/odw-parallel-index-smoke.js @@ -0,0 +1,18 @@ +export default async function parallelIndexSmoke() { + phase("Parallel Index Smoke", "Verify concurrent agent bookkeeping keeps unique call indexes."); + const labels = ["alpha", "beta", "gamma", "delta", "epsilon"]; + const results = await parallel( + labels.map((label) => () => agent(`Return ${label}.`, { + id: `parallel-index-${label}`, + label: `parallel-index-${label}`, + runtime: "codex" + })), + { label: "parallel-index-smoke", maxConcurrency: 5 } + ); + + return { + ok: results.length === labels.length, + labels, + results + }; +}