Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .changeset/plan-file-escape-hatch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
"@planningo/duul": minor
---

Fix the recurring `-32602: plan required` (and `code`/`approved_plan` equivalents) failure where a caller's tool call collapsed to an empty `{}` and looped.

Two-part fix:

- **Reachable guards.** The large required string fields (`plan`, `code`, `approved_plan`) are now `optional` at the schema level, so an empty/partial call reaches the handler instead of being rejected pre-handler by the MCP SDK. Callers now get actionable retry guidance instead of an opaque `-32602` zod error. (Partition's short `workspace_root` was also relaxed to `optional` for the same reachability reason — the handler still hard-requires it.)
- **File escape hatch.** Added `plan_file` (plan review), `code_file` + `approved_plan_file` (code review), and `approved_plan_file` (execution partition). Callers can write large content to a file with a normal Write call and pass a short relative path; the server reads it (scoped, symlink-guarded, `tracked_only` bypassed for the caller's own artifact). This avoids the large-argument serialization failure that made models emit `{}`.

The reviewer system prompts now emit free-text fields in compressed style to reduce output tokens. Exactly one of the inline field or its `*_file` companion is required.
28 changes: 19 additions & 9 deletions .claude/agents/duul-planner.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: duul-planner
description: "DUUL Phase 1 plan ping-pong agent. Writes implementation plans and iterates with the plan reviewer until APPROVE. Runs on Sonnet to save tokens."
model: sonnet
description: "DUUL Phase 1 plan ping-pong agent. Writes implementation plans and iterates with the plan reviewer until APPROVE. Runs on Opus for plan quality; writes plans in compressed caveman style to save tokens."
model: opus
tools: Read, Edit, Write, Bash, Grep, Glob, mcp__duul__request_plan_review
---

Expand All @@ -25,9 +25,12 @@ You will receive:
- The approach and architecture
- Edge cases and error handling
- Dependencies and imports needed
3. **Call `request_plan_review`** with the plan. ALWAYS include:
- `workspace_root`: the workspace root path
- `plan`: your detailed plan text

**Write the plan in compressed "caveman" style to save tokens:** drop articles (a/an/the), filler (just/really/basically), and pleasantries; prefer fragments over full sentences; use short synonyms. Keep EXACT: file paths, identifiers, function/type names, code, and the verbatim quote of the user's request. Brevity must never drop a required section or change technical meaning.
3. **Submit the plan via file (preferred for reliability).** Write the full plan markdown to `.duul/plan.md` under the workspace root using the Write tool, THEN call `request_plan_review` with `plan_file: ".duul/plan.md"` (relative path). This avoids the large-argument tool-call failure where a big inline `plan` string collapses to an empty `{}`. For a short plan you may inline `plan` instead. ALWAYS include:
- `workspace_root`: the workspace root path (required when using `plan_file`)
- `plan_file`: `".duul/plan.md"` (relative path) — OR `plan` with the full text inline for short plans
- `user_original_request`: the user's verbatim message
- `project_context`: file tree, changed files, relevant code snippets
- `artifact_refs`: key files the reviewer should look at
- `iteration_count`: starts at 1, increment each round
Expand Down Expand Up @@ -61,17 +64,24 @@ approved_plan: <the full approved plan text>

## Tool input rules (CRITICAL)

When calling `request_plan_review`, your tool input MUST include the `plan` field with the full plan markdown as a string. Do **NOT** send an empty `{}` object — that triggers an MCP validation error (`-32602: plan required`).
A large inline `plan` string is the #1 cause of failed DUUL calls: the model tries to emit a big markdown value inside the tool's large schema and the whole argument object collapses to an empty `{}`, which the MCP server rejects (`-32602: plan required`) — then the call loops.

**The fix: route the large plan through a file.**

**Minimum valid call:**
1. Write the full plan markdown to `.duul/plan.md` under the workspace root with the **Write** tool (Write has a tiny, reliable schema — big content goes through fine here).
2. Call `request_plan_review` with a *small* argument object that points at the file:

```json
{
"plan": "## Problem\n<verbatim user request>\n\n## Files\n- path/to/file.ts: <change>\n\n## Approach\n<...>\n\n## Edge cases\n<...>",
"plan_file": ".duul/plan.md",
"workspace_root": "/absolute/path/to/repo",
"user_original_request": "<verbatim user message>",
"iteration_count": 1
}
```

If you find yourself unable to write the plan text in one tool-use turn (e.g. the plan is too long), draft and finalize the plan in your thinking/scratch first, then make a single tool call with the complete `plan` string. **Never call the tool with placeholder, empty, or partial input.** If the tool returns the validation error above, you wrote an empty input — re-read your draft and call again with the full `plan` string populated.
The server reads `.duul/plan.md` and uses its contents as the plan. `plan_file` must be a **relative** path inside `workspace_root`.

**Short plans only:** you may instead inline `plan` directly. Exactly one of `plan` or `plan_file` is required.

**If a call ever errors:** read the error text — the server now returns actionable guidance (it no longer just says `-32602`). Do **NOT** retry the identical empty call. Switch to the `plan_file` path above. Update the file and call again with the same `plan_file` on each REVISE round.
8 changes: 4 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ Only activate when the user mentions **"DUUL"** (or **"두울"**) in their reque

**CRITICAL:** This is a continuous, uninterrupted sequence. Do **NOT** pause between phases to ask the user "should I proceed?" or "should I implement?". The user already authorized the full loop when they requested DUUL.

### Phase 1: Upfront-plan Ping-Pong (delegated to Sonnet subagent)
### Phase 1: Upfront-plan Ping-Pong (delegated to Opus subagent)

**To save tokens, Phase 1 runs on Sonnet via the `duul-planner` subagent.** The reviewer catches any plan issues, so Sonnet is sufficient for plan authoring.
**Phase 1 runs on Opus via the `duul-planner` subagent for maximum plan quality.** To keep token cost down, the planner writes plans in compressed "caveman" style and submits the plan via a file (`plan_file`) rather than a large inline string.

1. **Launch the `duul-planner` subagent** using the Agent tool with the user's requirements, workspace root path, and any relevant context. The subagent runs on Sonnet automatically (`model: sonnet` in its definition).
1. **Launch the `duul-planner` subagent** using the Agent tool with the user's requirements, workspace root path, and any relevant context. The subagent runs on Opus automatically (`model: opus` in its definition).
2. The subagent handles the entire plan ping-pong loop internally:
- Writes a detailed implementation plan
- Calls `request_plan_review` and iterates on REVISE feedback
Expand All @@ -35,7 +35,7 @@ Phase 2 runs on the **main agent (Opus)** for maximum code quality.

7. **Write the actual code** to the project files based on the approved plan (received from the `duul-planner` subagent). Use your edit/write tools to make real changes.
8. **Run lint if available.** Check `package.json` scripts for `lint`, `lint:fix`, or `eslint`, or check for a Makefile/config equivalent. If a lint command exists, run it with auto-fix (e.g. `npm run lint -- --fix` or `npx eslint --fix`). Fix any remaining errors before proceeding. If no lint is configured, skip this step.
9. Call `request_code_review` with the code and the approved plan.
9. Call `request_code_review` with the code and the approved plan. **For large content, avoid huge inline strings** (they can make the tool call collapse to an empty `{}` and fail validation): write the code to `.duul/code.md` and the plan to `.duul/plan.md`, then pass `code_file: ".duul/code.md"` and `approved_plan_file: ".duul/plan.md"` (relative paths) plus `workspace_root`. Inline `code`/`approved_plan` only for small content.
10. If `review_status === "incomplete"`: check `missing_context` and retry with narrower scope.
11. If `blocking_issues.length > 0` or `verdict === "REVISE"`: fix the code in the actual files, re-run lint if applicable, and call again.
12. If `requires_human_review === true`: pause and ask the user.
Expand Down
8 changes: 4 additions & 4 deletions README.ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ DUUL은 [Model Context Protocol](https://modelcontextprotocol.io/) 서버로, MC

호출 에이전트는 각 단계에서 `APPROVE` 판정을 받을 때까지 리뷰어와 반복하고, 이후 다음 단계로 진행합니다. 이를 통해 한 LLM이 다른 LLM의 작업을 검증하는 크로스 모델 리뷰 워크플로우를 만듭니다.

**토큰 효율 설계:** 1단계(계획 작성)는 Sonnet급 서브에이전트에 위임합니다 — 리뷰어가 계획의 문제를 잡아주므로 충분합니다. 2단계(코드 구현)는 최대 코드 품질을 위해 Opus에서 실행됩니다. 이를 통해 1단계 토큰 비용이 약 80% 절감됩니다.
**토큰 효율 설계:** 두 단계 모두 최대 품질을 위해 Opus에서 실행됩니다. 비용을 낮추기 위해, 1단계 플래너는 압축된 "케이브맨" 스타일로 계획을 작성하고, 큰 계획은 거대한 인라인 문자열 대신 파일(`plan_file`)로 제출하며, 리뷰어도 같은 압축 형식으로 결과를 출력합니다.

리뷰어는 **워크스페이스 인식 파일 탐색** 기능을 갖추고 있어, `workspace_root`가 주어지면 7개의 내장 도구(파일 읽기, 코드 검색, 디렉토리 목록 등)를 사용하여 정보에 기반한 리뷰 결정을 내립니다.

Expand Down Expand Up @@ -268,9 +268,9 @@ node scripts/token-report.mjs --plan max20 --all-time

```mermaid
flowchart TD
Start(["사용자: 'DUUL로 개발 진행해줘'"]):::trigger --> Plan["구현 계획 작성\n(Sonnet 서브에이전트)"]:::sonnet
Start(["사용자: 'DUUL로 개발 진행해줘'"]):::trigger --> Plan["구현 계획 작성\n(Opus 서브에이전트)"]:::planner

subgraph Phase1["1단계: 계획 핑퐁 — Sonnet (최대 7회 반복)"]
subgraph Phase1["1단계: 계획 핑퐁 — Opus (최대 7회 반복)"]
Plan --> PR["request_plan_review"]
PR --> IterCheck1{반복\n제한?}
IterCheck1 -- "초과" --> Human1["⏸ requires_human_review: true"]
Expand Down Expand Up @@ -305,7 +305,7 @@ flowchart TD
classDef trigger fill:#e1f5fe,stroke:#0288d1,color:#01579b
classDef approved fill:#e8f5e9,stroke:#388e3c,color:#1b5e20
classDef done fill:#c8e6c9,stroke:#2e7d32,color:#1b5e20,stroke-width:2px
classDef sonnet fill:#fff3e0,stroke:#f57c00,color:#e65100
classDef planner fill:#fff3e0,stroke:#f57c00,color:#e65100
classDef opus fill:#ede7f6,stroke:#7b1fa2,color:#4a148c
```

Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ DUUL is a [Model Context Protocol](https://modelcontextprotocol.io/) server that

The calling agent iterates with the reviewer on each phase until it receives an `APPROVE` verdict, then moves to the next phase. This creates a cross-model peer review workflow where one LLM checks the work of another.

**Token-efficient by design:** Phase 1 (plan authoring) is delegated to a Sonnet-class subagent, since the reviewer catches any plan issues anyway. Phase 2 (code implementation) stays on Opus for maximum code quality. This typically reduces Phase 1 token costs by ~80%.
**Token-efficient by design:** Both phases run on Opus for maximum quality. To keep cost down, the Phase 1 planner writes plans in compressed "caveman" style and submits large plans via a file (`plan_file`) instead of a giant inline string, and the reviewer emits its findings in the same compressed form.

The reviewer has **workspace-aware file exploration** -- when given a `workspace_root`, it can autonomously browse the codebase using 7 built-in tools (read files, search code, list directories, etc.) to make informed review decisions instead of speculating.

Expand Down Expand Up @@ -268,9 +268,9 @@ Reads `~/.duul/usage.jsonl` (set `DUUL_DEBUG_TOKEN=1` in your MCP env to enable

```mermaid
flowchart TD
Start(["User: 'run DUUL'"]):::trigger --> Plan["Write implementation plan\n(Sonnet subagent)"]:::sonnet
Start(["User: 'run DUUL'"]):::trigger --> Plan["Write implementation plan\n(Opus subagent)"]:::planner

subgraph Phase1["Phase 1: Plan Ping-Pong — Sonnet (max 7 iterations)"]
subgraph Phase1["Phase 1: Plan Ping-Pong — Opus (max 7 iterations)"]
Plan --> PR["request_plan_review"]
PR --> IterCheck1{iteration\nlimit?}
IterCheck1 -- "exceeded" --> Human1["⏸ requires_human_review: true"]
Expand Down Expand Up @@ -305,7 +305,7 @@ flowchart TD
classDef trigger fill:#e1f5fe,stroke:#0288d1,color:#01579b
classDef approved fill:#e8f5e9,stroke:#388e3c,color:#1b5e20
classDef done fill:#c8e6c9,stroke:#2e7d32,color:#1b5e20,stroke-width:2px
classDef sonnet fill:#fff3e0,stroke:#f57c00,color:#e65100
classDef planner fill:#fff3e0,stroke:#f57c00,color:#e65100
classDef opus fill:#ede7f6,stroke:#7b1fa2,color:#4a148c
```

Expand Down
4 changes: 2 additions & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

101 changes: 101 additions & 0 deletions src/__tests__/inline-or-file.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
import { test } from 'node:test';
import assert from 'node:assert/strict';
import { mkdtempSync, writeFileSync, rmSync, symlinkSync } from 'node:fs';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { resolveInlineOrFile, type WorkspaceScope } from '../services/filesystem.js';

function makeScope(root: string): WorkspaceScope {
return { root, workingDirectories: null, linkedRoots: [], trackedOnly: false };
}

test('returns inline value when present', async () => {
const result = await resolveInlineOrFile({ inline: 'hello world', file: undefined, scope: null, label: 'plan' });
assert.equal(result, 'hello world');
});

test('reads from file when inline is empty', async () => {
const dir = mkdtempSync(join(tmpdir(), 'duul-inline-'));
try {
writeFileSync(join(dir, 'plan.md'), '## Plan\nfull contents from file');
const result = await resolveInlineOrFile({
inline: undefined,
file: 'plan.md',
scope: makeScope(dir),
label: 'plan',
});
assert.equal(result, '## Plan\nfull contents from file');
} finally {
rmSync(dir, { recursive: true, force: true });
}
});

test('whitespace-only inline falls through to file', async () => {
const dir = mkdtempSync(join(tmpdir(), 'duul-inline-'));
try {
writeFileSync(join(dir, 'plan.md'), 'real content');
const result = await resolveInlineOrFile({
inline: ' \n ',
file: 'plan.md',
scope: makeScope(dir),
label: 'plan',
});
assert.equal(result, 'real content');
} finally {
rmSync(dir, { recursive: true, force: true });
}
});

test('inline takes precedence over file', async () => {
const dir = mkdtempSync(join(tmpdir(), 'duul-inline-'));
try {
writeFileSync(join(dir, 'plan.md'), 'from file');
const result = await resolveInlineOrFile({
inline: 'from inline',
file: 'plan.md',
scope: makeScope(dir),
label: 'plan',
});
assert.equal(result, 'from inline');
} finally {
rmSync(dir, { recursive: true, force: true });
}
});

test('returns undefined when neither inline nor file is provided', async () => {
const result = await resolveInlineOrFile({ inline: undefined, file: undefined, scope: null, label: 'plan' });
assert.equal(result, undefined);
});

test('throws when file is given but no workspace scope is set', async () => {
await assert.rejects(
() => resolveInlineOrFile({ inline: undefined, file: 'plan.md', scope: null, label: 'plan' }),
/plan_file was provided.*no workspace_root/s,
);
});

test('throws when file path escapes the workspace root', async () => {
const dir = mkdtempSync(join(tmpdir(), 'duul-inline-'));
try {
await assert.rejects(
() => resolveInlineOrFile({ inline: undefined, file: '../escape.md', scope: makeScope(dir), label: 'plan' }),
/outside project root/,
);
} finally {
rmSync(dir, { recursive: true, force: true });
}
});

test('blocks an in-root symlink that points at a secret (.env)', async () => {
const dir = mkdtempSync(join(tmpdir(), 'duul-inline-'));
try {
writeFileSync(join(dir, '.env'), 'SECRET=topsecret');
symlinkSync(join(dir, '.env'), join(dir, 'innocent.md'));
await assert.rejects(
() => resolveInlineOrFile({ inline: undefined, file: 'innocent.md', scope: makeScope(dir), label: 'plan' }),
/Access denied \(sensitive file\)/,
);
} finally {
rmSync(dir, { recursive: true, force: true });
}
});
7 changes: 7 additions & 0 deletions src/prompts/code-review-system.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,13 @@ A junior developer wrote code based on an approved plan. You must verify that ev
- Do NOT put actionable corrections in \`non_blocking_suggestions\` to soften the tone — if the code would be more correct or safer with the change, it belongs in \`blocking_issues\` with verdict "REVISE".
- \`confidence\`: Your honest confidence (0-1). If the code is too short to fully evaluate, or context is missing, be honest about it and set \`requires_human_review: true\`.

## Output Style — Compressed (token economy)
Write every free-text VALUE (logic_validation, blocking_issues.description/suggestion, non_blocking_suggestions, vulnerabilities.description, symptom_impact prose, symptom_match_notes) in compressed "caveman" style to save tokens:
- Drop articles (a/an/the), filler (just/really/basically/actually/simply), and pleasantries.
- Prefer fragments over full sentences. Pattern: "[location] [problem]. [fix]." beats prose.
- Use short synonyms (big not extensive, fix not "implement a solution for").
Keep EXACT and uncompressed: JSON keys, enum values (APPROVE/REVISE, severities), \`optimized_snippet\` code, file paths, identifiers, function/type names, and any quoted user text (\`user_original_request_echo\` stays verbatim). Brevity must never drop a required field, soften a blocking issue, or change technical meaning.

## Verdict Calibration
Do NOT conflate positive tone with APPROVE. Code can be "almost perfect" and still require REVISE. The verdict is determined solely by whether blocking_issues is empty:
- blocking_issues is empty → APPROVE is allowed
Expand Down
7 changes: 7 additions & 0 deletions src/prompts/plan-review-system.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,13 @@ You are reviewing a development plan submitted by a junior developer. Your job i
- \`edge_cases\`: List specific scenarios the plan does not account for.
- \`checklist_for_implementation\`: Concrete steps the developer must follow during coding.

## Output Style — Compressed (token economy)
Write every free-text VALUE (architectural_analysis, blocking_issues.description/suggestion, non_blocking_suggestions, edge_cases, checklist_for_implementation, symptom_impact prose, symptom_match_notes) in compressed "caveman" style to save tokens:
- Drop articles (a/an/the), filler (just/really/basically/actually/simply), and pleasantries.
- Prefer fragments over full sentences. Pattern: "[thing] [problem]. [fix]." beats prose.
- Use short synonyms (big not extensive, fix not "implement a solution for").
Keep EXACT and uncompressed: JSON keys, enum values (APPROVE/REVISE, severities), code, file paths, identifiers, function/type names, and any quoted user text (\`user_original_request_echo\` stays verbatim). Brevity must never drop a required field, soften a blocking issue, or change technical meaning.

## Verdict Calibration
Do NOT conflate positive tone with APPROVE. A plan can be "mostly good" or "almost there" and still require REVISE. The verdict is determined solely by whether blocking_issues is empty:
- blocking_issues is empty → APPROVE is allowed (but not required if you have low confidence)
Expand Down
Loading
Loading