Planningo · Suprhimp · Jun 5, 2026 · Jun 5, 2026
diff --git a/.changeset/plan-file-escape-hatch.md b/.changeset/plan-file-escape-hatch.md
@@ -0,0 +1,12 @@
+---
+"@planningo/duul": minor
+---
+
+Fix the recurring `-32602: plan required` (and `code`/`approved_plan` equivalents) failure where a caller's tool call collapsed to an empty `{}` and looped.
+
+Two-part fix:
+
+- **Reachable guards.** The large required string fields (`plan`, `code`, `approved_plan`) are now `optional` at the schema level, so an empty/partial call reaches the handler instead of being rejected pre-handler by the MCP SDK. Callers now get actionable retry guidance instead of an opaque `-32602` zod error. (Partition's short `workspace_root` was also relaxed to `optional` for the same reachability reason — the handler still hard-requires it.)
+- **File escape hatch.** Added `plan_file` (plan review), `code_file` + `approved_plan_file` (code review), and `approved_plan_file` (execution partition). Callers can write large content to a file with a normal Write call and pass a short relative path; the server reads it (scoped, symlink-guarded, `tracked_only` bypassed for the caller's own artifact). This avoids the large-argument serialization failure that made models emit `{}`.
+
+The reviewer system prompts now emit free-text fields in compressed style to reduce output tokens. Exactly one of the inline field or its `*_file` companion is required.
diff --git a/.claude/agents/duul-planner.md b/.claude/agents/duul-planner.md
@@ -1,7 +1,7 @@
 ---
 name: duul-planner
-description: "DUUL Phase 1 plan ping-pong agent. Writes implementation plans and iterates with the plan reviewer until APPROVE. Runs on Sonnet to save tokens."
-model: sonnet
+description: "DUUL Phase 1 plan ping-pong agent. Writes implementation plans and iterates with the plan reviewer until APPROVE. Runs on Opus for plan quality; writes plans in compressed caveman style to save tokens."
+model: opus
 tools: Read, Edit, Write, Bash, Grep, Glob, mcp__duul__request_plan_review
 ---
 
@@ -25,9 +25,12 @@ You will receive:
    - The approach and architecture
    - Edge cases and error handling
    - Dependencies and imports needed
-3. **Call `request_plan_review`** with the plan. ALWAYS include:
-   - `workspace_root`: the workspace root path
-   - `plan`: your detailed plan text
+
+   **Write the plan in compressed "caveman" style to save tokens:** drop articles (a/an/the), filler (just/really/basically), and pleasantries; prefer fragments over full sentences; use short synonyms. Keep EXACT: file paths, identifiers, function/type names, code, and the verbatim quote of the user's request. Brevity must never drop a required section or change technical meaning.
+3. **Submit the plan via file (preferred for reliability).** Write the full plan markdown to `.duul/plan.md` under the workspace root using the Write tool, THEN call `request_plan_review` with `plan_file: ".duul/plan.md"` (relative path). This avoids the large-argument tool-call failure where a big inline `plan` string collapses to an empty `{}`. For a short plan you may inline `plan` instead. ALWAYS include:
+   - `workspace_root`: the workspace root path (required when using `plan_file`)
+   - `plan_file`: `".duul/plan.md"` (relative path) — OR `plan` with the full text inline for short plans
+   - `user_original_request`: the user's verbatim message
    - `project_context`: file tree, changed files, relevant code snippets
    - `artifact_refs`: key files the reviewer should look at
    - `iteration_count`: starts at 1, increment each round
@@ -61,17 +64,24 @@ approved_plan: <the full approved plan text>
 
 ## Tool input rules (CRITICAL)
 
-When calling `request_plan_review`, your tool input MUST include the `plan` field with the full plan markdown as a string. Do **NOT** send an empty `{}` object — that triggers an MCP validation error (`-32602: plan required`).
+A large inline `plan` string is the #1 cause of failed DUUL calls: the model tries to emit a big markdown value inside the tool's large schema and the whole argument object collapses to an empty `{}`, which the MCP server rejects (`-32602: plan required`) — then the call loops.
+
+**The fix: route the large plan through a file.**
 
-**Minimum valid call:**
+1. Write the full plan markdown to `.duul/plan.md` under the workspace root with the **Write** tool (Write has a tiny, reliable schema — big content goes through fine here).
+2. Call `request_plan_review` with a *small* argument object that points at the file:
 
 ```json
 {
-  "plan": "## Problem\n<verbatim user request>\n\n## Files\n- path/to/file.ts: <change>\n\n## Approach\n<...>\n\n## Edge cases\n<...>",
+  "plan_file": ".duul/plan.md",
   "workspace_root": "/absolute/path/to/repo",
   "user_original_request": "<verbatim user message>",
   "iteration_count": 1
 }
 ```
 
-If you find yourself unable to write the plan text in one tool-use turn (e.g. the plan is too long), draft and finalize the plan in your thinking/scratch first, then make a single tool call with the complete `plan` string. **Never call the tool with placeholder, empty, or partial input.** If the tool returns the validation error above, you wrote an empty input — re-read your draft and call again with the full `plan` string populated.
+The server reads `.duul/plan.md` and uses its contents as the plan. `plan_file` must be a **relative** path inside `workspace_root`.
+
+**Short plans only:** you may instead inline `plan` directly. Exactly one of `plan` or `plan_file` is required.
+
+**If a call ever errors:** read the error text — the server now returns actionable guidance (it no longer just says `-32602`). Do **NOT** retry the identical empty call. Switch to the `plan_file` path above. Update the file and call again with the same `plan_file` on each REVISE round.
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -15,11 +15,11 @@ Only activate when the user mentions **"DUUL"** (or **"두울"**) in their reque
 
 **CRITICAL:** This is a continuous, uninterrupted sequence. Do **NOT** pause between phases to ask the user "should I proceed?" or "should I implement?". The user already authorized the full loop when they requested DUUL.
 
-### Phase 1: Upfront-plan Ping-Pong (delegated to Sonnet subagent)
+### Phase 1: Upfront-plan Ping-Pong (delegated to Opus subagent)
 
-**To save tokens, Phase 1 runs on Sonnet via the `duul-planner` subagent.** The reviewer catches any plan issues, so Sonnet is sufficient for plan authoring.
+**Phase 1 runs on Opus via the `duul-planner` subagent for maximum plan quality.** To keep token cost down, the planner writes plans in compressed "caveman" style and submits the plan via a file (`plan_file`) rather than a large inline string.
 
-1. **Launch the `duul-planner` subagent** using the Agent tool with the user's requirements, workspace root path, and any relevant context. The subagent runs on Sonnet automatically (`model: sonnet` in its definition).
+1. **Launch the `duul-planner` subagent** using the Agent tool with the user's requirements, workspace root path, and any relevant context. The subagent runs on Opus automatically (`model: opus` in its definition).
 2. The subagent handles the entire plan ping-pong loop internally:
    - Writes a detailed implementation plan
    - Calls `request_plan_review` and iterates on REVISE feedback
@@ -35,7 +35,7 @@ Phase 2 runs on the **main agent (Opus)** for maximum code quality.
 
 7. **Write the actual code** to the project files based on the approved plan (received from the `duul-planner` subagent). Use your edit/write tools to make real changes.
 8. **Run lint if available.** Check `package.json` scripts for `lint`, `lint:fix`, or `eslint`, or check for a Makefile/config equivalent. If a lint command exists, run it with auto-fix (e.g. `npm run lint -- --fix` or `npx eslint --fix`). Fix any remaining errors before proceeding. If no lint is configured, skip this step.
-9. Call `request_code_review` with the code and the approved plan.
+9. Call `request_code_review` with the code and the approved plan. **For large content, avoid huge inline strings** (they can make the tool call collapse to an empty `{}` and fail validation): write the code to `.duul/code.md` and the plan to `.duul/plan.md`, then pass `code_file: ".duul/code.md"` and `approved_plan_file: ".duul/plan.md"` (relative paths) plus `workspace_root`. Inline `code`/`approved_plan` only for small content.
 10. If `review_status === "incomplete"`: check `missing_context` and retry with narrower scope.
 11. If `blocking_issues.length > 0` or `verdict === "REVISE"`: fix the code in the actual files, re-run lint if applicable, and call again.
 12. If `requires_human_review === true`: pause and ask the user.

diff --git a/README.ko.md b/README.ko.md
@@ -25,7 +25,7 @@ DUUL은 [Model Context Protocol](https://modelcontextprotocol.io/) 서버로, MC
 
 호출 에이전트는 각 단계에서 `APPROVE` 판정을 받을 때까지 리뷰어와 반복하고, 이후 다음 단계로 진행합니다. 이를 통해 한 LLM이 다른 LLM의 작업을 검증하는 크로스 모델 리뷰 워크플로우를 만듭니다.
 
-**토큰 효율 설계:** 1단계(계획 작성)는 Sonnet급 서브에이전트에 위임합니다 — 리뷰어가 계획의 문제를 잡아주므로 충분합니다. 2단계(코드 구현)는 최대 코드 품질을 위해 Opus에서 실행됩니다. 이를 통해 1단계 토큰 비용이 약 80% 절감됩니다.
+**토큰 효율 설계:** 두 단계 모두 최대 품질을 위해 Opus에서 실행됩니다. 비용을 낮추기 위해, 1단계 플래너는 압축된 "케이브맨" 스타일로 계획을 작성하고, 큰 계획은 거대한 인라인 문자열 대신 파일(`plan_file`)로 제출하며, 리뷰어도 같은 압축 형식으로 결과를 출력합니다.
 
 리뷰어는 **워크스페이스 인식 파일 탐색** 기능을 갖추고 있어, `workspace_root`가 주어지면 7개의 내장 도구(파일 읽기, 코드 검색, 디렉토리 목록 등)를 사용하여 정보에 기반한 리뷰 결정을 내립니다.
 
@@ -268,9 +268,9 @@ node scripts/token-report.mjs --plan max20 --all-time
 
 ```mermaid
 flowchart TD
-    Start(["사용자: 'DUUL로 개발 진행해줘'"]):::trigger --> Plan["구현 계획 작성\n(Sonnet 서브에이전트)"]:::sonnet
+    Start(["사용자: 'DUUL로 개발 진행해줘'"]):::trigger --> Plan["구현 계획 작성\n(Opus 서브에이전트)"]:::planner
 
-    subgraph Phase1["1단계: 계획 핑퐁 — Sonnet (최대 7회 반복)"]
+    subgraph Phase1["1단계: 계획 핑퐁 — Opus (최대 7회 반복)"]
         Plan --> PR["request_plan_review"]
         PR --> IterCheck1{반복\n제한?}
         IterCheck1 -- "초과" --> Human1["⏸ requires_human_review: true"]
@@ -305,7 +305,7 @@ flowchart TD
     classDef trigger fill:#e1f5fe,stroke:#0288d1,color:#01579b
     classDef approved fill:#e8f5e9,stroke:#388e3c,color:#1b5e20
     classDef done fill:#c8e6c9,stroke:#2e7d32,color:#1b5e20,stroke-width:2px
-    classDef sonnet fill:#fff3e0,stroke:#f57c00,color:#e65100
+    classDef planner fill:#fff3e0,stroke:#f57c00,color:#e65100
     classDef opus fill:#ede7f6,stroke:#7b1fa2,color:#4a148c
 ```
 

diff --git a/README.md b/README.md
@@ -25,7 +25,7 @@ DUUL is a [Model Context Protocol](https://modelcontextprotocol.io/) server that
 
 The calling agent iterates with the reviewer on each phase until it receives an `APPROVE` verdict, then moves to the next phase. This creates a cross-model peer review workflow where one LLM checks the work of another.
 
-**Token-efficient by design:** Phase 1 (plan authoring) is delegated to a Sonnet-class subagent, since the reviewer catches any plan issues anyway. Phase 2 (code implementation) stays on Opus for maximum code quality. This typically reduces Phase 1 token costs by ~80%.
+**Token-efficient by design:** Both phases run on Opus for maximum quality. To keep cost down, the Phase 1 planner writes plans in compressed "caveman" style and submits large plans via a file (`plan_file`) instead of a giant inline string, and the reviewer emits its findings in the same compressed form.
 
 The reviewer has **workspace-aware file exploration** -- when given a `workspace_root`, it can autonomously browse the codebase using 7 built-in tools (read files, search code, list directories, etc.) to make informed review decisions instead of speculating.
 
@@ -268,9 +268,9 @@ Reads `~/.duul/usage.jsonl` (set `DUUL_DEBUG_TOKEN=1` in your MCP env to enable
 
 ```mermaid
 flowchart TD
-    Start(["User: 'run DUUL'"]):::trigger --> Plan["Write implementation plan\n(Sonnet subagent)"]:::sonnet
+    Start(["User: 'run DUUL'"]):::trigger --> Plan["Write implementation plan\n(Opus subagent)"]:::planner
 
-    subgraph Phase1["Phase 1: Plan Ping-Pong — Sonnet (max 7 iterations)"]
+    subgraph Phase1["Phase 1: Plan Ping-Pong — Opus (max 7 iterations)"]
         Plan --> PR["request_plan_review"]
         PR --> IterCheck1{iteration\nlimit?}
         IterCheck1 -- "exceeded" --> Human1["⏸ requires_human_review: true"]
@@ -305,7 +305,7 @@ flowchart TD
     classDef trigger fill:#e1f5fe,stroke:#0288d1,color:#01579b
     classDef approved fill:#e8f5e9,stroke:#388e3c,color:#1b5e20
     classDef done fill:#c8e6c9,stroke:#2e7d32,color:#1b5e20,stroke-width:2px
-    classDef sonnet fill:#fff3e0,stroke:#f57c00,color:#e65100
+    classDef planner fill:#fff3e0,stroke:#f57c00,color:#e65100
     classDef opus fill:#ede7f6,stroke:#7b1fa2,color:#4a148c
 ```
 

diff --git a/package-lock.json b/package-lock.json
diff --git a/src/__tests__/inline-or-file.test.ts b/src/__tests__/inline-or-file.test.ts
@@ -0,0 +1,101 @@
+import { test } from 'node:test';
+import assert from 'node:assert/strict';
+import { mkdtempSync, writeFileSync, rmSync, symlinkSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+import { join } from 'node:path';
+import { resolveInlineOrFile, type WorkspaceScope } from '../services/filesystem.js';
+
+function makeScope(root: string): WorkspaceScope {
+  return { root, workingDirectories: null, linkedRoots: [], trackedOnly: false };
+}
+
+test('returns inline value when present', async () => {
+  const result = await resolveInlineOrFile({ inline: 'hello world', file: undefined, scope: null, label: 'plan' });
+  assert.equal(result, 'hello world');
+});
+
+test('reads from file when inline is empty', async () => {
+  const dir = mkdtempSync(join(tmpdir(), 'duul-inline-'));
+  try {
+    writeFileSync(join(dir, 'plan.md'), '## Plan\nfull contents from file');
+    const result = await resolveInlineOrFile({
+      inline: undefined,
+      file: 'plan.md',
+      scope: makeScope(dir),
+      label: 'plan',
+    });
+    assert.equal(result, '## Plan\nfull contents from file');
+  } finally {
+    rmSync(dir, { recursive: true, force: true });
+  }
+});
+
+test('whitespace-only inline falls through to file', async () => {
+  const dir = mkdtempSync(join(tmpdir(), 'duul-inline-'));
+  try {
+    writeFileSync(join(dir, 'plan.md'), 'real content');
+    const result = await resolveInlineOrFile({
+      inline: '   \n  ',
+      file: 'plan.md',
+      scope: makeScope(dir),
+      label: 'plan',
+    });
+    assert.equal(result, 'real content');
+  } finally {
+    rmSync(dir, { recursive: true, force: true });
+  }
+});
+
+test('inline takes precedence over file', async () => {
+  const dir = mkdtempSync(join(tmpdir(), 'duul-inline-'));
+  try {
+    writeFileSync(join(dir, 'plan.md'), 'from file');
+    const result = await resolveInlineOrFile({
+      inline: 'from inline',
+      file: 'plan.md',
+      scope: makeScope(dir),
+      label: 'plan',
+    });
+    assert.equal(result, 'from inline');
+  } finally {
+    rmSync(dir, { recursive: true, force: true });
+  }
+});
+
+test('returns undefined when neither inline nor file is provided', async () => {
+  const result = await resolveInlineOrFile({ inline: undefined, file: undefined, scope: null, label: 'plan' });
+  assert.equal(result, undefined);
+});
+
+test('throws when file is given but no workspace scope is set', async () => {
+  await assert.rejects(
+    () => resolveInlineOrFile({ inline: undefined, file: 'plan.md', scope: null, label: 'plan' }),
+    /plan_file was provided.*no workspace_root/s,
+  );
+});
+
+test('throws when file path escapes the workspace root', async () => {
+  const dir = mkdtempSync(join(tmpdir(), 'duul-inline-'));
+  try {
+    await assert.rejects(
+      () => resolveInlineOrFile({ inline: undefined, file: '../escape.md', scope: makeScope(dir), label: 'plan' }),
+      /outside project root/,
+    );
+  } finally {
+    rmSync(dir, { recursive: true, force: true });
+  }
+});
+
+test('blocks an in-root symlink that points at a secret (.env)', async () => {
+  const dir = mkdtempSync(join(tmpdir(), 'duul-inline-'));
+  try {
+    writeFileSync(join(dir, '.env'), 'SECRET=topsecret');
+    symlinkSync(join(dir, '.env'), join(dir, 'innocent.md'));
+    await assert.rejects(
+      () => resolveInlineOrFile({ inline: undefined, file: 'innocent.md', scope: makeScope(dir), label: 'plan' }),
+      /Access denied \(sensitive file\)/,
+    );
+  } finally {
+    rmSync(dir, { recursive: true, force: true });
+  }
+});
diff --git a/src/prompts/code-review-system.ts b/src/prompts/code-review-system.ts
@@ -30,6 +30,13 @@ A junior developer wrote code based on an approved plan. You must verify that ev
 - Do NOT put actionable corrections in \`non_blocking_suggestions\` to soften the tone — if the code would be more correct or safer with the change, it belongs in \`blocking_issues\` with verdict "REVISE".
 - \`confidence\`: Your honest confidence (0-1). If the code is too short to fully evaluate, or context is missing, be honest about it and set \`requires_human_review: true\`.
 
+## Output Style — Compressed (token economy)
+Write every free-text VALUE (logic_validation, blocking_issues.description/suggestion, non_blocking_suggestions, vulnerabilities.description, symptom_impact prose, symptom_match_notes) in compressed "caveman" style to save tokens:
+- Drop articles (a/an/the), filler (just/really/basically/actually/simply), and pleasantries.
+- Prefer fragments over full sentences. Pattern: "[location] [problem]. [fix]." beats prose.
+- Use short synonyms (big not extensive, fix not "implement a solution for").
+Keep EXACT and uncompressed: JSON keys, enum values (APPROVE/REVISE, severities), \`optimized_snippet\` code, file paths, identifiers, function/type names, and any quoted user text (\`user_original_request_echo\` stays verbatim). Brevity must never drop a required field, soften a blocking issue, or change technical meaning.
+
 ## Verdict Calibration
 Do NOT conflate positive tone with APPROVE. Code can be "almost perfect" and still require REVISE. The verdict is determined solely by whether blocking_issues is empty:
 - blocking_issues is empty → APPROVE is allowed

diff --git a/src/prompts/plan-review-system.ts b/src/prompts/plan-review-system.ts
@@ -28,6 +28,13 @@ You are reviewing a development plan submitted by a junior developer. Your job i
 - \`edge_cases\`: List specific scenarios the plan does not account for.
 - \`checklist_for_implementation\`: Concrete steps the developer must follow during coding.
 
+## Output Style — Compressed (token economy)
+Write every free-text VALUE (architectural_analysis, blocking_issues.description/suggestion, non_blocking_suggestions, edge_cases, checklist_for_implementation, symptom_impact prose, symptom_match_notes) in compressed "caveman" style to save tokens:
+- Drop articles (a/an/the), filler (just/really/basically/actually/simply), and pleasantries.
+- Prefer fragments over full sentences. Pattern: "[thing] [problem]. [fix]." beats prose.
+- Use short synonyms (big not extensive, fix not "implement a solution for").
+Keep EXACT and uncompressed: JSON keys, enum values (APPROVE/REVISE, severities), code, file paths, identifiers, function/type names, and any quoted user text (\`user_original_request_echo\` stays verbatim). Brevity must never drop a required field, soften a blocking issue, or change technical meaning.
+
 ## Verdict Calibration
 Do NOT conflate positive tone with APPROVE. A plan can be "mostly good" or "almost there" and still require REVISE. The verdict is determined solely by whether blocking_issues is empty:
 - blocking_issues is empty → APPROVE is allowed (but not required if you have low confidence)