diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index b733539..029a9c4 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -11,7 +11,7 @@ "name": "look", "source": "./src", "description": "Sequential code review with fresh agent contexts. Runs multiple independent review passes to catch more issues.", - "version": "0.2.1", + "version": "0.3.0", "author": { "name": "HartBrook" }, "repository": "https://github.com/HartBrook/lookagain", "license": "MIT", diff --git a/CHANGELOG.md b/CHANGELOG.md index 4e72da5..a83a8c6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,18 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [0.3.0] - 2026-01-28 + +### Changed + +- **Breaking**: Replaced unsupported `arguments:` frontmatter array and `$ARGUMENTS.` dot-access syntax with agent-side parsing of `$ARGUMENTS`. Claude Code only supports `$ARGUMENTS` (whole string) and `$ARGUMENTS[N]` (positional) — the named dot-access syntax was never interpolated, causing the agent to see literal placeholder text and fall back to safe defaults. +- Commands now include a "Parse arguments" section with a defaults table. The agent parses `key=value` pairs from the raw `$ARGUMENTS` string and applies documented defaults for missing keys. +- Commands log resolved configuration to make argument values visible and debuggable. +- Frontmatter uses `argument-hint` (supported) instead of `arguments:` (unsupported). +- Updated static tests to enforce the new pattern and reject the old `arguments:` frontmatter. +- Updated behavioral evals with new test cases for empty arguments (defaults) and partial arguments. +- Updated CONTRIBUTING.md prompt authoring guidance for the correct argument pattern. + ## [0.2.1] - 2026-01-28 ### Fixed diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 11af6d0..7cb8827 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -67,7 +67,7 @@ make test make eval ``` -`make test` runs fast, offline checks that validate plugin structure: file existence, JSON validity, frontmatter fields, cross-references between manifests, and that all frontmatter arguments are referenced as `$ARGUMENTS.` in the instruction body (not just in display sections). +`make test` runs fast, offline checks that validate plugin structure: file existence, JSON validity, frontmatter fields, cross-references between manifests, and that commands accepting arguments use the correct pattern (`argument-hint` in frontmatter, `$ARGUMENTS` placeholder, and a defaults table in the body). `make eval` runs [promptfoo](https://promptfoo.dev) evals that send the interpolated prompts to Claude and assert on behavioral correctness. For example, it verifies that `auto-fix=false` causes the model to skip fixes, and that `passes=5` results in 5 planned passes. @@ -120,9 +120,11 @@ You can also test the plugin through the marketplace install flow, which is clos When editing or adding command prompts in `src/commands/`: -- Define arguments in the YAML frontmatter with `name`, `description`, and `default`. -- Reference arguments in the instruction body using `$ARGUMENTS.` — not just in display sections. The executing agent needs to see the interpolated value at the point where it makes decisions. For example, write `If $ARGUMENTS.auto-fix is true` rather than `If auto-fix is enabled`. -- `make test` enforces that every frontmatter argument appears as `$ARGUMENTS.` somewhere in the body. If you add an argument, the test will fail until you reference it. +- **Use `$ARGUMENTS` for the raw string.** Claude Code replaces `$ARGUMENTS` with whatever the user typed after the command name. There is no `$ARGUMENTS.name` dot-access syntax — only `$ARGUMENTS` (whole string) and `$ARGUMENTS[N]` (positional). Do NOT use an `arguments:` array in frontmatter — it is not a supported Claude Code feature and will not be interpolated. +- **Add `argument-hint`** in frontmatter to document expected input format (e.g., `argument-hint: "[key=value ...]"`). +- **Include a defaults table** in the body listing each key, its default, and a description. Instruct the agent to parse `key=value` pairs from `$ARGUMENTS` and fall back to defaults for missing keys. +- **Log the resolved configuration** so it is visible in output and reviewable in evals. +- `make test` enforces that commands using `$ARGUMENTS` have `argument-hint` in frontmatter, a defaults table in the body, and do NOT use the unsupported `arguments:` frontmatter array. - After changing prompt logic, run `make eval` to verify models still interpret the arguments correctly. ## Pull Requests diff --git a/evals/prompt-loader.js b/evals/prompt-loader.js index 84daccc..a2ea6fc 100644 --- a/evals/prompt-loader.js +++ b/evals/prompt-loader.js @@ -1,6 +1,7 @@ -// Loads a markdown command file, strips frontmatter, interpolates -// $ARGUMENTS.* tokens with test-case variables, and prepends a -// meta-instruction so the model describes its plan without executing. +// Loads a markdown command file, strips frontmatter, replaces the +// $ARGUMENTS placeholder with a key=value string built from test-case +// variables, and prepends a meta-instruction so the model describes +// its plan without executing. const fs = require("fs"); const path = require("path"); @@ -18,18 +19,17 @@ function generatePrompt(context) { // Strip YAML frontmatter (between opening and closing ---) const stripped = raw.replace(/^---\n[\s\S]*?\n---\n/, ""); - // Replace $ARGUMENTS. with matching arg_ variable. - // Argument names may contain hyphens (e.g. auto-fix, max-passes). - const interpolated = stripped.replace( - /\$ARGUMENTS\.([\w-]+)/g, - (_match, name) => { - const key = `arg_${name}`; - if (key in vars) { - return vars[key]; - } - return _match; // leave unresolved tokens as-is - }, - ); + // Build a key=value argument string from all vars prefixed with arg_. + // e.g. { arg_passes: "5", arg_auto-fix: "false" } → "passes=5 auto-fix=false" + const argPairs = Object.entries(vars) + .filter(([k]) => k.startsWith("arg_")) + .map(([k, v]) => `${k.slice(4)}=${v}`) + .join(" "); + + // Replace the $ARGUMENTS placeholder with the built argument string. + // This mirrors what Claude Code does at runtime: $ARGUMENTS is replaced + // with the raw text the user typed after the command name. + const interpolated = stripped.replace(/\$ARGUMENTS/g, argPairs); const meta = [ "You are analyzing a Claude Code plugin command prompt.", diff --git a/evals/promptfooconfig.yaml b/evals/promptfooconfig.yaml index 46fee81..49f322d 100644 --- a/evals/promptfooconfig.yaml +++ b/evals/promptfooconfig.yaml @@ -1,4 +1,4 @@ -description: "Behavioral evals for lookagain prompt interpolation" +description: "Behavioral evals for lookagain argument handling" prompts: - file://prompt-loader.js @@ -9,6 +9,22 @@ providers: max_tokens: 2048 tests: + # ================================================================== + # again.md — no arguments (defaults) + # ================================================================== + - description: "no arguments → model uses all defaults (auto-fix=true, passes=3, thorough)" + vars: + prompt_file: src/commands/again.md + assert: + - type: llm-rubric + value: > + The response must use default values for all settings since the + argument string is empty: 3 passes, staged target, auto-fix true, + thorough model, and max-passes 7. It must describe applying fixes + for must_fix issues (since auto-fix defaults to true). It should + NOT say arguments are missing or unavailable — an empty string + simply means the user wants all defaults. + # ================================================================== # again.md — auto-fix interpretation # ================================================================== @@ -114,6 +130,21 @@ tests: reviewing all changes on the current branch versus the base branch. It should reference branch comparison or merge-base. + # ================================================================== + # again.md — partial arguments (only auto-fix specified) + # ================================================================== + - description: "only auto-fix=false → defaults for everything else, no fixing" + vars: + prompt_file: src/commands/again.md + arg_auto-fix: "false" + assert: + - type: llm-rubric + value: > + The response must parse auto-fix=false from the argument string + and use defaults for all other settings (3 passes, staged target, + thorough model, max-passes 7). It must NOT apply any fixes since + auto-fix is explicitly false. + # ================================================================== # tidy.md — all flag # ================================================================== @@ -139,3 +170,19 @@ tests: The response must describe calculating a cutoff date by subtracting 3 days from today, and only removing runs older than that cutoff. It should keep runs from the last 3 days. + + # ================================================================== + # tidy.md — no arguments (defaults) + # ================================================================== + - description: "tidy with no arguments → keep=1, all=false" + vars: + prompt_file: src/commands/tidy.md + assert: + - type: llm-rubric + value: > + The response must use default values since the argument string is + empty: keep=1 (keep runs from the last 1 day) and all=false (do + NOT remove all runs). It should describe date-based filtering + with a 1-day retention window. An empty argument string means + the user wants all defaults — it should NOT say arguments are + missing or unavailable. diff --git a/scripts/test.sh b/scripts/test.sh index 637098f..87b8198 100755 --- a/scripts/test.sh +++ b/scripts/test.sh @@ -129,28 +129,20 @@ test_frontmatter() { check_frontmatter "$PROJECT_ROOT/src/skills/lookagain-output-format/SKILL.md" name description } -test_argument_interpolation() { - # Verify that arguments defined in frontmatter are referenced using - # $ARGUMENTS. syntax in the instruction body, not just in the - # Configuration display section. This prevents the executing agent - # from missing argument values and falling back to safe defaults. +test_argument_handling() { + # Verify that command files using arguments follow the correct pattern: + # 1. Frontmatter has argument-hint (not the unsupported arguments: array) + # 2. Body contains $ARGUMENTS placeholder for the raw argument string + # 3. Body contains a defaults table with Key/Default columns + # This ensures the agent receives and parses arguments at runtime + # rather than relying on non-existent compile-time interpolation. for file in "$PROJECT_ROOT"/src/commands/*.md; do local relpath="${file#"$PROJECT_ROOT"/}" - # Extract argument names from frontmatter - local args - args=$(awk ' - NR==1 && /^---$/ { in_fm=1; next } - in_fm && /^---$/ { exit } - in_fm && /^ - name: / { gsub(/^ - name: /, ""); print } - ' "$file") - - if [[ -z "$args" ]]; then - continue - fi + local frontmatter + frontmatter=$(awk 'NR==1{next} /^---$/{exit} {print}' "$file") - # Extract the body (everything after the second ---) local body body=$(awk ' NR==1 && /^---$/ { in_fm=1; next } @@ -158,19 +150,30 @@ test_argument_interpolation() { !in_fm { print } ' "$file") - # For each argument, verify $ARGUMENTS. appears in the body - local all_found=1 - while IFS= read -r arg; do - local ref="\$ARGUMENTS.${arg}" - if ! echo "$body" | grep -qF "$ref"; then - fail "$relpath: argument '$arg' defined but \$ARGUMENTS.$arg never used in body" - all_found=0 - fi - done <<< "$args" - - if [[ $all_found -eq 1 ]]; then - pass "$relpath: all arguments interpolated in body" + # Skip commands that don't accept arguments + if ! echo "$body" | grep -qF '$ARGUMENTS'; then + continue + fi + + # Must NOT use the unsupported arguments: array in frontmatter + if echo "$frontmatter" | grep -q "^arguments:"; then + fail "$relpath: uses unsupported 'arguments:' frontmatter — use 'argument-hint:' and agent-side parsing instead" + continue + fi + + # Must have argument-hint in frontmatter + if ! echo "$frontmatter" | grep -q "^argument-hint:"; then + fail "$relpath: missing 'argument-hint:' in frontmatter" + continue fi + + # Must have a defaults table (Key | Default header pattern) + if ! echo "$body" | grep -q "| Key | Default"; then + fail "$relpath: missing defaults table (expected '| Key | Default |' header)" + continue + fi + + pass "$relpath: argument handling correct" done } @@ -369,8 +372,8 @@ echo "--- frontmatter ---" test_frontmatter echo "" -echo "--- argument interpolation ---" -test_argument_interpolation +echo "--- argument handling ---" +test_argument_handling echo "" echo "--- cross-references ---" diff --git a/src/commands/again.md b/src/commands/again.md index 7243ae4..4c2f395 100644 --- a/src/commands/again.md +++ b/src/commands/again.md @@ -1,45 +1,53 @@ --- name: again description: Run sequential code review passes with fresh contexts to catch more issues -arguments: - - name: passes - description: Number of review passes to run - default: "3" - - name: target - description: "What to review: staged, commit, branch, or a file/directory path" - default: "staged" - - name: auto-fix - description: Automatically fix must_fix issues between passes (true/false) - default: "true" - - name: model - description: "Reviewer model: fast (haiku), balanced (sonnet), thorough (inherit)" - default: "thorough" - - name: max-passes - description: Maximum passes if must_fix issues persist - default: "7" +argument-hint: "[key=value ...]" --- # Iterative Code Review You orchestrate sequential, multi-pass code review. Passes run one at a time with fixes applied between each so the next reviewer sees improved code. -## Configuration +## Phase 0: Setup -- **Passes**: $ARGUMENTS.passes -- **Target**: $ARGUMENTS.target -- **Auto-fix**: $ARGUMENTS.auto-fix -- **Model**: $ARGUMENTS.model -- **Max passes**: $ARGUMENTS.max-passes +### Parse arguments -## Phase 0: Setup +The user may pass key=value pairs after the command name. The raw argument string is: + +> $ARGUMENTS + +If the argument string is empty or blank, the user provided no overrides — use all defaults. Parse `key=value` pairs from the string. For any key not provided, use the default. + +| Key | Default | Description | +|---|---|---| +| `passes` | `3` | Number of review passes to run | +| `target` | `staged` | What to review: `staged`, `commit`, `branch`, or a file/directory path | +| `auto-fix` | `true` | Automatically fix must_fix issues between passes (`true` or `false`) | +| `model` | `thorough` | Reviewer model: `fast` (haiku), `balanced` (sonnet), `thorough` (inherit) | +| `max-passes` | `7` | Maximum passes if must_fix issues persist | + +**Log the resolved configuration** so it is visible in the output: + +``` +Configuration: + passes: + target: + auto-fix: + model: + max-passes: +``` + +### Generate run ID + +Generate a run ID: `YYYY-MM-DDTHH-MM-SS`. All output goes under `.lookagain//`. + +### Orchestrator role -1. Generate a run ID: `YYYY-MM-DDTHH-MM-SS`. All output goes under `.lookagain//`. -2. Do NOT read the codebase yourself. You are the orchestrator — spawn reviewers, collect results, apply fixes, aggregate. -3. Create a TodoWrite list with one item per pass plus aggregation. Mark items `in_progress` when starting and `completed` when done. +Do NOT read the codebase yourself. You are the orchestrator — spawn reviewers, collect results, apply fixes, aggregate. Create a TodoWrite list with one item per pass plus aggregation. Mark items `in_progress` when starting and `completed` when done. ### Resolve scope -Determine the scope instruction to pass to each reviewer: +Using the resolved `target` value, determine the scope instruction to pass to each reviewer: | Target value | Scope instruction for reviewer | |---|---| @@ -50,7 +58,7 @@ Determine the scope instruction to pass to each reviewer: ### Resolve model -Map the model argument to the Task tool model parameter: +Using the resolved `model` value, map to the Task tool model parameter: | Model value | Task model | |---|---| @@ -62,17 +70,17 @@ Map the model argument to the Task tool model parameter: CRITICAL: Passes run in sequence, NOT in parallel. Each pass reviews code after previous fixes. -For each pass (1 through $ARGUMENTS.passes): +For each pass (1 through the resolved `passes` value): **Review**: Spawn a fresh subagent via the Task tool using the `lookagain-reviewer` agent. Include: pass number, scope instruction, and instruction to use the `lookagain-output-format` skill. Set the model parameter based on the resolved model. Do NOT include findings from previous passes. **Collect**: Parse the JSON response. Store findings and track which pass found each issue. -**Fix**: If `$ARGUMENTS.auto-fix` is `true`, apply fixes for `must_fix` issues only. Minimal changes, no refactoring. +**Fix**: If the resolved `auto-fix` value is `true`, apply fixes for `must_fix` issues only. Minimal changes, no refactoring. If `auto-fix` is `false`, skip this step entirely — do not apply any fixes. **Log**: "Pass N complete. Found X must_fix, Y should_fix, Z suggestions." -After completing $ARGUMENTS.passes passes, if `must_fix` issues remain and total passes < $ARGUMENTS.max-passes, run additional passes. +After completing the configured number of passes, if `must_fix` issues remain and total passes < the resolved `max-passes` value, run additional passes. ## Phase 2: Aggregate @@ -122,6 +130,6 @@ Include the count of previous runs (glob `.lookagain/????-??-??T??-??-??/`, subt 1. **Sequential**: Never launch passes in parallel. Each must complete before the next starts. 2. **Fresh context**: Always use the Task tool for subagents. 3. **Independence**: Never tell subagents what previous passes found. -4. **Minimal fixes**: Only change what's necessary when `$ARGUMENTS.auto-fix` is `true`. +4. **Minimal fixes**: Only apply fixes when the resolved `auto-fix` value is `true`. 5. **Valid JSON**: If subagent output fails to parse, log the error and continue. -6. **Respect max-passes**: Never exceed $ARGUMENTS.max-passes. +6. **Respect max-passes**: Never exceed the resolved `max-passes` value. diff --git a/src/commands/tidy.md b/src/commands/tidy.md index d6c4e0d..b79b221 100644 --- a/src/commands/tidy.md +++ b/src/commands/tidy.md @@ -2,19 +2,26 @@ name: tidy description: Remove old lookagain review runs, keeping today's results by default tools: Glob, Bash(rm -rf .lookagain/????-??-??T??-??-??) -arguments: - - name: keep - description: "Keep runs from the last N days (default: 1)" - default: "1" - - name: all - description: "Remove all runs including today's (true/false)" - default: "false" +argument-hint: "[key=value ...]" --- # Tidy Old Review Runs You are cleaning up old review results from `.lookagain/`. +## Parse arguments + +The user may pass key=value pairs after the command name. The raw argument string is: + +> $ARGUMENTS + +If the argument string is empty or blank, the user provided no overrides — use all defaults. Parse `key=value` pairs from the string. For any key not provided, use the default. + +| Key | Default | Description | +|---|---|---| +| `keep` | `1` | Keep runs from the last N days | +| `all` | `false` | Remove all runs including today's (`true` or `false`) | + ## Process 1. Use Glob to list all directories under `.lookagain/` matching the pattern `.lookagain/????-??-??T??-??-??/`. @@ -22,8 +29,8 @@ You are cleaning up old review results from `.lookagain/`. 2. If no run directories exist, tell the user there's nothing to tidy and stop. 3. Determine which runs to remove: - - If `$ARGUMENTS.all` is `true`, remove ALL run directories. - - Otherwise, calculate the cutoff date by subtracting `$ARGUMENTS.keep` days from today's date. Remove only run directories whose date portion (the `YYYY-MM-DD` prefix of the directory name) is strictly before the cutoff date. + - If the resolved `all` value is `true`, remove ALL run directories. + - Otherwise, calculate the cutoff date by subtracting the resolved `keep` value (in days) from today's date. Remove only run directories whose date portion (the `YYYY-MM-DD` prefix of the directory name) is strictly before the cutoff date. 4. Before deleting, validate each directory name matches the exact pattern `YYYY-MM-DDTHH-MM-SS` (all digits in the right positions). Skip any directory that doesn't match. @@ -43,5 +50,5 @@ If nothing was removed, say so: ``` ## Nothing to Tidy -All N run(s) are within the last $ARGUMENTS.keep day(s). +All N run(s) are within the keep window. ``` diff --git a/src/dot-claude-plugin/plugin.json b/src/dot-claude-plugin/plugin.json index 082f580..a60ffe1 100644 --- a/src/dot-claude-plugin/plugin.json +++ b/src/dot-claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "look", - "version": "0.2.1", + "version": "0.3.0", "description": "Sequential code review with fresh agent contexts. Runs multiple independent review passes to catch more issues.", "author": { "name": "HartBrook" }, "repository": "https://github.com/HartBrook/lookagain",