Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"name": "look",
"source": "./src",
"description": "Sequential code review with fresh agent contexts. Runs multiple independent review passes to catch more issues.",
"version": "0.2.1",
"version": "0.3.0",
"author": { "name": "HartBrook" },
"repository": "https://github.com/HartBrook/lookagain",
"license": "MIT",
Expand Down
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,18 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.3.0] - 2026-01-28

### Changed

- **Breaking**: Replaced unsupported `arguments:` frontmatter array and `$ARGUMENTS.<name>` dot-access syntax with agent-side parsing of `$ARGUMENTS`. Claude Code only supports `$ARGUMENTS` (whole string) and `$ARGUMENTS[N]` (positional) — the named dot-access syntax was never interpolated, causing the agent to see literal placeholder text and fall back to safe defaults.
- Commands now include a "Parse arguments" section with a defaults table. The agent parses `key=value` pairs from the raw `$ARGUMENTS` string and applies documented defaults for missing keys.
- Commands log resolved configuration to make argument values visible and debuggable.
- Frontmatter uses `argument-hint` (supported) instead of `arguments:` (unsupported).
- Updated static tests to enforce the new pattern and reject the old `arguments:` frontmatter.
- Updated behavioral evals with new test cases for empty arguments (defaults) and partial arguments.
- Updated CONTRIBUTING.md prompt authoring guidance for the correct argument pattern.

## [0.2.1] - 2026-01-28

### Fixed
Expand Down
10 changes: 6 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ make test
make eval
```

`make test` runs fast, offline checks that validate plugin structure: file existence, JSON validity, frontmatter fields, cross-references between manifests, and that all frontmatter arguments are referenced as `$ARGUMENTS.<name>` in the instruction body (not just in display sections).
`make test` runs fast, offline checks that validate plugin structure: file existence, JSON validity, frontmatter fields, cross-references between manifests, and that commands accepting arguments use the correct pattern (`argument-hint` in frontmatter, `$ARGUMENTS` placeholder, and a defaults table in the body).

`make eval` runs [promptfoo](https://promptfoo.dev) evals that send the interpolated prompts to Claude and assert on behavioral correctness. For example, it verifies that `auto-fix=false` causes the model to skip fixes, and that `passes=5` results in 5 planned passes.

Expand Down Expand Up @@ -120,9 +120,11 @@ You can also test the plugin through the marketplace install flow, which is clos

When editing or adding command prompts in `src/commands/`:

- Define arguments in the YAML frontmatter with `name`, `description`, and `default`.
- Reference arguments in the instruction body using `$ARGUMENTS.<name>` — not just in display sections. The executing agent needs to see the interpolated value at the point where it makes decisions. For example, write `If $ARGUMENTS.auto-fix is true` rather than `If auto-fix is enabled`.
- `make test` enforces that every frontmatter argument appears as `$ARGUMENTS.<name>` somewhere in the body. If you add an argument, the test will fail until you reference it.
- **Use `$ARGUMENTS` for the raw string.** Claude Code replaces `$ARGUMENTS` with whatever the user typed after the command name. There is no `$ARGUMENTS.name` dot-access syntax — only `$ARGUMENTS` (whole string) and `$ARGUMENTS[N]` (positional). Do NOT use an `arguments:` array in frontmatter — it is not a supported Claude Code feature and will not be interpolated.
- **Add `argument-hint`** in frontmatter to document expected input format (e.g., `argument-hint: "[key=value ...]"`).
- **Include a defaults table** in the body listing each key, its default, and a description. Instruct the agent to parse `key=value` pairs from `$ARGUMENTS` and fall back to defaults for missing keys.
- **Log the resolved configuration** so it is visible in output and reviewable in evals.
- `make test` enforces that commands using `$ARGUMENTS` have `argument-hint` in frontmatter, a defaults table in the body, and do NOT use the unsupported `arguments:` frontmatter array.
- After changing prompt logic, run `make eval` to verify models still interpret the arguments correctly.

## Pull Requests
Expand Down
30 changes: 15 additions & 15 deletions evals/prompt-loader.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
// Loads a markdown command file, strips frontmatter, interpolates
// $ARGUMENTS.* tokens with test-case variables, and prepends a
// meta-instruction so the model describes its plan without executing.
// Loads a markdown command file, strips frontmatter, replaces the
// $ARGUMENTS placeholder with a key=value string built from test-case
// variables, and prepends a meta-instruction so the model describes
// its plan without executing.

const fs = require("fs");
const path = require("path");
Expand All @@ -18,18 +19,17 @@ function generatePrompt(context) {
// Strip YAML frontmatter (between opening and closing ---)
const stripped = raw.replace(/^---\n[\s\S]*?\n---\n/, "");

// Replace $ARGUMENTS.<name> with matching arg_<name> variable.
// Argument names may contain hyphens (e.g. auto-fix, max-passes).
const interpolated = stripped.replace(
/\$ARGUMENTS\.([\w-]+)/g,
(_match, name) => {
const key = `arg_${name}`;
if (key in vars) {
return vars[key];
}
return _match; // leave unresolved tokens as-is
},
);
// Build a key=value argument string from all vars prefixed with arg_.
// e.g. { arg_passes: "5", arg_auto-fix: "false" } → "passes=5 auto-fix=false"
const argPairs = Object.entries(vars)
.filter(([k]) => k.startsWith("arg_"))
.map(([k, v]) => `${k.slice(4)}=${v}`)
.join(" ");

// Replace the $ARGUMENTS placeholder with the built argument string.
// This mirrors what Claude Code does at runtime: $ARGUMENTS is replaced
// with the raw text the user typed after the command name.
const interpolated = stripped.replace(/\$ARGUMENTS/g, argPairs);

const meta = [
"You are analyzing a Claude Code plugin command prompt.",
Expand Down
49 changes: 48 additions & 1 deletion evals/promptfooconfig.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
description: "Behavioral evals for lookagain prompt interpolation"
description: "Behavioral evals for lookagain argument handling"

prompts:
- file://prompt-loader.js
Expand All @@ -9,6 +9,22 @@ providers:
max_tokens: 2048

tests:
# ==================================================================
# again.md — no arguments (defaults)
# ==================================================================
- description: "no arguments → model uses all defaults (auto-fix=true, passes=3, thorough)"
vars:
prompt_file: src/commands/again.md
assert:
- type: llm-rubric
value: >
The response must use default values for all settings since the
argument string is empty: 3 passes, staged target, auto-fix true,
thorough model, and max-passes 7. It must describe applying fixes
for must_fix issues (since auto-fix defaults to true). It should
NOT say arguments are missing or unavailable — an empty string
simply means the user wants all defaults.

# ==================================================================
# again.md — auto-fix interpretation
# ==================================================================
Expand Down Expand Up @@ -114,6 +130,21 @@ tests:
reviewing all changes on the current branch versus the base
branch. It should reference branch comparison or merge-base.

# ==================================================================
# again.md — partial arguments (only auto-fix specified)
# ==================================================================
- description: "only auto-fix=false → defaults for everything else, no fixing"
vars:
prompt_file: src/commands/again.md
arg_auto-fix: "false"
assert:
- type: llm-rubric
value: >
The response must parse auto-fix=false from the argument string
and use defaults for all other settings (3 passes, staged target,
thorough model, max-passes 7). It must NOT apply any fixes since
auto-fix is explicitly false.

# ==================================================================
# tidy.md — all flag
# ==================================================================
Expand All @@ -139,3 +170,19 @@ tests:
The response must describe calculating a cutoff date by subtracting
3 days from today, and only removing runs older than that cutoff.
It should keep runs from the last 3 days.

# ==================================================================
# tidy.md — no arguments (defaults)
# ==================================================================
- description: "tidy with no arguments → keep=1, all=false"
vars:
prompt_file: src/commands/tidy.md
assert:
- type: llm-rubric
value: >
The response must use default values since the argument string is
empty: keep=1 (keep runs from the last 1 day) and all=false (do
NOT remove all runs). It should describe date-based filtering
with a 1-day retention window. An empty argument string means
the user wants all defaults — it should NOT say arguments are
missing or unavailable.
65 changes: 34 additions & 31 deletions scripts/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -129,48 +129,51 @@ test_frontmatter() {
check_frontmatter "$PROJECT_ROOT/src/skills/lookagain-output-format/SKILL.md" name description
}

test_argument_interpolation() {
# Verify that arguments defined in frontmatter are referenced using
# $ARGUMENTS.<name> syntax in the instruction body, not just in the
# Configuration display section. This prevents the executing agent
# from missing argument values and falling back to safe defaults.
test_argument_handling() {
# Verify that command files using arguments follow the correct pattern:
# 1. Frontmatter has argument-hint (not the unsupported arguments: array)
# 2. Body contains $ARGUMENTS placeholder for the raw argument string
# 3. Body contains a defaults table with Key/Default columns
# This ensures the agent receives and parses arguments at runtime
# rather than relying on non-existent compile-time interpolation.

for file in "$PROJECT_ROOT"/src/commands/*.md; do
local relpath="${file#"$PROJECT_ROOT"/}"

# Extract argument names from frontmatter
local args
args=$(awk '
NR==1 && /^---$/ { in_fm=1; next }
in_fm && /^---$/ { exit }
in_fm && /^ - name: / { gsub(/^ - name: /, ""); print }
' "$file")

if [[ -z "$args" ]]; then
continue
fi
local frontmatter
frontmatter=$(awk 'NR==1{next} /^---$/{exit} {print}' "$file")

# Extract the body (everything after the second ---)
local body
body=$(awk '
NR==1 && /^---$/ { in_fm=1; next }
in_fm && /^---$/ { in_fm=0; next }
!in_fm { print }
' "$file")

# For each argument, verify $ARGUMENTS.<name> appears in the body
local all_found=1
while IFS= read -r arg; do
local ref="\$ARGUMENTS.${arg}"
if ! echo "$body" | grep -qF "$ref"; then
fail "$relpath: argument '$arg' defined but \$ARGUMENTS.$arg never used in body"
all_found=0
fi
done <<< "$args"

if [[ $all_found -eq 1 ]]; then
pass "$relpath: all arguments interpolated in body"
# Skip commands that don't accept arguments
if ! echo "$body" | grep -qF '$ARGUMENTS'; then
continue
fi

# Must NOT use the unsupported arguments: array in frontmatter
if echo "$frontmatter" | grep -q "^arguments:"; then
fail "$relpath: uses unsupported 'arguments:' frontmatter — use 'argument-hint:' and agent-side parsing instead"
continue
fi

# Must have argument-hint in frontmatter
if ! echo "$frontmatter" | grep -q "^argument-hint:"; then
fail "$relpath: missing 'argument-hint:' in frontmatter"
continue
fi

# Must have a defaults table (Key | Default header pattern)
if ! echo "$body" | grep -q "| Key | Default"; then
fail "$relpath: missing defaults table (expected '| Key | Default |' header)"
continue
fi

pass "$relpath: argument handling correct"
done
}

Expand Down Expand Up @@ -369,8 +372,8 @@ echo "--- frontmatter ---"
test_frontmatter
echo ""

echo "--- argument interpolation ---"
test_argument_interpolation
echo "--- argument handling ---"
test_argument_handling
echo ""

echo "--- cross-references ---"
Expand Down
74 changes: 41 additions & 33 deletions src/commands/again.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,53 @@
---
name: again
description: Run sequential code review passes with fresh contexts to catch more issues
arguments:
- name: passes
description: Number of review passes to run
default: "3"
- name: target
description: "What to review: staged, commit, branch, or a file/directory path"
default: "staged"
- name: auto-fix
description: Automatically fix must_fix issues between passes (true/false)
default: "true"
- name: model
description: "Reviewer model: fast (haiku), balanced (sonnet), thorough (inherit)"
default: "thorough"
- name: max-passes
description: Maximum passes if must_fix issues persist
default: "7"
argument-hint: "[key=value ...]"
---

# Iterative Code Review

You orchestrate sequential, multi-pass code review. Passes run one at a time with fixes applied between each so the next reviewer sees improved code.

## Configuration
## Phase 0: Setup

- **Passes**: $ARGUMENTS.passes
- **Target**: $ARGUMENTS.target
- **Auto-fix**: $ARGUMENTS.auto-fix
- **Model**: $ARGUMENTS.model
- **Max passes**: $ARGUMENTS.max-passes
### Parse arguments

## Phase 0: Setup
The user may pass key=value pairs after the command name. The raw argument string is:

> $ARGUMENTS

If the argument string is empty or blank, the user provided no overrides — use all defaults. Parse `key=value` pairs from the string. For any key not provided, use the default.

| Key | Default | Description |
|---|---|---|
| `passes` | `3` | Number of review passes to run |
| `target` | `staged` | What to review: `staged`, `commit`, `branch`, or a file/directory path |
| `auto-fix` | `true` | Automatically fix must_fix issues between passes (`true` or `false`) |
| `model` | `thorough` | Reviewer model: `fast` (haiku), `balanced` (sonnet), `thorough` (inherit) |
| `max-passes` | `7` | Maximum passes if must_fix issues persist |

**Log the resolved configuration** so it is visible in the output:

```
Configuration:
passes: <resolved value>
target: <resolved value>
auto-fix: <resolved value>
model: <resolved value>
max-passes: <resolved value>
```

### Generate run ID

Generate a run ID: `YYYY-MM-DDTHH-MM-SS`. All output goes under `.lookagain/<run-id>/`.

### Orchestrator role

1. Generate a run ID: `YYYY-MM-DDTHH-MM-SS`. All output goes under `.lookagain/<run-id>/`.
2. Do NOT read the codebase yourself. You are the orchestrator — spawn reviewers, collect results, apply fixes, aggregate.
3. Create a TodoWrite list with one item per pass plus aggregation. Mark items `in_progress` when starting and `completed` when done.
Do NOT read the codebase yourself. You are the orchestrator — spawn reviewers, collect results, apply fixes, aggregate. Create a TodoWrite list with one item per pass plus aggregation. Mark items `in_progress` when starting and `completed` when done.

### Resolve scope

Determine the scope instruction to pass to each reviewer:
Using the resolved `target` value, determine the scope instruction to pass to each reviewer:

| Target value | Scope instruction for reviewer |
|---|---|
Expand All @@ -50,7 +58,7 @@ Determine the scope instruction to pass to each reviewer:

### Resolve model

Map the model argument to the Task tool model parameter:
Using the resolved `model` value, map to the Task tool model parameter:

| Model value | Task model |
|---|---|
Expand All @@ -62,17 +70,17 @@ Map the model argument to the Task tool model parameter:

CRITICAL: Passes run in sequence, NOT in parallel. Each pass reviews code after previous fixes.

For each pass (1 through $ARGUMENTS.passes):
For each pass (1 through the resolved `passes` value):

**Review**: Spawn a fresh subagent via the Task tool using the `lookagain-reviewer` agent. Include: pass number, scope instruction, and instruction to use the `lookagain-output-format` skill. Set the model parameter based on the resolved model. Do NOT include findings from previous passes.

**Collect**: Parse the JSON response. Store findings and track which pass found each issue.

**Fix**: If `$ARGUMENTS.auto-fix` is `true`, apply fixes for `must_fix` issues only. Minimal changes, no refactoring.
**Fix**: If the resolved `auto-fix` value is `true`, apply fixes for `must_fix` issues only. Minimal changes, no refactoring. If `auto-fix` is `false`, skip this step entirely — do not apply any fixes.

**Log**: "Pass N complete. Found X must_fix, Y should_fix, Z suggestions."

After completing $ARGUMENTS.passes passes, if `must_fix` issues remain and total passes < $ARGUMENTS.max-passes, run additional passes.
After completing the configured number of passes, if `must_fix` issues remain and total passes < the resolved `max-passes` value, run additional passes.

## Phase 2: Aggregate

Expand Down Expand Up @@ -122,6 +130,6 @@ Include the count of previous runs (glob `.lookagain/????-??-??T??-??-??/`, subt
1. **Sequential**: Never launch passes in parallel. Each must complete before the next starts.
2. **Fresh context**: Always use the Task tool for subagents.
3. **Independence**: Never tell subagents what previous passes found.
4. **Minimal fixes**: Only change what's necessary when `$ARGUMENTS.auto-fix` is `true`.
4. **Minimal fixes**: Only apply fixes when the resolved `auto-fix` value is `true`.
5. **Valid JSON**: If subagent output fails to parse, log the error and continue.
6. **Respect max-passes**: Never exceed $ARGUMENTS.max-passes.
6. **Respect max-passes**: Never exceed the resolved `max-passes` value.
Loading