Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,13 @@
"name": "look",
"source": "./src",
"description": "Sequential code review with fresh agent contexts. Runs multiple independent review passes to catch more issues.",
"version": "0.3.0",
"version": "0.4.0",
"author": { "name": "HartBrook" },
"repository": "https://github.com/HartBrook/lookagain",
"license": "MIT",
"keywords": ["code-review", "quality", "iterative", "multi-pass"],
"commands": ["./commands/again.md", "./commands/tidy.md"],
"agents": ["./agents/lookagain-reviewer.md"],
"skills": ["./skills/lookagain-output-format"],
"skills": ["./skills/again", "./skills/tidy", "./skills/lookagain-output-format"],
"strict": false
}
]
Expand Down
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,16 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.4.0] - 2026-01-29

### Changed

- Migrated commands from `commands/` directory to `skills/` directory format (`SKILL.md` files) to fix CLI command resolution issues on newer Claude Code versions. See [GSD plugin issue #218](https://github.com/glittercowboy/get-shit-done/issues/218) for background.
- Added `disable-model-invocation: true` to `/look:again` and `/look:tidy` to prevent unintended auto-invocation.
- Renamed frontmatter field `tools` to `allowed-tools` per skills format.
- Updated test suite, build script, evals, and documentation to reflect new file layout.
- No changes to user-facing command names: `/look:again` and `/look:tidy` work as before.

## [0.3.0] - 2026-01-28

### Changed
Expand Down
48 changes: 32 additions & 16 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,18 @@ lookagain/
├── .claude-plugin/
│ └── marketplace.json # Marketplace manifest
├── src/
│ ├── commands/ # Plugin commands
│ │ ├── again.md # Main orchestrator
│ │ └── tidy.md # Tidy old review runs
│ ├── agents/ # Subagent definitions
│ │ └── lookagain-reviewer.md
│ ├── skills/ # Output format specs
│ │ └── lookagain-output-format/
│ ├── skills/ # Plugin skills (commands + reference)
│ │ ├── again/ # Main orchestrator (/look:again)
│ │ ├── tidy/ # Tidy old review runs (/look:tidy)
│ │ └── lookagain-output-format/ # Output format spec
│ ├── dot-claude-plugin/ # Plugin manifest (becomes .claude-plugin/)
│ └── dot-claude/ # Claude settings (becomes .claude/)
├── scripts/
│ ├── package.sh # Build script
│ └── test.sh # Plugin validation tests
│ ├── test.sh # Plugin validation tests
│ └── integration-test.sh # End-to-end integration test
├── evals/ # Behavioral evals (promptfoo)
│ ├── promptfooconfig.yaml
│ └── prompt-loader.js
Expand Down Expand Up @@ -65,13 +65,27 @@ make test
# Behavioral evals — verifies models interpret prompt arguments correctly
# Requires ANTHROPIC_API_KEY
make eval

# Integration test — end-to-end run of look:again with auto-fix
# Requires ANTHROPIC_API_KEY
make integration
```

`make test` runs fast, offline checks that validate plugin structure: file existence, JSON validity, frontmatter fields, cross-references between manifests, and that commands accepting arguments use the correct pattern (`argument-hint` in frontmatter, `$ARGUMENTS` placeholder, and a defaults table in the body).
The three test layers catch different kinds of regressions:

| Layer | Command | What it catches | Cost |
|-------|---------|----------------|------|
| **Structure** | `make test` | Broken manifests, missing files, frontmatter drift, cross-reference mismatches | Free, offline, fast |
| **Evals** | `make eval` | Model misinterpreting arguments (e.g. ignoring `auto-fix=false`, wrong pass count) | API calls (~$0.05/run) |
| **Integration** | `make integration` | End-to-end failures: review not finding bugs, auto-fix not applying, output artifacts missing, tidy not cleaning up | API calls (~$0.50/run) |

**When to run each:**

`make eval` runs [promptfoo](https://promptfoo.dev) evals that send the interpolated prompts to Claude and assert on behavioral correctness. For example, it verifies that `auto-fix=false` causes the model to skip fixes, and that `passes=5` results in 5 planned passes.
- **`make test`** — always, before every commit. It's fast and offline. Catches most structural mistakes from editing manifests, renaming files, or changing frontmatter.
- **`make eval`** — after changing prompt wording or argument handling in skill files. The evals send the interpolated prompts to Claude and assert on behavioral correctness (e.g. that `auto-fix=false` causes the model to skip fixes, that `passes=5` results in 5 planned passes). Cheap enough to run liberally.
- **`make integration`** — after changes that affect the review pipeline end-to-end (orchestration logic, agent prompts, output format). Spins up a temp git repo with contrived bugs, runs `/look:again` with auto-fix, and verifies that bugs are found, fixed, and that output artifacts are correct. Also tests `/look:tidy` cleanup. More expensive, so run when the cheaper layers aren't sufficient.

Evals require an Anthropic API key and cost a small amount per run. Set the key before running:
Both `make eval` and `make integration` require an Anthropic API key. Set it before running:

```bash
# Option 1: export for the current shell session
Expand Down Expand Up @@ -109,22 +123,24 @@ You can also test the plugin through the marketplace install flow, which is clos

### Key Files

- **[src/commands/again.md](src/commands/again.md)**: Main orchestrator logic. Controls pass execution, auto-fixing, and aggregation.
- **[src/skills/again/SKILL.md](src/skills/again/SKILL.md)**: Main orchestrator logic. Controls pass execution, auto-fixing, and aggregation.
- **[src/skills/tidy/SKILL.md](src/skills/tidy/SKILL.md)**: Tidy skill for pruning old review runs.
- **[src/agents/lookagain-reviewer.md](src/agents/lookagain-reviewer.md)**: Reviewer subagent. Defines how individual review passes work.
- **[src/skills/lookagain-output-format/SKILL.md](src/skills/lookagain-output-format/SKILL.md)**: JSON output format specification.
- **[src/dot-claude-plugin/plugin.json](src/dot-claude-plugin/plugin.json)**: Plugin metadata and version.
- **[src/commands/tidy.md](src/commands/tidy.md)**: Tidy command for pruning old review runs.
- **[.claude-plugin/marketplace.json](.claude-plugin/marketplace.json)**: Marketplace manifest for plugin discovery and installation.

### Writing Command Prompts
### Writing Skill Prompts

When editing or adding command prompts in `src/commands/`:
When editing or adding skills in `src/skills/`:

- **Use `$ARGUMENTS` for the raw string.** Claude Code replaces `$ARGUMENTS` with whatever the user typed after the command name. There is no `$ARGUMENTS.name` dot-access syntax — only `$ARGUMENTS` (whole string) and `$ARGUMENTS[N]` (positional). Do NOT use an `arguments:` array in frontmatter — it is not a supported Claude Code feature and will not be interpolated.
- Each skill is a directory containing a `SKILL.md` file (e.g., `src/skills/again/SKILL.md`).
- **Use `$ARGUMENTS` for the raw string.** Claude Code replaces `$ARGUMENTS` with whatever the user typed after the skill name. There is no `$ARGUMENTS.name` dot-access syntax — only `$ARGUMENTS` (whole string) and `$ARGUMENTS[N]` (positional). Do NOT use an `arguments:` array in frontmatter — it is not a supported Claude Code feature and will not be interpolated.
- **Add `argument-hint`** in frontmatter to document expected input format (e.g., `argument-hint: "[key=value ...]"`).
- **Add `disable-model-invocation: true`** for user-triggered actions to prevent Claude from auto-invoking them.
- **Include a defaults table** in the body listing each key, its default, and a description. Instruct the agent to parse `key=value` pairs from `$ARGUMENTS` and fall back to defaults for missing keys.
- **Log the resolved configuration** so it is visible in output and reviewable in evals.
- `make test` enforces that commands using `$ARGUMENTS` have `argument-hint` in frontmatter, a defaults table in the body, and do NOT use the unsupported `arguments:` frontmatter array.
- `make test` enforces that skills using `$ARGUMENTS` have `argument-hint` in frontmatter, a defaults table in the body, and do NOT use the unsupported `arguments:` frontmatter array.
- After changing prompt logic, run `make eval` to verify models still interpret the arguments correctly.

## Pull Requests
Expand All @@ -138,4 +154,4 @@ When editing or adding command prompts in `src/commands/`:

## Versioning

Update the version in `src/dot-claude-plugin/plugin.json` when making releases. The marketplace entry in `.claude-plugin/marketplace.json` also has a `version` field — keep both in sync. The test suite (`make test`) validates that commands, agents, and skills match between the two files.
Update the version in `src/dot-claude-plugin/plugin.json` when making releases. The marketplace entry in `.claude-plugin/marketplace.json` also has a `version` field — keep both in sync. The test suite (`make test`) validates that agents and skills match between the two files.
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ dev: build ## Build and start Claude Code with plugin loaded
eval: ## Run behavioral evals (requires ANTHROPIC_API_KEY)
@npx promptfoo@latest eval -c evals/promptfooconfig.yaml

integration: build ## Run integration test (requires ANTHROPIC_API_KEY)
@./scripts/integration-test.sh

clean: ## Remove build artifacts
@rm -rf dist/
@echo "Cleaned dist/"
19 changes: 15 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,15 +140,26 @@ Previous runs are preserved. Use `/look:tidy` to prune old results:
| should_fix | No | Performance issues, poor error handling |
| suggestion | No | Refactoring, documentation, style |

## Updating

To update to the latest version:

```bash
/plugin marketplace update hartbrook-plugins
/plugin uninstall look@hartbrook-plugins
/plugin install look@hartbrook-plugins
```

## Development

```bash
make test # Structural validation (offline, fast)
make eval # Behavioral evals via promptfoo (requires ANTHROPIC_API_KEY)
make dev # Build and start Claude Code with the plugin loaded
make test # Structural validation (offline, fast)
make eval # Behavioral evals via promptfoo (requires ANTHROPIC_API_KEY)
make integration # End-to-end integration test (requires ANTHROPIC_API_KEY)
make dev # Build and start Claude Code with the plugin loaded
```

See [CONTRIBUTING.md](CONTRIBUTING.md) for full development setup and guidelines.
Run `make test` before every commit (free, offline). Run `make eval` after changing prompt wording or argument handling. Run `make integration` after changes to the review pipeline or output format. See [CONTRIBUTING.md](CONTRIBUTING.md) for details on what each layer catches.

## License

Expand Down
38 changes: 19 additions & 19 deletions evals/promptfooconfig.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ providers:

tests:
# ==================================================================
# again.md — no arguments (defaults)
# again — no arguments (defaults)
# ==================================================================
- description: "no arguments → model uses all defaults (auto-fix=true, passes=3, thorough)"
vars:
prompt_file: src/commands/again.md
prompt_file: src/skills/again/SKILL.md
assert:
- type: llm-rubric
value: >
Expand All @@ -26,11 +26,11 @@ tests:
simply means the user wants all defaults.

# ==================================================================
# again.md — auto-fix interpretation
# again — auto-fix interpretation
# ==================================================================
- description: "auto-fix=true → model plans to apply fixes"
vars:
prompt_file: src/commands/again.md
prompt_file: src/skills/again/SKILL.md
arg_passes: "3"
arg_target: staged
arg_auto-fix: "true"
Expand All @@ -45,7 +45,7 @@ tests:

- description: "auto-fix=false → model skips fixes"
vars:
prompt_file: src/commands/again.md
prompt_file: src/skills/again/SKILL.md
arg_passes: "3"
arg_target: staged
arg_auto-fix: "false"
Expand All @@ -59,11 +59,11 @@ tests:
It should not describe applying any code fixes.

# ==================================================================
# again.md — passes count
# again — passes count
# ==================================================================
- description: "passes=5 → model plans exactly 5 initial passes"
vars:
prompt_file: src/commands/again.md
prompt_file: src/skills/again/SKILL.md
arg_passes: "5"
arg_target: staged
arg_auto-fix: "true"
Expand All @@ -79,11 +79,11 @@ tests:
5 sequential passes before considering additional passes.

# ==================================================================
# again.md — model resolution
# again — model resolution
# ==================================================================
- description: "model=fast → reviewer uses haiku"
vars:
prompt_file: src/commands/again.md
prompt_file: src/skills/again/SKILL.md
arg_passes: "3"
arg_target: staged
arg_auto-fix: "true"
Expand All @@ -99,7 +99,7 @@ tests:

- description: "model=thorough → no explicit model override"
vars:
prompt_file: src/commands/again.md
prompt_file: src/skills/again/SKILL.md
arg_passes: "3"
arg_target: staged
arg_auto-fix: "true"
Expand All @@ -113,11 +113,11 @@ tests:
current model). It should NOT set the model to haiku or sonnet.

# ==================================================================
# again.md — scope resolution
# again — scope resolution
# ==================================================================
- description: "target=branch → branch-based diff scope"
vars:
prompt_file: src/commands/again.md
prompt_file: src/skills/again/SKILL.md
arg_passes: "3"
arg_target: branch
arg_auto-fix: "true"
Expand All @@ -131,11 +131,11 @@ tests:
branch. It should reference branch comparison or merge-base.

# ==================================================================
# again.md — partial arguments (only auto-fix specified)
# again — partial arguments (only auto-fix specified)
# ==================================================================
- description: "only auto-fix=false → defaults for everything else, no fixing"
vars:
prompt_file: src/commands/again.md
prompt_file: src/skills/again/SKILL.md
arg_auto-fix: "false"
assert:
- type: llm-rubric
Expand All @@ -146,11 +146,11 @@ tests:
auto-fix is explicitly false.

# ==================================================================
# tidy.md — all flag
# tidy — all flag
# ==================================================================
- description: "all=true → removes all runs"
vars:
prompt_file: src/commands/tidy.md
prompt_file: src/skills/tidy/SKILL.md
arg_keep: "1"
arg_all: "true"
assert:
Expand All @@ -161,7 +161,7 @@ tests:

- description: "all=false, keep=3 → date-based retention"
vars:
prompt_file: src/commands/tidy.md
prompt_file: src/skills/tidy/SKILL.md
arg_keep: "3"
arg_all: "false"
assert:
Expand All @@ -172,11 +172,11 @@ tests:
It should keep runs from the last 3 days.

# ==================================================================
# tidy.md — no arguments (defaults)
# tidy — no arguments (defaults)
# ==================================================================
- description: "tidy with no arguments → keep=1, all=false"
vars:
prompt_file: src/commands/tidy.md
prompt_file: src/skills/tidy/SKILL.md
assert:
- type: llm-rubric
value: >
Expand Down
Loading