feat: add supermodel skill command by jonathanpopham · Pull Request #125 · supermodeltools/cli

jonathanpopham · 2026-04-13T21:25:24Z

Summary

Adds supermodel skill — prints a generic prompt teaching AI agents to use .graph.* files
Prompt is repo/language agnostic, benchmark-tuned
Includes regression test locking in the 6 key prompt elements

Benchmark (Django, 8 failing tests)

Variant	Cost	Duration	Turns
Naked (no shards)	$0.30	122s	20
Grey's hand-crafted prompt	$0.12	29s	9
`supermodel skill` (this PR)	$0.11	31s	7

Test plan

go build ./... passes
go test ./cmd/ -run TestSkill passes
Docker benchmark run confirms parity with hand-crafted prompt

Summary by CodeRabbit

Release Notes

New Features
- Added skill command that provides guidance on navigating code relationships using .graph files.
Documentation
- Introduced documentation describing .graph file format, naming conventions, and usage guidance.
- Updated benchmark results with expanded comparison metrics across multiple implementation variants.

Revised the generic skill prompt based on benchmark trace analysis. Three changes: teach the .graph naming convention so agents construct paths directly, bold the read-order directive, and tell agents to check graph files before grepping for structure. Skill v2: $0.11, 31s, 7 turns (was $0.15, 42s, 11 turns) Matches Grey's hand-crafted Django prompt: $0.12, 29s, 9 turns

Locks in the six key elements that drove benchmark results: graph extension, three section names, naming convention example, and read-order directive.

coderabbitai · 2026-04-13T21:25:41Z

Walkthrough

This PR introduces a new skill CLI command that prints instructional guidance on using .graph code relationship files. Supporting changes include documentation of the .graph file format, tests for the command's output, and updates to benchmark results and documentation reflecting new experiment variants.

Changes

Cohort / File(s)	Summary
Skill Command & Tests `cmd/skill.go`, `cmd/skill_test.go`	Added new Cobra subcommand `skill` that prints a `skillPrompt` constant to stdout. Prompt instructs users to consult `.graph.*` files before source files and describes the file format (naming convention, expected sections: `[deps]`, `[calls]`, `[impact]`). Two test cases validate prompt content presence and minimum length.
Documentation & Guidance `benchmark/CLAUDE.skill.md`	New markdown file documenting `.graph.*` file usage, naming convention (inserting `.graph` before extension), expected sections, and best practices for using graph files to navigate code relationships.
Benchmark Results Updates `benchmark/results/summary.md`, `benchmark/results/blog-post-draft.md`	Updated benchmark comparison from two variants to four: Naked Claude, Supermodel with crafted prompt, Supermodel with auto-generated prompt, and Three-file shards. Revised metrics table, cost/turns/duration values, and narrative descriptions to reflect new experiment design.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: add supermodel skill command for agent awareness prompt #119: Adds the same skill Cobra command and skillPrompt output that teaches agents about repository graph files.
revert: remove skill command (needs redesign) #122: Directly reverts and removes the skill command and skillPrompt implemented in this PR.

Suggested reviewers

greynewell

Poem

📚 Graph files light the path ahead,
No more guessing where to tread,
.graph shards organize the way,
Skill command saves the day,
Benchmarks proved—relationships clear! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description covers the key aspects but doesn't follow the required template structure with 'What', 'Why', and 'Test plan' sections as specified.	Restructure the description to match the template: add explicit 'What' section (summarize the change), 'Why' section (motivation/context), and organize 'Test plan' with checkboxes for make test and make lint.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding a new `supermodel skill` command that provides a generic prompt for AI agents.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

cmd/skill.go (1)
9-19: Keep prompt content single-sourced to avoid doc drift.

skillPrompt is effectively duplicated with benchmark/CLAUDE.skill.md. Consider enforcing exact parity (or generating one from the other) so future edits don’t silently diverge.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/skill.go` around lines 9 - 19, The string constant skillPrompt is
duplicated in benchmark/CLAUDE.skill.md; consolidate to a single source of truth
by loading the prompt text from one canonical file instead of hardcoding it in
cmd/skill.go (or by generating benchmark/CLAUDE.skill.md from skillPrompt at
build time). Update cmd/skill.go to read the canonical prompt (e.g., from
benchmark/CLAUDE.skill.md) into the skillPrompt variable (or add a build step
that writes the canonical file), and add a small test or CI check that verifies
exact parity between skillPrompt and benchmark/CLAUDE.skill.md to prevent future
drift.
cmd/skill_test.go (1)

8-32: Add one behavior test for the command path, not just the constant.

Right now tests only assert skillPrompt. Add a test that executes skill and asserts output contains the prompt so command wiring regressions are caught too.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benchmark/results/summary.md`:
- Around line 12-14: Update the stale "skill (generic)" benchmark entries in the
results table: replace the old values ($0.15, 42s, 11 turns) with the PR-stated
results ($0.11, 31s, 7 turns) in the row for "skill (generic)" and then
recompute and update the derived comparison line (the comparison column that
references those values) so the cost, duration, and turns reflect the new
numbers consistently throughout the table.

---

Nitpick comments:
In `@cmd/skill.go`:
- Around line 9-19: The string constant skillPrompt is duplicated in
benchmark/CLAUDE.skill.md; consolidate to a single source of truth by loading
the prompt text from one canonical file instead of hardcoding it in cmd/skill.go
(or by generating benchmark/CLAUDE.skill.md from skillPrompt at build time).
Update cmd/skill.go to read the canonical prompt (e.g., from
benchmark/CLAUDE.skill.md) into the skillPrompt variable (or add a build step
that writes the canonical file), and add a small test or CI check that verifies
exact parity between skillPrompt and benchmark/CLAUDE.skill.md to prevent future
drift.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cad1440b-55c3-4210-9196-8e0c138d87b5

📥 Commits

Reviewing files that changed from the base of the PR and between 5552bb6 and 8a9f1aa.

⛔ Files ignored due to path filters (1)

benchmark/results/benchmark_results.zip is excluded by !**/*.zip

📒 Files selected for processing (6)

benchmark/CLAUDE.skill.md
benchmark/results/blog-post-draft.md
benchmark/results/skill-v2.txt
benchmark/results/summary.md
cmd/skill.go
cmd/skill_test.go

coderabbitai · 2026-04-13T21:28:56Z

benchmark/results/summary.md

+| Cost               | $0.30        | $0.12                | $0.15           | $0.25        |
+| Turns              | 20           | 9                    | 11              | 16           |
+| Duration           | 122s         | 29s                  | 42s             | 73s          |


⚠️ Potential issue | 🟠 Major

skill (generic) benchmark values look stale against this PR’s stated results.

Line 12-Line 14 and Line 18 still show the older run ($0.15, 42s, 11 turns). The PR objective states the updated result is $0.11, 31s, 7 turns, so the derived comparison line is also off.

📌 Suggested update

-| Cost | $0.30 | $0.12 | $0.15 | $0.25 | -| Turns | 20 | 9 | 11 | 16 | -| Duration | 122s | 29s | 42s | 73s | +| Cost | $0.30 | $0.12 | $0.11 | $0.25 | +| Turns | 20 | 9 | 7 | 16 | +| Duration | 122s | 29s | 31s | 73s | -**skill (generic prompt): 50% cheaper, 66% faster, 45% fewer turns vs naked** +**skill (generic prompt): 63% cheaper, 75% faster, 65% fewer turns vs naked**

Also applies to: 18-18

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@benchmark/results/summary.md` around lines 12 - 14, Update the stale "skill (generic)" benchmark entries in the results table: replace the old values ($0.15, 42s, 11 turns) with the PR-stated results ($0.11, 31s, 7 turns) in the row for "skill (generic)" and then recompute and update the derived comparison line (the comparison column that references those values) so the cost, duration, and turns reflect the new numbers consistently throughout the table.

greynewell · 2026-04-13T21:37:59Z

Closing in favor of a new PR from the rebased origin branch that includes Windows CI fixes (TMP/TEMP env vars and filepath separators). The code changes from this PR are preserved in supermodeltools:feat/skill-command.

jonathanpopham added 2 commits April 13, 2026 17:24

test: add skill prompt regression tests

8a9f1aa

Locks in the six key elements that drove benchmark results: graph extension, three section names, naming convention example, and read-order directive.

jonathanpopham requested a review from greynewell as a code owner April 13, 2026 21:25

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

greynewell closed this Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add supermodel skill command#125

feat: add supermodel skill command#125
jonathanpopham wants to merge 2 commits intosupermodeltools:mainfrom
jonathanpopham:feat/skill-command

jonathanpopham commented Apr 13, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 13, 2026 •

edited

Loading

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 13, 2026

Uh oh!

greynewell commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jonathanpopham commented Apr 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark (Django, 8 failing tests)

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

greynewell commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jonathanpopham commented Apr 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 13, 2026 •

edited

Loading