Skip to content

feat: add supermodel skill command#125

Closed
jonathanpopham wants to merge 2 commits intosupermodeltools:mainfrom
jonathanpopham:feat/skill-command
Closed

feat: add supermodel skill command#125
jonathanpopham wants to merge 2 commits intosupermodeltools:mainfrom
jonathanpopham:feat/skill-command

Conversation

@jonathanpopham
Copy link
Copy Markdown
Contributor

@jonathanpopham jonathanpopham commented Apr 13, 2026

Summary

  • Adds supermodel skill — prints a generic prompt teaching AI agents to use .graph.* files
  • Prompt is repo/language agnostic, benchmark-tuned
  • Includes regression test locking in the 6 key prompt elements

Benchmark (Django, 8 failing tests)

Variant Cost Duration Turns
Naked (no shards) $0.30 122s 20
Grey's hand-crafted prompt $0.12 29s 9
supermodel skill (this PR) $0.11 31s 7

Test plan

  • go build ./... passes
  • go test ./cmd/ -run TestSkill passes
  • Docker benchmark run confirms parity with hand-crafted prompt

Summary by CodeRabbit

Release Notes

  • New Features

    • Added skill command that provides guidance on navigating code relationships using .graph files.
  • Documentation

    • Introduced documentation describing .graph file format, naming conventions, and usage guidance.
    • Updated benchmark results with expanded comparison metrics across multiple implementation variants.

Revised the generic skill prompt based on benchmark trace analysis.
Three changes: teach the .graph naming convention so agents construct
paths directly, bold the read-order directive, and tell agents to
check graph files before grepping for structure.

Skill v2: $0.11, 31s, 7 turns (was $0.15, 42s, 11 turns)
Matches Grey's hand-crafted Django prompt: $0.12, 29s, 9 turns
Locks in the six key elements that drove benchmark results:
graph extension, three section names, naming convention example,
and read-order directive.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

Walkthrough

This PR introduces a new skill CLI command that prints instructional guidance on using .graph code relationship files. Supporting changes include documentation of the .graph file format, tests for the command's output, and updates to benchmark results and documentation reflecting new experiment variants.

Changes

Cohort / File(s) Summary
Skill Command & Tests
cmd/skill.go, cmd/skill_test.go
Added new Cobra subcommand skill that prints a skillPrompt constant to stdout. Prompt instructs users to consult .graph.* files before source files and describes the file format (naming convention, expected sections: [deps], [calls], [impact]). Two test cases validate prompt content presence and minimum length.
Documentation & Guidance
benchmark/CLAUDE.skill.md
New markdown file documenting .graph.* file usage, naming convention (inserting .graph before extension), expected sections, and best practices for using graph files to navigate code relationships.
Benchmark Results Updates
benchmark/results/summary.md, benchmark/results/blog-post-draft.md
Updated benchmark comparison from two variants to four: Naked Claude, Supermodel with crafted prompt, Supermodel with auto-generated prompt, and Three-file shards. Revised metrics table, cost/turns/duration values, and narrative descriptions to reflect new experiment design.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • greynewell

Poem

📚 Graph files light the path ahead,
No more guessing where to tread,
.graph shards organize the way,
Skill command saves the day,
Benchmarks proved—relationships clear! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers the key aspects but doesn't follow the required template structure with 'What', 'Why', and 'Test plan' sections as specified. Restructure the description to match the template: add explicit 'What' section (summarize the change), 'Why' section (motivation/context), and organize 'Test plan' with checkboxes for make test and make lint.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a new supermodel skill command that provides a generic prompt for AI agents.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
cmd/skill.go (1)

9-19: Keep prompt content single-sourced to avoid doc drift.

skillPrompt is effectively duplicated with benchmark/CLAUDE.skill.md. Consider enforcing exact parity (or generating one from the other) so future edits don’t silently diverge.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/skill.go` around lines 9 - 19, The string constant skillPrompt is
duplicated in benchmark/CLAUDE.skill.md; consolidate to a single source of truth
by loading the prompt text from one canonical file instead of hardcoding it in
cmd/skill.go (or by generating benchmark/CLAUDE.skill.md from skillPrompt at
build time). Update cmd/skill.go to read the canonical prompt (e.g., from
benchmark/CLAUDE.skill.md) into the skillPrompt variable (or add a build step
that writes the canonical file), and add a small test or CI check that verifies
exact parity between skillPrompt and benchmark/CLAUDE.skill.md to prevent future
drift.
cmd/skill_test.go (1)

8-32: Add one behavior test for the command path, not just the constant.

Right now tests only assert skillPrompt. Add a test that executes skill and asserts output contains the prompt so command wiring regressions are caught too.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benchmark/results/summary.md`:
- Around line 12-14: Update the stale "skill (generic)" benchmark entries in the
results table: replace the old values ($0.15, 42s, 11 turns) with the PR-stated
results ($0.11, 31s, 7 turns) in the row for "skill (generic)" and then
recompute and update the derived comparison line (the comparison column that
references those values) so the cost, duration, and turns reflect the new
numbers consistently throughout the table.

---

Nitpick comments:
In `@cmd/skill.go`:
- Around line 9-19: The string constant skillPrompt is duplicated in
benchmark/CLAUDE.skill.md; consolidate to a single source of truth by loading
the prompt text from one canonical file instead of hardcoding it in cmd/skill.go
(or by generating benchmark/CLAUDE.skill.md from skillPrompt at build time).
Update cmd/skill.go to read the canonical prompt (e.g., from
benchmark/CLAUDE.skill.md) into the skillPrompt variable (or add a build step
that writes the canonical file), and add a small test or CI check that verifies
exact parity between skillPrompt and benchmark/CLAUDE.skill.md to prevent future
drift.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cad1440b-55c3-4210-9196-8e0c138d87b5

📥 Commits

Reviewing files that changed from the base of the PR and between 5552bb6 and 8a9f1aa.

⛔ Files ignored due to path filters (1)
  • benchmark/results/benchmark_results.zip is excluded by !**/*.zip
📒 Files selected for processing (6)
  • benchmark/CLAUDE.skill.md
  • benchmark/results/blog-post-draft.md
  • benchmark/results/skill-v2.txt
  • benchmark/results/summary.md
  • cmd/skill.go
  • cmd/skill_test.go

Comment on lines +12 to +14
| Cost | $0.30 | $0.12 | $0.15 | $0.25 |
| Turns | 20 | 9 | 11 | 16 |
| Duration | 122s | 29s | 42s | 73s |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

skill (generic) benchmark values look stale against this PR’s stated results.

Line 12-Line 14 and Line 18 still show the older run ($0.15, 42s, 11 turns). The PR objective states the updated result is $0.11, 31s, 7 turns, so the derived comparison line is also off.

📌 Suggested update
-| Cost               | $0.30        | $0.12                | $0.15           | $0.25        |
-| Turns              | 20           | 9                    | 11              | 16           |
-| Duration           | 122s         | 29s                  | 42s             | 73s          |
+| Cost               | $0.30        | $0.12                | $0.11           | $0.25        |
+| Turns              | 20           | 9                    | 7               | 16           |
+| Duration           | 122s         | 29s                  | 31s             | 73s          |

-**skill (generic prompt): 50% cheaper, 66% faster, 45% fewer turns vs naked**
+**skill (generic prompt): 63% cheaper, 75% faster, 65% fewer turns vs naked**

Also applies to: 18-18

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/results/summary.md` around lines 12 - 14, Update the stale "skill
(generic)" benchmark entries in the results table: replace the old values
($0.15, 42s, 11 turns) with the PR-stated results ($0.11, 31s, 7 turns) in the
row for "skill (generic)" and then recompute and update the derived comparison
line (the comparison column that references those values) so the cost, duration,
and turns reflect the new numbers consistently throughout the table.

@greynewell
Copy link
Copy Markdown
Contributor

Closing in favor of a new PR from the rebased origin branch that includes Windows CI fixes (TMP/TEMP env vars and filepath separators). The code changes from this PR are preserved in supermodeltools:feat/skill-command.

@greynewell greynewell closed this Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants