Add business-level naming guidance to samurai skill#24
Merged
Conversation
…kill) Variant A of A/B test for samurai test-naming guidance. Extends the existing skill with an inline Naming section in SKILL.md (forbidden patterns + lazy-load pointer) plus skill/naming.md with the actor / verb / outcome replacement protocol, heuristic table, and bad → good pairs grounded in real production trees. Worked example in api.md updated so its Test() names already clear the new bar (no "list empty", no generic "check"). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hardened A/B eval (no lazy load + mixed fixtures + N=3) ranked V2 at 98.4% vs V1 at 92.1%. V1's extra detail over-flagged passive voice ("first loan is added to the history"). V2 keeps all 4 forbidden patterns with examples and adds a safe-examples disclaimer so domain-natural uses of "has", "returns", "no" stay PASS.
Full results: eval/naming-length/RESULTS-HARD.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Records the two-pass evaluation that picked V2 (129w) as the inline naming section: first eval was non-discriminating because every variant loaded the lazy naming.md (RESULTS.md); hardened eval (RESULTS-HARD.md) stripped the pointer, used mixed textbook/borderline/clean fixtures, and ran N=3 per cell to expose stable failure modes. V2 won at 98.4%. Includes both fixture sets, both ground-truth files, all 5 variants (loaded and no-load), and the prompt-build script.
…reator review - Drop bare `has` from the inline assertion-phrasing list in SKILL.md: it conflicted with the safe-examples carve-out for `has insufficient collateral` and was absent from naming.md's heuristics row, so a model cross-checking the two files saw conflicting signals. - Pair the HTTP-status examples in naming.md with their domain verbs (`200 → accepts`, `404 → reports missing`, `409 → rejects duplicate id`) so the list no longer reads as if all HTTP leaks are rejections. Coverage on the hardened eval is unchanged (both edits are inside cases V2 already PASSed).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
## Namingsection tocmd/claude-setup/skill/SKILL.mdplus a lazy-loadednaming.mdso Claude keepss.Test("...")names at business level — no Go identifier leaks, no HTTP status codes, no assertion phrasing, no data-shape words.The inline section is the V2 winner from a two-pass A/B evaluation (
eval/naming-length/):RESULTS.md) — non-discriminating: every variant pointed at the lazy-loadednaming.mdand subagents loaded it, so the inline section's own carrying capacity was never tested. All variants scored 20/20.RESULTS-HARD.md) — stripped the[naming.md]pointer, used mixed fixtures (textbook + borderline + clean), N=3 subagent runs per cell. V2 (129w) won at 98.4% vs V1 (206w, 92.1%) which over-flagged passive voice, and V5 (16w, 85.7%) which lost structure-leak detection.Variant B (two-skill split) was prepared on
naming-variant-bbut deferred — V2 already clears the carrying-capacity bar without the extra install surface (two skills, two version markers, six files entries). If review-phase precision regresses in production use, B can be re-evaluated against shipped A.Test plan
go build ./cmd/claude-setup— green.claude/skills/samurai/contains V2## Naming(4-bullet list + safe-examples disclaimer) and the 4.1Knaming.mdhas/returns/isstayed PASS🤖 Generated with Claude Code