Skip to content

Han Feedback: plan-a-feature + plan-implementation (2026-05-29) #40

@mjansen401

Description

@mjansen401

Han Feedback — 2026-05-29

Skills used: han:plan-a-featurehan:plan-implementation
Context: Planning and implementing the contribution of /han-feedback to testdouble/han as a pull request — speccing what the skill does, then producing the PR workflow and SKILL.md adaptation checklist
Outcome: Feature specification, implementation plan, and a live PR at #39


han:plan-a-feature

What worked well

  • Codebase-first discovery before any interview. The skill fetched real SKILL.md examples, the CONTRIBUTING.md requirements, the skills index format, writing voice guide, and the long-form doc template from the Han repo before surfacing a single question. No placeholder decisions. The interview started with real evidence, not assumptions.
  • Review agents caught 16 genuine findings. The fish shell glob failure on empty directories (ls -t dir/*.md with fish raises an error rather than returning empty output) would have produced mysterious first-run behavior. The partial write leaving a file that satisfies the dedup check, the ambiguous-response handling gaps, and the step-order difference between the original skill and the spec — all real, all caught before the SKILL.md was written.
  • Only 2 questions reached the user. Output folder and skills index group. Everything else resolved from the Han repo's own codebase and writing standards. The "what genuinely needs human judgment" filter held.
  • Project-manager synthesis found orphaned cross-references the earlier passes missed. Artifacts came out internally consistent — every F# had a matching D# and vice versa.
  • The spec survived "what does the gh CLI actually output?" The 4-branch gh failure model in the original skill collapsed to 3 in the spec after the on-call-engineer verified the CLI signals. That correction happened at spec time, not at implementation time.

What didn't work

  • Protocol overhead is sized for software features, not skill contributions. The spec produces Outcome, Actors, Primary Flow, Alternate Flows, Edge Cases, User Interactions, Coordinations, Out of Scope, Open Items. For a markdown instruction file where the full behavioral surface fits in 10 steps, the 70-line spec is more scaffolding than the content warrants. The useful artifact for this use case turned out to be the implementation checklist (in plan-implementation), not the spec itself.
  • Step counting drift between phases. The original skill had 8 steps. Review agent findings added directory creation and moved the emptiness check, producing a 10-step spec. The implementation plan then had to carefully number them again. Three numbering artifacts in three files that all needed to stay in sync. A direct adaptation checklist against the original would have been cleaner for this use case.
  • Rating dimensions were underspecified in the spec ("adapt to skill type"). This triggered a finding (F8) and a new decision (D9). A more prescriptive template for the rating table format would have resolved this at template time, not spec time.

Overall

plan-a-feature produced a specification that is genuinely accurate and internally consistent, and the review agents added real value. The friction is in protocol fit: the skill is designed for speccing software features with coordinations across systems, actors, and failure modes. A skill contribution to an external repo is closer to a documentation PR with a well-defined checklist. The spec framework is not wrong here, but it generates more scaffolding than the decision count justifies. Worth considering a lighter-weight "contribution spec" path for this class of work.

Rating

Dimension Score
Spec completeness 5/5
Evidence-first discipline 5/5
Review agent signal quality 5/5
Output length vs. decision count 2/5
Protocol fit for documentation contributions 3/5

han:plan-implementation

What worked well

  • The adaptation checklist is the right artifact. 13 numbered changes against the original SKILL.md, each citing the decision that drives it. A developer can diff the original and the checklist and know exactly what changed and why. That's more useful than re-reading the spec.
  • On-call-engineer was the right specialist for a skill file. The failure path instructions are load-bearing in a way that structural or behavioral analysis wouldn't catch. The 4-to-3 gh branch collapse, the fish glob failure, the partial-file dedup trap — these are the kinds of errors that ship silently and confuse users. Exactly what the on-call-engineer is there for.
  • .discovery-notes.md as shared context worked. The file prevented re-grepping across specialists. admin access confirmation, skill count, writing voice rules — all in one place, read once, referenced everywhere.
  • 1-round convergence was the right outcome. All 16 findings plan-level, none spec-level. The gate correctly went straight to synthesis.
  • YAGNI discipline: partial-file detection was correctly deferred. Write tool can't distinguish partial from absent. The simpler instruction (tell user the path, have them check) satisfies the same behavioral commitment. That's the rule working correctly.

What didn't work

  • Full plan-implementation protocol for a documentation PR. The RAID log, security posture, and testing strategy sections are appropriately scoped (security = the gate is in the skill itself; testing = install locally and run) but still written at full template length. Two specialists, two template sections, one iteration round, project-manager synthesis — the process produced the right plan, but 80% of the output is infrastructure for decisions that were obvious from the start.
  • CONTRIBUTING.md checklist items were found by specialist agents. The Evidence breadcrumb, the em-dash in the issue title, the step-order difference — a pre-flight read of the source SKILL.md against CONTRIBUTING.md would have caught these in 5 minutes. Dispatching agents to find them added turns without adding judgment. The pre-flight read is a better match for the discovery pattern here.
  • Plan section for CHANGELOG required a live skill count that wasn't in the discovery notes. Correct call (count from disk), but it surfaced as a finding (C1) that could have been in the standard pre-flight.

Overall

plan-implementation produced an actionable plan with the right adaptation checklist at its center. The specialist combination was correct. The gap is the same as plan-a-feature: the protocol is sized for code changes, and applying it to a documentation contribution generates scaffolding that outweighs the decision content. The framework is not wrong — the findings were real — but the time-to-plan ratio favors a lighter checklist-based approach for this use case. The skill contribution path would benefit from a dedicated mode or a reduced-template option for documentation-only changes.

Rating

Dimension Score
Adaptation checklist quality 5/5
Specialist selection 5/5
Finding quality (signal-to-noise) 4/5
YAGNI discipline 5/5
Output length vs. decision count 2/5
Protocol fit for documentation contributions 3/5
Round efficiency 5/5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions