Skip to content

docs: address cli-agent-skill-patterns gaps from issue #173 (pprose audit)#175

Merged
jlevy merged 3 commits into
mainfrom
claude/hopeful-hopper-thbntw
Jun 13, 2026
Merged

docs: address cli-agent-skill-patterns gaps from issue #173 (pprose audit)#175
jlevy merged 3 commits into
mainfrom
claude/hopeful-hopper-thbntw

Conversation

@jlevy

@jlevy jlevy commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Closes #173.

Folds the five pprose-surfaced gaps from #173 into the cli-agent-skill-patterns guideline, plus a wider distribution audit that keeps our research current. Each guideline edit is tagged with the ladder rung it touches, so the §0 simple baseline stays untouched. All five gaps were verified against pprose's working install.py (not just the issue prose) before editing.

Guideline edits (the five gaps)

  1. Name the L2b variant (§6.0, §0.3) — a self-installing skill that writes a compact managed AGENTS.md block but takes on none of the L3 platform (no hooks/prime/setup/DocCache). Named as a variant within L2 (reference: pprose); the four-rung ladder L0–L3 is unchanged, with no L2a/L2b renumber — the owner chose the lighter single-named-sentence form over a full ladder split. qmd stays the plain-L2 reference.
  2. Format-versioning is artifact-driven, not L3-only (§6.0 closer, §6.6) — any rung that writes a merged/managed artifact (an AGENTS.md block, or a committed generated skill shared across versions) needs the format stamp and forward-compat guard. Hooks/prime/setup/DocCache stay L3-specific.
  3. Generator-side dev-build pin rule (§6.7) — bake the running version only if it is a real, resolvable published release; otherwise fall back to a known-good published pin (pprose's DISCOVERY_VERSION + is_pypi_release). A dev/editable checkout's version (0.1.1.dev49+abc1234) never resolves via uvx/npx.
  4. Drop the redundant surface= tag (§2, §6.6) — both marker examples now carry only format=fNN; artifacts are identified by location. (pprose already dropped it; tbd/flowmark still emit it — see follow-on below.)
  5. Multi-block collapse (§2, with a §6.6 pointer) — on install, collapse stale duplicate blocks to one at the first match, matching a small set of known legacy begin-prefixes.

Current-practice corrections (§6.6 native-scanning table): Cursor now scans .agents/skills/ natively (verified against Cursor docs); native per-agent skill dirs (.cursor/skills/, .github/skills/, .gemini/skills/, Google Antigravity .agent/skills/, …) are proliferating. §11/§12 and References updated (pprose L2b variant, qmd L2, taste-skill/anthropics L0).

Research (the broader audit)

  • New research-2026-06-13-skill-distribution-landscape.md — channel taxonomy, a distribution decision matrix (when each rung/channel is right), case studies across L0/L2/L2b/L3 (taste-skill, anthropics/skills, qmd, pprose, tbd), and the skills.sh leaderboard reality: the most-installed skills are simple L0/L1 vendor knowledge skills (Vercel find-skills ~2.0M, Anthropic frontend-design ~540K, Microsoft Azure ~5.8M), not self-installing platforms. Distribution complexity does not track adoption. The doc records both shapes considered for gap 1 (the L2a/L2b split vs. a single named sentence) and which was adopted.
  • New plan-2026-06-13-cli-skill-guideline-pprose-gaps.md — maps each gap to its edit; Phase 1 marked done and the gap-1 Open Question resolved to the single named variant.
  • Refreshed three stale research docs (foundational CLI-as-skill brief; standard-paths native-dirs note; meta-skill 15K→1% budget correction).

Scope notes

  • All docs were formatted with the repo's flowmark (format:md); new content strictly follows the common-doc guidelines (no spaced em dashes; "and" not +/& in new prose; Title Case headings). Pre-existing + in untouched sections of the guideline was left alone to keep this diff substantive rather than a repo-wide style sweep.
  • Follow-on (not in this PR): gap 4 implies a small tbd generator change — setup.ts emits surface=agents-md on the AGENTS.md begin line, which the guideline now calls unnecessary. Dropping it is safe (block is rewritten whole; detection is prefix-anchored) but is a code change with golden/drift-test updates, best tracked separately.

https://claude.ai/code/session_01MvbozVS4yEWoR4pkJCMSkJ


Generated by Claude Code

Source-level audit (via checkout-third-party-repo) of how popular and shipped
skills distribute, to ground issue #173 and keep our research current.

- New research-2026-06-13-skill-distribution-landscape.md: the channel taxonomy,
  a distribution decision matrix, case studies across L0/L2a/L2b/L3 (taste-skill,
  anthropics/skills, qmd, pprose, tbd), and the skills.sh leaderboard reality
  (simple L0/L1 vendor knowledge skills dominate adoption).
- Verified all five issue #173 gaps against pprose's install.py (dev-build pin
  selection, surface= drop, multi-block collapse, artifact-driven format stamps).
- Current-practice corrections: Cursor now scans .agents/skills/ natively
  (Cursor docs); native per-agent skill dirs are proliferating.
- Refreshed three stale research docs (foundational CLI-as-skill brief,
  standard-paths native-dirs note, meta-skill 15K->1% budget correction).
- New plan-2026-06-13-cli-skill-guideline-pprose-gaps.md maps each gap to a
  concrete, rung-tagged guideline edit.

https://claude.ai/code/session_01MvbozVS4yEWoR4pkJCMSkJ
@deepsource-io

deepsource-io Bot commented Jun 13, 2026

Copy link
Copy Markdown

DeepSource Code Review

We reviewed changes in b68dc4e...1208ee5 on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.

See full review on DeepSource ↗

PR Report Card

Overall Grade   Security  

Reliability  

Complexity  

Hygiene  

Code Review Summary

Analyzer Status Updated (UTC) Details
Secrets Jun 13, 2026 7:43p.m. Review ↗

Important

AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown

Coverage Report for packages/tbd

Status Category Percentage Covered / Total
🔵 Lines 33.9% 2876 / 8483
🔵 Statements 33.82% 2977 / 8801
🔵 Functions 38.99% 452 / 1159
🔵 Branches 29.98% 1362 / 4543
File CoverageNo changed files found.
Generated in workflow #1017 for commit 1208ee5 by the Vitest Coverage Report Action

@jlevy jlevy force-pushed the claude/hopeful-hopper-thbntw branch from 2c54b53 to ade659c Compare June 13, 2026 19:27
@jlevy

jlevy commented Jun 13, 2026

Copy link
Copy Markdown
Owner Author

Senior Engineering Review: Skill Distribution Guidance

I reviewed PR #175 (docs: address cli-agent-skill-patterns gaps from issue #173) at head 2c54b531af967074be541c48d4339c80c7575760, including the diff, PR discussion, local project docs, the tbd workflow guidance, and the external skill-distribution sources cited or implied by the PR.

Bottom line

I agree with the PR's main direction: the guideline should strongly recommend stopping at the lowest integration rung that works. Most tools should ship a small repo-root skills/<name>/SKILL.md and maybe a self-installing skill mirror. Very few should copy tbd's full L3 setup with hooks, prime, setup, DocCache, and a meta-skill.

I would request changes before merge because a few internal contradictions still weaken the recommendation.

Findings

  1. [P1] §6.6 still scopes upgrade/migration guidance to “L3 only.”

    packages/tbd/docs/guidelines/cli-agent-skill-patterns.md:571-577 correctly says format versioning is artifact-driven and applies to any rung that writes a generated artifact it may later need to upgrade or refuse to clobber. That explicitly includes an L2b managed AGENTS.md block.

    But packages/tbd/docs/guidelines/cli-agent-skill-patterns.md:816-819 still says:

    • “Upgrade existing installs deliberately (L3 only).”
    • “An L2 tool that only writes discovery-dir skills can skip all of this...”

    That conflicts with the new L2b definition. An L2b tool writes a managed AGENTS.md block, so it needs the §6.6 format stamp and forward-compatibility guard even though it is not L3. I would rename this section around the artifact, e.g. “Upgrade managed generated artifacts deliberately,” and change the parenthetical to “An L2a tool...” or “A tool that only writes fully-overwritten discovery-dir skills...”

  2. [P2] The refreshed standard-path research still contains stale Cursor/native-scan claims.

    docs/project/research/current/research-agent-skills-standard-paths.md:144-149 says Cursor now scans .agents/skills/ natively, but the table and precision note still say otherwise:

    • :203 says Cursor is .agents/skills/ “per skills CLI support” and that path docs are thinner.
    • :217-224 says Cursor is reached via npx skills add, not native scan.

    Cursor's current docs corroborate native loading from .agents/skills/, .cursor/skills/, ~/.agents/skills/, and ~/.cursor/skills/, plus compatibility loading from .claude/skills/ and .codex/skills/. The research doc should reconcile this instead of carrying both claims.

    More generally, I would keep the distinction explicit: “verified vendor-native scan” vs “Vercel installer support.” Vercel's supported-agent table is useful evidence for distribution reach, but it should not be treated as source verification for every vendor's native loader.

  3. [P3] The plan/spec still reads like pre-implementation state.

    docs/project/specs/active/plan-2026-06-13-cli-skill-guideline-pprose-gaps.md:10 still says this is a draft proposal and the guideline edits are not applied. :202-210 still lists open questions that the PR has already answered, including the L2a/L2b split and Cursor native-scanning confirmation.

    I would update this to “implemented/review” state, mark the decisions as decided, or move it out of active draft state. Otherwise future agents will treat already-settled PR decisions as unresolved planning work.

Research corroboration

The main research claims check out.

  • skills.sh corroborates the public agent-skills ecosystem and leaderboard shape, including large install counts for small knowledge/prompt skills.
  • Vercel Skills supported agents corroborates the broad installer/distribution story and the importance of .agents/skills/ as a portable project-level convention.
  • Cursor Agent Skills docs corroborate native Cursor loading from .agents/skills/, .cursor/skills/, and user-level variants.
  • pprose install.py corroborates the L2b pattern: .agents/skills/, .claude/skills/, marker-bounded AGENTS.md, format=fNN, a forward-compatibility guard, collapse of stale duplicate managed blocks, and release-pinned runner instructions.
  • qmd supports the L2a pattern: installable skill plus plugin/MCP packaging without a managed AGENTS.md block.
  • anthropics/skills and taste-skill support the repo-root skills and plugin-marketplace distribution examples.

The nuance I would add: the ecosystem is fragmenting into many native directories, but the best response is not for every CLI to write every native path itself. The durable baseline is still canonical repo-root skills plus .agents/skills/ portability and a .claude/skills/ mirror where Claude Code support matters. Let broader installers or plugin packaging fan out additional per-agent mirrors unless the tool has a strong reason to own that surface.

Holistic recommendation

I would make the guideline's recommendation ladder even sharper:

  1. Default: repo-root canonical skill. Ship skills/<name>/SKILL.md in the repo. Keep it small, installable, and readable by humans and crawlers.
  2. Skill routes to CLI. A skill should teach the agent when and how to invoke the CLI, not duplicate full CLI help or become a second product surface.
  3. Portable project mirror: .agents/skills/. Use this as the cross-agent project path.
  4. Claude mirror: .claude/skills/. Required if Claude Code support is first-class, because Claude Code does not treat .agents/skills/ as the canonical native location.
  5. L2a: self-installing skill only. A CLI may write its own discovery-dir skill and stop there. Full overwrite is enough; no migration apparatus needed.
  6. L2b: self-installing skill plus managed AGENTS.md. Only add this when the tool needs always-on bootstrap instructions. Once it writes into a human-authored project-instruction file, it needs marker bounds, format=fNN, a forward guard, deterministic output, and duplicate-block cleanup.
  7. L3: full agent platform. Hooks, prime, setup, DocCache, and meta-skills are justified only for tools with broad lifecycle/state/knowledge-library needs. tbd qualifies; most CLIs do not.
  8. Plugin marketplaces are packaging channels, not the baseline. Use them for bundled skills/MCP/hooks/trust/discovery. Do not make plugin packaging the required path for a basic skill to work.

What tbd should do to dogfood this

I would turn the PR's recommendations into a small tbd dogfooding roadmap:

  1. Fix this PR's internal contradictions before merge.

    • Rename or re-scope the §6.6 “L3 only” upgrade section.
    • Reconcile Cursor/native-scan research.
    • Update the active spec status and open questions.
  2. Make tbd's own distribution contract explicit.

    • State that tbd is intentionally L3, not the baseline example for ordinary CLIs.
    • Document the exact surfaces tbd owns: generated skill(s), managed AGENTS.md block, hooks, setup/prime, DocCache/meta-skill.
    • State which lower rungs tbd recommends for simpler tools and why.
  3. Generate and test tbd's own canonical skill from one source.

    • Ensure the source of truth for the tbd skill can produce repo-local .agents/skills/tbd/SKILL.md and .claude/skills/tbd/SKILL.md consistently.
    • Add/keep drift tests so generated skill content cannot silently diverge from bundled docs.
    • Keep the skill small: it should route agents to tbd prime, tbd shortcut, and tbd guidelines, not embed the whole manual.
  4. Clean up tbd's managed marker format.

    • The PR says surface= is redundant because artifact identity comes from file location and marker name. I agree.
    • tbd should follow through with a real migration: accept old markers with surface=..., emit the cleaner marker going forward, collapse duplicates, and guard newer format=fNN blocks.
  5. Make format migration artifact-driven in code and tests.

    • Test managed AGENTS.md blocks independently from hook files and skill files.
    • Treat each generated artifact as having its own format/migration contract where needed.
    • Avoid implying that format stamps exist only because a tool is L3.
  6. Improve setup/doctor output around rungs and surfaces.

    • tbd setup/doctor should be able to explain what surfaces are installed and why tbd uses L3.
    • The message should reinforce the public recommendation: simpler tools usually want L1, L2a, or L2b.
  7. Add a visible follow-up bead for the tbd generator cleanup.

    • I would not force the generator migration into this docs PR if the intent is to keep it pure docs.
    • But the follow-up should be tracked explicitly: remove/deprecate surface=, add compatibility tests, verify generated output, and update docs/examples after the migration lands.

Suggested merge stance

I would not block on the tbd generator cleanup if it is tracked as a follow-up. I would block on the three documentation consistency issues above, because they are in the recommendation surface this PR is changing.

Fold the five pprose-surfaced gaps from issue #173 into the guideline, each
tagged with the ladder rung it touches so the §0 baseline stays untouched.

- Name the L2b variant (§6.0, §0.3): an L2 self-installer that also writes a
  managed AGENTS.md block, with none of the L3 platform; pprose is the reference.
  Kept as a named variant within L2, not a separate renumbered rung.
- Reframe format-versioning as artifact-driven, not L3-only (§6.0 closer, §6.6):
  any rung that writes a merged/managed artifact needs the format stamp and
  forward-compat guard; hooks/prime/setup/DocCache stay L3-specific.
- Add the generator-side dev-build pin rule (§6.7): bake a resolvable published
  release, not an unpublishable dev/pre-release version.
- Drop the redundant surface= tag from both marker examples (§2, §6.6); identify
  artifacts by location.
- Add multi-block collapse to §2 (and a §6.6 pointer): collapse stale duplicate
  blocks to one, matching known legacy begin-prefixes.
- Refresh the §6.6 native-scanning table: Cursor now scans .agents/skills/
  natively; note native per-agent skill dirs are multiplying.
- Update §11/§12 and References (pprose L2b variant, qmd L2, taste-skill/
  anthropics L0).

Sync the plan and research docs to the chosen form: record gap 1 resolved as the
single named variant in the plan (Phase 1 done, Open Questions resolved), and note
in the landscape research which shape was adopted vs. the rejected L2a/L2b split.

https://claude.ai/code/session_01MvbozVS4yEWoR4pkJCMSkJ
@jlevy jlevy force-pushed the claude/hopeful-hopper-thbntw branch from ade659c to 02d4a8e Compare June 13, 2026 19:32
…ative, spec status)

Address the three blocking findings from the senior engineering review on PR #175:

- P1 (guideline §6.6): retitle "Upgrade existing installs deliberately (L3 only)"
  to "Upgrade managed generated artifacts deliberately" and rewrite the parenthetical
  around the artifact, not the rung. A plain-L2 tool (discovery-dirs only) still skips
  migration; the L2b variant, which merges a managed AGENTS.md block into a
  human-authored file, needs the format stamp and forward-compat guard even though it
  is not L3. Removes the contradiction with the artifact-driven rule in §6.0.
- P2 (research-agent-skills-standard-paths): reconcile the Cursor claims. Cursor
  natively scans .agents/skills/ and .cursor/skills/ (plus the ~/ variants, recursive,
  with .claude/.codex compatibility loading) per the Cursor Agent Skills docs (verified
  2026-06-13). Move Cursor from the installer-reach bucket to verified-native in both
  the table and the precision note, and make the verified-native vs. Vercel-installer-
  reach distinction explicit.
- P3 (plan spec): flip Status from "Draft (not yet applied)" to "Implemented"; the
  Open Questions and Phase 1 were already marked resolved in the prior commit. Record
  the review's dogfooding roadmap as tracked Phase 2 follow-ons.

https://claude.ai/code/session_01MvbozVS4yEWoR4pkJCMSkJ

jlevy commented Jun 13, 2026

Copy link
Copy Markdown
Owner Author

Thanks for the detailed review. Addressed the three blocking findings in 1208ee5 (on top of the gap-1 revision in 02d4a8e):

P1 — §6.6 "L3 only" upgrade scope. Retitled "Upgrade existing installs deliberately (L3 only)" → "Upgrade managed generated artifacts deliberately", and rewrote the parenthetical around the artifact, not the rung: a plain-L2 tool (discovery-dirs only, full overwrite) still skips migration, but the L2b variant — which merges a managed AGENTS.md block into a human-authored file — needs the format stamp and forward-compat guard even though it isn't L3. No longer contradicts the artifact-driven rule in §6.0.

P2 — Cursor native-scan contradiction. Verified against the current Cursor Agent Skills docs: native recursive scan of .agents/skills/ and .cursor/skills/ (+ the ~/ variants), with compatibility loading from .claude/skills/ and .codex/skills/. Moved Cursor from the installer-reach bucket to verified-native in both the path table and the precision note, and made the verified-vendor-native vs. Vercel-installer-reach distinction explicit (Vercel's supported-agent table is installer reach, not proof a vendor's own loader reads the path).

P3 — spec reads pre-implementation. Flipped Status from "Draft (not yet applied)" → "Implemented … this spec is a record, not an open proposal." (The Open Questions and Phase 1 checklist were already marked resolved in 02d4a8e.)

Non-blocking items, considered:

  • Holistic recommendation ladder — the guideline already frames plugin marketplaces as a packaging/publishing channel rather than the baseline (§1, the §6.6 Codex note, and the "Anthropic's own marketplaces are the exception" note at the end). Happy to fold your sharper 8-point ladder into one explicit summary block as a follow-up if you'd like it stated in a single place.
  • Dogfooding roadmap (7 pts) — tracked as Phase 2 follow-ons in the spec: explicit tbd distribution contract, single-source canonical skill + drift test, artifact-driven format migration in code/tests, the surface= generator migration, and surface/rung output in setup/doctor. Kept out of this pure-docs PR per your merge stance.

One heads-up: since your review (made at 2c54b53), the gap-1 shape changed per owner direction to a single named "L2b" variant within L2 — no L2a/L2b ladder renumber; qmd is the plain-L2 reference, pprose the L2b one. Your holistic rec lists L2a/L2b as distinct rungs, so if you'd prefer the full split instead, say the word and I'll switch it back (it's a small change).


Generated by Claude Code

@jlevy jlevy merged commit b156bfa into main Jun 13, 2026
6 checks passed
@jlevy jlevy deleted the claude/hopeful-hopper-thbntw branch June 13, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cli-agent-skill-patterns: five small gaps surfaced by an L2-ish CLI (pprose)

2 participants