feat(cli): add skill quality evaluator (#119)#154
Merged
Conversation
Adds `asm eval <skill-path>` which scores a skill's SKILL.md against seven best-practice categories (structure, description, prompt engineering, context efficiency, safety, testability, naming) and emits a structured report with an overall 0-100 score plus the top three actionable suggestions. `--fix` applies deterministic frontmatter fixes (missing version, inferred effort, canonical key ordering, CRLF normalisation, trailing-whitespace stripping, creator from git). `--fix --dry-run` previews a unified diff without writing, and `--fix` on a real run creates a `SKILL.md.bak` before modifying. Supports `--json` and `--machine` for programmatic consumers. Scope choices documented inline in `src/evaluator.ts`: the issue uses `author`/`type`/`XS/S/M/L/XL` terminology which does not match the existing SKILL.md schema described in the README and `utils/frontmatter.ts` (`creator`, `metadata.version`, `low/medium/high/max`). The evaluator maps to existing conventions instead of silently introducing a schema change, and defers `type` since no codebase field uses it. The optional "integrate with asm publish" bullet from the issue is deferred for a follow-up — this PR ships eval standalone. Five pre-existing test failures (4 publishSkill gh-CLI flows, 1 import-integration) are environment-specific on `main` and unrelated to this change; CI on main is green. Closes #119
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
asm eval <skill-path>which scores a skill'sSKILL.mdagainst seven best-practice categories (structure, description, prompt engineering, context efficiency, safety, testability, naming & conventions) and emits a structured report with an overall 0–100 score plus the top three actionable suggestions.--fixapplies deterministic frontmatter fixes (add missing version, infer effort from body size, canonical key ordering, CRLF normalisation, trailing-whitespace stripping, creator fromgit config user.name) and creates aSKILL.md.bakbackup before writing.--fix --dry-runpreviews a unified diff without touching disk.--jsonand--machine(v1 envelope) outputs for programmatic consumers, matching the shape used bydoctor/publish.Changes
src/evaluator.ts— category scorers, report aggregator, auto-fix planner, unified diff helper, formatters, machine-envelope helper.src/evaluator.test.ts— 37 unit tests covering every category scorer, each auto-fixable item in isolation, dry-run, backup creation, idempotency, format helpers, and end-to-endevaluateSkill.src/cli.ts— new--fixflag inParsedArgs, help text,cmdEvaldispatcher,switchcase, andevaladded to thecommandsarray inisCLIMode.doctoralso added to the array (was previously relying on the fallback branch).src/cli.test.ts—eval --help, missing-path error,--json,--machine,--fix --dry-runnon-writing behaviour, and--fixbackup creation.isCLIModetests forevalanddoctor.Schema alignment (intentional)
The issue uses
author/type/XS/S/M/L/XL, which don't match the existingSKILL.mdschema described in the README and parsed bysrc/utils/frontmatter.ts. Rather than silently introduce a new schema, the evaluator maps issue terminology to existing conventions:authorcreator(ormetadata.creator)versionmetadata.versionpreferred,versionfallback (viaresolveVersion)XS/S/M/L/XLlow/medium/high/max(README table)typeThe decision is documented in the module docstring at the top of
src/evaluator.ts.Scope note
The optional "Can be integrated into
asm publishas pre-publish quality gate (medium)" item from the issue is deferred. Hooking into the existing publish pipeline would widen the blast radius of this PR (existing publish tests, a behavior change for an already-shipped command). This PR shipsevalstandalone; publish integration is a clean follow-up.Testing
bun test src/evaluator.test.ts— 37 pass, 0 fail, 70 expect() calls.bun test src/cli.test.ts --test-name-pattern "eval"— 7 new CLI integration tests pass.bun run typecheck— clean.bunx prettier --check src/evaluator.ts src/evaluator.test.ts src/cli.ts src/cli.test.ts— clean.bun run build— succeeds (run by thepre-pushhook).bun test tests/e2e/bun-e2e.test.ts— passes (run by thepre-pushhook).asm evalagainst./skills/hello-world(scored 40/F) and./skills/skill-index-updater(scored higher) to confirm scoring differentiates real skills rather than being degenerate.Note on pre-existing test failures
Five unit tests fail locally on both
mainand this branch due to local environment state (4publishSkill > ...tests that depend on git /ghCLI state, and 1CLI integration: import > import existing skills are skippedtest that collides with the user's globally-installed skills). These failures exist onmainprior to this PR — verified viagit stash+bun test. CI onmainis green, so the CI sandbox is not affected. This PR does not add or touch any of those tests.Test plan
asm eval ./skills/hello-worldproduces a scored reportasm eval ./skills/hello-world --jsonemits parseable JSON with 7 categoriesasm eval <tempdir>/skill --fix --dry-runprints a diff and does not modifySKILL.mdasm eval <tempdir>/skill --fixcreatesSKILL.md.bakand rewrites the originalasm eval <bogus>exits with code 1 and a helpful errorasm evalwith no path prints the usage error and exits with code 2asm eval --helpprints helpasm eval ./skills/hello-world --machineemits a v1 envelope withcommand: "eval"Closes #119