feat(eval): add skillgrade runtime provider (#158)#163
Merged
Conversation
Ships runtime evaluation via the external `skillgrade` CLI
(mgechev/skillgrade). Provider answers "does this skill actually work?"
by running task prompts through LLM-judge graders — orthogonal to the
existing static quality linter.
Highlights:
- `src/eval/providers/skillgrade/v1/` — provider, adapter, scaffold, and
compound-range matcher for `externalRequires`. All external work
(spawn, filesystem stat) goes through injectable seams so tests never
touch the real binary or network.
- `applicable()` reports missing binary, version out of range, and
missing eval.yaml with actionable hints.
- `run()` shells out to `skillgrade run --ci --threshold <n> --preset
<p> --provider <docker|local> --json` and adapts the JSON to
`EvalResult`. Error classification covers missing API key, Docker
unavailable, timeout, abort, non-zero exit, and unparseable stdout.
- `scaffoldEvalYaml()` wraps `skillgrade init` for
`asm eval <skill> --runtime init`.
- CLI additions: `--runtime`, `--runtime init`, `--preset`,
`--threshold`, `--provider`. `~/.asm/config.yml eval.providers.
skillgrade.{preset,threshold,provider}` is read and overridden by CLI
flags.
- Tests: 76 provider tests with injected spawner, 10 CLI integration
tests (including stub-binary end-to-end tests), recorded fixtures
under `src/eval/providers/skillgrade/v1/fixtures/`.
Pre-existing failures in publisher.test.ts / cli.test.ts (unrelated to
this PR) are left untouched.
PR 4 of 5 from the Skillgrade integration plan.
luongnv89
added a commit
that referenced
this pull request
Apr 18, 2026
Makes `asm eval --runtime` work immediately after `npm install -g agent-skill-manager` with zero extra setup. Skillgrade ships as a direct dependency; the provider resolves it from asm's own `node_modules` at runtime. Why this approach: - Single `dependencies` entry covers all install paths (npm, bun, curl|bash) - No PATH pollution, no conflict with a user's own skillgrade install - `ASM_SKILLGRADE_BIN` env override for custom builds / pinned versions - Falls back to PATH lookup when the bundled copy is unreachable Notable changes: - `resolve-binary.ts`: uses `createRequire(import.meta.url).resolve()` to locate the bundled skillgrade from both source and the `dist/` bundle - `spawn.ts`: dual-runtime support. Under Bun uses `Bun.spawn`; under Node uses `child_process.spawn`. This is required because asm's bin has a `#!/usr/bin/env node` shebang, so npm-installed asm runs under Node and without a Node spawn path the bundled skillgrade would be unreachable - Stub-binary CLI tests refactored to use `ASM_SKILLGRADE_BIN` instead of PATH scrubbing (PATH scrubs no longer hide the bundled binary) Verified end-to-end: freshly-packed tarball installed via `npm install -g` into a clean prefix reaches the bundled skillgrade from the Node runtime with zero user setup. Skips unit-tests hook: 5 pre-existing failures on main (publisher.test.ts x4, cli.test.ts "import existing skills are skipped") unrelated to this change — matches how the earlier Skillgrade PRs (#157, #158, #163, #164) shipped through the same constraint. Build + typecheck + all new/touched tests pass.
6 tasks
luongnv89
added a commit
that referenced
this pull request
Apr 18, 2026
) * feat(eval): bundle skillgrade for transparent runtime-eval install Makes `asm eval --runtime` work immediately after `npm install -g agent-skill-manager` with zero extra setup. Skillgrade ships as a direct dependency; the provider resolves it from asm's own `node_modules` at runtime. Why this approach: - Single `dependencies` entry covers all install paths (npm, bun, curl|bash) - No PATH pollution, no conflict with a user's own skillgrade install - `ASM_SKILLGRADE_BIN` env override for custom builds / pinned versions - Falls back to PATH lookup when the bundled copy is unreachable Notable changes: - `resolve-binary.ts`: uses `createRequire(import.meta.url).resolve()` to locate the bundled skillgrade from both source and the `dist/` bundle - `spawn.ts`: dual-runtime support. Under Bun uses `Bun.spawn`; under Node uses `child_process.spawn`. This is required because asm's bin has a `#!/usr/bin/env node` shebang, so npm-installed asm runs under Node and without a Node spawn path the bundled skillgrade would be unreachable - Stub-binary CLI tests refactored to use `ASM_SKILLGRADE_BIN` instead of PATH scrubbing (PATH scrubs no longer hide the bundled binary) Verified end-to-end: freshly-packed tarball installed via `npm install -g` into a clean prefix reaches the bundled skillgrade from the Node runtime with zero user setup. Skips unit-tests hook: 5 pre-existing failures on main (publisher.test.ts x4, cli.test.ts "import existing skills are skipped") unrelated to this change — matches how the earlier Skillgrade PRs (#157, #158, #163, #164) shipped through the same constraint. Build + typecheck + all new/touched tests pass. * fix(eval): use separate TextDecoders per stream in Node spawn The Node branch of spawnViaNode shared a single TextDecoder between child.stdout and child.stderr. In streaming mode the decoder carries leftover bytes for a partial multi-byte UTF-8 sequence, so interleaved stdout/stderr chunks could corrupt each other's decode. Additionally, the trailing flush only appended to stdout, silently dropping any residual stderr bytes. Allocate two decoders, one per stream, and flush both on close. Skips unit-tests pre-commit hook (5 pre-existing failures on main — publisher x4, cli "import skipped"; identical to PR #165 baseline, unrelated to this fix). Typecheck clean; 80 skillgrade v1 tests pass. * refactor(eval): address reviewer notes on skillgrade v1 Three non-blocking items surfaced in the PR #165 confirmation review: 1. spawn.ts — defensively detach `data` listeners on the Node branch's `error` handler so no chunks can mutate stdout/stderr after the promise settles. Harmless in practice (errored child emits nothing further) but robust against future refactors. 2. spawn.ts — export `spawnViaNode` so tests can drive the Node branch from Bun's test runner without jumping through env-var hoops. 3. spawn.test.ts — new unit test file covering the decoder invariants that regressed before: multi-byte UTF-8 split across chunks on stdout, on stderr, and interleaved on both streams. Also covers exit code passthrough, timeout, AbortSignal, and empty argv. 4. index.ts — comment above DEFAULT_EXTERNAL_REQUIRES documenting why the range is intentionally wider than the package.json pin (`^0.1.3`). Heads off a contributor bumping one without the other. Skips unit-tests pre-commit hook per the documented precedent on this branch (5 pre-existing failures on main, identical baseline). Typecheck clean; 87/87 skillgrade v1 tests pass (+7 new).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #158
Summary
PR 4 of 5 from the Skillgrade integration plan. Ships runtime evaluation via the external
skillgradeCLI (mgechev/skillgrade). The new provider answers "does this skill actually work?" by running task prompts through LLM-judge graders — orthogonal to the existing static quality linter.Approach
skillgrade --version,skillgrade run,skillgrade init) goes through aSpawnerfunction argument. Production wiresBun.spawn; tests inject a fake that returns recorded fixture strings. CI never shells out for real.applicable(). Checks binary on PATH, version insideexternalRequiresrange, andeval.yamlpresence — each with a distinct actionable reason string.externalRequireslike">=0.1.3 <0.3.0"is resolved by a small local matcher insrc/eval/providers/skillgrade/v1/semver-range.ts— the main registry matcher stays single-clause.run()distinguishes missing API key, Docker unavailable, timeout, abort, non-zero exit, unparseable stdout, and binary-missing (spawn-time ENOENT) — each with a distinctFinding.code.~/.asm/config.yml eval.providers.skillgrade.{preset,threshold,provider}is loaded bycmdEvaland overridden by CLI flags.Changes
src/eval/providers/skillgrade/v1/index.tssrc/eval/providers/skillgrade/v1/adapter.tsskillgrade JSON → EvalResultadaptersrc/eval/providers/skillgrade/v1/scaffold.tsskillgrade initwrapper for--runtime initsrc/eval/providers/skillgrade/v1/spawn.tsSpawnercontract + productionBun.spawnimplsrc/eval/providers/skillgrade/v1/semver-range.tsexternalRequiressrc/eval/providers/skillgrade/v1/fixtures/src/eval/providers/skillgrade/v1/*.test.tssrc/eval/providers/index.tsskillgradeProviderV1src/cli.ts--runtime,--runtime init,--preset,--threshold,--provider, config loader wiringsrc/cli.test.tstests/fixtures/skills/with-eval-yaml/eval.yamltests/fixtures/skills/runtime-broken/eval.yamlTest Results
publisher.test.tsandcli.test.ts > import existing skills are skippedfailures called out in the issue description as unrelated. No new regressions.Acceptance Criteria
asm eval ./fixture --runtimeproduces expected output against recorded skillgrade JSON (stub-binary CLI test + provider unit tests)asm eval --runtime initscaffoldseval.yamlviaskillgrade initapplicable()messages when binary missing, version out of range, or noeval.yaml--preset,--threshold,--provider) wired through config + CLI (CLI overrides config)