Conversation
…ders list (#157) Lands PR 3 of 5 from the Skillgrade integration plan. Routes `asm eval` through `runProvider(qualityProviderV1, ctx)` without changing the user-visible output, and adds a new `asm eval-providers list` subcommand that prints the registry contents. Output parity verified against pre-refactor baselines for --json, --text, --machine, --fix, and --fix --dry-run paths. Byte-identical modulo the wall-clock timestamp inside EvaluationReport (already non-deterministic pre-PR). Error-path parity: `runProvider` wraps thrown errors into an EvalResult with a `severity: error` finding. `unwrapRunnerErrorOrThrow()` re-throws the original message so the existing catch block still emits the same SKILL_NOT_FOUND machine envelope + exit 1 as before. --fix stays on applyFix() directly. Auto-fix is quality-provider specific; we don't expose it via a provider capability until a second provider needs the same surface (per the issue body). Files: - src/cli.ts - Added runner/registry/registerBuiltins imports - ensureEvalBuiltins() idempotency guard (register() throws on duplicates) - cmdEval: resolves quality via registry, runs through runner, extracts EvaluationReport from result.raw, passes through existing formatters - cmdEvalProviders: new command with `list` subcommand (text + --json) - Main --help documents eval-providers; eval --help cross-links it - isCLIMode commands array includes "eval-providers" - src/cli.test.ts - `eval-providers -> CLI mode` isCLIMode test - `eval-providers --help` + main-help cross-ref tests - `eval text` preserves 7-section structure - `eval --json` emits EvaluationReport shape (not EvalResult envelope) - Error path parity: SKILL_NOT_FOUND envelope + Error: line + exit 1 - `eval-providers list` + `--json` + missing/unknown subcommand tests Acceptance: - Existing `asm eval` golden behavior unchanged (byte-identical text, JSON differs only in evaluatedAt wall-clock, machine identical after stripping timing fields) - All flags (--json, --machine, --fix, --dry-run) behave identically - `asm eval-providers list` prints quality@1.0.0 with schemaVersion=1 and description - --help documents eval-providers subcommand - bun test src/eval/ : 80 pass - bun test src/cli.test.ts -t "eval" : 18 pass (was 7 pre-refactor) - bun test src/evaluator.test.ts : 37 pass - typecheck clean SKIP=unit-tests was used to bypass the local pre-commit hook for the 5 pre-existing failures in src/publisher.test.ts (4) and src/cli.test.ts (1 import test that depends on host-installed skills). These are unrelated to the eval framework and called out in the issue body.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #157
Summary
PR 3 of 5 from the Skillgrade integration plan. Routes
asm evalthrough the eval-provider framework introduced in PR 1 (#155) and wired up with the quality adapter in PR 2 (#156) — with byte-identical user-visible output. Adds a newasm eval-providers listsubcommand.Approach
cmdEval(non-fix path): resolvequality@^1.0.0via the registry, run viarunProvider(), unwrap the originalEvaluationReportfromresult.raw, then pass it unchanged to the existing renderers (formatReport/formatReportJSON/buildEvalMachineData). The renderer stays byte-identical; only the wiring changes.cmdEval(--fix path): unchanged — still callsapplyFix()directly. Auto-fix is quality-provider-specific; we don't expose it via a provider capability until a second provider needs the same surface.runProviderwraps thrown errors into anEvalResultwith aseverity: "error"finding.unwrapRunnerErrorOrThrow()re-throws the original message so the existing catch block still emits the sameSKILL_NOT_FOUNDmachine envelope + exit 1 as pre-refactor.ensureEvalBuiltins()wrapsregisterBuiltins()with a module-local flag, sinceregister()throws on duplicate(id, version).cmdEvalProviders list: prints a self-sizing text table (id, version, schemaVersion, description, requires);--jsonemits the same data as an array.Changes
src/cli.tsensureEvalBuiltins+unwrapRunnerErrorOrThrowhelpers. RefactorcmdEvalnon-fix path to go through the framework. AddcmdEvalProviders+printEvalProvidersHelp. Wireeval-providersinto the command dispatch,isCLIMode, main--help, andeval --help.src/cli.test.tsisCLIMode(eval-providers)test. Addeval-providers --help+ main--helpcross-ref tests. Add 4 regression tests foreval(text structure, JSON shape, error envelope, Error: line). Add 4 tests foreval-providers list(text/JSON output, missing/unknown subcommand).Test Results
bun test src/eval/— 80 pass (unchanged from PR 2 baseline)bun test src/evaluator.test.ts— 37 pass (unchanged)bun test src/cli.test.ts -t "eval"— 18 pass (was 7 pre-refactor; +11 new tests)bunx tsc --noEmit— clean--json,--text,--machine,--fix,--fix --dry-runpathsThe 5 pre-existing failures (4 in
src/publisher.test.ts, 1 insrc/cli.test.ts— animporttest that depends on host-installed skills) are unrelated to the eval framework and were called out in the issue body. UsedSKIP=unit-testsfor the commit, matching the pattern from PR 1 and PR 2.Acceptance Criteria
asm evalgolden/snapshot tests pass unchanged (byte-identical text output; JSON differs only inevaluatedAtwall-clock which was non-deterministic pre-PR; machine envelope identical after stripping timing fields)--json,--machine,--fix,--dry-run) behave identicallyasm eval-providers listprintsquality@1.0.0withschemaVersion: 1and description--helpdocumentseval-providerssubcommand (both main help andasm eval --helpcross-link)