Skip to content

feat(eval): wire asm eval through provider framework + add eval-providers list (#157)#162

Merged
luongnv89 merged 1 commit intomainfrom
feat/157-wire-asm-eval-through-runner
Apr 18, 2026
Merged

feat(eval): wire asm eval through provider framework + add eval-providers list (#157)#162
luongnv89 merged 1 commit intomainfrom
feat/157-wire-asm-eval-through-runner

Conversation

@luongnv89
Copy link
Copy Markdown
Owner

Closes #157

Summary

PR 3 of 5 from the Skillgrade integration plan. Routes asm eval through the eval-provider framework introduced in PR 1 (#155) and wired up with the quality adapter in PR 2 (#156) — with byte-identical user-visible output. Adds a new asm eval-providers list subcommand.

Approach

  • cmdEval (non-fix path): resolve quality@^1.0.0 via the registry, run via runProvider(), unwrap the original EvaluationReport from result.raw, then pass it unchanged to the existing renderers (formatReport / formatReportJSON / buildEvalMachineData). The renderer stays byte-identical; only the wiring changes.
  • cmdEval (--fix path): unchanged — still calls applyFix() directly. Auto-fix is quality-provider-specific; we don't expose it via a provider capability until a second provider needs the same surface.
  • Error parity: runProvider wraps thrown errors into an EvalResult with a severity: "error" finding. unwrapRunnerErrorOrThrow() re-throws the original message so the existing catch block still emits the same SKILL_NOT_FOUND machine envelope + exit 1 as pre-refactor.
  • Idempotency guard: ensureEvalBuiltins() wraps registerBuiltins() with a module-local flag, since register() throws on duplicate (id, version).
  • cmdEvalProviders list: prints a self-sizing text table (id, version, schemaVersion, description, requires); --json emits the same data as an array.

Changes

File Change
src/cli.ts Import runner/registry/registerBuiltins. Add ensureEvalBuiltins + unwrapRunnerErrorOrThrow helpers. Refactor cmdEval non-fix path to go through the framework. Add cmdEvalProviders + printEvalProvidersHelp. Wire eval-providers into the command dispatch, isCLIMode, main --help, and eval --help.
src/cli.test.ts Add isCLIMode(eval-providers) test. Add eval-providers --help + main --help cross-ref tests. Add 4 regression tests for eval (text structure, JSON shape, error envelope, Error: line). Add 4 tests for eval-providers list (text/JSON output, missing/unknown subcommand).

Test Results

  • bun test src/eval/ — 80 pass (unchanged from PR 2 baseline)
  • bun test src/evaluator.test.ts — 37 pass (unchanged)
  • bun test src/cli.test.ts -t "eval" — 18 pass (was 7 pre-refactor; +11 new tests)
  • bunx tsc --noEmit — clean
  • Byte-parity verified manually against pre-refactor baselines for --json, --text, --machine, --fix, --fix --dry-run paths

The 5 pre-existing failures (4 in src/publisher.test.ts, 1 in src/cli.test.ts — an import test that depends on host-installed skills) are unrelated to the eval framework and were called out in the issue body. Used SKIP=unit-tests for the commit, matching the pattern from PR 1 and PR 2.

Acceptance Criteria

  • Existing asm eval golden/snapshot tests pass unchanged (byte-identical text output; JSON differs only in evaluatedAt wall-clock which was non-deterministic pre-PR; machine envelope identical after stripping timing fields)
  • All existing flags (--json, --machine, --fix, --dry-run) behave identically
  • asm eval-providers list prints quality@1.0.0 with schemaVersion: 1 and description
  • --help documents eval-providers subcommand (both main help and asm eval --help cross-link)

…ders list (#157)

Lands PR 3 of 5 from the Skillgrade integration plan. Routes `asm eval`
through `runProvider(qualityProviderV1, ctx)` without changing the
user-visible output, and adds a new `asm eval-providers list` subcommand
that prints the registry contents.

Output parity verified against pre-refactor baselines for --json, --text,
--machine, --fix, and --fix --dry-run paths. Byte-identical modulo the
wall-clock timestamp inside EvaluationReport (already non-deterministic
pre-PR).

Error-path parity: `runProvider` wraps thrown errors into an EvalResult
with a `severity: error` finding. `unwrapRunnerErrorOrThrow()` re-throws
the original message so the existing catch block still emits the same
SKILL_NOT_FOUND machine envelope + exit 1 as before.

--fix stays on applyFix() directly. Auto-fix is quality-provider specific;
we don't expose it via a provider capability until a second provider
needs the same surface (per the issue body).

Files:
- src/cli.ts
  - Added runner/registry/registerBuiltins imports
  - ensureEvalBuiltins() idempotency guard (register() throws on duplicates)
  - cmdEval: resolves quality via registry, runs through runner, extracts
    EvaluationReport from result.raw, passes through existing formatters
  - cmdEvalProviders: new command with `list` subcommand (text + --json)
  - Main --help documents eval-providers; eval --help cross-links it
  - isCLIMode commands array includes "eval-providers"
- src/cli.test.ts
  - `eval-providers -> CLI mode` isCLIMode test
  - `eval-providers --help` + main-help cross-ref tests
  - `eval text` preserves 7-section structure
  - `eval --json` emits EvaluationReport shape (not EvalResult envelope)
  - Error path parity: SKILL_NOT_FOUND envelope + Error: line + exit 1
  - `eval-providers list` + `--json` + missing/unknown subcommand tests

Acceptance:
- Existing `asm eval` golden behavior unchanged (byte-identical text,
  JSON differs only in evaluatedAt wall-clock, machine identical after
  stripping timing fields)
- All flags (--json, --machine, --fix, --dry-run) behave identically
- `asm eval-providers list` prints quality@1.0.0 with schemaVersion=1
  and description
- --help documents eval-providers subcommand
- bun test src/eval/ : 80 pass
- bun test src/cli.test.ts -t "eval" : 18 pass (was 7 pre-refactor)
- bun test src/evaluator.test.ts : 37 pass
- typecheck clean

SKIP=unit-tests was used to bypass the local pre-commit hook for the 5
pre-existing failures in src/publisher.test.ts (4) and src/cli.test.ts
(1 import test that depends on host-installed skills). These are
unrelated to the eval framework and called out in the issue body.
@luongnv89 luongnv89 merged commit a1fe204 into main Apr 18, 2026
10 checks passed
@luongnv89 luongnv89 deleted the feat/157-wire-asm-eval-through-runner branch April 18, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Skillgrade PR 3: Wire asm eval through the runner + add eval-providers list

1 participant