Skip to content

feat(eval): add provider contract, registry, runner skeleton (#155)#160

Merged
luongnv89 merged 1 commit intomainfrom
feat/155-eval-framework-skeleton
Apr 18, 2026
Merged

feat(eval): add provider contract, registry, runner skeleton (#155)#160
luongnv89 merged 1 commit intomainfrom
feat/155-eval-framework-skeleton

Conversation

@luongnv89
Copy link
Copy Markdown
Owner

Closes #155

Summary

Lands PR 1 of 5 from docs/SKILLGRADE_INTEGRATION_PLAN.md. Introduces the src/eval/ module that PR 2-5 will build on. Pure scaffolding — zero user-visible behavior change; src/cli.ts is untouched (git diff main -- src/cli.ts is empty).

Approach

Five new files mirror plan §2.1 and §2.2, plus a paired test file for each. The EvalProvider contract carries both a semver version (free to bump) and an integer schemaVersion (only bumps on structural breaks), so future providers can evolve on two axes without churning consumers. The registry stores providers per id as a version array so PR 5's asm eval --compare quality@1.0.0,quality@2.0.0-next is structurally supported from day one. No new runtime dependencies — the semver matcher needed for ^, ~, =, * / x is implemented locally (about 100 lines) rather than pulling in the semver npm package.

Changes

File Change
src/eval/types.ts EvalProvider, EvalResult, SkillContext, EvalOpts, CategoryResult, Finding, ApplicableResult, ExternalRequirement, FindingSeverity
src/eval/registry.ts register, resolve(id, semverRange), list, plus parseSemver, compareSemver, satisfiesRange helpers
src/eval/runner.ts runProvider(provider, ctx, opts) — captures startedAt / durationMs, wraps throws into { severity: "error" } findings, honors timeoutMs + external AbortSignal
src/eval/config.ts loadEvalConfig() from ~/.asm/config.yml with defaults, mergeConfig() for test injection
src/eval/providers/index.ts registerBuiltins() hook (empty in PR 1; PR 2/4 wire their providers here)
src/eval/*.test.ts 67 new tests across 4 test files

Test Results

  • New tests: 67 passing (0 failures) across registry.test.ts, runner.test.ts, config.test.ts, providers/index.test.ts
  • Full suite: 1400 pass / 5 fail. The 5 failing tests (publishSkill × 4 in src/publisher.test.ts, CLI integration: import > import existing skills are skipped in src/cli.test.ts) are pre-existing and unrelated to this PR — they fail on a clean checkout of main from before this branch was created. No new regressions.
  • Typecheck passes (bun run --bun tsc --noEmit).
  • Prettier clean (bunx prettier --check src/eval/).

Registry coverage

  • satisfiesRange: exact, caret (1.x, 0.x, 0.0.x), tilde, wildcard, = prefix, invalid range (empty / null / garbage / invalid base version) — all covered.
  • register: missing id, invalid semver version, non-integer schemaVersion, multiple versions per id (allowed), exact duplicate (id, version) pairs (rejected).
  • resolve: returns highest version in range, prefers release over pre-release at same M.m.p, falls back to pre-release when it's the only match, throws on unknown id, throws on no matching version, throws on invalid range, throws on empty id.

Runner coverage

  • Timing: startedAt is ISO-8601 and close to Date.now(); durationMs is a non-negative integer and reflects actual wall-clock work; timing still captured when the provider throws; provider's claimed startedAt / durationMs are overwritten.
  • Error wrapping: thrown Error{ severity: "error", code: "provider-threw" } with passed: false, score: 0, empty categories; string and object throws stringified cleanly; runProvider never re-throws.
  • Timeout: opts.timeoutMs elapsing produces { code: "timeout" }; providers that finish first are unaffected.
  • Identity stamping: runner overrides providerId / providerVersion even if the provider tries to spoof them.

Acceptance Criteria

  • All new files under src/eval/ created per plan §2.1
  • Unit tests for registry resolution (semver range matching, missing id, invalid semver)
  • Unit tests for runner (timing capture, error wrapping)
  • bun test src/eval/ passes (67/67)
  • src/cli.ts is NOT modified in this PR (git diff main -- src/cli.ts empty)

Notes for reviewers

  • registerBuiltins() is intentionally empty — PR 2 adds the quality provider call, PR 4 adds skillgrade. Keeping the list here (instead of having each provider self-register on import) makes ordering deterministic and makes it possible to restrict the provider set in tests.
  • The 5 pre-existing test failures pre-date this branch. Fixing them is out of scope for PR 1 (would violate the "zero behavior change" constraint). SKIP=unit-tests was used only on the commit hook to bypass them; the pre-push build + e2e hooks both passed.

Lands PR 1 of 5 from the Skillgrade integration plan. Introduces the
`src/eval/` module with the `EvalProvider` contract, a semver-range
provider registry, a runner that captures timing and normalizes errors,
and a YAML config reader backed by `~/.asm/config.yml`. No import from
`src/cli.ts` yet — user-visible behavior is unchanged.

- src/eval/types.ts — EvalProvider contract + EvalResult, SkillContext,
  EvalOpts, CategoryResult, Finding. Dual versioning: semver `version`
  plus integer `schemaVersion` for structural breaks.
- src/eval/registry.ts — `register`, `resolve(id, semverRange)`, `list`
  with a minimal built-in semver matcher (^, ~, exact, *) so no new
  runtime dep is pulled in for the scaffolding.
- src/eval/runner.ts — runs a single provider, stamps `startedAt` and
  `durationMs`, converts thrown errors into an `EvalResult` with a
  `severity: "error"` finding so callers never need try/catch.
- src/eval/config.ts — loads and merges the `eval` section of
  `~/.asm/config.yml` with defaults.
- src/eval/providers/index.ts — `registerBuiltins()` hook, empty in
  PR 1 on purpose; PR 2/4 wire their providers in here.

Tests (67 new, all passing): registry semver matching (including caret
0.x/0.0.x edge cases, tilde, wildcards), unknown id, invalid range,
duplicate registration, runner timing capture on both success and
error paths, error wrapping for `Error`/string/object throws, timeout
handling, identity stamping, config defaults + YAML parse + missing
file handling.

Note: SKIP=unit-tests was used to bypass the local pre-commit hook that
runs the entire src/ suite — 5 tests in src/publisher.test.ts and
src/cli.test.ts already fail on main before this branch and are
unrelated to the eval scaffolding (verified by running the same tests
from a clean main checkout). All 67 new eval tests pass.
@luongnv89 luongnv89 merged commit a861c72 into main Apr 18, 2026
10 checks passed
@luongnv89 luongnv89 deleted the feat/155-eval-framework-skeleton branch April 18, 2026 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Skillgrade PR 1: Eval framework skeleton (contract + registry + runner)

1 participant