Skip to content

feat(eval): add quality provider adapter (#156)#161

Merged
luongnv89 merged 1 commit intomainfrom
feat/156-quality-provider-adapter
Apr 18, 2026
Merged

feat(eval): add quality provider adapter (#156)#161
luongnv89 merged 1 commit intomainfrom
feat/156-quality-provider-adapter

Conversation

@luongnv89
Copy link
Copy Markdown
Owner

Closes #156

Summary

PR 2 of 5 from the Skillgrade integration plan. Wraps the existing static SKILL.md linter (src/evaluator.ts) in the EvalProvider contract introduced by PR 1 (#155) — without modifying src/evaluator.ts. If this adapter needed ugly workarounds, the contract would be what had to change; it didn't, so the interface from PR 1 holds.

Approach

Thin delegation. qualityProviderV1.run() calls evaluateSkill() and maps EvaluationReport onto EvalResult. applicable() is a cheap stat(skillMdPath) — no other gating, since the evaluator surfaces every other issue as a finding.

Mapping (EvaluationReport -> EvalResult)

EvaluationReport EvalResult
overallScore score
grade !== "F" passed
categories categories (1:1 on id, name, score, max)
topSuggestions findings with severity: "info"
original report raw (stable per schemaVersion)

Evaluator's free-form findings: string[] and suggestions: string[] stay in raw — the adapter intentionally does not invent a string→Finding conversion the issue didn't ask for.

Changes

File Change
src/eval/providers/quality/v1/index.ts New — EvalProvider implementation
src/eval/providers/quality/v1/fixtures/well-formed.json New — snapshot for pass-path corpus skill (score 91, grade A)
src/eval/providers/quality/v1/fixtures/missing-frontmatter.json New — snapshot for fail-path corpus skill (score 21, grade F)
src/eval/providers/quality/v1/index.test.ts New — 12 tests (contract, applicable(), snapshot per corpus skill, registry integration, mapping invariants)
src/eval/providers/index.ts Registers quality@1.0.0 via registerBuiltins()
src/eval/providers/index.test.ts Updated to expect 1 registered provider (was 0 in PR 1)
tests/fixtures/skills/well-formed/SKILL.md New corpus fixture
tests/fixtures/skills/missing-frontmatter/SKILL.md New corpus fixture

src/evaluator.ts unchanged — git diff HEAD~1 src/evaluator.ts returns zero lines.

Non-determinism handling

Snapshot tests strip these before deep-equal against checked-in JSON:

  • startedAt, durationMs — stamped by the runner on every call
  • raw.evaluatedAt — wall-clock set by evaluateSkill()
  • raw.skillPath, raw.skillMdPath — absolute paths depend on checkout; fixtures store a __FIXTURE__/<name> marker

Drift in fixtures is a review artifact — if evaluator scoring changes, snapshots flag it so reviewers can decide whether the change is intentional.

Test Results

  • bun test src/eval/80 pass, 0 fail (67 existing PR 1 tests + 12 new quality provider tests + 1 existing registerBuiltins test)
  • bun run typecheck — clean
  • Full bun test src/ — 1413 pass, 5 fail (pre-existing unrelated failures in publisher.test.ts and cli.test.ts, called out in the issue; not addressed here)

Acceptance Criteria

  • quality provider registered and resolvable via registry.resolve("quality", "^1.0.0")
  • Snapshot test per corpus skill passes — adapter output matches checked-in JSON
  • src/evaluator.ts is untouched (diff confirms zero changes)
  • bun test passes (all new tests green; pre-existing unrelated failures as called out in issue)

Boundary kept for PR 3

src/cli.ts does not import providers/index.ts or call registerBuiltins() — PR 3 owns that wiring. No user-visible behavior change in this PR.

Lands PR 2 of 5 from the Skillgrade integration plan. Wraps the existing
static SKILL.md linter (`src/evaluator.ts`) with the `EvalProvider`
contract introduced in PR 1 (#155), without modifying the evaluator.

Mapping (EvaluationReport -> EvalResult):
- overallScore    -> score
- grade !== "F"   -> passed
- categories      -> categories (1:1 on id/name/score/max)
- topSuggestions  -> findings (severity "info")
- original report -> raw (stable per schemaVersion)

Files:
- src/eval/providers/quality/v1/index.ts       - EvalProvider impl
- src/eval/providers/quality/v1/fixtures/*.json - EvalResult snapshots
- src/eval/providers/quality/v1/index.test.ts  - 12 tests: contract,
  applicable(), snapshot per corpus skill, registry integration,
  mapping invariants
- src/eval/providers/index.ts - registers quality@1.0.0 via
  registerBuiltins()
- src/eval/providers/index.test.ts - updated to expect 1 provider
- tests/fixtures/skills/{well-formed,missing-frontmatter}/SKILL.md -
  corpus skills (pass path grade A, fail path grade F)

Acceptance:
- quality provider resolvable via registry.resolve("quality", "^1.0.0")
- Snapshot tests assert deep equality against checked-in JSON after
  stripping non-deterministic timing/path fields (startedAt,
  durationMs, raw.evaluatedAt, raw.skillPath, raw.skillMdPath)
- src/evaluator.ts unchanged (git diff = 0 lines)
- bun test src/eval/ all 80 tests pass; typecheck clean

The 5 pre-existing failures in src/publisher.test.ts and
src/cli.test.ts on main are unrelated and not addressed here.
@luongnv89 luongnv89 merged commit 3cd2da9 into main Apr 18, 2026
10 checks passed
@luongnv89 luongnv89 deleted the feat/156-quality-provider-adapter branch April 18, 2026 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Skillgrade PR 2: Quality provider adapter wrapping existing evaluator

1 participant