feat(eval): add quality provider adapter (#156)#161
Merged
Conversation
Lands PR 2 of 5 from the Skillgrade integration plan. Wraps the existing static SKILL.md linter (`src/evaluator.ts`) with the `EvalProvider` contract introduced in PR 1 (#155), without modifying the evaluator. Mapping (EvaluationReport -> EvalResult): - overallScore -> score - grade !== "F" -> passed - categories -> categories (1:1 on id/name/score/max) - topSuggestions -> findings (severity "info") - original report -> raw (stable per schemaVersion) Files: - src/eval/providers/quality/v1/index.ts - EvalProvider impl - src/eval/providers/quality/v1/fixtures/*.json - EvalResult snapshots - src/eval/providers/quality/v1/index.test.ts - 12 tests: contract, applicable(), snapshot per corpus skill, registry integration, mapping invariants - src/eval/providers/index.ts - registers quality@1.0.0 via registerBuiltins() - src/eval/providers/index.test.ts - updated to expect 1 provider - tests/fixtures/skills/{well-formed,missing-frontmatter}/SKILL.md - corpus skills (pass path grade A, fail path grade F) Acceptance: - quality provider resolvable via registry.resolve("quality", "^1.0.0") - Snapshot tests assert deep equality against checked-in JSON after stripping non-deterministic timing/path fields (startedAt, durationMs, raw.evaluatedAt, raw.skillPath, raw.skillMdPath) - src/evaluator.ts unchanged (git diff = 0 lines) - bun test src/eval/ all 80 tests pass; typecheck clean The 5 pre-existing failures in src/publisher.test.ts and src/cli.test.ts on main are unrelated and not addressed here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #156
Summary
PR 2 of 5 from the Skillgrade integration plan. Wraps the existing static SKILL.md linter (
src/evaluator.ts) in theEvalProvidercontract introduced by PR 1 (#155) — without modifyingsrc/evaluator.ts. If this adapter needed ugly workarounds, the contract would be what had to change; it didn't, so the interface from PR 1 holds.Approach
Thin delegation.
qualityProviderV1.run()callsevaluateSkill()and mapsEvaluationReportontoEvalResult.applicable()is a cheapstat(skillMdPath)— no other gating, since the evaluator surfaces every other issue as a finding.Mapping (EvaluationReport -> EvalResult)
overallScorescoregrade !== "F"passedcategoriescategories(1:1 onid,name,score,max)topSuggestionsfindingswithseverity: "info"raw(stable perschemaVersion)Evaluator's free-form
findings: string[]andsuggestions: string[]stay inraw— the adapter intentionally does not invent a string→Finding conversion the issue didn't ask for.Changes
src/eval/providers/quality/v1/index.tssrc/eval/providers/quality/v1/fixtures/well-formed.jsonsrc/eval/providers/quality/v1/fixtures/missing-frontmatter.jsonsrc/eval/providers/quality/v1/index.test.tsapplicable(), snapshot per corpus skill, registry integration, mapping invariants)src/eval/providers/index.tsquality@1.0.0viaregisterBuiltins()src/eval/providers/index.test.tstests/fixtures/skills/well-formed/SKILL.mdtests/fixtures/skills/missing-frontmatter/SKILL.mdsrc/evaluator.tsunchanged —git diff HEAD~1 src/evaluator.tsreturns zero lines.Non-determinism handling
Snapshot tests strip these before deep-equal against checked-in JSON:
startedAt,durationMs— stamped by the runner on every callraw.evaluatedAt— wall-clock set byevaluateSkill()raw.skillPath,raw.skillMdPath— absolute paths depend on checkout; fixtures store a__FIXTURE__/<name>markerDrift in fixtures is a review artifact — if evaluator scoring changes, snapshots flag it so reviewers can decide whether the change is intentional.
Test Results
bun test src/eval/— 80 pass, 0 fail (67 existing PR 1 tests + 12 new quality provider tests + 1 existingregisterBuiltinstest)bun run typecheck— cleanbun test src/— 1413 pass, 5 fail (pre-existing unrelated failures inpublisher.test.tsandcli.test.ts, called out in the issue; not addressed here)Acceptance Criteria
qualityprovider registered and resolvable viaregistry.resolve("quality", "^1.0.0")src/evaluator.tsis untouched (diff confirms zero changes)bun testpasses (all new tests green; pre-existing unrelated failures as called out in issue)Boundary kept for PR 3
src/cli.tsdoes not importproviders/index.tsor callregisterBuiltins()— PR 3 owns that wiring. No user-visible behavior change in this PR.