Skip to content

fix(pipeline): deprecate hits/misses grader format, use assertions natively#858

Merged
christso merged 3 commits intomainfrom
fix/deprecate-hits-misses-grader-fallback
Mar 30, 2026
Merged

fix(pipeline): deprecate hits/misses grader format, use assertions natively#858
christso merged 3 commits intomainfrom
fix/deprecate-hits-misses-grader-fallback

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Fixes grader output handling in pipeline grade/bench to support the deprecated hits/misses format while graders transition to emitting assertions natively.

  • grade.ts: adds TODO comment marking the hits/misses fallback for future removal
  • bench.ts: reads LLM grader results from disk to avoid context-window loss across batches
  • SKILL.md: documents write-to-disk approach for LLM grader subagents

Note: @agentv/studio has pre-existing build errors unrelated to this change.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Mar 30, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: c6feef1
Status: ✅  Deploy successful!
Preview URL: https://3d55a48a.agentv.pages.dev
Branch Preview URL: https://fix-deprecate-hits-misses-gr.agentv.pages.dev

View logs

christso and others added 2 commits March 30, 2026 01:03
…k only

bench now reads LLM grader results exclusively from
llm_grader_results/<name>.json per test. Removes the --llm-scores
flag, stdin reading, and readStdin() — simplifying the interface
to a single positional arg.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@christso christso merged commit 290b13d into main Mar 30, 2026
2 checks passed
@christso christso deleted the fix/deprecate-hits-misses-grader-fallback branch March 30, 2026 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant