refactor(core): remove borderline verdict by christso · Pull Request #857 · EntityProcess/agentv

christso · 2026-03-29T22:14:19Z

Summary

Remove borderline from EvaluationVerdict type, simplifying to pass | fail | skip
Simplify scoreToVerdict(): scores < 0.8 are now fail (previously 0.6-0.8 was borderline)
Simplify negateScore(): only swap pass↔fail, skip stays skip
Remove borderline field from EvalSummary and fix inconsistent threshold (was 0.5, now matches 0.8)
Update composite evaluator: only pass counts as passing in threshold aggregator
Update all tests, examples, docs, baseline JSONL fixtures, and skill references

Industry research confirms no major eval framework uses a named intermediate verdict. The numeric score already captures nuance, and the borderline verdict added complexity without clear value.

Test plan

bun run build — TypeScript compilation passes
bun run test — all 1713 tests pass (1295 core + 67 eval + 351 cli)
bun run lint — Biome passes
grep -r "borderline" packages/ apps/ examples/ plugins/ — no remaining code references
Pre-push hooks pass (build, typecheck, lint, test, validate)

🤖 Generated with Claude Code

cloudflare-workers-and-pages · 2026-03-29T22:15:01Z

Deploying agentv with Cloudflare Pages

Latest commit:	`a7e37b2`
Status:	⚡️ Build in progress...

View logs

Simplify EvaluationVerdict to 'pass' | 'fail' | 'skip'. Scores below 0.8 are now 'fail' (previously 0.6-0.8 was 'borderline'). Remove borderline from EvalSummary, scoreToVerdict, negateScore, and composite evaluator. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Change borderline expectations to fail (scores 0.6-0.8 are now fail). Remove borderline-specific tests in negation and composite-threshold. Update threshold aggregator tests since only pass verdicts count. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update example scripts, documentation, baseline JSONL fixtures, and skill references to reflect binary pass/fail verdict system. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Extract PASS_THRESHOLD = 0.8 as single source of truth in scoring.ts - Replace magic 0.8 in evaluate.ts and orchestrator.ts with the constant - Add file header to scoring.ts explaining the scoring model - Use data-driven NEGATED_VERDICT map instead of ternary chain - Remove dead isNonEmptyString import from composite.ts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

christso and others added 5 commits March 29, 2026 22:47

docs: remove borderline references from examples, docs, and baselines

afd2944

Update example scripts, documentation, baseline JSONL fixtures, and skill references to reflect binary pass/fail verdict system. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: fix biome formatting in composite-threshold test

1f10cb1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

christso force-pushed the refactor/remove-borderline-verdict branch from c00ace3 to a7e37b2 Compare March 29, 2026 22:49

christso merged commit 7ae533e into main Mar 29, 2026
1 of 2 checks passed

christso deleted the refactor/remove-borderline-verdict branch March 29, 2026 22:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(core): remove borderline verdict#857

refactor(core): remove borderline verdict#857
christso merged 5 commits intomainfrom
refactor/remove-borderline-verdict

christso commented Mar 29, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Mar 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Mar 29, 2026

Summary

Test plan

Uh oh!

cloudflare-workers-and-pages bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages bot commented Mar 29, 2026 •

edited

Loading