fix(studio): use 0.8 pass threshold instead of 1.0#863
Merged
Conversation
…hold Replace hardcoded `score >= 1` checks in 5 server endpoints with a configurable pass_threshold loaded from config.yaml in the runs directory. Defaults to PASS_THRESHOLD (0.8) from @agentv/core when no config exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d isPassing helper Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tils Replace hardcoded score >= 1.0 with PASS_THRESHOLD (0.8) in listResultFiles pass count calculation so it aligns with the standard evaluation threshold. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace hardcoded score < 1 checks with isPassing(score, passThreshold) using the studio config's pass_threshold (default 0.8). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hardcoded score >= 1 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace hardcoded `score >= 1` checks with `isPassing(score, passThreshold)` using the `useStudioConfig` hook in EvalSidebar, DatasetSidebar, and DatasetPage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deploying agentv with
|
| Latest commit: |
74e4c0f
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://42293bce.agentv.pages.dev |
| Branch Preview URL: | https://fix-862-pass-threshold.agentv.pages.dev |
Remove redundant Output and Task tabs that showed identical file trees. Replace with a single Files tab for browsing eval artifacts. Remove legacy fallback logic for pre-manifest result formats. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The loadStudioConfig was receiving the project root (cwd) instead of the actual runs directory. Now correctly constructs the path to .agentv/results/runs/config.yaml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The config is a global Studio setting, not per-run data. It belongs alongside cache.json in the .agentv/ directory root. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Re-read .agentv/config.yaml on each API request instead of once at startup so external edits are picked up immediately. Add POST /api/config endpoint to save config changes. Add /settings route with card-based UI for editing pass threshold. Add settings link to sidebar. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Read config once per request, not once per result row. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PASS_THRESHOLDpass_thresholdviaconfig.yamlin the.agentv/results/runs/directoryscore >= 1checks across server (5) and client (7) codeHow to override the threshold
Create
.agentv/results/runs/config.yaml:The Studio server reads this on startup and serves it via
/api/config. The frontend fetches it and uses it for all pass/fail UI decisions. Default remains 0.8 (matching@agentv/corePASS_THRESHOLD).Files changed
Server:
apps/cli/src/commands/results/studio-config.ts(new) — config loaderapps/cli/src/commands/results/serve.ts—/api/configendpoint + threshold in 5 endpointsapps/cli/src/commands/trace/utils.ts—listResultFilespass countClient:
apps/studio/src/lib/api.ts—useStudioConfighook,isPassinghelperapps/studio/src/lib/types.ts—StudioConfigResponsetypeapps/studio/src/components/RunDetail.tsx— pass/fail countingapps/studio/src/components/EvalDetail.tsx— failure reason displayapps/studio/src/components/Sidebar.tsx— sidebar pass/fail indicatorsapps/studio/src/routes/runs/$runId_.dataset.$dataset.tsx— dataset pageTests:
apps/cli/test/commands/results/studio-config.test.ts(new) — 6 testsTest plan
agentv studioand verify scores between 0.8-1.0 show as passedCloses #862
🤖 Generated with Claude Code