-
Notifications
You must be signed in to change notification settings - Fork 0
refactor: rename dataset to suite across codebase #943
Copy link
Copy link
Closed
Labels
coreAnything pertaining to core functionality of AgentVAnything pertaining to core functionality of AgentV
Description
Objective
Rename dataset → suite everywhere. An eval file is a test suite (lifecycle hooks, workspace setup/teardown, execution config), not a dataset (passive input/output pairs).
Context
The field has ping-ponged: dataset → eval_set (#678) → dataset (#814). The #814 rename to dataset cited "industry conventions (Braintrust, LangSmith, DeepEval)" — but those platforms use "dataset" because their datasets really are just input/output pairs with no execution semantics. In agentv, eval files have:
before_all/after_alllifecycle hooksbefore_each/after_eachper-test hooks- Workspace setup/teardown and pooling
- Execution config (trials, timeout, concurrency)
- Target matrix evaluation
That's a test suite, not a dataset. suite is the accurate term.
Scope
~190 occurrences across ~27 files:
| Area | Files | Occurrences |
|---|---|---|
packages/core/ |
types, orchestrator, yaml-parser, jsonl-parser, schema, validator, trace | ~50 |
apps/cli/ |
artifact-writer, trace commands, pipeline, results/serve, tests | ~80 |
apps/studio/ |
api, types, Sidebar, RunDetail, routes, Breadcrumbs | ~110 |
apps/web/ (docs) |
TBD | TBD |
examples/ |
baseline JSONL files | ~42 files |
Wire format (JSONL results)
// Before
{"test_id":"foo","dataset":"my-eval","score":1.0,...}
// After
{"test_id":"foo","suite":"my-eval","score":1.0,...}Backward compatibility
Same pattern as the #814 rename:
- JSONL parser accepts both
suiteanddataset(deprecated alias) - Zod schema accepts both field names
- CLI
--group-by datasetaccepted as deprecated alias for--group-by suite - Pipeline bench/grade read
manifest.suite ?? manifest.dataset
Internal types
// Before
interface EvalTest {
readonly dataset?: string;
}
interface EvaluationResult {
readonly dataset?: string;
}
// After
interface EvalTest {
readonly suite?: string;
}
interface EvaluationResult {
readonly suite?: string;
}Studio UI
- "Datasets" → "Suites" in headings, sidebar labels
- Route
/runs/:runId/dataset/:dataset→/runs/:runId/suite/:suite DatasetSidebar→SuiteSidebarcomponent rename (or keep generic)- API endpoint
/api/runs/:filename/datasets→/suites
YAML
No change needed — the top-level name field already names the suite. The dataset field on test cases becomes suite (optional grouping tag).
Acceptance Signals
-
EvalTest.suiteandEvaluationResult.suitereplace.dataset - JSONL output writes
suitefield - JSONL parser reads both
suiteanddataset(backward compat) - CLI commands use
suite(--group-by suite, trace stats, etc.) - Studio UI labels say "Suites", routes use
/suite/ - Studio API endpoints renamed
- Example baseline JSONL files updated
- Docs updated on agentv.dev
Non-Goals
- Changing the YAML
namefield (already neutral) - Changing the file extension (
.eval.yamlis fine) - Removing backward-compat aliases in this PR (deprecate only)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
coreAnything pertaining to core functionality of AgentVAnything pertaining to core functionality of AgentV