Skip to content

refactor: rename dataset to suite across codebase #943

@christso

Description

@christso

Objective

Rename datasetsuite everywhere. An eval file is a test suite (lifecycle hooks, workspace setup/teardown, execution config), not a dataset (passive input/output pairs).

Context

The field has ping-ponged: dataseteval_set (#678) → dataset (#814). The #814 rename to dataset cited "industry conventions (Braintrust, LangSmith, DeepEval)" — but those platforms use "dataset" because their datasets really are just input/output pairs with no execution semantics. In agentv, eval files have:

  • before_all / after_all lifecycle hooks
  • before_each / after_each per-test hooks
  • Workspace setup/teardown and pooling
  • Execution config (trials, timeout, concurrency)
  • Target matrix evaluation

That's a test suite, not a dataset. suite is the accurate term.

Scope

~190 occurrences across ~27 files:

Area Files Occurrences
packages/core/ types, orchestrator, yaml-parser, jsonl-parser, schema, validator, trace ~50
apps/cli/ artifact-writer, trace commands, pipeline, results/serve, tests ~80
apps/studio/ api, types, Sidebar, RunDetail, routes, Breadcrumbs ~110
apps/web/ (docs) TBD TBD
examples/ baseline JSONL files ~42 files

Wire format (JSONL results)

// Before
{"test_id":"foo","dataset":"my-eval","score":1.0,...}

// After
{"test_id":"foo","suite":"my-eval","score":1.0,...}

Backward compatibility

Same pattern as the #814 rename:

  • JSONL parser accepts both suite and dataset (deprecated alias)
  • Zod schema accepts both field names
  • CLI --group-by dataset accepted as deprecated alias for --group-by suite
  • Pipeline bench/grade read manifest.suite ?? manifest.dataset

Internal types

// Before
interface EvalTest {
  readonly dataset?: string;
}
interface EvaluationResult {
  readonly dataset?: string;
}

// After
interface EvalTest {
  readonly suite?: string;
}
interface EvaluationResult {
  readonly suite?: string;
}

Studio UI

  • "Datasets" → "Suites" in headings, sidebar labels
  • Route /runs/:runId/dataset/:dataset/runs/:runId/suite/:suite
  • DatasetSidebarSuiteSidebar component rename (or keep generic)
  • API endpoint /api/runs/:filename/datasets/suites

YAML

No change needed — the top-level name field already names the suite. The dataset field on test cases becomes suite (optional grouping tag).

Acceptance Signals

  • EvalTest.suite and EvaluationResult.suite replace .dataset
  • JSONL output writes suite field
  • JSONL parser reads both suite and dataset (backward compat)
  • CLI commands use suite (--group-by suite, trace stats, etc.)
  • Studio UI labels say "Suites", routes use /suite/
  • Studio API endpoints renamed
  • Example baseline JSONL files updated
  • Docs updated on agentv.dev

Non-Goals

  • Changing the YAML name field (already neutral)
  • Changing the file extension (.eval.yaml is fine)
  • Removing backward-compat aliases in this PR (deprecate only)

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreAnything pertaining to core functionality of AgentV

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions