Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions apps/web/src/content/docs/evaluation/eval-cases.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ input:
content: What is 15 + 27?
```

When suite-level `input` is defined in the eval file, those messages are prepended to the test's input. See [Suite-level Input](/evaluation/eval-files/#suite-level-input).
When suite-level `input` is defined in the eval file, those messages are prepended to the test's input. See [Suite-level Input](/docs/evaluation/eval-files/#suite-level-input).

## Expected Output

Expand Down Expand Up @@ -137,7 +137,7 @@ tests:
# Inherits suite-level hooks.before_all
```

See [Workspace Lifecycle Hooks](/targets/configuration/#workspace-lifecycle-hooks) for the full workspace config reference.
See [Workspace Lifecycle Hooks](/docs/targets/configuration/#workspace-lifecycle-hooks) for the full workspace config reference.

## Per-Case Metadata

Expand Down
4 changes: 2 additions & 2 deletions apps/web/src/content/docs/evaluation/eval-files.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ tests:
| `description` | Human-readable description of the evaluation |
| `dataset` | Optional dataset identifier |
| `execution` | Default execution config (`target`, `fail_on_error`, `threshold`, etc.) |
| `workspace` | Suite-level workspace config — inline object or string path to an [external workspace file](/guides/workspace-pool/#external-workspace-config) |
| `workspace` | Suite-level workspace config — inline object or string path to an [external workspace file](/docs/guides/workspace-pool/#external-workspace-config) |
| `tests` | Array of individual tests, or a string path to an external file |
| `assertions` | Suite-level evaluators appended to each test unless `execution.skip_defaults: true` is set on the test |
| `input` | Suite-level input messages prepended to each test's input unless `execution.skip_defaults: true` is set on the test |
Expand Down Expand Up @@ -88,7 +88,7 @@ tests:
input: Check API health
```

`assertions` supports all evaluator types, including deterministic assertion types (`contains`, `regex`, `is_json`, `equals`) and `rubrics`. See [Tests](/evaluation/eval-cases/#per-test-assertions) for per-test assertions usage.
`assertions` supports all evaluator types, including deterministic assertion types (`contains`, `regex`, `is_json`, `equals`) and `rubrics`. See [Tests](/docs/evaluation/eval-cases/#per-test-assertions) for per-test assertions usage.

### Suite-level Input

Expand Down
2 changes: 1 addition & 1 deletion apps/web/src/content/docs/evaluation/sdk.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ export default defineCodeGrader(({ trace, outputText }) => ({

`defineCodeGrader` graders are referenced in YAML with `type: code-grader` and `command: [bun, run, grader.ts]`. `defineAssertion` uses convention-based discovery instead — just place in `.agentv/assertions/` and reference by name.

For detailed patterns, input/output contracts, and language-agnostic examples, see [Code Graders](/evaluators/code-graders/).
For detailed patterns, input/output contracts, and language-agnostic examples, see [Code Graders](/docs/evaluators/code-graders/).

## Programmatic API

Expand Down
2 changes: 1 addition & 1 deletion apps/web/src/content/docs/evaluators/custom-assertions.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ AgentV provides two SDK functions for custom evaluation logic:

**Use `defineAssertion()`** when you want a named assertion type that can be referenced across eval files without specifying a command path. It uses a simplified result contract focused on `pass` and optional `score`.

**Use `defineCodeGrader()`** when you need full control over scoring with explicit `assertions` arrays, or when the evaluator is a one-off grader tied to a specific eval. See [Code Graders](/evaluators/code-graders/) for details.
**Use `defineCodeGrader()`** when you need full control over scoring with explicit `assertions` arrays, or when the evaluator is a one-off grader tied to a specific eval. See [Code Graders](/docs/evaluators/code-graders/) for details.

Both functions handle stdin/stdout JSON parsing, snake_case-to-camelCase conversion, Zod validation, and error handling automatically.

Expand Down
2 changes: 1 addition & 1 deletion apps/web/src/content/docs/evaluators/llm-graders.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ tests:
# No assertions needed — default llm-grader evaluates against criteria
```

When `assertions` **is** present, no default grader is added. To use an LLM grader alongside other graders, declare it explicitly. See [How criteria and assertions interact](/evaluation/eval-cases/#how-criteria-and-assertions-interact).
When `assertions` **is** present, no default grader is added. To use an LLM grader alongside other graders, declare it explicitly. See [How criteria and assertions interact](/docs/evaluation/eval-cases/#how-criteria-and-assertions-interact).

## Configuration

Expand Down
6 changes: 3 additions & 3 deletions apps/web/src/content/docs/getting-started/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ Results appear in `.agentv/results/eval_<timestamp>.jsonl` with scores, reasonin

## Next Steps

- Learn about [eval file formats](/evaluation/eval-files/)
- Configure [targets](/targets/configuration/) for different providers
- Create [custom evaluators](/evaluators/custom-evaluators/)
- Learn about [eval file formats](/docs/evaluation/eval-files/)
- Configure [targets](/docs/targets/configuration/) for different providers
- Create [custom evaluators](/docs/evaluators/custom-evaluators/)
- If setup drifts, rerun: `agentv init`
2 changes: 1 addition & 1 deletion apps/web/src/content/docs/guides/git-cache-workspace.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ sidebar:
order: 3
---

AgentV evaluations that use `workspace.repos` clone repositories directly from their source (git URL or local path) into a workspace directory. [Workspace pooling](/guides/workspace-pool/) (enabled by default) eliminates repeated clone costs by reusing materialized workspaces across runs.
AgentV evaluations that use `workspace.repos` clone repositories directly from their source (git URL or local path) into a workspace directory. [Workspace pooling](/docs/guides/workspace-pool/) (enabled by default) eliminates repeated clone costs by reusing materialized workspaces across runs.

## Eval setup lifecycle

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ After converting, you can:
- Use `code-grader` for custom scoring logic
- Define `tool-trajectory` assertions to check tool usage patterns

See [Skill Evals (evals.json)](/guides/agent-skills-evals/) for the full field mapping and side-by-side comparison.
See [Skill Evals (evals.json)](/docs/guides/agent-skills-evals/) for the full field mapping and side-by-side comparison.

## Migration from Skill-Creator

Expand Down
2 changes: 1 addition & 1 deletion apps/web/src/content/docs/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ hero:
file: ../../assets/logo.svg
actions:
- text: Get Started
link: /getting-started/introduction/
link: /docs/getting-started/introduction/
icon: right-arrow
- text: GitHub
link: https://github.com/EntityProcess/agentv
Expand Down
2 changes: 1 addition & 1 deletion apps/web/src/content/docs/targets/coding-agents.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ When an eval test includes `type: file` inputs, agent providers do **not** recei

The agent is expected to read the files itself using its filesystem tools.

This differs from [LLM providers](/targets/llm-providers), which receive file content embedded directly in the prompt as XML:
This differs from [LLM providers](/docs/targets/llm-providers), which receive file content embedded directly in the prompt as XML:

```xml
<file path="src/example.ts">
Expand Down
2 changes: 1 addition & 1 deletion apps/web/src/content/docs/targets/configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ Each hook config accepts:
}
```

**Suite vs per-test:** When both are defined, test-level fields replace suite-level fields. See [Per-Test Workspace Config](/evaluation/eval-cases/#per-case-workspace-config) for examples.
**Suite vs per-test:** When both are defined, test-level fields replace suite-level fields. See [Per-Test Workspace Config](/docs/evaluation/eval-cases/#per-case-workspace-config) for examples.

### Repository Lifecycle

Expand Down
2 changes: 1 addition & 1 deletion apps/web/src/content/docs/tools/convert.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Outputs a `.eval.yaml` file alongside the input.
agentv convert evals.json
```

Converts an [Agent Skills `evals.json`](/guides/agent-skills-evals) file into an AgentV EVAL YAML file. The converter:
Converts an [Agent Skills `evals.json`](/docs/guides/agent-skills-evals) file into an AgentV EVAL YAML file. The converter:

- Maps `prompt` → `input` message array
- Maps `expected_output` → `expected_output`
Expand Down
Loading