diff --git a/apps/web/src/content/docs/evaluation/eval-cases.mdx b/apps/web/src/content/docs/evaluation/eval-cases.mdx index 539cdc788..afe5cae07 100644 --- a/apps/web/src/content/docs/evaluation/eval-cases.mdx +++ b/apps/web/src/content/docs/evaluation/eval-cases.mdx @@ -51,7 +51,7 @@ input: content: What is 15 + 27? ``` -When suite-level `input` is defined in the eval file, those messages are prepended to the test's input. See [Suite-level Input](/evaluation/eval-files/#suite-level-input). +When suite-level `input` is defined in the eval file, those messages are prepended to the test's input. See [Suite-level Input](/docs/evaluation/eval-files/#suite-level-input). ## Expected Output @@ -137,7 +137,7 @@ tests: # Inherits suite-level hooks.before_all ``` -See [Workspace Lifecycle Hooks](/targets/configuration/#workspace-lifecycle-hooks) for the full workspace config reference. +See [Workspace Lifecycle Hooks](/docs/targets/configuration/#workspace-lifecycle-hooks) for the full workspace config reference. ## Per-Case Metadata diff --git a/apps/web/src/content/docs/evaluation/eval-files.mdx b/apps/web/src/content/docs/evaluation/eval-files.mdx index 41c03eb97..cdfbaf9ec 100644 --- a/apps/web/src/content/docs/evaluation/eval-files.mdx +++ b/apps/web/src/content/docs/evaluation/eval-files.mdx @@ -35,7 +35,7 @@ tests: | `description` | Human-readable description of the evaluation | | `dataset` | Optional dataset identifier | | `execution` | Default execution config (`target`, `fail_on_error`, `threshold`, etc.) | -| `workspace` | Suite-level workspace config — inline object or string path to an [external workspace file](/guides/workspace-pool/#external-workspace-config) | +| `workspace` | Suite-level workspace config — inline object or string path to an [external workspace file](/docs/guides/workspace-pool/#external-workspace-config) | | `tests` | Array of individual tests, or a string path to an external file | | `assertions` | Suite-level evaluators appended to each test unless `execution.skip_defaults: true` is set on the test | | `input` | Suite-level input messages prepended to each test's input unless `execution.skip_defaults: true` is set on the test | @@ -88,7 +88,7 @@ tests: input: Check API health ``` -`assertions` supports all evaluator types, including deterministic assertion types (`contains`, `regex`, `is_json`, `equals`) and `rubrics`. See [Tests](/evaluation/eval-cases/#per-test-assertions) for per-test assertions usage. +`assertions` supports all evaluator types, including deterministic assertion types (`contains`, `regex`, `is_json`, `equals`) and `rubrics`. See [Tests](/docs/evaluation/eval-cases/#per-test-assertions) for per-test assertions usage. ### Suite-level Input diff --git a/apps/web/src/content/docs/evaluation/sdk.mdx b/apps/web/src/content/docs/evaluation/sdk.mdx index e7e912661..3bddcbb7c 100644 --- a/apps/web/src/content/docs/evaluation/sdk.mdx +++ b/apps/web/src/content/docs/evaluation/sdk.mdx @@ -90,7 +90,7 @@ export default defineCodeGrader(({ trace, outputText }) => ({ `defineCodeGrader` graders are referenced in YAML with `type: code-grader` and `command: [bun, run, grader.ts]`. `defineAssertion` uses convention-based discovery instead — just place in `.agentv/assertions/` and reference by name. -For detailed patterns, input/output contracts, and language-agnostic examples, see [Code Graders](/evaluators/code-graders/). +For detailed patterns, input/output contracts, and language-agnostic examples, see [Code Graders](/docs/evaluators/code-graders/). ## Programmatic API diff --git a/apps/web/src/content/docs/evaluators/custom-assertions.mdx b/apps/web/src/content/docs/evaluators/custom-assertions.mdx index 00ec39417..5987a0dae 100644 --- a/apps/web/src/content/docs/evaluators/custom-assertions.mdx +++ b/apps/web/src/content/docs/evaluators/custom-assertions.mdx @@ -18,7 +18,7 @@ AgentV provides two SDK functions for custom evaluation logic: **Use `defineAssertion()`** when you want a named assertion type that can be referenced across eval files without specifying a command path. It uses a simplified result contract focused on `pass` and optional `score`. -**Use `defineCodeGrader()`** when you need full control over scoring with explicit `assertions` arrays, or when the evaluator is a one-off grader tied to a specific eval. See [Code Graders](/evaluators/code-graders/) for details. +**Use `defineCodeGrader()`** when you need full control over scoring with explicit `assertions` arrays, or when the evaluator is a one-off grader tied to a specific eval. See [Code Graders](/docs/evaluators/code-graders/) for details. Both functions handle stdin/stdout JSON parsing, snake_case-to-camelCase conversion, Zod validation, and error handling automatically. diff --git a/apps/web/src/content/docs/evaluators/llm-graders.mdx b/apps/web/src/content/docs/evaluators/llm-graders.mdx index 7df1546fc..9e17027c8 100644 --- a/apps/web/src/content/docs/evaluators/llm-graders.mdx +++ b/apps/web/src/content/docs/evaluators/llm-graders.mdx @@ -19,7 +19,7 @@ tests: # No assertions needed — default llm-grader evaluates against criteria ``` -When `assertions` **is** present, no default grader is added. To use an LLM grader alongside other graders, declare it explicitly. See [How criteria and assertions interact](/evaluation/eval-cases/#how-criteria-and-assertions-interact). +When `assertions` **is** present, no default grader is added. To use an LLM grader alongside other graders, declare it explicitly. See [How criteria and assertions interact](/docs/evaluation/eval-cases/#how-criteria-and-assertions-interact). ## Configuration diff --git a/apps/web/src/content/docs/getting-started/quickstart.mdx b/apps/web/src/content/docs/getting-started/quickstart.mdx index d9f99b68c..81b35127c 100644 --- a/apps/web/src/content/docs/getting-started/quickstart.mdx +++ b/apps/web/src/content/docs/getting-started/quickstart.mdx @@ -67,7 +67,7 @@ Results appear in `.agentv/results/eval_.jsonl` with scores, reasonin ## Next Steps -- Learn about [eval file formats](/evaluation/eval-files/) -- Configure [targets](/targets/configuration/) for different providers -- Create [custom evaluators](/evaluators/custom-evaluators/) +- Learn about [eval file formats](/docs/evaluation/eval-files/) +- Configure [targets](/docs/targets/configuration/) for different providers +- Create [custom evaluators](/docs/evaluators/custom-evaluators/) - If setup drifts, rerun: `agentv init` diff --git a/apps/web/src/content/docs/guides/git-cache-workspace.mdx b/apps/web/src/content/docs/guides/git-cache-workspace.mdx index a78e6124c..3079c151d 100644 --- a/apps/web/src/content/docs/guides/git-cache-workspace.mdx +++ b/apps/web/src/content/docs/guides/git-cache-workspace.mdx @@ -5,7 +5,7 @@ sidebar: order: 3 --- -AgentV evaluations that use `workspace.repos` clone repositories directly from their source (git URL or local path) into a workspace directory. [Workspace pooling](/guides/workspace-pool/) (enabled by default) eliminates repeated clone costs by reusing materialized workspaces across runs. +AgentV evaluations that use `workspace.repos` clone repositories directly from their source (git URL or local path) into a workspace directory. [Workspace pooling](/docs/guides/workspace-pool/) (enabled by default) eliminates repeated clone costs by reusing materialized workspaces across runs. ## Eval setup lifecycle diff --git a/apps/web/src/content/docs/guides/skill-improvement-workflow.mdx b/apps/web/src/content/docs/guides/skill-improvement-workflow.mdx index 6843e5b81..53f937e32 100644 --- a/apps/web/src/content/docs/guides/skill-improvement-workflow.mdx +++ b/apps/web/src/content/docs/guides/skill-improvement-workflow.mdx @@ -257,7 +257,7 @@ After converting, you can: - Use `code-grader` for custom scoring logic - Define `tool-trajectory` assertions to check tool usage patterns -See [Skill Evals (evals.json)](/guides/agent-skills-evals/) for the full field mapping and side-by-side comparison. +See [Skill Evals (evals.json)](/docs/guides/agent-skills-evals/) for the full field mapping and side-by-side comparison. ## Migration from Skill-Creator diff --git a/apps/web/src/content/docs/index.mdx b/apps/web/src/content/docs/index.mdx index a9748102c..79de69d31 100644 --- a/apps/web/src/content/docs/index.mdx +++ b/apps/web/src/content/docs/index.mdx @@ -7,7 +7,7 @@ hero: file: ../../assets/logo.svg actions: - text: Get Started - link: /getting-started/introduction/ + link: /docs/getting-started/introduction/ icon: right-arrow - text: GitHub link: https://github.com/EntityProcess/agentv diff --git a/apps/web/src/content/docs/targets/coding-agents.mdx b/apps/web/src/content/docs/targets/coding-agents.mdx index eba7107a0..ae5ac3002 100644 --- a/apps/web/src/content/docs/targets/coding-agents.mdx +++ b/apps/web/src/content/docs/targets/coding-agents.mdx @@ -20,7 +20,7 @@ When an eval test includes `type: file` inputs, agent providers do **not** recei The agent is expected to read the files itself using its filesystem tools. -This differs from [LLM providers](/targets/llm-providers), which receive file content embedded directly in the prompt as XML: +This differs from [LLM providers](/docs/targets/llm-providers), which receive file content embedded directly in the prompt as XML: ```xml diff --git a/apps/web/src/content/docs/targets/configuration.mdx b/apps/web/src/content/docs/targets/configuration.mdx index fc09001a3..31fd57bb2 100644 --- a/apps/web/src/content/docs/targets/configuration.mdx +++ b/apps/web/src/content/docs/targets/configuration.mdx @@ -162,7 +162,7 @@ Each hook config accepts: } ``` -**Suite vs per-test:** When both are defined, test-level fields replace suite-level fields. See [Per-Test Workspace Config](/evaluation/eval-cases/#per-case-workspace-config) for examples. +**Suite vs per-test:** When both are defined, test-level fields replace suite-level fields. See [Per-Test Workspace Config](/docs/evaluation/eval-cases/#per-case-workspace-config) for examples. ### Repository Lifecycle diff --git a/apps/web/src/content/docs/tools/convert.mdx b/apps/web/src/content/docs/tools/convert.mdx index aeac6640f..24ad82814 100644 --- a/apps/web/src/content/docs/tools/convert.mdx +++ b/apps/web/src/content/docs/tools/convert.mdx @@ -31,7 +31,7 @@ Outputs a `.eval.yaml` file alongside the input. agentv convert evals.json ``` -Converts an [Agent Skills `evals.json`](/guides/agent-skills-evals) file into an AgentV EVAL YAML file. The converter: +Converts an [Agent Skills `evals.json`](/docs/guides/agent-skills-evals) file into an AgentV EVAL YAML file. The converter: - Maps `prompt` → `input` message array - Maps `expected_output` → `expected_output`