From a92e221e559b9a71733a655d9e1c82dd1df6f04d Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Sat, 28 Mar 2026 09:20:43 +1100
Subject: [PATCH] =?UTF-8?q?docs(agentv-bench):=20dispatch=20grader=20subag?=
 =?UTF-8?q?ents=20per=20(test=20=C3=97=20LLM=20grader)=20in=20parallel?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 plugins/agentv-dev/skills/agentv-bench/SKILL.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/plugins/agentv-dev/skills/agentv-bench/SKILL.md b/plugins/agentv-dev/skills/agentv-bench/SKILL.md
index f1b27e61..29cf8317 100644
--- a/plugins/agentv-dev/skills/agentv-bench/SKILL.md
+++ b/plugins/agentv-dev/skills/agentv-bench/SKILL.md
@@ -358,9 +358,7 @@ The agent reads `llm_graders/<name>.json` for each test, grades the response usi
 }
 ```
 
-**Subagent environments (Claude Code):** Dispatch the `grader` subagent (read `agents/grader.md`) for this step.
-
-**Non-subagent environments (VS Code Copilot, Codex, etc.):** Perform LLM grading inline. Read each `llm_graders/<name>.json`, grade the response against the `prompt_content` criteria, score 0.0–1.0 with evidence, and write the result to `llm_scores.json` in the run directory.
+Dispatch one `grader` subagent (read `agents/grader.md`) **per (test × LLM grader) pair**, all in parallel. For example, 5 tests × 2 LLM graders = 10 subagents launched simultaneously. Each subagent reads `<test-id>/llm_graders/<name>.json`, grades the corresponding `<test-id>/response.md` against the `prompt_content` criteria, and returns its score (0.0–1.0) and assertions. After all subagents complete, merge their results into a single `llm_scores.json` in the run directory.
 
 **Note:** `pipeline bench` merges LLM scores into `index.jsonl` with a full `scores[]` array per entry, matching the CLI-mode schema. The web dashboard (`agentv results serve`) reads this format directly — no separate conversion script is needed. Run `agentv results validate <run-dir>` to verify compatibility.