From a92e221e559b9a71733a655d9e1c82dd1df6f04d Mon Sep 17 00:00:00 2001 From: Christopher Tso Date: Sat, 28 Mar 2026 09:20:43 +1100 Subject: [PATCH] =?UTF-8?q?docs(agentv-bench):=20dispatch=20grader=20subag?= =?UTF-8?q?ents=20per=20(test=20=C3=97=20LLM=20grader)=20in=20parallel?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- plugins/agentv-dev/skills/agentv-bench/SKILL.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/plugins/agentv-dev/skills/agentv-bench/SKILL.md b/plugins/agentv-dev/skills/agentv-bench/SKILL.md index f1b27e61..29cf8317 100644 --- a/plugins/agentv-dev/skills/agentv-bench/SKILL.md +++ b/plugins/agentv-dev/skills/agentv-bench/SKILL.md @@ -358,9 +358,7 @@ The agent reads `llm_graders/.json` for each test, grades the response usi } ``` -**Subagent environments (Claude Code):** Dispatch the `grader` subagent (read `agents/grader.md`) for this step. - -**Non-subagent environments (VS Code Copilot, Codex, etc.):** Perform LLM grading inline. Read each `llm_graders/.json`, grade the response against the `prompt_content` criteria, score 0.0–1.0 with evidence, and write the result to `llm_scores.json` in the run directory. +Dispatch one `grader` subagent (read `agents/grader.md`) **per (test × LLM grader) pair**, all in parallel. For example, 5 tests × 2 LLM graders = 10 subagents launched simultaneously. Each subagent reads `/llm_graders/.json`, grades the corresponding `/response.md` against the `prompt_content` criteria, and returns its score (0.0–1.0) and assertions. After all subagents complete, merge their results into a single `llm_scores.json` in the run directory. **Note:** `pipeline bench` merges LLM scores into `index.jsonl` with a full `scores[]` array per entry, matching the CLI-mode schema. The web dashboard (`agentv results serve`) reads this format directly — no separate conversion script is needed. Run `agentv results validate ` to verify compatibility.