From de497304ad0d73864bb623656325d057a05ff910 Mon Sep 17 00:00:00 2001 From: ashwinimanoj Date: Thu, 4 Jun 2026 22:40:00 +0530 Subject: [PATCH] fix(shield): add P0-gate to /plan-review verdict + fix doc inconsistencies MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Brings /plan-review to parity with /prd-review's verdict semantics and resolves four file-level contradictions in the skill. P0-gate (#4) — the substantive fix: - New shield/scripts/compute_plan_verdict.py computes the weighted composite, detects P0s (Critical-severity findings graded D/F), and applies the P0-gate: a composite >= 2.5 with any P0 is gated to "Needs Work", not "Ready". Previously the verdict was composite-only, so a Critical-F could be diluted across the 10 PM dims (then down-weighted to 0.7) and the plan still scored "Ready". prd-review/scoring.md was forked from plan-review WITH this gate added; it was never back-ported until now. - The script is the single source of truth for persona weights (WEIGHTS), which resolves #3 (scoring.md omitted platform/backend engineers and used divergent persona names vs dimensions.md). Eval (RED -> GREEN, committed): - shield/evals/plan-review-verdict.yaml + 4 grades.json fixtures; run.py gains a `verdict` branch. RED: 0/4 (script absent). GREEN: 4/4, including high-composite-p0 -> "Needs Work (composite 3.61, blocked by 1 P0)". - Existing plan-review-trd gates suite: 5/5, no regression. Doc/consistency fixes: - scoring.md (#3,#4): P0-gate verdict table, completed+renamed weight table, fixed overlapping verdict-label ranges, references the script for weights. - templates.md (#1,#2,#6): replaced dead plan-review/{N}-{slug}/ paths with the date-keyed reviews/plan/{date}{_counter}/ paths; rewrote the Dispatch Prompt for Pattern A subagent_type dispatch (no inlined agent markdown); added a Deterministic Gates section to the summary template. - SKILL.md (#4,#5): P0-gated verdict in description; new Step 1b source-plan.md immutable snapshot; scoring step now invokes compute_plan_verdict.py; documented source-plan.md + grades.json in the output tree. - dimensions.md: weights reference the canonical script table. Version: shield 2.26.0 -> 2.27.0 (marketplace.json). Co-Authored-By: Claude Opus 4.8 (1M context) --- .claude-plugin/marketplace.json | 2 +- shield/evals/plan-review-verdict.yaml | 47 ++++++ .../fixtures/clean-ready/grades.json | 12 ++ .../fixtures/high-composite-p0/grades.json | 12 ++ .../fixtures/needs-work-threshold/grades.json | 11 ++ .../fixtures/not-ready/grades.json | 9 ++ shield/evals/run.py | 26 +++ shield/scripts/compute_plan_verdict.py | 152 ++++++++++++++++++ shield/skills/general/plan-review/SKILL.md | 38 ++++- .../skills/general/plan-review/dimensions.md | 7 +- shield/skills/general/plan-review/scoring.md | 52 ++++-- .../skills/general/plan-review/templates.md | 50 +++--- 12 files changed, 373 insertions(+), 45 deletions(-) create mode 100644 shield/evals/plan-review-verdict.yaml create mode 100644 shield/evals/plan-review-verdict/fixtures/clean-ready/grades.json create mode 100644 shield/evals/plan-review-verdict/fixtures/high-composite-p0/grades.json create mode 100644 shield/evals/plan-review-verdict/fixtures/needs-work-threshold/grades.json create mode 100644 shield/evals/plan-review-verdict/fixtures/not-ready/grades.json create mode 100755 shield/scripts/compute_plan_verdict.py diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index 3ab06b81..e17b31a8 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -9,7 +9,7 @@ { "name": "shield", "description": "Unified SDLC plugin \u2014 research, planning, PM integration, implementation, and continuous review with multi-domain support and specialist agents", - "version": "2.26.0", + "version": "2.27.0", "source": "./shield", "category": "development" }, diff --git a/shield/evals/plan-review-verdict.yaml b/shield/evals/plan-review-verdict.yaml new file mode 100644 index 00000000..23db3246 --- /dev/null +++ b/shield/evals/plan-review-verdict.yaml @@ -0,0 +1,47 @@ +# shield/evals/plan-review-verdict.yaml +# Eval suite for /plan-review verdict computation (composite + P0-gate). +# Invoked by `uv run shield/evals/run.py plan-review-verdict`. +# +# Each fixture is a grades.json holding aggregated per-persona grades plus the +# classified findings. compute_plan_verdict.py turns those into a composite +# score and applies the P0-gate. The `verdict` expectation lists substrings +# that must ALL appear in the script's stdout. + +name: plan-review-verdict +description: > + Asserts that /plan-review's verdict logic applies the P0-gate: a high + composite with a Critical D/F finding is gated to Needs Work, while a clean + high composite is Ready and a weak plan is Not Ready. + +cases: + # Strong grades, no Critical D/F → Ready. + - name: clean-ready + fixture: fixtures/clean-ready + expect: + verdict: + - "Ready" + - "composite 3.6" + + # THE bug today: composite >= 2.5 but a Critical-F is present. Without the + # P0-gate this scored "Ready"; the gate must flip it to Needs Work. + - name: high-composite-p0 + fixture: fixtures/high-composite-p0 + expect: + verdict: + - "Needs Work" + - "blocked by 1 P0" + + # Mid composite, no P0 (the lone finding is Important/C = P1) → Needs Work, + # and crucially NOT "blocked by" (distinguishes gate from threshold). + - name: needs-work-threshold + fixture: fixtures/needs-work-threshold + expect: + verdict: + - "Needs Work" + + # Weak grades across the board → Not Ready on composite alone. + - name: not-ready + fixture: fixtures/not-ready + expect: + verdict: + - "Not Ready" diff --git a/shield/evals/plan-review-verdict/fixtures/clean-ready/grades.json b/shield/evals/plan-review-verdict/fixtures/clean-ready/grades.json new file mode 100644 index 00000000..89fdd015 --- /dev/null +++ b/shield/evals/plan-review-verdict/fixtures/clean-ready/grades.json @@ -0,0 +1,12 @@ +{ + "personas": [ + { "name": "architect", "grade": "A" }, + { "name": "security-engineer", "grade": "A" }, + { "name": "dx-engineer", "grade": "B" }, + { "name": "agile-coach", "grade": "A" }, + { "name": "product-manager", "grade": "B" } + ], + "findings": [ + { "id": "PM6", "severity": "Warning", "grade": "C" } + ] +} diff --git a/shield/evals/plan-review-verdict/fixtures/high-composite-p0/grades.json b/shield/evals/plan-review-verdict/fixtures/high-composite-p0/grades.json new file mode 100644 index 00000000..700b1e57 --- /dev/null +++ b/shield/evals/plan-review-verdict/fixtures/high-composite-p0/grades.json @@ -0,0 +1,12 @@ +{ + "personas": [ + { "name": "architect", "grade": "A" }, + { "name": "security-engineer", "grade": "A" }, + { "name": "dx-engineer", "grade": "B" }, + { "name": "agile-coach", "grade": "A" }, + { "name": "product-manager", "grade": "B" } + ], + "findings": [ + { "id": "PM2", "severity": "Critical", "grade": "F" } + ] +} diff --git a/shield/evals/plan-review-verdict/fixtures/needs-work-threshold/grades.json b/shield/evals/plan-review-verdict/fixtures/needs-work-threshold/grades.json new file mode 100644 index 00000000..f86f9263 --- /dev/null +++ b/shield/evals/plan-review-verdict/fixtures/needs-work-threshold/grades.json @@ -0,0 +1,11 @@ +{ + "personas": [ + { "name": "architect", "grade": "C" }, + { "name": "security-engineer", "grade": "C" }, + { "name": "dx-engineer", "grade": "B" }, + { "name": "product-manager", "grade": "C" } + ], + "findings": [ + { "id": "PM4", "severity": "Important", "grade": "C" } + ] +} diff --git a/shield/evals/plan-review-verdict/fixtures/not-ready/grades.json b/shield/evals/plan-review-verdict/fixtures/not-ready/grades.json new file mode 100644 index 00000000..87a62ba8 --- /dev/null +++ b/shield/evals/plan-review-verdict/fixtures/not-ready/grades.json @@ -0,0 +1,9 @@ +{ + "personas": [ + { "name": "architect", "grade": "D" }, + { "name": "security-engineer", "grade": "F" }, + { "name": "dx-engineer", "grade": "D" }, + { "name": "product-manager", "grade": "D" } + ], + "findings": [] +} diff --git a/shield/evals/run.py b/shield/evals/run.py index 7c957432..9eb0a450 100755 --- a/shield/evals/run.py +++ b/shield/evals/run.py @@ -36,6 +36,7 @@ VALIDATE_TRD = SCRIPTS_DIR / "validate_trd.py" VALIDATE_PLAN = SCRIPTS_DIR / "validate_plan.py" CHECK_PLAN_REVIEW_TRD = SCRIPTS_DIR / "check_plan_review_trd.py" +COMPUTE_PLAN_VERDICT = SCRIPTS_DIR / "compute_plan_verdict.py" def _run_validator(script: Path, target: Path) -> tuple[int, str]: @@ -125,6 +126,31 @@ def run_suite(suite_name: str, only_case: str | None = None, verbose: bool = Fal elif verbose: print(f" gates OK ({gates_expect})") + # plan-review verdict check (composite + P0-gate). The `verdict` + # expectation is one or more substrings that must ALL appear in the + # compute_plan_verdict.py stdout for grades.json. + verdict_expect = expect.get("verdict") + if verdict_expect is not None: + needles = [verdict_expect] if isinstance(verdict_expect, str) else list(verdict_expect) + grades_path = fixture_dir / "grades.json" + proc = subprocess.run( + [sys.executable, str(COMPUTE_PLAN_VERDICT), str(grades_path)], + capture_output=True, + text=True, + ) + stdout = proc.stdout.strip() + missing = [n for n in needles if n not in stdout] + if proc.returncode != 0: + failed_assertions.append( + f"verdict: script exit={proc.returncode} stderr={proc.stderr.strip()!r}" + ) + elif missing: + failed_assertions.append( + f"verdict: missing {missing!r} in stdout={stdout!r}" + ) + elif verbose: + print(f" verdict OK ({needles})") + if failed_assertions: print(f" FAIL {name}") for fa in failed_assertions: diff --git a/shield/scripts/compute_plan_verdict.py b/shield/scripts/compute_plan_verdict.py new file mode 100755 index 00000000..78662449 --- /dev/null +++ b/shield/scripts/compute_plan_verdict.py @@ -0,0 +1,152 @@ +#!/usr/bin/env python3 +# /// script +# requires-python = ">=3.11" +# /// +"""compute_plan_verdict.py — deterministic verdict for /plan-review. + +Turns aggregated per-persona grades plus the classified findings into a +composite readiness score and applies the P0-gate documented in +`shield/skills/general/plan-review/scoring.md`. This mechanizes the +"averaging problem" guard: a strong composite cannot hide a Critical-severity +D/F finding. + +This module is the SINGLE SOURCE OF TRUTH for plan-review persona weights. +`scoring.md` and `dimensions.md` reference this table rather than restating it. + +Input (JSON on stdin or a file path argument): + + { + "personas": [ {"name": "architect", "grade": "B"}, ... ], + "findings": [ {"id": "PM2", "severity": "Critical", "grade": "F"}, ... ] + } + +- `personas[].name` must be a known persona (see WEIGHTS); unknown names error. +- `personas[].grade` is the persona's aggregated letter grade (A-F). +- `findings[]` is the classified finding list; a P0 is any finding with + severity "Critical" graded D or F. + +Output (stdout), three stable lines: + + composite: 2.93 (B) + p0_count: 1 + verdict: Needs Work (composite 2.93, blocked by 1 P0) + +Exit codes: + 0 — verdict computed. + 2 — usage / input error (unknown persona, bad grade, malformed JSON). +""" +from __future__ import annotations + +import argparse +import json +import sys +from pathlib import Path +from typing import Any + +# Canonical persona weights. Mirrors dimensions.md / personas.md. Core = 1.0, +# supporting = 0.7. The PM persona weight (0.7) applies to the grade rolled up +# from the 10 PM dim subagents (PM1-PM10). +WEIGHTS: dict[str, float] = { + "architect": 1.0, + "security-engineer": 1.0, + "dx-engineer": 1.0, + "platform-engineer": 1.0, + "backend-engineer": 1.0, + "finops-analyst": 0.7, + "agile-coach": 0.7, + "sre": 0.7, + "product-manager": 0.7, +} + +GRADE_TO_NUM: dict[str, int] = {"A": 4, "B": 3, "C": 2, "D": 1, "F": 0} + + +def _letter(num: float) -> str: + """Map a numeric average to a letter using scoring.md's range table.""" + if num >= 3.5: + return "A" + if num >= 2.5: + return "B" + if num >= 1.5: + return "C" + if num >= 0.5: + return "D" + return "F" + + +def _composite(personas: list[dict[str, Any]]) -> float: + """Weighted average of activated persona grades. Denominator is the sum of + weights for personas that actually ran — not all of WEIGHTS.""" + num = 0.0 + denom = 0.0 + for p in personas: + name = p.get("name") + if name not in WEIGHTS: + raise ValueError(f"unknown persona: {name!r} (known: {sorted(WEIGHTS)})") + grade = (p.get("grade") or "").strip().upper() + if grade not in GRADE_TO_NUM: + raise ValueError(f"bad grade {grade!r} for persona {name!r}") + weight = WEIGHTS[name] + num += GRADE_TO_NUM[grade] * weight + denom += weight + if denom == 0: + raise ValueError("no activated personas — cannot compute composite") + return num / denom + + +def _p0_count(findings: list[dict[str, Any]]) -> int: + """P0 = grade D or F on a Critical-severity finding (scoring.md).""" + count = 0 + for f in findings: + severity = (f.get("severity") or "").strip().lower() + grade = (f.get("grade") or "").strip().upper() + if severity == "critical" and grade in {"D", "F"}: + count += 1 + return count + + +def _verdict(composite: float, p0_count: int) -> str: + """Composite + P0-gate. A high composite with any P0 is gated to Needs Work.""" + if composite < 1.5: + return f"Not Ready (composite {composite:.2f})" + if composite < 2.5: + return f"Needs Work (composite {composite:.2f})" + if p0_count > 0: + plural = "P0" if p0_count == 1 else "P0s" + return f"Needs Work (composite {composite:.2f}, blocked by {p0_count} {plural})" + return f"Ready (composite {composite:.2f})" + + +def compute(payload: dict[str, Any]) -> str: + personas = payload.get("personas") or [] + findings = payload.get("findings") or [] + composite = _composite(personas) + p0 = _p0_count(findings) + return ( + f"composite: {composite:.2f} ({_letter(composite)})\n" + f"p0_count: {p0}\n" + f"verdict: {_verdict(composite, p0)}" + ) + + +def main(argv: list[str]) -> int: + parser = argparse.ArgumentParser(description=__doc__.split("\n", 1)[0]) + parser.add_argument( + "grades_json", + nargs="?", + default=None, + help="Path to grades.json. Reads stdin when omitted.", + ) + args = parser.parse_args(argv) + try: + raw = Path(args.grades_json).read_text() if args.grades_json else sys.stdin.read() + payload = json.loads(raw) + print(compute(payload)) + except (ValueError, json.JSONDecodeError, OSError) as exc: + print(f"error: {exc}", file=sys.stderr) + return 2 + return 0 + + +if __name__ == "__main__": + sys.exit(main(sys.argv[1:])) diff --git a/shield/skills/general/plan-review/SKILL.md b/shield/skills/general/plan-review/SKILL.md index 53d73809..a3ed2781 100644 --- a/shield/skills/general/plan-review/SKILL.md +++ b/shield/skills/general/plan-review/SKILL.md @@ -1,6 +1,6 @@ --- name: plan-review -description: Use when a plan, architecture doc, or execution plan exists and needs expert review before implementation. Triggers on /plan-review, review my plan, document review. +description: Use when a plan, architecture doc, or execution plan exists and needs expert review before implementation. Produces a scored analysis with a P0-gated verdict and an enhanced plan. Triggers on /plan-review, review my plan, document review. --- # Plan Review @@ -14,6 +14,8 @@ All review output goes into a per-run, date-keyed folder under the feature's `re ``` {output_dir}/{feature}/reviews/plan/{date}{_counter}/ ← {review_dir} ├── summary.md ← {review_summary} (scored analysis, main output) +├── source-plan.md ← immutable verbatim snapshot of the reviewed plan +├── grades.json ← aggregated persona grades + findings (verdict input) ├── enhanced-plan.md ← {review_enhanced} (enhanced plan with feedback applied) └── detailed/ └── .md ← {review_detailed} (one per dispatched agent) @@ -59,17 +61,26 @@ At startup, call execute-steps to register these steps. Execute them in order, u | 0i | `lld_draft_review` — apply the LLD structural rubric (missing always-on, missing forced subsection, vague TBDs in always-on, PoD lifted but vague) to every `docs/shield/{feature}/lld-*.md` draft | when feature-folder LLD drafts exist | Yes — High/Medium/Review depending on issue | | 1 | Load plan document | always | Yes | | 1a | Detect prior PRD in feature folder — read prd.meta.json if present | only if prd.meta.json exists | No | +| 1b | Snapshot the loaded plan to `{review_dir}/source-plan.md` (immutable) — see Source Snapshot | always | Yes | | 2 | Select reviewer personas | always | Yes | | 3 | Dispatch selected agents in parallel | always | Yes | -| 4 | Parse grades + calculate scores | always | Yes | +| 4 | Parse grades; compute composite + verdict via `compute_plan_verdict.py` (P0-gate) — see Collection & Scoring | always | Yes | | 5 | Generate enhanced plan | always | Yes | -| 6 | Write summary + detailed findings (gates 0a-0e flow in here as Critical findings) | always | Yes | +| 6 | Write summary + detailed findings (gates 0a-0i flow in here as Critical/High findings) | always | Yes | | 7 | Update manifest | always | Yes | ### Step 1a: Detect prior PRD If `{output_dir}/{feature}/prd.meta.json` exists (alongside `{prd}` = `{output_dir}/{feature}/prd.md`), read it. Use its `sections_present` and `type` to inform the plan-vs-PRD alignment check (future enhancement — for now, record it in `{review_summary}` as a "Source PRD" header line, e.g. `Source PRD: prd.md (type: standard, rubric: 1.2)`). This gives reviewers visibility into which PRD version the plan was built from. +### Step 1b: Source Snapshot + +Immediately after loading the plan (and before dispatch), copy the plan markdown verbatim to +`{review_dir}/source-plan.md`. This is an immutable snapshot — the enhanced plan is annotated +separately as `enhanced-plan.md`, never in place. The snapshot makes each date-keyed review +folder self-contained for audit (reviews never overwrite prior runs). Mirrors `/prd-review`'s +`source-prd.md`. When the plan source is HTML, snapshot the parsed markdown and note the origin. + ## Plan Input The skill reads plan data from (in priority order): @@ -379,14 +390,27 @@ After all agents return: templates expect. 3. **Per-persona grade** — average numeric grades (A=4, B=3, C=2, D=1, F=0) within each persona, round using ranges in `scoring.md`. -4. **Composite score** — weighted average using persona weights from `dimensions.md` (PM - persona is 0.7, applied to the aggregated PM grade), convert to verdict per `scoring.md` - thresholds. -5. **Classify recommendations** — P0/P1/P2 per severity rules in `scoring.md`. +4. **Classify recommendations** — P0/P1/P2 per severity rules in `scoring.md`. A P0 is any + finding graded D or F on a **Critical** severity evaluation point. Failed deterministic + gates (0a–0i) are also P0s. +5. **Compute composite + verdict deterministically** — do NOT compute the verdict by hand. + Build a `grades.json` payload — `{ "personas": [{"name": "", "grade": ""}, ...], + "findings": [{"id": "...", "severity": "Critical|Important|Warning", "grade": ""}, ...] }` + — and run: + + ```bash + uv run "$CLAUDE_PLUGIN_ROOT/scripts/compute_plan_verdict.py" {review_dir}/grades.json + ``` + + It returns the composite (with weights from its canonical `WEIGHTS` table), the P0 count, and + the verdict string — applying the P0-gate (a composite ≥ 2.5 with any P0 is gated to **Needs + Work**, not Ready). Use the emitted `verdict:` line verbatim in `summary.md`. ## Output Write to `{review_dir}` = `{output_dir}/{feature}/reviews/plan/{date}{_counter}/`: +- `source-plan.md` — immutable verbatim snapshot of the reviewed plan (Step 1b) +- `grades.json` — aggregated persona grades + classified findings (input to `compute_plan_verdict.py`) - `{review_summary}` (`summary.md`) — scored evaluation with consolidated recommendations - `{review_enhanced}` (`enhanced-plan.md`) — enhanced version of original plan with feedback applied - `{review_detailed}` (`detailed/.md`) — full output from each dispatched agent diff --git a/shield/skills/general/plan-review/dimensions.md b/shield/skills/general/plan-review/dimensions.md index c22ad11e..7811330b 100644 --- a/shield/skills/general/plan-review/dimensions.md +++ b/shield/skills/general/plan-review/dimensions.md @@ -48,9 +48,10 @@ These continue to dispatch as full persona agents — they are not decomposed in | Backend engineer | `shield:backend-engineer` | 1.0 | Application code, API design | | Security engineer | `shield:security-engineer` | 1.0 | Security posture, threat modeling | -Persona weights (used by `scoring.md` composite calculation) are unchanged from `personas.md`. -The PM persona weight is 0.7 — it now applies to the aggregated PM grade rolled up from the -10 dim subagents. +Persona weights are defined canonically in `shield/scripts/compute_plan_verdict.py` (`WEIGHTS`); +`scoring.md` and the table above reference that single source. The PM persona weight is 0.7 — it +applies to the aggregated PM grade rolled up from the 10 dim subagents. Platform Engineer and +Backend Engineer are weight 1.0 (Core); both are in the canonical table. ## Dispatch shape per pattern diff --git a/shield/skills/general/plan-review/scoring.md b/shield/skills/general/plan-review/scoring.md index 7a6b5f64..0ccf8047 100644 --- a/shield/skills/general/plan-review/scoring.md +++ b/shield/skills/general/plan-review/scoring.md @@ -26,17 +26,21 @@ Average all evaluation point grades for a persona, then round to the nearest let Weighted average of all activated persona grades. -**Persona Weights:** +**Persona Weights:** the canonical table lives in `shield/scripts/compute_plan_verdict.py` +(`WEIGHTS`). It is the single source of truth — `dimensions.md` and this file reference it +rather than restating values. For human reference, the current weights are: -| Persona | Weight | Role | +| Persona (subagent slug) | Weight | Role | |---------|--------|------| -| Cloud Architect | 1.0 | Core | -| Security Engineer | 1.0 | Core | -| DX Engineer | 1.0 | Core | -| Cost/FinOps | 0.7 | Supporting | -| Agile Coach | 0.7 | Supporting | -| Operations | 0.7 | Supporting | -| Product Manager | 0.7 | Supporting | +| `architect` | 1.0 | Core | +| `security-engineer` | 1.0 | Core | +| `dx-engineer` | 1.0 | Core | +| `platform-engineer` | 1.0 | Core | +| `backend-engineer` | 1.0 | Core | +| `finops-analyst` | 0.7 | Supporting | +| `agile-coach` | 0.7 | Supporting | +| `sre` | 0.7 | Supporting | +| `product-manager` | 0.7 | Supporting (applied to the grade rolled up from PM1-PM10) | **Formula:** @@ -44,15 +48,33 @@ Weighted average of all activated persona grades. composite = sum(persona_numeric_grade * weight) / sum(activated_weights) ``` -Only activated personas contribute to the composite. The denominator is the sum of weights for personas that actually ran — not all 7. +Only activated personas contribute to the composite. The denominator is the sum of weights +for personas that actually ran — not all nine. -## Verdict Thresholds +## Verdict — composite + P0-gate -| Composite Range | Verdict | +**Do not compute the verdict by hand.** Feed the aggregated persona grades and the classified +findings to `shield/scripts/compute_plan_verdict.py`; it returns the composite, the P0 count, +and the verdict string. The SKILL.md scoring step invokes it. + +The composite alone can hide a fatal gap (the "averaging problem"): enough strong personas can +drown out one F on a Critical dimension — and plan-review makes this worse by first averaging +the ten PM dims into one grade, then weighting that aggregate at only 0.7. **P0 presence GATES +the verdict.** + +| Condition | Verdict | |----------------|---------| -| A - B (>= 2.5) | Ready | -| B - C (1.5 - 2.4) | Needs Work | -| D - F (< 1.5) | Not Ready | +| Composite < 1.5 | **Not Ready** | +| Composite 1.5 – 2.4 | **Needs Work** | +| Composite ≥ 2.5 AND any P0 present | **Needs Work** (composite is informational; the P0 floor binds) | +| Composite ≥ 2.5 AND zero P0s | **Ready** | + +**Verdict line in `summary.md`** (verbatim from the script): +- With P0s: `Needs Work (composite 3.61, blocked by 1 P0)` +- Clean: `Ready (composite 3.61)` + +This is aligned with `/prd-review`'s P0-gate (`prd-review/scoring.md`) — same averaging-problem +guard, same gate semantics. ## Priority Classification diff --git a/shield/skills/general/plan-review/templates.md b/shield/skills/general/plan-review/templates.md index 8d421ee0..e4918b78 100644 --- a/shield/skills/general/plan-review/templates.md +++ b/shield/skills/general/plan-review/templates.md @@ -2,24 +2,21 @@ ## Dispatch Prompt -For each selected agent, use this prompt structure with the Agent tool: +Dispatch each selected reviewer via the `Agent` tool with `subagent_type` set to the agent's +slug (e.g. `shield:architect`, `shield:security-engineer`, or one of the PM1-PM10 dim subagents +— see `dimensions.md`). The subagent loads its OWN persona definition; do **not** inline the +agent markdown into the prompt. Pass only the plan path and the rubric pointer: ``` -You are reviewing a plan document as a specialized reviewer. +You are reviewing a plan document in plan-review mode. - -{full content of the agent's markdown file} - + +{path to the plan document — e.g. docs/shield/{feature}/plan.md or the source-plan.md snapshot} + - -{full content of the plan document} - - - -{full content of scoring.md} - - -Review the plan according to your persona's evaluation points. Grade each point A-F using the scoring rubric. Produce your output in the exact format specified in your persona's Output Format section. +Grade against your persona's evaluation points using the rubric at +`shield/skills/general/plan-review/scoring.md`. Produce output in the exact format +specified in your persona's Output Format section. Important: - Grade based on what the plan SAYS, not what you assume @@ -28,7 +25,8 @@ Important: - Recommendations must be actionable — not "improve this" but "add X to section Y" ``` -Use `subagent_type` matching the agent name (e.g., `shield:architect`) when available, otherwise use `general-purpose`. +Fall back to `subagent_type: general-purpose` only if the named subagent is unavailable; in +that case inline the agent's markdown file as a `` block. ## plan-review-summary.md Format @@ -40,6 +38,20 @@ Use `subagent_type` matching the agent name (e.g., `shield:architect`) when avai **Reviewers:** **Composite Score:** +## Deterministic Gates + +Results of gates 0a–0i (run before persona dispatch). List each as PASS or the named +Critical/High/Medium finding. A failed gate is a P0 regardless of persona grades. + +| Gate | Check | Result | +|------|-------|--------| +| 0a | plan.json schema (validate_plan.py) | PASS | +| 0b | TRD 14-section presence (validate_trd.py) | PASS | +| 0c | Stale design_refs[] anchors | PASS | +| 0d | PRD↔TRD duplication | PASS | +| 0e | Implementation-manual (TRD §7) | PASS | +| 0f–0i | touches_lld / registry / undocumented / LLD-draft | PASS | + ## Score Summary | Persona | Grade | Key Finding | @@ -109,10 +121,10 @@ After writing both files, summarize: > **P1 issues:** 5 (should fix for quality) > **P2 issues:** 3 (nice to have) > -> Files written: -> - `{feature}/plan-review/{N}-{slug}/summary.md` — full scored analysis -> - `{feature}/plan-review/{N}-{slug}/enhanced-plan.md` — enhanced plan with recommendations applied -> - `{feature}/plan-review/{N}-{slug}/detailed/.md` — per-agent detailed findings +> Files written (under `{review_dir}` = `{output_dir}/{feature}/reviews/plan/{date}{_counter}/`): +> - `summary.md` — full scored analysis +> - `enhanced-plan.md` — enhanced plan with recommendations applied +> - `detailed/.md` — per-agent detailed findings > > Review the analysis and enhanced plan. You can edit either file before proceeding. >