fix(shield): add P0-gate to /plan-review verdict + fix doc inconsistencies#63
Merged
Merged
Conversation
…ncies Brings /plan-review to parity with /prd-review's verdict semantics and resolves four file-level contradictions in the skill. P0-gate (#4) — the substantive fix: - New shield/scripts/compute_plan_verdict.py computes the weighted composite, detects P0s (Critical-severity findings graded D/F), and applies the P0-gate: a composite >= 2.5 with any P0 is gated to "Needs Work", not "Ready". Previously the verdict was composite-only, so a Critical-F could be diluted across the 10 PM dims (then down-weighted to 0.7) and the plan still scored "Ready". prd-review/scoring.md was forked from plan-review WITH this gate added; it was never back-ported until now. - The script is the single source of truth for persona weights (WEIGHTS), which resolves #3 (scoring.md omitted platform/backend engineers and used divergent persona names vs dimensions.md). Eval (RED -> GREEN, committed): - shield/evals/plan-review-verdict.yaml + 4 grades.json fixtures; run.py gains a `verdict` branch. RED: 0/4 (script absent). GREEN: 4/4, including high-composite-p0 -> "Needs Work (composite 3.61, blocked by 1 P0)". - Existing plan-review-trd gates suite: 5/5, no regression. Doc/consistency fixes: - scoring.md (#3,#4): P0-gate verdict table, completed+renamed weight table, fixed overlapping verdict-label ranges, references the script for weights. - templates.md (#1,#2,#6): replaced dead plan-review/{N}-{slug}/ paths with the date-keyed reviews/plan/{date}{_counter}/ paths; rewrote the Dispatch Prompt for Pattern A subagent_type dispatch (no inlined agent markdown); added a Deterministic Gates section to the summary template. - SKILL.md (#4,#5): P0-gated verdict in description; new Step 1b source-plan.md immutable snapshot; scoring step now invokes compute_plan_verdict.py; documented source-plan.md + grades.json in the output tree. - dimensions.md: weights reference the canonical script table. Version: shield 2.26.0 -> 2.27.0 (marketplace.json). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings
/plan-reviewto parity with/prd-review's verdict semantics and fixes four file-level contradictions discovered while comparing the two skills.The core fix is the P0-gate:
prd-review/scoring.mdwas explicitly forked fromplan-review/scoring.mdwith a P0-gate added — but the gate was never back-ported. So a plan with a Critical-severity D/F finding could still score Ready, because that F got averaged across the 10 PM dims and then down-weighted to 0.7 (the "averaging problem")./plan-reviewcomputed P0s but never let them gate the verdict.Changes
shield/scripts/compute_plan_verdict.pycomputes composite + P0 count and applies the gate (composite ≥ 2.5 with any P0 → Needs Work). Skill scoring step now invokes it.scoring.mdweight table incomplete + divergent persona namesWEIGHTSis now the single source of truth;scoring.md/dimensions.mdreference it. Added platform/backend engineers; fixedCloud Architect/Operationsnaming; fixed overlapping verdict-label ranges.templates.mdreported the deadplan-review/{N}-{slug}/pathreviews/plan/{date}{_counter}/.templates.mdDispatch Prompt used the pre-restructure inlined-persona patternsubagent_typedispatch.source-prd.md)source-plan.md.Eval coverage (per
updating-plugin-assets)New deterministic suite:
shield/evals/plan-review-verdict.yaml(+ 4grades.jsonfixtures), run viauv run shield/evals/run.py plan-review-verdict.0/4 cases passed.4/4 cases passed, includinghigh-composite-p0→Needs Work (composite 3.61, blocked by 1 P0)— the exact bug.plan-review-trdgates suite still5/5.Note for reviewers
This makes the verdict script-computed rather than LLM-computed — a deliberate execution-model change that makes the gate deterministic and testable, matching the existing
run.py+check_plan_review_trd.pypattern. If preferred, the script can instead stay an eval-only oracle with the verdict left to the LLM (small revert of the SKILL.md scoring step).Version: shield
2.26.0 → 2.27.0.🤖 Generated with Claude Code