Skip to content

fix(shield): add P0-gate to /plan-review verdict + fix doc inconsistencies#63

Merged
ashwinimanoj merged 1 commit into
mainfrom
fix/plan-review-p0-gate-and-consistency
Jun 4, 2026
Merged

fix(shield): add P0-gate to /plan-review verdict + fix doc inconsistencies#63
ashwinimanoj merged 1 commit into
mainfrom
fix/plan-review-p0-gate-and-consistency

Conversation

@ashwinimanoj

Copy link
Copy Markdown
Contributor

Summary

Brings /plan-review to parity with /prd-review's verdict semantics and fixes four file-level contradictions discovered while comparing the two skills.

The core fix is the P0-gate: prd-review/scoring.md was explicitly forked from plan-review/scoring.md with a P0-gate added — but the gate was never back-ported. So a plan with a Critical-severity D/F finding could still score Ready, because that F got averaged across the 10 PM dims and then down-weighted to 0.7 (the "averaging problem"). /plan-review computed P0s but never let them gate the verdict.

Changes

# Issue Fix
4 No P0-gate (vs prd-review) New shield/scripts/compute_plan_verdict.py computes composite + P0 count and applies the gate (composite ≥ 2.5 with any P0 → Needs Work). Skill scoring step now invokes it.
3 scoring.md weight table incomplete + divergent persona names Script WEIGHTS is now the single source of truth; scoring.md/dimensions.md reference it. Added platform/backend engineers; fixed Cloud Architect/Operations naming; fixed overlapping verdict-label ranges.
1 templates.md reported the dead plan-review/{N}-{slug}/ path Replaced with date-keyed reviews/plan/{date}{_counter}/.
2 templates.md Dispatch Prompt used the pre-restructure inlined-persona pattern Rewrote for Pattern A subagent_type dispatch.
6 Summary template predated gates 0a–0i Added a Deterministic Gates section.
5 No immutable source snapshot (vs prd-review's source-prd.md) New Step 1b writes source-plan.md.

Eval coverage (per updating-plugin-assets)

New deterministic suite: shield/evals/plan-review-verdict.yaml (+ 4 grades.json fixtures), run via uv run shield/evals/run.py plan-review-verdict.

  • RED (script absent): 0/4 cases passed.
  • GREEN: 4/4 cases passed, including high-composite-p0Needs Work (composite 3.61, blocked by 1 P0) — the exact bug.
  • No regression: existing plan-review-trd gates suite still 5/5.
=== eval suite: plan-review-verdict (4 cases) ===
  PASS clean-ready
  PASS high-composite-p0
  PASS needs-work-threshold
  PASS not-ready
=== 4/4 cases passed ===

Note for reviewers

This makes the verdict script-computed rather than LLM-computed — a deliberate execution-model change that makes the gate deterministic and testable, matching the existing run.py + check_plan_review_trd.py pattern. If preferred, the script can instead stay an eval-only oracle with the verdict left to the LLM (small revert of the SKILL.md scoring step).

Version: shield 2.26.0 → 2.27.0.

🤖 Generated with Claude Code

…ncies

Brings /plan-review to parity with /prd-review's verdict semantics and
resolves four file-level contradictions in the skill.

P0-gate (#4) — the substantive fix:
- New shield/scripts/compute_plan_verdict.py computes the weighted composite,
  detects P0s (Critical-severity findings graded D/F), and applies the P0-gate:
  a composite >= 2.5 with any P0 is gated to "Needs Work", not "Ready".
  Previously the verdict was composite-only, so a Critical-F could be diluted
  across the 10 PM dims (then down-weighted to 0.7) and the plan still scored
  "Ready". prd-review/scoring.md was forked from plan-review WITH this gate
  added; it was never back-ported until now.
- The script is the single source of truth for persona weights (WEIGHTS),
  which resolves #3 (scoring.md omitted platform/backend engineers and used
  divergent persona names vs dimensions.md).

Eval (RED -> GREEN, committed):
- shield/evals/plan-review-verdict.yaml + 4 grades.json fixtures; run.py gains
  a `verdict` branch. RED: 0/4 (script absent). GREEN: 4/4, including
  high-composite-p0 -> "Needs Work (composite 3.61, blocked by 1 P0)".
- Existing plan-review-trd gates suite: 5/5, no regression.

Doc/consistency fixes:
- scoring.md (#3,#4): P0-gate verdict table, completed+renamed weight table,
  fixed overlapping verdict-label ranges, references the script for weights.
- templates.md (#1,#2,#6): replaced dead plan-review/{N}-{slug}/ paths with the
  date-keyed reviews/plan/{date}{_counter}/ paths; rewrote the Dispatch Prompt
  for Pattern A subagent_type dispatch (no inlined agent markdown); added a
  Deterministic Gates section to the summary template.
- SKILL.md (#4,#5): P0-gated verdict in description; new Step 1b source-plan.md
  immutable snapshot; scoring step now invokes compute_plan_verdict.py;
  documented source-plan.md + grades.json in the output tree.
- dimensions.md: weights reference the canonical script table.

Version: shield 2.26.0 -> 2.27.0 (marketplace.json).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ashwinimanoj ashwinimanoj merged commit 5ab6a65 into main Jun 4, 2026
12 checks passed
@ashwinimanoj ashwinimanoj deleted the fix/plan-review-p0-gate-and-consistency branch June 4, 2026 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant