diff --git a/.agents/skills/babysit-pr/SKILL.md b/.agents/skills/babysit-pr/SKILL.md index 2fced03bc5..baaeb10189 100644 --- a/.agents/skills/babysit-pr/SKILL.md +++ b/.agents/skills/babysit-pr/SKILL.md @@ -27,9 +27,9 @@ This skill always runs exactly one pass. It never waits or repeats internally. F The skill uses two HTML-comment sentinels. -**Addressed sentinel**: ``. The `core@` suffix records which plugin version produced the reply. Appended on its own line at the end of every reply the skill posts (both thread replies and the CodeRabbit summary). This is how the skill knows, on re-runs, which threads and CodeRabbit review-body comments it already handled. Dedupe matches by the version-agnostic prefix ``) to find sentinels regardless of version; grep `babysit-pr:addressed v1 core@3.4.1` to find ones from a specific version. +**Addressed sentinel**: ``. The `core@` suffix records which plugin version produced the reply. Appended on its own line at the end of every reply the skill posts (both thread replies and the CodeRabbit summary). This is how the skill knows, on re-runs, which threads and CodeRabbit review-body comments it already handled. Dedupe matches by the version-agnostic prefix ``) to find sentinels regardless of version; grep `babysit-pr:addressed v1 core@3.5.0` to find ones from a specific version. -**Follow-up sentinel**: ``. Attached to replies that defer an out-of-scope comment as a tracked follow-up (see the Scope subsection and the Defer verdict in step 6). Grep `babysit-pr:followup` across PR conversation JSON to enumerate deferred items. This sentinel is additive — the post-reply scripts still append the `addressed` sentinel at the end, so a deferred thread is correctly machine-classified as addressed (the skill _has_ handled it — by deferring). Human reviewers and future sweeps distinguish deferred from resolved by looking for the follow-up sentinel. +**Follow-up sentinel**: ``. Attached to replies that defer an out-of-scope comment as a tracked follow-up (see the Scope subsection and the Defer verdict in step 6). Grep `babysit-pr:followup` across PR conversation JSON to enumerate deferred items. This sentinel is additive — the post-reply scripts still append the `addressed` sentinel at the end, so a deferred thread is correctly machine-classified as addressed (the skill _has_ handled it — by deferring). Human reviewers and future sweeps distinguish deferred from resolved by looking for the follow-up sentinel. **Sentinel recency rules.** The script emits a per-thread `activityState` with three values: @@ -266,7 +266,7 @@ Body templates (the script appends the `addressed` sentinel if missing): - **Agree**: `Addressed in . .` - **Disagree**: `Leaving current behavior. .` - **Already fixed**: `Already handled by . .` -- **Defer**: `Out of scope for this PR; this looks like follow-up work rather than something introduced or required by this change. .\n\n` +- **Defer**: `Out of scope for this PR; this looks like follow-up work rather than something introduced or required by this change. .\n\n` For Defer replies, include the follow-up sentinel on its own line as shown. The script will append the `addressed` sentinel after it on its own line, so the final body ends with the follow-up sentinel followed by a blank line followed by the `addressed` sentinel — `grep babysit-pr:followup` finds the deferral and `grep babysit-pr:addressed` still marks the thread handled for dedupe. @@ -281,7 +281,7 @@ bash scripts/postSentinelPrComment.sh "$PR_NUMBER" "$BODY" The CodeRabbit summary body should: - Group verdicts under **Agree / Disagree / Already fixed / Deferred (out of scope)** headings. Omit a heading if its list is empty. -- Under **Deferred (out of scope)**, list each deferred CodeRabbit review-body comment as a bullet, followed on its own line by `` so grep catches them individually. +- Under **Deferred (out of scope)**, list each deferred CodeRabbit review-body comment as a bullet, followed on its own line by `` so grep catches them individually. - Include the commit URL for fixes. - Include every current CodeRabbit review-body comment's `fingerprint` — addressed and deferred — in a fenced block at the end (one per line, before the sentinel) so future runs can dedupe. Deferred comments count as handled for dedupe purposes. diff --git a/.agents/skills/babysit-pr/scripts/_sentinel.sh b/.agents/skills/babysit-pr/scripts/_sentinel.sh index 855da99018..f98566c996 100644 --- a/.agents/skills/babysit-pr/scripts/_sentinel.sh +++ b/.agents/skills/babysit-pr/scripts/_sentinel.sh @@ -8,7 +8,7 @@ # replies; the `core@X.Y.Z` suffix records which plugin version produced it. SENTINEL_PREFIX='' +SENTINEL='' # Echo $1 with SENTINEL appended on its own trailing paragraph, unless the # body already contains any version of the sentinel (matched via SENTINEL_PREFIX). diff --git a/.agents/skills/commit-push-pr/SKILL.md b/.agents/skills/commit-push-pr/SKILL.md index 67cc0395fc..fe67ae7673 100644 --- a/.agents/skills/commit-push-pr/SKILL.md +++ b/.agents/skills/commit-push-pr/SKILL.md @@ -45,6 +45,6 @@ Script paths in this procedure are written as `scripts/...`, relative to this SK 4. Push the branch to origin. 5. Look up the current agent session ID by running this skill's bundled script: `bash scripts/find-session-id.sh ''`. Pass a distinctive verbatim chunk (≥10 words) from the most recent user message; see the script header for quoting constraints. If the script prints `codex `, use `Agent session: codex resume `. If it prints `claude-code `, use `Agent session: claude --resume `. If empty, there is no session footer line. 6. Check for an existing PR with `gh pr view`. - - No PR: create with `gh pr create`. Title = commit subject. Description = the PR body shape above, followed by the session footer line if known and ``. - - PR exists: refresh the body via `gh pr edit --body` so (a) the new commit's changes are reflected in the prose while existing `## Summary`, `## Validation`, and `## Notes` sections are preserved unless clearly stale, (b) any known session footer line is appended if missing, never removing or rewriting existing `Agent session: ...` or `Agent session ID: ...` lines, and (c) any existing `` line is preserved verbatim, appending `` if absent. Then report the URL. + - No PR: create with `gh pr create`. Title = commit subject. Description = the PR body shape above, followed by the session footer line if known and ``. + - PR exists: refresh the body via `gh pr edit --body` so (a) the new commit's changes are reflected in the prose while existing `## Summary`, `## Validation`, and `## Notes` sections are preserved unless clearly stale, (b) any known session footer line is appended if missing, never removing or rewriting existing `Agent session: ...` or `Agent session ID: ...` lines, and (c) any existing `` line is preserved verbatim, appending `` if absent. Then report the URL. 7. End with one short text response: branch name and the full PR URL (e.g., `https://github.com/clipboardhealth/core-utils/pull/123`). Never use shorthand like `repo#123` — always output the complete URL. diff --git a/.agents/skills/in-depth-review/SKILL.md b/.agents/skills/in-depth-review/SKILL.md new file mode 100644 index 0000000000..11e667ae51 --- /dev/null +++ b/.agents/skills/in-depth-review/SKILL.md @@ -0,0 +1,494 @@ +--- +name: in-depth-review +description: Multi-agent code review of the current branch or a PR — parallel engineering, minimalist, conventions, and AntiSlop reviewers plus conditional security, database, and frontend specialists that debate, then a synthesized review posted as anchored PR comments. Use when the user asks for a deep, thorough, or multi-perspective review, reviews a large or high-stakes diff, or runs /in-depth-review [PR-number-or-URL]. For small diffs or a quick single-pass review, use simple-review instead. +--- + +# In-Depth Review + +Run a multi-agent code review on the current branch _or_ on a PR identified by argument. By default dispatches four parallel reviewers (engineering+customer, minimalist, conventions, anti-AI-slop) and conditionally adds up to three domain specialists (security, database, frontend) when the diff touches their surface. The first-principles adversarial agent (Agent A) is **opt-in** — include it only when the user explicitly asks (see Invocation). Agents review in parallel, debate once, and the main agent moderator-filters and synthesizes a report. Author vs reviewer mode adapts the post-review flow. + +## Invocation + +- **`/in-depth-review`** — review the current branch (resolves the open PR for the branch if any, otherwise diffs against the default branch). +- **`/in-depth-review `** — fast path for reviewing someone else's PR while sitting on `main` (typical entry from Claude Desktop in code mode on a worktree). Skips the local branch checkout, fetches diff and metadata via `gh`, forces **reviewer mode**, and skips the plan/execute path. Accepts either a bare number (resolved against the current repo) or a full GitHub PR URL (which also identifies the owner/repo, useful when the worktree's remote differs). + +### Agent roster + +| Letter | Name | Role | Default state | +| ------ | ----------- | ------------------------------------- | -------------------------------------------- | +| A | Adversarial | First-principles adversarial | Opt-in only (`with adversarial`, `+A`, etc.) | +| B | Engineering | High engineering standards + customer | Always on | +| C | Minimalist | Smallest-diff posture | Always on | +| D | Conventions | Repo-rules / docs | Always on | +| E | Security | Security & API surface | Conditional (auth/route/contract files) | +| F | Database | DB schema, queries, migrations | Conditional (migration/schema/model files) | +| G | Frontend | FE conventions & UX correctness | Conditional (FE files) | +| H | AntiSlop | Anti-AI-bias / pushback on slop | Always on | + +**Refer to agents by name in everything the user sees** (Summary, Actionable items, Raised-by lines, Disagreements, Withdrawn). The letter is a compact prefix for finding IDs (`B1`, `C3`, `Adversarial`/`A1`, …) and a shorthand inside this document — never the only label in user-facing prose. The Round 2 `original_author` field stores the **name**, not the letter. + +### Agent A (first-principles adversarial) is opt-in + +The Adversarial agent (A) is skipped by default. Include it only when the user's invocation contains a clear opt-in phrase such as `with adversarial`, `with agent a`, `include adversarial`, `add the adversarial agent`, `+A`, or equivalent. If the user gives a more ambiguous instruction (e.g. "run the full review"), ask once whether to include the Adversarial agent before dispatching. Record the decision (`agent_a_enabled: true|false` + the phrase that triggered the opt-in, if any) in `/tmp/in-depth-review-meta.json` and surface it in the Summary so the user can see what ran. + +## Scope + +Two entry paths: + +### Path A — PR-argument fast path (when an argument is provided) + +- Parse the argument: bare integer → use current repo (`gh repo view --json nameWithOwner --jq .nameWithOwner`); URL → extract `/` and `` from the URL. +- Do **not** check out the PR branch. Operate from whatever the working directory is (typically `main` in a worktree). Convention/file reads happen against a verified-fresh ref — see **Freshness preflight** below. +- Mode is **locked to reviewer**. Skip mode detection. + +Gather in parallel: + +```bash +gh pr view --repo / --json number,url,title,body,baseRefName,author,headRefOid,headRepository,files +gh pr diff --repo / +gh api user --jq .login +``` + +Persist: + +- Diff stdout from `gh pr diff` → `/tmp/in-depth-review-diff.patch`. +- `pr.title + pr.body` → `/tmp/in-depth-review-context.md`. +- `pr.files[].path` → `/tmp/in-depth-review-files.txt`. +- Meta (PR number/url/baseRefName/author, viewer login, head SHA, owner/repo, mode=`reviewer`, conditional-agent set after classification) → `/tmp/in-depth-review-meta.json`. + +If `pr.author.login == viewer.login`, still proceed in reviewer mode (the user explicitly asked for the PR-arg path; honor it) but flag this in the synthesis Summary so the user can switch to a manual `/in-depth-review` run on their checkout if they meant to implement. + +### Path B — current-branch path (no argument) + +- If there is an open PR for the current branch, review that PR. Base = `baseRefName`, diff = `git diff $base...HEAD`. Capture PR title + body as context. +- Otherwise, review the current branch vs the default branch (`main`, fall back to `master`). Capture `git log --format='%h %s%n%n%b' $base..HEAD` as context. +- **Never** review uncommitted working-tree changes. If the branch has no diff vs base, stop and report. + +Gather in parallel: + +```bash +git rev-parse --abbrev-ref HEAD +git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||' +gh pr list --head "$(git branch --show-current)" --state open --json number,url,title,body,baseRefName,author,headRefOid +gh api user --jq .login +gh repo view --json nameWithOwner --jq .nameWithOwner +git diff --name-only "$base"...HEAD +``` + +Determine **mode**: + +- PR exists and `pr.author.login == viewer.login` → **author mode**. +- PR exists and authors differ → **reviewer mode**. +- No PR exists → **author mode** (reviewing your own pre-PR branch). + +Persist context so subagents read it from disk: + +- Diff → `/tmp/in-depth-review-diff.patch` +- Context (PR body or commit log) → `/tmp/in-depth-review-context.md` +- Changed-file list → `/tmp/in-depth-review-files.txt` +- Mode + repo metadata (PR number, url, base, head SHA, author, viewer, owner/repo, list of conditional agents that will run, verified-fresh `context_ref` from the freshness preflight) → `/tmp/in-depth-review-meta.json` + +## Freshness preflight (runs before Diff classification — both paths) + +Stale local state is the #1 cause of false-positive findings: it produces "evidence" of bugs that were already fixed on `main`. Before dispatching any agent, the main agent **must** verify that the ref it tells subagents to read for context is current with the remote, and pass that ref into subagent prompts so agents read code via `git show :` and `git grep ... -- ` rather than the worktree filesystem. + +Run this preflight on the **primary repo** (the one containing the diff under review): + +```bash +git fetch origin "$base" --quiet 2>&1 | tail -3 +git rev-parse --abbrev-ref HEAD +git status --porcelain +git rev-list --left-right --count "HEAD...origin/${base}" # prints "\t" +git log -1 --format='%h %ci' "origin/${base}" +``` + +Decide `context_ref` (the ref subagents must read for repo context, distinct from the diff itself): + +- **Path A (reviewer mode, PR-arg):** ideal `context_ref = origin/${base}` (whatever the PR's base branch is, usually `main`). The worktree filesystem must **only** be used as `context_ref` when `HEAD == origin/${base}` AND working tree is clean AND `git fetch` succeeded. +- **Path B (author mode, current-branch):** `context_ref = origin/${base}`. The user's local feature-branch files are the _diff_, not the _pre-PR context_; pre-PR convention/neighbor reads must come from `origin/${base}`. + +If any of the following are true, **stop and ask the user before continuing**: + +1. `git fetch origin "$base"` failed (offline, auth, repo permissions). Print the error. +2. `HEAD` differs from `origin/${base}` (Path A only — Path B expects HEAD on a feature branch). +3. Working tree is dirty (`git status --porcelain` is non-empty) **and** the dirty paths overlap the diff's changed-file list or any path subagents will need to read. +4. `HEAD` is behind `origin/${base}` (any non-zero "behind" count). +5. `HEAD` is more than a small number of commits ahead of `origin/${base}` on Path A (ahead implies the user is mid-work on a branch that isn't the PR; their worktree is not a clean main). + +Warn template (substitute the verified state): + +> Freshness check for `/` at ``: +> +> - on branch `` (expected `` for Path A) +> - `` commit(s) ahead, `` commit(s) behind `origin/` (last remote commit: ` `) +> - working tree: `` +> +> Reading code from this worktree may surface findings based on stale or local-only state. +> Reply: `proceed` (use worktree anyway and accept the risk), `use-origin` (read context via `git show origin/:` instead — recommended, no checkout needed), or `stop` (let me clean up and re-run). + +**Never** run `git checkout`, `git stash`, `git reset`, or any other state-modifying git command on the user's behalf. The skill warns and asks; the user is the only actor that resolves local state. + +Record the resolved `context_ref` (e.g. `origin/main`, or the literal string `worktree (stale, user accepted risk)`) in `/tmp/in-depth-review-meta.json`. Every subagent prompt must explicitly state which ref to read and how (see "Reading code in subagents" below). + +### Reading code in subagents + +When `context_ref` is `origin/`: + +- Read repo context via `git show "${context_ref}:"` (for whole files) and `git grep -n "${context_ref}" -- ` (for searches). +- The Read tool reads the working tree, which may be stale; subagents must **prefer** `git show` for context reads. The working tree may only be read when the file isn't tracked at `context_ref` (e.g. brand-new file in the PR diff) and that fact is explicitly noted in the finding. + +When `context_ref` is `worktree (stale, user accepted risk)`: + +- Subagents may use Read on the worktree, but every finding must include a one-line "verified against: worktree-stale" caveat in the `point`, and the moderator filter must downgrade CRITICAL/MAJOR to MINOR unless the evidence is internal to the diff itself. + +## Cross-repo evidence policy (binding for all agents) + +A finding's evidence is "cross-repo" when it depends on code in any repo other than the one containing the diff (e.g. a consumer or producer in another service). Subagents must **NOT** silently read external repos — stale checkouts fabricate evidence and unverifiable paths can't be checked. Emit such findings with an `evidence_required` field and cap severity at **MAJOR** until the moderator verifies the evidence on a fresh ref; after Round 1 the moderator collects every `evidence_required` block and asks the user for access before Round 2. + +Read [references/cross-repo-evidence.md](references/cross-repo-evidence.md) before raising or finalizing any cross-repo finding — it has the `evidence_required` schema, the moderator's access-request prompt, the freshness rules for external repos, and `skip` handling. + +## Diff classification (runs before Round 1) + +Match the changed-file list against these path patterns. A conditional agent runs only when its pattern matches at least one changed file: + +- **E — Security** triggers on: any file under `routes/`, `controllers/`, `middleware*/`, files matching `auth*`, `*permission*`, `*acl*`, `*token*`, `*session*`, response serializers, OpenAPI / contract definitions, or new API endpoint files. +- **F — Database & migrations** triggers on: `migrations/`, `*.sql`, files matching `schema*`, Mongoose/Prisma model files (`models/`, `*.model.ts`, `*.schema.ts`), repository / DAL files, or files containing query builders. +- **G — Frontend** triggers on: `*.tsx`, `*.jsx`, `*.css`, `*.scss`, files under `pages/`, `components/`, `hooks/`, or anything importing from `react`, `@tanstack/react-query`, or a design-system package. + +Record the dispatched set in `/tmp/in-depth-review-meta.json` so Round 2 knows the full agent list. + +## Severity rubric (binding for all agents) + +- **CRITICAL** — realistic input causes incorrect behavior, data loss, security regression, broken contract, or a paging incident. +- **MAJOR** — meaningful degradation of correctness/UX/observability under realistic conditions; OR a documented-convention violation with concrete downstream impact. +- **MINOR** — cheap, concrete improvement with a named benefit. +- **NIT** — only admissible if (a) repeats ≥2× in the diff, (b) conflicts with a documented convention, or (c) is a one-line trivial fix. Otherwise drop silently. + +Every finding **must** include a `failure_mode` field: one sentence on the concrete user-, oncall-, or maintainer-visible bad outcome that would occur if not fixed. Hypotheticals like "a future caller might…" do not satisfy this; such findings will be dropped during moderation. + +Run the litmus test on every candidate finding before keeping it: _"What is the concrete, current, product-visible cost of leaving this code in?"_ If you cannot answer in one sentence, drop the finding. + +### Do-not-raise list (binding) + +Do not surface any of: + +- Speculative defensiveness at trusted internal boundaries (e.g. "what if this private helper got null?"). +- Restating the obvious ("consider a comment explaining what this does"). +- Hypothetical future-caller scenarios with no current caller. +- Style/formatting a linter or formatter already covers. +- Test-coverage demands on trivially-verifiable code (one-line passthroughs, simple getters). +- "Add observability" without naming a concrete failure mode it would help debug. +- Abstract SOLID-style "consider extracting…" suggestions without a concrete failure mode. +- Aesthetic naming preferences. Only raise names that mislead about behavior. + +## Rounds (hard cap: 2) + +### Round 1 — parallel independent reviews + +Dispatch all selected agents **in the same message** via the `Agent` tool with `subagent_type: general-purpose`. The default selected set is **Engineering, Minimalist, Conventions, AntiSlop** (B/C/D/H), plus **Security, Database, Frontend** (E/F/G) when triggered by classification. **The Adversarial agent (A) is dispatched only when `agent_a_enabled` is true** (see Invocation — "Agent A is opt-in"). Each agent gets the file paths above, their role brief, and permission to Read repo files for context. Do **not** let agents see each other's output in this round. **Each agent emits at most 8 findings, prioritized — not exhaustive.** + +**Every subagent prompt must include the following input contract verbatim** (substitute `` with the value resolved in the Freshness preflight): + +> **Context-read contract.** The verified-fresh ref for repo context is ``. For any file you read that is part of the primary repo's tracked content (i.e. _not_ a brand-new file in this PR's diff), use `git show ":"` or `git grep -n "" -- ` rather than the Read tool on the worktree. If you read from the worktree because the file is new in the diff or untracked at ``, say so in the finding's `point`. +> +> **Cross-repo evidence contract.** If a finding's load-bearing evidence is in any repo other than this one, do NOT silently read another local checkout — those checkouts may be stale or on unrelated branches. Instead, emit the finding with an `evidence_required` field naming the repo(s) and the specific verification question, and cap your severity at MAJOR. The moderator will ask the user for verified access before Round 2 and re-dispatch you if needed. + +Required output per finding: `id` (`A1`, `B3`, `C2`, `D4`, `E1`, `F1`, `G1`, `H1`, …), `severity`, `file:lines`, `title` (one sentence), `point` (one short paragraph), `failure_mode` (one sentence), optional `suggested_fix` (see schema below), and optional `evidence_required` (object with `repos: string[]` and `what_to_verify: string` — required whenever the finding depends on cross-repo state). + +**`suggested_fix` schema** (when present): an object with two string fields, both code (not prose), preserving the file's actual indentation: + +```json +"suggested_fix": { + "before": "", + "after": "" +} +``` + +- For a **pure deletion**, set `after` to the empty string `""`. +- For a **pure addition** at a new line (nothing replaced), set `before` to the empty string `""` and prefix `after` with a one-line comment indicating where it lands (e.g. `// add after line 142`). +- For a **structural change** with no clean drop-in (e.g. "move this file"), omit `suggested_fix` entirely and put the guidance in `point` / `failure_mode`. +- Both fields must be code snippets directly usable to compute a diff. Do not include surrounding prose, do not paraphrase, and do not elide content with `// ...`. If the change is too large to inline both, omit `suggested_fix` and describe the fix in `point`. + +#### Always-on agents (Engineering / Minimalist / Conventions / AntiSlop) and the opt-in Adversarial agent + +**Adversarial (A) — First-principles. Opt-in only.** Dispatch only when the user explicitly asks (see Invocation — "Agent A is opt-in"). Skip silently otherwise. Posture: _"What is the single most important thing this PR gets wrong? Then up to 7 more, ranked."_ Question whether the change actually solves the underlying problem and whether the chosen approach is right. Challenge specific library / pattern / approach choices when they don't hold up on their own merits. Do **not** challenge foundational stack choices that are out of scope for the PR. Flag internal inconsistencies inside the diff itself. Specifically also probe: + +- Should this PR be split, or is it bundling unrelated tickets? +- Is the root cause of the bug clear, and is it fixed in _every_ place it manifests? +- Is a feature flag warranted but missing? +- Are there old feature flags or dead references this change makes deletable? + +Conventions are the Conventions agent's (D) job — do not double up. + +**Engineering (B) — High engineering standards + customer-centric.** Posture: _"For each change, name a realistic input or condition that would expose a bug. If you cannot, do not raise it."_ Evaluate edge cases, error paths, observability, tests (coverage of real risk, not lines), high-level security (E does the deep dive when dispatched), concurrency, performance at real scale, data integrity, accessibility, UX, degraded-mode behavior, backward compatibility, on-call implications, and domain assumptions in the diff. Specifically also probe: + +- Async/await correctness — ordering matches the actual dependency between calls. +- Timezone correctness in date-handling code; currency variables explicitly in minor units. +- When API surface changed: AuthN, AuthZ, sensitive-data leakage, PII handling. Hand the deep dive to Security (E) if dispatched. +- When a migration is in the diff: rollback strategy, backward compatibility during rolling deploy, ETL / downstream impact. Hand to Database (F) if dispatched. +- When DB schema or queries changed: indexes for new query patterns, N+1 risk, cascade-delete semantics, data-type fit. Hand to Database (F) if dispatched. +- Telemetry covers _business / product_ value, not only engineering surface metrics (latency / error rate). +- Contract / backward-compatibility for any consumer-visible response shape change. If you suspect a consumer break, the evidence is cross-repo — emit `evidence_required` and cap at MAJOR rather than asserting it; never raise a speculative "the FE will fail to parse" finding without verified evidence. + +Skip items that fail the realistic-input test. + +**Minimalist (C).** Posture: the smallest diff that ships the intent is the best diff. Identify unneeded abstractions, speculative generality, dead branches, redundant validation, defensive code at trusted internal boundaries, comments that restate code, new files / utilities that duplicate existing ones, tests that test the framework rather than behavior, flags / config knobs without a concrete caller. Specifically also probe: + +- Duplicated error handlers in the same scope. +- Commented-out code or dead branches (still gated by `failure_mode`). + +For every "delete this" finding, `failure_mode` must state the _concrete cost of keeping the code_. + +**Conventions (D) — Repo-rules.** Sole owner of convention checks (backend / shared). Read `AGENTS.md`, `CLAUDE.md`, `.rules/`, `.cursorrules`, `docs/adr/` (or equivalent), package READMEs in the diff's path, and one or two neighboring files in the same service for in-practice patterns. Then check the diff for: + +- Preferred internal packages over third-party ones (e.g. `@clipboard-health/*`, internal `lib/*`). +- Terminology rules (worker/workplace, not HCP/facility). +- `lodash` removal preference. +- `@clipboard-health/datetime` over `date-fns`, `date-fns-tz`, `moment`, `luxon`. +- Internal `Money` package for currencies; explicit minor-units variable names. +- Logging conventions (`longContext`, `ObjectId.toString()`, no variables in log messages). +- Null/undefined checks via `isDefined` / `isNil` over truthy. +- 4+ argument functions → params object. +- Error-handling patterns in the same service (typed `ServiceResult` vs throw). +- Test conventions, naming, file location (service test structure, `createTestContext` / `tearDown`, `beforeAll` / `afterAll` for GET). +- Business-meaningful magic constants that should be named. +- ENV access via the project's Configuration abstraction, not direct `process.env`. +- Logging library standard for the repo (e.g. Winston) — derive from existing code, not hardcoded. +- Pinned dependencies; lock file (`package-lock.json` / `yarn.lock`) in sync with `package.json`. +- RESTful route shape and uniform response format within a service. +- Internal inconsistency _within the diff itself_ (e.g. half of imports from one package family, half from upstream). + +Frontend conventions are the Frontend agent's (G) territory when G is dispatched. If Frontend is not dispatched (no FE files), Conventions (D) may surface FE convention drift if it appears. + +Tag every finding `[CONVENTION]`. Cap severity at MAJOR (only when behavior diverges as a result of the violation) or MINOR otherwise. At the top of your output, list the specific files/sources you checked. + +**AntiSlop (H) — Anti-AI-bias.** Posture: _"This PR may have been written or assisted by an LLM. For each addition, ask: 'is this line earning its keep, or is it pattern-matching what code is supposed to look like?' If a reasonable senior engineer wouldn't have written it, call it out and propose deleting or shrinking it."_ Your job is to push back on AI-shaped padding that the other agents are too polite to call slop. Specifically watch for, **inside the diff itself**: + +- **Defensive code at trusted internal boundaries.** Null / undefined guards on private helpers whose callers' types already guarantee non-null. `try` / `catch` wrapping a single call that doesn't itself throw, or that re-throws unchanged, or that "logs and swallows" without naming what to do next. Optional-chaining cascades through types that don't include optionality. +- **Defensiveness against unrealistic product / domain scenarios.** Code that guards against situations that wouldn't actually arise when the product is used as designed. Ask the litmus question: _"In the real product flow this code participates in, what user action, system event, or upstream call could land us in this branch?"_ If the answer is "none" or "I had to invent one to justify the guard", the code is slop. Concrete shapes this takes: + - A `null` / `undefined` guard on an ID immediately after that ID was used to load (and find) the entity. + - A `null` / `undefined` / `""` / `0` branch on a field whose TypeScript type permits only one of those values, or whose Zod / class-validator schema already rejected the others at the request boundary. + - A re-validation of a value the request DTO already validated upstream in the same request lifecycle. + - A consistency check (`if (a !== b) throw`) between two fields the data model forbids being unequal. + - A branch for a product-impossible state — e.g. "what if the worker is assigned to two workplaces at once" when the product enforces single-active-assignment; "what if the shift end is before its start" after the schema already enforces ordering; "what if the dispute amount is negative" when the validator caps it. + - A retry / fallback layer around an SDK or library call that already retries or already returns a typed error you can ignore. + - A `catch` clause for an error class the underlying code is statically known not to throw, or that "logs and swallows" without naming what to do next. + - A "future-proof" code path (`if (feature === "v2")`) when there is no `v2` and no concrete ticket creating one. + + Don't accept "but what if upstream changes?" as a defense — that's the hypothetical-future-caller anti-pattern. The fix when upstream genuinely changes is a typed-error / schema update in the same PR, not preemptive guards. + +- **Restating-the-code comments.** `// fetch the user`, `// loop through items`, `// add 1 to counter`. JSDoc on private helpers that only restates the signature. `// Note: this is important` lines. Section banners (`// === HELPERS ===`) inside short files. +- **Empty scaffolding.** `// TODO` comments with no owner, ticket, or trigger condition. Redundant pre-conditions (`if (!x) throw` immediately after a Zod parse). Log lines added "for debugging" that survive into the PR. Default `else { return undefined; }` after already-exhaustive branching. `_unused` parameter prefixes that should just be deletions. +- **Speculative generality.** A helper called once that wraps two trivial lines. A `Map` / `Set` / config object keyed by a single hardcoded value. A "strategy" / "registry" pattern with one strategy. Type unions whose only second case is `never`, a placeholder, or a string literal nobody emits. +- **Unused parameters / overloads / fields.** Function args destructured but never read; interface methods with empty implementations; type fields written but never consumed downstream in the diff; new optional fields with no producer or consumer. +- **Tests that exercise the framework, not the code.** Tests that assert "the mock returns what the mock returned" via `jest.fn().mockReturnValue(x); expect(fn()).toBe(x)`. Snapshot tests with no semantic assertion. Tests that mock the unit under test. Tests with names that describe the implementation (`it("calls foo.bar"…)`) rather than the behavior. +- **Dead AI breadcrumbs.** Variables whose only use is being logged or echoed in a debug branch; `console.log` / `console.error` calls that should have been removed before commit; commented-out alternative implementations left in the diff; "\_legacy" parameter renames where nothing actually changed about how it's read. +- **Tone / description mismatch.** PR description / commit message claims behavior the diff doesn't have, or describes the diff in language that pattern-matches engineering writing without naming the actual change. Variable / function names that sound right but don't reflect the value's role (`data`, `result`, `processed`, `_handle`, `doStuff`). + +For every finding, `failure_mode` must name the _concrete cost of leaving the code in_ — e.g. "future readers waste cycles understanding a wrapper that does nothing", "the unused parameter masks the next refactor's mistake", "the `try` / `catch` silently swallows an unrelated runtime error", "the restating comment goes stale in the next change and misleads readers", "the speculative helper makes call-site search return one false hit per call". A finding without a concrete cost-of-keeping fails the moderator filter. + +The default `suggested_fix` shape is **delete** (empty `after`) or **simplify** (smaller `after` than `before`). Suggesting "add a justifying comment" is itself slop — do not propose it. If the right move is structural (e.g. inline the wrapper), omit `suggested_fix` and describe the move in `point`. + +Stay in your lane: + +- Do **not** raise items that the Conventions agent (D) owns (terminology, lodash, datetime / money libraries, logging context shape) — that's their turf. +- Do **not** demand observability, tests, or error handling that **does not exist** in the diff — that's the inverse of your job. You only call out what's _present_ and unnecessary. +- Do **not** challenge whether the PR solves the right problem — that's the Adversarial agent's (A) turf when opted in. +- A change that's small but unnecessary is still slop. A change that's large but earns each line is not. + +Cap output at 8 findings. `failure_mode` required. + +**Round 2 expanded role — audit the other agents' findings, not just your own.** In Round 2 you receive every other agent's Round 1 findings. In addition to defending or withdrawing your own (the standard Round 2 contract), you are the AI-bias counterweight in the debate: audit every other agent's finding for the same slop patterns you scan code for. Mark each such finding `disagree` with a one-line `slop:` label in the `reasoning` field. Specifically watch for: + +- **Asks-for-more-defensiveness.** "Add a null check", "wrap in try / catch", "validate this input", "guard against …". If the value is already typed, already validated upstream, or already guaranteed by a preceding call inside the same scope, tag `slop: asks for defensive guard on already-narrowed value`. +- **Hypothetical concerns.** "What if a future caller passes …", "consider the case where …", "if someone later …". If the finding cannot name a current realistic input or product flow that hits the concern, tag `slop: hypothetical future caller — no current path`. +- **Restating-the-obvious comment requests.** "Add a JSDoc explaining what this does", "consider a comment here". If the name + signature already convey intent, tag `slop: restating-the-obvious comment request`. +- **Abstract "extract this helper" / SOLID-aesthetic refactors.** Refactor suggestions whose `failure_mode` is shape-of-the-code rather than a concrete cost of keeping the inline version. Tag `slop: refactor with no concrete cost-of-keeping`. +- **Observability demands without a named failure mode.** "Add a log here", "emit a metric". If the finding can't name the specific failure the telemetry would help debug, tag `slop: observability without named failure mode`. +- **Test-coverage demands on trivially-verifiable code.** "Add a unit test for this getter / passthrough / one-line wrapper". If the code's correctness is type-evident or already exercised by a higher-level test in the same PR, tag `slop: test for trivially-verifiable code`. +- **Domain-impossible scenarios surfaced by other agents.** Any finding that worries about a state the product cannot produce given how it actually works. Apply the same litmus question to _the finding_ that you apply to code: "what real product flow could land us here?" If none, tag `slop: defends against a state the product cannot produce`. + +The moderator filter weighs your slop tags heavily — see the moderator filter step on AntiSlop tags. The original author gets a chance to defend in their own Round 2 entry (a concrete `final_failure_mode` naming a real, product-specific cost); without that defense, the finding is dropped. + +Even when your Round 1 findings are sparse because the diff is genuinely clean, do not under-spend on the Round 2 audit. The audit is often the bigger lever — it stops slop _suggestions_ from polluting the final review. + +#### Conditional agents + +**Security (E) — API surface.** Dispatched when the diff touches auth / route / permission / serializer / contract files. Posture: _"Where could this change leak data, bypass authorization, or expand the trust boundary?"_ Specifically check: + +- AuthN: every new or modified endpoint has authenticated identity unless explicitly public. +- AuthZ: per-endpoint and per-resource permission checks; cross-tenant / cross-user access. +- Secrets in configuration — no hardcoded keys, no secrets in client bundles. +- No self-made or client-side cryptography. +- SQL / NoSQL injection vectors (string-concatenated queries, unvalidated `$where`, raw aggregation pipelines from user input). +- Sensitive data in response payloads — over-fetching, over-serialization, internal IDs leaked. +- PII handling — PII fields logged, cached, or sent to telemetry. +- Input validation at the boundary (Zod / class-validator) on every new endpoint. +- For numeric inputs: explicit `min` / `max` bounds. Verify what the JS engine does with `Number.MAX_SAFE_INTEGER + 1`, BigInt conversion, negative values, and whether the catch path returns a structured 400 or a 500 with stack log. + +Cap output at 8 findings. `failure_mode` required. + +**Database (F) — Schema, queries, migrations.** Dispatched when the diff touches migrations / SQL / schema / model / query files. Posture: _"What breaks at production scale or during deploy?"_ Specifically check: + +- Index coverage for new query patterns; missing compound indexes. +- N+1 query shapes; loops issuing per-iteration queries. +- Schema normalization — denormalization that creates write-amplification or update anomalies. +- Data types — range, precision, locale (numeric, date, money). Mongoose `Number` is IEEE 754 double; safe for integers up to 2^53. +- Cascade-delete and FK-on-delete semantics; orphan risk. +- Migration rollback strategy; can the migration run online without downtime? +- Backward compatibility during a rolling deploy (old code reads new schema, new code reads old schema). +- ETL / downstream consumer impact when schema or field semantics change. +- New collections / tables ship with the indexes their query patterns need on day one. +- Dual-write fields (storing both `amount` and `amountInMinorUnits`-style pairs): is canonicalization happening on write, or does the on-disk record encode two contradictory values? +- Backfill for new fields on existing records: present, deferred (with ticket), or missing? + +Cap output at 8 findings. `failure_mode` required. + +**Frontend (G) — Conventions & UX-correctness.** Dispatched when the diff touches `*.tsx` / `*.jsx` / FE component / hook / page paths. Posture: _"Does this follow our FE conventions, and will it behave correctly under realistic user conditions?"_ Specifically check: + +- API calls go through v2 / custom hooks, not raw `fetch` / `axios` in components. +- React Query: thin wrappers (`useGetQuery`, etc.), not raw `useQuery` / `useMutation`. +- Falsy-value checks use `isDefined()` / `isNil()`, not `!value`. +- Test actions wrapped in `act()`. +- Zod schemas: `z.nativeEnum` used where a TS enum already exists. +- Prop drilling — when a value is passed through ≥2 layers, is a context preferable? +- Feature flags via `useCbhFlag` (or repo equivalent), not direct config reads. +- New components / styles pulled from the design-system library, not built from scratch. +- Loading / empty / error states present for any new data-fetching surface. +- Accessibility: keyboard navigation, ARIA roles, labels on interactive controls. + +Cap output at 8 findings. `failure_mode` required. + +### Round 2 — debate + +Dispatch the **same set of agents** that ran in Round 1, in parallel. Each agent receives **all Round 1 outputs** (passed inline) and must: + +- Re-examine each of their own comments and **withdraw** any that don't survive scrutiny. +- For each comment from the other agents: mark `agree`, `disagree` (with reasoning), or `refine` (propose a tighter version). +- Flag items where disagreement is substantive and unlikely to resolve. + +Output per item: `id`, `original_author` (agent **name**, e.g. `Engineering`, `Minimalist`, `Conventions`, `AntiSlop`, `Security`, `Database`, `Frontend`, `Adversarial` — not the bare letter), `verdict` (keep | withdraw | agree | disagree | refine), `final_severity`, `final_title`, `final_failure_mode`, `reasoning`, `suggested_fix`, and rebuttals `[{from, stance, reasoning}]` where `from` is also the agent name. + +**AntiSlop (H) plays an enhanced Round 2 role.** Beyond defending / withdrawing its own findings and voting on the others, AntiSlop audits every other agent's findings for AI-bias patterns — asks-for-more-defensiveness, hypothetical "future caller" concerns, restating-the-obvious comment requests, abstract "extract this helper" refactors, observability / test-coverage demands without a named failure mode, and guards-against-states-the-product-cannot-produce. AntiSlop tags those `disagree` with a one-line `slop:` label in `reasoning`. These tags feed the moderator filter (see step 2 there); the original author's Round 2 entry is their chance to rebut with a concrete, product-specific `final_failure_mode`. When other agents see an AntiSlop slop tag on one of their own findings, they should either rebut with a concrete failure mode or withdraw — not both stand on the original framing and re-assert the same wording. + +**This is the last round.** Residual disagreement goes to the Disagreements section. + +## Moderator filter (main agent — runs after Round 2, before synthesis) + +Apply these filters in order. They are non-negotiable: + +1. **Drop findings with empty or hypothetical `failure_mode`.** "A future caller might…", "in case someone…" → drop. +2. **Apply AntiSlop slop tags.** For every Round 2 entry where AntiSlop voted `disagree` with a `slop:` label in `reasoning`, check the original author's Round 2 rebuttal. If the author did not provide a concrete, product-specific `final_failure_mode` (a real user-, oncall-, or maintainer-visible cost realized through an actual product flow) that addresses AntiSlop's specific tag, **drop the finding**. AntiSlop's audit is not a unilateral veto, but the burden of proof shifts to the original author once AntiSlop tags slop. Record dropped findings in Withdrawn with the slop label so the user can audit AntiSlop's calls (e.g. `B4 — dropped: AntiSlop tagged "slop: asks for defensive guard on already-narrowed value", Engineering did not rebut with concrete failure_mode`). +3. **Drop do-not-raise items** that slipped through. +4. **Apply the NIT gate**. NITs that don't meet (a)/(b)/(c) → drop. NITs that do meet are kept internally but hidden by default in synthesis. +5. **Merge near-duplicates** across agents into one item (preserve all attributions in `Raised by:`). +6. **Apply hard caps**: at most **6 actionable** items (CRITICAL/MAJOR/MINOR) and **8 NITs** retained internally. Anything beyond is summarized as "N additional items omitted; ask for the full list." + +Filtered items go into Withdrawn (one-liner) so nothing is invisible. + +## Synthesize (main agent) + +### Summary + +One short paragraph: what the change does and your overall recommendation (ship / ship with changes / do not ship). Do not state the mode, name which agents ran, attribute headline verdicts to specific agents, or note whether Adversarial ran — that methodology metadata is noise for the user. Keep it to substance: the headline concerns and the recommendation. + +### Actionable + +Up to 6 items that survived scrutiny and the moderator filter. Format each: + +- **[SEVERITY] Title** — `file:lines` +- **Point:** one sentence. +- **Why it matters:** the `failure_mode` sentence. +- **Counterpoint (if any):** one sentence on the rebuttal and why it didn't overturn. +- **Suggested fix (before → after):** two stacked fenced code blocks with language tags — first labeled `// Before` (the current code from `file:lines`), then `// After` (the replacement). Use the same language tag for both. Omit when the fix is structural (no clean drop-in) and explain in prose instead. For a pure deletion, show the `Before` block and write "_Delete these lines._" under it. For a pure addition, show only the `After` block prefixed with `// Add after line `. +- **Raised by:** comma-separated agent **names** (e.g. `Engineering, Conventions`); **agreed by:** comma-separated agent names. Do not use bare letters in this field — the user shouldn't have to decode A/B/C. + +Render template: + +````markdown +**Suggested fix (before → after):** + +```ts +// Before — src/foo/bar.ts:42-45 +const x = doThing(); +if (x !== null) { + return x; +} +``` + +```ts +// After +const x = doThing(); +return isDefined(x) ? x : undefined; +``` +```` + +Order by severity (CRITICAL → MAJOR → MINOR), then by file. + +### Disagreements + +Items where agents substantively disagreed and did not converge. One sentence per side (attribute each by agent **name**, e.g. "Engineering: …" / "Minimalist: …"), then the moderator's call with a one-line reason. + +### Nits + +Do **not** print the nits list by default. Print only one line: _"N nits available (M from convention audit). Reply `show nits` to see them."_ If N is 0, omit the section entirely. When the user replies `show nits`, print the gated NITs as one-liners with `file:lines`, the raising agent's **name**, and `[CONVENTION]` tag where applicable, then re-prompt user gate 1. + +### Withdrawn + +Terse one-liners of Round 1 comments that were withdrawn or filtered. Each line ends with the raising agent's **name** in parentheses (e.g. "C7 — superseded by C4 (Minimalist)"). For transparency only. + +## User gate 1 — select items + +Ask exactly: _"Which items do you want to act on? Reply with ids (e.g. `B1, C2`), `all actionable`, `show nits`, or `none`."_ Do not proceed until the user answers. `none` ends the skill cleanly. `show nits` prints the hidden list, then re-asks the same question. + +## Branch on mode + +### Author mode + +**Plan.** For the selected ids only, produce an ordered implementation plan: steps, files touched per step, tests to add/update, verification commands. Do **not** edit any files yet. + +**User gate 2 (author mode).** Ask: _"What do you want to do? (`implement-locally` / `post-as-review` / `both` / `edit-plan` / `cancel`)"_ + +- `implement-locally` → execute (see below). +- `post-as-review` → post anchored review (see below); skill ends. +- `both` → post the review first, then execute. +- `edit-plan` → revise from feedback, re-prompt. +- `cancel` → stop. + +**Execute.** Use `TodoWrite` to track each step, apply the plan, run the verification, report results. If a step surfaces a new substantive issue not in the selected items, stop and ask before expanding scope. + +### Reviewer mode + +Skip the plan step entirely — you are not implementing someone else's code. + +**User gate 2 (reviewer mode).** Ask: _"Post these on the PR? (`post` / `edit` / `cancel`)"_ + +- `post` → post anchored review (see below). +- `edit` → ask which items to drop or refine, then re-prompt. +- `cancel` → stop. + +## Posting an anchored PR review (both modes) + +When the user approves items to post, submit a **single** review via the GitHub Reviews API (`event: COMMENT` — never `APPROVE`/`REQUEST_CHANGES`), with every selected actionable item as an inline comment anchored to its diff line. Never use loose issue comments. + +Follow [references/posting-pr-review.md](references/posting-pr-review.md) exactly — it has the `gh api` call, the payload shape, the three-block review body (attribution line, synthesis summary, apply-all prompt), and the per-comment body budget and format. After posting, print the review `html_url` and a one-line summary of how many comments were posted (and how many fell back to general notes, if any). + +## Rules + +- Use fresh `general-purpose` subagents for every round — do not reuse other subagent types. +- Issue all agent calls in a single message for each round so they run truly in parallel. +- Keep large content on disk (`/tmp/in-depth-review-*`) so round-2 prompts stay compact. +- If one agent fails or returns malformed output, re-dispatch **that agent only**; do not restart the round. +- If the diff is empty, stop with a one-line message — do not invent findings. +- Do not auto-apply recommendations and do not auto-post reviews. Both user gates are mandatory. +- The moderator filter is non-negotiable: empty `failure_mode` → drop, NIT not meeting the gate → drop, do-not-raise items → drop, caps applied. +- Hide nits by default. Only print them when the user replies `show nits`. +- Reviews are always `event: COMMENT`. Never approve or request changes on the user's behalf. +- Posted comment bodies are terse: ≤60 words of prose + the code block, no bulleted lists, no prose preamble on the `suggestion`, no trailing addenda, and `Why it matters` is dropped when it would only rephrase the Point. The Example/context block is opt-out — include it only when prose + `suggestion` is genuinely ambiguous. +- Conditional agents (E/F/G) run only when classification matches. Do not invent triggers; if the user wants a forced full run, they will say so. +- The Adversarial agent (A) is opt-in. Default selected set is Engineering / Minimalist / Conventions / AntiSlop (B/C/D/H), plus the classified subset of Security / Database / Frontend (E/F/G). Dispatch Adversarial only when the user explicitly asks (`with adversarial`, `with agent a`, `+A`, etc.). Surface the Adversarial on/off decision in the Summary so the user can see what ran. +- **Refer to agents by name in every user-facing surface** — Summary, Actionable items (`Raised by` / `agreed by`), Disagreements, Withdrawn, and `original_author` / `rebuttals[].from` in Round 2 output. Bare letters (A/B/C/D/E/F/G) are acceptable only as ID prefixes (`B1`, `C3`) and as parenthetical shorthand next to the name. The agent-name table in the "Agent roster" section is the authoritative mapping; do not invent alternative names. +- Freshness preflight is mandatory before any agent dispatch. Subagents read repo context via `git show ":"` / `git grep ... ""`, not the worktree filesystem, unless the resolved `context_ref` is `worktree (stale, user accepted risk)`. +- Never run state-modifying git commands on the user's behalf (`checkout`, `stash`, `reset`, `clean`, `pull` with merge). The skill warns and asks; the user resolves local state. `git fetch` is allowed because it does not modify the working tree. +- Cross-repo evidence is opt-in by the user. Subagents tag findings with `evidence_required` and cap them at MAJOR until verified; the moderator asks the user for paths or `gh:owner/repo` slugs and runs the freshness preflight on each before treating the evidence as load-bearing. `skip` downgrades the finding to MINOR with a "speculative" prefix. +- `suggested_fix` is always a `{ before, after }` object of language-matched code snippets (never prose). **In-chat synthesis Actionable section:** render both halves as separate language-tagged fenced blocks (`// Before` then `// After`) — the user has no GitHub renderer in their terminal, so they need the pair to review the change before approving the post. **PR-review inline comments:** render ONLY the `suggestion` block — GitHub already renders the diff vs. the anchored line(s), so a separate `// Before` block would duplicate what's on screen. Empty `before` means pure addition (still emit a `suggestion` block; prefix `after` with `// Add after line `). Empty `after` means pure deletion (omit the `suggestion` block and write "_Delete the anchored line(s)._"). Omit `suggested_fix` entirely for structural changes with no clean drop-in. diff --git a/.agents/skills/in-depth-review/references/cross-repo-evidence.md b/.agents/skills/in-depth-review/references/cross-repo-evidence.md new file mode 100644 index 0000000000..8f25125f43 --- /dev/null +++ b/.agents/skills/in-depth-review/references/cross-repo-evidence.md @@ -0,0 +1,37 @@ +# Cross-repo evidence policy (binding for all agents) + +Read this before raising or finalizing any finding whose evidence may live outside the repo containing the diff. Referenced from `SKILL.md`. + +A finding's evidence is "cross-repo" when it depends on code in any repo other than the one containing the diff. Examples: the diff is in `clipboard-health` but the finding claims that `cbh-admin-frontend`, `payment-service`, or `worker-service-backend` still calls a deprecated endpoint. + +**Subagents must NOT silently read external repos.** Doing so risks (a) reading a stale local checkout and fabricating evidence (see "Freshness preflight"), or (b) citing a path the user has no checkout of, which the moderator can't verify. + +Subagent rule: if a finding's load-bearing evidence is in another repo, the subagent must emit the finding with the additional field: + +```json +"evidence_required": { + "repos": ["cbh-admin-frontend", "payment-service"], + "what_to_verify": "concrete grep / file path / question the moderator should answer to confirm or kill the finding" +} +``` + +…and cap the finding's severity at **MAJOR**. Findings marked `evidence_required` cannot be CRITICAL until the moderator has confirmed the cross-repo evidence on a verified-fresh ref. + +Moderator rule: after Round 1, collect every `evidence_required` block across all subagents and ask the user **before Round 2**: + +> Some findings depend on code outside ``. To verify, I need access to: +> +> - `` — to check `` (raised by agent ) +> - `` — to check `` (raised by agent ) +> +> For each repo, reply with one of: +> +> - a local path (e.g. `/Users/you/repos/cbh/`) — I will run the freshness preflight on it before reading +> - `gh:/` — I will fetch the file content via `gh api repos///contents/?ref=main` instead of cloning +> - `skip` — the finding will be downgraded to "speculative — cross-repo evidence not verified" and capped at MINOR +> +> Or reply `skip all` to downgrade every cross-repo finding. + +For each user-provided local path, run the **same freshness preflight** as on the primary repo (fetch, check ahead/behind, check working-tree cleanliness). If the external repo is stale or on a non-default branch, warn with the same template and require explicit user acknowledgement before reading. Always read external code via `git show "${external_context_ref}:"` / `git grep ... "${external_context_ref}" -- `, never via the worktree filesystem. + +After verification, re-dispatch only the affected subagents (one agent per repo group) with the verified evidence (or its absence) inlined, so they can finalize severity for Round 2. Findings whose cross-repo evidence the user `skip`s are kept but capped at MINOR with a "speculative" prefix in the title. diff --git a/.agents/skills/in-depth-review/references/posting-pr-review.md b/.agents/skills/in-depth-review/references/posting-pr-review.md new file mode 100644 index 0000000000..06b764a9f3 --- /dev/null +++ b/.agents/skills/in-depth-review/references/posting-pr-review.md @@ -0,0 +1,105 @@ +# Posting an anchored PR review (both modes) + +Read this once the user approves items to post. Referenced from `SKILL.md`. + +## Contents + +- API call and payload shape +- Review body (top-level comment): attribution line, synthesis summary, apply-all prompt +- Each comment body: budget, template, formatting rules + +## API call and payload + +Always post via the GitHub Reviews API as a **single** review, with all selected actionable items as inline review comments anchored to specific diff lines. **Never** loose issue comments. + +Build the payload as JSON and pipe it through `gh api --input`: + +```bash +# /tmp/in-depth-review-payload.json contains: {event, commit_id, body, comments: [...]} +gh api -X POST "repos///pulls//reviews" --input /tmp/in-depth-review-payload.json +``` + +Payload shape: + +```json +{ + "event": "COMMENT", + "commit_id": "", + "body": "", + "comments": [ + { "path": "src/foo.ts", "line": 42, "side": "RIGHT", "body": "..." }, + { + "path": "src/bar.ts", + "start_line": 10, + "line": 14, + "start_side": "RIGHT", + "side": "RIGHT", + "body": "..." + } + ] +} +``` + +Rules for posting: + +- `event` is **always `COMMENT`** — never `APPROVE` or `REQUEST_CHANGES`. Approval / blocking is a human decision, not the skill's. +- For multi-line ranges, set `start_line`, `line`, `start_side: "RIGHT"`, `side: "RIGHT"`. +- For findings without a clean line anchor (rare), pick the first changed line of the most relevant file rather than dropping the comment or going loose. +- If literally nothing is anchorable for a given finding, fold it into the review `body` as a labeled "general note" and flag this in the post-confirmation summary so the user knows. +- Use `commit_id` from `pr.headRefOid` so comments anchor to the reviewed commit. +- Resolve `/` once via `gh repo view --json nameWithOwner --jq .nameWithOwner` and stash it in `/tmp/in-depth-review-meta.json` alongside `head_sha` so permalinks below can be built without re-querying. + +## Review body (top-level comment) + +The `body` field on the review (not the inline comment bodies) must contain three blocks, in order: + +**1. Attribution line.** A single sentence disclosing that the review was generated by Claude Code. Use exactly: + +> _These comments were generated by @\ using Claude Code._ + +Substitute `` from `gh api user --jq .login`. Italicize the line so it renders as muted text. This is non-negotiable — collaborators must be able to tell at a glance that the review is AI-assisted. Do **not** mention the In-Depth Review skill by name; it's a local skill the PR author can't see or use. + +**2. Synthesis Summary.** The same one-paragraph summary printed in the in-chat synthesis: what the change does and your recommendation. Do not include mode, per-agent headline verdicts, or which agents ran — that's methodology noise. + +**3. "Apply all comments at once" prompt.** A fenced block containing a self-contained Claude Code prompt the PR author (or any agent operator) can copy-paste into a Claude Code session on a checkout of this branch to address every inline comment in one shot. Template (substitute the `<…>` placeholders before posting): + +````markdown +**Apply all comments at once** — paste this into Claude Code on a checkout of this branch: + +``` +Fetch the most recent review by @ on PR . For every inline comment in that review, address the issue: when the comment includes a `suggestion` block, apply it verbatim; otherwise implement an equivalent fix that satisfies the comment's "Why it matters" rationale. After resolving each thread, post a reply on that thread with a one-line summary of what you changed. When all comments are handled, run the project's tests, commit the changes with a message that references the review, and report back any threads you could not resolve and why. +``` +```` + +Build the body in `/tmp/in-depth-review-payload.json` with literal newlines (use `jq -n --arg body "$BODY" '{body: $body, ...}'` or build the JSON via a small heredoc-fed `python -c` so newlines are real `\n` in the JSON string, not the two characters `\` `n`). + +## Each comment body + +**Budget:** ≤60 words of prose total + the code block. The hard caps below are binding — if a section won't fit, the finding is probably two findings; split or drop one. + +````markdown +**[SEVERITY] Title** + + + +**Why it matters:** + +**Suggested fix:** + +```suggestion + +``` +```` + +Rules for code and links inside the comment body: + +- **No bulleted lists in the comment body.** Bullets fragment reasoning; either the prose fits in one sentence or it doesn't belong on the PR. +- **No prose preamble on the suggested fix.** The code block speaks for itself — don't write "Suggested fix — route Rate the same way as Pay" before the block; just show the block. +- **No trailing addenda.** "If feature X isn't shipped yet…", "Note that this also affects…", "While you're here…" — these belong in the in-chat synthesis, not on the posted comment. The reviewee can ask follow-ups on the thread. +- **Include a `suggestion` block whenever it makes the fix concrete.** Skip it only when the issue is purely structural or when prose alone says everything. +- **`suggestion` block stands alone on the PR.** GitHub already renders the diff vs. the anchored line(s) inside the suggestion block (red `-` lines + green `+` lines + one-click apply). Do **NOT** also include a `// Before` fenced block — it duplicates what GitHub already shows and makes the comment twice as long. This is different from the in-chat **Actionable** section, which DOES render `// Before` + `// After` pairs so the user can review the change before approving the post (the user has no GitHub renderer in their terminal). +- **When to skip the `suggestion` block:** the fix is structural with no clean drop-in (e.g. "move this file"), OR `suggested_fix.after` is empty (pure deletion — write "_Delete the anchored line(s)._" instead), OR `suggested_fix.before` is empty (pure addition — the `suggestion` block still works, but prefix `suggested_fix.after` with a `// Add after line ` comment so the intent reads cleanly). +- **Optional Example/context fenced block is opt-out by default.** Include a non-`suggestion` code block only when prose-plus-`suggestion` is genuinely ambiguous — e.g. you're flagging a pattern violation and the offending pattern is not in the anchored lines. Render it under an **Example / context:** heading as a regular language-tagged fenced block, with a permalink in the prose above pointing to where it lives in the tree. If you're tempted to paste the anchored lines as "context," you're rebuilding the `// Before` block GitHub already renders — drop it. Default: skip. +- **Permalinks for in-prose line references.** Build them as `https://github.com///blob//#L-L` (single line: `#L`). Always pin to `head_sha`, never `main` or a branch name — branch links break as soon as the branch moves. Use Markdown link syntax with a short, meaningful label (e.g. `[the existing ServiceResult pattern](https://github.com/...#L88-L104)`), not the raw URL. + +After posting, print the review `html_url` and a one-line summary of how many comments were posted (and how many fell back to general notes, if any). diff --git a/.agents/skills/simple-review/SKILL.md b/.agents/skills/simple-review/SKILL.md new file mode 100644 index 0000000000..78fe5e7eed --- /dev/null +++ b/.agents/skills/simple-review/SKILL.md @@ -0,0 +1,472 @@ +--- +name: simple-review +description: Single-pass code review of the current branch or a PR using the in-depth review rubric without subagents, posted as anchored PR comments. Use when the user asks to review a small or medium diff, wants a fast or low-budget review, or runs /simple-review [PR-number-or-URL]. For large or high-stakes diffs that benefit from multiple debating reviewers, use in-depth-review instead. +--- + +# Simple Review + +Single-pass code review you (the main agent) perform yourself — **no subagents**. Condenses the rules from `/in-depth-review` into one checklist you walk through directly. Use this for small/medium PRs and when budget matters. Use `/in-depth-review` instead when the diff is large, high-stakes, or benefits from independent perspectives that can debate. + +## Invocation + +- **`/simple-review`** — review the current branch (resolves the open PR for the branch if any, otherwise diffs against the default branch). +- **`/simple-review `** — fast path for reviewing someone else's PR while sitting on `main` (typical entry from a worktree). Skips local branch checkout, fetches diff/metadata via `gh`, forces **reviewer mode**, skips the plan/execute path. Accepts a bare number (current repo) or full GitHub URL (identifies owner/repo). + +## Scope + +### Path A — PR-argument fast path (argument provided) + +- Parse the argument: bare integer → `gh repo view --json nameWithOwner --jq .nameWithOwner`; URL → extract `/` and ``. +- Do **not** check out the PR branch. +- Mode is **locked to reviewer**. + +Gather in parallel: + +```bash +gh pr view --repo / --json number,url,title,body,baseRefName,author,headRefOid,files +gh pr diff --repo / +gh api user --jq .login +``` + +If `pr.author.login == viewer.login`, still proceed in reviewer mode but flag this in Summary so the user can switch to a manual `/simple-review` run on their checkout if they meant to implement. + +### Path B — current-branch path (no argument) + +- Open PR for current branch → review that PR. Base = `baseRefName`, diff = `git diff $base...HEAD`. Context = PR title + body. +- No PR → diff vs default branch (`main`, fall back to `master`). Context = `git log --format='%h %s%n%n%b' $base..HEAD`. +- **Never** review uncommitted working-tree changes. Empty diff → stop and report. + +Determine **mode**: + +- PR exists and `pr.author.login == viewer.login` → **author mode**. +- PR exists and authors differ → **reviewer mode**. +- No PR → **author mode**. + +You hold review state in-context — no `/tmp/*` persistence needed for findings (the PR-posting step writes one transient payload file; see Posting an anchored PR review). + +## Freshness preflight (mandatory before reading code) + +Stale local state produces false-positive findings. Before reviewing, verify the ref you'll read for context is current. + +Run on the **primary repo** (the one containing the diff): + +```bash +git fetch origin "$base" --quiet +git rev-parse --abbrev-ref HEAD +git status --porcelain +git rev-list --left-right --count "HEAD...origin/${base}" +git log -1 --format='%h %ci' "origin/${base}" +``` + +Decide `context_ref`: + +- **Path A (reviewer):** `context_ref = origin/${base}` (usually `main`). Worktree may be used only when `HEAD == origin/${base}` AND clean AND fetch succeeded. +- **Path B (author):** `context_ref = origin/${base}`. The local feature branch IS the diff; pre-PR context comes from `origin/${base}`. + +Stop and ask the user when: + +1. `git fetch` failed (offline, auth). +2. `HEAD` differs from `origin/${base}` on Path A. +3. Working tree is dirty AND dirty paths overlap the diff's changed-file list or anything you'll need to read. +4. `HEAD` is behind `origin/${base}` (any non-zero "behind"). +5. `HEAD` is more than a small number of commits ahead of `origin/${base}` on Path A. + +Warn template (substitute verified state): + +> Freshness check for `/` at ``: +> +> - on branch `` (expected `` for Path A) +> - `` ahead, `` behind `origin/` (last: ` `) +> - working tree: `` +> +> Reading from this worktree may surface findings based on stale state. +> Reply: `proceed` (use worktree, accept the risk), `use-origin` (read context via `git show origin/:` — recommended), or `stop`. + +**Never** run `git checkout`, `stash`, `reset`, or other state-modifying git on the user's behalf. The skill warns and asks; the user resolves local state. `git fetch` is allowed (read-only). + +For Path A PR review when the worktree is dirty or on a non-base branch, the practical workaround is to fetch the PR head to a local ref: + +```bash +git fetch origin "pull//head:refs/remotes/origin/pr-" --quiet +``` + +Then read PR-head content via `git show origin/pr-:` and base context via `git show origin/:`. No checkout needed. + +### Reading code + +When `context_ref = origin/` (or `origin/pr-` for PR head): + +- Read via `git show "${context_ref}:"` (whole files) or `git grep -n "${context_ref}" -- ` (search). +- Avoid the Read tool on the worktree filesystem for tracked content — it may be stale. +- Worktree Read is OK only for files brand-new in the diff (untracked at `context_ref`); note "verified against: worktree" in the finding. + +When `context_ref = worktree (stale, user accepted risk)`: + +- Read tool is OK but every CRITICAL/MAJOR finding is downgraded to MINOR unless evidence is internal to the diff itself. Tag each finding with "verified against: worktree-stale". + +## Cross-repo evidence policy + +A finding's evidence is "cross-repo" when its load-bearing claim depends on code in any repo other than the one containing the diff — most commonly a consumer, producer, or downstream parser of something the diff changes. **Never silently read external repos and never claim a downstream impact you have not verified** — speculating that "the FE will break" without reading the consumer is a top source of false-positive findings. Cap any cross-repo finding at **MAJOR** until verified. + +Read [references/cross-repo-evidence.md](references/cross-repo-evidence.md) before raising or finalizing any cross-repo finding — it has when the policy fires, the verify-or-downgrade procedure, the access-request template, and what "verify" means for contract/schema vs API-response changes. + +## Diff classification (pick which lenses apply) + +Walk the changed-file list. Activate lenses that match: + +- **Security lens** triggers on: `routes/`, `controllers/`, `middleware*/`, files matching `auth*`/`*permission*`/`*acl*`/`*token*`/`*session*`, response serializers, OpenAPI/contract definitions, new API endpoint files. +- **Database lens** triggers on: `migrations/`, `*.sql`, files matching `schema*`, Mongoose/Prisma model files (`models/`, `*.model.ts`, `*.schema.ts`), repository/DAL files, query builders. +- **Frontend lens** triggers on: `*.tsx`, `*.jsx`, `*.css`, `*.scss`, `pages/`, `components/`, `hooks/`, or anything importing from `react`, `@tanstack/react-query`, or a design-system package. + +Always-on lenses: **Engineering**, **Minimalism**, **Conventions**, **AntiSlop**. + +## Severity rubric (binding) + +- **CRITICAL** — realistic input causes incorrect behavior, data loss, security regression, broken contract, or a paging incident. +- **MAJOR** — meaningful degradation of correctness/UX/observability under realistic conditions; OR a documented-convention violation with concrete downstream impact. +- **MINOR** — cheap, concrete improvement with a named benefit. +- **NIT** — only admissible if (a) repeats ≥2× in the diff, (b) conflicts with a documented convention, or (c) is a one-line trivial fix. + +Every finding **must** include a `failure_mode`: one sentence on the concrete user-, oncall-, or maintainer-visible bad outcome that would occur if not fixed. Hypotheticals like "a future caller might…" do **NOT** satisfy this — drop the finding. + +### Do-not-raise list (binding) + +- Speculative defensiveness at trusted internal boundaries. +- Restating the obvious ("consider a comment explaining what this does"). +- Hypothetical future-caller scenarios with no current caller. +- Style/formatting a linter or formatter covers. +- Test-coverage demands on trivially-verifiable code. +- "Add observability" without naming a concrete failure mode it would help debug. +- Abstract SOLID-style "consider extracting…" without a concrete failure mode. +- Aesthetic naming preferences — only raise names that mislead about behavior. + +## Review checklist + +Walk the diff once. For each lens, you're looking for _the smallest number of high-signal findings_, not exhaustive coverage. Cap **6 actionable items** (CRITICAL/MAJOR/MINOR) plus **8 nits** retained internally; anything beyond is "N additional items omitted; ask for the full list." + +For every candidate finding, run the litmus test before keeping it: _"What is the concrete, current, product-visible cost of leaving this code in?"_ If you can't answer in one sentence, drop it. + +### Engineering (always) + +For each change name a realistic input or condition that would expose a bug. If you cannot, do not raise it. + +- Edge cases, error paths, observability of real failure modes. +- Tests cover real risk, not lines. +- Concurrency, performance at real scale, data integrity. +- Backward compatibility, on-call implications, degraded-mode behavior. +- Async/await ordering matches actual data dependencies. +- Timezone correctness in date code; currency variables explicitly in minor units. +- API surface changed → AuthN/AuthZ/PII (deep dive under Security). +- Migration → rollback, rolling-deploy compat, ETL/downstream impact (deep dive under Database). +- Schema/query changed → indexes for new patterns, N+1, cascade semantics, type fit (deep dive under Database). +- Telemetry covers business/product value, not only engineering surface metrics. +- Contract/backward-compat for any consumer-visible response shape change. **If you suspect a consumer break, the finding is cross-repo — follow the Cross-repo evidence policy before raising it. Do not raise speculative "the FE will fail to parse" findings without reading the FE.** + +### Minimalism (always) + +The smallest diff that ships the intent is the best diff. + +- Unneeded abstractions, speculative generality, dead branches. +- Redundant validation, defensive code at trusted internal boundaries. +- Comments that restate code; new files/utilities that duplicate existing ones. +- Tests that exercise the framework, not behavior. +- Flags/config knobs without a concrete caller. +- Duplicated error handlers in the same scope. +- Commented-out code or dead branches. + +For every "delete this" finding, `failure_mode` must state the **concrete cost of keeping the code**. + +### Conventions (always) + +You are the convention owner — read the repo's actual conventions before flagging anything. Consult, in order of priority: + +- `git show ${context_ref}:AGENTS.md` and `CLAUDE.md` (if present) +- `git ls-tree -r --name-only ${context_ref} -- .rules` then read each rule file relevant to the diff +- Neighboring files in the same module/service for in-practice patterns +- Package READMEs in the touched paths + +Check the diff for (only what's actually documented in the consulted sources): + +- Preferred internal packages over third-party (`@clipboard-health/*`, internal `lib/*`). +- Terminology rules (worker/workplace, not HCP/facility). +- `lodash` removal preference. +- `@clipboard-health/datetime` over `date-fns`/`date-fns-tz`/`moment`/`luxon`. +- Internal `Money` package for currencies; explicit minor-units variable names. +- Logging conventions (`longContext`, `ObjectId.toString()`, no variables in log messages). +- Null/undefined checks via `isDefined`/`isNil` over truthy. +- 4+ argument functions → params object. +- Error-handling patterns in the same service (typed `ServiceResult` vs throw). +- Test conventions, naming, file location (service test structure, `createTestContext`/`tearDown`, `beforeAll/afterAll` for GET). +- Business-meaningful magic constants that should be named. +- ENV access via Configuration abstraction, not direct `process.env`. +- Logging library standard for the repo — derive from existing code, not hardcoded. +- Pinned dependencies; lock file in sync with `package.json`. +- RESTful route shape and uniform response format within a service. +- **Internal inconsistency within the diff itself** (e.g. half of imports from one package family, half from upstream; same persisted shape written three different ways across three call sites). + +Frontend conventions are the Frontend lens's job when activated. If FE lens isn't active (no FE files), surface FE convention drift here only if it appears. + +Tag every convention finding with `[CONVENTION]` in the title. Cap severity at MAJOR (only when behavior diverges as a result) or MINOR otherwise. + +### AntiSlop (always) + +This PR may have been written or assisted by an LLM. For each addition, ask: _"Is this line earning its keep, or is it pattern-matching what code is supposed to look like?"_ Push back on what other lenses are too polite to flag. Apply to additions **inside the diff itself**. + +- **Defensive code at trusted internal boundaries.** Null guards on private helpers whose callers' types guarantee non-null; `try`/`catch` wrapping a single non-throwing call, or that re-throws unchanged, or that "logs and swallows" without naming what to do next; optional-chaining through types that don't include optionality. +- **Defensiveness against unrealistic product scenarios.** Litmus: _"In the real product flow this code participates in, what user action / system event / upstream call could land us in this branch?"_ If the answer is "none" or "I had to invent one to justify the guard", it's slop. Concrete shapes: + - Null/undefined guard on an ID immediately after that ID was used to load (and find) the entity. + - A `null`/`undefined`/`""`/`0` branch on a field whose TypeScript or Zod/class-validator already rejects those. + - Re-validation of a value the request DTO already validated upstream in the same lifecycle. + - Consistency check (`if (a !== b) throw`) between two fields the data model forbids being unequal. + - Branch for a product-impossible state. + - Retry/fallback around an SDK call that already retries or returns a typed error. + - `catch` for an error class statically known not to throw, or that logs+swallows. + - "Future-proof" code path with no current `v2`. +- **Restating-the-code comments.** `// fetch the user`, JSDoc on private helpers that only restates the signature, `// Note: this is important`, section banners in short files. +- **Empty scaffolding.** `// TODO` with no owner/ticket; redundant pre-conditions; debug logs that survived to the PR; default `else { return undefined; }` after exhaustive branching; `_unused` prefixes that should be deletions. +- **Speculative generality.** Helper called once that wraps two trivial lines; `Map`/`Set`/config keyed by a single hardcoded value; "strategy"/"registry" pattern with one strategy; union types whose only second case is `never`/placeholder. +- **Unused parameters/overloads/fields.** Args destructured but never read; interface methods with empty implementations; new optional fields with no producer or consumer. +- **Tests that exercise the framework, not the code.** `jest.fn().mockReturnValue(x); expect(fn()).toBe(x)`; snapshot tests with no semantic assertion; tests that mock the unit under test; test names that describe the implementation (`it("calls foo.bar"…)`) instead of behavior. +- **Dead AI breadcrumbs.** Variables whose only use is logging or debug branches; `console.log`/`console.error` that should have been removed before commit; commented-out alternative implementations. +- **Tone/description mismatch.** PR description claims behavior the diff doesn't have; variable/function names that pattern-match engineering writing without naming the role (`data`, `result`, `processed`, `_handle`, `doStuff`). + +Default `suggested_fix` is **delete** (empty `after`) or **simplify** (smaller `after`). Suggesting "add a justifying comment" is itself slop — do not propose it. + +### Security lens (when triggered) + +Where could this change leak data, bypass authorization, or expand the trust boundary? + +- AuthN: every new/modified endpoint has authenticated identity unless explicitly public. +- AuthZ: per-endpoint AND per-resource permission checks; cross-tenant/cross-user access. +- Secrets in configuration — no hardcoded keys, no secrets in client bundles. +- No self-made or client-side cryptography. +- SQL/NoSQL injection: string-concatenated queries, unvalidated `$where`, raw aggregation pipelines from user input. +- Sensitive data in response payloads — over-fetching, over-serialization, internal IDs leaked. +- PII fields logged, cached, or sent to telemetry. +- Input validation at the boundary (Zod / class-validator) on every new endpoint. +- For numeric inputs: explicit `min`/`max` bounds. Verify what the JS engine does with `Number.MAX_SAFE_INTEGER + 1`, BigInt conversion, negative values, and whether the catch path returns a structured 400 or a 500 with stack log. + +### Database lens (when triggered) + +What breaks at production scale or during deploy? + +- Index coverage for new query patterns; missing compound indexes. +- N+1 query shapes; loops issuing per-iteration queries. +- Schema normalization — denormalization that creates write-amplification or update anomalies. +- Data types — range, precision, locale (numeric, date, money). Mongoose `Number` is IEEE 754 double; safe for integers up to 2^53. +- Cascade-delete and FK-on-delete semantics; orphan risk. +- Migration rollback strategy; online without downtime. +- Backward compatibility during a rolling deploy (old code reads new schema; new code reads old schema). +- ETL / downstream consumer impact when schema or field semantics change. +- New collections / tables ship with the indexes their query patterns need on day one. +- **Dual-write fields** (storing both `amount` and `amountInMinorUnits`-style pairs): is canonicalization happening on write, or does the on-disk record encode two contradictory values? +- Backfill for new fields on existing records: present, deferred (with ticket), or missing? + +### Frontend lens (when triggered) + +Does this follow our FE conventions, and will it behave correctly under realistic user conditions? + +- API calls through v2 / custom hooks, not raw `fetch`/`axios` in components. +- React Query: thin wrappers (`useGetQuery`, etc.), not raw `useQuery`/`useMutation`. +- Falsy checks via `isDefined()`/`isNil()`, not `!value`. +- Test actions wrapped in `act()`. +- Zod: `z.nativeEnum` when a TS enum already exists. +- Prop drilling — when a value passes through ≥2 layers, consider context. +- Feature flags via `useCbhFlag` (or repo equivalent), not direct config reads. +- New components/styles from the design-system library, not built from scratch. +- Loading / empty / error states for any new data-fetching surface. +- Accessibility: keyboard navigation, ARIA roles, labels on interactive controls. + +## Self-filter (before synthesis) + +After walking the checklist, apply these filters to your candidate findings: + +1. **Drop findings with empty or hypothetical `failure_mode`.** "A future caller might…", "in case someone…" → drop. +2. **Self-audit for slop.** For every finding you wrote, ask: does it match a slop pattern (asks-for-defensive-guard on already-narrowed value; hypothetical future caller; restating-obvious comment request; abstract refactor with no concrete cost-of-keeping; observability without named failure mode; test for trivially-verifiable code; defends against a state the product cannot produce)? If yes and you can't write a concrete, product-specific cost in one sentence — drop it. Being your own AntiSlop reviewer is the main lever for keeping this skill honest. +3. **Cross-repo audit.** For every finding, ask: does the failure_mode reference a downstream actor (FE, mobile, consumer service, rolling deploy, external library user) or a contract/schema/public-artifact boundary? If yes, did you actually read the relevant external file(s) to confirm the claim, or are you reasoning from priors about "how FEs usually work"? If you didn't read it, the finding is cross-repo — route to the Cross-repo evidence policy (verify, ask for access, or downgrade to a clearly-labeled "speculative" MINOR). Do not ship a "consumer will break" finding sourced from imagination. +4. **Drop do-not-raise items** that slipped through. +5. **Apply the NIT gate.** NITs that don't meet (a)/(b)/(c) → drop. Kept internally but hidden by default in synthesis. +6. **Merge near-duplicates** under one finding (note which lens(es) surfaced it). +7. **Apply hard caps.** 6 actionable, 8 NITs retained, rest summarized as "N additional items omitted". + +Track dropped items in a Withdrawn section (one-liner each) so the user can see what was filtered. + +## suggested_fix schema (when present) + +```json +"suggested_fix": { + "before": "", + "after": "" +} +``` + +- Pure deletion: `after` is `""`. +- Pure addition at a new line: `before` is `""`, prefix `after` with `// add after line `. +- Structural change with no clean drop-in: omit `suggested_fix`, describe in `point`/`failure_mode`. +- Both must be code (not prose), preserving the file's indentation. Don't elide with `// ...`. + +## Synthesize + +### Summary + +One short paragraph: what the change does and your overall recommendation (ship / ship with changes / do not ship). Do not state the mode, list the lenses you applied, or list the convention sources you consulted — that metadata is noise for the user. Keep it to substance. + +### Actionable + +Up to 6 items. Format each: + +- **[SEVERITY] Title** — `file:lines` +- **Point:** one sentence. +- **Why it matters:** the `failure_mode`. +- **Suggested fix (before → after):** two stacked fenced code blocks — first `// Before` (current code), then `// After` (replacement). Same language tag for both. For a pure deletion, show only `Before` and write "_Delete these lines._" For a pure addition, show only `After` prefixed with `// Add after line `. Omit when structural; explain in prose. +- **Lens(es):** which lens(es) surfaced this (e.g. `Engineering, AntiSlop`). + +Render template: + +````markdown +**Suggested fix (before → after):** + +```ts +// Before — src/foo/bar.ts:42-45 +const x = doThing(); +if (x !== null) { + return x; +} +``` + +```ts +// After +const x = doThing(); +return isDefined(x) ? x : undefined; +``` +```` + +Order by severity (CRITICAL → MAJOR → MINOR), then by file. + +### Nits + +Do **not** print by default. Print only: _"N nit(s) available (M from convention audit). Pick `Show the N NIT(s) first` in the picker below to see them."_ If N is 0, omit this entire section. Surface the "show nits" affordance as an option in the User gate 1 `AskUserQuestion` call (not as typed text). When selected, print one-liners with `file:lines` and `[CONVENTION]` tag where applicable, then re-call User gate 1. + +### Withdrawn + +Terse one-liners of items you dropped (do-not-raise, NIT gate, self-slop audit). Transparency only. + +## User gate 1 — select items + +Ask via **`AskUserQuestion`**, not as a text prompt. Render each actionable finding as an option, with a trailing "show the N NIT(s) first" option when `N > 0`. Set `multiSelect: true` so the user can pick any combination (including none). Do not proceed until the user answers. + +**Template:** + +```ts +AskUserQuestion({ + questions: [ + { + question: "Which findings should I ?", // "post as inline comments on PR " / "implement locally" / "act on" + header: "Findings", + multiSelect: true, + options: [ + { label: "1 - ()", description: "" }, + { label: "2 - …", description: "…" }, + // …one entry per actionable finding (cap 6) + { label: "Show the NIT(s) first", description: "Print the NIT list before deciding." }, // omit when N == 0 + ], + }, + ], +}); +``` + +Option-label format: ` - ()`. The id matches the in-chat numbering in the Actionable section above. The `description` is one short sentence — the failure_mode, not the fix. Do not pad with file paths; the user already has the full Actionable section in the chat. + +Branching on the answer: + +- **Zero findings selected** (user submitted no boxes / picked Skip) → end the skill with a one-line "no findings selected — nothing to do" reply. +- **Only "Show the N NIT(s) first"** selected → print the NITs (one-liners with `file:lines` and `[CONVENTION]` tag where applicable), then re-call `AskUserQuestion` for this same gate. +- **Findings selected (with or without the NIT toggle)** → proceed to gate 2. If the NIT toggle was also checked, print NITs once before gate 2. + +## Branch on mode + +### Author mode + +**Plan.** For the selected ids only, produce an ordered implementation plan: steps, files touched per step, tests to add/update, verification commands. Do **not** edit files yet. + +**User gate 2 (author)** — ask via **`AskUserQuestion`** (`multiSelect: false`): + +```ts +AskUserQuestion({ + questions: [ + { + question: "What do you want to do with the selected findings?", + header: "Next step", + multiSelect: false, + options: [ + { label: "Implement locally", description: "Apply the plan against your checkout." }, + { + label: "Post as review", + description: "Post anchored comments on your own PR for the team to see, then stop.", + }, + { label: "Both", description: "Post the review first, then execute the plan locally." }, + { label: "Edit the plan", description: "Revise the plan; I'll re-prompt this gate." }, + { label: "Cancel", description: "Stop." }, + ], + }, + ], +}); +``` + +- `Implement locally` → execute the plan. +- `Post as review` → post anchored review (below); skill ends. +- `Both` → post review first, then execute. +- `Edit the plan` → revise, re-prompt this gate. +- `Cancel` → stop. + +**Execute.** Use TodoWrite to track each step. Apply the plan, run verification, report results. If a step surfaces a new substantive issue not in the selected items, stop and ask before expanding scope. + +### Reviewer mode + +Skip the plan step — you are not implementing someone else's code. + +**User gate 2 (reviewer)** — ask via **`AskUserQuestion`** (`multiSelect: false`): + +```ts +AskUserQuestion({ + questions: [ + { + question: "Post these on the PR?", + header: "Post review", + multiSelect: false, + options: [ + { + label: "Post", + description: "Submit a single COMMENT review with the selected anchored comments.", + }, + { label: "Edit", description: "Ask which to drop or refine, then re-prompt." }, + { label: "Cancel", description: "Stop." }, + ], + }, + ], +}); +``` + +- `Post` → post anchored review. +- `Edit` → ask which to drop or refine, then re-prompt. +- `Cancel` → stop. + +## Posting an anchored PR review (both modes) + +When the user approves items to post, submit a **single** review via the GitHub Reviews API (`event: COMMENT` — never `APPROVE`/`REQUEST_CHANGES`), with every selected actionable item as an inline comment anchored to its diff line. Never use loose issue comments. + +Follow [references/posting-pr-review.md](references/posting-pr-review.md) exactly — it has the `gh api` call, the payload shape, the three-block review body (attribution line, summary, apply-all prompt), and the per-comment body budget and format. After posting, print the review `html_url` and a one-line summary of how many comments were posted (and fallback general-notes count if any). + +## Rules + +- **No subagents.** This skill runs entirely in the main context. If a finding needs deeper independent investigation that you can't do confidently in-line, surface it as a flagged finding rather than guess. +- Issue parallel `gh`/`git` commands in a single message when independent (PR view, diff, files, viewer). +- If a finding has empty `failure_mode` → drop. NIT not meeting the gate → drop. Do-not-raise items → drop. Caps applied (6 actionable, 8 nits). +- Hide nits by default. Print only when the user selects `Show the N NIT(s) first` in the User gate 1 picker. +- Reviews are always `event: COMMENT`. Never approve or request changes on the user's behalf. +- Conditional lenses (Security/Database/Frontend) run only when classification matches. +- Freshness preflight is mandatory before reading code. Read repo context via `git show ":"`, not the worktree filesystem, unless `context_ref = worktree (stale, user accepted risk)`. +- Never run state-modifying git commands on the user's behalf (`checkout`, `stash`, `reset`, `clean`, `pull` with merge). Warn and ask. `git fetch` is allowed. +- Cross-repo evidence is opt-in by the user. The policy fires whenever the diff touches a contract/schema/published-artifact boundary OR the finding's failure_mode references a downstream actor (FE, mobile, consumer service, rolling deploy). Identify the specific external file you'd need to read, search locally first, then ask the user using the access-request template. Cap at MAJOR until verified; `skip` → either drop or keep as MINOR with a visible "speculative — assumes ``" prefix. **Never raise a "consumer will break" finding without reading the consumer.** +- `suggested_fix` is always a `{ before, after }` object of code (not prose). **In-chat Actionable:** render both halves as separate `// Before` then `// After` fenced blocks. **PR-review inline comments:** render only the `suggestion` block (GitHub already renders the diff). Empty `before` = pure addition; empty `after` = pure deletion (omit `suggestion`, write "_Delete the anchored line(s)._"); omit entirely for structural changes. +- Both user gates are mandatory and **must** be presented via `AskUserQuestion` (the structured picker), not as inline text questions. Do not auto-apply recommendations and do not auto-post reviews. diff --git a/.agents/skills/simple-review/references/cross-repo-evidence.md b/.agents/skills/simple-review/references/cross-repo-evidence.md new file mode 100644 index 0000000000..1e2dfcd910 --- /dev/null +++ b/.agents/skills/simple-review/references/cross-repo-evidence.md @@ -0,0 +1,64 @@ +# Cross-repo evidence policy + +Read this before raising or finalizing any cross-repo finding. Referenced from `SKILL.md`. + +A finding's evidence is "cross-repo" when its load-bearing claim depends on code in any repo other than the one containing the diff — most commonly a producer, consumer, or downstream parser of something the diff changes. **Never silently read external repos** — they may be stale or you may have no checkout at all. Equally, **never claim a downstream impact you have not verified.** Speculating that "the FE will break" or "a consumer will choke on this" without reading the consumer is a top source of false-positive findings. + +## When this policy fires + +Treat a finding as cross-repo _before_ raising it whenever any of these apply: + +1. The diff touches a **contract/schema package** (anything in `packages/contract-*`, `packages/*-contract*`, ts-rest contracts, JSON Schema files, OpenAPI specs, Protobuf, Avro, etc.) — consumers of these packages live in other repos by design. +2. The diff alters a **deployed-artifact boundary**: HTTP response shape, queue message shape, public SDK signature, exported library type, env-var contract, DB-record shape read by other services. +3. The failure mode you wrote references a downstream actor by role rather than name: _"the FE will…", "the mobile app will…", "consumers will…", "a rolling deploy will break for…"_ — you cannot verify any of those claims from inside the diff's repo alone. +4. The finding is a "backward-compatibility" or "rolling-deploy" concern about a producer change. +5. The finding's argument is "this diverges from how other consumers do X" — and "other consumers" are not in this repo. + +If any of (1)–(5) is true, the finding is cross-repo. Move on to **Verify or downgrade** below before keeping it in your candidate list. + +## Verify or downgrade (binding) + +For each cross-repo finding: + +1. **Identify the specific consumer/producer file(s)** you would need to read to confirm the claim. Be concrete: _"facility app's `useGetInvoiceBalances` hook to see whether it imports the contract schema and whether `.parse()` is strict."_ If you cannot name a specific artifact, the finding is speculative — drop it. +2. **Search likely locations you already have access to** before asking the user. Sibling repos under the same parent directory, monorepo workspaces, vendored copies. Use targeted `grep`/`git grep` — do not recursively scan the whole filesystem. +3. **If found locally**, run the **same freshness preflight** on that external repo (branch, dirty state, behind/ahead). Read context via `git show "${external_context_ref}:"`, never via worktree filesystem. Note "verified against: `@`" in the finding. +4. **If not found locally**, ask the user for access using the template below. Do not raise the finding until the user has responded. +5. **If the user picks `skip`**, you have two choices: + - **Drop the finding entirely** if the speculative version doesn't pass the litmus test ("concrete, current, product-visible cost"). + - **Keep as MINOR with explicit "speculative" prefix in the title** — only when the worst-case is genuinely concerning AND you can name the _assumption_ the finding rests on (e.g., _"assuming the FE imports this schema and parses strictly, then…"_). Make the assumption visible so the PR author can confirm or refute it. + +Severity cap on any cross-repo finding is **MAJOR** until verified. Anything verified as actual breakage can be raised to its true severity. + +## Access-request template + +When asking the user for access to external repos, name the specific file and the specific question. Do not ask vaguely for "access to repo X" — ask for the evidence that would change your mind. + +> One finding depends on code outside ``. Before I raise it, I need to verify: +> +> **Finding:** `` +> **Claim:** `` +> **What I need to read:** `/` — to check ``. +> +> Options: +> +> - **Local path** — give me an absolute path to a checkout (I'll run freshness preflight and read via `git show`). +> - **`gh:/`** — I'll fetch via `gh api repos///contents/?ref=main`. +> - **`skip`** — I'll either drop the finding or keep it as MINOR with a "speculative — assumes ``" prefix; you can confirm/refute. +> - **`skip all`** — apply `skip` to every remaining cross-repo finding. + +## What "verify" means in practice + +For a contract/schema change, the questions you must answer before claiming consumer impact are concrete: + +- Does the consumer **import the contract schema directly**, or define its own local schema? (Local schema = your contract change has zero direct effect on the consumer's parser.) +- If it imports, does it call `.strict()`, `.passthrough()`, or rely on the default (strip unknowns)? Required fields missing → parse failure; extra fields → silently stripped under default. +- Is the consumer pinned to a published version of the package, or symlinked/workspace-resolved? Pinned → upgrade is explicit and the consumer team controls timing. +- Is the field actually read at runtime, or just typed? Type-only references don't fail at runtime even when the schema diverges. + +For an API response change (without consumer-side schema): + +- Does the consumer JSON-parse and read specific keys, or pass the body through opaquely (e.g. to a logger, to storage)? +- Does any layer in between (BFF, gateway) apply its own validation? + +Answer these from real code, not from priors about "how FEs usually work." When you can't answer, you do not have a finding — you have a hypothesis. diff --git a/.agents/skills/simple-review/references/posting-pr-review.md b/.agents/skills/simple-review/references/posting-pr-review.md new file mode 100644 index 0000000000..6b54d7ca16 --- /dev/null +++ b/.agents/skills/simple-review/references/posting-pr-review.md @@ -0,0 +1,100 @@ +# Posting an anchored PR review (both modes) + +Read this once the user approves items to post. Referenced from `SKILL.md`. + +## Contents + +- API call and payload shape +- Review body (top-level): attribution line, summary, apply-all prompt +- Each comment body: budget, template, formatting rules + +## API call and payload + +Post via the GitHub Reviews API as a **single** review with all selected actionable items as inline review comments anchored to specific diff lines. **Never** loose issue comments. + +```bash +# /tmp/simple-review-payload.json contains: {event, commit_id, body, comments: [...]} +gh api -X POST "repos///pulls//reviews" --input /tmp/simple-review-payload.json +``` + +Payload shape: + +```json +{ + "event": "COMMENT", + "commit_id": "", + "body": "", + "comments": [ + { "path": "src/foo.ts", "line": 42, "side": "RIGHT", "body": "..." }, + { + "path": "src/bar.ts", + "start_line": 10, + "line": 14, + "start_side": "RIGHT", + "side": "RIGHT", + "body": "..." + } + ] +} +``` + +Rules: + +- `event` is **always `COMMENT`** — never `APPROVE` or `REQUEST_CHANGES`. Approval is a human decision. +- Multi-line ranges: `start_line`, `line`, `start_side: "RIGHT"`, `side: "RIGHT"`. +- No clean anchor → pick the first changed line of the most relevant file; if literally nothing anchorable, fold into the review `body` as a labeled "general note" and flag in the post-confirmation summary. +- `commit_id` from `pr.headRefOid`. +- Build JSON with literal newlines (use `jq -n --arg body "$BODY" '...'` or a small `python -c`/heredoc), not `\n` escapes. + +## Review body (top-level) + +Three blocks, in order: + +**1. Attribution line.** Exactly: + +> _These comments were generated by @\ using Claude Code._ + +Italicized. Substitute `` from `gh api user --jq .login`. Non-negotiable — collaborators must be able to tell at a glance the review is AI-assisted. Do **not** mention the Simple Review skill by name; it's a local skill the PR author can't see or use. + +**2. Summary.** Same paragraph from the in-chat synthesis: what the change does and your recommendation. Do not include mode, lenses applied, or convention sources consulted — that's noise. + +**3. "Apply all comments at once" prompt.** A fenced block with a self-contained Claude Code prompt the PR author can paste into a Claude Code session on a checkout of this branch: + +````markdown +**Apply all comments at once** — paste this into Claude Code on a checkout of this branch: + +``` +Fetch the most recent review by @ on PR . For every inline comment in that review, address the issue: when the comment includes a `suggestion` block, apply it verbatim; otherwise implement an equivalent fix that satisfies the comment's "Why it matters" rationale. After resolving each thread, post a reply on that thread with a one-line summary of what you changed. When all comments are handled, run the project's tests, commit the changes with a message that references the review, and report back any threads you could not resolve and why. +``` +```` + +## Each comment body + +**Budget:** ≤60 words of prose total + the code block. Hard caps below are binding — if a section won't fit, the finding is probably two findings; split or drop one. + +````markdown +**[SEVERITY] Title** + + + +**Why it matters:** + +**Suggested fix:** + +```suggestion + +``` +```` + +Rules: + +- **No bulleted lists in the comment body.** Bullets fragment reasoning; either the prose fits in one sentence or it doesn't belong on the PR. +- **No prose preamble on the suggested fix.** The code block speaks for itself. Don't write "Suggested fix — route Rate the same way as Pay" before the block; just show the block. +- **No trailing addenda.** "If feature X isn't shipped yet…", "Note that this also affects…", "While you're here…" — these belong in the in-chat synthesis, not on the posted comment. The reviewee can ask follow-ups on the thread. +- **Optional Example/context fenced block is opt-out by default.** Include a non-suggestion code block only when prose-plus-`suggestion` is genuinely ambiguous (e.g. you're flagging a pattern violation and the offending pattern is not in the anchored lines). Default: skip. +- **Include a `suggestion` block whenever it makes the fix concrete.** Skip when the issue is purely structural or when prose alone says everything. +- **`suggestion` block stands alone on the PR.** GitHub already renders the diff vs. anchored line(s). Do **NOT** include a `// Before` fenced block in PR comments — it duplicates what GitHub shows. This is different from the in-chat **Actionable** section, which DOES render `// Before` + `// After` pairs so the user can review before approving the post. +- **When to skip the `suggestion` block:** structural fix with no drop-in (use prose + a target-shape fenced block), OR `suggested_fix.after` is empty (pure deletion — write "_Delete the anchored line(s)._" instead), OR `suggested_fix.before` is empty (pure addition — `suggestion` block still works; prefix `after` with `// Add after line `). +- **Permalinks for in-prose line references.** Build as `https://github.com///blob//#L-L` (single line: `#L`). Always pin to `head_sha`. Use Markdown link syntax with a meaningful label. + +After posting, print the review `html_url` and a one-line summary of how many comments were posted (and fallback general-notes count if any). diff --git a/cspell.json b/cspell.json index def1540861..dd251a9890 100644 --- a/cspell.json +++ b/cspell.json @@ -8,6 +8,8 @@ "airbyte", "amannn", "anthropics", + "AntiSlop", + "anchorable", "apikey", "appkey", "argjson", @@ -40,12 +42,15 @@ "constate", "coreutils", "creds", + "cursorrules", "datadoghq", "datamodeling", "datatypes", "datetime", "decamelize", "dedupe", + "defensiveness", + "denormalization", "depcruise", "devkit", "direnv", @@ -89,6 +94,7 @@ "lastminute", "lcov", "leie", + "luxon", "Mailpit", "maxage", "maxtimeout", @@ -108,13 +114,16 @@ "nvmrc", "Nx's", "oneline", + "oncall", "overcomplicate", "oxfmt", "oxfmtrc", "oxlint", "oxlintrc", "parallelizable", + "passthroughs", "pcpu", + "permalinks", "PGID", "pgid", "pipefail", @@ -129,6 +138,7 @@ "RECERTIFICATION", "REFCODE", "reposted", + "reviewee", "retryable", "readlink", "rollup", @@ -148,6 +158,7 @@ "sonarcloud", "sourceable", "sourcegraph", + "subagents", "subshells", "stringifying", "Stringly", @@ -174,6 +185,7 @@ "unmatch", "unpublish", "unstub", + "unvalidated", "Updte", "upserting", "upserts", diff --git a/plugins/core/.claude-plugin/plugin.json b/plugins/core/.claude-plugin/plugin.json index 60a52a94c6..71cea7d52f 100644 --- a/plugins/core/.claude-plugin/plugin.json +++ b/plugins/core/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "core", - "version": "3.4.1", + "version": "3.5.0", "description": "Clipboard's core development tools.", "author": { "name": "Clipboard", diff --git a/plugins/core/README.md b/plugins/core/README.md index f44b3dc893..90d087eba9 100644 --- a/plugins/core/README.md +++ b/plugins/core/README.md @@ -13,6 +13,8 @@ Clipboard's core development tools. - [commit-push-pr](#commit-push-pr) - [datadog-investigate](#datadog-investigate) - [flaky-test-debugger](#flaky-test-debugger) + - [in-depth-review](#in-depth-review) + - [simple-review](#simple-review) - [local-package](#local-package) - [seed-data](#seed-data) - [simplify](#simplify) @@ -68,6 +70,14 @@ Investigate production issues by querying Datadog logs, metrics, and APM traces, Debug and fix flaky Playwright E2E tests using Playwright reports and Datadog. Invoke with `/flaky-test-debugger` or when investigating intermittent test failures. +### in-depth-review + +Run a multi-agent code review on the current branch or a PR identified by argument. Defaults to engineering, minimalist, conventions, and AntiSlop reviewers, conditionally adds security/database/frontend specialists based on changed files, and includes the adversarial reviewer only when explicitly requested. Invoke with `/in-depth-review` or `/in-depth-review `. + +### simple-review + +Run a single-pass code review on the current branch or a PR identified by argument. Uses the in-depth review rubric without subagents for smaller reviews or lower-budget passes. Invoke with `/simple-review` or `/simple-review `. + ### local-package Use Clipboard's internal CLI (`@clipboard-health/cli`) to link and unlink packages across repositories for local development. Invoke with `/local-package` or let Claude auto-trigger when discussing local package development. diff --git a/plugins/core/skills/babysit-pr/SKILL.md b/plugins/core/skills/babysit-pr/SKILL.md index 2fced03bc5..baaeb10189 100644 --- a/plugins/core/skills/babysit-pr/SKILL.md +++ b/plugins/core/skills/babysit-pr/SKILL.md @@ -27,9 +27,9 @@ This skill always runs exactly one pass. It never waits or repeats internally. F The skill uses two HTML-comment sentinels. -**Addressed sentinel**: ``. The `core@` suffix records which plugin version produced the reply. Appended on its own line at the end of every reply the skill posts (both thread replies and the CodeRabbit summary). This is how the skill knows, on re-runs, which threads and CodeRabbit review-body comments it already handled. Dedupe matches by the version-agnostic prefix ``) to find sentinels regardless of version; grep `babysit-pr:addressed v1 core@3.4.1` to find ones from a specific version. +**Addressed sentinel**: ``. The `core@` suffix records which plugin version produced the reply. Appended on its own line at the end of every reply the skill posts (both thread replies and the CodeRabbit summary). This is how the skill knows, on re-runs, which threads and CodeRabbit review-body comments it already handled. Dedupe matches by the version-agnostic prefix ``) to find sentinels regardless of version; grep `babysit-pr:addressed v1 core@3.5.0` to find ones from a specific version. -**Follow-up sentinel**: ``. Attached to replies that defer an out-of-scope comment as a tracked follow-up (see the Scope subsection and the Defer verdict in step 6). Grep `babysit-pr:followup` across PR conversation JSON to enumerate deferred items. This sentinel is additive — the post-reply scripts still append the `addressed` sentinel at the end, so a deferred thread is correctly machine-classified as addressed (the skill _has_ handled it — by deferring). Human reviewers and future sweeps distinguish deferred from resolved by looking for the follow-up sentinel. +**Follow-up sentinel**: ``. Attached to replies that defer an out-of-scope comment as a tracked follow-up (see the Scope subsection and the Defer verdict in step 6). Grep `babysit-pr:followup` across PR conversation JSON to enumerate deferred items. This sentinel is additive — the post-reply scripts still append the `addressed` sentinel at the end, so a deferred thread is correctly machine-classified as addressed (the skill _has_ handled it — by deferring). Human reviewers and future sweeps distinguish deferred from resolved by looking for the follow-up sentinel. **Sentinel recency rules.** The script emits a per-thread `activityState` with three values: @@ -266,7 +266,7 @@ Body templates (the script appends the `addressed` sentinel if missing): - **Agree**: `Addressed in . .` - **Disagree**: `Leaving current behavior. .` - **Already fixed**: `Already handled by . .` -- **Defer**: `Out of scope for this PR; this looks like follow-up work rather than something introduced or required by this change. .\n\n` +- **Defer**: `Out of scope for this PR; this looks like follow-up work rather than something introduced or required by this change. .\n\n` For Defer replies, include the follow-up sentinel on its own line as shown. The script will append the `addressed` sentinel after it on its own line, so the final body ends with the follow-up sentinel followed by a blank line followed by the `addressed` sentinel — `grep babysit-pr:followup` finds the deferral and `grep babysit-pr:addressed` still marks the thread handled for dedupe. @@ -281,7 +281,7 @@ bash scripts/postSentinelPrComment.sh "$PR_NUMBER" "$BODY" The CodeRabbit summary body should: - Group verdicts under **Agree / Disagree / Already fixed / Deferred (out of scope)** headings. Omit a heading if its list is empty. -- Under **Deferred (out of scope)**, list each deferred CodeRabbit review-body comment as a bullet, followed on its own line by `` so grep catches them individually. +- Under **Deferred (out of scope)**, list each deferred CodeRabbit review-body comment as a bullet, followed on its own line by `` so grep catches them individually. - Include the commit URL for fixes. - Include every current CodeRabbit review-body comment's `fingerprint` — addressed and deferred — in a fenced block at the end (one per line, before the sentinel) so future runs can dedupe. Deferred comments count as handled for dedupe purposes. diff --git a/plugins/core/skills/babysit-pr/scripts/_sentinel.sh b/plugins/core/skills/babysit-pr/scripts/_sentinel.sh index 855da99018..f98566c996 100644 --- a/plugins/core/skills/babysit-pr/scripts/_sentinel.sh +++ b/plugins/core/skills/babysit-pr/scripts/_sentinel.sh @@ -8,7 +8,7 @@ # replies; the `core@X.Y.Z` suffix records which plugin version produced it. SENTINEL_PREFIX='' +SENTINEL='' # Echo $1 with SENTINEL appended on its own trailing paragraph, unless the # body already contains any version of the sentinel (matched via SENTINEL_PREFIX). diff --git a/plugins/core/skills/commit-push-pr/SKILL.md b/plugins/core/skills/commit-push-pr/SKILL.md index 67cc0395fc..fe67ae7673 100644 --- a/plugins/core/skills/commit-push-pr/SKILL.md +++ b/plugins/core/skills/commit-push-pr/SKILL.md @@ -45,6 +45,6 @@ Script paths in this procedure are written as `scripts/...`, relative to this SK 4. Push the branch to origin. 5. Look up the current agent session ID by running this skill's bundled script: `bash scripts/find-session-id.sh ''`. Pass a distinctive verbatim chunk (≥10 words) from the most recent user message; see the script header for quoting constraints. If the script prints `codex `, use `Agent session: codex resume `. If it prints `claude-code `, use `Agent session: claude --resume `. If empty, there is no session footer line. 6. Check for an existing PR with `gh pr view`. - - No PR: create with `gh pr create`. Title = commit subject. Description = the PR body shape above, followed by the session footer line if known and ``. - - PR exists: refresh the body via `gh pr edit --body` so (a) the new commit's changes are reflected in the prose while existing `## Summary`, `## Validation`, and `## Notes` sections are preserved unless clearly stale, (b) any known session footer line is appended if missing, never removing or rewriting existing `Agent session: ...` or `Agent session ID: ...` lines, and (c) any existing `` line is preserved verbatim, appending `` if absent. Then report the URL. + - No PR: create with `gh pr create`. Title = commit subject. Description = the PR body shape above, followed by the session footer line if known and ``. + - PR exists: refresh the body via `gh pr edit --body` so (a) the new commit's changes are reflected in the prose while existing `## Summary`, `## Validation`, and `## Notes` sections are preserved unless clearly stale, (b) any known session footer line is appended if missing, never removing or rewriting existing `Agent session: ...` or `Agent session ID: ...` lines, and (c) any existing `` line is preserved verbatim, appending `` if absent. Then report the URL. 7. End with one short text response: branch name and the full PR URL (e.g., `https://github.com/clipboardhealth/core-utils/pull/123`). Never use shorthand like `repo#123` — always output the complete URL. diff --git a/plugins/core/skills/in-depth-review/SKILL.md b/plugins/core/skills/in-depth-review/SKILL.md new file mode 100644 index 0000000000..11e667ae51 --- /dev/null +++ b/plugins/core/skills/in-depth-review/SKILL.md @@ -0,0 +1,494 @@ +--- +name: in-depth-review +description: Multi-agent code review of the current branch or a PR — parallel engineering, minimalist, conventions, and AntiSlop reviewers plus conditional security, database, and frontend specialists that debate, then a synthesized review posted as anchored PR comments. Use when the user asks for a deep, thorough, or multi-perspective review, reviews a large or high-stakes diff, or runs /in-depth-review [PR-number-or-URL]. For small diffs or a quick single-pass review, use simple-review instead. +--- + +# In-Depth Review + +Run a multi-agent code review on the current branch _or_ on a PR identified by argument. By default dispatches four parallel reviewers (engineering+customer, minimalist, conventions, anti-AI-slop) and conditionally adds up to three domain specialists (security, database, frontend) when the diff touches their surface. The first-principles adversarial agent (Agent A) is **opt-in** — include it only when the user explicitly asks (see Invocation). Agents review in parallel, debate once, and the main agent moderator-filters and synthesizes a report. Author vs reviewer mode adapts the post-review flow. + +## Invocation + +- **`/in-depth-review`** — review the current branch (resolves the open PR for the branch if any, otherwise diffs against the default branch). +- **`/in-depth-review `** — fast path for reviewing someone else's PR while sitting on `main` (typical entry from Claude Desktop in code mode on a worktree). Skips the local branch checkout, fetches diff and metadata via `gh`, forces **reviewer mode**, and skips the plan/execute path. Accepts either a bare number (resolved against the current repo) or a full GitHub PR URL (which also identifies the owner/repo, useful when the worktree's remote differs). + +### Agent roster + +| Letter | Name | Role | Default state | +| ------ | ----------- | ------------------------------------- | -------------------------------------------- | +| A | Adversarial | First-principles adversarial | Opt-in only (`with adversarial`, `+A`, etc.) | +| B | Engineering | High engineering standards + customer | Always on | +| C | Minimalist | Smallest-diff posture | Always on | +| D | Conventions | Repo-rules / docs | Always on | +| E | Security | Security & API surface | Conditional (auth/route/contract files) | +| F | Database | DB schema, queries, migrations | Conditional (migration/schema/model files) | +| G | Frontend | FE conventions & UX correctness | Conditional (FE files) | +| H | AntiSlop | Anti-AI-bias / pushback on slop | Always on | + +**Refer to agents by name in everything the user sees** (Summary, Actionable items, Raised-by lines, Disagreements, Withdrawn). The letter is a compact prefix for finding IDs (`B1`, `C3`, `Adversarial`/`A1`, …) and a shorthand inside this document — never the only label in user-facing prose. The Round 2 `original_author` field stores the **name**, not the letter. + +### Agent A (first-principles adversarial) is opt-in + +The Adversarial agent (A) is skipped by default. Include it only when the user's invocation contains a clear opt-in phrase such as `with adversarial`, `with agent a`, `include adversarial`, `add the adversarial agent`, `+A`, or equivalent. If the user gives a more ambiguous instruction (e.g. "run the full review"), ask once whether to include the Adversarial agent before dispatching. Record the decision (`agent_a_enabled: true|false` + the phrase that triggered the opt-in, if any) in `/tmp/in-depth-review-meta.json` and surface it in the Summary so the user can see what ran. + +## Scope + +Two entry paths: + +### Path A — PR-argument fast path (when an argument is provided) + +- Parse the argument: bare integer → use current repo (`gh repo view --json nameWithOwner --jq .nameWithOwner`); URL → extract `/` and `` from the URL. +- Do **not** check out the PR branch. Operate from whatever the working directory is (typically `main` in a worktree). Convention/file reads happen against a verified-fresh ref — see **Freshness preflight** below. +- Mode is **locked to reviewer**. Skip mode detection. + +Gather in parallel: + +```bash +gh pr view --repo / --json number,url,title,body,baseRefName,author,headRefOid,headRepository,files +gh pr diff --repo / +gh api user --jq .login +``` + +Persist: + +- Diff stdout from `gh pr diff` → `/tmp/in-depth-review-diff.patch`. +- `pr.title + pr.body` → `/tmp/in-depth-review-context.md`. +- `pr.files[].path` → `/tmp/in-depth-review-files.txt`. +- Meta (PR number/url/baseRefName/author, viewer login, head SHA, owner/repo, mode=`reviewer`, conditional-agent set after classification) → `/tmp/in-depth-review-meta.json`. + +If `pr.author.login == viewer.login`, still proceed in reviewer mode (the user explicitly asked for the PR-arg path; honor it) but flag this in the synthesis Summary so the user can switch to a manual `/in-depth-review` run on their checkout if they meant to implement. + +### Path B — current-branch path (no argument) + +- If there is an open PR for the current branch, review that PR. Base = `baseRefName`, diff = `git diff $base...HEAD`. Capture PR title + body as context. +- Otherwise, review the current branch vs the default branch (`main`, fall back to `master`). Capture `git log --format='%h %s%n%n%b' $base..HEAD` as context. +- **Never** review uncommitted working-tree changes. If the branch has no diff vs base, stop and report. + +Gather in parallel: + +```bash +git rev-parse --abbrev-ref HEAD +git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||' +gh pr list --head "$(git branch --show-current)" --state open --json number,url,title,body,baseRefName,author,headRefOid +gh api user --jq .login +gh repo view --json nameWithOwner --jq .nameWithOwner +git diff --name-only "$base"...HEAD +``` + +Determine **mode**: + +- PR exists and `pr.author.login == viewer.login` → **author mode**. +- PR exists and authors differ → **reviewer mode**. +- No PR exists → **author mode** (reviewing your own pre-PR branch). + +Persist context so subagents read it from disk: + +- Diff → `/tmp/in-depth-review-diff.patch` +- Context (PR body or commit log) → `/tmp/in-depth-review-context.md` +- Changed-file list → `/tmp/in-depth-review-files.txt` +- Mode + repo metadata (PR number, url, base, head SHA, author, viewer, owner/repo, list of conditional agents that will run, verified-fresh `context_ref` from the freshness preflight) → `/tmp/in-depth-review-meta.json` + +## Freshness preflight (runs before Diff classification — both paths) + +Stale local state is the #1 cause of false-positive findings: it produces "evidence" of bugs that were already fixed on `main`. Before dispatching any agent, the main agent **must** verify that the ref it tells subagents to read for context is current with the remote, and pass that ref into subagent prompts so agents read code via `git show :` and `git grep ... -- ` rather than the worktree filesystem. + +Run this preflight on the **primary repo** (the one containing the diff under review): + +```bash +git fetch origin "$base" --quiet 2>&1 | tail -3 +git rev-parse --abbrev-ref HEAD +git status --porcelain +git rev-list --left-right --count "HEAD...origin/${base}" # prints "\t" +git log -1 --format='%h %ci' "origin/${base}" +``` + +Decide `context_ref` (the ref subagents must read for repo context, distinct from the diff itself): + +- **Path A (reviewer mode, PR-arg):** ideal `context_ref = origin/${base}` (whatever the PR's base branch is, usually `main`). The worktree filesystem must **only** be used as `context_ref` when `HEAD == origin/${base}` AND working tree is clean AND `git fetch` succeeded. +- **Path B (author mode, current-branch):** `context_ref = origin/${base}`. The user's local feature-branch files are the _diff_, not the _pre-PR context_; pre-PR convention/neighbor reads must come from `origin/${base}`. + +If any of the following are true, **stop and ask the user before continuing**: + +1. `git fetch origin "$base"` failed (offline, auth, repo permissions). Print the error. +2. `HEAD` differs from `origin/${base}` (Path A only — Path B expects HEAD on a feature branch). +3. Working tree is dirty (`git status --porcelain` is non-empty) **and** the dirty paths overlap the diff's changed-file list or any path subagents will need to read. +4. `HEAD` is behind `origin/${base}` (any non-zero "behind" count). +5. `HEAD` is more than a small number of commits ahead of `origin/${base}` on Path A (ahead implies the user is mid-work on a branch that isn't the PR; their worktree is not a clean main). + +Warn template (substitute the verified state): + +> Freshness check for `/` at ``: +> +> - on branch `` (expected `` for Path A) +> - `` commit(s) ahead, `` commit(s) behind `origin/` (last remote commit: ` `) +> - working tree: `` +> +> Reading code from this worktree may surface findings based on stale or local-only state. +> Reply: `proceed` (use worktree anyway and accept the risk), `use-origin` (read context via `git show origin/:` instead — recommended, no checkout needed), or `stop` (let me clean up and re-run). + +**Never** run `git checkout`, `git stash`, `git reset`, or any other state-modifying git command on the user's behalf. The skill warns and asks; the user is the only actor that resolves local state. + +Record the resolved `context_ref` (e.g. `origin/main`, or the literal string `worktree (stale, user accepted risk)`) in `/tmp/in-depth-review-meta.json`. Every subagent prompt must explicitly state which ref to read and how (see "Reading code in subagents" below). + +### Reading code in subagents + +When `context_ref` is `origin/`: + +- Read repo context via `git show "${context_ref}:"` (for whole files) and `git grep -n "${context_ref}" -- ` (for searches). +- The Read tool reads the working tree, which may be stale; subagents must **prefer** `git show` for context reads. The working tree may only be read when the file isn't tracked at `context_ref` (e.g. brand-new file in the PR diff) and that fact is explicitly noted in the finding. + +When `context_ref` is `worktree (stale, user accepted risk)`: + +- Subagents may use Read on the worktree, but every finding must include a one-line "verified against: worktree-stale" caveat in the `point`, and the moderator filter must downgrade CRITICAL/MAJOR to MINOR unless the evidence is internal to the diff itself. + +## Cross-repo evidence policy (binding for all agents) + +A finding's evidence is "cross-repo" when it depends on code in any repo other than the one containing the diff (e.g. a consumer or producer in another service). Subagents must **NOT** silently read external repos — stale checkouts fabricate evidence and unverifiable paths can't be checked. Emit such findings with an `evidence_required` field and cap severity at **MAJOR** until the moderator verifies the evidence on a fresh ref; after Round 1 the moderator collects every `evidence_required` block and asks the user for access before Round 2. + +Read [references/cross-repo-evidence.md](references/cross-repo-evidence.md) before raising or finalizing any cross-repo finding — it has the `evidence_required` schema, the moderator's access-request prompt, the freshness rules for external repos, and `skip` handling. + +## Diff classification (runs before Round 1) + +Match the changed-file list against these path patterns. A conditional agent runs only when its pattern matches at least one changed file: + +- **E — Security** triggers on: any file under `routes/`, `controllers/`, `middleware*/`, files matching `auth*`, `*permission*`, `*acl*`, `*token*`, `*session*`, response serializers, OpenAPI / contract definitions, or new API endpoint files. +- **F — Database & migrations** triggers on: `migrations/`, `*.sql`, files matching `schema*`, Mongoose/Prisma model files (`models/`, `*.model.ts`, `*.schema.ts`), repository / DAL files, or files containing query builders. +- **G — Frontend** triggers on: `*.tsx`, `*.jsx`, `*.css`, `*.scss`, files under `pages/`, `components/`, `hooks/`, or anything importing from `react`, `@tanstack/react-query`, or a design-system package. + +Record the dispatched set in `/tmp/in-depth-review-meta.json` so Round 2 knows the full agent list. + +## Severity rubric (binding for all agents) + +- **CRITICAL** — realistic input causes incorrect behavior, data loss, security regression, broken contract, or a paging incident. +- **MAJOR** — meaningful degradation of correctness/UX/observability under realistic conditions; OR a documented-convention violation with concrete downstream impact. +- **MINOR** — cheap, concrete improvement with a named benefit. +- **NIT** — only admissible if (a) repeats ≥2× in the diff, (b) conflicts with a documented convention, or (c) is a one-line trivial fix. Otherwise drop silently. + +Every finding **must** include a `failure_mode` field: one sentence on the concrete user-, oncall-, or maintainer-visible bad outcome that would occur if not fixed. Hypotheticals like "a future caller might…" do not satisfy this; such findings will be dropped during moderation. + +Run the litmus test on every candidate finding before keeping it: _"What is the concrete, current, product-visible cost of leaving this code in?"_ If you cannot answer in one sentence, drop the finding. + +### Do-not-raise list (binding) + +Do not surface any of: + +- Speculative defensiveness at trusted internal boundaries (e.g. "what if this private helper got null?"). +- Restating the obvious ("consider a comment explaining what this does"). +- Hypothetical future-caller scenarios with no current caller. +- Style/formatting a linter or formatter already covers. +- Test-coverage demands on trivially-verifiable code (one-line passthroughs, simple getters). +- "Add observability" without naming a concrete failure mode it would help debug. +- Abstract SOLID-style "consider extracting…" suggestions without a concrete failure mode. +- Aesthetic naming preferences. Only raise names that mislead about behavior. + +## Rounds (hard cap: 2) + +### Round 1 — parallel independent reviews + +Dispatch all selected agents **in the same message** via the `Agent` tool with `subagent_type: general-purpose`. The default selected set is **Engineering, Minimalist, Conventions, AntiSlop** (B/C/D/H), plus **Security, Database, Frontend** (E/F/G) when triggered by classification. **The Adversarial agent (A) is dispatched only when `agent_a_enabled` is true** (see Invocation — "Agent A is opt-in"). Each agent gets the file paths above, their role brief, and permission to Read repo files for context. Do **not** let agents see each other's output in this round. **Each agent emits at most 8 findings, prioritized — not exhaustive.** + +**Every subagent prompt must include the following input contract verbatim** (substitute `` with the value resolved in the Freshness preflight): + +> **Context-read contract.** The verified-fresh ref for repo context is ``. For any file you read that is part of the primary repo's tracked content (i.e. _not_ a brand-new file in this PR's diff), use `git show ":"` or `git grep -n "" -- ` rather than the Read tool on the worktree. If you read from the worktree because the file is new in the diff or untracked at ``, say so in the finding's `point`. +> +> **Cross-repo evidence contract.** If a finding's load-bearing evidence is in any repo other than this one, do NOT silently read another local checkout — those checkouts may be stale or on unrelated branches. Instead, emit the finding with an `evidence_required` field naming the repo(s) and the specific verification question, and cap your severity at MAJOR. The moderator will ask the user for verified access before Round 2 and re-dispatch you if needed. + +Required output per finding: `id` (`A1`, `B3`, `C2`, `D4`, `E1`, `F1`, `G1`, `H1`, …), `severity`, `file:lines`, `title` (one sentence), `point` (one short paragraph), `failure_mode` (one sentence), optional `suggested_fix` (see schema below), and optional `evidence_required` (object with `repos: string[]` and `what_to_verify: string` — required whenever the finding depends on cross-repo state). + +**`suggested_fix` schema** (when present): an object with two string fields, both code (not prose), preserving the file's actual indentation: + +```json +"suggested_fix": { + "before": "", + "after": "" +} +``` + +- For a **pure deletion**, set `after` to the empty string `""`. +- For a **pure addition** at a new line (nothing replaced), set `before` to the empty string `""` and prefix `after` with a one-line comment indicating where it lands (e.g. `// add after line 142`). +- For a **structural change** with no clean drop-in (e.g. "move this file"), omit `suggested_fix` entirely and put the guidance in `point` / `failure_mode`. +- Both fields must be code snippets directly usable to compute a diff. Do not include surrounding prose, do not paraphrase, and do not elide content with `// ...`. If the change is too large to inline both, omit `suggested_fix` and describe the fix in `point`. + +#### Always-on agents (Engineering / Minimalist / Conventions / AntiSlop) and the opt-in Adversarial agent + +**Adversarial (A) — First-principles. Opt-in only.** Dispatch only when the user explicitly asks (see Invocation — "Agent A is opt-in"). Skip silently otherwise. Posture: _"What is the single most important thing this PR gets wrong? Then up to 7 more, ranked."_ Question whether the change actually solves the underlying problem and whether the chosen approach is right. Challenge specific library / pattern / approach choices when they don't hold up on their own merits. Do **not** challenge foundational stack choices that are out of scope for the PR. Flag internal inconsistencies inside the diff itself. Specifically also probe: + +- Should this PR be split, or is it bundling unrelated tickets? +- Is the root cause of the bug clear, and is it fixed in _every_ place it manifests? +- Is a feature flag warranted but missing? +- Are there old feature flags or dead references this change makes deletable? + +Conventions are the Conventions agent's (D) job — do not double up. + +**Engineering (B) — High engineering standards + customer-centric.** Posture: _"For each change, name a realistic input or condition that would expose a bug. If you cannot, do not raise it."_ Evaluate edge cases, error paths, observability, tests (coverage of real risk, not lines), high-level security (E does the deep dive when dispatched), concurrency, performance at real scale, data integrity, accessibility, UX, degraded-mode behavior, backward compatibility, on-call implications, and domain assumptions in the diff. Specifically also probe: + +- Async/await correctness — ordering matches the actual dependency between calls. +- Timezone correctness in date-handling code; currency variables explicitly in minor units. +- When API surface changed: AuthN, AuthZ, sensitive-data leakage, PII handling. Hand the deep dive to Security (E) if dispatched. +- When a migration is in the diff: rollback strategy, backward compatibility during rolling deploy, ETL / downstream impact. Hand to Database (F) if dispatched. +- When DB schema or queries changed: indexes for new query patterns, N+1 risk, cascade-delete semantics, data-type fit. Hand to Database (F) if dispatched. +- Telemetry covers _business / product_ value, not only engineering surface metrics (latency / error rate). +- Contract / backward-compatibility for any consumer-visible response shape change. If you suspect a consumer break, the evidence is cross-repo — emit `evidence_required` and cap at MAJOR rather than asserting it; never raise a speculative "the FE will fail to parse" finding without verified evidence. + +Skip items that fail the realistic-input test. + +**Minimalist (C).** Posture: the smallest diff that ships the intent is the best diff. Identify unneeded abstractions, speculative generality, dead branches, redundant validation, defensive code at trusted internal boundaries, comments that restate code, new files / utilities that duplicate existing ones, tests that test the framework rather than behavior, flags / config knobs without a concrete caller. Specifically also probe: + +- Duplicated error handlers in the same scope. +- Commented-out code or dead branches (still gated by `failure_mode`). + +For every "delete this" finding, `failure_mode` must state the _concrete cost of keeping the code_. + +**Conventions (D) — Repo-rules.** Sole owner of convention checks (backend / shared). Read `AGENTS.md`, `CLAUDE.md`, `.rules/`, `.cursorrules`, `docs/adr/` (or equivalent), package READMEs in the diff's path, and one or two neighboring files in the same service for in-practice patterns. Then check the diff for: + +- Preferred internal packages over third-party ones (e.g. `@clipboard-health/*`, internal `lib/*`). +- Terminology rules (worker/workplace, not HCP/facility). +- `lodash` removal preference. +- `@clipboard-health/datetime` over `date-fns`, `date-fns-tz`, `moment`, `luxon`. +- Internal `Money` package for currencies; explicit minor-units variable names. +- Logging conventions (`longContext`, `ObjectId.toString()`, no variables in log messages). +- Null/undefined checks via `isDefined` / `isNil` over truthy. +- 4+ argument functions → params object. +- Error-handling patterns in the same service (typed `ServiceResult` vs throw). +- Test conventions, naming, file location (service test structure, `createTestContext` / `tearDown`, `beforeAll` / `afterAll` for GET). +- Business-meaningful magic constants that should be named. +- ENV access via the project's Configuration abstraction, not direct `process.env`. +- Logging library standard for the repo (e.g. Winston) — derive from existing code, not hardcoded. +- Pinned dependencies; lock file (`package-lock.json` / `yarn.lock`) in sync with `package.json`. +- RESTful route shape and uniform response format within a service. +- Internal inconsistency _within the diff itself_ (e.g. half of imports from one package family, half from upstream). + +Frontend conventions are the Frontend agent's (G) territory when G is dispatched. If Frontend is not dispatched (no FE files), Conventions (D) may surface FE convention drift if it appears. + +Tag every finding `[CONVENTION]`. Cap severity at MAJOR (only when behavior diverges as a result of the violation) or MINOR otherwise. At the top of your output, list the specific files/sources you checked. + +**AntiSlop (H) — Anti-AI-bias.** Posture: _"This PR may have been written or assisted by an LLM. For each addition, ask: 'is this line earning its keep, or is it pattern-matching what code is supposed to look like?' If a reasonable senior engineer wouldn't have written it, call it out and propose deleting or shrinking it."_ Your job is to push back on AI-shaped padding that the other agents are too polite to call slop. Specifically watch for, **inside the diff itself**: + +- **Defensive code at trusted internal boundaries.** Null / undefined guards on private helpers whose callers' types already guarantee non-null. `try` / `catch` wrapping a single call that doesn't itself throw, or that re-throws unchanged, or that "logs and swallows" without naming what to do next. Optional-chaining cascades through types that don't include optionality. +- **Defensiveness against unrealistic product / domain scenarios.** Code that guards against situations that wouldn't actually arise when the product is used as designed. Ask the litmus question: _"In the real product flow this code participates in, what user action, system event, or upstream call could land us in this branch?"_ If the answer is "none" or "I had to invent one to justify the guard", the code is slop. Concrete shapes this takes: + - A `null` / `undefined` guard on an ID immediately after that ID was used to load (and find) the entity. + - A `null` / `undefined` / `""` / `0` branch on a field whose TypeScript type permits only one of those values, or whose Zod / class-validator schema already rejected the others at the request boundary. + - A re-validation of a value the request DTO already validated upstream in the same request lifecycle. + - A consistency check (`if (a !== b) throw`) between two fields the data model forbids being unequal. + - A branch for a product-impossible state — e.g. "what if the worker is assigned to two workplaces at once" when the product enforces single-active-assignment; "what if the shift end is before its start" after the schema already enforces ordering; "what if the dispute amount is negative" when the validator caps it. + - A retry / fallback layer around an SDK or library call that already retries or already returns a typed error you can ignore. + - A `catch` clause for an error class the underlying code is statically known not to throw, or that "logs and swallows" without naming what to do next. + - A "future-proof" code path (`if (feature === "v2")`) when there is no `v2` and no concrete ticket creating one. + + Don't accept "but what if upstream changes?" as a defense — that's the hypothetical-future-caller anti-pattern. The fix when upstream genuinely changes is a typed-error / schema update in the same PR, not preemptive guards. + +- **Restating-the-code comments.** `// fetch the user`, `// loop through items`, `// add 1 to counter`. JSDoc on private helpers that only restates the signature. `// Note: this is important` lines. Section banners (`// === HELPERS ===`) inside short files. +- **Empty scaffolding.** `// TODO` comments with no owner, ticket, or trigger condition. Redundant pre-conditions (`if (!x) throw` immediately after a Zod parse). Log lines added "for debugging" that survive into the PR. Default `else { return undefined; }` after already-exhaustive branching. `_unused` parameter prefixes that should just be deletions. +- **Speculative generality.** A helper called once that wraps two trivial lines. A `Map` / `Set` / config object keyed by a single hardcoded value. A "strategy" / "registry" pattern with one strategy. Type unions whose only second case is `never`, a placeholder, or a string literal nobody emits. +- **Unused parameters / overloads / fields.** Function args destructured but never read; interface methods with empty implementations; type fields written but never consumed downstream in the diff; new optional fields with no producer or consumer. +- **Tests that exercise the framework, not the code.** Tests that assert "the mock returns what the mock returned" via `jest.fn().mockReturnValue(x); expect(fn()).toBe(x)`. Snapshot tests with no semantic assertion. Tests that mock the unit under test. Tests with names that describe the implementation (`it("calls foo.bar"…)`) rather than the behavior. +- **Dead AI breadcrumbs.** Variables whose only use is being logged or echoed in a debug branch; `console.log` / `console.error` calls that should have been removed before commit; commented-out alternative implementations left in the diff; "\_legacy" parameter renames where nothing actually changed about how it's read. +- **Tone / description mismatch.** PR description / commit message claims behavior the diff doesn't have, or describes the diff in language that pattern-matches engineering writing without naming the actual change. Variable / function names that sound right but don't reflect the value's role (`data`, `result`, `processed`, `_handle`, `doStuff`). + +For every finding, `failure_mode` must name the _concrete cost of leaving the code in_ — e.g. "future readers waste cycles understanding a wrapper that does nothing", "the unused parameter masks the next refactor's mistake", "the `try` / `catch` silently swallows an unrelated runtime error", "the restating comment goes stale in the next change and misleads readers", "the speculative helper makes call-site search return one false hit per call". A finding without a concrete cost-of-keeping fails the moderator filter. + +The default `suggested_fix` shape is **delete** (empty `after`) or **simplify** (smaller `after` than `before`). Suggesting "add a justifying comment" is itself slop — do not propose it. If the right move is structural (e.g. inline the wrapper), omit `suggested_fix` and describe the move in `point`. + +Stay in your lane: + +- Do **not** raise items that the Conventions agent (D) owns (terminology, lodash, datetime / money libraries, logging context shape) — that's their turf. +- Do **not** demand observability, tests, or error handling that **does not exist** in the diff — that's the inverse of your job. You only call out what's _present_ and unnecessary. +- Do **not** challenge whether the PR solves the right problem — that's the Adversarial agent's (A) turf when opted in. +- A change that's small but unnecessary is still slop. A change that's large but earns each line is not. + +Cap output at 8 findings. `failure_mode` required. + +**Round 2 expanded role — audit the other agents' findings, not just your own.** In Round 2 you receive every other agent's Round 1 findings. In addition to defending or withdrawing your own (the standard Round 2 contract), you are the AI-bias counterweight in the debate: audit every other agent's finding for the same slop patterns you scan code for. Mark each such finding `disagree` with a one-line `slop:` label in the `reasoning` field. Specifically watch for: + +- **Asks-for-more-defensiveness.** "Add a null check", "wrap in try / catch", "validate this input", "guard against …". If the value is already typed, already validated upstream, or already guaranteed by a preceding call inside the same scope, tag `slop: asks for defensive guard on already-narrowed value`. +- **Hypothetical concerns.** "What if a future caller passes …", "consider the case where …", "if someone later …". If the finding cannot name a current realistic input or product flow that hits the concern, tag `slop: hypothetical future caller — no current path`. +- **Restating-the-obvious comment requests.** "Add a JSDoc explaining what this does", "consider a comment here". If the name + signature already convey intent, tag `slop: restating-the-obvious comment request`. +- **Abstract "extract this helper" / SOLID-aesthetic refactors.** Refactor suggestions whose `failure_mode` is shape-of-the-code rather than a concrete cost of keeping the inline version. Tag `slop: refactor with no concrete cost-of-keeping`. +- **Observability demands without a named failure mode.** "Add a log here", "emit a metric". If the finding can't name the specific failure the telemetry would help debug, tag `slop: observability without named failure mode`. +- **Test-coverage demands on trivially-verifiable code.** "Add a unit test for this getter / passthrough / one-line wrapper". If the code's correctness is type-evident or already exercised by a higher-level test in the same PR, tag `slop: test for trivially-verifiable code`. +- **Domain-impossible scenarios surfaced by other agents.** Any finding that worries about a state the product cannot produce given how it actually works. Apply the same litmus question to _the finding_ that you apply to code: "what real product flow could land us here?" If none, tag `slop: defends against a state the product cannot produce`. + +The moderator filter weighs your slop tags heavily — see the moderator filter step on AntiSlop tags. The original author gets a chance to defend in their own Round 2 entry (a concrete `final_failure_mode` naming a real, product-specific cost); without that defense, the finding is dropped. + +Even when your Round 1 findings are sparse because the diff is genuinely clean, do not under-spend on the Round 2 audit. The audit is often the bigger lever — it stops slop _suggestions_ from polluting the final review. + +#### Conditional agents + +**Security (E) — API surface.** Dispatched when the diff touches auth / route / permission / serializer / contract files. Posture: _"Where could this change leak data, bypass authorization, or expand the trust boundary?"_ Specifically check: + +- AuthN: every new or modified endpoint has authenticated identity unless explicitly public. +- AuthZ: per-endpoint and per-resource permission checks; cross-tenant / cross-user access. +- Secrets in configuration — no hardcoded keys, no secrets in client bundles. +- No self-made or client-side cryptography. +- SQL / NoSQL injection vectors (string-concatenated queries, unvalidated `$where`, raw aggregation pipelines from user input). +- Sensitive data in response payloads — over-fetching, over-serialization, internal IDs leaked. +- PII handling — PII fields logged, cached, or sent to telemetry. +- Input validation at the boundary (Zod / class-validator) on every new endpoint. +- For numeric inputs: explicit `min` / `max` bounds. Verify what the JS engine does with `Number.MAX_SAFE_INTEGER + 1`, BigInt conversion, negative values, and whether the catch path returns a structured 400 or a 500 with stack log. + +Cap output at 8 findings. `failure_mode` required. + +**Database (F) — Schema, queries, migrations.** Dispatched when the diff touches migrations / SQL / schema / model / query files. Posture: _"What breaks at production scale or during deploy?"_ Specifically check: + +- Index coverage for new query patterns; missing compound indexes. +- N+1 query shapes; loops issuing per-iteration queries. +- Schema normalization — denormalization that creates write-amplification or update anomalies. +- Data types — range, precision, locale (numeric, date, money). Mongoose `Number` is IEEE 754 double; safe for integers up to 2^53. +- Cascade-delete and FK-on-delete semantics; orphan risk. +- Migration rollback strategy; can the migration run online without downtime? +- Backward compatibility during a rolling deploy (old code reads new schema, new code reads old schema). +- ETL / downstream consumer impact when schema or field semantics change. +- New collections / tables ship with the indexes their query patterns need on day one. +- Dual-write fields (storing both `amount` and `amountInMinorUnits`-style pairs): is canonicalization happening on write, or does the on-disk record encode two contradictory values? +- Backfill for new fields on existing records: present, deferred (with ticket), or missing? + +Cap output at 8 findings. `failure_mode` required. + +**Frontend (G) — Conventions & UX-correctness.** Dispatched when the diff touches `*.tsx` / `*.jsx` / FE component / hook / page paths. Posture: _"Does this follow our FE conventions, and will it behave correctly under realistic user conditions?"_ Specifically check: + +- API calls go through v2 / custom hooks, not raw `fetch` / `axios` in components. +- React Query: thin wrappers (`useGetQuery`, etc.), not raw `useQuery` / `useMutation`. +- Falsy-value checks use `isDefined()` / `isNil()`, not `!value`. +- Test actions wrapped in `act()`. +- Zod schemas: `z.nativeEnum` used where a TS enum already exists. +- Prop drilling — when a value is passed through ≥2 layers, is a context preferable? +- Feature flags via `useCbhFlag` (or repo equivalent), not direct config reads. +- New components / styles pulled from the design-system library, not built from scratch. +- Loading / empty / error states present for any new data-fetching surface. +- Accessibility: keyboard navigation, ARIA roles, labels on interactive controls. + +Cap output at 8 findings. `failure_mode` required. + +### Round 2 — debate + +Dispatch the **same set of agents** that ran in Round 1, in parallel. Each agent receives **all Round 1 outputs** (passed inline) and must: + +- Re-examine each of their own comments and **withdraw** any that don't survive scrutiny. +- For each comment from the other agents: mark `agree`, `disagree` (with reasoning), or `refine` (propose a tighter version). +- Flag items where disagreement is substantive and unlikely to resolve. + +Output per item: `id`, `original_author` (agent **name**, e.g. `Engineering`, `Minimalist`, `Conventions`, `AntiSlop`, `Security`, `Database`, `Frontend`, `Adversarial` — not the bare letter), `verdict` (keep | withdraw | agree | disagree | refine), `final_severity`, `final_title`, `final_failure_mode`, `reasoning`, `suggested_fix`, and rebuttals `[{from, stance, reasoning}]` where `from` is also the agent name. + +**AntiSlop (H) plays an enhanced Round 2 role.** Beyond defending / withdrawing its own findings and voting on the others, AntiSlop audits every other agent's findings for AI-bias patterns — asks-for-more-defensiveness, hypothetical "future caller" concerns, restating-the-obvious comment requests, abstract "extract this helper" refactors, observability / test-coverage demands without a named failure mode, and guards-against-states-the-product-cannot-produce. AntiSlop tags those `disagree` with a one-line `slop:` label in `reasoning`. These tags feed the moderator filter (see step 2 there); the original author's Round 2 entry is their chance to rebut with a concrete, product-specific `final_failure_mode`. When other agents see an AntiSlop slop tag on one of their own findings, they should either rebut with a concrete failure mode or withdraw — not both stand on the original framing and re-assert the same wording. + +**This is the last round.** Residual disagreement goes to the Disagreements section. + +## Moderator filter (main agent — runs after Round 2, before synthesis) + +Apply these filters in order. They are non-negotiable: + +1. **Drop findings with empty or hypothetical `failure_mode`.** "A future caller might…", "in case someone…" → drop. +2. **Apply AntiSlop slop tags.** For every Round 2 entry where AntiSlop voted `disagree` with a `slop:` label in `reasoning`, check the original author's Round 2 rebuttal. If the author did not provide a concrete, product-specific `final_failure_mode` (a real user-, oncall-, or maintainer-visible cost realized through an actual product flow) that addresses AntiSlop's specific tag, **drop the finding**. AntiSlop's audit is not a unilateral veto, but the burden of proof shifts to the original author once AntiSlop tags slop. Record dropped findings in Withdrawn with the slop label so the user can audit AntiSlop's calls (e.g. `B4 — dropped: AntiSlop tagged "slop: asks for defensive guard on already-narrowed value", Engineering did not rebut with concrete failure_mode`). +3. **Drop do-not-raise items** that slipped through. +4. **Apply the NIT gate**. NITs that don't meet (a)/(b)/(c) → drop. NITs that do meet are kept internally but hidden by default in synthesis. +5. **Merge near-duplicates** across agents into one item (preserve all attributions in `Raised by:`). +6. **Apply hard caps**: at most **6 actionable** items (CRITICAL/MAJOR/MINOR) and **8 NITs** retained internally. Anything beyond is summarized as "N additional items omitted; ask for the full list." + +Filtered items go into Withdrawn (one-liner) so nothing is invisible. + +## Synthesize (main agent) + +### Summary + +One short paragraph: what the change does and your overall recommendation (ship / ship with changes / do not ship). Do not state the mode, name which agents ran, attribute headline verdicts to specific agents, or note whether Adversarial ran — that methodology metadata is noise for the user. Keep it to substance: the headline concerns and the recommendation. + +### Actionable + +Up to 6 items that survived scrutiny and the moderator filter. Format each: + +- **[SEVERITY] Title** — `file:lines` +- **Point:** one sentence. +- **Why it matters:** the `failure_mode` sentence. +- **Counterpoint (if any):** one sentence on the rebuttal and why it didn't overturn. +- **Suggested fix (before → after):** two stacked fenced code blocks with language tags — first labeled `// Before` (the current code from `file:lines`), then `// After` (the replacement). Use the same language tag for both. Omit when the fix is structural (no clean drop-in) and explain in prose instead. For a pure deletion, show the `Before` block and write "_Delete these lines._" under it. For a pure addition, show only the `After` block prefixed with `// Add after line `. +- **Raised by:** comma-separated agent **names** (e.g. `Engineering, Conventions`); **agreed by:** comma-separated agent names. Do not use bare letters in this field — the user shouldn't have to decode A/B/C. + +Render template: + +````markdown +**Suggested fix (before → after):** + +```ts +// Before — src/foo/bar.ts:42-45 +const x = doThing(); +if (x !== null) { + return x; +} +``` + +```ts +// After +const x = doThing(); +return isDefined(x) ? x : undefined; +``` +```` + +Order by severity (CRITICAL → MAJOR → MINOR), then by file. + +### Disagreements + +Items where agents substantively disagreed and did not converge. One sentence per side (attribute each by agent **name**, e.g. "Engineering: …" / "Minimalist: …"), then the moderator's call with a one-line reason. + +### Nits + +Do **not** print the nits list by default. Print only one line: _"N nits available (M from convention audit). Reply `show nits` to see them."_ If N is 0, omit the section entirely. When the user replies `show nits`, print the gated NITs as one-liners with `file:lines`, the raising agent's **name**, and `[CONVENTION]` tag where applicable, then re-prompt user gate 1. + +### Withdrawn + +Terse one-liners of Round 1 comments that were withdrawn or filtered. Each line ends with the raising agent's **name** in parentheses (e.g. "C7 — superseded by C4 (Minimalist)"). For transparency only. + +## User gate 1 — select items + +Ask exactly: _"Which items do you want to act on? Reply with ids (e.g. `B1, C2`), `all actionable`, `show nits`, or `none`."_ Do not proceed until the user answers. `none` ends the skill cleanly. `show nits` prints the hidden list, then re-asks the same question. + +## Branch on mode + +### Author mode + +**Plan.** For the selected ids only, produce an ordered implementation plan: steps, files touched per step, tests to add/update, verification commands. Do **not** edit any files yet. + +**User gate 2 (author mode).** Ask: _"What do you want to do? (`implement-locally` / `post-as-review` / `both` / `edit-plan` / `cancel`)"_ + +- `implement-locally` → execute (see below). +- `post-as-review` → post anchored review (see below); skill ends. +- `both` → post the review first, then execute. +- `edit-plan` → revise from feedback, re-prompt. +- `cancel` → stop. + +**Execute.** Use `TodoWrite` to track each step, apply the plan, run the verification, report results. If a step surfaces a new substantive issue not in the selected items, stop and ask before expanding scope. + +### Reviewer mode + +Skip the plan step entirely — you are not implementing someone else's code. + +**User gate 2 (reviewer mode).** Ask: _"Post these on the PR? (`post` / `edit` / `cancel`)"_ + +- `post` → post anchored review (see below). +- `edit` → ask which items to drop or refine, then re-prompt. +- `cancel` → stop. + +## Posting an anchored PR review (both modes) + +When the user approves items to post, submit a **single** review via the GitHub Reviews API (`event: COMMENT` — never `APPROVE`/`REQUEST_CHANGES`), with every selected actionable item as an inline comment anchored to its diff line. Never use loose issue comments. + +Follow [references/posting-pr-review.md](references/posting-pr-review.md) exactly — it has the `gh api` call, the payload shape, the three-block review body (attribution line, synthesis summary, apply-all prompt), and the per-comment body budget and format. After posting, print the review `html_url` and a one-line summary of how many comments were posted (and how many fell back to general notes, if any). + +## Rules + +- Use fresh `general-purpose` subagents for every round — do not reuse other subagent types. +- Issue all agent calls in a single message for each round so they run truly in parallel. +- Keep large content on disk (`/tmp/in-depth-review-*`) so round-2 prompts stay compact. +- If one agent fails or returns malformed output, re-dispatch **that agent only**; do not restart the round. +- If the diff is empty, stop with a one-line message — do not invent findings. +- Do not auto-apply recommendations and do not auto-post reviews. Both user gates are mandatory. +- The moderator filter is non-negotiable: empty `failure_mode` → drop, NIT not meeting the gate → drop, do-not-raise items → drop, caps applied. +- Hide nits by default. Only print them when the user replies `show nits`. +- Reviews are always `event: COMMENT`. Never approve or request changes on the user's behalf. +- Posted comment bodies are terse: ≤60 words of prose + the code block, no bulleted lists, no prose preamble on the `suggestion`, no trailing addenda, and `Why it matters` is dropped when it would only rephrase the Point. The Example/context block is opt-out — include it only when prose + `suggestion` is genuinely ambiguous. +- Conditional agents (E/F/G) run only when classification matches. Do not invent triggers; if the user wants a forced full run, they will say so. +- The Adversarial agent (A) is opt-in. Default selected set is Engineering / Minimalist / Conventions / AntiSlop (B/C/D/H), plus the classified subset of Security / Database / Frontend (E/F/G). Dispatch Adversarial only when the user explicitly asks (`with adversarial`, `with agent a`, `+A`, etc.). Surface the Adversarial on/off decision in the Summary so the user can see what ran. +- **Refer to agents by name in every user-facing surface** — Summary, Actionable items (`Raised by` / `agreed by`), Disagreements, Withdrawn, and `original_author` / `rebuttals[].from` in Round 2 output. Bare letters (A/B/C/D/E/F/G) are acceptable only as ID prefixes (`B1`, `C3`) and as parenthetical shorthand next to the name. The agent-name table in the "Agent roster" section is the authoritative mapping; do not invent alternative names. +- Freshness preflight is mandatory before any agent dispatch. Subagents read repo context via `git show ":"` / `git grep ... ""`, not the worktree filesystem, unless the resolved `context_ref` is `worktree (stale, user accepted risk)`. +- Never run state-modifying git commands on the user's behalf (`checkout`, `stash`, `reset`, `clean`, `pull` with merge). The skill warns and asks; the user resolves local state. `git fetch` is allowed because it does not modify the working tree. +- Cross-repo evidence is opt-in by the user. Subagents tag findings with `evidence_required` and cap them at MAJOR until verified; the moderator asks the user for paths or `gh:owner/repo` slugs and runs the freshness preflight on each before treating the evidence as load-bearing. `skip` downgrades the finding to MINOR with a "speculative" prefix. +- `suggested_fix` is always a `{ before, after }` object of language-matched code snippets (never prose). **In-chat synthesis Actionable section:** render both halves as separate language-tagged fenced blocks (`// Before` then `// After`) — the user has no GitHub renderer in their terminal, so they need the pair to review the change before approving the post. **PR-review inline comments:** render ONLY the `suggestion` block — GitHub already renders the diff vs. the anchored line(s), so a separate `// Before` block would duplicate what's on screen. Empty `before` means pure addition (still emit a `suggestion` block; prefix `after` with `// Add after line `). Empty `after` means pure deletion (omit the `suggestion` block and write "_Delete the anchored line(s)._"). Omit `suggested_fix` entirely for structural changes with no clean drop-in. diff --git a/plugins/core/skills/in-depth-review/references/cross-repo-evidence.md b/plugins/core/skills/in-depth-review/references/cross-repo-evidence.md new file mode 100644 index 0000000000..8f25125f43 --- /dev/null +++ b/plugins/core/skills/in-depth-review/references/cross-repo-evidence.md @@ -0,0 +1,37 @@ +# Cross-repo evidence policy (binding for all agents) + +Read this before raising or finalizing any finding whose evidence may live outside the repo containing the diff. Referenced from `SKILL.md`. + +A finding's evidence is "cross-repo" when it depends on code in any repo other than the one containing the diff. Examples: the diff is in `clipboard-health` but the finding claims that `cbh-admin-frontend`, `payment-service`, or `worker-service-backend` still calls a deprecated endpoint. + +**Subagents must NOT silently read external repos.** Doing so risks (a) reading a stale local checkout and fabricating evidence (see "Freshness preflight"), or (b) citing a path the user has no checkout of, which the moderator can't verify. + +Subagent rule: if a finding's load-bearing evidence is in another repo, the subagent must emit the finding with the additional field: + +```json +"evidence_required": { + "repos": ["cbh-admin-frontend", "payment-service"], + "what_to_verify": "concrete grep / file path / question the moderator should answer to confirm or kill the finding" +} +``` + +…and cap the finding's severity at **MAJOR**. Findings marked `evidence_required` cannot be CRITICAL until the moderator has confirmed the cross-repo evidence on a verified-fresh ref. + +Moderator rule: after Round 1, collect every `evidence_required` block across all subagents and ask the user **before Round 2**: + +> Some findings depend on code outside ``. To verify, I need access to: +> +> - `` — to check `` (raised by agent ) +> - `` — to check `` (raised by agent ) +> +> For each repo, reply with one of: +> +> - a local path (e.g. `/Users/you/repos/cbh/`) — I will run the freshness preflight on it before reading +> - `gh:/` — I will fetch the file content via `gh api repos///contents/?ref=main` instead of cloning +> - `skip` — the finding will be downgraded to "speculative — cross-repo evidence not verified" and capped at MINOR +> +> Or reply `skip all` to downgrade every cross-repo finding. + +For each user-provided local path, run the **same freshness preflight** as on the primary repo (fetch, check ahead/behind, check working-tree cleanliness). If the external repo is stale or on a non-default branch, warn with the same template and require explicit user acknowledgement before reading. Always read external code via `git show "${external_context_ref}:"` / `git grep ... "${external_context_ref}" -- `, never via the worktree filesystem. + +After verification, re-dispatch only the affected subagents (one agent per repo group) with the verified evidence (or its absence) inlined, so they can finalize severity for Round 2. Findings whose cross-repo evidence the user `skip`s are kept but capped at MINOR with a "speculative" prefix in the title. diff --git a/plugins/core/skills/in-depth-review/references/posting-pr-review.md b/plugins/core/skills/in-depth-review/references/posting-pr-review.md new file mode 100644 index 0000000000..06b764a9f3 --- /dev/null +++ b/plugins/core/skills/in-depth-review/references/posting-pr-review.md @@ -0,0 +1,105 @@ +# Posting an anchored PR review (both modes) + +Read this once the user approves items to post. Referenced from `SKILL.md`. + +## Contents + +- API call and payload shape +- Review body (top-level comment): attribution line, synthesis summary, apply-all prompt +- Each comment body: budget, template, formatting rules + +## API call and payload + +Always post via the GitHub Reviews API as a **single** review, with all selected actionable items as inline review comments anchored to specific diff lines. **Never** loose issue comments. + +Build the payload as JSON and pipe it through `gh api --input`: + +```bash +# /tmp/in-depth-review-payload.json contains: {event, commit_id, body, comments: [...]} +gh api -X POST "repos///pulls//reviews" --input /tmp/in-depth-review-payload.json +``` + +Payload shape: + +```json +{ + "event": "COMMENT", + "commit_id": "", + "body": "", + "comments": [ + { "path": "src/foo.ts", "line": 42, "side": "RIGHT", "body": "..." }, + { + "path": "src/bar.ts", + "start_line": 10, + "line": 14, + "start_side": "RIGHT", + "side": "RIGHT", + "body": "..." + } + ] +} +``` + +Rules for posting: + +- `event` is **always `COMMENT`** — never `APPROVE` or `REQUEST_CHANGES`. Approval / blocking is a human decision, not the skill's. +- For multi-line ranges, set `start_line`, `line`, `start_side: "RIGHT"`, `side: "RIGHT"`. +- For findings without a clean line anchor (rare), pick the first changed line of the most relevant file rather than dropping the comment or going loose. +- If literally nothing is anchorable for a given finding, fold it into the review `body` as a labeled "general note" and flag this in the post-confirmation summary so the user knows. +- Use `commit_id` from `pr.headRefOid` so comments anchor to the reviewed commit. +- Resolve `/` once via `gh repo view --json nameWithOwner --jq .nameWithOwner` and stash it in `/tmp/in-depth-review-meta.json` alongside `head_sha` so permalinks below can be built without re-querying. + +## Review body (top-level comment) + +The `body` field on the review (not the inline comment bodies) must contain three blocks, in order: + +**1. Attribution line.** A single sentence disclosing that the review was generated by Claude Code. Use exactly: + +> _These comments were generated by @\ using Claude Code._ + +Substitute `` from `gh api user --jq .login`. Italicize the line so it renders as muted text. This is non-negotiable — collaborators must be able to tell at a glance that the review is AI-assisted. Do **not** mention the In-Depth Review skill by name; it's a local skill the PR author can't see or use. + +**2. Synthesis Summary.** The same one-paragraph summary printed in the in-chat synthesis: what the change does and your recommendation. Do not include mode, per-agent headline verdicts, or which agents ran — that's methodology noise. + +**3. "Apply all comments at once" prompt.** A fenced block containing a self-contained Claude Code prompt the PR author (or any agent operator) can copy-paste into a Claude Code session on a checkout of this branch to address every inline comment in one shot. Template (substitute the `<…>` placeholders before posting): + +````markdown +**Apply all comments at once** — paste this into Claude Code on a checkout of this branch: + +``` +Fetch the most recent review by @ on PR . For every inline comment in that review, address the issue: when the comment includes a `suggestion` block, apply it verbatim; otherwise implement an equivalent fix that satisfies the comment's "Why it matters" rationale. After resolving each thread, post a reply on that thread with a one-line summary of what you changed. When all comments are handled, run the project's tests, commit the changes with a message that references the review, and report back any threads you could not resolve and why. +``` +```` + +Build the body in `/tmp/in-depth-review-payload.json` with literal newlines (use `jq -n --arg body "$BODY" '{body: $body, ...}'` or build the JSON via a small heredoc-fed `python -c` so newlines are real `\n` in the JSON string, not the two characters `\` `n`). + +## Each comment body + +**Budget:** ≤60 words of prose total + the code block. The hard caps below are binding — if a section won't fit, the finding is probably two findings; split or drop one. + +````markdown +**[SEVERITY] Title** + + + +**Why it matters:** + +**Suggested fix:** + +```suggestion + +``` +```` + +Rules for code and links inside the comment body: + +- **No bulleted lists in the comment body.** Bullets fragment reasoning; either the prose fits in one sentence or it doesn't belong on the PR. +- **No prose preamble on the suggested fix.** The code block speaks for itself — don't write "Suggested fix — route Rate the same way as Pay" before the block; just show the block. +- **No trailing addenda.** "If feature X isn't shipped yet…", "Note that this also affects…", "While you're here…" — these belong in the in-chat synthesis, not on the posted comment. The reviewee can ask follow-ups on the thread. +- **Include a `suggestion` block whenever it makes the fix concrete.** Skip it only when the issue is purely structural or when prose alone says everything. +- **`suggestion` block stands alone on the PR.** GitHub already renders the diff vs. the anchored line(s) inside the suggestion block (red `-` lines + green `+` lines + one-click apply). Do **NOT** also include a `// Before` fenced block — it duplicates what GitHub already shows and makes the comment twice as long. This is different from the in-chat **Actionable** section, which DOES render `// Before` + `// After` pairs so the user can review the change before approving the post (the user has no GitHub renderer in their terminal). +- **When to skip the `suggestion` block:** the fix is structural with no clean drop-in (e.g. "move this file"), OR `suggested_fix.after` is empty (pure deletion — write "_Delete the anchored line(s)._" instead), OR `suggested_fix.before` is empty (pure addition — the `suggestion` block still works, but prefix `suggested_fix.after` with a `// Add after line ` comment so the intent reads cleanly). +- **Optional Example/context fenced block is opt-out by default.** Include a non-`suggestion` code block only when prose-plus-`suggestion` is genuinely ambiguous — e.g. you're flagging a pattern violation and the offending pattern is not in the anchored lines. Render it under an **Example / context:** heading as a regular language-tagged fenced block, with a permalink in the prose above pointing to where it lives in the tree. If you're tempted to paste the anchored lines as "context," you're rebuilding the `// Before` block GitHub already renders — drop it. Default: skip. +- **Permalinks for in-prose line references.** Build them as `https://github.com///blob//#L-L` (single line: `#L`). Always pin to `head_sha`, never `main` or a branch name — branch links break as soon as the branch moves. Use Markdown link syntax with a short, meaningful label (e.g. `[the existing ServiceResult pattern](https://github.com/...#L88-L104)`), not the raw URL. + +After posting, print the review `html_url` and a one-line summary of how many comments were posted (and how many fell back to general notes, if any). diff --git a/plugins/core/skills/simple-review/SKILL.md b/plugins/core/skills/simple-review/SKILL.md new file mode 100644 index 0000000000..78fe5e7eed --- /dev/null +++ b/plugins/core/skills/simple-review/SKILL.md @@ -0,0 +1,472 @@ +--- +name: simple-review +description: Single-pass code review of the current branch or a PR using the in-depth review rubric without subagents, posted as anchored PR comments. Use when the user asks to review a small or medium diff, wants a fast or low-budget review, or runs /simple-review [PR-number-or-URL]. For large or high-stakes diffs that benefit from multiple debating reviewers, use in-depth-review instead. +--- + +# Simple Review + +Single-pass code review you (the main agent) perform yourself — **no subagents**. Condenses the rules from `/in-depth-review` into one checklist you walk through directly. Use this for small/medium PRs and when budget matters. Use `/in-depth-review` instead when the diff is large, high-stakes, or benefits from independent perspectives that can debate. + +## Invocation + +- **`/simple-review`** — review the current branch (resolves the open PR for the branch if any, otherwise diffs against the default branch). +- **`/simple-review `** — fast path for reviewing someone else's PR while sitting on `main` (typical entry from a worktree). Skips local branch checkout, fetches diff/metadata via `gh`, forces **reviewer mode**, skips the plan/execute path. Accepts a bare number (current repo) or full GitHub URL (identifies owner/repo). + +## Scope + +### Path A — PR-argument fast path (argument provided) + +- Parse the argument: bare integer → `gh repo view --json nameWithOwner --jq .nameWithOwner`; URL → extract `/` and ``. +- Do **not** check out the PR branch. +- Mode is **locked to reviewer**. + +Gather in parallel: + +```bash +gh pr view --repo / --json number,url,title,body,baseRefName,author,headRefOid,files +gh pr diff --repo / +gh api user --jq .login +``` + +If `pr.author.login == viewer.login`, still proceed in reviewer mode but flag this in Summary so the user can switch to a manual `/simple-review` run on their checkout if they meant to implement. + +### Path B — current-branch path (no argument) + +- Open PR for current branch → review that PR. Base = `baseRefName`, diff = `git diff $base...HEAD`. Context = PR title + body. +- No PR → diff vs default branch (`main`, fall back to `master`). Context = `git log --format='%h %s%n%n%b' $base..HEAD`. +- **Never** review uncommitted working-tree changes. Empty diff → stop and report. + +Determine **mode**: + +- PR exists and `pr.author.login == viewer.login` → **author mode**. +- PR exists and authors differ → **reviewer mode**. +- No PR → **author mode**. + +You hold review state in-context — no `/tmp/*` persistence needed for findings (the PR-posting step writes one transient payload file; see Posting an anchored PR review). + +## Freshness preflight (mandatory before reading code) + +Stale local state produces false-positive findings. Before reviewing, verify the ref you'll read for context is current. + +Run on the **primary repo** (the one containing the diff): + +```bash +git fetch origin "$base" --quiet +git rev-parse --abbrev-ref HEAD +git status --porcelain +git rev-list --left-right --count "HEAD...origin/${base}" +git log -1 --format='%h %ci' "origin/${base}" +``` + +Decide `context_ref`: + +- **Path A (reviewer):** `context_ref = origin/${base}` (usually `main`). Worktree may be used only when `HEAD == origin/${base}` AND clean AND fetch succeeded. +- **Path B (author):** `context_ref = origin/${base}`. The local feature branch IS the diff; pre-PR context comes from `origin/${base}`. + +Stop and ask the user when: + +1. `git fetch` failed (offline, auth). +2. `HEAD` differs from `origin/${base}` on Path A. +3. Working tree is dirty AND dirty paths overlap the diff's changed-file list or anything you'll need to read. +4. `HEAD` is behind `origin/${base}` (any non-zero "behind"). +5. `HEAD` is more than a small number of commits ahead of `origin/${base}` on Path A. + +Warn template (substitute verified state): + +> Freshness check for `/` at ``: +> +> - on branch `` (expected `` for Path A) +> - `` ahead, `` behind `origin/` (last: ` `) +> - working tree: `` +> +> Reading from this worktree may surface findings based on stale state. +> Reply: `proceed` (use worktree, accept the risk), `use-origin` (read context via `git show origin/:` — recommended), or `stop`. + +**Never** run `git checkout`, `stash`, `reset`, or other state-modifying git on the user's behalf. The skill warns and asks; the user resolves local state. `git fetch` is allowed (read-only). + +For Path A PR review when the worktree is dirty or on a non-base branch, the practical workaround is to fetch the PR head to a local ref: + +```bash +git fetch origin "pull//head:refs/remotes/origin/pr-" --quiet +``` + +Then read PR-head content via `git show origin/pr-:` and base context via `git show origin/:`. No checkout needed. + +### Reading code + +When `context_ref = origin/` (or `origin/pr-` for PR head): + +- Read via `git show "${context_ref}:"` (whole files) or `git grep -n "${context_ref}" -- ` (search). +- Avoid the Read tool on the worktree filesystem for tracked content — it may be stale. +- Worktree Read is OK only for files brand-new in the diff (untracked at `context_ref`); note "verified against: worktree" in the finding. + +When `context_ref = worktree (stale, user accepted risk)`: + +- Read tool is OK but every CRITICAL/MAJOR finding is downgraded to MINOR unless evidence is internal to the diff itself. Tag each finding with "verified against: worktree-stale". + +## Cross-repo evidence policy + +A finding's evidence is "cross-repo" when its load-bearing claim depends on code in any repo other than the one containing the diff — most commonly a consumer, producer, or downstream parser of something the diff changes. **Never silently read external repos and never claim a downstream impact you have not verified** — speculating that "the FE will break" without reading the consumer is a top source of false-positive findings. Cap any cross-repo finding at **MAJOR** until verified. + +Read [references/cross-repo-evidence.md](references/cross-repo-evidence.md) before raising or finalizing any cross-repo finding — it has when the policy fires, the verify-or-downgrade procedure, the access-request template, and what "verify" means for contract/schema vs API-response changes. + +## Diff classification (pick which lenses apply) + +Walk the changed-file list. Activate lenses that match: + +- **Security lens** triggers on: `routes/`, `controllers/`, `middleware*/`, files matching `auth*`/`*permission*`/`*acl*`/`*token*`/`*session*`, response serializers, OpenAPI/contract definitions, new API endpoint files. +- **Database lens** triggers on: `migrations/`, `*.sql`, files matching `schema*`, Mongoose/Prisma model files (`models/`, `*.model.ts`, `*.schema.ts`), repository/DAL files, query builders. +- **Frontend lens** triggers on: `*.tsx`, `*.jsx`, `*.css`, `*.scss`, `pages/`, `components/`, `hooks/`, or anything importing from `react`, `@tanstack/react-query`, or a design-system package. + +Always-on lenses: **Engineering**, **Minimalism**, **Conventions**, **AntiSlop**. + +## Severity rubric (binding) + +- **CRITICAL** — realistic input causes incorrect behavior, data loss, security regression, broken contract, or a paging incident. +- **MAJOR** — meaningful degradation of correctness/UX/observability under realistic conditions; OR a documented-convention violation with concrete downstream impact. +- **MINOR** — cheap, concrete improvement with a named benefit. +- **NIT** — only admissible if (a) repeats ≥2× in the diff, (b) conflicts with a documented convention, or (c) is a one-line trivial fix. + +Every finding **must** include a `failure_mode`: one sentence on the concrete user-, oncall-, or maintainer-visible bad outcome that would occur if not fixed. Hypotheticals like "a future caller might…" do **NOT** satisfy this — drop the finding. + +### Do-not-raise list (binding) + +- Speculative defensiveness at trusted internal boundaries. +- Restating the obvious ("consider a comment explaining what this does"). +- Hypothetical future-caller scenarios with no current caller. +- Style/formatting a linter or formatter covers. +- Test-coverage demands on trivially-verifiable code. +- "Add observability" without naming a concrete failure mode it would help debug. +- Abstract SOLID-style "consider extracting…" without a concrete failure mode. +- Aesthetic naming preferences — only raise names that mislead about behavior. + +## Review checklist + +Walk the diff once. For each lens, you're looking for _the smallest number of high-signal findings_, not exhaustive coverage. Cap **6 actionable items** (CRITICAL/MAJOR/MINOR) plus **8 nits** retained internally; anything beyond is "N additional items omitted; ask for the full list." + +For every candidate finding, run the litmus test before keeping it: _"What is the concrete, current, product-visible cost of leaving this code in?"_ If you can't answer in one sentence, drop it. + +### Engineering (always) + +For each change name a realistic input or condition that would expose a bug. If you cannot, do not raise it. + +- Edge cases, error paths, observability of real failure modes. +- Tests cover real risk, not lines. +- Concurrency, performance at real scale, data integrity. +- Backward compatibility, on-call implications, degraded-mode behavior. +- Async/await ordering matches actual data dependencies. +- Timezone correctness in date code; currency variables explicitly in minor units. +- API surface changed → AuthN/AuthZ/PII (deep dive under Security). +- Migration → rollback, rolling-deploy compat, ETL/downstream impact (deep dive under Database). +- Schema/query changed → indexes for new patterns, N+1, cascade semantics, type fit (deep dive under Database). +- Telemetry covers business/product value, not only engineering surface metrics. +- Contract/backward-compat for any consumer-visible response shape change. **If you suspect a consumer break, the finding is cross-repo — follow the Cross-repo evidence policy before raising it. Do not raise speculative "the FE will fail to parse" findings without reading the FE.** + +### Minimalism (always) + +The smallest diff that ships the intent is the best diff. + +- Unneeded abstractions, speculative generality, dead branches. +- Redundant validation, defensive code at trusted internal boundaries. +- Comments that restate code; new files/utilities that duplicate existing ones. +- Tests that exercise the framework, not behavior. +- Flags/config knobs without a concrete caller. +- Duplicated error handlers in the same scope. +- Commented-out code or dead branches. + +For every "delete this" finding, `failure_mode` must state the **concrete cost of keeping the code**. + +### Conventions (always) + +You are the convention owner — read the repo's actual conventions before flagging anything. Consult, in order of priority: + +- `git show ${context_ref}:AGENTS.md` and `CLAUDE.md` (if present) +- `git ls-tree -r --name-only ${context_ref} -- .rules` then read each rule file relevant to the diff +- Neighboring files in the same module/service for in-practice patterns +- Package READMEs in the touched paths + +Check the diff for (only what's actually documented in the consulted sources): + +- Preferred internal packages over third-party (`@clipboard-health/*`, internal `lib/*`). +- Terminology rules (worker/workplace, not HCP/facility). +- `lodash` removal preference. +- `@clipboard-health/datetime` over `date-fns`/`date-fns-tz`/`moment`/`luxon`. +- Internal `Money` package for currencies; explicit minor-units variable names. +- Logging conventions (`longContext`, `ObjectId.toString()`, no variables in log messages). +- Null/undefined checks via `isDefined`/`isNil` over truthy. +- 4+ argument functions → params object. +- Error-handling patterns in the same service (typed `ServiceResult` vs throw). +- Test conventions, naming, file location (service test structure, `createTestContext`/`tearDown`, `beforeAll/afterAll` for GET). +- Business-meaningful magic constants that should be named. +- ENV access via Configuration abstraction, not direct `process.env`. +- Logging library standard for the repo — derive from existing code, not hardcoded. +- Pinned dependencies; lock file in sync with `package.json`. +- RESTful route shape and uniform response format within a service. +- **Internal inconsistency within the diff itself** (e.g. half of imports from one package family, half from upstream; same persisted shape written three different ways across three call sites). + +Frontend conventions are the Frontend lens's job when activated. If FE lens isn't active (no FE files), surface FE convention drift here only if it appears. + +Tag every convention finding with `[CONVENTION]` in the title. Cap severity at MAJOR (only when behavior diverges as a result) or MINOR otherwise. + +### AntiSlop (always) + +This PR may have been written or assisted by an LLM. For each addition, ask: _"Is this line earning its keep, or is it pattern-matching what code is supposed to look like?"_ Push back on what other lenses are too polite to flag. Apply to additions **inside the diff itself**. + +- **Defensive code at trusted internal boundaries.** Null guards on private helpers whose callers' types guarantee non-null; `try`/`catch` wrapping a single non-throwing call, or that re-throws unchanged, or that "logs and swallows" without naming what to do next; optional-chaining through types that don't include optionality. +- **Defensiveness against unrealistic product scenarios.** Litmus: _"In the real product flow this code participates in, what user action / system event / upstream call could land us in this branch?"_ If the answer is "none" or "I had to invent one to justify the guard", it's slop. Concrete shapes: + - Null/undefined guard on an ID immediately after that ID was used to load (and find) the entity. + - A `null`/`undefined`/`""`/`0` branch on a field whose TypeScript or Zod/class-validator already rejects those. + - Re-validation of a value the request DTO already validated upstream in the same lifecycle. + - Consistency check (`if (a !== b) throw`) between two fields the data model forbids being unequal. + - Branch for a product-impossible state. + - Retry/fallback around an SDK call that already retries or returns a typed error. + - `catch` for an error class statically known not to throw, or that logs+swallows. + - "Future-proof" code path with no current `v2`. +- **Restating-the-code comments.** `// fetch the user`, JSDoc on private helpers that only restates the signature, `// Note: this is important`, section banners in short files. +- **Empty scaffolding.** `// TODO` with no owner/ticket; redundant pre-conditions; debug logs that survived to the PR; default `else { return undefined; }` after exhaustive branching; `_unused` prefixes that should be deletions. +- **Speculative generality.** Helper called once that wraps two trivial lines; `Map`/`Set`/config keyed by a single hardcoded value; "strategy"/"registry" pattern with one strategy; union types whose only second case is `never`/placeholder. +- **Unused parameters/overloads/fields.** Args destructured but never read; interface methods with empty implementations; new optional fields with no producer or consumer. +- **Tests that exercise the framework, not the code.** `jest.fn().mockReturnValue(x); expect(fn()).toBe(x)`; snapshot tests with no semantic assertion; tests that mock the unit under test; test names that describe the implementation (`it("calls foo.bar"…)`) instead of behavior. +- **Dead AI breadcrumbs.** Variables whose only use is logging or debug branches; `console.log`/`console.error` that should have been removed before commit; commented-out alternative implementations. +- **Tone/description mismatch.** PR description claims behavior the diff doesn't have; variable/function names that pattern-match engineering writing without naming the role (`data`, `result`, `processed`, `_handle`, `doStuff`). + +Default `suggested_fix` is **delete** (empty `after`) or **simplify** (smaller `after`). Suggesting "add a justifying comment" is itself slop — do not propose it. + +### Security lens (when triggered) + +Where could this change leak data, bypass authorization, or expand the trust boundary? + +- AuthN: every new/modified endpoint has authenticated identity unless explicitly public. +- AuthZ: per-endpoint AND per-resource permission checks; cross-tenant/cross-user access. +- Secrets in configuration — no hardcoded keys, no secrets in client bundles. +- No self-made or client-side cryptography. +- SQL/NoSQL injection: string-concatenated queries, unvalidated `$where`, raw aggregation pipelines from user input. +- Sensitive data in response payloads — over-fetching, over-serialization, internal IDs leaked. +- PII fields logged, cached, or sent to telemetry. +- Input validation at the boundary (Zod / class-validator) on every new endpoint. +- For numeric inputs: explicit `min`/`max` bounds. Verify what the JS engine does with `Number.MAX_SAFE_INTEGER + 1`, BigInt conversion, negative values, and whether the catch path returns a structured 400 or a 500 with stack log. + +### Database lens (when triggered) + +What breaks at production scale or during deploy? + +- Index coverage for new query patterns; missing compound indexes. +- N+1 query shapes; loops issuing per-iteration queries. +- Schema normalization — denormalization that creates write-amplification or update anomalies. +- Data types — range, precision, locale (numeric, date, money). Mongoose `Number` is IEEE 754 double; safe for integers up to 2^53. +- Cascade-delete and FK-on-delete semantics; orphan risk. +- Migration rollback strategy; online without downtime. +- Backward compatibility during a rolling deploy (old code reads new schema; new code reads old schema). +- ETL / downstream consumer impact when schema or field semantics change. +- New collections / tables ship with the indexes their query patterns need on day one. +- **Dual-write fields** (storing both `amount` and `amountInMinorUnits`-style pairs): is canonicalization happening on write, or does the on-disk record encode two contradictory values? +- Backfill for new fields on existing records: present, deferred (with ticket), or missing? + +### Frontend lens (when triggered) + +Does this follow our FE conventions, and will it behave correctly under realistic user conditions? + +- API calls through v2 / custom hooks, not raw `fetch`/`axios` in components. +- React Query: thin wrappers (`useGetQuery`, etc.), not raw `useQuery`/`useMutation`. +- Falsy checks via `isDefined()`/`isNil()`, not `!value`. +- Test actions wrapped in `act()`. +- Zod: `z.nativeEnum` when a TS enum already exists. +- Prop drilling — when a value passes through ≥2 layers, consider context. +- Feature flags via `useCbhFlag` (or repo equivalent), not direct config reads. +- New components/styles from the design-system library, not built from scratch. +- Loading / empty / error states for any new data-fetching surface. +- Accessibility: keyboard navigation, ARIA roles, labels on interactive controls. + +## Self-filter (before synthesis) + +After walking the checklist, apply these filters to your candidate findings: + +1. **Drop findings with empty or hypothetical `failure_mode`.** "A future caller might…", "in case someone…" → drop. +2. **Self-audit for slop.** For every finding you wrote, ask: does it match a slop pattern (asks-for-defensive-guard on already-narrowed value; hypothetical future caller; restating-obvious comment request; abstract refactor with no concrete cost-of-keeping; observability without named failure mode; test for trivially-verifiable code; defends against a state the product cannot produce)? If yes and you can't write a concrete, product-specific cost in one sentence — drop it. Being your own AntiSlop reviewer is the main lever for keeping this skill honest. +3. **Cross-repo audit.** For every finding, ask: does the failure_mode reference a downstream actor (FE, mobile, consumer service, rolling deploy, external library user) or a contract/schema/public-artifact boundary? If yes, did you actually read the relevant external file(s) to confirm the claim, or are you reasoning from priors about "how FEs usually work"? If you didn't read it, the finding is cross-repo — route to the Cross-repo evidence policy (verify, ask for access, or downgrade to a clearly-labeled "speculative" MINOR). Do not ship a "consumer will break" finding sourced from imagination. +4. **Drop do-not-raise items** that slipped through. +5. **Apply the NIT gate.** NITs that don't meet (a)/(b)/(c) → drop. Kept internally but hidden by default in synthesis. +6. **Merge near-duplicates** under one finding (note which lens(es) surfaced it). +7. **Apply hard caps.** 6 actionable, 8 NITs retained, rest summarized as "N additional items omitted". + +Track dropped items in a Withdrawn section (one-liner each) so the user can see what was filtered. + +## suggested_fix schema (when present) + +```json +"suggested_fix": { + "before": "", + "after": "" +} +``` + +- Pure deletion: `after` is `""`. +- Pure addition at a new line: `before` is `""`, prefix `after` with `// add after line `. +- Structural change with no clean drop-in: omit `suggested_fix`, describe in `point`/`failure_mode`. +- Both must be code (not prose), preserving the file's indentation. Don't elide with `// ...`. + +## Synthesize + +### Summary + +One short paragraph: what the change does and your overall recommendation (ship / ship with changes / do not ship). Do not state the mode, list the lenses you applied, or list the convention sources you consulted — that metadata is noise for the user. Keep it to substance. + +### Actionable + +Up to 6 items. Format each: + +- **[SEVERITY] Title** — `file:lines` +- **Point:** one sentence. +- **Why it matters:** the `failure_mode`. +- **Suggested fix (before → after):** two stacked fenced code blocks — first `// Before` (current code), then `// After` (replacement). Same language tag for both. For a pure deletion, show only `Before` and write "_Delete these lines._" For a pure addition, show only `After` prefixed with `// Add after line `. Omit when structural; explain in prose. +- **Lens(es):** which lens(es) surfaced this (e.g. `Engineering, AntiSlop`). + +Render template: + +````markdown +**Suggested fix (before → after):** + +```ts +// Before — src/foo/bar.ts:42-45 +const x = doThing(); +if (x !== null) { + return x; +} +``` + +```ts +// After +const x = doThing(); +return isDefined(x) ? x : undefined; +``` +```` + +Order by severity (CRITICAL → MAJOR → MINOR), then by file. + +### Nits + +Do **not** print by default. Print only: _"N nit(s) available (M from convention audit). Pick `Show the N NIT(s) first` in the picker below to see them."_ If N is 0, omit this entire section. Surface the "show nits" affordance as an option in the User gate 1 `AskUserQuestion` call (not as typed text). When selected, print one-liners with `file:lines` and `[CONVENTION]` tag where applicable, then re-call User gate 1. + +### Withdrawn + +Terse one-liners of items you dropped (do-not-raise, NIT gate, self-slop audit). Transparency only. + +## User gate 1 — select items + +Ask via **`AskUserQuestion`**, not as a text prompt. Render each actionable finding as an option, with a trailing "show the N NIT(s) first" option when `N > 0`. Set `multiSelect: true` so the user can pick any combination (including none). Do not proceed until the user answers. + +**Template:** + +```ts +AskUserQuestion({ + questions: [ + { + question: "Which findings should I ?", // "post as inline comments on PR " / "implement locally" / "act on" + header: "Findings", + multiSelect: true, + options: [ + { label: "1 - ()", description: "" }, + { label: "2 - …", description: "…" }, + // …one entry per actionable finding (cap 6) + { label: "Show the NIT(s) first", description: "Print the NIT list before deciding." }, // omit when N == 0 + ], + }, + ], +}); +``` + +Option-label format: ` - ()`. The id matches the in-chat numbering in the Actionable section above. The `description` is one short sentence — the failure_mode, not the fix. Do not pad with file paths; the user already has the full Actionable section in the chat. + +Branching on the answer: + +- **Zero findings selected** (user submitted no boxes / picked Skip) → end the skill with a one-line "no findings selected — nothing to do" reply. +- **Only "Show the N NIT(s) first"** selected → print the NITs (one-liners with `file:lines` and `[CONVENTION]` tag where applicable), then re-call `AskUserQuestion` for this same gate. +- **Findings selected (with or without the NIT toggle)** → proceed to gate 2. If the NIT toggle was also checked, print NITs once before gate 2. + +## Branch on mode + +### Author mode + +**Plan.** For the selected ids only, produce an ordered implementation plan: steps, files touched per step, tests to add/update, verification commands. Do **not** edit files yet. + +**User gate 2 (author)** — ask via **`AskUserQuestion`** (`multiSelect: false`): + +```ts +AskUserQuestion({ + questions: [ + { + question: "What do you want to do with the selected findings?", + header: "Next step", + multiSelect: false, + options: [ + { label: "Implement locally", description: "Apply the plan against your checkout." }, + { + label: "Post as review", + description: "Post anchored comments on your own PR for the team to see, then stop.", + }, + { label: "Both", description: "Post the review first, then execute the plan locally." }, + { label: "Edit the plan", description: "Revise the plan; I'll re-prompt this gate." }, + { label: "Cancel", description: "Stop." }, + ], + }, + ], +}); +``` + +- `Implement locally` → execute the plan. +- `Post as review` → post anchored review (below); skill ends. +- `Both` → post review first, then execute. +- `Edit the plan` → revise, re-prompt this gate. +- `Cancel` → stop. + +**Execute.** Use TodoWrite to track each step. Apply the plan, run verification, report results. If a step surfaces a new substantive issue not in the selected items, stop and ask before expanding scope. + +### Reviewer mode + +Skip the plan step — you are not implementing someone else's code. + +**User gate 2 (reviewer)** — ask via **`AskUserQuestion`** (`multiSelect: false`): + +```ts +AskUserQuestion({ + questions: [ + { + question: "Post these on the PR?", + header: "Post review", + multiSelect: false, + options: [ + { + label: "Post", + description: "Submit a single COMMENT review with the selected anchored comments.", + }, + { label: "Edit", description: "Ask which to drop or refine, then re-prompt." }, + { label: "Cancel", description: "Stop." }, + ], + }, + ], +}); +``` + +- `Post` → post anchored review. +- `Edit` → ask which to drop or refine, then re-prompt. +- `Cancel` → stop. + +## Posting an anchored PR review (both modes) + +When the user approves items to post, submit a **single** review via the GitHub Reviews API (`event: COMMENT` — never `APPROVE`/`REQUEST_CHANGES`), with every selected actionable item as an inline comment anchored to its diff line. Never use loose issue comments. + +Follow [references/posting-pr-review.md](references/posting-pr-review.md) exactly — it has the `gh api` call, the payload shape, the three-block review body (attribution line, summary, apply-all prompt), and the per-comment body budget and format. After posting, print the review `html_url` and a one-line summary of how many comments were posted (and fallback general-notes count if any). + +## Rules + +- **No subagents.** This skill runs entirely in the main context. If a finding needs deeper independent investigation that you can't do confidently in-line, surface it as a flagged finding rather than guess. +- Issue parallel `gh`/`git` commands in a single message when independent (PR view, diff, files, viewer). +- If a finding has empty `failure_mode` → drop. NIT not meeting the gate → drop. Do-not-raise items → drop. Caps applied (6 actionable, 8 nits). +- Hide nits by default. Print only when the user selects `Show the N NIT(s) first` in the User gate 1 picker. +- Reviews are always `event: COMMENT`. Never approve or request changes on the user's behalf. +- Conditional lenses (Security/Database/Frontend) run only when classification matches. +- Freshness preflight is mandatory before reading code. Read repo context via `git show ":"`, not the worktree filesystem, unless `context_ref = worktree (stale, user accepted risk)`. +- Never run state-modifying git commands on the user's behalf (`checkout`, `stash`, `reset`, `clean`, `pull` with merge). Warn and ask. `git fetch` is allowed. +- Cross-repo evidence is opt-in by the user. The policy fires whenever the diff touches a contract/schema/published-artifact boundary OR the finding's failure_mode references a downstream actor (FE, mobile, consumer service, rolling deploy). Identify the specific external file you'd need to read, search locally first, then ask the user using the access-request template. Cap at MAJOR until verified; `skip` → either drop or keep as MINOR with a visible "speculative — assumes ``" prefix. **Never raise a "consumer will break" finding without reading the consumer.** +- `suggested_fix` is always a `{ before, after }` object of code (not prose). **In-chat Actionable:** render both halves as separate `// Before` then `// After` fenced blocks. **PR-review inline comments:** render only the `suggestion` block (GitHub already renders the diff). Empty `before` = pure addition; empty `after` = pure deletion (omit `suggestion`, write "_Delete the anchored line(s)._"); omit entirely for structural changes. +- Both user gates are mandatory and **must** be presented via `AskUserQuestion` (the structured picker), not as inline text questions. Do not auto-apply recommendations and do not auto-post reviews. diff --git a/plugins/core/skills/simple-review/references/cross-repo-evidence.md b/plugins/core/skills/simple-review/references/cross-repo-evidence.md new file mode 100644 index 0000000000..1e2dfcd910 --- /dev/null +++ b/plugins/core/skills/simple-review/references/cross-repo-evidence.md @@ -0,0 +1,64 @@ +# Cross-repo evidence policy + +Read this before raising or finalizing any cross-repo finding. Referenced from `SKILL.md`. + +A finding's evidence is "cross-repo" when its load-bearing claim depends on code in any repo other than the one containing the diff — most commonly a producer, consumer, or downstream parser of something the diff changes. **Never silently read external repos** — they may be stale or you may have no checkout at all. Equally, **never claim a downstream impact you have not verified.** Speculating that "the FE will break" or "a consumer will choke on this" without reading the consumer is a top source of false-positive findings. + +## When this policy fires + +Treat a finding as cross-repo _before_ raising it whenever any of these apply: + +1. The diff touches a **contract/schema package** (anything in `packages/contract-*`, `packages/*-contract*`, ts-rest contracts, JSON Schema files, OpenAPI specs, Protobuf, Avro, etc.) — consumers of these packages live in other repos by design. +2. The diff alters a **deployed-artifact boundary**: HTTP response shape, queue message shape, public SDK signature, exported library type, env-var contract, DB-record shape read by other services. +3. The failure mode you wrote references a downstream actor by role rather than name: _"the FE will…", "the mobile app will…", "consumers will…", "a rolling deploy will break for…"_ — you cannot verify any of those claims from inside the diff's repo alone. +4. The finding is a "backward-compatibility" or "rolling-deploy" concern about a producer change. +5. The finding's argument is "this diverges from how other consumers do X" — and "other consumers" are not in this repo. + +If any of (1)–(5) is true, the finding is cross-repo. Move on to **Verify or downgrade** below before keeping it in your candidate list. + +## Verify or downgrade (binding) + +For each cross-repo finding: + +1. **Identify the specific consumer/producer file(s)** you would need to read to confirm the claim. Be concrete: _"facility app's `useGetInvoiceBalances` hook to see whether it imports the contract schema and whether `.parse()` is strict."_ If you cannot name a specific artifact, the finding is speculative — drop it. +2. **Search likely locations you already have access to** before asking the user. Sibling repos under the same parent directory, monorepo workspaces, vendored copies. Use targeted `grep`/`git grep` — do not recursively scan the whole filesystem. +3. **If found locally**, run the **same freshness preflight** on that external repo (branch, dirty state, behind/ahead). Read context via `git show "${external_context_ref}:"`, never via worktree filesystem. Note "verified against: `@`" in the finding. +4. **If not found locally**, ask the user for access using the template below. Do not raise the finding until the user has responded. +5. **If the user picks `skip`**, you have two choices: + - **Drop the finding entirely** if the speculative version doesn't pass the litmus test ("concrete, current, product-visible cost"). + - **Keep as MINOR with explicit "speculative" prefix in the title** — only when the worst-case is genuinely concerning AND you can name the _assumption_ the finding rests on (e.g., _"assuming the FE imports this schema and parses strictly, then…"_). Make the assumption visible so the PR author can confirm or refute it. + +Severity cap on any cross-repo finding is **MAJOR** until verified. Anything verified as actual breakage can be raised to its true severity. + +## Access-request template + +When asking the user for access to external repos, name the specific file and the specific question. Do not ask vaguely for "access to repo X" — ask for the evidence that would change your mind. + +> One finding depends on code outside ``. Before I raise it, I need to verify: +> +> **Finding:** `` +> **Claim:** `` +> **What I need to read:** `/` — to check ``. +> +> Options: +> +> - **Local path** — give me an absolute path to a checkout (I'll run freshness preflight and read via `git show`). +> - **`gh:/`** — I'll fetch via `gh api repos///contents/?ref=main`. +> - **`skip`** — I'll either drop the finding or keep it as MINOR with a "speculative — assumes ``" prefix; you can confirm/refute. +> - **`skip all`** — apply `skip` to every remaining cross-repo finding. + +## What "verify" means in practice + +For a contract/schema change, the questions you must answer before claiming consumer impact are concrete: + +- Does the consumer **import the contract schema directly**, or define its own local schema? (Local schema = your contract change has zero direct effect on the consumer's parser.) +- If it imports, does it call `.strict()`, `.passthrough()`, or rely on the default (strip unknowns)? Required fields missing → parse failure; extra fields → silently stripped under default. +- Is the consumer pinned to a published version of the package, or symlinked/workspace-resolved? Pinned → upgrade is explicit and the consumer team controls timing. +- Is the field actually read at runtime, or just typed? Type-only references don't fail at runtime even when the schema diverges. + +For an API response change (without consumer-side schema): + +- Does the consumer JSON-parse and read specific keys, or pass the body through opaquely (e.g. to a logger, to storage)? +- Does any layer in between (BFF, gateway) apply its own validation? + +Answer these from real code, not from priors about "how FEs usually work." When you can't answer, you do not have a finding — you have a hypothesis. diff --git a/plugins/core/skills/simple-review/references/posting-pr-review.md b/plugins/core/skills/simple-review/references/posting-pr-review.md new file mode 100644 index 0000000000..6b54d7ca16 --- /dev/null +++ b/plugins/core/skills/simple-review/references/posting-pr-review.md @@ -0,0 +1,100 @@ +# Posting an anchored PR review (both modes) + +Read this once the user approves items to post. Referenced from `SKILL.md`. + +## Contents + +- API call and payload shape +- Review body (top-level): attribution line, summary, apply-all prompt +- Each comment body: budget, template, formatting rules + +## API call and payload + +Post via the GitHub Reviews API as a **single** review with all selected actionable items as inline review comments anchored to specific diff lines. **Never** loose issue comments. + +```bash +# /tmp/simple-review-payload.json contains: {event, commit_id, body, comments: [...]} +gh api -X POST "repos///pulls//reviews" --input /tmp/simple-review-payload.json +``` + +Payload shape: + +```json +{ + "event": "COMMENT", + "commit_id": "", + "body": "", + "comments": [ + { "path": "src/foo.ts", "line": 42, "side": "RIGHT", "body": "..." }, + { + "path": "src/bar.ts", + "start_line": 10, + "line": 14, + "start_side": "RIGHT", + "side": "RIGHT", + "body": "..." + } + ] +} +``` + +Rules: + +- `event` is **always `COMMENT`** — never `APPROVE` or `REQUEST_CHANGES`. Approval is a human decision. +- Multi-line ranges: `start_line`, `line`, `start_side: "RIGHT"`, `side: "RIGHT"`. +- No clean anchor → pick the first changed line of the most relevant file; if literally nothing anchorable, fold into the review `body` as a labeled "general note" and flag in the post-confirmation summary. +- `commit_id` from `pr.headRefOid`. +- Build JSON with literal newlines (use `jq -n --arg body "$BODY" '...'` or a small `python -c`/heredoc), not `\n` escapes. + +## Review body (top-level) + +Three blocks, in order: + +**1. Attribution line.** Exactly: + +> _These comments were generated by @\ using Claude Code._ + +Italicized. Substitute `` from `gh api user --jq .login`. Non-negotiable — collaborators must be able to tell at a glance the review is AI-assisted. Do **not** mention the Simple Review skill by name; it's a local skill the PR author can't see or use. + +**2. Summary.** Same paragraph from the in-chat synthesis: what the change does and your recommendation. Do not include mode, lenses applied, or convention sources consulted — that's noise. + +**3. "Apply all comments at once" prompt.** A fenced block with a self-contained Claude Code prompt the PR author can paste into a Claude Code session on a checkout of this branch: + +````markdown +**Apply all comments at once** — paste this into Claude Code on a checkout of this branch: + +``` +Fetch the most recent review by @ on PR . For every inline comment in that review, address the issue: when the comment includes a `suggestion` block, apply it verbatim; otherwise implement an equivalent fix that satisfies the comment's "Why it matters" rationale. After resolving each thread, post a reply on that thread with a one-line summary of what you changed. When all comments are handled, run the project's tests, commit the changes with a message that references the review, and report back any threads you could not resolve and why. +``` +```` + +## Each comment body + +**Budget:** ≤60 words of prose total + the code block. Hard caps below are binding — if a section won't fit, the finding is probably two findings; split or drop one. + +````markdown +**[SEVERITY] Title** + + + +**Why it matters:** + +**Suggested fix:** + +```suggestion + +``` +```` + +Rules: + +- **No bulleted lists in the comment body.** Bullets fragment reasoning; either the prose fits in one sentence or it doesn't belong on the PR. +- **No prose preamble on the suggested fix.** The code block speaks for itself. Don't write "Suggested fix — route Rate the same way as Pay" before the block; just show the block. +- **No trailing addenda.** "If feature X isn't shipped yet…", "Note that this also affects…", "While you're here…" — these belong in the in-chat synthesis, not on the posted comment. The reviewee can ask follow-ups on the thread. +- **Optional Example/context fenced block is opt-out by default.** Include a non-suggestion code block only when prose-plus-`suggestion` is genuinely ambiguous (e.g. you're flagging a pattern violation and the offending pattern is not in the anchored lines). Default: skip. +- **Include a `suggestion` block whenever it makes the fix concrete.** Skip when the issue is purely structural or when prose alone says everything. +- **`suggestion` block stands alone on the PR.** GitHub already renders the diff vs. anchored line(s). Do **NOT** include a `// Before` fenced block in PR comments — it duplicates what GitHub shows. This is different from the in-chat **Actionable** section, which DOES render `// Before` + `// After` pairs so the user can review before approving the post. +- **When to skip the `suggestion` block:** structural fix with no drop-in (use prose + a target-shape fenced block), OR `suggested_fix.after` is empty (pure deletion — write "_Delete the anchored line(s)._" instead), OR `suggested_fix.before` is empty (pure addition — `suggestion` block still works; prefix `after` with `// Add after line `). +- **Permalinks for in-prose line references.** Build as `https://github.com///blob//#L-L` (single line: `#L`). Always pin to `head_sha`. Use Markdown link syntax with a meaningful label. + +After posting, print the review `html_url` and a one-line summary of how many comments were posted (and fallback general-notes count if any).