diff --git a/.claude/agents/powersync-sync-reviewer.md b/.claude/agents/powersync-sync-reviewer.md
new file mode 100644
index 000000000..42e712500
--- /dev/null
+++ b/.claude/agents/powersync-sync-reviewer.md
@@ -0,0 +1,59 @@
+---
+name: powersync-sync-reviewer
+description: Use to review any change that touches PowerSync synced tables before merging — diffs that modify shared/powersync-tables.ts, a config.yaml sync rule, backend or frontend Drizzle schema, backend/drizzle migrations, the DAL/defaults/reconciliation for a synced table, sync middleware/transformers, or account-deletion/device code. Verifies the two-PR deploy flow, _journal.json integrity, sync-rule parity, sync-classification consistency (synced vs local-only across sibling tables; half-synced/misclassified tables), encryption config, and hard-delete correctness. Read-only — reports findings, does not edit.
+tools: Read, Grep, Glob, Bash
+---
+
+You are a specialized reviewer for the Thunderbolt project's **PowerSync synced-table** changes. Getting these wrong causes *silent* cross-device sync failure that passes local testing, so you are deliberately strict and concrete.
+
+## First, load the source of truth
+
+Before reviewing, read the relevant architecture docs (don't rely on memory):
+
+- `docs/architecture/powersync-account-devices.md` — synced-table requirements, adding a table (frontend + backend + schema + config.yaml + production), the PR flow, account deletion, device management.
+- `docs/architecture/powersync-sync-middleware.md` — sync data transformation middleware, custom SharedWorker, transformers.
+- `docs/architecture/e2e-encryption.md` — encrypted columns, key hierarchy, device approval.
+- The repo `CLAUDE.md` PowerSync section.
+
+## Scope
+
+- Review ONLY what changed in the PR. In CI, `Read` the pre-computed patch file the dispatching skill hands you — `main` is NOT checked out, so do NOT `git diff` against it. Locally with full history, `git diff` against the base is fine. Never flag pre-existing issues.
+- Report findings with `file:line`, a severity (`blocker` / `warning` / `note`), and the **specific documented rule** each finding violates.
+- You are read-only. Do NOT edit any file. End with a short PASS/CONCERNS verdict.
+
+## Checklist
+
+**1. Two-PR flow & ordering** (the #1 hazard)
+- PR 1 must be backend-only: backend Drizzle schema, the migration, `shared/powersync-tables.ts`, and the `config.yaml` sync rule.
+- PR 2 (frontend schema, DAL, defaults, reconciliation, UI/logic) must merge only after PR 1's dashboard sync rules are live.
+- **Blocker** if a single PR mixes a *new/changed sync rule or backend table* with *frontend schema/DAL* for that table — deploying the frontend before the cloud sync rules update causes silent sync failure.
+
+**2. Migration journal integrity**
+- For every new `backend/drizzle/*.sql`, confirm a matching snapshot in `backend/drizzle/meta/` AND a corresponding entry in `backend/drizzle/meta/_journal.json`. A missing journal entry means the migration never runs. This is easy to miss when cherry-picking migration files across branches.
+
+**3. Sync-rule & schema parity**
+- The table/columns must agree across: backend Drizzle schema, frontend Drizzle schema (PR 2), `shared/powersync-tables.ts`, and the `config.yaml` sync rule. Flag any column present in one but missing in another.
+- Remind the reviewer (note) that the PowerSync Cloud **dashboard rules must be updated manually** after PR 1's migration — code alone is not enough.
+
+**4. Encryption**
+- If the table carries sensitive data, verify encrypted-column configuration matches `docs/architecture/e2e-encryption.md`. Flag plaintext storage of data that should be E2E-encrypted.
+
+**5. Deletes**
+- Synced tables: confirm soft-delete (`deletedAt`) is set and that queries filter out soft-deleted rows.
+- Confirm account deletion / device removal still hard-deletes this table's rows where required (these are the sanctioned hard-delete paths). Flag a new synced table that account-deletion doesn't clean up.
+
+**6. SharedWorker / internal path**
+- If the diff bumps `@powersync/web`, verify the `powersync-web-internal` alias in `vite.config.ts` (→ `@powersync/web/lib/src`) still resolves and `ThunderboltSharedSyncImplementation` still extends `SharedSyncImplementation`. This can break without a TypeScript error.
+
+**7. Sync-classification consistency (sibling tables & half-synced state)**
+
+Getting a table's *classification* wrong — synced when it should be local-only, or stuck half-way — causes the same silent cross-device failure as a missing sync rule, and it passes local testing. For EACH table the PR adds or changes, compute its classification from the **three registration points**:
+- (a) listed in `shared/powersync-tables.ts` (`powersyncTableNames`)
+- (b) has a sync rule in `config.yaml`
+- (c) the `localOnly` flag in `src/db/powersync/schema.ts`
+
+A clean **SYNCED** table = (a) AND (b) AND NOT (c). A clean **LOCAL-ONLY** table = (c) AND NOT (a) AND NOT (b).
+
+- **Blocker** — a table in a **HALF state**: present in some of {a, b, c} but not a clean SYNCED or LOCAL-ONLY set. Examples: in `powersync-tables.ts` but no `config.yaml` rule; `localOnly:true` yet still listed in `powersync-tables.ts`; a `config.yaml` rule for a table the schema marks `localOnly`. This table neither syncs nor is cleanly local → silent failure.
+- **Warning** — a feature introduces **sibling tables** (`x` + `x_secrets`, `x` + `x_members`, a config table + its data table) with **different classifications** and the PR gives no reason. Flag it and ask: *"is `x` meant to sync but `x_secrets` not — is this intentional?"* The local-only-secrets / plaintext-synced-config split IS the intended THU-504/505/506 paradigm (secrets never cross the network; config replicates), so a split is often correct — but it must be a **deliberate** choice, not an oversight. Do not pass a sibling-classification difference silently.
+- Verify the **LOCAL-ONLY contract holds in full** (all four): `localOnly:true` in the schema; absent from `shared/powersync-tables.ts`; absent from `src/db/encryption/config.ts` `encryptedColumnsMap`; PK column literally named `id`. A local-only table failing any one of these is misclassified.
diff --git a/.claude/agents/react-effect-reviewer.md b/.claude/agents/react-effect-reviewer.md
new file mode 100644
index 000000000..c5ded4b6a
--- /dev/null
+++ b/.claude/agents/react-effect-reviewer.md
@@ -0,0 +1,44 @@
+---
+name: react-effect-reviewer
+description: Use to review React (.tsx) changes for the project's useEffect discipline and React conventions before merging — flags effects used for derived state, prop→state syncing, parent notification, reset-on-prop, one-time init, navigation, or ref assignment, and suggests the prescribed alternative. Also checks useReducer/state-hook and import conventions. Read-only — reports findings, does not edit.
+tools: Read, Grep, Glob, Bash
+---
+
+You are a specialized reviewer enforcing the Thunderbolt project's **React / `useEffect` discipline** as defined in `CLAUDE.md`. Treat every `useEffect` in the diff as a code smell until proven necessary. Cross-reference https://react.dev/learn/you-might-not-need-an-effect.
+
+## Scope
+
+- Review ONLY `.tsx`/`.ts` React code changed in the PR. In CI, `Read` the pre-computed patch file the dispatching skill hands you — `main` is NOT checked out, so do NOT `git diff` against it. Locally with full history, `git diff` against the base is fine. Never flag pre-existing effects you didn't see change.
+- Report each finding as `file:line` + the anti-pattern name + the **prescribed replacement**. Severity: `blocker` (clear anti-pattern) / `warning` (likely) / `note`.
+- Read-only. Do NOT edit. End with a PASS/CONCERNS verdict.
+
+## Anti-patterns to flag (never use `useEffect` for these)
+
+| Smell in the diff | Prescribed fix |
+|---|---|
+| Deriving state from props/state (setState in effect from props) | Compute during render: `const x = derive(props)` or `useMemo` |
+| Syncing a prop into state | Use the prop directly, or a ref to detect prop changes during render |
+| Notifying a parent of a state change | Call the callback in the event handler that caused the change |
+| Resetting state when a prop changes | `key` prop on the component, or a `useState` lazy initializer |
+| One-time init from already-available data | `useState(() => computeInitial())` |
+| Navigation side effect in an effect | Return `<Navigate replace />` in JSX |
+| Assigning to a ref in an effect | Assign `ref.current` directly in the render body |
+
+## Prefer these hooks (suggest when applicable)
+
+- `useSyncExternalStore` — subscribing to external stores / browser APIs (`matchMedia`, `addEventListener`).
+- `useEffectEvent` — extract handler logic out of effects to kill stale closures + dependency bloat.
+- `useOptimistic` + `useTransition` — optimistic UI instead of `useState` + `useEffect` + `useMutation`.
+- `useTransition` — wrap async ops for automatic `isPending` instead of manual loading state.
+- `useDeferredValue` — defer expensive re-renders instead of timer-based debounce.
+
+## Legitimate uses — do NOT flag these
+
+DOM event listeners with cleanup, external system subscriptions (WebSocket, SDK listeners), DOM measurements/scroll, timers with cleanup, analytics/tracking, async operations on mount.
+
+## Also check (project conventions)
+
+- 3+ `useState` in one component → recommend `useReducer`.
+- Component state/logic that could be a testable `use[Component]State()` hook.
+- Imports: direct (`useEffect`, not `React.useEffect`); top-level over inline `await import(...)` unless avoiding a circular dep.
+- Early return over nested conditionals; `const`/helper functions over `let` reassigned in branches.
diff --git a/.claude/skills/thunder-deep-review/SKILL.md b/.claude/skills/thunder-deep-review/SKILL.md
new file mode 100644
index 000000000..8830df943
--- /dev/null
+++ b/.claude/skills/thunder-deep-review/SKILL.md
@@ -0,0 +1,126 @@
+---
+name: thunder-deep-review
+description: Reviews a pull request, diff, or branch against Thunderbolt's house rules (no any, type-over-interface, arrow fns, const/early-return, no error-swallowing or defensive code, useReducer at 3+ useState, useEffect discipline, route code-splitting, soft-deletes, integration-agnostic logic) and the 80 architecture invariants. Goes beyond bug-finding to catch architecture/abstraction-layer mistakes, undocumented intent, convention drift, bloat/supply-chain, and readability. Use when reviewing a PR/diff/branch, when the user says review, code-review, deep-review, or "review before merge". Read-only: reports findings, never edits or posts to the PR.
+---
+
+# Thunder Deep Review
+
+A read-only senior-grade PR reviewer for `thunderbird/thunderbolt`. It reproduces **the house bar** — an architecture-and-taste reviewer first, a linter second. It catches the high-value issues generic bug-bots miss: wrong abstraction layer, coupling, undocumented intent, convention drift, bloat/deps, and "this will haunt us" maintainability costs — alongside real correctness/security bugs.
+
+> The "house bar" is a standard, not a person. Never name or impersonate any individual. Encode the bar, not the human.
+
+## Operating rules (low freedom — always)
+
+1. **Read-only.** Never edit files, never post to the PR, never use Write/Edit. Output a findings report only.
+2. **Diff-scoped.** Only flag lines that *changed* in this diff. Never flag pre-existing code you didn't see change (note it as context at most).
+3. **Codebase-aware.** Before asserting any *cross-file* claim (architecture, data, security), Read the surrounding/imported files (`src/dal/*`, `src/db/tables.ts`, `src/db/schema.ts`, `shared/powersync-tables.ts`, `powersync-service/config/config.yaml`, `src/app.tsx`, `CLAUDE.md`). The signature catches are invisible from the diff alone.
+4. **Verification bar.** Every behavioral claim must quote the exact `file:line` that proves it. If a claim rests on naming or an unconfirmed assumption, downgrade it to a question or drop it. Never infer behavior from a symbol name.
+5. **Never post to PR / never deploy.** Write findings to a review file or return them; nothing leaves the working tree.
+
+## Inputs
+
+- The diff: either a patch file path given to you, or run read-only `git diff main...HEAD` (or `gh pr diff <N>` if a PR number is given). Do not reconstruct the diff by hand.
+- The repo working tree (for cross-file context via Read/Grep/Glob).
+
+## Workflow (follow in order — copy this checklist)
+
+```
+[ ] 0. Map changed files → domains (frontend TS/React, backend, migrations, sync schema, config, tests, docs)
+[ ] 1. PASS A — SCAN (over-generate candidates, all categories)
+[ ] 2. Cross-file context expansion for architecture/data/security candidates
+[ ] 3. PASS B — FILTER (re-read code; keep only grounded, rule-cited findings)
+[ ] 4. Category recall floors (MUST≈100%, SHOULD≈75%)
+[ ] 5. Noise suppression (nit cap, skip lint/generated/lockfiles)
+[ ] 6. Self-validation gate (real id + real file:line + the bar would flag it)
+[ ] 7. Severity-rank, emit report + machine-readable trailer
+```
+
+### Execution model — category fan-out (CRITICAL for recall)
+**Do not review the whole diff in one mental pass — you will skim and miss the majority of issues, especially on large diffs.** Instead run the scan as **separate, exhaustive, category-scoped passes** (one focused pass per lens A–J below; when orchestrated, spawn one read-only sub-reviewer per lens in parallel, then merge). Each pass cares about ONE category and must walk **every changed hunk in every changed file** for that category — no sampling, no "representative" subset. For diffs over ~800 lines, process the file list in chunks and confirm you covered every file before emitting. The single most common failure is a short, shallow finding list on a big diff; treat fewer findings than changed-files as a red flag that you skimmed.
+
+**When you spawn ANY sub-reviewer** (a lane pass, a micro-specialist, or a domain subagent below), include the diff/patch file path you were given in its `Task` prompt and tell it to `Read` that file. A spawned subagent **cannot** reconstruct the diff itself — in CI `main` is not checked out, so `git diff main...HEAD` fails. The patch file you were handed is the single source of truth every worker reviews.
+
+### Enumerate-then-assess (CRITICAL for recall on mechanical checks)
+For the mechanical, countable patterns, **do not eyeball — first ENUMERATE every occurrence in the diff, then judge each one individually.** Holistic scanning silently drops the boring ones. Before emitting a lane, build the relevant list(s) and assess each entry:
+- Correctness: every changed `if (...)`/guard (is it always-true → dead?), every `||` used for a default (should it be `??`?), every `!`/`as` cast (value possibly undefined?), every `arr[0]`/index (array possibly empty?), every `?.` (is the LHS always defined → redundant?), every `.then(`/`.catch(` (should be async/await?), every `try/catch` (swallowing? empty?), every `=== 'literal'` (should be a range/set?), every numeric/`.length` used as a count.
+- Conventions: every **new identifier** (const/var/function/boolean) — check each against naming rules AND naming-quality (vague like `variables`/`data`? misleading like `bypassed` when it means disabled? boolean missing `is/should/has`? a name implying the wrong flow?); every `any`/`as any`; every ALL_CAPS frontend const.
+- Docs-intent: scan **every** changed comment, user-facing string, and doc line for typos, undocumented markers/magic constants, and stale/duplicated/incomplete docs.
+Enumeration is the difference between ~45% and ~80% recall — it is mandatory, not optional.
+
+### High-recall mode (multi-run union — optional, for important PRs)
+A single fan-out catches a strong but incomplete slice; independent runs miss *different* items. For high-stakes PRs, run the full category fan-out **2–3 times** (the specialists are stochastic) and **union** the findings, deduping only exact root-cause+location duplicates. Measured effect: a 2-run union lifts recall ~7–8 points over a single run. Default (cheap) mode is one run; reach for multi-run union when completeness matters more than cost.
+
+### Deep mode (micro-specialists — for the highest recall)
+The broad lenses (A–J) skim the *boring, enumerable* gaps. To close them, add **six narrow micro-specialists**, each a single exhaustive job, and **union** their findings onto the broad-lane + multi-run results. Measured effect: micro-specialists lift recall from ~52% to **~63%** (e.g. they newly catch dead guards, `.then/.catch`, capture-await, drop-unused-export, split-component, restructure-to-flat-tree). Each runs as its own read-only sub-reviewer:
+1. **dead-code** — every `if`-guard (provably always-true → dead?), unreachable branch, empty block, redundant `?.`, `||`-should-be-`??`, leftover no-op.
+2. **async-hygiene** — every `.then/.catch` (→async/await?), unawaited fire-and-forget, ignored resolved `{error}` (Better-Auth-style APIs don't throw), empty/swallowing catch, un-awaited capture/track.
+3. **naming-casing** — every NEW identifier: vague/misleading/wrong-flow/boolean-prefix + frontend ALL_CAPS↔camelCase and JSON-wire-value UPPER_CASE.
+4. **typo-docs** — spell-check every changed comment/string/doc line; JSDoc on every new exported util; stale/duplicated/incomplete docs; undocumented magic markers.
+5. **simplify-restructure** — hard-to-follow → flat decision tree; one-off guard/hook → fold into existing gate; repeated condition → named const (`isFullUser`); `deps`-object → flat; presentation embedded in fetch → split; over-engineered → simple lookup.
+6. **module-surface-dry** — export used only in-module → drop; cross-file helper not exported → export; per-file client/`from`-address/const → single shared source; list duplicated in two places.
+Narrow attention budgets force the enumeration that broad lanes skip. The remaining ~37% are mostly **subjective taste** (rename this var, restructure this block) and small-n volatility — diminishing returns beyond here.
+
+### Specialized subagent workers (dispatch by domain)
+For narrow, high-stakes domains the repo ships dedicated read-only reviewer subagents — **dispatch them in addition to the lanes when the diff touches their area** (they hold deeper domain rules than a general lane can):
+- **`powersync-sync-reviewer`** — invoke whenever the diff touches `shared/powersync-tables.ts`, `config.yaml` sync rules, backend/frontend Drizzle schema, `backend/drizzle/**` migrations, `src/db/powersync/**`, or a synced-table DAL/defaults/reconciliation. It verifies the two-PR deploy flow, `_journal.json` integrity, sync-rule/column parity, **sync-classification consistency** (synced vs local-only across sibling tables; half-synced/misclassified tables — a silent cross-device-failure class general lanes miss), encryption config, and hard-delete correctness.
+- **`react-effect-reviewer`** — invoke on `.tsx`/`.ts` React diffs for the full `useEffect`-discipline catalogue.
+Treat their `blocker` findings as blocking. These are the "narrow + durable + reusable" checks that correctly live as their own agent files, not as prose lanes.
+
+### 1. PASS A — SCAN (recall pass, over-generate)
+Walk **every changed hunk** (per the execution model). List **every** candidate concern — aim to surface the real issues a meticulous senior reviewer would raise, which is typically **one finding per ~60–120 changed lines**, not 3–5 for the whole PR. **Over-generate — do not self-censor; a missed issue costs far more than a candidate dropped in Pass B.** For each candidate record: `file:line`, one-line concern, suspected category, suspected severity. Run these category lenses over the diff (use `references/review-heuristics.md` trigger table + IF–THEN rules + the deep correctness checklist):
+
+- **A. Correctness / logic** *(largest issue bucket — go deep, trace data flow line-by-line)* — real bugs; nullable-sync issues; error-before-first-message; off-by-one / wrong-variable in a branch or payload (e.g. `new_position` and `old_position` both set to the same source); unreachable/dead branches & dead defensive guards (`if (x)` where `x` cannot be falsy here); missing throwing `default` on enum/db-type switches; `||` used for defaulting where `??` is needed (clobbers valid `0`/`''`/`false`); non-null `!` / unsafe cast on a possibly-undefined value; unguarded array index (`dns.lookup(...)[0]`, `.find()[0]`) on a possibly-empty array; redundant optional chaining on a value that's always defined (`.all()?.`, a `= []`-defaulted destructure); exact-equality where a range/set is meant (`=== '127.0.0.1'` misses the rest of `127.0.0.0/8`); recursion that can blow the stack / N+1 query where an iterative BFS or single query fits; non-deterministic ordering (sort on a non-unique key without a tiebreak); a function whose return type/contract changed (e.g. now returns a query builder instead of `Promise<T|null>`) breaking callers; `.then/.catch` where the codebase uses async/await; fire-and-forget async whose rejection is silently dropped (should be awaited). **See the deep correctness checklist in `references/review-heuristics.md` §3.**
+- **B. Architecture / abstraction & coupling** *(highest value)* — second route paralleling a canonical one (fold into the single `/chat`/`/inference` path so token-tracking/auth/observability centralize); logic branching on a specific model/provider/integration (push into a data column/config); DB guard logic outside the DAL; cross-cutting concern enforced per-route instead of at one point; reinventing an existing primitive; premature abstraction (hook/wrapper/util/config-object for 1–2 call sites) **and** missing abstraction (pattern recurs 3+ times); placeholder/temporary code not named as such.
+- **C. Conventions / house rules** — `references/house-rules.md` (TS + React + data). camelCase even for constants, no `let`, early-return over nested ifs, `@/` imports, `.test`+`bun:test`, direct imports.
+- **D. React patterns** — `useReducer` at 3+ `useState`; `useEffect` used for derived state / prop-sync / parent-notify / reset-on-prop / one-time-init / navigation / ref-assignment (see `references/house-rules.md` effect catalogue); reducer actions modeled as events not setters.
+- **E. Error handling** — caught error downgraded to `console.warn`/silent return on a path that shouldn't fail → "let it throw"; error/early-return branch with **no** logging → "at least `console.error`"; over-defensive guard on trusted data; `setTimeout`/`requestAnimationFrame` papering over a race/ordering bug. **Test exception:** a test file's `spyOn(console,'error')` for an intentionally-triggered error is the prescribed `R-SUPPRESSCONSOLE` pattern, NOT error-swallowing — never flag it here.
+- **F. Testability** *(see `references/testing-rules.md` for the full test standard)* — `mock.module`/`vi.mock` of a **shared** module = **blocker** (`R-NOMOCKSHARED`: leaks globally → #1 CI flake; use real impls + test DB/provider, don't just mock it better); mocking an internal collaborator = modularity smell → DI (`R-NOMOCK`/`R-DITEST`); backend tests should inject `database`/`fetchFn` via `createApp` + `createTestDb()` (`R-DITEST`); real waits / `vi.useFakeTimers()` instead of `getClock()` (`R-FAKETIMERS`); `.spec`/`vitest` (`R-TEST`); `as any`/`as unknown as`; branchy logic without unit tests. Note: a test's `spyOn(console,'error')` for an expected error is the prescribed pattern (`R-SUPPRESSCONSOLE`) — do NOT flag it as error-swallowing.
+- **G. Bundle / bloat / supply-chain** — non-lazy settings/admin/enterprise/OIDC route in `app.tsx`; new dependency (esp. parsers) / unpinned `^`; hardcoded list that will grow → derive it.
+- **H. Security / privacy** — endpoint without auth/session; denylist where allowlist fits; per-IP rate limit on an authed route; PII/owned-domain/real-IP/seeded-UUID in code; unescaped model output in HTML/widget attrs; hard-delete of user data (must soft-delete); non-nullable column on a synced table (+ two-PR deploy); migration missing `_journal.json` entry.
+- **I. Docs-intent / completeness / scope** — undocumented magic reference / missing rationale ("what is this constant/flag?"); incompleteness ("should the sibling cases be handled too?"); inconsistency with a sibling path; unrelated changes mixed in (split PR); leftover/dead code; no-value comments restating the next line.
+- **J. Readability / simplification** *(dedicated pass — high-frequency, easy to under-call)* — hard-to-follow branching that should be a **flat decision tree** ("has session → allowed; no session → branch on mode"); deep nesting that should be **early-return**; a repeated condition that should become a **named const** (`const isFullUser = isAuthenticated && !isAnonymous`); a `deps`-object that reads cleaner as **flat params**; an unnecessary wrapper/indirection; a presentation/display component that should be **split out** of its data-fetching component; over-engineered logic (e.g. dot-notation nested-path parsing) that should be a **simple flat lookup**; 5 lines that collapse to 1. Frame as "this is hard to follow — restructure as …".
+
+### 2. Cross-file context expansion
+For every A/B/G/H candidate, Read the relevant cross-file targets (DAL, schema, `config.yaml`, `shared/powersync-tables.ts`, `app.tsx`, `CLAUDE.md`, the importing/imported modules) and confirm the claim against actual code. Under-contextualized long functions are where false positives spike — expand context before asserting.
+
+### 3. PASS B — FILTER (precision pass, fresh re-read)
+For each Pass-A candidate, KEEP it only if it can **(1) quote the exact offending line** AND **(2) cite a specific house-rule id (`references/house-rules.md`) or invariant id (`references/architecture-invariants.md`)** — OR it is a concrete correctness/security/privacy bug with `file:line` evidence. Otherwise DROP. Do **not** run an open-ended "are there more bugs?" self-critique loop on already-clean code (it invents issues) — only validate the candidates from Pass A.
+
+### 4. Category recall floors (asymmetric)
+- **MUST / MUST-NOT rules → ≈100% recall** — never miss: `any`, error-swallowing, frontend hard-delete, `useEffect` for derived-state/prop-sync, PII in logs, unscoped/cross-tenant query, non-nullable synced column.
+- **SHOULD rules → ≈75% recall** — `useReducer` at 3+, helper extraction, JSDoc: report when clear, skip when marginal.
+- Prefer **one extra question over one missed blocker.**
+
+### 5. Noise suppression (mandatory — but never at the cost of a real rule-cited finding)
+- The cap applies **only to purely cosmetic nits that carry no `R-*`/`INV-*` id** (e.g. spacing/wording taste): surface at most 5, append "plus N similar". **Any finding that cites a real rule or invariant id is ALWAYS surfaced — never capped, collapsed, or dropped for volume.** A convention violation (camelCase, `let`, naming, JSDoc, deps-object, `.then/.catch`) and a docs-intent catch (undocumented marker/constant) are rule-grounded findings, not cosmetic nits.
+- **Skip** only what `/thundercheck` (eslint/prettier/tsc) mechanically auto-fixes (pure formatting/import-order), generated files, `*.lock`/`bun.lockb`, auto-generated Drizzle SQL, vendored deps. Do NOT skip naming, typing, error-handling, or structural conventions — those are not auto-fixed.
+- Lead with **"No blocking issues"** when true (but only after the full per-hunk scan — an empty list on a non-trivial diff almost always means you skimmed).
+
+### 6. Self-validation gate
+For each surviving finding verify: (1) cites a real rule/invariant id, (2) points to a real `file:line` in the diff, (3) the house bar would genuinely flag it (not a preference dressed as a rule). Drop any failing (1)–(3).
+
+### 7. Emit
+Rank by severity then `file:line`. Use the exact format in `assets/finding-template.md`, including the machine-readable trailer.
+
+## Severity & confidence (summary — full ladder in `references/severity-rubric.md`)
+
+Severity ⊥ confidence (two independent axes). Down-weight low confidence, **never silently drop it**:
+- **blocker / high-confidence** → direct statement + prescribed fix.
+- **warning / medium** → frame as a question ("intended?").
+- **note / low** → "flagging, no strong feelings."
+
+| Tier | Examples | Block? |
+|---|---|---|
+| **Blocker** | real correctness bug; error-swallowing on trusted path; frontend hard-delete; `useEffect` anti-pattern; PII in logs; unscoped query; non-nullable synced column; migration w/o journal entry; **"this will haunt us" maintainability/architecture cost** | Yes |
+| **Convention** | `any`; `interface` over `type`; `function` kw; `let` where const+early-return works; ALL_CAPS const; 3+ `useState`; non-lazy route; `.spec`/vitest | Soft |
+| **Nit / note** | cosmetic, no-value comment, numeric separators | No |
+| **Pre-existing** | issue already in base — context only, not a new blocker | No |
+
+## Reference files (read on demand — progressive disclosure)
+- `references/review-heuristics.md` — IF–THEN heuristics + the diff-signal → comment trigger table (the core review intelligence). **Read this for Pass A.**
+- `references/house-rules.md` — TS / React / data conventions with rule ids (`R-*`), incl. the full `useEffect` anti-pattern catalogue.
+- `references/testing-rules.md` — the test-file standard with rule ids (`R-NOMOCKSHARED`, `R-DITEST`, `R-FAKETIMERS`, `R-SUPPRESSCONSOLE`, `R-BUNTESTCWD`). **Read this whenever the diff changes a `*.test.ts(x)` file.**
+- `references/architecture-invariants.md` — all 80 invariants (`INV-01..INV-80`), titles indexed up top.
+- `references/severity-rubric.md` — the severity ladder, tone calibration, confidence rules.
+- `references/style-exemplars.md` — anonymized few-shot exemplars; match this register.
+- `assets/finding-template.md` — exact output format.
diff --git a/.claude/skills/thunder-deep-review/assets/finding-template.md b/.claude/skills/thunder-deep-review/assets/finding-template.md
new file mode 100644
index 000000000..d18db0e91
--- /dev/null
+++ b/.claude/skills/thunder-deep-review/assets/finding-template.md
@@ -0,0 +1,46 @@
+# Output Contract (exact format — low freedom)
+
+Emit findings in this shape. Never post to the PR; return the report or write it to a review file only.
+
+**Voice:** write every finding for a **teammate**, not a linter — warm, collaborative, curious. Lead with a question or first-person framing where it fits ("I wonder if…", "Could we…?", "Heads up —"). Be specific and kind; never robotic, terse, or scolding.
+
+**Rule/invariant ids are INTERNAL.** Cite a real `R-*`/`INV-*` id while grounding/keeping a finding (the verification bar + dedup use it), but it lives in the internal `rule` field — **never write rule-ids, section refs (`§`), or scaffolding (`INV-01`, `R-REDUCER`, "H mass-assignment…") into the human-facing problem/fix text.** They're meaningless to a person reading a PR comment.
+
+## Header (always)
+First line is a tally, then a one-line verdict:
+
+```
+Thunder Deep Review — <N> blocking, <N> convention, <N> nits
+Verdict: PASS | CONCERNS | BLOCK
+```
+Lead with **"No blocking issues."** when there are none.
+
+## Findings (grouped by severity: Blocking → Convention → Nits)
+Each finding is one tight unit:
+
+```
+- `path:line` — <problem, in friendly register> — <prescribed fix> (confidence: high|med|low)
+```
+(No rule-id in this line — it's human-facing; the id stays in the internal `rule` field.)
+
+- Blocking findings: clear, warm statement + the fix.
+- Convention findings: friendly and specific (the `R-*` id grounds it internally; don't show it).
+- Medium/low confidence: phrase the problem as a question.
+- Nit cap: at most 5 nits shown; if more, end the Nits group with `…plus <N> similar`.
+
+Optionally add a collapsed rationale for non-obvious findings:
+```
+  <details>why: <one-line>; verification: quoted offending line</details>
+```
+
+## Machine-readable trailer (always last line)
+For CI/orchestrator gating:
+
+```
+<!-- thunder-severity: {"blocking":<n>,"convention":<n>,"nit":<n>} -->
+```
+
+## Hard constraints
+- Read-only. No Write/Edit, no PR posting, no deploy.
+- Only changed lines. Pre-existing issues are context, never new blockers.
+- Every finding cites a real `file:line` and (for convention/architecture) a real `R-*`/`INV-*` id, or it is dropped.
diff --git a/.claude/skills/thunder-deep-review/references/architecture-invariants.md b/.claude/skills/thunder-deep-review/references/architecture-invariants.md
new file mode 100644
index 000000000..954d1b5cc
--- /dev/null
+++ b/.claude/skills/thunder-deep-review/references/architecture-invariants.md
@@ -0,0 +1,112 @@
+# Architecture Invariants (INV-01..INV-80)
+
+Global properties that survive across subsystems. Verify any change against the relevant ones; cite `INV-NN` when a finding rests on one. Condensed from the architecture deep-dive — read the full `01..10-*.md` docs for detail.
+
+## Index (titles)
+- Identity & Tenancy: INV-01..05
+- Trust & Auth: INV-06..13
+- E2E Encryption: INV-14..29
+- Sync Pipeline: INV-30..37
+- Persistence & Soft-Delete: INV-38..48
+- Backend Skeleton: INV-49..61
+- AI / Chat / MCP: INV-62..70
+- Frontend Devices & Reset: INV-71..74
+- Tauri Shell: INV-75..80
+
+## Identity & Tenancy
+- **INV-01** `user_id` always from JWT, never client payload; sync rules scope every row by `user_id = request.user_id()`. Cross-tenant leak structurally impossible.
+- **INV-02** `X-Device-ID` mandatory on every PowerSync token request + upload (400 `DEVICE_ID_REQUIRED`).
+- **INV-03** Device id is client-generated UUID v4 in `localStorage[thunderbolt_device_id]`; backend never mints.
+- **INV-04** Email stored lowercase (`normalizeEmail`); lookups must normalize.
+- **INV-05** PR-preview envs bypass auth (`shouldBypassWaitlist`).
+
+## Trust & Authentication
+- **INV-06** Bearer = `rawToken.HMAC_b64`; verified with `timingSafeEqual`; PowerSync path-2 verifies signature independently.
+- **INV-07** `tauri://localhost` always trusted origin; backend's own origin auto-added.
+- **INV-08** `accountLinking.trustedProviders=['sso']` only when authMode ∈ {oidc, saml}.
+- **INV-09** OTP sign-in requires `x-challenge-token` + existing-user-or-approved-waitlist (defense in depth).
+- **INV-10** OAuth integration secrets never leave backend (confidential-client proxy).
+- **INV-11** OAuth redirect URIs: exact CORS match + hardcoded app URL + loopback any-port; HTTPS-on-localhost rejected.
+- **INV-12** Tauri loopback ports fixed `[17421,17422,17423]`, shared Rust↔backend; PKCE per RFC 8252.
+- **INV-13** Post-update redirect flag set before relaunch; consumed once.
+
+## E2E Encryption (E2EE_ENABLED)
+- **INV-14** CK is AES-256-GCM, single algo/length everywhere.
+- **INV-15** Hybrid envelope V1 (`0x01`), exactly 1194 bytes.
+- **INV-16** Wrapping key one-shot, HKDF-SHA-256 over `ssEcdh||ssMlkem`.
+- **INV-17** CK never leaves device extractable (transient only at setup/recovery).
+- **INV-18** CK invalidation broadcasts via `BroadcastChannel('thunderbolt-ck-invalidation')`.
+- **INV-19** Codec encode fail-closed in steady state (`e2eeSetupComplete && !ck` throws).
+- **INV-20** Codec decode tolerant — passes through on missing CK / decrypt error (warns only; never breaks sync).
+- **INV-21** Trust transition atomic — upsertEnvelope + markDeviceTrusted in one tx.
+- **INV-22** Revoke atomic — deleteEnvelope + revokeDevice + revokeDeviceSessions in one tx.
+- **INV-23** Trusted device envelope overwrite-protected (only the device re-keys).
+- **INV-24** Approval requires canary proof AND trusted-caller.
+- **INV-25** Re-bootstrap requires old canary secret.
+- **INV-26** Canary hash compare constant-time.
+- **INV-27** Device limit 10 active/user, enforced in-tx.
+- **INV-28** First-device flow returns recovery key exactly once (in-memory).
+- **INV-29** Approve-device never touches local non-extractable CK.
+
+## Sync Pipeline
+- **INV-30** Empty transformer pipeline ⇒ byte-identical to `SqliteBucketStorage`.
+- **INV-31** Re-encoded payload always same wire format (JSON/BSON in = out).
+- **INV-32** Two sync paths — custom SharedWorker (Chrome/Edge/FF) vs main-thread transformer (Safari/iOS/Tauri).
+- **INV-33** `@powersync/web` `@internal` override via `powersync-web-internal` alias — brittle across upgrades; re-verify `generateStreamingImplementation()`.
+- **INV-34** HTTP streaming, not WebSockets (`SyncStreamConnectionMethod.HTTP`).
+- **INV-35** `fetchCredentials` returns `null` (not throw) on auth failure.
+- **INV-36** `uploadData` throws on failure, never `transaction.complete()` — PowerSync retries.
+- **INV-37** Error→reset mapping: 410→account_deleted; 403 DEVICE_DISCONNECTED→device_revoked; 409 DEVICE_ID_TAKEN→reset; 400 DEVICE_ID_REQUIRED→reset; 401→session_expired.
+
+## Persistence & Soft-Delete
+- **INV-38** Frontend NEVER hard-deletes — `deletedAt = nowIso()`. Hard delete only in account/device flows.
+- **INV-39** Backend prefers soft-delete; hard delete only for account deletion, PowerSync DELETE, device revoke.
+- **INV-40** `settings` is the only frontend table without `deletedAt` (only hard-delete in frontend DAL).
+- **INV-41** `deletedAt IS NULL` filtered at WHERE + partial indexes.
+- **INV-42** `clearNullableColumns` nulls every nullable non-PK/FK/unique col except `deletedAt`/`userId`.
+- **INV-43** Cascades hand-rolled in DAL (backend FKs intentionally omitted).
+- **INV-44** Insert-then-update upsert (PowerSync views lack `ON CONFLICT`).
+- **INV-45** `isInsertConflictError` unwraps `error.cause` (Drizzle wraps).
+- **INV-46** Composite-PK `(id, user_id)`/`(key, user_id)` for default-data tables.
+- **INV-47** `reconcileDefaults` runs AFTER `waitForInitialSync`.
+- **INV-48** `defaultHash` is the modification tracker — `update*` strips it; `reset*ToDefault` recomputes.
+
+## Backend Skeleton
+- **INV-49** Single Elysia tree at `/v1`; Better Auth uses `.all('/*')` not `.mount()` (mount bypasses rate limiting).
+- **INV-50** `safeErrorHandler` re-applied per plugin (Elysia confines `onError`).
+- **INV-51** Error envelope always `{success:false, data:null, error:reasonPhrase}`; internals never leak.
+- **INV-52** Domain error codes returned, not thrown (bypass `safeErrorHandler`).
+- **INV-53** PGLite default driver; Postgres only when `DATABASE_DRIVER=postgres`.
+- **INV-54** `runMigrations()` before `createApp`; discovered via `process.cwd()/drizzle`.
+- **INV-55** Migration `_journal.json` MUST include new entries (else migration never runs).
+- **INV-56** CORS allowed headers must include any new custom request header (else preflight fails silently). Browser-readable response headers go in `corsExposeHeaders`.
+- **INV-57** Settings parsed once + cached (`clearSettingsCache()` for tests).
+- **INV-58** Trusted proxy headers: `cf-connecting-ip`/`true-client-ip` only; XFF/Forwarded never trusted.
+- **INV-59** Rate-limit table shared across tiers; key prefix discriminates.
+- **INV-60** User-keyed rate limit must be `.use()`d INSIDE `guard({auth:true})`.
+- **INV-61** IP-keyed rate limit skips `'unknown'`.
+
+## AI / Chat / MCP
+- **INV-62** User's outbound message persisted before model invoked (crash-safe).
+- **INV-63** Partial assistant messages persisted ≤ every 200ms while streaming.
+- **INV-64** `stepCountIs(maxSteps)` is the only stop condition; final-step nudge disables tools.
+- **INV-65** Backend `/v1/inference/*` only accepts `stream:true`; privileged roles after `messages[0]` downgraded to user.
+- **INV-66** MCP traffic rides universal proxy `/v1/proxy`; proxy hop authed with session bearer, upstream MCP cred rides as `X-Proxy-Passthrough-Authorization`.
+- **INV-67** PostHog `privacy_mode===true` when configured; conversation content never in `$ai_input`/`$ai_output`.
+- **INV-68** MCP server config AND credentials are local-only (never synced/E2EE'd); `mcp_secrets` is bearer|oauth union.
+- **INV-69** Encryption mismatch fatal — `chatThread.isEncrypted !== selectedModel.isConfidential` throws synchronously.
+- **INV-70** Widget tags must be self-closing `<widget:name … />`; parser silently drops invalid widgets.
+
+## Frontend Devices & Reset
+- **INV-71** Single canonical reset: `clearLocalData()` → `window.location.replace(...)`.
+- **INV-72** Current-device-revoked dual-tracked (`powersyncCredentialsInvalid` event + React Query watcher).
+- **INV-73** Sign-in modal suppressed if revoked-modal already fired.
+- **INV-74** `hadDeviceOnceRef` distinguishes first-load from post-deletion sync wipe.
+
+## Tauri Shell
+- **INV-75** Window hidden until React mounts (`visible:false`).
+- **INV-76** Single instance per OS (focuses existing window).
+- **INV-77** `tauri-plugin-http` capability gated by `native_fetch` flag.
+- **INV-78** CSP `connect-src` explicit allow (backend 8000, Ollama 11434); `script-src 'self'`.
+- **INV-79** Backend proxy CSP `sandbox` — every proxied response sets sandbox + `Content-Disposition: attachment` + `nosniff` + CORP.
+- **INV-80** `createSafeFetch` IP-pins every redirect hop (DNS before connect; max 5 redirects).
diff --git a/.claude/skills/thunder-deep-review/references/house-rules.md b/.claude/skills/thunder-deep-review/references/house-rules.md
new file mode 100644
index 000000000..c74cad501
--- /dev/null
+++ b/.claude/skills/thunder-deep-review/references/house-rules.md
@@ -0,0 +1,49 @@
+# House Rules (CLAUDE.md / AGENTS.md) — with rule ids
+
+Cite the `R-*` id when surfacing a finding. Source of truth is `CLAUDE.md` (symlinked `AGENTS.md`) at repo root — read it if in doubt.
+
+## TypeScript & style
+- **R-NOANY** — never use `any` (incl. `as any`, `as unknown as`). Unsafe non-null `!` on possibly-undefined also flagged.
+- **R-TYPE** — prefer `type` over `interface`.
+- **R-ARROW** — prefer arrow functions over the `function` keyword.
+- **R-NOLET** — prefer `const` over `let`; extract a helper with early return instead of setting a `let` inside conditionals.
+- **R-CAMEL** — camelCase for consts and variables (yes, even module constants in frontend/TS). Backend JSON string values in responses may be ALL_CAPS (they are wire values, not TS identifiers) — do not flag those.
+- **R-EARLY** — prefer early return over long if / nested code.
+- **R-IMPORT** — direct imports (`useEffect`, not `React.useEffect`); `@/...` over deep relative paths; top-level imports over inline `await import(...)` unless a circular dep requires it.
+- **R-ASYNC** — prefer async/await over `.then/.catch`.
+- **R-JSDOC** — add JSDoc to new utility functions.
+- **R-COMMENT** — only comment non-obvious code; remove comments that restate the next line.
+- **R-NUMSEP** — numeric separators on large literals (`16_000`).
+- **R-ONEFILE** — loosely one React component per file.
+
+## React patterns
+- **R-REDUCER** — use `useReducer` when a component needs 3+ `useState`. Model reducer actions as **events** (`SEARCH_STARTED`), not setters (`SET_FOO`).
+- **R-STATEHOOK** — abstract state/logic into a `use[Component]State()` hook to separate computation from display and enable unit testing.
+- **R-EFFECT** — treat every `useEffect` as a smell until proven necessary. **Never** use an effect for:
+  - deriving state from props/state → compute during render or `useMemo`
+  - syncing props into state → use the prop directly, or a ref to detect prop change during render
+  - notifying parents of state changes → call the callback in the event handler
+  - resetting state when a prop changes → `key` prop, or `useState` lazy initializer
+  - one-time init from already-available data → `useState(() => compute())`
+  - navigation side-effects → return `<Navigate replace />` in JSX
+  - assigning to refs → assign `ref.current` in the render body
+  - **Prefer** `useSyncExternalStore` (external stores / browser APIs), `useEffectEvent` (extract handler logic), `useOptimistic`+`useTransition`, `useTransition`, `useDeferredValue`.
+  - **Legitimate** (keep): DOM listeners w/ cleanup, external subscriptions (WebSocket/SDK), DOM measurement/scroll, timers w/ cleanup, analytics, async-on-mount.
+- **R-LAZY** — keep the entry bundle small. New top-level routes default to `React.lazy(() => import(...))` unless on the chat/landing critical path. Static: Chat, layouts, small auth/error pages. Lazy: all settings/admin pages, secondary features, waitlist, SSO flows. Pair lazy imports with a content-area `<Suspense>`.
+- **R-VARCSS** — use standard Tailwind classes for properties with responsive theme overrides (`rounded-*`, spacing). Only use `var()` syntax for the custom variables without a Tailwind equivalent (`text-[length:var(--font-size-*)]`, `h-[var(--touch-height-*)]`, `size-[var(--icon-size-*)]`, `min-h-[var(--min-touch-height)]`).
+
+## Data, errors, architecture
+- **R-SOFTDEL** — **Frontend never hard-deletes.** Always soft-delete (`deletedAt = nowIso()`; call update APIs). Only exception: explicit account/device removal flows. Backend prefers soft-delete; hard delete only for account deletion, PowerSync DELETE ops, device revocation.
+- **R-ERRSWALLOW** — prefer optimistic over defensive code. Let errors surface loudly in development; don't wrap trusted calls in try/catch that swallows. Handle errors architecturally at higher levels. Distinguish: (a) swallowing a real error → let it throw; (b) an error branch with *no* log → add `console.error`.
+- **R-NODEFENSIVE** — don't add null-checks / guards against conditions that can't occur on trusted data.
+- **R-DAL** — DB logic lives in the DAL (`src/dal/*`), not inline in components/settings.
+- **R-HTTP** — use the app's `HttpClient` (`src/lib/http.ts`): `getHttpClient()` for authed backend calls, `http` for external APIs. No bare `fetch()`.
+- **R-MIGRATION** — generate Drizzle migrations with `bun db generate`, never hand-write SQL. Always verify `backend/drizzle/meta/_journal.json` includes the new entry (else it never runs). Never `bun db push` against prod.
+- **R-SYNCNULL** — new synced PowerSync columns must be nullable; adding a synced table is a two-PR deploy (backend schema + `config.yaml` sync rule + dashboard rules FIRST, frontend SECOND).
+- **R-CORS** — a new custom request header needs no CORS change (echo-back), but a browser-readable response header must be added to `corsExposeHeaders`.
+- **R-BUN** — use `bun` (not npm); `bun test` (not vitest); install latest (`bun add <pkg>@latest`).
+- **R-SIMPLE** — bias to tasteful simplicity; avoid over-engineering, premature optimization, and defensive patterns that obscure intent. Question and recommend alternatives.
+
+## Tests (summary — full standard in `references/testing-rules.md`)
+- **R-TEST** — tests live as `<file>.test.ts` next to source and use `bun:test`; never `.spec` files, never `vitest`.
+- **R-NOMOCK** — `mock.module()` of an internal/shared module is a modularity smell → prefer dependency injection (inject `httpClient`/`fetch`/`database`). Mocking is OK only for truly-external boundaries: external/auth/third-party APIs, browser APIs absent in the test env, and React Router hooks. Its stronger sibling is **`R-NOMOCKSHARED`** (a `mock.module` of a *shared* module is a blocker-tier CI-flake leak) — see `references/testing-rules.md`.
diff --git a/.claude/skills/thunder-deep-review/references/review-heuristics.md b/.claude/skills/thunder-deep-review/references/review-heuristics.md
new file mode 100644
index 000000000..34ae44a2b
--- /dev/null
+++ b/.claude/skills/thunder-deep-review/references/review-heuristics.md
@@ -0,0 +1,138 @@
+# Review Heuristics — the core review intelligence
+
+Apply these in **Pass A (Scan)**. Two parts: (1) a fast diff-signal → comment trigger table, (2) IF–THEN heuristics by category. Each heuristic, when it fires, must still pass the Pass-B verification bar (quote the line + cite a rule/invariant id) before it is surfaced.
+
+## 1. Diff-signal → comment trigger table (scan for these literal signals)
+
+| Diff signal | Likely finding | Rule/Cat |
+|---|---|---|
+| `const ALL_CAPS =` / `FOO_BAR:` keys in TS/frontend (incl. `*.config.ts`, `scripts/`, `e2e/`, `*.test.ts`) | "camelCase even for constants" | R-CAMEL |
+| `let x` then reassignment; `let isX=false; if(...) isX=true` | "avoid `let` — early-return + `const`" | R-NOLET |
+| `import … from '../../types'` (deep relative) | "use `@/types`" | R-IMPORT |
+| `.spec.ts` file; `from 'vitest'` | "use `.test` + `bun:test`" | R-TEST |
+| `mock.module(`/`vi.mock`/`jest.mock` of a SHARED module (`@/hooks/*`, `@/components/ui/*`, any app module other tests import) | "global mock leaks across files → #1 CI flake — DON'T mock shared modules; use real impls + test DB/provider" | R-NOMOCKSHARED (blocker) |
+| `mock.module(`/`vi.mock`/`jest.mock` of an internal collaborator | "mocking smell → DI; or justify" | R-NOMOCK |
+| partial `mock.module` of a shared module (missing exports) | "include EVERY export or it breaks the next test file" | R-NOMOCKSHARED |
+| real wait in a test (`await sleep`, real `setTimeout`, `vi.useFakeTimers()`) | "fake timers are global — advance via `getClock()` in `act()`" | R-FAKETIMERS |
+| `spyOn(console,'error')` in a test that triggers an expected error | OK — prescribed pattern, NOT error-swallowing (do not flag) | R-SUPPRESSCONSOLE |
+| backend route test that module-mocks the db/`fetch` instead of `createApp({ database, fetchFn })` + `createTestDb()` | "inject via DI; roll back in `afterEach` `cleanup()`" | R-DITEST |
+| Component with 3+ `useState(` | "switch to `useReducer`" | R-REDUCER |
+| useState + useEffect that syncs/derives from props/state | "derive during render, don't effect" | R-EFFECT |
+| `useEffect(` with no adjacent justifying comment | "justify the effect or remove" | R-EFFECT |
+| reducer with `SET_FOO`/`SET_BAR` actions | "actions = events (SEARCH_STARTED), not setters" | R-REDUCER |
+| Second backend route duplicating `/chat`/`/inference` per provider | "fold into the one canonical endpoint" | architecture |
+| `if (model === 'mistral')`, hardcoded `google`/`microsoft` in generic code | "stay integration-agnostic — data column/config" | architecture |
+| Hardcoded array listing every model / every table name | "list will grow & bloat — derive it" | bloat |
+| `React.lazy` missing on a settings/admin/enterprise/OIDC route in `app.tsx` | "lazy-load — bundle + attack surface" | R-LAZY |
+| New `dependencies` entry (esp. docx/pdf parsers); unpinned `^` | "supply-chain + bundle risk; pin" | bloat |
+| Backend route handler with no auth/session guard; denylist arrays | "require auth; allowlist > denylist" | security |
+| `rateLimit` keyed by IP on an authenticated route | "per-user, not per-IP" | security |
+| Literal IPs, owned domains, home paths, seeded UUIDs in migrations/tests | "use example.com / 1.1.1.1; regen UUIDs" | security |
+| Template-string building HTML/widget tags w/o escape; `sandbox="… allow-popups"` | "escape attrs; tighten sandbox" | security |
+| `db.delete()` / hard delete on user-data tables (frontend) | "must soft-delete (`deletedAt`)" | R-SOFTDEL |
+| Non-nullable column added to a synced table schema | "synced cols must be nullable; confirm two-PR deploy" | INV / sync |
+| Migration SQL added without `_journal.json` entry | "journal entry missing — migration never runs" | INV-55 |
+| DB query/mutation inline in a component/settings file | "move into `dal.ts`" | architecture |
+| `catch (e) { console.warn(...) }` and continue; OR error branch with no logging | "let it throw" / "at least `console.error`" | R-ERRSWALLOW |
+| `setTimeout`/`requestAnimationFrame` near init/race/flicker | "don't paper over the race; fix the branching" | error-handling |
+| New `use[X]()`/wrapper with 1–2 call sites | "premature abstraction — inline it" | simplicity |
+| Comment restating the next line (`// Save X` above `saveX()`) | "remove — self-explanatory" | R-COMMENT |
+| `as any` / `as unknown as`; non-null `!` on possibly-undefined | "avoid the cast / unsafe assertion" | R-NOANY |
+| switch/if over enum/db-type with no throwing `default` | "add throwing `default` — would silently fail" | error-handling |
+| Prompt strings the code then string-matches on exact wording | "brittle; breaks across languages/providers" | llm |
+| dev-env/CI/playwright/prettier churn intermixed with feature | "separate PR" | scope |
+| `bun.lock`/`package.json`/workflow churn unrelated to PR title | "intentional?" | scope |
+| Recomputed value a hook already provides (`?? 'http://localhost:8000/v1'` next to `useSettings()`) | "the hook already does this" | simplicity |
+| Numeric literal `16000` without separators | "use `16_000`" | R-NUMSEP |
+| Single-row setter/getter called in a loop | "plural batch variant — one SQL query" | perf |
+| `useQuery` result assumed complete (no pagination) | "anticipate scale; lazy single SQL lookup" | perf |
+| Hardcoded config (domain/cloudUrl/CORS) where app self-hosts | "should be env-driven" | architecture |
+| Verbose user copy ("This action cannot be undone.") | "trim to minimal clear phrasing" | ux |
+| Same list duplicated in a `.md` doc and source | "don't store twice — link" | docs |
+
+## 2. IF–THEN heuristics by category
+
+### Architecture & coupling (highest value)
+- IF a diff adds a route/endpoint paralleling an existing canonical one THEN ask if it folds into the existing endpoint so cross-cutting hooks (token tracking, auth, observability) stay centralized — flag the second path.
+- IF new logic branches on a specific model/provider/integration THEN push the knowledge into data (a models-table column, config object, or tool/system-prompt instruction) so it survives those becoming plugins.
+- IF DB validation/guard logic (soft-delete, `isSystem`) lives in a bespoke helper THEN move it into the DAL.
+- IF a cross-cutting concern (auth, prompt-injection sanitization) is enforced per-route THEN push it to the single enforcement point.
+- IF a general-purpose function gains feature-specific logic THEN flag the coupling; generalize or move it out.
+- IF new code reimplements an existing primitive (memoize, platform helpers, debounce, `useQuery`, dayjs, a localStorage hook) THEN redirect to the existing one.
+- IF placeholder/temporary code lands THEN require it be named temporary and point at the eventual home.
+
+### Abstraction altitude
+- IF a new hook/wrapper/util/config-object wraps a single call site or trivial values THEN challenge as premature — inline it ("the value doesn't justify the complexity").
+- IF the same expression/logic recurs 3+ times THEN extract a named helper — but only on genuine recurrence.
+- IF code is "janky/funky/hacky" (mutating a function after it runs, re-throwing to control flow) THEN flag it as a smell and ask for the clean structure.
+
+### Error handling
+- IF an error is caught and downgraded to `console.warn`/silent return on a path that shouldn't fail THEN "let it throw so we can fix it" (anti-defensive).
+- IF an error/early-return branch has NO logging THEN require at least `console.error`/`console.warn` ("some future debugger gets burned"). (Distinguish from the swallow case.)
+- IF a switch/if over an enum/db-type has no throwing `default` THEN flag silent-failure on a future variant.
+- IF a race/ordering/flicker bug is "fixed" with `setTimeout`/`rAF`/extra state THEN reject as masking the real branching bug.
+- IF code is defensive against an impossible condition on trusted data THEN call it overly defensive; prefer loud failure, fix upstream.
+
+### Testability — full standard in `references/testing-rules.md`
+- IF a test `mock.module()`s a **SHARED** module (`@/hooks/*`, `@/components/ui/*`, or any app module other tests import) THEN it is a **blocker** (`R-NOMOCKSHARED`): the mock leaks globally across every test file in the worker — the #1 cause of CI failures that pass alone but fail together (`Export named 'X' not found`, `undefined is not an object`). Don't tell them to mock it better — tell them to **stop mocking shared modules** and use real impls + a test DB/provider. This is NOT just a DI taste nit; press it as a CI-flake blocker.
+- IF a test partial-mocks a shared module (omits some exports) THEN flag it (`R-NOMOCKSHARED`): a missing export breaks the next test file — include EVERY export if a shared-module mock is truly unavoidable.
+- IF a test relies on heavy mocking (esp. `mock.module` of internal collaborators) THEN modularity smell → DI (`R-NOMOCK`/`R-DITEST`), or at minimum a comment why the mock is unavoidable. Mocking is OK only for truly-external things: external/auth/third-party APIs, browser APIs absent in the test env, and React Router hooks — do NOT flag those.
+- IF a backend route test module-mocks the db or `fetch` THEN redirect to DI (`R-DITEST`): inject via `createApp({ database, fetchFn })` + `createTestDb()`, and roll back in `afterEach` via `cleanup()`.
+- IF a test uses a real wait (`await sleep`, real `setTimeout`) or re-installs timers (`vi.useFakeTimers()`) THEN flag (`R-FAKETIMERS`): fake timers are installed globally — advance time via `getClock()` (`tickAsync`/`runAllAsync`) inside `act()`.
+- IF a test intentionally triggers an error THEN it should `spyOn(console,'error').mockImplementation(() => {})` in `beforeAll` (`R-SUPPRESSCONSOLE`). **Do NOT flag this as error-swallowing** — it is the OPPOSITE of the production `R-ERRSWALLOW` rule; in a test it is the prescribed pattern, not a smell.
+- IF branchy logic (parsers, `prepareStep`, reconciliation, `getOrCreate`) lacks unit tests THEN request tests; suggest extracting a pure function.
+- IF a test uses `.spec`/`vitest` or `as any`/`as unknown as` THEN flag (`R-TEST`/`R-NOANY`).
+
+### Correctness
+- IF you find a genuine bug (nullable-sync, error-before-first-assistant-message, off-by-one branch) THEN label it explicitly "real bug" and distinguish from nits.
+- IF logic depends on the LLM emitting exact wording THEN flag brittleness (non-English/cross-provider); prefer a structural fix.
+
+### Security / privacy / data / perf
+- IF a backend endpoint is added without auth/session THEN ask whether it needs a session; prefer allowlist over denylist.
+- IF rate-limiting is per-IP on an authed route THEN suggest per-user.
+- IF code embeds PII, an owned domain, a real IP, or seeded UUIDs THEN flag (use `example.com`, `1.1.1.1`, regenerate UUIDs).
+- IF model/external output is interpolated into HTML/widget attrs THEN require escaping; tighten over-broad `sandbox`.
+- IF a delete touches user data THEN require soft-delete unless it is explicit account/device removal.
+- IF a new synced PowerSync column is non-nullable THEN flag sync-breaking; confirm the two-PR deploy.
+- IF a single-row setter/getter is called in a loop THEN suggest a plural batch variant.
+- IF a `useQuery` result is assumed complete THEN anticipate scale; flag missing pagination.
+- IF config is hardcoded but the app self-hosts/enterprise/standalone THEN ask whether it should be env-driven.
+
+### Docs-intent / completeness / scope
+- IF a magic constant/flag/reference appears with no explanation THEN ask what it is / require a justifying comment (the "what is this?" catch).
+- IF a change handles one case but sibling cases exist THEN ask "should the others be handled too?" (completeness).
+- IF a path is inconsistent with a sibling path THEN flag the inconsistency.
+- IF a PR mixes unrelated changes THEN recommend splitting; call out incidental lockfile/version churn ("intentional?").
+- IF a real-but-out-of-scope concern surfaces THEN defer with a follow-up (Linear) rather than blocking.
+
+## 3. Deep correctness checklist (run line-by-line on every changed hunk)
+
+Correctness is the largest bucket of real findings and the easiest to skim past. For each changed hunk, trace the data flow and check **every** item:
+
+- **Wrong/duplicated variable in a payload or branch** — e.g. an event sends `new_position` and `old_position` both from `updates[0].order` (delta always 0); a copy-pasted object reuses the wrong field.
+- **Defaulting operator** — `a || b` where `a` can be a *valid* `0`/`''`/`false` → should be `a ?? b`.
+- **Unsafe assertion** — `x!` or `x as T` on a value that can be `undefined`/`null` here.
+- **Unguarded index** — `arr[0]` / `lookup(...)[0]` without checking the array is non-empty (`dns.lookup`, `.filter()[0]`, query `.rows[0]`).
+- **Redundant optional chaining** — `?.` on something always defined (`.all()` always returns an array; a `= []`-defaulted destructure) → dead/ misleading code.
+- **Exact-equality where a range/set is meant** — `=== '127.0.0.1'` misses `127.0.0.0/8`; a single host check that should be CIDR/range; an enum compared to one literal where several apply.
+- **Dead / unreachable branch & dead defensive guard** — `if (value)` / `if (x)` where `x` cannot be falsy at that point; a branch whose condition is impossible (`isAnonymous && !isAuthenticated`).
+- **Missing throwing `default`** — `switch`/if-chain over an enum or db-type with no `else`/`default` that throws → silent failure on a future variant.
+- **Missing-guard on external/IO result** — DNS, fetch, db row, JSON parse used without handling the empty/error case.
+- **Body/size/pagination limits** — a proxy/handler that reads a request body or `useQuery` result with no size cap / pagination → unbounded.
+- **Return-contract change** — a function that changed from `Promise<T|null>` to a raw query builder (or dropped an `await`) → callers now mishandle it.
+- **Non-deterministic ordering** — `ORDER BY` / `.sort()` on a non-unique key with no tiebreak.
+- **Recursion / N+1** — recursive id/descendant collection that can blow the stack or fire N queries → prefer iterative BFS / one query.
+- **Async hygiene** — `.then/.catch` where the codebase uses async/await; fire-and-forget async whose rejection is silently dropped (await it, or it's an error-swallow).
+- **Brittle string/keyword matching** — logic that depends on the LLM/user emitting exact wording (breaks across languages/providers); an over-broad regex (a generic self-closing-tag matcher that also eats real HTML).
+- **Hardcoded-true / redundant property** — a field set to a constant (`has_coordinates: true`) that's always that value → redundant; flag it.
+- **Token/length confusion** — a `token_count` computed from `.length` (characters), not a tokenizer → wrong; rename or remove.
+
+## 4. Docs-intent, naming & micro-quality (high-frequency, easy to skim past)
+
+- **Undocumented marker / magic reference** — an internal tracking code or magic string leaked into a comment or constant with no explanation (`external-6`, `external-14`, `M3`, `M4`, a bare flag). Ask "what is this?" — one finding per distinct marker.
+- **Typos in user-facing or doc strings** — e.g. "privacy police" → "privacy policy".
+- **Docs placement / completeness** — telemetry/feature docs buried in README that belong in a dedicated file; a "how to add X" checklist that omits a now-required step (update the doc, add the telemetry); a list duplicated in a doc and source.
+- **Naming** — boolean not prefixed `is/should/has`; a vague name (`variables` → `updated`); a misleading word (`bypassed` where "disabled" is meant); a name implying the wrong flow (`confirmation` email vs `joined-waitlist`).
+- **Unnecessary wrapper / indirection** — a one-line function wrapping `import.meta.env.X` → reference it directly; a `deps` object bundling args that read cleaner flat.
+- **Safety defaults** — a permissive flag (`AUTH_ALLOW_ANONYMOUS`) that should default to `false` in code, not just in `.env.example`.
+- **Single-source config** — a hardcoded `from` address / client constructed per-file that should live once in a shared module and be imported.
diff --git a/.claude/skills/thunder-deep-review/references/severity-rubric.md b/.claude/skills/thunder-deep-review/references/severity-rubric.md
new file mode 100644
index 000000000..c8317862a
--- /dev/null
+++ b/.claude/skills/thunder-deep-review/references/severity-rubric.md
@@ -0,0 +1,33 @@
+# Severity Rubric, Tone & Confidence
+
+Two **independent** axes: **severity** (how much it matters) ⊥ **confidence** (how sure you are). Map them to register; never silently drop a low-confidence item — down-weight it to a question.
+
+## Severity ladder (hand the reader the tier in plain words)
+
+| Tier | When | Signature register | Block? |
+|---|---|---|---|
+| **Praise** | genuine taste signal | "Nice, very clean", "Good call" | — |
+| **Nit** | style/cosmetic | "Nitpick: …", "minor", "this might be nitpicky" | No |
+| **Non-blocking idea** | suggestion, no strong opinion | "just an idea", "thinking out loud", "safe to ignore I think?", "ok merging as-is if not trivial" | No |
+| **Convention** | house rule, terse | "These should be camelCase", "no `let` here", "use `{ }`" | Soft-firm |
+| **Real bug / concern** | genuine correctness | "This is a real bug", "this looks real" | Yes |
+| **Future-pain (architectural)** | silent maintainability cost | "this will come back to haunt us", "painful someday", "some future debugger gets burned", "this is a code smell" | Yes (even if nothing breaks today) |
+| **Hard block** | won't hold up | "I don't think this approach holds up", "we need these columns nullable" (imperative + *need*) | Yes |
+
+## Confidence → register
+- **High** → direct statement + prescribed fix.
+- **Medium** → question form ("intended?", "shouldn't we…?", "was this intentional?").
+- **Low** → "flagging, no strong feelings — drop if I'm wrong."
+
+## Emission rules
+1. Lead with a question or first-person framing; questions do real work (Socratic challenge, info-seek, polite directive, scope/consistency probe).
+2. Attach an explicit severity word so the author knows act-now vs later.
+3. For non-blockers, grant explicit permission to merge as-is.
+4. For out-of-scope concerns, defer with a follow-up rather than block.
+5. Distinguish nit from real bug explicitly — never let a cosmetic note and a correctness bug read at the same volume.
+6. Convention nits stay terse; may repeat across files (consistency over padding) but count once toward the nit cap.
+7. Reserve "haunt us / painful someday / code smell" for **architectural & maintainability** blockers — the strongest non-correctness register; don't spend it on cosmetics.
+8. Anti-defensive (let-it-throw) and anti-mocking (DI) are pressed hard even when nothing is broken — the two strongest standing positions.
+
+## High-confidence AI-authored "tells" (pounce on these)
+Setter-style reducer actions; `setTimeout`/`rAF` race "fixes"; auto-added obvious comments; `as any`; randomly-generated UUIDs/domains possibly from training data; growing hardcoded enumerations; defensive try/catch swallowing; reinventing an existing primitive.
diff --git a/.claude/skills/thunder-deep-review/references/style-exemplars.md b/.claude/skills/thunder-deep-review/references/style-exemplars.md
new file mode 100644
index 000000000..6d2576339
--- /dev/null
+++ b/.claude/skills/thunder-deep-review/references/style-exemplars.md
@@ -0,0 +1,51 @@
+# Style Exemplars (anonymized) — match this register
+
+Few-shot exemplars distilled from real held-out reviews, paraphrased and de-identified. They encode the **bar**, not any person. Format: signal → finding (in-register) → severity. Match the voice: lead with a question/first-person framing, name the tier, prescribe the fix.
+
+1. **Signal:** a new backend route proxies one provider in parallel to the existing `/chat`/`/inference` path.
+   **Finding:** "I'd think of this as another model provider on the existing chat endpoint rather than a separate backend integration — that's where we'd centralize token tracking later. Can the encrypted body ride the one canonical endpoint instead? Prefer a single chat path for all providers." → *Future-pain / architecture (block).*
+
+2. **Signal:** code references `M4` / `external-6` / a bare flag with no explanation.
+   **Finding:** "What is `M4` here? There's no rationale in the code or nearby — can we name it or add a comment so the intent survives?" → *Docs-intent (warning, question).*
+
+3. **Signal:** a hardcoded array enumerating every model ever shipped, used to filter defaults.
+   **Finding:** "Over time this list gets really long and bloats the bundle — hate to grow it. Could we derive it instead (e.g. remove any unedited system model not in the current defaults) rather than enumerating removed ones?" → *Bloat (block-ish, architectural).*
+
+4. **Signal:** `const FOO_BAR = …` / ALL_CAPS keys in a frontend `.ts`/test file.
+   **Finding:** "These should be camelCase to match the codebase — the ALL_CAPS convention only applies to wire/JSON string values, not our TS identifiers." → *Convention (R-CAMEL).*
+
+5. **Signal:** `try { ok() } catch (e) { console.warn(e) }` around a trusted call.
+   **Finding:** "Let's not swallow this — prefer letting it throw so we catch it in dev. Swallowed errors tend to come back to haunt us." → *Error-handling (R-ERRSWALLOW, block).*
+
+6. **Signal:** a component with 4+ `useState`.
+   **Finding:** "This is a lot of `useState`s — would a `useReducer` make it less bug-prone and easier to follow?" → *React (R-REDUCER, convention).*
+
+7. **Signal:** a test leans on `mock.module(...)` of an internal collaborator.
+   **Finding:** "I'd love to avoid mocking our own modules here — it's usually a sign the code wants dependency injection. Could we inject the dependency instead, or at least note why the mock is unavoidable?" → *Testability (R-NOMOCK).*
+
+7a. **Signal:** a new test file does `mock.module('@/hooks/use-settings', () => ({ useSettings: () => ({ … }) }))` (a shared module).
+    **Finding:** "This is a blocker — `mock.module()` on a shared module like `@/hooks/use-settings` leaks globally to every test file in the worker. It'll pass alone and then fail in CI when run together (`undefined is not an object`, `Export named 'X' not found`) — the #1 CI-flake here. Don't mock it better; drop the mock and use the real hook with a test DB/provider (`setupTestDatabase` + `createTestProvider`)." → *Testability (R-NOMOCKSHARED, block).*
+
+7b. **Signal:** a test mocks `@/components/ui/dialog` but lists only `Dialog`/`DialogContent`, omitting `DialogFooter`.
+    **Finding:** "If we truly must mock a shared module, it has to include EVERY export — a missing `DialogFooter` here will crash the next test file that imports it (`Export named 'DialogFooter' not found`). Better: don't mock the shared UI module at all." → *Testability (R-NOMOCKSHARED, block).*
+
+8. **Signal:** a one-off `useAnonymousSessionGuard` hook gates auth alongside the existing `AuthGate`.
+   **Finding:** "This feels like a footgun — anonymous auth is just another auth type, so I'd fold it into `AuthGate` rather than a separate guard/hook. If someone later removed the other gate they might not realize this silently creates users. Could we do it imperatively in the auth logic?" → *Architecture / footgun (block).*
+
+9. **Signal:** a branch handles `isAnonymous && !isAuthenticated`.
+   **Finding:** "How is it possible for a user to be anonymous *and* not authenticated? This branch looks unreachable — can we remove it or correct the condition?" → *Correctness / dead code (warning).*
+
+10. **Signal:** a switch adds one new model/provider case; siblings exist nearby.
+    **Finding:** "Should the other three models be added here too? Want to make sure this isn't half-applied." → *Completeness (question).*
+
+11. **Signal:** a flicker/ordering bug is patched with `setTimeout(() => …, 0)`.
+    **Finding:** "I'm skeptical of the `setTimeout` here — it papers over an ordering bug rather than fixing it. Does fixing the branching remove the need for the timer?" → *Error-handling (block).*
+
+12. **Signal:** a frontend flow calls `db.delete()` on a user-data row.
+    **Finding:** "We always soft-delete on the frontend — set `deletedAt` instead of hard-deleting, unless this is an explicit account/device removal." → *Data (R-SOFTDEL, block).*
+
+13. **Signal:** a `deps` object bundles two collaborators passed into a function/hook.
+    **Finding:** "The `deps` object feels heavier than needed — deps are just inputs; I'd keep them flat as plain args, it reads easier." → *Readability (nit/idea).*
+
+14. **Signal:** a non-nullable column is added to a synced PowerSync table.
+    **Finding:** "We need these synced columns nullable — a non-nullable add breaks sync for existing rows. Also confirm the two-PR deploy (backend + `config.yaml` rules before the frontend)." → *Hard block (R-SYNCNULL).*
diff --git a/.claude/skills/thunder-deep-review/references/testing-rules.md b/.claude/skills/thunder-deep-review/references/testing-rules.md
new file mode 100644
index 000000000..8cf1c948e
--- /dev/null
+++ b/.claude/skills/thunder-deep-review/references/testing-rules.md
@@ -0,0 +1,19 @@
+# Testing Rules — with rule ids
+
+Cite the `R-*` id when surfacing a test-file finding. Source of truth is the two testing docs: `docs/development/testing.md` (frontend/general) and `backend/docs/testing.md` (backend). Read them if in doubt.
+
+These rules apply to **test files** (`*.test.ts`, `*.test.tsx`). They are the standard a senior reviewer holds tests to — distinct from, and sometimes the **opposite of**, the production house rules (`house-rules.md`). The most important inversion is `R-SUPPRESSCONSOLE` vs `R-ERRSWALLOW` — see that rule.
+
+## The rules
+
+- **R-NOMOCKSHARED** *(blocker)* — **Never `mock.module()` a SHARED module.** Bun's `mock.module()` creates **global, persistent mocks that leak across test files** for ALL tests in the same worker — the **#1 cause of CI failures that pass individually but fail when run together**. Shared modules include `@/hooks/*`, `@/components/ui/*`, and any app module imported by other tests. Symptom errors in CI (not locally): `SyntaxError: Export named 'X' not found in module`, `TypeError: undefined is not an object`, `TypeError: X is not a function`. The fix is **not** "mock it better" — it's **don't mock shared modules at all**: use real implementations with a test DB/provider (`setupTestDatabase`/`createTestProvider` on the frontend; see `R-DITEST`). Mock ONLY truly-external things: external/auth/third-party APIs, browser APIs absent in the test env (e.g. `window.location.reload`), and React Router hooks when testing navigation. **If you absolutely must mock a shared module, you must include EVERY export** (a partial mock drops an export and breaks the next test file that imports it). This is a **blocker**, not a taste nit — a shared-module `mock.module` is a latent CI flake even if it passes today.
+
+- **R-DITEST** *(should)* — **Prefer dependency injection over mocking.** Inject collaborators instead of `mock.module`-ing them.
+  - **Frontend:** inject `httpClient`/`fetch` as a defaulted parameter. ✅ `export const checkInbox = async (params, httpClient: HttpClient = ky) => { … httpClient.get(…) }` — ❌ `mock.module('ky', () => ({ … }))`. For component tests, use real impls behind a test provider (`createTestProvider()`) and a test DB (`setupTestDatabase`/`resetTestDatabase`/`teardownTestDatabase` from `@/dal/test-utils`).
+  - **Backend:** use the PGlite in-memory DB. `createTestDb()` (from `@/test-utils/db`) returns `{ client, db, cleanup }`; the single PGlite instance is reused and migrations run once in preload (`bunfig.toml`). Inject the test DB via `createApp({ database })` and external calls via `createApp({ fetchFn })` (the `AppDeps` type). Each test runs in a transaction that you **roll back via `cleanup()` in `afterEach`** for isolation. `globalThis.fetch` is mocked in preload to **throw** if a test calls it without DI — so never module-mock the db or fetch; inject them.
+
+- **R-FAKETIMERS** *(should)* — **Fake timers are installed globally** (auto-installed before each test, uninstalled after) so tests run fast and deterministically. **Never use real waits** (`await sleep(...)`, real `setTimeout`) and **never re-install timers** (`vi.useFakeTimers()`). To advance time, use `getClock()` from `@/testing-library` inside `act()`: `await act(async () => { await getClock().tickAsync(300) })` for a specific duration, or `getClock().runAllAsync()` to settle all pending timers (also speeds up HTTP libs with retry logic like `ky`).
+
+- **R-SUPPRESSCONSOLE** *(should)* — **Suppress EXPECTED console errors in tests.** For a test that intentionally triggers an error, silence the noise with `spyOn(console, 'error').mockImplementation(() => {})` in `beforeAll`. **CRITICAL — this is the OPPOSITE of the production `R-ERRSWALLOW` rule, and must NOT be flagged as error-swallowing.** In production code an error branch with no log earns a "add `console.error`" finding (`R-ERRSWALLOW`); in a **test** that deliberately provokes an error, `spyOn(console,'error')` is the prescribed, correct pattern. The error-handling lane must never treat a test's `spyOn(console,'error')` as a swallow. (It IS still worth a note if a test spies on `console.error` but does **not** actually trigger an expected error — that hides real failures.)
+
+- **R-BUNTESTCWD** *(nit)* — **Don't run `bun test` from the repo root.** Bun's positional args are **substring filters, not paths**, so a filter like `src/` also matches `backend/src/...` and pulls in backend tests. Use the `test` script (`bun run test`), which scopes discovery via `bun test --cwd=src` and runs `shared/*.test.ts` and other out-of-tree paths by explicit path. (Tests live as `<file>.test.ts` next to source and use `bun:test`, not `vitest` — see `R-TEST`/`R-BUN`.)
diff --git a/.github/scripts/review-orchestrator.mjs b/.github/scripts/review-orchestrator.mjs
new file mode 100644
index 000000000..53f20b5e3
--- /dev/null
+++ b/.github/scripts/review-orchestrator.mjs
@@ -0,0 +1,1171 @@
+#!/usr/bin/env node
+
+/* This Source Code Form is subject to the terms of the Mozilla Public
+ * License, v. 2.0. If a copy of the MPL was not distributed with this
+ * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
+
+// =============================================================================
+// review-orchestrator.mjs
+//
+// Deterministic control loop for the advisory "thunder-deep-review" bot.
+//
+// WHY THIS EXISTS:
+//   The workflow must NOT delegate orchestration to the model's tool-calls.
+//   All GitHub I/O — polling A's bots, building the skip-list, computing the
+//   resolved-diff, and posting the INLINE PR REVIEW — happens HERE, in
+//   deterministic code. The model is invoked separately and emits STRUCTURED
+//   JSON FINDINGS ONLY. This script reads that JSON, validates it, maps each
+//   finding to a diff position, and posts it as one PR review with N inline
+//   comments anchored to file:line (Bugbot-style).
+//
+// EXECUTION MODEL (two phases, see thunder-deep-review.yml):
+//   Phase "pre":   poll A's bots  -> write skip-list + diff to files for the model step.
+//   Phase "post":  read model JSON -> dedup vs our own open threads -> resolve
+//                  our own fixed threads -> POST ONE PR REVIEW (event=COMMENT)
+//                  with the NEW findings as inline comments.
+//   Run as:  node review-orchestrator.mjs <pre|post>
+//
+// FAIL-SOFT (advisory): on ANY unrecoverable error we log and
+//   process.exit(0) WITHOUT posting. A missing review is acceptable; a broken
+//   or duplicate review is not. Next push retries.
+//
+// ZERO npm DEPENDENCIES: Node >=18 built-in global `fetch` only. No Octokit.
+//
+// ───────────────────────────────────────────────────────────────────────────
+// IDENTITY: the orchestrator posts via the default GITHUB_TOKEN, so its review
+//   comments are authored as `github-actions[bot]` — THUNDER_BOT_LOGIN resolves
+//   to that (set in thunder-deep-review.yml). No separate GitHub App is needed.
+//   The model-findings JSON path/contract is fixed via the THUNDER_*_FILE env
+//   vars set in the workflow (see below).
+// RESOLVED: EXPECTED_BOTS Cursor Bugbot app.id (1210556).
+// =============================================================================
+
+import { readFile, writeFile } from 'node:fs/promises';
+import { createHash } from 'node:crypto';
+
+// -----------------------------------------------------------------------------
+// CONFIG / CONSTANTS
+// -----------------------------------------------------------------------------
+
+/**
+ * EXPECTED_BOTS — the external "Job A" SaaS reviewers we sequence behind.
+ *
+ * Identify bots by NUMERIC `app.id`, NOT by login strings or
+ * `app.slug` — logins ("cursor[bot]") and slugs drift, and a single app can
+ * spawn multiple check-runs. The numeric id is stable.
+ *
+ * To find an app id: open a check-run that bot produced and GET its
+ * `/repos/{owner}/{repo}/check-runs/{id}`, then read `.app.id`. Or hit
+ * `/repos/{owner}/{repo}/commits/{sha}/check-runs` and read each run's `.app.id`.
+ *
+ * `loginHint` is ONLY used as a secondary, best-effort filter for review
+ * COMMENTS (the comment author is a login, not an app id — see buildSkipList).
+ *
+ * Scope (Job A): Cursor Bugbot only. GitHub Advanced Security and CodeRabbit
+ * are out of scope; add either later by registering its app.id below.
+ */
+const EXPECTED_BOTS = [
+  // Cursor Bugbot — real numeric GitHub App id (slug "cursor", name "Cursor Bugbot").
+  { name: 'cursor-bugbot', appId: 1210556, loginHint: 'cursor[bot]' },
+];
+
+// Identity of OUR OWN review comments, so we can dedup against / resolve only
+// our own threads (never touch Bugbot's or a human's). The orchestrator runs
+// under the default GITHUB_TOKEN, so the author login is github-actions[bot]
+// (THUNDER_BOT_LOGIN, see env below). We also stamp every comment body with a
+// hidden marker (SELF_COMMENT_MARKER) as the robust secondary signal.
+const SELF_COMMENT_MARKER = '<!-- thunder-deep-review-finding -->';
+
+// Severity enum the model output is validated against. Anything else is dropped.
+const SEVERITY_ENUM = ['blocking', 'convention', 'nit'];
+
+// Deterministic deep-mode gate. Diff bigger than this => deep mode.
+const DEEP_MODE_CHANGED_LINES = 600; // tunable for the team's PR sizes.
+const DEEP_MODE_FILE_COUNT = 40; // tunable for the team's PR sizes.
+// Hard GitHub cliffs: /commits/{sha}/check-runs caps at 1000 check-suites,
+// PR files list caps at 3000 files. Past the file cap we switch to bounded mode.
+const PR_FILES_TRUNCATION_CAP = 3000;
+// In bounded mode (PR exceeds the file cap) we deterministically clip the diff
+// handed to the model to the first N per-file sections so "honor bounded mode"
+// is a REAL scope limit, not just an unenforced prompt hint.
+const BOUNDED_MODE_FILE_LIMIT = 200;
+
+// Per-comment body length cap so a single inline comment never balloons.
+const MAX_COMMENT_BODY = 60_000; // GitHub hard limit is 65536; leave headroom.
+// Cap on inline comments per review so we never post a wall of nits.
+const MAX_INLINE_COMMENTS = 50; // overflow rolls into the review summary body.
+
+// Polling timing: jittered exponential backoff, honor Retry-After.
+const POLL_BASE_MS = 15_000; // start at ~15s
+const POLL_CAP_MS = 60_000; // cap at ~60s
+const DISCOVERY_TIMEOUT_MS = 60_000; // ~60s: "never appeared" => treat bot as disabled
+const COMPLETION_TIMEOUT_MS = 5 * 60_000; // ~5min: "exists but not completed yet"
+
+// File handoff between the two phases / the model step.
+// Paths come from the THUNDER_*_FILE env vars set in thunder-deep-review.yml.
+const SKIPLIST_FILE = process.env.THUNDER_SKIPLIST_FILE ?? '/tmp/thunder-skiplist.json';
+const DIFF_FILE = process.env.THUNDER_DIFF_FILE ?? '/tmp/thunder-diff.patch';
+const FINDINGS_FILE = process.env.THUNDER_FINDINGS_FILE ?? '/tmp/thunder-findings.json';
+const DEEPMODE_FILE = process.env.THUNDER_DEEPMODE_FILE ?? '/tmp/thunder-deepmode.json';
+
+// -----------------------------------------------------------------------------
+// ENVIRONMENT (read once, validated in main)
+// -----------------------------------------------------------------------------
+
+const env = {
+  token: process.env.GITHUB_TOKEN ?? process.env.GH_TOKEN ?? '',
+  repoFull: process.env.GITHUB_REPOSITORY ?? '', // "owner/repo"
+  prNumber: process.env.PR_NUMBER ?? '',
+  // Reconcile EVERYTHING to the PR head SHA, never the synthetic merge ref.
+  headSha: process.env.PR_HEAD_SHA ?? '',
+  // GitHub PR action event name: "reopened" suppresses thread resolution.
+  eventAction: process.env.PR_ACTION ?? '',
+  apiBase: process.env.GITHUB_API_URL ?? 'https://api.github.com',
+  graphqlBase: process.env.GITHUB_GRAPHQL_URL ?? 'https://api.github.com/graphql',
+  // Login OUR review comments are authored as. The orchestrator posts via the
+  // default GITHUB_TOKEN, so the author login is github-actions[bot]. The hidden
+  // marker (SELF_COMMENT_MARKER) is the robust secondary match.
+  selfLogin: process.env.THUNDER_BOT_LOGIN ?? 'github-actions[bot]',
+  // Allow a `review:deep` label to force deep mode in addition to the size gate.
+  hasDeepLabel: (process.env.PR_LABELS ?? '').split(',').map((s) => s.trim()).includes('review:deep'),
+};
+
+const [owner, repo] = env.repoFull.split('/');
+
+// -----------------------------------------------------------------------------
+// SMALL UTILITIES
+// -----------------------------------------------------------------------------
+
+const log = (...args) => console.log('[thunder-orchestrator]', ...args);
+const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
+const jitter = (ms) => Math.round(ms * (0.8 + Math.random() * 0.4)); // ±20%
+
+/** Fail-soft exit: log the reason and exit 0 so the advisory bot never breaks CI. */
+const failSoft = (reason, err) => {
+  log('FAIL-SOFT (exiting 0 without posting):', reason, err?.message ?? '');
+  process.exit(0);
+};
+
+/**
+ * githubFetch — single choke-point for every GitHub REST call.
+ *  - Sends auth + API version headers.
+ *  - 4xx => throw immediately (bail; retrying won't help) EXCEPT 403/429 with a
+ *    rate-limit signal, which we treat as retryable backoff.
+ *  - 5xx => signal retryable so callers can back off.
+ *  - Honors `Retry-After` and `x-ratelimit-reset` when rate-limited.
+ * Returns { ok, status, json, linkHeader, retryAfterMs }.
+ */
+const githubFetch = async (url, init = {}) => {
+  const res = await fetch(url, {
+    ...init,
+    headers: {
+      Authorization: `Bearer ${env.token}`,
+      Accept: 'application/vnd.github+json',
+      'X-GitHub-Api-Version': '2022-11-28',
+      'User-Agent': 'thunder-deep-review-orchestrator',
+      // Declare the JSON body explicitly (matches graphqlFetch). Harmless on the
+      // bodyless GETs; required so POSTs that send JSON.stringify(...) — the
+      // create-review call — are parsed as JSON rather than mis-sniffed.
+      'Content-Type': 'application/json',
+      ...(init.headers ?? {}),
+    },
+  });
+
+  const retryAfterMs = computeRetryAfterMs(res);
+  const linkHeader = res.headers.get('link') ?? '';
+
+  // Rate-limited: surface as retryable regardless of 403/429 framing. GitHub's
+  // SECONDARY rate limit returns 403 with a `retry-after` header and a
+  // remaining quota > 0 — treat that as retryable too, not a permanent 4xx.
+  const remaining = res.headers.get('x-ratelimit-remaining');
+  const isRateLimited =
+    res.status === 429 ||
+    (res.status === 403 && (remaining === '0' || res.headers.get('retry-after') !== null));
+
+  if (res.status >= 500 || isRateLimited) {
+    return { ok: false, retryable: true, status: res.status, json: null, linkHeader, retryAfterMs };
+  }
+  if (res.status >= 400) {
+    // Non-rate-limit 4xx: permanent for this request. Caller decides bail vs ignore.
+    const body = await res.text().catch(() => '');
+    return { ok: false, retryable: false, status: res.status, json: null, linkHeader, retryAfterMs, body };
+  }
+  const json = res.status === 204 ? null : await res.json().catch(() => null);
+  return { ok: true, retryable: false, status: res.status, json, linkHeader, retryAfterMs };
+};
+
+/** Parse Retry-After (seconds) or x-ratelimit-reset (epoch secs) into a ms delay. */
+const computeRetryAfterMs = (res) => {
+  const retryAfter = res.headers.get('retry-after');
+  if (retryAfter) {
+    const secs = Number(retryAfter);
+    if (Number.isFinite(secs)) return secs * 1000;
+  }
+  const reset = res.headers.get('x-ratelimit-reset');
+  const remaining = res.headers.get('x-ratelimit-remaining');
+  if (reset && remaining === '0') {
+    const deltaMs = Number(reset) * 1000 - Date.now();
+    if (Number.isFinite(deltaMs) && deltaMs > 0) return Math.min(deltaMs, POLL_CAP_MS);
+  }
+  return 0;
+};
+
+/** Follow RFC-5988 `Link: <url>; rel="next"` to paginate. */
+const nextLink = (linkHeader) => {
+  // e.g. <https://api...?page=2>; rel="next", <https://api...?page=9>; rel="last"
+  const match = linkHeader.split(',').find((p) => p.includes('rel="next"'));
+  if (!match) return null;
+  const urlMatch = match.match(/<([^>]+)>/);
+  return urlMatch ? urlMatch[1] : null;
+};
+
+/**
+ * paginateAll — GET every page of a list endpoint, following Link headers.
+ * Retries 5xx / rate-limit with jittered backoff; bails on hard 4xx.
+ * `cap` guards against pathological pagination (e.g. the 3000-file cliff).
+ */
+const paginateAll = async (firstUrl, { cap = Infinity } = {}) => {
+  const out = [];
+  let url = firstUrl;
+  let truncated = false;
+  while (url) {
+    // Retry budget is PER PAGE, not shared across the whole pagination —
+    // otherwise a few flaky early pages exhaust the budget for all later ones.
+    let attempt = 0;
+    let r = await githubFetch(url);
+    while (!r.ok && r.retryable && attempt < 5) {
+      attempt += 1;
+      await sleep(Math.max(r.retryAfterMs, jitter(Math.min(POLL_BASE_MS * 2 ** attempt, POLL_CAP_MS))));
+      r = await githubFetch(url);
+    }
+    if (!r.ok) {
+      throw new Error(`GET ${url} -> ${r.status} ${r.body ?? ''}`);
+    }
+    const page = Array.isArray(r.json) ? r.json : (r.json?.check_runs ?? []);
+    out.push(...page);
+    if (out.length >= cap) {
+      truncated = true;
+      break;
+    }
+    url = nextLink(r.linkHeader);
+  }
+  return { items: out, truncated };
+};
+
+const apiUrl = (path) => `${env.apiBase}/repos/${owner}/${repo}${path}`;
+
+/**
+ * graphqlFetch — single choke-point for the GraphQL calls we need (review-thread
+ * resolution state + resolveReviewThread). Retries 5xx / rate-limit; returns
+ * `{ ok, data, errors }`. Mutations that fail are fail-soft at the caller.
+ */
+const graphqlFetch = async (query, variables) => {
+  let attempt = 0;
+  for (;;) {
+    const res = await fetch(env.graphqlBase, {
+      method: 'POST',
+      headers: {
+        Authorization: `Bearer ${env.token}`,
+        'Content-Type': 'application/json',
+        'User-Agent': 'thunder-deep-review-orchestrator',
+      },
+      body: JSON.stringify({ query, variables }),
+    });
+    // Retry 5xx, 429, and GitHub's secondary rate limit (403 + retry-after).
+    const secondaryLimited = res.status === 403 && res.headers.get('retry-after') !== null;
+    if (res.status >= 500 || res.status === 429 || secondaryLimited) {
+      if (attempt < 5) {
+        attempt += 1;
+        await sleep(Math.max(computeRetryAfterMs(res), jitter(Math.min(POLL_BASE_MS * 2 ** attempt, POLL_CAP_MS))));
+        continue;
+      }
+      return { ok: false, data: null, errors: [{ message: `graphql ${res.status}` }] };
+    }
+    const body = await res.json().catch(() => null);
+    if (!res.ok || !body) {
+      return { ok: false, data: null, errors: body?.errors ?? [{ message: `graphql ${res.status}` }] };
+    }
+    return { ok: !body.errors?.length, data: body.data ?? null, errors: body.errors ?? null };
+  }
+};
+
+// =============================================================================
+// POLLER
+// =============================================================================
+//
+// Enumerate the head-SHA check-runs ONCE, find the EXPECTED_BOTS runs by
+// numeric app.id, then poll each existing run BY ID until status===completed
+// (accept ANY conclusion). A short discovery sub-timeout treats "never
+// appeared" as "bot disabled on this PR" (empty contribution) vs the full
+// completion timeout for "run exists but hasn't finished yet".
+//
+// UNCHANGED by the inline-review redesign — only the comment SINK changed.
+// =============================================================================
+
+/** One enumeration of all check-runs for the head SHA (paginated). */
+const listCheckRuns = async () => {
+  // `filter=all` so we don't only see the latest run per app.
+  const url = apiUrl(`/commits/${env.headSha}/check-runs?per_page=100&filter=all`);
+  const { items } = await paginateAll(url);
+  return items;
+};
+
+/**
+ * pollForBots — block until all DISCOVERABLE expected bots reach completed,
+ * or the relevant timeouts elapse. Returns the set of bot app.ids that
+ * actually completed (used only for observability; the skip-list is built
+ * from comments + outputs separately).
+ */
+const pollForBots = async () => {
+  const start = Date.now();
+  const expectedIds = new Set(EXPECTED_BOTS.map((b) => b.appId).filter((id) => id > 0));
+  if (expectedIds.size === 0) {
+    log('no real EXPECTED_BOTS app.ids configured — skipping poll, skip-list will be best-effort.');
+    return;
+  }
+
+  let backoff = POLL_BASE_MS;
+  // Track, per bot id: have we ever SEEN a run (discovery), and is it completed?
+  const seen = new Set();
+  const completed = new Set();
+
+  for (;;) {
+    let runs;
+    try {
+      runs = await listCheckRuns();
+    } catch (err) {
+      // Transient list failure: back off and retry within the overall window.
+      if (Date.now() - start > COMPLETION_TIMEOUT_MS) {
+        log('poll: completion timeout during list error — proceeding.', err.message);
+        return;
+      }
+      await sleep(jitter(backoff));
+      backoff = Math.min(backoff * 2, POLL_CAP_MS);
+      continue;
+    }
+    // Successful list: reset the backoff so a few transient errors don't leave
+    // us polling at the 60s cap for the rest of the discovery window.
+    backoff = POLL_BASE_MS;
+
+    for (const run of runs) {
+      const appId = run.app?.id;
+      if (!expectedIds.has(appId)) continue;
+      seen.add(appId);
+      if (run.status === 'completed') completed.add(appId); // ANY conclusion is terminal.
+    }
+
+    const elapsed = Date.now() - start;
+
+    // Discovery sub-timeout: any expected bot we've NEVER seen after ~60s is
+    // treated as disabled and dropped from the wait-set.
+    const stillWaiting = [...expectedIds].filter((id) => {
+      if (completed.has(id)) return false; // done
+      if (!seen.has(id) && elapsed > DISCOVERY_TIMEOUT_MS) return false; // never appeared => disabled
+      return true; // either seen-but-not-complete, or not-yet-seen within discovery window
+    });
+
+    if (stillWaiting.length === 0) {
+      log(`poll: all discoverable bots terminal (completed=${[...completed].join(',') || 'none'}).`);
+      return;
+    }
+    if (elapsed > COMPLETION_TIMEOUT_MS) {
+      log(`poll: completion timeout (${Math.round(elapsed / 1000)}s) — proceeding with whatever posted.`);
+      return;
+    }
+
+    await sleep(jitter(backoff));
+    backoff = Math.min(backoff * 2, POLL_CAP_MS);
+  }
+};
+
+// =============================================================================
+// SKIP-LIST
+// =============================================================================
+//
+// Build the "already reported by A" skip-list so the model surfaces only what
+// A missed. Sources, in priority order:
+//   1. PR review (inline) comments authored by the bots — AUTHORITATIVE.
+//   2. The bots' check-run `output.text/summary` — BEST-EFFORT (free text).
+// Dedup key is normalized on file + diff_hunk + side/original_line, NOT raw
+// line numbers (lines drift on force-push).
+//
+// UNCHANGED by the inline-review redesign.
+// =============================================================================
+
+/** Stable dedup key for an inline comment. */
+const dedupKeyFromReviewComment = (c) => {
+  const parts = [
+    c.path ?? '',
+    // diff_hunk anchors the comment to surrounding code, resilient to line drift.
+    (c.diff_hunk ?? '').trim().slice(0, 200),
+    c.side ?? c.original_side ?? '',
+    String(c.original_line ?? c.line ?? ''),
+  ];
+  return normalizeKey(parts.join(' '));
+};
+
+const normalizeKey = (s) => createHash('sha1').update(s).digest('hex').slice(0, 16);
+
+/**
+ * Is this review comment from one of our expected Job-A bots (Cursor Bugbot
+ * et al.)? The skip-list MUST stay scoped to those external reviewers: pulling
+ * in unrelated bots — especially THIS workflow's own `github-actions[bot]`
+ * comments — would nudge the model to suppress its own prior findings, which
+ * then trips the "missing hash ⇒ resolve thread" convergence on the next run.
+ * Match an expected bot by its (drift-prone, best-effort) loginHint, and NEVER
+ * count our own comments (selfLogin or the self marker).
+ */
+const isBotComment = (c) => {
+  const login = c.user?.login ?? '';
+  if (login === env.selfLogin || (c.body ?? '').includes(SELF_COMMENT_MARKER)) return false;
+  return EXPECTED_BOTS.some((b) => b.loginHint && b.loginHint === login);
+};
+
+const buildSkipList = async () => {
+  const entries = [];
+
+  // 1) Inline review comments (authoritative). Paginate.
+  try {
+    const url = apiUrl(`/pulls/${env.prNumber}/comments?per_page=100`);
+    const { items } = await paginateAll(url);
+    for (const c of items) {
+      if (!isBotComment(c)) continue;
+      entries.push({
+        key: dedupKeyFromReviewComment(c),
+        source: 'inline',
+        path: c.path,
+        hunk: (c.diff_hunk ?? '').trim().slice(0, 200),
+        summary: (c.body ?? '').slice(0, 280),
+      });
+    }
+  } catch (err) {
+    // Best-effort: a skip-list failure must not abort the whole review.
+    log('skip-list: inline-comment fetch failed (continuing best-effort):', err.message);
+  }
+
+  // 2) Check-run outputs (best-effort, free text). We can't anchor these to a
+  //    hunk, so we only record a coarse summary the model can defer to.
+  try {
+    const runs = await listCheckRuns();
+    const expectedIds = new Set(EXPECTED_BOTS.map((b) => b.appId).filter((id) => id > 0));
+    for (const run of runs) {
+      if (!expectedIds.has(run.app?.id)) continue;
+      const text = `${run.output?.title ?? ''}\n${run.output?.summary ?? ''}`.trim();
+      if (!text) continue;
+      entries.push({
+        key: normalizeKey(`checkrun ${run.id}`),
+        source: 'checkrun-summary',
+        path: null,
+        hunk: null,
+        summary: text.slice(0, 600),
+      });
+    }
+  } catch (err) {
+    log('skip-list: check-run output fetch failed (continuing best-effort):', err.message);
+  }
+
+  // De-dup the skip-list itself by key.
+  const byKey = new Map(entries.map((e) => [e.key, e]));
+  return [...byKey.values()];
+};
+
+// =============================================================================
+// DEEP-MODE GATE
+// =============================================================================
+//
+// Compute diff size DETERMINISTICALLY in code; do NOT let the model decide
+// when to spend on deep mode. Detect the 3000-file truncation cliff and fall
+// back to a bounded mode flag the model honors.
+//
+// UNCHANGED by the inline-review redesign.
+// =============================================================================
+
+const computeDeepMode = async () => {
+  let fileCount = 0;
+  let changedLines = 0;
+  let truncated = false;
+  try {
+    const url = apiUrl(`/pulls/${env.prNumber}/files?per_page=100`);
+    const { items, truncated: t } = await paginateAll(url, { cap: PR_FILES_TRUNCATION_CAP });
+    truncated = t;
+    fileCount = items.length;
+    for (const f of items) changedLines += (f.additions ?? 0) + (f.deletions ?? 0);
+  } catch (err) {
+    log('deep-mode: file list failed, defaulting to non-deep bounded:', err.message);
+  }
+
+  const sizeTriggered = changedLines >= DEEP_MODE_CHANGED_LINES || fileCount >= DEEP_MODE_FILE_COUNT;
+  return {
+    deepMode: env.hasDeepLabel || sizeTriggered,
+    boundedMode: truncated, // past the file cap, review a bounded subset only.
+    fileCount,
+    changedLines,
+    truncated,
+    reason: env.hasDeepLabel ? 'review:deep label' : sizeTriggered ? 'size gate' : 'default single fan-out',
+  };
+};
+
+// =============================================================================
+// OUTPUT CONTRACT
+// =============================================================================
+//
+// The model writes STRUCTURED JSON findings to FINDINGS_FILE. We NEVER trust
+// raw model markdown. Validate severities against the enum, drop anything
+// malformed, then render each finding into an inline comment body here.
+//
+// Expected JSON shape (the action step's prompt must enforce this):
+//   { "findings": [ { "severity": "blocking"|"convention"|"nit",
+//                     "file": "src/x.ts", "line": 42, "side": "RIGHT"|"LEFT",
+//                     "title": "…", "body": "…", "rule": "optional-invariant-id" } ] }
+// `side` is optional and defaults to RIGHT (the added/changed side). A finding
+// whose line is not in the diff falls back to a file-level / summary comment.
+// =============================================================================
+
+/**
+ * Stable per-finding hash, used to match a finding to an EXISTING open thread
+ * across pushes (convergence). Anchored on file + normalized diff context + rule
+ * + title — NOT raw line numbers (they drift on force-push). The `title` is what
+ * separates two DISTINCT findings that share the same file, hunk, rule, and
+ * severity; without it they collapse to one hash, so the second finding gets
+ * dropped as "already open" and its feedback silently disappears. The model does
+ * not emit a hunk, so we derive a coarse code anchor from the diff (see
+ * hunkAnchorFor).
+ */
+const findingHash = (f) =>
+  normalizeKey(
+    [f.file ?? '', f.rule ?? '', (f.anchor ?? '').trim().slice(0, 200), f.severity, (f.title ?? '').trim()].join(' '),
+  );
+
+const readModelFindings = async () => {
+  const raw = await readFile(FINDINGS_FILE, 'utf8');
+  const parsed = JSON.parse(raw); // throws on malformed => fail-soft upstream.
+  const list = Array.isArray(parsed?.findings) ? parsed.findings : [];
+  // Validate + normalize. Drop anything with an out-of-enum severity.
+  const valid = [];
+  for (const f of list) {
+    const severity = String(f.severity ?? '').toLowerCase();
+    if (!SEVERITY_ENUM.includes(severity)) {
+      log(`dropping finding with invalid severity: ${JSON.stringify(f.severity)}`);
+      continue;
+    }
+    const side = String(f.side ?? 'RIGHT').toUpperCase() === 'LEFT' ? 'LEFT' : 'RIGHT';
+    valid.push({
+      severity,
+      file: typeof f.file === 'string' ? f.file : null,
+      // Valid diff lines are 1-based; reject <= 0 so a bad value falls back to
+      // a file-level / summary placement rather than a never-matching line.
+      line: Number.isFinite(f.line) && f.line >= 1 ? f.line : null,
+      side,
+      title: String(f.title ?? '').slice(0, 300),
+      body: String(f.body ?? '').slice(0, 4000),
+      rule: f.rule ? String(f.rule).slice(0, 80) : null,
+    });
+  }
+  return valid;
+};
+
+// =============================================================================
+// DIFF POSITION MAPPING
+// =============================================================================
+//
+// The "create a review" REST API anchors inline comments with { path, line,
+// side } (the line in the file at the head SHA). A comment can only attach to a
+// line that is PART OF THE DIFF. We parse the unified diff (already produced
+// deterministically into DIFF_FILE) to learn, per file, which (side, line)
+// positions are commentable, and to capture a small code anchor per hunk so a
+// finding can be matched across pushes even as line numbers drift.
+// A finding whose line is NOT in the diff falls back to a file-level comment
+// (subject_type=file) or, if even the file isn't in the diff, the review body
+// summary — it NEVER crashes the post.
+// =============================================================================
+
+/**
+ * parseUnifiedDiff — minimal unified-diff parser.
+ * Returns Map<path, { rightLines:Set<number>, leftLines:Set<number>,
+ *                     hunks: [{ rightStart, leftStart, text }] }>.
+ * We only need membership ("is this line commentable?") and a per-hunk text
+ * anchor; we do NOT need GitHub's legacy `position` integer (line+side is the
+ * modern, drift-resilient anchor).
+ */
+const parseUnifiedDiff = (patch) => {
+  const files = new Map();
+  let current = null;
+  let rightLine = 0;
+  let leftLine = 0;
+  let hunk = null;
+
+  // Split on CRLF or LF — a trailing \r would otherwise get baked into the
+  // hunk anchor text and destabilize the finding hash across pushes.
+  const lines = patch.split(/\r?\n/);
+  for (const raw of lines) {
+    if (raw.startsWith('diff --git ')) {
+      current = null;
+      hunk = null;
+      continue;
+    }
+    if (raw.startsWith('+++ ')) {
+      // "+++ b/path" — the head-side path. "/dev/null" => deletion (no RIGHT side).
+      const p = raw.slice(4).replace(/^b\//, '').trim();
+      if (p === '/dev/null') {
+        current = null;
+        continue;
+      }
+      current = { rightLines: new Set(), leftLines: new Set(), hunks: [] };
+      files.set(p, current);
+      continue;
+    }
+    if (raw.startsWith('@@')) {
+      // @@ -l,s +l,s @@ optional section heading
+      const m = raw.match(/@@ -(\d+)(?:,\d+)? \+(\d+)(?:,\d+)? @@/);
+      if (!m || !current) {
+        hunk = null;
+        continue;
+      }
+      leftLine = Number(m[1]);
+      rightLine = Number(m[2]);
+      hunk = { rightStart: rightLine, leftStart: leftLine, text: raw };
+      current.hunks.push(hunk);
+      continue;
+    }
+    if (!current) continue;
+    const tag = raw[0];
+    if (tag === '+') {
+      current.rightLines.add(rightLine);
+      if (hunk) hunk.text += `\n${raw}`;
+      rightLine += 1;
+    } else if (tag === '-') {
+      current.leftLines.add(leftLine);
+      if (hunk) hunk.text += `\n${raw}`;
+      leftLine += 1;
+    } else if (tag === ' ') {
+      // Context line (leading space): unchanged, but it IS part of the hunk and
+      // GitHub accepts review comments on it ("unchanged lines shown for
+      // context"). Record it on BOTH sides so a finding on a context line still
+      // anchors inline instead of being needlessly demoted to a file comment.
+      // ONLY a leading-space line counts — an empty token (the trailing split
+      // artifact, or stray text outside a hunk) must NOT advance the counters or
+      // be marked commentable, or it would falsely anchor a non-diff line.
+      current.rightLines.add(rightLine);
+      current.leftLines.add(leftLine);
+      if (hunk) hunk.text += `\n${raw}`;
+      rightLine += 1;
+      leftLine += 1;
+    }
+    // tag === '\\' ("\ No newline at end of file") and empty/other tokens are
+    // not content lines — skip without touching the line counters.
+  }
+  return files;
+};
+
+/** Load + parse the pre-computed diff. Best-effort: missing diff => empty map. */
+const loadDiffIndex = async () => {
+  try {
+    const patch = await readFile(DIFF_FILE, 'utf8');
+    return parseUnifiedDiff(patch);
+  } catch (err) {
+    log('diff: could not read/parse DIFF_FILE (all findings fall back to summary):', err.message);
+    return new Map();
+  }
+};
+
+/**
+ * Deterministically clip a unified diff to its first `limit` per-file sections.
+ * Each file section starts at a `diff --git ` line, so cutting on that boundary
+ * always yields a still-valid patch (no half-file/half-hunk). Returns the
+ * original patch unchanged when it has `limit` files or fewer.
+ */
+const clipDiffToFileLimit = (patch, limit) => {
+  const lines = patch.split('\n');
+  let fileCount = 0;
+  const kept = [];
+  for (const line of lines) {
+    if (line.startsWith('diff --git ')) {
+      fileCount += 1;
+      if (fileCount > limit) break;
+    }
+    kept.push(line);
+  }
+  return { clipped: fileCount > limit, fileCount: Math.min(fileCount, limit), text: kept.join('\n') };
+};
+
+/**
+ * In bounded mode, rewrite DIFF_FILE in place to its first
+ * BOUNDED_MODE_FILE_LIMIT file sections so the model gets a REAL, deterministic
+ * scope — not just a prompt asking it to self-limit. No-op when not bounded or
+ * the diff already fits. Best-effort: a read/write failure leaves the full diff.
+ */
+const enforceBoundedDiff = async (boundedMode) => {
+  if (!boundedMode) return;
+  try {
+    const patch = await readFile(DIFF_FILE, 'utf8');
+    const { clipped, fileCount, text } = clipDiffToFileLimit(patch, BOUNDED_MODE_FILE_LIMIT);
+    if (!clipped) return;
+    await writeFile(DIFF_FILE, text);
+    log(`pre: bounded mode — clipped diff to first ${fileCount} files (${BOUNDED_MODE_FILE_LIMIT} cap).`);
+  } catch (err) {
+    log('pre: bounded-diff clip failed (continuing with full diff):', err.message);
+  }
+};
+
+/**
+ * Find the hunk in `fileEntry` that contains (side, line). Used to derive a
+ * stable code anchor for the finding hash so convergence survives line drift.
+ */
+const stripHunkHeader = (text) => text.replace(/^@@.*@@.*/, '').trim().slice(0, 200);
+
+const hunkAnchorFor = (fileEntry, line, side) => {
+  if (!fileEntry) return '';
+  // Pick the LAST hunk whose start is at/before the target line — that's the
+  // one actually containing it. Picking the first (any start ≤ line) would
+  // anchor a late-file finding to an early hunk and weaken cross-push
+  // convergence. Coarse (first ~200 chars) on purpose — stability over exactness.
+  let containing = null;
+  for (const h of fileEntry.hunks) {
+    const start = side === 'LEFT' ? h.leftStart : h.rightStart;
+    if (line >= start) containing = h;
+  }
+  if (containing) {
+    const anchor = stripHunkHeader(containing.text);
+    if (anchor) return anchor;
+  }
+  return stripHunkHeader(fileEntry.hunks[0]?.text ?? '');
+};
+
+/**
+ * Classify each finding against the diff index:
+ *  - `inline`  : (side, line) is commentable → real anchored inline comment.
+ *  - `file`    : file is in the diff but the line isn't → file-level comment.
+ *  - `summary` : file not in the diff at all → rolled into the review body.
+ * Also attaches a stable `anchor` (+ `hash`) for cross-push convergence.
+ */
+const classifyFindings = (findings, diffIndex) =>
+  findings.map((f) => {
+    const fileEntry = f.file ? diffIndex.get(f.file) : null;
+    const lineSet = f.side === 'LEFT' ? fileEntry?.leftLines : fileEntry?.rightLines;
+    const anchor = hunkAnchorFor(fileEntry, f.line ?? 0, f.side);
+    const withAnchor = { ...f, anchor };
+    const hash = findingHash(withAnchor);
+
+    if (f.file && fileEntry && f.line != null && lineSet?.has(f.line)) {
+      return { ...withAnchor, hash, placement: 'inline' };
+    }
+    if (f.file && fileEntry) {
+      return { ...withAnchor, hash, placement: 'file' };
+    }
+    return { ...withAnchor, hash, placement: 'summary' };
+  });
+
+// =============================================================================
+// OWN-THREAD STATE (dedup + resolution targets) — via GraphQL
+// =============================================================================
+//
+// To converge across pushes we need, for OUR OWN review comments only:
+//   - the stable finding-hash we stamped (read from a hidden HTML comment in
+//     each comment body), and
+//   - the review-thread node id + isResolved flag (REST review comments don't
+//     expose thread/resolution; GraphQL `reviewThreads` does).
+// We then: (a) skip posting a finding whose hash already has an OPEN own-thread
+// (dedup), and (b) `resolveReviewThread` our own OPEN threads whose finding is
+// gone from the current set (fixed/changed code).
+// =============================================================================
+
+/** Pull the stamped finding-hash out of one of our comment bodies, if present. */
+const hashFromOwnBody = (body) => {
+  const m = (body ?? '').match(/<!-- thunder-finding-hash:([0-9a-f]{8,}) -->/);
+  return m ? m[1] : null;
+};
+
+/** Pull EVERY stamped finding-hash out of a body (a review body lists many). */
+const hashesFromBody = (body) => [...(body ?? '').matchAll(/<!-- thunder-finding-hash:([0-9a-f]{8,}) -->/g)].map((m) => m[1]);
+
+/**
+ * Is this login OURS? GraphQL returns a Bot login WITHOUT the "[bot]" suffix
+ * ("github-actions"), while the REST-derived selfLogin carries it
+ * ("github-actions[bot]") — normalize both before comparing.
+ */
+const isSelfLogin = (login) => login.replace(/\[bot\]$/, '') === env.selfLogin.replace(/\[bot\]$/, '');
+
+/**
+ * Did WE author this review/comment? Login match, with the hidden
+ * SELF_COMMENT_MARKER as the canonical fallback.
+ */
+const isOwnAuthor = (login, body) => isSelfLogin(login) || (body ?? '').includes(SELF_COMMENT_MARKER);
+
+/**
+ * Fetch the finding-hashes we already SUMMARIZED in our own prior review bodies.
+ * Summary/overflow findings never become review THREADS (they're body bullets,
+ * not inline comments), so reviewThreads can't dedup them — without this each
+ * push reposts the same "Additional notes". We stamp every summary bullet with a
+ * hidden hash (see buildReviewPayload) and read them back here. Best-effort: a
+ * GraphQL failure just means we may repost a summary line, never a crash.
+ */
+const fetchOwnSummarizedHashes = async () => {
+  const query = `
+    query($owner:String!, $repo:String!, $pr:Int!, $cursor:String) {
+      repository(owner:$owner, name:$repo) {
+        pullRequest(number:$pr) {
+          reviews(first:100, after:$cursor) {
+            pageInfo { hasNextPage endCursor }
+            nodes { body author { login } }
+          }
+        }
+      }
+    }`;
+
+  const out = new Set();
+  let cursor = null;
+  for (let page = 0; page < 10; page += 1) {
+    // eslint-disable-next-line no-await-in-loop -- sequential cursor pagination
+    const r = await graphqlFetch(query, { owner, repo, pr: Number(env.prNumber), cursor });
+    if (!r.ok || !r.data) {
+      log('own-reviews: GraphQL fetch failed (summary dedup best-effort):', r.errors?.[0]?.message ?? '');
+      return out;
+    }
+    const conn = r.data.repository?.pullRequest?.reviews;
+    for (const node of conn?.nodes ?? []) {
+      if (!isOwnAuthor(node.author?.login ?? '', node.body)) continue;
+      for (const h of hashesFromBody(node.body)) out.add(h);
+    }
+    if (!conn?.pageInfo?.hasNextPage) break;
+    cursor = conn.pageInfo.endCursor;
+  }
+  return out;
+};
+
+/**
+ * Fetch OUR OWN review threads on this PR via GraphQL, paginated.
+ * Returns [{ threadId, isResolved, resolvedByLogin, hash }] where hash is the
+ * stamped finding hash on the FIRST (root) comment of a thread we authored.
+ * `resolvedByLogin` is the login that resolved the thread (null if open), used
+ * to tell a human resolution apart from the bot's own auto-resolution.
+ */
+const fetchOwnThreads = async () => {
+  const query = `
+    query($owner:String!, $repo:String!, $pr:Int!, $cursor:String) {
+      repository(owner:$owner, name:$repo) {
+        pullRequest(number:$pr) {
+          reviewThreads(first:100, after:$cursor) {
+            pageInfo { hasNextPage endCursor }
+            nodes {
+              id
+              isResolved
+              isOutdated
+              resolvedBy { login }
+              comments(first:1) {
+                nodes { body author { login } }
+              }
+            }
+          }
+        }
+      }
+    }`;
+
+  const out = [];
+  let cursor = null;
+  // Hard page cap (~1000 threads) so a stuck/looping endCursor can't spin forever.
+  for (let page = 0; page < 10; page += 1) {
+    const r = await graphqlFetch(query, {
+      owner,
+      repo,
+      pr: Number(env.prNumber),
+      cursor,
+    });
+    if (!r.ok || !r.data) {
+      log('own-threads: GraphQL fetch failed (treating as no prior threads):', r.errors?.[0]?.message ?? '');
+      return out;
+    }
+    const conn = r.data.repository?.pullRequest?.reviewThreads;
+    for (const node of conn?.nodes ?? []) {
+      const root = node.comments?.nodes?.[0];
+      if (!root) continue;
+      if (!isOwnAuthor(root.author?.login ?? '', root.body)) continue;
+      out.push({
+        threadId: node.id,
+        isResolved: Boolean(node.isResolved),
+        resolvedByLogin: node.resolvedBy?.login ?? null,
+        hash: hashFromOwnBody(root.body),
+      });
+    }
+    if (!conn?.pageInfo?.hasNextPage) break;
+    cursor = conn.pageInfo.endCursor;
+  }
+  return out;
+};
+
+/** Mark one of OUR threads resolved. Fail-soft per call. */
+const resolveThread = async (threadId) => {
+  const mutation = `
+    mutation($threadId:ID!) {
+      resolveReviewThread(input:{threadId:$threadId}) {
+        thread { id isResolved }
+      }
+    }`;
+  const r = await graphqlFetch(mutation, { threadId });
+  if (!r.ok) {
+    log('resolve-thread: failed (continuing):', r.errors?.[0]?.message ?? threadId);
+    return false;
+  }
+  return true;
+};
+
+// =============================================================================
+// INLINE REVIEW RENDERING + POST
+// =============================================================================
+//
+// Build ONE PR review: event=COMMENT (advisory — never approve/request-changes),
+// commit_id=head SHA, comments=[{path, line, side, body}, ...]. Findings that
+// can't anchor to a diff line are file-level comments (subject_type=file) or, if
+// even the file isn't in the diff, summarized in the review body.
+// =============================================================================
+
+const SEVERITY_LABEL = { blocking: '🚫 Blocking', convention: '📐 Convention', nit: '🔧 Nit' };
+
+// Heading for body-summarized findings (summary placements + inline overflow +
+// the 422 fallback). Single source so the skip-post check and the fallback's
+// dedup-guard match the heading we actually emit.
+const SUMMARY_HEADING = `### Additional notes (couldn't anchor to a diff line)`;
+
+/** Render one finding into an inline-comment markdown body (with hidden hash). */
+const renderCommentBody = (f) => {
+  // Rule/invariant ids (f.rule) are INTERNAL grounding only — used for the
+  // dedup hash and the model's self-validation. They're meaningless to a human
+  // in a PR comment, so they are never rendered into the comment body.
+  const head = `**${SEVERITY_LABEL[f.severity]} — ${f.title}**`;
+  const stamp = `\n\n${SELF_COMMENT_MARKER}<!-- thunder-finding-hash:${f.hash} -->`;
+  const body = `${head}\n\n${f.body}${stamp}`;
+  // Unreachable while readModelFindings caps title@300 + body@4000 (well under
+  // MAX_COMMENT_BODY); kept intentionally in case those caps rise — not dead code.
+  return body.length > MAX_COMMENT_BODY ? `${body.slice(0, MAX_COMMENT_BODY - stamp.length - 16)}\n…${stamp}` : body;
+};
+
+/**
+ * Is this finding actually anchorable as an inline/file comment in the diff we
+ * parsed? This is the SAME test GitHub applies server-side when it validates
+ * `comments[]` on `POST /pulls/{n}/reviews`, so mirroring it here lets us demote
+ * anything unanchorable to the summary body instead of risking a 422 that drops
+ * the whole review.
+ */
+const isAnchorable = (f, diffIndex) => {
+  const fileEntry = f.file ? diffIndex.get(f.file) : null;
+  if (!fileEntry) return false; // file not in the real PR diff at all
+  if (f.placement === 'file') return true; // file is in the diff → file-level ok
+  const lineSet = f.side === 'LEFT' ? fileEntry.leftLines : fileEntry.rightLines;
+  return f.line != null && Boolean(lineSet?.has(f.line));
+};
+
+/**
+ * Build the review payload from the findings we decided to POST (already
+ * deduped against open own-threads). Inline + file-level findings become
+ * `comments[]`; summary-only findings (and inline overflow past the cap) roll
+ * into the review `body`.
+ *
+ * RESILIENCE: an inline/file comment is added to `comments[]` ONLY if its
+ * (path, line, side) is actually anchorable in the diff we parsed — re-verified
+ * here against `diffIndex`, not just trusting the `placement` tag — so a stray
+ * path/line can never reach GitHub as an inline comment and 422 the review.
+ */
+const buildReviewPayload = ({ toPost, diffIndex, deepInfo, skipCount }) => {
+  const comments = [];
+  const overflow = [];
+
+  for (const f of toPost) {
+    // Summary placements and anything not anchorable in the REAL diff roll into
+    // the body so the POST can never 422 ("Path could not be resolved").
+    if (f.placement === 'summary' || !isAnchorable(f, diffIndex)) {
+      overflow.push(f);
+      continue;
+    }
+    if (comments.length >= MAX_INLINE_COMMENTS) {
+      overflow.push(f);
+      continue;
+    }
+    const base = { path: f.file, body: renderCommentBody(f) };
+    // file-level comment: subject_type=file, no line. Otherwise anchor by line+side.
+    comments.push(f.placement === 'file' ? { ...base, subject_type: 'file' } : { ...base, line: f.line, side: f.side });
+  }
+
+  const summaryLines = [];
+  for (const f of overflow) {
+    const loc = f.file ? `\`${f.file}${f.line ? `:${f.line}` : ''}\`` : '';
+    // Trailing hidden hash so the NEXT run can see this summary finding was
+    // already reported (summary findings have no thread, so reviewThreads can't
+    // dedup them — fetchOwnSummarizedHashes reads these markers back).
+    summaryLines.push(`- **${SEVERITY_LABEL[f.severity]}** ${loc} — ${f.title} <!-- thunder-finding-hash:${f.hash} -->`);
+  }
+
+  const header =
+    `## 🔭 thunder-deep-review (advisory)\n` +
+    `Complements the other bots — surfaces only what they did not flag. ` +
+    `Never approves, never requests changes, never gates merge.\n` +
+    `<sub>head: \`${env.headSha.slice(0, 12)}\` · mode: ${deepInfo.deepMode ? 'deep' : 'single'}` +
+    `${deepInfo.boundedMode ? ' (bounded: diff exceeded file cap)' : ''} · ` +
+    `deferred ${skipCount} item(s) already reported by other bots (best-effort dedup)</sub>\n`;
+
+  const summaryBlock =
+    summaryLines.length === 0 ? '' : `\n${SUMMARY_HEADING}\n${summaryLines.join('\n')}\n`;
+
+  const body = `${header}${summaryBlock}`;
+  return { event: 'COMMENT', commit_id: env.headSha, body, comments };
+};
+
+/** POST the review. event=COMMENT only — NEVER approve/request-changes. */
+const postReview = async (payload) => {
+  if (payload.comments.length === 0 && !payload.body.includes(SUMMARY_HEADING)) {
+    // Nothing to anchor inline and nothing to summarize beyond the header —
+    // i.e. zero new findings. Skip posting entirely rather than leave a
+    // contentless review on the PR.
+    log('post: no new inline comments and no summary items — skipping review post.');
+    return;
+  }
+  const r = await githubFetch(apiUrl(`/pulls/${env.prNumber}/reviews`), {
+    method: 'POST',
+    body: JSON.stringify(payload),
+  });
+  if (r.ok) {
+    log(`posted inline review: ${payload.comments.length} inline comment(s), event=COMMENT.`);
+    return;
+  }
+  // Last-resort resilience: GitHub rejects the WHOLE review if ANY inline
+  // comment can't be anchored ("Path could not be resolved"). Rather than lose
+  // every good finding, retry ONCE with all inline comments dropped, folding
+  // their locations into the summary body so the findings still reach the human.
+  if (r.status === 422 && payload.comments.length > 0) {
+    log(`post: 422 on inline anchors (${r.body ?? ''}) — retrying comment-free with findings in the body.`);
+    const demoted = payload.comments.map((c) => {
+      const lineRef = c.line ? `:${c.line}` : '';
+      // Carry the hidden finding-hash from the inline body onto the demoted
+      // bullet so the next run's summary-dedup (fetchOwnSummarizedHashes) sees
+      // it and does not repost — otherwise this fallback path reintroduces the
+      // "summary findings never dedupe" bug it is rescuing.
+      const hash = hashFromOwnBody(c.body);
+      const stamp = hash ? ` <!-- thunder-finding-hash:${hash} -->` : '';
+      return `- \`${c.path}${lineRef}\` — ${c.body.split('\n')[0].replace(/\*\*/g, '')}${stamp}`;
+    });
+    // payload.body may ALREADY carry a SUMMARY_HEADING block (mixed-placement
+    // reviews with summary findings) — append the demoted bullets under it
+    // rather than emitting a duplicate header.
+    const demotedBlock = demoted.join('\n');
+    const fallbackBody = payload.body.includes(SUMMARY_HEADING)
+      ? `${payload.body}${demotedBlock}\n`
+      : `${payload.body}\n${SUMMARY_HEADING}\n${demotedBlock}\n`;
+    const retry = await githubFetch(apiUrl(`/pulls/${env.prNumber}/reviews`), {
+      method: 'POST',
+      body: JSON.stringify({ event: 'COMMENT', commit_id: env.headSha, body: fallbackBody, comments: [] }),
+    });
+    if (!retry.ok) throw new Error(`create review retry failed: ${retry.status} ${retry.body ?? ''}`);
+    log(`posted comment-free fallback review: ${demoted.length} finding(s) in body, event=COMMENT.`);
+    return;
+  }
+  throw new Error(`create review failed: ${r.status} ${r.body ?? ''}`);
+};
+
+// =============================================================================
+// PHASES
+// =============================================================================
+
+/** PRE phase: poll A's bots, then write the skip-list + deep-mode flag for the model step. */
+const runPre = async () => {
+  await pollForBots();
+  const [skipList, deepInfo] = await Promise.all([buildSkipList(), computeDeepMode()]);
+  await writeFile(SKIPLIST_FILE, JSON.stringify({ headSha: env.headSha, skipList }, null, 2));
+  await writeFile(DEEPMODE_FILE, JSON.stringify(deepInfo, null, 2));
+  // The DIFF_FILE itself is produced by a separate deterministic workflow step
+  // (the GitHub API's PR diff, Accept: v3.diff — the exact merge-base..head set
+  // GitHub validates review anchors against). This script never shells out to
+  // git; it only does GitHub REST I/O. But when the PR overran the file cap we
+  // clip that diff here so bounded mode is an ENFORCED scope, not a prompt hint.
+  await enforceBoundedDiff(deepInfo.boundedMode);
+  log(`pre: wrote ${skipList.length} skip-list entries; deepMode=${deepInfo.deepMode} (${deepInfo.reason}).`);
+};
+
+/**
+ * POST phase: read validated model findings, map them to diff positions, dedup
+ * against our own OPEN review threads, resolve our own FIXED threads, then post
+ * ONE PR review (event=COMMENT) carrying only the NEW findings as inline
+ * comments.
+ */
+const runPost = async () => {
+  const rawFindings = await readModelFindings(); // throws => fail-soft
+  let skipCount = 0;
+  let deepInfo = { deepMode: false, boundedMode: false };
+  try {
+    skipCount = JSON.parse(await readFile(SKIPLIST_FILE, 'utf8')).skipList?.length ?? 0;
+    deepInfo = JSON.parse(await readFile(DEEPMODE_FILE, 'utf8'));
+  } catch {
+    /* best-effort metadata only */
+  }
+
+  // 1) Map findings to diff positions (inline / file / summary) + stable hash.
+  const diffIndex = await loadDiffIndex();
+  const findings = classifyFindings(rawFindings, diffIndex);
+  const currentHashes = new Set(findings.map((f) => f.hash));
+
+  // 2) Fetch our own prior state: open inline THREADS (hash + resolution) and
+  //    the hashes we already SUMMARIZED in prior review bodies. Summary findings
+  //    have no thread, so without the latter each push reposts the same notes.
+  const [ownThreads, summarizedHashes] = await Promise.all([fetchOwnThreads(), fetchOwnSummarizedHashes()]);
+  const openHashes = new Set(ownThreads.filter((t) => !t.isResolved && t.hash).map((t) => t.hash));
+  // Threads a HUMAN resolved (decided the finding is intentional). We must NOT
+  // re-post these even though the underlying code is unchanged, or the bot would
+  // fight the human's resolve on every push. We scope to human resolutions only:
+  // a thread WE auto-resolved (step 4) is resolved precisely because its code was
+  // fixed, so if that code regresses the finding genuinely reappears and SHOULD
+  // re-post — hence bot-self-resolved hashes are deliberately left out.
+  const humanResolvedHashes = new Set(
+    ownThreads.filter((t) => t.isResolved && t.hash && t.resolvedByLogin && !isSelfLogin(t.resolvedByLogin)).map((t) => t.hash),
+  );
+
+  // 3) Dedup: post only findings NOT already present as an OPEN own-thread, NOT
+  //    human-resolved (respect the human's decision), and — for summary-placement
+  //    findings, which have no thread — not already listed in a prior review's
+  //    "Additional notes" body.
+  const toPost = findings.filter(
+    (f) => !openHashes.has(f.hash) && !humanResolvedHashes.has(f.hash) && !summarizedHashes.has(f.hash),
+  );
+
+  // 4) Convergence: resolve our OWN open threads whose finding is gone now.
+  //    Suppressed on `reopened` (the prior threads may be from another base).
+  //    ALSO suppressed when the model emitted ZERO findings this run: an empty
+  //    result is the model saying "nothing new to add" (the prompt tells it to
+  //    return {"findings":[]} in that case), NOT "every prior finding is fixed".
+  //    Without this guard an empty run would mark every open own-thread stale and
+  //    resolve it even though the flagged code was never touched.
+  if (env.eventAction === 'reopened') {
+    log('post: reopened event — skipping thread resolution.');
+  } else if (findings.length === 0) {
+    log('post: model returned no findings — skipping thread resolution (no fix signal).');
+  } else {
+    const stale = ownThreads.filter((t) => !t.isResolved && t.hash && !currentHashes.has(t.hash));
+    for (const t of stale) {
+      // eslint-disable-next-line no-await-in-loop -- small N, ordered for clear logs
+      await resolveThread(t.threadId);
+    }
+    if (stale.length) log(`post: resolved ${stale.length} own thread(s) whose finding is fixed.`);
+  }
+
+  // 5) Post the new findings as one inline review.
+  const payload = buildReviewPayload({ toPost, diffIndex, deepInfo, skipCount });
+  await postReview(payload);
+};
+
+// =============================================================================
+// MAIN
+// =============================================================================
+
+const main = async () => {
+  const phase = process.argv[2];
+  if (!['pre', 'post'].includes(phase)) failSoft(`usage: review-orchestrator.mjs <pre|post> (got "${phase}")`);
+  if (!env.token) failSoft('missing GITHUB_TOKEN/GH_TOKEN');
+  if (!owner || !repo) failSoft(`missing/invalid GITHUB_REPOSITORY: "${env.repoFull}"`);
+  if (!env.prNumber) failSoft('missing PR_NUMBER');
+  if (!env.headSha) failSoft('missing PR_HEAD_SHA'); // must reconcile to head, not merge ref.
+
+  try {
+    if (phase === 'pre') await runPre();
+    else await runPost();
+  } catch (err) {
+    failSoft(`unrecoverable error in ${phase} phase`, err);
+  }
+};
+
+main();
diff --git a/.github/workflows/thunder-deep-review.yml b/.github/workflows/thunder-deep-review.yml
new file mode 100644
index 000000000..7ae478f18
--- /dev/null
+++ b/.github/workflows/thunder-deep-review.yml
@@ -0,0 +1,319 @@
+# =============================================================================
+# thunder-deep-review — advisory, comment-only AI review. Never approves, never
+# requests changes, never merges. Human approval + merge stay exactly as today.
+#
+# Input-name verification:
+#   - The `anthropics/claude-code-action` input names below were verified
+#     against the live @v1 `action.yml` (2026-06): `prompt`, `anthropic_api_key`,
+#     `claude_args`, and `use_sticky_comment` all EXIST. `model:` /
+#     `allowed_tools:` / `review_event:` do NOT exist in @v1 — do NOT
+#     reintroduce them (pass --model/--allowedTools via `claude_args`).
+#     Re-confirm only if the action major version changes.
+#   - Action commit SHAs are PINNED below (checkout / claude-code-action).
+#   - The only secret the team sets is the ANTHROPIC_API_KEY repo secret.
+#   - Recommended: CODEOWNERS-protect this workflow + .github/scripts/review-orchestrator.mjs
+#     (they hold the review logic + the API-key reference).
+#
+# Orchestration is deterministic CODE, not model tool-calls: .github/scripts/
+# review-orchestrator.mjs does ALL GitHub I/O (poll A's bots, build the
+# skip-list, post the inline PR review). The model step is READ-ONLY and ONLY
+# emits structured JSON findings as the action's structured_output (via
+# --json-schema); a separate non-model step materializes that to a file.
+# =============================================================================
+
+name: thunder-deep-review
+
+on:
+  pull_request:
+    # Include ready_for_review so un-drafting triggers a review; drafts
+    # are skipped by the job `if:` below.
+    types: [opened, synchronize, reopened, ready_for_review]
+    # NOTE: fork PRs are GATED OFF entirely by the job `if:` — do NOT switch
+    # to `pull_request_target` + checkout of fork head (the key-exposure
+    # pwn-request). Fork coverage, if ever wanted, is a separate `workflow_run`
+    # job that reads metadata only and never executes fork code.
+
+# Least privilege + the three scopes the mechanism actually needs.
+permissions:
+  contents: read # read the diff/files
+  pull-requests: write # post the inline PR review (POST /pulls/{n}/reviews)
+  checks: read # poll A's bot check-runs (without this the poller 403s)
+
+concurrency:
+  # Only review the latest push. cancel-in-progress during the debounce
+  # `sleep` (below) is free. No cross-run lock: a run posts a review only when it
+  # has new findings (it skips otherwise) and converges via open-thread hash dedup
+  # (the post phase resolves/skips the threads it already owns), so a cancelled run is safe.
+  group: thunder-deep-review-${{ github.event.pull_request.number }}
+  cancel-in-progress: true
+
+jobs:
+  review:
+    # Hard-gate on same-repo PRs only (no forks), skip Dependabot, skip
+    # drafts. This is the primary defense — fork code must never run with
+    # access to repo secrets.
+    if: >-
+      github.event.pull_request.head.repo.full_name == github.repository &&
+      github.actor != 'dependabot[bot]' &&
+      github.event.pull_request.draft == false
+
+    # GitHub-hosted ephemeral runner: a clean VM per job. Safe now that the
+    # job only runs on same-repo PRs (fork `if:` gate above) and uses the
+    # public Anthropic API (no cloud creds on the runner).
+    runs-on: ubuntu-latest
+
+    # Cap total runner time so a hung model API call can't hold the
+    # runner indefinitely.
+    timeout-minutes: 20
+
+    env:
+      # Reconcile EVERYTHING to the PR HEAD sha, never the synthetic merge ref.
+      # Both the poller and the diff must describe the same code.
+      PR_NUMBER: ${{ github.event.pull_request.number }}
+      PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }}
+      PR_ACTION: ${{ github.event.action }}
+      PR_LABELS: ${{ join(github.event.pull_request.labels.*.name, ',') }}
+      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+      # The orchestrator posts the review under the default GITHUB_TOKEN, so the
+      # author login is github-actions[bot]. The script matches/resolves its own
+      # threads by this login (+ a hidden marker). No separate GitHub App.
+      THUNDER_BOT_LOGIN: github-actions[bot]
+      # File handoff between steps (must match .github/scripts/review-orchestrator.mjs).
+      THUNDER_DIFF_FILE: ${{ github.workspace }}/thunder-diff.patch
+      THUNDER_SKIPLIST_FILE: ${{ github.workspace }}/thunder-skiplist.json
+      THUNDER_DEEPMODE_FILE: ${{ github.workspace }}/thunder-deepmode.json
+      THUNDER_FINDINGS_FILE: ${{ github.workspace }}/thunder-findings.json
+
+    steps:
+      # ─────────────────────────────────────────────────────────────────────
+      # DEBOUNCE. Rapid pushes (rebases/typo fixes) cancel during this
+      # sleep at zero model cost. Only survivors proceed to spend on the model.
+      # ─────────────────────────────────────────────────────────────────────
+      - name: Debounce rapid pushes
+        run: sleep 60
+
+      # ─────────────────────────────────────────────────────────────────────
+      # Checkout. persist-credentials:false (don't leave a usable token on disk
+      # for the model step), check out the HEAD sha shallow. The diff itself is
+      # fetched from the GitHub API (next step), so we need NO base/head fetch
+      # and NOT fetch-depth:0 — the working tree is only here for the model's
+      # Read/Grep/Glob tools and the .claude symlink-prune step. SHA pinned to
+      # v7.0.0.
+      # ─────────────────────────────────────────────────────────────────────
+      - name: Checkout (head, shallow)
+        uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0
+        with:
+          persist-credentials: false
+          ref: ${{ github.event.pull_request.head.sha }} # review the head sha
+          fetch-depth: 1 # shallow; the diff comes from the API, not local git
+
+      # ─────────────────────────────────────────────────────────────────────
+      # Pre-compute the diff DETERMINISTICALLY (no model, no Bash tool in
+      # the model step). The model reads this file via `Read`.
+      #
+      # It MUST equal GitHub's REAL PR diff — the same merge-base..head ("Files
+      # changed") set that `POST /pulls/{n}/reviews` validates inline `comments[]`
+      # against — so we fetch it straight from the API (Accept: v3.diff) instead
+      # of a local `git diff`. A local TWO-DOT `git diff base..head` is WRONG:
+      # base.sha is the CURRENT tip of the base branch, not the merge-base, so
+      # when the PR branch is behind base it injects base-only commits as REVERSE
+      # diffs. The model then anchors findings to files NOT in the real PR diff,
+      # and `POST /pulls/{n}/reviews` 422s ("Path could not be resolved"),
+      # dropping the whole review.
+      #
+      # GITHUB_TOKEN is in the job env; this is NOT the read-only model step, so
+      # using it here is fine. `set -o pipefail` so an API error fails loudly
+      # instead of writing a truncated/empty diff that silently reviews nothing.
+      # ─────────────────────────────────────────────────────────────────────
+      - name: Compute diff to a bounded file
+        run: |
+          set -o pipefail
+          gh api "repos/${GITHUB_REPOSITORY}/pulls/${PR_NUMBER}" \
+            -H "Accept: application/vnd.github.v3.diff" > "$THUNDER_DIFF_FILE"
+          echo "diff bytes: $(wc -c < "$THUNDER_DIFF_FILE")"
+
+      # ─────────────────────────────────────────────────────────────────────
+      # The repo commits `.claude/{commands,agents}/thunderbot*` as symlinks into
+      # a local-only `.thunderbot/` dir that is NOT checked out in CI, so they are
+      # dangling links here. The Claude SDK scans `.claude/` (the project setting
+      # source) and crashes (`ENOENT ... statx`) on a dangling link. Remove ONLY
+      # broken symlinks; the real skill/agent files are left untouched.
+      # ─────────────────────────────────────────────────────────────────────
+      - name: Prune dangling .claude symlinks (local-only tooling)
+        run: |
+          find .claude -type l | while read -r l; do
+            [ -e "$l" ] || { echo "pruning dangling symlink: $l"; rm -f "$l"; }
+          done
+
+      # ─────────────────────────────────────────────────────────────────────
+      # (PRE phase): poll A's bots, build the skip-list, compute the
+      # deterministic deep-mode flag. ALL GitHub I/O is in this script.
+      # This step touches untrusted PR metadata but holds no model credentials.
+      # ─────────────────────────────────────────────────────────────────────
+      - name: Orchestrator — poll bots + build skip-list (pre)
+        run: node .github/scripts/review-orchestrator.mjs pre
+        # NOTE: .github/scripts/review-orchestrator.mjs is checked in at the
+        # repo-relative path; recommend CODEOWNERS-protecting it.
+
+      # ─────────────────────────────────────────────────────────────────────
+      # The MODEL step. READ-ONLY: emits STRUCTURED JSON findings ONLY as the
+      # action's structured_output (via --json-schema), NOT to a file (it has no
+      # Write/Edit/Bash tool). The next step materializes that output to the
+      # findings file. It does NO GitHub I/O.
+      #
+      # Input names below were verified against the live @v1 `action.yml`
+      # (2026-06); re-confirm only if the action major version changes:
+      #      - auth via `anthropic_api_key` (the Anthropic public API directly;
+      #        NOT Bedrock/Vertex). The only secret the team provisions.
+      #      - sticky/consolidated comment via `use_sticky_comment`. We render+post
+      #        the inline review ourselves in the post step, so it stays "false"
+      #        here — the post step is the single owner of the inline review.
+      #      - pass model + tools through `claude_args`, NOT `model:`/`allowed_tools:`.
+      #      - there is NO `review_event:` input in @v1.
+      #      - allowedTools = Read,Grep,Glob,Skill,Task,Agent. NO Bash(...). NO write tools.
+      #        `Skill` and the subagent-spawn tool REQUIRE permission (per the tools
+      #        reference); the action runs the Agent SDK in DEFAULT permission mode
+      #        with NO canUseTool callback (headless CI), so any permission-required
+      #        tool that is NOT pre-approved by an allow rule is denied with no prompt.
+      #        The whole review is invoked via the `Skill` tool ("Use the
+      #        thunder-deep-review skill"), and the skill fans out to read-only
+      #        sub-reviewers via the subagent-spawn tool — that tool is named `Task`
+      #        in some SDK versions and `Agent` in others, so BOTH are allow-listed
+      #        for forward/backward compatibility (the unused name is harmless). If
+      #        only the wrong name were listed, every spawn (the per-lens sub-
+      #        reviewers + the dispatched powersync-sync/react-effect reviewers)
+      #        would be silently denied, collapsing the fan-out to a single pass.
+      #        Read/Grep/Glob need no permission but are listed for clarity. The
+      #        skill + its subagents under `.claude/{skills,agents}/` are auto-
+      #        discovered because the SDK loads the `project` setting source by default.
+      # SHA pinned to v1.
+      # PIN the model id LITERALLY here — do NOT source from vars.* (a
+      #         mutable repo/org var changes review behavior with no PR/audit trail).
+      # ─────────────────────────────────────────────────────────────────────
+      - name: thunder-deep-review (model — structured findings only)
+        id: review_model
+        uses: anthropics/claude-code-action@806af32823ef69c8ef357086c573a902af641307 # v1
+        with:
+          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
+          # Pass the workflow token so the action authenticates to GitHub WITH it
+          # instead of minting a Claude App token via OIDC — which would require
+          # `id-token: write` (we keep least-privilege without it). The action's
+          # OIDC path only runs when github_token is empty; providing it skips OIDC.
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          use_sticky_comment: "false" # we own the inline review via the post step
+          # Model id pinned LITERALLY (a mutable vars.* would change review
+          # behavior with no PR/audit trail). claude-opus-4-8 is a pinned
+          # snapshot id (the 4.6+ generation uses a dateless-but-pinned format).
+          # --allowedTools: Skill (invokes the review skill — permission-required,
+          # so it MUST be allow-listed or the headless run denies it and no-ops),
+          # Task AND Agent (the skill's read-only sub-reviewer fan-out — also
+          # permission-required; the subagent-spawn tool is `Task` in some SDK
+          # versions and `Agent` in others, so both names are listed), and
+          # Read,Grep,Glob (read-only working-tree access). NO Bash,
+          # NO write tools — the model is fully READ-ONLY and CANNOT write a file.
+          # --max-turns caps the agent loop.
+          # --json-schema: the model is read-only, so it can't write the findings
+          # file directly. Instead it returns findings as the action's STRUCTURED
+          # OUTPUT (action.yml `outputs.structured_output`, populated only when a
+          # --json-schema is provided in claude_args). The next step materializes
+          # that output into ${THUNDER_FINDINGS_FILE}. The schema MUST stay aligned
+          # with the orchestrator's readModelFindings parser:
+          #   findings[].severity ∈ {blocking,convention,nit}; file (string|null);
+          #   line (integer|null, parser keeps only >= 1); side ∈ {RIGHT,LEFT}
+          #   (optional, defaults RIGHT); title; body; rule (string|null).
+          claude_args: >-
+            --model claude-opus-4-8
+            --allowedTools Read,Grep,Glob,Skill,Task,Agent
+            --max-turns 20
+            --json-schema '{"type":"object","additionalProperties":false,"required":["findings"],"properties":{"findings":{"type":"array","items":{"type":"object","additionalProperties":false,"required":["severity","file","line","title","body","rule"],"properties":{"severity":{"type":"string","enum":["blocking","convention","nit"]},"file":{"type":["string","null"]},"line":{"type":["integer","null"],"minimum":1},"side":{"type":"string","enum":["RIGHT","LEFT"]},"title":{"type":"string"},"body":{"type":"string"},"rule":{"type":["string","null"]}}}}}}'
+          prompt: |
+            Use the thunder-deep-review skill to review this pull request.
+            The unified diff to review is in the file: ${{ env.THUNDER_DIFF_FILE }} (read it with Read).
+            A skip-list of issues ALREADY reported by other bots is in:
+              ${{ env.THUNDER_SKIPLIST_FILE }}
+            Do NOT repeat anything on the skip-list. Surface ONLY what the other
+            bots missed (architecture, conventions, docs-intent, readability).
+            Deep-mode / bounded-mode flags are in: ${{ env.THUNDER_DEEPMODE_FILE }} —
+            honor them; do NOT decide spend yourself.
+
+            OUTPUT CONTRACT: you are READ-ONLY — do NOT attempt to write any file.
+            Return your findings as STRUCTURED OUTPUT matching the provided JSON
+            schema — a single object:
+              { "findings": [
+                  { "severity": "blocking" | "convention" | "nit",
+                    "file": "<path>", "line": <int|null>,
+                    "side": "RIGHT" | "LEFT",
+                    "title": "<short, plain-language summary>",
+                    "body": "<the explanation + suggested fix, in warm human prose>",
+                    "rule": "<the rule/invariant id, e.g. R-NOANY or INV-01, or null>" } ] }
+            VOICE — write `title` and `body` for a TEAMMATE, not a linter: warm,
+            collaborative, curious. Lead with a question or first-person framing
+            where it fits ("I wonder if…", "Could we…?", "Heads up —"). Be specific
+            and kind; never robotic, terse, or scolding.
+            HUMAN-FACING TEXT — `title` and `body` are shown verbatim to a person,
+            so write ONLY plain prose + the concrete fix. Do NOT put rule/invariant
+            ids, section refs, or internal scaffolding in them — NO "R-REDUCER",
+            NO "INV-01", NO "(review-heuristics §2)", NO "H mass-assignment …".
+            Put the id ONLY in the `rule` field; it is internal grounding and is
+            NEVER shown to the human.
+            Emit NO markdown headers, NO severity trailer, NO PR comment — the
+            orchestrator renders and posts. If nothing to add, return {"findings":[]}.
+
+      # ─────────────────────────────────────────────────────────────────────
+      # Persist the model's STRUCTURED OUTPUT to the findings file the post phase
+      # reads. The model step is READ-ONLY (no Write/Edit/Bash tool), so it cannot
+      # create ${THUNDER_FINDINGS_FILE} itself — the action's structured_output
+      # (populated by the --json-schema above) is the handoff channel. This step
+      # has legitimate shell access and materializes the file.
+      #
+      # Security: structured_output is model-derived and can contain quotes, shell
+      # metacharacters, or hostile PR content. We pass it via `env:` (NEVER inlined
+      # into the run: body) and read it with Node from process.env, so there is no
+      # script-generation injection surface (GitHub's recommended pattern for
+      # untrusted expression values).
+      #
+      # Fail posture: an EMPTY structured_output (model step skipped/cancelled/
+      # errored — this step is `if: always()`) falls back to an empty findings set
+      # so the post phase stays a clean no-op instead of crashing on ENOENT. A
+      # NON-empty but malformed payload throws here, surfacing a real schema
+      # regression instead of silently posting nothing.
+      # ─────────────────────────────────────────────────────────────────────
+      - name: Persist structured findings to file
+        if: always()
+        env:
+          STRUCTURED_OUTPUT: ${{ steps.review_model.outputs.structured_output }}
+        run: |
+          node <<'NODE'
+          const fs = require('node:fs');
+          const file = process.env.THUNDER_FINDINGS_FILE;
+          if (!file) throw new Error('THUNDER_FINDINGS_FILE is unset');
+          const raw = process.env.STRUCTURED_OUTPUT;
+          if (!raw) {
+            // Empty structured_output means the model step produced NOTHING
+            // (skipped/cancelled/errored) — distinct from a genuine empty result,
+            // which arrives as the non-empty string '{"findings":[]}'. Fail loudly
+            // so a broken model step is a red check, never a silent green no-op
+            // (codex second opinion).
+            console.error('::error::model step produced no structured_output — review did not run');
+            process.exit(1);
+          }
+          const parsed = JSON.parse(raw); // throws on malformed -> surfaces a real regression
+          if (!parsed || !Array.isArray(parsed.findings)) {
+            throw new Error('structured_output must be an object with a findings[] array');
+          }
+          fs.writeFileSync(file, `${JSON.stringify(parsed)}\n`, 'utf8');
+          console.log(`wrote ${parsed.findings.length} finding(s); bytes: ${fs.statSync(file).size}`);
+          NODE
+
+      # ─────────────────────────────────────────────────────────────────────
+      # (POST phase): read+validate the model's JSON, compute the
+      # resolved-diff structurally, render markdown deterministically, and post
+      # ONE PR review (POST /pulls/{n}/reviews, event=COMMENT) with anchored inline
+      # comments — no issue comment, no PATCH/upsert, no lock. No model creds here;
+      # pure GitHub I/O.
+      # `if: always()` so a non-zero model step still lets us post (or skip
+      # fail-soft) — but the script itself exits 0 on any error.
+      # ─────────────────────────────────────────────────────────────────────
+      - name: Orchestrator — render + post inline review (post)
+        if: always()
+        run: node .github/scripts/review-orchestrator.mjs post
diff --git a/.gitignore b/.gitignore
index 2cc1b5941..007c8c6f1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -54,6 +54,9 @@ mise.toml
 !.claude/agents/**
 !.claude/rules/
 !.claude/rules/**
+!.claude/skills/
+!.claude/skills/thunder-deep-review/
+!.claude/skills/thunder-deep-review/**
 .zed/**
 .zedscratches/**
 .mcp.json