diff --git a/docs/design/ci-testing-corridor-app.md b/docs/design/ci-testing-corridor-app.md
new file mode 100644
index 0000000..1ce0dbd
--- /dev/null
+++ b/docs/design/ci-testing-corridor-app.md
@@ -0,0 +1,229 @@
+# Continuous Integration Testing for the Corridor App
+
+**Status:** Draft
+**Author:** Tanuj Vasudeva
+**Linear:** [COR-5749 — Build complete e2e tests for CLI + Claude Code](https://linear.app/corridorsecurity/issue/COR-5749/build-complete-e2e-tests-for-cli-claude-code)
+**Project:** [Continuous Integration Testing for the Corridor App](https://linear.app/corridorsecurity/project/continuous-integration-testing-for-the-corridor-app-9b6cc143a2ab)
+**Cycles in scope:** 34 (May 25 – Jun 8, 2026), 35 (Jun 8 – Jun 22, 2026), with cycle 36 stretch
+
+---
+
+## 1. Context
+
+"Application" here means the functional surface that lives on the user's device and interacts with their coding agents — the CLI binary, the IDE extension, the hooks plugins, and the MCP server registrations they write into Cursor / Claude Code / Codex / Cascade / Droid / Windsurf config trees.
+
+### 1.1 The problem we keep hitting
+
+As we have added agent providers (Cursor → Claude Code → Codex → Cascade → Droid → Windsurf), our mock-heavy unit tests have grown insufficient at modeling real-world install + run + uninstall behavior. Concretely:
+
+- `cli/cmd/install.go` is a 946-LoC orchestrator (`runInstall`) that calls 18 sibling functions. Any one of them silently returning early ships a broken first-install for every new customer (COR-6093, COR-5973).
+- We have **two** customer-facing uninstall paths — `cli/cmd/uninstall.go` and `scripts/uninstall.py` — each with its own hand-maintained list of IDE config locations. They are guaranteed to drift (COR-6101).
+- We have **5 critical CLI flows** documented in `E2E_TEST_FLOWS.md` as **manually-tested only**. The most consequential — Codex managed-hooks plist merge (COR-6043) — has the property that a single bad release wipes out the MDM-managed Codex hook config across an entire customer fleet, and we would not catch it before merge.
+- POV-customer-facing regressions in the last quarter (Etsy: stale CLI from missing auto-update; Eleven Labs: 20s `analyzePlan` calls; Factory: matcher mismatch between embedded plugin and ide-extension provider) all share the same root cause: the unit tests pass against mocked filesystems, the manual smoke is run once at release time, and the regression is observed in production.
+
+### 1.2 What landed in cycle 33 (the foundation we're building on)
+
+Cycle 33 (May 11 – May 25) shipped the test infrastructure that the next two cycles depend on. The relevant PRs are:
+
+| PR | Linear | What it gave us |
+|---|---|---|
+| corridor#3150 (the original) | COR-4103 | First attempt at full Claude Code E2E. The architectural review of #3150 is what motivated COR-4317's principles (test the right things at the right level, prefer determinism over LLM-coupled tests, use pytest + uv instead of hand-rolled). |
+| corridor#4900 | COR-5749 | **Snapshot/diff harness for installation tracking.** Captures the on-disk state of `$HOME` (MCP JSON, hook binaries, rule files, Auth0 token, install marker) before/after each `corridor install` / `corridor uninstall`, then diffs against a checked-in golden snapshot. This is the primitive everything else in this project leans on. |
+| corridor#5120 | COR-5749 | **Activated the installation-tracking CI gate** with the first per-target allowlists. Any PR that touches `cli/` now has to either match the golden snapshot or explicitly update the allowlist for the affected target — install-side-effect drift is no longer silent. |
+| corridor#5126 | COR-5749 | **Migrated `cli/tests/agent` off the hand-rolled diff onto the harness.** Existing agent tests now share a single diffing engine with the install-tracking gate, so we don't have two implementations of "what changed under `$HOME`?" to maintain. |
+| corridor#4901 + #4912 | COR-5857 | **CLI and hooks emit structured JSONL events** to `~/.corridor/debug/*.jsonl`. This replaces free-form debug logs and gives the harness something to assert against ("the `mcp.write` event fired exactly once with target=cursor"). |
+| corridor#5082 | COR-5973 | **E2E smoke for `corridor install`** (in design — first end-to-end test using the harness against the real `runInstall`, not unit-mocked sub-pieces). |
+
+### 1.3 What this design doc is and isn't
+
+This doc plans cycles 34 and 35 — what we test, in what order, against which agents, and where each piece of test code lives.
+
+It is **not** a re-litigation of the architecture decisions in COR-4317 (e2e/ mono-project, deterministic-tests-for-deterministic-systems, pinned-version compatibility matrix). Those are settled; this doc adopts them.
+
+---
+
+## 2. Goal
+
+> On some machine, install the Corridor CLI, do a bunch of stuff with it, and uninstall. At each step, test for the correct input and output.
+
+Concretely, by the end of cycle 35 we want a CI job that, on every PR touching `cli/`, `vscode-extension/`, or `e2e/toolchain/`:
+
+1. Builds the CLI binary from source.
+2. Spins up a clean `$HOME` and a mock Corridor backend.
+3. Runs the install + use + uninstall cycle against **each supported agent** (Claude Code, Cursor, Codex), pinned to a specific agent version.
+4. Asserts both:
+   - **On-disk side effects** match the snapshot (MCP JSON, hooks, rules, plist for Codex on macOS) via the COR-5749 harness.
+   - **Behavioral output** matches expectations (correct JSONL events, correct exit codes, correct stdout banners) via direct event assertions.
+5. Fails the build if either set of assertions fails — no flaky-test allowlists, no skip annotations without a linked follow-up ticket.
+
+We are explicitly **not** trying to test Claude Code's / Cursor's / Codex's own behavior — we are testing the seam between those agents and our CLI. Where we need the agent to actually invoke a hook or load an MCP server, we drive the agent's own non-interactive entrypoint (`claude --print …`, Cursor's `cursor-agent`, Codex's `codex exec …`) with a fixed prompt and assert on the JSONL events Corridor emits, not on the LLM output.
+
+---
+
+## 3. Cycle plan
+
+### Cycle 33 (just-completed) — Test infra + Claude Code coverage
+
+Status: **In Review** on parent COR-5749. Carrying the install-tracking gate, JSONL events, and the first install smoke into cycle 34.
+
+Carryover into cycle 34:
+- Land the remaining sub-PRs of COR-5749.
+- Land COR-5973 (`corridor install` smoke) — already in design, PR corridor#5082.
+- Backfill the Claude Code half of the agent matrix below to parity with what cycle 33's harness can already assert.
+
+### Cycle 34 — Cursor coverage + uninstall guarantees
+
+**Theme:** Bring Cursor (CLI + extension) to the same coverage level Claude Code has at the end of cycle 33, and close the highest-impact uninstall regression risk.
+
+| Workstream | Tickets | What "done" looks like |
+|---|---|---|
+| **CW-1. Cursor install/uninstall snapshot** | new sub-issue of COR-5749, alongside existing harness | Cursor target added to the install-tracking allowlist + snapshot. CI fails on any `cli/cmd/install_cursor.go` or matcher change that drifts from the golden. |
+| **CW-2. Extension ↔ CLI delegation E2E** | COR-5845 (extension uses local plugin dir), follow-up of COR-4225 | Pytest under `e2e/toolchain/` that builds the VSIX, invokes `corridor install --target ide-extension` through the extension's `CliProvider`, and asserts the resulting `~/.cursor/mcp.json` + hook wrapper match the golden. Catches the "extension and CLI install paths drift" class of bug (the original motivation for COR-4225). |
+| **CW-3. Curl-installer E2E** | COR-6269 (cancelled — re-scope) | Playwright spec under `e2e/tests/cli-curl-installer.spec.ts` that runs the staging install script in a tmpdir-scoped `$HOME`, asserts binary lands at `~/.corridor/bin/corridor`, `corridor --version` is a non-empty semver, `corridor install --target none --no-mcp` exits 0. The cancelled ticket's PR (corridor#5328) had reviewer notes — fold those in. Skip on Windows. |
+| **CW-4. Codex managed-hooks merge automation** (macOS) | COR-6043 | Go integration test gated `//go:build darwin` that exercises the actual `defaults read/write` path against a throwaway `com.corridor.test.codex.${RANDOM}` plist key. Wired into `.github/workflows/test-cli-installer.yml` as a `runs-on: macos-latest` job. Removes the manual entry from `E2E_TEST_FLOWS.md`. **Highest single-regression-blast-radius item in the project; do this in cycle 34, not cycle 35.** |
+| **CW-5. Auto-update regression guard** | COR-5715 / COR-6330 / COR-6363 | A test that pins an old CLI binary, runs the session-start hook against the mock backend, and asserts the binary at `~/.corridor/bin/corridor` is now the new version. The Etsy POV regression (CLI versions weeks out of date) was exactly this bug class. |
+| **CW-6. E2E runtime budget** | replacement for cancelled COR-6124 | The existing e2e CI is too slow to add three more agent matrices to. Re-open with a concrete target (e.g. p95 ≤ 8 minutes) and the proven wins from corridor#5207 (sharding), #5208 (Playwright cache), #5209 (parallelize dep containers). Without this, cycle 35 can't actually add Codex to the matrix without timing out. |
+
+**Cycle 34 explicit non-goals:**
+- Codex agent end-to-end (planned for cycle 35).
+- VS Code marketplace install of the VSIX (depends on a separate build chain — track in cycle 36 stretch).
+- Windows curl-installer parity (track separately; not blocking enterprise rollout).
+
+**Cycle 34 dependency order** (to reduce wasted work if we slip):
+
+```
+CW-6 (CI runtime) ──┐
+                    ├──► CW-1 (Cursor snapshot) ──► CW-2 (Extension ↔ CLI E2E)
+                    ├──► CW-3 (curl installer)
+                    └──► CW-4 (Codex plist) ──► CW-5 (auto-update guard)
+```
+
+CW-6 is on the critical path because every other workstream lengthens CI; if we don't pay down runtime first, we'll be tempted to mark new tests `runs-on: only when label X is set`, which silently re-creates the "manually run" problem.
+
+### Cycle 35 — Codex coverage + install/uninstall consolidation
+
+**Theme:** Codex is the next provider rollout. Ship the same coverage we now have for Claude Code and Cursor, and close the dual-uninstall-path drift risk while we're touching the install code anyway.
+
+| Workstream | Tickets | What "done" looks like |
+|---|---|---|
+| **CW-7. Codex install/uninstall snapshot** | new sub-issue of COR-5749 | Codex target added to the install-tracking allowlist + snapshot, including the macOS plist branch (depends on CW-4 having landed the test harness for `defaults read/write`). |
+| **CW-8. Codex plugin auto-enable** | COR-5942 (already shipped — add regression test) | Test asserting that after `corridor install --target codex`, the Corridor plugin is **enabled**, not just present. The original bug was "plugin is there but user has to press Enter to install." This is the kind of regression the snapshot diff alone won't catch — needs JSONL-event assertion. |
+| **CW-9. Codex E2E with `codex exec`** | new sub-issue, mirrors Claude Code's cycle-33 work | Pytest that drives `codex exec` non-interactively against a fixed prompt that triggers a Corridor hook, asserts the JSONL event sequence (`hook.fire` with the expected matcher). Pinned Codex version in the compatibility matrix. |
+| **CW-10. Unify install/uninstall config registry** | COR-6101 | Single source of truth for "list of IDE config locations" shared by `cli/cmd/uninstall.go` and `scripts/uninstall.py`. Add a unit test that fails if the two paths diverge. Lower priority than CW-7/8/9 but can be parallelized; a junior eng can drive this with Cursor in the loop. |
+| **CW-11. Declarative `IDETarget` manifest** | COR-6093 | Refactor `cli/cmd/install.go` so adding a new agent is a sub-50-LoC manifest entry rather than a 700-LoC imperative branch. **Stretch** — only attempt if CW-7–10 are tracking ahead of plan. The risk is that a half-finished refactor in cycle 35 destabilizes the same code path we're trying to harden. |
+| **CW-12. CI gate becomes blocking** | new sub-issue | Today the install-tracking allowlist gate is advisory (PRs can override). After CW-1, CW-7 land and the false-positive rate has settled, flip it to required. This is the bit that actually prevents the "merged on Friday, regression on Monday" pattern. |
+
+**Cycle 35 explicit non-goals:**
+- Cascade, Droid, Windsurf agent E2E. Each gets its own cycle once Codex is stable; we don't have customer pressure to ship them faster than that.
+- Browser-side dashboard E2E for the Compliance dashboard (COR-2438 is shipped; its tests live under `e2e/webapp/` and are the responsibility of a different track).
+
+### Cycle 36 — stretch / contingency
+
+If anything from cycles 33–35 slips, cycle 36 is the catch-up window. If everything ships, the stretch work is:
+
+- Cascade + Droid + Windsurf E2E to parity with Claude/Cursor/Codex (likely one cycle each, though much smaller than cycle 33's infra build because the harness is reusable).
+- VS Code Marketplace VSIX install path E2E (different build chain — needs `@vscode/test-electron` or equivalent; out of scope until then).
+- Windows curl-installer parity.
+- A "dogfood telemetry" CI job that spot-checks the JSONL events emitted by every team member's installed CLI on a daily cron. This is the engineering-side complement to the "every Corridork has Corridor installed" cultural commitment from §1.
+
+---
+
+## 4. Architecture decisions (adopted from COR-4317)
+
+These are not new — they were settled in the architectural review of corridor#3150. Repeating them here so the cycle-34/35 PRs don't re-debate them.
+
+1. **Test the right things at the right level.** Most of what corridor#3150 attempted is already covered by `test_cli.py` and `test_hooks.py`. The genuine gap is the install side-effect surface and the install-tracking gate — that is what cycles 33–35 target.
+2. **Deterministic tests for deterministic systems.** `corridor install`, the hooks binary, and MCP config writes are all deterministic given a fixed input. Do not couple their tests to non-deterministic Claude/Codex API calls. Where we need to drive an agent, use a fixed prompt and assert on Corridor's JSONL events, not on the LLM's reply.
+3. **Established tools, not hand-rolled.** pytest + uv for Python, standard `testing` package for Go, `@playwright/test` for browser. No `PASS`/`FAIL` globals + `assert_true()`. The cycle-33 migration off the hand-rolled diff (corridor#5126) is the template.
+4. **e2e/ as a mono-project.**
+   ```
+   e2e/
+   ├── toolchain/   # Python: CLI, hooks, agent integration  (pytest + uv)
+   ├── webapp/      # Playwright: frontend + backend browser
+   ├── extension/   # VS Code VSIX
+   ├── shared/      # Docker images, scripts, CI helpers
+   └── README.md
+   ```
+   Cycles 34/35 work lives under `toolchain/` and `extension/`. The mono-project layout (COR-4322) is a prerequisite — track separately if not yet landed.
+5. **Pinned compatibility matrix.** Each agent under test pins a specific version in `e2e/toolchain/conftest.py`. We update the pin in a dedicated PR with a "I read the agent's changelog and nothing relevant changed" reviewer note. We do **not** rely on `npm install -g …` returning a stable version.
+
+---
+
+## 5. Test surface (what each test asserts)
+
+For every (agent, host-OS) pair in `{Claude Code, Cursor, Codex} × {linux, macOS}`, the canonical install/use/uninstall test asserts:
+
+**Install side**
+- Exit code 0; recognizable success banner on stdout.
+- MCP JSON written at the agent's expected path (`~/.cursor/mcp.json`, `~/.claude/mcp.json`, Codex plist on macOS, etc.) with the correct `corridor` server entry and api-key.
+- Hook binaries copied to `~/.corridor/bin/` with the expected file mode.
+- Per-target rule file written for platforms that support it (`writeCorridorRule`).
+- Auth0 token persisted to `~/.corridor/config.env`.
+- Install marker created.
+- JSONL event log contains exactly one `install.start`, one `install.complete`, and one `mcp.write` per target, in that order.
+- No prompts on stdin (the install path supports non-interactive via env vars; the test reveals any regression there).
+
+**Use side** (one fixed prompt per agent, asserting on Corridor's behavior, not the agent's)
+- Driving the agent's non-interactive entrypoint produces the expected sequence of `hook.fire` events with the right matchers.
+- The matcher is the agent-specific one — Factory's `.___.*`, Cursor's `mcp__.*|Bash`, Codex's `*` — and the test fails on a regression in either matcher (the COR-5844 bug class).
+
+**Uninstall side**
+- Exit code 0 on `corridor uninstall --non-interactive`.
+- All on-disk side effects from install are gone.
+- Foreign entries (the seeded `/usr/local/bin/user-tool` Codex plist scenario) are preserved — this is the COR-6043 / MDM regression guard.
+- JSONL event log contains exactly one `uninstall.start` and one `uninstall.complete`.
+
+**`--target` scoping**
+- Re-running with `--target=<single agent>` touches only that agent's files (regression coverage for `filterTargetsByFlag`).
+
+---
+
+## 6. Risks and mitigations
+
+| Risk | Mitigation |
+|---|---|
+| **CI runtime balloons past the timeout as we add agents.** | CW-6 first. Hard p95 budget enforced in CI. If a new agent matrix would push us over, we shard further or split the workflow file. |
+| **Allowlist becomes a rubber-stamp.** Every drift PR adds an allowlist entry, the gate becomes advisory in practice. | CW-12: flip the gate to required after the allowlists stabilize. Require a Linear ticket linked from any allowlist diff in the PR template. |
+| **Codex plist test pollutes a developer's real `com.openai.codex`.** | Throwaway plist key namespaced as `com.corridor.test.codex.${RANDOM}`. Cleanup runs on `t.Cleanup` even on failure. Documented in COR-6043. |
+| **Pinned agent versions go stale; we ship a regression because the matrix didn't catch a real-world version.** | Quarterly cadence for bumping the pinned versions. The bump PR is its own ticket so the changelog review is the entire PR diff. |
+| **The 946-LoC `runInstall` orchestrator (COR-6093) keeps growing during cycle 34/35 because we're adding test hooks.** | Do **not** mix CW-11 (the `IDETarget` refactor) into the test-coverage PRs. The refactor lands in cycle 35 only after coverage is good enough to catch its regressions. |
+| **Two-uninstall-path drift (COR-6101) becomes worse during cycle 34.** | CW-10 in cycle 35 closes this. Until then, every PR that adds an IDE config location to one path is required (PR-template checkbox) to add it to the other. |
+| **POV-customer Slack reports a regression we should have caught — but the test for it is "in cycle 36 stretch."** | Treat any such report as a cycle-immediate priority that bumps a stretch item out, with a backfilled E2E test as the fix's acceptance criterion. The test-first habit is the cultural shift this project is trying to enforce. |
+
+---
+
+## 7. Out of scope (and why)
+
+- **LLM-quality regression testing.** Whether Claude/Codex generate a "good" response to a Corridor finding is not a CI question; it's an evals question and lives in a different project.
+- **Performance benchmarking of `analyzePlan`.** COR-5842 (20s `analyzePlan`) is a backend perf issue, not a CLI install issue. Performance work belongs in the Reliability project.
+- **Auth0 sign-in browser flow.** Covered by the dashboard Playwright tests under `e2e/webapp/`. The CLI tests use a pre-provisioned token from the mock backend.
+- **Tests for cosmetic CLI output.** First-install banners (COR-4081), warning suppression (COR-6145), and similar are validated by snapshot-of-stdout in the smoke test, not by per-line assertions.
+
+---
+
+## 8. Cultural commitments (from §1.2 of the original draft)
+
+These are not testable in CI but are in scope for this project because they are the only way to catch the bugs CI inherently can't:
+
+1. **Every member on the eng team has Corridor installed and uses it daily.** This is our best defense against the bugs that only show up under "real human in front of an editor" conditions — laggy hooks, confusing rule wording, stale CLI versions sitting on a laptop for two weeks. If you don't like a piece of functionality for your dev workflow, you are likely not alone — flag it, file a ticket, gate the fix behind a feature flag, ship.
+2. **Manually-tested-only flows in `E2E_TEST_FLOWS.md` are tech debt.** Every entry in that file is a ticket assigned to someone, with a target cycle. The file shrinks every cycle until it's empty. Cycle 35 ends with at most 1 entry remaining (Windows curl installer, which is intentionally deferred).
+
+---
+
+## 9. Open questions
+
+- **Where do extension VSIX install tests live?** `e2e/extension/` per COR-4317, but the build chain for invoking VS Code in CI (`@vscode/test-electron`) is not yet stood up. Decision needed by mid-cycle 34 if we want VSIX coverage in cycle 35.
+- **Do we run the macOS Codex plist job on every PR, or only on PRs touching `cli/cmd/install_codex.go` and friends?** macOS GitHub runners are 10× the cost of Linux. Recommendation: every PR touching `cli/`, with a path-filter to skip when only `e2e/webapp/` changed.
+- **Mock backend: shared fixture or per-test?** Cycle 33's harness uses a shared `httptest.NewServer` per process. As we add the agent E2E paths in cycles 34/35 we'll want per-test isolation for some tests (so a misbehaving test can't pollute the next one's `client-settings` payload). Worth a small refactor PR before CW-2 lands.
+
+---
+
+## 10. References
+
+- Linear project: [Continuous Integration Testing for the Corridor App](https://linear.app/corridorsecurity/project/continuous-integration-testing-for-the-corridor-app-9b6cc143a2ab)
+- Parent issue: [COR-5749](https://linear.app/corridorsecurity/issue/COR-5749/build-complete-e2e-tests-for-cli-claude-code)
+- Architecture principles: [COR-4317 — Improved E2E testing framework](https://linear.app/corridorsecurity/issue/COR-4317/improved-e2e-testing-framework)
+- Mono-project layout: [COR-4322](https://linear.app/corridorsecurity/issue/COR-4322/reorganize-e2e-into-mono-project-structure)
+- Manually-tested flow inventory: `E2E_TEST_FLOWS.md` (corridor monorepo)
+- High-leverage individual gaps: [COR-5973](https://linear.app/corridorsecurity/issue/COR-5973/) (`corridor install` smoke), [COR-6043](https://linear.app/corridorsecurity/issue/COR-6043/) (Codex plist merge), [COR-6093](https://linear.app/corridorsecurity/issue/COR-6093/) (`install.go` declarative refactor), [COR-6101](https://linear.app/corridorsecurity/issue/COR-6101/) (uninstall-path unification)
+- Cycle-33 foundation PRs: corridor#3150, #4900, #5120, #5126, #4901, #4912, #5082