From 00fbb43782585ab52979bce0a7be8f7979895e85 Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Wed, 24 Jun 2026 23:50:03 +0800
Subject: [PATCH 01/18] docs(adr): openab-agent multi-vendor OAuth & credential
 storage

Proposed ADR for the openab-agent LLM-provider OAuth revamp: a two-axis
OAuthVendor adapter (auth flow vs inference transport), a cross-process
flock-guarded credential-store invariant for auth.json, the
CLAUDE_CODE_OAUTH_TOKEN env route, a 14-variant vendor feasibility matrix,
and the /auth (PR #1185) auth-trigger model. Surfaced while reviewing
PR #1187 (first OAuth vendor).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/adr/openab-agent-oauth.md | 267 +++++++++++++++++++++++++++++++++
 1 file changed, 267 insertions(+)
 create mode 100644 docs/adr/openab-agent-oauth.md

diff --git a/docs/adr/openab-agent-oauth.md b/docs/adr/openab-agent-oauth.md
new file mode 100644
index 000000000..41a74a8e5
--- /dev/null
+++ b/docs/adr/openab-agent-oauth.md
@@ -0,0 +1,267 @@
+# ADR: openab-agent — Multi-Vendor LLM-Provider OAuth & Credential Storage
+
+- **Status:** Proposed
+- **Date:** 2026-06-24
+- **Author:** @brettchien
+- **Related:** `docs/adr/openab-agent.md` (charter), `docs/adr/openab-agent-mcp.md` §6 (MCP OAuth + §6.1 storage format), PR #1187 (Anthropic OAuth, first provider), PR #1185 (`/auth` device-flow relay), PR #1111 (`--no-browser`)
+
+---
+
+## 1. Context & Motivation
+
+### 1.1 Why now
+`openab-agent` reaches LLM providers in two ways: `ANTHROPIC_API_KEY` (pay-per-token) and an existing
+Codex subscription-OAuth tenant in `~/.openab/agent/auth.json`. **PR #1187** adds native **Anthropic
+(Claude Pro/Max) OAuth** as a second subscription tenant. This is the moment to set the pattern for *every*
+future provider rather than let each PR hand-roll its own flow.
+
+### 1.2 What PR #1187 surfaced
+Reviewing #1187 exposed a latent, **release-blocker-class storage bug** that is independent of any single
+provider: `auth.json` is a shared multi-writer file with an **unlocked read-modify-write**, and openab-agent
+runs **one process per Discord thread** (`SessionPool` in `crates/openab-core/src/acp/pool.rs` → `crates/openab-core/src/acp/connection.rs`
+spawns one `openab-agent` child per thread). So ordinary concurrent multi-thread usage = concurrent
+processes refreshing the same OAuth token → refresh-token-rotation reuse → worst case OAuth 2.1 §10.4
+**token-family revocation = fleet-wide logout**. API-key users never hit this (no refresh); **OAuth adoption
+is what activates the bug.**
+
+### 1.3 The wider demand
+openab packages 14 agent variants (`kiro, claude, codex, copilot, cursor, gemini, grok, hermes, mimocode,
+opencode, antigravity, pi, native, agentcore`). Several wrap a model vendor reachable by subscription OAuth.
+A coherent extension model lets openab-agent (the `native` variant) host these directly. PR #1185 already
+shipped a Discord `/auth` slash command that relays a device-flow login — the agreed near-term auth UX.
+
+---
+
+## 2. Goals & Non-Goals
+
+### In scope
+- A single **`OAuthVendor` adapter** (auth axis) reused by all subscription-OAuth providers.
+- Keeping the **inference axis** (per-provider request/response transport) **separate** from auth.
+- A **concurrency-safe credential store**: all `auth.json` writes funnel through one locked
+  read-modify-write helper (covers MCP `CredentialStore` + provider tenants).
+- Support for the OAuth styles real vendors use: **PKCE public**, **PKCE + bundled client_secret**,
+  **device flow (RFC 8628)**, and **pre-provisioned long-lived token via env** (`CLAUDE_CODE_OAUTH_TOKEN`).
+- Compatibility with PR #1185's `/auth` poll-and-exit relay model.
+
+### Out of scope
+- **Layer-3 auto-trigger** (agent auto-launches login on a mid-turn 401). DEFERRED (Brett, 2026-06-24);
+  the manual `/auth` command is sufficient for now.
+- Building every vendor at once. This ADR sets the model; vendors land incrementally.
+- Non-OAuth backends: `agentcore` (AWS SigV4/IAM/Bedrock) is explicitly outside the OAuth surface.
+- MCP-server OAuth internals — owned by `docs/adr/openab-agent-mcp.md`; this ADR only shares the storage
+  layer with it.
+
+---
+
+## 3. Prior Art Survey
+(Per `docs/adr/pr-contribution-guidelines.md`, OpenClaw + Hermes are mandatory references.)
+
+- **Pi (`earendil-works/pi`)** — primary source ported for #1187's Anthropic flow (PKCE endpoints, Claude
+  Code identity headers, system block, tool-name casing). Also ships `CLAUDE_CODE_OAUTH_TOKEN` support
+  (pi #3591) and provider extensions for Kiro / Cursor / xAI — evidence the per-vendor adapter shape works.
+- **OpenClaw** — API keys + subscription OAuth. Anthropic via **setup-token** or **reuse of local Claude
+  CLI** (no native PKCE login). Codex via full PKCE. Stores per-profile `{access,refresh,expires,accountId}`
+  and treats the profile file as a **token sink refreshed under a file lock** — direct corroboration for the
+  locked-RMW decision (§5.4).
+- **Hermes Agent (NousResearch)** — `PROVIDER_REGISTRY` dataclasses declare each provider's auth type +
+  URLs + env vars; one `resolve_runtime_provider()` entry point. Anthropic is **API-key only** (or reuse
+  `~/.claude/.credentials.json`). `auth.json` guarded with `fcntl`/`msvcrt` file locks — again corroborates
+  §5.4. The registry pattern is the spiritual model for `OAuthVendor`.
+- **Vendor CLIs (evidence for the matrix, §6):** Gemini CLI (`code_assist/oauth2.ts`), Antigravity
+  (`opencode-antigravity-auth` + `ANTIGRAVITY_API_SPEC.md`), GitHub Copilot CLI, Kiro CLI, xAI/Grok,
+  Xiaomi MiMo Code — surveyed 2026-06-24 (§6).
+
+**How this ADR differs:** like OpenClaw/Hermes it keeps one namespaced multi-tenant credential file with
+atomic writes + per-refresh rotation handling, and (unlike both) adds native PKCE logins. It adds two things
+neither documents cleanly: a **two-axis** auth/inference split, and an explicit **cross-process** locked-RMW
+invariant (both flag file locks but for the simpler single-process case).
+
+---
+
+## 4. Design Decision
+
+1. **Adopt a two-axis model.** Auth (how a credential is obtained/refreshed/stored) and inference (how a
+   request is sent) are **orthogonal** and must not be coupled. A vendor that serves Claude over Google's
+   Code Assist envelope (agy — the Antigravity CLI variant; see §6) reuses neither Anthropic's Messages-V1
+   transport nor its auth.
+2. **Auth axis = one `OAuthVendor` descriptor + shared driver** (§5.1). New vendor = new descriptor; the
+   PKCE/state/loopback/paste/device-poll/storage driver is written once.
+3. **Inference axis = one provider per wire format** (§5.2). Four formats today; no reuse across them.
+4. **Credential storage = single locked-RMW funnel** (§5.4). Establishes the invariant: *every* write to
+   `auth.json` goes through `with_auth_locked`, with refresh performed outside the lock + re-read inside
+   (single-flight). This is a Consequence of the multi-writer/cross-process reality, not an optional perf
+   tweak.
+5. **Credential-source precedence:** explicit API key → pre-provisioned long-lived OAuth token env
+   (`CLAUDE_CODE_OAUTH_TOKEN` and equivalents) → stored interactive OAuth tenant. Rationale + why the env
+   path is the preferred fleet mode: §5.3.
+
+---
+
+## 5. Detailed Design
+
+### 5.1 `OAuthVendor` (auth axis)
+```rust
+trait OAuthVendor {
+    fn namespace(&self) -> &str;                 // "codex" / "anthropic-oauth" / "antigravity" ...
+    fn client_id(&self) -> String;               // env override + default
+    fn client_secret(&self) -> Option<String> { None }    // Gemini = Some(bundled); agy TBD (§9 Q2); Anthropic/Codex = None
+    fn authorize_url(&self) -> &str;
+    fn token_url(&self) -> &str;
+    fn redirect(&self) -> (u16, &str);
+    fn scope(&self) -> &str;
+    fn extra_authorize_params(&self) -> &[(&str,&str)] { &[] }       // Anthropic: ("code","true")
+    fn token_body(&self) -> TokenBodyFormat { TokenBodyFormat::Form } // Anthropic = Json-no-scope
+    fn grant(&self) -> AuthGrant { AuthGrant::Pkce }                  // DeviceCode for copilot/kiro
+}
+enum TokenBodyFormat { Form, Json }
+enum AuthGrant { Pkce, DeviceCode }
+```
+Shared driver handles browser-loopback, no-browser paste, device-code poll, and refresh once. Fold the
+existing Codex flow into the shared `accept_callback_code` helper (its comment already says "fold it in");
+unify the `127.0.0.1` vs `localhost` bind while there.
+
+### 5.2 Inference providers (inference axis — no reuse)
+| Provider | Endpoint | Wire format | Vendors |
+|---|---|---|---|
+| `AnthropicProvider` (exists) | `api.anthropic.com/v1/messages` | Anthropic Messages V1 | claude; mimocode `/anthropic` mirror |
+| `OpenAiProvider` (exists) | OpenAI-style `/v1/chat/completions` | OpenAI Chat/Responses | codex, grok, copilot, mimocode |
+| `AntigravityProvider` (new) | `cloudcode-pa.googleapis.com` | Google Code Assist (`{project,model,request}`→`{candidates[]}`) | gemini, agy |
+| `AwsQProvider` (new, heaviest) | AWS CodeWhisperer/Q | AWS proprietary event-stream | kiro |
+
+OAuth-mode request decoration (Bearer + identity headers/system-block/tool-name casing) stays in the
+inference provider; if shared, a small `decorate_request()` hook — never folded into `OAuthVendor`.
+
+### 5.3 Credential-source precedence & the env route
+Anthropic offers a route that bypasses interactive login entirely: `claude setup-token` mints a long-lived
+subscription OAuth token (~1-year per Anthropic's Claude Code docs) exposed as **`CLAUDE_CODE_OAUTH_TOKEN`**. For pods, ops mints it once and injects
+it as a k8s secret — no interactive flow, no `auth.json` write, near-zero race exposure. openab-agent should
+read it as a Bearer subscription source, precedence: `ANTHROPIC_API_KEY` → `CLAUDE_CODE_OAUTH_TOKEN` →
+stored `anthropic-oauth` tenant. This is the recommended fleet mode; interactive OAuth is for self-service.
+
+### 5.4 Concurrency & storage invariant (folds in the flock decision)
+`auth.json` is multi-tenant (`codex`, `anthropic-oauth`, `mcp:<server>`×N) and written by two code paths
+(`save_tokens_for`, `McpCredentialStore`) across **multiple processes** (one per Discord thread). The
+required invariant:
+```rust
+// ALL writers funnel through this. auth.rs storage layer.
+fn with_auth_locked<R>(f: impl FnOnce(&mut HashMap<String, AuthEntry>) -> R) -> Result<R> {
+    let _g = flock_exclusive("auth.json.lock")?;  // sidecar file (NOT auth.json — rename swaps its inode)
+    let mut map = read_auth_file(&path)?;          // re-read inside lock (anti lost-update + thundering herd)
+    let r = f(&mut map);
+    write_auth_file(&path, &map)?;                 // existing atomic tmp+rename
+    Ok(r)
+}
+```
+- **Refresh runs OUTSIDE the lock**, result committed inside (re-read → if file already newer, adopt it →
+  else write) ⟹ N processes do 1 real refresh.
+- **`flock(2)`, not a sentinel lockfile**: kernel auto-releases on fd close / process death → no stale lock.
+- **try-lock + timeout** so a hung holder degrades to a graceful error, never a wedged worker.
+- **Crate:** `rustix::fs::flock` (already in the tree, safe API), gated `#[cfg(unix)]` with a non-unix
+  no-op — mirroring the existing atomic-write cfg split. (openab-agent is de-facto unix-only: its
+  `ci-openab-agent.yml` is linux, deploy is always container; Windows binaries are the broker only.)
+- This invariant serves the MCP `CredentialStore` too — see `openab-agent-mcp.md` §6.1.
+- **Until this lands**, prefer the `CLAUDE_CODE_OAUTH_TOKEN` env route (§5.3 — no refresh write, no race);
+  treat interactive Anthropic OAuth as not-yet-hardened for high concurrency.
+
+---
+
+## 6. Vendor feasibility matrix (surveyed 2026-06-24)
+```
+Variant      OAuth style                Inference bucket               Native feasibility
+──────────────────────────────────────────────────────────────────────────────────────────
+claude       PKCE public  (+env token)  Anthropic Messages V1         ✅ done (#1187) + add env route
+codex        PKCE public / device       OpenAI                        ✅ done (has device flow)
+grok (xAI)   xai-oauth (sub) / api-key  OpenAI-compatible             🟢 easy (reuse OpenAiProvider)
+mimocode     MiMo Platform OAuth/key    OpenAI-compat (+/anthropic)   🟢 easy (dual-bucket; OAuth low-ROI)
+copilot      GitHub device flow         OpenAI-compat (githubcopilot) 🟡 token exchange + CC headers
+gemini       PKCE + bundled secret      Google Code Assist            🟡 new provider
+antigravity  PKCE + bundled secret      Google Code Assist            🟡 same provider; ToS-risk*
+kiro         AWS Builder ID device flow AWS Q/CodeWhisperer (propr.)  🔴 hard (event-stream)
+cursor       Cursor browser OAuth       Cursor proprietary proxy      🔴 reverse-eng, ToS-risk*
+hermes       API-key                    multi                         ⚪ agent shell, not a vendor
+opencode     BYO (per-auth plugins)     multi                         ⚪ agent shell
+pi           BYO (provider extensions)  multi                         ⚪ agent shell
+native       —                          —                             = openab-agent itself
+agentcore    AWS SigV4/IAM (not OAuth)  AWS Bedrock                   ❌ out of OAuth scope
+```
+Concrete values (verified): codex `app_EMoamEEZ73f0CkXaXp7hrann` (no secret, form); claude
+`9d1c250a-e61b-44d9-88ed-5944d1962f5e` (no secret, JSON no-scope); gemini
+`681255809395-oo8ft2oprdrnp9e3aqf6av3hmdib135j` (+ bundled `GOCSPX-…`, git-safe); agy
+`1071006060591-tmhssin2h21lcre235vtolojh4g403ep` (likely bundled — UNCONFIRMED; redirect `localhost:51121`,
+scopes add `cclog`/`experimentsandconfigs`; inference `cloudcode-pa`, needs GCP `project` field; one OAuth
+unlocks Claude+Gemini+GPT-OSS; agy ≠ Messages V1).
+
+\* **ToS-risk** = relies on the vendor's official-client OAuth credentials + subscription quota from a
+third-party application (openab-agent) rather than the vendor's own client — which may violate that vendor's
+Terms of Service.
+
+---
+
+## 7. Auth-trigger UX (PR #1185)
+`/auth` is broker-side (`crates/openab-core/src/discord.rs`) — openab-agent advertises no slash commands;
+it exposes CLI subcommands the relay shells out to via `$OPENAB_AGENT_AUTH_COMMAND`. The relay is
+**poll-and-exit**: print URL+code to stdout, poll the AS, exit 0.
+- **Anthropic has NO device flow** (claude.ai = authorization_code only; RFC 8628 unshipped,
+  anthropics/claude-code #22992) and #1187's `--no-browser` reads the code from **stdin**, which the relay
+  cannot feed → undrivable by `/auth`.
+- **Resolution for interactive Claude self-service = two-step, code-as-CLI-arg:** `/auth claude` persists
+  PKCE verifier+state as a pending entry (reuse the existing `mcp-pending`/`AuthEntry::Pending` machinery) +
+  prints the `code=true` URL (claude.ai shows a copyable code); `/auth claude <code>` →
+  `anthropic-oauth --code <code>` completes. No stdin. (Fallback: broker pipes a follow-up DM/modal to child
+  stdin — #1185 v2.) For pods, the §5.3 env route avoids all of this.
+
+---
+
+## 8. Rejected alternatives
+- **Per-vendor bespoke flows (status quo):** rejected — N copies of PKCE/loopback/refresh; #1187 already
+  duplicated the Codex flow. Doesn't scale to 5+ vendors.
+- **Force everything through rmcp `CredentialStore`:** rejected — lossy. `TokenStore` (provider) and rmcp
+  `StoredCredentials` are different on-disk shapes (untagged `AuthEntry`); the translation drops fields
+  (see `openab-agent-mcp.md` §6.1). The shared layer must sit *below* both (file RMW), not in one's trait.
+- **`oauth2` crate for the provider flows:** rejected as the primary path — it enforces RFC form-encoded
+  token bodies; Anthropic needs JSON-no-scope, fighting the lib. May still supply PKCE/state primitives.
+  Does **not** solve the race (stateless).
+- **In-process `Mutex` / tokio single-flight, or a sentinel lockfile (create→delete), for the race:**
+  rejected — see §5.4 (in-process locks are useless across the per-thread processes; a sentinel lockfile
+  deadlocks if a holder dies, whereas `flock(2)` auto-releases on death).
+- **Device flow for Anthropic:** not available (Anthropic ships no device endpoint). Hence the env route +
+  two-step interactive (§7).
+- **Layer-3 auto-trigger now:** deferred — `/auth` manual is sufficient (Brett, 2026-06-24).
+
+---
+
+## 9. Decisions & open questions
+1. **Default-model staleness — DECIDED (Brett 2026-06-24): no hardcoded default; require via config/env,
+   fail-loud.** Hardcoding `claude-opus-4-8` is a recurring 404 timebomb: this PR exists because the prior
+   dated default 404'd on the subscription endpoint, and 4.6+ dateless IDs are **fixed canonical IDs, not
+   evergreen aliases** — there is no floating "-latest" to lean on, and Messages V1 mandates a `model`.
+   Resolve model as ACP/CLI `model_override` → `OPENAB_AGENT_MODEL` → **error** (no hardcoded fallback);
+   drop the three duplicated default sites (`llm.rs:153`, `acp.rs:385/446`). Consequence: removes the
+   zero-config default (deployments set model via values.yaml/env already; needs a clear error message +
+   CHANGELOG note). Also eliminates the silent Opus cost bump for API-key users.
+2. agy `client_secret` requirement — confirm from the plugin `src/constants.ts` / agy binary.
+3. Which heavy/ToS-risk vendors to actually build: kiro (AWS proprietary), cursor (reverse-eng + ToS-risk),
+   agy (ToS-risk). Needs an explicit go/no-go.
+4. Does the locked-RMW fix also subsume `openab-agent-mcp.md` open items #1 (reqwest 0.12/0.13 split) and
+   #8 (doctor/runtime two-store split)? Likely partial.
+
+---
+
+## 10. References
+
+### Internal
+- `docs/adr/openab-agent.md` — agent charter (4 tools, no SDK, thin HTTP)
+- `docs/adr/openab-agent-mcp.md` — MCP client + §6 OAuth + §6.1 storage format
+- `docs/adr/pr-contribution-guidelines.md` — prior-art requirements
+- PR #1187 (Anthropic OAuth), PR #1185 (`/auth`), PR #1111 (`--no-browser`)
+
+### External — projects
+- Pi `earendil-works/pi` (ported flow; `CLAUDE_CODE_OAUTH_TOKEN` #3591) · OpenClaw · Hermes Agent
+- Gemini CLI `code_assist/oauth2.ts` · `NoeFabris/opencode-antigravity-auth` (+ `ANTIGRAVITY_API_SPEC.md`)
+- GitHub `copilot-cli` · Kiro CLI / `pi-provider-kiro` / `kiro-gateway` · xAI API / `pi-xai-oauth`
+- Xiaomi `MiMo-Code`
+
+### External — specs
+- RFC 8628 (Device Authorization Grant) · OAuth 2.1 §10.4 (refresh-token rotation/reuse)
+- anthropics/claude-code #22992 (device-flow request), #20215 (MCP device flow)
+- [Documenting Architecture Decisions — Nygard (2011)](https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions.html)

From b5e46e775dbb617af1bc57067aadfcb4fd69425e Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 00:07:35 +0800
Subject: [PATCH 02/18] =?UTF-8?q?docs(adr):=20address=20review=20=E2=80=94?=
 =?UTF-8?q?=20prefer=20oauth2=20crate,=20drop=20rollout,=20vendor=20names?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Build the OAuthVendor driver on the official `oauth2` crate (already in-tree
  via the MCP side) instead of a hand-rolled PKCE/exchange/refresh flow; the
  Anthropic JSON-token-body quirk is applied via the crate's custom http-client
  hook. Reframe §8 accordingly (hand-rolled flow is the rejected alternative).
- Remove the project Rollout-plan section (internal sequencing, not ADR
  material); keep the race-window mitigation in §5.4.
- Use vendor names only; drop internal fleet-agent references from the matrix.
- Replace unexplained "ToS-gray" with "ToS-risk" + a definition.
- Fix cross-references (crate-qualified paths, §-refs, line numbers) and move
  the settled model-default decision under "Decisions & open questions".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/adr/openab-agent-oauth.md | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/docs/adr/openab-agent-oauth.md b/docs/adr/openab-agent-oauth.md
index 41a74a8e5..cef14986a 100644
--- a/docs/adr/openab-agent-oauth.md
+++ b/docs/adr/openab-agent-oauth.md
@@ -84,8 +84,10 @@ invariant (both flag file locks but for the simpler single-process case).
    request is sent) are **orthogonal** and must not be coupled. A vendor that serves Claude over Google's
    Code Assist envelope (agy — the Antigravity CLI variant; see §6) reuses neither Anthropic's Messages-V1
    transport nor its auth.
-2. **Auth axis = one `OAuthVendor` descriptor + shared driver** (§5.1). New vendor = new descriptor; the
-   PKCE/state/loopback/paste/device-poll/storage driver is written once.
+2. **Auth axis = one `OAuthVendor` descriptor + a shared driver built on the official `oauth2` crate** (§5.1;
+   the crate is already in-tree via the MCP side). New vendor = new descriptor; PKCE/CSRF/auth-code
+   exchange/refresh come from the crate, **not hand-rolled**. The few vendor quirks (e.g. Anthropic's JSON
+   token body) are applied through the crate's custom http-client hook, not by forking the flow.
 3. **Inference axis = one provider per wire format** (§5.2). Four formats today; no reuse across them.
 4. **Credential storage = single locked-RMW funnel** (§5.4). Establishes the invariant: *every* write to
    `auth.json` goes through `with_auth_locked`, with refresh performed outside the lock + re-read inside
@@ -116,9 +118,13 @@ trait OAuthVendor {
 enum TokenBodyFormat { Form, Json }
 enum AuthGrant { Pkce, DeviceCode }
 ```
-Shared driver handles browser-loopback, no-browser paste, device-code poll, and refresh once. Fold the
-existing Codex flow into the shared `accept_callback_code` helper (its comment already says "fold it in");
-unify the `127.0.0.1` vs `localhost` bind while there.
+The shared driver is built on the **official `oauth2` crate** (already a dependency via the MCP side): it
+supplies PKCE, CSRF `state`, the authorization-code exchange, and refresh; the descriptor only feeds it
+per-vendor config. Hand-rolled code is limited to what the crate does not cover — the loopback/paste/
+device-code callback plumbing (fold the existing Codex flow into the shared `accept_callback_code` helper —
+its comment already says "fold it in"; unify the `127.0.0.1` vs `localhost` bind) and the single
+body-encoding override (Anthropic's JSON-no-scope token request, applied via the crate's custom http-client
+hook rather than a separate flow).
 
 ### 5.2 Inference providers (inference axis — no reuse)
 | Provider | Endpoint | Wire format | Vendors |
@@ -218,9 +224,11 @@ it exposes CLI subcommands the relay shells out to via `$OPENAB_AGENT_AUTH_COMMA
 - **Force everything through rmcp `CredentialStore`:** rejected — lossy. `TokenStore` (provider) and rmcp
   `StoredCredentials` are different on-disk shapes (untagged `AuthEntry`); the translation drops fields
   (see `openab-agent-mcp.md` §6.1). The shared layer must sit *below* both (file RMW), not in one's trait.
-- **`oauth2` crate for the provider flows:** rejected as the primary path — it enforces RFC form-encoded
-  token bodies; Anthropic needs JSON-no-scope, fighting the lib. May still supply PKCE/state primitives.
-  Does **not** solve the race (stateless).
+- **Fully hand-rolled OAuth flow:** rejected — it reimplements PKCE/CSRF/exchange/refresh that the official
+  `oauth2` crate (already in-tree) provides. The crate is the chosen basis (§4 decision 2, §5.1); its one
+  friction — it defaults to RFC form-encoded token bodies while Anthropic needs JSON-no-scope — is handled
+  via the crate's custom http-client hook, not by abandoning it. (`oauth2` is stateless and does **not**
+  solve the auth.json race — that's the storage-layer's job, §5.4.)
 - **In-process `Mutex` / tokio single-flight, or a sentinel lockfile (create→delete), for the race:**
   rejected — see §5.4 (in-process locks are useless across the per-thread processes; a sentinel lockfile
   deadlocks if a holder dies, whereas `flock(2)` auto-releases on death).

From 285b335ccde6b0edcacabd614f7c035e194ae673 Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 00:14:54 +0800
Subject: [PATCH 03/18] docs(adr): per-tenant refresh lock + per-user PKCE
 keying (Mira review r1)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fold in two release-blocker-class concurrency fixes from Mira's review:

- §5.4 refresh-token rotation: the prior "refresh outside lock, re-read on
  commit" claim was wrong — N processes each send a refresh with the same
  RT_old before committing, tripping OAuth 2.1 §10.4 reuse -> token-family
  revocation. Replace with a per-tenant exclusive lock: network refresh held
  under the tenant lock only (not the global lock), so exactly one refresh per
  tenant per expiry with no head-of-line blocking across tenants. Extends to
  mcp:<server> tenants.
- §7 /auth: key the pending PKCE verifier+state by the initiating Discord user
  id instead of a single global entry — prevents concurrent-user overwrite
  (PKCE mismatch) and session hijack.
- §5.1 OAuthVendor::redirect() -> Option, since device flows have no redirect.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/adr/openab-agent-oauth.md | 69 +++++++++++++++++++++++++---------
 1 file changed, 52 insertions(+), 17 deletions(-)

diff --git a/docs/adr/openab-agent-oauth.md b/docs/adr/openab-agent-oauth.md
index cef14986a..f50a350d4 100644
--- a/docs/adr/openab-agent-oauth.md
+++ b/docs/adr/openab-agent-oauth.md
@@ -89,10 +89,11 @@ invariant (both flag file locks but for the simpler single-process case).
    exchange/refresh come from the crate, **not hand-rolled**. The few vendor quirks (e.g. Anthropic's JSON
    token body) are applied through the crate's custom http-client hook, not by forking the flow.
 3. **Inference axis = one provider per wire format** (§5.2). Four formats today; no reuse across them.
-4. **Credential storage = single locked-RMW funnel** (§5.4). Establishes the invariant: *every* write to
-   `auth.json` goes through `with_auth_locked`, with refresh performed outside the lock + re-read inside
-   (single-flight). This is a Consequence of the multi-writer/cross-process reality, not an optional perf
-   tweak.
+4. **Credential storage = locked-RMW funnel + per-tenant refresh lock** (§5.4). *Every* write to `auth.json`
+   goes through `with_auth_locked` (global lock — file integrity); *every* token refresh serializes on a
+   **per-tenant** lock so concurrent processes perform exactly one network refresh per tenant and never
+   present a rotated `RT_old` twice (which would trigger OAuth 2.1 §10.4 token-family revocation). A
+   Consequence of the multi-writer/cross-process reality, not an optional perf tweak.
 5. **Credential-source precedence:** explicit API key → pre-provisioned long-lived OAuth token env
    (`CLAUDE_CODE_OAUTH_TOKEN` and equivalents) → stored interactive OAuth tenant. Rationale + why the env
    path is the preferred fleet mode: §5.3.
@@ -109,7 +110,7 @@ trait OAuthVendor {
     fn client_secret(&self) -> Option<String> { None }    // Gemini = Some(bundled); agy TBD (§9 Q2); Anthropic/Codex = None
     fn authorize_url(&self) -> &str;
     fn token_url(&self) -> &str;
-    fn redirect(&self) -> (u16, &str);
+    fn redirect(&self) -> Option<(u16, &'static str)> { None } // Some((port,path)) for loopback PKCE; None for device flow (no redirect endpoint)
     fn scope(&self) -> &str;
     fn extra_authorize_params(&self) -> &[(&str,&str)] { &[] }       // Anthropic: ("code","true")
     fn token_body(&self) -> TokenBodyFormat { TokenBodyFormat::Form } // Anthropic = Json-no-scope
@@ -146,26 +147,56 @@ stored `anthropic-oauth` tenant. This is the recommended fleet mode; interactive
 
 ### 5.4 Concurrency & storage invariant (folds in the flock decision)
 `auth.json` is multi-tenant (`codex`, `anthropic-oauth`, `mcp:<server>`×N) and written by two code paths
-(`save_tokens_for`, `McpCredentialStore`) across **multiple processes** (one per Discord thread). The
-required invariant:
+(`save_tokens_for`, `McpCredentialStore`) across **multiple processes** (one per Discord thread). Two
+distinct hazards demand two locks.
+
+**(a) File integrity — one global lock.** Every write funnels through a single locked RMW so concurrent
+writers never lost-update the shared file:
 ```rust
 // ALL writers funnel through this. auth.rs storage layer.
 fn with_auth_locked<R>(f: impl FnOnce(&mut HashMap<String, AuthEntry>) -> R) -> Result<R> {
     let _g = flock_exclusive("auth.json.lock")?;  // sidecar file (NOT auth.json — rename swaps its inode)
-    let mut map = read_auth_file(&path)?;          // re-read inside lock (anti lost-update + thundering herd)
+    let mut map = read_auth_file(&path)?;          // re-read inside lock (anti lost-update)
     let r = f(&mut map);
     write_auth_file(&path, &map)?;                 // existing atomic tmp+rename
     Ok(r)
 }
 ```
-- **Refresh runs OUTSIDE the lock**, result committed inside (re-read → if file already newer, adopt it →
-  else write) ⟹ N processes do 1 real refresh.
-- **`flock(2)`, not a sentinel lockfile**: kernel auto-releases on fd close / process death → no stale lock.
-- **try-lock + timeout** so a hung holder degrades to a graceful error, never a wedged worker.
+
+**(b) Refresh-token rotation — one lock per tenant.** An earlier draft ran the refresh *outside* the lock
+and committed the result inside, claiming "N processes do 1 real refresh." **That is wrong** (Mira review,
+2026-06-24): re-read-on-commit only prevents a lost *write* — every process has already *sent* a network
+refresh carrying the same `RT_old` before it reaches the commit. Under OAuth 2.1 §10.4 refresh-token
+rotation, the second `RT_old` presentation reads as reuse and the AS **revokes the whole token family** =
+exactly the fleet-wide logout this ADR exists to prevent. Holding the *global* exclusive lock across the
+network refresh would serialize it, but then a slow refresh for one tenant head-of-line-blocks every other
+tenant (MCP servers, Codex). So: **one exclusive lock file per tenant**, network I/O held under the tenant
+lock only — never under the global lock:
+```rust
+fn get_or_refresh(tenant: &str) -> Result<String> {
+    // 1. fast path — fresh token under a shared (read) global lock
+    if let Some(t) = read_fresh(tenant)? { return Ok(t); }
+    // 2. serialize refreshes for THIS tenant only (other tenants unaffected)
+    let _tg = flock_exclusive(&format!("auth.json.{tenant}.lock"))?;
+    // 3. double-check — another process may have refreshed while we waited on the tenant lock
+    if let Some(t) = read_fresh(tenant)? { return Ok(t); }
+    // 4. exactly one network refresh per tenant per expiry — tenant lock held, global lock NOT
+    let fresh = perform_network_refresh(tenant)?;
+    // 5. commit under the global lock (fast inode swap, no network I/O inside)
+    with_auth_locked(|m| m.insert(tenant.into(), fresh.clone()))?;
+    Ok(fresh.access_token)
+}
+```
+- **`flock(2)`, not a sentinel lockfile**: kernel auto-releases on fd close / process death → a hung or
+  killed refresher frees its tenant lock instantly. No stale lock, no manual timeout/orphan cleanup.
+- **try-lock + timeout** on the global lock so a wedged writer degrades to a graceful error, never a wedged
+  worker.
 - **Crate:** `rustix::fs::flock` (already in the tree, safe API), gated `#[cfg(unix)]` with a non-unix
   no-op — mirroring the existing atomic-write cfg split. (openab-agent is de-facto unix-only: its
   `ci-openab-agent.yml` is linux, deploy is always container; Windows binaries are the broker only.)
-- This invariant serves the MCP `CredentialStore` too — see `openab-agent-mcp.md` §6.1.
+- Each MCP `mcp:<server>` tenant takes its own tenant lock by the same rule, so the MCP `CredentialStore`
+  refreshes are serialized per server too — the invariant serves it directly (see `openab-agent-mcp.md`
+  §6.1).
 - **Until this lands**, prefer the `CLAUDE_CODE_OAUTH_TOKEN` env route (§5.3 — no refresh write, no race);
   treat interactive Anthropic OAuth as not-yet-hardened for high concurrency.
 
@@ -211,10 +242,14 @@ it exposes CLI subcommands the relay shells out to via `$OPENAB_AGENT_AUTH_COMMA
   anthropics/claude-code #22992) and #1187's `--no-browser` reads the code from **stdin**, which the relay
   cannot feed → undrivable by `/auth`.
 - **Resolution for interactive Claude self-service = two-step, code-as-CLI-arg:** `/auth claude` persists
-  PKCE verifier+state as a pending entry (reuse the existing `mcp-pending`/`AuthEntry::Pending` machinery) +
-  prints the `code=true` URL (claude.ai shows a copyable code); `/auth claude <code>` →
-  `anthropic-oauth --code <code>` completes. No stdin. (Fallback: broker pipes a follow-up DM/modal to child
-  stdin — #1185 v2.) For pods, the §5.3 env route avoids all of this.
+  the PKCE verifier+state as a pending entry **keyed by the initiating Discord user id**
+  (`pending:claude:<discord_user_id>`, reuse the existing `mcp-pending`/`AuthEntry::Pending` machinery) +
+  prints the `code=true` URL (claude.ai shows a copyable code); `/auth claude <code>` forwards that same user
+  id so `anthropic-oauth --code <code>` loads the matching verifier and completes. No stdin. **Per-user
+  keying is required** (Mira review, 2026-06-24): a single global pending entry lets a second concurrent
+  user's verifier overwrite the first's → PKCE mismatch on exchange, and worse, lets user B complete a flow
+  user A initiated (session hijack). (Fallback: broker pipes a follow-up DM/modal to child stdin — #1185 v2.)
+  For pods, the §5.3 env route avoids all of this.
 
 ---
 

From f3dedfe0abecb1d6d20b0f772abbee536786e1b2 Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 00:19:23 +0800
Subject: [PATCH 04/18] docs(adr): bundled-secret storage + pending-entry GC
 (Mira review r2)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- §6/§9 Q2: correct the stale "git-safe" claim on the gemini GOCSPX-
  secret. The value is non-confidential by RFC 8252 / Google docs, but
  GitHub now push-protects Google secrets by default (changelog 2026-03)
  and partner-scans them for auto-revoke, so a raw literal is not safe in a
  public repo. Decide: encode-at-rest (scanner-evasion for a non-secret, NOT
  a security control) or env-inject at runtime — not raw text.
- §7: pending PKCE entries get created_at + a 15-min GC sweep in
  with_auth_locked so abandoned /auth attempts don't accumulate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/adr/openab-agent-oauth.md | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/docs/adr/openab-agent-oauth.md b/docs/adr/openab-agent-oauth.md
index f50a350d4..85b567dad 100644
--- a/docs/adr/openab-agent-oauth.md
+++ b/docs/adr/openab-agent-oauth.md
@@ -223,7 +223,7 @@ agentcore    AWS SigV4/IAM (not OAuth)  AWS Bedrock                   ❌ out of
 ```
 Concrete values (verified): codex `app_EMoamEEZ73f0CkXaXp7hrann` (no secret, form); claude
 `9d1c250a-e61b-44d9-88ed-5944d1962f5e` (no secret, JSON no-scope); gemini
-`681255809395-oo8ft2oprdrnp9e3aqf6av3hmdib135j` (+ bundled `GOCSPX-…`, git-safe); agy
+`681255809395-oo8ft2oprdrnp9e3aqf6av3hmdib135j` (+ bundled `GOCSPX-…` — non-confidential by spec but **not** safe as raw repo text; storage decided in §9 Q2); agy
 `1071006060591-tmhssin2h21lcre235vtolojh4g403ep` (likely bundled — UNCONFIRMED; redirect `localhost:51121`,
 scopes add `cclog`/`experimentsandconfigs`; inference `cloudcode-pa`, needs GCP `project` field; one OAuth
 unlocks Claude+Gemini+GPT-OSS; agy ≠ Messages V1).
@@ -250,6 +250,9 @@ it exposes CLI subcommands the relay shells out to via `$OPENAB_AGENT_AUTH_COMMA
   user's verifier overwrite the first's → PKCE mismatch on exchange, and worse, lets user B complete a flow
   user A initiated (session hijack). (Fallback: broker pipes a follow-up DM/modal to child stdin — #1185 v2.)
   For pods, the §5.3 env route avoids all of this.
+- **Pending-entry GC** (Mira review, 2026-06-24): stamp each `AuthEntry::Pending` with `created_at`;
+  `with_auth_locked` opportunistically drops pending entries older than 15 min on every write, so abandoned
+  `/auth` attempts (user never pastes a code) don't accumulate stale verifiers in `auth.json`.
 
 ---
 
@@ -282,7 +285,19 @@ it exposes CLI subcommands the relay shells out to via `$OPENAB_AGENT_AUTH_COMMA
    drop the three duplicated default sites (`llm.rs:153`, `acp.rs:385/446`). Consequence: removes the
    zero-config default (deployments set model via values.yaml/env already; needs a clear error message +
    CHANGELOG note). Also eliminates the silent Opus cost bump for API-key users.
-2. agy `client_secret` requirement — confirm from the plugin `src/constants.ts` / agy binary.
+2. **Bundled `client_secret` storage — REFINED (Mira review 2026-06-24).** Google Code-Assist vendors
+   (gemini, agy) ship a `GOCSPX-…` desktop-app secret. By RFC 8252 and Google's own docs this value is
+   **non-confidential** (installed-app secret, "obviously not treated as a secret") — there is no
+   confidentiality to protect, so obscuring it adds zero cryptographic security. But it is **not safe as raw
+   text in a public repo for operational reasons**: GitHub push-protection covers Google secrets **by default**
+   (changelog 2026-03), so a raw `GOCSPX-` literal blocks contributor `git push`, and GitHub↔Google partner
+   token-scanning may **auto-revoke** the credential once it lands in a public commit. Decision: do **not**
+   commit the raw literal; pick per vendor —
+   (a) **encode-at-rest** (split/base64) in source — purely scanner-evasion for an already-public value, *not*
+   a security control; label it as such inline so reviewers aren't misled into treating it as a real secret; or
+   (b) **inject at runtime via env** (no secret in the repo at all) — cleaner provenance, consistent with the
+   §5.3 env-route preference, at the cost of the bundled zero-config UX.
+   Still confirm whether agy actually *requires* a secret, from the plugin `src/constants.ts` / agy binary.
 3. Which heavy/ToS-risk vendors to actually build: kiro (AWS proprietary), cursor (reverse-eng + ToS-risk),
    agy (ToS-risk). Needs an explicit go/no-go.
 4. Does the locked-RMW fix also subsume `openab-agent-mcp.md` open items #1 (reqwest 0.12/0.13 split) and

From 8a6ada102514f73129838be355ebc60b0f274875 Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 00:27:01 +0800
Subject: [PATCH 05/18] =?UTF-8?q?docs(adr):=20close=20=C2=A79=20Q2/Q3=20?=
 =?UTF-8?q?=E2=80=94=20env-injection=20default,=20vendor=20go/no-go=20(Bre?=
 =?UTF-8?q?tt=20decisions=20r3)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- §9 Q2 DECIDED: env-injection (b) is the default for the gemini/agy bundled
  client_secret; encode-at-rest (a) is the fallback for bundled zero-config
  binaries only, framed as scanner-evasion (not security). Cite rclone
  rcloneEncryptedClientSecret + obscure.MustReveal() as the canonical
  precedent (§10).
- §9 Q3 DECIDED: GO gemini/grok (first wave) + agy (experimental, opt-in,
  ToS caveat — shares gemini's Code-Assist provider, residual risk is ToS not
  secret storage); No-Go cursor/kiro. Mirror as a build-decision line under §6.
- §10: add rclone + GitHub/Google secret-scanning references.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/adr/openab-agent-oauth.md | 53 ++++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 15 deletions(-)

diff --git a/docs/adr/openab-agent-oauth.md b/docs/adr/openab-agent-oauth.md
index 85b567dad..c8f75db48 100644
--- a/docs/adr/openab-agent-oauth.md
+++ b/docs/adr/openab-agent-oauth.md
@@ -232,6 +232,9 @@ unlocks Claude+Gemini+GPT-OSS; agy ≠ Messages V1).
 third-party application (openab-agent) rather than the vendor's own client — which may violate that vendor's
 Terms of Service.
 
+**Build decision (§9 Q3, Brett 2026-06-24):** GO `gemini`/`grok` (first wave) and `agy` (experimental,
+opt-in, ToS caveat); No-Go `cursor`/`kiro`.
+
 ---
 
 ## 7. Auth-trigger UX (PR #1185)
@@ -285,21 +288,36 @@ it exposes CLI subcommands the relay shells out to via `$OPENAB_AGENT_AUTH_COMMA
    drop the three duplicated default sites (`llm.rs:153`, `acp.rs:385/446`). Consequence: removes the
    zero-config default (deployments set model via values.yaml/env already; needs a clear error message +
    CHANGELOG note). Also eliminates the silent Opus cost bump for API-key users.
-2. **Bundled `client_secret` storage — REFINED (Mira review 2026-06-24).** Google Code-Assist vendors
-   (gemini, agy) ship a `GOCSPX-…` desktop-app secret. By RFC 8252 and Google's own docs this value is
-   **non-confidential** (installed-app secret, "obviously not treated as a secret") — there is no
-   confidentiality to protect, so obscuring it adds zero cryptographic security. But it is **not safe as raw
-   text in a public repo for operational reasons**: GitHub push-protection covers Google secrets **by default**
-   (changelog 2026-03), so a raw `GOCSPX-` literal blocks contributor `git push`, and GitHub↔Google partner
-   token-scanning may **auto-revoke** the credential once it lands in a public commit. Decision: do **not**
-   commit the raw literal; pick per vendor —
-   (a) **encode-at-rest** (split/base64) in source — purely scanner-evasion for an already-public value, *not*
-   a security control; label it as such inline so reviewers aren't misled into treating it as a real secret; or
-   (b) **inject at runtime via env** (no secret in the repo at all) — cleaner provenance, consistent with the
-   §5.3 env-route preference, at the cost of the bundled zero-config UX.
-   Still confirm whether agy actually *requires* a secret, from the plugin `src/constants.ts` / agy binary.
-3. Which heavy/ToS-risk vendors to actually build: kiro (AWS proprietary), cursor (reverse-eng + ToS-risk),
-   agy (ToS-risk). Needs an explicit go/no-go.
+2. **Bundled `client_secret` storage — DECIDED (Brett 2026-06-24): env-injection default; encode-at-rest
+   fallback.** Google Code-Assist vendors (gemini, agy) ship a `GOCSPX-…` desktop-app secret. By RFC 8252 and
+   Google's own docs this value is **non-confidential** (installed-app secret, "obviously not treated as a
+   secret") — there is no confidentiality to protect, so obscuring it adds zero cryptographic security. But it
+   is **not safe as raw text in a public repo for operational reasons**: GitHub push-protection covers Google
+   secrets **by default** (changelog 2026-03), so a raw `GOCSPX-` literal blocks contributor `git push`, and
+   GitHub↔Google partner token-scanning may **auto-revoke** the credential once it lands in a public commit.
+   Decision: do **not** commit the raw literal —
+   - **(b) inject at runtime via env is the default** (no secret in the repo at all): cleanest provenance,
+     consistent with the §5.3 env-route preference, removes the ToS/leak surface entirely. Costs the bundled
+     zero-config UX, which fleet/pod deployments don't need (they set env already).
+   - **(a) encode-at-rest** (split/base64) is the **fallback** only where a bundled zero-config binary is
+     genuinely required: it is **scanner-evasion for an already-public value, *not* a security control** —
+     label it as such inline so reviewers aren't misled. This is a well-adopted pattern: **rclone** declares a
+     `rcloneEncryptedClientSecret` constant and calls `obscure.MustReveal()` at runtime for exactly this
+     reason (bypass static scanners / partner auto-revoke, explicitly not encryption) — see §10.
+   Still confirm whether agy actually *requires* a secret, from the plugin `src/constants.ts` / agy binary;
+   under the (b) default this is moot unless agy is shipped as a bundled zero-config binary.
+3. **Vendor go/no-go — DECIDED (Brett 2026-06-24).**
+   - **GO:** `gemini`, `grok` (high value, clean APIs, low ToS risk) — first wave.
+   - **GO (experimental, opt-in):** `agy`. Marginal eng cost is low — it shares gemini's Google Code-Assist
+     `AntigravityProvider`, and one OAuth unlocks Claude+Gemini+GPT-OSS. Its one residual risk is **ToS**
+     (drives Antigravity's official client + the user's subscription quota from a third-party app), which is
+     *independent of* the now-solved secret-storage question and cannot be engineered away — so agy stays
+     behind an **explicit opt-in flag with a documented ToS caveat**, and the user accepts the risk on their
+     own subscription. Watch for `429 RESOURCE_EXHAUSTED` (shared-quota exhaustion) and `cloudcode-pa`
+     endpoint drift (semi-internal Google API; agy auto-updates). openab already runs agy in production via
+     the ECS `antigravity` variant, so the auth/quota behaviour is first-hand-known.
+   - **No-Go:** `cursor` (reverse-engineered proprietary proxy + high ToS/account-ban risk), `kiro` (AWS Q
+     event-stream protocol — high maintenance cost). Revisit only on explicit demand.
 4. Does the locked-RMW fix also subsume `openab-agent-mcp.md` open items #1 (reqwest 0.12/0.13 split) and
    #8 (doctor/runtime two-store split)? Likely partial.
 
@@ -318,8 +336,13 @@ it exposes CLI subcommands the relay shells out to via `$OPENAB_AGENT_AUTH_COMMA
 - Gemini CLI `code_assist/oauth2.ts` · `NoeFabris/opencode-antigravity-auth` (+ `ANTIGRAVITY_API_SPEC.md`)
 - GitHub `copilot-cli` · Kiro CLI / `pi-provider-kiro` / `kiro-gateway` · xAI API / `pi-xai-oauth`
 - Xiaomi `MiMo-Code`
+- **rclone `rclone/rclone`** — `rcloneEncryptedClientSecret` constant + runtime `obscure.MustReveal()`: the
+  canonical real-world precedent for §9 Q2 encode-at-rest (scanner-evasion of a non-confidential bundled
+  OAuth secret, explicitly not encryption).
 
 ### External — specs
 - RFC 8628 (Device Authorization Grant) · OAuth 2.1 §10.4 (refresh-token rotation/reuse)
 - anthropics/claude-code #22992 (device-flow request), #20215 (MCP device flow)
+- GitHub secret-scanning — Google secrets push-protected by default (changelog 2026-03-31); Google
+  google-auth-library-nodejs #959 (desktop client secret is non-confidential)
 - [Documenting Architecture Decisions — Nygard (2011)](https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions.html)

From ed83dc2b1ad4ec63764cba9efdb79819d7af4b4e Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 00:30:15 +0800
Subject: [PATCH 06/18] docs(adr): confirm agy secret + ecosystem evidence
 (GitHub survey r4)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

GitHub survey 2026-06-24 grounds the agy GO decision:
- §6/§9 Q2: agy client_secret requirement CONFIRMED — it needs a GOCSPX-
  secret, a public constant >=20 antigravity-auth repos hardcode verbatim
  (NoeFabris/opencode-antigravity-auth, router-for-me/CLIProxyAPI, ...).
  Literal deliberately NOT pasted into the doc — would trip the very §9 Q2
  push-protection, so we dogfood the env/encode decision. Redirect confirmed
  localhost:51121/oauth-callback.
- §9 Q3: ecosystem evidence — agy OAuth is widely ported (opencode/pi/hermes/
  openclaw plugins + proxies), proving the integration, while the same
  ecosystem's anti-ban / quota-locking / multi-account-rotation tooling
  empirically confirms the ToS-ban + 429 risks, reinforcing the opt-in gate.
- §10: add the antigravity OAuth ecosystem reference set.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/adr/openab-agent-oauth.md | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/docs/adr/openab-agent-oauth.md b/docs/adr/openab-agent-oauth.md
index c8f75db48..2117ce936 100644
--- a/docs/adr/openab-agent-oauth.md
+++ b/docs/adr/openab-agent-oauth.md
@@ -224,7 +224,10 @@ agentcore    AWS SigV4/IAM (not OAuth)  AWS Bedrock                   ❌ out of
 Concrete values (verified): codex `app_EMoamEEZ73f0CkXaXp7hrann` (no secret, form); claude
 `9d1c250a-e61b-44d9-88ed-5944d1962f5e` (no secret, JSON no-scope); gemini
 `681255809395-oo8ft2oprdrnp9e3aqf6av3hmdib135j` (+ bundled `GOCSPX-…` — non-confidential by spec but **not** safe as raw repo text; storage decided in §9 Q2); agy
-`1071006060591-tmhssin2h21lcre235vtolojh4g403ep` (likely bundled — UNCONFIRMED; redirect `localhost:51121`,
+`1071006060591-tmhssin2h21lcre235vtolojh4g403ep` (bundled secret **CONFIRMED** 2026-06-24 — the same public
+`GOCSPX-…` constant ≥20 antigravity-auth ecosystem repos hardcode verbatim, e.g. `NoeFabris/opencode-antigravity-auth`,
+`router-for-me/CLIProxyAPI`; **deliberately not reproduced here** — pasting the literal would trip the very
+§9 Q2 push-protection, so we dogfood the env/encode decision; redirect `localhost:51121/oauth-callback`,
 scopes add `cclog`/`experimentsandconfigs`; inference `cloudcode-pa`, needs GCP `project` field; one OAuth
 unlocks Claude+Gemini+GPT-OSS; agy ≠ Messages V1).
 
@@ -304,8 +307,9 @@ it exposes CLI subcommands the relay shells out to via `$OPENAB_AGENT_AUTH_COMMA
      label it as such inline so reviewers aren't misled. This is a well-adopted pattern: **rclone** declares a
      `rcloneEncryptedClientSecret` constant and calls `obscure.MustReveal()` at runtime for exactly this
      reason (bypass static scanners / partner auto-revoke, explicitly not encryption) — see §10.
-   Still confirm whether agy actually *requires* a secret, from the plugin `src/constants.ts` / agy binary;
-   under the (b) default this is moot unless agy is shipped as a bundled zero-config binary.
+   agy secret requirement is now **CONFIRMED** (2026-06-24): agy *does* require a `GOCSPX-…` client_secret, a
+   public ecosystem constant (≥20 antigravity-auth repos hardcode it — §6). Under the (b) env default it is
+   injected, never committed; (a) applies only if agy is ever shipped as a bundled zero-config binary.
 3. **Vendor go/no-go — DECIDED (Brett 2026-06-24).**
    - **GO:** `gemini`, `grok` (high value, clean APIs, low ToS risk) — first wave.
    - **GO (experimental, opt-in):** `agy`. Marginal eng cost is low — it shares gemini's Google Code-Assist
@@ -315,7 +319,12 @@ it exposes CLI subcommands the relay shells out to via `$OPENAB_AGENT_AUTH_COMMA
      behind an **explicit opt-in flag with a documented ToS caveat**, and the user accepts the risk on their
      own subscription. Watch for `429 RESOURCE_EXHAUSTED` (shared-quota exhaustion) and `cloudcode-pa`
      endpoint drift (semi-internal Google API; agy auto-updates). openab already runs agy in production via
-     the ECS `antigravity` variant, so the auth/quota behaviour is first-hand-known.
+     the ECS `antigravity` variant, so the auth/quota behaviour is first-hand-known. **Ecosystem evidence
+     (GitHub survey 2026-06-24):** agy OAuth is widely ported — ≥20 public repos hardcode the identical
+     client_id/secret (opencode/pi/hermes/openclaw plugins, standalone proxies) — so the integration is
+     proven, *and* the same ecosystem is full of "anti-ban", "strict quota locking" and "multi-account
+     rotation" tooling, which empirically confirms the ToS-ban and `429` quota-exhaustion risks are real, not
+     theoretical — reinforcing the opt-in gate.
    - **No-Go:** `cursor` (reverse-engineered proprietary proxy + high ToS/account-ban risk), `kiro` (AWS Q
      event-stream protocol — high maintenance cost). Revisit only on explicit demand.
 4. Does the locked-RMW fix also subsume `openab-agent-mcp.md` open items #1 (reqwest 0.12/0.13 split) and
@@ -334,6 +343,10 @@ it exposes CLI subcommands the relay shells out to via `$OPENAB_AGENT_AUTH_COMMA
 ### External — projects
 - Pi `earendil-works/pi` (ported flow; `CLAUDE_CODE_OAUTH_TOKEN` #3591) · OpenClaw · Hermes Agent
 - Gemini CLI `code_assist/oauth2.ts` · `NoeFabris/opencode-antigravity-auth` (+ `ANTIGRAVITY_API_SPEC.md`)
+- **Antigravity (agy) OAuth ecosystem** (GitHub survey 2026-06-24, ≥20 repos hardcoding the same client_id/
+  secret) — `NoeFabris/opencode-antigravity-auth`, `router-for-me/CLIProxyAPI`, `andyvandaric/opencode-ag-auth`
+  (quota-locking/anti-ban), `Meapri/hermes-google-antigravity-plugin`, `wbbtmusic/openclaw-antigravity-oauth`;
+  evidence the integration is proven and that ToS-ban/quota mitigations are a real ecosystem concern (§9 Q3)
 - GitHub `copilot-cli` · Kiro CLI / `pi-provider-kiro` / `kiro-gateway` · xAI API / `pi-xai-oauth`
 - Xiaomi `MiMo-Code`
 - **rclone `rclone/rclone`** — `rcloneEncryptedClientSecret` constant + runtime `obscure.MustReveal()`: the

From db88c7a2ed76cda9f7251bec3d86837e7fd2df1e Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 00:35:30 +0800
Subject: [PATCH 07/18] docs(adr): agy-vendor vs agy-CLI/ACP clarification +
 MCP store revamp in scope (r5)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Grounded by a codebase survey (2026-06-24):

- §4/§6: clarify "agy as a GO vendor" means a native OAuthVendor + Code-Assist
  inference provider — it does NOT run the agy CLI. agy speaks no ACP; the
  existing `antigravity` runtime variant (Mira/ECS) only works via a dedicated
  agy-acp adapter that shells out to the agy binary per prompt and polls its
  SQLite DB. The provider path sidesteps ACP entirely and supersedes the
  CLI-wrapper for native use, so agy's lack of ACP doesn't block the GO.
- §5.4/§9 Q4: make the MCP CredentialStore revamp explicitly in-scope. auth.json
  has NO lock today (only atomic rename); provider save_tokens (auth.rs:234) and
  McpCredentialStore::save/clear (auth.rs:284-328) are two independent unlocked
  RMW callers, so with_auth_locked must wrap BOTH or the race persists. Also
  correct the stale save_tokens_for name and note with_auth_locked is new.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/adr/openab-agent-oauth.md | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/docs/adr/openab-agent-oauth.md b/docs/adr/openab-agent-oauth.md
index 2117ce936..c09fdb413 100644
--- a/docs/adr/openab-agent-oauth.md
+++ b/docs/adr/openab-agent-oauth.md
@@ -82,8 +82,13 @@ invariant (both flag file locks but for the simpler single-process case).
 
 1. **Adopt a two-axis model.** Auth (how a credential is obtained/refreshed/stored) and inference (how a
    request is sent) are **orthogonal** and must not be coupled. A vendor that serves Claude over Google's
-   Code Assist envelope (agy — the Antigravity CLI variant; see §6) reuses neither Anthropic's Messages-V1
-   transport nor its auth.
+   Code Assist envelope (agy; see §6) reuses neither Anthropic's Messages-V1 transport nor its auth.
+   **Note — "agy as a GO vendor" ≠ running the agy CLI.** This ADR adds agy as an `OAuthVendor` + Code-Assist
+   inference *provider* consumed by the **native** openab-agent; it does **not** spawn the agy binary. That
+   matters because **agy speaks no ACP** — the existing `antigravity` *runtime variant* (Mira on ECS) only
+   works via a dedicated `agy-acp` adapter that shells out to the agy CLI per prompt and polls its SQLite
+   conversation DB to synthesize ACP events. The vendor/provider path here **sidesteps ACP entirely** and
+   supersedes that CLI-wrapper for native use — agy's lack of ACP is irrelevant to it (see §6).
 2. **Auth axis = one `OAuthVendor` descriptor + a shared driver built on the official `oauth2` crate** (§5.1;
    the crate is already in-tree via the MCP side). New vendor = new descriptor; PKCE/CSRF/auth-code
    exchange/refresh come from the crate, **not hand-rolled**. The few vendor quirks (e.g. Anthropic's JSON
@@ -146,9 +151,15 @@ read it as a Bearer subscription source, precedence: `ANTHROPIC_API_KEY` → `CL
 stored `anthropic-oauth` tenant. This is the recommended fleet mode; interactive OAuth is for self-service.
 
 ### 5.4 Concurrency & storage invariant (folds in the flock decision)
-`auth.json` is multi-tenant (`codex`, `anthropic-oauth`, `mcp:<server>`×N) and written by two code paths
-(`save_tokens_for`, `McpCredentialStore`) across **multiple processes** (one per Discord thread). Two
-distinct hazards demand two locks.
+`auth.json` is multi-tenant (`codex`, `anthropic-oauth`, `mcp:<server>`×N) and written by **two independent
+read-modify-write call sites** across **multiple processes** (one per Discord thread): provider tokens via
+`save_tokens` (`openab-agent/src/auth.rs:234`) and **MCP** OAuth creds via `McpCredentialStore::save`/`clear`
+(`auth.rs:284-328`), plus the MCP pending-login finalize path. **Today there is *no* lock on `auth.json`** —
+only an atomic `tmp+rename` in the shared low-level `write_auth_file`; the two RMW callers each do their own
+unlocked `read_auth_file → mutate map → write_auth_file`, so a concurrent provider-refresh and MCP-save
+last-writer-wins the *entire* map (lost update). Two distinct hazards demand two locks — **and the fix is not
+provider-only: the MCP `CredentialStore` is a co-equal RMW caller that must be routed through the same
+invariant** (see §9 Q4). (`with_auth_locked` below is *new* — the thing this ADR introduces.)
 
 **(a) File integrity — one global lock.** Every write funnels through a single locked RMW so concurrent
 writers never lost-update the shared file:
@@ -327,8 +338,15 @@ it exposes CLI subcommands the relay shells out to via `$OPENAB_AGENT_AUTH_COMMA
      theoretical — reinforcing the opt-in gate.
    - **No-Go:** `cursor` (reverse-engineered proprietary proxy + high ToS/account-ban risk), `kiro` (AWS Q
      event-stream protocol — high maintenance cost). Revisit only on explicit demand.
-4. Does the locked-RMW fix also subsume `openab-agent-mcp.md` open items #1 (reqwest 0.12/0.13 split) and
-   #8 (doctor/runtime two-store split)? Likely partial.
+4. **MCP credential store must be revamped together — IN SCOPE (Brett 2026-06-24).** The locked-RMW +
+   per-tenant-lock invariant (§5.4) is **not** a provider-only change. `McpCredentialStore::save`/`clear`
+   (`auth.rs:284-328`) is a co-equal unlocked RMW writer of `auth.json`, and the MCP pending-login finalize
+   path writes it too. Introducing `with_auth_locked` therefore requires routing **both** the provider
+   (`save_tokens`) and the MCP `CredentialStore` (+ pending finalize) through it — otherwise the lock is
+   bypassed by half the writers and the race persists. Land them in the same change. This also likely
+   subsumes (partially) `openab-agent-mcp.md` open items #1 (reqwest 0.12/0.13 split) and #8 (doctor/runtime
+   two-store split); confirm against that ADR. Connects to the pending OAuth-revamp follow-up flagged on the
+   `feat/openab-agent-mcp-resilience` PR.
 
 ---
 

From 2c6063b182e9d0982f01c996af62b1697fbd5904 Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 01:04:10 +0800
Subject: [PATCH 08/18] =?UTF-8?q?docs(adr):=20fix=20stale=20=C2=A79=20Q4?=
 =?UTF-8?q?=20cross-reference?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The §9 Q4 tail referenced 'openab-agent-mcp.md open items #1 (reqwest
0.12/0.13 split) and #8 (doctor/runtime two-store split)' — but that ADR's
§10 Open Questions has only two items (mcp.json location; native-vs-broker
parity), neither matching, and no such numbered items / terms exist anywhere
in it. The phantom reference dated to the original draft. Replace with an
accurate statement: McpCredentialStore reuses the same TokenStore/auth.json
storage (openab-agent-mcp.md §6.1), so the lock lands once and serves both;
the reqwest version conflict is the rmcp-OAuth dependency issue surfaced on
the feat/openab-agent-mcp-resilience PR, not an mcp-ADR open item.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/adr/openab-agent-oauth.md | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/docs/adr/openab-agent-oauth.md b/docs/adr/openab-agent-oauth.md
index c09fdb413..475b0cae6 100644
--- a/docs/adr/openab-agent-oauth.md
+++ b/docs/adr/openab-agent-oauth.md
@@ -343,10 +343,11 @@ it exposes CLI subcommands the relay shells out to via `$OPENAB_AGENT_AUTH_COMMA
    (`auth.rs:284-328`) is a co-equal unlocked RMW writer of `auth.json`, and the MCP pending-login finalize
    path writes it too. Introducing `with_auth_locked` therefore requires routing **both** the provider
    (`save_tokens`) and the MCP `CredentialStore` (+ pending finalize) through it — otherwise the lock is
-   bypassed by half the writers and the race persists. Land them in the same change. This also likely
-   subsumes (partially) `openab-agent-mcp.md` open items #1 (reqwest 0.12/0.13 split) and #8 (doctor/runtime
-   two-store split); confirm against that ADR. Connects to the pending OAuth-revamp follow-up flagged on the
-   `feat/openab-agent-mcp-resilience` PR.
+   bypassed by half the writers and the race persists. Land them in the same change. (`McpCredentialStore`
+   reuses the same `TokenStore`/`auth.json` storage that `openab-agent-mcp.md` §6.1 describes, so the lock
+   lands once and serves both.) Connects to the pending OAuth-revamp follow-up flagged on the
+   `feat/openab-agent-mcp-resilience` PR — itself driven by the rmcp/`reqwest` dependency-version conflict its
+   OAuth adoption surfaced.
 
 ---
 

From 515d49b08ce66a6fb80d2ac88bbf6f3d633b7fde Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 01:05:44 +0800
Subject: [PATCH 09/18] =?UTF-8?q?docs(adr):=20=C2=A79=20Q2=20=E2=80=94=20e?=
 =?UTF-8?q?ncode-at-rest=20is=20the=20default,=20env=20is=20the=20alternat?=
 =?UTF-8?q?ive=20(Brett)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Brett's call given the 93%-plaintext ecosystem survey: keep the bundled
zero-config UX via encode-at-rest (obscure, rclone obscure.MustReveal style)
as the DEFAULT, with env-injection as the alternative for fleet/pod. Reverses
the prior (b)-default ordering. Framing unchanged: encode-at-rest is
scanner-evasion for a non-confidential value, not a security control. Record
the survey numbers (99/107 plaintext; auto-revoke largely unrealized) and that
the real risk mitigated is org-repo push-protection friction, not credential
loss.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/adr/openab-agent-oauth.md | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/docs/adr/openab-agent-oauth.md b/docs/adr/openab-agent-oauth.md
index 475b0cae6..2e688af1e 100644
--- a/docs/adr/openab-agent-oauth.md
+++ b/docs/adr/openab-agent-oauth.md
@@ -302,25 +302,30 @@ it exposes CLI subcommands the relay shells out to via `$OPENAB_AGENT_AUTH_COMMA
    drop the three duplicated default sites (`llm.rs:153`, `acp.rs:385/446`). Consequence: removes the
    zero-config default (deployments set model via values.yaml/env already; needs a clear error message +
    CHANGELOG note). Also eliminates the silent Opus cost bump for API-key users.
-2. **Bundled `client_secret` storage — DECIDED (Brett 2026-06-24): env-injection default; encode-at-rest
-   fallback.** Google Code-Assist vendors (gemini, agy) ship a `GOCSPX-…` desktop-app secret. By RFC 8252 and
+2. **Bundled `client_secret` storage — DECIDED (Brett 2026-06-24): encode-at-rest default; env-injection
+   alternative.** Google Code-Assist vendors (gemini, agy) ship a `GOCSPX-…` desktop-app secret. By RFC 8252 and
    Google's own docs this value is **non-confidential** (installed-app secret, "obviously not treated as a
    secret") — there is no confidentiality to protect, so obscuring it adds zero cryptographic security. But it
    is **not safe as raw text in a public repo for operational reasons**: GitHub push-protection covers Google
    secrets **by default** (changelog 2026-03), so a raw `GOCSPX-` literal blocks contributor `git push`, and
    GitHub↔Google partner token-scanning may **auto-revoke** the credential once it lands in a public commit.
    Decision: do **not** commit the raw literal —
-   - **(b) inject at runtime via env is the default** (no secret in the repo at all): cleanest provenance,
-     consistent with the §5.3 env-route preference, removes the ToS/leak surface entirely. Costs the bundled
-     zero-config UX, which fleet/pod deployments don't need (they set env already).
-   - **(a) encode-at-rest** (split/base64) is the **fallback** only where a bundled zero-config binary is
-     genuinely required: it is **scanner-evasion for an already-public value, *not* a security control** —
-     label it as such inline so reviewers aren't misled. This is a well-adopted pattern: **rclone** declares a
-     `rcloneEncryptedClientSecret` constant and calls `obscure.MustReveal()` at runtime for exactly this
-     reason (bypass static scanners / partner auto-revoke, explicitly not encryption) — see §10.
+   - **(a) encode-at-rest is the default** (split/base64, decoded at runtime): keeps the bundled zero-config
+     UX while dodging push-protection and partner auto-revoke. It is **scanner-evasion for an already-public
+     value, *not* a security control** — label it as such inline so reviewers aren't misled into treating it
+     as a real secret. Well-adopted pattern: **rclone** declares a `rcloneEncryptedClientSecret` constant and
+     calls `obscure.MustReveal()` at runtime for exactly this reason (bypass static scanners / partner
+     auto-revoke, explicitly not encryption) — see §10.
+   - **(b) inject at runtime via env is the alternative** (no secret in the repo at all): cleanest provenance,
+     consistent with the §5.3 env-route preference; preferred where a deployment already sets env (fleet/pod)
+     and the bundled zero-config UX isn't needed.
+   Empirically the ecosystem overwhelmingly commits the literal plaintext (survey: 99/107 ≈ 93%; obscure 4,
+   env 1) and the shared secret is **not** being aggressively auto-revoked despite 100+ public copies — so the
+   risk we mitigate is mainly **contributor push-protection friction on the `openabdev/openab` org repo**, not
+   credential loss. Encode-at-rest buys that with zero UX cost, hence the default.
    agy secret requirement is now **CONFIRMED** (2026-06-24): agy *does* require a `GOCSPX-…` client_secret, a
-   public ecosystem constant (≥20 antigravity-auth repos hardcode it — §6). Under the (b) env default it is
-   injected, never committed; (a) applies only if agy is ever shipped as a bundled zero-config binary.
+   public ecosystem constant (≥20 antigravity-auth repos hardcode it — §6); it ships encode-at-rest by default
+   per this decision (or via env where (b) applies).
 3. **Vendor go/no-go — DECIDED (Brett 2026-06-24).**
    - **GO:** `gemini`, `grok` (high value, clean APIs, low ToS risk) — first wave.
    - **GO (experimental, opt-in):** `agy`. Marginal eng cost is low — it shares gemini's Google Code-Assist

From f3701103c9f6c79f9a600e561bb1827b798e931f Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 01:18:44 +0800
Subject: [PATCH 10/18] =?UTF-8?q?feat(auth):=20cross-process=20locking=20f?=
 =?UTF-8?q?or=20auth.json=20=E2=80=94=20codex=20+=20MCP=20(ADR=20=C2=A75.4?=
 =?UTF-8?q?)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Implements the ADR §5.4 invariant. auth.json had NO lock: provider
save_tokens and McpCredentialStore::save/clear each did an independent
unlocked read-modify-write, so a concurrent codex refresh and MCP save
last-writer-wins the whole map, and N processes (one openab-agent per Discord
thread) could each refresh the codex token with the same RT_old → OAuth 2.1
§10.4 token-family revocation (fleet-wide logout).

Two locks, flock(2) on sidecar files (kernel auto-releases on death), cfg(unix)
with a non-unix no-op:

- with_auth_locked: global exclusive lock across the re-read -> mutate ->
  atomic-write. ALL writers funnel through it — save_tokens (codex) and
  McpCredentialStore::save/clear (MCP) — so writers merge onto the latest
  on-disk state instead of lost-updating. Held only for the fast file RMW,
  never across network I/O.
- lock_tenant_refresh: per-tenant refresh serialisation. get_valid_token /
  force_refresh take the codex tenant lock (non-blocking try + async backoff +
  10s timeout, held across the network refresh), with a double-checked re-read
  so a process that waited adopts the token another already refreshed → exactly
  one real refresh per tenant per expiry, no RT_old reuse.

Uses libc::flock (already a cfg(unix) dep); rustix is NOT in-tree (ADR text was
optimistic). New test asserts the locked RMW merges codex + MCP tenants without
lost-update. fmt + clippy -D warnings + test (191 passed) green.

Known gap (for review): the MCP *network* refresh is owned by rmcp's
AuthorizationManager (it refreshes then calls CredentialStore::save), so MCP
writes get the file-integrity lock but MCP refreshes are not yet tenant-
serialised. Closing that needs an rmcp-level hook — proposed as follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 openab-agent/src/auth.rs | 255 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 222 insertions(+), 33 deletions(-)

diff --git a/openab-agent/src/auth.rs b/openab-agent/src/auth.rs
index d9a701237..24040fa0a 100644
--- a/openab-agent/src/auth.rs
+++ b/openab-agent/src/auth.rs
@@ -214,6 +214,145 @@ fn write_auth_file(path: &Path, map: &HashMap<String, AuthEntry>) -> Result<()>
     Ok(())
 }
 
+// ── auth.json cross-process locking (ADR §5.4) ──────────────────────────────
+//
+// `auth.json` is written by multiple processes (one openab-agent per Discord
+// thread) and by two code paths within each (`save_tokens` for the codex tenant
+// + `McpCredentialStore` for MCP servers). Two hazards, two locks:
+//
+//   (a) File integrity — every read-modify-write funnels through `with_auth_locked`,
+//       which holds an exclusive `flock` on an `auth.json.global.lock` sidecar
+//       across the re-read → mutate → atomic-write. The re-read *inside* the lock
+//       is what makes concurrent writers merge instead of lost-update.
+//   (b) Refresh-token rotation — `lock_tenant_refresh` serialises the network
+//       refresh per tenant so concurrent processes present a rotated `RT_old`
+//       only once, never tripping OAuth 2.1 §10.4 token-family revocation.
+//
+// `flock(2)` (not a sentinel lockfile) so the kernel auto-releases on fd close /
+// process death — no stale lock, no orphan cleanup. The lock lives on a sidecar,
+// never on `auth.json` itself, because the atomic tmp+rename swaps that inode out
+// from under any lock held on it. `#[cfg(unix)]`; a non-unix build is a no-op
+// (openab-agent is de-facto unix-only — see `write_auth_file`).
+
+/// Sidecar lock path `auth.json.<suffix>.lock`, next to the auth file so a
+/// test-injected tempdir locks its own sidecar rather than the real `$HOME` one.
+#[cfg(unix)]
+fn lock_path_for(auth: &Path, suffix: &str) -> PathBuf {
+    let dir = auth.parent().unwrap_or_else(|| Path::new("."));
+    dir.join(format!("auth.json.{suffix}.lock"))
+}
+
+/// RAII guard releasing the advisory lock on drop. The kernel also drops it on
+/// fd close / process death, so a crashed holder never wedges the file.
+#[cfg(unix)]
+struct AuthFileLock {
+    file: std::fs::File,
+}
+
+#[cfg(unix)]
+impl Drop for AuthFileLock {
+    fn drop(&mut self) {
+        use std::os::unix::io::AsRawFd;
+        // SAFETY: `self.file` owns a valid fd; flock has no memory effects.
+        unsafe { libc::flock(self.file.as_raw_fd(), libc::LOCK_UN) };
+    }
+}
+
+#[cfg(unix)]
+fn open_lock_file(lock: &Path) -> Result<std::fs::File> {
+    use std::os::unix::fs::OpenOptionsExt;
+    if let Some(dir) = lock.parent() {
+        std::fs::create_dir_all(dir)?;
+    }
+    Ok(std::fs::OpenOptions::new()
+        .write(true)
+        .create(true)
+        .truncate(false)
+        .mode(0o600)
+        .open(lock)?)
+}
+
+/// Blocking exclusive lock. Used ONLY for the global file RMW, which performs no
+/// network I/O while held, so acquisition blocks at most for another process's
+/// fast tmp+rename — never for a slow refresh (those take the per-tenant lock).
+#[cfg(unix)]
+fn flock_exclusive(lock: &Path) -> Result<AuthFileLock> {
+    use std::os::unix::io::AsRawFd;
+    let file = open_lock_file(lock)?;
+    // SAFETY: valid fd held by `file`; flock has no memory effects.
+    let rc = unsafe { libc::flock(file.as_raw_fd(), libc::LOCK_EX) };
+    if rc != 0 {
+        return Err(std::io::Error::last_os_error().into());
+    }
+    Ok(AuthFileLock { file })
+}
+
+/// Non-blocking exclusive lock; `Ok(None)` means another holder owns it.
+#[cfg(unix)]
+fn flock_try_exclusive(lock: &Path) -> Result<Option<AuthFileLock>> {
+    use std::os::unix::io::AsRawFd;
+    let file = open_lock_file(lock)?;
+    // SAFETY: valid fd held by `file`; flock has no memory effects.
+    let rc = unsafe { libc::flock(file.as_raw_fd(), libc::LOCK_EX | libc::LOCK_NB) };
+    if rc == 0 {
+        return Ok(Some(AuthFileLock { file }));
+    }
+    let err = std::io::Error::last_os_error();
+    if err.raw_os_error() == Some(libc::EWOULDBLOCK) {
+        Ok(None)
+    } else {
+        Err(err.into())
+    }
+}
+
+/// (a) File-integrity funnel (ADR §5.4). Holds the global sidecar lock across a
+/// re-read → mutate → atomic-write so the codex `save_tokens` path AND the MCP
+/// `McpCredentialStore` never lost-update the shared map: each writer merges onto
+/// the latest on-disk state. A corrupt file is quarantined by `read_auth_file`
+/// and treated as empty (`unwrap_or_default`), matching the prior save behaviour.
+fn with_auth_locked<R>(
+    path: &Path,
+    f: impl FnOnce(&mut HashMap<String, AuthEntry>) -> R,
+) -> Result<R> {
+    #[cfg(unix)]
+    let _guard = flock_exclusive(&lock_path_for(path, "global"))?;
+    let mut map = read_auth_file(path).unwrap_or_default();
+    let r = f(&mut map);
+    write_auth_file(path, &map)?;
+    Ok(r)
+}
+
+/// (b) Per-tenant refresh serialisation (ADR §5.4). The returned guard is held by
+/// the caller across the network refresh so concurrent processes do exactly one
+/// real refresh per tenant — never presenting a rotated `RT_old` twice (OAuth 2.1
+/// §10.4 family revocation). Non-blocking acquire + async backoff so a refresh in
+/// flight elsewhere never blocks this executor thread; on timeout we return `None`
+/// and let the caller proceed (degrade to a possible double-refresh, never wedge).
+#[cfg(unix)]
+async fn lock_tenant_refresh(auth: &Path, tenant: &str) -> Option<AuthFileLock> {
+    let lock = lock_path_for(auth, &format!("refresh.{tenant}"));
+    let deadline = std::time::Instant::now() + std::time::Duration::from_secs(10);
+    loop {
+        match flock_try_exclusive(&lock) {
+            Ok(Some(guard)) => return Some(guard),
+            Ok(None) => {
+                if std::time::Instant::now() >= deadline {
+                    tracing::warn!(
+                        tenant,
+                        "timed out waiting for refresh lock; proceeding unserialised"
+                    );
+                    return None;
+                }
+                tokio::time::sleep(std::time::Duration::from_millis(100)).await;
+            }
+            Err(e) => {
+                tracing::warn!(tenant, error = %e, "refresh lock unavailable; proceeding unserialised");
+                return None;
+            }
+        }
+    }
+}
+
 pub fn load_tokens() -> Result<TokenStore> {
     let path = auth_path();
     let map = read_auth_file(&path).map_err(|_| {
@@ -232,10 +371,9 @@ pub fn load_tokens() -> Result<TokenStore> {
 }
 
 fn save_tokens(store: &TokenStore) -> Result<()> {
-    let path = auth_path();
-    let mut map = read_auth_file(&path).unwrap_or_default();
-    map.insert(CODEX_NAMESPACE.to_string(), AuthEntry::Token(store.clone()));
-    write_auth_file(&path, &map)
+    with_auth_locked(&auth_path(), |map| {
+        map.insert(CODEX_NAMESPACE.to_string(), AuthEntry::Token(store.clone()));
+    })
 }
 
 /// rmcp [`CredentialStore`] backed by the shared `auth.json` file (ADR §6.1
@@ -283,61 +421,89 @@ impl CredentialStore for McpCredentialStore {
 
     async fn save(&self, mut credentials: StoredCredentials) -> Result<(), AuthError> {
         use oauth2::{RefreshToken, TokenResponse};
-        let mut map = read_auth_file(&self.path).unwrap_or_default();
 
         // OAuth 2.1 §10.4: when a refresh response omits `refresh_token`, the
         // prior one stays valid. rmcp's `refresh_token()` rebuilds the stored
         // credentials from the refresh response alone, so a rotating-but-omitting
         // AS would lose our fallback — splice the prior refresh_token back in.
+        // The prior-read happens inside the lock (re-read) so two writers can't
+        // race the splice. The whole RMW funnels through `with_auth_locked` so an
+        // interleaving codex `save_tokens` never lost-updates this MCP entry.
         let incoming_has_refresh = credentials
             .token_response
             .as_ref()
             .and_then(|tr| tr.refresh_token())
             .is_some_and(|rt| !rt.secret().is_empty());
-        if !incoming_has_refresh {
-            if let Some(AuthEntry::Mcp(old)) = map.get(&self.key) {
-                let prior = old
-                    .token_response
-                    .as_ref()
-                    .and_then(|tr| tr.refresh_token())
-                    .map(|rt| rt.secret().to_string())
-                    .filter(|s| !s.is_empty());
+        let key = self.key.clone();
+        with_auth_locked(&self.path, move |map| {
+            if !incoming_has_refresh {
+                let prior = match map.get(&key) {
+                    Some(AuthEntry::Mcp(old)) => old
+                        .token_response
+                        .as_ref()
+                        .and_then(|tr| tr.refresh_token())
+                        .map(|rt| rt.secret().to_string())
+                        .filter(|s| !s.is_empty()),
+                    _ => None,
+                };
                 if let (Some(prior), Some(tr)) = (prior, credentials.token_response.as_mut()) {
                     tr.set_refresh_token(Some(RefreshToken::new(prior)));
                 }
             }
-        }
-
-        map.insert(self.key.clone(), AuthEntry::Mcp(credentials));
-        write_auth_file(&self.path, &map).map_err(|e| AuthError::InternalError(e.to_string()))
+            map.insert(key.clone(), AuthEntry::Mcp(credentials));
+        })
+        .map_err(|e| AuthError::InternalError(e.to_string()))
     }
 
     async fn clear(&self) -> Result<(), AuthError> {
-        let mut map = match read_auth_file(&self.path) {
-            Ok(m) => m,
-            Err(_) => return Ok(()),
+        // Same global lock as every other writer; preserves the "last entry
+        // removed → delete the file" behaviour rather than leaving an empty map.
+        let run = || -> Result<()> {
+            #[cfg(unix)]
+            let _guard = flock_exclusive(&lock_path_for(&self.path, "global"))?;
+            let mut map = match read_auth_file(&self.path) {
+                Ok(m) => m,
+                Err(_) => return Ok(()),
+            };
+            if map.remove(&self.key).is_none() {
+                return Ok(());
+            }
+            if map.is_empty() {
+                let _ = std::fs::remove_file(&self.path);
+                return Ok(());
+            }
+            write_auth_file(&self.path, &map)
         };
-        if map.remove(&self.key).is_none() {
-            return Ok(());
-        }
-        if map.is_empty() {
-            let _ = std::fs::remove_file(&self.path);
-            return Ok(());
-        }
-        write_auth_file(&self.path, &map).map_err(|e| AuthError::InternalError(e.to_string()))
+        run().map_err(|e| AuthError::InternalError(e.to_string()))
     }
 }
 
 pub async fn get_valid_token() -> Result<String> {
-    let mut store = load_tokens()?;
-    if store.is_expired() {
-        store = refresh_token(&store).await?;
-        save_tokens(&store)?;
+    // 1. Fast path: a fresh token needs no lock.
+    let store = load_tokens()?;
+    if !store.is_expired() {
+        return Ok(store.access_token);
     }
-    Ok(store.access_token)
+    // 2. Serialise the refresh for the codex tenant — held across the network
+    //    call so a second process does not present the same RT_old (§5.4 (b)).
+    #[cfg(unix)]
+    let _refresh_guard = lock_tenant_refresh(&auth_path(), CODEX_NAMESPACE).await;
+    // 3. Double-check: another process may have refreshed while we waited.
+    let store = load_tokens()?;
+    if !store.is_expired() {
+        return Ok(store.access_token);
+    }
+    // 4. Exactly one network refresh per tenant per expiry (tenant lock held).
+    let fresh = refresh_token(&store).await?;
+    // 5. Commit under the global file lock.
+    save_tokens(&fresh)?;
+    Ok(fresh.access_token)
 }
 
 pub async fn force_refresh() -> Result<String> {
+    // Serialise even a forced refresh so two of them can't both rotate RT_old.
+    #[cfg(unix)]
+    let _refresh_guard = lock_tenant_refresh(&auth_path(), CODEX_NAMESPACE).await;
     let store = load_tokens()?;
     let new_store = refresh_token(&store).await?;
     save_tokens(&new_store)?;
@@ -1090,4 +1256,27 @@ mod tests {
         let pending = map.get("mcp-pending:srv");
         assert!(matches!(pending, Some(AuthEntry::Pending(_))));
     }
+
+    #[test]
+    fn with_auth_locked_merges_concurrent_tenants_no_lost_update() {
+        // Two locked RMWs against the same file — the codex tenant and an MCP
+        // tenant — must both survive: the second writer re-reads inside the lock
+        // and merges, instead of clobbering the first (the §5.4 lost-update fix).
+        let dir = tempfile::tempdir().unwrap();
+        let path = dir.path().join("auth.json");
+
+        with_auth_locked(&path, |m| {
+            m.insert("codex".to_string(), AuthEntry::Token(make_store(1)));
+        })
+        .unwrap();
+        with_auth_locked(&path, |m| {
+            m.insert("github".to_string(), AuthEntry::Mcp(make_mcp_creds()));
+        })
+        .unwrap();
+
+        let map = read_auth_file(&path).unwrap();
+        assert_eq!(map.len(), 2, "second write merged, did not lost-update");
+        assert_eq!(token_of(map.get("codex")).expires_at, 1);
+        assert!(matches!(map.get("github"), Some(AuthEntry::Mcp(_))));
+    }
 }

From 38007e28eb8b46e1b566b46e1d48b638be7b2ee0 Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 01:24:24 +0800
Subject: [PATCH 11/18] =?UTF-8?q?feat(auth):=20address=20Mira=20code=20rev?=
 =?UTF-8?q?iew=20=E2=80=94=20pending=20GC=20+=20portable=20WouldBlock?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Round-2 code review (Mira):
- §7 pending-entry GC: add created_at to PendingPasteLogin and sweep
  AuthEntry::Pending older than 15 min inside with_auth_locked on every write,
  so abandoned /auth two-step attempts don't accumulate. PendingPasteLogin is
  currently a legacy tombstone with no live writer (PKCE state lives in rmcp's
  in-memory StateStore); the field + GC land now per ADR §7 and are
  forward-compatible with the forthcoming /auth two-step flow, and meanwhile
  sweep legacy stray entries (created_at default 0 reads as ancient). New test
  covers stale-swept / fresh-kept / real-tenant-untouched.
- flock_try_exclusive: match std::io::ErrorKind::WouldBlock instead of a single
  raw EWOULDBLOCK errno, covering EAGAIN/EWOULDBLOCK across libc/BSD.
- MCP network-refresh serialisation stays a follow-up (rmcp owns the refresh) —
  Mira concurs.

fmt + clippy -D warnings + test (192 passed) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 openab-agent/src/auth.rs | 70 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/openab-agent/src/auth.rs b/openab-agent/src/auth.rs
index 24040fa0a..25c3827db 100644
--- a/openab-agent/src/auth.rs
+++ b/openab-agent/src/auth.rs
@@ -74,6 +74,14 @@ pub struct PendingPasteLogin {
     /// (written before this field existed) deserializable.
     #[serde(default)]
     pub resource: Option<String>,
+    /// Unix-seconds stamp set when the pending entry is written; `with_auth_locked`
+    /// expires entries older than 15 min (ADR §7) so an abandoned `/auth` two-step
+    /// (verifier written, code never pasted) doesn't accumulate. The two-step flow
+    /// that *writes* pending state is forthcoming; today this struct is a legacy
+    /// read-tolerant tombstone, so the field exists mainly for the GC. `#[serde(default)]`
+    /// (= 0) reads pre-existing/legacy entries as ancient → swept on the next write.
+    #[serde(default)]
+    pub created_at: u64,
 }
 
 /// `auth.json` value type. Untagged Serde enum: `TokenStore` has required
@@ -298,7 +306,9 @@ fn flock_try_exclusive(lock: &Path) -> Result<Option<AuthFileLock>> {
         return Ok(Some(AuthFileLock { file }));
     }
     let err = std::io::Error::last_os_error();
-    if err.raw_os_error() == Some(libc::EWOULDBLOCK) {
+    // flock can report EWOULDBLOCK or EAGAIN depending on libc/BSD; both map to
+    // `ErrorKind::WouldBlock`, so use the std abstraction (Mira review).
+    if err.kind() == std::io::ErrorKind::WouldBlock {
         Ok(None)
     } else {
         Err(err.into())
@@ -318,10 +328,27 @@ fn with_auth_locked<R>(
     let _guard = flock_exclusive(&lock_path_for(path, "global"))?;
     let mut map = read_auth_file(path).unwrap_or_default();
     let r = f(&mut map);
+    gc_stale_pending(&mut map);
     write_auth_file(path, &map)?;
     Ok(r)
 }
 
+/// Opportunistic GC (ADR §7): drop `AuthEntry::Pending` entries older than 15 min
+/// on every locked write, so abandoned `/auth` two-step attempts don't accumulate.
+/// `created_at == 0` (legacy/unstamped entries) reads as ancient and is swept.
+const PENDING_TTL_SECONDS: u64 = 15 * 60;
+
+fn gc_stale_pending(map: &mut HashMap<String, AuthEntry>) {
+    let now = SystemTime::now()
+        .duration_since(UNIX_EPOCH)
+        .unwrap_or_default()
+        .as_secs();
+    map.retain(|_, entry| match entry {
+        AuthEntry::Pending(p) => now.saturating_sub(p.created_at) <= PENDING_TTL_SECONDS,
+        _ => true,
+    });
+}
+
 /// (b) Per-tenant refresh serialisation (ADR §5.4). The returned guard is held by
 /// the caller across the network refresh so concurrent processes do exactly one
 /// real refresh per tenant — never presenting a rotated `RT_old` twice (OAuth 2.1
@@ -1013,6 +1040,7 @@ mod tests {
             token_url: "https://example.com/token".to_string(),
             provider_name: "anthropic-mcp".to_string(),
             resource: None,
+            created_at: 0,
         }
     }
 
@@ -1279,4 +1307,44 @@ mod tests {
         assert_eq!(token_of(map.get("codex")).expires_at, 1);
         assert!(matches!(map.get("github"), Some(AuthEntry::Mcp(_))));
     }
+
+    #[test]
+    fn with_auth_locked_gcs_stale_pending_but_keeps_fresh_and_tokens() {
+        // ADR §7: a locked write opportunistically sweeps `Pending` entries older
+        // than 15 min, while fresh pending state and real tenants are untouched.
+        let dir = tempfile::tempdir().unwrap();
+        let path = dir.path().join("auth.json");
+        let now = SystemTime::now()
+            .duration_since(UNIX_EPOCH)
+            .unwrap()
+            .as_secs();
+        // Seed through the un-GC'd writer so the stale entry exists pre-sweep.
+        let mut seed = HashMap::new();
+        seed.insert("codex".to_string(), AuthEntry::Token(make_store(1)));
+        seed.insert(
+            "mcp-pending:stale".to_string(),
+            AuthEntry::Pending(make_pending()), // created_at = 0 → ancient
+        );
+        seed.insert(
+            "mcp-pending:fresh".to_string(),
+            AuthEntry::Pending(PendingPasteLogin {
+                created_at: now,
+                ..make_pending()
+            }),
+        );
+        write_auth_file(&path, &seed).unwrap();
+
+        with_auth_locked(&path, |_m| {}).unwrap();
+
+        let map = read_auth_file(&path).unwrap();
+        assert!(
+            map.get("mcp-pending:stale").is_none(),
+            "stale pending swept"
+        );
+        assert!(
+            matches!(map.get("mcp-pending:fresh"), Some(AuthEntry::Pending(_))),
+            "fresh pending kept"
+        );
+        assert!(map.get("codex").is_some(), "real tenant untouched");
+    }
 }

From f53f4631d56b700dd3395924d7b80bd1e817000b Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 03:04:40 +0800
Subject: [PATCH 12/18] =?UTF-8?q?feat(auth):=20serialise=20MCP=20token=20r?=
 =?UTF-8?q?efresh=20cross-process=20(close=20=C2=A75.4=20(b)=20gap)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Closes the known gap from f370110: MCP refreshes now get the same per-tenant
serialisation as codex.

rmcp's CredentialStore exposes no pre-refresh hook, but openab drives the MCP
refresh explicitly in McpRuntimeManager::resolve_oauth_dial via
client.get_access_token(). Wrap that call (per-server) with the existing
auth::lock_tenant_refresh so only one process refreshes a given server at a
time. No explicit double-check needed: rmcp's get_access_token re-load()s
auth.json each call and returns the cached token without a network refresh when
remaining >= REFRESH_BUFFER (rmcp auth.rs:1238), so a process that loses the
race adopts the token the winner wrote to the shared file — no second RT_old
presentation, no OAuth 2.1 §10.4 family revocation. rmcp already single-flights
within one process via its AuthorizationManager Mutex; this closes the
cross-process gap.

- auth.rs: lock_tenant_refresh + AuthFileLock made pub(crate) for the mcp module.
- ADR §5.4: document the resolve_oauth_dial serialisation point.

fmt + clippy -D warnings + test (192 passed) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 docs/adr/openab-agent-oauth.md  |  6 +++++-
 openab-agent/src/auth.rs        |  4 ++--
 openab-agent/src/mcp/runtime.rs | 19 +++++++++++++++----
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/docs/adr/openab-agent-oauth.md b/docs/adr/openab-agent-oauth.md
index 2e688af1e..753e86fb6 100644
--- a/docs/adr/openab-agent-oauth.md
+++ b/docs/adr/openab-agent-oauth.md
@@ -207,7 +207,11 @@ fn get_or_refresh(tenant: &str) -> Result<String> {
   `ci-openab-agent.yml` is linux, deploy is always container; Windows binaries are the broker only.)
 - Each MCP `mcp:<server>` tenant takes its own tenant lock by the same rule, so the MCP `CredentialStore`
   refreshes are serialized per server too — the invariant serves it directly (see `openab-agent-mcp.md`
-  §6.1).
+  §6.1). rmcp owns the MCP refresh internally (no pre-refresh `CredentialStore` hook), but openab drives it
+  explicitly at `resolve_oauth_dial` (`mcp/runtime.rs`) via `client.get_access_token()`; the per-server
+  tenant lock wraps that call, and rmcp's `get_access_token` re-`load()`s `auth.json` and skips the network
+  refresh when the token is already fresh, so the lock-loser adopts the winner's token (cross-process
+  single-flight) without a second `RT_old` presentation.
 - **Until this lands**, prefer the `CLAUDE_CODE_OAUTH_TOKEN` env route (§5.3 — no refresh write, no race);
   treat interactive Anthropic OAuth as not-yet-hardened for high concurrency.
 
diff --git a/openab-agent/src/auth.rs b/openab-agent/src/auth.rs
index 25c3827db..2a3d78305 100644
--- a/openab-agent/src/auth.rs
+++ b/openab-agent/src/auth.rs
@@ -253,7 +253,7 @@ fn lock_path_for(auth: &Path, suffix: &str) -> PathBuf {
 /// RAII guard releasing the advisory lock on drop. The kernel also drops it on
 /// fd close / process death, so a crashed holder never wedges the file.
 #[cfg(unix)]
-struct AuthFileLock {
+pub(crate) struct AuthFileLock {
     file: std::fs::File,
 }
 
@@ -356,7 +356,7 @@ fn gc_stale_pending(map: &mut HashMap<String, AuthEntry>) {
 /// flight elsewhere never blocks this executor thread; on timeout we return `None`
 /// and let the caller proceed (degrade to a possible double-refresh, never wedge).
 #[cfg(unix)]
-async fn lock_tenant_refresh(auth: &Path, tenant: &str) -> Option<AuthFileLock> {
+pub(crate) async fn lock_tenant_refresh(auth: &Path, tenant: &str) -> Option<AuthFileLock> {
     let lock = lock_path_for(auth, &format!("refresh.{tenant}"));
     let deadline = std::time::Instant::now() + std::time::Duration::from_secs(10);
     loop {
diff --git a/openab-agent/src/mcp/runtime.rs b/openab-agent/src/mcp/runtime.rs
index 0c8ac7936..46c038254 100644
--- a/openab-agent/src/mcp/runtime.rs
+++ b/openab-agent/src/mcp/runtime.rs
@@ -631,10 +631,21 @@ impl McpRuntimeManager {
         if !has_refresh {
             return Err(needs_login());
         }
-        // Expired but refreshable: rmcp needs the authorization server's
-        // metadata configured before it can exchange the refresh token, so
-        // discover it now (network) and let `get_access_token` perform the
-        // rotation. Any failure surfaces as user-actionable `NeedsAuth`.
+        // Expired but refreshable. Serialise the refresh per-server across
+        // processes (ADR §5.4 (b)): openab runs one process per Discord thread,
+        // all sharing this server's stored creds, so two of them would otherwise
+        // present the same `RT_old` and trip OAuth 2.1 §10.4 token-family
+        // revocation. rmcp already single-flights within one process (shared
+        // `AuthorizationManager` `Mutex`); this closes the cross-process gap.
+        // The lock is held across the network refresh below. The double-check is
+        // implicit: rmcp's `get_access_token` re-`load()`s `auth.json` and skips
+        // the network refresh when the token is already fresh, so a process that
+        // loses the race adopts the token the winner wrote. Non-unix = no-op.
+        #[cfg(unix)]
+        let _refresh_guard = crate::auth::lock_tenant_refresh(&self.auth_path, name).await;
+        // rmcp needs the authorization server's metadata configured before it can
+        // exchange the refresh token, so discover it now (network) and let
+        // `get_access_token` perform the rotation. Failures surface as `NeedsAuth`.
         {
             let mut mgr = client.auth_manager.lock().await;
             mgr.initialize_from_store().await.map_err(|e| {

From 842c70feccd911b341f5b5dbd0cb818fbc2acdbc Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 04:00:14 +0800
Subject: [PATCH 13/18] =?UTF-8?q?refactor(auth):=20/simplify=20cleanups=20?=
 =?UTF-8?q?=E2=80=94=20dedup=20lock=20acquire,=20reuse=20refresh=20fd?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Quality-only cleanups from a 4-angle /simplify pass (no behaviour change):
- Extract lock_global(path) -> Result<Option<AuthFileLock>> and route both
  with_auth_locked and McpCredentialStore::clear through it, so the "global"
  sidecar name + the cfg(unix) acquire live in one place instead of two
  copy-pasted blocks. clear flattens (drop the inner run closure).
- lock_tenant_refresh opens the lock fd once and re-issues flock on it each
  retry, instead of re-opening (and re-create_dir_all-ing) the file every
  100ms under contention. Removes the now-unused flock_try_exclusive helper.
- Document lock_tenant_refresh's double-check contract (the lock only
  serialises; callers must re-check freshness after acquiring).

fmt + clippy -D warnings + test (192 passed) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 openab-agent/src/auth.rs | 121 ++++++++++++++++++++++-----------------
 1 file changed, 67 insertions(+), 54 deletions(-)

diff --git a/openab-agent/src/auth.rs b/openab-agent/src/auth.rs
index 2a3d78305..34bb24f44 100644
--- a/openab-agent/src/auth.rs
+++ b/openab-agent/src/auth.rs
@@ -295,23 +295,19 @@ fn flock_exclusive(lock: &Path) -> Result<AuthFileLock> {
     Ok(AuthFileLock { file })
 }
 
-/// Non-blocking exclusive lock; `Ok(None)` means another holder owns it.
-#[cfg(unix)]
-fn flock_try_exclusive(lock: &Path) -> Result<Option<AuthFileLock>> {
-    use std::os::unix::io::AsRawFd;
-    let file = open_lock_file(lock)?;
-    // SAFETY: valid fd held by `file`; flock has no memory effects.
-    let rc = unsafe { libc::flock(file.as_raw_fd(), libc::LOCK_EX | libc::LOCK_NB) };
-    if rc == 0 {
-        return Ok(Some(AuthFileLock { file }));
+/// Acquire the global `auth.json` write lock (a no-op `None` guard off-unix).
+/// Both `with_auth_locked` and `McpCredentialStore::clear` — which needs a
+/// delete-on-empty tail the funnel can't express — acquire here, so the
+/// `"global"` sidecar name and the acquire policy live in exactly one place.
+fn lock_global(path: &Path) -> Result<Option<AuthFileLock>> {
+    #[cfg(unix)]
+    {
+        Ok(Some(flock_exclusive(&lock_path_for(path, "global"))?))
     }
-    let err = std::io::Error::last_os_error();
-    // flock can report EWOULDBLOCK or EAGAIN depending on libc/BSD; both map to
-    // `ErrorKind::WouldBlock`, so use the std abstraction (Mira review).
-    if err.kind() == std::io::ErrorKind::WouldBlock {
+    #[cfg(not(unix))]
+    {
+        let _ = path;
         Ok(None)
-    } else {
-        Err(err.into())
     }
 }
 
@@ -324,8 +320,7 @@ fn with_auth_locked<R>(
     path: &Path,
     f: impl FnOnce(&mut HashMap<String, AuthEntry>) -> R,
 ) -> Result<R> {
-    #[cfg(unix)]
-    let _guard = flock_exclusive(&lock_path_for(path, "global"))?;
+    let _guard = lock_global(path)?;
     let mut map = read_auth_file(path).unwrap_or_default();
     let r = f(&mut map);
     gc_stale_pending(&mut map);
@@ -352,31 +347,51 @@ fn gc_stale_pending(map: &mut HashMap<String, AuthEntry>) {
 /// (b) Per-tenant refresh serialisation (ADR §5.4). The returned guard is held by
 /// the caller across the network refresh so concurrent processes do exactly one
 /// real refresh per tenant — never presenting a rotated `RT_old` twice (OAuth 2.1
-/// §10.4 family revocation). Non-blocking acquire + async backoff so a refresh in
-/// flight elsewhere never blocks this executor thread; on timeout we return `None`
-/// and let the caller proceed (degrade to a possible double-refresh, never wedge).
+/// §10.4 family revocation). Non-blocking acquire on a single fd + async backoff
+/// so a refresh in flight elsewhere never blocks this executor thread; on timeout
+/// we return `None` and let the caller proceed (degrade to a possible
+/// double-refresh, never wedge).
+///
+/// The lock only *serialises* — it does not decide whether to refresh. Callers
+/// MUST re-check token freshness after acquiring (the double-check) so a process
+/// that lost the race adopts the token the winner wrote instead of refreshing
+/// again: `get_valid_token` does this explicitly; the MCP path relies on rmcp's
+/// `get_access_token` re-`load()`ing the store.
 #[cfg(unix)]
 pub(crate) async fn lock_tenant_refresh(auth: &Path, tenant: &str) -> Option<AuthFileLock> {
+    use std::os::unix::io::AsRawFd;
     let lock = lock_path_for(auth, &format!("refresh.{tenant}"));
+    // Open the lock fd once; re-issue `flock` on it each retry instead of
+    // re-opening (and re-`create_dir_all`-ing) the same file every 100 ms.
+    let file = match open_lock_file(&lock) {
+        Ok(f) => f,
+        Err(e) => {
+            tracing::warn!(tenant, error = %e, "refresh lock unavailable; proceeding unserialised");
+            return None;
+        }
+    };
     let deadline = std::time::Instant::now() + std::time::Duration::from_secs(10);
     loop {
-        match flock_try_exclusive(&lock) {
-            Ok(Some(guard)) => return Some(guard),
-            Ok(None) => {
-                if std::time::Instant::now() >= deadline {
-                    tracing::warn!(
-                        tenant,
-                        "timed out waiting for refresh lock; proceeding unserialised"
-                    );
-                    return None;
-                }
-                tokio::time::sleep(std::time::Duration::from_millis(100)).await;
-            }
-            Err(e) => {
-                tracing::warn!(tenant, error = %e, "refresh lock unavailable; proceeding unserialised");
-                return None;
-            }
+        // SAFETY: valid fd held by `file`; flock has no memory effects.
+        let rc = unsafe { libc::flock(file.as_raw_fd(), libc::LOCK_EX | libc::LOCK_NB) };
+        if rc == 0 {
+            return Some(AuthFileLock { file });
+        }
+        let err = std::io::Error::last_os_error();
+        // EWOULDBLOCK/EAGAIN (both `ErrorKind::WouldBlock`) = another holder is
+        // refreshing; any other errno is a real failure we degrade on.
+        if err.kind() != std::io::ErrorKind::WouldBlock {
+            tracing::warn!(tenant, error = %err, "refresh lock unavailable; proceeding unserialised");
+            return None;
+        }
+        if std::time::Instant::now() >= deadline {
+            tracing::warn!(
+                tenant,
+                "timed out waiting for refresh lock; proceeding unserialised"
+            );
+            return None;
         }
+        tokio::time::sleep(std::time::Duration::from_millis(100)).await;
     }
 }
 
@@ -483,25 +498,23 @@ impl CredentialStore for McpCredentialStore {
     }
 
     async fn clear(&self) -> Result<(), AuthError> {
-        // Same global lock as every other writer; preserves the "last entry
-        // removed → delete the file" behaviour rather than leaving an empty map.
-        let run = || -> Result<()> {
-            #[cfg(unix)]
-            let _guard = flock_exclusive(&lock_path_for(&self.path, "global"))?;
-            let mut map = match read_auth_file(&self.path) {
-                Ok(m) => m,
-                Err(_) => return Ok(()),
-            };
-            if map.remove(&self.key).is_none() {
-                return Ok(());
-            }
-            if map.is_empty() {
-                let _ = std::fs::remove_file(&self.path);
-                return Ok(());
-            }
-            write_auth_file(&self.path, &map)
+        // Same global lock as every other writer, but with a delete-on-empty tail
+        // `with_auth_locked` can't express (it always writes), so `clear` acquires
+        // the shared `lock_global` directly rather than funnelling through it.
+        let _guard =
+            lock_global(&self.path).map_err(|e| AuthError::InternalError(e.to_string()))?;
+        let mut map = match read_auth_file(&self.path) {
+            Ok(m) => m,
+            Err(_) => return Ok(()),
         };
-        run().map_err(|e| AuthError::InternalError(e.to_string()))
+        if map.remove(&self.key).is_none() {
+            return Ok(());
+        }
+        if map.is_empty() {
+            let _ = std::fs::remove_file(&self.path);
+            return Ok(());
+        }
+        write_auth_file(&self.path, &map).map_err(|e| AuthError::InternalError(e.to_string()))
     }
 }
 

From 4e6f2467ca23081f7581f4cf8ba77a26ea72cdd6 Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 04:07:13 +0800
Subject: [PATCH 14/18] =?UTF-8?q?docs(auth):=20/code-review=20=E2=80=94=20?=
 =?UTF-8?q?harden=20GC=20+=20refresh-lock=20contract=20comments?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Doc-only clarifications from a /code-review pass (no behaviour change):
- PendingPasteLogin.created_at: warn loudly that any writer of a fresh Pending
  entry MUST stamp created_at, else gc_stale_pending sweeps it on the next
  locked write (latent footgun for the forthcoming /auth two-step writer).
- lock_tenant_refresh: correct the contract — reuse-safety comes from loading
  the refresh token INSIDE the lock (so force_refresh, which always refreshes
  on a 401, is reuse-safe too); the post-lock expiry re-check is only an
  optimisation to skip a redundant refresh.

Review also surfaced a pre-existing, out-of-scope namespacing gap (an MCP
server literally named "codex" collides with the codex tenant's auth.json key
and refresh lock — committed MCP creds use the bare server name, not
mcp:<server> as ADR §5.4 describes) — flagged for a separate change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 openab-agent/src/auth.rs | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/openab-agent/src/auth.rs b/openab-agent/src/auth.rs
index 34bb24f44..a11ec5658 100644
--- a/openab-agent/src/auth.rs
+++ b/openab-agent/src/auth.rs
@@ -80,6 +80,11 @@ pub struct PendingPasteLogin {
     /// that *writes* pending state is forthcoming; today this struct is a legacy
     /// read-tolerant tombstone, so the field exists mainly for the GC. `#[serde(default)]`
     /// (= 0) reads pre-existing/legacy entries as ancient → swept on the next write.
+    ///
+    /// **Any code that writes a fresh `Pending` entry MUST set `created_at` to the
+    /// current Unix time** — an unstamped (0) entry is treated as ancient and is
+    /// swept by `gc_stale_pending` on the very next locked write, so a verifier
+    /// written without stamping would vanish before the user pastes the code.
     #[serde(default)]
     pub created_at: u64,
 }
@@ -352,11 +357,15 @@ fn gc_stale_pending(map: &mut HashMap<String, AuthEntry>) {
 /// we return `None` and let the caller proceed (degrade to a possible
 /// double-refresh, never wedge).
 ///
-/// The lock only *serialises* — it does not decide whether to refresh. Callers
-/// MUST re-check token freshness after acquiring (the double-check) so a process
-/// that lost the race adopts the token the winner wrote instead of refreshing
-/// again: `get_valid_token` does this explicitly; the MCP path relies on rmcp's
-/// `get_access_token` re-`load()`ing the store.
+/// Reuse-safety comes from loading the refresh token *inside* the lock: a
+/// process that waited then loads the token the winner just wrote, so it never
+/// re-presents a rotated `RT_old`. Re-checking expiry after acquiring is an
+/// additional optimisation that skips a redundant network refresh — `get_valid_token`
+/// does this explicitly, and the MCP path gets it free from rmcp's `get_access_token`
+/// (which re-`load()`s and returns early when the token is already fresh).
+/// `force_refresh` intentionally skips that optimisation and always refreshes (it
+/// runs on a 401, where the clock-fresh token is already known-bad); it stays
+/// reuse-safe because it, too, loads inside the lock.
 #[cfg(unix)]
 pub(crate) async fn lock_tenant_refresh(auth: &Path, tenant: &str) -> Option<AuthFileLock> {
     use std::os::unix::io::AsRawFd;

From b40693c31c87f1fa8133f08a30f71b1e860f83c5 Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 04:10:43 +0800
Subject: [PATCH 15/18] build(openab-agent): declare standalone [workspace] to
 fix CI

ci-openab-agent.yml runs cargo fmt/clippy/test/build with
working-directory: openab-agent, but the crate is not a member of the parent
openab workspace (members = crates/openab-core, openab-gateway) and was not
excluded, so cargo errors 'current package believes it's in a workspace when
it's not' and every step fails before doing any work. openab-agent is
intentionally standalone (own version + dual reqwest 0.12/0.13 for rmcp), so
the correct fix is an empty [workspace] table making it its own root.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 openab-agent/Cargo.toml | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/openab-agent/Cargo.toml b/openab-agent/Cargo.toml
index 5f146ca21..57f3d55ae 100644
--- a/openab-agent/Cargo.toml
+++ b/openab-agent/Cargo.toml
@@ -50,3 +50,11 @@ libc = "0.2"
 [dev-dependencies]
 tempfile = "3"
 temp-env = "0.3.6"
+
+# openab-agent is a standalone crate (its own version + a dual reqwest 0.12/0.13
+# dep set for rmcp), deliberately NOT a member of the parent `openab` workspace
+# (members = crates/openab-core, openab-gateway). Declare an empty workspace so
+# cargo, run from this directory, treats it as its own root instead of erroring
+# "current package believes it's in a workspace when it's not" — which otherwise
+# breaks every cargo step in ci-openab-agent.yml (working-directory: openab-agent).
+[workspace]

From 752be4c1106d7dd6774cb1549b36ffa8feb89f95 Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 04:24:10 +0800
Subject: [PATCH 16/18] Revert "build(openab-agent): declare standalone
 [workspace] to fix CI"

This reverts commit b40693c31c87f1fa8133f08a30f71b1e860f83c5.
---
 openab-agent/Cargo.toml | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/openab-agent/Cargo.toml b/openab-agent/Cargo.toml
index 57f3d55ae..5f146ca21 100644
--- a/openab-agent/Cargo.toml
+++ b/openab-agent/Cargo.toml
@@ -50,11 +50,3 @@ libc = "0.2"
 [dev-dependencies]
 tempfile = "3"
 temp-env = "0.3.6"
-
-# openab-agent is a standalone crate (its own version + a dual reqwest 0.12/0.13
-# dep set for rmcp), deliberately NOT a member of the parent `openab` workspace
-# (members = crates/openab-core, openab-gateway). Declare an empty workspace so
-# cargo, run from this directory, treats it as its own root instead of erroring
-# "current package believes it's in a workspace when it's not" — which otherwise
-# breaks every cargo step in ci-openab-agent.yml (working-directory: openab-agent).
-[workspace]

From 212d4bbf43366cfc00ea125487421510924bd7d8 Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 04:25:18 +0800
Subject: [PATCH 17/18] ci(openab-agent): append [workspace] in CI + harden ACP
 smoke test
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

ci-openab-agent.yml ran cargo from working-directory: openab-agent without the
[workspace] table the crate needs (it's standalone, not a parent-workspace
member), so every step failed with 'believes it's in a workspace when it's
not' — the workflow had been red independently of any PR. Dockerfile.unified
already works around this by appending the table at build time; replicate that
in the workflow rather than committing it to Cargo.toml (which would
double-append in the Dockerfile and break the image build).

Also harden the ACP smoke test: build the release binary in its own step and
bump the response timeout 5s -> 30s so a loaded runner doesn't flake the
agent's first-response window (the binary itself responds in <1s locally,
verified incl. a clean-HOME release build).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 .github/workflows/ci-openab-agent.yml | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/ci-openab-agent.yml b/.github/workflows/ci-openab-agent.yml
index 11d9290ca..550edb4ac 100644
--- a/.github/workflows/ci-openab-agent.yml
+++ b/.github/workflows/ci-openab-agent.yml
@@ -24,17 +24,27 @@ jobs:
       - uses: Swatinem/rust-cache@v2
         with:
           workspaces: openab-agent
+      # openab-agent is a standalone crate, not a member of the parent openab
+      # workspace (members = crates/openab-core, openab-gateway) and not excluded,
+      # so cargo run from this directory errors "believes it's in a workspace when
+      # it's not". Declare an empty workspace, mirroring Dockerfile.unified, so
+      # every cargo step below resolves it as its own root.
+      - run: printf '\n[workspace]\n' >> Cargo.toml
       - run: cargo fmt --check
       - run: cargo clippy -- -D warnings
       - run: cargo test
       - run: cargo test -- --ignored
         env:
           ANTHROPIC_API_KEY: "fake-key-for-ci"
+      # Build in its own step so a build failure is distinct from a smoke failure
+      # and the timed run below races only the binary, not a concurrent compile.
+      - name: Build release binary
+        run: cargo build --release
       - name: ACP smoke test
         run: |
-          cargo build --release
-          # Test: initialize returns valid ACP response
-          RESP=$(echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' | timeout 5 ./target/release/openab-agent 2>/dev/null | head -1)
+          # Generous timeout so a loaded runner doesn't flake the agent's
+          # first-response window (the prior 5s was tight right after a release build).
+          RESP=$(echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' | timeout 30 ./target/release/openab-agent 2>/dev/null | head -1)
           echo "Response: $RESP"
           echo "$RESP" | grep -q '"agentInfo"' || (echo "FAIL: no agentInfo in response" && exit 1)
           echo "$RESP" | grep -q '"openab-agent"' || (echo "FAIL: wrong agent name" && exit 1)

From 235945094f8840ce8c62094e6d90f070d8ae41ac Mon Sep 17 00:00:00 2001
From: Brett Chien <brettchien@users.noreply.github.com>
Date: Thu, 25 Jun 2026 04:37:10 +0800
Subject: [PATCH 18/18] ci(openab-agent): surface ACP smoke-test stderr + exit
 code for diagnosis

The smoke test produces empty stdout only in CI (the binary responds with
agentInfo in <1s locally across debug/release/clean-HOME/CI-env-var runs).
Capture stderr + exit code so the next CI run reveals why stdout is empty.
---
 .github/workflows/ci-openab-agent.yml | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/ci-openab-agent.yml b/.github/workflows/ci-openab-agent.yml
index 550edb4ac..5ba974853 100644
--- a/.github/workflows/ci-openab-agent.yml
+++ b/.github/workflows/ci-openab-agent.yml
@@ -42,9 +42,14 @@ jobs:
         run: cargo build --release
       - name: ACP smoke test
         run: |
-          # Generous timeout so a loaded runner doesn't flake the agent's
-          # first-response window (the prior 5s was tight right after a release build).
-          RESP=$(echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' | timeout 30 ./target/release/openab-agent 2>/dev/null | head -1)
+          set +e
+          echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}' \
+            | timeout 30 ./target/release/openab-agent > /tmp/acp_out.txt 2> /tmp/acp_err.txt
+          code=$?
+          echo "exit code: $code"
+          echo "--- stdout ---"; cat /tmp/acp_out.txt
+          echo "--- stderr ---"; cat /tmp/acp_err.txt
+          RESP=$(head -1 /tmp/acp_out.txt)
           echo "Response: $RESP"
           echo "$RESP" | grep -q '"agentInfo"' || (echo "FAIL: no agentInfo in response" && exit 1)
           echo "$RESP" | grep -q '"openab-agent"' || (echo "FAIL: wrong agent name" && exit 1)