From a7064ea7e13d4bb69e2ec7a668ea3f65f9d3d7df Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 10:28:14 -0700 Subject: [PATCH 01/59] LLP 0046-0048: multi-tenant OIDC login on the client (design docs) Integration branch base for the chunk-2 OIDC client work. Carries the decision, design, and plan that the implementation PRs realize. --- llp/0046-oidc-login-client.decision.md | 173 ++++++++++++++++++++++++ llp/0047-oidc-login-client.design.md | 179 +++++++++++++++++++++++++ llp/0048-oidc-login-client.plan.md | 104 ++++++++++++++ 3 files changed, 456 insertions(+) create mode 100644 llp/0046-oidc-login-client.decision.md create mode 100644 llp/0047-oidc-login-client.design.md create mode 100644 llp/0048-oidc-login-client.plan.md diff --git a/llp/0046-oidc-login-client.decision.md b/llp/0046-oidc-login-client.decision.md new file mode 100644 index 0000000..5137ae5 --- /dev/null +++ b/llp/0046-oidc-login-client.decision.md @@ -0,0 +1,173 @@ +# LLP 0046: Multi-tenant OIDC login on the client + +**Type:** Decision +**Status:** Draft +**Systems:** CLI, Onboarding, Query, MCP +**Author:** Kenny / Claude +**Date:** 2026-06-29 +**Related:** LLP 0009, LLP 0011, LLP 0033, LLP 0044 + +> The server side of multi-tenant OIDC login already shipped: hypaware-server is +> a rendezvous that runs the OIDC flow against a configurable provider and mints +> its own revocable refresh token plus a short org-scoped access JWT. This is the +> client half: teaching `hyp` to run the browser flow, store the refresh token in +> the existing 0600 credential store, and refresh silently on the query path. +> Grilled against the corpus on 2026-06-29; the decisions below are the +> client-local forks resolved in that session. + +## Summary + +This is **chunk 2** of the multi-tenant OAuth login program. The architecture was +decided server-side and lives in the hypaware-server repo: +`../hypaware-server/llp/0017-multi-tenant-oauth-login.decision.md` (the decision), +`0018-oidc-login-server.design.md` (the server design), and +`0019-oidc-login-server.plan.md` (the shipped server plan). That work added the +login endpoints, claimed-domain to org resolution, and the token endpoint. + +This document does **not** re-decide that architecture. It records only the +**client-local** choices. Everything cross-cutting (why hypaware-server owns `org`, +why it mints its own refresh token, the forge-proof org boundary) defers up to +server LLP 0017. + +## Context: what exists today + +The client already has a per-target remote credential path, and chunk 2 extends it +rather than inventing a parallel one: + +- **Credential store** (`src/core/remote/credentials.js`): a single `0600` file at + `/remote-credentials.json` (HYP_HOME-overridable), per-target shape + `{ "": { "token": "..." } }`, written atomically (tmp + rename + chmod). + `writeToken`, `readCredentials`, `removeToken`, `resolveToken` (env then file). + This is LLP 0033's query-scoped store. +- **Command** (`src/core/cli/remote_commands.js`): `hyp remote login ` reads a + token from `--token-file ` or stdin and calls `writeToken`. Siblings: `remote + add` (writes `query.remotes. = { url }`), `remote list`, `remote remove`. +- **Query attach** (`src/core/mcp/remote_verb.js`, `client.js`): `resolveToken` reads + the token, `createHttpMcpClient({ url, token })` sets `Authorization: Bearer`, and a + 401/403 throws `remote rejected the credential ... re-run 'hyp remote login'`. +- **Net-new for chunk 2:** there is no browser-open, loopback listener, PKCE, or + refresh-token handling anywhere in the client today. + +## Decisions + +### D1: Login is a browser mode of `hyp remote login ` + + +Server LLP 0017 named the client command `hypaware login`. The client's entire remote +model is per-target under `hyp remote ...`, and the credential store is keyed by +target; a separate top-level `hyp login` would create a second credential path for the +same servers. **Decision:** `hyp remote login ` gains an interactive browser +mode, selected when no static token is supplied (or forced with `--browser`), and still +falls back to `--token-file`/stdin for static tokens. One command, one store, one more +way to populate it. The server prose should treat `hypaware login` as the intent and +`hyp remote login` as the realized spelling, not silently diverge. + +### D2: Ephemeral loopback redirect on 127.0.0.1 (RFC 8252) + + +The client starts a single-shot HTTP listener on `127.0.0.1` at an OS-assigned port, +serving one path (`/callback`), and uses `http://127.0.0.1:/callback` as the +`redirect_uri`. The server already restricts `redirect_uri` to loopback hosts, so this +matches. Rejected a fixed port (collisions, and the server would have to allowlist it). +The listener is bound for one login, with a short timeout, then closed. + +### D3: The client owns the downstream PKCE leg + + +There are two PKCE legs. The client is the OAuth app talking to hypaware-server (the +downstream leg): it generates a verifier and an S256 challenge, sends the challenge to +`/login/start`, and holds the verifier to present at `/identity/token`. The upstream leg +(hypaware-server to the IdP) is entirely server-internal; the client never sees it. The +verifier lives in memory for one flow, is never logged, never persisted. + +### D4: The session is a discriminated record in the same credential file + + +Each per-target record carries `kind: 'static' | 'oidc'`. A `static` record is the +LLP 0033 `{ token }`; an `oidc` record carries `{ refreshToken, accessJwt, expiresAt, +org }`. **Migration is read-implicit:** a legacy record with a `token` and no `kind` is +treated as `kind: 'static'` on read, so existing files keep working without a rewrite; +writes stamp `kind`. One file, one resolve path, one remove path. Rejected a separate +sessions file (two resolve/remove paths for the same per-target secret). Rejected silent +additive fields without a discriminator (the explicit `kind` makes "is this an OIDC +session" a property of the record, not an inference from which fields are present). + +### D5: Silent refresh and re-login live on the attach path + + +When the query attach path needs a token for an `oidc` target, a session-aware resolver +checks `expiresAt`; if it is within a skew window of expiry, it POSTs the refresh grant +to the token endpoint, persists the new JWT, and proceeds. If a live request still +returns 401, the attach path refreshes once and retries before surfacing an error. A +refresh that fails with 401 `invalid_grant` (the refresh row was revoked or expired) +surfaces the existing "re-run `hyp remote login`" guidance, now meaning re-run the +browser flow. Rejected refreshing eagerly on every call (needless token-endpoint +traffic). The per-target env override (`HYP_REMOTE_TOKEN_`) still wins for CI. + +### D6: Identity endpoints derive from the configured remote URL origin + + +A target is configured as `query.remotes. = { url }`, the MCP endpoint. The login +and token endpoints live on the same server under `/v1/identity/...`. **Decision:** the +client derives the identity base as `/v1/identity`, so no second URL is +configured. **Caveat (open question):** this assumes identity is mounted at the origin +root; a deployment hosted under a path prefix (`host/hypaware/mcp`) is not covered by the +MVP. If that case arises, an optional explicit `identityUrl` on the target is the +additive fix, defaulting to origin-derive. We do not build server-advertised discovery +in this chunk. + +### D7: Org selection is a `--org` selector; surfaced errors are explained + + +`hyp remote login --org acme` passes `org` to `/login/start` as a selector +only; the server resolves the real org and may bounce `no_membership`, +`org_selection_required`, or `org_not_permitted` to the loopback as an `error`. The +command translates each into a clear message. The client never sees the user's org list, +so on `org_selection_required` it instructs the user to re-run with `--org ` rather +than enumerating. The client never asserts the org; it only selects. + +### D8: Static tokens remain the headless escape hatch; no device-code flow + + +When there is no display or browser (`--no-browser`, or no opener found), the client +prints the authorize URL for manual open while the loopback still waits. For a truly +headless host with no reachable loopback (a remote SSH shell), the browser flow is not +usable; the existing `--token-file`/stdin static-token path stays as the documented +fallback. We do **not** build a device-code flow: hypaware-server's chunk-1 token +endpoint exposes only the `authorization_code` and `refresh_token` grants, so a device +flow would require new server work and is out of scope. + +## Consequences + +- One command, one store, one resolve path: OIDC sessions and static tokens coexist, + distinguished by `kind`. +- The client gains its first browser-open, loopback-listener, and PKCE code; small, + dependency-free, isolated under `src/core/remote/`. +- The IdP stays out of the steady-state path: only login and refresh-expiry touch the + network beyond normal queries. +- LLP 0033's credential-store and 401 contracts are now extended; 0033 needs a + follow-up edit when the code lands (noted in the plan), not a rewrite here. +- Out of scope (server chunks 3 to 5): the hyperparam.app OIDC provider, login-minted + gateway enrollment, and self-serve DNS-TXT domain claiming. The client change is + provider-agnostic and works against any hypaware-server with login configured. + +## Open questions + +- **System vocabulary.** Login/credentials is arguably a new subsystem. This doc tags + the existing `CLI, Onboarding, Query, MCP`. Should LLP 0000 gain an `Identity` (or + `Auth`) System tag, and should LLP 0033's credential store move under it? +- **Path-prefixed identity hosting.** D6 derives from the origin; an explicit + `identityUrl` override is the additive fix if a real deployment needs it. +- **OS keychain.** The refresh token lands in the `0600` file, consistent with LLP 0033; + the keychain remains the named follow-up there, not this chunk. +- **Gateway provisioning.** Server chunk 4 may have `hyp remote login` also receive a + gateway token for unattended ingest. Deferred; the D4 record shape should not preclude + adding a gateway-token field later. + +## References + +- `../hypaware-server/llp/0017-multi-tenant-oauth-login.decision.md` (architecture) +- `../hypaware-server/llp/0018-oidc-login-server.design.md` (server design) +- `../hypaware-server/llp/0019-oidc-login-server.plan.md` (shipped server plan) +- [LLP 0009](./0009-cli-registry.spec.md), [LLP 0011](./0011-setup-and-onboarding.decision.md), [LLP 0033](./0033-remote-query-attach.spec.md) +- Designed in [LLP 0047](./0047-oidc-login-client.design.md); sequenced in [LLP 0048](./0048-oidc-login-client.plan.md) diff --git a/llp/0047-oidc-login-client.design.md b/llp/0047-oidc-login-client.design.md new file mode 100644 index 0000000..cb1b968 --- /dev/null +++ b/llp/0047-oidc-login-client.design.md @@ -0,0 +1,179 @@ +# LLP 0047: Multi-tenant OIDC login on the client, implementation design + +**Type:** design +**Status:** Draft +**Systems:** CLI, Onboarding, Query, MCP, Config +**Author:** Kenny / Claude +**Date:** 2026-06-29 +**Related:** LLP 0009, LLP 0011, LLP 0033, LLP 0046 +**Decided-by:** LLP 0046 — multi-tenant OIDC login on the client + +> [LLP 0046](./0046-oidc-login-client.decision.md) grilled and decided the client-local +> shape of chunk 2: a browser mode on `hyp remote login`, an ephemeral loopback +> redirect, the downstream PKCE leg, a discriminated `kind` record in the existing 0600 +> store, origin-derived identity endpoints, and silent refresh on the attach path. This +> document is the implementation design: the modules and functions to add, the exact +> wire contract with hypaware-server, and the test plan. + +## Summary + +Chunk 2 adds a browser authorization-code flow to the client and a silent-refresh hook +to the query path, against the login surface hypaware-server shipped in its LLP 0019. +The new code is small, dependency-free (Node stdlib `crypto`, `http`, and a platform +opener), and isolated under `src/core/remote/`. The credential store and the `hyp remote +login` command are extended, not replaced. + +## The server contract (what the client speaks) + +All paths are under the identity base `/v1/identity` (LLP 0046 D6). + +**Start (browser navigates here):** `GET /login/start` with query params: + +| param | value | +|-------|-------| +| `redirect_uri` | `http://127.0.0.1:/callback` (loopback http only) | +| `code_challenge` | base64url SHA-256 of the client verifier | +| `code_challenge_method` | `S256` | +| `state` | client CSRF token, echoed back on the loopback callback | +| `org` | optional org selector (`--org`) | + +The server parks a flight and 302s the browser to the upstream provider. After the +provider returns, the server resolves the org and 302s the browser to the client's +`redirect_uri`. + +**Loopback callback (the client's listener catches this):** +`GET /callback?code=&state=` on success, or +`GET /callback?error=&state=` on failure. The `state` must equal the one +the client sent; `error` is a provider error or a resolution error (`access_denied`, +`no_membership`, `org_selection_required`, `org_not_permitted`). + +**Token endpoint:** `POST /token`, JSON body, two grants: + +- `{ "grant_type": "authorization_code", "code": "...", "code_verifier": "..." }` + → `200 { session_id, refresh_token, access_jwt, expires_at, org }` +- `{ "grant_type": "refresh_token", "refresh_token": "..." }` + → `200 { access_jwt, expires_at, org }` or `401 { error: "invalid_grant" }` + +Note the response field is `access_jwt` (not `access_token`) and `expires_at` is an ISO +timestamp. This is hypaware-server's own token, verified by its own resource path; there +is no external JWKS on the client. + +## Modules to add (under `src/core/remote/`) + +### `pkce.js` +`createPkcePair()` → `{ verifier, challenge }`. Verifier is 32 random bytes base64url; +challenge is base64url SHA-256 of the verifier. Pure, synchronous, stdlib `crypto`. +Realizes LLP 0046 D3. + +### `loopback.js` +`startLoopbackReceiver({ state, timeoutMs })` → `{ redirectUri, waitForCode() }`. Binds +`http://127.0.0.1:0` (OS-assigned port) and returns `redirectUri` so the caller can build +the start URL before opening the browser. `waitForCode()` resolves `{ code }` when +`/callback` arrives with a matching `state`; rejects on `error=`, `state` mismatch, or +timeout. Serves a minimal "login complete, you can close this tab" page, then closes. +Single-shot. Realizes LLP 0046 D2. + +### `open_browser.js` +`openBrowser(url)` spawns the platform opener (`open` darwin, `xdg-open` linux, `start` +win32) detached; returns whether an opener was found. `--no-browser` skips it and prints +the URL. Realizes LLP 0046 D8. + +### `identity_client.js` +Plain JSON over an injectable `fetchImpl`, distinct from the MCP JSON-RPC `client.js`: +- `exchangeCode({ identityBase, code, codeVerifier, fetchImpl })` + → `{ refreshToken, accessJwt, expiresAt, org }` +- `refreshSession({ identityBase, refreshToken, fetchImpl })` + → `{ accessJwt, expiresAt, org }`, or throws a typed `invalid_grant` error the attach + path turns into re-login guidance. + +### `oidc_login.js` +`loginWithBrowser({ identityBase, org, openBrowser, now })` orchestrates D2/D3: +generate PKCE + a random `state`, start the loopback receiver, build the start URL, open +the browser (or print it), await the code, exchange it, and return the session +`{ refreshToken, accessJwt, expiresAt, org }`. No persistence here; the caller stores it. + +## Modules to extend + +### `credentials.js` (LLP 0046 D4) +Records gain a discriminator `kind: 'static' | 'oidc'`. Additions: +- `writeSession(stateDir, target, { refreshToken, accessJwt, expiresAt, org })` writes a + `kind: 'oidc'` record through the same atomic 0600 path as `writeToken`. +- a normalizing read: a legacy record with `token` and no `kind` is read as + `kind: 'static'`, so existing files keep working without a rewrite. +- `resolveAccessJwt({ target, env, stateDir, identityBase, now, fetchImpl })`: the + session-aware resolver for the attach path. For a `static` record (or env override) it + returns the token as today. For an `oidc` record it returns a fresh access JWT, calling + `refreshSession` and persisting the new JWT/expiry when the stored one is within a skew + window of expiry. The env override `HYP_REMOTE_TOKEN_` still wins. +- `removeToken` already drops the whole per-target record, so it covers both kinds. + +### `remote_commands.js` (LLP 0046 D1, D7, D8) +`runRemoteLogin` gains browser mode: parse `--browser` / `--no-browser` / `--org ` +/ `--token-file `. With a token file or piped stdin, behave exactly as today +(`kind: 'static'`). Otherwise run `loginWithBrowser` against the target's identity base, +then `writeSession`. Print the resolved org on success. Translate a callback `error` into +a clear message per D7. + +### `remote_verb.js` and `client.js` (LLP 0046 D5) +The attach path calls `resolveAccessJwt` instead of `resolveToken`. On a live 401/403 +from `client.js`, the attach path refreshes once and retries before surfacing the error; +a refresh that fails with `invalid_grant` surfaces the existing re-login message (now: +re-run the browser flow). + +## Security notes + +- `state` is a random per-login CSRF token; a callback whose `state` does not match is + rejected without reading `code`. +- The loopback listener binds only `127.0.0.1`, is single-shot, and times out. +- The PKCE verifier lives in memory for one flow, is never logged, never persisted. +- The refresh token and access JWT inherit the 0600 atomic-write store; neither is ever + written to config or logs. +- The token endpoint is reached over the target's own origin (https in any real + deployment); the access JWT is hypaware-server's own credential, so there is no + external JWKS trust on the client. + +## Telemetry (log-driven development) + +Emit structured logs around: login start (target, has-org), loopback bind (port), +browser open (opener found or printed), code receipt (success or error kind), token +exchange, and refresh (refreshed or invalid_grant). Attributes use `component: +'remote-oidc'`, an `operation`, and a `status`; never log the verifier, code, refresh +token, or access JWT (hash or redact if identity matters). Give the smoke a stable +`smoke_step` per phase. + +## Tests + +Traditional unit tests (deterministic; injected clock, fetch, and http): +- `pkce`: challenge is the base64url SHA-256 of the verifier; pairs differ. +- `loopback`: a matching `state` resolves `{ code }`; a mismatched `state`, an `error=`, + and a timeout each reject; the listener closes after one request. +- `identity_client`: `exchangeCode` and `refreshSession` post the right body and parse + the response; a 401 surfaces a typed `invalid_grant`. +- `credentials`: an `oidc` record round-trips; a legacy `token`-only record reads as + `static`; `resolveAccessJwt` refreshes a stale JWT and persists it, returns a fresh one + untouched, falls back to a static token, and honors the env override; a refresh failure + propagates. +- `remote_commands`: `--org` is forwarded; a callback `error` maps to the right message; + the static `--token-file` path is unchanged. + +Hermetic smoke `remote_oidc_login` (a stub identity server signing real tokens, temp +HYP_HOME): browser flow (loopback driven by a scripted client), session stored, query +attaches the access JWT, a forced expiry triggers a silent refresh, a revoked refresh row +triggers the re-login message. + +## @refs to add when the code lands + +- `oidc_login.js`: `@ref LLP 0046#d3 [implements]` (downstream PKCE leg owned by client) +- `loopback.js`: `@ref LLP 0046#d2 [implements]` (ephemeral 127.0.0.1 redirect, RFC 8252) +- `credentials.js` session additions: `@ref LLP 0046#d4 [implements]` (discriminated + `kind` record), and `@ref LLP 0033#credentials [constrained-by]` +- `resolveAccessJwt` / attach path: `@ref LLP 0046#d5 [implements]` (silent refresh and + 401 re-login on the attach path) +- `runRemoteLogin` browser mode: `@ref LLP 0046#d1 [implements]` (browser mode of `hyp + remote login`) + +## References + +- `../hypaware-server/llp/0018-oidc-login-server.design.md` (the server side of this contract) +- [LLP 0046](./0046-oidc-login-client.decision.md) (decisions), [LLP 0033](./0033-remote-query-attach.spec.md) (credential store + attach), [LLP 0009](./0009-cli-registry.spec.md) (command registry) +- Sequenced in [LLP 0048](./0048-oidc-login-client.plan.md) diff --git a/llp/0048-oidc-login-client.plan.md b/llp/0048-oidc-login-client.plan.md new file mode 100644 index 0000000..9295491 --- /dev/null +++ b/llp/0048-oidc-login-client.plan.md @@ -0,0 +1,104 @@ +# LLP 0048: Multi-tenant OIDC login on the client, plan + +**Type:** plan +**Status:** Draft +**Systems:** CLI, Onboarding, Query, MCP, Config +**Author:** Kenny / Claude +**Date:** 2026-06-29 +**Related:** LLP 0033, LLP 0046, LLP 0047 +**Generated-by:** LLP 0047 — multi-tenant OIDC login on the client, implementation design + +> [LLP 0047](./0047-oidc-login-client.design.md) named the modules and the wire +> contract. This plan turns them into a task graph of independently-mergeable PRs, with +> the dependency edges that let the local primitives land in parallel before the command +> and attach-path wiring that consume them. + +## Sequencing at a glance + +``` +M1 local primitives (parallel) M2 command M3 attach M4 smoke + T1 pkce.js ............................\ + T2 loopback.js ....................... \ + T3 identity_client.js ................ >--> T6 runRemoteLogin --> T7 resolveAccessJwt --> T8 hermetic + T4 credentials kind + writeSession ... / browser mode on attach path smoke + T5 oidc_login.js (needs T1,T2,T3) ..../ +``` + +Each task is a PR with its own unit tests, lands green under `npm test`, and carries the +`@ref` annotations LLP 0047 lists. Follow the same discipline the server used in its LLP +0019: typecheck and tests gate every merge; the LLP edit lands with its code. + +## Milestone 1: local primitives + +No command wiring, no network beyond injected fetch; each is unit-testable in isolation. + +- **T1 `pkce.js`.** `createPkcePair()`. Test: challenge is the base64url SHA-256 of the + verifier; two pairs differ. (`@ref LLP 0046#d3`) +- **T2 `loopback.js`.** `startLoopbackReceiver({ state, timeoutMs })`. Test (injected + requests against the real bound port): matching `state` resolves `{ code }`; mismatch, + `error=`, and timeout reject; the listener closes after one request. (`@ref LLP 0046#d2`) +- **T3 `identity_client.js`.** `exchangeCode`, `refreshSession` over an injected + `fetchImpl`. Test: correct request body and parsed response; a 401 surfaces a typed + `invalid_grant`. +- **T4 `credentials.js` discriminated `kind`.** Add `writeSession`, the read-normalizing + of legacy `token`-only records to `static`, and a `RemoteCredentialRecord` interface in + the sibling `.d.ts`. Test: `oidc` record round-trips; legacy record reads as `static`; + `removeToken` still clears both. (`@ref LLP 0046#d4`, `@ref LLP 0033#credentials`) +- **T5 `oidc_login.js`.** `loginWithBrowser(...)`, composing T1+T2+T3 plus + `open_browser.js`. Depends on T1, T2, T3. Test: a scripted loopback + stub identity + server drives a full code-to-session exchange; `--no-browser` prints the URL. + +T1 to T4 are mutually independent and parallelize; T5 joins them. + +## Milestone 2: command flow + +- **T6 `runRemoteLogin` browser mode.** Flag parsing (`--browser` / `--no-browser` / + `--org` / `--token-file`), origin-derive the identity base from the target's `url`, run + `loginWithBrowser`, persist with `writeSession`, print the resolved org. Translate + callback `error` codes into clear messages (D7). Static `--token-file`/stdin path + unchanged. Depends on M1. (`@ref LLP 0046#d1`, `#d6`, `#d7`) + Test: `--org` forwarded; each error code maps to its message; static path untouched. + +## Milestone 3: attach-path integration + +- **T7 silent refresh + 401 retry.** Swap the attach path (`remote_verb.js`) to + `resolveAccessJwt`; add the one-shot refresh-and-retry on a live 401 in `client.js`; + keep the env override winning. A failed refresh surfaces the re-login message. Depends + on T4 (and benefits from T6 existing for end-to-end manual check). (`@ref LLP 0046#d5`) + Test: a stale stored JWT is refreshed and persisted before the call; a 401 mid-flight + triggers exactly one refresh+retry; `invalid_grant` maps to re-login guidance. + +## Milestone 4: hermetic smoke + +- **T8 `remote_oidc_login` smoke.** A stub identity server (signing real tokens) plus a + scripted loopback in a temp HYP_HOME: login, session stored as `kind: 'oidc'`, query + attaches the access JWT, a forced expiry drives a silent refresh, a revoked refresh row + drives the re-login message. Assert both the user-visible result and the telemetry + `smoke_step` markers per the repo's log-driven-development guidance. + +## Cross-doc follow-ups (land with the code, not before) + +- **Edit LLP 0033.** Its credential-store and 401 sections now have an OIDC variant. + When T4/T7 land, update 0033 to reference the `kind` discriminator and the + refresh-then-retry behavior, so 0033 stays the honest source for the attach path. +- **Promote 0046/0047/0048.** Move them Draft → Accepted/Active as the design is approved, + and to Implemented when the milestones land (the server-side 0015/0016 precedent). +- **Decide the `Identity` System tag** (LLP 0046 open question) before tagging the code; + if adopted, update LLP 0000's vocabulary in the same change. +- **Run `/ref-check`** after each milestone so no `@ref` dangles and every `#anchor` + resolves. + +## Out of scope (tracked elsewhere) + +- hyperparam.app minimal OIDC provider (server chunk 3). +- Login-minted gateway enrollment (server chunk 4); keep the D4 record shape open to a + future gateway-token field. +- Self-serve DNS-TXT domain claiming (server chunk 5). +- OS keychain for the refresh token (LLP 0033 follow-up). +- Path-prefixed identity hosting / explicit `identityUrl` (LLP 0046 D6 caveat). + +## References + +- [LLP 0046](./0046-oidc-login-client.decision.md) (decisions), [LLP 0047](./0047-oidc-login-client.design.md) (design) +- [LLP 0033](./0033-remote-query-attach.spec.md) (credential store + attach), [LLP 0009](./0009-cli-registry.spec.md) (command registry), [LLP 0011](./0011-setup-and-onboarding.decision.md) (onboarding) +- `../hypaware-server/llp/0019-oidc-login-server.plan.md` (the server-side plan this mirrors) From 0cc12ccf10273f60f435d3a8acd6aa4e96f93597 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 11:33:36 -0700 Subject: [PATCH 02/59] remote/oidc: PKCE + ephemeral loopback receiver (T1, T2) (#197) * remote/oidc: PKCE pair + ephemeral loopback receiver (T1, T2) Two dependency-free local primitives for the browser login flow: - pkce.js: createPkcePair() -> { verifier, challenge }, S256 over stdlib crypto. The client's downstream PKCE leg (LLP 0046 D3). - loopback.js: startLoopbackReceiver({ state, timeoutMs }) binds a single-shot 127.0.0.1:0 HTTP listener serving /callback, returns its redirectUri up front, and resolves { code } on a state-matched callback (rejecting on mismatch, error=, or timeout). RFC 8252 ephemeral redirect (LLP 0046 D2). Unit tests cover the SHA-256 challenge derivation, fresh randomness, and the loopback success / state-mismatch / error / timeout paths. * remote/oidc: loopback close() rejects pending waitForCode; route post-listen errors Review follow-ups (PR #197): - close() before a code arrives now rejects an in-flight (or future) waitForCode() instead of leaving it permanently pending. - a post-listen server 'error' now settles a pending waitForCode() and closes the server (via fail()), not a no-op rejectStart. - tests: close()-rejects-pending, and a non-/callback probe 404s without consuming the single shot. --- src/core/remote/loopback.js | 176 ++++++++++++++++++++++++++++++ src/core/remote/pkce.js | 37 +++++++ test/core/remote-loopback.test.js | 77 +++++++++++++ test/core/remote-pkce.test.js | 31 ++++++ 4 files changed, 321 insertions(+) create mode 100644 src/core/remote/loopback.js create mode 100644 src/core/remote/pkce.js create mode 100644 test/core/remote-loopback.test.js create mode 100644 test/core/remote-pkce.test.js diff --git a/src/core/remote/loopback.js b/src/core/remote/loopback.js new file mode 100644 index 0000000..66422fd --- /dev/null +++ b/src/core/remote/loopback.js @@ -0,0 +1,176 @@ +// @ts-check + +import http from 'node:http' + +import { Attr, getLogger } from '../observability/index.js' + +/** + * Ephemeral loopback redirect receiver (RFC 8252 §7.3) for the browser + * authorization-code flow. The client binds a single-shot HTTP listener on + * `127.0.0.1` at an OS-assigned port, serves one path (`/callback`), and uses + * `http://127.0.0.1:/callback` as the OAuth `redirect_uri`. The server + * already restricts `redirect_uri` to loopback hosts, so this matches; a fixed + * port was rejected (collisions, and the server would have to allowlist it). + * + * @import { Server, ServerResponse } from 'node:http' + */ + +const CALLBACK_PATH = '/callback' +const DEFAULT_TIMEOUT_MS = 5 * 60 * 1000 + +/** + * Start the single-shot loopback receiver. Binds `127.0.0.1:0`, then resolves + * `{ redirectUri, port, waitForCode, close }` so the caller can build the start + * URL before opening the browser. `waitForCode()` resolves `{ code }` when + * `/callback` arrives with a matching `state`; it rejects on `error=`, a + * `state` mismatch, or timeout. The listener serves a minimal "you can close + * this tab" page, then closes after one request. + * + * @param {{ state: string, timeoutMs?: number }} args + * @returns {Promise<{ redirectUri: string, port: number, waitForCode: () => Promise<{ code: string }>, close: () => void }>} + * @ref LLP 0046#d2 [implements]: ephemeral 127.0.0.1 redirect, single-shot, timed out (RFC 8252) + */ +export function startLoopbackReceiver({ state, timeoutMs = DEFAULT_TIMEOUT_MS }) { + const log = getLogger('remote') + + // One-flow result channel: the request handler (or the timeout) settles it + // exactly once; waitForCode() adopts these resolvers. + let settled = false + let started = false + /** @type {(value: { code: string }) => void} */ + let resolveCode = () => {} + /** @type {(err: Error) => void} */ + let rejectCode = () => {} + /** @type {{ code: string } | { error: Error } | undefined} */ + let pending + + /** @param {{ code: string }} value */ + function deliver(value) { + if (settled) return + settled = true + clearTimeout(timer) + server.close(() => {}) + pending = value + resolveCode(value) + } + /** @param {Error} err @param {string} kind */ + function fail(err, kind) { + if (settled) return + settled = true + clearTimeout(timer) + server.close(() => {}) + pending = { error: err } + log.warn('remote.loopback_error', { + [Attr.COMPONENT]: 'remote-oidc', + [Attr.OPERATION]: 'remote.loopback', + [Attr.STATUS]: 'failed', + [Attr.ERROR_KIND]: kind, + smoke_step: 'loopback_callback', + }) + rejectCode(err) + } + + /** @type {Server} */ + const server = http.createServer((req, res) => { + const url = new URL(req.url ?? '/', 'http://127.0.0.1') + if (url.pathname !== CALLBACK_PATH) { + res.writeHead(404, { 'content-type': 'text/plain' }) + res.end('not found') + return + } + const params = url.searchParams + const returnedState = params.get('state') + const error = params.get('error') + const code = params.get('code') + + // A callback whose state does not match is rejected without reading the + // code (CSRF guard). Serve a neutral page either way. + if (returnedState !== state) { + respond(res, 'Login state mismatch. You can close this tab.') + fail(new Error('loopback received a callback with a mismatched state'), 'state_mismatch') + return + } + if (error) { + respond(res, 'Login failed. You can close this tab and return to the terminal.') + fail(Object.assign(new Error(`login failed: ${error}`), { callbackError: error }), error) + return + } + if (!code) { + respond(res, 'Login response was missing a code. You can close this tab.') + fail(new Error('loopback callback had neither code nor error'), 'no_code') + return + } + respond(res, 'Login complete. You can close this tab and return to the terminal.') + deliver({ code }) + }) + + /** @type {ReturnType} */ + let timer + + return new Promise((resolveStart, rejectStart) => { + server.on('error', (err) => { + const e = err instanceof Error ? err : new Error(String(err)) + // A post-listen error must settle a pending waitForCode() (and close the + // server), not just no-op against the already-resolved start promise. + if (started) { + fail(e, 'server_error') + return + } + if (settled) return + settled = true + rejectStart(e) + }) + + server.listen(0, '127.0.0.1', () => { + started = true + const addr = server.address() + const port = addr && typeof addr === 'object' ? addr.port : 0 + const redirectUri = `http://127.0.0.1:${port}/callback` + log.info('remote.loopback_bind', { + [Attr.COMPONENT]: 'remote-oidc', + [Attr.OPERATION]: 'remote.loopback', + [Attr.STATUS]: 'ok', + port, + smoke_step: 'loopback_bind', + }) + + timer = setTimeout(() => { + fail(new Error('timed out waiting for the browser login to complete'), 'timeout') + }, timeoutMs) + if (typeof timer.unref === 'function') timer.unref() + + resolveStart({ + redirectUri, + port, + waitForCode() { + return new Promise((resolve, reject) => { + // If the callback already arrived (fast browser), settle now. + if (pending && 'code' in pending) return resolve(pending) + if (pending && 'error' in pending) return reject(pending.error) + resolveCode = resolve + rejectCode = reject + }) + }, + close() { + // Closing before a code arrives must reject an in-flight (or future) + // waitForCode(), or the caller's promise hangs forever. + if (!settled) { + settled = true + clearTimeout(timer) + const err = new Error('loopback receiver closed before a code arrived') + pending = { error: err } + rejectCode(err) + } + server.close(() => {}) + }, + }) + }) + }) +} + +/** @param {ServerResponse} res @param {string} message */ +function respond(res, message) { + const body = `HypAware login

${message}

` + res.writeHead(200, { 'content-type': 'text/html; charset=utf-8' }) + res.end(body) +} diff --git a/src/core/remote/pkce.js b/src/core/remote/pkce.js new file mode 100644 index 0000000..34b8422 --- /dev/null +++ b/src/core/remote/pkce.js @@ -0,0 +1,37 @@ +// @ts-check + +import crypto from 'node:crypto' + +/** + * PKCE (RFC 7636) for the client's **downstream** OAuth leg: `hyp` is the + * OAuth app talking to hypaware-server. The client generates a verifier and + * an S256 challenge, sends the challenge to `/login/start`, and holds the + * verifier to present at `/token`. The upstream leg (hypaware-server to the + * IdP) is server-internal and the client never sees it. + * + * Pure and synchronous over stdlib `crypto`; no I/O, no persistence. The + * verifier lives in memory for one flow and is never logged. + */ + +/** + * Generate a one-flow PKCE pair. The verifier is 32 random bytes base64url + * encoded; the challenge is the base64url SHA-256 of the verifier ASCII. + * + * @returns {{ verifier: string, challenge: string }} + * @ref LLP 0046#d3 [implements]: client owns the downstream PKCE leg; verifier + S256 challenge, in-memory for one flow + */ +export function createPkcePair() { + const verifier = base64url(crypto.randomBytes(32)) + const challenge = base64url(crypto.createHash('sha256').update(verifier).digest()) + return { verifier, challenge } +} + +/** + * Base64url with no padding (RFC 7636 §A). + * + * @param {Buffer} buf + * @returns {string} + */ +function base64url(buf) { + return buf.toString('base64').replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '') +} diff --git a/test/core/remote-loopback.test.js b/test/core/remote-loopback.test.js new file mode 100644 index 0000000..a8e5c08 --- /dev/null +++ b/test/core/remote-loopback.test.js @@ -0,0 +1,77 @@ +// @ts-check + +import test from 'node:test' +import assert from 'node:assert/strict' + +import { startLoopbackReceiver } from '../../src/core/remote/loopback.js' + +/** + * Drive the real bound port with a real HTTP request, the way the browser + * would hit the loopback redirect. + * + * @param {string} redirectUri + * @param {Record} query + * @returns {Promise} + */ +async function hitCallback(redirectUri, query) { + const url = new URL(redirectUri) + for (const [k, v] of Object.entries(query)) url.searchParams.set(k, v) + const res = await fetch(url, { redirect: 'manual' }) + await res.text() + return res.status +} + +test('a matching state resolves { code } and closes the listener', async () => { + const recv = await startLoopbackReceiver({ state: 's-good', timeoutMs: 2000 }) + const waiting = recv.waitForCode() + const status = await hitCallback(recv.redirectUri, { code: 'abc123', state: 's-good' }) + assert.equal(status, 200) + assert.deepEqual(await waiting, { code: 'abc123' }) + + // The listener is single-shot: a second request must fail to connect. + await assert.rejects(() => hitCallback(recv.redirectUri, { code: 'x', state: 's-good' })) +}) + +test('a mismatched state rejects without yielding a code', async () => { + const recv = await startLoopbackReceiver({ state: 's-real', timeoutMs: 2000 }) + // Attach the rejection assertion synchronously so the (synchronous) reject + // inside the request handler always lands on a handler. + const assertion = assert.rejects(() => recv.waitForCode(), /mismatched state/) + await hitCallback(recv.redirectUri, { code: 'leak', state: 's-forged' }) + await assertion +}) + +test('an error= callback rejects with the error code attached', async () => { + const recv = await startLoopbackReceiver({ state: 's1', timeoutMs: 2000 }) + const assertion = assert.rejects(() => recv.waitForCode(), (err) => { + assert.match(/** @type {Error} */ (err).message, /org_selection_required/) + assert.equal(/** @type {any} */ (err).callbackError, 'org_selection_required') + return true + }) + await hitCallback(recv.redirectUri, { error: 'org_selection_required', state: 's1' }) + await assertion +}) + +test('a timeout rejects', async () => { + const recv = await startLoopbackReceiver({ state: 's1', timeoutMs: 50 }) + await assert.rejects(() => recv.waitForCode(), /timed out/) +}) + +test('close() before a code arrives rejects a pending waitForCode (no hang)', async () => { + const recv = await startLoopbackReceiver({ state: 's1', timeoutMs: 2000 }) + const assertion = assert.rejects(() => recv.waitForCode(), /closed before a code arrived/) + recv.close() + await assertion +}) + +test('a request to a path other than /callback does not consume the single shot', async () => { + const recv = await startLoopbackReceiver({ state: 's1', timeoutMs: 2000 }) + const waiting = recv.waitForCode() + // A favicon-style probe to another path 404s without settling the flow. + const probe = await fetch(new URL('/favicon.ico', recv.redirectUri)) + await probe.text() + assert.equal(probe.status, 404) + // The real callback still works afterward. + await hitCallback(recv.redirectUri, { code: 'ok', state: 's1' }) + assert.deepEqual(await waiting, { code: 'ok' }) +}) diff --git a/test/core/remote-pkce.test.js b/test/core/remote-pkce.test.js new file mode 100644 index 0000000..dbbea31 --- /dev/null +++ b/test/core/remote-pkce.test.js @@ -0,0 +1,31 @@ +// @ts-check + +import test from 'node:test' +import assert from 'node:assert/strict' +import crypto from 'node:crypto' + +import { createPkcePair } from '../../src/core/remote/pkce.js' + +/** base64url with no padding, the encoding RFC 7636 mandates. */ +function base64url(/** @type {Buffer} */ buf) { + return buf.toString('base64').replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '') +} + +test('challenge is the base64url SHA-256 of the verifier', () => { + const { verifier, challenge } = createPkcePair() + const expected = base64url(crypto.createHash('sha256').update(verifier).digest()) + assert.equal(challenge, expected) +}) + +test('verifier and challenge are base64url (no +/= padding chars)', () => { + const { verifier, challenge } = createPkcePair() + assert.match(verifier, /^[A-Za-z0-9_-]+$/) + assert.match(challenge, /^[A-Za-z0-9_-]+$/) +}) + +test('two pairs differ (fresh randomness per flow)', () => { + const a = createPkcePair() + const b = createPkcePair() + assert.notEqual(a.verifier, b.verifier) + assert.notEqual(a.challenge, b.challenge) +}) From c4c4ca4abeb93779490909b0d92da236632a306a Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 11:36:20 -0700 Subject: [PATCH 03/59] remote/oidc: identity client + browser opener + login orchestrator (T3, T5) (#198) * remote/oidc: identity client, browser opener, login orchestrator (T3, T5) Completes milestone-1 local primitives, composing chunk 1's PKCE + loopback: - identity_client.js: exchangeCode / refreshSession over an injectable fetch against /v1/identity/token; a 401 invalid_grant surfaces a typed InvalidGrantError. Response field is access_jwt, expires_at is ISO. No external JWKS on the client. - open_browser.js: platform opener (open / xdg-open / cmd start), detached; returns whether an opener was found (LLP 0046 D8). - oidc_login.js: loginWithBrowser() orchestrates PKCE + a random state, starts the loopback receiver, builds the /login/start URL, opens the browser (or prints the URL), awaits the code, and exchanges it for the session. No persistence; the caller stores it (LLP 0046 D2/D3). - types.d.ts: OidcSession / RefreshedAccess shared interfaces. Unit tests cover the token request bodies + response mapping, the invalid_grant typing, each platform opener, and the full PKCE->loopback->exchange orchestration including --no-browser and loopback cleanup on failure. * remote/oidc: open_browser must not crash on async spawn ENOENT Review follow-ups (PR #198): - spawn delivers a missing-opener failure as an async 'error' event, not a synchronous throw. Without a listener it became an uncaught exception that crashed the CLI on the D8 no-opener path. Attach child.on('error') to swallow it; document the boolean return as best-effort. - oidc_login now always prints the URL as a fallback even when the opener reported success, so a silently-failed launcher never strands the user. - test now models spawn's async 'error' emission (EventEmitter child) and asserts no process crash; keep the synchronous-throw case too. --- src/core/remote/identity_client.js | 137 +++++++++++++++++++++++ src/core/remote/oidc_login.js | 108 ++++++++++++++++++ src/core/remote/open_browser.js | 52 +++++++++ src/core/remote/types.d.ts | 27 +++++ test/core/remote-identity-client.test.js | 86 ++++++++++++++ test/core/remote-oidc-login.test.js | 102 +++++++++++++++++ test/core/remote-open-browser.test.js | 69 ++++++++++++ 7 files changed, 581 insertions(+) create mode 100644 src/core/remote/identity_client.js create mode 100644 src/core/remote/oidc_login.js create mode 100644 src/core/remote/open_browser.js create mode 100644 src/core/remote/types.d.ts create mode 100644 test/core/remote-identity-client.test.js create mode 100644 test/core/remote-oidc-login.test.js create mode 100644 test/core/remote-open-browser.test.js diff --git a/src/core/remote/identity_client.js b/src/core/remote/identity_client.js new file mode 100644 index 0000000..800149b --- /dev/null +++ b/src/core/remote/identity_client.js @@ -0,0 +1,137 @@ +// @ts-check + +import { Attr, getLogger } from '../observability/index.js' + +/** + * Plain JSON client for hypaware-server's identity surface + * (`/v1/identity`), distinct from the MCP JSON-RPC `client.js`. It + * speaks the two token grants the server exposes (LLP 0047 §the-server- + * contract): `authorization_code` (browser login) and `refresh_token` (silent + * refresh). The response field is `access_jwt` (not `access_token`) and + * `expires_at` is an ISO timestamp; the JWT is hypaware-server's own + * credential, so there is no external JWKS trust on the client. + * + * @import { OidcSession, RefreshedAccess } from '../../../src/core/remote/types.js' + */ + +/** + * Error thrown when the token endpoint rejects a refresh with + * `invalid_grant` (the refresh row was revoked or expired). The attach path + * turns this into the "re-run `hyp remote login`" guidance (LLP 0046 D5). + */ +export class InvalidGrantError extends Error { + /** @param {string} [message] */ + constructor(message = 'refresh token was rejected (invalid_grant)') { + super(message) + this.name = 'InvalidGrantError' + /** @type {'invalid_grant'} */ + this.code = 'invalid_grant' + } +} + +/** + * Exchange an authorization code for a session (the `authorization_code` + * grant). Presents the held PKCE verifier. + * + * @param {{ identityBase: string, code: string, codeVerifier: string, fetchImpl?: typeof fetch }} args + * @returns {Promise} + */ +export async function exchangeCode({ identityBase, code, codeVerifier, fetchImpl }) { + const body = { grant_type: 'authorization_code', code, code_verifier: codeVerifier } + const json = await postToken({ identityBase, body, fetchImpl, operation: 'remote.exchange_code' }) + return { + refreshToken: str(json.refresh_token, 'refresh_token'), + accessJwt: str(json.access_jwt, 'access_jwt'), + expiresAt: str(json.expires_at, 'expires_at'), + org: str(json.org, 'org'), + } +} + +/** + * Refresh an access JWT (the `refresh_token` grant). Throws + * {@link InvalidGrantError} on a 401 `invalid_grant`. + * + * @param {{ identityBase: string, refreshToken: string, fetchImpl?: typeof fetch }} args + * @returns {Promise} + */ +export async function refreshSession({ identityBase, refreshToken, fetchImpl }) { + const body = { grant_type: 'refresh_token', refresh_token: refreshToken } + const json = await postToken({ identityBase, body, fetchImpl, operation: 'remote.refresh' }) + return { + accessJwt: str(json.access_jwt, 'access_jwt'), + expiresAt: str(json.expires_at, 'expires_at'), + org: str(json.org, 'org'), + } +} + +/** + * POST a JSON body to `/token` and parse the JSON response. A + * 401 whose body carries `error: "invalid_grant"` becomes an + * {@link InvalidGrantError}; any other non-2xx becomes a generic error. The + * refresh token, access JWT, code, and verifier are never logged. + * + * @param {{ identityBase: string, body: Record, fetchImpl?: typeof fetch, operation: string }} args + * @returns {Promise>} + */ +async function postToken({ identityBase, body, fetchImpl, operation }) { + const doFetch = fetchImpl ?? /** @type {typeof fetch | undefined} */ (globalThis.fetch) + if (typeof doFetch !== 'function') { + throw new Error('no fetch implementation available for the identity client') + } + const log = getLogger('remote') + const url = `${trimSlash(identityBase)}/token` + const res = await doFetch(url, { + method: 'POST', + headers: { 'content-type': 'application/json', accept: 'application/json' }, + body: JSON.stringify(body), + }) + const text = await safeText(res) + /** @type {any} */ + let json + try { + json = text ? JSON.parse(text) : {} + } catch { + throw new Error(`identity endpoint returned a non-JSON response (HTTP ${res.status})`) + } + if (!res.ok) { + const errCode = json && typeof json.error === 'string' ? json.error : undefined + log.warn('remote.token_error', { + [Attr.COMPONENT]: 'remote-oidc', + [Attr.OPERATION]: operation, + [Attr.STATUS]: 'failed', + [Attr.ERROR_KIND]: errCode ?? `http_${res.status}`, + }) + if (res.status === 401 && errCode === 'invalid_grant') { + throw new InvalidGrantError() + } + throw new Error(`identity endpoint rejected the grant (HTTP ${res.status}${errCode ? ` ${errCode}` : ''})`) + } + log.info('remote.token_ok', { + [Attr.COMPONENT]: 'remote-oidc', + [Attr.OPERATION]: operation, + [Attr.STATUS]: 'ok', + }) + return json && typeof json === 'object' ? json : {} +} + +/** @param {unknown} v @param {string} field @returns {string} */ +function str(v, field) { + if (typeof v !== 'string' || v.length === 0) { + throw new Error(`identity response missing '${field}'`) + } + return v +} + +/** @param {string} base @returns {string} */ +function trimSlash(base) { + return base.replace(/\/+$/, '') +} + +/** @param {any} res @returns {Promise} */ +async function safeText(res) { + try { + return await res.text() + } catch { + return '' + } +} diff --git a/src/core/remote/oidc_login.js b/src/core/remote/oidc_login.js new file mode 100644 index 0000000..4435d3a --- /dev/null +++ b/src/core/remote/oidc_login.js @@ -0,0 +1,108 @@ +// @ts-check + +import crypto from 'node:crypto' + +import { Attr, getLogger } from '../observability/index.js' +import { exchangeCode } from './identity_client.js' +import { startLoopbackReceiver } from './loopback.js' +import { openBrowser as defaultOpenBrowser } from './open_browser.js' +import { createPkcePair } from './pkce.js' + +/** + * Orchestrate the browser authorization-code flow (LLP 0046 D2/D3): generate + * a PKCE pair and a random CSRF `state`, start the ephemeral loopback + * receiver, build the `/login/start` URL, open the browser (or print the URL), + * await the loopback `code`, exchange it at `/token`, and return the session. + * No persistence here: the caller stores the returned session. + * + * @import { OidcSession } from '../../../src/core/remote/types.js' + */ + +/** + * @param {{ + * identityBase: string, + * org?: string, + * noBrowser?: boolean, + * openBrowser?: typeof defaultOpenBrowser, + * fetchImpl?: typeof fetch, + * startReceiver?: typeof startLoopbackReceiver, + * timeoutMs?: number, + * print?: (line: string) => void, + * }} args + * @returns {Promise} + * @ref LLP 0046#d3 [implements]: client orchestrates the downstream PKCE leg; verifier held in memory, presented at /token + */ +export async function loginWithBrowser({ + identityBase, + org, + noBrowser = false, + openBrowser = defaultOpenBrowser, + fetchImpl, + startReceiver = startLoopbackReceiver, + timeoutMs, + print = () => {}, +}) { + const log = getLogger('remote') + const { verifier, challenge } = createPkcePair() + const state = crypto.randomBytes(16).toString('hex') + + const receiver = await startReceiver({ state, timeoutMs }) + try { + const startUrl = buildStartUrl({ identityBase, redirectUri: receiver.redirectUri, challenge, state, org }) + + log.info('remote.login_start', { + [Attr.COMPONENT]: 'remote-oidc', + [Attr.OPERATION]: 'remote.login', + [Attr.STATUS]: 'ok', + has_org: Boolean(org), + smoke_step: 'login_start', + }) + + const opened = noBrowser ? false : openBrowser(startUrl) + if (opened) { + // The opener boolean is best-effort (a silently-failed launcher still + // returns true), so always print the URL as a fallback path. + print(`Opened your browser to sign in. Waiting for the redirect...`) + print(`If it did not open, visit:\n\n ${startUrl}\n`) + } else { + print(`Open this URL in your browser to sign in:\n\n ${startUrl}\n`) + } + log.info('remote.browser_open', { + [Attr.COMPONENT]: 'remote-oidc', + [Attr.OPERATION]: 'remote.login', + [Attr.STATUS]: opened ? 'ok' : 'skipped', + opener_found: opened, + smoke_step: 'browser_open', + }) + + const { code } = await receiver.waitForCode() + const session = await exchangeCode({ identityBase, code, codeVerifier: verifier, fetchImpl }) + log.info('remote.login_complete', { + [Attr.COMPONENT]: 'remote-oidc', + [Attr.OPERATION]: 'remote.login', + [Attr.STATUS]: 'ok', + smoke_step: 'login_complete', + }) + return session + } finally { + receiver.close() + } +} + +/** + * Build the `GET /login/start` URL the browser navigates to (LLP 0047 §the- + * server-contract). `org` is an optional selector only; the server resolves + * the real org. + * + * @param {{ identityBase: string, redirectUri: string, challenge: string, state: string, org?: string }} args + * @returns {string} + */ +export function buildStartUrl({ identityBase, redirectUri, challenge, state, org }) { + const url = new URL(`${identityBase.replace(/\/+$/, '')}/login/start`) + url.searchParams.set('redirect_uri', redirectUri) + url.searchParams.set('code_challenge', challenge) + url.searchParams.set('code_challenge_method', 'S256') + url.searchParams.set('state', state) + if (org) url.searchParams.set('org', org) + return url.toString() +} diff --git a/src/core/remote/open_browser.js b/src/core/remote/open_browser.js new file mode 100644 index 0000000..743c7a9 --- /dev/null +++ b/src/core/remote/open_browser.js @@ -0,0 +1,52 @@ +// @ts-check + +import { spawn } from 'node:child_process' +import process from 'node:process' + +/** + * Spawn the platform browser opener for `url`, detached, so the login flow + * does not hold the opener process. Returns whether an opener was found; the + * caller prints the URL for manual open when it was not (or under + * `--no-browser`). The static `--token-file`/stdin path remains the headless + * escape hatch (LLP 0046 D8); no device-code flow. + * + * @param {string} url + * @param {{ platform?: NodeJS.Platform, spawnImpl?: typeof spawn }} [opts] + * @returns {boolean} + * @ref LLP 0046#d8 [implements]: print-the-URL fallback when no opener; static token stays the headless path + */ +export function openBrowser(url, opts = {}) { + const platform = opts.platform ?? process.platform + const spawnImpl = opts.spawnImpl ?? spawn + const opener = openerFor(platform) + if (!opener) return false + try { + const child = spawnImpl(opener.command, [...opener.args, url], { + detached: true, + stdio: 'ignore', + }) + // A missing opener (e.g. no `xdg-open`) is delivered ASYNCHRONOUSLY as an + // 'error' event, not a synchronous throw. Without a listener that becomes + // an uncaught exception that crashes the process. Swallow it: the printed + // URL and the loopback timeout are the real backstops (D8). The boolean + // return is therefore best-effort, not a guarantee the browser opened. + if (child && typeof child.on === 'function') child.on('error', () => {}) + if (child && typeof child.unref === 'function') child.unref() + return true + } catch { + return false + } +} + +/** + * @param {NodeJS.Platform} platform + * @returns {{ command: string, args: string[] } | undefined} + */ +function openerFor(platform) { + if (platform === 'darwin') return { command: 'open', args: [] } + if (platform === 'win32') return { command: 'cmd', args: ['/c', 'start', ''] } + // Treat every other Unix as freedesktop (xdg-open). A missing xdg-open + // surfaces as an async spawn 'error' event (swallowed above); the caller's + // loopback waits and the URL is printed, so login still completes manually. + return { command: 'xdg-open', args: [] } +} diff --git a/src/core/remote/types.d.ts b/src/core/remote/types.d.ts new file mode 100644 index 0000000..c0456da --- /dev/null +++ b/src/core/remote/types.d.ts @@ -0,0 +1,27 @@ +/** + * Shared types for the remote OIDC client (LLP 0046-0048). Imported via + * `@import` from the sibling `.js` modules using a repo-root-anchored `.js` + * specifier so the published declaration build resolves them identically. + */ + +/** + * A full OIDC session, the result of the `authorization_code` grant. The + * refresh token is long-lived and revocable; the access JWT is short-lived + * and re-minted by the refresh grant. + */ +export interface OidcSession { + refreshToken: string + accessJwt: string + /** ISO-8601 expiry of `accessJwt`. */ + expiresAt: string + /** The org hypaware-server resolved for this session. */ + org: string +} + +/** The result of the `refresh_token` grant: a fresh access JWT only. */ +export interface RefreshedAccess { + accessJwt: string + /** ISO-8601 expiry of `accessJwt`. */ + expiresAt: string + org: string +} diff --git a/test/core/remote-identity-client.test.js b/test/core/remote-identity-client.test.js new file mode 100644 index 0000000..f2c6ba6 --- /dev/null +++ b/test/core/remote-identity-client.test.js @@ -0,0 +1,86 @@ +// @ts-check + +import test from 'node:test' +import assert from 'node:assert/strict' + +import { exchangeCode, refreshSession, InvalidGrantError } from '../../src/core/remote/identity_client.js' + +/** + * A fetch stub that records the last request and returns `reply`. + * + * @param {{ status?: number, body: any }} reply + */ +function stubFetch(reply) { + /** @type {{ url: string, init: any }[]} */ + const calls = [] + const fetchImpl = /** @type {any} */ (async (/** @type {string} */ url, /** @type {any} */ init) => { + calls.push({ url, init }) + const status = reply.status ?? 200 + return { + ok: status >= 200 && status < 300, + status, + text: async () => (typeof reply.body === 'string' ? reply.body : JSON.stringify(reply.body)), + } + }) + return { fetchImpl, calls } +} + +test('exchangeCode posts the authorization_code grant and maps the response', async () => { + const { fetchImpl, calls } = stubFetch({ + body: { session_id: 'sess-1', refresh_token: 'rt-1', access_jwt: 'jwt-1', expires_at: '2026-06-29T12:00:00Z', org: 'acme' }, + }) + const session = await exchangeCode({ + identityBase: 'https://hyp.internal/v1/identity', + code: 'auth-code', + codeVerifier: 'verifier-1', + fetchImpl, + }) + assert.deepEqual(session, { refreshToken: 'rt-1', accessJwt: 'jwt-1', expiresAt: '2026-06-29T12:00:00Z', org: 'acme' }) + assert.equal(calls[0].url, 'https://hyp.internal/v1/identity/token') + assert.deepEqual(JSON.parse(calls[0].init.body), { grant_type: 'authorization_code', code: 'auth-code', code_verifier: 'verifier-1' }) +}) + +test('refreshSession posts the refresh_token grant and maps the response', async () => { + const { fetchImpl, calls } = stubFetch({ + body: { access_jwt: 'jwt-2', expires_at: '2026-06-29T13:00:00Z', org: 'acme' }, + }) + const refreshed = await refreshSession({ + identityBase: 'https://hyp.internal/v1/identity', + refreshToken: 'rt-1', + fetchImpl, + }) + assert.deepEqual(refreshed, { accessJwt: 'jwt-2', expiresAt: '2026-06-29T13:00:00Z', org: 'acme' }) + assert.deepEqual(JSON.parse(calls[0].init.body), { grant_type: 'refresh_token', refresh_token: 'rt-1' }) +}) + +test('a 401 invalid_grant surfaces a typed InvalidGrantError', async () => { + const { fetchImpl } = stubFetch({ status: 401, body: { error: 'invalid_grant' } }) + await assert.rejects( + () => refreshSession({ identityBase: 'https://hyp.internal/v1/identity', refreshToken: 'stale', fetchImpl }), + (err) => { + assert.ok(err instanceof InvalidGrantError) + assert.equal(/** @type {any} */ (err).code, 'invalid_grant') + return true + }, + ) +}) + +test('a non-invalid_grant error throws a generic error, not InvalidGrantError', async () => { + const { fetchImpl } = stubFetch({ status: 500, body: { error: 'server_error' } }) + await assert.rejects( + () => refreshSession({ identityBase: 'https://hyp.internal/v1/identity', refreshToken: 'rt', fetchImpl }), + (err) => { + assert.ok(!(err instanceof InvalidGrantError)) + assert.match(/** @type {Error} */ (err).message, /HTTP 500/) + return true + }, + ) +}) + +test('a response missing access_jwt is rejected', async () => { + const { fetchImpl } = stubFetch({ body: { expires_at: 'x', org: 'acme' } }) + await assert.rejects( + () => refreshSession({ identityBase: 'https://hyp.internal/v1/identity', refreshToken: 'rt', fetchImpl }), + /missing 'access_jwt'/, + ) +}) diff --git a/test/core/remote-oidc-login.test.js b/test/core/remote-oidc-login.test.js new file mode 100644 index 0000000..86119ef --- /dev/null +++ b/test/core/remote-oidc-login.test.js @@ -0,0 +1,102 @@ +// @ts-check + +import test from 'node:test' +import assert from 'node:assert/strict' + +import { loginWithBrowser, buildStartUrl } from '../../src/core/remote/oidc_login.js' + +/** + * A scripted loopback receiver: captures the start URL the orchestrator built + * (via the redirectUri it hands back) and yields a fixed code. + * + * @param {{ code?: string, reject?: Error }} [opts] + */ +function scriptedReceiver(opts = {}) { + let closed = false + const startReceiver = /** @type {any} */ (async (/** @type {{ state: string }} */ args) => ({ + redirectUri: 'http://127.0.0.1:54321/callback', + port: 54321, + state: args.state, + waitForCode: async () => { + if (opts.reject) throw opts.reject + return { code: opts.code ?? 'the-code' } + }, + close: () => { closed = true }, + })) + return { startReceiver, wasClosed: () => closed } +} + +test('drives PKCE -> loopback -> exchange and returns the session', async (t) => { + /** @type {any[]} */ + const tokenCalls = [] + const fetchImpl = /** @type {any} */ (async (/** @type {string} */ url, /** @type {any} */ init) => { + tokenCalls.push({ url, body: JSON.parse(init.body) }) + return { + ok: true, + status: 200, + text: async () => JSON.stringify({ session_id: 's', refresh_token: 'rt', access_jwt: 'jwt', expires_at: '2026-06-29T12:00:00Z', org: 'acme' }), + } + }) + /** @type {string[]} */ + const openedUrls = [] + const { startReceiver, wasClosed } = scriptedReceiver({ code: 'code-xyz' }) + + const session = await loginWithBrowser({ + identityBase: 'https://hyp.internal/v1/identity', + org: 'acme', + openBrowser: (url) => { openedUrls.push(url); return true }, + fetchImpl, + startReceiver, + }) + + assert.deepEqual(session, { refreshToken: 'rt', accessJwt: 'jwt', expiresAt: '2026-06-29T12:00:00Z', org: 'acme' }) + // The browser was opened to a /login/start URL carrying the challenge + state + org. + const opened = new URL(openedUrls[0]) + assert.equal(opened.pathname, '/v1/identity/login/start') + assert.equal(opened.searchParams.get('code_challenge_method'), 'S256') + assert.ok(opened.searchParams.get('code_challenge')) + assert.ok(opened.searchParams.get('state')) + assert.equal(opened.searchParams.get('org'), 'acme') + assert.equal(opened.searchParams.get('redirect_uri'), 'http://127.0.0.1:54321/callback') + // The code was exchanged with the held verifier. + assert.equal(tokenCalls[0].body.grant_type, 'authorization_code') + assert.equal(tokenCalls[0].body.code, 'code-xyz') + assert.ok(tokenCalls[0].body.code_verifier) + assert.equal(wasClosed(), true) +}) + +test('--no-browser prints the URL instead of opening it', async () => { + const fetchImpl = /** @type {any} */ (async () => ({ + ok: true, status: 200, + text: async () => JSON.stringify({ refresh_token: 'rt', access_jwt: 'jwt', expires_at: 'x', org: 'acme' }), + })) + const { startReceiver } = scriptedReceiver() + /** @type {string[]} */ + const printed = [] + let openCalled = false + await loginWithBrowser({ + identityBase: 'https://hyp.internal/v1/identity', + noBrowser: true, + openBrowser: () => { openCalled = true; return true }, + fetchImpl, + startReceiver, + print: (line) => printed.push(line), + }) + assert.equal(openCalled, false) + assert.match(printed.join('\n'), /Open this URL/) + assert.match(printed.join('\n'), /\/login\/start/) +}) + +test('closes the loopback even when the flow rejects', async () => { + const { startReceiver, wasClosed } = scriptedReceiver({ reject: new Error('login failed: access_denied') }) + await assert.rejects( + () => loginWithBrowser({ identityBase: 'https://h/v1/identity', openBrowser: () => true, startReceiver }), + /access_denied/, + ) + assert.equal(wasClosed(), true) +}) + +test('buildStartUrl omits org when not given', () => { + const url = new URL(buildStartUrl({ identityBase: 'https://h/v1/identity', redirectUri: 'http://127.0.0.1:1/callback', challenge: 'c', state: 's' })) + assert.equal(url.searchParams.get('org'), null) +}) diff --git a/test/core/remote-open-browser.test.js b/test/core/remote-open-browser.test.js new file mode 100644 index 0000000..1745896 --- /dev/null +++ b/test/core/remote-open-browser.test.js @@ -0,0 +1,69 @@ +// @ts-check + +import test from 'node:test' +import assert from 'node:assert/strict' +import { EventEmitter } from 'node:events' + +import { openBrowser } from '../../src/core/remote/open_browser.js' + +/** + * A spawn stub that records the command and returns a child resembling a real + * ChildProcess (an EventEmitter with `unref`). + */ +function stubSpawn() { + /** @type {{ command: string, args: string[] }[]} */ + const calls = [] + /** @type {any[]} */ + const children = [] + const spawnImpl = /** @type {any} */ ((/** @type {string} */ command, /** @type {string[]} */ args) => { + calls.push({ command, args }) + const child = Object.assign(new EventEmitter(), { unref() {} }) + children.push(child) + return child + }) + return { spawnImpl, calls, children } +} + +test('darwin uses `open`', () => { + const { spawnImpl, calls } = stubSpawn() + const ok = openBrowser('https://x/auth', { platform: 'darwin', spawnImpl }) + assert.equal(ok, true) + assert.equal(calls[0].command, 'open') + assert.deepEqual(calls[0].args, ['https://x/auth']) +}) + +test('linux uses `xdg-open`', () => { + const { spawnImpl, calls } = stubSpawn() + openBrowser('https://x/auth', { platform: 'linux', spawnImpl }) + assert.equal(calls[0].command, 'xdg-open') +}) + +test('win32 uses `cmd /c start`', () => { + const { spawnImpl, calls } = stubSpawn() + openBrowser('https://x/auth', { platform: 'win32', spawnImpl }) + assert.equal(calls[0].command, 'cmd') + assert.deepEqual(calls[0].args, ['/c', 'start', '', 'https://x/auth']) +}) + +test('a synchronous spawn throw returns false (caller prints the URL)', () => { + const spawnImpl = /** @type {any} */ (() => { throw new Error('EACCES') }) + const ok = openBrowser('https://x/auth', { platform: 'linux', spawnImpl }) + assert.equal(ok, false) +}) + +test('an async spawn ENOENT (missing opener) does not crash the process', async () => { + // Real spawn delivers a missing-binary failure as an async "error" event, + // not a synchronous throw. openBrowser must attach a listener so that event + // cannot become an uncaught exception. + const { spawnImpl, children } = stubSpawn() + const ok = openBrowser('https://x/auth', { platform: 'linux', spawnImpl }) + assert.equal(ok, true) // best-effort: it spawned without throwing + let crashed = false + const onUnhandled = () => { crashed = true } + process.once('uncaughtException', onUnhandled) + // Emit the async failure the way a missing xdg-open would. + children[0].emit('error', Object.assign(new Error('spawn xdg-open ENOENT'), { code: 'ENOENT' })) + await new Promise((r) => setTimeout(r, 10)) + process.removeListener('uncaughtException', onUnhandled) + assert.equal(crashed, false) +}) From 44d88fc7c846ccc180c567cb3c051afe6180560d Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 11:40:02 -0700 Subject: [PATCH 04/59] remote/oidc: discriminated credential record + session-aware resolve (T4) (#203) * remote/oidc: discriminated credential record + session-aware resolve (T4) Extends the 0600 credential store to carry OIDC sessions alongside the existing static tokens (LLP 0046 D4), and adds the attach-path resolver (LLP 0046 D5): - Records gain a kind discriminator. readCredentials normalizes each record; a legacy token-only record (no kind) reads as static, so existing files keep working without a rewrite. writeToken now stamps kind: 'static'. - writeSession(stateDir, target, { refreshToken, accessJwt, expiresAt, org }) writes a kind: 'oidc' record through the same atomic 0600 path. - resolveAccessJwt({ target, env, stateDir, identityBase, now, fetchImpl }): env override wins; a static record returns its token; an oidc record returns a fresh access JWT, refreshing + persisting when the cached one is within a 60s skew of expiry, and propagating a refresh failure (typed invalid_grant). resolveToken stays for the stdio proxy. - removeToken drops either kind (whole-record delete, unchanged). - types.d.ts: RemoteStaticRecord / RemoteOidcRecord / the union. Unit tests cover the round-trip, legacy normalization, env override, fresh vs stale refresh + persistence, and failure propagation. Full suite green. * remote/oidc: normalizeRecord keeps a usable static token on a corrupt record Review follow-up (PR #199): an incomplete oidc shape (refreshToken but no accessJwt) on a hand-edited record no longer returns null and drop a working static `token`; it falls through to the static branch. Added boundary tests: the static fallthrough, a malformed-oidc drop, resolveToken on an oidc record, and a skew-window-boundary refresh. --- src/core/remote/credentials.js | 205 +++++++++++++++++++--- src/core/remote/types.d.ts | 29 +++ test/core/remote-credentials-oidc.test.js | 149 ++++++++++++++++ test/core/remote-credentials.test.js | 4 +- 4 files changed, 363 insertions(+), 24 deletions(-) create mode 100644 test/core/remote-credentials-oidc.test.js diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index f529f4c..3120a42 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -4,21 +4,35 @@ import fs from 'node:fs/promises' import path from 'node:path' import process from 'node:process' +import { refreshSession } from './identity_client.js' + +/** + * @import { RemoteCredentialRecord, RemoteOidcRecord } from '../../../src/core/remote/types.js' + */ + /** * Query-scoped credential store for the human-CLI `--remote` path (LLP 0033 * §credentials). The token is **never** config (secrets-never-in-config): it * lives in a single `0600` file, written atomically, mirroring `central`'s * `identity.json` single-file precedent. * + * Each per-target record is discriminated by `kind` (LLP 0046 D4): a + * `static` record is the LLP 0033 `{ token }`; an `oidc` record carries the + * refresh token + cached access JWT of a browser-login session. Migration is + * read-implicit: a legacy record with a `token` and no `kind` reads as + * `static`, so existing files keep working without a rewrite. One file, one + * resolve path, one remove path. + * * Stakes are low by scoping: what lands here is the **query-scoped** token * (read/compute tools only; cannot author configs or mint tokens), not the - * all-powerful operator token (LLP 0033 §credential-stakes). AI clients that - * install the endpoint directly hold their own token in their own MCP - * config; this store is only for `hyp`'s client path. + * all-powerful operator token (LLP 0033 §credential-stakes). */ const CREDENTIALS_BASENAME = 'remote-credentials.json' +/** Refresh an `oidc` access JWT once it is within this window of expiry. */ +const REFRESH_SKEW_MS = 60 * 1000 + /** * @param {string} stateDir * @returns {string} @@ -40,11 +54,16 @@ export function remoteTokenEnvVar(target) { } /** - * Read the credential map. Returns `{}` when the file is absent; throws on a - * corrupt file so a silent empty map can't mask a broken store. + * Read the credential map, normalizing each record to a discriminated + * {@link RemoteCredentialRecord}. Returns `{}` when the file is absent; throws + * on a corrupt file so a silent empty map can't mask a broken store. A legacy + * `token`-only record is read as `kind: 'static'` (read-implicit migration, + * LLP 0046 D4). * * @param {string} stateDir - * @returns {Promise>} + * @returns {Promise>} + * @ref LLP 0046#d4 [implements]: discriminated kind record; legacy token-only reads as static + * @ref LLP 0033#credentials [constrained-by]: single 0600 per-target store, secrets never in config */ export async function readCredentials(stateDir) { let raw @@ -64,18 +83,49 @@ export async function readCredentials(stateDir) { if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) { throw new Error(`remote credentials file must be a JSON object: ${remoteCredentialsPath(stateDir)}`) } - /** @type {Record} */ + /** @type {Record} */ const out = {} for (const [target, entry] of Object.entries(/** @type {Record} */ (parsed))) { - if (entry && typeof entry === 'object' && typeof (/** @type {any} */ (entry).token) === 'string') { - out[target] = { token: /** @type {any} */ (entry).token } - } + const record = normalizeRecord(entry) + if (record) out[target] = record } return out } /** - * Store (or replace) the token for a target. Atomic tmp+rename, mode `0600`. + * Normalize a stored entry into a discriminated record, or `null` if it is + * neither a usable static nor oidc record. + * + * @param {unknown} entry + * @returns {RemoteCredentialRecord | null} + */ +function normalizeRecord(entry) { + if (!entry || typeof entry !== 'object') return null + const e = /** @type {Record} */ (entry) + // A complete oidc record (explicit kind, or an inferable refresh+access pair). + if (e.kind === 'oidc' || (e.kind === undefined && typeof e.refreshToken === 'string')) { + if (typeof e.refreshToken === 'string' && typeof e.accessJwt === 'string') { + return { + kind: 'oidc', + refreshToken: e.refreshToken, + accessJwt: e.accessJwt, + expiresAt: typeof e.expiresAt === 'string' ? e.expiresAt : '', + org: typeof e.org === 'string' ? e.org : '', + } + } + // Incomplete oidc shape: fall through so a usable static `token` on the + // same (corrupt/hand-edited) record is not silently dropped. + } + // Legacy / static: a bare token with no (or static) kind. + if (typeof e.token === 'string') { + return { kind: 'static', token: e.token } + } + return null +} + +/** + * Store (or replace) a static query-scoped token for a target, stamped + * `kind: 'static'`. Atomic tmp+rename, mode `0600`. * * @param {string} stateDir * @param {string} target @@ -85,12 +135,36 @@ export async function readCredentials(stateDir) { export async function writeToken(stateDir, target, token) { await fs.mkdir(stateDir, { recursive: true }) const current = await readCredentials(stateDir) - current[target] = { token } + current[target] = { kind: 'static', token } await writeCredentials(stateDir, current) } /** - * Remove a target's stored token. Returns whether one was present. + * Store (or replace) an OIDC session for a target, stamped `kind: 'oidc'`. + * Same atomic 0600 path as {@link writeToken} (LLP 0046 D4). + * + * @param {string} stateDir + * @param {string} target + * @param {{ refreshToken: string, accessJwt: string, expiresAt: string, org: string }} session + * @returns {Promise} + * @ref LLP 0046#d4 [implements]: oidc session written through the same atomic 0600 store as static tokens + */ +export async function writeSession(stateDir, target, session) { + await fs.mkdir(stateDir, { recursive: true }) + const current = await readCredentials(stateDir) + current[target] = { + kind: 'oidc', + refreshToken: session.refreshToken, + accessJwt: session.accessJwt, + expiresAt: session.expiresAt, + org: session.org, + } + await writeCredentials(stateDir, current) +} + +/** + * Remove a target's stored record (either kind). Returns whether one was + * present. * * @param {string} stateDir * @param {string} target @@ -105,9 +179,12 @@ export async function removeToken(stateDir, target) { } /** - * Resolve a target's token at query time. Order: per-target env var - * (CI/ephemeral) → stored file → error. An env override never falls through - * to the file, so an ephemeral token always wins (LLP 0033 §credentials). + * Resolve a target's bearer token at query time, **without** session-aware + * refresh. Order: per-target env var (CI/ephemeral) → stored file → error. An + * env override never falls through to the file (LLP 0033 §credentials). For an + * `oidc` record this returns the cached access JWT as-is; the attach path uses + * {@link resolveAccessJwt} for refresh. Kept for the stdio proxy and other + * non-refreshing callers. * * @param {{ target: string, env: NodeJS.ProcessEnv, stateDir: string }} args * @returns {Promise<{ ok: true, token: string, source: 'env' | 'file' } | { ok: false, error: string }>} @@ -120,18 +197,102 @@ export async function resolveToken({ target, env, stateDir }) { } const creds = await readCredentials(stateDir) const entry = creds[target] - if (entry && entry.token.length > 0) { - return { ok: true, token: entry.token, source: 'file' } + const token = entry ? bearerOf(entry) : '' + if (token.length > 0) { + return { ok: true, token, source: 'file' } + } + return { ok: false, error: noTokenError(target, envName) } +} + +/** + * Session-aware resolver for the query attach path (LLP 0046 D5). The + * per-target env override still wins. A `static` record returns its token. An + * `oidc` record returns a fresh access JWT, calling `refreshSession` and + * persisting the new JWT/expiry when the stored one is within a skew window of + * expiry; a non-stale JWT is returned untouched. A refresh failure (including + * a typed `invalid_grant`) propagates to the caller. + * + * @param {{ + * target: string, + * env: NodeJS.ProcessEnv, + * stateDir: string, + * identityBase?: string, + * now?: number, + * fetchImpl?: typeof fetch, + * }} args + * @returns {Promise<{ ok: true, token: string, source: 'env' | 'file', kind?: 'static' | 'oidc' } | { ok: false, error: string }>} + * @ref LLP 0046#d5 [implements]: silent refresh on the attach path; env override still wins + */ +export async function resolveAccessJwt({ target, env, stateDir, identityBase, now = Date.now(), fetchImpl }) { + const envName = remoteTokenEnvVar(target) + const fromEnv = env[envName] + if (typeof fromEnv === 'string' && fromEnv.length > 0) { + return { ok: true, token: fromEnv, source: 'env', kind: 'static' } + } + + const creds = await readCredentials(stateDir) + const entry = creds[target] + if (!entry) { + return { ok: false, error: noTokenError(target, envName) } } - return { - ok: false, - error: `no token for '${target}' - run 'hyp remote login ${target}' (or set ${envName})`, + if (entry.kind === 'static') { + return entry.token.length > 0 + ? { ok: true, token: entry.token, source: 'file', kind: 'static' } + : { ok: false, error: noTokenError(target, envName) } } + + // oidc: refresh if the cached JWT is missing, unparseable, or near expiry. + if (!isFresh(entry, now)) { + if (!identityBase) { + return { ok: false, error: `cannot refresh '${target}': no identity endpoint resolved` } + } + const refreshed = await refreshSession({ identityBase, refreshToken: entry.refreshToken, fetchImpl }) + /** @type {RemoteOidcRecord} */ + const next = { ...entry, accessJwt: refreshed.accessJwt, expiresAt: refreshed.expiresAt, org: refreshed.org } + await writeSession(stateDir, target, next) + return { ok: true, token: refreshed.accessJwt, source: 'file', kind: 'oidc' } + } + return { ok: true, token: entry.accessJwt, source: 'file', kind: 'oidc' } +} + +/** + * Whether an `oidc` record's cached JWT is still safely usable: present and + * more than the skew window from its parseable expiry. + * + * @param {RemoteOidcRecord} record + * @param {number} now + * @returns {boolean} + */ +function isFresh(record, now) { + if (!record.accessJwt) return false + const expiry = Date.parse(record.expiresAt) + if (Number.isNaN(expiry)) return false + return expiry - now > REFRESH_SKEW_MS +} + +/** + * The bearer token a record presents as-is (no refresh): the static token, or + * the oidc cached access JWT. + * + * @param {RemoteCredentialRecord} record + * @returns {string} + */ +function bearerOf(record) { + return record.kind === 'oidc' ? record.accessJwt : record.token +} + +/** + * @param {string} target + * @param {string} envName + * @returns {string} + */ +function noTokenError(target, envName) { + return `no token for '${target}' - run 'hyp remote login ${target}' (or set ${envName})` } /** * @param {string} stateDir - * @param {Record} map + * @param {Record} map * @returns {Promise} */ async function writeCredentials(stateDir, map) { diff --git a/src/core/remote/types.d.ts b/src/core/remote/types.d.ts index c0456da..0a07c0c 100644 --- a/src/core/remote/types.d.ts +++ b/src/core/remote/types.d.ts @@ -25,3 +25,32 @@ export interface RefreshedAccess { expiresAt: string org: string } + +/** + * A legacy / static credential record: a bare query-scoped token (LLP 0033). + * On read, a record with a `token` and no `kind` is normalized to this. + */ +export interface RemoteStaticRecord { + kind: 'static' + token: string +} + +/** + * An OIDC session record (LLP 0046 D4): the revocable refresh token plus the + * cached short-lived access JWT, its expiry, and the resolved org. + */ +export interface RemoteOidcRecord { + kind: 'oidc' + refreshToken: string + accessJwt: string + /** ISO-8601 expiry of `accessJwt`. */ + expiresAt: string + org: string +} + +/** + * One per-target record in `remote-credentials.json`, discriminated by + * `kind`. Both kinds live in the same file, share one resolve path, and are + * dropped by the same `removeToken`. + */ +export type RemoteCredentialRecord = RemoteStaticRecord | RemoteOidcRecord diff --git a/test/core/remote-credentials-oidc.test.js b/test/core/remote-credentials-oidc.test.js new file mode 100644 index 0000000..28c3c89 --- /dev/null +++ b/test/core/remote-credentials-oidc.test.js @@ -0,0 +1,149 @@ +// @ts-check + +import test from 'node:test' +import assert from 'node:assert/strict' +import fs from 'node:fs/promises' +import os from 'node:os' +import path from 'node:path' + +import { + readCredentials, + remoteCredentialsPath, + removeToken, + resolveAccessJwt, + resolveToken, + writeSession, + writeToken, +} from '../../src/core/remote/credentials.js' + +async function tmpState() { + return fs.mkdtemp(path.join(os.tmpdir(), 'hyp-oidc-creds-')) +} + +const FUTURE = '2999-01-01T00:00:00Z' +const PAST = '2000-01-01T00:00:00Z' + +test('an oidc session round-trips with kind: oidc', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt', expiresAt: FUTURE, org: 'acme' }) + const creds = await readCredentials(dir) + assert.deepEqual(creds.prod, { kind: 'oidc', refreshToken: 'rt', accessJwt: 'jwt', expiresAt: FUTURE, org: 'acme' }) + const st = await fs.stat(remoteCredentialsPath(dir)) + assert.equal(st.mode & 0o777, 0o600) +}) + +test('a record with a refreshToken but no accessJwt still yields its usable static token', async () => { + const dir = await tmpState() + // A corrupt/hand-edited record: an incomplete oidc shape that also carries a + // working static token. The token must not be silently dropped. + await fs.writeFile(remoteCredentialsPath(dir), JSON.stringify({ prod: { refreshToken: 'rt', token: 'still-good' } })) + const creds = await readCredentials(dir) + assert.deepEqual(creds.prod, { kind: 'static', token: 'still-good' }) +}) + +test('a malformed oidc record (no accessJwt, no token) is dropped on read', async () => { + const dir = await tmpState() + await fs.writeFile(remoteCredentialsPath(dir), JSON.stringify({ prod: { kind: 'oidc', refreshToken: 'rt' } })) + assert.deepEqual(await readCredentials(dir), {}) +}) + +test('resolveToken returns an oidc record cached access JWT as-is', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-cached', expiresAt: FUTURE, org: 'acme' }) + const r = await resolveToken({ target: 'prod', env: {}, stateDir: dir }) + assert.deepEqual(r, { ok: true, token: 'jwt-cached', source: 'file' }) +}) + +test('resolveAccessJwt refreshes a JWT inside the skew window (not yet past)', async () => { + const dir = await tmpState() + // Expiry 30s in the future, inside the 60s skew window: must refresh. + const now = Date.parse('2026-06-29T12:00:00Z') + const soon = new Date(now + 30 * 1000).toISOString() + await writeSession(dir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-soon', expiresAt: soon, org: 'acme' }) + let refreshed = false + const fetchImpl = /** @type {any} */ (async () => { + refreshed = true + return { ok: true, status: 200, text: async () => JSON.stringify({ access_jwt: 'jwt-new', expires_at: FUTURE, org: 'acme' }) } + }) + const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', now, fetchImpl }) + assert.equal(refreshed, true) + assert.equal(/** @type {any} */ (r).token, 'jwt-new') +}) + +test('a legacy token-only record reads as kind: static', async () => { + const dir = await tmpState() + // Write the pre-kind on-disk shape directly. + await fs.writeFile(remoteCredentialsPath(dir), JSON.stringify({ prod: { token: 'legacy' } })) + const creds = await readCredentials(dir) + assert.deepEqual(creds.prod, { kind: 'static', token: 'legacy' }) +}) + +test('writeToken now stamps kind: static', async () => { + const dir = await tmpState() + await writeToken(dir, 'prod', 'sk-1') + const creds = await readCredentials(dir) + assert.deepEqual(creds.prod, { kind: 'static', token: 'sk-1' }) +}) + +test('removeToken clears an oidc record too', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt', expiresAt: FUTURE, org: 'acme' }) + assert.equal(await removeToken(dir, 'prod'), true) + assert.deepEqual(await readCredentials(dir), {}) +}) + +test('resolveAccessJwt returns a static token unchanged', async () => { + const dir = await tmpState() + await writeToken(dir, 'prod', 'sk-1') + const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir }) + assert.deepEqual(r, { ok: true, token: 'sk-1', source: 'file', kind: 'static' }) +}) + +test('resolveAccessJwt honors the per-target env override over the file', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt', expiresAt: FUTURE, org: 'acme' }) + const r = await resolveAccessJwt({ target: 'prod', env: { HYP_REMOTE_TOKEN_PROD: 'env-tok' }, stateDir: dir }) + assert.deepEqual(r, { ok: true, token: 'env-tok', source: 'env', kind: 'static' }) +}) + +test('resolveAccessJwt returns a fresh oidc JWT without calling refresh', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-fresh', expiresAt: FUTURE, org: 'acme' }) + let called = false + const fetchImpl = /** @type {any} */ (async () => { called = true; return { ok: true, status: 200, text: async () => '{}' } }) + const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }) + assert.equal(/** @type {any} */ (r).token, 'jwt-fresh') + assert.equal(called, false) +}) + +test('resolveAccessJwt refreshes a stale oidc JWT and persists the new one', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) + const fetchImpl = /** @type {any} */ (async () => ({ + ok: true, status: 200, + text: async () => JSON.stringify({ access_jwt: 'jwt-new', expires_at: FUTURE, org: 'acme' }), + })) + const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }) + assert.equal(/** @type {any} */ (r).token, 'jwt-new') + // The new JWT + expiry were persisted. + const creds = await readCredentials(dir) + assert.equal(/** @type {any} */ (creds.prod).accessJwt, 'jwt-new') + assert.equal(/** @type {any} */ (creds.prod).expiresAt, FUTURE) +}) + +test('resolveAccessJwt errors with login guidance when no record exists', async () => { + const dir = await tmpState() + const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir }) + assert.equal(r.ok, false) + assert.match(/** @type {any} */ (r).error, /no token for 'prod' - run 'hyp remote login prod'/) +}) + +test('resolveAccessJwt propagates a refresh failure (invalid_grant)', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'stale', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) + const fetchImpl = /** @type {any} */ (async () => ({ ok: false, status: 401, text: async () => JSON.stringify({ error: 'invalid_grant' }) })) + await assert.rejects( + () => resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }), + (err) => { assert.equal(/** @type {any} */ (err).code, 'invalid_grant'); return true }, + ) +}) diff --git a/test/core/remote-credentials.test.js b/test/core/remote-credentials.test.js index f596950..effab97 100644 --- a/test/core/remote-credentials.test.js +++ b/test/core/remote-credentials.test.js @@ -34,7 +34,7 @@ test('writeToken persists, is 0600, and round-trips', async () => { const dir = await tmpState() await writeToken(dir, 'prod', 'sk-1') const creds = await readCredentials(dir) - assert.equal(creds.prod.token, 'sk-1') + assert.deepEqual(creds.prod, { kind: 'static', token: 'sk-1' }) const st = await fs.stat(remoteCredentialsPath(dir)) assert.equal(st.mode & 0o777, 0o600) }) @@ -47,7 +47,7 @@ test('writeToken merges, removeToken drops only the named target', async () => { assert.equal(await removeToken(dir, 'prod'), false) // already gone const creds = await readCredentials(dir) assert.equal(creds.prod, undefined) - assert.equal(creds.staging.token, 'sk-2') + assert.deepEqual(creds.staging, { kind: 'static', token: 'sk-2' }) }) test('resolveToken order: env overrides file', async () => { From b25785c0e449e6596a74b0ddbcd21997e6c5d565 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 11:43:17 -0700 Subject: [PATCH 05/59] remote/oidc: browser mode for hyp remote login (T6) (#200) * remote/oidc: browser mode for `hyp remote login` (T6) runRemoteLogin gains an interactive browser authorization-code mode (LLP 0046 D1), selected by default when no static token is supplied: - Flag parsing: --token-file / stdin keep the static path unchanged (kind: 'static', the headless escape hatch, D8); --org selects an org; --browser forces the flow even with stdin piped; --no-browser prints the URL instead of opening it. - The identity base is derived from the configured target URL's origin as /v1/identity, so no second URL is configured (D6). - On success the resolved org is printed and the OIDC session is stored via writeSession. - A server-surfaced callback error (access_denied, no_membership, org_selection_required, org_not_permitted) is translated to a clear message; org_selection_required instructs a --org re-run rather than enumerating the user's orgs (D7). A small `deps.login` seam keeps the browser path unit-testable. Tests cover --org + identity-base forwarding, --no-browser, each error mapping, the unconfigured-target refusal, and the unchanged static token-file and piped-stdin paths. * remote/oidc: robust login flag parsing; note --org on the static path Review follow-ups (PR #200): - the target name now skips the value slot of --token-file/--org via a firstPositional() helper, so `login --org acme` (name omitted) is a usage error instead of misreading 'acme' as the target. - a static path with --org now prints a note that --org applies only to the browser flow, instead of silently dropping it. - tests: --browser overriding piped stdin, only-flags usage error, --org-missing-value exit, and the static --org-ignored note. --- src/core/cli/remote_commands.js | 188 +++++++++++++++++++++--- test/core/remote-login-command.test.js | 190 +++++++++++++++++++++++++ 2 files changed, 357 insertions(+), 21 deletions(-) create mode 100644 test/core/remote-login-command.test.js diff --git a/src/core/cli/remote_commands.js b/src/core/cli/remote_commands.js index d7a77ed..82a317e 100644 --- a/src/core/cli/remote_commands.js +++ b/src/core/cli/remote_commands.js @@ -10,8 +10,10 @@ import { readCredentials, remoteTokenEnvVar, removeToken, + writeSession, writeToken, } from '../remote/credentials.js' +import { loginWithBrowser } from '../remote/oidc_login.js' /** * @import { CommandRunContext } from '../../../collectivus-plugin-kernel-types.js' @@ -34,6 +36,8 @@ export async function runRemoteHelp(argv, ctx) { if (argv.length === 0 || argv[0] === '--help' || argv[0] === '-h') { ctx.stdout.write('usage: hyp remote [args...]\n') ctx.stdout.write(' subcommands: add, login, list, remove\n') + ctx.stdout.write(' login: browser sign-in by default; --token-file/stdin for a static token,\n') + ctx.stdout.write(' --org to select an org, --no-browser to print the URL\n') return 0 } ctx.stderr.write(`hyp remote: unknown subcommand '${argv[0]}'\n`) @@ -77,41 +81,98 @@ export async function runRemoteAdd(argv, ctx) { } /** - * `hyp remote login `: store the query-scoped token for a target. - * Token source: `--token-file ` or piped stdin (a TTY with neither is - * an error, never a hang). + * `hyp remote login `: populate the target's query-scoped credential. + * + * Two modes, one store (LLP 0046 D1). A **static** token still comes from + * `--token-file ` or piped stdin, unchanged (the headless escape hatch, + * D8). Otherwise an interactive **browser** authorization-code flow runs + * against the target's identity endpoint and stores an OIDC session. `--org` + * selects an org; `--browser` forces the flow even with stdin piped; + * `--no-browser` prints the URL instead of opening it. * * @param {string[]} argv * @param {CommandRunContext} ctx - * @ref LLP 0033#credentials [implements]: token to the 0600 store, never config; one login per server + * @param {{ login?: typeof loginWithBrowser }} [deps] test seam for the browser flow + * @ref LLP 0046#d1 [implements]: browser mode of `hyp remote login`; one command, one store, one more way to populate it */ -export async function runRemoteLogin(argv, ctx) { - const positional = argv.filter((a) => !a.startsWith('-')) - const name = positional[0] - if (!name) { - ctx.stderr.write('usage: hyp remote login [--token-file ]\n') - return 2 - } +export async function runRemoteLogin(argv, ctx, deps = {}) { const tokenFileIdx = argv.indexOf('--token-file') const tokenFile = tokenFileIdx >= 0 ? argv[tokenFileIdx + 1] : undefined if (tokenFileIdx >= 0 && (!tokenFile || tokenFile.startsWith('-'))) { ctx.stderr.write('hyp remote login: --token-file expects a path\n') return 2 } + const orgIdx = argv.indexOf('--org') + const org = orgIdx >= 0 ? argv[orgIdx + 1] : undefined + if (orgIdx >= 0 && (!org || org.startsWith('-'))) { + ctx.stderr.write('hyp remote login: --org expects an org name\n') + return 2 + } + // The target name is the first positional. Skip the VALUE slot of a + // value-taking flag so e.g. `login --org acme` (name omitted) is not misread + // as the target 'acme'. + const name = firstPositional(argv, new Set(['--token-file', '--org'])) + if (!name) { + ctx.stderr.write('usage: hyp remote login [--token-file ] [--org ] [--no-browser]\n') + return 2 + } + const forceBrowser = argv.includes('--browser') + const noBrowser = argv.includes('--no-browser') + + const stdin = /** @type {any} */ (ctx.stdin ?? process.stdin) + const stdinPiped = !!stdin && !stdin.isTTY + // Static path: an explicit token file, or a piped token unless --browser + // forces the interactive flow. + const useStatic = !!tokenFile || (stdinPiped && !forceBrowser) + + if (useStatic) { + // --org only applies to the browser flow; say so rather than silently drop it. + if (org) { + ctx.stderr.write('note: --org is ignored with a static token (it applies to the browser login flow)\n') + } + return runStaticLogin(name, tokenFile, stdin, ctx) + } + return runBrowserLogin(name, { org, noBrowser }, ctx, deps.login ?? loginWithBrowser) +} +/** + * Return the first positional argument (not a flag, not the value slot of a + * value-taking flag). + * + * @param {string[]} argv + * @param {Set} valueFlags + * @returns {string | undefined} + */ +function firstPositional(argv, valueFlags) { + for (let i = 0; i < argv.length; i++) { + const a = argv[i] + if (a.startsWith('-')) { + if (valueFlags.has(a)) i++ // consume its value + continue + } + return a + } + return undefined +} + +/** + * The static-token path (LLP 0033, unchanged behavior): read a token from + * `--token-file` or piped stdin and store it as a `kind: 'static'` record. + * + * @param {string} name + * @param {string | undefined} tokenFile + * @param {any} stdin + * @param {CommandRunContext} ctx + * @returns {Promise} + * @ref LLP 0046#d8 [implements]: static token stays the documented headless fallback + */ +async function runStaticLogin(name, tokenFile, stdin, ctx) { /** @type {string} */ let token try { - if (tokenFile) { - token = (await fs.readFile(tokenFile, 'utf8')).trim() - } else { - const stdin = /** @type {any} */ (ctx.stdin ?? process.stdin) - if (stdin && stdin.isTTY) { - ctx.stderr.write("hyp remote login: provide the token via --token-file or pipe it on stdin\n") - return 2 - } - token = (await readAllStdin(stdin)).trim() - } + token = tokenFile + ? (await fs.readFile(tokenFile, 'utf8')).trim() + : (await readAllStdin(stdin)).trim() } catch (err) { ctx.stderr.write(`hyp remote login: ${err instanceof Error ? err.message : String(err)}\n`) return 1 @@ -134,6 +195,91 @@ export async function runRemoteLogin(argv, ctx) { return 0 } +/** + * The browser authorization-code path (LLP 0046 D1/D6/D7). Derives the + * identity base from the configured target URL's origin, runs the loopback + * flow, and stores the resulting OIDC session. + * + * @param {string} name + * @param {{ org?: string, noBrowser: boolean }} opts + * @param {CommandRunContext} ctx + * @param {typeof loginWithBrowser} login + * @returns {Promise} + */ +async function runBrowserLogin(name, { org, noBrowser }, ctx, login) { + const remotes = await readConfiguredRemotes(ctx) + const entry = remotes[name] + if (!entry) { + ctx.stderr.write(`hyp remote login: '${name}' is not a configured target - add it first with 'hyp remote add ${name} '\n`) + ctx.stderr.write(" (or pass a static token with --token-file )\n") + return 2 + } + const identityBase = deriveIdentityBase(entry.url) + if (!identityBase) { + ctx.stderr.write(`hyp remote login: target '${name}' has an invalid url (${entry.url})\n`) + return 2 + } + + const stateDir = readObservabilityEnv(ctx.env).stateDir + try { + const session = await login({ + identityBase, + org, + noBrowser, + print: (line) => ctx.stderr.write(`${line}\n`), + }) + await writeSession(stateDir, name, session) + ctx.stdout.write(`logged in to '${name}' as org '${session.org}' (session stored, mode 0600)\n`) + return 0 + } catch (err) { + const callbackError = /** @type {any} */ (err)?.callbackError + ctx.stderr.write(`hyp remote login: ${explainLoginError(callbackError, err)}\n`) + return 1 + } +} + +/** + * Derive the identity base `/v1/identity` from a target's MCP URL + * (LLP 0046 D6): identity is mounted at the same origin, so no second URL is + * configured. Returns `null` for an unparseable URL. + * + * @param {string} url + * @returns {string | null} + * @ref LLP 0046#d6 [implements]: identity endpoints derive from the configured remote URL origin + */ +export function deriveIdentityBase(url) { + try { + return `${new URL(url).origin}/v1/identity` + } catch { + return null + } +} + +/** + * Translate a server-surfaced callback `error` (D7) into a clear message. The + * client never sees the user's org list, so `org_selection_required` instructs + * a re-run with `--org` rather than enumerating. + * + * @param {string | undefined} callbackError + * @param {unknown} err + * @returns {string} + * @ref LLP 0046#d7 [implements]: org selector errors explained; never enumerate the user's orgs + */ +function explainLoginError(callbackError, err) { + switch (callbackError) { + case 'access_denied': + return 'login was denied at the provider' + case 'no_membership': + return 'this account is not a member of any org on this server - ask an admin to invite you' + case 'org_selection_required': + return 'this account has more than one org - re-run with --org to choose one' + case 'org_not_permitted': + return 'the selected org is not permitted for this account - check the --org name' + default: + return err instanceof Error ? err.message : String(err) + } +} + /** * `hyp remote list`: targets + token status (`stored` / `env` / `missing`), * never the token itself. diff --git a/test/core/remote-login-command.test.js b/test/core/remote-login-command.test.js new file mode 100644 index 0000000..d052ea4 --- /dev/null +++ b/test/core/remote-login-command.test.js @@ -0,0 +1,190 @@ +// @ts-check + +import test from 'node:test' +import assert from 'node:assert/strict' +import fs from 'node:fs/promises' +import os from 'node:os' +import path from 'node:path' + +import { runRemoteLogin, deriveIdentityBase } from '../../src/core/cli/remote_commands.js' +import { readCredentials } from '../../src/core/remote/credentials.js' + +async function tmpHome() { + return fs.mkdtemp(path.join(os.tmpdir(), 'hyp-login-')) +} + +/** + * Build a ctx with a stub stdin (TTY by default so the browser path is the + * default), captured streams, and a configured `prod` target written to a + * real config file (the command resolves targets from the config file). + * + * @param {{ hypHome: string, stdin?: any, remotes?: any }} opts + */ +async function makeCtx({ hypHome, stdin, remotes }) { + /** @type {string[]} */ const out = [] + /** @type {string[]} */ const err = [] + const configPath = path.join(hypHome, 'config.json') + const resolvedRemotes = remotes ?? { prod: { url: 'https://hyp.internal/mcp' } } + await fs.writeFile(configPath, JSON.stringify({ version: 2, query: { remotes: resolvedRemotes } })) + const ctx = /** @type {any} */ ({ + env: { HYP_HOME: hypHome, HYP_CONFIG: configPath }, + config: { version: 2, query: { remotes: resolvedRemotes } }, + stdin: stdin ?? { isTTY: true }, + stdout: { write: (/** @type {string} */ s) => out.push(s) }, + stderr: { write: (/** @type {string} */ s) => err.push(s) }, + }) + return { ctx, out, err } +} + +test('deriveIdentityBase yields /v1/identity', () => { + assert.equal(deriveIdentityBase('https://hyp.internal/mcp'), 'https://hyp.internal/v1/identity') + assert.equal(deriveIdentityBase('https://hyp.internal:8443/a/b/mcp'), 'https://hyp.internal:8443/v1/identity') + assert.equal(deriveIdentityBase('not a url'), null) +}) + +test('browser mode forwards --org and the derived identity base, then stores the session', async () => { + const hypHome = await tmpHome() + const { ctx, out } = await makeCtx({ hypHome }) + /** @type {any} */ let seen + const login = /** @type {any} */ (async (/** @type {any} */ args) => { + seen = args + return { refreshToken: 'rt', accessJwt: 'jwt', expiresAt: '2999-01-01T00:00:00Z', org: 'acme' } + }) + + const code = await runRemoteLogin(['prod', '--org', 'acme'], ctx, { login }) + assert.equal(code, 0) + assert.equal(seen.identityBase, 'https://hyp.internal/v1/identity') + assert.equal(seen.org, 'acme') + assert.equal(seen.noBrowser, false) + assert.match(out.join(''), /logged in to 'prod' as org 'acme'/) + + const stateDir = path.join(hypHome, 'hypaware') + const creds = await readCredentials(stateDir) + assert.equal(/** @type {any} */ (creds.prod).kind, 'oidc') + assert.equal(/** @type {any} */ (creds.prod).refreshToken, 'rt') +}) + +test('--no-browser passes noBrowser through to the flow', async () => { + const hypHome = await tmpHome() + const { ctx } = await makeCtx({ hypHome }) + /** @type {any} */ let seen + const login = /** @type {any} */ (async (/** @type {any} */ args) => { + seen = args + return { refreshToken: 'rt', accessJwt: 'jwt', expiresAt: '2999-01-01T00:00:00Z', org: 'acme' } + }) + await runRemoteLogin(['prod', '--no-browser'], ctx, { login }) + assert.equal(seen.noBrowser, true) +}) + +test('a callback error maps to a clear org-selection message', async () => { + const hypHome = await tmpHome() + const { ctx, err } = await makeCtx({ hypHome }) + const login = /** @type {any} */ (async () => { + throw Object.assign(new Error('login failed: org_selection_required'), { callbackError: 'org_selection_required' }) + }) + const code = await runRemoteLogin(['prod'], ctx, { login }) + assert.equal(code, 1) + assert.match(err.join(''), /re-run with --org /) +}) + +test('no_membership maps to its own message', async () => { + const hypHome = await tmpHome() + const { ctx, err } = await makeCtx({ hypHome }) + const login = /** @type {any} */ (async () => { + throw Object.assign(new Error('x'), { callbackError: 'no_membership' }) + }) + await runRemoteLogin(['prod'], ctx, { login }) + assert.match(err.join(''), /not a member of any org/) +}) + +test('browser mode on an unconfigured target refuses before any flow', async () => { + const hypHome = await tmpHome() + const { ctx, err } = await makeCtx({ hypHome, remotes: {} }) + let called = false + const login = /** @type {any} */ (async () => { called = true; return {} }) + const code = await runRemoteLogin(['ghost'], ctx, { login }) + assert.equal(code, 2) + assert.equal(called, false) + assert.match(err.join(''), /not a configured target/) +}) + +test('the static --token-file path is unchanged (stores kind: static)', async () => { + const hypHome = await tmpHome() + const tokenFile = path.join(hypHome, 'tok.txt') + await fs.writeFile(tokenFile, 'sk-static\n') + const { ctx, out } = await makeCtx({ hypHome }) + let called = false + const login = /** @type {any} */ (async () => { called = true; return {} }) + const code = await runRemoteLogin(['prod', '--token-file', tokenFile], ctx, { login }) + assert.equal(code, 0) + assert.equal(called, false) // browser flow not entered + assert.match(out.join(''), /stored query-scoped token for 'prod'/) + const stateDir = path.join(hypHome, 'hypaware') + const creds = await readCredentials(stateDir) + assert.deepEqual(creds.prod, { kind: 'static', token: 'sk-static' }) +}) + +test('a piped stdin token still takes the static path', async () => { + const hypHome = await tmpHome() + // A non-TTY stdin that yields a token, the way a piped `echo tok |` does. + const stdin = { + isTTY: false, + async *[Symbol.asyncIterator]() { yield Buffer.from('piped-tok\n') }, + } + const { ctx } = await makeCtx({ hypHome, stdin }) + let called = false + const login = /** @type {any} */ (async () => { called = true; return {} }) + const code = await runRemoteLogin(['prod'], ctx, { login }) + assert.equal(code, 0) + assert.equal(called, false) + const stateDir = path.join(hypHome, 'hypaware') + const creds = await readCredentials(stateDir) + assert.deepEqual(creds.prod, { kind: 'static', token: 'piped-tok' }) +}) + +test('--browser overrides a piped stdin token and takes the browser flow', async () => { + const hypHome = await tmpHome() + const stdin = { + isTTY: false, + async *[Symbol.asyncIterator]() { yield Buffer.from('piped-tok\n') }, + } + const { ctx } = await makeCtx({ hypHome, stdin }) + let called = false + const login = /** @type {any} */ (async () => { + called = true + return { refreshToken: 'rt', accessJwt: 'jwt', expiresAt: '2999-01-01T00:00:00Z', org: 'acme' } + }) + const code = await runRemoteLogin(['prod', '--browser'], ctx, { login }) + assert.equal(code, 0) + assert.equal(called, true) +}) + +test('a missing target name (only flags) is a usage error, not a flag value misread as the name', async () => { + const hypHome = await tmpHome() + const { ctx, err } = await makeCtx({ hypHome }) + let called = false + const login = /** @type {any} */ (async () => { called = true; return {} }) + // `--org acme` with no positional name must not be read as target 'acme'. + const code = await runRemoteLogin(['--org', 'acme'], ctx, { login }) + assert.equal(code, 2) + assert.equal(called, false) + assert.match(err.join(''), /usage: hyp remote login /) +}) + +test('--org as the last arg with no value is a usage error', async () => { + const hypHome = await tmpHome() + const { ctx, err } = await makeCtx({ hypHome }) + const code = await runRemoteLogin(['prod', '--org'], ctx, {}) + assert.equal(code, 2) + assert.match(err.join(''), /--org expects an org name/) +}) + +test('--org is noted as ignored when a static token forces the static path', async () => { + const hypHome = await tmpHome() + const tokenFile = path.join(hypHome, 'tok.txt') + await fs.writeFile(tokenFile, 'sk-static\n') + const { ctx, err } = await makeCtx({ hypHome }) + const code = await runRemoteLogin(['prod', '--token-file', tokenFile, '--org', 'acme'], ctx, {}) + assert.equal(code, 0) + assert.match(err.join(''), /--org is ignored with a static token/) +}) From 197ce4a1773eadbcea541d40df87d58c66e16d5c Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 11:45:19 -0700 Subject: [PATCH 06/59] remote/oidc: silent refresh + one-shot 401 retry on the attach path (T7) (#201) * remote/oidc: silent refresh + one-shot 401 retry on the attach path (T7) The query attach path becomes session-aware (LLP 0046 D5): - remote_verb.js resolves the bearer via resolveAccessJwt (not resolveToken), passing the origin-derived identity base. A stale cached JWT is refreshed and persisted before the call. - client.js tags a 401/403 rejection (authError + status) so the attach path can recognize it. On a live auth error for an oidc/file session, the attach path forces a single refresh and retries once; an env override or static token is surfaced as-is (nothing to refresh). - A refresh that fails invalid_grant surfaces "remote session expired - re-run 'hyp remote login '". - resolveAccessJwt gains forceRefresh for the retry; deriveIdentityBase moves to credentials.js as the shared home for the login command and the attach path (single D6 definition). Tests: a stale stored JWT refreshes + persists pre-call; a mid-flight 401 triggers exactly one refresh + retry; invalid_grant maps to re-login guidance; a static 401 is not retried. Full suite green (1527). * remote/oidc: map invalid_grant on the initial resolve too (not just 401 retry) Review follow-up (PR #201): the initial resolveAccessJwt can itself refresh (and throw invalid_grant) when the stored JWT is already stale, which escaped as an unhandled rejection. Wrap it and map via a shared mapRefreshError helper, exactly like the mid-flight 401 retry path. Added a stale-JWT pre-call-refresh-fails unit test. --- src/core/cli/remote_commands.js | 18 +-- src/core/mcp/client.js | 7 +- src/core/mcp/remote_verb.js | 98 ++++++++++++++-- src/core/remote/credentials.js | 26 ++++- test/core/remote-login-command.test.js | 4 +- test/core/remote-verb-oidc.test.js | 153 +++++++++++++++++++++++++ 6 files changed, 271 insertions(+), 35 deletions(-) create mode 100644 test/core/remote-verb-oidc.test.js diff --git a/src/core/cli/remote_commands.js b/src/core/cli/remote_commands.js index 82a317e..87c1cbc 100644 --- a/src/core/cli/remote_commands.js +++ b/src/core/cli/remote_commands.js @@ -7,6 +7,7 @@ import process from 'node:process' import { defaultConfigPath } from '../config/schema.js' import { readObservabilityEnv } from '../observability/env.js' import { + deriveIdentityBase, readCredentials, remoteTokenEnvVar, removeToken, @@ -238,23 +239,6 @@ async function runBrowserLogin(name, { org, noBrowser }, ctx, login) { } } -/** - * Derive the identity base `/v1/identity` from a target's MCP URL - * (LLP 0046 D6): identity is mounted at the same origin, so no second URL is - * configured. Returns `null` for an unparseable URL. - * - * @param {string} url - * @returns {string | null} - * @ref LLP 0046#d6 [implements]: identity endpoints derive from the configured remote URL origin - */ -export function deriveIdentityBase(url) { - try { - return `${new URL(url).origin}/v1/identity` - } catch { - return null - } -} - /** * Translate a server-surfaced callback `error` (D7) into a clear message. The * client never sees the user's org list, so `org_selection_required` instructs diff --git a/src/core/mcp/client.js b/src/core/mcp/client.js index 2b671bf..6b97b3a 100644 --- a/src/core/mcp/client.js +++ b/src/core/mcp/client.js @@ -67,7 +67,12 @@ export function createHttpMcpClient(opts) { } if (!res.ok) { if (res.status === 401 || res.status === 403) { - throw new Error(`remote rejected the credential (HTTP ${res.status}) - re-run 'hyp remote login'`) + // Tag the error so the attach path can recognize an auth rejection and + // attempt a single silent refresh + retry (LLP 0046 D5). + throw Object.assign( + new Error(`remote rejected the credential (HTTP ${res.status}) - re-run 'hyp remote login'`), + { authError: true, status: res.status }, + ) } const text = await safeText(res) throw new Error(`MCP ${method} failed: HTTP ${res.status}${text ? ` - ${text.slice(0, 200)}` : ''}`) diff --git a/src/core/mcp/remote_verb.js b/src/core/mcp/remote_verb.js index 89f91b3..477287e 100644 --- a/src/core/mcp/remote_verb.js +++ b/src/core/mcp/remote_verb.js @@ -1,7 +1,8 @@ // @ts-check import { readObservabilityEnv } from '../observability/env.js' -import { resolveToken } from '../remote/credentials.js' +import { deriveIdentityBase, resolveAccessJwt } from '../remote/credentials.js' +import { InvalidGrantError } from '../remote/identity_client.js' import { createHttpMcpClient } from './client.js' /** @@ -34,24 +35,97 @@ export async function runRemoteVerb({ verb, params, target, ctx }) { } const stateDir = readObservabilityEnv(ctx.env).stateDir - const resolved = await resolveToken({ target, env: ctx.env, stateDir }) + const identityBase = deriveIdentityBase(entry.url) ?? undefined + /** @type {Awaited>} */ + let resolved + try { + // The initial resolve can itself refresh (and fail) when the stored JWT is + // already stale; map an invalid_grant here too, not only on the 401 retry. + resolved = await resolveAccessJwt({ target, env: ctx.env, stateDir, identityBase }) + } catch (err) { + return mapRefreshError(err, target) + } if (!resolved.ok) { return { ok: false, error: resolved.error, exitCode: 2 } } - const client = createHttpMcpClient({ url: entry.url, token: resolved.token }) + /** Run one full attach attempt with a given bearer token. */ + const attempt = (/** @type {string} */ token) => callRemoteTool({ url: entry.url, token, verb, params }) + try { - await client.initialize() - const toolResult = await client.callTool(verb.tool, params) - if (toolResult?.isError) { - const text = firstTextContent(toolResult) ?? 'remote tool reported an error' - return { ok: false, error: text, exitCode: 1 } - } - const structured = toolResult?.structuredContent ?? parseTextContent(toolResult) - return { ok: true, result: structured, notices: serverCapNotices(structured) } + return await attempt(resolved.token) } catch (err) { - return { ok: false, error: err instanceof Error ? err.message : String(err), exitCode: 1 } + // A live 401/403 on an OIDC session means the cached JWT is stale or was + // revoked early. Refresh once and retry before surfacing (LLP 0046 D5). An + // env override or a static token cannot be refreshed, so it surfaces as-is. + const refreshable = resolved.kind === 'oidc' && resolved.source === 'file' + if (!refreshable || !isAuthError(err)) { + return { ok: false, error: err instanceof Error ? err.message : String(err), exitCode: 1 } + } + /** @type {Awaited>} */ + let refreshed + try { + refreshed = await resolveAccessJwt({ target, env: ctx.env, stateDir, identityBase, forceRefresh: true }) + } catch (refreshErr) { + return mapRefreshError(refreshErr, target) + } + if (!refreshed.ok) { + return { ok: false, error: refreshed.error, exitCode: 2 } + } + try { + return await attempt(refreshed.token) + } catch (retryErr) { + return { ok: false, error: retryErr instanceof Error ? retryErr.message : String(retryErr), exitCode: 1 } + } + } +} + +/** + * One remote attach attempt: initialize, call the tool, and shape the result. + * Throws on transport/auth errors (so the caller can retry); a tool-level + * `isError` is a normal return, not a throw (it is not retryable). + * + * @param {{ url: string, token: string, verb: VerbRegistration, params: Record }} args + * @returns {Promise<{ ok: true, result: unknown, notices: string[] } | { ok: false, error: string, exitCode: number }>} + */ +async function callRemoteTool({ url, token, verb, params }) { + const client = createHttpMcpClient({ url, token }) + await client.initialize() + const toolResult = await client.callTool(verb.tool, params) + if (toolResult?.isError) { + const text = firstTextContent(toolResult) ?? 'remote tool reported an error' + return { ok: false, error: text, exitCode: 1 } } + const structured = toolResult?.structuredContent ?? parseTextContent(toolResult) + return { ok: true, result: structured, notices: serverCapNotices(structured) } +} + +/** + * Map a refresh failure to a verb result: a typed `invalid_grant` (the refresh + * row was revoked or expired) becomes the re-login guidance (LLP 0046 D5); any + * other failure is a generic error. + * + * @param {unknown} err + * @param {string} target + * @returns {{ ok: false, error: string, exitCode: number }} + */ +function mapRefreshError(err, target) { + if (err instanceof InvalidGrantError) { + return { ok: false, error: `remote session expired - re-run 'hyp remote login ${target}'`, exitCode: 2 } + } + return { ok: false, error: err instanceof Error ? err.message : String(err), exitCode: 1 } +} + +/** + * Whether a thrown error is an MCP-client auth rejection (a 401/403 tagged by + * `client.js`), eligible for a one-shot refresh + retry. + * + * @param {unknown} err + * @returns {boolean} + */ +function isAuthError(err) { + return !!err && typeof err === 'object' && + (/** @type {any} */ (err).authError === true || /** @type {any} */ (err).status === 401 || /** @type {any} */ (err).status === 403) } /** diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index 3120a42..890de6d 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -33,6 +33,24 @@ const CREDENTIALS_BASENAME = 'remote-credentials.json' /** Refresh an `oidc` access JWT once it is within this window of expiry. */ const REFRESH_SKEW_MS = 60 * 1000 +/** + * Derive the identity base `/v1/identity` from a target's MCP URL + * (LLP 0046 D6): identity is mounted at the same origin, so no second URL is + * configured. Returns `null` for an unparseable URL. Shared by the login + * command and the attach path. + * + * @param {string} url + * @returns {string | null} + * @ref LLP 0046#d6 [implements]: identity endpoints derive from the configured remote URL origin + */ +export function deriveIdentityBase(url) { + try { + return `${new URL(url).origin}/v1/identity` + } catch { + return null + } +} + /** * @param {string} stateDir * @returns {string} @@ -219,11 +237,12 @@ export async function resolveToken({ target, env, stateDir }) { * identityBase?: string, * now?: number, * fetchImpl?: typeof fetch, + * forceRefresh?: boolean, * }} args * @returns {Promise<{ ok: true, token: string, source: 'env' | 'file', kind?: 'static' | 'oidc' } | { ok: false, error: string }>} * @ref LLP 0046#d5 [implements]: silent refresh on the attach path; env override still wins */ -export async function resolveAccessJwt({ target, env, stateDir, identityBase, now = Date.now(), fetchImpl }) { +export async function resolveAccessJwt({ target, env, stateDir, identityBase, now = Date.now(), fetchImpl, forceRefresh = false }) { const envName = remoteTokenEnvVar(target) const fromEnv = env[envName] if (typeof fromEnv === 'string' && fromEnv.length > 0) { @@ -241,8 +260,9 @@ export async function resolveAccessJwt({ target, env, stateDir, identityBase, no : { ok: false, error: noTokenError(target, envName) } } - // oidc: refresh if the cached JWT is missing, unparseable, or near expiry. - if (!isFresh(entry, now)) { + // oidc: refresh if forced (a live 401 retry), or the cached JWT is missing, + // unparseable, or near expiry. + if (forceRefresh || !isFresh(entry, now)) { if (!identityBase) { return { ok: false, error: `cannot refresh '${target}': no identity endpoint resolved` } } diff --git a/test/core/remote-login-command.test.js b/test/core/remote-login-command.test.js index d052ea4..67d234e 100644 --- a/test/core/remote-login-command.test.js +++ b/test/core/remote-login-command.test.js @@ -6,8 +6,8 @@ import fs from 'node:fs/promises' import os from 'node:os' import path from 'node:path' -import { runRemoteLogin, deriveIdentityBase } from '../../src/core/cli/remote_commands.js' -import { readCredentials } from '../../src/core/remote/credentials.js' +import { runRemoteLogin } from '../../src/core/cli/remote_commands.js' +import { deriveIdentityBase, readCredentials } from '../../src/core/remote/credentials.js' async function tmpHome() { return fs.mkdtemp(path.join(os.tmpdir(), 'hyp-login-')) diff --git a/test/core/remote-verb-oidc.test.js b/test/core/remote-verb-oidc.test.js new file mode 100644 index 0000000..838a6ae --- /dev/null +++ b/test/core/remote-verb-oidc.test.js @@ -0,0 +1,153 @@ +// @ts-check + +/** + * @import { TestContext } from 'node:test' + */ + +import test from 'node:test' +import assert from 'node:assert/strict' +import fs from 'node:fs/promises' +import os from 'node:os' +import path from 'node:path' + +import { querySqlVerb } from '../../src/core/query/verb.js' +import { verbToCommand } from '../../src/core/cli/verb_command.js' +import { readCredentials, writeSession } from '../../src/core/remote/credentials.js' + +const cmd = verbToCommand(querySqlVerb) +const MCP_URL = 'https://hyp.internal/mcp' +const FUTURE = '2999-01-01T00:00:00Z' +const PAST = '2000-01-01T00:00:00Z' + +/** + * Install a combined fetch stub that routes the identity `/token` refresh and + * the MCP JSON-RPC against the same origin. The MCP side accepts only + * `validJwt`; anything else gets a 401, simulating a stale/revoked access JWT. + * + * @param {TestContext} t + * @param {{ validJwt: string, refreshTo?: string, refreshInvalid?: boolean }} opts + */ +function stubServers(t, opts) { + const original = globalThis.fetch + t.after(() => { globalThis.fetch = original }) + const state = { validJwt: opts.validJwt, refreshCalls: 0 } + + globalThis.fetch = /** @type {any} */ (async (/** @type {string} */ url, /** @type {any} */ init) => { + const reply = (/** @type {any} */ obj, status = 200, ct = 'application/json') => ({ + ok: status >= 200 && status < 300, + status, + headers: { get: (/** @type {string} */ k) => k.toLowerCase() === 'content-type' ? ct : (k.toLowerCase() === 'mcp-session-id' ? 'sess-1' : null) }, + text: async () => (typeof obj === 'string' ? obj : JSON.stringify(obj)), + }) + + // Identity refresh endpoint. + if (String(url).includes('/v1/identity/token')) { + state.refreshCalls++ + if (opts.refreshInvalid) return reply({ error: 'invalid_grant' }, 401) + state.validJwt = opts.refreshTo ?? state.validJwt + return reply({ access_jwt: state.validJwt, expires_at: FUTURE, org: 'acme' }) + } + + // MCP JSON-RPC. + const req = JSON.parse(init.body) + if (req.method === 'initialize') return reply({ jsonrpc: '2.0', id: req.id, result: { protocolVersion: '2025-06-18' } }) + if (req.method === 'notifications/initialized') return { ok: true, status: 202, headers: { get: () => null }, text: async () => '' } + if (req.method === 'tools/call') { + if (init.headers.authorization !== `Bearer ${state.validJwt}`) return reply({}, 401) + return reply({ jsonrpc: '2.0', id: req.id, result: { structuredContent: { columns: ['n'], rows: [{ n: 7 }] }, isError: false } }) + } + return reply({ jsonrpc: '2.0', id: req.id, error: { code: -32601, message: 'no' } }) + }) + return state +} + +async function tmpHome() { + return fs.mkdtemp(path.join(os.tmpdir(), 'hyp-attach-')) +} + +/** @param {string} hypHome */ +function ctxWith(hypHome) { + /** @type {string[]} */ const out = [] + /** @type {string[]} */ const err = [] + const ctx = /** @type {any} */ ({ + env: { HYP_HOME: hypHome }, + config: { version: 2, query: { remotes: { prod: { url: MCP_URL } } } }, + query: {}, storage: {}, + stdout: { write: (/** @type {string} */ s) => out.push(s) }, + stderr: { write: (/** @type {string} */ s) => err.push(s) }, + }) + return { ctx, out, err } +} + +test('a stale stored JWT is refreshed and persisted before the call', async (t) => { + const hypHome = await tmpHome() + const stateDir = path.join(hypHome, 'hypaware') + await writeSession(stateDir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) + const state = stubServers(t, { validJwt: 'jwt-new', refreshTo: 'jwt-new' }) + + const { ctx, out } = ctxWith(hypHome) + const code = await cmd.run(['SELECT 1', '--remote', 'prod', '--format', 'json'], ctx) + assert.equal(code, 0) + assert.deepEqual(JSON.parse(out.join('')), [{ n: 7 }]) + // Refreshed exactly once (pre-call, because the stored JWT was stale). + assert.equal(state.refreshCalls, 1) + // The new JWT was persisted. + const creds = await readCredentials(stateDir) + assert.equal(/** @type {any} */ (creds.prod).accessJwt, 'jwt-new') +}) + +test('a 401 mid-flight triggers exactly one refresh + retry', async (t) => { + const hypHome = await tmpHome() + const stateDir = path.join(hypHome, 'hypaware') + // Fresh-looking stored JWT, but the server rejects it (early revocation). + await writeSession(stateDir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-revoked', expiresAt: FUTURE, org: 'acme' }) + const state = stubServers(t, { validJwt: 'jwt-fresh', refreshTo: 'jwt-fresh' }) + + const { ctx, out } = ctxWith(hypHome) + const code = await cmd.run(['SELECT 1', '--remote', 'prod', '--format', 'json'], ctx) + assert.equal(code, 0) + assert.deepEqual(JSON.parse(out.join('')), [{ n: 7 }]) + assert.equal(state.refreshCalls, 1) +}) + +test('a refresh that fails invalid_grant surfaces the re-login guidance', async (t) => { + const hypHome = await tmpHome() + const stateDir = path.join(hypHome, 'hypaware') + await writeSession(stateDir, 'prod', { refreshToken: 'stale', accessJwt: 'jwt-revoked', expiresAt: FUTURE, org: 'acme' }) + stubServers(t, { validJwt: 'jwt-never', refreshInvalid: true }) + + const { ctx, err } = ctxWith(hypHome) + const code = await cmd.run(['SELECT 1', '--remote', 'prod', '--format', 'json'], ctx) + assert.equal(code, 2) + assert.match(err.join(''), /remote session expired - re-run 'hyp remote login prod'/) +}) + +test('a stale JWT whose pre-call refresh fails maps to re-login (not an unhandled throw)', async (t) => { + const hypHome = await tmpHome() + const stateDir = path.join(hypHome, 'hypaware') + // Stale stored JWT: the initial resolve refreshes before the call, and that + // refresh is the one that fails invalid_grant. + await writeSession(stateDir, 'prod', { refreshToken: 'stale', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) + stubServers(t, { validJwt: 'jwt-never', refreshInvalid: true }) + + const { ctx, err } = ctxWith(hypHome) + const code = await cmd.run(['SELECT 1', '--remote', 'prod', '--format', 'json'], ctx) + assert.equal(code, 2) + assert.match(err.join(''), /remote session expired - re-run 'hyp remote login prod'/) +}) + +test('a static token that 401s is not retried (cannot refresh)', async (t) => { + const hypHome = await tmpHome() + const stateDir = path.join(hypHome, 'hypaware') + // A static token the server rejects; there is nothing to refresh. + await fs.mkdir(stateDir, { recursive: true }) + const { writeToken } = await import('../../src/core/remote/credentials.js') + await writeToken(stateDir, 'prod', 'static-bad') + const state = stubServers(t, { validJwt: 'something-else' }) + + const { ctx, err } = ctxWith(hypHome) + const code = await cmd.run(['SELECT 1', '--remote', 'prod'], ctx) + assert.equal(code, 1) + assert.equal(state.refreshCalls, 0) + assert.match(err.join(''), /re-run 'hyp remote login'/) +}) From a83a3ec949ae5d51770ae5a5479ebd0c39f89338 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 11:48:09 -0700 Subject: [PATCH 07/59] remote/oidc: hermetic smoke + LLP follow-ups (T8) (#202) * remote/oidc: hermetic smoke + LLP follow-ups (T8) - remote_oidc_login smoke: one in-process server plays both the identity surface (/v1/identity/login/start + /token, signing real per-call tokens) and the MCP endpoint (/mcp, accepting only the current JWT). A scripted opener drives the real loopback redirect. Asserts the user-visible result and the remote-oidc smoke_step telemetry across: browser login -> session stored kind: 'oidc' -> query attaches the JWT -> forced expiry drives a silent refresh + persist -> a revoked refresh row drives the re-login message. - Fix surfaced by the smoke: when the stored JWT is already stale, the initial resolveAccessJwt refreshes pre-call and can throw invalid_grant outside the 401-retry try/catch. remote_verb now maps a refresh failure to the re-login guidance on both the initial-resolve and the mid-flight-401 paths (shared mapRefreshError). Added a unit test for the stale-initial-resolve path. - LLP 0033: its credential-store and attach sections now note the kind discriminator and the silent-refresh + 401-retry behavior, pointing at LLP 0046 D4/D5. - Promote LLP 0046/0047/0048 Draft -> Accepted now that the milestones land. All @refs resolve (ref-check clean). * remote/oidc: smoke convention + cleanup follow-ups Review follow-ups (PR #202): - hoist the inline import('node:http').IncomingMessage type to the top-level @import block (repo hard rule: no inline import() types). - drop runQuery's unused cmd param. - move obs.shutdown() into the finally and force-close keep-alive sockets (closeAllConnections) so the server does not wait on idle timeouts. --- .../smoke/flows/remote_oidc_login.js | 252 ++++++++++++++++++ llp/0033-remote-query-attach.spec.md | 15 +- llp/0046-oidc-login-client.decision.md | 2 +- llp/0047-oidc-login-client.design.md | 2 +- llp/0048-oidc-login-client.plan.md | 2 +- 5 files changed, 267 insertions(+), 6 deletions(-) create mode 100644 hypaware-core/smoke/flows/remote_oidc_login.js diff --git a/hypaware-core/smoke/flows/remote_oidc_login.js b/hypaware-core/smoke/flows/remote_oidc_login.js new file mode 100644 index 0000000..39dd9f2 --- /dev/null +++ b/hypaware-core/smoke/flows/remote_oidc_login.js @@ -0,0 +1,252 @@ +// @ts-check + +import http from 'node:http' +import process from 'node:process' + +import { installObservability } from '../../../src/core/observability/index.js' +import { loginWithBrowser } from '../../../src/core/remote/oidc_login.js' +import { + readCredentials, + writeSession, + remoteCredentialsPath, +} from '../../../src/core/remote/credentials.js' +import { querySqlVerb } from '../../../src/core/query/verb.js' +import { verbToCommand } from '../../../src/core/cli/verb_command.js' + +/** + * @import { AddressInfo } from 'node:net' + * @import { IncomingMessage } from 'node:http' + */ + +/** + * Hermetic smoke for the multi-tenant OIDC client login (LLP 0046-0048). One + * in-process server plays both roles against a single origin: + * + * - the identity surface `/v1/identity/{login/start,token}` (signs + * real per-call tokens), and + * - the MCP endpoint `/mcp` (accepts only the current access JWT). + * + * The flow drives the full chunk-2 path in a temp HYP_HOME: + * + * browser login (scripted opener -> loopback redirect) -> session stored as + * kind: 'oidc' -> query attaches the access JWT -> a forced expiry drives a + * silent refresh + persist -> a revoked refresh row drives the re-login + * message. + * + * Asserts both the user-visible result and the `smoke_step` telemetry the + * remote-oidc modules emit (Log-Driven Development). + * + * @param {{ harness: any, expect: any }} args + * @ref LLP 0046#d5 [tests]: silent refresh + re-login on the attach path, end to end against a stub identity server + */ +export async function run({ harness, expect }) { + const obs = installObservability() + if (!obs.tracer.provider) { + throw new Error('remote_oidc_login: tracer provider not installed - expected HYP_DEV_TELEMETRY=1') + } + + const server = await startStubServer() + const origin = `http://127.0.0.1:${server.port}` + const mcpUrl = `${origin}/mcp` + const identityBase = `${origin}/v1/identity` + const stateDir = harness.stateDir + + try { + // ----- smoke_step: browser_login ----- + // A scripted opener: instead of launching a browser, GET the start URL. + // The stub 302s to the loopback redirect_uri with a code, which the real + // loopback receiver catches. + const openBrowser = (/** @type {string} */ url) => { + fetch(url).catch(() => {}) + return true + } + const session = await loginWithBrowser({ identityBase, org: 'acme', openBrowser }) + await writeSession(stateDir, 'prod', session) + + const afterLogin = await readCredentials(stateDir) + expect.that('login: session stored as kind oidc', afterLogin.prod?.kind, (v) => v === 'oidc') + expect.that('login: resolved org is acme', session.org, (v) => v === 'acme') + expect.that('login: a refresh token was issued', session.refreshToken, (v) => typeof v === 'string' && v.length > 0) + + // ----- smoke_step: attach_query ----- + const cmd = verbToCommand(querySqlVerb) + const first = runQuery(mcpUrl) + let code = await cmd.run(['SELECT 1', '--remote', 'prod', '--format', 'json'], first.ctx) + expect.that('attach: query exits 0 with the access JWT', code, (v) => v === 0) + expect.that('attach: rows returned', first.out.join(''), (s) => s.includes('"n": 1') || s.includes('"n":1')) + expect.that('attach: no refresh needed on a fresh session', server.state.refreshCalls, (v) => v === 0) + + // ----- smoke_step: silent_refresh ----- + // Force the stored access JWT to look expired; the next query must refresh. + await forceExpiry(stateDir, 'prod') + const second = runQuery(mcpUrl) + code = await cmd.run(['SELECT 1', '--remote', 'prod', '--format', 'json'], second.ctx) + expect.that('refresh: query still exits 0 after silent refresh', code, (v) => v === 0) + expect.that('refresh: exactly one refresh happened', server.state.refreshCalls, (v) => v === 1) + const afterRefresh = await readCredentials(stateDir) + expect.that('refresh: a new access JWT was persisted', afterRefresh.prod?.kind === 'oidc' && /** @type {any} */ (afterRefresh.prod).accessJwt !== /** @type {any} */ (afterLogin.prod).accessJwt, (v) => v === true) + + // ----- smoke_step: revoked_refresh ----- + // Revoke the refresh row and force expiry again: the attach path must + // surface the re-login guidance, not a generic error. + server.state.refreshRevoked = true + await forceExpiry(stateDir, 'prod') + const third = runQuery(mcpUrl) + code = await cmd.run(['SELECT 1', '--remote', 'prod', '--format', 'json'], third.ctx) + expect.that('revoked: query exits nonzero', code, (v) => v !== 0) + expect.that('revoked: re-login guidance surfaced', third.err.join(''), (s) => /re-run 'hyp remote login prod'/.test(s)) + } finally { + // Force-close keep-alive sockets so close() does not wait on idle timeouts, + // then flush telemetry even if an assertion above threw. + if (typeof server.closeAllConnections === 'function') server.closeAllConnections() + await new Promise((resolve) => server.close(() => resolve(undefined))) + await obs.shutdown() + } + + // ----- telemetry: the remote-oidc path emitted its smoke_step markers ----- + const logs = await expect.logs() + const oidcLogs = logs.filter((l) => l.attributes?.hyp_component === 'remote-oidc') + expect.that('telemetry: remote-oidc logs were emitted', oidcLogs, (rows) => rows.length > 0) + const steps = new Set(oidcLogs.map((l) => l.attributes?.smoke_step).filter(Boolean)) + expect.that('telemetry: login_complete step present', steps.has('login_complete'), (v) => v === true) + expect.that('telemetry: loopback_bind step present', steps.has('loopback_bind'), (v) => v === true) +} + +/** + * Build a ctx for the query verb against `mcpUrl`, with captured streams and a + * configured `prod` target. The credential resolves from HYP_HOME's state dir. + * + * @param {string} mcpUrl + */ +function runQuery(mcpUrl) { + /** @type {string[]} */ const out = [] + /** @type {string[]} */ const err = [] + const ctx = /** @type {any} */ ({ + env: { HYP_HOME: process.env.HYP_HOME }, + config: { version: 2, query: { remotes: { prod: { url: mcpUrl } } } }, + query: {}, storage: {}, + stdout: { write: (/** @type {string} */ s) => out.push(s) }, + stderr: { write: (/** @type {string} */ s) => err.push(s) }, + }) + return { ctx, out, err } +} + +/** + * Rewrite a target's stored OIDC record so its access JWT reads as expired, + * forcing the next resolve to refresh. + * + * @param {string} stateDir + * @param {string} target + */ +async function forceExpiry(stateDir, target) { + const fs = await import('node:fs/promises') + const creds = await readCredentials(stateDir) + const rec = /** @type {any} */ (creds[target]) + rec.expiresAt = '2000-01-01T00:00:00Z' + await fs.writeFile(remoteCredentialsPath(stateDir), JSON.stringify(creds, null, 2) + '\n', { mode: 0o600 }) +} + +/** + * Start the combined identity + MCP stub server. Signs a fresh access JWT on + * each grant; the MCP side accepts only the latest one. + * + * @returns {Promise<{ port: number, state: any, close: (cb: () => void) => void, closeAllConnections: () => void }>} + */ +function startStubServer() { + const state = { jwtSeq: 0, validJwt: '', refreshToken: 'rt-smoke', refreshRevoked: false, refreshCalls: 0 } + const mint = () => { + state.jwtSeq += 1 + state.validJwt = `jwt-${state.jwtSeq}` + return state.validJwt + } + const FUTURE = '2999-01-01T00:00:00Z' + + const server = http.createServer((req, res) => { + const url = new URL(req.url ?? '/', 'http://127.0.0.1') + const json = (/** @type {any} */ obj, status = 200) => { + res.writeHead(status, { 'content-type': 'application/json' }) + res.end(JSON.stringify(obj)) + } + + // Identity: browser start -> 302 to the loopback redirect with a code. + if (req.method === 'GET' && url.pathname === '/v1/identity/login/start') { + const redirectUri = url.searchParams.get('redirect_uri') ?? '' + const stateParam = url.searchParams.get('state') ?? '' + const loc = `${redirectUri}?code=auth-code&state=${encodeURIComponent(stateParam)}` + res.writeHead(302, { location: loc }) + res.end() + return + } + + // Identity: token endpoint (authorization_code + refresh_token grants). + if (req.method === 'POST' && url.pathname === '/v1/identity/token') { + readBody(req).then((body) => { + const grant = body.grant_type + if (grant === 'authorization_code') { + return json({ session_id: 'sess-1', refresh_token: state.refreshToken, access_jwt: mint(), expires_at: FUTURE, org: 'acme' }) + } + if (grant === 'refresh_token') { + state.refreshCalls += 1 + if (state.refreshRevoked) return json({ error: 'invalid_grant' }, 401) + return json({ access_jwt: mint(), expires_at: FUTURE, org: 'acme' }) + } + return json({ error: 'unsupported_grant_type' }, 400) + }) + return + } + + // MCP: accept only the current access JWT. + if (req.method === 'POST' && url.pathname === '/mcp') { + readBody(req).then((rpc) => { + if (rpc.method === 'initialize') { + res.writeHead(200, { 'content-type': 'application/json', 'mcp-session-id': 'mcp-1' }) + return res.end(JSON.stringify({ jsonrpc: '2.0', id: rpc.id, result: { protocolVersion: '2025-06-18' } })) + } + if (rpc.method === 'notifications/initialized') { + res.writeHead(202) + return res.end() + } + if (rpc.method === 'tools/call') { + const auth = req.headers['authorization'] + if (auth !== `Bearer ${state.validJwt}`) return json({ jsonrpc: '2.0', id: rpc.id }, 401) + return json({ jsonrpc: '2.0', id: rpc.id, result: { structuredContent: { columns: ['n'], rows: [{ n: 1 }] }, isError: false } }) + } + return json({ jsonrpc: '2.0', id: rpc.id, error: { code: -32601, message: 'no' } }) + }) + return + } + + json({ error: 'not_found' }, 404) + }) + + return new Promise((resolve) => { + server.listen(0, '127.0.0.1', () => { + const addr = /** @type {AddressInfo} */ (server.address()) + resolve({ + port: addr.port, + state, + close: (/** @type {() => void} */ cb) => server.close(cb), + closeAllConnections: () => server.closeAllConnections?.(), + }) + }) + }) +} + +/** + * @param {IncomingMessage} req + * @returns {Promise} + */ +function readBody(req) { + return new Promise((resolve) => { + /** @type {Buffer[]} */ const chunks = [] + req.on('data', (c) => chunks.push(c)) + req.on('end', () => { + const text = Buffer.concat(chunks).toString('utf8') + try { + resolve(text ? JSON.parse(text) : {}) + } catch { + resolve({}) + } + }) + }) +} diff --git a/llp/0033-remote-query-attach.spec.md b/llp/0033-remote-query-attach.spec.md index 8032357..e4055ed 100644 --- a/llp/0033-remote-query-attach.spec.md +++ b/llp/0033-remote-query-attach.spec.md @@ -89,13 +89,22 @@ a free invariant. The URL is non-secret and committable; the token is not config (secrets-never-in-config, server LLP 0000). Resolution for the human-CLI path: - **Storage:** `/remote-credentials.json`, mode `0600`, atomic tmp+rename, a - single map `{ "": { "token": "…" } }` (kernel-managed state, + single map `{ "": { "kind": "…", … } }` (kernel-managed state, [LLP 0004](./0004-activation-and-paths.spec.md); mirrors `central`'s - `identity.json` single-file precedent). One `hyp remote login` per server. + `identity.json` single-file precedent). One `hyp remote login` per server. Each + record is **discriminated by `kind`** ([LLP 0046 D4](./0046-oidc-login-client.decision.md#d4)): + a `static` record is the bare `{ token }` of this spec; an `oidc` record carries + a refresh token plus a cached short-lived access JWT from a browser login. A + legacy `token`-only record (no `kind`) reads as `static`, so existing files keep + working without a rewrite. - **Resolution at query time:** per-target env `HYP_REMOTE_TOKEN_` (CI/ephemeral) → stored file → error (`no token for '' — run 'hyp remote login '`). A *per-target* env var so a stored var can never - silently authenticate the wrong server. + silently authenticate the wrong server. For an `oidc` record the attach path is + **session-aware** ([LLP 0046 D5](./0046-oidc-login-client.decision.md#d5)): it + silently refreshes a near-expiry access JWT, and on a live `401`/`403` it + refreshes once and retries before surfacing; a refresh that fails `invalid_grant` + surfaces the same re-login guidance, now meaning re-run the browser flow. - AI clients that install the endpoint directly hold the token in **their own** MCP config — `hyp`'s store is only for the human-CLI client path. diff --git a/llp/0046-oidc-login-client.decision.md b/llp/0046-oidc-login-client.decision.md index 5137ae5..f500a53 100644 --- a/llp/0046-oidc-login-client.decision.md +++ b/llp/0046-oidc-login-client.decision.md @@ -1,7 +1,7 @@ # LLP 0046: Multi-tenant OIDC login on the client **Type:** Decision -**Status:** Draft +**Status:** Accepted **Systems:** CLI, Onboarding, Query, MCP **Author:** Kenny / Claude **Date:** 2026-06-29 diff --git a/llp/0047-oidc-login-client.design.md b/llp/0047-oidc-login-client.design.md index cb1b968..ca63e21 100644 --- a/llp/0047-oidc-login-client.design.md +++ b/llp/0047-oidc-login-client.design.md @@ -1,7 +1,7 @@ # LLP 0047: Multi-tenant OIDC login on the client, implementation design **Type:** design -**Status:** Draft +**Status:** Accepted **Systems:** CLI, Onboarding, Query, MCP, Config **Author:** Kenny / Claude **Date:** 2026-06-29 diff --git a/llp/0048-oidc-login-client.plan.md b/llp/0048-oidc-login-client.plan.md index 9295491..b402a11 100644 --- a/llp/0048-oidc-login-client.plan.md +++ b/llp/0048-oidc-login-client.plan.md @@ -1,7 +1,7 @@ # LLP 0048: Multi-tenant OIDC login on the client, plan **Type:** plan -**Status:** Draft +**Status:** Accepted **Systems:** CLI, Onboarding, Query, MCP, Config **Author:** Kenny / Claude **Date:** 2026-06-29 From 8910de13195ad838640601e008f31cabc88113ca Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 12:31:43 -0700 Subject: [PATCH 08/59] remote/oidc: open the browser via rundll32 on win32 `cmd /c start ` treats `&` as a command separator, so the authorize URL (always multi-param: redirect_uri, code_challenge, state, ...) was truncated at the first `&`, opening a PKCE-less URL the server rejects. Spawn `rundll32 url.dll,FileProtocolHandler` directly instead, so the URL reaches the handler verbatim with no shell parsing. --- src/core/remote/open_browser.js | 6 +++++- test/core/remote-open-browser.test.js | 10 ++++++---- 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/src/core/remote/open_browser.js b/src/core/remote/open_browser.js index 743c7a9..c1681e2 100644 --- a/src/core/remote/open_browser.js +++ b/src/core/remote/open_browser.js @@ -44,7 +44,11 @@ export function openBrowser(url, opts = {}) { */ function openerFor(platform) { if (platform === 'darwin') return { command: 'open', args: [] } - if (platform === 'win32') return { command: 'cmd', args: ['/c', 'start', ''] } + // win32: NOT `cmd /c start `. cmd treats `&` as a command separator, so + // an unquoted authorize URL (always multi-param: redirect_uri, code_challenge, + // state, ...) is truncated at the first `&`, opening a PKCE-less URL. rundll32 + // is spawned directly (no shell), so the URL reaches the handler verbatim. + if (platform === 'win32') return { command: 'rundll32', args: ['url.dll,FileProtocolHandler'] } // Treat every other Unix as freedesktop (xdg-open). A missing xdg-open // surfaces as an async spawn 'error' event (swallowed above); the caller's // loopback waits and the URL is printed, so login still completes manually. diff --git a/test/core/remote-open-browser.test.js b/test/core/remote-open-browser.test.js index 1745896..ca1f669 100644 --- a/test/core/remote-open-browser.test.js +++ b/test/core/remote-open-browser.test.js @@ -38,11 +38,13 @@ test('linux uses `xdg-open`', () => { assert.equal(calls[0].command, 'xdg-open') }) -test('win32 uses `cmd /c start`', () => { +test('win32 uses rundll32 so `&` in the URL is not a cmd separator', () => { const { spawnImpl, calls } = stubSpawn() - openBrowser('https://x/auth', { platform: 'win32', spawnImpl }) - assert.equal(calls[0].command, 'cmd') - assert.deepEqual(calls[0].args, ['/c', 'start', '', 'https://x/auth']) + // A real authorize URL has multiple `&`-joined params; cmd /c start would + // truncate it at the first `&`. rundll32 receives it as a single argv. + openBrowser('https://x/auth?a=1&b=2', { platform: 'win32', spawnImpl }) + assert.equal(calls[0].command, 'rundll32') + assert.deepEqual(calls[0].args, ['url.dll,FileProtocolHandler', 'https://x/auth?a=1&b=2']) }) test('a synchronous spawn throw returns false (caller prints the URL)', () => { From eca56c24e8bc38ce2ef44450e933052f234be54d Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 12:31:52 -0700 Subject: [PATCH 09/59] remote/oidc: refresh OIDC sessions on the stdio proxy path The `hyp mcp --remote` proxy resolved the bearer token once via the non-refreshing resolveToken and reused it for the whole long-lived proxy, so an OIDC session's short-lived access JWT expired and every forwarded message returned a generic HTTP 401 with no refresh and no re-login hint. Resolve a fresh access JWT per forwarded message via resolveAccessJwt (env/static tokens pass through unchanged), refresh once and retry on a live 401/403, and map invalid_grant to "re-run 'hyp remote login'". LLP 0033 now documents that the proxy fallback shares the session-aware path. --- llp/0033-remote-query-attach.spec.md | 5 +- src/core/mcp/proxy.js | 93 +++++++++++++++++++---- test/core/remote-proxy-oidc.test.js | 107 +++++++++++++++++++++++++++ 3 files changed, 190 insertions(+), 15 deletions(-) create mode 100644 test/core/remote-proxy-oidc.test.js diff --git a/llp/0033-remote-query-attach.spec.md b/llp/0033-remote-query-attach.spec.md index e4055ed..4f401f2 100644 --- a/llp/0033-remote-query-attach.spec.md +++ b/llp/0033-remote-query-attach.spec.md @@ -104,7 +104,10 @@ a free invariant. The URL is non-secret and committable; the token is not config **session-aware** ([LLP 0046 D5](./0046-oidc-login-client.decision.md#d5)): it silently refreshes a near-expiry access JWT, and on a live `401`/`403` it refreshes once and retries before surfacing; a refresh that fails `invalid_grant` - surfaces the same re-login guidance, now meaning re-run the browser flow. + surfaces the same re-login guidance, now meaning re-run the browser flow. The + stdio proxy fallback ([LLP 0034 §proxy-fallback](./0034-mcp-host-intrinsic.decision.md#proxy-fallback)) + shares this session-aware path, resolving a fresh JWT per forwarded message so a + long-lived proxy does not pin one short-lived access JWT. - AI clients that install the endpoint directly hold the token in **their own** MCP config — `hyp`'s store is only for the human-CLI client path. diff --git a/src/core/mcp/proxy.js b/src/core/mcp/proxy.js index 8f69742..dc101a9 100644 --- a/src/core/mcp/proxy.js +++ b/src/core/mcp/proxy.js @@ -4,7 +4,8 @@ import process from 'node:process' import { Attr, getLogger } from '../observability/index.js' import { readObservabilityEnv } from '../observability/env.js' -import { resolveToken } from '../remote/credentials.js' +import { deriveIdentityBase, resolveAccessJwt } from '../remote/credentials.js' +import { InvalidGrantError } from '../remote/identity_client.js' import { parseRpcResponse } from './client.js' import { INTERNAL_ERROR, jsonRpcError } from './jsonrpc.js' import { serveStdio } from './stdio.js' @@ -37,9 +38,18 @@ export async function runMcpProxy({ target, ctx }) { return 2 } const stateDir = readObservabilityEnv(ctx.env).stateDir - const resolved = await resolveToken({ target, env: ctx.env, stateDir }) - if (!resolved.ok) { - ctx.stderr.write(`hyp mcp: ${resolved.error}\n`) + const identityBase = deriveIdentityBase(entry.url) ?? undefined + // Fail fast (exit 2) if there is no usable credential at all. Resolve again + // per message below so an OIDC access JWT that expires mid-session refreshes, + // rather than the whole long-lived proxy holding one short-lived JWT. + try { + const probe = await resolveAccessJwt({ target, env: ctx.env, stateDir, identityBase }) + if (!probe.ok) { + ctx.stderr.write(`hyp mcp: ${probe.error}\n`) + return 2 + } + } catch (err) { + ctx.stderr.write(`hyp mcp: ${proxyAuthMessage(err, target)}\n`) return 2 } const fetchImpl = /** @type {typeof fetch | undefined} */ (globalThis.fetch) @@ -55,33 +65,72 @@ export async function runMcpProxy({ target, ctx }) { // Lifecycle line to stderr: stdout is the protocol channel. ctx.stderr.write(`hyp mcp: proxying stdio → ${entry.url} (target '${target}')\n`) + /** Forward one message to the remote with the given bearer token. */ + const forward = (/** @type {any} */ message, /** @type {string} */ token) => { + /** @type {Record} */ + const headers = { + 'content-type': 'application/json', + accept: 'application/json, text/event-stream', + authorization: `Bearer ${token}`, + ...(sessionId ? { 'mcp-session-id': sessionId } : {}), + } + return fetchImpl(entry.url, { method: 'POST', headers, body: JSON.stringify(message) }) + } + const server = { /** @param {any} message */ async handleMessage(message) { const isNotify = message && typeof message === 'object' && message.id === undefined - /** @type {Record} */ - const headers = { - 'content-type': 'application/json', - accept: 'application/json, text/event-stream', - authorization: `Bearer ${resolved.token}`, - ...(sessionId ? { 'mcp-session-id': sessionId } : {}), + const id = message?.id ?? null + + // Resolve a fresh access JWT per message: an OIDC session silently + // refreshes a near-expiry JWT, env/static tokens pass through unchanged. + let resolved + try { + resolved = await resolveAccessJwt({ target, env: ctx.env, stateDir, identityBase }) + } catch (err) { + return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, proxyAuthMessage(err, target)) + } + if (!resolved.ok) { + return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, resolved.error) } + let res try { - res = await fetchImpl(entry.url, { method: 'POST', headers, body: JSON.stringify(message) }) + res = await forward(message, resolved.token) } catch (err) { - return isNotify ? null : jsonRpcError(message?.id ?? null, INTERNAL_ERROR, `proxy fetch failed: ${err instanceof Error ? err.message : String(err)}`) + return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, `proxy fetch failed: ${err instanceof Error ? err.message : String(err)}`) + } + + // A live 401/403 on an OIDC session means the cached JWT was revoked or + // expired early: force one refresh + retry before surfacing (LLP 0046 D5). + // An env/static token cannot refresh, so it falls through to the error. + if ((res.status === 401 || res.status === 403) && resolved.kind === 'oidc' && resolved.source === 'file') { + let refreshed + try { + refreshed = await resolveAccessJwt({ target, env: ctx.env, stateDir, identityBase, forceRefresh: true }) + } catch (err) { + return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, proxyAuthMessage(err, target)) + } + if (refreshed.ok) { + try { + res = await forward(message, refreshed.token) + } catch (err) { + return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, `proxy fetch failed: ${err instanceof Error ? err.message : String(err)}`) + } + } } + const sid = res.headers?.get?.('mcp-session-id') if (sid) sessionId = sid if (isNotify) return null if (!res.ok) { - return jsonRpcError(message?.id ?? null, INTERNAL_ERROR, `remote returned HTTP ${res.status}`) + return jsonRpcError(id, INTERNAL_ERROR, `remote returned HTTP ${res.status}`) } try { return await parseRpcResponse(res, message?.id) } catch (err) { - return jsonRpcError(message?.id ?? null, INTERNAL_ERROR, err instanceof Error ? err.message : String(err)) + return jsonRpcError(id, INTERNAL_ERROR, err instanceof Error ? err.message : String(err)) } }, } @@ -100,3 +149,19 @@ export async function runMcpProxy({ target, ctx }) { log.info('mcp.proxy_stop', { [Attr.COMPONENT]: 'mcp', [Attr.OPERATION]: 'mcp.proxy' }) return 0 } + +/** + * Message for a refresh failure: a typed `invalid_grant` (the refresh row was + * revoked or expired) becomes re-login guidance (LLP 0046 D5); anything else + * surfaces as a generic error. + * + * @param {unknown} err + * @param {string} target + * @returns {string} + */ +function proxyAuthMessage(err, target) { + if (err instanceof InvalidGrantError) { + return `remote session expired - re-run 'hyp remote login ${target}'` + } + return err instanceof Error ? err.message : String(err) +} diff --git a/test/core/remote-proxy-oidc.test.js b/test/core/remote-proxy-oidc.test.js new file mode 100644 index 0000000..25f2bc5 --- /dev/null +++ b/test/core/remote-proxy-oidc.test.js @@ -0,0 +1,107 @@ +// @ts-check + +import test from 'node:test' +import assert from 'node:assert/strict' +import fs from 'node:fs/promises' +import os from 'node:os' +import path from 'node:path' +import { Readable } from 'node:stream' + +import { runMcpProxy } from '../../src/core/mcp/proxy.js' +import { writeSession, readCredentials } from '../../src/core/remote/credentials.js' + +const MCP_URL = 'https://hyp.internal/mcp' +const TOKEN_URL = 'https://hyp.internal/v1/identity/token' +const FUTURE = '2999-01-01T00:00:00Z' + +async function tmpHome() { + return fs.mkdtemp(path.join(os.tmpdir(), 'hyp-proxy-')) +} + +/** + * @param {any} obj + * @param {number} status + */ +function reply(obj, status = 200) { + return { + ok: status >= 200 && status < 300, + status, + headers: { get: (/** @type {string} */ k) => (k.toLowerCase() === 'content-type' ? 'application/json' : null) }, + text: async () => (typeof obj === 'string' ? obj : JSON.stringify(obj)), + } +} + +/** + * Build a ctx whose stdin yields the given JSON-RPC lines then EOFs, with + * captured stdout/stderr and a configured `prod` target. + * + * @param {{ hypHome: string, lines: any[] }} opts + */ +function makeCtx({ hypHome, lines }) { + /** @type {string[]} */ const out = [] + /** @type {string[]} */ const err = [] + const stdin = Readable.from(lines.map((m) => JSON.stringify(m) + '\n')) + const ctx = /** @type {any} */ ({ + env: { HYP_HOME: hypHome }, + config: { version: 2, query: { remotes: { prod: { url: MCP_URL } } } }, + stdin, + stdout: { write: (/** @type {string} */ s) => out.push(s) }, + stderr: { write: (/** @type {string} */ s) => err.push(s) }, + }) + return { ctx, out, err } +} + +test('proxy refreshes an oidc session on a live 401 and retries (no longer dies on a stale JWT)', async (t) => { + const original = globalThis.fetch + t.after(() => { globalThis.fetch = original }) + + const hypHome = await tmpHome() + const stateDir = path.join(hypHome, 'hypaware') + // A non-expired record (so the startup probe does not refresh), but the MCP + // side rejects the cached JWT: it is the live-401 path that must refresh. + await writeSession(stateDir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-old', expiresAt: FUTURE, org: 'acme' }) + + let refreshCalls = 0 + const validJwt = 'jwt-new' + globalThis.fetch = /** @type {any} */ (async (/** @type {string} */ url, /** @type {any} */ init) => { + if (url === TOKEN_URL) { + refreshCalls += 1 + return reply({ access_jwt: validJwt, expires_at: FUTURE, org: 'acme' }) + } + const body = JSON.parse(init.body) + if (init.headers.authorization !== `Bearer ${validJwt}`) return reply({ jsonrpc: '2.0', id: body.id }, 401) + return reply({ jsonrpc: '2.0', id: body.id, result: { ok: true } }) + }) + + const { ctx, out } = makeCtx({ hypHome, lines: [{ jsonrpc: '2.0', id: 1, method: 'tools/call', params: { name: 'x' } }] }) + const code = await runMcpProxy({ target: 'prod', ctx }) + + assert.equal(code, 0) + assert.equal(refreshCalls, 1) + assert.match(out.join(''), /"result"/) + assert.match(out.join(''), /"ok":\s*true/) + // The refreshed JWT was persisted back to the 0600 store. + const creds = await readCredentials(stateDir) + assert.equal(/** @type {any} */ (creds.prod).accessJwt, validJwt) +}) + +test('proxy surfaces re-login guidance when the refresh is rejected (invalid_grant)', async (t) => { + const original = globalThis.fetch + t.after(() => { globalThis.fetch = original }) + + const hypHome = await tmpHome() + const stateDir = path.join(hypHome, 'hypaware') + await writeSession(stateDir, 'prod', { refreshToken: 'stale', accessJwt: 'jwt-old', expiresAt: FUTURE, org: 'acme' }) + + globalThis.fetch = /** @type {any} */ (async (/** @type {string} */ url, /** @type {any} */ init) => { + if (url === TOKEN_URL) return reply({ error: 'invalid_grant' }, 401) + const body = JSON.parse(init.body) + return reply({ jsonrpc: '2.0', id: body.id }, 401) // always reject the cached JWT + }) + + const { ctx, out } = makeCtx({ hypHome, lines: [{ jsonrpc: '2.0', id: 1, method: 'tools/call', params: { name: 'x' } }] }) + const code = await runMcpProxy({ target: 'prod', ctx }) + + assert.equal(code, 0) // the proxy stays up; the error rides back as a JSON-RPC error + assert.match(out.join(''), /re-run 'hyp remote login prod'/) +}) From 841cce926ec1258e892dd5f3be7999f92028b943 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 12:32:03 -0700 Subject: [PATCH 10/59] remote/oidc: tolerate a refresh response that omits org refreshSession hard-required `org`, so a refresh grant that re-mints only the access JWT (org is fixed for the refresh token's life) threw a misleading "missing org" error and no near-expiry session could refresh. Treat org as optional on refresh and keep the org already stored. Also share trimSlash between identity_client and the start-URL builder, and correct the resolveToken doc now that the proxy refreshes too. --- src/core/remote/credentials.js | 10 ++++++---- src/core/remote/identity_client.js | 16 +++++++++++++--- src/core/remote/oidc_login.js | 4 ++-- test/core/remote-credentials-oidc.test.js | 15 +++++++++++++++ test/core/remote-identity-client.test.js | 6 ++++++ 5 files changed, 42 insertions(+), 9 deletions(-) diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index 890de6d..2738ca4 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -200,9 +200,9 @@ export async function removeToken(stateDir, target) { * Resolve a target's bearer token at query time, **without** session-aware * refresh. Order: per-target env var (CI/ephemeral) → stored file → error. An * env override never falls through to the file (LLP 0033 §credentials). For an - * `oidc` record this returns the cached access JWT as-is; the attach path uses - * {@link resolveAccessJwt} for refresh. Kept for the stdio proxy and other - * non-refreshing callers. + * `oidc` record this returns the cached access JWT as-is. Both the query attach + * path and the stdio proxy now use {@link resolveAccessJwt} for session-aware + * refresh; this lower-level reader is kept for any non-refreshing caller. * * @param {{ target: string, env: NodeJS.ProcessEnv, stateDir: string }} args * @returns {Promise<{ ok: true, token: string, source: 'env' | 'file' } | { ok: false, error: string }>} @@ -268,7 +268,9 @@ export async function resolveAccessJwt({ target, env, stateDir, identityBase, no } const refreshed = await refreshSession({ identityBase, refreshToken: entry.refreshToken, fetchImpl }) /** @type {RemoteOidcRecord} */ - const next = { ...entry, accessJwt: refreshed.accessJwt, expiresAt: refreshed.expiresAt, org: refreshed.org } + // `org` is fixed for the refresh token's life; keep the stored one when the + // refresh response omits it (refreshSession returns '' in that case). + const next = { ...entry, accessJwt: refreshed.accessJwt, expiresAt: refreshed.expiresAt, org: refreshed.org || entry.org } await writeSession(stateDir, target, next) return { ok: true, token: refreshed.accessJwt, source: 'file', kind: 'oidc' } } diff --git a/src/core/remote/identity_client.js b/src/core/remote/identity_client.js index 800149b..ae6cff7 100644 --- a/src/core/remote/identity_client.js +++ b/src/core/remote/identity_client.js @@ -60,7 +60,11 @@ export async function refreshSession({ identityBase, refreshToken, fetchImpl }) return { accessJwt: str(json.access_jwt, 'access_jwt'), expiresAt: str(json.expires_at, 'expires_at'), - org: str(json.org, 'org'), + // The refresh grant only has to re-mint the access JWT; `org` is fixed for + // the life of the refresh token. Treat it as optional here and let the + // caller keep the org it already stored, so a server that omits it on + // refresh does not turn every silent refresh into a hard error. + org: typeof json.org === 'string' ? json.org : '', } } @@ -122,8 +126,14 @@ function str(v, field) { return v } -/** @param {string} base @returns {string} */ -function trimSlash(base) { +/** + * Strip trailing slashes from an identity base so `${base}/token` never + * double-slashes. Shared with the login orchestrator's start-URL builder. + * + * @param {string} base + * @returns {string} + */ +export function trimSlash(base) { return base.replace(/\/+$/, '') } diff --git a/src/core/remote/oidc_login.js b/src/core/remote/oidc_login.js index 4435d3a..96f1927 100644 --- a/src/core/remote/oidc_login.js +++ b/src/core/remote/oidc_login.js @@ -3,7 +3,7 @@ import crypto from 'node:crypto' import { Attr, getLogger } from '../observability/index.js' -import { exchangeCode } from './identity_client.js' +import { exchangeCode, trimSlash } from './identity_client.js' import { startLoopbackReceiver } from './loopback.js' import { openBrowser as defaultOpenBrowser } from './open_browser.js' import { createPkcePair } from './pkce.js' @@ -98,7 +98,7 @@ export async function loginWithBrowser({ * @returns {string} */ export function buildStartUrl({ identityBase, redirectUri, challenge, state, org }) { - const url = new URL(`${identityBase.replace(/\/+$/, '')}/login/start`) + const url = new URL(`${trimSlash(identityBase)}/login/start`) url.searchParams.set('redirect_uri', redirectUri) url.searchParams.set('code_challenge', challenge) url.searchParams.set('code_challenge_method', 'S256') diff --git a/test/core/remote-credentials-oidc.test.js b/test/core/remote-credentials-oidc.test.js index 28c3c89..c2d7af3 100644 --- a/test/core/remote-credentials-oidc.test.js +++ b/test/core/remote-credentials-oidc.test.js @@ -131,6 +131,21 @@ test('resolveAccessJwt refreshes a stale oidc JWT and persists the new one', asy assert.equal(/** @type {any} */ (creds.prod).expiresAt, FUTURE) }) +test('resolveAccessJwt keeps the stored org when a refresh response omits it', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) + // A refresh grant that re-mints only the access JWT (no org field) must not + // wipe the stored org or fail; org is fixed for the refresh token's life. + const fetchImpl = /** @type {any} */ (async () => ({ + ok: true, status: 200, + text: async () => JSON.stringify({ access_jwt: 'jwt-new', expires_at: FUTURE }), + })) + const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }) + assert.equal(/** @type {any} */ (r).token, 'jwt-new') + const creds = await readCredentials(dir) + assert.equal(/** @type {any} */ (creds.prod).org, 'acme') +}) + test('resolveAccessJwt errors with login guidance when no record exists', async () => { const dir = await tmpState() const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir }) diff --git a/test/core/remote-identity-client.test.js b/test/core/remote-identity-client.test.js index f2c6ba6..069c3a3 100644 --- a/test/core/remote-identity-client.test.js +++ b/test/core/remote-identity-client.test.js @@ -53,6 +53,12 @@ test('refreshSession posts the refresh_token grant and maps the response', async assert.deepEqual(JSON.parse(calls[0].init.body), { grant_type: 'refresh_token', refresh_token: 'rt-1' }) }) +test('refreshSession tolerates a response that omits org (returns org: "")', async () => { + const { fetchImpl } = stubFetch({ body: { access_jwt: 'jwt-2', expires_at: '2026-06-29T13:00:00Z' } }) + const refreshed = await refreshSession({ identityBase: 'https://hyp.internal/v1/identity', refreshToken: 'rt-1', fetchImpl }) + assert.deepEqual(refreshed, { accessJwt: 'jwt-2', expiresAt: '2026-06-29T13:00:00Z', org: '' }) +}) + test('a 401 invalid_grant surfaces a typed InvalidGrantError', async () => { const { fetchImpl } = stubFetch({ status: 401, body: { error: 'invalid_grant' } }) await assert.rejects( From 044b3d71e2ab8ca56dffc2769aebfeb55c8d1088 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 12:32:17 -0700 Subject: [PATCH 11/59] remote/oidc: point empty-stdin login at --browser; unify arg parsing Any non-TTY stdin (a wrapper, an IDE terminal, /dev/null) routes login to the static path, so an empty stdin failed with a bare "empty token" even though browser sign-in is the default. The error now points at --browser. Also unify positional parsing: the value-flag-aware parser firstPositional becomes positionals(), used by add/remove/login alike, so a future value-taking flag on any subcommand can't misread its value as a name. --- src/core/cli/remote_commands.js | 31 ++++++++++++++++---------- test/core/remote-login-command.test.js | 14 ++++++++++++ 2 files changed, 33 insertions(+), 12 deletions(-) diff --git a/src/core/cli/remote_commands.js b/src/core/cli/remote_commands.js index 87c1cbc..9cd1a51 100644 --- a/src/core/cli/remote_commands.js +++ b/src/core/cli/remote_commands.js @@ -55,8 +55,7 @@ export async function runRemoteHelp(argv, ctx) { * @ref LLP 0033#commands [implements]: `remote add` is a local-layer writer; URL in config, token never in config */ export async function runRemoteAdd(argv, ctx) { - const positional = argv.filter((a) => !a.startsWith('-')) - const [name, url] = positional + const [name, url] = positionals(argv) if (!name || !url) { ctx.stderr.write('usage: hyp remote add \n') return 2 @@ -112,7 +111,7 @@ export async function runRemoteLogin(argv, ctx, deps = {}) { // The target name is the first positional. Skip the VALUE slot of a // value-taking flag so e.g. `login --org acme` (name omitted) is not misread // as the target 'acme'. - const name = firstPositional(argv, new Set(['--token-file', '--org'])) + const name = positionals(argv, new Set(['--token-file', '--org']))[0] if (!name) { ctx.stderr.write('usage: hyp remote login [--token-file ] [--org ] [--no-browser]\n') return 2 @@ -137,23 +136,27 @@ export async function runRemoteLogin(argv, ctx, deps = {}) { } /** - * Return the first positional argument (not a flag, not the value slot of a - * value-taking flag). + * Return the positional arguments in order, skipping flags and the value slot + * of any value-taking flag (so e.g. `--org acme` is not read as a positional). + * The one parser every `remote` subcommand uses, so a value flag added to any + * of them never misreads its value as a positional. * * @param {string[]} argv - * @param {Set} valueFlags - * @returns {string | undefined} + * @param {Set} [valueFlags] + * @returns {string[]} */ -function firstPositional(argv, valueFlags) { +function positionals(argv, valueFlags = new Set()) { + /** @type {string[]} */ + const out = [] for (let i = 0; i < argv.length; i++) { const a = argv[i] if (a.startsWith('-')) { if (valueFlags.has(a)) i++ // consume its value continue } - return a + out.push(a) } - return undefined + return out } /** @@ -180,6 +183,11 @@ async function runStaticLogin(name, tokenFile, stdin, ctx) { } if (!token) { ctx.stderr.write('hyp remote login: empty token\n') + // Any non-TTY stdin (a wrapper, an IDE terminal, /dev/null) routes here + // even when no token was piped; point at the browser flow it bypassed. + if (!tokenFile) { + ctx.stderr.write(' (to sign in with a browser instead, re-run with --browser)\n') + } return 2 } @@ -309,8 +317,7 @@ export async function runRemoteList(argv, ctx) { * @param {CommandRunContext} ctx */ export async function runRemoteRemove(argv, ctx) { - const positional = argv.filter((a) => !a.startsWith('-')) - const name = positional[0] + const name = positionals(argv)[0] if (!name) { ctx.stderr.write('usage: hyp remote remove \n') return 2 diff --git a/test/core/remote-login-command.test.js b/test/core/remote-login-command.test.js index 67d234e..8c1d8a0 100644 --- a/test/core/remote-login-command.test.js +++ b/test/core/remote-login-command.test.js @@ -142,6 +142,20 @@ test('a piped stdin token still takes the static path', async () => { assert.deepEqual(creds.prod, { kind: 'static', token: 'piped-tok' }) }) +test('an empty piped stdin (no token) points at --browser instead of just "empty token"', async () => { + const hypHome = await tmpHome() + // Non-TTY stdin that yields nothing, the way `< /dev/null` or some wrappers do. + const stdin = { + isTTY: false, + async *[Symbol.asyncIterator]() { /* no chunks */ }, + } + const { ctx, err } = await makeCtx({ hypHome, stdin }) + const code = await runRemoteLogin(['prod'], ctx, {}) + assert.equal(code, 2) + assert.match(err.join(''), /empty token/) + assert.match(err.join(''), /re-run with --browser/) +}) + test('--browser overrides a piped stdin token and takes the browser flow', async () => { const hypHome = await tmpHome() const stdin = { From 8a1a4fab4b40a5ec24469260b28fcd0bf2297fc5 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 12:32:29 -0700 Subject: [PATCH 12/59] remote/oidc: use Node's native base64url in PKCE Buffer's 'base64url' encoding produces unpadded base64url (RFC 7636) directly, so the hand-rolled +/_/= replace chain was redundant. --- src/core/remote/pkce.js | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/src/core/remote/pkce.js b/src/core/remote/pkce.js index 34b8422..6c8048e 100644 --- a/src/core/remote/pkce.js +++ b/src/core/remote/pkce.js @@ -21,17 +21,8 @@ import crypto from 'node:crypto' * @ref LLP 0046#d3 [implements]: client owns the downstream PKCE leg; verifier + S256 challenge, in-memory for one flow */ export function createPkcePair() { - const verifier = base64url(crypto.randomBytes(32)) - const challenge = base64url(crypto.createHash('sha256').update(verifier).digest()) + // `base64url` is unpadded base64url (RFC 7636 §A) natively, no replace chain. + const verifier = crypto.randomBytes(32).toString('base64url') + const challenge = crypto.createHash('sha256').update(verifier).digest().toString('base64url') return { verifier, challenge } } - -/** - * Base64url with no padding (RFC 7636 §A). - * - * @param {Buffer} buf - * @returns {string} - */ -function base64url(buf) { - return buf.toString('base64').replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '') -} From 03fba6bb9eb88b105ec4282ca958b408691af4bd Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 13:00:10 -0700 Subject: [PATCH 13/59] mcp: retry oidc refresh after notification auth failures --- src/core/mcp/client.js | 37 ++++++++++++++++++++++-------- test/core/remote-verb-oidc.test.js | 26 +++++++++++++++++++-- 2 files changed, 52 insertions(+), 11 deletions(-) diff --git a/src/core/mcp/client.js b/src/core/mcp/client.js index 6b97b3a..0a59876 100644 --- a/src/core/mcp/client.js +++ b/src/core/mcp/client.js @@ -62,18 +62,14 @@ export function createHttpMcpClient(opts) { if (notify) { // A notification gets `202 Accepted` with no body. - if (!res.ok && res.status !== 202) throw new Error(`MCP ${method} failed: HTTP ${res.status}`) + if (!res.ok && res.status !== 202) { + if (isAuthStatus(res.status)) throw authRejectionError(res.status) + throw new Error(`MCP ${method} failed: HTTP ${res.status}`) + } return undefined } if (!res.ok) { - if (res.status === 401 || res.status === 403) { - // Tag the error so the attach path can recognize an auth rejection and - // attempt a single silent refresh + retry (LLP 0046 D5). - throw Object.assign( - new Error(`remote rejected the credential (HTTP ${res.status}) - re-run 'hyp remote login'`), - { authError: true, status: res.status }, - ) - } + if (isAuthStatus(res.status)) throw authRejectionError(res.status) const text = await safeText(res) throw new Error(`MCP ${method} failed: HTTP ${res.status}${text ? ` - ${text.slice(0, 200)}` : ''}`) } @@ -108,6 +104,29 @@ export function createHttpMcpClient(opts) { } } +/** + * @param {number} status + * @returns {boolean} + */ +function isAuthStatus(status) { + return status === 401 || status === 403 +} + +/** + * Tag the error so the attach path can recognize an auth rejection and attempt + * a single silent refresh + retry. + * + * @param {number} status + * @returns {Error} + * @ref LLP 0046#d5 [implements]: live MCP 401/403 failures are tagged so attach can refresh once + */ +function authRejectionError(status) { + return Object.assign( + new Error(`remote rejected the credential (HTTP ${status}) - re-run 'hyp remote login'`), + { authError: true, status }, + ) +} + /** * Parse a Streamable-HTTP response into the JSON-RPC message matching `id`. * The server may answer a POST with a single `application/json` body or an diff --git a/test/core/remote-verb-oidc.test.js b/test/core/remote-verb-oidc.test.js index 838a6ae..486f51a 100644 --- a/test/core/remote-verb-oidc.test.js +++ b/test/core/remote-verb-oidc.test.js @@ -25,7 +25,7 @@ const PAST = '2000-01-01T00:00:00Z' * `validJwt`; anything else gets a 401, simulating a stale/revoked access JWT. * * @param {TestContext} t - * @param {{ validJwt: string, refreshTo?: string, refreshInvalid?: boolean }} opts + * @param {{ validJwt: string, refreshTo?: string, refreshInvalid?: boolean, notificationAuthStatus?: 401 | 403 }} opts */ function stubServers(t, opts) { const original = globalThis.fetch @@ -51,7 +51,10 @@ function stubServers(t, opts) { // MCP JSON-RPC. const req = JSON.parse(init.body) if (req.method === 'initialize') return reply({ jsonrpc: '2.0', id: req.id, result: { protocolVersion: '2025-06-18' } }) - if (req.method === 'notifications/initialized') return { ok: true, status: 202, headers: { get: () => null }, text: async () => '' } + if (req.method === 'notifications/initialized') { + if (opts.notificationAuthStatus && init.headers.authorization !== `Bearer ${state.validJwt}`) return reply({}, opts.notificationAuthStatus) + return { ok: true, status: 202, headers: { get: () => null }, text: async () => '' } + } if (req.method === 'tools/call') { if (init.headers.authorization !== `Bearer ${state.validJwt}`) return reply({}, 401) return reply({ jsonrpc: '2.0', id: req.id, result: { structuredContent: { columns: ['n'], rows: [{ n: 7 }] }, isError: false } }) @@ -110,6 +113,25 @@ test('a 401 mid-flight triggers exactly one refresh + retry', async (t) => { assert.equal(state.refreshCalls, 1) }) +test('a notification 401 or 403 during handshake triggers exactly one refresh + retry', async (t) => { + for (const status of /** @type {const} */ ([401, 403])) { + await t.test(`HTTP ${status}`, async (t) => { + const hypHome = await tmpHome() + const stateDir = path.join(hypHome, 'hypaware') + // Fresh-looking stored JWT, but the server accepts initialize and rejects + // the initialized notification before tools/call can run. + await writeSession(stateDir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-revoked', expiresAt: FUTURE, org: 'acme' }) + const state = stubServers(t, { validJwt: 'jwt-fresh', refreshTo: 'jwt-fresh', notificationAuthStatus: status }) + + const { ctx, out } = ctxWith(hypHome) + const code = await cmd.run(['SELECT 1', '--remote', 'prod', '--format', 'json'], ctx) + assert.equal(code, 0) + assert.deepEqual(JSON.parse(out.join('')), [{ n: 7 }]) + assert.equal(state.refreshCalls, 1) + }) + } +}) + test('a refresh that fails invalid_grant surfaces the re-login guidance', async (t) => { const hypHome = await tmpHome() const stateDir = path.join(hypHome, 'hypaware') From 65867096dbaec479b2fee619fe760039f78b834c Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 13:18:36 -0700 Subject: [PATCH 14/59] Fix no-browser remote login review issues --- llp/0047-oidc-login-client.design.md | 2 +- llp/0048-oidc-login-client.plan.md | 2 +- src/core/cli/remote_commands.js | 10 +++++----- test/core/remote-login-command.test.js | 17 +++++++++++++++++ 4 files changed, 24 insertions(+), 7 deletions(-) diff --git a/llp/0047-oidc-login-client.design.md b/llp/0047-oidc-login-client.design.md index ca63e21..eb3b8f2 100644 --- a/llp/0047-oidc-login-client.design.md +++ b/llp/0047-oidc-login-client.design.md @@ -6,7 +6,7 @@ **Author:** Kenny / Claude **Date:** 2026-06-29 **Related:** LLP 0009, LLP 0011, LLP 0033, LLP 0046 -**Decided-by:** LLP 0046 — multi-tenant OIDC login on the client +**Decided-by:** LLP 0046: multi-tenant OIDC login on the client > [LLP 0046](./0046-oidc-login-client.decision.md) grilled and decided the client-local > shape of chunk 2: a browser mode on `hyp remote login`, an ephemeral loopback diff --git a/llp/0048-oidc-login-client.plan.md b/llp/0048-oidc-login-client.plan.md index b402a11..dafe1cb 100644 --- a/llp/0048-oidc-login-client.plan.md +++ b/llp/0048-oidc-login-client.plan.md @@ -6,7 +6,7 @@ **Author:** Kenny / Claude **Date:** 2026-06-29 **Related:** LLP 0033, LLP 0046, LLP 0047 -**Generated-by:** LLP 0047 — multi-tenant OIDC login on the client, implementation design +**Generated-by:** LLP 0047: multi-tenant OIDC login on the client, implementation design > [LLP 0047](./0047-oidc-login-client.design.md) named the modules and the wire > contract. This plan turns them into a task graph of independently-mergeable PRs, with diff --git a/src/core/cli/remote_commands.js b/src/core/cli/remote_commands.js index 9cd1a51..6eb7844 100644 --- a/src/core/cli/remote_commands.js +++ b/src/core/cli/remote_commands.js @@ -121,9 +121,9 @@ export async function runRemoteLogin(argv, ctx, deps = {}) { const stdin = /** @type {any} */ (ctx.stdin ?? process.stdin) const stdinPiped = !!stdin && !stdin.isTTY - // Static path: an explicit token file, or a piped token unless --browser - // forces the interactive flow. - const useStatic = !!tokenFile || (stdinPiped && !forceBrowser) + // Static path: an explicit token file, or a piped token unless a browser + // mode flag forces the authorization-code flow. + const useStatic = !!tokenFile || (stdinPiped && !forceBrowser && !noBrowser) if (useStatic) { // --org only applies to the browser flow; say so rather than silently drop it. @@ -183,8 +183,8 @@ async function runStaticLogin(name, tokenFile, stdin, ctx) { } if (!token) { ctx.stderr.write('hyp remote login: empty token\n') - // Any non-TTY stdin (a wrapper, an IDE terminal, /dev/null) routes here - // even when no token was piped; point at the browser flow it bypassed. + // Non-TTY stdin without a browser-mode flag routes here even when no + // token was piped; point at the browser flow it bypassed. if (!tokenFile) { ctx.stderr.write(' (to sign in with a browser instead, re-run with --browser)\n') } diff --git a/test/core/remote-login-command.test.js b/test/core/remote-login-command.test.js index 8c1d8a0..3fa6682 100644 --- a/test/core/remote-login-command.test.js +++ b/test/core/remote-login-command.test.js @@ -76,6 +76,23 @@ test('--no-browser passes noBrowser through to the flow', async () => { assert.equal(seen.noBrowser, true) }) +test('--no-browser still uses browser mode when stdin is non-TTY', async () => { + const hypHome = await tmpHome() + const stdin = { + isTTY: false, + async *[Symbol.asyncIterator]() { /* no chunks */ }, + } + const { ctx } = await makeCtx({ hypHome, stdin }) + /** @type {any} */ let seen + const login = /** @type {any} */ (async (/** @type {any} */ args) => { + seen = args + return { refreshToken: 'rt', accessJwt: 'jwt', expiresAt: '2999-01-01T00:00:00Z', org: 'acme' } + }) + const code = await runRemoteLogin(['prod', '--no-browser'], ctx, { login }) + assert.equal(code, 0) + assert.equal(seen.noBrowser, true) +}) + test('a callback error maps to a clear org-selection message', async () => { const hypHome = await tmpHome() const { ctx, err } = await makeCtx({ hypHome }) From a5689c9b354a3098371add0b16b25547aa1adae8 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 14:18:05 -0700 Subject: [PATCH 15/59] remote/oidc: harden loopback against a malformed request target A request whose target makes `new URL` throw (e.g. `GET //`) raised an uncaught exception inside the loopback request listener, crashing the whole `hyp remote login` process. Any local process or port scanner hitting the OS-assigned loopback port during the login window could trigger it. Wrap the parse in try/catch and answer 400 without settling the flow, so the real callback can still arrive. --- src/core/remote/loopback.js | 13 +++++++++++- test/core/remote-loopback.test.js | 35 +++++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/src/core/remote/loopback.js b/src/core/remote/loopback.js index 66422fd..df1335c 100644 --- a/src/core/remote/loopback.js +++ b/src/core/remote/loopback.js @@ -72,7 +72,18 @@ export function startLoopbackReceiver({ state, timeoutMs = DEFAULT_TIMEOUT_MS }) /** @type {Server} */ const server = http.createServer((req, res) => { - const url = new URL(req.url ?? '/', 'http://127.0.0.1') + // A malformed request target (e.g. `GET //` or `http://[`) makes `new URL` + // throw; an uncaught throw in the request listener would crash the whole + // `hyp remote login` process. Treat it as a stray request: 400 and ignore, + // never settle the flow, so the real callback can still arrive. + let url + try { + url = new URL(req.url ?? '/', 'http://127.0.0.1') + } catch { + res.writeHead(400, { 'content-type': 'text/plain' }) + res.end('bad request') + return + } if (url.pathname !== CALLBACK_PATH) { res.writeHead(404, { 'content-type': 'text/plain' }) res.end('not found') diff --git a/test/core/remote-loopback.test.js b/test/core/remote-loopback.test.js index a8e5c08..f4ee1c5 100644 --- a/test/core/remote-loopback.test.js +++ b/test/core/remote-loopback.test.js @@ -2,9 +2,32 @@ import test from 'node:test' import assert from 'node:assert/strict' +import net from 'node:net' import { startLoopbackReceiver } from '../../src/core/remote/loopback.js' +/** + * Send a raw request line the way `fetch` never would, so we can exercise a + * request target that makes `new URL` throw. Resolves with the response bytes. + * + * @param {string} redirectUri + * @param {string} target the raw request-target after the method + * @returns {Promise} + */ +function rawRequest(redirectUri, target) { + const port = Number(new URL(redirectUri).port) + return new Promise((resolve, reject) => { + const sock = net.connect(port, '127.0.0.1', () => { + sock.write(`GET ${target} HTTP/1.1\r\nHost: 127.0.0.1\r\nConnection: close\r\n\r\n`) + }) + let buf = '' + sock.setEncoding('utf8') + sock.on('data', (chunk) => { buf += chunk }) + sock.on('end', () => resolve(buf)) + sock.on('error', reject) + }) +} + /** * Drive the real bound port with a real HTTP request, the way the browser * would hit the loopback redirect. @@ -64,6 +87,18 @@ test('close() before a code arrives rejects a pending waitForCode (no hang)', as await assertion }) +test('a malformed request target returns 400 without crashing or settling the flow', async () => { + const recv = await startLoopbackReceiver({ state: 's1', timeoutMs: 2000 }) + const waiting = recv.waitForCode() + // `GET //` makes `new URL('//', base)` throw; the handler must answer 400 + // rather than letting the throw crash the process. + const resp = await rawRequest(recv.redirectUri, '//') + assert.match(resp, /^HTTP\/1\.1 400/) + // The flow is untouched: the real callback afterward still resolves. + await hitCallback(recv.redirectUri, { code: 'ok', state: 's1' }) + assert.deepEqual(await waiting, { code: 'ok' }) +}) + test('a request to a path other than /callback does not consume the single shot', async () => { const recv = await startLoopbackReceiver({ state: 's1', timeoutMs: 2000 }) const waiting = recv.waitForCode() From 1a3f04b1d11e2d590f69907e4f6b204b6359a9c2 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 14:20:14 -0700 Subject: [PATCH 16/59] remote/oidc: don't silently discard a piped token under --no-browser `echo $TOKEN | hyp remote login foo --no-browser` routed to the browser flow and never read stdin, so the supplied token was silently dropped. --no-browser is a browser-mode modifier (print the URL) yet a piped token signals static intent. Peek stdin in that case: a non-empty piped value is stored statically (with a note that --no-browser was ignored), while an empty pipe still falls through to the browser flow, preserving the documented non-TTY --no-browser behavior. Extracts persistStaticToken so both static paths share one writer. --- src/core/cli/remote_commands.js | 42 ++++++++++++++++++++++++++ test/core/remote-login-command.test.js | 18 +++++++++++ 2 files changed, 60 insertions(+) diff --git a/src/core/cli/remote_commands.js b/src/core/cli/remote_commands.js index 6eb7844..a34e31e 100644 --- a/src/core/cli/remote_commands.js +++ b/src/core/cli/remote_commands.js @@ -132,6 +132,33 @@ export async function runRemoteLogin(argv, ctx, deps = {}) { } return runStaticLogin(name, tokenFile, stdin, ctx) } + + // `--no-browser` with a piped token is contradictory: the flag is a browser + // mode modifier ("print the URL") but a piped token signals static intent. + // Peek stdin to tell a real token apart from an empty pipe: a non-empty value + // is stored statically (never silently discarded), while an empty pipe falls + // through to the browser flow (which is what `--no-browser` is for). + if (stdinPiped && noBrowser && !forceBrowser && !tokenFile) { + /** @type {string} */ + let piped + try { + piped = (await readAllStdin(stdin)).trim() + } catch (err) { + ctx.stderr.write(`hyp remote login: ${err instanceof Error ? err.message : String(err)}\n`) + return 1 + } + if (piped) { + ctx.stderr.write('note: --no-browser is ignored because a token was piped on stdin (storing it statically)\n') + if (org) { + ctx.stderr.write('note: --org is ignored with a static token (it applies to the browser login flow)\n') + } + return persistStaticToken(name, piped, ctx) + } + // Empty pipe: this is the documented "print the URL" path. The browser flow + // below must not re-read the (now exhausted) stdin, so signal it explicitly. + return runBrowserLogin(name, { org, noBrowser }, ctx, deps.login ?? loginWithBrowser) + } + return runBrowserLogin(name, { org, noBrowser }, ctx, deps.login ?? loginWithBrowser) } @@ -191,6 +218,21 @@ async function runStaticLogin(name, tokenFile, stdin, ctx) { return 2 } + return persistStaticToken(name, token, ctx) +} + +/** + * Store an already-read static token to the 0600 store and print the + * confirmation, with a nudge when the target isn't configured. Shared by the + * `--token-file`/piped static path and the `--no-browser`-with-a-piped-token + * peek. + * + * @param {string} name + * @param {string} token a non-empty, trimmed token + * @param {CommandRunContext} ctx + * @returns {Promise} + */ +async function persistStaticToken(name, token, ctx) { const stateDir = readObservabilityEnv(ctx.env).stateDir await writeToken(stateDir, name, token) ctx.stdout.write(`stored query-scoped token for '${name}' (mode 0600)\n`) diff --git a/test/core/remote-login-command.test.js b/test/core/remote-login-command.test.js index 3fa6682..fe4476d 100644 --- a/test/core/remote-login-command.test.js +++ b/test/core/remote-login-command.test.js @@ -173,6 +173,24 @@ test('an empty piped stdin (no token) points at --browser instead of just "empty assert.match(err.join(''), /re-run with --browser/) }) +test('--no-browser with a piped token stores it statically instead of discarding it', async () => { + const hypHome = await tmpHome() + const stdin = { + isTTY: false, + async *[Symbol.asyncIterator]() { yield Buffer.from('piped-tok\n') }, + } + const { ctx, err } = await makeCtx({ hypHome, stdin }) + let called = false + const login = /** @type {any} */ (async () => { called = true; return {} }) + const code = await runRemoteLogin(['prod', '--no-browser'], ctx, { login }) + assert.equal(code, 0) + assert.equal(called, false) // the browser flow must NOT run; the token wins + assert.match(err.join(''), /--no-browser is ignored because a token was piped/) + const stateDir = path.join(hypHome, 'hypaware') + const creds = await readCredentials(stateDir) + assert.deepEqual(creds.prod, { kind: 'static', token: 'piped-tok' }) +}) + test('--browser overrides a piped stdin token and takes the browser flow', async () => { const hypHome = await tmpHome() const stdin = { From 5f492b6b3282b96e49b361a8b909ab090aa8c993 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 14:20:53 -0700 Subject: [PATCH 17/59] remote/oidc: point a failed browser login at the headless escape hatches On a headless box the opener silently fails but still reports success, so the browser flow blocks until the loopback timeout and then printed only the bare timeout error. The old fast-fail TTY guard that named --token-file is gone now that browser login is the default. On any non-callback failure (timeout, transport), append guidance pointing at --token-file, piped stdin, and --no-browser, so the user is never left stuck without a next step. --- src/core/cli/remote_commands.js | 7 +++++++ test/core/remote-login-command.test.js | 24 ++++++++++++++++++++++++ 2 files changed, 31 insertions(+) diff --git a/src/core/cli/remote_commands.js b/src/core/cli/remote_commands.js index a34e31e..56d4abf 100644 --- a/src/core/cli/remote_commands.js +++ b/src/core/cli/remote_commands.js @@ -285,6 +285,13 @@ async function runBrowserLogin(name, { org, noBrowser }, ctx, login) { } catch (err) { const callbackError = /** @type {any} */ (err)?.callbackError ctx.stderr.write(`hyp remote login: ${explainLoginError(callbackError, err)}\n`) + // A server-surfaced callback error (org selection, membership) is already + // actionable. A local failure - most importantly a timeout, which is what a + // headless box hits when the opener silently fails - is not, so point at the + // headless escape hatches rather than leaving the user stuck (LLP 0046 D8). + if (!callbackError) { + ctx.stderr.write(" (on a machine with no browser, pass a static token with --token-file or pipe it on stdin; --no-browser prints the URL to open elsewhere)\n") + } return 1 } } diff --git a/test/core/remote-login-command.test.js b/test/core/remote-login-command.test.js index fe4476d..5dd775b 100644 --- a/test/core/remote-login-command.test.js +++ b/test/core/remote-login-command.test.js @@ -104,6 +104,30 @@ test('a callback error maps to a clear org-selection message', async () => { assert.match(err.join(''), /re-run with --org /) }) +test('a browser login timeout points at the headless escape hatches', async () => { + const hypHome = await tmpHome() + const { ctx, err } = await makeCtx({ hypHome }) + // A local failure with no server callbackError (e.g. the loopback timeout a + // headless box hits when the opener silently fails). + const login = /** @type {any} */ (async () => { + throw new Error('timed out waiting for the browser login to complete') + }) + const code = await runRemoteLogin(['prod'], ctx, { login }) + assert.equal(code, 1) + assert.match(err.join(''), /timed out/) + assert.match(err.join(''), /--token-file or pipe it on stdin/) +}) + +test('a server callback error does not append the headless hint (it is already actionable)', async () => { + const hypHome = await tmpHome() + const { ctx, err } = await makeCtx({ hypHome }) + const login = /** @type {any} */ (async () => { + throw Object.assign(new Error('x'), { callbackError: 'org_selection_required' }) + }) + await runRemoteLogin(['prod'], ctx, { login }) + assert.doesNotMatch(err.join(''), /--token-file or pipe it on stdin/) +}) + test('no_membership maps to its own message', async () => { const hypHome = await tmpHome() const { ctx, err } = await makeCtx({ hypHome }) From 6d61bf7f1234065b18c2ab6dd8a4d7caea33c4bd Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 14:22:08 -0700 Subject: [PATCH 18/59] remote/oidc: persist a rotated refresh token from the refresh grant refreshSession dropped any refresh_token in the token response and resolveAccessJwt re-stored the old refreshToken, so a server that rotates refresh tokens on use (one-time-use) would 401 on the next silent refresh and force a full re-login every session after the first. Surface a rotated refresh_token through RefreshedAccess and keep the stored one only when the server omits it (stable-token servers are unaffected). --- src/core/remote/credentials.js | 13 ++++++++++-- src/core/remote/identity_client.js | 5 +++++ src/core/remote/types.d.ts | 6 ++++++ test/core/remote-credentials-oidc.test.js | 26 +++++++++++++++++++++++ test/core/remote-identity-client.test.js | 14 ++++++++++-- 5 files changed, 60 insertions(+), 4 deletions(-) diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index 2738ca4..27d5f3e 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -269,8 +269,17 @@ export async function resolveAccessJwt({ target, env, stateDir, identityBase, no const refreshed = await refreshSession({ identityBase, refreshToken: entry.refreshToken, fetchImpl }) /** @type {RemoteOidcRecord} */ // `org` is fixed for the refresh token's life; keep the stored one when the - // refresh response omits it (refreshSession returns '' in that case). - const next = { ...entry, accessJwt: refreshed.accessJwt, expiresAt: refreshed.expiresAt, org: refreshed.org || entry.org } + // refresh response omits it (refreshSession returns '' in that case). A + // rotated refresh token (one-time-use servers) replaces the stored one; + // otherwise keep it, so the next refresh sends a live token rather than a + // consumed one that would 401 and force a re-login every session. + const next = { + ...entry, + accessJwt: refreshed.accessJwt, + expiresAt: refreshed.expiresAt, + org: refreshed.org || entry.org, + refreshToken: refreshed.refreshToken || entry.refreshToken, + } await writeSession(stateDir, target, next) return { ok: true, token: refreshed.accessJwt, source: 'file', kind: 'oidc' } } diff --git a/src/core/remote/identity_client.js b/src/core/remote/identity_client.js index ae6cff7..d3c8185 100644 --- a/src/core/remote/identity_client.js +++ b/src/core/remote/identity_client.js @@ -65,6 +65,11 @@ export async function refreshSession({ identityBase, refreshToken, fetchImpl }) // caller keep the org it already stored, so a server that omits it on // refresh does not turn every silent refresh into a hard error. org: typeof json.org === 'string' ? json.org : '', + // A rotated (one-time-use) refresh token, if the server issues one. Empty + // when the server keeps the refresh token stable; the caller then retains + // the token it already stored. Storing a stale token here would 401 on the + // next refresh and force a full re-login every session. + refreshToken: typeof json.refresh_token === 'string' ? json.refresh_token : '', } } diff --git a/src/core/remote/types.d.ts b/src/core/remote/types.d.ts index 0a07c0c..f2ccdc7 100644 --- a/src/core/remote/types.d.ts +++ b/src/core/remote/types.d.ts @@ -24,6 +24,12 @@ export interface RefreshedAccess { /** ISO-8601 expiry of `accessJwt`. */ expiresAt: string org: string + /** + * A rotated refresh token, when the server issues one-time-use refresh + * tokens. Empty when the server keeps the refresh token stable, in which + * case the caller retains the one it already stored. + */ + refreshToken: string } /** diff --git a/test/core/remote-credentials-oidc.test.js b/test/core/remote-credentials-oidc.test.js index c2d7af3..954ba46 100644 --- a/test/core/remote-credentials-oidc.test.js +++ b/test/core/remote-credentials-oidc.test.js @@ -131,6 +131,32 @@ test('resolveAccessJwt refreshes a stale oidc JWT and persists the new one', asy assert.equal(/** @type {any} */ (creds.prod).expiresAt, FUTURE) }) +test('resolveAccessJwt persists a rotated refresh token from the refresh response', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'rt-old', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) + // A one-time-use server rotates the refresh token on each refresh. + const fetchImpl = /** @type {any} */ (async () => ({ + ok: true, status: 200, + text: async () => JSON.stringify({ access_jwt: 'jwt-new', expires_at: FUTURE, org: 'acme', refresh_token: 'rt-new' }), + })) + await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }) + const creds = await readCredentials(dir) + // The consumed token must be replaced; storing rt-old would 401 next refresh. + assert.equal(/** @type {any} */ (creds.prod).refreshToken, 'rt-new') +}) + +test('resolveAccessJwt keeps the stored refresh token when the server does not rotate', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'rt-stable', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) + const fetchImpl = /** @type {any} */ (async () => ({ + ok: true, status: 200, + text: async () => JSON.stringify({ access_jwt: 'jwt-new', expires_at: FUTURE, org: 'acme' }), + })) + await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }) + const creds = await readCredentials(dir) + assert.equal(/** @type {any} */ (creds.prod).refreshToken, 'rt-stable') +}) + test('resolveAccessJwt keeps the stored org when a refresh response omits it', async () => { const dir = await tmpState() await writeSession(dir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) diff --git a/test/core/remote-identity-client.test.js b/test/core/remote-identity-client.test.js index 069c3a3..ed57357 100644 --- a/test/core/remote-identity-client.test.js +++ b/test/core/remote-identity-client.test.js @@ -49,14 +49,24 @@ test('refreshSession posts the refresh_token grant and maps the response', async refreshToken: 'rt-1', fetchImpl, }) - assert.deepEqual(refreshed, { accessJwt: 'jwt-2', expiresAt: '2026-06-29T13:00:00Z', org: 'acme' }) + // No rotated refresh_token in the response, so refreshToken comes back empty + // (the caller keeps the one it already stored). + assert.deepEqual(refreshed, { accessJwt: 'jwt-2', expiresAt: '2026-06-29T13:00:00Z', org: 'acme', refreshToken: '' }) assert.deepEqual(JSON.parse(calls[0].init.body), { grant_type: 'refresh_token', refresh_token: 'rt-1' }) }) +test('refreshSession returns a rotated refresh_token when the server issues one', async () => { + const { fetchImpl } = stubFetch({ + body: { access_jwt: 'jwt-2', expires_at: '2026-06-29T13:00:00Z', org: 'acme', refresh_token: 'rt-2' }, + }) + const refreshed = await refreshSession({ identityBase: 'https://hyp.internal/v1/identity', refreshToken: 'rt-1', fetchImpl }) + assert.equal(refreshed.refreshToken, 'rt-2') +}) + test('refreshSession tolerates a response that omits org (returns org: "")', async () => { const { fetchImpl } = stubFetch({ body: { access_jwt: 'jwt-2', expires_at: '2026-06-29T13:00:00Z' } }) const refreshed = await refreshSession({ identityBase: 'https://hyp.internal/v1/identity', refreshToken: 'rt-1', fetchImpl }) - assert.deepEqual(refreshed, { accessJwt: 'jwt-2', expiresAt: '2026-06-29T13:00:00Z', org: '' }) + assert.deepEqual(refreshed, { accessJwt: 'jwt-2', expiresAt: '2026-06-29T13:00:00Z', org: '', refreshToken: '' }) }) test('a 401 invalid_grant surfaces a typed InvalidGrantError', async () => { From 093ec1f2542ed79386eb94ac8a6243a4ee0026cd Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 14:23:10 -0700 Subject: [PATCH 19/59] remote/oidc: preserve sibling credential records the writer can't normalize readCredentials drops any record it can't normalize; writeToken/writeSession/ removeToken read through it, so an unrelated login or remove silently deleted a hand-edited or newer-version sibling record from the 0600 store. Route the write path through a raw read that keeps unrecognized records intact. removeToken can now also drop a record that doesn't normalize, instead of reporting it absent. --- src/core/remote/credentials.js | 40 ++++++++++++++++------- test/core/remote-credentials-oidc.test.js | 28 ++++++++++++++++ 2 files changed, 57 insertions(+), 11 deletions(-) diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index 27d5f3e..d513332 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -84,6 +84,28 @@ export function remoteTokenEnvVar(target) { * @ref LLP 0033#credentials [constrained-by]: single 0600 per-target store, secrets never in config */ export async function readCredentials(stateDir) { + const parsed = await readRawCredentials(stateDir) + /** @type {Record} */ + const out = {} + for (const [target, entry] of Object.entries(parsed)) { + const record = normalizeRecord(entry) + if (record) out[target] = record + } + return out +} + +/** + * Read the credential file as its raw per-target object, **without** dropping + * records that don't normalize. The write path uses this so an unrelated login + * or remove rewrites the store without deleting a sibling record it could not + * interpret (a hand-edited entry, or one written by a newer version). Returns + * `{}` when the file is absent; throws on a corrupt file, same as + * {@link readCredentials}. + * + * @param {string} stateDir + * @returns {Promise>} + */ +async function readRawCredentials(stateDir) { let raw try { raw = await fs.readFile(remoteCredentialsPath(stateDir), 'utf8') @@ -101,13 +123,7 @@ export async function readCredentials(stateDir) { if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) { throw new Error(`remote credentials file must be a JSON object: ${remoteCredentialsPath(stateDir)}`) } - /** @type {Record} */ - const out = {} - for (const [target, entry] of Object.entries(/** @type {Record} */ (parsed))) { - const record = normalizeRecord(entry) - if (record) out[target] = record - } - return out + return /** @type {Record} */ (parsed) } /** @@ -152,7 +168,7 @@ function normalizeRecord(entry) { */ export async function writeToken(stateDir, target, token) { await fs.mkdir(stateDir, { recursive: true }) - const current = await readCredentials(stateDir) + const current = await readRawCredentials(stateDir) current[target] = { kind: 'static', token } await writeCredentials(stateDir, current) } @@ -169,7 +185,7 @@ export async function writeToken(stateDir, target, token) { */ export async function writeSession(stateDir, target, session) { await fs.mkdir(stateDir, { recursive: true }) - const current = await readCredentials(stateDir) + const current = await readRawCredentials(stateDir) current[target] = { kind: 'oidc', refreshToken: session.refreshToken, @@ -189,7 +205,9 @@ export async function writeSession(stateDir, target, session) { * @returns {Promise} */ export async function removeToken(stateDir, target) { - const current = await readCredentials(stateDir) + // Operate on the raw file so a record we can't normalize is still removable, + // and so removing one target never drops an unrelated sibling. + const current = await readRawCredentials(stateDir) if (!Object.prototype.hasOwnProperty.call(current, target)) return false delete current[target] await writeCredentials(stateDir, current) @@ -323,7 +341,7 @@ function noTokenError(target, envName) { /** * @param {string} stateDir - * @param {Record} map + * @param {Record} map * @returns {Promise} */ async function writeCredentials(stateDir, map) { diff --git a/test/core/remote-credentials-oidc.test.js b/test/core/remote-credentials-oidc.test.js index 954ba46..69d16d0 100644 --- a/test/core/remote-credentials-oidc.test.js +++ b/test/core/remote-credentials-oidc.test.js @@ -85,6 +85,34 @@ test('writeToken now stamps kind: static', async () => { assert.deepEqual(creds.prod, { kind: 'static', token: 'sk-1' }) }) +test('writing one target preserves a sibling record that does not normalize', async () => { + const dir = await tmpState() + // A valid sibling we understand, plus a record from a hypothetical newer + // version that normalizeRecord would drop on read. + await fs.writeFile(remoteCredentialsPath(dir), JSON.stringify({ + future: { kind: 'webauthn', handle: 'abc' }, + })) + await writeToken(dir, 'prod', 'sk-1') + // The future record must still be on disk after an unrelated login. + const raw = JSON.parse(await fs.readFile(remoteCredentialsPath(dir), 'utf8')) + assert.deepEqual(raw.future, { kind: 'webauthn', handle: 'abc' }) + assert.deepEqual(raw.prod, { kind: 'static', token: 'sk-1' }) +}) + +test('removeToken drops a record that does not normalize and keeps the rest', async () => { + const dir = await tmpState() + await fs.writeFile(remoteCredentialsPath(dir), JSON.stringify({ + ghost: { kind: 'webauthn', handle: 'abc' }, + keep: { kind: 'static', token: 'sk-keep' }, + })) + // removeToken must find and remove the un-normalizable record... + assert.equal(await removeToken(dir, 'ghost'), true) + const raw = JSON.parse(await fs.readFile(remoteCredentialsPath(dir), 'utf8')) + assert.equal(raw.ghost, undefined) + // ...without disturbing the sibling. + assert.deepEqual(raw.keep, { kind: 'static', token: 'sk-keep' }) +}) + test('removeToken clears an oidc record too', async () => { const dir = await tmpState() await writeSession(dir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt', expiresAt: FUTURE, org: 'acme' }) From b14ce3ec2883a41e6849d84446d59d9a0d628266 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 14:23:54 -0700 Subject: [PATCH 20/59] mcp/proxy: make the startup credential probe a non-refreshing presence check The fail-fast probe called resolveAccessJwt, so a near-expiry OIDC JWT triggered a network refresh at startup that the first handleMessage then repeated (two refreshes + two writes before any message forwarded), and a transient refresh blip aborted a proxy that would otherwise have started. Use the non-refreshing resolveToken for the presence check; the per-message resolveAccessJwt still does the session-aware refresh. --- src/core/mcp/proxy.js | 13 +++++++----- test/core/remote-proxy-oidc.test.js | 31 +++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+), 5 deletions(-) diff --git a/src/core/mcp/proxy.js b/src/core/mcp/proxy.js index dc101a9..dc33661 100644 --- a/src/core/mcp/proxy.js +++ b/src/core/mcp/proxy.js @@ -4,7 +4,7 @@ import process from 'node:process' import { Attr, getLogger } from '../observability/index.js' import { readObservabilityEnv } from '../observability/env.js' -import { deriveIdentityBase, resolveAccessJwt } from '../remote/credentials.js' +import { deriveIdentityBase, resolveAccessJwt, resolveToken } from '../remote/credentials.js' import { InvalidGrantError } from '../remote/identity_client.js' import { parseRpcResponse } from './client.js' import { INTERNAL_ERROR, jsonRpcError } from './jsonrpc.js' @@ -39,11 +39,14 @@ export async function runMcpProxy({ target, ctx }) { } const stateDir = readObservabilityEnv(ctx.env).stateDir const identityBase = deriveIdentityBase(entry.url) ?? undefined - // Fail fast (exit 2) if there is no usable credential at all. Resolve again - // per message below so an OIDC access JWT that expires mid-session refreshes, - // rather than the whole long-lived proxy holding one short-lived JWT. + // Fail fast (exit 2) if there is no usable credential at all. This is a + // presence check only - resolveToken never refreshes - so a near-expiry OIDC + // JWT does not trigger a network refresh here that the first handleMessage + // would immediately repeat, and a transient refresh blip can't abort a proxy + // that would otherwise start. The per-message resolveAccessJwt below does the + // session-aware refresh. try { - const probe = await resolveAccessJwt({ target, env: ctx.env, stateDir, identityBase }) + const probe = await resolveToken({ target, env: ctx.env, stateDir }) if (!probe.ok) { ctx.stderr.write(`hyp mcp: ${probe.error}\n`) return 2 diff --git a/test/core/remote-proxy-oidc.test.js b/test/core/remote-proxy-oidc.test.js index 25f2bc5..6a734c1 100644 --- a/test/core/remote-proxy-oidc.test.js +++ b/test/core/remote-proxy-oidc.test.js @@ -85,6 +85,37 @@ test('proxy refreshes an oidc session on a live 401 and retries (no longer dies assert.equal(/** @type {any} */ (creds.prod).accessJwt, validJwt) }) +test('a stale JWT at startup refreshes once (lazily on the first message), not twice', async (t) => { + const original = globalThis.fetch + t.after(() => { globalThis.fetch = original }) + + const hypHome = await tmpHome() + const stateDir = path.join(hypHome, 'hypaware') + // Already-expired cached JWT: a probe that refreshed would do it once, then + // the first message again. The probe is a presence check now, so exactly one + // refresh happens. + await writeSession(stateDir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-old', expiresAt: '2000-01-01T00:00:00Z', org: 'acme' }) + + let refreshCalls = 0 + const validJwt = 'jwt-new' + globalThis.fetch = /** @type {any} */ (async (/** @type {string} */ url, /** @type {any} */ init) => { + if (url === TOKEN_URL) { + refreshCalls += 1 + return reply({ access_jwt: validJwt, expires_at: FUTURE, org: 'acme' }) + } + const body = JSON.parse(init.body) + if (init.headers.authorization !== `Bearer ${validJwt}`) return reply({ jsonrpc: '2.0', id: body.id }, 401) + return reply({ jsonrpc: '2.0', id: body.id, result: { ok: true } }) + }) + + const { ctx, out } = makeCtx({ hypHome, lines: [{ jsonrpc: '2.0', id: 1, method: 'tools/call', params: { name: 'x' } }] }) + const code = await runMcpProxy({ target: 'prod', ctx }) + + assert.equal(code, 0) + assert.equal(refreshCalls, 1) + assert.match(out.join(''), /"ok":\s*true/) +}) + test('proxy surfaces re-login guidance when the refresh is rejected (invalid_grant)', async (t) => { const original = globalThis.fetch t.after(() => { globalThis.fetch = original }) From a4c33237418b091c718353b5d8e08d5d1a1e8285 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 14:25:00 -0700 Subject: [PATCH 21/59] mcp/proxy: surface a failed forced refresh, not a bare HTTP 401 On a live 401, when the forced refresh returned ok:false (e.g. the record vanished, or no identity endpoint resolved), the proxy skipped the retry block and let the stale 401 fall through to a generic 'remote returned HTTP 401', discarding the refresh reason. The one-shot verb path already surfaces it. Return refreshed.error from the retry block so the actual reason rides back. --- src/core/mcp/proxy.js | 6 ++++++ test/core/remote-proxy-oidc.test.js | 30 ++++++++++++++++++++++++++++- 2 files changed, 35 insertions(+), 1 deletion(-) diff --git a/src/core/mcp/proxy.js b/src/core/mcp/proxy.js index dc33661..175a2c9 100644 --- a/src/core/mcp/proxy.js +++ b/src/core/mcp/proxy.js @@ -121,6 +121,12 @@ export async function runMcpProxy({ target, ctx }) { } catch (err) { return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, `proxy fetch failed: ${err instanceof Error ? err.message : String(err)}`) } + } else { + // The forced refresh could not produce a token (e.g. no identity + // endpoint). Surface that reason rather than letting the stale 401 + // fall through to a generic "remote returned HTTP 401" with no + // guidance, mirroring the one-shot verb path. + return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, refreshed.error) } } diff --git a/test/core/remote-proxy-oidc.test.js b/test/core/remote-proxy-oidc.test.js index 6a734c1..5e20283 100644 --- a/test/core/remote-proxy-oidc.test.js +++ b/test/core/remote-proxy-oidc.test.js @@ -8,7 +8,7 @@ import path from 'node:path' import { Readable } from 'node:stream' import { runMcpProxy } from '../../src/core/mcp/proxy.js' -import { writeSession, readCredentials } from '../../src/core/remote/credentials.js' +import { writeSession, readCredentials, remoteCredentialsPath } from '../../src/core/remote/credentials.js' const MCP_URL = 'https://hyp.internal/mcp' const TOKEN_URL = 'https://hyp.internal/v1/identity/token' @@ -116,6 +116,34 @@ test('a stale JWT at startup refreshes once (lazily on the first message), not t assert.match(out.join(''), /"ok":\s*true/) }) +test('proxy surfaces a failed forced refresh instead of a bare HTTP 401', async (t) => { + const original = globalThis.fetch + t.after(() => { globalThis.fetch = original }) + + const hypHome = await tmpHome() + const stateDir = path.join(hypHome, 'hypaware') + // Fresh JWT (probe + first resolve succeed), but the MCP side rejects it and, + // as a side effect, the stored record vanishes (e.g. a concurrent logout), so + // the forced refresh returns ok:false rather than throwing. + await writeSession(stateDir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-fresh', expiresAt: FUTURE, org: 'acme' }) + + globalThis.fetch = /** @type {any} */ (async (/** @type {string} */ url, /** @type {any} */ init) => { + if (url === MCP_URL) { + await fs.rm(remoteCredentialsPath(stateDir), { force: true }) + const body = JSON.parse(init.body) + return reply({ jsonrpc: '2.0', id: body.id }, 401) + } + return reply({}, 500) + }) + + const { ctx, out } = makeCtx({ hypHome, lines: [{ jsonrpc: '2.0', id: 1, method: 'tools/call', params: { name: 'x' } }] }) + const code = await runMcpProxy({ target: 'prod', ctx }) + assert.equal(code, 0) + // The refresh failure reason rides back, not a generic "remote returned HTTP 401". + assert.match(out.join(''), /no token for 'prod'/) + assert.doesNotMatch(out.join(''), /remote returned HTTP 401/) +}) + test('proxy surfaces re-login guidance when the refresh is rejected (invalid_grant)', async (t) => { const original = globalThis.fetch t.after(() => { globalThis.fetch = original }) From b70e02db3f778a9bb576c571d9001518a4b0615c Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 14:28:20 -0700 Subject: [PATCH 22/59] remote/oidc: give the refresh-retry policy primitives one home The re-login message and the 'refreshable' predicate were duplicated verbatim across proxy.js and remote_verb.js, free to drift. Extract sessionExpiredMessage (beside InvalidGrantError) and isRefreshable (beside the resolver that owns the kind+source distinction), and have both attach paths call them. The retry orchestration stays per-path since the proxy and the one-shot verb return different shapes. --- src/core/mcp/proxy.js | 8 ++++---- src/core/mcp/remote_verb.js | 8 ++++---- src/core/remote/credentials.js | 13 +++++++++++++ src/core/remote/identity_client.js | 12 ++++++++++++ test/core/remote-credentials-oidc.test.js | 7 +++++++ test/core/remote-identity-client.test.js | 6 +++++- 6 files changed, 45 insertions(+), 9 deletions(-) diff --git a/src/core/mcp/proxy.js b/src/core/mcp/proxy.js index 175a2c9..850dc02 100644 --- a/src/core/mcp/proxy.js +++ b/src/core/mcp/proxy.js @@ -4,8 +4,8 @@ import process from 'node:process' import { Attr, getLogger } from '../observability/index.js' import { readObservabilityEnv } from '../observability/env.js' -import { deriveIdentityBase, resolveAccessJwt, resolveToken } from '../remote/credentials.js' -import { InvalidGrantError } from '../remote/identity_client.js' +import { deriveIdentityBase, isRefreshable, resolveAccessJwt, resolveToken } from '../remote/credentials.js' +import { InvalidGrantError, sessionExpiredMessage } from '../remote/identity_client.js' import { parseRpcResponse } from './client.js' import { INTERNAL_ERROR, jsonRpcError } from './jsonrpc.js' import { serveStdio } from './stdio.js' @@ -108,7 +108,7 @@ export async function runMcpProxy({ target, ctx }) { // A live 401/403 on an OIDC session means the cached JWT was revoked or // expired early: force one refresh + retry before surfacing (LLP 0046 D5). // An env/static token cannot refresh, so it falls through to the error. - if ((res.status === 401 || res.status === 403) && resolved.kind === 'oidc' && resolved.source === 'file') { + if ((res.status === 401 || res.status === 403) && isRefreshable(resolved)) { let refreshed try { refreshed = await resolveAccessJwt({ target, env: ctx.env, stateDir, identityBase, forceRefresh: true }) @@ -170,7 +170,7 @@ export async function runMcpProxy({ target, ctx }) { */ function proxyAuthMessage(err, target) { if (err instanceof InvalidGrantError) { - return `remote session expired - re-run 'hyp remote login ${target}'` + return sessionExpiredMessage(target) } return err instanceof Error ? err.message : String(err) } diff --git a/src/core/mcp/remote_verb.js b/src/core/mcp/remote_verb.js index 477287e..046c464 100644 --- a/src/core/mcp/remote_verb.js +++ b/src/core/mcp/remote_verb.js @@ -1,8 +1,8 @@ // @ts-check import { readObservabilityEnv } from '../observability/env.js' -import { deriveIdentityBase, resolveAccessJwt } from '../remote/credentials.js' -import { InvalidGrantError } from '../remote/identity_client.js' +import { deriveIdentityBase, isRefreshable, resolveAccessJwt } from '../remote/credentials.js' +import { InvalidGrantError, sessionExpiredMessage } from '../remote/identity_client.js' import { createHttpMcpClient } from './client.js' /** @@ -58,7 +58,7 @@ export async function runRemoteVerb({ verb, params, target, ctx }) { // A live 401/403 on an OIDC session means the cached JWT is stale or was // revoked early. Refresh once and retry before surfacing (LLP 0046 D5). An // env override or a static token cannot be refreshed, so it surfaces as-is. - const refreshable = resolved.kind === 'oidc' && resolved.source === 'file' + const refreshable = isRefreshable(resolved) if (!refreshable || !isAuthError(err)) { return { ok: false, error: err instanceof Error ? err.message : String(err), exitCode: 1 } } @@ -111,7 +111,7 @@ async function callRemoteTool({ url, token, verb, params }) { */ function mapRefreshError(err, target) { if (err instanceof InvalidGrantError) { - return { ok: false, error: `remote session expired - re-run 'hyp remote login ${target}'`, exitCode: 2 } + return { ok: false, error: sessionExpiredMessage(target), exitCode: 2 } } return { ok: false, error: err instanceof Error ? err.message : String(err), exitCode: 1 } } diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index d513332..75c53b5 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -304,6 +304,19 @@ export async function resolveAccessJwt({ target, env, stateDir, identityBase, no return { ok: true, token: entry.accessJwt, source: 'file', kind: 'oidc' } } +/** + * Whether a resolved credential can be silently refreshed: an `oidc` record + * read from the file. A per-target env override and a `static` token cannot be + * refreshed. The attach paths call this instead of each re-deriving the + * kind+source rule, so "what is refreshable" has one owner (LLP 0046 D5). + * + * @param {{ source?: 'env' | 'file', kind?: 'static' | 'oidc' }} resolved a successful resolveAccessJwt result + * @returns {boolean} + */ +export function isRefreshable(resolved) { + return resolved.kind === 'oidc' && resolved.source === 'file' +} + /** * Whether an `oidc` record's cached JWT is still safely usable: present and * more than the skew window from its parseable expiry. diff --git a/src/core/remote/identity_client.js b/src/core/remote/identity_client.js index d3c8185..9ce3427 100644 --- a/src/core/remote/identity_client.js +++ b/src/core/remote/identity_client.js @@ -29,6 +29,18 @@ export class InvalidGrantError extends Error { } } +/** + * The user-facing re-login guidance for an expired or revoked OIDC session + * (LLP 0046 D5). Lives with {@link InvalidGrantError} so the wording has one + * home, shared by the stdio proxy and the one-shot verb attach path. + * + * @param {string} target + * @returns {string} + */ +export function sessionExpiredMessage(target) { + return `remote session expired - re-run 'hyp remote login ${target}'` +} + /** * Exchange an authorization code for a session (the `authorization_code` * grant). Presents the held PKCE verifier. diff --git a/test/core/remote-credentials-oidc.test.js b/test/core/remote-credentials-oidc.test.js index 69d16d0..3ad7600 100644 --- a/test/core/remote-credentials-oidc.test.js +++ b/test/core/remote-credentials-oidc.test.js @@ -7,6 +7,7 @@ import os from 'node:os' import path from 'node:path' import { + isRefreshable, readCredentials, remoteCredentialsPath, removeToken, @@ -23,6 +24,12 @@ async function tmpState() { const FUTURE = '2999-01-01T00:00:00Z' const PAST = '2000-01-01T00:00:00Z' +test('isRefreshable is true only for an oidc record read from the file', () => { + assert.equal(isRefreshable({ kind: 'oidc', source: 'file' }), true) + assert.equal(isRefreshable({ kind: 'oidc', source: 'env' }), false) // env override never refreshes + assert.equal(isRefreshable({ kind: 'static', source: 'file' }), false) +}) + test('an oidc session round-trips with kind: oidc', async () => { const dir = await tmpState() await writeSession(dir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt', expiresAt: FUTURE, org: 'acme' }) diff --git a/test/core/remote-identity-client.test.js b/test/core/remote-identity-client.test.js index ed57357..5090f90 100644 --- a/test/core/remote-identity-client.test.js +++ b/test/core/remote-identity-client.test.js @@ -3,7 +3,7 @@ import test from 'node:test' import assert from 'node:assert/strict' -import { exchangeCode, refreshSession, InvalidGrantError } from '../../src/core/remote/identity_client.js' +import { exchangeCode, refreshSession, InvalidGrantError, sessionExpiredMessage } from '../../src/core/remote/identity_client.js' /** * A fetch stub that records the last request and returns `reply`. @@ -25,6 +25,10 @@ function stubFetch(reply) { return { fetchImpl, calls } } +test('sessionExpiredMessage names the target for re-login', () => { + assert.equal(sessionExpiredMessage('prod'), "remote session expired - re-run 'hyp remote login prod'") +}) + test('exchangeCode posts the authorization_code grant and maps the response', async () => { const { fetchImpl, calls } = stubFetch({ body: { session_id: 'sess-1', refresh_token: 'rt-1', access_jwt: 'jwt-1', expires_at: '2026-06-29T12:00:00Z', org: 'acme' }, From 42fedbf672607251650ab6638941438930bfef38 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 14:30:31 -0700 Subject: [PATCH 23/59] remote/oidc: cache the parsed credential file between reads The stdio proxy resolves a token per forwarded message, so resolveAccessJwt re-read and re-parsed the 0600 file on every message even when the cached JWT was nowhere near expiry. Add a single-entry parse cache keyed by path + mtime + size, busted on every write through this module, so a fresh-JWT message skips the read and parse while our own writes and external edits are still picked up. --- src/core/remote/credentials.js | 38 ++++++++++++++++++++--- test/core/remote-credentials-oidc.test.js | 22 +++++++++++++ 2 files changed, 56 insertions(+), 4 deletions(-) diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index 75c53b5..6355a14 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -33,6 +33,17 @@ const CREDENTIALS_BASENAME = 'remote-credentials.json' /** Refresh an `oidc` access JWT once it is within this window of expiry. */ const REFRESH_SKEW_MS = 60 * 1000 +/** + * Single-entry parse cache for the credential file. The stdio proxy resolves a + * token per forwarded message, so without this every message re-read and + * re-parsed the 0600 file. Keyed by path + mtime + size and busted on every + * write through this module, so a fresh-JWT message skips disk and parse while + * any real change (our own write, or an external edit) is still picked up. + * + * @type {{ path: string, mtimeMs: number, size: number, value: Record } | null} + */ +let rawCache = null + /** * Derive the identity base `/v1/identity` from a target's MCP URL * (LLP 0046 D6): identity is mounted at the same origin, so no second URL is @@ -106,9 +117,23 @@ export async function readCredentials(stateDir) { * @returns {Promise>} */ async function readRawCredentials(stateDir) { + const p = remoteCredentialsPath(stateDir) + /** @type {import('node:fs').Stats} */ + let stat + try { + stat = await fs.stat(p) + } catch (err) { + if (err && /** @type {NodeJS.ErrnoException} */ (err).code === 'ENOENT') return {} + throw err + } + // Cache hit: same file, unchanged since the last parse. Return a clone so a + // mutating caller (the write path) can't corrupt the cached value. + if (rawCache && rawCache.path === p && rawCache.mtimeMs === stat.mtimeMs && rawCache.size === stat.size) { + return structuredClone(rawCache.value) + } let raw try { - raw = await fs.readFile(remoteCredentialsPath(stateDir), 'utf8') + raw = await fs.readFile(p, 'utf8') } catch (err) { if (err && /** @type {NodeJS.ErrnoException} */ (err).code === 'ENOENT') return {} throw err @@ -118,12 +143,14 @@ async function readRawCredentials(stateDir) { try { parsed = JSON.parse(raw) } catch { - throw new Error(`remote credentials file is not valid JSON: ${remoteCredentialsPath(stateDir)}`) + throw new Error(`remote credentials file is not valid JSON: ${p}`) } if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) { - throw new Error(`remote credentials file must be a JSON object: ${remoteCredentialsPath(stateDir)}`) + throw new Error(`remote credentials file must be a JSON object: ${p}`) } - return /** @type {Record} */ (parsed) + const value = /** @type {Record} */ (parsed) + rawCache = { path: p, mtimeMs: stat.mtimeMs, size: stat.size, value } + return structuredClone(value) } /** @@ -366,4 +393,7 @@ async function writeCredentials(stateDir, map) { await fs.rename(tmpPath, finalPath) // Re-assert the mode in case the file pre-existed with looser perms. await fs.chmod(finalPath, 0o600).catch(() => {}) + // Invalidate the read cache: our own write must be visible to the next read + // regardless of mtime resolution. + rawCache = null } diff --git a/test/core/remote-credentials-oidc.test.js b/test/core/remote-credentials-oidc.test.js index 3ad7600..ceceda9 100644 --- a/test/core/remote-credentials-oidc.test.js +++ b/test/core/remote-credentials-oidc.test.js @@ -3,6 +3,7 @@ import test from 'node:test' import assert from 'node:assert/strict' import fs from 'node:fs/promises' +import fsp from 'node:fs/promises' import os from 'node:os' import path from 'node:path' @@ -24,6 +25,27 @@ async function tmpState() { const FUTURE = '2999-01-01T00:00:00Z' const PAST = '2000-01-01T00:00:00Z' +test('a second read with no change skips re-reading the file (parse cache)', async (t) => { + const dir = await tmpState() + await writeToken(dir, 'prod', 'sk-1') // creates the file and busts the cache + const spy = t.mock.method(fsp, 'readFile') + const isCredFile = (/** @type {any} */ c) => String(c.arguments[0]).endsWith('remote-credentials.json') + + await readCredentials(dir) // miss: reads + parses once + await readCredentials(dir) // hit: served from cache, no read + assert.equal(spy.mock.calls.filter(isCredFile).length, 1) +}) + +test('a write is visible to the very next read (cache is busted on write)', async (t) => { + const dir = await tmpState() + await writeToken(dir, 'prod', 'sk-1') + assert.deepEqual((await readCredentials(dir)).prod, { kind: 'static', token: 'sk-1' }) + // Overwrite through the module; the next read must see the new value, not the + // cached one. + await writeToken(dir, 'prod', 'sk-2') + assert.deepEqual((await readCredentials(dir)).prod, { kind: 'static', token: 'sk-2' }) +}) + test('isRefreshable is true only for an oidc record read from the file', () => { assert.equal(isRefreshable({ kind: 'oidc', source: 'file' }), true) assert.equal(isRefreshable({ kind: 'oidc', source: 'env' }), false) // env override never refreshes From ede2969d82a7ffe8c4c834ee74f05fdfa7feea77 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 15:16:37 -0700 Subject: [PATCH 24/59] remote/oidc: map invalid_grant by error code, not HTTP 401 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit RFC 6749 §5.2 returns HTTP 400 for invalid_grant, but the token client only raised InvalidGrantError on a 401. Against a spec-compliant server a revoked or expired refresh row would surface as a generic error instead of the re-login guidance the attach paths are built to show. Key the typed rejection on the body's error code and accept either status. --- src/core/remote/identity_client.js | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/src/core/remote/identity_client.js b/src/core/remote/identity_client.js index 9ce3427..c2073dc 100644 --- a/src/core/remote/identity_client.js +++ b/src/core/remote/identity_client.js @@ -87,9 +87,10 @@ export async function refreshSession({ identityBase, refreshToken, fetchImpl }) /** * POST a JSON body to `/token` and parse the JSON response. A - * 401 whose body carries `error: "invalid_grant"` becomes an - * {@link InvalidGrantError}; any other non-2xx becomes a generic error. The - * refresh token, access JWT, code, and verifier are never logged. + * non-2xx whose body carries `error: "invalid_grant"` becomes an + * {@link InvalidGrantError} (regardless of the exact status: RFC 6749 uses 400, + * some servers 401); any other non-2xx becomes a generic error. The refresh + * token, access JWT, code, and verifier are never logged. * * @param {{ identityBase: string, body: Record, fetchImpl?: typeof fetch, operation: string }} args * @returns {Promise>} @@ -122,7 +123,11 @@ async function postToken({ identityBase, body, fetchImpl, operation }) { [Attr.STATUS]: 'failed', [Attr.ERROR_KIND]: errCode ?? `http_${res.status}`, }) - if (res.status === 401 && errCode === 'invalid_grant') { + // Key the typed rejection on the OAuth error code, not the HTTP status: + // RFC 6749 §5.2 returns 400 for `invalid_grant`, but some deployments use + // 401. Either way a revoked/expired refresh row must reach the re-login + // guidance, so trust the body's `error` field over the status code. + if (errCode === 'invalid_grant') { throw new InvalidGrantError() } throw new Error(`identity endpoint rejected the grant (HTTP ${res.status}${errCode ? ` ${errCode}` : ''})`) From cad0c816126e48b4da07e6178d2336860dc9b22d Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 15:16:46 -0700 Subject: [PATCH 25/59] remote: lift the defensive safeText reader into one shared helper The MCP client and the identity client both wrapped res.text() to return '' on an unreadable body, byte-for-byte the same. Extract it to src/core/http_text.js so the two cannot drift. --- src/core/http_text.js | 18 ++++++++++++++++++ src/core/mcp/client.js | 10 ++-------- src/core/remote/identity_client.js | 9 +-------- 3 files changed, 21 insertions(+), 16 deletions(-) create mode 100644 src/core/http_text.js diff --git a/src/core/http_text.js b/src/core/http_text.js new file mode 100644 index 0000000..51226e4 --- /dev/null +++ b/src/core/http_text.js @@ -0,0 +1,18 @@ +// @ts-check + +/** + * Read a fetch `Response` body as text, returning `''` instead of throwing when + * the body is absent or unreadable. Both the MCP client and the identity client + * decode an error body defensively (the response may be empty or already + * consumed), so the helper has one home rather than a copy in each. + * + * @param {{ text(): Promise }} res + * @returns {Promise} + */ +export async function safeText(res) { + try { + return await res.text() + } catch { + return '' + } +} diff --git a/src/core/mcp/client.js b/src/core/mcp/client.js index 0a59876..b9c3b8e 100644 --- a/src/core/mcp/client.js +++ b/src/core/mcp/client.js @@ -1,5 +1,7 @@ // @ts-check +import { safeText } from '../http_text.js' + /** * MCP **client** over Streamable HTTP: the consumer half of remote attach * (LLP 0033). `hyp` is an MCP client, so `hyp --remote ` @@ -184,11 +186,3 @@ function pickById(messages, id) { return messages.find((m) => m && typeof m === 'object' && m.id === id) } -/** @param {any} res */ -async function safeText(res) { - try { - return await res.text() - } catch { - return '' - } -} diff --git a/src/core/remote/identity_client.js b/src/core/remote/identity_client.js index c2073dc..a732a4b 100644 --- a/src/core/remote/identity_client.js +++ b/src/core/remote/identity_client.js @@ -1,5 +1,6 @@ // @ts-check +import { safeText } from '../http_text.js' import { Attr, getLogger } from '../observability/index.js' /** @@ -159,11 +160,3 @@ export function trimSlash(base) { return base.replace(/\/+$/, '') } -/** @param {any} res @returns {Promise} */ -async function safeText(res) { - try { - return await res.text() - } catch { - return '' - } -} From eb61be8a2403bf902c41d8b964b8f8b6fa0e8b41 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 15:16:58 -0700 Subject: [PATCH 26/59] remote/oidc: declare the fs.Stats type via @import, not inline CLAUDE.md forbids inline import('...') types: declare them at the top of the file with @import and reference the bare name. --- src/core/remote/credentials.js | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index 6355a14..c115c67 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -7,6 +7,7 @@ import process from 'node:process' import { refreshSession } from './identity_client.js' /** + * @import { Stats } from 'node:fs' * @import { RemoteCredentialRecord, RemoteOidcRecord } from '../../../src/core/remote/types.js' */ @@ -118,7 +119,7 @@ export async function readCredentials(stateDir) { */ async function readRawCredentials(stateDir) { const p = remoteCredentialsPath(stateDir) - /** @type {import('node:fs').Stats} */ + /** @type {Stats} */ let stat try { stat = await fs.stat(p) From 1511615048a3b88d6b87b9f9e31082d681cfea8a Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 15:17:08 -0700 Subject: [PATCH 27/59] mcp: give the 401 refresh-and-retry policy one home The proxy and the one-shot verb path each re-implemented the same attach policy: run the call, and on a live 401/403 against a refreshable OIDC session force one refresh and retry once. They diverged on how an auth failure is detected (an HTTP 401 response vs a thrown 401) and on how the outcome is formatted. Extract attachWithRefresh in credentials.js, next to isRefreshable and resolveAccessJwt, so the retry decision lives in one tested place; each caller now supplies only its op and a refresh thunk. --- src/core/mcp/proxy.js | 60 ++++++++++++++++++---------------- src/core/mcp/remote_verb.js | 49 +++++++++++++-------------- src/core/remote/credentials.js | 34 +++++++++++++++++++ 3 files changed, 88 insertions(+), 55 deletions(-) diff --git a/src/core/mcp/proxy.js b/src/core/mcp/proxy.js index 850dc02..a699ae5 100644 --- a/src/core/mcp/proxy.js +++ b/src/core/mcp/proxy.js @@ -4,7 +4,7 @@ import process from 'node:process' import { Attr, getLogger } from '../observability/index.js' import { readObservabilityEnv } from '../observability/env.js' -import { deriveIdentityBase, isRefreshable, resolveAccessJwt, resolveToken } from '../remote/credentials.js' +import { attachWithRefresh, deriveIdentityBase, resolveAccessJwt, resolveToken } from '../remote/credentials.js' import { InvalidGrantError, sessionExpiredMessage } from '../remote/identity_client.js' import { parseRpcResponse } from './client.js' import { INTERNAL_ERROR, jsonRpcError } from './jsonrpc.js' @@ -98,37 +98,41 @@ export async function runMcpProxy({ target, ctx }) { return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, resolved.error) } - let res - try { - res = await forward(message, resolved.token) - } catch (err) { - return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, `proxy fetch failed: ${err instanceof Error ? err.message : String(err)}`) - } - - // A live 401/403 on an OIDC session means the cached JWT was revoked or - // expired early: force one refresh + retry before surfacing (LLP 0046 D5). - // An env/static token cannot refresh, so it falls through to the error. - if ((res.status === 401 || res.status === 403) && isRefreshable(resolved)) { - let refreshed + // Forward via the shared attach policy: a live 401/403 on a refreshable + // OIDC session forces one refresh + retry before surfacing (LLP 0046 D5); + // an env/static token cannot refresh, so its 401 falls through. `op` folds + // both the fetch outcome and a fetch failure into `value`, so the retry + // decision stays in attachWithRefresh and never throws here. + /** @type {(token: string) => Promise<{ authFailed: boolean, value: { res: Awaited> } | { err: unknown } }>} */ + const op = async (token) => { try { - refreshed = await resolveAccessJwt({ target, env: ctx.env, stateDir, identityBase, forceRefresh: true }) + const res = await forward(message, token) + return { authFailed: res.status === 401 || res.status === 403, value: { res } } } catch (err) { - return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, proxyAuthMessage(err, target)) - } - if (refreshed.ok) { - try { - res = await forward(message, refreshed.token) - } catch (err) { - return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, `proxy fetch failed: ${err instanceof Error ? err.message : String(err)}`) - } - } else { - // The forced refresh could not produce a token (e.g. no identity - // endpoint). Surface that reason rather than letting the stale 401 - // fall through to a generic "remote returned HTTP 401" with no - // guidance, mirroring the one-shot verb path. - return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, refreshed.error) + return { authFailed: false, value: { err } } } } + let out + try { + out = await attachWithRefresh({ + resolved, + refresh: () => resolveAccessJwt({ target, env: ctx.env, stateDir, identityBase, forceRefresh: true }), + op, + }) + } catch (err) { + return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, proxyAuthMessage(err, target)) + } + if (!out.ok) { + // The forced refresh could not produce a token (e.g. no identity + // endpoint). Surface that reason rather than letting the stale 401 + // fall through to a generic "remote returned HTTP 401" with no guidance. + return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, out.error) + } + if ('err' in out.value) { + const err = out.value.err + return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, `proxy fetch failed: ${err instanceof Error ? err.message : String(err)}`) + } + const res = out.value.res const sid = res.headers?.get?.('mcp-session-id') if (sid) sessionId = sid diff --git a/src/core/mcp/remote_verb.js b/src/core/mcp/remote_verb.js index 046c464..7d4c52e 100644 --- a/src/core/mcp/remote_verb.js +++ b/src/core/mcp/remote_verb.js @@ -1,7 +1,7 @@ // @ts-check import { readObservabilityEnv } from '../observability/env.js' -import { deriveIdentityBase, isRefreshable, resolveAccessJwt } from '../remote/credentials.js' +import { attachWithRefresh, deriveIdentityBase, resolveAccessJwt } from '../remote/credentials.js' import { InvalidGrantError, sessionExpiredMessage } from '../remote/identity_client.js' import { createHttpMcpClient } from './client.js' @@ -49,35 +49,30 @@ export async function runRemoteVerb({ verb, params, target, ctx }) { return { ok: false, error: resolved.error, exitCode: 2 } } - /** Run one full attach attempt with a given bearer token. */ - const attempt = (/** @type {string} */ token) => callRemoteTool({ url: entry.url, token, verb, params }) - - try { - return await attempt(resolved.token) - } catch (err) { - // A live 401/403 on an OIDC session means the cached JWT is stale or was - // revoked early. Refresh once and retry before surfacing (LLP 0046 D5). An - // env override or a static token cannot be refreshed, so it surfaces as-is. - const refreshable = isRefreshable(resolved) - if (!refreshable || !isAuthError(err)) { - return { ok: false, error: err instanceof Error ? err.message : String(err), exitCode: 1 } - } - /** @type {Awaited>} */ - let refreshed - try { - refreshed = await resolveAccessJwt({ target, env: ctx.env, stateDir, identityBase, forceRefresh: true }) - } catch (refreshErr) { - return mapRefreshError(refreshErr, target) - } - if (!refreshed.ok) { - return { ok: false, error: refreshed.error, exitCode: 2 } - } + // One full attach attempt, folded into the shape attachWithRefresh wants: a + // thrown 401/403 (stale or revoked OIDC JWT) is reported as `authFailed` so + // the shared policy can force one refresh and retry; any other throw is a + // terminal non-auth error folded into the returned verb-result. + const op = async (/** @type {string} */ token) => { try { - return await attempt(refreshed.token) - } catch (retryErr) { - return { ok: false, error: retryErr instanceof Error ? retryErr.message : String(retryErr), exitCode: 1 } + return { authFailed: false, value: await callRemoteTool({ url: entry.url, token, verb, params }) } + } catch (err) { + /** @type {{ ok: false, error: string, exitCode: number }} */ + const value = { ok: false, error: err instanceof Error ? err.message : String(err), exitCode: 1 } + return { authFailed: isAuthError(err), value } } } + + try { + const out = await attachWithRefresh({ + resolved, + refresh: () => resolveAccessJwt({ target, env: ctx.env, stateDir, identityBase, forceRefresh: true }), + op, + }) + return out.ok ? out.value : { ok: false, error: out.error, exitCode: 2 } + } catch (refreshErr) { + return mapRefreshError(refreshErr, target) + } } /** diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index c115c67..f0881a1 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -345,6 +345,40 @@ export function isRefreshable(resolved) { return resolved.kind === 'oidc' && resolved.source === 'file' } +/** + * The one-shot refresh + retry policy both attach paths share (LLP 0046 D5): + * run `op(token)` with the already-resolved token; if `op` reports an auth + * failure on a refreshable (oidc/file) credential, force one refresh and run + * `op` once more. Env overrides and static tokens cannot refresh, so their auth + * failure is returned as-is. The two callers differ only in how `op` detects an + * auth failure (a thrown 401 for the verb path, an HTTP 401 response for the + * stdio proxy) and how they format the outcome; the retry decision lives here. + * + * `op` returns `{ authFailed, value }` and never throws for an auth failure (it + * folds it into `value`), so the final attempt's `value` is returned verbatim + * even when it too failed auth. A `refresh()` that throws (e.g. a typed + * `invalid_grant`) propagates to the caller, which maps it to re-login guidance. + * + * @template T + * @param {{ + * resolved: { ok: true, token: string, source?: 'env' | 'file', kind?: 'static' | 'oidc' }, + * refresh: () => Promise<{ ok: true, token: string, source?: 'env' | 'file', kind?: 'static' | 'oidc' } | { ok: false, error: string }>, + * op: (token: string) => Promise<{ authFailed: boolean, value: T }>, + * }} args + * @returns {Promise<{ ok: true, value: T } | { ok: false, error: string }>} + * @ref LLP 0046#d5 [implements]: the 401 -> force refresh -> retry-once policy, one home for both attach paths + */ +export async function attachWithRefresh({ resolved, refresh, op }) { + const first = await op(resolved.token) + if (!first.authFailed || !isRefreshable(resolved)) { + return { ok: true, value: first.value } + } + const refreshed = await refresh() + if (!refreshed.ok) return { ok: false, error: refreshed.error } + const second = await op(refreshed.token) + return { ok: true, value: second.value } +} + /** * Whether an `oidc` record's cached JWT is still safely usable: present and * more than the skew window from its parseable expiry. From 89350da3998006413151ea57e02df59897eaa4ab Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 15:17:15 -0700 Subject: [PATCH 28/59] remote/oidc: sanitize the OAuth error code from the loopback callback Anyone can hit the ephemeral loopback port, so the redirect's error param is attacker-influenceable. It flowed unsanitized into the Error message, the log ERROR_KIND, and the terminal, where a crafted value could inject newlines into logs or terminal output. Bound it to the RFC 6749 printable error-code range and a max length before use. --- src/core/remote/loopback.js | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/src/core/remote/loopback.js b/src/core/remote/loopback.js index df1335c..f4cb872 100644 --- a/src/core/remote/loopback.js +++ b/src/core/remote/loopback.js @@ -102,8 +102,13 @@ export function startLoopbackReceiver({ state, timeoutMs = DEFAULT_TIMEOUT_MS }) return } if (error) { + // The redirect's `error` is attacker-influenceable (anyone can hit the + // loopback port). Bound it to a safe token before it reaches the error + // message, the log ERROR_KIND, and the terminal, so a crafted value can't + // inject newlines into logs or terminal output. + const safeError = sanitizeErrorCode(error) respond(res, 'Login failed. You can close this tab and return to the terminal.') - fail(Object.assign(new Error(`login failed: ${error}`), { callbackError: error }), error) + fail(Object.assign(new Error(`login failed: ${safeError}`), { callbackError: safeError }), safeError) return } if (!code) { @@ -179,6 +184,20 @@ export function startLoopbackReceiver({ state, timeoutMs = DEFAULT_TIMEOUT_MS }) }) } +/** + * Reduce an OAuth `error` redirect param to a bounded, log-safe token. RFC 6749 + * error codes are `%x20-21 / %x23-5B / %x5D-7E`; we keep that printable range, + * drop control chars (newlines especially), and cap the length so a hostile + * callback can't inject lines into logs or the terminal. + * + * @param {string} error + * @returns {string} + */ +function sanitizeErrorCode(error) { + const cleaned = error.replace(/[^\x20-\x7E]/g, '').replace(/["\\]/g, '').trim().slice(0, 80) + return cleaned || 'unknown_error' +} + /** @param {ServerResponse} res @param {string} message */ function respond(res, message) { const body = `HypAware login

${message}

` From 11b13a63db231f7a6ade96b170f6a9130b2c48b5 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 15:42:00 -0700 Subject: [PATCH 29/59] remote/oidc: tolerate a concurrent refresh-token rotation The 0600 credential store is shared by concurrent hyp processes (a second MCP client, or a verb call beside a running proxy). With one-time-use refresh tokens, the loser of a refresh race got invalid_grant only because the winner had already rotated the row, and was told to re-run 'hyp remote login' despite holding a valid session. resolveAccessJwt now re-reads the store once on invalid_grant: if the stored refresh token changed, it adopts the winner's freshly-minted session (its JWT if still fresh, else a refresh with the rotated token) instead of surfacing re-login guidance. An unchanged token is a real revocation and still surfaces the guidance. Documented in LLP 0046 D5. --- llp/0046-oidc-login-client.decision.md | 8 +++++ src/core/remote/credentials.js | 25 +++++++++++++-- test/core/remote-credentials-oidc.test.js | 39 +++++++++++++++++++++++ 3 files changed, 69 insertions(+), 3 deletions(-) diff --git a/llp/0046-oidc-login-client.decision.md b/llp/0046-oidc-login-client.decision.md index f500a53..9d247a8 100644 --- a/llp/0046-oidc-login-client.decision.md +++ b/llp/0046-oidc-login-client.decision.md @@ -104,6 +104,14 @@ surfaces the existing "re-run `hyp remote login`" guidance, now meaning re-run t browser flow. Rejected refreshing eagerly on every call (needless token-endpoint traffic). The per-target env override (`HYP_REMOTE_TOKEN_`) still wins for CI. +Because the `0600` store is shared by concurrent `hyp` processes (a second MCP client, +or a verb call beside a running proxy), an `invalid_grant` is not unconditionally +terminal: with one-time-use refresh tokens, the loser of a refresh race sees +`invalid_grant` only because the winner already rotated the row. The resolver re-reads +the store once on `invalid_grant`; if the stored refresh token changed, it adopts the +winner's freshly-minted session instead of forcing a re-login. An unchanged token is a +genuine revocation and still surfaces the re-login guidance. + ### D6: Identity endpoints derive from the configured remote URL origin diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index f0881a1..8de8073 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -4,7 +4,7 @@ import fs from 'node:fs/promises' import path from 'node:path' import process from 'node:process' -import { refreshSession } from './identity_client.js' +import { InvalidGrantError, refreshSession } from './identity_client.js' /** * @import { Stats } from 'node:fs' @@ -296,7 +296,7 @@ export async function resolveAccessJwt({ target, env, stateDir, identityBase, no } const creds = await readCredentials(stateDir) - const entry = creds[target] + let entry = creds[target] if (!entry) { return { ok: false, error: noTokenError(target, envName) } } @@ -312,7 +312,26 @@ export async function resolveAccessJwt({ target, env, stateDir, identityBase, no if (!identityBase) { return { ok: false, error: `cannot refresh '${target}': no identity endpoint resolved` } } - const refreshed = await refreshSession({ identityBase, refreshToken: entry.refreshToken, fetchImpl }) + let refreshed + try { + refreshed = await refreshSession({ identityBase, refreshToken: entry.refreshToken, fetchImpl }) + } catch (err) { + // Concurrent-rotation tolerance: another `hyp` process sharing this 0600 + // store (a second MCP client, or a verb call alongside a proxy) may have + // refreshed first, consuming our refresh row so this refresh comes back + // `invalid_grant`. Re-read once: if the stored refresh token changed, that + // process already minted a fresh session - adopt it (its JWT if still + // fresh, else refresh with the rotated token) rather than forcing a + // needless re-login. An unchanged token is a real revocation; re-throw. + if (!(err instanceof InvalidGrantError)) throw err + const latest = (await readCredentials(stateDir))[target] + if (!latest || latest.kind !== 'oidc' || latest.refreshToken === entry.refreshToken) throw err + if (isFresh(latest, now)) { + return { ok: true, token: latest.accessJwt, source: 'file', kind: 'oidc' } + } + entry = latest + refreshed = await refreshSession({ identityBase, refreshToken: latest.refreshToken, fetchImpl }) + } /** @type {RemoteOidcRecord} */ // `org` is fixed for the refresh token's life; keep the stored one when the // refresh response omits it (refreshSession returns '' in that case). A diff --git a/test/core/remote-credentials-oidc.test.js b/test/core/remote-credentials-oidc.test.js index ceceda9..0c3dfca 100644 --- a/test/core/remote-credentials-oidc.test.js +++ b/test/core/remote-credentials-oidc.test.js @@ -236,6 +236,45 @@ test('resolveAccessJwt errors with login guidance when no record exists', async assert.match(/** @type {any} */ (r).error, /no token for 'prod' - run 'hyp remote login prod'/) }) +test('resolveAccessJwt adopts a concurrent process\'s fresh session on invalid_grant', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'rt-old', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) + let calls = 0 + // A second `hyp` process refreshed first, rotating the row out from under us, + // and stored its fresh session. Our refresh comes back invalid_grant; we must + // adopt the winner's JWT rather than force a re-login. + const fetchImpl = /** @type {any} */ (async () => { + calls++ + await writeSession(dir, 'prod', { refreshToken: 'rt-winner', accessJwt: 'jwt-winner', expiresAt: FUTURE, org: 'acme' }) + return { ok: false, status: 401, text: async () => JSON.stringify({ error: 'invalid_grant' }) } + }) + const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }) + assert.equal(/** @type {any} */ (r).token, 'jwt-winner') + assert.equal(calls, 1) // the fresh adopted JWT means no second refresh +}) + +test('resolveAccessJwt refreshes with a rotated token when the adopted session is also stale', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'rt-old', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) + let calls = 0 + const fetchImpl = /** @type {any} */ (async (/** @type {string} */ _url, /** @type {any} */ opts) => { + calls++ + const sent = JSON.parse(opts.body).refresh_token + if (sent === 'rt-old') { + // Winner rotated to rt-new but its JWT is also already stale. + await writeSession(dir, 'prod', { refreshToken: 'rt-new', accessJwt: 'jwt-mid', expiresAt: PAST, org: 'acme' }) + return { ok: false, status: 401, text: async () => JSON.stringify({ error: 'invalid_grant' }) } + } + // Retry with the rotated token succeeds. + return { ok: true, status: 200, text: async () => JSON.stringify({ access_jwt: 'jwt-final', expires_at: FUTURE, org: 'acme', refresh_token: 'rt-final' }) } + }) + const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }) + assert.equal(/** @type {any} */ (r).token, 'jwt-final') + assert.equal(calls, 2) + const creds = await readCredentials(dir) + assert.equal(/** @type {any} */ (creds.prod).refreshToken, 'rt-final') +}) + test('resolveAccessJwt propagates a refresh failure (invalid_grant)', async () => { const dir = await tmpState() await writeSession(dir, 'prod', { refreshToken: 'stale', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) From 4e1c4a964aa00592f1461742cc83960e90da6159 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 15:42:07 -0700 Subject: [PATCH 30/59] remote/login: drop the redundant !tokenFile guard The no-browser-with-piped-stdin branch is only reached when useStatic is false, and useStatic already forces the static path whenever --token-file is present, so tokenFile is guaranteed absent here. Drop the dead condition. --- src/core/cli/remote_commands.js | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/core/cli/remote_commands.js b/src/core/cli/remote_commands.js index 56d4abf..3efa3ca 100644 --- a/src/core/cli/remote_commands.js +++ b/src/core/cli/remote_commands.js @@ -138,7 +138,9 @@ export async function runRemoteLogin(argv, ctx, deps = {}) { // Peek stdin to tell a real token apart from an empty pipe: a non-empty value // is stored statically (never silently discarded), while an empty pipe falls // through to the browser flow (which is what `--no-browser` is for). - if (stdinPiped && noBrowser && !forceBrowser && !tokenFile) { + // `!useStatic` already guarantees no `--token-file` here (it forces the static + // path), so this only needs the browser-mode + piped-stdin conditions. + if (stdinPiped && noBrowser && !forceBrowser) { /** @type {string} */ let piped try { From 8d4d572ecc2cff54891038cda1a87ad6a526bd22 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 15:42:07 -0700 Subject: [PATCH 31/59] mcp: share isAuthStatus instead of inlining the 401/403 check Export the 401/403 predicate from client.js and have the stdio proxy's per-message forward call it, so "which statuses mean auth" has one home rather than a copy in the proxy. --- src/core/mcp/client.js | 6 +++++- src/core/mcp/proxy.js | 4 ++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/src/core/mcp/client.js b/src/core/mcp/client.js index b9c3b8e..85aae8a 100644 --- a/src/core/mcp/client.js +++ b/src/core/mcp/client.js @@ -107,10 +107,14 @@ export function createHttpMcpClient(opts) { } /** + * Whether an HTTP status is an auth rejection (401/403) eligible for a one-shot + * refresh + retry. The single home for "which statuses mean auth", shared with + * the stdio proxy's per-message forward. + * * @param {number} status * @returns {boolean} */ -function isAuthStatus(status) { +export function isAuthStatus(status) { return status === 401 || status === 403 } diff --git a/src/core/mcp/proxy.js b/src/core/mcp/proxy.js index a699ae5..2fa5253 100644 --- a/src/core/mcp/proxy.js +++ b/src/core/mcp/proxy.js @@ -6,7 +6,7 @@ import { Attr, getLogger } from '../observability/index.js' import { readObservabilityEnv } from '../observability/env.js' import { attachWithRefresh, deriveIdentityBase, resolveAccessJwt, resolveToken } from '../remote/credentials.js' import { InvalidGrantError, sessionExpiredMessage } from '../remote/identity_client.js' -import { parseRpcResponse } from './client.js' +import { isAuthStatus, parseRpcResponse } from './client.js' import { INTERNAL_ERROR, jsonRpcError } from './jsonrpc.js' import { serveStdio } from './stdio.js' @@ -107,7 +107,7 @@ export async function runMcpProxy({ target, ctx }) { const op = async (token) => { try { const res = await forward(message, token) - return { authFailed: res.status === 401 || res.status === 403, value: { res } } + return { authFailed: isAuthStatus(res.status), value: { res } } } catch (err) { return { authFailed: false, value: { err } } } From 2e5737be5be1b2b33b77514d6103495b945a153f Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 16:18:15 -0700 Subject: [PATCH 32/59] remote/oidc: address review findings on the login + attach paths - proxy: a 401/403 surviving the refresh+retry now returns the re-login guidance the verb path gives, not a bare "remote returned HTTP 401". - identity: reject a non-date expires_at at parse time so a bad server value can't turn every forwarded message into a fresh refresh that re-stores the same value and never self-corrects. - credentials: drop the parse cache before the invalid_grant re-read so a same-size concurrent token rotation can't hide behind the cache; and stop deep-cloning the whole store on every per-message read (only the in-place write path clones now). - share the 401/403 status check (isAuthStatus) and the invalid_grant -> session-expired mapping (describeRefreshError) instead of copy-pasting them across the proxy and verb attach paths. - loopback: treat a present-but-empty error= callback as an error. --- src/core/mcp/proxy.js | 31 ++++++------------ src/core/mcp/remote_verb.js | 12 +++---- src/core/remote/credentials.js | 25 ++++++++++----- src/core/remote/identity_client.js | 40 ++++++++++++++++++++++-- src/core/remote/loopback.js | 4 +-- test/core/remote-identity-client.test.js | 24 +++++++++++++- test/core/remote-oidc-login.test.js | 2 +- test/core/remote-proxy-oidc.test.js | 24 ++++++++++++++ 8 files changed, 120 insertions(+), 42 deletions(-) diff --git a/src/core/mcp/proxy.js b/src/core/mcp/proxy.js index 2fa5253..7a0b0ec 100644 --- a/src/core/mcp/proxy.js +++ b/src/core/mcp/proxy.js @@ -5,7 +5,7 @@ import process from 'node:process' import { Attr, getLogger } from '../observability/index.js' import { readObservabilityEnv } from '../observability/env.js' import { attachWithRefresh, deriveIdentityBase, resolveAccessJwt, resolveToken } from '../remote/credentials.js' -import { InvalidGrantError, sessionExpiredMessage } from '../remote/identity_client.js' +import { describeRefreshError, sessionExpiredMessage } from '../remote/identity_client.js' import { isAuthStatus, parseRpcResponse } from './client.js' import { INTERNAL_ERROR, jsonRpcError } from './jsonrpc.js' import { serveStdio } from './stdio.js' @@ -52,7 +52,7 @@ export async function runMcpProxy({ target, ctx }) { return 2 } } catch (err) { - ctx.stderr.write(`hyp mcp: ${proxyAuthMessage(err, target)}\n`) + ctx.stderr.write(`hyp mcp: ${describeRefreshError(err, target).message}\n`) return 2 } const fetchImpl = /** @type {typeof fetch | undefined} */ (globalThis.fetch) @@ -92,7 +92,7 @@ export async function runMcpProxy({ target, ctx }) { try { resolved = await resolveAccessJwt({ target, env: ctx.env, stateDir, identityBase }) } catch (err) { - return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, proxyAuthMessage(err, target)) + return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, describeRefreshError(err, target).message) } if (!resolved.ok) { return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, resolved.error) @@ -120,7 +120,7 @@ export async function runMcpProxy({ target, ctx }) { op, }) } catch (err) { - return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, proxyAuthMessage(err, target)) + return isNotify ? null : jsonRpcError(id, INTERNAL_ERROR, describeRefreshError(err, target).message) } if (!out.ok) { // The forced refresh could not produce a token (e.g. no identity @@ -138,7 +138,12 @@ export async function runMcpProxy({ target, ctx }) { if (sid) sessionId = sid if (isNotify) return null if (!res.ok) { - return jsonRpcError(id, INTERNAL_ERROR, `remote returned HTTP ${res.status}`) + // A 401/403 that survives the refresh + retry above (server-side clock + // skew, or a JWT revoked independently of the refresh row) is a dead + // credential: give the same re-login guidance the verb path does rather + // than a bare status the caller can't act on. + const detail = isAuthStatus(res.status) ? sessionExpiredMessage(target) : `remote returned HTTP ${res.status}` + return jsonRpcError(id, INTERNAL_ERROR, detail) } try { return await parseRpcResponse(res, message?.id) @@ -162,19 +167,3 @@ export async function runMcpProxy({ target, ctx }) { log.info('mcp.proxy_stop', { [Attr.COMPONENT]: 'mcp', [Attr.OPERATION]: 'mcp.proxy' }) return 0 } - -/** - * Message for a refresh failure: a typed `invalid_grant` (the refresh row was - * revoked or expired) becomes re-login guidance (LLP 0046 D5); anything else - * surfaces as a generic error. - * - * @param {unknown} err - * @param {string} target - * @returns {string} - */ -function proxyAuthMessage(err, target) { - if (err instanceof InvalidGrantError) { - return sessionExpiredMessage(target) - } - return err instanceof Error ? err.message : String(err) -} diff --git a/src/core/mcp/remote_verb.js b/src/core/mcp/remote_verb.js index 7d4c52e..3bf5273 100644 --- a/src/core/mcp/remote_verb.js +++ b/src/core/mcp/remote_verb.js @@ -2,8 +2,8 @@ import { readObservabilityEnv } from '../observability/env.js' import { attachWithRefresh, deriveIdentityBase, resolveAccessJwt } from '../remote/credentials.js' -import { InvalidGrantError, sessionExpiredMessage } from '../remote/identity_client.js' -import { createHttpMcpClient } from './client.js' +import { describeRefreshError } from '../remote/identity_client.js' +import { createHttpMcpClient, isAuthStatus } from './client.js' /** * @import { CommandRunContext, VerbRegistration } from '../../../collectivus-plugin-kernel-types.js' @@ -105,10 +105,8 @@ async function callRemoteTool({ url, token, verb, params }) { * @returns {{ ok: false, error: string, exitCode: number }} */ function mapRefreshError(err, target) { - if (err instanceof InvalidGrantError) { - return { ok: false, error: sessionExpiredMessage(target), exitCode: 2 } - } - return { ok: false, error: err instanceof Error ? err.message : String(err), exitCode: 1 } + const { sessionExpired, message } = describeRefreshError(err, target) + return { ok: false, error: message, exitCode: sessionExpired ? 2 : 1 } } /** @@ -120,7 +118,7 @@ function mapRefreshError(err, target) { */ function isAuthError(err) { return !!err && typeof err === 'object' && - (/** @type {any} */ (err).authError === true || /** @type {any} */ (err).status === 401 || /** @type {any} */ (err).status === 403) + (/** @type {any} */ (err).authError === true || isAuthStatus(/** @type {any} */ (err).status)) } /** diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index 8de8073..152e8cb 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -114,10 +114,16 @@ export async function readCredentials(stateDir) { * `{}` when the file is absent; throws on a corrupt file, same as * {@link readCredentials}. * + * The read path (it builds a fresh normalized map and never mutates the parsed + * object) takes the cached value directly; only a `mutable` caller that edits + * the map in place before writing it back gets a defensive clone, so the + * per-message proxy resolve doesn't deep-clone the whole store on every hit. + * * @param {string} stateDir + * @param {{ mutable?: boolean }} [opts] * @returns {Promise>} */ -async function readRawCredentials(stateDir) { +async function readRawCredentials(stateDir, { mutable = false } = {}) { const p = remoteCredentialsPath(stateDir) /** @type {Stats} */ let stat @@ -127,10 +133,9 @@ async function readRawCredentials(stateDir) { if (err && /** @type {NodeJS.ErrnoException} */ (err).code === 'ENOENT') return {} throw err } - // Cache hit: same file, unchanged since the last parse. Return a clone so a - // mutating caller (the write path) can't corrupt the cached value. + // Cache hit: same file, unchanged since the last parse. if (rawCache && rawCache.path === p && rawCache.mtimeMs === stat.mtimeMs && rawCache.size === stat.size) { - return structuredClone(rawCache.value) + return mutable ? structuredClone(rawCache.value) : rawCache.value } let raw try { @@ -151,7 +156,7 @@ async function readRawCredentials(stateDir) { } const value = /** @type {Record} */ (parsed) rawCache = { path: p, mtimeMs: stat.mtimeMs, size: stat.size, value } - return structuredClone(value) + return mutable ? structuredClone(value) : value } /** @@ -196,7 +201,7 @@ function normalizeRecord(entry) { */ export async function writeToken(stateDir, target, token) { await fs.mkdir(stateDir, { recursive: true }) - const current = await readRawCredentials(stateDir) + const current = await readRawCredentials(stateDir, { mutable: true }) current[target] = { kind: 'static', token } await writeCredentials(stateDir, current) } @@ -213,7 +218,7 @@ export async function writeToken(stateDir, target, token) { */ export async function writeSession(stateDir, target, session) { await fs.mkdir(stateDir, { recursive: true }) - const current = await readRawCredentials(stateDir) + const current = await readRawCredentials(stateDir, { mutable: true }) current[target] = { kind: 'oidc', refreshToken: session.refreshToken, @@ -235,7 +240,7 @@ export async function writeSession(stateDir, target, session) { export async function removeToken(stateDir, target) { // Operate on the raw file so a record we can't normalize is still removable, // and so removing one target never drops an unrelated sibling. - const current = await readRawCredentials(stateDir) + const current = await readRawCredentials(stateDir, { mutable: true }) if (!Object.prototype.hasOwnProperty.call(current, target)) return false delete current[target] await writeCredentials(stateDir, current) @@ -324,6 +329,10 @@ export async function resolveAccessJwt({ target, env, stateDir, identityBase, no // fresh, else refresh with the rotated token) rather than forcing a // needless re-login. An unchanged token is a real revocation; re-throw. if (!(err instanceof InvalidGrantError)) throw err + // Drop the parse cache before re-reading: we are reacting to a probable + // concurrent write, and a same-size rewrite landing within one mtime tick + // would otherwise be hidden behind the cache, making us miss the rotation. + rawCache = null const latest = (await readCredentials(stateDir))[target] if (!latest || latest.kind !== 'oidc' || latest.refreshToken === entry.refreshToken) throw err if (isFresh(latest, now)) { diff --git a/src/core/remote/identity_client.js b/src/core/remote/identity_client.js index a732a4b..8b0d9e5 100644 --- a/src/core/remote/identity_client.js +++ b/src/core/remote/identity_client.js @@ -42,6 +42,25 @@ export function sessionExpiredMessage(target) { return `remote session expired - re-run 'hyp remote login ${target}'` } +/** + * Classify a refresh failure so both attach paths report it the same way: a + * typed `invalid_grant` (the refresh row was revoked or expired) is a session + * expiry that maps to the re-login guidance; anything else surfaces its own + * message. The one home for this decision, shared by the stdio proxy and the + * verb attach path, so the two can never drift in what they tell the user. + * + * @param {unknown} err + * @param {string} target + * @returns {{ sessionExpired: boolean, message: string }} + * @ref LLP 0046#d5 [implements]: invalid_grant -> re-login guidance, one home for both attach paths + */ +export function describeRefreshError(err, target) { + if (err instanceof InvalidGrantError) { + return { sessionExpired: true, message: sessionExpiredMessage(target) } + } + return { sessionExpired: false, message: err instanceof Error ? err.message : String(err) } +} + /** * Exchange an authorization code for a session (the `authorization_code` * grant). Presents the held PKCE verifier. @@ -55,7 +74,7 @@ export async function exchangeCode({ identityBase, code, codeVerifier, fetchImpl return { refreshToken: str(json.refresh_token, 'refresh_token'), accessJwt: str(json.access_jwt, 'access_jwt'), - expiresAt: str(json.expires_at, 'expires_at'), + expiresAt: isoTimestamp(json.expires_at, 'expires_at'), org: str(json.org, 'org'), } } @@ -72,7 +91,7 @@ export async function refreshSession({ identityBase, refreshToken, fetchImpl }) const json = await postToken({ identityBase, body, fetchImpl, operation: 'remote.refresh' }) return { accessJwt: str(json.access_jwt, 'access_jwt'), - expiresAt: str(json.expires_at, 'expires_at'), + expiresAt: isoTimestamp(json.expires_at, 'expires_at'), // The refresh grant only has to re-mint the access JWT; `org` is fixed for // the life of the refresh token. Treat it as optional here and let the // caller keep the org it already stored, so a server that omits it on @@ -149,6 +168,23 @@ function str(v, field) { return v } +/** + * A non-empty string that also parses as a date. The stdio proxy refreshes + * whenever the stored expiry is unparseable, so accepting a non-date + * `expires_at` (e.g. epoch-seconds-as-string) would make every forwarded + * message a fresh refresh that re-stores the same bad value and never + * self-corrects. Fail the refresh loudly at parse time instead. + * + * @param {unknown} v @param {string} field @returns {string} + */ +function isoTimestamp(v, field) { + const s = str(v, field) + if (Number.isNaN(Date.parse(s))) { + throw new Error(`identity response field '${field}' is not a valid timestamp`) + } + return s +} + /** * Strip trailing slashes from an identity base so `${base}/token` never * double-slashes. Shared with the login orchestrator's start-URL builder. diff --git a/src/core/remote/loopback.js b/src/core/remote/loopback.js index f4cb872..2d793a1 100644 --- a/src/core/remote/loopback.js +++ b/src/core/remote/loopback.js @@ -101,12 +101,12 @@ export function startLoopbackReceiver({ state, timeoutMs = DEFAULT_TIMEOUT_MS }) fail(new Error('loopback received a callback with a mismatched state'), 'state_mismatch') return } - if (error) { + if (params.has('error')) { // The redirect's `error` is attacker-influenceable (anyone can hit the // loopback port). Bound it to a safe token before it reaches the error // message, the log ERROR_KIND, and the terminal, so a crafted value can't // inject newlines into logs or terminal output. - const safeError = sanitizeErrorCode(error) + const safeError = sanitizeErrorCode(error ?? '') respond(res, 'Login failed. You can close this tab and return to the terminal.') fail(Object.assign(new Error(`login failed: ${safeError}`), { callbackError: safeError }), safeError) return diff --git a/test/core/remote-identity-client.test.js b/test/core/remote-identity-client.test.js index 5090f90..2a4188e 100644 --- a/test/core/remote-identity-client.test.js +++ b/test/core/remote-identity-client.test.js @@ -3,7 +3,7 @@ import test from 'node:test' import assert from 'node:assert/strict' -import { exchangeCode, refreshSession, InvalidGrantError, sessionExpiredMessage } from '../../src/core/remote/identity_client.js' +import { exchangeCode, refreshSession, describeRefreshError, InvalidGrantError, sessionExpiredMessage } from '../../src/core/remote/identity_client.js' /** * A fetch stub that records the last request and returns `reply`. @@ -104,3 +104,25 @@ test('a response missing access_jwt is rejected', async () => { /missing 'access_jwt'/, ) }) + +test('a non-date expires_at is rejected at refresh time, not stored to loop forever', async () => { + const { fetchImpl } = stubFetch({ body: { access_jwt: 'jwt', expires_at: '1719600000', org: 'acme' } }) + await assert.rejects( + () => refreshSession({ identityBase: 'https://hyp.internal/v1/identity', refreshToken: 'rt', fetchImpl }), + /'expires_at' is not a valid timestamp/, + ) +}) + +test('describeRefreshError maps invalid_grant to session-expired re-login guidance', () => { + assert.deepEqual(describeRefreshError(new InvalidGrantError(), 'prod'), { + sessionExpired: true, + message: "remote session expired - re-run 'hyp remote login prod'", + }) +}) + +test('describeRefreshError passes a non-invalid_grant error through as a generic message', () => { + assert.deepEqual(describeRefreshError(new Error('identity endpoint rejected the grant (HTTP 500)'), 'prod'), { + sessionExpired: false, + message: 'identity endpoint rejected the grant (HTTP 500)', + }) +}) diff --git a/test/core/remote-oidc-login.test.js b/test/core/remote-oidc-login.test.js index 86119ef..f328f44 100644 --- a/test/core/remote-oidc-login.test.js +++ b/test/core/remote-oidc-login.test.js @@ -68,7 +68,7 @@ test('drives PKCE -> loopback -> exchange and returns the session', async (t) => test('--no-browser prints the URL instead of opening it', async () => { const fetchImpl = /** @type {any} */ (async () => ({ ok: true, status: 200, - text: async () => JSON.stringify({ refresh_token: 'rt', access_jwt: 'jwt', expires_at: 'x', org: 'acme' }), + text: async () => JSON.stringify({ refresh_token: 'rt', access_jwt: 'jwt', expires_at: '2026-06-29T12:00:00Z', org: 'acme' }), })) const { startReceiver } = scriptedReceiver() /** @type {string[]} */ diff --git a/test/core/remote-proxy-oidc.test.js b/test/core/remote-proxy-oidc.test.js index 5e20283..bd7bd6d 100644 --- a/test/core/remote-proxy-oidc.test.js +++ b/test/core/remote-proxy-oidc.test.js @@ -144,6 +144,30 @@ test('proxy surfaces a failed forced refresh instead of a bare HTTP 401', async assert.doesNotMatch(out.join(''), /remote returned HTTP 401/) }) +test('proxy gives re-login guidance, not a bare 401, when a clean refresh is still rejected', async (t) => { + const original = globalThis.fetch + t.after(() => { globalThis.fetch = original }) + + const hypHome = await tmpHome() + const stateDir = path.join(hypHome, 'hypaware') + await writeSession(stateDir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-old', expiresAt: FUTURE, org: 'acme' }) + + // The refresh succeeds (mints a new JWT), but the MCP side rejects every JWT + // (e.g. server-side clock skew or a revocation independent of the refresh row). + globalThis.fetch = /** @type {any} */ (async (/** @type {string} */ url, /** @type {any} */ init) => { + if (url === TOKEN_URL) return reply({ access_jwt: 'jwt-new', expires_at: FUTURE, org: 'acme' }) + const body = JSON.parse(init.body) + return reply({ jsonrpc: '2.0', id: body.id }, 401) + }) + + const { ctx, out } = makeCtx({ hypHome, lines: [{ jsonrpc: '2.0', id: 1, method: 'tools/call', params: { name: 'x' } }] }) + const code = await runMcpProxy({ target: 'prod', ctx }) + + assert.equal(code, 0) + assert.match(out.join(''), /re-run 'hyp remote login prod'/) + assert.doesNotMatch(out.join(''), /remote returned HTTP 401/) +}) + test('proxy surfaces re-login guidance when the refresh is rejected (invalid_grant)', async (t) => { const original = globalThis.fetch t.after(() => { globalThis.fetch = original }) From af039835ac93f24f6a100f9a527f09f293ff8a8e Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 17:34:11 -0700 Subject: [PATCH 33/59] remote/oidc: keep a refreshable session whose cached JWT is gone normalizeRecord dropped an explicit oidc record that had a valid refresh token but a missing or empty accessJwt (a partial write, an interrupted refresh, a hand edit), and the proxy presence-probe (resolveToken) likewise treated it as no-credential. Both forced a needless full re-login for a session that resolveAccessJwt could mint a fresh JWT for. Keep an explicit-kind oidc record whenever its refresh token is non-empty, defaulting accessJwt to ''. Inferred (no-kind) records still require the full refresh+access pair so one carrying a static token is not hijacked. resolveToken now reports an oidc record as present even when its cached JWT is empty, so the fail-fast probe does not refuse a refreshable session. --- src/core/remote/credentials.js | 58 +++++++++++++++-------- test/core/remote-credentials-oidc.test.js | 11 ++++- 2 files changed, 49 insertions(+), 20 deletions(-) diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index 152e8cb..6e02646 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -169,19 +169,29 @@ async function readRawCredentials(stateDir, { mutable = false } = {}) { function normalizeRecord(entry) { if (!entry || typeof entry !== 'object') return null const e = /** @type {Record} */ (entry) - // A complete oidc record (explicit kind, or an inferable refresh+access pair). - if (e.kind === 'oidc' || (e.kind === undefined && typeof e.refreshToken === 'string')) { - if (typeof e.refreshToken === 'string' && typeof e.accessJwt === 'string') { - return { - kind: 'oidc', - refreshToken: e.refreshToken, - accessJwt: e.accessJwt, - expiresAt: typeof e.expiresAt === 'string' ? e.expiresAt : '', - org: typeof e.org === 'string' ? e.org : '', - } + /** @param {string} accessJwt @returns {RemoteOidcRecord} */ + const oidc = (accessJwt) => ({ + kind: 'oidc', + refreshToken: e.refreshToken, + accessJwt, + expiresAt: typeof e.expiresAt === 'string' ? e.expiresAt : '', + org: typeof e.org === 'string' ? e.org : '', + }) + if (e.kind === 'oidc') { + // Explicit oidc: the refresh token is what makes the record usable - the + // cached access JWT is derivable from it. So keep a record whose `accessJwt` + // is missing or empty (a partial write, an interrupted refresh, a hand edit) + // rather than dropping it; resolveAccessJwt mints a fresh JWT from the + // refresh token, where dropping it would force a needless full re-login. + if (typeof e.refreshToken === 'string' && e.refreshToken.length > 0) { + return oidc(typeof e.accessJwt === 'string' ? e.accessJwt : '') } - // Incomplete oidc shape: fall through so a usable static `token` on the - // same (corrupt/hand-edited) record is not silently dropped. + // No usable refresh token: fall through to the static `token` check. + } else if (e.kind === undefined && typeof e.refreshToken === 'string' && e.refreshToken.length > 0 && typeof e.accessJwt === 'string') { + // Inferred oidc (legacy file with no explicit kind): require the full + // refresh + access pair before reading an unkinded record as oidc, so a + // record that also carries a static `token` is not hijacked away from it. + return oidc(e.accessJwt) } // Legacy / static: a bare token with no (or static) kind. if (typeof e.token === 'string') { @@ -250,10 +260,15 @@ export async function removeToken(stateDir, target) { /** * Resolve a target's bearer token at query time, **without** session-aware * refresh. Order: per-target env var (CI/ephemeral) → stored file → error. An - * env override never falls through to the file (LLP 0033 §credentials). For an - * `oidc` record this returns the cached access JWT as-is. Both the query attach - * path and the stdio proxy now use {@link resolveAccessJwt} for session-aware - * refresh; this lower-level reader is kept for any non-refreshing caller. + * env override never falls through to the file (LLP 0033 §credentials). + * + * This is a **presence** check, not a working-token guarantee: an `oidc` record + * reports `ok` whenever it can yield a token, which includes a refreshable + * record whose cached JWT is empty or stale (the returned `token` is then that + * cached value, possibly `''`). A caller that needs a usable bearer must go + * through {@link resolveAccessJwt}, which refreshes; this lower-level reader + * exists for the stdio proxy's fail-fast probe, which only asks "is there a + * credential here at all" and must not refuse a session it could refresh. * * @param {{ target: string, env: NodeJS.ProcessEnv, stateDir: string }} args * @returns {Promise<{ ok: true, token: string, source: 'env' | 'file' } | { ok: false, error: string }>} @@ -266,9 +281,14 @@ export async function resolveToken({ target, env, stateDir }) { } const creds = await readCredentials(stateDir) const entry = creds[target] - const token = entry ? bearerOf(entry) : '' - if (token.length > 0) { - return { ok: true, token, source: 'file' } + if (entry) { + const token = bearerOf(entry) + // A non-empty cached bearer is usable as-is; an oidc record is also present + // when only its refresh token survives (normalizeRecord guarantees a + // non-empty refreshToken), since resolveAccessJwt can mint a fresh JWT. + if (token.length > 0 || entry.kind === 'oidc') { + return { ok: true, token, source: 'file' } + } } return { ok: false, error: noTokenError(target, envName) } } diff --git a/test/core/remote-credentials-oidc.test.js b/test/core/remote-credentials-oidc.test.js index 0c3dfca..ce4b818 100644 --- a/test/core/remote-credentials-oidc.test.js +++ b/test/core/remote-credentials-oidc.test.js @@ -70,9 +70,18 @@ test('a record with a refreshToken but no accessJwt still yields its usable stat assert.deepEqual(creds.prod, { kind: 'static', token: 'still-good' }) }) -test('a malformed oidc record (no accessJwt, no token) is dropped on read', async () => { +test('an oidc record with a refresh token but no cached accessJwt is kept (refreshable)', async () => { const dir = await tmpState() + // A partial write / interrupted refresh: the refresh token survives but the + // cached JWT does not. The record is still usable - resolveAccessJwt can mint + // a fresh JWT - so it must be kept (with an empty accessJwt), not dropped. await fs.writeFile(remoteCredentialsPath(dir), JSON.stringify({ prod: { kind: 'oidc', refreshToken: 'rt' } })) + assert.deepEqual((await readCredentials(dir)).prod, { kind: 'oidc', refreshToken: 'rt', accessJwt: '', expiresAt: '', org: '' }) +}) + +test('an oidc record with neither a refresh token nor a static token is dropped on read', async () => { + const dir = await tmpState() + await fs.writeFile(remoteCredentialsPath(dir), JSON.stringify({ prod: { kind: 'oidc', accessJwt: 'orphan-jwt' } })) assert.deepEqual(await readCredentials(dir), {}) }) From 43f89383c880b6ff168e19550f00989206673eca Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 17:34:21 -0700 Subject: [PATCH 34/59] remote/oidc: bound the /token request with a timeout postToken's fetch had no deadline, but the stdio proxy refreshes on the per-message hot path (resolveAccessJwt -> refreshSession), so a hung or slow identity endpoint blocked a forwarded JSON-RPC message indefinitely. The loopback receiver had a timeout; the token call, the one actually on the request path, did not. Wrap the fetch and body read in a 30s AbortController, clearing the timer in finally so a fast response does not keep the event loop alive, and surface an abort as a clear "did not respond within 30s" error. --- src/core/remote/identity_client.js | 38 +++++++++++++++++++++++++----- 1 file changed, 32 insertions(+), 6 deletions(-) diff --git a/src/core/remote/identity_client.js b/src/core/remote/identity_client.js index 8b0d9e5..f5ebe68 100644 --- a/src/core/remote/identity_client.js +++ b/src/core/remote/identity_client.js @@ -15,6 +15,13 @@ import { Attr, getLogger } from '../observability/index.js' * @import { OidcSession, RefreshedAccess } from '../../../src/core/remote/types.js' */ +/** + * Upper bound on a single `/token` request. The stdio proxy refreshes on the + * per-message hot path (resolveAccessJwt -> refreshSession), so a hung or very + * slow identity endpoint must not block a forwarded JSON-RPC message forever. + */ +const TOKEN_TIMEOUT_MS = 30 * 1000 + /** * Error thrown when the token endpoint rejects a refresh with * `invalid_grant` (the refresh row was revoked or expired). The attach path @@ -122,12 +129,31 @@ async function postToken({ identityBase, body, fetchImpl, operation }) { } const log = getLogger('remote') const url = `${trimSlash(identityBase)}/token` - const res = await doFetch(url, { - method: 'POST', - headers: { 'content-type': 'application/json', accept: 'application/json' }, - body: JSON.stringify(body), - }) - const text = await safeText(res) + // Bound the whole request (connect + body read) so a hung endpoint can't wedge + // the proxy hot path. clearTimeout in `finally` so a fast response doesn't keep + // the timer (and the event loop) alive. + const controller = new AbortController() + const timer = setTimeout(() => controller.abort(), TOKEN_TIMEOUT_MS) + /** @type {Awaited>} */ + let res + /** @type {string} */ + let text + try { + res = await doFetch(url, { + method: 'POST', + headers: { 'content-type': 'application/json', accept: 'application/json' }, + body: JSON.stringify(body), + signal: controller.signal, + }) + text = await safeText(res) + } catch (err) { + if (controller.signal.aborted) { + throw new Error(`identity endpoint did not respond within ${TOKEN_TIMEOUT_MS / 1000}s`) + } + throw err + } finally { + clearTimeout(timer) + } /** @type {any} */ let json try { From 6ccf8fedbd550d4b4ed49e92f7b516a65a6fdb6b Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 17:34:21 -0700 Subject: [PATCH 35/59] mcp/proxy: do not advise re-login for an env/static credential 401 A 401/403 surviving the refresh+retry was reported as "remote session expired - re-run hyp remote login" regardless of credential kind. For a per-target env override (never read from the store) or a CI static token, re-login cannot fix it. Gate the session-expired wording on isRefreshable(resolved); otherwise point at the credential itself. --- src/core/mcp/proxy.js | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/src/core/mcp/proxy.js b/src/core/mcp/proxy.js index 7a0b0ec..f34079d 100644 --- a/src/core/mcp/proxy.js +++ b/src/core/mcp/proxy.js @@ -4,7 +4,7 @@ import process from 'node:process' import { Attr, getLogger } from '../observability/index.js' import { readObservabilityEnv } from '../observability/env.js' -import { attachWithRefresh, deriveIdentityBase, resolveAccessJwt, resolveToken } from '../remote/credentials.js' +import { attachWithRefresh, deriveIdentityBase, isRefreshable, resolveAccessJwt, resolveToken } from '../remote/credentials.js' import { describeRefreshError, sessionExpiredMessage } from '../remote/identity_client.js' import { isAuthStatus, parseRpcResponse } from './client.js' import { INTERNAL_ERROR, jsonRpcError } from './jsonrpc.js' @@ -138,11 +138,17 @@ export async function runMcpProxy({ target, ctx }) { if (sid) sessionId = sid if (isNotify) return null if (!res.ok) { - // A 401/403 that survives the refresh + retry above (server-side clock - // skew, or a JWT revoked independently of the refresh row) is a dead - // credential: give the same re-login guidance the verb path does rather - // than a bare status the caller can't act on. - const detail = isAuthStatus(res.status) ? sessionExpiredMessage(target) : `remote returned HTTP ${res.status}` + // A 401/403 that survives the refresh + retry above is a dead + // credential. For a refreshable OIDC session (server-side clock skew, or + // a JWT revoked independently of the refresh row) the fix is re-login. + // For an env/static token re-login cannot help (an env override is never + // read from the store, a CI token needs replacing), so point at the + // credential itself rather than mis-advising a login. + const detail = !isAuthStatus(res.status) + ? `remote returned HTTP ${res.status}` + : isRefreshable(resolved) + ? sessionExpiredMessage(target) + : `remote rejected the credential for '${target}' (HTTP ${res.status}) - check the token` return jsonRpcError(id, INTERNAL_ERROR, detail) } try { From a887c51d4425ed277d54475c2c804be5d68fddaf Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 17:34:21 -0700 Subject: [PATCH 36/59] mcp/remote-verb: guard a missing global fetch up front The proxy fails fast with a clear message when globalThis.fetch is absent; the verb path did not, so a fetchless runtime threw the identity client's lower-level "no fetch" deep inside the attach path with no guidance. Add the same early guard, since both the refresh and the tool call need fetch. --- src/core/mcp/remote_verb.js | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/core/mcp/remote_verb.js b/src/core/mcp/remote_verb.js index 3bf5273..c0e3470 100644 --- a/src/core/mcp/remote_verb.js +++ b/src/core/mcp/remote_verb.js @@ -34,6 +34,14 @@ export async function runRemoteVerb({ verb, params, target, ctx }) { } } + // Both the silent refresh and the remote tool call need a global fetch; fail + // with a clear reason here rather than letting an oidc refresh throw the + // identity client's lower-level "no fetch" deep inside the attach path (the + // stdio proxy guards the same way). + if (typeof (/** @type {unknown} */ (globalThis.fetch)) !== 'function') { + return { ok: false, error: 'no fetch implementation available for the remote verb', exitCode: 1 } + } + const stateDir = readObservabilityEnv(ctx.env).stateDir const identityBase = deriveIdentityBase(entry.url) ?? undefined /** @type {Awaited>} */ From 24b136c08b55b791942db6103da0ced5d9eb9bca Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 17:34:34 -0700 Subject: [PATCH 37/59] remote/loopback: close the callback socket so login does not linger The browser opens /callback over a keep-alive connection. server.close() waits for in-flight sockets to drain, so without Connection: close the idle keep-alive socket held the event loop until Node's keepAliveTimeout (~5s) and hyp remote login hung that long at exit. Set Connection: close on the response so close() finishes promptly while the user still sees the page. --- src/core/remote/loopback.js | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/core/remote/loopback.js b/src/core/remote/loopback.js index 2d793a1..74e01ca 100644 --- a/src/core/remote/loopback.js +++ b/src/core/remote/loopback.js @@ -201,6 +201,12 @@ function sanitizeErrorCode(error) { /** @param {ServerResponse} res @param {string} message */ function respond(res, message) { const body = `HypAware login

${message}

` - res.writeHead(200, { 'content-type': 'text/html; charset=utf-8' }) + // Browsers open the callback over a keep-alive connection. `server.close()` + // waits for in-flight sockets to drain, so without `Connection: close` the + // idle keep-alive socket would hold the event loop until Node's + // keepAliveTimeout (~5s) and `hyp remote login` would hang that long at exit. + // Asking the browser to close after this single response lets close() finish + // promptly while the user still sees the page. + res.writeHead(200, { 'content-type': 'text/html; charset=utf-8', connection: 'close' }) res.end(body) } From 1887932dfe5f4d5c32f7344c520bfa73efdc3fa6 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 17:34:34 -0700 Subject: [PATCH 38/59] remote/oidc: phrase the browser-open line as an attempt openBrowser's boolean is best-effort: a launcher that exists but fails (no display on a headless box) still returns true. "Opened your browser" then overstated what happened. Say "Opening" instead; the URL is still printed right below as the real fallback. --- src/core/remote/oidc_login.js | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/core/remote/oidc_login.js b/src/core/remote/oidc_login.js index 96f1927..ff03ad6 100644 --- a/src/core/remote/oidc_login.js +++ b/src/core/remote/oidc_login.js @@ -60,9 +60,10 @@ export async function loginWithBrowser({ const opened = noBrowser ? false : openBrowser(startUrl) if (opened) { - // The opener boolean is best-effort (a silently-failed launcher still - // returns true), so always print the URL as a fallback path. - print(`Opened your browser to sign in. Waiting for the redirect...`) + // The opener boolean is best-effort: a launcher that exists but fails (no + // display on a headless box) still returns true. So phrase this as an + // attempt, not a fact, and always print the URL as the real fallback. + print(`Opening your browser to sign in. Waiting for the redirect...`) print(`If it did not open, visit:\n\n ${startUrl}\n`) } else { print(`Open this URL in your browser to sign in:\n\n ${startUrl}\n`) From d079a062bab33ad4b9cc08a761bb95445e54002d Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 17:34:34 -0700 Subject: [PATCH 39/59] smoke/remote-oidc: force expiry through the cache-busting write path forceExpiry rewrote the credential file with a raw fs.writeFile at the same byte length. A same-size rewrite landing within one mtime tick is hidden behind the credential module's parse cache, so the next resolve could read the pre-expiry record, skip the refresh, and flake the one-refresh assertion. Write through writeSession, which invalidates that cache. --- hypaware-core/smoke/flows/remote_oidc_login.js | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/hypaware-core/smoke/flows/remote_oidc_login.js b/hypaware-core/smoke/flows/remote_oidc_login.js index 39dd9f2..9206a60 100644 --- a/hypaware-core/smoke/flows/remote_oidc_login.js +++ b/hypaware-core/smoke/flows/remote_oidc_login.js @@ -8,7 +8,6 @@ import { loginWithBrowser } from '../../../src/core/remote/oidc_login.js' import { readCredentials, writeSession, - remoteCredentialsPath, } from '../../../src/core/remote/credentials.js' import { querySqlVerb } from '../../../src/core/query/verb.js' import { verbToCommand } from '../../../src/core/cli/verb_command.js' @@ -139,11 +138,18 @@ function runQuery(mcpUrl) { * @param {string} target */ async function forceExpiry(stateDir, target) { - const fs = await import('node:fs/promises') const creds = await readCredentials(stateDir) const rec = /** @type {any} */ (creds[target]) - rec.expiresAt = '2000-01-01T00:00:00Z' - await fs.writeFile(remoteCredentialsPath(stateDir), JSON.stringify(creds, null, 2) + '\n', { mode: 0o600 }) + // Write through writeSession (not a raw fs.writeFile) so the credential + // module's parse cache is invalidated. A raw same-size rewrite landing within + // one mtime tick would be hidden behind that cache, and the next resolve would + // read the pre-expiry record, skip the refresh, and flake this smoke. + await writeSession(stateDir, target, { + refreshToken: rec.refreshToken, + accessJwt: rec.accessJwt, + expiresAt: '2000-01-01T00:00:00Z', + org: rec.org, + }) } /** From dae52f8fb41e880feb5298e2bd27d1d3defe8b02 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 19:08:27 -0700 Subject: [PATCH 40/59] remote/loopback: send Connection: close on stray 404/400 replies A browser favicon or probe hitting the loopback port on a non-/callback path was answered without Connection: close, so the idle keep-alive socket held the event loop until Node's keepAliveTimeout (~5s) and hyp remote login hung that long at exit. Match respond()'s keep-alive close on the 404 and 400 paths. --- src/core/remote/loopback.js | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/core/remote/loopback.js b/src/core/remote/loopback.js index 74e01ca..4c006ee 100644 --- a/src/core/remote/loopback.js +++ b/src/core/remote/loopback.js @@ -76,16 +76,20 @@ export function startLoopbackReceiver({ state, timeoutMs = DEFAULT_TIMEOUT_MS }) // throw; an uncaught throw in the request listener would crash the whole // `hyp remote login` process. Treat it as a stray request: 400 and ignore, // never settle the flow, so the real callback can still arrive. + // `connection: close` on these stray-request replies for the same reason + // respond() sets it: a browser favicon/probe over keep-alive would otherwise + // hold an idle socket open, and server.close() (on the real callback) waits + // for it to drain, hanging `hyp remote login` at exit for ~5s. let url try { url = new URL(req.url ?? '/', 'http://127.0.0.1') } catch { - res.writeHead(400, { 'content-type': 'text/plain' }) + res.writeHead(400, { 'content-type': 'text/plain', connection: 'close' }) res.end('bad request') return } if (url.pathname !== CALLBACK_PATH) { - res.writeHead(404, { 'content-type': 'text/plain' }) + res.writeHead(404, { 'content-type': 'text/plain', connection: 'close' }) res.end('not found') return } From e7b1a61b2854925a029bb371d8ace797ada7a329 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 19:08:36 -0700 Subject: [PATCH 41/59] remote/credentials: serialize writes with a lock and harden the refresh resolver - Wrap writeToken/writeSession/removeToken in a cross-process O_EXCL lock that re-reads the freshest file inside the lock, so two logins for different targets can no longer clobber each other's just-rotated one-time-use refresh token (which forced a needless re-login). A stale lock is stolen by age. - Drop the parse cache before the forced-refresh read so the live-401 retry sees a concurrent rotation, symmetric with the invalid_grant re-read. - Read the per-target env override through one helper in both resolvers. - Add describeAuthRejection, the shared "why is this credential dead" wording for a 401 surviving the refresh + retry, and surface attachWithRefresh's final authFailed so callers can word their guidance accordingly. --- src/core/remote/credentials.js | 173 ++++++++++++++++++++++++++++----- 1 file changed, 146 insertions(+), 27 deletions(-) diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index 6e02646..b322223 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -4,7 +4,7 @@ import fs from 'node:fs/promises' import path from 'node:path' import process from 'node:process' -import { InvalidGrantError, refreshSession } from './identity_client.js' +import { InvalidGrantError, refreshSession, sessionExpiredMessage } from './identity_client.js' /** * @import { Stats } from 'node:fs' @@ -34,6 +34,12 @@ const CREDENTIALS_BASENAME = 'remote-credentials.json' /** Refresh an `oidc` access JWT once it is within this window of expiry. */ const REFRESH_SKEW_MS = 60 * 1000 +/** Max time to wait for the cross-process write lock before giving up. */ +const LOCK_TIMEOUT_MS = 5 * 1000 + +/** A lock file older than this is treated as abandoned by a crashed holder. */ +const LOCK_STALE_MS = 30 * 1000 + /** * Single-entry parse cache for the credential file. The stdio proxy resolves a * token per forwarded message, so without this every message re-read and @@ -210,10 +216,11 @@ function normalizeRecord(entry) { * @returns {Promise} */ export async function writeToken(stateDir, target, token) { - await fs.mkdir(stateDir, { recursive: true }) - const current = await readRawCredentials(stateDir, { mutable: true }) - current[target] = { kind: 'static', token } - await writeCredentials(stateDir, current) + await withCredentialsLock(stateDir, async () => { + const current = await readRawCredentials(stateDir, { mutable: true }) + current[target] = { kind: 'static', token } + await writeCredentials(stateDir, current) + }) } /** @@ -227,16 +234,17 @@ export async function writeToken(stateDir, target, token) { * @ref LLP 0046#d4 [implements]: oidc session written through the same atomic 0600 store as static tokens */ export async function writeSession(stateDir, target, session) { - await fs.mkdir(stateDir, { recursive: true }) - const current = await readRawCredentials(stateDir, { mutable: true }) - current[target] = { - kind: 'oidc', - refreshToken: session.refreshToken, - accessJwt: session.accessJwt, - expiresAt: session.expiresAt, - org: session.org, - } - await writeCredentials(stateDir, current) + await withCredentialsLock(stateDir, async () => { + const current = await readRawCredentials(stateDir, { mutable: true }) + current[target] = { + kind: 'oidc', + refreshToken: session.refreshToken, + accessJwt: session.accessJwt, + expiresAt: session.expiresAt, + org: session.org, + } + await writeCredentials(stateDir, current) + }) } /** @@ -250,11 +258,13 @@ export async function writeSession(stateDir, target, session) { export async function removeToken(stateDir, target) { // Operate on the raw file so a record we can't normalize is still removable, // and so removing one target never drops an unrelated sibling. - const current = await readRawCredentials(stateDir, { mutable: true }) - if (!Object.prototype.hasOwnProperty.call(current, target)) return false - delete current[target] - await writeCredentials(stateDir, current) - return true + return withCredentialsLock(stateDir, async () => { + const current = await readRawCredentials(stateDir, { mutable: true }) + if (!Object.prototype.hasOwnProperty.call(current, target)) return false + delete current[target] + await writeCredentials(stateDir, current) + return true + }) } /** @@ -275,8 +285,8 @@ export async function removeToken(stateDir, target) { */ export async function resolveToken({ target, env, stateDir }) { const envName = remoteTokenEnvVar(target) - const fromEnv = env[envName] - if (typeof fromEnv === 'string' && fromEnv.length > 0) { + const fromEnv = envOverride(env, target) + if (fromEnv !== undefined) { return { ok: true, token: fromEnv, source: 'env' } } const creds = await readCredentials(stateDir) @@ -315,11 +325,17 @@ export async function resolveToken({ target, env, stateDir }) { */ export async function resolveAccessJwt({ target, env, stateDir, identityBase, now = Date.now(), fetchImpl, forceRefresh = false }) { const envName = remoteTokenEnvVar(target) - const fromEnv = env[envName] - if (typeof fromEnv === 'string' && fromEnv.length > 0) { + const fromEnv = envOverride(env, target) + if (fromEnv !== undefined) { return { ok: true, token: fromEnv, source: 'env', kind: 'static' } } + // A forced refresh is the live-401 retry: drop the parse cache so the read + // below sees the freshest refresh token. A concurrent rotation may have + // rewritten the file with an identical size within one mtime tick, which the + // cache key (path + mtime + size) would otherwise miss - symmetric with the + // invalid_grant re-read below. + if (forceRefresh) rawCache = null const creds = await readCredentials(stateDir) let entry = creds[target] if (!entry) { @@ -407,24 +423,58 @@ export function isRefreshable(resolved) { * even when it too failed auth. A `refresh()` that throws (e.g. a typed * `invalid_grant`) propagates to the caller, which maps it to re-login guidance. * + * The `ok` result carries the final attempt's `authFailed` so the caller can + * tell a 401 that *survived* the refresh + retry (a dead credential) from a + * clean success, and word the guidance accordingly via {@link describeAuthRejection}. + * * @template T * @param {{ * resolved: { ok: true, token: string, source?: 'env' | 'file', kind?: 'static' | 'oidc' }, * refresh: () => Promise<{ ok: true, token: string, source?: 'env' | 'file', kind?: 'static' | 'oidc' } | { ok: false, error: string }>, * op: (token: string) => Promise<{ authFailed: boolean, value: T }>, * }} args - * @returns {Promise<{ ok: true, value: T } | { ok: false, error: string }>} + * @returns {Promise<{ ok: true, value: T, authFailed: boolean } | { ok: false, error: string }>} * @ref LLP 0046#d5 [implements]: the 401 -> force refresh -> retry-once policy, one home for both attach paths */ export async function attachWithRefresh({ resolved, refresh, op }) { const first = await op(resolved.token) if (!first.authFailed || !isRefreshable(resolved)) { - return { ok: true, value: first.value } + return { ok: true, value: first.value, authFailed: first.authFailed } } const refreshed = await refresh() if (!refreshed.ok) return { ok: false, error: refreshed.error } const second = await op(refreshed.token) - return { ok: true, value: second.value } + return { ok: true, value: second.value, authFailed: second.authFailed } +} + +/** + * Explain a 401/403 that survived the one-shot refresh + retry, by *why* the + * credential is dead, so the stdio proxy and the verb attach path advise the + * user identically (LLP 0046 D5). One home so the two can never drift: + * - a refreshable oidc session that still 401s is an expired session: re-login + * (exit 2, same as an invalid_grant refresh failure); + * - a per-target env override can't be fixed by re-login (it always wins over + * the store), so point at the env var (exit 1); + * - a static file token re-login *does* replace, so advise re-login (exit 1). + * + * @param {{ target: string, status: number, resolved: { source?: 'env' | 'file', kind?: 'static' | 'oidc' } }} args + * @returns {{ message: string, exitCode: number }} + * @ref LLP 0046#d5 [implements]: 401-after-retry guidance, one home for both attach paths + */ +export function describeAuthRejection({ target, status, resolved }) { + if (isRefreshable(resolved)) { + return { message: sessionExpiredMessage(target), exitCode: 2 } + } + if (resolved.source === 'env') { + return { + message: `remote rejected the credential for '${target}' (HTTP ${status}) - re-login cannot fix an env override; check ${remoteTokenEnvVar(target)}`, + exitCode: 1, + } + } + return { + message: `remote rejected the credential (HTTP ${status}) - re-run 'hyp remote login'`, + exitCode: 1, + } } /** @@ -453,6 +503,20 @@ function bearerOf(record) { return record.kind === 'oidc' ? record.accessJwt : record.token } +/** + * The per-target env override, or `undefined`. `HYP_REMOTE_TOKEN_` + * always wins over the stored file (LLP 0033 §credentials); both resolvers read + * it through here so the "env always wins" rule has one home. + * + * @param {NodeJS.ProcessEnv} env + * @param {string} target + * @returns {string | undefined} + */ +function envOverride(env, target) { + const value = env[remoteTokenEnvVar(target)] + return typeof value === 'string' && value.length > 0 ? value : undefined +} + /** * @param {string} target * @param {string} envName @@ -462,6 +526,61 @@ function noTokenError(target, envName) { return `no token for '${target}' - run 'hyp remote login ${target}' (or set ${envName})` } +/** @param {number} ms @returns {Promise} */ +function delay(ms) { + return new Promise((resolve) => setTimeout(resolve, ms)) +} + +/** + * Serialize the read-modify-write of the shared 0600 store across concurrent + * `hyp` processes (a verb call beside a proxy, two MCP clients). Without it two + * writers to *different* targets each read the whole map and rename; the later + * rename clobbers the earlier writer's just-rotated one-time-use refresh token + * for the other target, forcing a needless re-login. An `O_EXCL` lock file is + * the cross-process mutex, and the cache is dropped on entry so the locked body + * reads the freshest on-disk map (an identical-size sibling rewrite the parse + * cache would otherwise miss). A stale lock left by a crashed holder is stolen + * by age, so a crash can't wedge every future write. + * + * @template T + * @param {string} stateDir + * @param {() => Promise} fn + * @returns {Promise} + */ +async function withCredentialsLock(stateDir, fn) { + await fs.mkdir(stateDir, { recursive: true }) + const lockPath = `${remoteCredentialsPath(stateDir)}.lock` + const deadline = Date.now() + LOCK_TIMEOUT_MS + for (;;) { + try { + const handle = await fs.open(lockPath, 'wx') + await handle.close() + break + } catch (err) { + if (!err || /** @type {NodeJS.ErrnoException} */ (err).code !== 'EEXIST') throw err + // Steal a lock abandoned by a crashed holder, identified by age. + try { + const st = await fs.stat(lockPath) + if (Date.now() - st.mtimeMs > LOCK_STALE_MS) { + await fs.rm(lockPath, { force: true }) + continue + } + } catch { /* lock vanished between open and stat: retry the acquire */ } + if (Date.now() > deadline) { + throw new Error('timed out acquiring the remote credentials lock') + } + await delay(25) + } + } + try { + // Read the freshest on-disk map inside the lock; see the doc comment. + rawCache = null + return await fn() + } finally { + await fs.rm(lockPath, { force: true }) + } +} + /** * @param {string} stateDir * @param {Record} map From c0dd7fa63379565d1943ee9da711769a2dd656b6 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 19:08:57 -0700 Subject: [PATCH 42/59] remote/attach: unify dead-credential and no-fetch guidance across verb and proxy The verb path advised "re-run hyp remote login" for any surviving 401, including an env-override token that re-login can never fix, and never gave the session-expired guidance the proxy gives when a freshly-refreshed JWT is still rejected (clock skew, or a revocation independent of the refresh row). Route both attach paths through describeAuthRejection so an expired oidc session, an env override, and a static file token each get the right message and exit code. Share the no-fetch wording through one constant so the three entrypoints cannot drift. --- src/core/mcp/proxy.js | 19 +++++------- src/core/mcp/remote_verb.js | 24 +++++++++++--- src/core/remote/identity_client.js | 9 +++++- test/core/remote-verb-oidc.test.js | 50 ++++++++++++++++++++++++++++++ 4 files changed, 85 insertions(+), 17 deletions(-) diff --git a/src/core/mcp/proxy.js b/src/core/mcp/proxy.js index f34079d..999845f 100644 --- a/src/core/mcp/proxy.js +++ b/src/core/mcp/proxy.js @@ -4,8 +4,8 @@ import process from 'node:process' import { Attr, getLogger } from '../observability/index.js' import { readObservabilityEnv } from '../observability/env.js' -import { attachWithRefresh, deriveIdentityBase, isRefreshable, resolveAccessJwt, resolveToken } from '../remote/credentials.js' -import { describeRefreshError, sessionExpiredMessage } from '../remote/identity_client.js' +import { attachWithRefresh, deriveIdentityBase, describeAuthRejection, resolveAccessJwt, resolveToken } from '../remote/credentials.js' +import { describeRefreshError, NO_FETCH_MESSAGE } from '../remote/identity_client.js' import { isAuthStatus, parseRpcResponse } from './client.js' import { INTERNAL_ERROR, jsonRpcError } from './jsonrpc.js' import { serveStdio } from './stdio.js' @@ -57,7 +57,7 @@ export async function runMcpProxy({ target, ctx }) { } const fetchImpl = /** @type {typeof fetch | undefined} */ (globalThis.fetch) if (typeof fetchImpl !== 'function') { - ctx.stderr.write('hyp mcp: no fetch implementation available for the proxy\n') + ctx.stderr.write(`hyp mcp: ${NO_FETCH_MESSAGE} for the proxy\n`) return 1 } @@ -139,16 +139,13 @@ export async function runMcpProxy({ target, ctx }) { if (isNotify) return null if (!res.ok) { // A 401/403 that survives the refresh + retry above is a dead - // credential. For a refreshable OIDC session (server-side clock skew, or - // a JWT revoked independently of the refresh row) the fix is re-login. - // For an env/static token re-login cannot help (an env override is never - // read from the store, a CI token needs replacing), so point at the - // credential itself rather than mis-advising a login. + // credential. Explain by why it is dead - a refreshable OIDC session is + // an expired session (re-login), a per-target env override re-login + // cannot fix, a static file token re-login replaces - via the shared + // policy the verb attach path also uses, so the two never drift. const detail = !isAuthStatus(res.status) ? `remote returned HTTP ${res.status}` - : isRefreshable(resolved) - ? sessionExpiredMessage(target) - : `remote rejected the credential for '${target}' (HTTP ${res.status}) - check the token` + : describeAuthRejection({ target, status: res.status, resolved }).message return jsonRpcError(id, INTERNAL_ERROR, detail) } try { diff --git a/src/core/mcp/remote_verb.js b/src/core/mcp/remote_verb.js index c0e3470..c1d78b5 100644 --- a/src/core/mcp/remote_verb.js +++ b/src/core/mcp/remote_verb.js @@ -1,8 +1,8 @@ // @ts-check import { readObservabilityEnv } from '../observability/env.js' -import { attachWithRefresh, deriveIdentityBase, resolveAccessJwt } from '../remote/credentials.js' -import { describeRefreshError } from '../remote/identity_client.js' +import { attachWithRefresh, deriveIdentityBase, describeAuthRejection, resolveAccessJwt } from '../remote/credentials.js' +import { describeRefreshError, NO_FETCH_MESSAGE } from '../remote/identity_client.js' import { createHttpMcpClient, isAuthStatus } from './client.js' /** @@ -39,7 +39,7 @@ export async function runRemoteVerb({ verb, params, target, ctx }) { // identity client's lower-level "no fetch" deep inside the attach path (the // stdio proxy guards the same way). if (typeof (/** @type {unknown} */ (globalThis.fetch)) !== 'function') { - return { ok: false, error: 'no fetch implementation available for the remote verb', exitCode: 1 } + return { ok: false, error: `${NO_FETCH_MESSAGE} for the remote verb`, exitCode: 1 } } const stateDir = readObservabilityEnv(ctx.env).stateDir @@ -61,13 +61,18 @@ export async function runRemoteVerb({ verb, params, target, ctx }) { // thrown 401/403 (stale or revoked OIDC JWT) is reported as `authFailed` so // the shared policy can force one refresh and retry; any other throw is a // terminal non-auth error folded into the returned verb-result. + // The status of the most recent auth rejection, so a 401/403 that survives the + // refresh + retry can be explained with the real code (defaults to 401). + let lastAuthStatus = 401 const op = async (/** @type {string} */ token) => { try { return { authFailed: false, value: await callRemoteTool({ url: entry.url, token, verb, params }) } } catch (err) { + const authFailed = isAuthError(err) + if (authFailed) lastAuthStatus = Number(/** @type {any} */ (err).status) || lastAuthStatus /** @type {{ ok: false, error: string, exitCode: number }} */ const value = { ok: false, error: err instanceof Error ? err.message : String(err), exitCode: 1 } - return { authFailed: isAuthError(err), value } + return { authFailed, value } } } @@ -77,7 +82,16 @@ export async function runRemoteVerb({ verb, params, target, ctx }) { refresh: () => resolveAccessJwt({ target, env: ctx.env, stateDir, identityBase, forceRefresh: true }), op, }) - return out.ok ? out.value : { ok: false, error: out.error, exitCode: 2 } + if (!out.ok) return { ok: false, error: out.error, exitCode: 2 } + // A 401/403 that survived the one-shot refresh + retry is a dead credential. + // Advise by *why* it is dead - identically to the stdio proxy - rather than + // surfacing client.js's blanket "re-run hyp remote login", which mis-advises + // an env override that re-login can never fix (LLP 0046 D5). + if (out.authFailed) { + const { message, exitCode } = describeAuthRejection({ target, status: lastAuthStatus, resolved }) + return { ok: false, error: message, exitCode } + } + return out.value } catch (refreshErr) { return mapRefreshError(refreshErr, target) } diff --git a/src/core/remote/identity_client.js b/src/core/remote/identity_client.js index f5ebe68..1180e08 100644 --- a/src/core/remote/identity_client.js +++ b/src/core/remote/identity_client.js @@ -22,6 +22,13 @@ import { Attr, getLogger } from '../observability/index.js' */ const TOKEN_TIMEOUT_MS = 30 * 1000 +/** + * Shared base wording for a missing global `fetch`, so every attach entrypoint + * (the verb, the stdio proxy, this identity client) reports the same phrase with + * its own context appended rather than three independently-drifting strings. + */ +export const NO_FETCH_MESSAGE = 'no fetch implementation available' + /** * Error thrown when the token endpoint rejects a refresh with * `invalid_grant` (the refresh row was revoked or expired). The attach path @@ -125,7 +132,7 @@ export async function refreshSession({ identityBase, refreshToken, fetchImpl }) async function postToken({ identityBase, body, fetchImpl, operation }) { const doFetch = fetchImpl ?? /** @type {typeof fetch | undefined} */ (globalThis.fetch) if (typeof doFetch !== 'function') { - throw new Error('no fetch implementation available for the identity client') + throw new Error(`${NO_FETCH_MESSAGE} for the identity client`) } const log = getLogger('remote') const url = `${trimSlash(identityBase)}/token` diff --git a/test/core/remote-verb-oidc.test.js b/test/core/remote-verb-oidc.test.js index 486f51a..500cf41 100644 --- a/test/core/remote-verb-oidc.test.js +++ b/test/core/remote-verb-oidc.test.js @@ -173,3 +173,53 @@ test('a static token that 401s is not retried (cannot refresh)', async (t) => { assert.equal(state.refreshCalls, 0) assert.match(err.join(''), /re-run 'hyp remote login'/) }) + +test('an env-override token that 401s does not advise a re-login it cannot fix', async (t) => { + const hypHome = await tmpHome() + // The env override wins and is never read from the store, so re-login cannot + // fix it. The verb must point at the env var, not tell the user to re-login + // (matching the stdio proxy, LLP 0046 D5). + const state = stubServers(t, { validJwt: 'something-else' }) + const { ctx, err } = ctxWith(hypHome) + ctx.env.HYP_REMOTE_TOKEN_PROD = 'env-bad' + const code = await cmd.run(['SELECT 1', '--remote', 'prod'], ctx) + assert.equal(code, 1) + assert.equal(state.refreshCalls, 0) + const joined = err.join('') + assert.match(joined, /HYP_REMOTE_TOKEN_PROD/) + assert.doesNotMatch(joined, /re-run 'hyp remote login/) +}) + +test('an oidc session whose freshly-refreshed JWT is still rejected gets re-login guidance (exit 2)', async (t) => { + const original = globalThis.fetch + t.after(() => { globalThis.fetch = original }) + const hypHome = await tmpHome() + const stateDir = path.join(hypHome, 'hypaware') + await writeSession(stateDir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-old', expiresAt: FUTURE, org: 'acme' }) + + // The refresh succeeds (mints jwt-new), but the MCP side rejects every JWT + // (server-side clock skew, or a revocation independent of the refresh row). + // The surviving 401 is a dead session: re-login guidance, exit 2. + let refreshCalls = 0 + globalThis.fetch = /** @type {any} */ (async (/** @type {string} */ url, /** @type {any} */ init) => { + const reply = (/** @type {any} */ obj, status = 200) => ({ + ok: status >= 200 && status < 300, + status, + headers: { get: (/** @type {string} */ k) => k.toLowerCase() === 'content-type' ? 'application/json' : null }, + text: async () => JSON.stringify(obj), + }) + if (String(url).includes('/v1/identity/token')) { + refreshCalls++ + return reply({ access_jwt: 'jwt-new', expires_at: FUTURE, org: 'acme' }) + } + const req = JSON.parse(init.body) + if (req.method === 'initialize') return reply({ jsonrpc: '2.0', id: req.id, result: { protocolVersion: '2025-06-18' } }) + return reply({ jsonrpc: '2.0', id: req.id }, 401) + }) + + const { ctx, err } = ctxWith(hypHome) + const code = await cmd.run(['SELECT 1', '--remote', 'prod'], ctx) + assert.equal(code, 2) + assert.equal(refreshCalls, 1) + assert.match(err.join(''), /remote session expired - re-run 'hyp remote login prod'/) +}) From 6482d06ca967b58e90e6756f9c3ae8153d82ef28 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 19:09:04 -0700 Subject: [PATCH 43/59] remote/identity: treat a 401 or non-JSON refresh failure as session-expired A revoked refresh row answered with an empty or non-JSON 401 body never became InvalidGrantError, so the user got a generic error instead of re-login guidance. The only credential the token endpoint authenticates is the refresh token, so classify any 401 there as invalid_grant (alongside an explicit invalid_grant at any status), and only fail on a non-JSON body when the response was otherwise ok. --- src/core/remote/identity_client.js | 24 +++++++++++++++++------- test/core/remote-identity-client.test.js | 18 ++++++++++++++++++ 2 files changed, 35 insertions(+), 7 deletions(-) diff --git a/src/core/remote/identity_client.js b/src/core/remote/identity_client.js index 1180e08..794f297 100644 --- a/src/core/remote/identity_client.js +++ b/src/core/remote/identity_client.js @@ -161,12 +161,17 @@ async function postToken({ identityBase, body, fetchImpl, operation }) { } finally { clearTimeout(timer) } + // Parse failure is not fatal on its own: an error response may carry an empty + // or non-JSON body, and we still want to classify it by status below. Only a + // *successful* response with an unparseable body is an error (we can't read + // the tokens out of it). /** @type {any} */ - let json + let json = {} + let parseFailed = false try { json = text ? JSON.parse(text) : {} } catch { - throw new Error(`identity endpoint returned a non-JSON response (HTTP ${res.status})`) + parseFailed = true } if (!res.ok) { const errCode = json && typeof json.error === 'string' ? json.error : undefined @@ -176,15 +181,20 @@ async function postToken({ identityBase, body, fetchImpl, operation }) { [Attr.STATUS]: 'failed', [Attr.ERROR_KIND]: errCode ?? `http_${res.status}`, }) - // Key the typed rejection on the OAuth error code, not the HTTP status: - // RFC 6749 §5.2 returns 400 for `invalid_grant`, but some deployments use - // 401. Either way a revoked/expired refresh row must reach the re-login - // guidance, so trust the body's `error` field over the status code. - if (errCode === 'invalid_grant') { + // A revoked/expired refresh row must reach the re-login guidance even when + // the body is empty or non-JSON: the only credential this endpoint + // authenticates is the refresh token, so a 401 means re-login regardless of + // whether an OAuth error object came back. RFC 6749 §5.2 returns 400 for + // `invalid_grant`, so honor an explicit `invalid_grant` at any status too; + // a different 400 (a malformed request) stays a generic error. + if (errCode === 'invalid_grant' || res.status === 401) { throw new InvalidGrantError() } throw new Error(`identity endpoint rejected the grant (HTTP ${res.status}${errCode ? ` ${errCode}` : ''})`) } + if (parseFailed) { + throw new Error(`identity endpoint returned a non-JSON response (HTTP ${res.status})`) + } log.info('remote.token_ok', { [Attr.COMPONENT]: 'remote-oidc', [Attr.OPERATION]: operation, diff --git a/test/core/remote-identity-client.test.js b/test/core/remote-identity-client.test.js index 2a4188e..1ab34a0 100644 --- a/test/core/remote-identity-client.test.js +++ b/test/core/remote-identity-client.test.js @@ -85,6 +85,24 @@ test('a 401 invalid_grant surfaces a typed InvalidGrantError', async () => { ) }) +test('a 401 with an empty body still surfaces InvalidGrantError (re-login guidance)', async () => { + // An edge proxy may answer the revoked refresh row with a bare 401, no OAuth + // error object. It must still map to session-expired, not a generic error. + const { fetchImpl } = stubFetch({ status: 401, body: '' }) + await assert.rejects( + () => refreshSession({ identityBase: 'https://hyp.internal/v1/identity', refreshToken: 'stale', fetchImpl }), + (err) => err instanceof InvalidGrantError, + ) +}) + +test('a 401 with a non-JSON body still surfaces InvalidGrantError', async () => { + const { fetchImpl } = stubFetch({ status: 401, body: 'unauthorized' }) + await assert.rejects( + () => refreshSession({ identityBase: 'https://hyp.internal/v1/identity', refreshToken: 'stale', fetchImpl }), + (err) => err instanceof InvalidGrantError, + ) +}) + test('a non-invalid_grant error throws a generic error, not InvalidGrantError', async () => { const { fetchImpl } = stubFetch({ status: 500, body: { error: 'server_error' } }) await assert.rejects( From d8de797e1ed85a52a11eb6b5725cd64913b2f2a6 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 19:09:12 -0700 Subject: [PATCH 44/59] remote/login: --no-browser always selects the browser flow Drop the stdin peek that tried to reconcile --no-browser with a piped token. --no-browser now unconditionally means "browser flow, print the URL"; a piped token without a browser-mode flag still takes the static path, so a token is only ignored when --no-browser is given explicitly. Removes a fragile branch and a duplicated note. --- src/core/cli/remote_commands.js | 33 ++++---------------------- test/core/remote-login-command.test.js | 21 +++++++++------- 2 files changed, 17 insertions(+), 37 deletions(-) diff --git a/src/core/cli/remote_commands.js b/src/core/cli/remote_commands.js index 3efa3ca..8884031 100644 --- a/src/core/cli/remote_commands.js +++ b/src/core/cli/remote_commands.js @@ -133,34 +133,11 @@ export async function runRemoteLogin(argv, ctx, deps = {}) { return runStaticLogin(name, tokenFile, stdin, ctx) } - // `--no-browser` with a piped token is contradictory: the flag is a browser - // mode modifier ("print the URL") but a piped token signals static intent. - // Peek stdin to tell a real token apart from an empty pipe: a non-empty value - // is stored statically (never silently discarded), while an empty pipe falls - // through to the browser flow (which is what `--no-browser` is for). - // `!useStatic` already guarantees no `--token-file` here (it forces the static - // path), so this only needs the browser-mode + piped-stdin conditions. - if (stdinPiped && noBrowser && !forceBrowser) { - /** @type {string} */ - let piped - try { - piped = (await readAllStdin(stdin)).trim() - } catch (err) { - ctx.stderr.write(`hyp remote login: ${err instanceof Error ? err.message : String(err)}\n`) - return 1 - } - if (piped) { - ctx.stderr.write('note: --no-browser is ignored because a token was piped on stdin (storing it statically)\n') - if (org) { - ctx.stderr.write('note: --org is ignored with a static token (it applies to the browser login flow)\n') - } - return persistStaticToken(name, piped, ctx) - } - // Empty pipe: this is the documented "print the URL" path. The browser flow - // below must not re-read the (now exhausted) stdin, so signal it explicitly. - return runBrowserLogin(name, { org, noBrowser }, ctx, deps.login ?? loginWithBrowser) - } - + // Browser flow. `--no-browser` selects it explicitly ("print the URL instead + // of opening one"), so the flag wins outright: with it set we never read stdin + // as a static token. A piped token *without* a browser-mode flag already took + // the static path above (`useStatic`), so nothing is swallowed silently there; + // only an explicit `--no-browser` ignores a pipe, by design. return runBrowserLogin(name, { org, noBrowser }, ctx, deps.login ?? loginWithBrowser) } diff --git a/test/core/remote-login-command.test.js b/test/core/remote-login-command.test.js index 5dd775b..277eca7 100644 --- a/test/core/remote-login-command.test.js +++ b/test/core/remote-login-command.test.js @@ -197,22 +197,25 @@ test('an empty piped stdin (no token) points at --browser instead of just "empty assert.match(err.join(''), /re-run with --browser/) }) -test('--no-browser with a piped token stores it statically instead of discarding it', async () => { +test('--no-browser takes the browser flow even with a piped token (the flag wins)', async () => { const hypHome = await tmpHome() const stdin = { isTTY: false, async *[Symbol.asyncIterator]() { yield Buffer.from('piped-tok\n') }, } - const { ctx, err } = await makeCtx({ hypHome, stdin }) - let called = false - const login = /** @type {any} */ (async () => { called = true; return {} }) + const { ctx } = await makeCtx({ hypHome, stdin }) + /** @type {any} */ let seen + const login = /** @type {any} */ (async (/** @type {any} */ args) => { + seen = args + return { refreshToken: 'rt', accessJwt: 'jwt', expiresAt: '2999-01-01T00:00:00Z', org: 'acme' } + }) const code = await runRemoteLogin(['prod', '--no-browser'], ctx, { login }) assert.equal(code, 0) - assert.equal(called, false) // the browser flow must NOT run; the token wins - assert.match(err.join(''), /--no-browser is ignored because a token was piped/) - const stateDir = path.join(hypHome, 'hypaware') - const creds = await readCredentials(stateDir) - assert.deepEqual(creds.prod, { kind: 'static', token: 'piped-tok' }) + // The flag selects the browser flow (which prints the URL); the pipe is not + // read as a static token. A piped token without --no-browser still takes the + // static path (covered above), so a token is only ignored when --no-browser + // is given explicitly. + assert.equal(seen.noBrowser, true) }) test('--browser overrides a piped stdin token and takes the browser flow', async () => { From 2cb0cc3f7e37b0c1c749a1c6961bc2f652d4bdd7 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 21:44:32 -0700 Subject: [PATCH 45/59] remote/credentials: serialize oidc refresh with a compare-and-swap commit resolveAccessJwt read the stored record and POSTed the refresh grant outside the write lock, taking the lock only for the final writeSession. Two `hyp` processes sharing one HYP_HOME (a verb call beside a proxy, two MCP clients) could therefore both read the same stale oidc record and both spend the same one-time-use refresh row, and a concurrent `hyp remote login` writing a fresh session could be clobbered by a proxy's later write carrying the superseded token. Refresh now runs through an optimistic compare-and-swap: the network call stays outside the lock (a 30s identity call must not wedge the 5s write lock), but the commit re-reads under the lock and persists only when the stored refresh token still matches the one we refreshed from. A mismatch means a sibling rotated first, so we adopt the stored session when usable or loop and refresh with the rotated token instead of overwriting it. The invalid_grant re-read now happens under the lock too, so a loser that refreshed before the winner persisted reliably sees the winner's write rather than forcing a spurious re-login. --- src/core/remote/credentials.js | 179 +++++++++++++++------- test/core/remote-credentials-oidc.test.js | 21 +++ 2 files changed, 147 insertions(+), 53 deletions(-) diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index b322223..3136525 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -40,6 +40,13 @@ const LOCK_TIMEOUT_MS = 5 * 1000 /** A lock file older than this is treated as abandoned by a crashed holder. */ const LOCK_STALE_MS = 30 * 1000 +/** + * Bound on the optimistic refresh compare-and-swap loop. Each extra attempt + * means a sibling rotated the refresh token between our refresh and our commit; + * a handful of rounds is far more than any real contention reaches. + */ +const REFRESH_MAX_ATTEMPTS = 4 + /** * Single-entry parse cache for the credential file. The stdio proxy resolves a * token per forwarded message, so without this every message re-read and @@ -234,17 +241,30 @@ export async function writeToken(stateDir, target, token) { * @ref LLP 0046#d4 [implements]: oidc session written through the same atomic 0600 store as static tokens */ export async function writeSession(stateDir, target, session) { - await withCredentialsLock(stateDir, async () => { - const current = await readRawCredentials(stateDir, { mutable: true }) - current[target] = { - kind: 'oidc', - refreshToken: session.refreshToken, - accessJwt: session.accessJwt, - expiresAt: session.expiresAt, - org: session.org, - } - await writeCredentials(stateDir, current) - }) + await withCredentialsLock(stateDir, () => commitSession(stateDir, target, session)) +} + +/** + * Persist an oidc session WITHOUT taking the write lock: the caller must already + * hold it. {@link writeSession} is the locked public entry point; the refresh + * compare-and-swap in {@link refreshOidcSession} re-reads and writes inside one + * lock acquisition and uses this directly to keep that read-modify-write atomic. + * + * @param {string} stateDir + * @param {string} target + * @param {{ refreshToken: string, accessJwt: string, expiresAt: string, org: string }} session + * @returns {Promise} + */ +async function commitSession(stateDir, target, session) { + const current = await readRawCredentials(stateDir, { mutable: true }) + current[target] = { + kind: 'oidc', + refreshToken: session.refreshToken, + accessJwt: session.accessJwt, + expiresAt: session.expiresAt, + org: session.org, + } + await writeCredentials(stateDir, current) } /** @@ -331,13 +351,10 @@ export async function resolveAccessJwt({ target, env, stateDir, identityBase, no } // A forced refresh is the live-401 retry: drop the parse cache so the read - // below sees the freshest refresh token. A concurrent rotation may have - // rewritten the file with an identical size within one mtime tick, which the - // cache key (path + mtime + size) would otherwise miss - symmetric with the - // invalid_grant re-read below. + // below sees the freshest refresh token a sibling may have just rotated in. if (forceRefresh) rawCache = null const creds = await readCredentials(stateDir) - let entry = creds[target] + const entry = creds[target] if (!entry) { return { ok: false, error: noTokenError(target, envName) } } @@ -347,53 +364,109 @@ export async function resolveAccessJwt({ target, env, stateDir, identityBase, no : { ok: false, error: noTokenError(target, envName) } } - // oidc: refresh if forced (a live 401 retry), or the cached JWT is missing, - // unparseable, or near expiry. - if (forceRefresh || !isFresh(entry, now)) { - if (!identityBase) { - return { ok: false, error: `cannot refresh '${target}': no identity endpoint resolved` } - } + // oidc: a fresh cached JWT needs neither a network call nor the lock. + if (!forceRefresh && isFresh(entry, now)) { + return { ok: true, token: entry.accessJwt, source: 'file', kind: 'oidc' } + } + if (!identityBase) { + return { ok: false, error: `cannot refresh '${target}': no identity endpoint resolved` } + } + return refreshOidcSession({ target, stateDir, identityBase, fetchImpl, now, from: entry, envName }) +} + +/** + * Refresh an oidc session with an optimistic compare-and-swap commit, so two + * `hyp` processes sharing this 0600 store (a verb call beside a proxy, two MCP + * clients) never double-spend a one-time-use refresh row or clobber a session a + * sibling just rotated. + * + * The network refresh runs OUTSIDE the write lock: a 30s identity call must not + * wedge the 5s lock that static-token writes also contend for. Only the commit + * takes the lock, and persists our result solely when the stored refresh token + * still matches the one we refreshed from. A mismatch means a sibling rotated + * first (a parallel proxy refresh, or a concurrent `hyp remote login`); we adopt + * the stored session when it is usable, or loop and refresh again with the + * rotated token, rather than overwriting it. On `invalid_grant` we re-read under + * the lock - which blocks until any in-flight sibling commit lands, closing the + * window where a loser refreshed before the winner persisted - and adopt a + * changed token; an unchanged token is a genuine revocation and re-throws. + * + * @param {{ + * target: string, + * stateDir: string, + * identityBase: string, + * fetchImpl?: typeof fetch, + * now: number, + * from: RemoteOidcRecord, + * envName: string, + * }} args + * @returns {Promise<{ ok: true, token: string, source: 'file', kind: 'oidc' } | { ok: false, error: string }>} + * @ref LLP 0046#d5 [implements]: serialized refresh commit - no double-spend or clobber across concurrent hyp processes + */ +async function refreshOidcSession({ target, stateDir, identityBase, fetchImpl, now, from, envName }) { + for (let attempt = 0; attempt < REFRESH_MAX_ATTEMPTS; attempt++) { let refreshed try { - refreshed = await refreshSession({ identityBase, refreshToken: entry.refreshToken, fetchImpl }) + refreshed = await refreshSession({ identityBase, refreshToken: from.refreshToken, fetchImpl }) } catch (err) { - // Concurrent-rotation tolerance: another `hyp` process sharing this 0600 - // store (a second MCP client, or a verb call alongside a proxy) may have - // refreshed first, consuming our refresh row so this refresh comes back - // `invalid_grant`. Re-read once: if the stored refresh token changed, that - // process already minted a fresh session - adopt it (its JWT if still - // fresh, else refresh with the rotated token) rather than forcing a - // needless re-login. An unchanged token is a real revocation; re-throw. if (!(err instanceof InvalidGrantError)) throw err - // Drop the parse cache before re-reading: we are reacting to a probable - // concurrent write, and a same-size rewrite landing within one mtime tick - // would otherwise be hidden behind the cache, making us miss the rotation. - rawCache = null - const latest = (await readCredentials(stateDir))[target] - if (!latest || latest.kind !== 'oidc' || latest.refreshToken === entry.refreshToken) throw err + // Our refresh row was consumed. Re-read under the lock so a sibling's + // winning commit is guaranteed visible (it holds the lock across its + // write). A changed-and-fresh token is an adoptable rotation; a changed + // stale token means loop and refresh with it; an unchanged token is a + // genuine revocation - re-throw. + const latest = await withCredentialsLock(stateDir, () => readOidcRecord(stateDir, target)) + if (!latest || latest.refreshToken === from.refreshToken) throw err if (isFresh(latest, now)) { return { ok: true, token: latest.accessJwt, source: 'file', kind: 'oidc' } } - entry = latest - refreshed = await refreshSession({ identityBase, refreshToken: latest.refreshToken, fetchImpl }) + from = latest + continue } - /** @type {RemoteOidcRecord} */ - // `org` is fixed for the refresh token's life; keep the stored one when the - // refresh response omits it (refreshSession returns '' in that case). A - // rotated refresh token (one-time-use servers) replaces the stored one; - // otherwise keep it, so the next refresh sends a live token rather than a - // consumed one that would 401 and force a re-login every session. - const next = { - ...entry, - accessJwt: refreshed.accessJwt, - expiresAt: refreshed.expiresAt, - org: refreshed.org || entry.org, - refreshToken: refreshed.refreshToken || entry.refreshToken, + const outcome = await withCredentialsLock(stateDir, async () => { + const latest = await readOidcRecord(stateDir, target) + if (latest && latest.refreshToken !== from.refreshToken) { + // A sibling rotated the one-time-use session under us while our refresh + // was in flight. Adopt a usable stored session; otherwise loop and + // refresh with the rotated token rather than clobber it. + if (isFresh(latest, now)) return { token: latest.accessJwt } + return { retryFrom: latest } + } + // No rotation: our result is authoritative. A rotated refresh token (from + // a one-time-use server) replaces the consumed one, otherwise the stored + // token is kept; `org` is fixed for the token's life, so an omitted one + // keeps the stored value. + const base = latest || from + await commitSession(stateDir, target, { + refreshToken: refreshed.refreshToken || base.refreshToken, + accessJwt: refreshed.accessJwt, + expiresAt: refreshed.expiresAt, + org: refreshed.org || base.org, + }) + return { token: refreshed.accessJwt } + }) + if (outcome.token !== undefined) { + return { ok: true, token: outcome.token, source: 'file', kind: 'oidc' } } - await writeSession(stateDir, target, next) - return { ok: true, token: refreshed.accessJwt, source: 'file', kind: 'oidc' } + from = /** @type {RemoteOidcRecord} */ (outcome.retryFrom) } - return { ok: true, token: entry.accessJwt, source: 'file', kind: 'oidc' } + // The stored session kept rotating under us past the attempt bound; treat it + // as unresolvable rather than spin forever. + return { ok: false, error: noTokenError(target, envName) } +} + +/** + * Read a target's normalized record inside a held lock, returning it only when + * it is an oidc session (a static record or absent target yields null). The + * caller's lock acquisition dropped the parse cache, so this reads from disk. + * + * @param {string} stateDir + * @param {string} target + * @returns {Promise} + */ +async function readOidcRecord(stateDir, target) { + const rec = (await readCredentials(stateDir))[target] + return rec && rec.kind === 'oidc' ? rec : null } /** diff --git a/test/core/remote-credentials-oidc.test.js b/test/core/remote-credentials-oidc.test.js index ce4b818..fdba5c6 100644 --- a/test/core/remote-credentials-oidc.test.js +++ b/test/core/remote-credentials-oidc.test.js @@ -223,6 +223,27 @@ test('resolveAccessJwt keeps the stored refresh token when the server does not r assert.equal(/** @type {any} */ (creds.prod).refreshToken, 'rt-stable') }) +test('resolveAccessJwt adopts a sibling login instead of clobbering it on a successful refresh', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'rt-old', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) + let calls = 0 + // While our refresh is in flight, a concurrent `hyp remote login` rotates the + // session on disk to a brand-new refresh token. Our refresh still succeeds, + // but its result was derived from the now-superseded rt-old, so committing it + // would clobber the fresh login. The compare-and-swap must adopt the login's + // session instead. + const fetchImpl = /** @type {any} */ (async () => { + calls++ + await writeSession(dir, 'prod', { refreshToken: 'rt-login', accessJwt: 'jwt-login', expiresAt: FUTURE, org: 'acme' }) + return { ok: true, status: 200, text: async () => JSON.stringify({ access_jwt: 'jwt-new', expires_at: FUTURE, org: 'acme', refresh_token: 'rt-new' }) } + }) + const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }) + assert.equal(/** @type {any} */ (r).token, 'jwt-login') + assert.equal(calls, 1) + // The login's refresh token survives; our refresh result did not overwrite it. + assert.equal(/** @type {any} */ ((await readCredentials(dir)).prod).refreshToken, 'rt-login') +}) + test('resolveAccessJwt keeps the stored org when a refresh response omits it', async () => { const dir = await tmpState() await writeSession(dir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) From ae39cb559ee9fa177e7ae59dcbf4de56ffbde6f8 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 21:45:27 -0700 Subject: [PATCH 46/59] remote/loopback: surface a provider error before the state check The callback handler ran the CSRF state comparison before the error check, so an error redirect that omits `state` (RFC 6749 only requires echoing it when present, and some providers drop it on errors like access_denied) fell into the state branch. The user was told their login hit a "mismatched state" and got the headless static-token hint, instead of the real "login failed: access_denied". Surface an `error` callback first. That branch never reads the authorization code, so the CSRF guard (which exists only to stop an attacker's code being adopted) does not apply, and the error value is already sanitized before it reaches the message, log, and terminal. The state guard still runs ahead of the code path. --- src/core/remote/loopback.js | 20 +++++++++++++------- test/core/remote-loopback.test.js | 15 +++++++++++++++ 2 files changed, 28 insertions(+), 7 deletions(-) diff --git a/src/core/remote/loopback.js b/src/core/remote/loopback.js index 4c006ee..f5986c9 100644 --- a/src/core/remote/loopback.js +++ b/src/core/remote/loopback.js @@ -98,13 +98,12 @@ export function startLoopbackReceiver({ state, timeoutMs = DEFAULT_TIMEOUT_MS }) const error = params.get('error') const code = params.get('code') - // A callback whose state does not match is rejected without reading the - // code (CSRF guard). Serve a neutral page either way. - if (returnedState !== state) { - respond(res, 'Login state mismatch. You can close this tab.') - fail(new Error('loopback received a callback with a mismatched state'), 'state_mismatch') - return - } + // An `error` callback is surfaced before the state check: it carries no + // authorization code, so the CSRF guard (which exists only to stop an + // attacker's code being adopted) does not apply, and a provider that drops + // `state` on an error redirect (RFC 6749 only requires echoing it when + // present) must still report the real failure rather than a misleading + // "state mismatch" plus a headless-token hint. if (params.has('error')) { // The redirect's `error` is attacker-influenceable (anyone can hit the // loopback port). Bound it to a safe token before it reaches the error @@ -115,6 +114,13 @@ export function startLoopbackReceiver({ state, timeoutMs = DEFAULT_TIMEOUT_MS }) fail(Object.assign(new Error(`login failed: ${safeError}`), { callbackError: safeError }), safeError) return } + // A callback whose state does not match is rejected without reading the + // code (CSRF guard). Serve a neutral page either way. + if (returnedState !== state) { + respond(res, 'Login state mismatch. You can close this tab.') + fail(new Error('loopback received a callback with a mismatched state'), 'state_mismatch') + return + } if (!code) { respond(res, 'Login response was missing a code. You can close this tab.') fail(new Error('loopback callback had neither code nor error'), 'no_code') diff --git a/test/core/remote-loopback.test.js b/test/core/remote-loopback.test.js index f4ee1c5..52a27fd 100644 --- a/test/core/remote-loopback.test.js +++ b/test/core/remote-loopback.test.js @@ -75,6 +75,21 @@ test('an error= callback rejects with the error code attached', async () => { await assertion }) +test('an error= callback with no state reports the real error, not a state mismatch', async () => { + const recv = await startLoopbackReceiver({ state: 's1', timeoutMs: 2000 }) + // A provider that drops `state` on an error redirect must still surface the + // denial, not be misreported as a CSRF/state mismatch (which also appends a + // misleading headless-token hint downstream). + const assertion = assert.rejects(() => recv.waitForCode(), (err) => { + assert.match(/** @type {Error} */ (err).message, /access_denied/) + assert.doesNotMatch(/** @type {Error} */ (err).message, /mismatched state/) + assert.equal(/** @type {any} */ (err).callbackError, 'access_denied') + return true + }) + await hitCallback(recv.redirectUri, { error: 'access_denied' }) + await assertion +}) + test('a timeout rejects', async () => { const recv = await startLoopbackReceiver({ state: 's1', timeoutMs: 50 }) await assert.rejects(() => recv.waitForCode(), /timed out/) From 55151277c976faaef75fa1c7fda107890a1c92a6 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 21:48:03 -0700 Subject: [PATCH 47/59] remote/commands: guard static-login and remove against the new lock throw writeToken and removeToken now run inside withCredentialsLock, which throws "timed out acquiring the remote credentials lock" under contention. Their call sites in the static-login (persistStaticToken) and remove (runRemoteRemove) paths were unguarded, so a contended write surfaced a raw uncaught error instead of the friendly `hyp remote login:` / `hyp remote remove:` message that held before the lock existed. The remove path also already mutated config before the throw, leaving a partial removal with no explanation. Wrap both calls. Static login keeps its `hyp remote login:` contract and exit 1; remove reports the lock failure and, since the config edit already landed, tells the user the stored token could not be removed. --- src/core/cli/remote_commands.js | 23 +++++++++++++++++++++-- test/core/remote-login-command.test.js | 26 +++++++++++++++++++++++++- 2 files changed, 46 insertions(+), 3 deletions(-) diff --git a/src/core/cli/remote_commands.js b/src/core/cli/remote_commands.js index 8884031..b1a9bf5 100644 --- a/src/core/cli/remote_commands.js +++ b/src/core/cli/remote_commands.js @@ -213,7 +213,14 @@ async function runStaticLogin(name, tokenFile, stdin, ctx) { */ async function persistStaticToken(name, token, ctx) { const stateDir = readObservabilityEnv(ctx.env).stateDir - await writeToken(stateDir, name, token) + try { + await writeToken(stateDir, name, token) + } catch (err) { + // writeToken now contends for the cross-process credentials lock and can + // throw a lock timeout; keep the friendly `hyp remote login:` contract. + ctx.stderr.write(`hyp remote login: ${err instanceof Error ? err.message : String(err)}\n`) + return 1 + } ctx.stdout.write(`stored query-scoped token for '${name}' (mode 0600)\n`) // A friendly nudge if the target isn't configured: the token still @@ -365,7 +372,19 @@ export async function runRemoteRemove(argv, ctx) { return 1 } const stateDir = readObservabilityEnv(ctx.env).stateDir - const removedToken = await removeToken(stateDir, name) + let removedToken = false + try { + removedToken = await removeToken(stateDir, name) + } catch (err) { + // removeToken now contends for the cross-process credentials lock and can + // throw a lock timeout. The config edit above already landed, so report the + // partial state rather than letting a raw error escape. + ctx.stderr.write(`hyp remote remove: ${err instanceof Error ? err.message : String(err)}\n`) + if (removedConfig) { + ctx.stderr.write(` (removed '${name}' from config; its stored token could not be removed)\n`) + } + return 1 + } if (!removedConfig && !removedToken) { ctx.stderr.write(`hyp remote remove: no target or token named '${name}'\n`) return 1 diff --git a/test/core/remote-login-command.test.js b/test/core/remote-login-command.test.js index 277eca7..4d68050 100644 --- a/test/core/remote-login-command.test.js +++ b/test/core/remote-login-command.test.js @@ -6,7 +6,7 @@ import fs from 'node:fs/promises' import os from 'node:os' import path from 'node:path' -import { runRemoteLogin } from '../../src/core/cli/remote_commands.js' +import { runRemoteLogin, runRemoteRemove } from '../../src/core/cli/remote_commands.js' import { deriveIdentityBase, readCredentials } from '../../src/core/remote/credentials.js' async function tmpHome() { @@ -165,6 +165,30 @@ test('the static --token-file path is unchanged (stores kind: static)', async () assert.deepEqual(creds.prod, { kind: 'static', token: 'sk-static' }) }) +test('a static login write failure keeps the friendly hyp remote login: message', async () => { + const hypHome = await tmpHome() + // Put a file where the state dir must be created, so writeToken's lock setup + // (mkdir) fails like a contended lock would, surfacing a thrown error. + await fs.writeFile(path.join(hypHome, 'hypaware'), 'not a dir') + const tokenFile = path.join(hypHome, 'tok.txt') + await fs.writeFile(tokenFile, 'sk-static\n') + const { ctx, err } = await makeCtx({ hypHome }) + const code = await runRemoteLogin(['prod', '--token-file', tokenFile], ctx, {}) + assert.equal(code, 1) + assert.match(err.join(''), /^hyp remote login: /m) +}) + +test('a remove whose token removal fails reports the partial state, not a raw throw', async () => { + const hypHome = await tmpHome() + await fs.writeFile(path.join(hypHome, 'hypaware'), 'not a dir') + const { ctx, err } = await makeCtx({ hypHome }) + const code = await runRemoteRemove(['prod'], ctx) + assert.equal(code, 1) + assert.match(err.join(''), /^hyp remote remove: /m) + // The config edit already landed, so the user is told the token lingered. + assert.match(err.join(''), /removed 'prod' from config; its stored token could not be removed/) +}) + test('a piped stdin token still takes the static path', async () => { const hypHome = await tmpHome() // A non-TTY stdin that yields a token, the way a piped `echo tok |` does. From 5662d2f19b3e6d9b2083d77b87d35079733bbc8d Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 21:49:21 -0700 Subject: [PATCH 48/59] remote/identity: treat an empty 2xx token body as transient, not missing-field safeText returns '' for both an empty body and a mid-body read failure (a reset connection). On a successful token response that empty string parsed to {} with parseFailed=false, so it slipped past the non-JSON guard and reached the field extractors, which threw "identity response missing 'access_jwt'" - a permanent-looking contract error for what was a transient network blip during a silent refresh. Reject an empty body on a 2xx with an explicit "empty response - try again" error before the field extraction, so the failure reads as transient. --- src/core/remote/identity_client.js | 11 +++++++++-- test/core/remote-identity-client.test.js | 15 +++++++++++++++ 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/src/core/remote/identity_client.js b/src/core/remote/identity_client.js index 794f297..55c1145 100644 --- a/src/core/remote/identity_client.js +++ b/src/core/remote/identity_client.js @@ -163,8 +163,8 @@ async function postToken({ identityBase, body, fetchImpl, operation }) { } // Parse failure is not fatal on its own: an error response may carry an empty // or non-JSON body, and we still want to classify it by status below. Only a - // *successful* response with an unparseable body is an error (we can't read - // the tokens out of it). + // *successful* response with an empty or unparseable body is an error (we + // can't read the tokens out of it) - see the post-status guards. /** @type {any} */ let json = {} let parseFailed = false @@ -195,6 +195,13 @@ async function postToken({ identityBase, body, fetchImpl, operation }) { if (parseFailed) { throw new Error(`identity endpoint returned a non-JSON response (HTTP ${res.status})`) } + if (text === '') { + // safeText returns '' for both an empty body and a mid-body read failure + // (a reset connection). On a 2xx that is a transient truncation, not a + // contract violation, so fail with that framing rather than letting the + // field extractors below report a misleading "missing 'access_jwt'". + throw new Error(`identity endpoint returned an empty response (HTTP ${res.status}) - try again`) + } log.info('remote.token_ok', { [Attr.COMPONENT]: 'remote-oidc', [Attr.OPERATION]: operation, diff --git a/test/core/remote-identity-client.test.js b/test/core/remote-identity-client.test.js index 1ab34a0..4d6f9d7 100644 --- a/test/core/remote-identity-client.test.js +++ b/test/core/remote-identity-client.test.js @@ -123,6 +123,21 @@ test('a response missing access_jwt is rejected', async () => { ) }) +test('a 2xx with an empty body fails as transient, not a misleading missing-field error', async () => { + // safeText returns '' for both an empty body and a mid-body read failure; on a + // success status that is a transient truncation, so it must not surface as + // "missing 'access_jwt'" (which reads like a permanent contract violation). + const { fetchImpl } = stubFetch({ status: 200, body: '' }) + await assert.rejects( + () => refreshSession({ identityBase: 'https://hyp.internal/v1/identity', refreshToken: 'rt', fetchImpl }), + (err) => { + assert.match(/** @type {Error} */ (err).message, /empty response/) + assert.doesNotMatch(/** @type {Error} */ (err).message, /missing/) + return true + }, + ) +}) + test('a non-date expires_at is rejected at refresh time, not stored to loop forever', async () => { const { fetchImpl } = stubFetch({ body: { access_jwt: 'jwt', expires_at: '1719600000', org: 'acme' } }) await assert.rejects( From 82b5fe089455ad77beae42a88eec46b36c28ae5c Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 21:50:23 -0700 Subject: [PATCH 49/59] remote/credentials: drop an empty static token on read normalizeRecord accepted a record whose `token` was the empty string as a present static record, so `hyp remote list` reported it as `stored` while resolveToken and resolveAccessJwt both rejected an empty token with the "run hyp remote login" guidance - contradictory signals for the same target. An empty token is no credential, so normalize it to absent; list and the resolvers now agree the target is logged out. --- src/core/remote/credentials.js | 7 +++++-- test/core/remote-credentials-oidc.test.js | 10 ++++++++++ 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index 3136525..b7343ac 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -206,8 +206,11 @@ function normalizeRecord(entry) { // record that also carries a static `token` is not hijacked away from it. return oidc(e.accessJwt) } - // Legacy / static: a bare token with no (or static) kind. - if (typeof e.token === 'string') { + // Legacy / static: a bare token with no (or static) kind. An empty token is + // no credential at all - resolveToken/resolveAccessJwt both reject it - so + // drop it here too, otherwise `remote list` would report a present-but-unusable + // record as `stored` while every query says the target is logged out. + if (typeof e.token === 'string' && e.token.length > 0) { return { kind: 'static', token: e.token } } return null diff --git a/test/core/remote-credentials-oidc.test.js b/test/core/remote-credentials-oidc.test.js index fdba5c6..e670332 100644 --- a/test/core/remote-credentials-oidc.test.js +++ b/test/core/remote-credentials-oidc.test.js @@ -79,6 +79,16 @@ test('an oidc record with a refresh token but no cached accessJwt is kept (refre assert.deepEqual((await readCredentials(dir)).prod, { kind: 'oidc', refreshToken: 'rt', accessJwt: '', expiresAt: '', org: '' }) }) +test('a static record with an empty token is dropped on read (not reported as stored)', async () => { + const dir = await tmpState() + // A hand-edited / partially-written record with no usable token. It must read + // as absent so `remote list` and the resolvers agree it is logged out. + await fs.writeFile(remoteCredentialsPath(dir), JSON.stringify({ prod: { token: '' } })) + assert.deepEqual(await readCredentials(dir), {}) + const r = await resolveToken({ target: 'prod', env: {}, stateDir: dir }) + assert.equal(r.ok, false) +}) + test('an oidc record with neither a refresh token nor a static token is dropped on read', async () => { const dir = await tmpState() await fs.writeFile(remoteCredentialsPath(dir), JSON.stringify({ prod: { kind: 'oidc', accessJwt: 'orphan-jwt' } })) From da6aabde2b9267539768b303f45c0bf908c53f31 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 21:52:10 -0700 Subject: [PATCH 50/59] mcp: share one MCP request-header builder across client and proxy The Streamable-HTTP request header map (content-type, the JSON+SSE accept, bearer, mcp-session-id) was built byte-identically in two places: the in-process client's rpc() and the stdio proxy's forward(). A protocol header change in one would silently miss the other, drifting `hyp --remote` from `hyp mcp`. Extract mcpRequestHeaders() in client.js (alongside the already-shared isAuthStatus and parseRpcResponse) and call it from both. --- src/core/mcp/client.js | 26 +++++++++++++++++++------- src/core/mcp/proxy.js | 10 ++-------- 2 files changed, 21 insertions(+), 15 deletions(-) diff --git a/src/core/mcp/client.js b/src/core/mcp/client.js index 85aae8a..f1c135a 100644 --- a/src/core/mcp/client.js +++ b/src/core/mcp/client.js @@ -51,13 +51,7 @@ export function createHttpMcpClient(opts) { method, ...(params !== undefined ? { params } : {}), } - /** @type {Record} */ - const headers = { - 'content-type': 'application/json', - accept: 'application/json, text/event-stream', - ...(opts.token ? { authorization: `Bearer ${opts.token}` } : {}), - ...(sessionId ? { 'mcp-session-id': sessionId } : {}), - } + const headers = mcpRequestHeaders({ token: opts.token, sessionId }) const res = await doFetch(opts.url, { method: 'POST', headers, body: JSON.stringify(body) }) const sid = res.headers?.get?.('mcp-session-id') if (sid) sessionId = sid @@ -106,6 +100,24 @@ export function createHttpMcpClient(opts) { } } +/** + * Build the headers an MCP Streamable-HTTP POST needs: JSON + SSE content + * negotiation, an optional bearer, and the session id once the server has + * issued one. The single home for the request header shape, shared with the + * stdio proxy's per-message forward so the two transports can't drift. + * + * @param {{ token?: string, sessionId?: string }} args + * @returns {Record} + */ +export function mcpRequestHeaders({ token, sessionId }) { + return { + 'content-type': 'application/json', + accept: 'application/json, text/event-stream', + ...(token ? { authorization: `Bearer ${token}` } : {}), + ...(sessionId ? { 'mcp-session-id': sessionId } : {}), + } +} + /** * Whether an HTTP status is an auth rejection (401/403) eligible for a one-shot * refresh + retry. The single home for "which statuses mean auth", shared with diff --git a/src/core/mcp/proxy.js b/src/core/mcp/proxy.js index 999845f..4ddf65f 100644 --- a/src/core/mcp/proxy.js +++ b/src/core/mcp/proxy.js @@ -6,7 +6,7 @@ import { Attr, getLogger } from '../observability/index.js' import { readObservabilityEnv } from '../observability/env.js' import { attachWithRefresh, deriveIdentityBase, describeAuthRejection, resolveAccessJwt, resolveToken } from '../remote/credentials.js' import { describeRefreshError, NO_FETCH_MESSAGE } from '../remote/identity_client.js' -import { isAuthStatus, parseRpcResponse } from './client.js' +import { isAuthStatus, mcpRequestHeaders, parseRpcResponse } from './client.js' import { INTERNAL_ERROR, jsonRpcError } from './jsonrpc.js' import { serveStdio } from './stdio.js' @@ -70,13 +70,7 @@ export async function runMcpProxy({ target, ctx }) { /** Forward one message to the remote with the given bearer token. */ const forward = (/** @type {any} */ message, /** @type {string} */ token) => { - /** @type {Record} */ - const headers = { - 'content-type': 'application/json', - accept: 'application/json, text/event-stream', - authorization: `Bearer ${token}`, - ...(sessionId ? { 'mcp-session-id': sessionId } : {}), - } + const headers = mcpRequestHeaders({ token, sessionId }) return fetchImpl(entry.url, { method: 'POST', headers, body: JSON.stringify(message) }) } From 7390eed44b75fdad9bd77e566ab6568fd78c1605 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Mon, 29 Jun 2026 23:34:07 -0700 Subject: [PATCH 51/59] remote/credentials: don't resurrect a session removed during refresh The OIDC refresh compare-and-swap re-read the stored record under the lock but fell back to the in-flight `from` record when the row was gone (`base = latest || from`), so a concurrent `hyp remote remove` landing while a refresh was in flight got its deletion clobbered: the commit wrote the stale session back. Treat a missing record under the lock as a removal and decline to commit, returning the normal no-token error. --- src/core/remote/credentials.js | 14 ++++++++++---- test/core/remote-credentials-oidc.test.js | 21 +++++++++++++++++++++ 2 files changed, 31 insertions(+), 4 deletions(-) diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index b7343ac..2a90180 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -428,7 +428,13 @@ async function refreshOidcSession({ target, stateDir, identityBase, fetchImpl, n } const outcome = await withCredentialsLock(stateDir, async () => { const latest = await readOidcRecord(stateDir, target) - if (latest && latest.refreshToken !== from.refreshToken) { + if (!latest) { + // `hyp remote remove` deleted the session while our refresh was in + // flight. Honor the deletion under the lock instead of writing the + // stale row back and resurrecting removed credentials. + return { removed: true } + } + if (latest.refreshToken !== from.refreshToken) { // A sibling rotated the one-time-use session under us while our refresh // was in flight. Adopt a usable stored session; otherwise loop and // refresh with the rotated token rather than clobber it. @@ -439,15 +445,15 @@ async function refreshOidcSession({ target, stateDir, identityBase, fetchImpl, n // a one-time-use server) replaces the consumed one, otherwise the stored // token is kept; `org` is fixed for the token's life, so an omitted one // keeps the stored value. - const base = latest || from await commitSession(stateDir, target, { - refreshToken: refreshed.refreshToken || base.refreshToken, + refreshToken: refreshed.refreshToken || latest.refreshToken, accessJwt: refreshed.accessJwt, expiresAt: refreshed.expiresAt, - org: refreshed.org || base.org, + org: refreshed.org || latest.org, }) return { token: refreshed.accessJwt } }) + if (outcome.removed) return { ok: false, error: noTokenError(target, envName) } if (outcome.token !== undefined) { return { ok: true, token: outcome.token, source: 'file', kind: 'oidc' } } diff --git a/test/core/remote-credentials-oidc.test.js b/test/core/remote-credentials-oidc.test.js index e670332..5e34b2d 100644 --- a/test/core/remote-credentials-oidc.test.js +++ b/test/core/remote-credentials-oidc.test.js @@ -254,6 +254,27 @@ test('resolveAccessJwt adopts a sibling login instead of clobbering it on a succ assert.equal(/** @type {any} */ ((await readCredentials(dir)).prod).refreshToken, 'rt-login') }) +test('resolveAccessJwt honors a concurrent remove instead of resurrecting the session', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'rt-old', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) + let calls = 0 + // While our refresh is in flight, the user runs `hyp remote remove prod`, + // deleting the row. Our refresh still succeeds, but committing its result + // would write the removed session back. The compare-and-swap must observe the + // deletion under the lock and decline to resurrect it. + const fetchImpl = /** @type {any} */ (async () => { + calls++ + await removeToken(dir, 'prod') + return { ok: true, status: 200, text: async () => JSON.stringify({ access_jwt: 'jwt-new', expires_at: FUTURE, org: 'acme', refresh_token: 'rt-new' }) } + }) + const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }) + assert.equal(r.ok, false) + assert.match(/** @type {any} */ (r).error, /no token for 'prod' - run 'hyp remote login prod'/) + assert.equal(calls, 1) + // The removed session stays removed; nothing was written back. + assert.equal(/** @type {any} */ ((await readCredentials(dir)).prod), undefined) +}) + test('resolveAccessJwt keeps the stored org when a refresh response omits it', async () => { const dir = await tmpState() await writeSession(dir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) From 4cb8898f6738abb180a68c6c61d913131ac57405 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Tue, 30 Jun 2026 10:13:30 -0700 Subject: [PATCH 52/59] remote/identity: don't borrow refresh wording on an authorization_code 401 A rejected authorization code (expired, replayed, or mis-scoped client) went through the same 401 -> InvalidGrantError mapping as a refresh, so a first-time login reported 'refresh token was rejected' for a credential that never had a refresh token. Map an authorization_code 401 to a code-specific message instead. --- src/core/remote/identity_client.js | 6 ++++++ test/core/remote-identity-client.test.js | 14 ++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/src/core/remote/identity_client.js b/src/core/remote/identity_client.js index 55c1145..4bfec10 100644 --- a/src/core/remote/identity_client.js +++ b/src/core/remote/identity_client.js @@ -188,6 +188,12 @@ async function postToken({ identityBase, body, fetchImpl, operation }) { // `invalid_grant`, so honor an explicit `invalid_grant` at any status too; // a different 400 (a malformed request) stays a generic error. if (errCode === 'invalid_grant' || res.status === 401) { + // The authorization_code grant has no refresh token: a rejected code + // (expired, replayed, or mis-scoped client) must not borrow the refresh + // wording, or a first-time login reports a refresh token it never had. + if (body.grant_type === 'authorization_code') { + throw new Error('the authorization code was rejected (it may have expired or already been used) - run the login again') + } throw new InvalidGrantError() } throw new Error(`identity endpoint rejected the grant (HTTP ${res.status}${errCode ? ` ${errCode}` : ''})`) diff --git a/test/core/remote-identity-client.test.js b/test/core/remote-identity-client.test.js index 4d6f9d7..3b6d8d8 100644 --- a/test/core/remote-identity-client.test.js +++ b/test/core/remote-identity-client.test.js @@ -85,6 +85,20 @@ test('a 401 invalid_grant surfaces a typed InvalidGrantError', async () => { ) }) +test('a 401 on the authorization_code grant does not borrow the refresh-token wording', async () => { + // A first-time login has no refresh token, so a rejected code (expired, + // replayed, mis-scoped client) must not report "refresh token was rejected". + const { fetchImpl } = stubFetch({ status: 401, body: { error: 'invalid_grant' } }) + await assert.rejects( + () => exchangeCode({ identityBase: 'https://hyp.internal/v1/identity', code: 'spent', codeVerifier: 'v', fetchImpl }), + (err) => { + assert.ok(!(err instanceof InvalidGrantError)) + assert.match(/** @type {Error} */ (err).message, /authorization code was rejected/) + return true + }, + ) +}) + test('a 401 with an empty body still surfaces InvalidGrantError (re-login guidance)', async () => { // An edge proxy may answer the revoked refresh row with a bare 401, no OAuth // error object. It must still map to session-expired, not a generic error. From 5ed5eef621077cfed81082000cd061e4ee4df8c0 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Tue, 30 Jun 2026 10:13:36 -0700 Subject: [PATCH 53/59] remote/login: don't blame the browser flow when only the session write fails In runBrowserLogin the single-use code exchange and writeSession shared one try/catch, so a write failure (most likely a lock timeout under a concurrent hyp process) printed the generic login-failed message plus the headless 'no browser' hint, even though the sign-in succeeded. Split the two phases: a post-exchange write failure now reports a store failure and points at the lock, without implying the browser login failed. --- src/core/cli/remote_commands.js | 21 +++++++++++++++++---- test/core/remote-login-command.test.js | 18 ++++++++++++++++++ 2 files changed, 35 insertions(+), 4 deletions(-) diff --git a/src/core/cli/remote_commands.js b/src/core/cli/remote_commands.js index b1a9bf5..68bdffd 100644 --- a/src/core/cli/remote_commands.js +++ b/src/core/cli/remote_commands.js @@ -18,6 +18,7 @@ import { loginWithBrowser } from '../remote/oidc_login.js' /** * @import { CommandRunContext } from '../../../collectivus-plugin-kernel-types.js' + * @import { OidcSession } from '../../../src/core/remote/types.js' */ /** @@ -258,16 +259,15 @@ async function runBrowserLogin(name, { org, noBrowser }, ctx, login) { } const stateDir = readObservabilityEnv(ctx.env).stateDir + /** @type {OidcSession} */ + let session try { - const session = await login({ + session = await login({ identityBase, org, noBrowser, print: (line) => ctx.stderr.write(`${line}\n`), }) - await writeSession(stateDir, name, session) - ctx.stdout.write(`logged in to '${name}' as org '${session.org}' (session stored, mode 0600)\n`) - return 0 } catch (err) { const callbackError = /** @type {any} */ (err)?.callbackError ctx.stderr.write(`hyp remote login: ${explainLoginError(callbackError, err)}\n`) @@ -280,6 +280,19 @@ async function runBrowserLogin(name, { org, noBrowser }, ctx, login) { } return 1 } + // The single-use code is already spent by here, so a write failure (most + // likely a lock timeout under a concurrent hyp process) is not a login + // failure: say the sign-in worked but the store did not, and do not print the + // headless hint, which would wrongly imply the browser flow itself failed. + try { + await writeSession(stateDir, name, session) + } catch (err) { + ctx.stderr.write(`hyp remote login: signed in but could not store the session: ${err instanceof Error ? err.message : String(err)}\n`) + ctx.stderr.write(" (re-run 'hyp remote login' once any other hyp process releases the credentials lock)\n") + return 1 + } + ctx.stdout.write(`logged in to '${name}' as org '${session.org}' (session stored, mode 0600)\n`) + return 0 } /** diff --git a/test/core/remote-login-command.test.js b/test/core/remote-login-command.test.js index 4d68050..f6b171b 100644 --- a/test/core/remote-login-command.test.js +++ b/test/core/remote-login-command.test.js @@ -64,6 +64,24 @@ test('browser mode forwards --org and the derived identity base, then stores the assert.equal(/** @type {any} */ (creds.prod).refreshToken, 'rt') }) +test('a successful sign-in whose session write fails reports a store failure, not a login failure', async () => { + const hypHome = await tmpHome() + const { ctx, out, err } = await makeCtx({ hypHome }) + // Make the session write fail: put a plain file where the state dir must be, + // so withCredentialsLock's mkdir throws. The single-use code is already spent. + await fs.writeFile(path.join(hypHome, 'hypaware'), 'not a dir') + const login = /** @type {any} */ (async () => ({ + refreshToken: 'rt', accessJwt: 'jwt', expiresAt: '2999-01-01T00:00:00Z', org: 'acme', + })) + + const code = await runRemoteLogin(['prod'], ctx, { login }) + assert.equal(code, 1) + // The browser flow itself worked, so do not blame it or print the headless hint. + assert.match(err.join(''), /signed in but could not store the session/) + assert.doesNotMatch(err.join(''), /machine with no browser/) + assert.equal(out.join(''), '') +}) + test('--no-browser passes noBrowser through to the flow', async () => { const hypHome = await tmpHome() const { ctx } = await makeCtx({ hypHome }) From d2a3c4b321ac998c83807bb86a1fc9a5a184c227 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Tue, 30 Jun 2026 10:13:42 -0700 Subject: [PATCH 54/59] remote/credentials: name the target in the static-token rejection guidance describeAuthRejection's static-token branch told the user to 're-run hyp remote login' with no target name, while the env branch, sessionExpiredMessage, and noTokenError all include it. Thread the target through so the re-login advice is consistent across paths. --- src/core/remote/credentials.js | 2 +- test/core/remote-verb-oidc.test.js | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index 2a90180..7d02eb9 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -554,7 +554,7 @@ export function describeAuthRejection({ target, status, resolved }) { } } return { - message: `remote rejected the credential (HTTP ${status}) - re-run 'hyp remote login'`, + message: `remote rejected the credential for '${target}' (HTTP ${status}) - re-run 'hyp remote login ${target}'`, exitCode: 1, } } diff --git a/test/core/remote-verb-oidc.test.js b/test/core/remote-verb-oidc.test.js index 500cf41..23a83d5 100644 --- a/test/core/remote-verb-oidc.test.js +++ b/test/core/remote-verb-oidc.test.js @@ -171,7 +171,7 @@ test('a static token that 401s is not retried (cannot refresh)', async (t) => { const code = await cmd.run(['SELECT 1', '--remote', 'prod'], ctx) assert.equal(code, 1) assert.equal(state.refreshCalls, 0) - assert.match(err.join(''), /re-run 'hyp remote login'/) + assert.match(err.join(''), /re-run 'hyp remote login prod'/) }) test('an env-override token that 401s does not advise a re-login it cannot fix', async (t) => { From 5ae41857c1a185327dc9918777271f5d387b9293 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Tue, 30 Jun 2026 11:15:25 -0700 Subject: [PATCH 55/59] remote/credentials: make OIDC refresh single-flight under a real lock The optimistic refresh compare-and-swap (refresh outside the lock, commit if the stored token still matched) had a race: a loser that re-read on invalid_grant before the winner committed saw an unchanged token and wrongly forced a re-login. The lock's stale-steal was also unsafe - two contenders could both steal an aged lock and clobber each other, and a slow-but-live holder could be stolen by the 30s age guess. Refresh is now single-flight: the whole read-decide-refresh-commit runs inside one lock hold, so only one process ever calls the token endpoint for a store at a time. A sibling never sees invalid_grant from contention, so invalid_grant under the lock is an unambiguous revocation; the retry loop and the "did the token change" re-read are gone. The lock records the holder's host+pid and is a real mutex: a contender steals a held lock only when that process is confirmed dead, and steals atomically (single-winner rename) so two stealers can't both adopt it; a holder only removes a lock still its own. Age remains a backstop for a cross-host or unparseable lock. Lock wait/stale thresholds raised to cover holding the lock across the bounded token call. Updates LLP 0046 D5 to specify the mechanism, not the racy policy. --- llp/0046-oidc-login-client.decision.md | 30 ++- src/core/remote/credentials.js | 260 ++++++++++++++-------- test/core/remote-credentials-oidc.test.js | 96 ++++---- 3 files changed, 242 insertions(+), 144 deletions(-) diff --git a/llp/0046-oidc-login-client.decision.md b/llp/0046-oidc-login-client.decision.md index 9d247a8..9233521 100644 --- a/llp/0046-oidc-login-client.decision.md +++ b/llp/0046-oidc-login-client.decision.md @@ -105,12 +105,30 @@ browser flow. Rejected refreshing eagerly on every call (needless token-endpoint traffic). The per-target env override (`HYP_REMOTE_TOKEN_`) still wins for CI. Because the `0600` store is shared by concurrent `hyp` processes (a second MCP client, -or a verb call beside a running proxy), an `invalid_grant` is not unconditionally -terminal: with one-time-use refresh tokens, the loser of a refresh race sees -`invalid_grant` only because the winner already rotated the row. The resolver re-reads -the store once on `invalid_grant`; if the stored refresh token changed, it adopts the -winner's freshly-minted session instead of forcing a re-login. An unchanged token is a -genuine revocation and still surfaces the re-login guidance. +or a verb call beside a running proxy), refreshing a one-time-use refresh token is +**single-flight under the write lock**, not an optimistic race. A resolver that finds a +stale (or force-refreshed) JWT takes the cross-process lock, re-reads the store, and then +either adopts a sibling's session if one was minted while it waited (a different, +still-fresh JWT, with no token-endpoint call) or performs the refresh and commits, all +inside the one lock hold. Only one process ever calls the token endpoint for a given +store at a time, so the lost-refresh-race case disappears at the source: a sibling never +sees `invalid_grant` from contention because it never refreshes concurrently. An +`invalid_grant` observed under the lock is therefore an unambiguous revocation and +surfaces the "re-run `hyp remote login`" guidance. **Rejected** the optimistic +compare-and-swap (refresh outside the lock, commit only if the stored token still +matched): its "unchanged token means revoked" test had a race window, because a loser +could re-read before the winner committed and then wrongly force a re-login. + +The lock is held across the bounded (token-endpoint timeout) refresh call, so it is a +real mutex, not a millisecond commit latch. **Decision:** the lock file records the +holder's `{host, pid}`, and a contender steals it only when that holder is confirmed dead +(a same-host liveness probe), not by a fixed age guess; the steal is an atomic rename so +two contenders can never both adopt the same abandoned lock, and a holder only ever +removes a lock that is still its own. A long age threshold remains a backstop for a +holder on another machine (a shared `HOME`) or an unparseable lock. **Rejected** stealing +purely by lock-file age: a fixed age cannot tell a crashed holder from a merely slow +refresh, so it both wedged longer than needed on crashes and let two live writers clobber +each other (the later rename dropping the other's just-rotated refresh token). ### D6: Identity endpoints derive from the configured remote URL origin diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index 7d02eb9..2f15ed2 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -1,10 +1,11 @@ // @ts-check import fs from 'node:fs/promises' +import os from 'node:os' import path from 'node:path' import process from 'node:process' -import { InvalidGrantError, refreshSession, sessionExpiredMessage } from './identity_client.js' +import { refreshSession, sessionExpiredMessage } from './identity_client.js' /** * @import { Stats } from 'node:fs' @@ -34,18 +35,23 @@ const CREDENTIALS_BASENAME = 'remote-credentials.json' /** Refresh an `oidc` access JWT once it is within this window of expiry. */ const REFRESH_SKEW_MS = 60 * 1000 -/** Max time to wait for the cross-process write lock before giving up. */ -const LOCK_TIMEOUT_MS = 5 * 1000 - -/** A lock file older than this is treated as abandoned by a crashed holder. */ -const LOCK_STALE_MS = 30 * 1000 +/** + * Max time to wait for the cross-process write lock before giving up. The lock + * is now held across the bounded token-endpoint refresh (single-flight, see + * {@link refreshOidcSession}), so a waiter must outlast one in-flight refresh + * rather than a millisecond commit. A crashed holder is stolen immediately by + * liveness, so this only bounds waiting on a *live* sibling's refresh. + */ +const LOCK_TIMEOUT_MS = 45 * 1000 /** - * Bound on the optimistic refresh compare-and-swap loop. Each extra attempt - * means a sibling rotated the refresh token between our refresh and our commit; - * a handful of rounds is far more than any real contention reaches. + * Age backstop for stealing an abandoned lock when liveness can't be checked: a + * holder on another machine (a shared `HOME`) or an unparseable/legacy lock + * file. It must exceed the longest legitimate hold (one refresh) so a live, slow + * holder is never stolen; same-host crashes are caught far sooner by pid + * liveness, not by this age. */ -const REFRESH_MAX_ATTEMPTS = 4 +const LOCK_STALE_MS = 90 * 1000 /** * Single-entry parse cache for the credential file. The stdio proxy resolves a @@ -378,21 +384,23 @@ export async function resolveAccessJwt({ target, env, stateDir, identityBase, no } /** - * Refresh an oidc session with an optimistic compare-and-swap commit, so two - * `hyp` processes sharing this 0600 store (a verb call beside a proxy, two MCP - * clients) never double-spend a one-time-use refresh row or clobber a session a - * sibling just rotated. + * Refresh an oidc session **single-flight under the write lock**, so two `hyp` + * processes sharing this 0600 store (a verb call beside a proxy, two MCP clients) + * never double-spend a one-time-use refresh row or clobber a session a sibling + * just rotated. * - * The network refresh runs OUTSIDE the write lock: a 30s identity call must not - * wedge the 5s lock that static-token writes also contend for. Only the commit - * takes the lock, and persists our result solely when the stored refresh token - * still matches the one we refreshed from. A mismatch means a sibling rotated - * first (a parallel proxy refresh, or a concurrent `hyp remote login`); we adopt - * the stored session when it is usable, or loop and refresh again with the - * rotated token, rather than overwriting it. On `invalid_grant` we re-read under - * the lock - which blocks until any in-flight sibling commit lands, closing the - * window where a loser refreshed before the winner persisted - and adopt a - * changed token; an unchanged token is a genuine revocation and re-throws. + * The whole read-decide-refresh-commit runs inside one lock hold, so only one + * process ever calls the token endpoint for a given store at a time. After + * acquiring the lock we re-read the store: if a sibling already minted a newer, + * still-fresh JWT while we waited, we adopt it with no network call; otherwise we + * refresh from the freshest stored refresh token and commit. Because no sibling + * refreshes concurrently, an `invalid_grant` here is an unambiguous revocation, + * not a lost race - it propagates (as a typed error) to the re-login guidance, + * with none of the old "did the token change since I refreshed" re-read to get + * wrong. The adopt check compares against the JWT we entered with (`from`), so a + * forced refresh of that same failing JWT still refreshes rather than re-adopting + * it. Holding the lock across the bounded token call is the cost of removing the + * race entirely; see {@link withCredentialsLock} for why that is safe. * * @param {{ * target: string, @@ -404,64 +412,35 @@ export async function resolveAccessJwt({ target, env, stateDir, identityBase, no * envName: string, * }} args * @returns {Promise<{ ok: true, token: string, source: 'file', kind: 'oidc' } | { ok: false, error: string }>} - * @ref LLP 0046#d5 [implements]: serialized refresh commit - no double-spend or clobber across concurrent hyp processes + * @ref LLP 0046#d5 [implements]: single-flight refresh under the lock - no double-spend, clobber, or lost-race re-login across concurrent hyp processes */ async function refreshOidcSession({ target, stateDir, identityBase, fetchImpl, now, from, envName }) { - for (let attempt = 0; attempt < REFRESH_MAX_ATTEMPTS; attempt++) { - let refreshed - try { - refreshed = await refreshSession({ identityBase, refreshToken: from.refreshToken, fetchImpl }) - } catch (err) { - if (!(err instanceof InvalidGrantError)) throw err - // Our refresh row was consumed. Re-read under the lock so a sibling's - // winning commit is guaranteed visible (it holds the lock across its - // write). A changed-and-fresh token is an adoptable rotation; a changed - // stale token means loop and refresh with it; an unchanged token is a - // genuine revocation - re-throw. - const latest = await withCredentialsLock(stateDir, () => readOidcRecord(stateDir, target)) - if (!latest || latest.refreshToken === from.refreshToken) throw err - if (isFresh(latest, now)) { - return { ok: true, token: latest.accessJwt, source: 'file', kind: 'oidc' } - } - from = latest - continue + return withCredentialsLock(stateDir, async () => { + const latest = await readOidcRecord(stateDir, target) + if (!latest) { + // `hyp remote remove` ran (or a static login replaced the record) before we + // got the lock. Honor that instead of resurrecting a refreshed session. + return { ok: false, error: noTokenError(target, envName) } } - const outcome = await withCredentialsLock(stateDir, async () => { - const latest = await readOidcRecord(stateDir, target) - if (!latest) { - // `hyp remote remove` deleted the session while our refresh was in - // flight. Honor the deletion under the lock instead of writing the - // stale row back and resurrecting removed credentials. - return { removed: true } - } - if (latest.refreshToken !== from.refreshToken) { - // A sibling rotated the one-time-use session under us while our refresh - // was in flight. Adopt a usable stored session; otherwise loop and - // refresh with the rotated token rather than clobber it. - if (isFresh(latest, now)) return { token: latest.accessJwt } - return { retryFrom: latest } - } - // No rotation: our result is authoritative. A rotated refresh token (from - // a one-time-use server) replaces the consumed one, otherwise the stored - // token is kept; `org` is fixed for the token's life, so an omitted one - // keeps the stored value. - await commitSession(stateDir, target, { - refreshToken: refreshed.refreshToken || latest.refreshToken, - accessJwt: refreshed.accessJwt, - expiresAt: refreshed.expiresAt, - org: refreshed.org || latest.org, - }) - return { token: refreshed.accessJwt } - }) - if (outcome.removed) return { ok: false, error: noTokenError(target, envName) } - if (outcome.token !== undefined) { - return { ok: true, token: outcome.token, source: 'file', kind: 'oidc' } + // A sibling refreshed (or the user re-logged in) while we waited for the + // lock: adopt its newer, still-fresh JWT with no token-endpoint call. + if (latest.accessJwt !== from.accessJwt && isFresh(latest, now)) { + return { ok: true, token: latest.accessJwt, source: 'file', kind: 'oidc' } } - from = /** @type {RemoteOidcRecord} */ (outcome.retryFrom) - } - // The stored session kept rotating under us past the attempt bound; treat it - // as unresolvable rather than spin forever. - return { ok: false, error: noTokenError(target, envName) } + // We are the single in-flight refresher. Refresh from the freshest stored + // refresh token (a sibling may have rotated it to a token whose JWT is itself + // already stale) and commit. A rotated one-time-use token replaces the + // consumed one; otherwise the stored token is kept. `org` is fixed for the + // token's life, so an omitted one keeps the stored value. + const refreshed = await refreshSession({ identityBase, refreshToken: latest.refreshToken, fetchImpl }) + await commitSession(stateDir, target, { + refreshToken: refreshed.refreshToken || latest.refreshToken, + accessJwt: refreshed.accessJwt, + expiresAt: refreshed.expiresAt, + org: refreshed.org || latest.org, + }) + return { ok: true, token: refreshed.accessJwt, source: 'file', kind: 'oidc' } + }) } /** @@ -613,21 +592,116 @@ function delay(ms) { return new Promise((resolve) => setTimeout(resolve, ms)) } +/** + * This process's lock-owner tag, written into the lock file so a contender can + * tell a crashed holder (steal it) from a live one (wait). Host-qualified so a + * pid recorded on a different machine (a shared `HOME`) is never probed against + * this machine's process table. + */ +const LOCK_OWNER = JSON.stringify({ host: os.hostname(), pid: process.pid }) + +/** + * Whether the process that wrote a lock file is gone, so its lock is safe to + * steal. A same-host holder is probed with signal 0: `ESRCH` means no such + * process (dead); a delivered signal or `EPERM` means it is alive. A holder on a + * different host, or an unparseable/legacy lock, can't be probed and is reported + * "not dead" so the caller falls back to the age backstop. + * + * @param {string} ownerText raw lock-file contents + * @returns {boolean} + */ +function lockHolderIsDead(ownerText) { + /** @type {{ host?: unknown, pid?: unknown }} */ + let owner + try { + owner = JSON.parse(ownerText) + } catch { + return false + } + if (!owner || owner.host !== os.hostname() || typeof owner.pid !== 'number') return false + try { + process.kill(owner.pid, 0) + return false // signal accepted: the holder is alive + } catch (err) { + return /** @type {NodeJS.ErrnoException} */ (err).code === 'ESRCH' + } +} + +/** + * Try to steal a held lock, returning whether the caller should retry the + * acquire (the lock is gone or was abandoned) versus keep waiting (a live + * holder). Stealing is atomic: we rename the lock aside before removing it, and + * a given lock can be renamed by exactly one contender (the loser sees `ENOENT`), + * so two stealers can never both adopt it and clobber each other. We only delete + * the captured file when its contents still match the abandoned lock we + * inspected, so a fresh lock a new holder grabbed in the race window is restored, + * not evicted. + * + * @param {string} lockPath + * @returns {Promise} true to retry the acquire, false to keep waiting + */ +async function stealLockIfAbandoned(lockPath) { + /** @type {Stats} */ + let st + /** @type {string} */ + let ownerText + try { + st = await fs.stat(lockPath) + ownerText = await fs.readFile(lockPath, 'utf8') + } catch { + return true // the lock vanished between the failed open and now: retry + } + const abandoned = lockHolderIsDead(ownerText) || Date.now() - st.mtimeMs > LOCK_STALE_MS + if (!abandoned) return false // a live holder mid-refresh: wait it out + const captured = `${lockPath}.stale-${process.pid}` + try { + await fs.rename(lockPath, captured) + } catch { + return true // another contender captured it first: retry the acquire + } + // We now exclusively own the captured file. Confirm it is the same abandoned + // lock we inspected (a new holder would have written a different owner tag) and + // only then discard it; otherwise put it back without clobbering whatever sits + // at lockPath now, and keep waiting. + let capturedText = '' + try { + capturedText = await fs.readFile(captured, 'utf8') + } catch { /* already gone: treat as removed */ } + if (capturedText === ownerText) { + await fs.rm(captured, { force: true }) + return true + } + try { + await fs.rename(captured, lockPath) + } catch { + await fs.rm(captured, { force: true }) + } + return false +} + /** * Serialize the read-modify-write of the shared 0600 store across concurrent * `hyp` processes (a verb call beside a proxy, two MCP clients). Without it two * writers to *different* targets each read the whole map and rename; the later * rename clobbers the earlier writer's just-rotated one-time-use refresh token * for the other target, forcing a needless re-login. An `O_EXCL` lock file is - * the cross-process mutex, and the cache is dropped on entry so the locked body - * reads the freshest on-disk map (an identical-size sibling rewrite the parse - * cache would otherwise miss). A stale lock left by a crashed holder is stolen - * by age, so a crash can't wedge every future write. + * the cross-process mutex; it records this process's `{host, pid}` so the lock + * is a real mutex even though it is now held across the bounded token-endpoint + * refresh ({@link refreshOidcSession}). The cache is dropped on entry so the + * locked body reads the freshest on-disk map (an identical-size sibling rewrite + * the parse cache would otherwise miss). + * + * A crashed holder's lock is stolen as soon as its pid is seen dead, not after a + * blind age wait, so a crash neither wedges every future write nor (the inverse) + * lets a fixed age threshold misfire on a merely slow live holder. The steal is + * atomic and single-winner (see {@link stealLockIfAbandoned}), and release only + * removes a lock that is still ours, so a holder can never evict a successor. * * @template T * @param {string} stateDir * @param {() => Promise} fn * @returns {Promise} + * @ref LLP 0046#d5 [implements]: pid-identified mutex with liveness-based, single-winner steal */ async function withCredentialsLock(stateDir, fn) { await fs.mkdir(stateDir, { recursive: true }) @@ -636,18 +710,15 @@ async function withCredentialsLock(stateDir, fn) { for (;;) { try { const handle = await fs.open(lockPath, 'wx') - await handle.close() + try { + await handle.writeFile(LOCK_OWNER) + } finally { + await handle.close() + } break } catch (err) { if (!err || /** @type {NodeJS.ErrnoException} */ (err).code !== 'EEXIST') throw err - // Steal a lock abandoned by a crashed holder, identified by age. - try { - const st = await fs.stat(lockPath) - if (Date.now() - st.mtimeMs > LOCK_STALE_MS) { - await fs.rm(lockPath, { force: true }) - continue - } - } catch { /* lock vanished between open and stat: retry the acquire */ } + if (await stealLockIfAbandoned(lockPath)) continue if (Date.now() > deadline) { throw new Error('timed out acquiring the remote credentials lock') } @@ -659,7 +730,12 @@ async function withCredentialsLock(stateDir, fn) { rawCache = null return await fn() } finally { - await fs.rm(lockPath, { force: true }) + // Remove only our own lock: if a slow hold ever overran the age backstop and + // a contender stole it, the file now belongs to a successor we must not evict. + try { + const owner = await fs.readFile(lockPath, 'utf8') + if (owner === LOCK_OWNER) await fs.rm(lockPath, { force: true }) + } catch { /* already gone */ } } } diff --git a/test/core/remote-credentials-oidc.test.js b/test/core/remote-credentials-oidc.test.js index 5e34b2d..fe22154 100644 --- a/test/core/remote-credentials-oidc.test.js +++ b/test/core/remote-credentials-oidc.test.js @@ -233,44 +233,40 @@ test('resolveAccessJwt keeps the stored refresh token when the server does not r assert.equal(/** @type {any} */ (creds.prod).refreshToken, 'rt-stable') }) -test('resolveAccessJwt adopts a sibling login instead of clobbering it on a successful refresh', async () => { +test('concurrent resolveAccessJwt is single-flight: the loser adopts the winner with no second token call', async () => { const dir = await tmpState() await writeSession(dir, 'prod', { refreshToken: 'rt-old', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) let calls = 0 - // While our refresh is in flight, a concurrent `hyp remote login` rotates the - // session on disk to a brand-new refresh token. Our refresh still succeeds, - // but its result was derived from the now-superseded rt-old, so committing it - // would clobber the fresh login. The compare-and-swap must adopt the login's - // session instead. + // Two `hyp` processes resolve the same stale session at once. The write lock + // serializes them: the winner refreshes once under the lock and commits; the + // loser then acquires the lock, sees the fresh JWT, and adopts it without a + // second token-endpoint call. No lost-refresh-race, no double-spend. const fetchImpl = /** @type {any} */ (async () => { calls++ - await writeSession(dir, 'prod', { refreshToken: 'rt-login', accessJwt: 'jwt-login', expiresAt: FUTURE, org: 'acme' }) return { ok: true, status: 200, text: async () => JSON.stringify({ access_jwt: 'jwt-new', expires_at: FUTURE, org: 'acme', refresh_token: 'rt-new' }) } }) - const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }) - assert.equal(/** @type {any} */ (r).token, 'jwt-login') - assert.equal(calls, 1) - // The login's refresh token survives; our refresh result did not overwrite it. - assert.equal(/** @type {any} */ ((await readCredentials(dir)).prod).refreshToken, 'rt-login') + const both = await Promise.all([ + resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }), + resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }), + ]) + assert.equal(/** @type {any} */ (both[0]).token, 'jwt-new') + assert.equal(/** @type {any} */ (both[1]).token, 'jwt-new') + assert.equal(calls, 1) // only one process hit the token endpoint + assert.equal(/** @type {any} */ ((await readCredentials(dir)).prod).refreshToken, 'rt-new') }) -test('resolveAccessJwt honors a concurrent remove instead of resurrecting the session', async () => { +test('resolveAccessJwt does not resurrect a session removed before the refresh', async () => { const dir = await tmpState() await writeSession(dir, 'prod', { refreshToken: 'rt-old', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) - let calls = 0 - // While our refresh is in flight, the user runs `hyp remote remove prod`, - // deleting the row. Our refresh still succeeds, but committing its result - // would write the removed session back. The compare-and-swap must observe the - // deletion under the lock and decline to resurrect it. - const fetchImpl = /** @type {any} */ (async () => { - calls++ - await removeToken(dir, 'prod') - return { ok: true, status: 200, text: async () => JSON.stringify({ access_jwt: 'jwt-new', expires_at: FUTURE, org: 'acme', refresh_token: 'rt-new' }) } - }) + // A concurrent `hyp remote remove prod` deletes the row. removeToken and the + // single-flight refresh share the write lock, so the removal lands fully before + // the refresher re-reads under the lock; it then finds nothing and declines to + // write a refreshed session back. + assert.equal(await removeToken(dir, 'prod'), true) + const fetchImpl = /** @type {any} */ (async () => { throw new Error('must not refresh a removed session') }) const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }) assert.equal(r.ok, false) assert.match(/** @type {any} */ (r).error, /no token for 'prod' - run 'hyp remote login prod'/) - assert.equal(calls, 1) // The removed session stays removed; nothing was written back. assert.equal(/** @type {any} */ ((await readCredentials(dir)).prod), undefined) }) @@ -297,43 +293,51 @@ test('resolveAccessJwt errors with login guidance when no record exists', async assert.match(/** @type {any} */ (r).error, /no token for 'prod' - run 'hyp remote login prod'/) }) -test('resolveAccessJwt adopts a concurrent process\'s fresh session on invalid_grant', async () => { +test('resolveAccessJwt with forceRefresh refreshes even when the cached JWT is still clock-fresh', async () => { const dir = await tmpState() - await writeSession(dir, 'prod', { refreshToken: 'rt-old', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) + // The stored JWT is nowhere near expiry, but a live request just got a 401, so + // the attach path forces a refresh. forceRefresh must bypass the fresh-cache + // fast path and the under-lock adopt-guard (which compares against the same + // failing JWT) and actually mint a new token. + await writeSession(dir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-fresh', expiresAt: FUTURE, org: 'acme' }) let calls = 0 - // A second `hyp` process refreshed first, rotating the row out from under us, - // and stored its fresh session. Our refresh comes back invalid_grant; we must - // adopt the winner's JWT rather than force a re-login. const fetchImpl = /** @type {any} */ (async () => { calls++ - await writeSession(dir, 'prod', { refreshToken: 'rt-winner', accessJwt: 'jwt-winner', expiresAt: FUTURE, org: 'acme' }) - return { ok: false, status: 401, text: async () => JSON.stringify({ error: 'invalid_grant' }) } + return { ok: true, status: 200, text: async () => JSON.stringify({ access_jwt: 'jwt-forced', expires_at: FUTURE, org: 'acme', refresh_token: 'rt2' }) } }) - const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }) - assert.equal(/** @type {any} */ (r).token, 'jwt-winner') - assert.equal(calls, 1) // the fresh adopted JWT means no second refresh + const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl, forceRefresh: true }) + assert.equal(/** @type {any} */ (r).token, 'jwt-forced') + assert.equal(calls, 1) }) -test('resolveAccessJwt refreshes with a rotated token when the adopted session is also stale', async () => { +test('resolveAccessJwt refreshes with the freshest stored refresh token in one shot', async () => { const dir = await tmpState() - await writeSession(dir, 'prod', { refreshToken: 'rt-old', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) + // A sibling rotated the refresh token to rt-new but its JWT is itself already + // stale (e.g. clock skew or an immediately-expiring grant). Single-flight reads + // the rotated token under the lock and refreshes from it once - no retry loop. + await writeSession(dir, 'prod', { refreshToken: 'rt-new', accessJwt: 'jwt-mid', expiresAt: PAST, org: 'acme' }) let calls = 0 const fetchImpl = /** @type {any} */ (async (/** @type {string} */ _url, /** @type {any} */ opts) => { calls++ - const sent = JSON.parse(opts.body).refresh_token - if (sent === 'rt-old') { - // Winner rotated to rt-new but its JWT is also already stale. - await writeSession(dir, 'prod', { refreshToken: 'rt-new', accessJwt: 'jwt-mid', expiresAt: PAST, org: 'acme' }) - return { ok: false, status: 401, text: async () => JSON.stringify({ error: 'invalid_grant' }) } - } - // Retry with the rotated token succeeds. + assert.equal(JSON.parse(opts.body).refresh_token, 'rt-new') // refreshed from the freshest stored token return { ok: true, status: 200, text: async () => JSON.stringify({ access_jwt: 'jwt-final', expires_at: FUTURE, org: 'acme', refresh_token: 'rt-final' }) } }) const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }) assert.equal(/** @type {any} */ (r).token, 'jwt-final') - assert.equal(calls, 2) - const creds = await readCredentials(dir) - assert.equal(/** @type {any} */ (creds.prod).refreshToken, 'rt-final') + assert.equal(calls, 1) + assert.equal(/** @type {any} */ ((await readCredentials(dir)).prod).refreshToken, 'rt-final') +}) + +test('a write steals a lock abandoned by a crashed holder (dead pid)', async () => { + const dir = await tmpState() + await writeToken(dir, 'prod', 'sk-1') // creates the store + const lockPath = `${remoteCredentialsPath(dir)}.lock` + // Simulate a crashed holder: a lock file naming a dead pid on this host. + await fs.writeFile(lockPath, JSON.stringify({ host: os.hostname(), pid: 999999999 })) + // The next write must steal the lock by liveness (the pid is dead) and succeed, + // not wait out the age backstop or time out. + await writeToken(dir, 'prod', 'sk-2') + assert.deepEqual((await readCredentials(dir)).prod, { kind: 'static', token: 'sk-2' }) }) test('resolveAccessJwt propagates a refresh failure (invalid_grant)', async () => { From 43727a42b818c199ce240588c364f0dec7f7e653 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Tue, 30 Jun 2026 11:18:31 -0700 Subject: [PATCH 56/59] remote/loopback: ignore non-matching callbacks instead of aborting the login The receiver settled (and closed, single-shot) on the first /callback hit carrying error=, a mismatched state, or no code - and the error= branch ran before the state check. The ephemeral loopback port is reachable by any local process and by any page the user is browsing (a no-cors GET still reaches us), so a single `GET /callback?error=x` or `?state=wrong` before the genuine redirect arrived would fail-and-close the listener, leaving the real browser callback with connection-refused: a trivial login denial-of-service. Apply the CSRF state check first and ignore any non-matching-state request (neutral page, never settle), so only a callback bearing our exact state can abort or complete the login. The identity server echoes state on both success and error (LLP 0047), so genuine provider errors still match and surface. Aligns LLP 0047, which already documents error redirects as carrying state. --- llp/0047-oidc-login-client.design.md | 19 +++++++++----- src/core/remote/loopback.js | 35 +++++++++++++------------ test/core/remote-loopback.test.js | 39 +++++++++++++++------------- 3 files changed, 51 insertions(+), 42 deletions(-) diff --git a/llp/0047-oidc-login-client.design.md b/llp/0047-oidc-login-client.design.md index eb3b8f2..617fa04 100644 --- a/llp/0047-oidc-login-client.design.md +++ b/llp/0047-oidc-login-client.design.md @@ -69,9 +69,11 @@ Realizes LLP 0046 D3. `startLoopbackReceiver({ state, timeoutMs })` → `{ redirectUri, waitForCode() }`. Binds `http://127.0.0.1:0` (OS-assigned port) and returns `redirectUri` so the caller can build the start URL before opening the browser. `waitForCode()` resolves `{ code }` when -`/callback` arrives with a matching `state`; rejects on `error=`, `state` mismatch, or -timeout. Serves a minimal "login complete, you can close this tab" page, then closes. -Single-shot. Realizes LLP 0046 D2. +`/callback` arrives with a matching `state`; rejects on a matching-state `error=`, a +matching-state callback with no `code`, or timeout. A callback whose `state` does not +match (or is absent) is served a neutral page and ignored, never settling the flow, so a +stray or hostile hit on the loopback port cannot abort the login. Serves a minimal "login +complete, you can close this tab" page, then closes. Single-shot. Realizes LLP 0046 D2. ### `open_browser.js` `openBrowser(url)` spawns the platform opener (`open` darwin, `xdg-open` linux, `start` @@ -122,8 +124,10 @@ re-run the browser flow). ## Security notes -- `state` is a random per-login CSRF token; a callback whose `state` does not match is - rejected without reading `code`. +- `state` is a random per-login CSRF token. A callback whose `state` matches settles the + flow (a `code` resolves it, an `error=` rejects it); a callback whose `state` does not + match is ignored without reading `code`, not rejected, so a stray or hostile request to + the loopback port cannot abort the in-flight login. - The loopback listener binds only `127.0.0.1`, is single-shot, and times out. - The PKCE verifier lives in memory for one flow, is never logged, never persisted. - The refresh token and access JWT inherit the 0600 atomic-write store; neither is ever @@ -145,8 +149,9 @@ token, or access JWT (hash or redact if identity matters). Give the smoke a stab Traditional unit tests (deterministic; injected clock, fetch, and http): - `pkce`: challenge is the base64url SHA-256 of the verifier; pairs differ. -- `loopback`: a matching `state` resolves `{ code }`; a mismatched `state`, an `error=`, - and a timeout each reject; the listener closes after one request. +- `loopback`: a matching `state` resolves `{ code }`; a matching-state `error=` and a + timeout each reject; a mismatched `state` is ignored (the flow keeps waiting and a later + valid callback still resolves); the listener closes after the settling request. - `identity_client`: `exchangeCode` and `refreshSession` post the right body and parse the response; a 401 surfaces a typed `invalid_grant`. - `credentials`: an `oidc` record round-trips; a legacy `token`-only record reads as diff --git a/src/core/remote/loopback.js b/src/core/remote/loopback.js index f5986c9..5b47d49 100644 --- a/src/core/remote/loopback.js +++ b/src/core/remote/loopback.js @@ -98,29 +98,30 @@ export function startLoopbackReceiver({ state, timeoutMs = DEFAULT_TIMEOUT_MS }) const error = params.get('error') const code = params.get('code') - // An `error` callback is surfaced before the state check: it carries no - // authorization code, so the CSRF guard (which exists only to stop an - // attacker's code being adopted) does not apply, and a provider that drops - // `state` on an error redirect (RFC 6749 only requires echoing it when - // present) must still report the real failure rather than a misleading - // "state mismatch" plus a headless-token hint. + // CSRF guard, applied to every callback before it can settle the login: only + // a request carrying our exact `state` can have come from the browser we sent + // to the IdP. Anything else gets a neutral page and is IGNORED, never settling + // the flow, so it cannot abort an in-flight login - we keep waiting for the + // genuine redirect or the timeout. This is deliberate: the ephemeral loopback + // port is reachable by any local process and by any page the user is browsing + // (a no-cors GET still reaches us), so failing the login on the first stray or + // hostile `?error=` / wrong-`?state=` hit would be a trivial login DoS. The + // identity server echoes `state` on both success and error (LLP 0047), so a + // real provider error still matches here and surfaces below. + if (returnedState !== state) { + respond(res, 'Unexpected login callback. You can close this tab.') + return + } if (params.has('error')) { - // The redirect's `error` is attacker-influenceable (anyone can hit the - // loopback port). Bound it to a safe token before it reaches the error - // message, the log ERROR_KIND, and the terminal, so a crafted value can't - // inject newlines into logs or terminal output. + // The redirect's `error` is attacker-influenceable (anyone who learns the + // state, or guesses it, can hit the port). Bound it to a safe token before + // it reaches the error message, the log ERROR_KIND, and the terminal, so a + // crafted value can't inject newlines into logs or terminal output. const safeError = sanitizeErrorCode(error ?? '') respond(res, 'Login failed. You can close this tab and return to the terminal.') fail(Object.assign(new Error(`login failed: ${safeError}`), { callbackError: safeError }), safeError) return } - // A callback whose state does not match is rejected without reading the - // code (CSRF guard). Serve a neutral page either way. - if (returnedState !== state) { - respond(res, 'Login state mismatch. You can close this tab.') - fail(new Error('loopback received a callback with a mismatched state'), 'state_mismatch') - return - } if (!code) { respond(res, 'Login response was missing a code. You can close this tab.') fail(new Error('loopback callback had neither code nor error'), 'no_code') diff --git a/test/core/remote-loopback.test.js b/test/core/remote-loopback.test.js index 52a27fd..d62e0be 100644 --- a/test/core/remote-loopback.test.js +++ b/test/core/remote-loopback.test.js @@ -55,13 +55,17 @@ test('a matching state resolves { code } and closes the listener', async () => { await assert.rejects(() => hitCallback(recv.redirectUri, { code: 'x', state: 's-good' })) }) -test('a mismatched state rejects without yielding a code', async () => { +test('a mismatched state is ignored, not consumed: the flow keeps waiting', async () => { const recv = await startLoopbackReceiver({ state: 's-real', timeoutMs: 2000 }) - // Attach the rejection assertion synchronously so the (synchronous) reject - // inside the request handler always lands on a handler. - const assertion = assert.rejects(() => recv.waitForCode(), /mismatched state/) - await hitCallback(recv.redirectUri, { code: 'leak', state: 's-forged' }) - await assertion + const waiting = recv.waitForCode() + // A forged-state callback (a CSRF attempt, or any stray hit on the loopback + // port) must NOT settle the login: failing on it would be a trivial login DoS. + // It gets a neutral page and is ignored. + const status = await hitCallback(recv.redirectUri, { code: 'leak', state: 's-forged' }) + assert.equal(status, 200) + // The genuine callback still resolves afterward. + await hitCallback(recv.redirectUri, { code: 'real', state: 's-real' }) + assert.deepEqual(await waiting, { code: 'real' }) }) test('an error= callback rejects with the error code attached', async () => { @@ -75,19 +79,18 @@ test('an error= callback rejects with the error code attached', async () => { await assertion }) -test('an error= callback with no state reports the real error, not a state mismatch', async () => { +test('an error= callback with no state is ignored, not surfaced (anti-DoS)', async () => { const recv = await startLoopbackReceiver({ state: 's1', timeoutMs: 2000 }) - // A provider that drops `state` on an error redirect must still surface the - // denial, not be misreported as a CSRF/state mismatch (which also appends a - // misleading headless-token hint downstream). - const assertion = assert.rejects(() => recv.waitForCode(), (err) => { - assert.match(/** @type {Error} */ (err).message, /access_denied/) - assert.doesNotMatch(/** @type {Error} */ (err).message, /mismatched state/) - assert.equal(/** @type {any} */ (err).callbackError, 'access_denied') - return true - }) - await hitCallback(recv.redirectUri, { error: 'access_denied' }) - await assertion + const waiting = recv.waitForCode() + // A stateless `?error=` is indistinguishable from an attacker poking the + // loopback port, so it must not abort the login. Our identity server echoes + // `state` on error redirects, so a genuine denial still matches (see the + // matching-state error= test above) and surfaces. + const status = await hitCallback(recv.redirectUri, { error: 'access_denied' }) + assert.equal(status, 200) + // The real callback still resolves afterward; the stray error did not settle it. + await hitCallback(recv.redirectUri, { code: 'real', state: 's1' }) + assert.deepEqual(await waiting, { code: 'real' }) }) test('a timeout rejects', async () => { From 7cccc26fa0c2f365e6f55768d235177be00b6f62 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Tue, 30 Jun 2026 11:20:45 -0700 Subject: [PATCH 57/59] remote/login: accept --org=/--token-file= equals form runRemoteLogin matched these flags with argv.indexOf('--org'), which only finds the bare ` --org value` form. `hyp remote login prod --org=acme` left org undefined and let positionals skip --org=acme as a flag, so the browser flow ran with no org and a multi-org account later failed with org_selection_required. --token-file=path likewise fell through to the browser flow instead of reading the file. The rest of the CLI already takes the equals form (core_commands' --token-file=), so this was an inconsistency users hit. Parse both forms via a shared valueFlag helper. The space form still rejects a following flag-looking token as "expects a value"; the equals form takes its value verbatim and rejects only an empty one. --- src/core/cli/remote_commands.js | 40 +++++++++++++++++++++----- test/core/remote-login-command.test.js | 35 ++++++++++++++++++++++ 2 files changed, 68 insertions(+), 7 deletions(-) diff --git a/src/core/cli/remote_commands.js b/src/core/cli/remote_commands.js index 68bdffd..47782cc 100644 --- a/src/core/cli/remote_commands.js +++ b/src/core/cli/remote_commands.js @@ -97,15 +97,15 @@ export async function runRemoteAdd(argv, ctx) { * @ref LLP 0046#d1 [implements]: browser mode of `hyp remote login`; one command, one store, one more way to populate it */ export async function runRemoteLogin(argv, ctx, deps = {}) { - const tokenFileIdx = argv.indexOf('--token-file') - const tokenFile = tokenFileIdx >= 0 ? argv[tokenFileIdx + 1] : undefined - if (tokenFileIdx >= 0 && (!tokenFile || tokenFile.startsWith('-'))) { + const tokenFileArg = valueFlag(argv, '--token-file') + const tokenFile = tokenFileArg.value + if (tokenFileArg.present && !tokenFile) { ctx.stderr.write('hyp remote login: --token-file expects a path\n') return 2 } - const orgIdx = argv.indexOf('--org') - const org = orgIdx >= 0 ? argv[orgIdx + 1] : undefined - if (orgIdx >= 0 && (!org || org.startsWith('-'))) { + const orgArg = valueFlag(argv, '--org') + const org = orgArg.value + if (orgArg.present && !org) { ctx.stderr.write('hyp remote login: --org expects an org name\n') return 2 } @@ -158,7 +158,7 @@ function positionals(argv, valueFlags = new Set()) { for (let i = 0; i < argv.length; i++) { const a = argv[i] if (a.startsWith('-')) { - if (valueFlags.has(a)) i++ // consume its value + if (valueFlags.has(a)) i++ // consume its value (`--flag value`; `--flag=value` carries its own) continue } out.push(a) @@ -166,6 +166,32 @@ function positionals(argv, valueFlags = new Set()) { return out } +/** + * Read a value-taking flag in either `--flag value` or `--flag=value` form. The + * `=` form is accepted because the rest of the CLI takes it (e.g. core_commands' + * `--token-file=`), so `login prod --org=acme` must not silently drop the org and + * fall through to a no-org browser flow. In the space form a following token that + * is itself a flag (or absent) is not a value, so the caller can report "expects + * a value"; in the `=` form the value is explicit (even `''`, which the caller + * rejects). + * + * @param {string[]} argv + * @param {string} flag e.g. `--org` + * @returns {{ present: boolean, value: string | undefined }} + */ +function valueFlag(argv, flag) { + const eq = `${flag}=` + for (let i = 0; i < argv.length; i++) { + const a = argv[i] + if (a === flag) { + const next = argv[i + 1] + return { present: true, value: next !== undefined && !next.startsWith('-') ? next : undefined } + } + if (a.startsWith(eq)) return { present: true, value: a.slice(eq.length) } + } + return { present: false, value: undefined } +} + /** * The static-token path (LLP 0033, unchanged behavior): read a token from * `--token-file` or piped stdin and store it as a `kind: 'static'` record. diff --git a/test/core/remote-login-command.test.js b/test/core/remote-login-command.test.js index f6b171b..f6c8dfb 100644 --- a/test/core/remote-login-command.test.js +++ b/test/core/remote-login-command.test.js @@ -306,3 +306,38 @@ test('--org is noted as ignored when a static token forces the static path', asy assert.equal(code, 0) assert.match(err.join(''), /--org is ignored with a static token/) }) + +test('--org=acme (equals form) is honored, not silently dropped', async () => { + const hypHome = await tmpHome() + const { ctx } = await makeCtx({ hypHome }) + /** @type {any} */ let seen + const login = /** @type {any} */ (async (/** @type {any} */ args) => { + seen = args + return { refreshToken: 'rt', accessJwt: 'jwt', expiresAt: '2999-01-01T00:00:00Z', org: 'acme' } + }) + const code = await runRemoteLogin(['prod', '--org=acme'], ctx, { login }) + assert.equal(code, 0) + assert.equal(seen.org, 'acme') // not undefined, which would run a no-org browser flow +}) + +test('--token-file=path (equals form) takes the static path, not the browser flow', async () => { + const hypHome = await tmpHome() + const tokenFile = path.join(hypHome, 'tok.txt') + await fs.writeFile(tokenFile, 'sk-static\n') + const { ctx } = await makeCtx({ hypHome }) + let called = false + const login = /** @type {any} */ (async () => { called = true; return {} }) + const code = await runRemoteLogin(['prod', `--token-file=${tokenFile}`], ctx, { login }) + assert.equal(code, 0) + assert.equal(called, false) // equals form must not fall through to the browser flow + const creds = await readCredentials(path.join(hypHome, 'hypaware')) + assert.deepEqual(creds.prod, { kind: 'static', token: 'sk-static' }) +}) + +test('--org= (equals form, empty value) is a usage error', async () => { + const hypHome = await tmpHome() + const { ctx, err } = await makeCtx({ hypHome }) + const code = await runRemoteLogin(['prod', '--org='], ctx, {}) + assert.equal(code, 2) + assert.match(err.join(''), /--org expects an org name/) +}) From fe1922a5666c69233a57c22cf840f449230e58be Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Tue, 30 Jun 2026 14:17:42 -0700 Subject: [PATCH 58/59] remote/credentials: replace the liveness-probe lock with an age-stale mutex The cross-process credential lock (LLP 0046 D5) kept its single-flight refresh but shipped a crash-recovery mechanism with six defects: a same-host liveness probe that could not see a suspended-but-alive holder (so the age backstop stole live locks), a 45s wait shorter than the 90s steal age (so waiters never reached the steal), an empty-lock wedge on a failed owner-tag write, and a rename-aside-and-restore steal whose restore gap allowed a double hold. Replace all of it with one rule set (LLP 0049 D1): grant only by O_EXCL, break a lock past one stale age with a plain fs.rm, and release only when a per-acquisition nonce still matches. Breaking never grants the lock, so a four-line break is safe where the old steal was not. One LOCK_STALE_MS (60s, 2x the bounded token call) replaces two constants in tension; the {host,pid} tag, process.kill probe, stealLockIfAbandoned, and LOCK_TIMEOUT_MS are gone, along with the node:os import. The single-flight refresh under the lock is unchanged and still necessary for one-time-use refresh-token rotation. Retargets the one steal test from a dead-pid probe to an age-stale break. --- llp/0046-oidc-login-client.decision.md | 12 +- llp/0049-remote-credentials-lock.decision.md | 138 ++++++++++++++++ src/core/remote/credentials.js | 159 ++++++------------- test/core/remote-credentials-oidc.test.js | 12 +- 4 files changed, 206 insertions(+), 115 deletions(-) create mode 100644 llp/0049-remote-credentials-lock.decision.md diff --git a/llp/0046-oidc-login-client.decision.md b/llp/0046-oidc-login-client.decision.md index 9233521..5722d65 100644 --- a/llp/0046-oidc-login-client.decision.md +++ b/llp/0046-oidc-login-client.decision.md @@ -120,7 +120,17 @@ matched): its "unchanged token means revoked" test had a race window, because a could re-read before the winner committed and then wrongly force a re-login. The lock is held across the bounded (token-endpoint timeout) refresh call, so it is a -real mutex, not a millisecond commit latch. **Decision:** the lock file records the +real mutex, not a millisecond commit latch. + +> **Superseded in part by [LLP 0049](./0049-remote-credentials-lock.decision.md).** +> The *single-flight refresh under the lock* above stands. The crash-recovery +> mechanism originally decided here (a `{host, pid}` liveness probe plus an age +> backstop plus a rename-aside-and-restore steal) shipped with six defects and is +> replaced by LLP 0049 D1: one age threshold, break by `fs.rm`, grant only by +> `O_EXCL`, release guarded by a per-acquisition nonce. The original recovery +> reasoning is retained below for history. + +**Decision (superseded):** the lock file records the holder's `{host, pid}`, and a contender steals it only when that holder is confirmed dead (a same-host liveness probe), not by a fixed age guess; the steal is an atomic rename so two contenders can never both adopt the same abandoned lock, and a holder only ever diff --git a/llp/0049-remote-credentials-lock.decision.md b/llp/0049-remote-credentials-lock.decision.md new file mode 100644 index 0000000..9844f0e --- /dev/null +++ b/llp/0049-remote-credentials-lock.decision.md @@ -0,0 +1,138 @@ +# LLP 0049: The remote-credentials lock is an age-stale mutex, not a liveness probe + +**Type:** Decision +**Status:** Accepted +**Systems:** CLI, Query, MCP +**Author:** Kenny / Claude +**Date:** 2026-06-30 +**Related:** LLP 0033, LLP 0046 + +> LLP 0046 D5 introduced a cross-process write lock on the shared `0600` +> credential store so a one-time-use refresh token is rotated single-flight. The +> lock worked; its *crash-recovery* mechanism (a `{host, pid}` liveness probe plus +> an age backstop plus a rename-aside-and-restore steal) did not. A review of the +> shipped code found six defects clustered entirely in that recovery path. This +> decision replaces the recovery mechanism with the simplest thing that is +> provably safe, and supersedes the second half of LLP 0046 D5. + +## What stays (and why) + +The **single-flight refresh under the lock** decision of LLP 0046 D5 is sound and +unchanged: the read-decide-refresh-commit still runs inside one lock hold, so only +one process calls the token endpoint for a given store at a time, and an +`invalid_grant` observed under the lock is an unambiguous revocation rather than a +lost race. We do **not** move the network refresh outside the lock. LLP 0046 D5 +already rejected that ("optimistic compare-and-swap"), and the rejection is +correct: the refresh token is one-time-use, so two processes refreshing the same +token concurrently would double-spend it, and a loser that re-read before the +winner committed could not tell a contended `invalid_grant` from a real +revocation. Holding the bounded (30s token-endpoint timeout) refresh inside the +lock is the price of correctness and we keep paying it. + +The bugs were never in *holding* the lock. They were in *recovering* a lock whose +holder died. + +## The defects being retired + +The recovery mechanism (`lockHolderIsDead`, `stealLockIfAbandoned`, the +`{host, pid}` owner tag, `LOCK_STALE_MS` OR-d with liveness, `LOCK_TIMEOUT_MS`) +produced: + +1. **Live-holder theft.** `abandoned = lockHolderIsDead(...) || age > LOCK_STALE_MS` + applied the age backstop even to a probeable, *alive* same-host holder, so a + holder suspended past 90s (laptop sleep, NFS stall) had its lock stolen and two + writers clobbered a just-rotated refresh token. This contradicted the function's + own doc comment. +2. **Unreachable steal.** `LOCK_TIMEOUT_MS` (45s) was smaller than `LOCK_STALE_MS` + (90s), so an in-flight waiter gave up before an abandoned lock ever aged into + stealable territory; cross-host crashes and PID reuse wedged every contender for + 45s. +3. **Empty-lock wedge.** A failed owner-tag write left a 0-byte lock that parsed as + "not dead" and blocked all writes until the 90s backstop. +4. **Restore-gap double-hold.** `stealLockIfAbandoned`'s rename-aside-then-restore + branch left `lockPath` briefly absent, so a third contender's `O_EXCL` create + could succeed alongside the restored holder. + +(Numbering here is local to this doc; the PR review thread carries the full list.) + +## Decision + + + +### D1: One age threshold, break by unlink, grant only by `O_EXCL` + +Replace the entire recovery apparatus with three rules: + +- **Grant only by `O_EXCL`.** A process holds the lock if and only if its + `fs.open(lockPath, 'wx')` (i.e. `O_CREAT | O_EXCL`) succeeds. The OS guarantees + exactly one winner per file-existence epoch. Nothing else ever confers the lock. +- **Break by age, with one constant.** A contender that sees `EEXIST` stats the + lock; if its mtime is older than `LOCK_STALE_MS`, the holder is treated as dead + and the contender `fs.rm`s the lock, then loops back to the `O_EXCL` create. + `LOCK_STALE_MS` is set to **60s**, comfortably above the longest legitimate hold + (the 30s-bounded token call plus a millisecond commit) and below user patience. + There is **no** liveness probe, **no** `{host, pid}` tag, and **no** second + timeout: because the lock's mtime is fixed at acquisition and the wall clock only + advances, every waiter is guaranteed to either acquire (the holder released) or + break (age crossed `LOCK_STALE_MS`) within one stale interval. A generous overall + deadline (`2 × LOCK_STALE_MS`) remains only as a runaway-loop backstop. +- **Release only your own lock.** The lock file records a per-acquisition random + nonce (`crypto.randomUUID`). Release reads the file and unlinks only if the nonce + still matches, so a holder whose overran lock was already broken and re-acquired + by a successor never deletes that successor's lock. + +Breaking **never grants** the lock: a breaker that unlinks a stale file must still +win the `O_EXCL` create like everyone else. + +**Rejected** keeping a liveness probe "for same-host precision": `process.kill(pid, 0)` +cannot see a suspended-but-alive holder, is defeated by PID reuse, is meaningless +across a shared `HOME`, and was the source of defects 1-2. A single age threshold, +chosen against the *bounded* hold time, subsumes every case the probe tried to +distinguish. + +**Rejected** the rename-aside-and-verify steal: its restore branch was defect 4, +and it is unnecessary once the grant is unconditionally `O_EXCL` - a plain `fs.rm` +break is single-winner *in effect* because the grant, not the break, decides who +holds. + +### Safety argument + +- **Mutual exclusion.** The lock is granted only by `O_EXCL`, which has one winner. + Breaking only removes a file. Two processes can therefore hold simultaneously + only if a break removed a *live* (non-stale) holder's lock - which requires a + fresh lock to be created in the window between the breaker's `stat` (which read + `age > LOCK_STALE_MS`) and its `rm`: two adjacent filesystem awaits. +- **Bounded worst case.** Even in that window, both writers commit via atomic + `tmp + rename` of the credential file, so no torn or partial file is ever + observed. The only possible loss is one one-time-use refresh-token rotation being + overwritten, costing **at most one needless re-login**, which self-heals on the + next refresh. We accept this bounded, self-healing edge instead of chasing + perfect exclusion, which is unattainable without OS-auto-released advisory locks + (`flock`/`fcntl`) that Node's standard `fs` does not expose and that we will not + add a native dependency to reach. +- **No deadlock.** A crashed holder's lock is always broken within `LOCK_STALE_MS`, + so no contender waits longer than one stale interval. +- **Single-flight preserved.** Absent the crash-recovery window, exactly one + process holds the lock and therefore exactly one refreshes, so one-time-use + rotation is never double-spent (the property LLP 0046 D5 needs). + +## Consequences + +- Net deletion: `lockHolderIsDead`, `stealLockIfAbandoned`, the `LOCK_OWNER` + `{host, pid}` tag, and `LOCK_TIMEOUT_MS` all go; `withCredentialsLock` shrinks to + an `O_EXCL` acquire loop plus a four-line `breakLockIfStale`. The `node:os` import + is no longer needed; `node:crypto` is added for the nonce. +- One constant (`LOCK_STALE_MS`) with one job, derived from the bounded token call, + replaces two constants that were in tension. +- The recovery semantics change is observable to exactly one test + (`a write steals a lock abandoned by a crashed holder`), which is retargeted from + a dead-PID steal to an age-stale steal (back-date the lock's mtime). +- Out of scope, tracked separately from this lock change: the missing `AbortSignal` + on the MCP forward/rpc fetch, and `https` enforcement on the derived identity + origin (both raised in the same review). + +## References + +- [LLP 0033](./0033-remote-query-attach.spec.md) (the `0600` query-scoped store) +- [LLP 0046](./0046-oidc-login-client.decision.md) D5 (single-flight refresh; this + doc supersedes its recovery mechanism) diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index 2f15ed2..7bd3193 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -1,7 +1,7 @@ // @ts-check +import crypto from 'node:crypto' import fs from 'node:fs/promises' -import os from 'node:os' import path from 'node:path' import process from 'node:process' @@ -36,22 +36,18 @@ const CREDENTIALS_BASENAME = 'remote-credentials.json' const REFRESH_SKEW_MS = 60 * 1000 /** - * Max time to wait for the cross-process write lock before giving up. The lock - * is now held across the bounded token-endpoint refresh (single-flight, see - * {@link refreshOidcSession}), so a waiter must outlast one in-flight refresh - * rather than a millisecond commit. A crashed holder is stolen immediately by - * liveness, so this only bounds waiting on a *live* sibling's refresh. + * The one lock constant: a lock whose mtime is older than this is treated as + * abandoned by a dead holder and broken (LLP 0049 D1). It is set comfortably + * above the longest *legitimate* hold - the 30s-bounded token-endpoint refresh + * ({@link refreshOidcSession}) plus a millisecond commit - so a live holder mid + * refresh is never broken, and comfortably below user patience. There is no + * separate wait timeout and no liveness probe: because a lock's mtime is fixed at + * acquisition and the clock only advances, every waiter is guaranteed to either + * acquire (the holder released) or break the lock (its age crossed this) within + * one stale interval. {@link withCredentialsLock} keeps a `2x` overall deadline + * only as a runaway-loop backstop. */ -const LOCK_TIMEOUT_MS = 45 * 1000 - -/** - * Age backstop for stealing an abandoned lock when liveness can't be checked: a - * holder on another machine (a shared `HOME`) or an unparseable/legacy lock - * file. It must exceed the longest legitimate hold (one refresh) so a live, slow - * holder is never stolen; same-host crashes are caught far sooner by pid - * liveness, not by this age. - */ -const LOCK_STALE_MS = 90 * 1000 +const LOCK_STALE_MS = 60 * 1000 /** * Single-entry parse cache for the credential file. The stdio proxy resolves a @@ -593,90 +589,28 @@ function delay(ms) { } /** - * This process's lock-owner tag, written into the lock file so a contender can - * tell a crashed holder (steal it) from a live one (wait). Host-qualified so a - * pid recorded on a different machine (a shared `HOME`) is never probed against - * this machine's process table. - */ -const LOCK_OWNER = JSON.stringify({ host: os.hostname(), pid: process.pid }) - -/** - * Whether the process that wrote a lock file is gone, so its lock is safe to - * steal. A same-host holder is probed with signal 0: `ESRCH` means no such - * process (dead); a delivered signal or `EPERM` means it is alive. A holder on a - * different host, or an unparseable/legacy lock, can't be probed and is reported - * "not dead" so the caller falls back to the age backstop. - * - * @param {string} ownerText raw lock-file contents - * @returns {boolean} - */ -function lockHolderIsDead(ownerText) { - /** @type {{ host?: unknown, pid?: unknown }} */ - let owner - try { - owner = JSON.parse(ownerText) - } catch { - return false - } - if (!owner || owner.host !== os.hostname() || typeof owner.pid !== 'number') return false - try { - process.kill(owner.pid, 0) - return false // signal accepted: the holder is alive - } catch (err) { - return /** @type {NodeJS.ErrnoException} */ (err).code === 'ESRCH' - } -} - -/** - * Try to steal a held lock, returning whether the caller should retry the - * acquire (the lock is gone or was abandoned) versus keep waiting (a live - * holder). Stealing is atomic: we rename the lock aside before removing it, and - * a given lock can be renamed by exactly one contender (the loser sees `ENOENT`), - * so two stealers can never both adopt it and clobber each other. We only delete - * the captured file when its contents still match the abandoned lock we - * inspected, so a fresh lock a new holder grabbed in the race window is restored, - * not evicted. + * Break a lock whose holder is past {@link LOCK_STALE_MS}, so a crashed holder can + * never wedge the store for longer than one stale interval (LLP 0049 D1). The + * break is a plain `fs.rm`: it does not grant the lock, it only clears a dead + * file, so the contender that broke it must still win the `O_EXCL` create like + * everyone else. That is what makes a four-line break safe where the old + * rename-aside-and-restore steal was not - exclusivity is decided by the create, + * never by the break. * * @param {string} lockPath - * @returns {Promise} true to retry the acquire, false to keep waiting + * @returns {Promise} */ -async function stealLockIfAbandoned(lockPath) { +async function breakLockIfStale(lockPath) { /** @type {Stats} */ let st - /** @type {string} */ - let ownerText try { st = await fs.stat(lockPath) - ownerText = await fs.readFile(lockPath, 'utf8') - } catch { - return true // the lock vanished between the failed open and now: retry - } - const abandoned = lockHolderIsDead(ownerText) || Date.now() - st.mtimeMs > LOCK_STALE_MS - if (!abandoned) return false // a live holder mid-refresh: wait it out - const captured = `${lockPath}.stale-${process.pid}` - try { - await fs.rename(lockPath, captured) - } catch { - return true // another contender captured it first: retry the acquire - } - // We now exclusively own the captured file. Confirm it is the same abandoned - // lock we inspected (a new holder would have written a different owner tag) and - // only then discard it; otherwise put it back without clobbering whatever sits - // at lockPath now, and keep waiting. - let capturedText = '' - try { - capturedText = await fs.readFile(captured, 'utf8') - } catch { /* already gone: treat as removed */ } - if (capturedText === ownerText) { - await fs.rm(captured, { force: true }) - return true - } - try { - await fs.rename(captured, lockPath) } catch { - await fs.rm(captured, { force: true }) + return // vanished between the failed open and now: the loop retries the create } - return false + // A holder within its budget is alive (or recently so): wait it out. Only an + // age past the bounded-hold ceiling marks it dead and breakable. + if (Date.now() - st.mtimeMs > LOCK_STALE_MS) await fs.rm(lockPath, { force: true }) } /** @@ -684,41 +618,48 @@ async function stealLockIfAbandoned(lockPath) { * `hyp` processes (a verb call beside a proxy, two MCP clients). Without it two * writers to *different* targets each read the whole map and rename; the later * rename clobbers the earlier writer's just-rotated one-time-use refresh token - * for the other target, forcing a needless re-login. An `O_EXCL` lock file is - * the cross-process mutex; it records this process's `{host, pid}` so the lock - * is a real mutex even though it is now held across the bounded token-endpoint - * refresh ({@link refreshOidcSession}). The cache is dropped on entry so the - * locked body reads the freshest on-disk map (an identical-size sibling rewrite - * the parse cache would otherwise miss). + * for the other target, forcing a needless re-login. An `O_EXCL` lock file is the + * cross-process mutex, held across the bounded token-endpoint refresh + * ({@link refreshOidcSession}). The cache is dropped on entry so the locked body + * reads the freshest on-disk map (an identical-size sibling rewrite the parse + * cache would otherwise miss). * - * A crashed holder's lock is stolen as soon as its pid is seen dead, not after a - * blind age wait, so a crash neither wedges every future write nor (the inverse) - * lets a fixed age threshold misfire on a merely slow live holder. The steal is - * atomic and single-winner (see {@link stealLockIfAbandoned}), and release only - * removes a lock that is still ours, so a holder can never evict a successor. + * Crash recovery is age-only (LLP 0049 D1): the lock is granted solely by the + * `O_EXCL` create, a holder past {@link LOCK_STALE_MS} is broken with a plain + * `fs.rm`, and release removes the file only if its per-acquisition nonce is still + * ours, so a holder whose overran lock was broken and re-acquired never evicts the + * successor. No liveness probe, no `{host, pid}` tag, no second timeout: the only + * deadline is a `2x` runaway-loop backstop, because a fixed-mtime lock is + * guaranteed to be acquired or broken within one stale interval. * * @template T * @param {string} stateDir * @param {() => Promise} fn * @returns {Promise} - * @ref LLP 0046#d5 [implements]: pid-identified mutex with liveness-based, single-winner steal + * @ref LLP 0049#d1 [implements]: age-stale mutex - grant only by O_EXCL, break by rm, release by nonce */ async function withCredentialsLock(stateDir, fn) { await fs.mkdir(stateDir, { recursive: true }) const lockPath = `${remoteCredentialsPath(stateDir)}.lock` - const deadline = Date.now() + LOCK_TIMEOUT_MS + const nonce = crypto.randomUUID() + const deadline = Date.now() + LOCK_STALE_MS * 2 for (;;) { try { const handle = await fs.open(lockPath, 'wx') try { - await handle.writeFile(LOCK_OWNER) + await handle.writeFile(nonce) + } catch (err) { + // A create that could not record its nonce must not leave an empty lock + // that wedges contenders until the stale age; drop our own fresh file. + await fs.rm(lockPath, { force: true }) + throw err } finally { await handle.close() } break } catch (err) { if (!err || /** @type {NodeJS.ErrnoException} */ (err).code !== 'EEXIST') throw err - if (await stealLockIfAbandoned(lockPath)) continue + await breakLockIfStale(lockPath) if (Date.now() > deadline) { throw new Error('timed out acquiring the remote credentials lock') } @@ -730,11 +671,11 @@ async function withCredentialsLock(stateDir, fn) { rawCache = null return await fn() } finally { - // Remove only our own lock: if a slow hold ever overran the age backstop and - // a contender stole it, the file now belongs to a successor we must not evict. + // Remove only our own lock: if a hold ever overran the stale age and a + // contender broke and re-acquired it, the file now belongs to a successor. try { const owner = await fs.readFile(lockPath, 'utf8') - if (owner === LOCK_OWNER) await fs.rm(lockPath, { force: true }) + if (owner === nonce) await fs.rm(lockPath, { force: true }) } catch { /* already gone */ } } } diff --git a/test/core/remote-credentials-oidc.test.js b/test/core/remote-credentials-oidc.test.js index fe22154..fe1cbca 100644 --- a/test/core/remote-credentials-oidc.test.js +++ b/test/core/remote-credentials-oidc.test.js @@ -328,14 +328,16 @@ test('resolveAccessJwt refreshes with the freshest stored refresh token in one s assert.equal(/** @type {any} */ ((await readCredentials(dir)).prod).refreshToken, 'rt-final') }) -test('a write steals a lock abandoned by a crashed holder (dead pid)', async () => { +test('a write breaks a lock left stale by a crashed holder', async () => { const dir = await tmpState() await writeToken(dir, 'prod', 'sk-1') // creates the store const lockPath = `${remoteCredentialsPath(dir)}.lock` - // Simulate a crashed holder: a lock file naming a dead pid on this host. - await fs.writeFile(lockPath, JSON.stringify({ host: os.hostname(), pid: 999999999 })) - // The next write must steal the lock by liveness (the pid is dead) and succeed, - // not wait out the age backstop or time out. + // Simulate a crashed holder: a leftover lock file, back-dated well past the + // stale age so it reads as abandoned (LLP 0049 D1 - age, not liveness). + await fs.writeFile(lockPath, 'crashed-holder-nonce') + const stale = new Date(Date.now() - 5 * 60 * 1000) + await fs.utimes(lockPath, stale, stale) + // The next write must break the stale lock by age and succeed, not time out. await writeToken(dir, 'prod', 'sk-2') assert.deepEqual((await readCredentials(dir)).prod, { kind: 'static', token: 'sk-2' }) }) From 20688e87a85588c812ea7fd42f682dd9d99b4a89 Mon Sep 17 00:00:00 2001 From: Kenny Daniel Date: Tue, 30 Jun 2026 16:29:00 -0700 Subject: [PATCH 59/59] remote/credentials: re-validate the refresh decision against disk under the lock Two concurrency bugs shared one root cause: a decision inside the lock body trusted an observation made before the lock was acquired, which can be tens of seconds stale under contention. - The sibling-adopt freshness check used the `now` sampled at resolveAccessJwt entry, so after a long lock wait it could adopt a sibling JWT that had already expired. Re-sample the clock at the check instead. - The refresh commit wrote the session back unconditionally, so in the bounded lock-break double-hold window (LLP 0049) it could resurrect a session a concurrent remove had deleted, or clobber a fresh login. Make the commit a compare-and-swap against the refresh token it read under the lock. The CAS makes LLP 0049's "at most one needless re-login" worst case actually hold. The lock primitive itself is unchanged: its residual double-hold window is irreducible with Node's stdlib fs and already accepted by the design. --- llp/0049-remote-credentials-lock.decision.md | 16 +++--- src/core/remote/credentials.js | 54 +++++++++++++++----- test/core/remote-credentials-oidc.test.js | 32 ++++++++++++ 3 files changed, 83 insertions(+), 19 deletions(-) diff --git a/llp/0049-remote-credentials-lock.decision.md b/llp/0049-remote-credentials-lock.decision.md index 9844f0e..6249b4f 100644 --- a/llp/0049-remote-credentials-lock.decision.md +++ b/llp/0049-remote-credentials-lock.decision.md @@ -104,12 +104,16 @@ holds. `age > LOCK_STALE_MS`) and its `rm`: two adjacent filesystem awaits. - **Bounded worst case.** Even in that window, both writers commit via atomic `tmp + rename` of the credential file, so no torn or partial file is ever - observed. The only possible loss is one one-time-use refresh-token rotation being - overwritten, costing **at most one needless re-login**, which self-heals on the - next refresh. We accept this bounded, self-healing edge instead of chasing - perfect exclusion, which is unattainable without OS-auto-released advisory locks - (`flock`/`fcntl`) that Node's standard `fs` does not expose and that we will not - add a native dependency to reach. + observed. The refresh commit is a **compare-and-swap** against the refresh token + it read under the lock, so a second writer in the double-hold window cannot + resurrect a session a concurrent `remove` deleted nor clobber a fresh login - it + declines to write when the on-disk record is no longer the one it refreshed from. + The only remaining loss is the two writers double-spending one one-time-use + refresh-token rotation (the loser gets `invalid_grant`), costing **at most one + needless re-login**, which self-heals on the next refresh. We accept this bounded, + self-healing edge instead of chasing perfect exclusion, which is unattainable + without OS-auto-released advisory locks (`flock`/`fcntl`) that Node's standard + `fs` does not expose and that we will not add a native dependency to reach. - **No deadlock.** A crashed holder's lock is always broken within `LOCK_STALE_MS`, so no contender waits longer than one stale interval. - **Single-flight preserved.** Absent the crash-recovery window, exactly one diff --git a/src/core/remote/credentials.js b/src/core/remote/credentials.js index 7bd3193..2fb7468 100644 --- a/src/core/remote/credentials.js +++ b/src/core/remote/credentials.js @@ -251,17 +251,32 @@ export async function writeSession(stateDir, target, session) { /** * Persist an oidc session WITHOUT taking the write lock: the caller must already - * hold it. {@link writeSession} is the locked public entry point; the refresh - * compare-and-swap in {@link refreshOidcSession} re-reads and writes inside one - * lock acquisition and uses this directly to keep that read-modify-write atomic. + * hold it. {@link writeSession} is the locked public entry point; the single-flight + * refresh in {@link refreshOidcSession} re-reads and writes inside one lock + * acquisition and uses this directly to keep that read-modify-write atomic. + * + * When `ifRefreshToken` is given, the write is a compare-and-swap: it commits only + * if the on-disk record is still the oidc record that token came from. Under the + * lock that always holds, so a normal refresh always commits. It diverges only in + * the bounded lock-break double-hold window (LLP 0049): there a concurrent `remove` + * or re-login may have changed the record while the refresher was on the network, + * and honoring that newer write - rather than resurrecting a removed session or + * clobbering a fresh login - is what keeps the worst case at the one needless + * re-login LLP 0049 promises. Returns whether the write happened. * * @param {string} stateDir * @param {string} target * @param {{ refreshToken: string, accessJwt: string, expiresAt: string, org: string }} session - * @returns {Promise} + * @param {string} [ifRefreshToken] commit only if the stored record still carries this refresh token + * @returns {Promise} + * @ref LLP 0049#d1 [implements]: refresh commit is a compare-and-swap, so the double-hold window cannot resurrect or clobber a sibling write */ -async function commitSession(stateDir, target, session) { +async function commitSession(stateDir, target, session, ifRefreshToken) { const current = await readRawCredentials(stateDir, { mutable: true }) + if (ifRefreshToken !== undefined) { + const cur = /** @type {Record | undefined} */ (current[target]) + if (!cur || typeof cur !== 'object' || cur.refreshToken !== ifRefreshToken) return false + } current[target] = { kind: 'oidc', refreshToken: session.refreshToken, @@ -270,6 +285,7 @@ async function commitSession(stateDir, target, session) { org: session.org, } await writeCredentials(stateDir, current) + return true } /** @@ -376,7 +392,7 @@ export async function resolveAccessJwt({ target, env, stateDir, identityBase, no if (!identityBase) { return { ok: false, error: `cannot refresh '${target}': no identity endpoint resolved` } } - return refreshOidcSession({ target, stateDir, identityBase, fetchImpl, now, from: entry, envName }) + return refreshOidcSession({ target, stateDir, identityBase, fetchImpl, from: entry, envName }) } /** @@ -398,19 +414,25 @@ export async function resolveAccessJwt({ target, env, stateDir, identityBase, no * it. Holding the lock across the bounded token call is the cost of removing the * race entirely; see {@link withCredentialsLock} for why that is safe. * + * Two decisions deliberately re-validate against disk at the point of use rather + * than trusting a pre-lock observation, since acquiring the lock can take tens of + * seconds: the adopt freshness check re-samples the clock (a JWT that looked fresh + * at entry may now be within the skew window), and the commit is a compare-and-swap + * against the token we refreshed from (the record may have been removed or replaced + * in the bounded double-hold window of LLP 0049). + * * @param {{ * target: string, * stateDir: string, * identityBase: string, * fetchImpl?: typeof fetch, - * now: number, * from: RemoteOidcRecord, * envName: string, * }} args * @returns {Promise<{ ok: true, token: string, source: 'file', kind: 'oidc' } | { ok: false, error: string }>} * @ref LLP 0046#d5 [implements]: single-flight refresh under the lock - no double-spend, clobber, or lost-race re-login across concurrent hyp processes */ -async function refreshOidcSession({ target, stateDir, identityBase, fetchImpl, now, from, envName }) { +async function refreshOidcSession({ target, stateDir, identityBase, fetchImpl, from, envName }) { return withCredentialsLock(stateDir, async () => { const latest = await readOidcRecord(stateDir, target) if (!latest) { @@ -418,23 +440,29 @@ async function refreshOidcSession({ target, stateDir, identityBase, fetchImpl, n // got the lock. Honor that instead of resurrecting a refreshed session. return { ok: false, error: noTokenError(target, envName) } } - // A sibling refreshed (or the user re-logged in) while we waited for the - // lock: adopt its newer, still-fresh JWT with no token-endpoint call. - if (latest.accessJwt !== from.accessJwt && isFresh(latest, now)) { + // A sibling refreshed (or the user re-logged in) while we waited for the lock: + // adopt its newer JWT with no token-endpoint call. Freshness is re-checked + // against the clock now, not the `now` sampled before the (possibly long) lock + // wait, so we never adopt a sibling token that expired while we waited. + if (latest.accessJwt !== from.accessJwt && isFresh(latest, Date.now())) { return { ok: true, token: latest.accessJwt, source: 'file', kind: 'oidc' } } // We are the single in-flight refresher. Refresh from the freshest stored // refresh token (a sibling may have rotated it to a token whose JWT is itself // already stale) and commit. A rotated one-time-use token replaces the // consumed one; otherwise the stored token is kept. `org` is fixed for the - // token's life, so an omitted one keeps the stored value. + // token's life, so an omitted one keeps the stored value. The commit is a + // compare-and-swap on `latest.refreshToken`: should the bounded double-hold + // window have let a concurrent remove or login change the record while we were + // on the network, we surface the fresh JWT for this in-flight request but leave + // the newer on-disk state untouched rather than resurrecting or clobbering it. const refreshed = await refreshSession({ identityBase, refreshToken: latest.refreshToken, fetchImpl }) await commitSession(stateDir, target, { refreshToken: refreshed.refreshToken || latest.refreshToken, accessJwt: refreshed.accessJwt, expiresAt: refreshed.expiresAt, org: refreshed.org || latest.org, - }) + }, latest.refreshToken) return { ok: true, token: refreshed.accessJwt, source: 'file', kind: 'oidc' } }) } diff --git a/test/core/remote-credentials-oidc.test.js b/test/core/remote-credentials-oidc.test.js index fe1cbca..92e4a8f 100644 --- a/test/core/remote-credentials-oidc.test.js +++ b/test/core/remote-credentials-oidc.test.js @@ -271,6 +271,38 @@ test('resolveAccessJwt does not resurrect a session removed before the refresh', assert.equal(/** @type {any} */ ((await readCredentials(dir)).prod), undefined) }) +test('a refresh does not resurrect a record removed during the network call (commit is a compare-and-swap)', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'rt-old', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) + // Simulate the bounded lock-break double-hold window (LLP 0049): while this + // refresher is on the network, a concurrent holder removes the record (writing + // the file directly, as if it too held the lock). The commit must honor that + // removal, not write the refreshed session back over it. + const fetchImpl = /** @type {any} */ (async () => { + await fs.writeFile(remoteCredentialsPath(dir), JSON.stringify({})) + return { ok: true, status: 200, text: async () => JSON.stringify({ access_jwt: 'jwt-new', expires_at: FUTURE, org: 'acme', refresh_token: 'rt-new' }) } + }) + const r = await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }) + // The in-flight request still gets the freshly minted JWT... + assert.equal(/** @type {any} */ (r).token, 'jwt-new') + // ...but the removed record stays removed - nothing is resurrected on disk. + assert.equal(/** @type {any} */ ((await readCredentials(dir)).prod), undefined) +}) + +test('a refresh does not clobber a record a concurrent login replaced during the network call', async () => { + const dir = await tmpState() + await writeSession(dir, 'prod', { refreshToken: 'rt-old', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' }) + // A concurrent holder re-logs in (static token) while we are refreshing the old + // oidc session. The CAS on the refreshed-from token must leave the newer login. + const fetchImpl = /** @type {any} */ (async () => { + await fs.writeFile(remoteCredentialsPath(dir), JSON.stringify({ prod: { kind: 'static', token: 'sk-fresh' } })) + return { ok: true, status: 200, text: async () => JSON.stringify({ access_jwt: 'jwt-new', expires_at: FUTURE, org: 'acme', refresh_token: 'rt-new' }) } + }) + await resolveAccessJwt({ target: 'prod', env: {}, stateDir: dir, identityBase: 'https://h/v1/identity', fetchImpl }) + // The concurrent login wins; the stale refresh did not overwrite it. + assert.deepEqual(/** @type {any} */ ((await readCredentials(dir)).prod), { kind: 'static', token: 'sk-fresh' }) +}) + test('resolveAccessJwt keeps the stored org when a refresh response omits it', async () => { const dir = await tmpState() await writeSession(dir, 'prod', { refreshToken: 'rt', accessJwt: 'jwt-old', expiresAt: PAST, org: 'acme' })