Skip to content

oauth/forward: refresh near-expired upstream id_token at /token (#121)#122

Merged
BorisTyshkevich merged 4 commits into
mainfrom
oauth/forward-mode-id-token-refresh-121
May 16, 2026
Merged

oauth/forward: refresh near-expired upstream id_token at /token (#121)#122
BorisTyshkevich merged 4 commits into
mainfrom
oauth/forward-mode-id-token-refresh-121

Conversation

@BorisTyshkevich
Copy link
Copy Markdown
Collaborator

Closes #121.

What

Forward-mode bearer = upstream id_token verbatim. Google's silent-SSO sometimes returns an id_token with only minutes of remaining life, so the MCP client gets a session far shorter than the 1h expires_in we advertise. This PR uses the upstream's refresh_token grant at /token to mint a fresh id_token before forwarding.

Why

Pod log from antalya-mcp 2026-05-15:

```
19:30:38 INF upstream code exchange succeeded expires_in=3598
19:38:39 ERR OAuth token expired exp=1778872374
```

8 minutes from auth to disconnect — id_token had ~2 min of life when we forwarded it. With the fix, each /authorize round-trip yields a deterministic ~1h window end-to-end.

Changes

  • `handleOAuthAuthorize`: when `oauth.upstream_offline_access` is true, add Google's `access_type=offline` + `prompt=consent` to the upstream /authorize request alongside the existing `offline_access` scope. Whichever the provider recognises unlocks a refresh_token; the other is silently ignored.
  • New helper `refreshUpstreamIDToken`: posts `grant_type=refresh_token` to the upstream /token, validates the returned id_token, returns it + parsed claims.
  • `handleOAuthTokenAuthCode`: after the initial code exchange + identity validation, if in forward mode AND a refresh_token was returned AND the id_token's remaining life is below `forwardModeIDTokenRefreshThresholdSeconds` (55 min), refresh the id_token and swap it into the bearer. Soft-fail on errors — keep the original near-expired token rather than break /token.
  • Refresh_token is discarded; RFC: support CIMD inbound at /authorize #115's no-downstream-refresh decision is unchanged.

Tests (5 new)

  • `TestOAuthAuthorize_OfflineAccessParams`: with/without `upstream_offline_access`, /authorize either does or does not send `access_type=offline` + `prompt=consent` + `offline_access` scope.
  • `TestOAuthToken_RefreshesNearExpiredIDToken`: 2-min id_token triggers one refresh_token call; response `expires_in` > 50 min.
  • `TestOAuthToken_SkipsRefreshWhenIDTokenFresh`: 57-min id_token, zero refresh calls.
  • `TestOAuthToken_RefreshFailureSoftFallsBack`: upstream 500 on refresh → /token still returns 200 with original id_token.
  • `TestOAuthToken_NoRefreshWhenUpstreamReturnsNoRefreshToken`: upstream code exchange returned no refresh_token → refresh never attempted.

```
ok github.com/altinity/altinity-mcp/cmd/altinity-mcp 8.337s
ok github.com/altinity/altinity-mcp/pkg/clickhouse 8.644s
ok github.com/altinity/altinity-mcp/pkg/config 0.015s
ok github.com/altinity/altinity-mcp/pkg/jwe_auth 0.009s
ok github.com/altinity/altinity-mcp/pkg/server 16.861s
```

Test plan

  • Build image, roll to antalya-mcp + otel-google-mcp.
  • Flip `upstream_offline_access: true` in antalya/otel-google values.
  • First /authorize via claude.ai → expect Google offline-access consent screen (one-time).
  • Subsequent /authorize rounds: id_token forwarded with fresh ~1h `exp`.
  • Wait > 5 min after auth, run `execute_query` — should NOT report disconnected.
  • Pod log: `OAuth /token: refreshed near-expired id_token via upstream refresh_token grant` when applicable.

Out of scope

  • Downstream refresh_token (still removed per RFC: support CIMD inbound at /authorize #115).
  • Extending sessions beyond Google's 1h id_token cap — would require reintroducing downstream refresh.
  • gating-mode deployments — refresh branch gated on `oauthForwardMode()`.

BorisTyshkevich and others added 4 commits May 16, 2026 12:58
In forward mode the bearer the MCP client receives is the upstream
id_token verbatim. Google's silent-SSO can return a cached id_token
whose exp is set from the original mint time, sometimes leaving only
minutes of remaining life. Pod log evidence from antalya-mcp 2026-05-15:

  19:30:38  INF  upstream code exchange succeeded   expires_in=3598
  19:38:39  ERR  OAuth token expired               exp=1778872374

i.e. an id_token with ~2 min of life forwarded as a 1h bearer. Result:
ChatGPT / claude.ai reports "disconnected" minutes after connecting.

Fix:

* /authorize: when oauth.upstream_offline_access is true, add both
  Google's `access_type=offline` + `prompt=consent` and the RFC-strict
  `offline_access` scope. Either form unlocks a refresh_token from the
  upstream, depending on provider; the unrecognised form is ignored.

* /token (forward mode + refresh_token returned + id_token remaining
  life < 55 min): immediately exchange the upstream refresh_token for
  a fresh id_token via grant_type=refresh_token, swap into the bearer
  before forwarding. Soft-fail on refresh errors — keep the original
  near-expired id_token rather than break /token entirely.

* The upstream refresh_token is NOT exposed downstream. #115's
  no-downstream-refresh decision stands. This change only ensures the
  1h window the AS metadata advertises is the window the client
  actually gets.

Tests (oauth_forward_refresh_test.go):

* TestOAuthAuthorize_OfflineAccessParams: with/without offline_access
  on cfg, /authorize sends access_type=offline+prompt=consent+offline_access
  scope (or none of them).
* TestOAuthToken_RefreshesNearExpiredIDToken: 2-min id_token triggers
  one refresh_token call; expires_in in /token response > 50 min.
* TestOAuthToken_SkipsRefreshWhenIDTokenFresh: 57-min id_token, zero
  refresh calls.
* TestOAuthToken_RefreshFailureSoftFallsBack: upstream returns 500 on
  refresh; /token still 200, original id_token forwarded.
* TestOAuthToken_NoRefreshWhenUpstreamReturnsNoRefreshToken: no upstream
  refresh_token → refresh path not attempted.

Gating-mode deployments unaffected — the refresh branch is gated on
oauthForwardMode().

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sending offline_access scope to Google produces a HARD invalid_scope
error at /authorize (Google does not silently ignore unknown scopes).
Previous commit assumed it was a no-op; it isn't. Provider-detect by
issuer:

* Google issuer -> access_type=offline + prompt=consent, NO offline_access scope.
* Anything else (Auth0, etc.) -> offline_access scope, no access_type.

E2E failure that surfaced this: "Access blocked: Some requested scopes
were invalid. invalid=[offline_access]" on first /authorize against
antalya-mcp after #122 was deployed.

Test updated to cover both providers x both offline_access settings.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The bearer returned to MCP clients is the upstream id_token under ALL
broker-mode deployments (`bearerToken := tokenResp.IDToken` runs
unconditionally), not just forward mode. Initial #121 patch gated the
refresh on `oauthForwardMode()` and silently skipped
gating+broker_upstream deployments (github-mcp, otel-google-gating-mcp)
which suffer the exact same Google-cached-id_token disconnect.

Reproduced today on github-mcp: same overnight-disconnect symptom as
antalya. Pulling `oauthForwardMode()` -> `oauthBrokerMode()` makes
the refresh fire on every broker deployment.

Constant renamed `forwardModeIDTokenRefreshThresholdSeconds` ->
`brokerModeIDTokenRefreshThresholdSeconds` to match the new scope.

Added test `TestOAuthToken_RefreshesNearExpiredIDToken_GatingBrokerUpstream`
guarding the broader gate; existing forward-mode tests stay green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1. Drop default prompt=consent on Google /authorize. Google mints the
   refresh_token on first interactive consent and honors it on silent
   SSO thereafter; forcing consent every authorize shows the
   "access your account when not using" screen on every login and is
   UX-hostile. New config field oauth.upstream_force_consent (default
   false) restores the previous behaviour for operators who need to
   force re-enrollment (rotated upstream client, revoked grant).

2. Drop dead `var _ = io.Discard` placeholder from
   oauth_forward_refresh_test.go.

3. Surface error_description in refreshUpstreamIDToken's error
   wrapping. Google's refresh-token failures are diagnostically richer
   in error_description ("Token has been expired or revoked") than the
   bare error enum ("invalid_grant"). New refreshErrorFields helper +
   sanitizeErrorDesc (strips control chars, caps at 120 bytes) extract
   the description without leaking arbitrary IdP-side bytes.

4. Two new tests:
   - TestOAuthToken_RefreshFallsBackWhenUpstreamReturnsNoIDToken:
     refresh response with access_token only (no id_token) must
     soft-fail and forward the original near-expired id_token.
   - TestOAuthToken_RefreshErrorDescriptionSurfacedInLog: invalid_grant
     + "Token has been expired or revoked" round-trips through the
     wrapped error untouched.
   - TestOAuthAuthorize_ForceConsentFlag: prompt=consent only set when
     UpstreamForceConsent=true; default omitted.

5. billing-mcp values already carry upstream_offline_access: true
   (Auth0 issuer uses the offline_access scope path; no change to its
   pre-existing config needed). No file edit; verified in review.

Existing TestOAuthAuthorize_OfflineAccessParams.google_enabled updated
to assert prompt=consent is now absent by default.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@BorisTyshkevich BorisTyshkevich merged commit 9499a95 into main May 16, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

forward mode: short-lived id_tokens cause early disconnect

1 participant