oauth/forward: refresh near-expired upstream id_token at /token (#121)#122
Merged
Merged
Conversation
In forward mode the bearer the MCP client receives is the upstream id_token verbatim. Google's silent-SSO can return a cached id_token whose exp is set from the original mint time, sometimes leaving only minutes of remaining life. Pod log evidence from antalya-mcp 2026-05-15: 19:30:38 INF upstream code exchange succeeded expires_in=3598 19:38:39 ERR OAuth token expired exp=1778872374 i.e. an id_token with ~2 min of life forwarded as a 1h bearer. Result: ChatGPT / claude.ai reports "disconnected" minutes after connecting. Fix: * /authorize: when oauth.upstream_offline_access is true, add both Google's `access_type=offline` + `prompt=consent` and the RFC-strict `offline_access` scope. Either form unlocks a refresh_token from the upstream, depending on provider; the unrecognised form is ignored. * /token (forward mode + refresh_token returned + id_token remaining life < 55 min): immediately exchange the upstream refresh_token for a fresh id_token via grant_type=refresh_token, swap into the bearer before forwarding. Soft-fail on refresh errors — keep the original near-expired id_token rather than break /token entirely. * The upstream refresh_token is NOT exposed downstream. #115's no-downstream-refresh decision stands. This change only ensures the 1h window the AS metadata advertises is the window the client actually gets. Tests (oauth_forward_refresh_test.go): * TestOAuthAuthorize_OfflineAccessParams: with/without offline_access on cfg, /authorize sends access_type=offline+prompt=consent+offline_access scope (or none of them). * TestOAuthToken_RefreshesNearExpiredIDToken: 2-min id_token triggers one refresh_token call; expires_in in /token response > 50 min. * TestOAuthToken_SkipsRefreshWhenIDTokenFresh: 57-min id_token, zero refresh calls. * TestOAuthToken_RefreshFailureSoftFallsBack: upstream returns 500 on refresh; /token still 200, original id_token forwarded. * TestOAuthToken_NoRefreshWhenUpstreamReturnsNoRefreshToken: no upstream refresh_token → refresh path not attempted. Gating-mode deployments unaffected — the refresh branch is gated on oauthForwardMode(). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sending offline_access scope to Google produces a HARD invalid_scope error at /authorize (Google does not silently ignore unknown scopes). Previous commit assumed it was a no-op; it isn't. Provider-detect by issuer: * Google issuer -> access_type=offline + prompt=consent, NO offline_access scope. * Anything else (Auth0, etc.) -> offline_access scope, no access_type. E2E failure that surfaced this: "Access blocked: Some requested scopes were invalid. invalid=[offline_access]" on first /authorize against antalya-mcp after #122 was deployed. Test updated to cover both providers x both offline_access settings. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The bearer returned to MCP clients is the upstream id_token under ALL broker-mode deployments (`bearerToken := tokenResp.IDToken` runs unconditionally), not just forward mode. Initial #121 patch gated the refresh on `oauthForwardMode()` and silently skipped gating+broker_upstream deployments (github-mcp, otel-google-gating-mcp) which suffer the exact same Google-cached-id_token disconnect. Reproduced today on github-mcp: same overnight-disconnect symptom as antalya. Pulling `oauthForwardMode()` -> `oauthBrokerMode()` makes the refresh fire on every broker deployment. Constant renamed `forwardModeIDTokenRefreshThresholdSeconds` -> `brokerModeIDTokenRefreshThresholdSeconds` to match the new scope. Added test `TestOAuthToken_RefreshesNearExpiredIDToken_GatingBrokerUpstream` guarding the broader gate; existing forward-mode tests stay green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1. Drop default prompt=consent on Google /authorize. Google mints the
refresh_token on first interactive consent and honors it on silent
SSO thereafter; forcing consent every authorize shows the
"access your account when not using" screen on every login and is
UX-hostile. New config field oauth.upstream_force_consent (default
false) restores the previous behaviour for operators who need to
force re-enrollment (rotated upstream client, revoked grant).
2. Drop dead `var _ = io.Discard` placeholder from
oauth_forward_refresh_test.go.
3. Surface error_description in refreshUpstreamIDToken's error
wrapping. Google's refresh-token failures are diagnostically richer
in error_description ("Token has been expired or revoked") than the
bare error enum ("invalid_grant"). New refreshErrorFields helper +
sanitizeErrorDesc (strips control chars, caps at 120 bytes) extract
the description without leaking arbitrary IdP-side bytes.
4. Two new tests:
- TestOAuthToken_RefreshFallsBackWhenUpstreamReturnsNoIDToken:
refresh response with access_token only (no id_token) must
soft-fail and forward the original near-expired id_token.
- TestOAuthToken_RefreshErrorDescriptionSurfacedInLog: invalid_grant
+ "Token has been expired or revoked" round-trips through the
wrapped error untouched.
- TestOAuthAuthorize_ForceConsentFlag: prompt=consent only set when
UpstreamForceConsent=true; default omitted.
5. billing-mcp values already carry upstream_offline_access: true
(Auth0 issuer uses the offline_access scope path; no change to its
pre-existing config needed). No file edit; verified in review.
Existing TestOAuthAuthorize_OfflineAccessParams.google_enabled updated
to assert prompt=consent is now absent by default.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #121.
What
Forward-mode bearer = upstream id_token verbatim. Google's silent-SSO sometimes returns an id_token with only minutes of remaining life, so the MCP client gets a session far shorter than the 1h
expires_inwe advertise. This PR uses the upstream's refresh_token grant at /token to mint a fresh id_token before forwarding.Why
Pod log from antalya-mcp 2026-05-15:
```
19:30:38 INF upstream code exchange succeeded expires_in=3598
19:38:39 ERR OAuth token expired exp=1778872374
```
8 minutes from auth to disconnect — id_token had ~2 min of life when we forwarded it. With the fix, each /authorize round-trip yields a deterministic ~1h window end-to-end.
Changes
Tests (5 new)
```
ok github.com/altinity/altinity-mcp/cmd/altinity-mcp 8.337s
ok github.com/altinity/altinity-mcp/pkg/clickhouse 8.644s
ok github.com/altinity/altinity-mcp/pkg/config 0.015s
ok github.com/altinity/altinity-mcp/pkg/jwe_auth 0.009s
ok github.com/altinity/altinity-mcp/pkg/server 16.861s
```
Test plan
Out of scope