Restore OAuth against the reworked Cloud scope model#53
Conversation
The Cloud console OAuth server reworked its scope model (RAR v2): the old scope names (account.admin, teams.read, ...) were replaced by OIDC scopes plus umbrella all/project:all/organization:all scopes, with concrete project/org selection handled on the consent screen via RFC 9396 authorization_details. That broke the hosted MCP twice over: the process-lifetime discovery cache kept advertising the removed scope names (clients got invalid_scope), and mirroring the new 120-scope catalog would exceed the authorize endpoint's scope parameter length cap. Auth fixes: - Advertise a curated scope set (openid profile email all project:all organization:all), intersected with the authorization server's live scopes_supported; fall back to mirroring discovery for projects with custom catalogs. Overridable via MCP_OAUTH_SCOPES. - Give the discovery cache a 5-minute TTL with stale-fallback on refresh failure, so authorization-server changes propagate without a redeploy. Housekeeping in the same change: - Consolidate module-level constants into constants.py (dedupes DEFAULT_ENDPOINT, prefixes the generic docs_search names with DOCS_). - Add a pyright (basic) check to the lint job and fix the type errors it found: Optional annotations, ASGIApp typing, AnyUrl resource URIs, a missing inline-content guard, and the anyio.to_thread submodule import.
Greptile SummaryThis PR restores the hosted MCP's OAuth flow against Appwrite Cloud's reworked RAR v2 scope model by curating the advertised scope set and adding a TTL-based discovery cache with stale fallback. It also consolidates module-level constants into
Confidence Score: 4/5Safe to merge; all changes are additive and the core auth contract (issuer binding, audience check, JWKS RS256 verification) is unchanged. The two items worth watching are both edge-case behaviors under specific failure conditions. The curated-scope and TTL-cache logic is the heart of this PR and is well-covered by the new test suite. Two non-blocking concerns exist: during an AS outage, stale metadata is served correctly but every request independently retries the failed fetch (10 s timeout each), because the stale entry's timestamp is never refreshed; and the MCP_OAUTH_SCOPES env override falls back to the full discovered catalog when none of its scopes intersect with the AS — counter to the operator's intent. auth.py — the stale-fallback path in both authorization_server_metadata() and authorization_server_metadata_sync() and the _advertised_scopes fallback logic when the env override produces an empty intersection. Important Files Changed
Reviews (1): Last reviewed commit: "Restore OAuth against the reworked Cloud..." | Re-trigger Greptile |
| async def authorization_server_metadata() -> dict: | ||
| project_id = configured_project_id() | ||
| cached = _discovery_cache.get(project_id) | ||
| cached = _cached_discovery(project_id) | ||
| if cached is not None: | ||
| return cached | ||
|
|
||
| url = discovery_url() | ||
| async with httpx.AsyncClient(timeout=10.0, follow_redirects=True) as client: | ||
| resp = await client.get(url) | ||
| resp.raise_for_status() | ||
| metadata = _validate_discovery(resp.json(), url) | ||
|
|
||
| _discovery_cache[project_id] = metadata | ||
| try: | ||
| async with httpx.AsyncClient(timeout=10.0, follow_redirects=True) as client: | ||
| resp = await client.get(url) | ||
| resp.raise_for_status() | ||
| metadata = _validate_discovery(resp.json(), url) | ||
| except Exception as exc: | ||
| stale = _cached_discovery(project_id, allow_stale=True) | ||
| if stale is not None: | ||
| _log(f"Discovery refresh failed ({exc}); serving stale metadata.") | ||
| return stale | ||
| raise | ||
|
|
||
| _store_discovery(project_id, metadata) | ||
| return metadata | ||
|
|
||
|
|
||
| def authorization_server_metadata_sync() -> dict: | ||
| project_id = configured_project_id() | ||
| cached = _discovery_cache.get(project_id) | ||
| cached = _cached_discovery(project_id) | ||
| if cached is not None: | ||
| return cached | ||
|
|
||
| url = discovery_url() | ||
| resp = httpx.get(url, timeout=10.0, follow_redirects=True) | ||
| resp.raise_for_status() | ||
| metadata = _validate_discovery(resp.json(), url) | ||
| _discovery_cache[project_id] = metadata | ||
| try: | ||
| resp = httpx.get(url, timeout=10.0, follow_redirects=True) | ||
| resp.raise_for_status() | ||
| metadata = _validate_discovery(resp.json(), url) | ||
| except Exception as exc: | ||
| stale = _cached_discovery(project_id, allow_stale=True) | ||
| if stale is not None: | ||
| _log(f"Discovery refresh failed ({exc}); serving stale metadata.") | ||
| return stale | ||
| raise | ||
|
|
||
| _store_discovery(project_id, metadata) | ||
| return metadata |
There was a problem hiding this comment.
Continuous retry overhead when AS is down
Both authorization_server_metadata() and authorization_server_metadata_sync() return stale data on a failed refresh, but they don't update the cached timestamp. This means every subsequent request — including every _verify_sync call, which runs in a thread-pool thread — will attempt a fresh 10-second HTTP fetch before falling back to the stale copy. Under any meaningful load during an AS outage, concurrent threads each block for the full timeout=10.0, which can exhaust the threadpool and dramatically degrade response times despite the MCP continuing to serve correct results. A simple fix is to bump the cached entry's timestamp when a stale hit is served so the entry is treated as fresh for the next TTL window.
| def _advertised_scopes(metadata: dict) -> list[str]: | ||
| """The scope set to advertise: the preferred scopes intersected with the | ||
| authorization server's live ``scopes_supported`` (so a renamed/removed scope | ||
| is never advertised). Falls back to mirroring the full discovery list when | ||
| none of the preferred scopes exist — e.g. a self-hosted project with a | ||
| custom, compact scope catalog.""" | ||
| discovered = metadata.get("scopes_supported") | ||
| if not isinstance(discovered, list): | ||
| raise ValueError( | ||
| f"authorization server discovery missing scopes_supported: {discovery_url()}" | ||
| ) | ||
| return scopes | ||
| scopes = [scope for scope in preferred_scopes() if scope in discovered] | ||
| if scopes: | ||
| return scopes | ||
| _log( | ||
| "None of the preferred scopes are in the authorization server's " | ||
| "scopes_supported; advertising the full discovered list." | ||
| ) | ||
| return discovered |
There was a problem hiding this comment.
MCP_OAUTH_SCOPES override silently falls back to the full catalog
When MCP_OAUTH_SCOPES is set but none of the specified scopes exist in the AS's scopes_supported, _advertised_scopes falls back to mirroring the full discovered list. For Cloud, that's 120+ scopes — the overflow condition the env var is specifically meant to prevent. An operator who sets this variable to work around a non-standard AS catalog would find it has no effect, and the same scope-parameter overflow will occur. The fallback log message fires, but the behavior is counter-intuitive and the current tests only cover the case where the override scopes exist in the catalog.
Summary
The Cloud console OAuth server reworked its scope model (RAR v2 — appwrite-labs/cloud#4546 and follow-ups): the old scope names (
account.admin,teams.read,projects.read, …) no longer exist. The server now publishes ~120 fine-grained scopes (project:*,organization:*) plus umbrella scopes (all,project:all,organization:all), with concrete project/organization selection happening on the consent screen via RFC 9396authorization_details.That broke the hosted MCP's OAuth flow in two independent ways, because the MCP mirrors the authorization server's entire
scopes_supportedinto its RFC 9728 protected-resource metadata and MCP clients request every scope advertised there:invalid_scope("Scope 'account.admin' is not supported") from the authorize endpoint.scopeparameter.The rest of the contract is intact: discovery, JWKS (RS256), and
audbinding viaresourceindicators all still work, and the issuer moving to the regional host (fra.cloud.appwrite.io) is already handled by the issuer-from-discovery logic.Auth fixes
openid profile email all project:all organization:all, intersected with the AS's livescopes_supportedso a renamed/removed scope is never advertised. Falls back to mirroring the full discovered list when none of the preferred scopes exist (self-hosted projects with custom scope catalogs). Overridable viaMCP_OAUTH_SCOPES(space-separated).Housekeeping (same branch)
constants.py: module-level constants consolidated into a single file, grouped by consumer. DedupesDEFAULT_ENDPOINT(was defined in bothauth.pyandserver.py); genericdocs_searchnames gained aDOCS_prefix; private-by-convention names dropped the leading underscore now that they're shared. Mutable/derived module state (_UPLOAD_TRANSPORT,SERVICE_CLASSES) stays where it lives.pyright(basic mode oversrc/, configured inpyproject.toml) added to the dev group and thelintjob, plus the AGENTS.md pre-PR checklist. Fixed the 13 pre-existing errors it found:Optionalnarrowing indocs_search._rank,ASGIApptyping forRequireBearer,AnyUrlconstruction for resource URIs,dict | Nonereturn onToolManager.get_tool, a missing inline-contentNoneguard in uploads,doc_param.description or "", the enum schema annotation, and theanyio.to_threadsubmodule import (previously relied on another library importing it first).Verification
fra.cloud.appwrite.ioauthorization server; 401 challenge intact./authorizewith the curated scope set +resource=https://mcp.appwrite.io/mcp— it 302s to the consent screen instead of erroring.Note
Production needs a deploy of this change — a bare restart would still hit failure mode 2.