Conversation
Toolbox API comes with a basic oauth2 client. This commit sets-up details about two important oauth flows: - authorization flow, in which the user is sent to web page where an authorization code is generated which is exchanged for an access token. - details about token refresh endpoint where users can obtain a new access token and a new refresh token. A couple of important aspects: - the client app id is resolved in upstream - as well as the actual endpoints for authorization and token refresh - S256 is the only code challenge supported
…ation url OAuth endpoint `.well-known/oauth-authorization-server` provides metadata about the endpoint for dynamic client registration and supported response types. This commit adds support for deserializing these values.
OAuth allows programatic client registration for apps like Coder Toolbox via the DCR endpoint which requires a name for the client app, the requested scopes, redirect URI, etc... DCR replies back with a similar structure but in addition it returs two very important properties: client_id - a unique client identifier string and also a client_secret - a secret string value used by clients to authenticate to the token endpoint.
Code Toolbox plugin should protect against authorization code interception attacks by making use of the PKCE security extension which involves a cryptographically random string (128 characters) known as code verifier and a code challenge - derived from code verifier using the S256 challenge method.
The OAuth2-compatible authentication manager provided by Toolbox
- authentication and token endpoints are now passed via the login configuration object - similar for client_id and client_secret - PCKE is now enabled
…injection - remove ServiceLocator dependency from CoderToolboxContext - move OAuth manager creation to CoderToolboxExtension for cleaner separation - Refactor CoderOAuthManager to use configuration-based approach instead of constructor injection The idea behind these changes is that createRefreshConfig API does not receive a configuration object that can provide the client id and secret and even the refresh url. So initially we worked around the issue by passing the necessary data via the constructor. However this approach means a couple of things: - the actual auth manager can be created only at a very late stage, when a URL is provided by users - can't easily pass arround the auth manager without coupling the components - have to recreate a new auth manager instance if the user logs out and logs in to a different URL - service locator needs to be passed around because this is the actual factory of oauth managers in Toolbox Instead, we went with a differet approach, COderOAuthManager will derive and store the refresh configs once the authorization config is received. If the user logs out and logs in to a different URL the refresh data is also guaranteed to be updated. And on top of that - this approach allows us to get rid of all of the issues mentioned above.
Toolbox can handle automatically the exchange of an authorization code with a token by handling the custom URI for oauth. This commit calls the necessary API in the Coder Toolbox URI handling.
POST /api/v2/oauth2-provider/apps is actually for manual admin registration for admin created apps. Programmatic Dynamic Client Registration is done via `POST /oauth2/register`. At the same time I included `registration_access_token` and `registration_client_uri` to use it later in order to refresh the client secret without re-registering the client app.
A bunch of code thrown around to launch the OAuth flow. Still needs a couple of things: - persist the client id and registration uri and token - re-use client id instead of re-register every time - properly handle scenarios where OAuth is not available - the OAuth right now can be enabled if we log out and then hit next in the deployment screen
A new config `preferAuthViaApiToken` allows users to continue to use API tokens for authentication when OAuth2 is available on the Coder deployment.
Account implementation with logic to resolve the account once the token is retrieved. Marshalling logic for the account is also added. There is a limitation in the Toolbox API where createRefreshConfig is not receiving the auth params. We worked around by capturing and storing these params in the createAuthConfig but this is unreliable. Instead we use the account to pass the missing info around.
OAuth2 should be launched if user prefers is over any other method of auth and if only the server supports it.
Fallback on client_secret_basic or None depending on what the Coder server supports.
…n endpoint Based on the auth method type we need to send client id and client secret as a basic auth header or part of the body as an encoded url form
We encountered a couple of issues with the Toolbox API which is inflexible:
- we don't have complete control over which parameters are sent as query&body
- we don't have fully basic + headers + body logging for debugging purposes
- doesn't integrate that well with our existing http client used for polling
- spent more than a couple of hours trying to understand why Coder rejects the
authorization call with:
```
{"error":"invalid_request","error_description":"The request is missing required parameters or is otherwise malformed"} from Coder server.
```
Instead we will slowly discard the existing logic and rely on enhancements to our existing http client.
Basically, the login screen will try to first determine if mTLS auth is configured and use that, otherwise
it will check if the user wants to use OAuth over API token, if available. When the flag is
true then the login screen will query the Coder server to see if OAuth2 is supported.
If that is true then browser is launched pointing to the authentication URL. If not we will default to
the API token authentication.
The OAuth2 server implementation needs to provide an authorization code that can be exchanged for an access token. But in order to make sure the authorization code is for the "our" login request, the client provides a state value when launching the authorization URL which the OAuth2 server has to send back when with the auth code. This fix makes sure the authorization code is actually sent, and that the state value is the same as in our initial request.
This fix reports an error to the user when token exchange request is failing, or returning an empty body or a body that does not contain the token.
The logic for exchanging auth code to tokens, refreshing tokens was used in multiple places without any code reuse strategy. Extracted an OAuth service that handles the basic operations.
The metadata endpoint provide an absolute URL for the client registration endpoint which we should use instead of hardcoding the path relative to the base url.
https://datatracker.ietf.org/doc/html/rfc7591 normalizes the client registration error responses and forces providers to always include json with an error code and an error message. This patch captures the error response and builds a pretty message and displays it to the user.
RFC 6749 §4.1.2.1 + RFC 7636 §4.4.1 specify that the error code and optional error_description can be returned as a query params int the callback URI. Similarly, RFC 6749 §5.2 — the exchange of authorization codes to tokens can return a json body containing an error code and an error message that was never handled in our code.
This upgrade will need TBX 3.4 or higher to be installed. The upgrade is needed to benefit from the fixes related to displaying UI pages in the URI handler. In addition I reworked the main build.gradle and extracted everything into a small custom plugin.
Due to the dependency on the new API.
OAuth callbacks are encoded, especially error details need to be decoded before surfacing them to the user.
We ended up with error messages like `An error was encountered: <error-code>: <some error description`. ":" is a bit repetitive.
matifali
left a comment
There was a problem hiding this comment.
Non-engineering approval. I am fine shipping this, given its opt-in.
Just ensure we provide a good experience in case the setting is enabled, but the deployment does not have CODER_EXPERIMENTS=aouth2 enabled.
Yes, I can confirm that OAuth happens only when the user explicitly enables the OAuth authentication AND the backed exposes the necessary endpoints. |
Go's html/template has a built-in security filter (urlFilter) that only allows http, https, and mailto URL schemes. Any other scheme gets replaced with #ZgotmplZ. The OAuth2 app's callback URL uses custom URI scheme which the filter considers unsafe. For example the Coder JetBrains plugin exposes a callback URI with the scheme jetbrains:// - which was effectively changed by the template engine into #ZgotmplZ. Of course this is not an actual callback. When users clicked the cancel button nothing happened. The fix was simple - we now wrap the apps registered callback URI into htmltemplate.URL. In addition, while testing this PR with coder/coder-jetbrains-toolbox#209 I discovered that we are also not compliant with https://www.rfc-editor.org/rfc/rfc6749#section-4.1.2.1 which requires the server to attach the local state if it was provided by the client in the original request. Also it is optional but generally a good practice to include `error_description` in the error responses. In fact we follow this pattern for the other types of error responses. So this is not a one off.
code-asher
left a comment
There was a problem hiding this comment.
I ran out of time but will finish tomorrow!
| throw Exception(errorMessage) | ||
| } | ||
|
|
||
| private fun createAuthorizationService(): CoderAuthorizationApi { |
There was a problem hiding this comment.
This is a nit, but when I initially saw the oauth2 auth service create yet another auth service it seemed weird to me, but this is really just an http/api client right? Not really doing any service-like things, I think?
In my mind it would involve state management to be a service, which has implications for how it should behave in the code (which is why I thought maybe it was a problem to recreate it above without updating the one on the class).
I guess in that sense OAuth2Service is not necessarily a service either, just a wrapper around the API calls. Actually, could these all be methods directly on the coder rest client? Feels to me like it could be part of the sdk, they are just more API endpoints after all.
There was a problem hiding this comment.
Hmm... this is an interesting point and I'd like to discuss/philosophize a bit over it.
First - CoderAuthorizationApi is just a Retrofit interface — it's an HTTP client definition, OAuth2Service is stateless — it just wraps those API calls with error handling.
Could these live on CoderRestClient? In principle yes — they're just more HTTP calls to the Coder deployment. But there's a practical issue : OAuth2Service is used before CoderRestClient exists. The discovery/registration/exchange calls are pre-authentication — they use a bare HTTP client (no auth interceptors, no token). CoderRestClient is constructed with a token or OAuth context already in hand, and its HTTP client is configured with auth interceptors. Mixing unauthenticated OAuth endpoint calls into that client would mean either:
- Building a second internal HTTP client without auth interceptors
- Making the auth interceptors conditional per-request
Both add complexity for little gain. So I think the current separation makes sense architecturally
Now regarding the naming - services usually orchestrate business logic. I guess I confused request construction with logic. I'll try to come up with a better name.
| val newAuthResponse = OAuth2Service(context).refreshToken(oauthContext!!) | ||
| this.oauthContext.tokenResponse = newAuthResponse |
There was a problem hiding this comment.
nbd at all but we use oauthContext without a this and then this.oauthContext, is there a reason for that?
I know I always say this haha but !! feels like a trap waiting to spring in the future, maybe we could pass in the context to refreshToken() or something?
There was a problem hiding this comment.
Arghhhh....I always tell myself - let's quickly throw two ! together and I'll rewrite the code later, for now let's make it work. As you have noticed...that later never comes :)
| block() | ||
| try { | ||
| val response = block() | ||
| if (response.code() == HttpURLConnection.HTTP_UNAUTHORIZED && oauthContext.hasRefreshToken()) { |
There was a problem hiding this comment.
If oauthContext is nullable would this not require a ?? Should we do an oauthContext.let or something? Would also let us get rid of that !! if we pass that around.
But I assume it must not require it since it is building, just not sure how haha
There was a problem hiding this comment.
It works because hasRefreshToken is an extension function on a nullable context:
fun CoderOAuthSessionContext?.hasRefreshToken(): Boolean = this?.tokenResponse?.refreshToken != null
There was a problem hiding this comment.
Ooooooo did not know that was possible. Neat.
| refreshToken() | ||
| true | ||
| } catch (e: Exception) { | ||
| context.logger.error(e, "Failed to refresh access token") |
There was a problem hiding this comment.
If the refresh fails, do we need to look at the response and possibly discard the token? If it is some kind of permanent auth failure. Otherwise I imagine we would keep trying to refresh with the same token.
There was a problem hiding this comment.
I think I'll log an issue. There are transient reasons (network failure??) for which it is worth retrying at the next API call. But for other errors like invalid_grant or unauthorized_client it is probably pointless to retry again. And it is not easy to tackle this problem - do we disrupt the user and remove his workspaces, stop the ongoing ssh connections and popup the login screen, or do we stop the world but still keep the workspaces visible? I think it is worth pondering over and maybe involve @matifali in the discussion, but in a separate ticket.
code-asher
left a comment
There was a problem hiding this comment.
Tried it out and oauth worked fabulously! Just FYI I only reviewed the oauth stuff and skipped buildSrc.
Had some observations (some of these might be pre-existing issues):
- If I deselect the oauth option in the settings it seems to immediately go back to using the API key. But the reverse seems not to be true, if I select oauth while already logged in then it keeps using my API key (until I restart Toolbox).
Is that what we want? I think I expected my session to keep using whatever it used to log in until I explicitly logged out. I also kind of expected my API key to be deleted once I switched to oauth (or rather I expected it would be overridden with the oauth access token), but it seems to have been preserved, so even if I explicitly log out and then disable oauth, it still authenticates with the API key which was surprising to me.
-
This one is kinda wacky and probably not going to happen in practice so might not be worth addressing, but if I try to log in, then disable oauth, then click "allow", the error message does not really make sense, says "oauth or api token is required" but I feel like the error should really be something like "unable to log in with oauth because it is disabled". Or, we should remember what we used to start the login and carry that through the whole process.
-
When I launched Toolbox I got "Error encountered while setting up Coder" and "authorization failed" which looks a bit scary but according to the logs my API key was invalid (makes sense, I have not launched Toolbox in a long while). We should add the actual error message to that dialog.
-
If I click out of Toolbox while the cli is downloading (for example if I close the browser window opened by the oauth process) then Toolbox closes itself (idk why they made it like this) and if I re-open it I am back on the URL screen. It looks like in the background the setup is still ongoing though. If I try to log in again, they seem to clash and I do get the workspaces list but I also get the security dialog (maybe because it is trying to use a file that got overwritten by the other). Same thing happens if I click "back" and log in again. Not sure what happens if I do that and try to log into a different deployment while the other is ongoing.
| refreshOAuthToken() | ||
| oauthSession = CoderSetupWizardContext.oauthSession!!.copy() |
There was a problem hiding this comment.
Could we call refreshToken on the rest client instead of duplicating it here? Would have to create the rest client first of course. Mostly it feels kinda scattered to me with how we have two copies of oauth session context and we have to kinda glue them together.
Also maybe we could return the session to avoid !!.
There was a problem hiding this comment.
I tried to do it quickly - but there are a number of issues:
- cli also needs to be initialized
- I need to find a good way to do the refresh token in the http client before any other API call is made.
Is it alright if I log an issue and treat this separetly?
There was a problem hiding this comment.
if I deselect the oauth option in the settings it seems to immediately go back to using the API key. But the reverse seems not to be true
This sounds really scary. If you are logged via oauth, and go to the Settings page and deselect the "Prefer OAuth.." the API token should be used only at the next restart or if you log out and log in again. The client should not terminate automatically, and go through the login screen again. Is this what you are experiencing (I can't reproduce it)
And yeah... it was intentional to keep the API token stored.
@code-asher do you think I can treat all of these into a separate issue/PR? it is a lot of stuff that I need to test and think about it. I will need Atif's help to decide the behavior for some of these scenarios (like removing the API token)
There was a problem hiding this comment.
do you think I can treat all of these into a separate issue/PR
Yes of course!
If you are logged via oauth, and go to the Settings page and deselect the "Prefer OAuth.." the API token should be used only at the next restart or if you log out and log in again. The client should not terminate automatically, and go through the login screen again. Is this what you are experiencing
What happened is that I deselected "prefer oauth", then I revoked permissions for the oauth app in my Coder dashboard. But it kept connecting to workspaces just fine.
To me, this felt unexpected, I thought the current session would keep using oauth (and so connecting would result in an auth failure), and for a second it felt like somehow it had bypassed auth lol until I realized it was probably just using my old API key I had configured before switching to oauth.
I think it is because we check the "prefer oauth" setting each time rather than having that info embedded as part of the session itself. IMO it should only affect the decision we make when first logging in.
Currently, named OAuth2Service which oversells itself because it only orchestrates http request construction, there is no business logic orchestration or state management.
We used to share oauth context as a global val that could be mutated once a token had to refreshed. Instead, we changed the code pass a modified copy of this oauth context with the refreshed token.
Go's html/template has a built-in security filter (urlFilter) that only allows http, https, and mailto URL schemes. Any other scheme gets replaced with #ZgotmplZ. The OAuth2 app's callback URL uses custom URI scheme which the filter considers unsafe. For example the Coder JetBrains plugin exposes a callback URI with the scheme jetbrains:// - which was effectively changed by the template engine into #ZgotmplZ. Of course this is not an actual callback. When users clicked the cancel button nothing happened. The fix was simple - we now wrap the apps registered callback URI into htmltemplate.URL. Usually this needs some validation otherwise the linter will complain about it. The callback URI used by the Cancel logic is actually validated by our backend when the client app programmatically registered via the dynamic OAuth2 registration endpoints, so we refactored the validation around that code and re-used some of it in the Cancel handling to make sure we don't allow URIs like `javascript` and `data`, even though in theory these URIs were already validated. In addition, while testing this PR with coder/coder-jetbrains-toolbox#209 I discovered that we are also not compliant with https://www.rfc-editor.org/rfc/rfc6749#section-4.1.2.1 which requires the server to attach the local state if it was provided by the client in the original request. Also it is optional but generally a good practice to include `error_description` in the error responses. In fact we follow this pattern for the other types of error responses. So this is not a one off. - resolves #20323 <img width="1485" height="771" alt="Cancel_page_with_invalid_uri" src="https://github.com/user-attachments/assets/5539d234-9ce3-4dda-b421-d023fc9aa99e" /> <img width="486" height="746" alt="Coder Toolbox handling the Cancel button" src="https://github.com/user-attachments/assets/acab71a6-d29c-4fa9-80ba-3c0095bbdc8f" /> <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. -->
code-asher
left a comment
There was a problem hiding this comment.
Putting down an approval, we can follow up on anything remaining separately!
Recent versions of Coder act as an OAuth 2.1 authorization server for first- and third‑party applications.
This PR aims at providing support for authenticating via OAuth with Coder Toolbox and still retain backward compatibility for authentication via API tokens or via certificates.
This PR is a WIP: