feat(auth): browser OAuth project login — kbagent project login (PKCE) (0.54.0)#377
feat(auth): browser OAuth project login — kbagent project login (PKCE) (0.54.0)#377soustruh wants to merge 2 commits into
kbagent project login (PKCE) (0.54.0)#377Conversation
padak
left a comment
There was a problem hiding this comment.
Code review — kbagent project login (OAuth + PKCE)
Reviewed the full diff against CONTRIBUTING.md and verified the critical paths empirically on a local checkout of the branch. This is a carefully engineered PR (PKCE, CSRF, flock-serialized refresh, single silent-refresh chokepoint — really nice). But there is one blocking bug that makes the headline feature fail on every real stack, and the test suite cannot catch it because of how the prerequisite is structured.
Verdict: please do not merge yet — 1 blocking bug + feature is inert until the external prerequisite lands.
🔴 BLOCKING
1. mint_storage_token rejects 201 Created — project login fails on every real stack.
src/keboola_agent_cli/oauth.py L305:
if response.status_code != 200:
raise KeboolaApiError(f"Storage token creation rejected ({response.status_code}) ...")The Keboola Storage API returns 201 Created for POST /v2/storage/tokens, not 200. Confirmed two ways:
- The official
keboola/storage-api-php-client/apiary.apibdocuments success as201. - Empirically calling
mint_storage_tokenagainst a simulated 201 response:mint REJECTED a valid 201 response -> [OAUTH_ERROR] Storage token creation rejected (201) ...
So the OAuth dance succeeds, the code is exchanged, and the final mint step always fails with the misleading message "The OAuth access token may lack project access" — even though the token is fine.
Why no test caught it: the in-repo fake Connection server (tests/test_oauth.py L2219) returns _json(200, {"token": ...}). With a realistic 201, the red CI would have caught this.
Note the existing client validates success as status_code < 400 (http_base.py L155), which is why the classic KeboolaClient.create_token() works — the new oauth.py bypasses the shared client with a raw httpx.post and its own stricter == 200 check.
Fix: if response.status_code not in (200, 201): — and update the fake server in the test to return 201 so the regression can't come back. The 200-only check in _post_token_request is fine (League /oauth/token returns 200 OK per RFC 6749 §5.1); only the Storage mint is 201.
🟡 NON-BLOCKING (decide before merge)
2. External "OAuth client" prerequisite. The PR itself flags it: real stacks need a registered public OAuth client (league:oauth2-server:create-client kbagent-cli --public ..., owned by the Connection team). Until that lands, project login returns OAUTH_ERROR: invalid_client on every stack, cannot be E2E-tested (hence the E2E_OAUTH_INTERACTIVE gate), and that is exactly why bug #1 slipped through. Merging now ships code that no one has run against the real API, plus the PR's own open question ("which scope grants the project-token mapping for a non-MCP client") is unresolved. Suggest holding until the client is registered and the flow is verified once, or shipping the command explicitly marked preview/experimental after #1 is fixed.
3. Over-privileged minted token (least privilege). oauth.py L280 requests canManageBuckets, canManageDevBranches, and notably canManageTokens: True — an effectively master-grade token (can mint further tokens), albeit ~2 h lived and unscoped (no componentAccess[] like the existing create_token()). If Connection does not clamp to the user's role server-side, this is an escalation vector for a readOnly member. At minimum document why canManageTokens is required; ideally narrow it.
4. Re-login with a different alias creates a duplicate. In services/oauth_login_service.py::_persist: if the project is already registered as myproj and the user runs login --project prod, the re-login detection falls through to the else branch and a second entry for the same project is created — contradicting the "never duplicates" claim in the PR description. Edge case; a warning would suffice.
5. No retry on transient 5xx. mint_storage_token and _post_token_request do a one-shot httpx.post with no retry/backoff, unlike BaseHttpClient (429/5xx retry). A transient 503 during login kills the flow.
🟢 NIT
6. services/oauth_login_service.py L1462 (re-login branch): project.active_branch_id = ... is set on a project instance that is then discarded (edit_project takes kwargs). Dead/misleading code; active_branch_id survives only because edit_project doesn't touch it. Finish wiring it or drop the line.
7. Budget ambiguity: keboola-expert.md is 61,937 B (~60.5 KB). CLAUDE.md and CONTRIBUTING.md say "hard 60 KB budget" in two places, but the PR commit and the file itself work against "62 KB". Either the file is ~0.5 KB over the documented cap, or the docs disagree with reality. Reconcile the number.
✅ Verified good (not assumed)
- Permission model:
project.login→admininOPERATION_REGISTRY✓ - 3-layer architecture: clean —
oauth.py= L3 (no Typer/Rich),oauth_login_service= L2,project_login.py= L1 thin ✓ - Doc/plugin sync (convention #17):
context.py,CLAUDE.md,keboola-expert.md(version gate + gotcha),SKILL.md(triggers + regenerated table —make skill-checksays up-to-date),commands-reference.md,gotchas.md(since v0.54.0),changelog.py,plugin.json/marketplace.json— all present ✓ - serve REST skip: legitimate (terminal-only, interactive browser) and documented ✓
- Flow security: PKCE S256,
secrets.compare_digeston state (constant-time CSRF), binds127.0.0.1only, single-use callback, refresh token under 0600/atomic/flock, access token never persisted, tokens masked in logs and errors ✓ - Silent refresh: single chokepoint in
resolve_projects(), lazy import off the hot path,isinstancegate, flock + re-read-after-lock against rotation races ✓ - Lint/format/typecheck:
ruff check+formatclean;tyonly one unrelatedunresolved-importwarning (build script, non-blocking) ✓ - New-code tests:
tests/test_oauth.py35/35 passed ✓
Overall: excellent groundwork; fix the 201 status check (with a 201-returning fake server), and align the merge with the OAuth-client registration so the flow gets one real end-to-end run before users meet it.
padak
left a comment
There was a problem hiding this comment.
Review of #377 — feat(auth): browser OAuth project login — kbagent project login (PKCE) (0.54.0)
Generated by
kbagent-pr-reviewersubagent. Verdict and findings below
are advisory; the human author retains every veto. CI-coverable issues
(lint, format, tests) are confirmed viamake check, not duplicated here.
Summary
This PR adds kbagent project login — a browser-based Authorization Code + PKCE OAuth flow for adding projects without manual token copying. The architecture is sound: the OAuth layer is cleanly separated in oauth.py, silent refresh is wired at the single BaseService.resolve_projects() chokepoint, and all plugin/doc surfaces required by CONTRIBUTING.md convention #17 are updated. make check passes (3887 tests, 8 skipped), make typecheck is clean (0 new diagnostics), and make skill-check is green. Two genuine bugs were found, one of which is a security concern affecting token least-privilege.
Verdict: REQUEST CHANGES — two BLOCKING findings must be addressed before merge.
Verdict
- Verdict: REQUEST CHANGES
- Blocking findings: 2
- Non-blocking findings: 2
- Nits: 1
Blocking findings
[B-1] src/keboola_agent_cli/oauth.py:305 — mint_storage_token rejects HTTP 201, silently breaking OAuth login on any stack that returns 201 for POST /v2/storage/tokens
mint_storage_token uses a raw httpx.post() call with an explicit if response.status_code != 200 guard. The Keboola Storage API POST /v2/storage/tokens is a resource-creation endpoint; the analogous Manage API endpoint (POST /manage/projects/{id}/tokens) explicitly returns HTTP 201 per tests/test_manage_client.py:157, and the test comment at line 171 ("Creates token when API returns 200 instead of 201") establishes that 201 is the primary expected code for token creation. Every other place in this codebase that calls a Keboola creation endpoint goes through BaseHttpClient._do_request(), which accepts any status_code < 400 — so 200 and 201 both succeed there. mint_storage_token is the only callsite that hard-checks != 200, meaning a stack that returns 201 for the Bearer-auth token mint path would produce a spurious OAUTH_ERROR and abort every login and every silent refresh cycle (which runs at each BaseService.resolve_projects() call, i.e. on every kbagent command invocation for an OAuth project).
The test suite never mocks the endpoint returning 201 (all httpx_mock.add_response calls for /v2/storage/tokens omit status_code, which defaults to 200), so the bug would never be caught by CI.
Fix: change if response.status_code != 200: to if response.status_code not in (200, 201): (matching the pattern used elsewhere in client.py:1565 and client.py:1597), and add a test variant that mocks HTTP 201 for POST /v2/storage/tokens and asserts the token is returned correctly.
[B-2] src/keboola_agent_cli/oauth.py:284 — canManageTokens: True on a 2-hour minted Storage token is over-privileged
mint_storage_token requests "canManageTokens": True on the Storage token it mints via Authorization: Bearer. canManageTokens grants the holder the ability to: (a) create new Storage API tokens — including tokens without an expiresIn (permanent), (b) list all tokens on the project, and (c) revoke (delete) other tokens. The only kbagent command that actually requires canManageTokens on the token it holds is config oauth-url (gated explicitly in config_service.py:1942). Every other command (config list, storage buckets, workspace create, job run, etc.) does not need it.
The PR justification — "same flags as project refresh" — conflates two different things: project refresh mints a token intended to be the permanent project credential (needs canManageTokens because it will later be used to call config oauth-url), while the OAuth-minted token is a short-lived intermediary whose scope should be bounded to what kbagent actually calls during its 2-hour window.
Granting canManageTokens: True on a short-lived token creates a 2-hour window in which anyone who obtains the token (stolen config.json, debug log leak, memory scrape) can mint arbitrary new Storage tokens — including permanent ones — on the project, effectively escalating to a persistent foothold after the original OAuth token expires.
Fix: remove "canManageTokens": True from mint_storage_token. Users of kbagent config oauth-url with an OAuth-minted project will receive the existing MISSING_MASTER_TOKEN error; this is acceptable behavior (they should use project add --token with a master token if they need that command). If the project team decides config oauth-url support via OAuth-login is required, make canManageTokens an explicit opt-in parameter to mint_storage_token and document the trade-off.
Non-blocking findings
[NB-1] src/keboola_agent_cli/oauth.py:157 and oauth.py:275 — OAuth HTTP calls bypass BaseHttpClient retry logic; transient 429/500/503 produces a hard OAUTH_ERROR
Both _post_token_request (line 157, used for code exchange and token refresh) and mint_storage_token (line 275) use a standalone httpx.post() that bypasses BaseHttpClient._do_request(). The base client automatically retries on 429, 500, 502, 503, and 504 with exponential backoff (1s, 2s, 4s) up to 3 attempts — the standard behavior every other kbagent HTTP call gets. A transient rate-limit (429) or server hiccup (500) during a token refresh or mint call will instead surface as a hard OAUTH_ERROR (exit code 3) rather than a retried success.
Since the silent refresh fires inside resolve_projects() at the start of every kbagent command invocation for OAuth projects, a momentary API degradation could cause an entire kbagent storage tables run to fail with exit 3 — an error that looks like "re-run kbagent project login" rather than "wait a moment and retry." The impact is worse during the initial project login interactive flow where the token mint is the final step after the user has already completed the browser flow.
Fix (suggested, not required for merge): extract a small _post_json helper that wraps httpx.post() with 2-3 retries on RETRYABLE_STATUS_CODES, reusing constants from constants.py. This can be a follow-up rather than blocking the PR if the author prefers.
[NB-2] tests/test_e2e.py:11103 — TestE2EOAuthLogin is gated on a non-existent OAuth client registration; the real-stack E2E can never run in CI until a prerequisite ships
The E2E test test_interactive_browser_login is correctly gated behind @pytest.mark.skipif(not os.environ.get("E2E_OAUTH_INTERACTIVE"), ...). However, even with E2E_OAUTH_INTERACTIVE=1, the test cannot run until the kbagent public OAuth client (kbagent-cli) is registered on the target stack — a dependency outside this PR with no tracked issue or timeline. The only E2E test that runs unconditionally is test_login_command_surface (OAUTH-1), which merely checks --help output.
This means the full project login flow — including token minting, config persistence, and the re-authentication idempotency path — has no E2E coverage that can actually execute today. Per CONTRIBUTING.md: "Every new CLI command MUST have a corresponding E2E test in tests/test_e2e.py." The structural compliance is present but the gate condition means it effectively has zero E2E coverage for the foreseeable future.
This is a policy gap rather than a code bug. The fix should be tracked as an issue: "Register kbagent-cli OAuth client on connection.keboola.com to enable E2E_OAUTH_INTERACTIVE tests" — reference it in the PR description so reviewers know it is acknowledged.
Nits
[NIT-1]src/keboola_agent_cli/commands/_helpers.py:98— newly added"OAUTH_ERROR"inmap_error_to_exit_codeuses a raw string literal in a tuple comparison. The preexisting"INVALID_TOKEN"and"MISSING_MASTER_TOKEN"strings are the same antipattern. CONTRIBUTING.md mandatesErrorCodeenum usage.check_error_codes.pyonly checks keyword-argument form (error_code="...") so this slips through CI. The triple should read(ErrorCode.INVALID_TOKEN, ErrorCode.MISSING_MASTER_TOKEN, ErrorCode.OAUTH_ERROR)— sinceErrorCodeis aStrEnum, theincomparison works identically.
Verification log
gh pr view 377 --json title,body,files,additions,deletions,baseRefName,headRefName,labels,state→ 27 files, +2030/-38, state=OPEN, conventionalfeat(auth):prefix ✓git rev-parse --abbrev-ref HEAD→feat/oauth-project-login(correct worktree) ✓- Layer violation grep (typer in services, httpx in commands, formatter in clients) → empty ✓
make check→ 3887 passed, 8 skipped, 120 deselected ✓make typecheck→ 1 diagnostic (preexisting unresolved-import forhatchlinginscripts/hatch_build.py, not in PR code) ✓make skill-check→SKILL.md is up-to-date✓OPERATION_REGISTRYinpermissions.py→"project.login": "admin"present ✓AGENT_CONTEXTincommands/context.py→project loginentry present ✓CLAUDE.md ## All CLI Commands→kbagent project loginline present at 300 ✓plugins/kbagent/agents/keboola-expert.md §1 Rule 6 VERSION GATE→project login (browser OAuth + PKCE) = 0.54.0+at line 71 ✓plugins/kbagent/agents/keboola-expert.md §3 Inline Gotchas→project login is INTERACTIVE-ONLYat line 310 ✓plugins/kbagent/skills/kbagent/references/commands-reference.md→project loginentry present ✓plugins/kbagent/skills/kbagent/references/gotchas.md→## project login is interactive-only... (since v0.54.0)✓oauth.py:305—if response.status_code != 200:hard-checks exactly 200; analogous Manage API endpoint returns 201 pertests/test_manage_client.py:157; no test variant covers HTTP 201 response forPOST /v2/storage/tokens✓ (confirmed bug)oauth.py:284—"canManageTokens": Truepresent;config_service.py:1942confirms onlyconfig oauth-urlrequires it; all other commands function without it ✓ (confirmed concern)_helpers.py:98—"OAUTH_ERROR"raw string in tuple;check_error_codes.pydoes not scan tuple literals ✓ (confirmed)- Behavior verification (interactive browser flow): cannot reproduce — requires the registered public OAuth client on the stack, which is documented in the PR as not yet existing. Per the reviewer playbook this is an unverified behavior claim noted explicitly.
Open questions for the author
-
Has
POST /v2/storage/tokenswithAuthorization: Bearerbeen confirmed by direct API inspection (not test mocks) to return 200 (not 201)? If so, a comment in the code with a source reference would prevent future confusion and make the!= 200check defensible. -
Is
config oauth-urlintended to work for projects added viakbagent project login? If yes,canManageTokens: Trueis required (with documented trade-off). If no, the flag should be removed and the docs updated to say "useproject add --tokenwith a master token for OAuth URL generation."
Browser-based project authorization, an alternative to manual token setup. Follows the production MCP-server pattern. - oauth.py: PKCE S256, loopback callback, code exchange, refresh rotation, Storage-token minting via Bearer. - ProjectConfig gains an OAuthCredentials block; only the refresh token is persisted (0600 config.json). - BaseService.resolve_projects() renews expired minted tokens for CLI, MCP subprocess, and serve. - ErrorCode.OAUTH_ERROR maps to exit 3; project.login is admin. - Docs and plugin files synced; keboola-expert.md compressed to stay under its 62KB budget. Real stacks need a public OAuth client registered in Connection; see the PR description for the command.
The errors exist on main; CI never sees them because it runs only ruff and pytest, and the pre-commit hook checks staged files only.
85edf41 to
6e2195b
Compare
|
I know, it's not verified & complete yet, I marked it as a draft, but yep, I accidentally left it as ready for review for a couple of hours 😇 |
Summary
Adds
kbagent project login— browser-based OAuth project authorization (Authorization Code + PKCE, RFC 7636/8252) as an alternative to manual token setup. The user runskbagent project login --url <stack>, a browser opens, they log in and pick the project on the Connection consent screen, and the CLI receives credentials on a127.0.0.1loopback callback. No token copying.Architecture (the production MCP-server pattern, verified against
keboola/mcp-server)client_secret— PKCE S256 proves possession; CSRFstatevalidated on the callback./oauth/authorize+/oauth/token) — the same server the remote MCP server authenticates to in production.POST /v2/storage/tokenswithAuthorization: Bearer, then discarded. Queue API and AI Service don't accept Bearer yet (per the MCP server'sProxyAccessTokenworkaround), so the minted token keeps every existing command path unchanged.BaseService.resolve_projects()chokepoint — covers CLI commands, the MCP subprocess env, andkbagent servewith zero per-call-site changes. Refresh-token rotation (League revokes the old token) is persisted under an inter-process flock (.oauth-refresh.lock); a failed refresh degrades to a per-project 401 instead of killing multi-project fan-outs.active_branch_id, never duplicates).ErrorCode.OAUTH_ERROR→ exit 3;project.loginregistered asadminin the permission engine.⛔ External prerequisite (not in this PR)
Real stacks need a public OAuth client registered in Connection (owner: Connection team; same console command used for the MCP server registration, per the MCP Server Deployment Handbook):
League validates redirect URIs by exact match (no RFC 8252 any-loopback-port), hence the fixed port set. Until the cross-stack name is finalized,
KBAGENT_OAUTH_CLIENT_IDoverrides the defaultkbagent-cliid. Open question for the Connection team: which scope grants the project-token mapping for a non-MCP client (MCP usesemail+claudai).Verification
make checkgreen — 3887 passed, lint/format/changelog clean;make typecheckclean (fixed 2 pre-existing ty errors in touched test files).tests/test_oauth.py, including a full real-HTTP protocol round-trip against an in-repo fake Connection OAuth server that enforces PKCE S256 and League-style refresh rotation (wrong verifier rejected; revoked refresh token rejected).tests/test_e2e.py::TestE2EOAuthLogin— command-surface test runs in CI; the interactive real-stack browser test is gated onE2E_OAUTH_INTERACTIVE=1+ the registered client (cannot run until the prerequisite lands).serve REST endpoint: skipped
project loginis genuinely terminal-only (interactive browser + human consent), per the documented skip rule in CONTRIBUTING.Plugin/doc sync (convention #17)
AGENT_CONTEXT (
context.py),CLAUDE.mdcommand list,commands-reference.md,gotchas.md(since v0.54.0),keboola-expert.md(version gate + INTERACTIVE-ONLY gotcha; compressed stale passages to stay under the 62KB prompt budget), SKILL.md triggers + regenerated table,plugin.json/marketplace.jsonviamake version-sync.