Skip to content

feat(auth): browser OAuth project login — kbagent project login (PKCE) (0.54.0)#377

Draft
soustruh wants to merge 2 commits into
mainfrom
feat/oauth-project-login
Draft

feat(auth): browser OAuth project login — kbagent project login (PKCE) (0.54.0)#377
soustruh wants to merge 2 commits into
mainfrom
feat/oauth-project-login

Conversation

@soustruh
Copy link
Copy Markdown

@soustruh soustruh commented Jun 2, 2026

Summary

Adds kbagent project login — browser-based OAuth project authorization (Authorization Code + PKCE, RFC 7636/8252) as an alternative to manual token setup. The user runs kbagent project login --url <stack>, a browser opens, they log in and pick the project on the Connection consent screen, and the CLI receives credentials on a 127.0.0.1 loopback callback. No token copying.

Architecture (the production MCP-server pattern, verified against keboola/mcp-server)

  • Public client, no client_secret — PKCE S256 proves possession; CSRF state validated on the callback.
  • Targets the Connection League OAuth2 server (/oauth/authorize + /oauth/token) — the same server the remote MCP server authenticates to in production.
  • Hybrid token model: the OAuth access token is used transiently to mint a short-lived Storage API token via POST /v2/storage/tokens with Authorization: Bearer, then discarded. Queue API and AI Service don't accept Bearer yet (per the MCP server's ProxyAccessToken workaround), so the minted token keeps every existing command path unchanged.
  • Silent renewal at the BaseService.resolve_projects() chokepoint — covers CLI commands, the MCP subprocess env, and kbagent serve with zero per-call-site changes. Refresh-token rotation (League revokes the old token) is persisted under an inter-process flock (.oauth-refresh.lock); a failed refresh degrades to a per-project 401 instead of killing multi-project fan-outs.
  • Re-login into a registered project updates it in place (preserves active_branch_id, never duplicates).
  • Refresh token persisted in config.json under the existing 0600/atomic/flock protections; the OAuth access token is never written to disk. New ErrorCode.OAUTH_ERROR → exit 3; project.login registered as admin in the permission engine.

⛔ External prerequisite (not in this PR)

Real stacks need a public OAuth client registered in Connection (owner: Connection team; same console command used for the MCP server registration, per the MCP Server Deployment Handbook):

bin/console league:oauth2-server:create-client kbagent-cli \
  --grant-type authorization_code --grant-type refresh_token \
  --redirect-uri http://127.0.0.1:8765/callback \
  --redirect-uri http://127.0.0.1:8766/callback \
  --redirect-uri http://127.0.0.1:8767/callback \
  --redirect-uri http://127.0.0.1:8768/callback \
  --redirect-uri http://127.0.0.1:8769/callback \
  --public

League validates redirect URIs by exact match (no RFC 8252 any-loopback-port), hence the fixed port set. Until the cross-stack name is finalized, KBAGENT_OAUTH_CLIENT_ID overrides the default kbagent-cli id. Open question for the Connection team: which scope grants the project-token mapping for a non-MCP client (MCP uses email + claudai).

Verification

  • make check green — 3887 passed, lint/format/changelog clean; make typecheck clean (fixed 2 pre-existing ty errors in touched test files).
  • 35 new tests in tests/test_oauth.py, including a full real-HTTP protocol round-trip against an in-repo fake Connection OAuth server that enforces PKCE S256 and League-style refresh rotation (wrong verifier rejected; revoked refresh token rejected).
  • Live local demo: login → 0600 persistence → re-login in place → silent refresh with rotation persisted on disk.
  • tests/test_e2e.py::TestE2EOAuthLogin — command-surface test runs in CI; the interactive real-stack browser test is gated on E2E_OAUTH_INTERACTIVE=1 + the registered client (cannot run until the prerequisite lands).

serve REST endpoint: skipped

project login is genuinely terminal-only (interactive browser + human consent), per the documented skip rule in CONTRIBUTING.

Plugin/doc sync (convention #17)

AGENT_CONTEXT (context.py), CLAUDE.md command list, commands-reference.md, gotchas.md (since v0.54.0), keboola-expert.md (version gate + INTERACTIVE-ONLY gotcha; compressed stale passages to stay under the 62KB prompt budget), SKILL.md triggers + regenerated table, plugin.json/marketplace.json via make version-sync.

Copy link
Copy Markdown
Member

@padak padak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review — kbagent project login (OAuth + PKCE)

Reviewed the full diff against CONTRIBUTING.md and verified the critical paths empirically on a local checkout of the branch. This is a carefully engineered PR (PKCE, CSRF, flock-serialized refresh, single silent-refresh chokepoint — really nice). But there is one blocking bug that makes the headline feature fail on every real stack, and the test suite cannot catch it because of how the prerequisite is structured.

Verdict: please do not merge yet — 1 blocking bug + feature is inert until the external prerequisite lands.


🔴 BLOCKING

1. mint_storage_token rejects 201 Createdproject login fails on every real stack.

src/keboola_agent_cli/oauth.py L305:

if response.status_code != 200:
    raise KeboolaApiError(f"Storage token creation rejected ({response.status_code}) ...")

The Keboola Storage API returns 201 Created for POST /v2/storage/tokens, not 200. Confirmed two ways:

  • The official keboola/storage-api-php-client/apiary.apib documents success as 201.
  • Empirically calling mint_storage_token against a simulated 201 response:
    mint REJECTED a valid 201 response -> [OAUTH_ERROR] Storage token creation rejected (201) ...
    

So the OAuth dance succeeds, the code is exchanged, and the final mint step always fails with the misleading message "The OAuth access token may lack project access" — even though the token is fine.

Why no test caught it: the in-repo fake Connection server (tests/test_oauth.py L2219) returns _json(200, {"token": ...}). With a realistic 201, the red CI would have caught this.

Note the existing client validates success as status_code < 400 (http_base.py L155), which is why the classic KeboolaClient.create_token() works — the new oauth.py bypasses the shared client with a raw httpx.post and its own stricter == 200 check.

Fix: if response.status_code not in (200, 201): — and update the fake server in the test to return 201 so the regression can't come back. The 200-only check in _post_token_request is fine (League /oauth/token returns 200 OK per RFC 6749 §5.1); only the Storage mint is 201.


🟡 NON-BLOCKING (decide before merge)

2. External "OAuth client" prerequisite. The PR itself flags it: real stacks need a registered public OAuth client (league:oauth2-server:create-client kbagent-cli --public ..., owned by the Connection team). Until that lands, project login returns OAUTH_ERROR: invalid_client on every stack, cannot be E2E-tested (hence the E2E_OAUTH_INTERACTIVE gate), and that is exactly why bug #1 slipped through. Merging now ships code that no one has run against the real API, plus the PR's own open question ("which scope grants the project-token mapping for a non-MCP client") is unresolved. Suggest holding until the client is registered and the flow is verified once, or shipping the command explicitly marked preview/experimental after #1 is fixed.

3. Over-privileged minted token (least privilege). oauth.py L280 requests canManageBuckets, canManageDevBranches, and notably canManageTokens: True — an effectively master-grade token (can mint further tokens), albeit ~2 h lived and unscoped (no componentAccess[] like the existing create_token()). If Connection does not clamp to the user's role server-side, this is an escalation vector for a readOnly member. At minimum document why canManageTokens is required; ideally narrow it.

4. Re-login with a different alias creates a duplicate. In services/oauth_login_service.py::_persist: if the project is already registered as myproj and the user runs login --project prod, the re-login detection falls through to the else branch and a second entry for the same project is created — contradicting the "never duplicates" claim in the PR description. Edge case; a warning would suffice.

5. No retry on transient 5xx. mint_storage_token and _post_token_request do a one-shot httpx.post with no retry/backoff, unlike BaseHttpClient (429/5xx retry). A transient 503 during login kills the flow.


🟢 NIT

6. services/oauth_login_service.py L1462 (re-login branch): project.active_branch_id = ... is set on a project instance that is then discarded (edit_project takes kwargs). Dead/misleading code; active_branch_id survives only because edit_project doesn't touch it. Finish wiring it or drop the line.

7. Budget ambiguity: keboola-expert.md is 61,937 B (~60.5 KB). CLAUDE.md and CONTRIBUTING.md say "hard 60 KB budget" in two places, but the PR commit and the file itself work against "62 KB". Either the file is ~0.5 KB over the documented cap, or the docs disagree with reality. Reconcile the number.


✅ Verified good (not assumed)

  • Permission model: project.loginadmin in OPERATION_REGISTRY
  • 3-layer architecture: clean — oauth.py = L3 (no Typer/Rich), oauth_login_service = L2, project_login.py = L1 thin ✓
  • Doc/plugin sync (convention #17): context.py, CLAUDE.md, keboola-expert.md (version gate + gotcha), SKILL.md (triggers + regenerated table — make skill-check says up-to-date), commands-reference.md, gotchas.md (since v0.54.0), changelog.py, plugin.json/marketplace.json — all present ✓
  • serve REST skip: legitimate (terminal-only, interactive browser) and documented ✓
  • Flow security: PKCE S256, secrets.compare_digest on state (constant-time CSRF), binds 127.0.0.1 only, single-use callback, refresh token under 0600/atomic/flock, access token never persisted, tokens masked in logs and errors ✓
  • Silent refresh: single chokepoint in resolve_projects(), lazy import off the hot path, isinstance gate, flock + re-read-after-lock against rotation races ✓
  • Lint/format/typecheck: ruff check + format clean; ty only one unrelated unresolved-import warning (build script, non-blocking) ✓
  • New-code tests: tests/test_oauth.py 35/35 passed ✓

Overall: excellent groundwork; fix the 201 status check (with a 201-returning fake server), and align the merge with the OAuth-client registration so the flow gets one real end-to-end run before users meet it.

Copy link
Copy Markdown
Member

@padak padak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of #377 — feat(auth): browser OAuth project login — kbagent project login (PKCE) (0.54.0)

Generated by kbagent-pr-reviewer subagent. Verdict and findings below
are advisory; the human author retains every veto. CI-coverable issues
(lint, format, tests) are confirmed via make check, not duplicated here.

Summary

This PR adds kbagent project login — a browser-based Authorization Code + PKCE OAuth flow for adding projects without manual token copying. The architecture is sound: the OAuth layer is cleanly separated in oauth.py, silent refresh is wired at the single BaseService.resolve_projects() chokepoint, and all plugin/doc surfaces required by CONTRIBUTING.md convention #17 are updated. make check passes (3887 tests, 8 skipped), make typecheck is clean (0 new diagnostics), and make skill-check is green. Two genuine bugs were found, one of which is a security concern affecting token least-privilege.

Verdict: REQUEST CHANGES — two BLOCKING findings must be addressed before merge.

Verdict

  • Verdict: REQUEST CHANGES
  • Blocking findings: 2
  • Non-blocking findings: 2
  • Nits: 1

Blocking findings

[B-1] src/keboola_agent_cli/oauth.py:305mint_storage_token rejects HTTP 201, silently breaking OAuth login on any stack that returns 201 for POST /v2/storage/tokens

mint_storage_token uses a raw httpx.post() call with an explicit if response.status_code != 200 guard. The Keboola Storage API POST /v2/storage/tokens is a resource-creation endpoint; the analogous Manage API endpoint (POST /manage/projects/{id}/tokens) explicitly returns HTTP 201 per tests/test_manage_client.py:157, and the test comment at line 171 ("Creates token when API returns 200 instead of 201") establishes that 201 is the primary expected code for token creation. Every other place in this codebase that calls a Keboola creation endpoint goes through BaseHttpClient._do_request(), which accepts any status_code < 400 — so 200 and 201 both succeed there. mint_storage_token is the only callsite that hard-checks != 200, meaning a stack that returns 201 for the Bearer-auth token mint path would produce a spurious OAUTH_ERROR and abort every login and every silent refresh cycle (which runs at each BaseService.resolve_projects() call, i.e. on every kbagent command invocation for an OAuth project).

The test suite never mocks the endpoint returning 201 (all httpx_mock.add_response calls for /v2/storage/tokens omit status_code, which defaults to 200), so the bug would never be caught by CI.

Fix: change if response.status_code != 200: to if response.status_code not in (200, 201): (matching the pattern used elsewhere in client.py:1565 and client.py:1597), and add a test variant that mocks HTTP 201 for POST /v2/storage/tokens and asserts the token is returned correctly.

[B-2] src/keboola_agent_cli/oauth.py:284canManageTokens: True on a 2-hour minted Storage token is over-privileged

mint_storage_token requests "canManageTokens": True on the Storage token it mints via Authorization: Bearer. canManageTokens grants the holder the ability to: (a) create new Storage API tokens — including tokens without an expiresIn (permanent), (b) list all tokens on the project, and (c) revoke (delete) other tokens. The only kbagent command that actually requires canManageTokens on the token it holds is config oauth-url (gated explicitly in config_service.py:1942). Every other command (config list, storage buckets, workspace create, job run, etc.) does not need it.

The PR justification — "same flags as project refresh" — conflates two different things: project refresh mints a token intended to be the permanent project credential (needs canManageTokens because it will later be used to call config oauth-url), while the OAuth-minted token is a short-lived intermediary whose scope should be bounded to what kbagent actually calls during its 2-hour window.

Granting canManageTokens: True on a short-lived token creates a 2-hour window in which anyone who obtains the token (stolen config.json, debug log leak, memory scrape) can mint arbitrary new Storage tokens — including permanent ones — on the project, effectively escalating to a persistent foothold after the original OAuth token expires.

Fix: remove "canManageTokens": True from mint_storage_token. Users of kbagent config oauth-url with an OAuth-minted project will receive the existing MISSING_MASTER_TOKEN error; this is acceptable behavior (they should use project add --token with a master token if they need that command). If the project team decides config oauth-url support via OAuth-login is required, make canManageTokens an explicit opt-in parameter to mint_storage_token and document the trade-off.

Non-blocking findings

[NB-1] src/keboola_agent_cli/oauth.py:157 and oauth.py:275 — OAuth HTTP calls bypass BaseHttpClient retry logic; transient 429/500/503 produces a hard OAUTH_ERROR

Both _post_token_request (line 157, used for code exchange and token refresh) and mint_storage_token (line 275) use a standalone httpx.post() that bypasses BaseHttpClient._do_request(). The base client automatically retries on 429, 500, 502, 503, and 504 with exponential backoff (1s, 2s, 4s) up to 3 attempts — the standard behavior every other kbagent HTTP call gets. A transient rate-limit (429) or server hiccup (500) during a token refresh or mint call will instead surface as a hard OAUTH_ERROR (exit code 3) rather than a retried success.

Since the silent refresh fires inside resolve_projects() at the start of every kbagent command invocation for OAuth projects, a momentary API degradation could cause an entire kbagent storage tables run to fail with exit 3 — an error that looks like "re-run kbagent project login" rather than "wait a moment and retry." The impact is worse during the initial project login interactive flow where the token mint is the final step after the user has already completed the browser flow.

Fix (suggested, not required for merge): extract a small _post_json helper that wraps httpx.post() with 2-3 retries on RETRYABLE_STATUS_CODES, reusing constants from constants.py. This can be a follow-up rather than blocking the PR if the author prefers.

[NB-2] tests/test_e2e.py:11103TestE2EOAuthLogin is gated on a non-existent OAuth client registration; the real-stack E2E can never run in CI until a prerequisite ships

The E2E test test_interactive_browser_login is correctly gated behind @pytest.mark.skipif(not os.environ.get("E2E_OAUTH_INTERACTIVE"), ...). However, even with E2E_OAUTH_INTERACTIVE=1, the test cannot run until the kbagent public OAuth client (kbagent-cli) is registered on the target stack — a dependency outside this PR with no tracked issue or timeline. The only E2E test that runs unconditionally is test_login_command_surface (OAUTH-1), which merely checks --help output.

This means the full project login flow — including token minting, config persistence, and the re-authentication idempotency path — has no E2E coverage that can actually execute today. Per CONTRIBUTING.md: "Every new CLI command MUST have a corresponding E2E test in tests/test_e2e.py." The structural compliance is present but the gate condition means it effectively has zero E2E coverage for the foreseeable future.

This is a policy gap rather than a code bug. The fix should be tracked as an issue: "Register kbagent-cli OAuth client on connection.keboola.com to enable E2E_OAUTH_INTERACTIVE tests" — reference it in the PR description so reviewers know it is acknowledged.

Nits

  • [NIT-1] src/keboola_agent_cli/commands/_helpers.py:98 — newly added "OAUTH_ERROR" in map_error_to_exit_code uses a raw string literal in a tuple comparison. The preexisting "INVALID_TOKEN" and "MISSING_MASTER_TOKEN" strings are the same antipattern. CONTRIBUTING.md mandates ErrorCode enum usage. check_error_codes.py only checks keyword-argument form (error_code="...") so this slips through CI. The triple should read (ErrorCode.INVALID_TOKEN, ErrorCode.MISSING_MASTER_TOKEN, ErrorCode.OAUTH_ERROR) — since ErrorCode is a StrEnum, the in comparison works identically.

Verification log

  • gh pr view 377 --json title,body,files,additions,deletions,baseRefName,headRefName,labels,state → 27 files, +2030/-38, state=OPEN, conventional feat(auth): prefix ✓
  • git rev-parse --abbrev-ref HEADfeat/oauth-project-login (correct worktree) ✓
  • Layer violation grep (typer in services, httpx in commands, formatter in clients) → empty ✓
  • make check → 3887 passed, 8 skipped, 120 deselected ✓
  • make typecheck → 1 diagnostic (preexisting unresolved-import for hatchling in scripts/hatch_build.py, not in PR code) ✓
  • make skill-checkSKILL.md is up-to-date
  • OPERATION_REGISTRY in permissions.py"project.login": "admin" present ✓
  • AGENT_CONTEXT in commands/context.pyproject login entry present ✓
  • CLAUDE.md ## All CLI Commandskbagent project login line present at 300 ✓
  • plugins/kbagent/agents/keboola-expert.md §1 Rule 6 VERSION GATEproject login (browser OAuth + PKCE) = 0.54.0+ at line 71 ✓
  • plugins/kbagent/agents/keboola-expert.md §3 Inline Gotchasproject login is INTERACTIVE-ONLY at line 310 ✓
  • plugins/kbagent/skills/kbagent/references/commands-reference.mdproject login entry present ✓
  • plugins/kbagent/skills/kbagent/references/gotchas.md## project login is interactive-only... (since v0.54.0)
  • oauth.py:305if response.status_code != 200: hard-checks exactly 200; analogous Manage API endpoint returns 201 per tests/test_manage_client.py:157; no test variant covers HTTP 201 response for POST /v2/storage/tokens ✓ (confirmed bug)
  • oauth.py:284"canManageTokens": True present; config_service.py:1942 confirms only config oauth-url requires it; all other commands function without it ✓ (confirmed concern)
  • _helpers.py:98"OAUTH_ERROR" raw string in tuple; check_error_codes.py does not scan tuple literals ✓ (confirmed)
  • Behavior verification (interactive browser flow): cannot reproduce — requires the registered public OAuth client on the stack, which is documented in the PR as not yet existing. Per the reviewer playbook this is an unverified behavior claim noted explicitly.

Open questions for the author

  1. Has POST /v2/storage/tokens with Authorization: Bearer been confirmed by direct API inspection (not test mocks) to return 200 (not 201)? If so, a comment in the code with a source reference would prevent future confusion and make the != 200 check defensible.

  2. Is config oauth-url intended to work for projects added via kbagent project login? If yes, canManageTokens: True is required (with documented trade-off). If no, the flag should be removed and the docs updated to say "use project add --token with a master token for OAuth URL generation."

@soustruh soustruh marked this pull request as draft June 2, 2026 22:56
soustruh added 2 commits June 3, 2026 00:57
Browser-based project authorization, an alternative to manual token
setup. Follows the production MCP-server pattern.

- oauth.py: PKCE S256, loopback callback, code exchange, refresh
  rotation, Storage-token minting via Bearer.
- ProjectConfig gains an OAuthCredentials block; only the refresh
  token is persisted (0600 config.json).
- BaseService.resolve_projects() renews expired minted tokens for
  CLI, MCP subprocess, and serve.
- ErrorCode.OAUTH_ERROR maps to exit 3; project.login is admin.
- Docs and plugin files synced; keboola-expert.md compressed to stay
  under its 62KB budget.

Real stacks need a public OAuth client registered in Connection; see
the PR description for the command.
The errors exist on main; CI never sees them because it runs only
ruff and pytest, and the pre-commit hook checks staged files only.
@soustruh soustruh force-pushed the feat/oauth-project-login branch from 85edf41 to 6e2195b Compare June 2, 2026 23:02
Copy link
Copy Markdown
Author

soustruh commented Jun 2, 2026

I know, it's not verified & complete yet, I marked it as a draft, but yep, I accidentally left it as ready for review for a couple of hours 😇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants