feat(pools): channel parity (REST + Telegram) and refresh-grant token-host scoping#47
Merged
Conversation
Move pool create/list/status/rotate/remove logic into internal/poolops so CLI, REST, and Telegram share one implementation. Adds structured errors (already-pooled member, live-member removal, delete conflict) returned channel-neutrally.
sluice pool create|list|status|rotate|remove now delegate to the shared poolops package; rotate-race hint points to 'sluice pool status'.
Add /api/pools CRUD plus status/rotate via poolops; pool-create honors the 409 conflict contract and returns structured pool-delete conflict payloads. Token-host grant parse restricted to form bodies.
Telegram bot reaches pool management through poolops, matching CLI and REST channel parity.
…grants Restrict the pool token-host phantom split-host expansion to refresh_token grant requests so non-refresh traffic is unaffected; parse only form bodies.
Reword the pool failover Telegram notice into human-readable text (task 4b).
Document the pool management surfaces across CLI/REST/Telegram in CLAUDE.md and README, and record the channel-feature-parity plan (incl. task 4b) as completed.
… request data split the pool-delete 409 body into PoolReferencedErrorResponse so the generic ErrorResponse envelope is no longer coupled to one endpoint (regenerated api.gen.go via oapi-codegen). PostApiPools now builds the 201 body from the request members + failover default instead of gating on a store read-back, so a read-back error can no longer report a successful create as a 500. poolStatusError no longer checks ErrCredentialInUseByPool, which is unreachable from the pool handlers (raised only by the credential-removal path).
…probe cap requestFlowGrantType now skips the string(body)+ParseQuery grant_type probe unless the request is an HTTP POST whose scheme+host matches a known OAuth token endpoint (new OAuthIndex.MatchesHost), removing the per-request body parse on the hot path for non-OAuth traffic. The form/JSON parsers in extractRequestRefreshToken and requestGrantType no longer double-stringify the body for an explicit form Content-Type. The probe cap is raised 8KiB->64KiB so a large RFC 7523 refresh payload at a pool token host is still expanded, with a rate-limited WARNING when the cap truncates a probe. StreamRequestModifier's buffered-path-only limitation is documented explicitly.
FormatFailoverNotice rendered '(unknown reason)' / 'after unknown reason' when the reason tag was empty. Drop the reason clause entirely in that case for both the exhausted and normal messages. The cred_failover / pool_exhausted audit Reason format is unchanged.
the /pool status hint in the rotate-race message wrote the raw pool name into an HTML-parsed reply, so a name with <,>,& would break rendering; htmlCode it like the other occurrence. The pool-remove test guarded on 'err == nil', which GetPool returns even for a missing pool; assert the pool row is actually gone instead.
the /api/* curl examples omit the Authorization header though the endpoints are behind BearerAuth; add a single leading note covering the credential, pool, and rule examples.
Invert poolCreateError: conflict sentinels stay 409, genuine client-input validation sentinels are 400, and everything else (tx/DB/INSERT failures inside store.CreatePoolWithMembers, which are wrapped fmt.Errorf strings with no sentinel) now defaults to 500 instead of being misclassified as a client 400. Add typed, errors.Is-able sentinels in internal/store for the client-validation cases that previously had only wrapped messages (ErrPoolNoMembers, ErrPoolStrategyInvalid, ErrPoolMemberDuplicate, ErrPoolMemberNotFound, ErrPoolMemberNotOAuth), wrapped at origin so the existing human-readable text is preserved verbatim. Document the new 500 response on POST /api/pools in the OpenAPI spec and regenerate.
…pty case humanizeFailoverReason now renders an unknown tag as 'unknown reason (<tag>)' so the surrounding 'failed over ... after %s.' and exhausted '... to (%s).' clauses read naturally instead of the redundant 'failed over ... after failover (<tag>).'. Remove the unreachable case "": branch: FormatFailoverNotice is the sole caller and short-circuits an empty reason before ever calling here, so the empty-tag wording has a single source of truth there. Update the notice/humanize tests accordingly.
poolStatus sends with HTML parse mode (htmlCode emits <code>), but appended m.LastFailureReason (upstream error text, may contain < > &) raw, which breaks rendering or is rejected by the Bot API. Escape it with htmlEscape (prose, not an identifier, so not htmlCode) consistent with every other user-facing value on that line. Add a test asserting a reason with < > & is rendered escaped.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Brings credential pools to feature parity across all management channels and fixes a grant-blind token-host rewrite.
Pools were CLI-only (
sluice pool create|list|status|rotate|remove) with no HTTP API and no Telegram commands, while every other store-backed surface (policy, credentials, bindings, MCP upstreams) is reachable from CLI and REST and Telegram. The pool operation logic also lived inline incmd/sluice, which is the root cause of the parity gap.Separately, sluice rewrote every request to a pool's shared OAuth token host (such as
auth.openai.com), including non-refresh grants, so a fresh in-containercodex login --device-auth(adevice_codegrant) was corrupted into400 token_exchange_user_error.Changes
feat(poolops): new channel-agnosticinternal/poolopspackage (Create, List, Status, Rotate, Remove) with typed results and sentinel errors. The epoch-guarded rotate write and theResolveActivestatus derivation now live here exactly once.feat(cli):cmd/sluicepool handlers rewired as thin wrappers overinternal/poolops. No behavior change.feat(api): spec-first REST endpoints.GET/POST /api/pools,GET/DELETE /api/pools/{name},POST /api/pools/{name}/rotate. Error mapping is 400 for validation, 404 for unknown pool, 409 for name collision or already-pooled member or a pool still referenced by a binding (the 409 body carries the structured referencing-binding list), 500 otherwise.feat(telegram):/pool create|list|status|rotate|removedispatching tointernal/poolops, status rendered like the CLI, errors as plain text, added to the grouped help and command menu. No message auto-delete because pool args carry only credential names.fix(proxy): the pool token-host phantom expansion is now gated ongrant_type=refresh_token.device_code,authorization_code, and absent-grant requests reach upstream byte-unchanged, so a fresh in-containercodex login --device-authis no longer corrupted. The request-side grant probe is bounded (8 KiB, form or JSON content types only) to keep the proxy hot path cheap.feat(telegram): the pool failover notification is reworded frompool X failed over A -> B (401)to a friendlier plain-text form, for examplePool "openai_pool" failed over from "openai_oauth_2" to "openai_oauth" after auth failure (401).. Thecred_failoverandpool_exhaustedaudit Reason formats are unchanged.docs: README documents the new/poolTelegram commands and/api/poolsREST endpoints. CLAUDE.md notes pool channel parity viainternal/poolops. The channel-parity principle itself was already codified in CLAUDE.md and CONTRIBUTING.md in docs: channel feature-parity principle + pool-parity/token-host follow-up plan #45.Testing
internal/poolopsunit tests cover every op success path plus each sentinel error (empty members, static member rejection, namespace collision, unknown pool, in-use-by-pool, rotate epoch-race no-op). REST handler tests cover happy paths plus 400, 404, and 409 including the duplicate-name and already-pooled-member and structured-binding-conflict paths. Telegram tests cover each subcommand success and error. Proxy tests are fail-before/pass-after on the grant scoping (refresh_token still expanded, device_code and authorization_code and absent grants byte-unchanged) plus the failover-notice wording.Full
go test ./...green,go vet ./...andgo vet -tags=e2e ./e2e/clean, gofumpt clean, golangci-lint 0 issues,make generateidempotent,make lint-apiclean on the pool endpoints.Plan:
docs/plans/20260518-channel-feature-parity.md.