Skip to content

feat(pools): channel parity (REST + Telegram) and refresh-grant token-host scoping#47

Merged
nnemirovsky merged 17 commits into
mainfrom
channel-feature-parity
May 18, 2026
Merged

feat(pools): channel parity (REST + Telegram) and refresh-grant token-host scoping#47
nnemirovsky merged 17 commits into
mainfrom
channel-feature-parity

Conversation

@nnemirovsky
Copy link
Copy Markdown
Owner

What

Brings credential pools to feature parity across all management channels and fixes a grant-blind token-host rewrite.

Pools were CLI-only (sluice pool create|list|status|rotate|remove) with no HTTP API and no Telegram commands, while every other store-backed surface (policy, credentials, bindings, MCP upstreams) is reachable from CLI and REST and Telegram. The pool operation logic also lived inline in cmd/sluice, which is the root cause of the parity gap.

Separately, sluice rewrote every request to a pool's shared OAuth token host (such as auth.openai.com), including non-refresh grants, so a fresh in-container codex login --device-auth (a device_code grant) was corrupted into 400 token_exchange_user_error.

Changes

  1. feat(poolops): new channel-agnostic internal/poolops package (Create, List, Status, Rotate, Remove) with typed results and sentinel errors. The epoch-guarded rotate write and the ResolveActive status derivation now live here exactly once.
  2. feat(cli): cmd/sluice pool handlers rewired as thin wrappers over internal/poolops. No behavior change.
  3. feat(api): spec-first REST endpoints. GET/POST /api/pools, GET/DELETE /api/pools/{name}, POST /api/pools/{name}/rotate. Error mapping is 400 for validation, 404 for unknown pool, 409 for name collision or already-pooled member or a pool still referenced by a binding (the 409 body carries the structured referencing-binding list), 500 otherwise.
  4. feat(telegram): /pool create|list|status|rotate|remove dispatching to internal/poolops, status rendered like the CLI, errors as plain text, added to the grouped help and command menu. No message auto-delete because pool args carry only credential names.
  5. fix(proxy): the pool token-host phantom expansion is now gated on grant_type=refresh_token. device_code, authorization_code, and absent-grant requests reach upstream byte-unchanged, so a fresh in-container codex login --device-auth is no longer corrupted. The request-side grant probe is bounded (8 KiB, form or JSON content types only) to keep the proxy hot path cheap.
  6. feat(telegram): the pool failover notification is reworded from pool X failed over A -> B (401) to a friendlier plain-text form, for example Pool "openai_pool" failed over from "openai_oauth_2" to "openai_oauth" after auth failure (401).. The cred_failover and pool_exhausted audit Reason formats are unchanged.
  7. docs: README documents the new /pool Telegram commands and /api/pools REST endpoints. CLAUDE.md notes pool channel parity via internal/poolops. The channel-parity principle itself was already codified in CLAUDE.md and CONTRIBUTING.md in docs: channel feature-parity principle + pool-parity/token-host follow-up plan #45.

Testing

internal/poolops unit tests cover every op success path plus each sentinel error (empty members, static member rejection, namespace collision, unknown pool, in-use-by-pool, rotate epoch-race no-op). REST handler tests cover happy paths plus 400, 404, and 409 including the duplicate-name and already-pooled-member and structured-binding-conflict paths. Telegram tests cover each subcommand success and error. Proxy tests are fail-before/pass-after on the grant scoping (refresh_token still expanded, device_code and authorization_code and absent grants byte-unchanged) plus the failover-notice wording.

Full go test ./... green, go vet ./... and go vet -tags=e2e ./e2e/ clean, gofumpt clean, golangci-lint 0 issues, make generate idempotent, make lint-api clean on the pool endpoints.

Plan: docs/plans/20260518-channel-feature-parity.md.

Move pool create/list/status/rotate/remove logic into internal/poolops
so CLI, REST, and Telegram share one implementation. Adds structured
errors (already-pooled member, live-member removal, delete conflict)
returned channel-neutrally.
sluice pool create|list|status|rotate|remove now delegate to the
shared poolops package; rotate-race hint points to 'sluice pool status'.
Add /api/pools CRUD plus status/rotate via poolops; pool-create
honors the 409 conflict contract and returns structured
pool-delete conflict payloads. Token-host grant parse restricted
to form bodies.
Telegram bot reaches pool management through poolops, matching CLI
and REST channel parity.
…grants

Restrict the pool token-host phantom split-host expansion to
refresh_token grant requests so non-refresh traffic is unaffected;
parse only form bodies.
Reword the pool failover Telegram notice into human-readable text
(task 4b).
Document the pool management surfaces across CLI/REST/Telegram in
CLAUDE.md and README, and record the channel-feature-parity plan
(incl. task 4b) as completed.

This comment was marked as outdated.

… request data

split the pool-delete 409 body into PoolReferencedErrorResponse so the
generic ErrorResponse envelope is no longer coupled to one endpoint
(regenerated api.gen.go via oapi-codegen). PostApiPools now builds the
201 body from the request members + failover default instead of gating
on a store read-back, so a read-back error can no longer report a
successful create as a 500. poolStatusError no longer checks
ErrCredentialInUseByPool, which is unreachable from the pool handlers
(raised only by the credential-removal path).
…probe cap

requestFlowGrantType now skips the string(body)+ParseQuery grant_type
probe unless the request is an HTTP POST whose scheme+host matches a
known OAuth token endpoint (new OAuthIndex.MatchesHost), removing the
per-request body parse on the hot path for non-OAuth traffic. The
form/JSON parsers in extractRequestRefreshToken and requestGrantType no
longer double-stringify the body for an explicit form Content-Type. The
probe cap is raised 8KiB->64KiB so a large RFC 7523 refresh payload at a
pool token host is still expanded, with a rate-limited WARNING when the
cap truncates a probe. StreamRequestModifier's buffered-path-only
limitation is documented explicitly.
FormatFailoverNotice rendered '(unknown reason)' / 'after unknown
reason' when the reason tag was empty. Drop the reason clause entirely
in that case for both the exhausted and normal messages. The
cred_failover / pool_exhausted audit Reason format is unchanged.
the /pool status hint in the rotate-race message wrote the raw pool name
into an HTML-parsed reply, so a name with <,>,& would break rendering;
htmlCode it like the other occurrence. The pool-remove test guarded on
'err == nil', which GetPool returns even for a missing pool; assert the
pool row is actually gone instead.
the /api/* curl examples omit the Authorization header though the
endpoints are behind BearerAuth; add a single leading note covering the
credential, pool, and rule examples.

This comment was marked as outdated.

This comment was marked as outdated.

Invert poolCreateError: conflict sentinels stay 409, genuine client-input
validation sentinels are 400, and everything else (tx/DB/INSERT failures
inside store.CreatePoolWithMembers, which are wrapped fmt.Errorf strings
with no sentinel) now defaults to 500 instead of being misclassified as a
client 400. Add typed, errors.Is-able sentinels in internal/store for the
client-validation cases that previously had only wrapped messages
(ErrPoolNoMembers, ErrPoolStrategyInvalid, ErrPoolMemberDuplicate,
ErrPoolMemberNotFound, ErrPoolMemberNotOAuth), wrapped at origin so the
existing human-readable text is preserved verbatim. Document the new 500
response on POST /api/pools in the OpenAPI spec and regenerate.
…pty case

humanizeFailoverReason now renders an unknown tag as 'unknown reason
(<tag>)' so the surrounding 'failed over ... after %s.' and exhausted
'... to (%s).' clauses read naturally instead of the redundant 'failed
over ... after failover (<tag>).'. Remove the unreachable case "":
branch: FormatFailoverNotice is the sole caller and short-circuits an
empty reason before ever calling here, so the empty-tag wording has a
single source of truth there. Update the notice/humanize tests
accordingly.
poolStatus sends with HTML parse mode (htmlCode emits <code>), but
appended m.LastFailureReason (upstream error text, may contain < > &)
raw, which breaks rendering or is rejected by the Bot API. Escape it
with htmlEscape (prose, not an identifier, so not htmlCode) consistent
with every other user-facing value on that line. Add a test asserting a
reason with < > & is rendered escaped.

This comment was marked as outdated.

@nnemirovsky nnemirovsky requested a review from Copilot May 18, 2026 08:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 25 changed files in this pull request and generated no new comments.

Files not reviewed (1)
  • internal/api/api.gen.go: Language not supported

@nnemirovsky nnemirovsky merged commit 6d5e680 into main May 18, 2026
10 checks passed
@nnemirovsky nnemirovsky deleted the channel-feature-parity branch May 18, 2026 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants