fix(api): model-group routing + expose group names in /v1/models by HeimaoLST · Pull Request #2 · minervacap2022/CLIProxyAPI

HeimaoLST · 2026-05-25T09:10:40Z

Two related bugs in the model-group plumbing.

1. `keyConfigMiddleware` was reading the wrong gin context key

AuthMiddleware (internal/api/server.go:475) sets the authenticated principal under "userApiKey":

c.Set("userApiKey", result.Principal)

But keyConfigMiddleware (internal/api/server.go:1588) was reading "apiKey":

apiKeyRaw, exists := c.Get("apiKey")  // never exists

The middleware therefore early-returned on every request and never populated "apiKeyConfig" / "modelGroup" in the context. Downstream, ginKeyConfigs() in sdk/api/handlers/handlers.go always returned (nil, nil), modelgroup.IsGroupModel was always false, and group names like claude-failover fell through to the normal model lookup which fails with:

unknown provider for model claude-failover  (HTTP 502)

Fix: read "userApiKey" to match what AuthMiddleware sets.

2. Configured model-groups were invisible in `/v1/models`

The /v1/models endpoint only returned models registered in the global registry. Clients that pick a model from that listing (e.g. Claude Code) had no way to discover that a group name was a valid model identifier.

Fix: add serveModelsWithGroups() helper that appends each configured model-group as a virtual entry to the response:

{
  "id": "claude-failover",
  "object": "model",
  "owned_by": "model-group",
  "type": "model-group",
  "display_name": "claude-failover"
}

The unified models handler now routes through this helper for both the Claude and OpenAI branches so both client families see the groups.

Verification

End-to-end against a local build of this branch with config:

api-key-configs:
  - key: sk-test-key-001
    model-group: claude-failover
    allow-other-models: true
model-groups:
  - name: claude-failover
    models:
      - {model: claude-sonnet-4-6, priority: 3}
      - {model: claude-opus-4-7,   priority: 3}
      - {model: gpt-5.4,           priority: 2}

Endpoint	Result
`GET /v1/models`	`claude-failover`, `doubao`, `openai` listed alongside real models (`type=model-group`)
`POST /v1/chat/completions {"model": "claude-failover"}`	200, routed to `claude-sonnet-4-6` (highest priority in the group)
`POST /v1/messages {"model": "claude-failover", "stream": true}`	200, SSE stream with `model=claude-sonnet-4-6` in `message_start`

Before the fix: every request above returned 502 unknown provider for model claude-failover.

- Simplify project ID selection to always use the backend project ID returned by Gemini onboarding - Update Gemini CLI version from 0.31.0 to 0.34.0 - Add 'terminal' to User-Agent string for better client identification Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ment Both branches assign finalProjectID = responseProjectID, so move the assignment outside the conditional and keep only the logging inside.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

After a Codex CLI compact, the client sends a full conversation transcript (with compaction items or assistant messages) as input. Previously, normalizeResponseSubsequentRequest() unconditionally merged this with stale lastRequest/lastResponseOutput, breaking function_call/function_call_output pairings and causing 400 errors ("No tool output found for function call"). Add inputContainsFullTranscript() heuristic that detects compaction items (type=compaction/compaction_summary) or assistant messages in the input array, and bypasses the merge when a full transcript is present. Fixes router-for-me#2207

Codex CLI gates the built-in image_generation tool behind AuthMode::Chatgpt (OAuth only). When clients connect via API key auth through CPA, the tool is absent from requests, making image generation unavailable through the reverse proxy. Changes: 1. Inject image_generation tool (codex_executor.go): Add ensureImageGenerationTool() that appends {"type":"image_generation","output_format":"png"} to the tools array if not already present. Applied to all three execution paths: Execute, executeCompact, and ExecuteStream. 2. Route aliases for Codex CLI direct access (server.go): Add /backend-api/codex/responses routes that map to the same OpenAI Responses API handlers as /v1/responses. This allows Codex CLI to connect via chatgpt_base_url config while keeping AuthMode::Chatgpt, which enables the built-in image_generation tool on the client side. 3. Unit tests (codex_executor_imagegen_test.go): Cover no-tools, existing tools, already-present, empty array, and mixed built-in tool scenarios.

Move credits handling from executor-level retry to conductor-level orchestration. When all free-tier auths are exhausted (429/503), the conductor discovers auths with available Google One AI credits and retries with enabledCreditTypes injected via context flag. Key changes: - Add AntigravityCreditsHint system for tracking per-auth credits state - Conductor tries credits fallback after all auths fail (Execute/Stream/Count) - Executor injects enabledCreditTypes only when conductor sets context flag - Credits fallback respects provider scope (requires antigravity in providers) - Add context cancellation check in credits fallback to avoid wasted requests - Remove executor-level attemptCreditsFallback and preferCredits machinery - Restructure 429 decision logic (parse details first, keyword fallback) - Expand shouldAbort to cover INVALID_ARGUMENT/FAILED_PRECONDITION/500+UNKNOWN - Support human-readable retry delay parsing (e.g. "1h43m56s")

…sion affinity

CountTokens upstream API does not support enabledCreditTypes, so remove the dead credits fallback path from ExecuteCount and delete the unused tryAntigravityCreditsExecuteCount method. Fix gofmt on credits test file.

… read X-Amp-Thread-Id

…ferred body on success - findAllAntigravityCreditsCandidateAuths now filters by PinnedAuthMetadataKey to prevent credential isolation violations during credits fallback - Release deferredBody reference on success path to avoid holding large payloads in memory for the lifetime of the gin context

…s-only logging Remove deferred body optimization and maxErrorLog constants that were unrelated to credits fallback. Keep only MarkCreditsUsed/CreditsUsed helpers for flagging requests that consumed AI credits.

…auths as fallback candidates Replace antigravityCreditsAvailableForModel with inline known/unknown split. Auths whose credit hints are not yet populated are kept as lower-priority candidates instead of being rejected, breaking the chicken-and-egg deadlock at cold start.

…image-generation-tool-injection feat(codex): inject image_generation tool + route aliases for Codex CLI image generation

- Included `/v1/images` in AI API path prefixes. - Introduced tests to validate `/v1/images/generations` and `/v1/images/edits` as AI API paths.

…credits-fallback feat(antigravity): conductor-level credits fallback for Claude models

Align GPT-5.5 Codex metadata with runtime cache

…ool injection - Modified `ensureImageGenerationTool` to accept `baseModel` for conditional logic. - Ensured `gpt-5.3-codex-spark` models bypass image_generation tool injection. - Updated relevant tests and executor logic to reflect changes.

…redits-stream-fallback fix(antigravity): trigger credits fallback for streaming

Normalize Codex context, thinking-signature, previous-response, and auth failures to explicit error codes: context_too_large, thinking_signature_invalid, previous_response_not_found, auth_unavailable. Refs router-for-me#2596.

- Introduced `disallowFreeAuthFromMetadata` and `isFreeCodexAuth` to enforce skipping free-tier credentials. - Modified scheduler logic to honor `DisallowFreeAuthMetadataKey` during auth selection. - Updated `ensureImageGenerationTool` to skip tool injection for free-tier Codex auth. - Added context utility `WithDisallowFreeAuth` and integrated with image handlers. - Augmented relevant tests to cover free-tier exclusion scenarios.

- Updated text formatting with bold emphasis for consistent branding. - Refined wording for VisionCoder's promotion details in Chinese, Japanese, and English README.

…LIProxyAPI

…ion failover - Introduced `homeRedisOperationTimeout` and `homeSubscriptionReceiveTimeout` constants for configurable timeouts. - Enhanced Redis connection options with operation timeout settings and failover mechanisms. - Implemented subscription failover logic on heartbeat timeouts to improve resilience. - Updated message handling to support additional Redis event types, including Pong and Subscription.

Added a new section for Codex Switch tool with details.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Add Codex Switch tool to README

- Removed obsolete Redis protocol test cases and helper functions that were no longer relevant due to recent architecture changes. - Streamlined remaining test files to align with updated Redis handling and connection management logic.

- Registered new models: `gemini-3-flash-agent` and `gemini-3.5-flash-low` with detailed specifications. - Includes support for dynamic thinking levels and extended context capabilities.

- Added new reasoning levels: `none`, `minimal`, and `unsupported` to Codex model configurations. - Introduced metadata sanitization and normalization for reasoning levels in API response. - Extended unit tests to cover reasoning levels validation and metadata sanitation logic.

- Added new model `gemini-3.5-flash` to the registry with enhanced intelligence and speed capabilities. - Supports extended thinking levels (`minimal`, `low`, `medium`, `high`) and dynamic adjustments. - Expanded generation methods, including content creation and token counting.

… capabilities - Registered `gemini-3.5-flash` model with dynamic thinking levels and extended token limits. - Supports multiple generation methods, including cached and batch content creation.

- Updated `ConvertClaudeRequestToGemini` to ignore empty `text` entries during processing. - Added unit tests to ensure empty `text` parts are skipped correctly. Closes: router-for-me#3485

Add reasoning_effort to usage event payloads

…eue operations - Added support for advanced RESP commands (`AUTH`, `SUBSCRIBE`, `RPOP`, `LPOP`) with extended functionality. - Implemented queue operations for usage events via `RPOP` and `LPOP` commands. - Introduced subscription handling with new Pub/Sub message features and error handling improvements. - Updated Redis connection logic to enforce authentication requirements and validate inputs. - Expanded related unit tests to cover new scenarios and edge cases.

…roject-id-onboard fix: require antigravity project id

fix: scope antigravity credits fallback gate

…-length-stream-errors-dev fix codex context length stream errors

Add cluster-specific docker-compose configuration for CLIProxyAPI

fix(auth): update import paths to v7 for registry and executor

…nversion - Updated `ConvertClaudeRequestToGemini` logic to treat `system` role as `developer`. - Added unit test case to validate the behavior. Closes: router-for-me#3510

- Registered `grok-build-0.1` model with enhanced context length and agentic engineering support. - Supports dynamic thinking levels for improved software workflows.

- Acknowledged APIKEY.FUN as a sponsor with details on their services and exclusive project-specific benefits. - Updated Japanese (README_JA.md), Chinese (README_CN.md), and English (README.md) documentation. - Added new sponsorship image (`assets/apikey.png`).

- Sync 238 upstream commits up to v7.1.20 (50d19e2) - Module path bumped v6 -> v7 - Resolved 5 conflicts: * Dockerfile -> keep ENTRYPOINT docker-entrypoint.sh * claude_executor.go -> keep retry comment (upstream already adopted RefreshTokensWithRetry) * helps/usage_helpers.go -> adopt upstream hasOpenAIStyleUsageTokenFields (more thorough than our null check) * sdk/api/handlers/handlers.go -> merge model-group routing + upstream image-handler refactor * sdk/cliproxy/service.go -> keep warmup scheduler + adopt upstream home/redisqueue/diff imports - Dropped dead import internal/usage (upstream removed legacy package; replaced by api_key_usage plugin) - Tests: 50 pass, 1 pre-existing failure (TestParseDoubaoRetryAfter time-drift in main, unrelated to merge)

Klik fork — upstream removed the in-process usage tracker in commit 18bb9c3 (moved to Redis queue + external consumer). We restore the legacy endpoints plus add a snapshot persistor so usage data survives restarts without needing the heavyweight CPA-Manager pipeline. - restore internal/usage/ from before 18bb9c3 (v6 module path bumped) - add internal/usage/persistence.go: Redis snapshot Persistor - on startup: load prior snapshot - while running: flush dirty snapshot every 5s (configurable) - on shutdown: final flush - on redis failure: log error, continue in pure in-memory mode - new config: cfg.UsagePersistence { addr, password, db, key, flush-interval-seconds } - new handler endpoints (preserved upstream v6 surface): GET /v0/management/usage GET /v0/management/usage/export POST /v0/management/usage/import - Management Handler.usageStats wired in via SetUsageStatistics() - Service.startUsagePersistor() ties it all together at Run() time

Two related bugs in the model-group plumbing: 1. keyConfigMiddleware was reading the wrong gin context key --------------------------------------------------------- AuthMiddleware sets the authenticated API key under "userApiKey": c.Set("userApiKey", result.Principal) but keyConfigMiddleware in server.go:1588 was reading "apiKey": apiKeyRaw, exists := c.Get("apiKey") // never exists The middleware therefore early-returned on every request and never populated "apiKeyConfig" / "modelGroup" in the context. Downstream, ginKeyConfigs() in sdk/api/handlers/handlers.go always returned (nil, nil), modelgroup.IsGroupModel was always false, and group names like "claude-failover" fell through to the normal model lookup which fails with "unknown provider for model claude-failover" (502). Fix: read "userApiKey" to match what AuthMiddleware sets. 2. Configured model-groups were invisible in /v1/models ---------------------------------------------------- The /v1/models endpoint only returned models registered in the global registry. Clients (e.g. Claude Code) that pick a model from that listing had no way to discover that a group name was a valid model identifier. Fix: add serveModelsWithGroups() helper that appends each configured model-group as a virtual entry to the response: { "id": "claude-failover", "object": "model", "owned_by": "model-group", "type": "model-group", "display_name": "claude-failover" } The unified models handler now goes through this helper for both the Claude and OpenAI branches so both client families see the groups. Verified end-to-end: - GET /v1/models → 3 model-group entries listed alongside real models - POST /v1/chat/completions {"model": "claude-failover"} → 200, routed to claude-sonnet-4-6 (highest priority in the group) - POST /v1/messages {"model": "claude-failover", "stream": true} → 200, SSE stream with model=claude-sonnet-4-6 in message_start

github-actions · 2026-05-25T09:10:51Z

This pull request targeted main.

The base branch has been automatically changed to dev.

kslamph and others added 30 commits March 29, 2026 23:52

refactor(gemini-cli): simplify redundant if/else in project ID assign…

91387ca

…ment Both branches assign finalProjectID = responseProjectID, so move the assignment outside the conditional and keep only the logging inside.

fix(claude-auth): dedupe OAuth refresh and honor 429 backoff

6431cec

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

fix(executor): route Claude refresh through retry-aware auth

29e32aa

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

fix(websocket): narrow compact replay detection

d2d0e6f

fix(websocket): gate compact replay by downstream support

4ca00f7

feat: support extracting X-Amp-Thread-Id header as session id for ses…

4d6457e

…sion affinity

fix(antigravity): remove credits fallback from CountTokens, fix gofmt

4de5c29

CountTokens upstream API does not support enabledCreditTypes, so remove the dead credits fallback path from ExecuteCount and delete the unused tryAntigravityCreditsExecuteCount method. Fix gofmt on credits test file.

fix: forward HTTP headers to executor Options so session affinity can…

8e49c79

… read X-Amp-Thread-Id

refactor(logging): strip unrelated deferred body changes, keep credit…

920b6ef

…s-only logging Remove deferred body optimization and maxErrorLog constants that were unrelated to credits fallback. Keep only MarkCreditsUsed/CreditsUsed helpers for flagging requests that consumed AI credits.

Merge pull request router-for-me#2962 from MoYeRanqianzhi/feat/codex-…

8eb56e5

…image-generation-tool-injection feat(codex): inject image_generation tool + route aliases for Codex CLI image generation

perf(antigravity): async credits hint refresh for warm tokens

7ad1900

feat(logging): add AI API path support for image routes

25137b1

- Included `/v1/images` in AI API path prefixes. - Introduced tests to validate `/v1/images/generations` and `/v1/images/edits` as AI API paths.

Merge pull request router-for-me#2971 from sususu98/feat/antigravity-…

12195a2

…credits-fallback feat(antigravity): conductor-level credits fallback for Claude models

feat(models): add GPT-5.5 model entry to registry JSON

7d5f6d9

Add GPT-5.5 Codex model support

736018a

Merge pull request router-for-me#2989 from ben-vargas/gpt-5-5-support

1576d14

Align GPT-5.5 Codex metadata with runtime cache

chore(models): remove GPT-5.5 model entry from registry JSON

7b89583

fix antigravity credits stream fallback

5f5d593

Merge pull request router-for-me#3007 from sususu98/fix-antigravity-c…

36cc762

…redits-stream-fallback fix(antigravity): trigger credits fallback for streaming

fix(codex): classify known upstream failures

4056c25

Normalize Codex context, thinking-signature, previous-response, and auth failures to explicit error codes: context_too_large, thinking_signature_invalid, previous_response_not_found, auth_unavailable. Refs router-for-me#2596.

Add CPA Usage Keeper to README ecosystem list

faad8e3

docs:Add CPA Usage Keeper to README ecosystem list

cf043f6

sususu98 and others added 28 commits May 19, 2026 16:05

fix codex context length stream errors

ad86830

style(docs): improve sponsor section clarity in README files

67f2251

- Updated text formatting with bold emphasis for consistent branding. - Refined wording for VisionCoder's promotion details in Chinese, Japanese, and English README.

feat(docker): add cluster-specific docker-compose configuration for C…

7efc162

…LIProxyAPI

Add Codex Switch tool to README

7f68fa2

Added a new section for Codex Switch tool with details.

Update README.md

5ef7693

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Add reasoning effort to usage events

0de0ad0

Merge pull request router-for-me#3482 from 9ycrooked/patch-1

b9589e8

Add Codex Switch tool to README

feat(models): add Gemini 3.5 Flash models to registry

ea25949

- Registered new models: `gemini-3-flash-agent` and `gemini-3.5-flash-low` with detailed specifications. - Includes support for dynamic thinking levels and extended context capabilities.

feat(models): add Gemini 3.5 Flash to registry with enhanced thinking…

0ec07e5

… capabilities - Registered `gemini-3.5-flash` model with dynamic thinking levels and extended token limits. - Supports multiple generation methods, including cached and batch content creation.

fix(translator): skip empty text parts in Claude request conversion

1c632d1

- Updated `ConvertClaudeRequestToGemini` to ignore empty `text` entries during processing. - Added unit tests to ensure empty `text` parts are skipped correctly. Closes: router-for-me#3485

Merge pull request router-for-me#3484 from yavon007/main

f1ee883

Add reasoning_effort to usage event payloads

Merge pull request router-for-me#3254 from sususu98/fix/antigravity-p…

42e9605

…roject-id-onboard fix: require antigravity project id

Merge pull request router-for-me#3382 from sususu98/dev

8b9ecff

fix: scope antigravity credits fallback gate

Merge pull request router-for-me#3476 from sususu98/fix/codex-context…

48a1c88

…-length-stream-errors-dev fix codex context length stream errors

Merge pull request router-for-me#3477 from router-for-me/cluster

21fad9d

Add cluster-specific docker-compose configuration for CLIProxyAPI

fix(auth): update import paths to v7 for registry and executor

3c62a9a

Merge pull request router-for-me#3498 from router-for-me/test

cecd393

fix(auth): update import paths to v7 for registry and executor

fix(translator): handle system role as developer in Claude request co…

33f4904

…nversion - Updated `ConvertClaudeRequestToGemini` logic to treat `system` role as `developer`. - Added unit test case to validate the behavior. Closes: router-for-me#3510

feat(models): add Grok Build 0.1 to registry

aaec919

- Registered `grok-build-0.1` model with enhanced context length and agentic engineering support. - Supports dynamic thinking levels for improved software workflows.

github-actions Bot changed the base branch from main to dev May 25, 2026 09:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): model-group routing + expose group names in /v1/models#2

fix(api): model-group routing + expose group names in /v1/models#2
HeimaoLST wants to merge 242 commits into
devfrom
fix/model-group-context-key

HeimaoLST commented May 25, 2026

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

HeimaoLST commented May 25, 2026

1. keyConfigMiddleware was reading the wrong gin context key

2. Configured model-groups were invisible in /v1/models

Verification

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

1. `keyConfigMiddleware` was reading the wrong gin context key

2. Configured model-groups were invisible in `/v1/models`