fix(api): model-group routing + expose group names in /v1/models#2
Open
HeimaoLST wants to merge 242 commits into
Open
fix(api): model-group routing + expose group names in /v1/models#2HeimaoLST wants to merge 242 commits into
HeimaoLST wants to merge 242 commits into
Conversation
- Simplify project ID selection to always use the backend project ID returned by Gemini onboarding - Update Gemini CLI version from 0.31.0 to 0.34.0 - Add 'terminal' to User-Agent string for better client identification Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ment Both branches assign finalProjectID = responseProjectID, so move the assignment outside the conditional and keep only the logging inside.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
After a Codex CLI compact, the client sends a full conversation
transcript (with compaction items or assistant messages) as input.
Previously, normalizeResponseSubsequentRequest() unconditionally
merged this with stale lastRequest/lastResponseOutput, breaking
function_call/function_call_output pairings and causing 400 errors
("No tool output found for function call").
Add inputContainsFullTranscript() heuristic that detects compaction
items (type=compaction/compaction_summary) or assistant messages in
the input array, and bypasses the merge when a full transcript is
present.
Fixes router-for-me#2207
Codex CLI gates the built-in image_generation tool behind
AuthMode::Chatgpt (OAuth only). When clients connect via API key
auth through CPA, the tool is absent from requests, making image
generation unavailable through the reverse proxy.
Changes:
1. Inject image_generation tool (codex_executor.go):
Add ensureImageGenerationTool() that appends
{"type":"image_generation","output_format":"png"} to the tools
array if not already present. Applied to all three execution
paths: Execute, executeCompact, and ExecuteStream.
2. Route aliases for Codex CLI direct access (server.go):
Add /backend-api/codex/responses routes that map to the same
OpenAI Responses API handlers as /v1/responses. This allows
Codex CLI to connect via chatgpt_base_url config while keeping
AuthMode::Chatgpt, which enables the built-in image_generation
tool on the client side.
3. Unit tests (codex_executor_imagegen_test.go):
Cover no-tools, existing tools, already-present, empty array,
and mixed built-in tool scenarios.
Move credits handling from executor-level retry to conductor-level orchestration. When all free-tier auths are exhausted (429/503), the conductor discovers auths with available Google One AI credits and retries with enabledCreditTypes injected via context flag. Key changes: - Add AntigravityCreditsHint system for tracking per-auth credits state - Conductor tries credits fallback after all auths fail (Execute/Stream/Count) - Executor injects enabledCreditTypes only when conductor sets context flag - Credits fallback respects provider scope (requires antigravity in providers) - Add context cancellation check in credits fallback to avoid wasted requests - Remove executor-level attemptCreditsFallback and preferCredits machinery - Restructure 429 decision logic (parse details first, keyword fallback) - Expand shouldAbort to cover INVALID_ARGUMENT/FAILED_PRECONDITION/500+UNKNOWN - Support human-readable retry delay parsing (e.g. "1h43m56s")
CountTokens upstream API does not support enabledCreditTypes, so remove the dead credits fallback path from ExecuteCount and delete the unused tryAntigravityCreditsExecuteCount method. Fix gofmt on credits test file.
… read X-Amp-Thread-Id
…ferred body on success - findAllAntigravityCreditsCandidateAuths now filters by PinnedAuthMetadataKey to prevent credential isolation violations during credits fallback - Release deferredBody reference on success path to avoid holding large payloads in memory for the lifetime of the gin context
…s-only logging Remove deferred body optimization and maxErrorLog constants that were unrelated to credits fallback. Keep only MarkCreditsUsed/CreditsUsed helpers for flagging requests that consumed AI credits.
…auths as fallback candidates Replace antigravityCreditsAvailableForModel with inline known/unknown split. Auths whose credit hints are not yet populated are kept as lower-priority candidates instead of being rejected, breaking the chicken-and-egg deadlock at cold start.
…image-generation-tool-injection feat(codex): inject image_generation tool + route aliases for Codex CLI image generation
- Included `/v1/images` in AI API path prefixes. - Introduced tests to validate `/v1/images/generations` and `/v1/images/edits` as AI API paths.
…credits-fallback feat(antigravity): conductor-level credits fallback for Claude models
Align GPT-5.5 Codex metadata with runtime cache
…ool injection - Modified `ensureImageGenerationTool` to accept `baseModel` for conditional logic. - Ensured `gpt-5.3-codex-spark` models bypass image_generation tool injection. - Updated relevant tests and executor logic to reflect changes.
…redits-stream-fallback fix(antigravity): trigger credits fallback for streaming
Normalize Codex context, thinking-signature, previous-response, and auth failures to explicit error codes: context_too_large, thinking_signature_invalid, previous_response_not_found, auth_unavailable. Refs router-for-me#2596.
- Introduced `disallowFreeAuthFromMetadata` and `isFreeCodexAuth` to enforce skipping free-tier credentials. - Modified scheduler logic to honor `DisallowFreeAuthMetadataKey` during auth selection. - Updated `ensureImageGenerationTool` to skip tool injection for free-tier Codex auth. - Added context utility `WithDisallowFreeAuth` and integrated with image handlers. - Augmented relevant tests to cover free-tier exclusion scenarios.
- Updated text formatting with bold emphasis for consistent branding. - Refined wording for VisionCoder's promotion details in Chinese, Japanese, and English README.
…ion failover - Introduced `homeRedisOperationTimeout` and `homeSubscriptionReceiveTimeout` constants for configurable timeouts. - Enhanced Redis connection options with operation timeout settings and failover mechanisms. - Implemented subscription failover logic on heartbeat timeouts to improve resilience. - Updated message handling to support additional Redis event types, including Pong and Subscription.
Added a new section for Codex Switch tool with details.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Add Codex Switch tool to README
- Removed obsolete Redis protocol test cases and helper functions that were no longer relevant due to recent architecture changes. - Streamlined remaining test files to align with updated Redis handling and connection management logic.
- Registered new models: `gemini-3-flash-agent` and `gemini-3.5-flash-low` with detailed specifications. - Includes support for dynamic thinking levels and extended context capabilities.
- Added new reasoning levels: `none`, `minimal`, and `unsupported` to Codex model configurations. - Introduced metadata sanitization and normalization for reasoning levels in API response. - Extended unit tests to cover reasoning levels validation and metadata sanitation logic.
- Added new model `gemini-3.5-flash` to the registry with enhanced intelligence and speed capabilities. - Supports extended thinking levels (`minimal`, `low`, `medium`, `high`) and dynamic adjustments. - Expanded generation methods, including content creation and token counting.
… capabilities - Registered `gemini-3.5-flash` model with dynamic thinking levels and extended token limits. - Supports multiple generation methods, including cached and batch content creation.
- Updated `ConvertClaudeRequestToGemini` to ignore empty `text` entries during processing. - Added unit tests to ensure empty `text` parts are skipped correctly. Closes: router-for-me#3485
Add reasoning_effort to usage event payloads
…eue operations - Added support for advanced RESP commands (`AUTH`, `SUBSCRIBE`, `RPOP`, `LPOP`) with extended functionality. - Implemented queue operations for usage events via `RPOP` and `LPOP` commands. - Introduced subscription handling with new Pub/Sub message features and error handling improvements. - Updated Redis connection logic to enforce authentication requirements and validate inputs. - Expanded related unit tests to cover new scenarios and edge cases.
…roject-id-onboard fix: require antigravity project id
fix: scope antigravity credits fallback gate
…-length-stream-errors-dev fix codex context length stream errors
Add cluster-specific docker-compose configuration for CLIProxyAPI
fix(auth): update import paths to v7 for registry and executor
…nversion - Updated `ConvertClaudeRequestToGemini` logic to treat `system` role as `developer`. - Added unit test case to validate the behavior. Closes: router-for-me#3510
- Registered `grok-build-0.1` model with enhanced context length and agentic engineering support. - Supports dynamic thinking levels for improved software workflows.
- Acknowledged APIKEY.FUN as a sponsor with details on their services and exclusive project-specific benefits. - Updated Japanese (README_JA.md), Chinese (README_CN.md), and English (README.md) documentation. - Added new sponsorship image (`assets/apikey.png`).
- Sync 238 upstream commits up to v7.1.20 (50d19e2) - Module path bumped v6 -> v7 - Resolved 5 conflicts: * Dockerfile -> keep ENTRYPOINT docker-entrypoint.sh * claude_executor.go -> keep retry comment (upstream already adopted RefreshTokensWithRetry) * helps/usage_helpers.go -> adopt upstream hasOpenAIStyleUsageTokenFields (more thorough than our null check) * sdk/api/handlers/handlers.go -> merge model-group routing + upstream image-handler refactor * sdk/cliproxy/service.go -> keep warmup scheduler + adopt upstream home/redisqueue/diff imports - Dropped dead import internal/usage (upstream removed legacy package; replaced by api_key_usage plugin) - Tests: 50 pass, 1 pre-existing failure (TestParseDoubaoRetryAfter time-drift in main, unrelated to merge)
Klik fork — upstream removed the in-process usage tracker in commit 18bb9c3 (moved to Redis queue + external consumer). We restore the legacy endpoints plus add a snapshot persistor so usage data survives restarts without needing the heavyweight CPA-Manager pipeline. - restore internal/usage/ from before 18bb9c3 (v6 module path bumped) - add internal/usage/persistence.go: Redis snapshot Persistor - on startup: load prior snapshot - while running: flush dirty snapshot every 5s (configurable) - on shutdown: final flush - on redis failure: log error, continue in pure in-memory mode - new config: cfg.UsagePersistence { addr, password, db, key, flush-interval-seconds } - new handler endpoints (preserved upstream v6 surface): GET /v0/management/usage GET /v0/management/usage/export POST /v0/management/usage/import - Management Handler.usageStats wired in via SetUsageStatistics() - Service.startUsagePersistor() ties it all together at Run() time
Two related bugs in the model-group plumbing:
1. keyConfigMiddleware was reading the wrong gin context key
---------------------------------------------------------
AuthMiddleware sets the authenticated API key under "userApiKey":
c.Set("userApiKey", result.Principal)
but keyConfigMiddleware in server.go:1588 was reading "apiKey":
apiKeyRaw, exists := c.Get("apiKey") // never exists
The middleware therefore early-returned on every request and never
populated "apiKeyConfig" / "modelGroup" in the context. Downstream,
ginKeyConfigs() in sdk/api/handlers/handlers.go always returned
(nil, nil), modelgroup.IsGroupModel was always false, and group
names like "claude-failover" fell through to the normal model
lookup which fails with "unknown provider for model claude-failover"
(502).
Fix: read "userApiKey" to match what AuthMiddleware sets.
2. Configured model-groups were invisible in /v1/models
----------------------------------------------------
The /v1/models endpoint only returned models registered in the
global registry. Clients (e.g. Claude Code) that pick a model from
that listing had no way to discover that a group name was a valid
model identifier.
Fix: add serveModelsWithGroups() helper that appends each configured
model-group as a virtual entry to the response:
{
"id": "claude-failover",
"object": "model",
"owned_by": "model-group",
"type": "model-group",
"display_name": "claude-failover"
}
The unified models handler now goes through this helper for both
the Claude and OpenAI branches so both client families see the
groups.
Verified end-to-end:
- GET /v1/models → 3 model-group entries listed alongside real models
- POST /v1/chat/completions {"model": "claude-failover"} → 200,
routed to claude-sonnet-4-6 (highest priority in the group)
- POST /v1/messages {"model": "claude-failover", "stream": true}
→ 200, SSE stream with model=claude-sonnet-4-6 in message_start
|
This pull request targeted The base branch has been automatically changed to |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two related bugs in the model-group plumbing.
1.
keyConfigMiddlewarewas reading the wrong gin context keyAuthMiddleware(internal/api/server.go:475) sets the authenticated principal under"userApiKey":But
keyConfigMiddleware(internal/api/server.go:1588) was reading"apiKey":The middleware therefore early-returned on every request and never populated
"apiKeyConfig"/"modelGroup"in the context. Downstream,ginKeyConfigs()insdk/api/handlers/handlers.goalways returned(nil, nil),modelgroup.IsGroupModelwas alwaysfalse, and group names likeclaude-failoverfell through to the normal model lookup which fails with:Fix: read
"userApiKey"to match whatAuthMiddlewaresets.2. Configured model-groups were invisible in
/v1/modelsThe
/v1/modelsendpoint only returned models registered in the global registry. Clients that pick a model from that listing (e.g. Claude Code) had no way to discover that a group name was a valid model identifier.Fix: add
serveModelsWithGroups()helper that appends each configured model-group as a virtual entry to the response:{ "id": "claude-failover", "object": "model", "owned_by": "model-group", "type": "model-group", "display_name": "claude-failover" }The unified models handler now routes through this helper for both the Claude and OpenAI branches so both client families see the groups.
Verification
End-to-end against a local build of this branch with config:
GET /v1/modelsclaude-failover,doubao,openailisted alongside real models (type=model-group)POST /v1/chat/completions {"model": "claude-failover"}claude-sonnet-4-6(highest priority in the group)POST /v1/messages {"model": "claude-failover", "stream": true}model=claude-sonnet-4-6inmessage_startBefore the fix: every request above returned
502 unknown provider for model claude-failover.