Skip to content

fix(api): model-group routing + expose group names in /v1/models#2

Open
HeimaoLST wants to merge 242 commits into
devfrom
fix/model-group-context-key
Open

fix(api): model-group routing + expose group names in /v1/models#2
HeimaoLST wants to merge 242 commits into
devfrom
fix/model-group-context-key

Conversation

@HeimaoLST
Copy link
Copy Markdown

Two related bugs in the model-group plumbing.

1. keyConfigMiddleware was reading the wrong gin context key

AuthMiddleware (internal/api/server.go:475) sets the authenticated principal under "userApiKey":

c.Set("userApiKey", result.Principal)

But keyConfigMiddleware (internal/api/server.go:1588) was reading "apiKey":

apiKeyRaw, exists := c.Get("apiKey")  // never exists

The middleware therefore early-returned on every request and never populated "apiKeyConfig" / "modelGroup" in the context. Downstream, ginKeyConfigs() in sdk/api/handlers/handlers.go always returned (nil, nil), modelgroup.IsGroupModel was always false, and group names like claude-failover fell through to the normal model lookup which fails with:

unknown provider for model claude-failover  (HTTP 502)

Fix: read "userApiKey" to match what AuthMiddleware sets.

2. Configured model-groups were invisible in /v1/models

The /v1/models endpoint only returned models registered in the global registry. Clients that pick a model from that listing (e.g. Claude Code) had no way to discover that a group name was a valid model identifier.

Fix: add serveModelsWithGroups() helper that appends each configured model-group as a virtual entry to the response:

{
  "id": "claude-failover",
  "object": "model",
  "owned_by": "model-group",
  "type": "model-group",
  "display_name": "claude-failover"
}

The unified models handler now routes through this helper for both the Claude and OpenAI branches so both client families see the groups.

Verification

End-to-end against a local build of this branch with config:

api-key-configs:
  - key: sk-test-key-001
    model-group: claude-failover
    allow-other-models: true
model-groups:
  - name: claude-failover
    models:
      - {model: claude-sonnet-4-6, priority: 3}
      - {model: claude-opus-4-7,   priority: 3}
      - {model: gpt-5.4,           priority: 2}
Endpoint Result
GET /v1/models claude-failover, doubao, openai listed alongside real models (type=model-group)
POST /v1/chat/completions {"model": "claude-failover"} 200, routed to claude-sonnet-4-6 (highest priority in the group)
POST /v1/messages {"model": "claude-failover", "stream": true} 200, SSE stream with model=claude-sonnet-4-6 in message_start

Before the fix: every request above returned 502 unknown provider for model claude-failover.

kslamph and others added 30 commits March 29, 2026 23:52
- Simplify project ID selection to always use the backend project ID returned by Gemini onboarding
- Update Gemini CLI version from 0.31.0 to 0.34.0
- Add 'terminal' to User-Agent string for better client identification

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ment

Both branches assign finalProjectID = responseProjectID, so move the
assignment outside the conditional and keep only the logging inside.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
After a Codex CLI compact, the client sends a full conversation
transcript (with compaction items or assistant messages) as input.
Previously, normalizeResponseSubsequentRequest() unconditionally
merged this with stale lastRequest/lastResponseOutput, breaking
function_call/function_call_output pairings and causing 400 errors
("No tool output found for function call").

Add inputContainsFullTranscript() heuristic that detects compaction
items (type=compaction/compaction_summary) or assistant messages in
the input array, and bypasses the merge when a full transcript is
present.

Fixes router-for-me#2207
Codex CLI gates the built-in image_generation tool behind
AuthMode::Chatgpt (OAuth only). When clients connect via API key
auth through CPA, the tool is absent from requests, making image
generation unavailable through the reverse proxy.

Changes:

1. Inject image_generation tool (codex_executor.go):
   Add ensureImageGenerationTool() that appends
   {"type":"image_generation","output_format":"png"} to the tools
   array if not already present. Applied to all three execution
   paths: Execute, executeCompact, and ExecuteStream.

2. Route aliases for Codex CLI direct access (server.go):
   Add /backend-api/codex/responses routes that map to the same
   OpenAI Responses API handlers as /v1/responses. This allows
   Codex CLI to connect via chatgpt_base_url config while keeping
   AuthMode::Chatgpt, which enables the built-in image_generation
   tool on the client side.

3. Unit tests (codex_executor_imagegen_test.go):
   Cover no-tools, existing tools, already-present, empty array,
   and mixed built-in tool scenarios.
Move credits handling from executor-level retry to conductor-level
orchestration. When all free-tier auths are exhausted (429/503), the
conductor discovers auths with available Google One AI credits and
retries with enabledCreditTypes injected via context flag.

Key changes:
- Add AntigravityCreditsHint system for tracking per-auth credits state
- Conductor tries credits fallback after all auths fail (Execute/Stream/Count)
- Executor injects enabledCreditTypes only when conductor sets context flag
- Credits fallback respects provider scope (requires antigravity in providers)
- Add context cancellation check in credits fallback to avoid wasted requests
- Remove executor-level attemptCreditsFallback and preferCredits machinery
- Restructure 429 decision logic (parse details first, keyword fallback)
- Expand shouldAbort to cover INVALID_ARGUMENT/FAILED_PRECONDITION/500+UNKNOWN
- Support human-readable retry delay parsing (e.g. "1h43m56s")
CountTokens upstream API does not support enabledCreditTypes, so
remove the dead credits fallback path from ExecuteCount and delete
the unused tryAntigravityCreditsExecuteCount method. Fix gofmt on
credits test file.
…ferred body on success

- findAllAntigravityCreditsCandidateAuths now filters by PinnedAuthMetadataKey
  to prevent credential isolation violations during credits fallback
- Release deferredBody reference on success path to avoid holding large
  payloads in memory for the lifetime of the gin context
…s-only logging

Remove deferred body optimization and maxErrorLog constants that were
unrelated to credits fallback. Keep only MarkCreditsUsed/CreditsUsed
helpers for flagging requests that consumed AI credits.
…auths as fallback candidates

Replace antigravityCreditsAvailableForModel with inline known/unknown
split. Auths whose credit hints are not yet populated are kept as
lower-priority candidates instead of being rejected, breaking the
chicken-and-egg deadlock at cold start.
…image-generation-tool-injection

feat(codex): inject image_generation tool + route aliases for Codex CLI image generation
- Included `/v1/images` in AI API path prefixes.
- Introduced tests to validate `/v1/images/generations` and `/v1/images/edits` as AI API paths.
…credits-fallback

feat(antigravity): conductor-level credits fallback for Claude models
Align GPT-5.5 Codex metadata with runtime cache
…ool injection

- Modified `ensureImageGenerationTool` to accept `baseModel` for conditional logic.
- Ensured `gpt-5.3-codex-spark` models bypass image_generation tool injection.
- Updated relevant tests and executor logic to reflect changes.
…redits-stream-fallback

fix(antigravity): trigger credits fallback for streaming
Normalize Codex context, thinking-signature, previous-response, and auth failures to explicit error codes: context_too_large, thinking_signature_invalid, previous_response_not_found, auth_unavailable.

Refs router-for-me#2596.
- Introduced `disallowFreeAuthFromMetadata` and `isFreeCodexAuth` to enforce skipping free-tier credentials.
- Modified scheduler logic to honor `DisallowFreeAuthMetadataKey` during auth selection.
- Updated `ensureImageGenerationTool` to skip tool injection for free-tier Codex auth.
- Added context utility `WithDisallowFreeAuth` and integrated with image handlers.
- Augmented relevant tests to cover free-tier exclusion scenarios.
sususu98 and others added 28 commits May 19, 2026 16:05
- Updated text formatting with bold emphasis for consistent branding.
- Refined wording for VisionCoder's promotion details in Chinese, Japanese, and English README.
…ion failover

- Introduced `homeRedisOperationTimeout` and `homeSubscriptionReceiveTimeout` constants for configurable timeouts.
- Enhanced Redis connection options with operation timeout settings and failover mechanisms.
- Implemented subscription failover logic on heartbeat timeouts to improve resilience.
- Updated message handling to support additional Redis event types, including Pong and Subscription.
Added a new section for Codex Switch tool with details.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- Removed obsolete Redis protocol test cases and helper functions that were no longer relevant due to recent architecture changes.
- Streamlined remaining test files to align with updated Redis handling and connection management logic.
- Registered new models: `gemini-3-flash-agent` and `gemini-3.5-flash-low` with detailed specifications.
- Includes support for dynamic thinking levels and extended context capabilities.
- Added new reasoning levels: `none`, `minimal`, and `unsupported` to Codex model configurations.
- Introduced metadata sanitization and normalization for reasoning levels in API response.
- Extended unit tests to cover reasoning levels validation and metadata sanitation logic.
- Added new model `gemini-3.5-flash` to the registry with enhanced intelligence and speed capabilities.
- Supports extended thinking levels (`minimal`, `low`, `medium`, `high`) and dynamic adjustments.
- Expanded generation methods, including content creation and token counting.
… capabilities

- Registered `gemini-3.5-flash` model with dynamic thinking levels and extended token limits.
- Supports multiple generation methods, including cached and batch content creation.
- Updated `ConvertClaudeRequestToGemini` to ignore empty `text` entries during processing.
- Added unit tests to ensure empty `text` parts are skipped correctly.

Closes: router-for-me#3485
Add reasoning_effort to usage event payloads
…eue operations

- Added support for advanced RESP commands (`AUTH`, `SUBSCRIBE`, `RPOP`, `LPOP`) with extended functionality.
- Implemented queue operations for usage events via `RPOP` and `LPOP` commands.
- Introduced subscription handling with new Pub/Sub message features and error handling improvements.
- Updated Redis connection logic to enforce authentication requirements and validate inputs.
- Expanded related unit tests to cover new scenarios and edge cases.
…roject-id-onboard

fix: require antigravity project id
fix: scope antigravity credits fallback gate
…-length-stream-errors-dev

fix codex context length stream errors
Add cluster-specific docker-compose configuration for CLIProxyAPI
fix(auth): update import paths to v7 for registry and executor
…nversion

- Updated `ConvertClaudeRequestToGemini` logic to treat `system` role as `developer`.
- Added unit test case to validate the behavior.

Closes: router-for-me#3510
- Registered `grok-build-0.1` model with enhanced context length and agentic engineering support.
- Supports dynamic thinking levels for improved software workflows.
- Acknowledged APIKEY.FUN as a sponsor with details on their services and exclusive project-specific benefits.
- Updated Japanese (README_JA.md), Chinese (README_CN.md), and English (README.md) documentation.
- Added new sponsorship image (`assets/apikey.png`).
- Sync 238 upstream commits up to v7.1.20 (50d19e2)
- Module path bumped v6 -> v7
- Resolved 5 conflicts:
  * Dockerfile               -> keep ENTRYPOINT docker-entrypoint.sh
  * claude_executor.go       -> keep retry comment (upstream already adopted RefreshTokensWithRetry)
  * helps/usage_helpers.go   -> adopt upstream hasOpenAIStyleUsageTokenFields (more thorough than our null check)
  * sdk/api/handlers/handlers.go -> merge model-group routing + upstream image-handler refactor
  * sdk/cliproxy/service.go  -> keep warmup scheduler + adopt upstream home/redisqueue/diff imports
- Dropped dead import internal/usage (upstream removed legacy package; replaced by api_key_usage plugin)
- Tests: 50 pass, 1 pre-existing failure (TestParseDoubaoRetryAfter time-drift in main, unrelated to merge)
Klik fork — upstream removed the in-process usage tracker in commit 18bb9c3
(moved to Redis queue + external consumer). We restore the legacy endpoints
plus add a snapshot persistor so usage data survives restarts without needing
the heavyweight CPA-Manager pipeline.

- restore internal/usage/ from before 18bb9c3 (v6 module path bumped)
- add internal/usage/persistence.go: Redis snapshot Persistor
  - on startup: load prior snapshot
  - while running: flush dirty snapshot every 5s (configurable)
  - on shutdown: final flush
  - on redis failure: log error, continue in pure in-memory mode
- new config: cfg.UsagePersistence { addr, password, db, key, flush-interval-seconds }
- new handler endpoints (preserved upstream v6 surface):
    GET  /v0/management/usage
    GET  /v0/management/usage/export
    POST /v0/management/usage/import
- Management Handler.usageStats wired in via SetUsageStatistics()
- Service.startUsagePersistor() ties it all together at Run() time
Two related bugs in the model-group plumbing:

1. keyConfigMiddleware was reading the wrong gin context key
   ---------------------------------------------------------
   AuthMiddleware sets the authenticated API key under "userApiKey":

       c.Set("userApiKey", result.Principal)

   but keyConfigMiddleware in server.go:1588 was reading "apiKey":

       apiKeyRaw, exists := c.Get("apiKey")  // never exists

   The middleware therefore early-returned on every request and never
   populated "apiKeyConfig" / "modelGroup" in the context. Downstream,
   ginKeyConfigs() in sdk/api/handlers/handlers.go always returned
   (nil, nil), modelgroup.IsGroupModel was always false, and group
   names like "claude-failover" fell through to the normal model
   lookup which fails with "unknown provider for model claude-failover"
   (502).

   Fix: read "userApiKey" to match what AuthMiddleware sets.

2. Configured model-groups were invisible in /v1/models
   ----------------------------------------------------
   The /v1/models endpoint only returned models registered in the
   global registry. Clients (e.g. Claude Code) that pick a model from
   that listing had no way to discover that a group name was a valid
   model identifier.

   Fix: add serveModelsWithGroups() helper that appends each configured
   model-group as a virtual entry to the response:

       {
         "id": "claude-failover",
         "object": "model",
         "owned_by": "model-group",
         "type": "model-group",
         "display_name": "claude-failover"
       }

   The unified models handler now goes through this helper for both
   the Claude and OpenAI branches so both client families see the
   groups.

Verified end-to-end:
  - GET /v1/models  → 3 model-group entries listed alongside real models
  - POST /v1/chat/completions  {"model": "claude-failover"}  → 200,
    routed to claude-sonnet-4-6 (highest priority in the group)
  - POST /v1/messages  {"model": "claude-failover", "stream": true}
    → 200, SSE stream with model=claude-sonnet-4-6 in message_start
@github-actions github-actions Bot changed the base branch from main to dev May 25, 2026 09:10
@github-actions
Copy link
Copy Markdown

This pull request targeted main.

The base branch has been automatically changed to dev.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.