Skip to content

[codex] add provider fallback routing and credential pools#367

Draft
furukama wants to merge 1 commit intomainfrom
codex/provider-fallback-pools
Draft

[codex] add provider fallback routing and credential pools#367
furukama wants to merge 1 commit intomainfrom
codex/provider-fallback-pools

Conversation

@furukama
Copy link
Copy Markdown
Contributor

What changed

This adds Hermes-style primary model routing for the main session model path.

  • add runtime config for ordered primary-model fallbacks and adaptive context-tier downgrade on 429
  • resolve an ordered route plan on the host, including per-provider credential pools discovered from env/runtime secrets
  • send the full routing plan into the worker/container, while redacting pooled secrets from IPC files on disk
  • add container-side recovery logic for least-used credential rotation, 401/402 credential failover, provider/model failover, and context-tier downgrade on 429
  • include primary-model routing in worker signatures so persistent workers/containers restart when the fallback chain or pool changes
  • add targeted tests for pool discovery, routing-plan assembly, IPC redaction, worker-signature invalidation, and container recovery behavior

Why

HybridClaw auto-discovers model providers today, but it still executes the primary model path as a single resolved provider credential. That means a transient provider outage, an exhausted key, or a long-context 429 can fail the whole turn even when alternate providers or keys are already configured.

Impact

Users with multiple provider credentials or fallback models configured can now recover from more provider-side failures automatically without manually switching models.

  • 401 and similar auth failures can rotate to another key in the same provider pool
  • 402/quota-style failures can move off an exhausted credential
  • 429 responses can first downgrade the active context tier, then rotate credentials, then fail over across configured fallback models/providers
  • persistent workers pick up routing/pool changes deterministically

Root cause

The main session model execution path only resolved one provider credential up front and retried that same route. It had no ordered fallback chain, no pooled credential rotation, and no adaptive downgrade for long-context rate-limit scenarios.

Validation

Passed:

  • npm run typecheck
  • npm run lint
  • npm run format
  • npm exec vitest -- run tests/provider-api-key-utils.test.ts tests/providers.model-routing.test.ts tests/container.model-routing-state.test.ts tests/ipc.test.ts tests/worker-signature.test.ts

Blocked in this desktop environment:

  • broader runner/provider suites currently hit a pre-existing better-sqlite3 native binary mismatch (NODE_MODULE_VERSION 127 vs 141), so they could not be used as final validation here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant