fix(proxy): prefer budget-safe routing and support image-generation compatibility ("code":"invalid_request_error","param":"tools")#421
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the proxy’s load-balancing and sticky-session routing to proactively avoid routing Responses traffic to accounts that are already above a configured budget threshold, and to rebind durable codex_session affinity when the pinned account becomes budget-pressured and a healthier option exists.
Changes:
- Adjust load balancer selection to prefer below-threshold (“budget-safe”) accounts when any exist.
- Extend sticky-session reallocation logic to include durable
codex_sessionmappings under budget pressure. - Add unit + integration regressions and OpenSpec deltas documenting the new routing behavior.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
app/modules/proxy/load_balancer.py |
Implements budget-safe preference wrapper around account selection and enables durable codex_session rebind under budget pressure. |
tests/unit/test_proxy_load_balancer_refresh.py |
Adds unit coverage proving a budget-safe account is preferred over a pressured one. |
tests/integration/test_proxy_sticky_sessions.py |
Adds integration coverage for session_id-pinned Codex routing reallocation once the pinned account crosses the budget threshold. |
openspec/changes/reallocate-codex-session-budget-pressure/tasks.md |
Tracks implementation + verification tasks for the change set. |
openspec/changes/reallocate-codex-session-budget-pressure/specs/sticky-session-operations/spec.md |
Specifies durable codex_session rebind behavior under budget pressure. |
openspec/changes/reallocate-codex-session-budget-pressure/specs/responses-api-compat/spec.md |
Specifies budget-safe preference behavior for Responses routes. |
openspec/changes/reallocate-codex-session-budget-pressure/proposal.md |
Documents motivation, changes, and expected impact. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Works for me |
|
i am getting this error too |
Works for me as well |
Soju06
left a comment
There was a problem hiding this comment.
PR #421 Code Review
Verdict
APPROVE — merge now. Well-targeted surgical fix with strong test coverage, all 18 CI checks green, and multiple independent community confirmations.
Summary of changes (what this actually does)
- Budget-safe routing — new
_select_account_preferring_budget_safe()wrapper filters out accounts whereprimary_used_percent > thresholdORsecondary_used_percent > thresholdbefore calling the existingselect_account(); falls back gracefully if no safe account exists. Wired into 3 call sites (main dispatch, sticky fallback, non-sticky path). - Sticky reallocation expanded — previously only
PROMPT_CACHEsticky sessions would rebind off a budget-pressured account; nowCODEX_SESSIONdoes too. This is the main fix forusage_limit_reachedrepeating despite healthy accounts available. - Built-in tools passthrough —
ResponsesRequest/V1ResponsesRequestnowvalidate_tool_types(allow_builtin_tools=True), while_strip_compact_unsupported_fields()stripstools/tool_choice/parallel_tool_callsfrom compact path. Matches upstream contract asymmetry. - Image-generation → HTTP transport —
_resolve_stream_transport()now auto-prefers HTTP whentools[].type == "image_generation"is present (only inautomode; explicitwebsocketconfig still honored). max_sse_event_bytesdefault — raised 2 MiB → 16 MiB to match common upstream WS frame ceiling, so large built-in tool payloads don't 1009 beforeresponse.completed.
Correctness findings
Blocking
- None.
Non-blocking
- transport escape hatch: if an operator has
upstream_stream_transport=websocketexplicitly set, image_generation requests still go over WS. Mitigated by the 16 MiB raise, but worth a doc note. Intentional per PR body. _state_above_budget_thresholdusesany(...)across(used_percent, secondary_used_percent). Both beingNonereturnsFalse→ treated as safe. Correct for the intended "only penalize known-pressured" semantic, but means missing telemetry = optimistic. Acceptable.- memory pressure: 8× increase in per-event buffer ceiling. Per-event, streamed, not pooled — low impact, but worth mentioning in release notes.
Nits
preferred_states and len(preferred_states) != len(state_list)is a micro-opt that also gates the re-entry; fine but could beif 0 < len(preferred_states) < len(state_list):for readability. Not worth a change.- Naming:
budget_exhausted→budget_pressuredrename inload_balancer.pyis the right call (matches> thresholdsemantic, not literal exhaustion).
Test coverage assessment
Solid. Each of the 4 claimed behaviors has direct tests:
test_proxy_sticky_sessions.py— 5 new/updated scenarios includingcodex_session_reallocates_when_pinned_budget_exhaustedandsticky_switches_when_pinned_rate_limitedtest_proxy_load_balancer_refresh.py::test_*prefers_budget_safe_account_when_any_existtest_openai_requests.py—builtin_tool,compact_strips_tool_fieldscasestest_proxy_utils.py—stream_responses_auto_transport_prefers_http_for_image_generation_tool+ explicit-override-respected counterparttest_proxy_compact.py— compact path stripping verified
openspec change docs (4 proposals) provide spec-level traceability — nice.
Existing Copilot PR review has already run twice (no new comments on the latest pass).
Recommendation for merge
Merge now, squash. 30+ reactions on #426, multiple "works for me" confirmations, CI green. Blocking further would just accrue more duplicate issue reports.
After merge:
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0db3341556
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def _state_above_budget_threshold(state: AccountState, budget_threshold_pct: float) -> bool: | ||
| return any( | ||
| used_percent is not None and used_percent > budget_threshold_pct | ||
| for used_percent in (state.used_percent, state.secondary_used_percent) |
There was a problem hiding this comment.
Restrict budget-pressure gating to primary usage
_state_above_budget_threshold now marks an account as pressured when either used_percent or secondary_used_percent exceeds the threshold, but usage_weighted selection still prioritizes lower secondary usage first. That combination can route back to a near-exhausted primary account (e.g., A: primary 99/secondary 5 vs B: primary 10/secondary 96), because both get classified as pressured and the budget-safe filter is bypassed before normal ranking picks A. This can recreate the same usage_limit_reached behavior the change is meant to prevent.
Useful? React with 👍 / 👎.
|
Good catch from @chatgpt-codex-connector — the P1 flagged on When both are Reproductiondef usage_sort_key(primary, secondary, aid):
sec = secondary if secondary is not None else primary
return (sec, primary, 0.0, aid)
A = usage_sort_key(99, 5, "A") # (5, 99, ...)
B = usage_sort_key(10, 96, "B") # (96, 10, ...)
min(A, B) == A # picks near-exhausted primaryImpact
Suggested follow-up (not blocking this PR)Narrow def _state_above_budget_threshold(state, threshold):
return state.used_percent is not None and state.used_percent > thresholdOr keep the Given the P1 is a narrow edge case on the degraded-pool fallback (not the main path), I'd still recommend merging as-is and handling this in a follow-up so the broader sticky + tools fix ships for the 30+ users waiting on #426. |
|
To use Codex here, create an environment for this repo. |
Summary
codex_sessionaffinity once the pinned account crosses the configured budget threshold and a healthier candidate exists/backend-api/codex/responsesand/v1/responsesroutes while stripping tool-only fields from compact requestsimage_generationrequestsWhy
Repeated
usage_limit_reachedfailures were still occurring even while other accounts had headroom.Live DB inspection showed the failures were being routed to accounts whose local primary usage at request time was already
97-100%. The proxy was still allowing those near-exhaustedACTIVEaccounts to win fresh selection, and durable backendcodex_sessionaffinity only reallocated under narrower conditions.Separately, recent Codex Desktop builds now send newer built-in Responses tools such as
image_generation. That exposed proxy-side compatibility issues including this concrete failure on full Responses requests:{"type":"error","status":400,"error":{"message":"Invalid request payload","type":"invalid_request_error","code":"invalid_request_error","param":"tools"}}The underlying issues were:
gpt-5.4could still carry large inlineimage_generationoutputs over upstream websocket, which made the proxy hit a local frame ceiling and fail beforeresponse.completedVerification
uv run pytest tests/integration/test_proxy_sticky_sessions.py -q -k "test_proxy_stream_sticky_threads_reallocate_by_prompt_cache_key or test_proxy_codex_session_id_pins_responses_and_compact_without_sticky_threads or test_proxy_codex_session_id_reallocates_when_pinned_budget_exhausted or test_proxy_codex_session_id_compact_first_pins_followup_stream_without_sticky_threads or test_proxy_sticky_switches_when_pinned_rate_limited"uv run pytest tests/unit/test_proxy_load_balancer_refresh.py -q -k "prefers_budget_safe_account_when_any_exist"uv run pytest -q tests/unit/test_openai_requests.py tests/integration/test_openai_compat_features.py tests/integration/test_proxy_compact.py -k "builtin_tool or builtin_tools or compact_strips_tool_fields or responses_accepts_builtin_tool_passthrough or v1_responses_forwards_builtin_tools or v1_chat_completions_rejects_builtin_tools or responses_accepts_builtin_tools"uv run pytest -q tests/unit/test_settings_multi_replica.py tests/unit/test_proxy_websocket_client.py -k "multi_replica_defaults or connect_responses_websocket_uses_websockets_transport"uv run pytest tests/unit/test_proxy_utils.py -q -k "resolve_stream_transport_prefers_http_for_image_generation_even_with_native_codex_headers or resolve_stream_transport_keeps_explicit_websocket_override_for_image_generation or stream_responses_auto_transport_prefers_http_for_image_generation_tool or stream_responses_auto_transport_uses_model_preference or stream_responses_auto_transport_uses_bootstrap_model_preference_when_registry_unloaded"uvx ruff check app/modules/proxy/load_balancer.py tests/integration/test_proxy_sticky_sessions.py tests/unit/test_proxy_load_balancer_refresh.py app/core/openai/requests.py app/core/openai/v1_requests.py app/core/config/settings.py app/core/clients/proxy.py tests/unit/test_openai_requests.py tests/integration/test_openai_compat_features.py tests/integration/test_proxy_compact.py tests/unit/test_settings_multi_replica.py tests/unit/test_proxy_websocket_client.py tests/unit/test_proxy_utils.py openspec/changes/reallocate-codex-session-budget-pressure openspec/changes/support-responses-builtin-tools openspec/changes/raise-upstream-event-size-limit openspec/changes/route-image-generation-over-httpuvx ruff format --check app/modules/proxy/load_balancer.py tests/integration/test_proxy_sticky_sessions.py tests/unit/test_proxy_load_balancer_refresh.py app/core/openai/requests.py app/core/openai/v1_requests.py app/core/config/settings.py app/core/clients/proxy.py tests/unit/test_openai_requests.py tests/integration/test_openai_compat_features.py tests/integration/test_proxy_compact.py tests/unit/test_settings_multi_replica.py tests/unit/test_proxy_websocket_client.py tests/unit/test_proxy_utils.py openspec/changes/reallocate-codex-session-budget-pressure openspec/changes/support-responses-builtin-tools openspec/changes/raise-upstream-event-size-limit openspec/changes/route-image-generation-over-httpopenspec validate --specs