Skip to content

fix(proxy): prefer budget-safe routing and support image-generation compatibility ("code":"invalid_request_error","param":"tools")#421

Merged
Soju06 merged 4 commits intoSoju06:mainfrom
mhughdo:hung/reallocate-budget-safe-routing
Apr 21, 2026
Merged

fix(proxy): prefer budget-safe routing and support image-generation compatibility ("code":"invalid_request_error","param":"tools")#421
Soju06 merged 4 commits intoSoju06:mainfrom
mhughdo:hung/reallocate-budget-safe-routing

Conversation

@mhughdo
Copy link
Copy Markdown
Contributor

@mhughdo mhughdo commented Apr 16, 2026

Summary

  • reallocate durable backend codex_session affinity once the pinned account crosses the configured budget threshold and a healthier candidate exists
  • prefer budget-safe Responses routing candidates over already-pressured candidates whenever any safe candidate exists
  • allow built-in Responses tools on full /backend-api/codex/responses and /v1/responses routes while stripping tool-only fields from compact requests
  • raise the default upstream Responses event/message limit to 16 MiB and make auto transport prefer upstream HTTP for image_generation requests

Why

Repeated usage_limit_reached failures were still occurring even while other accounts had headroom.

Live DB inspection showed the failures were being routed to accounts whose local primary usage at request time was already 97-100%. The proxy was still allowing those near-exhausted ACTIVE accounts to win fresh selection, and durable backend codex_session affinity only reallocated under narrower conditions.

Separately, recent Codex Desktop builds now send newer built-in Responses tools such as image_generation. That exposed proxy-side compatibility issues including this concrete failure on full Responses requests:

{"type":"error","status":400,"error":{"message":"Invalid request payload","type":"invalid_request_error","code":"invalid_request_error","param":"tools"}}

The underlying issues were:

  • full Responses and compact requests did not sanitize tool payloads correctly for the different upstream contracts
  • websocket-preferred models such as gpt-5.4 could still carry large inline image_generation outputs over upstream websocket, which made the proxy hit a local frame ceiling and fail before response.completed

Verification

  • uv run pytest tests/integration/test_proxy_sticky_sessions.py -q -k "test_proxy_stream_sticky_threads_reallocate_by_prompt_cache_key or test_proxy_codex_session_id_pins_responses_and_compact_without_sticky_threads or test_proxy_codex_session_id_reallocates_when_pinned_budget_exhausted or test_proxy_codex_session_id_compact_first_pins_followup_stream_without_sticky_threads or test_proxy_sticky_switches_when_pinned_rate_limited"
  • uv run pytest tests/unit/test_proxy_load_balancer_refresh.py -q -k "prefers_budget_safe_account_when_any_exist"
  • uv run pytest -q tests/unit/test_openai_requests.py tests/integration/test_openai_compat_features.py tests/integration/test_proxy_compact.py -k "builtin_tool or builtin_tools or compact_strips_tool_fields or responses_accepts_builtin_tool_passthrough or v1_responses_forwards_builtin_tools or v1_chat_completions_rejects_builtin_tools or responses_accepts_builtin_tools"
  • uv run pytest -q tests/unit/test_settings_multi_replica.py tests/unit/test_proxy_websocket_client.py -k "multi_replica_defaults or connect_responses_websocket_uses_websockets_transport"
  • uv run pytest tests/unit/test_proxy_utils.py -q -k "resolve_stream_transport_prefers_http_for_image_generation_even_with_native_codex_headers or resolve_stream_transport_keeps_explicit_websocket_override_for_image_generation or stream_responses_auto_transport_prefers_http_for_image_generation_tool or stream_responses_auto_transport_uses_model_preference or stream_responses_auto_transport_uses_bootstrap_model_preference_when_registry_unloaded"
  • uvx ruff check app/modules/proxy/load_balancer.py tests/integration/test_proxy_sticky_sessions.py tests/unit/test_proxy_load_balancer_refresh.py app/core/openai/requests.py app/core/openai/v1_requests.py app/core/config/settings.py app/core/clients/proxy.py tests/unit/test_openai_requests.py tests/integration/test_openai_compat_features.py tests/integration/test_proxy_compact.py tests/unit/test_settings_multi_replica.py tests/unit/test_proxy_websocket_client.py tests/unit/test_proxy_utils.py openspec/changes/reallocate-codex-session-budget-pressure openspec/changes/support-responses-builtin-tools openspec/changes/raise-upstream-event-size-limit openspec/changes/route-image-generation-over-http
  • uvx ruff format --check app/modules/proxy/load_balancer.py tests/integration/test_proxy_sticky_sessions.py tests/unit/test_proxy_load_balancer_refresh.py app/core/openai/requests.py app/core/openai/v1_requests.py app/core/config/settings.py app/core/clients/proxy.py tests/unit/test_openai_requests.py tests/integration/test_openai_compat_features.py tests/integration/test_proxy_compact.py tests/unit/test_settings_multi_replica.py tests/unit/test_proxy_websocket_client.py tests/unit/test_proxy_utils.py openspec/changes/reallocate-codex-session-budget-pressure openspec/changes/support-responses-builtin-tools openspec/changes/raise-upstream-event-size-limit openspec/changes/route-image-generation-over-http
  • openspec validate --specs

Copilot AI review requested due to automatic review settings April 16, 2026 14:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the proxy’s load-balancing and sticky-session routing to proactively avoid routing Responses traffic to accounts that are already above a configured budget threshold, and to rebind durable codex_session affinity when the pinned account becomes budget-pressured and a healthier option exists.

Changes:

  • Adjust load balancer selection to prefer below-threshold (“budget-safe”) accounts when any exist.
  • Extend sticky-session reallocation logic to include durable codex_session mappings under budget pressure.
  • Add unit + integration regressions and OpenSpec deltas documenting the new routing behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
app/modules/proxy/load_balancer.py Implements budget-safe preference wrapper around account selection and enables durable codex_session rebind under budget pressure.
tests/unit/test_proxy_load_balancer_refresh.py Adds unit coverage proving a budget-safe account is preferred over a pressured one.
tests/integration/test_proxy_sticky_sessions.py Adds integration coverage for session_id-pinned Codex routing reallocation once the pinned account crosses the budget threshold.
openspec/changes/reallocate-codex-session-budget-pressure/tasks.md Tracks implementation + verification tasks for the change set.
openspec/changes/reallocate-codex-session-budget-pressure/specs/sticky-session-operations/spec.md Specifies durable codex_session rebind behavior under budget pressure.
openspec/changes/reallocate-codex-session-budget-pressure/specs/responses-api-compat/spec.md Specifies budget-safe preference behavior for Responses routes.
openspec/changes/reallocate-codex-session-budget-pressure/proposal.md Documents motivation, changes, and expected impact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/modules/proxy/load_balancer.py Outdated
Comment thread app/modules/proxy/load_balancer.py
@mhughdo mhughdo changed the title fix(proxy): prefer budget-safe responses routing fix(proxy): prefer budget-safe routing and support image-generation compatibility Apr 17, 2026
@mhughdo mhughdo requested a review from Copilot April 17, 2026 04:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mhughdo mhughdo changed the title fix(proxy): prefer budget-safe routing and support image-generation compatibility fix(proxy): prefer budget-safe routing and support image-generation compatibility ("code":"invalid_request_error","param":"tools") Apr 17, 2026
@cani1989
Copy link
Copy Markdown

Works for me

@nika34-glitch
Copy link
Copy Markdown

i am getting this error too

@MPiland
Copy link
Copy Markdown

MPiland commented Apr 21, 2026

Works for me

Works for me as well

Copy link
Copy Markdown
Owner

@Soju06 Soju06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #421 Code Review

Verdict

APPROVE — merge now. Well-targeted surgical fix with strong test coverage, all 18 CI checks green, and multiple independent community confirmations.

Summary of changes (what this actually does)

  1. Budget-safe routing — new _select_account_preferring_budget_safe() wrapper filters out accounts where primary_used_percent > threshold OR secondary_used_percent > threshold before calling the existing select_account(); falls back gracefully if no safe account exists. Wired into 3 call sites (main dispatch, sticky fallback, non-sticky path).
  2. Sticky reallocation expanded — previously only PROMPT_CACHE sticky sessions would rebind off a budget-pressured account; now CODEX_SESSION does too. This is the main fix for usage_limit_reached repeating despite healthy accounts available.
  3. Built-in tools passthroughResponsesRequest / V1ResponsesRequest now validate_tool_types(allow_builtin_tools=True), while _strip_compact_unsupported_fields() strips tools / tool_choice / parallel_tool_calls from compact path. Matches upstream contract asymmetry.
  4. Image-generation → HTTP transport_resolve_stream_transport() now auto-prefers HTTP when tools[].type == "image_generation" is present (only in auto mode; explicit websocket config still honored).
  5. max_sse_event_bytes default — raised 2 MiB → 16 MiB to match common upstream WS frame ceiling, so large built-in tool payloads don't 1009 before response.completed.

Correctness findings

Blocking

  • None.

Non-blocking

  • transport escape hatch: if an operator has upstream_stream_transport=websocket explicitly set, image_generation requests still go over WS. Mitigated by the 16 MiB raise, but worth a doc note. Intentional per PR body.
  • _state_above_budget_threshold uses any(...) across (used_percent, secondary_used_percent). Both being None returns False → treated as safe. Correct for the intended "only penalize known-pressured" semantic, but means missing telemetry = optimistic. Acceptable.
  • memory pressure: 8× increase in per-event buffer ceiling. Per-event, streamed, not pooled — low impact, but worth mentioning in release notes.

Nits

  • preferred_states and len(preferred_states) != len(state_list) is a micro-opt that also gates the re-entry; fine but could be if 0 < len(preferred_states) < len(state_list): for readability. Not worth a change.
  • Naming: budget_exhaustedbudget_pressured rename in load_balancer.py is the right call (matches > threshold semantic, not literal exhaustion).

Test coverage assessment

Solid. Each of the 4 claimed behaviors has direct tests:

  • test_proxy_sticky_sessions.py — 5 new/updated scenarios including codex_session_reallocates_when_pinned_budget_exhausted and sticky_switches_when_pinned_rate_limited
  • test_proxy_load_balancer_refresh.py::test_*prefers_budget_safe_account_when_any_exist
  • test_openai_requests.pybuiltin_tool, compact_strips_tool_fields cases
  • test_proxy_utils.pystream_responses_auto_transport_prefers_http_for_image_generation_tool + explicit-override-respected counterpart
  • test_proxy_compact.py — compact path stripping verified

openspec change docs (4 proposals) provide spec-level traceability — nice.

Existing Copilot PR review has already run twice (no new comments on the latest pass).

Recommendation for merge

Merge now, squash. 30+ reactions on #426, multiple "works for me" confirmations, CI green. Blocking further would just accrue more duplicate issue reports.

After merge:

  • Close #426 and #441 pointing to the release commit
  • Release note bullet: "Large built-in-tool Responses payloads (e.g. image_generation) now route over HTTP by default in auto mode and have a 16 MiB frame ceiling."

@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented Apr 21, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0db3341556

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1301 to +1304
def _state_above_budget_threshold(state: AccountState, budget_threshold_pct: float) -> bool:
return any(
used_percent is not None and used_percent > budget_threshold_pct
for used_percent in (state.used_percent, state.secondary_used_percent)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict budget-pressure gating to primary usage

_state_above_budget_threshold now marks an account as pressured when either used_percent or secondary_used_percent exceeds the threshold, but usage_weighted selection still prioritizes lower secondary usage first. That combination can route back to a near-exhausted primary account (e.g., A: primary 99/secondary 5 vs B: primary 10/secondary 96), because both get classified as pressured and the budget-safe filter is bypassed before normal ranking picks A. This can recreate the same usage_limit_reached behavior the change is meant to prevent.

Useful? React with 👍 / 👎.

@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented Apr 21, 2026

Good catch from @chatgpt-codex-connector — the P1 flagged on _state_above_budget_threshold reproduces:

Account A: primary 99%, secondary  5%  -> key=(5, 99, ...)
Account B: primary 10%, secondary 96%  -> key=(96, 10, ...)

When both are > budget_threshold, the new preferred-safe filter collapses to the full pool. On the usage_weighted strategy, _usage_sort_key sorts by (secondary_used, primary_used, ...), so min(A, B) returns A — the near-exhausted primary — which is exactly the usage_limit_reached case this PR aims to avoid.

Reproduction

def usage_sort_key(primary, secondary, aid):
    sec = secondary if secondary is not None else primary
    return (sec, primary, 0.0, aid)

A = usage_sort_key(99, 5, "A")   # (5, 99, ...)
B = usage_sort_key(10, 96, "B")  # (96, 10, ...)
min(A, B) == A  # picks near-exhausted primary

Impact

  • Only reproduces under routing_strategy=usage_weighted (not capacity_weighted / round_robin)
  • Requires all candidates to be > budget_threshold (the defensive fallback path — already the rare/degraded case)
  • Does NOT fully regress the fix: the sticky-reallocation and tool-payload changes are orthogonal and still land
  • When any candidate is below threshold, the new filter still works as intended

Suggested follow-up (not blocking this PR)

Narrow _state_above_budget_threshold to consider the primary window only (the one that throws usage_limit_reached), so near-exhausted-primary accounts get filtered out even when their secondary is low:

def _state_above_budget_threshold(state, threshold):
    return state.used_percent is not None and state.used_percent > threshold

Or keep the any(...) but add a secondary tiebreaker that penalizes high primary in the sort key when all candidates are pressured.

Given the P1 is a narrow edge case on the degraded-pool fallback (not the main path), I'd still recommend merging as-is and handling this in a follow-up so the broader sticky + tools fix ships for the 30+ users waiting on #426.

@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create an environment for this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants