fix(proxy): prefer budget-safe routing and support image-generation compatibility ("code":"invalid_request_error","param":"tools") by mhughdo · Pull Request #421 · Soju06/codex-lb

mhughdo · 2026-04-16T14:12:19Z

Summary

reallocate durable backend codex_session affinity once the pinned account crosses the configured budget threshold and a healthier candidate exists
prefer budget-safe Responses routing candidates over already-pressured candidates whenever any safe candidate exists
allow built-in Responses tools on full /backend-api/codex/responses and /v1/responses routes while stripping tool-only fields from compact requests
raise the default upstream Responses event/message limit to 16 MiB and make auto transport prefer upstream HTTP for image_generation requests

Why

Repeated usage_limit_reached failures were still occurring even while other accounts had headroom.

Live DB inspection showed the failures were being routed to accounts whose local primary usage at request time was already 97-100%. The proxy was still allowing those near-exhausted ACTIVE accounts to win fresh selection, and durable backend codex_session affinity only reallocated under narrower conditions.

Separately, recent Codex Desktop builds now send newer built-in Responses tools such as image_generation. That exposed proxy-side compatibility issues including this concrete failure on full Responses requests:

{"type":"error","status":400,"error":{"message":"Invalid request payload","type":"invalid_request_error","code":"invalid_request_error","param":"tools"}}

The underlying issues were:

full Responses and compact requests did not sanitize tool payloads correctly for the different upstream contracts
websocket-preferred models such as gpt-5.4 could still carry large inline image_generation outputs over upstream websocket, which made the proxy hit a local frame ceiling and fail before response.completed

Verification

uv run pytest tests/integration/test_proxy_sticky_sessions.py -q -k "test_proxy_stream_sticky_threads_reallocate_by_prompt_cache_key or test_proxy_codex_session_id_pins_responses_and_compact_without_sticky_threads or test_proxy_codex_session_id_reallocates_when_pinned_budget_exhausted or test_proxy_codex_session_id_compact_first_pins_followup_stream_without_sticky_threads or test_proxy_sticky_switches_when_pinned_rate_limited"
uv run pytest tests/unit/test_proxy_load_balancer_refresh.py -q -k "prefers_budget_safe_account_when_any_exist"
uv run pytest -q tests/unit/test_openai_requests.py tests/integration/test_openai_compat_features.py tests/integration/test_proxy_compact.py -k "builtin_tool or builtin_tools or compact_strips_tool_fields or responses_accepts_builtin_tool_passthrough or v1_responses_forwards_builtin_tools or v1_chat_completions_rejects_builtin_tools or responses_accepts_builtin_tools"
uv run pytest -q tests/unit/test_settings_multi_replica.py tests/unit/test_proxy_websocket_client.py -k "multi_replica_defaults or connect_responses_websocket_uses_websockets_transport"
uv run pytest tests/unit/test_proxy_utils.py -q -k "resolve_stream_transport_prefers_http_for_image_generation_even_with_native_codex_headers or resolve_stream_transport_keeps_explicit_websocket_override_for_image_generation or stream_responses_auto_transport_prefers_http_for_image_generation_tool or stream_responses_auto_transport_uses_model_preference or stream_responses_auto_transport_uses_bootstrap_model_preference_when_registry_unloaded"
uvx ruff check app/modules/proxy/load_balancer.py tests/integration/test_proxy_sticky_sessions.py tests/unit/test_proxy_load_balancer_refresh.py app/core/openai/requests.py app/core/openai/v1_requests.py app/core/config/settings.py app/core/clients/proxy.py tests/unit/test_openai_requests.py tests/integration/test_openai_compat_features.py tests/integration/test_proxy_compact.py tests/unit/test_settings_multi_replica.py tests/unit/test_proxy_websocket_client.py tests/unit/test_proxy_utils.py openspec/changes/reallocate-codex-session-budget-pressure openspec/changes/support-responses-builtin-tools openspec/changes/raise-upstream-event-size-limit openspec/changes/route-image-generation-over-http
uvx ruff format --check app/modules/proxy/load_balancer.py tests/integration/test_proxy_sticky_sessions.py tests/unit/test_proxy_load_balancer_refresh.py app/core/openai/requests.py app/core/openai/v1_requests.py app/core/config/settings.py app/core/clients/proxy.py tests/unit/test_openai_requests.py tests/integration/test_openai_compat_features.py tests/integration/test_proxy_compact.py tests/unit/test_settings_multi_replica.py tests/unit/test_proxy_websocket_client.py tests/unit/test_proxy_utils.py openspec/changes/reallocate-codex-session-budget-pressure openspec/changes/support-responses-builtin-tools openspec/changes/raise-upstream-event-size-limit openspec/changes/route-image-generation-over-http
openspec validate --specs

Copilot

Pull request overview

This PR updates the proxy’s load-balancing and sticky-session routing to proactively avoid routing Responses traffic to accounts that are already above a configured budget threshold, and to rebind durable codex_session affinity when the pinned account becomes budget-pressured and a healthier option exists.

Changes:

Adjust load balancer selection to prefer below-threshold (“budget-safe”) accounts when any exist.
Extend sticky-session reallocation logic to include durable codex_session mappings under budget pressure.
Add unit + integration regressions and OpenSpec deltas documenting the new routing behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`app/modules/proxy/load_balancer.py`	Implements budget-safe preference wrapper around account selection and enables durable `codex_session` rebind under budget pressure.
`tests/unit/test_proxy_load_balancer_refresh.py`	Adds unit coverage proving a budget-safe account is preferred over a pressured one.
`tests/integration/test_proxy_sticky_sessions.py`	Adds integration coverage for `session_id`-pinned Codex routing reallocation once the pinned account crosses the budget threshold.
`openspec/changes/reallocate-codex-session-budget-pressure/tasks.md`	Tracks implementation + verification tasks for the change set.
`openspec/changes/reallocate-codex-session-budget-pressure/specs/sticky-session-operations/spec.md`	Specifies durable `codex_session` rebind behavior under budget pressure.
`openspec/changes/reallocate-codex-session-budget-pressure/specs/responses-api-compat/spec.md`	Specifies budget-safe preference behavior for Responses routes.
`openspec/changes/reallocate-codex-session-budget-pressure/proposal.md`	Documents motivation, changes, and expected impact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cani1989 · 2026-04-19T15:17:18Z

Works for me

nika34-glitch · 2026-04-19T23:23:22Z

i am getting this error too

MPiland · 2026-04-21T00:44:40Z

Works for me

Works for me as well

Soju06

PR #421 Code Review

Verdict

APPROVE — merge now. Well-targeted surgical fix with strong test coverage, all 18 CI checks green, and multiple independent community confirmations.

Summary of changes (what this actually does)

Budget-safe routing — new _select_account_preferring_budget_safe() wrapper filters out accounts where primary_used_percent > threshold OR secondary_used_percent > threshold before calling the existing select_account(); falls back gracefully if no safe account exists. Wired into 3 call sites (main dispatch, sticky fallback, non-sticky path).
Sticky reallocation expanded — previously only PROMPT_CACHE sticky sessions would rebind off a budget-pressured account; now CODEX_SESSION does too. This is the main fix for usage_limit_reached repeating despite healthy accounts available.
Built-in tools passthrough — ResponsesRequest / V1ResponsesRequest now validate_tool_types(allow_builtin_tools=True), while _strip_compact_unsupported_fields() strips tools / tool_choice / parallel_tool_calls from compact path. Matches upstream contract asymmetry.
Image-generation → HTTP transport — _resolve_stream_transport() now auto-prefers HTTP when tools[].type == "image_generation" is present (only in auto mode; explicit websocket config still honored).
max_sse_event_bytes default — raised 2 MiB → 16 MiB to match common upstream WS frame ceiling, so large built-in tool payloads don't 1009 before response.completed.

Correctness findings

Blocking

None.

Non-blocking

transport escape hatch: if an operator has upstream_stream_transport=websocket explicitly set, image_generation requests still go over WS. Mitigated by the 16 MiB raise, but worth a doc note. Intentional per PR body.
_state_above_budget_threshold uses any(...) across (used_percent, secondary_used_percent). Both being None returns False → treated as safe. Correct for the intended "only penalize known-pressured" semantic, but means missing telemetry = optimistic. Acceptable.
memory pressure: 8× increase in per-event buffer ceiling. Per-event, streamed, not pooled — low impact, but worth mentioning in release notes.

Nits

preferred_states and len(preferred_states) != len(state_list) is a micro-opt that also gates the re-entry; fine but could be if 0 < len(preferred_states) < len(state_list): for readability. Not worth a change.
Naming: budget_exhausted → budget_pressured rename in load_balancer.py is the right call (matches > threshold semantic, not literal exhaustion).

Test coverage assessment

Solid. Each of the 4 claimed behaviors has direct tests:

test_proxy_sticky_sessions.py — 5 new/updated scenarios including codex_session_reallocates_when_pinned_budget_exhausted and sticky_switches_when_pinned_rate_limited
test_proxy_load_balancer_refresh.py::test_*prefers_budget_safe_account_when_any_exist
test_openai_requests.py — builtin_tool, compact_strips_tool_fields cases
test_proxy_utils.py — stream_responses_auto_transport_prefers_http_for_image_generation_tool + explicit-override-respected counterpart
test_proxy_compact.py — compact path stripping verified

openspec change docs (4 proposals) provide spec-level traceability — nice.

Existing Copilot PR review has already run twice (no new comments on the latest pass).

Recommendation for merge

Merge now, squash. 30+ reactions on #426, multiple "works for me" confirmations, CI green. Blocking further would just accrue more duplicate issue reports.

After merge:

Close #426 and #441 pointing to the release commit
Release note bullet: "Large built-in-tool Responses payloads (e.g. image_generation) now route over HTTP by default in auto mode and have a 16 MiB frame ceiling."

Soju06 · 2026-04-21T02:54:26Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0db3341556

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-21T03:01:27Z

+def _state_above_budget_threshold(state: AccountState, budget_threshold_pct: float) -> bool:
+    return any(
+        used_percent is not None and used_percent > budget_threshold_pct
+        for used_percent in (state.used_percent, state.secondary_used_percent)


Restrict budget-pressure gating to primary usage

_state_above_budget_threshold now marks an account as pressured when either used_percent or secondary_used_percent exceeds the threshold, but usage_weighted selection still prioritizes lower secondary usage first. That combination can route back to a near-exhausted primary account (e.g., A: primary 99/secondary 5 vs B: primary 10/secondary 96), because both get classified as pressured and the budget-safe filter is bypassed before normal ranking picks A. This can recreate the same usage_limit_reached behavior the change is meant to prevent.

Useful? React with 👍 / 👎.

Soju06 · 2026-04-21T03:03:01Z

Good catch from @chatgpt-codex-connector — the P1 flagged on _state_above_budget_threshold reproduces:

Account A: primary 99%, secondary  5%  -> key=(5, 99, ...)
Account B: primary 10%, secondary 96%  -> key=(96, 10, ...)

When both are > budget_threshold, the new preferred-safe filter collapses to the full pool. On the usage_weighted strategy, _usage_sort_key sorts by (secondary_used, primary_used, ...), so min(A, B) returns A — the near-exhausted primary — which is exactly the usage_limit_reached case this PR aims to avoid.

Reproduction

def usage_sort_key(primary, secondary, aid):
    sec = secondary if secondary is not None else primary
    return (sec, primary, 0.0, aid)

A = usage_sort_key(99, 5, "A")   # (5, 99, ...)
B = usage_sort_key(10, 96, "B")  # (96, 10, ...)
min(A, B) == A  # picks near-exhausted primary

Impact

Only reproduces under routing_strategy=usage_weighted (not capacity_weighted / round_robin)
Requires all candidates to be > budget_threshold (the defensive fallback path — already the rare/degraded case)
Does NOT fully regress the fix: the sticky-reallocation and tool-payload changes are orthogonal and still land
When any candidate is below threshold, the new filter still works as intended

Suggested follow-up (not blocking this PR)

Narrow _state_above_budget_threshold to consider the primary window only (the one that throws usage_limit_reached), so near-exhausted-primary accounts get filtered out even when their secondary is low:

def _state_above_budget_threshold(state, threshold):
    return state.used_percent is not None and state.used_percent > threshold

Or keep the any(...) but add a secondary tiebreaker that penalizes high primary in the sort key when all candidates are pressured.

Given the P1 is a narrow edge case on the degraded-pool fallback (not the main path), I'd still recommend merging as-is and handling this in a follow-up so the broader sticky + tools fix ships for the 30+ users waiting on #426.

chatgpt-codex-connector · 2026-04-21T03:03:12Z

To use Codex here, create an environment for this repo.

fix(proxy): prefer budget-safe responses routing

128213b

Copilot AI review requested due to automatic review settings April 16, 2026 14:12

Copilot started reviewing on behalf of mhughdo April 16, 2026 14:12 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Comment thread app/modules/proxy/load_balancer.py Outdated

Comment thread app/modules/proxy/load_balancer.py

mhughdo added 3 commits April 16, 2026 21:20

test(proxy): align budget-safe stickiness expectations

1302166

chore(proxy): clarify budget-safe routing naming

98cb65a

fix(proxy): support responses image-generation compatibility

0db3341

mhughdo changed the title ~~fix(proxy): prefer budget-safe responses routing~~ fix(proxy): prefer budget-safe routing and support image-generation compatibility Apr 17, 2026

mhughdo requested a review from Copilot April 17, 2026 04:18

Copilot started reviewing on behalf of mhughdo April 17, 2026 04:18 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

mhughdo changed the title ~~fix(proxy): prefer budget-safe routing and support image-generation compatibility~~ fix(proxy): prefer budget-safe routing and support image-generation compatibility ("code":"invalid_request_error","param":"tools") Apr 17, 2026

mhughdo mentioned this pull request Apr 17, 2026

Codex update has broken this #426

Closed

stemirkhan mentioned this pull request Apr 17, 2026

fix(responses): allow built-in tools in Responses requests #431

Closed

This was referenced Apr 18, 2026

feat(settings): add image generation toggle for OpenCode users #435

Closed

feat(settings): add image generation toggle button for PR #421 #436

Closed

Soju06 mentioned this pull request Apr 21, 2026

API Compatibility Drift - Codex App fails with "Invalid request payload" for tools, while Codex CLI still works #441

Closed

Soju06 approved these changes Apr 21, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 21, 2026

View reviewed changes

Soju06 mentioned this pull request Apr 21, 2026

Invalid request payload from codex-lb after codex-cli v0.122.0 update #443

Closed

RazvanBugoi mentioned this pull request Apr 21, 2026

fix(proxy): prefer budget-safe routing and support image-generation compatibility RazvanBugoi/codex-lb#1

Merged

Soju06 merged commit e632d94 into Soju06:main Apr 21, 2026
21 of 22 checks passed

This was referenced Apr 21, 2026

[follow-up] Narrow _state_above_budget_threshold primary-only gating #446

Open

chore(main): release 1.14.0 #447

Open

This was referenced Apr 21, 2026

fix(api-keys): reuse shared copy button for created keys #432

Merged

feat(api-keys): show assigned account availability in picker #422

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(proxy): prefer budget-safe routing and support image-generation compatibility ("code":"invalid_request_error","param":"tools")#421

fix(proxy): prefer budget-safe routing and support image-generation compatibility ("code":"invalid_request_error","param":"tools")#421
Soju06 merged 4 commits intoSoju06:mainfrom
mhughdo:hung/reallocate-budget-safe-routing

mhughdo commented Apr 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

cani1989 commented Apr 19, 2026

Uh oh!

nika34-glitch commented Apr 19, 2026

Uh oh!

MPiland commented Apr 21, 2026

Uh oh!

Soju06 left a comment

Uh oh!

Soju06 commented Apr 21, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 21, 2026

Uh oh!

Soju06 commented Apr 21, 2026

Uh oh!

chatgpt-codex-connector bot commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

mhughdo commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

cani1989 commented Apr 19, 2026

Uh oh!

nika34-glitch commented Apr 19, 2026

Uh oh!

MPiland commented Apr 21, 2026

Uh oh!

Soju06 left a comment

Choose a reason for hiding this comment

PR #421 Code Review

Verdict

Summary of changes (what this actually does)

Correctness findings

Blocking

Non-blocking

Nits

Test coverage assessment

Recommendation for merge

Uh oh!

Soju06 commented Apr 21, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Soju06 commented Apr 21, 2026

Reproduction

Impact

Suggested follow-up (not blocking this PR)

Uh oh!

chatgpt-codex-connector bot commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mhughdo commented Apr 16, 2026 •

edited

Loading