fix(gemini_cli): harden 429 failure handling by MasuRii · Pull Request #145 · Mirrowel/LLM-API-Key-Proxy

MasuRii · 2026-03-01T21:25:31Z

Summary

This PR hardens Gemini CLI rate-limit behavior by preserving proper HTTP semantics when 429 errors occur before streaming starts. It ensures clients receive accurate non-200 responses and retry metadata instead of a committed 200 stream.

Problem

Pre-stream failures could be surfaced after a streaming response was already committed. This made 429 handling ambiguous for clients and could hide actionable retry timing.

Root Cause

Streaming bootstrap and response commit timing were not clearly separated. Failures that occurred before first chunk emission could still flow through a path that had already committed streaming semantics.

Changes

Added explicit pre-stream failure handling in the rotator flow
Introduced bootstrap failure signaling for failures before first chunk emission
Primed first streaming chunk before constructing StreamingResponse
Preserved status code and Retry-After metadata for pre-stream HTTP errors
Prevented retry/rotation after stream commit

Impact

Improves Gemini CLI 429 resilience and correctness
Restores correct HTTP behavior for pre-stream failures
Enables clients to honor retry guidance reliably

Validation

Verified PR metadata update through GitHub CLI
Confirmed PR title reflects fix/hardening framing for Gemini CLI 429
Confirmed body was applied via --body-file

Risks

Low: behavior changes are focused on pre-stream error paths
Existing successful streaming paths remain unchanged

Checklist

Title uses conventional fix/hardening framing for Gemini CLI 429
PR body updated using local markdown file via --body-file
Body sections follow the required standard structure
Scope matches committed changes without introducing unrelated claims

Summary by CodeRabbit

Bug Fixes
- Streaming endpoints now return proper HTTP error responses for pre-transmission failures, replacing in-band error handling.
- Added support for rate-limit retry-after headers in streaming responses.
- Improved error handling for mid-stream failures with structured error messages and completion markers.

Introduce StreamBootstrapError for failures before first chunk emission and use it in streaming execution flow to preserve correct HTTP semantics. Prevent retry/rotation after stream commit and aggregate pre-stream failure metadata for accurate status propagation.

Prime the first streaming chunk before constructing StreamingResponse so bootstrap failures can return non-200 responses with Retry-After headers when available.

mirrobot-agent · 2026-03-03T01:34:57Z

Starting review of the Gemini CLI 429 hardening changes. This PR introduces pre-stream failure handling to ensure proper HTTP semantics. Let me analyze the streaming bootstrap logic and error handling flow...

mirrobot-agent

Overall Assessment

This PR introduces well-structured pre-stream failure handling to ensure proper HTTP semantics for 429 errors before stream commit. The approach is sound: priming the first chunk before constructing the StreamingResponse allows the HTTP layer to return appropriate status codes and headers instead of committing to a 200 response prematurely.

The implementation is thorough, with careful attention to tracking pre-stream failure metadata and preserving Retry-After information. The separation between pre-stream failures (raising StreamBootstrapError) and post-commit failures (in-band SSE errors) is clear and logical.

Architectural Feedback

The design effectively addresses the core problem of ambiguous 429 handling before streaming starts. The StreamBootstrapError exception provides a clean mechanism for signaling bootstrap failures to the HTTP layer.

One consideration: the status code selection logic in _build_stream_bootstrap_error uses the last status code when failures are mixed. This might not always represent the most actionable information for clients (e.g., a 429 buried among 500s).

Key Suggestions

Exception handling at line 1093: The re-raising of last_exception when there are no accumulated errors could propagate non-StreamBootstrapError exceptions to the HTTP layer. Consider wrapping or documenting expected exception types.
Test coverage: Given the significance of this behavioral change, comprehensive tests for pre-stream failure paths would strengthen reliability, especially for 429 handling with Retry-After headers.
Status code priority: The current logic prioritizes uniformity (all 429s → 429) over presence (any 429 → 429). Consider whether this matches the intended client retry behavior.

Questions for the Author

What's the intended behavior when retry_after is 0? Currently, the Retry-After header is omitted.
Should 429 status codes be prioritized if they appear at any point in the failure chain, even if mixed with other status codes?

This review was generated by an AI assistant.

mirrobot-agent · 2026-03-03T01:39:33Z

src/rotator_library/client/executor.py

+                return
+
+            if last_exception and not error_accumulator.has_errors():
+                raise last_exception


This check re-raises last_exception if there are no accumulated errors. However, this exception may not be a StreamBootstrapError, which could lead to unhandled exceptions at the HTTP layer. Consider whether this should wrap the exception in a StreamBootstrapError or document the expected exception types.

mirrobot-agent · 2026-03-03T01:39:33Z

src/rotator_library/client/executor.py

+            if all(code == 429 for code in prestream_status_codes):
+                status_code = 429
+            else:
+                status_code = prestream_status_codes[-1]


The status code selection logic uses the last status code when not all are 429. For mixed failures like [500, 429, 500], this returns 500. Consider whether prioritizing 429 (if any exist) or using the most severe status code would be more appropriate for client retry behavior.

mirrobot-agent · 2026-03-03T01:39:33Z

src/proxy_app/main.py

+                status_code = e.status_code if e.status_code else 500
+                headers = {}
+                if e.retry_after is not None and e.retry_after > 0:
+                    headers["Retry-After"] = str(int(e.retry_after))


The Retry-After header is only set when retry_after > 0. If retry_after is exactly 0, the header will not be set. Consider whether retry_after >= 0 would be more appropriate, or document why 0 is excluded.

Mirrowel · 2026-03-03T03:34:58Z

@mirrobot-agent Explain to me what exactly does this PR do? How does it "fix" 429 errors, if it does.

mirrobot-agent · 2026-03-03T03:35:37Z

@Mirrowel, I'll explain what this PR does and how it addresses 429 errors.

What This PR Does

This PR hardens the handling of 429 (Rate Limit Exceeded) errors in the Gemini CLI integration by ensuring proper HTTP semantics when rate-limit failures occur before the streaming response begins.

The Problem (Before This Fix)

When a 429 error occurred before the first chunk of a stream was emitted, the system could:

Commit to a 200 OK streaming response prematurely
Then surface the 429 error after the stream had already started
This made it ambiguous for clients - they received a 200 status code but then got rate-limited, hiding the actual retry timing information

The Root Cause

The streaming bootstrap logic and response commit timing were not clearly separated. Failures that occurred before the first chunk could still flow through a path that had already committed to streaming semantics.

The Fix

The PR introduces pre-stream failure handling with these key changes:

Prime the first chunk before constructing StreamingResponse - This ensures we know if the request will succeed before committing to a 200 response
Introduce StreamBootstrapError exception - Signals bootstrap failures to the HTTP layer
Preserve HTTP status codes and Retry-After headers - Clients now receive the actual 429 status code with proper retry metadata
Prevent retry/rotation after stream commit - Once streaming starts, error handling follows a different path

How It "Fixes" 429 Errors

This doesn't prevent 429 errors from occurring (those are imposed by the Gemini API). Instead, it fixes how 429 errors are communicated to clients:

✅ Clients now receive a proper 429 status code instead of 200
✅ Retry-After headers are preserved so clients know when to retry
✅ HTTP semantics are correct - no more ambiguous 200 responses that fail mid-stream
✅ Retry guidance is reliable - clients can honor the API's retry timing

Validation

The PR body indicates manual verification was performed:

Confirmed PR metadata updates through GitHub CLI
Verified the fix/hardening framing in the title
Confirmed body was applied correctly

The changes are focused on error paths, so existing successful streaming behavior remains unchanged.

This explanation was generated by an AI assistant.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

src/rotator_library/client/executor.py (1)

1092-1101: ⚠️ Potential issue | 🟠 Major

Raw raise last_exception can bypass bootstrap HTTP mapping.

At Lines [1092]-[1093], re-raising a non-StreamBootstrapError can skip the structured pre-stream HTTP error path and lose consistent status/payload behavior.

🔧 Proposed fix

-            if last_exception and not error_accumulator.has_errors():
-                raise last_exception
+            if last_exception and not error_accumulator.has_errors():
+                fallback_error = {
+                    "error": {
+                        "message": str(last_exception),
+                        "type": "proxy_internal_error",
+                    }
+                }
+                raise self._build_stream_bootstrap_error(
+                    fallback_error,
+                    prestream_status_codes,
+                    prestream_retry_after_values,
+                    last_exception,
+                ) from last_exception

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/client/executor.py` around lines 1092 - 1101, The code
currently does "if last_exception and not error_accumulator.has_errors(): raise
last_exception" which can bypass the structured pre-stream HTTP mapping;
instead, detect when last_exception is set and do not re-raise it raw—instead
pass its info into the existing bootstrap error path by calling
self._build_stream_bootstrap_error(...) (use
error_accumulator.build_client_error_response(), prestream_status_codes,
prestream_retry_after_values and last_exception) and raise that
StreamBootstrapError so all errors follow the same HTTP/status/payload mapping;
update the branch that references last_exception and
error_accumulator.has_errors() to always raise the result of
self._build_stream_bootstrap_error rather than raw last_exception.

🧹 Nitpick comments (1)

src/rotator_library/client/executor.py (1)

890-895: Pass the caught exception object into bootstrap error synthesis (and preserve chaining).

At Line [894], original can be non-exception payload data; _build_stream_bootstrap_error(..., last_exception=...) is safer with the caught exception object. Also chain with from to preserve traceback provenance.

🔧 Proposed fix

-                                            raise self._build_stream_bootstrap_error(
+                                            raise self._build_stream_bootstrap_error(
                                                 error_data,
                                                 prestream_status_codes,
                                                 prestream_retry_after_values,
-                                                original,
-                                            )
+                                                e,
+                                            ) from e
@@
-                                            raise self._build_stream_bootstrap_error(
+                                            raise self._build_stream_bootstrap_error(
                                                 error_data,
                                                 prestream_status_codes,
                                                 prestream_retry_after_values,
                                                 e,
-                                            )
+                                            ) from e

Also applies to: 951-956

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/client/executor.py` around lines 890 - 895, The raise is
passing a non-exception payload (`original`) into _build_stream_bootstrap_error;
change the call sites (around the raise that calls _build_stream_bootstrap_error
and the similar block at the 951-956 region) to pass the actual caught exception
object (e.g., the exception variable from the except block) as the
last_exception/last_exception parameter instead of `original`, and re-raise the
constructed exception using "raise <constructed_exception> from
<caught_exception>" to preserve traceback chaining; locate the calls to
_build_stream_bootstrap_error in executor.py and replace the last arg with the
caught exception variable and add the from clause when raising.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/proxy_app/main.py`:
- Around line 1035-1044: The raw logging calls can raise when
response.model_dump() is not present and thus convert a successful response into
a 500; update the block that calls raw_logger.log_final_response to guard the
model_dump call (e.g., check hasattr(response, "model_dump") or use a
try/except) and pass a safe fallback (like the raw response or its str/repr) for
the body so logging never raises; modify the code referencing raw_logger,
response.model_dump(), response.headers and status_code to safely derive body
without throwing and then call raw_logger.log_final_response with the safe body
value.
- Around line 994-995: The code sets headers["Retry-After"] =
str(int(e.retry_after)), which floors fractional retry_after values and can make
clients retry too early; change this to round up instead (e.g., import math and
set headers["Retry-After"] = str(math.ceil(e.retry_after))) so any fractional
seconds are preserved by rounding up; update the exception handling block that
accesses e.retry_after and ensure math.ceil is used rather than int to compute
the header value.

---

Duplicate comments:
In `@src/rotator_library/client/executor.py`:
- Around line 1092-1101: The code currently does "if last_exception and not
error_accumulator.has_errors(): raise last_exception" which can bypass the
structured pre-stream HTTP mapping; instead, detect when last_exception is set
and do not re-raise it raw—instead pass its info into the existing bootstrap
error path by calling self._build_stream_bootstrap_error(...) (use
error_accumulator.build_client_error_response(), prestream_status_codes,
prestream_retry_after_values and last_exception) and raise that
StreamBootstrapError so all errors follow the same HTTP/status/payload mapping;
update the branch that references last_exception and
error_accumulator.has_errors() to always raise the result of
self._build_stream_bootstrap_error rather than raw last_exception.

---

Nitpick comments:
In `@src/rotator_library/client/executor.py`:
- Around line 890-895: The raise is passing a non-exception payload (`original`)
into _build_stream_bootstrap_error; change the call sites (around the raise that
calls _build_stream_bootstrap_error and the similar block at the 951-956 region)
to pass the actual caught exception object (e.g., the exception variable from
the except block) as the last_exception/last_exception parameter instead of
`original`, and re-raise the constructed exception using "raise
<constructed_exception> from <caught_exception>" to preserve traceback chaining;
locate the calls to _build_stream_bootstrap_error in executor.py and replace the
last arg with the caught exception variable and add the from clause when
raising.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b62f6e4 and 65d9d95.

📒 Files selected for processing (3)

src/proxy_app/main.py
src/rotator_library/client/executor.py
src/rotator_library/core/errors.py

coderabbitai · 2026-03-03T03:54:44Z

src/proxy_app/main.py

+                if e.retry_after is not None and e.retry_after > 0:
+                    headers["Retry-After"] = str(int(e.retry_after))


⚠️ Potential issue | 🟡 Minor

Avoid truncating Retry-After to a lower value.

At Line [995], int(e.retry_after) floors fractional values (e.g., 0.5 -> 0), which can cause clients to retry too early.

🔧 Proposed fix

+import math @@ - if e.retry_after is not None and e.retry_after > 0: - headers["Retry-After"] = str(int(e.retry_after)) + if e.retry_after is not None and e.retry_after > 0: + headers["Retry-After"] = str(math.ceil(e.retry_after))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if e.retry_after is not None and e.retry_after > 0:

headers["Retry-After"] = str(int(e.retry_after))

import math

if e.retry_after is not None and e.retry_after > 0:

headers["Retry-After"] = str(math.ceil(e.retry_after))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/proxy_app/main.py` around lines 994 - 995, The code sets headers["Retry-After"] = str(int(e.retry_after)), which floors fractional retry_after values and can make clients retry too early; change this to round up instead (e.g., import math and set headers["Retry-After"] = str(math.ceil(e.retry_after))) so any fractional seconds are preserved by rounding up; update the exception handling block that accesses e.retry_after and ensure math.ceil is used rather than int to compute the header value.

coderabbitai · 2026-03-03T03:54:44Z

src/proxy_app/main.py

+        if raw_logger:
+            # Assuming response has status_code and headers attributes
+            # This might need adjustment based on the actual response object
+            response_headers = response.headers if hasattr(response, "headers") else None
+            status_code = response.status_code if hasattr(response, "status_code") else 200
+            raw_logger.log_final_response(
+                status_code=status_code,
+                headers=response_headers,
+                body=response.model_dump(),
+            )


⚠️ Potential issue | 🟠 Major

Don’t let raw logging break a successful completion response.

At Line [1043], response.model_dump() is unconditional. If a provider/plugin returns a non-Pydantic object, this can raise and turn a successful upstream response into a 500 when raw logging is enabled.

🔧 Proposed fix

if raw_logger: # Assuming response has status_code and headers attributes # This might need adjustment based on the actual response object response_headers = response.headers if hasattr(response, "headers") else None status_code = response.status_code if hasattr(response, "status_code") else 200 + if hasattr(response, "model_dump"): + response_body = response.model_dump() + elif isinstance(response, dict): + response_body = response + else: + response_body = {"response": str(response)} raw_logger.log_final_response( status_code=status_code, headers=response_headers, - body=response.model_dump(), + body=response_body, )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/proxy_app/main.py` around lines 1035 - 1044, The raw logging calls can raise when response.model_dump() is not present and thus convert a successful response into a 500; update the block that calls raw_logger.log_final_response to guard the model_dump call (e.g., check hasattr(response, "model_dump") or use a try/except) and pass a safe fallback (like the raw response or its str/repr) for the body so logging never raises; modify the code referencing raw_logger, response.model_dump(), response.headers and status_code to safely derive body without throwing and then call raw_logger.log_final_response with the safe body value.

MasuRii added 2 commits March 2, 2026 05:25

fix(proxy): return HTTP errors before stream commit

b82ef28

Prime the first streaming chunk before constructing StreamingResponse so bootstrap failures can return non-200 responses with Retry-After headers when available.

MasuRii requested a review from Mirrowel as a code owner March 1, 2026 21:25

MasuRii changed the title ~~fix(streaming): harden bootstrap error handling for SSE~~ fix(gemini_cli): harden 429 failure handling Mar 1, 2026

chore(rotator): restore removed inline comments

65d9d95

MasuRii marked this pull request as draft March 1, 2026 23:14

mirrobot-agent bot reviewed Mar 3, 2026

View reviewed changes

Mirrowel added the needs discussion label Mar 3, 2026

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

Repository owner deleted a comment from coderabbitai bot Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(gemini_cli): harden 429 failure handling#145

fix(gemini_cli): harden 429 failure handling#145
MasuRii wants to merge 3 commits intoMirrowel:devfrom
MasuRii:fix/stream-bootstrap-handling

MasuRii commented Mar 1, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

mirrobot-agent bot commented Mar 3, 2026

Uh oh!

mirrobot-agent bot left a comment

Uh oh!

mirrobot-agent bot Mar 3, 2026

Uh oh!

mirrobot-agent bot Mar 3, 2026

Uh oh!

mirrobot-agent bot Mar 3, 2026

Uh oh!

Mirrowel commented Mar 3, 2026

Uh oh!

mirrobot-agent bot commented Mar 3, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 3, 2026

Uh oh!

coderabbitai bot Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if e.retry_after is not None and e.retry_after > 0:
		headers["Retry-After"] = str(int(e.retry_after))

Uh oh!

Conversation

MasuRii commented Mar 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Root Cause

Changes

Impact

Validation

Risks

Checklist

Summary by CodeRabbit

Uh oh!

mirrobot-agent bot commented Mar 3, 2026

Uh oh!

mirrobot-agent bot left a comment

Choose a reason for hiding this comment

Overall Assessment

Architectural Feedback

Key Suggestions

Questions for the Author

Uh oh!

mirrobot-agent bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Mirrowel commented Mar 3, 2026

Uh oh!

mirrobot-agent bot commented Mar 3, 2026

What This PR Does

The Problem (Before This Fix)

The Root Cause

The Fix

How It "Fixes" 429 Errors

Validation

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MasuRii commented Mar 1, 2026 •

edited by coderabbitai bot

Loading