Skip to content

fix(gemini_cli): harden 429 failure handling#145

Draft
MasuRii wants to merge 3 commits intoMirrowel:devfrom
MasuRii:fix/stream-bootstrap-handling
Draft

fix(gemini_cli): harden 429 failure handling#145
MasuRii wants to merge 3 commits intoMirrowel:devfrom
MasuRii:fix/stream-bootstrap-handling

Conversation

@MasuRii
Copy link
Contributor

@MasuRii MasuRii commented Mar 1, 2026

Summary

This PR hardens Gemini CLI rate-limit behavior by preserving proper HTTP semantics when 429 errors occur before streaming starts. It ensures clients receive accurate non-200 responses and retry metadata instead of a committed 200 stream.

Problem

Pre-stream failures could be surfaced after a streaming response was already committed. This made 429 handling ambiguous for clients and could hide actionable retry timing.

Root Cause

Streaming bootstrap and response commit timing were not clearly separated. Failures that occurred before first chunk emission could still flow through a path that had already committed streaming semantics.

Changes

  • Added explicit pre-stream failure handling in the rotator flow
  • Introduced bootstrap failure signaling for failures before first chunk emission
  • Primed first streaming chunk before constructing StreamingResponse
  • Preserved status code and Retry-After metadata for pre-stream HTTP errors
  • Prevented retry/rotation after stream commit

Impact

  • Improves Gemini CLI 429 resilience and correctness
  • Restores correct HTTP behavior for pre-stream failures
  • Enables clients to honor retry guidance reliably

Validation

  • Verified PR metadata update through GitHub CLI
  • Confirmed PR title reflects fix/hardening framing for Gemini CLI 429
  • Confirmed body was applied via --body-file

Risks

  • Low: behavior changes are focused on pre-stream error paths
  • Existing successful streaming paths remain unchanged

Checklist

  • Title uses conventional fix/hardening framing for Gemini CLI 429
  • PR body updated using local markdown file via --body-file
  • Body sections follow the required standard structure
  • Scope matches committed changes without introducing unrelated claims

Summary by CodeRabbit

  • Bug Fixes
    • Streaming endpoints now return proper HTTP error responses for pre-transmission failures, replacing in-band error handling.
    • Added support for rate-limit retry-after headers in streaming responses.
    • Improved error handling for mid-stream failures with structured error messages and completion markers.

MasuRii added 2 commits March 2, 2026 05:25
Introduce StreamBootstrapError for failures before first chunk emission and use it in streaming execution flow to preserve correct HTTP semantics.

Prevent retry/rotation after stream commit and aggregate pre-stream failure metadata for accurate status propagation.
Prime the first streaming chunk before constructing StreamingResponse so bootstrap failures can return non-200 responses with Retry-After headers when available.
@MasuRii MasuRii requested a review from Mirrowel as a code owner March 1, 2026 21:25
@MasuRii MasuRii changed the title fix(streaming): harden bootstrap error handling for SSE fix(gemini_cli): harden 429 failure handling Mar 1, 2026
@MasuRii MasuRii marked this pull request as draft March 1, 2026 23:14
@mirrobot-agent
Copy link
Contributor

mirrobot-agent bot commented Mar 3, 2026

Starting review of the Gemini CLI 429 hardening changes. This PR introduces pre-stream failure handling to ensure proper HTTP semantics. Let me analyze the streaming bootstrap logic and error handling flow...

Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

This PR introduces well-structured pre-stream failure handling to ensure proper HTTP semantics for 429 errors before stream commit. The approach is sound: priming the first chunk before constructing the StreamingResponse allows the HTTP layer to return appropriate status codes and headers instead of committing to a 200 response prematurely.

The implementation is thorough, with careful attention to tracking pre-stream failure metadata and preserving Retry-After information. The separation between pre-stream failures (raising StreamBootstrapError) and post-commit failures (in-band SSE errors) is clear and logical.

Architectural Feedback

The design effectively addresses the core problem of ambiguous 429 handling before streaming starts. The StreamBootstrapError exception provides a clean mechanism for signaling bootstrap failures to the HTTP layer.

One consideration: the status code selection logic in _build_stream_bootstrap_error uses the last status code when failures are mixed. This might not always represent the most actionable information for clients (e.g., a 429 buried among 500s).

Key Suggestions

  • Exception handling at line 1093: The re-raising of last_exception when there are no accumulated errors could propagate non-StreamBootstrapError exceptions to the HTTP layer. Consider wrapping or documenting expected exception types.
  • Test coverage: Given the significance of this behavioral change, comprehensive tests for pre-stream failure paths would strengthen reliability, especially for 429 handling with Retry-After headers.
  • Status code priority: The current logic prioritizes uniformity (all 429s → 429) over presence (any 429 → 429). Consider whether this matches the intended client retry behavior.

Questions for the Author

  • What's the intended behavior when retry_after is 0? Currently, the Retry-After header is omitted.
  • Should 429 status codes be prioritized if they appear at any point in the failure chain, even if mixed with other status codes?

This review was generated by an AI assistant.

return

if last_exception and not error_accumulator.has_errors():
raise last_exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check re-raises last_exception if there are no accumulated errors. However, this exception may not be a StreamBootstrapError, which could lead to unhandled exceptions at the HTTP layer. Consider whether this should wrap the exception in a StreamBootstrapError or document the expected exception types.

if all(code == 429 for code in prestream_status_codes):
status_code = 429
else:
status_code = prestream_status_codes[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status code selection logic uses the last status code when not all are 429. For mixed failures like [500, 429, 500], this returns 500. Consider whether prioritizing 429 (if any exist) or using the most severe status code would be more appropriate for client retry behavior.

status_code = e.status_code if e.status_code else 500
headers = {}
if e.retry_after is not None and e.retry_after > 0:
headers["Retry-After"] = str(int(e.retry_after))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Retry-After header is only set when retry_after > 0. If retry_after is exactly 0, the header will not be set. Consider whether retry_after >= 0 would be more appropriate, or document why 0 is excluded.

@Mirrowel
Copy link
Owner

Mirrowel commented Mar 3, 2026

@mirrobot-agent Explain to me what exactly does this PR do? How does it "fix" 429 errors, if it does.

@mirrobot-agent
Copy link
Contributor

mirrobot-agent bot commented Mar 3, 2026

@Mirrowel, I'll explain what this PR does and how it addresses 429 errors.

What This PR Does

This PR hardens the handling of 429 (Rate Limit Exceeded) errors in the Gemini CLI integration by ensuring proper HTTP semantics when rate-limit failures occur before the streaming response begins.

The Problem (Before This Fix)

When a 429 error occurred before the first chunk of a stream was emitted, the system could:

  1. Commit to a 200 OK streaming response prematurely
  2. Then surface the 429 error after the stream had already started
  3. This made it ambiguous for clients - they received a 200 status code but then got rate-limited, hiding the actual retry timing information

The Root Cause

The streaming bootstrap logic and response commit timing were not clearly separated. Failures that occurred before the first chunk could still flow through a path that had already committed to streaming semantics.

The Fix

The PR introduces pre-stream failure handling with these key changes:

  1. Prime the first chunk before constructing StreamingResponse - This ensures we know if the request will succeed before committing to a 200 response
  2. Introduce StreamBootstrapError exception - Signals bootstrap failures to the HTTP layer
  3. Preserve HTTP status codes and Retry-After headers - Clients now receive the actual 429 status code with proper retry metadata
  4. Prevent retry/rotation after stream commit - Once streaming starts, error handling follows a different path

How It "Fixes" 429 Errors

This doesn't prevent 429 errors from occurring (those are imposed by the Gemini API). Instead, it fixes how 429 errors are communicated to clients:

  • ✅ Clients now receive a proper 429 status code instead of 200
  • Retry-After headers are preserved so clients know when to retry
  • HTTP semantics are correct - no more ambiguous 200 responses that fail mid-stream
  • Retry guidance is reliable - clients can honor the API's retry timing

Validation

The PR body indicates manual verification was performed:

  • Confirmed PR metadata updates through GitHub CLI
  • Verified the fix/hardening framing in the title
  • Confirmed body was applied correctly

The changes are focused on error paths, so existing successful streaming behavior remains unchanged.

This explanation was generated by an AI assistant.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/rotator_library/client/executor.py (1)

1092-1101: ⚠️ Potential issue | 🟠 Major

Raw raise last_exception can bypass bootstrap HTTP mapping.

At Lines [1092]-[1093], re-raising a non-StreamBootstrapError can skip the structured pre-stream HTTP error path and lose consistent status/payload behavior.

🔧 Proposed fix
-            if last_exception and not error_accumulator.has_errors():
-                raise last_exception
+            if last_exception and not error_accumulator.has_errors():
+                fallback_error = {
+                    "error": {
+                        "message": str(last_exception),
+                        "type": "proxy_internal_error",
+                    }
+                }
+                raise self._build_stream_bootstrap_error(
+                    fallback_error,
+                    prestream_status_codes,
+                    prestream_retry_after_values,
+                    last_exception,
+                ) from last_exception
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/client/executor.py` around lines 1092 - 1101, The code
currently does "if last_exception and not error_accumulator.has_errors(): raise
last_exception" which can bypass the structured pre-stream HTTP mapping;
instead, detect when last_exception is set and do not re-raise it raw—instead
pass its info into the existing bootstrap error path by calling
self._build_stream_bootstrap_error(...) (use
error_accumulator.build_client_error_response(), prestream_status_codes,
prestream_retry_after_values and last_exception) and raise that
StreamBootstrapError so all errors follow the same HTTP/status/payload mapping;
update the branch that references last_exception and
error_accumulator.has_errors() to always raise the result of
self._build_stream_bootstrap_error rather than raw last_exception.
🧹 Nitpick comments (1)
src/rotator_library/client/executor.py (1)

890-895: Pass the caught exception object into bootstrap error synthesis (and preserve chaining).

At Line [894], original can be non-exception payload data; _build_stream_bootstrap_error(..., last_exception=...) is safer with the caught exception object. Also chain with from to preserve traceback provenance.

🔧 Proposed fix
-                                            raise self._build_stream_bootstrap_error(
+                                            raise self._build_stream_bootstrap_error(
                                                 error_data,
                                                 prestream_status_codes,
                                                 prestream_retry_after_values,
-                                                original,
-                                            )
+                                                e,
+                                            ) from e
@@
-                                            raise self._build_stream_bootstrap_error(
+                                            raise self._build_stream_bootstrap_error(
                                                 error_data,
                                                 prestream_status_codes,
                                                 prestream_retry_after_values,
                                                 e,
-                                            )
+                                            ) from e

Also applies to: 951-956

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rotator_library/client/executor.py` around lines 890 - 895, The raise is
passing a non-exception payload (`original`) into _build_stream_bootstrap_error;
change the call sites (around the raise that calls _build_stream_bootstrap_error
and the similar block at the 951-956 region) to pass the actual caught exception
object (e.g., the exception variable from the except block) as the
last_exception/last_exception parameter instead of `original`, and re-raise the
constructed exception using "raise <constructed_exception> from
<caught_exception>" to preserve traceback chaining; locate the calls to
_build_stream_bootstrap_error in executor.py and replace the last arg with the
caught exception variable and add the from clause when raising.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/proxy_app/main.py`:
- Around line 1035-1044: The raw logging calls can raise when
response.model_dump() is not present and thus convert a successful response into
a 500; update the block that calls raw_logger.log_final_response to guard the
model_dump call (e.g., check hasattr(response, "model_dump") or use a
try/except) and pass a safe fallback (like the raw response or its str/repr) for
the body so logging never raises; modify the code referencing raw_logger,
response.model_dump(), response.headers and status_code to safely derive body
without throwing and then call raw_logger.log_final_response with the safe body
value.
- Around line 994-995: The code sets headers["Retry-After"] =
str(int(e.retry_after)), which floors fractional retry_after values and can make
clients retry too early; change this to round up instead (e.g., import math and
set headers["Retry-After"] = str(math.ceil(e.retry_after))) so any fractional
seconds are preserved by rounding up; update the exception handling block that
accesses e.retry_after and ensure math.ceil is used rather than int to compute
the header value.

---

Duplicate comments:
In `@src/rotator_library/client/executor.py`:
- Around line 1092-1101: The code currently does "if last_exception and not
error_accumulator.has_errors(): raise last_exception" which can bypass the
structured pre-stream HTTP mapping; instead, detect when last_exception is set
and do not re-raise it raw—instead pass its info into the existing bootstrap
error path by calling self._build_stream_bootstrap_error(...) (use
error_accumulator.build_client_error_response(), prestream_status_codes,
prestream_retry_after_values and last_exception) and raise that
StreamBootstrapError so all errors follow the same HTTP/status/payload mapping;
update the branch that references last_exception and
error_accumulator.has_errors() to always raise the result of
self._build_stream_bootstrap_error rather than raw last_exception.

---

Nitpick comments:
In `@src/rotator_library/client/executor.py`:
- Around line 890-895: The raise is passing a non-exception payload (`original`)
into _build_stream_bootstrap_error; change the call sites (around the raise that
calls _build_stream_bootstrap_error and the similar block at the 951-956 region)
to pass the actual caught exception object (e.g., the exception variable from
the except block) as the last_exception/last_exception parameter instead of
`original`, and re-raise the constructed exception using "raise
<constructed_exception> from <caught_exception>" to preserve traceback chaining;
locate the calls to _build_stream_bootstrap_error in executor.py and replace the
last arg with the caught exception variable and add the from clause when
raising.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b62f6e4 and 65d9d95.

📒 Files selected for processing (3)
  • src/proxy_app/main.py
  • src/rotator_library/client/executor.py
  • src/rotator_library/core/errors.py

Comment on lines +994 to +995
if e.retry_after is not None and e.retry_after > 0:
headers["Retry-After"] = str(int(e.retry_after))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid truncating Retry-After to a lower value.

At Line [995], int(e.retry_after) floors fractional values (e.g., 0.5 -> 0), which can cause clients to retry too early.

🔧 Proposed fix
+import math
@@
-                if e.retry_after is not None and e.retry_after > 0:
-                    headers["Retry-After"] = str(int(e.retry_after))
+                if e.retry_after is not None and e.retry_after > 0:
+                    headers["Retry-After"] = str(math.ceil(e.retry_after))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if e.retry_after is not None and e.retry_after > 0:
headers["Retry-After"] = str(int(e.retry_after))
import math
if e.retry_after is not None and e.retry_after > 0:
headers["Retry-After"] = str(math.ceil(e.retry_after))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy_app/main.py` around lines 994 - 995, The code sets
headers["Retry-After"] = str(int(e.retry_after)), which floors fractional
retry_after values and can make clients retry too early; change this to round up
instead (e.g., import math and set headers["Retry-After"] =
str(math.ceil(e.retry_after))) so any fractional seconds are preserved by
rounding up; update the exception handling block that accesses e.retry_after and
ensure math.ceil is used rather than int to compute the header value.

Comment on lines +1035 to +1044
if raw_logger:
# Assuming response has status_code and headers attributes
# This might need adjustment based on the actual response object
response_headers = response.headers if hasattr(response, "headers") else None
status_code = response.status_code if hasattr(response, "status_code") else 200
raw_logger.log_final_response(
status_code=status_code,
headers=response_headers,
body=response.model_dump(),
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t let raw logging break a successful completion response.

At Line [1043], response.model_dump() is unconditional. If a provider/plugin returns a non-Pydantic object, this can raise and turn a successful upstream response into a 500 when raw logging is enabled.

🔧 Proposed fix
         if raw_logger:
             # Assuming response has status_code and headers attributes
             # This might need adjustment based on the actual response object
             response_headers = response.headers if hasattr(response, "headers") else None
             status_code = response.status_code if hasattr(response, "status_code") else 200
+            if hasattr(response, "model_dump"):
+                response_body = response.model_dump()
+            elif isinstance(response, dict):
+                response_body = response
+            else:
+                response_body = {"response": str(response)}
             raw_logger.log_final_response(
                 status_code=status_code,
                 headers=response_headers,
-                body=response.model_dump(),
+                body=response_body,
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy_app/main.py` around lines 1035 - 1044, The raw logging calls can
raise when response.model_dump() is not present and thus convert a successful
response into a 500; update the block that calls raw_logger.log_final_response
to guard the model_dump call (e.g., check hasattr(response, "model_dump") or use
a try/except) and pass a safe fallback (like the raw response or its str/repr)
for the body so logging never raises; modify the code referencing raw_logger,
response.model_dump(), response.headers and status_code to safely derive body
without throwing and then call raw_logger.log_final_response with the safe body
value.

Repository owner deleted a comment from coderabbitai bot Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants