fix(embeddings): add 429 backoff to stop 31K Sentry event flood (#2898)#2904
Conversation
…inyhumansai#2898) Introduces a reactive retry loop in OpenAiEmbedding::embed that retries on 429 and 503 up to MAX_429_RETRIES (3) times before bailing. Retry-After delta-seconds from the server are honoured when present; exponential backoff (1 s, 2 s, 4 s, capped at 30 s) is used as fallback. The bail message preserves the canonical "(429 …)" shape so the TransientUpstreamHttp classifier in core::observability continues to demote it to a warning breadcrumb rather than a Sentry error event. Backoff logic is extracted into retry_after.rs so cohere.rs can share it without duplication.
Mirrors the openai.rs retry loop in CohereEmbedding::embed using the same retry_after::backoff_ms_for_attempt helper. Cohere's canonical error shape "Cohere embed API error (429 …)" also matches the api error (429 prefix checked by is_transient_upstream_http_message, so Sentry suppression is preserved on the bail path.
…reamHttp Proves that the canonical error messages produced by openai.rs and cohere.rs on 429 — both the live-path format and the retry-cap bail format — are already matched by is_transient_upstream_http_message via the "(429 " token, and therefore routed to warn! instead of a Sentry error event.
Three new async tests for OpenAiEmbedding::embed: - embed_429_retries_with_retry_after_then_succeeds: mock returns 429×2 then 200, verifies exactly 3 requests and correct embedding result. - embed_429_indefinite_bails_after_retry_cap: mock always 429, verifies MAX_429_RETRIES+1 requests and canonical transient error message. - embed_429_without_retry_after_uses_exponential_backoff: unit-tests backoff_ms_for_attempt helper directly plus an end-to-end 429→200 round-trip. Two new tests for CohereEmbedding via TestCohereEmbedding wrapper: - cohere_embed_429_then_success - cohere_embed_429_indefinite_bails_after_cap
… (no shadow impl) Add a `base_url` field to `CohereEmbedding` (defaulting to `COHERE_API_BASE` in the public constructor, no caller breakage) and a `#[cfg(test)]` `with_base_url` builder so tests can point at a local axum mock server. Delete `TestCohereEmbedding` — the shadow implementation that duplicated the full retry loop, header construction, 429 detection, and Retry-After parsing. Rewrite both 429 backoff tests to construct a real `CohereEmbedding` via `with_base_url` and call `CohereEmbedding::embed` directly, mirroring the pattern already used by the OpenAI 429 tests.
…st-only base url Replace the dead post-loop anyhow::bail! in both openai.rs and cohere.rs with unreachable! and an explanatory comment. The bail was structurally unreachable: on the final attempt the retryable guard (attempt < MAX_429_RETRIES) is false, so execution falls into the existing non-2xx branch which already bails with the body-bearing "... (429 ...): <body>" format — the format that tests assert and that TransientUpstreamHttp in core::observability uses to suppress the Sentry event. Also: document why CohereEmbedding::with_base_url is cfg(test)-only on the method itself (OpenAI takes its base URL via new(); Cohere historically hardcoded COHERE_API_BASE so the builder fills the gap for 429 tests), and log the retried response body at debug level instead of discarding it.
📝 WalkthroughWalkthroughAdds a Retry-After/backoff utility and uses it in OpenAI and Cohere embedding clients to retry HTTP 429/503 with Retry-After precedence and exponential fallback; observability is updated with a regression test classifying these 429s as transient. ChangesEmbedding 429 Retry and Backoff
Sequence Diagram(s)sequenceDiagram
participant Client
participant EmbeddingAPI as OpenAI/Cohere API
participant RetryUtils
participant Observability
Client->>EmbeddingAPI: POST /embeddings (attempt N)
EmbeddingAPI-->>Client: 429/503 (+ optional Retry-After)
Client->>RetryUtils: parse_retry_after_ms / backoff_ms_for_attempt
RetryUtils-->>Client: delay ms
Client->>Client: sleep(delay)
alt retries exhausted or non-retryable
Client->>Observability: report_error_or_expected(status, body)
Client-->>Client: return error
else 200 OK
EmbeddingAPI-->>Client: 200 + embeddings
Client-->>Client: parse, validate, return embeddings
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly Related PRs
Suggested Reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/openhuman/embeddings/openai_tests.rs (1)
233-241:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winThis 429 test now sleeps ~7s due to the new retry loop.
This mock returns 429 with no
Retry-After, so the new backoff loop retries with the exponential fallback (1s + 2s + 4s ≈ 7s of realtokio::time::sleep) before bailing. AddRetry-After: 0to keep CI fast while preserving the canonical-format assertions.💚 Keep the assertions, drop the wall-clock delay
let app = Router::new().route( "/v1/embeddings", - post(|| async { - ( - StatusCode::TOO_MANY_REQUESTS, - r#"{"error":{"message":"Rate limit exceeded.","type":"rate_limit_error"}}"#, - ) - }), + post(|| async { + axum::response::Response::builder() + .status(StatusCode::TOO_MANY_REQUESTS) + .header("Retry-After", "0") + .body(axum::body::Body::from( + r#"{"error":{"message":"Rate limit exceeded.","type":"rate_limit_error"}}"#, + )) + .unwrap() + }), );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/embeddings/openai_tests.rs` around lines 233 - 241, The 429 mock handler for the "/v1/embeddings" route causes long real sleeps because it doesn't set a Retry-After header; update the Router::new().route handler so it returns a 429 response that includes the HTTP header "Retry-After: 0" alongside the body (i.e., modify the closure used in Router::new().route for "/v1/embeddings" to include the Retry-After header in the response tuple/object), preserving the same status and JSON body and keeping the existing assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/embeddings/openai_tests.rs`:
- Around line 587-596: The test comment is misleading: the mock response inside
the branch guarding `if *n == 1` actually sets a "Retry-After: 0" header so the
HTTP round-trip exercises the header-based path rather than the no-header
exponential backoff; update the comment to state that this branch returns a
0-second Retry-After header (so the header path is taken) and, if you intend to
exercise the no-header exponential fallback end-to-end, remove the header here
or add a separate case that omits the "Retry-After" header; reference the `if *n
== 1` branch and the `backoff_ms_for_attempt` assertions so readers understand
which parts test header vs. exponential behavior.
In `@src/openhuman/embeddings/openai.rs`:
- Around line 127-174: The current embed function acquires a single rate-limit
token once before the retry loop, but each HTTP attempt (including retries) must
consume a token; modify embed so it calls
super::rate_limit::acquire_embedding_slot(&self.base_url).await inside the retry
loop immediately before performing req.send().await (or the request send path),
ensuring each attempt invokes TokenBucket::acquire and therefore every retry is
budgeted by the shared per-endpoint bucket; remove or adjust the single pre-loop
acquire to avoid double-consumption or leave it out if you move consumption into
the loop.
---
Outside diff comments:
In `@src/openhuman/embeddings/openai_tests.rs`:
- Around line 233-241: The 429 mock handler for the "/v1/embeddings" route
causes long real sleeps because it doesn't set a Retry-After header; update the
Router::new().route handler so it returns a 429 response that includes the HTTP
header "Retry-After: 0" alongside the body (i.e., modify the closure used in
Router::new().route for "/v1/embeddings" to include the Retry-After header in
the response tuple/object), preserving the same status and JSON body and keeping
the existing assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 18bb235f-052d-4d7a-b0a2-c213d61e74f8
📒 Files selected for processing (6)
src/core/observability.rssrc/openhuman/embeddings/cohere.rssrc/openhuman/embeddings/mod.rssrc/openhuman/embeddings/openai.rssrc/openhuman/embeddings/openai_tests.rssrc/openhuman/embeddings/retry_after.rs
graycyrus
left a comment
There was a problem hiding this comment.
@YOMXXX hey! the code looks solid — clean retry implementation, Retry-After header parsing is correct, the unreachable! macro is properly justified, and the test coverage is thorough across both providers. CI is still pending on Rust Quality (fmt+clippy), core coverage, and the Tauri build, so i'll hold off on approving until those are green.
One thing worth flagging that CodeRabbit didn't catch: after cap exhaustion on a sustained 503, the bail message is formatted as "Embedding API error (503 Service Unavailable): ...". This won't match the (429 substring token in is_transient_upstream_http_message, so a persistent 503 burst would still fire Sentry events rather than demoting to a warn breadcrumb. If you want 503 cap-exhaustion bails to be suppressed the same way 429s are, the classifier needs (503 added to TRANSIENT_STATUS_MESSAGE_TOKENS (or an equivalent check), and the observability test should cover the 503 bail message shapes. If sustained 503 should remain alertable (reasonable case — service down is different from rate limited), just add a comment clarifying that intent so it doesn't look like an oversight.
On CodeRabbit's finding about the rate-limit slot being acquired once before the retry loop: that one's valid and should be addressed before this merges — retries bypass acquire_embedding_slot, which could let concurrent callers exceed the per-endpoint cap and trigger additional 429s.
Once CI is green and the open threads are resolved, happy to approve.
…+cohere The pre-loop `acquire_embedding_slot` call only consumed one token per `embed` invocation, so retried 429/503 attempts bypassed the per-endpoint budget and concurrent callers could exceed the intended cap — ironically triggering more 429s. Move the acquire inside the loop so every HTTP attempt (initial + each retry) is budgeted by the shared token bucket. Loopback endpoints stay exempt as before.
…ter is absent The previous "no Retry-After" round-trip still set `Retry-After: 0` on the mock, so the embed loop never actually called `backoff_ms_for_attempt(.., None)` — the exponential fallback was only exercised by unit tests in `retry_after.rs`. Add a dedicated round-trip test that omits the header entirely so the loop must take the `BASE_BACKOFF_MS` branch, and assert the elapsed wait reflects the schedule. Also correct the misleading comment in the existing test so it points at the real coverage for the no-header branch.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/openhuman/embeddings/openai_tests.rs (1)
1-690: ⚖️ Poor tradeoffTest file exceeds the 500-line guideline.
This file is 690 lines, which exceeds the guideline by ~38%. Consider splitting into separate test modules (e.g.,
openai_tests_basic.rs,openai_tests_errors.rs,openai_tests_retry.rs) to improve maintainability and align with the repository's file-size standard.As per coding guidelines: File size: prefer ≤ ~500 lines per source file; split modules when growing to maintain readability and single responsibility.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/embeddings/openai_tests.rs` around lines 1 - 690, The test file openai_tests.rs is too large (~690 lines); split it into smaller focused test modules (e.g., openai_tests_basic.rs for constructor/URL tests like trailing_slash_stripped, dimensions_custom, accessors, url_*; openai_tests_errors.rs for error-path tests like embed_missing_data_field, embed_missing_embedding_field_in_item, embed_non_numeric_value_errors, embed_count_mismatch, embed_dimension_mismatch, embed_malformed_json, embed_connection_refused; and openai_tests_retry.rs for 429/backoff tests like embed_429_retries_with_retry_after_then_succeeds, embed_429_indefinite_bails_after_retry_cap, embed_429_without_retry_after_uses_exponential_backoff, embed_429_no_retry_after_header_falls_back_to_exponential), move the shared helper start_mock and common imports into a new test utils module (e.g., mod test_util with start_mock and shared use statements) and update each new test file to import that utility and the same symbols (OpenAiEmbedding, retry_after::*, core::observability) so tests compile and maintain the original grouping and assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/openhuman/embeddings/openai_tests.rs`:
- Around line 1-690: The test file openai_tests.rs is too large (~690 lines);
split it into smaller focused test modules (e.g., openai_tests_basic.rs for
constructor/URL tests like trailing_slash_stripped, dimensions_custom,
accessors, url_*; openai_tests_errors.rs for error-path tests like
embed_missing_data_field, embed_missing_embedding_field_in_item,
embed_non_numeric_value_errors, embed_count_mismatch, embed_dimension_mismatch,
embed_malformed_json, embed_connection_refused; and openai_tests_retry.rs for
429/backoff tests like embed_429_retries_with_retry_after_then_succeeds,
embed_429_indefinite_bails_after_retry_cap,
embed_429_without_retry_after_uses_exponential_backoff,
embed_429_no_retry_after_header_falls_back_to_exponential), move the shared
helper start_mock and common imports into a new test utils module (e.g., mod
test_util with start_mock and shared use statements) and update each new test
file to import that utility and the same symbols (OpenAiEmbedding,
retry_after::*, core::observability) so tests compile and maintain the original
grouping and assertions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bf1b6437-76c5-42ab-8b3a-8b5a924b93f0
📒 Files selected for processing (3)
src/openhuman/embeddings/cohere.rssrc/openhuman/embeddings/openai.rssrc/openhuman/embeddings/openai_tests.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- src/openhuman/embeddings/cohere.rs
- src/openhuman/embeddings/openai.rs
|
@graycyrus all CI checks are green now (26/26 pass) and CodeRabbit re-approved after the rate-limit-slot-per-retry fix + the new end-to-end no-Retry-After test. Pinging in case this fell off your radar — happy to address anything else if needed. |
Summary
Fixes #2898 — the project's noisiest Sentry issue (TAURI-RUST-3, 31,328 events across 0.54.0 → 0.56.0).
OpenAiEmbedding::embedandCohereEmbedding::embedin a retry loop that handles 429 and 503 withRetry-Afterheader awareness and exponential fallback (1 s → 2 s → 4 s, capped at 30 s, max 3 retries / 4 attempts)."... API error (429 …): …"bail message so the existingcore::observability::ExpectedErrorKind::TransientUpstreamHttpclassifier continues to demote these towarn!breadcrumbs (no Sentry event). A new classification test pins all four canonical message shapes against that classifier.Why 31K events with the classifier already in place?
Sentry suppression was already correct via the
"(429 "substring token inTRANSIENT_STATUS_MESSAGE_TOKENS. The flood came from no backoff — each 429 looped back into the next request immediately, generating breadcrumbs and tag events at high rate. Backoff alone is expected to cut the volume by 99 %+.Files
src/openhuman/embeddings/retry_after.rsparse_retry_after_ms(delta-seconds form; HTTP-date intentionally not parsed) +backoff_ms_for_attempt+ 11 unit testssrc/openhuman/embeddings/openai.rssrc/openhuman/embeddings/cohere.rssrc/openhuman/embeddings/openai_tests.rssrc/core/observability.rsTest plan
pnpm debug rust embed_429andpnpm debug rust embedding_429— 6 + 1 passing locally (afterGGML_NATIVE=OFFworkaround for the unrelated whisper-rs-sys-mcpu=nativeclang issue on this dev box)Scope notes
OpenAiEmbeddingand inherits the fix; Ollama is loopback (rate-limit-exempt); cloud just dispatches.reqwest::header::RETRY_AFTERandtokio::time::sleep.(429substring soTransientUpstreamHttpsuppression still fires after cap exhaustion.Summary by CodeRabbit
New Features
Bug Fixes
Tests