fix(embeddings): add 429 backoff to stop 31K Sentry event flood (#2898) by YOMXXX · Pull Request #2904 · tinyhumansai/openhuman

YOMXXX · 2026-05-29T05:47:22Z

Summary

Fixes #2898 — the project's noisiest Sentry issue (TAURI-RUST-3, 31,328 events across 0.54.0 → 0.56.0).

Wraps OpenAiEmbedding::embed and CohereEmbedding::embed in a retry loop that handles 429 and 503 with Retry-After header awareness and exponential fallback (1 s → 2 s → 4 s, capped at 30 s, max 3 retries / 4 attempts).
Preserves the canonical "... API error (429 …): …" bail message so the existing core::observability::ExpectedErrorKind::TransientUpstreamHttp classifier continues to demote these to warn! breadcrumbs (no Sentry event). A new classification test pins all four canonical message shapes against that classifier.

Why 31K events with the classifier already in place?

Sentry suppression was already correct via the "(429 " substring token in TRANSIENT_STATUS_MESSAGE_TOKENS. The flood came from no backoff — each 429 looped back into the next request immediately, generating breadcrumbs and tag events at high rate. Backoff alone is expected to cut the volume by 99 %+.

Files

File	Change
`src/openhuman/embeddings/retry_after.rs`	New helper: `parse_retry_after_ms` (delta-seconds form; HTTP-date intentionally not parsed) + `backoff_ms_for_attempt` + 11 unit tests
`src/openhuman/embeddings/openai.rs`	Retry loop on 429/503 with Retry-After-aware backoff
`src/openhuman/embeddings/cohere.rs`	Same retry loop mirrored; `base_url` made injectable via `#[cfg(test)] with_base_url` so the new backoff tests drive real production code (not a shadow impl)
`src/openhuman/embeddings/openai_tests.rs`	3 async backoff tests (Retry-After success, indefinite cap, no-header fallback)
`src/core/observability.rs`	1 classification proof test

Test plan

pnpm debug rust embed_429 and pnpm debug rust embedding_429 — 6 + 1 passing locally (after GGML_NATIVE=OFF workaround for the unrelated whisper-rs-sys -mcpu=native clang issue on this dev box)

Scope notes

Voyage / Ollama / cloud-dispatch: not touched. Voyage wraps OpenAiEmbedding and inherits the fix; Ollama is loopback (rate-limit-exempt); cloud just dispatches.
No new deps. Uses reqwest::header::RETRY_AFTER and tokio::time::sleep.
Bail message unchanged: matches (429 substring so TransientUpstreamHttp suppression still fires after cap exhaustion.

Summary by CodeRabbit

New Features
- Embedding requests now retry on 429/503 responses (bounded attempts), honor Retry-After headers, and fall back to exponential backoff.
Bug Fixes
- 429 rate-limit responses are classified and handled consistently as transient, preserving suppression behavior for re-emitted errors.
Tests
- Added integration and unit tests for retry behavior, Retry-After parsing, backoff timing, and success/failure scenarios.

…inyhumansai#2898) Introduces a reactive retry loop in OpenAiEmbedding::embed that retries on 429 and 503 up to MAX_429_RETRIES (3) times before bailing. Retry-After delta-seconds from the server are honoured when present; exponential backoff (1 s, 2 s, 4 s, capped at 30 s) is used as fallback. The bail message preserves the canonical "(429 …)" shape so the TransientUpstreamHttp classifier in core::observability continues to demote it to a warning breadcrumb rather than a Sentry error event. Backoff logic is extracted into retry_after.rs so cohere.rs can share it without duplication.

Mirrors the openai.rs retry loop in CohereEmbedding::embed using the same retry_after::backoff_ms_for_attempt helper. Cohere's canonical error shape "Cohere embed API error (429 …)" also matches the api error (429 prefix checked by is_transient_upstream_http_message, so Sentry suppression is preserved on the bail path.

…reamHttp Proves that the canonical error messages produced by openai.rs and cohere.rs on 429 — both the live-path format and the retry-cap bail format — are already matched by is_transient_upstream_http_message via the "(429 " token, and therefore routed to warn! instead of a Sentry error event.

Three new async tests for OpenAiEmbedding::embed: - embed_429_retries_with_retry_after_then_succeeds: mock returns 429×2 then 200, verifies exactly 3 requests and correct embedding result. - embed_429_indefinite_bails_after_retry_cap: mock always 429, verifies MAX_429_RETRIES+1 requests and canonical transient error message. - embed_429_without_retry_after_uses_exponential_backoff: unit-tests backoff_ms_for_attempt helper directly plus an end-to-end 429→200 round-trip. Two new tests for CohereEmbedding via TestCohereEmbedding wrapper: - cohere_embed_429_then_success - cohere_embed_429_indefinite_bails_after_cap

… (no shadow impl) Add a `base_url` field to `CohereEmbedding` (defaulting to `COHERE_API_BASE` in the public constructor, no caller breakage) and a `#[cfg(test)]` `with_base_url` builder so tests can point at a local axum mock server. Delete `TestCohereEmbedding` — the shadow implementation that duplicated the full retry loop, header construction, 429 detection, and Retry-After parsing. Rewrite both 429 backoff tests to construct a real `CohereEmbedding` via `with_base_url` and call `CohereEmbedding::embed` directly, mirroring the pattern already used by the OpenAI 429 tests.

…st-only base url Replace the dead post-loop anyhow::bail! in both openai.rs and cohere.rs with unreachable! and an explanatory comment. The bail was structurally unreachable: on the final attempt the retryable guard (attempt < MAX_429_RETRIES) is false, so execution falls into the existing non-2xx branch which already bails with the body-bearing "... (429 ...): <body>" format — the format that tests assert and that TransientUpstreamHttp in core::observability uses to suppress the Sentry event. Also: document why CohereEmbedding::with_base_url is cfg(test)-only on the method itself (OpenAI takes its base URL via new(); Cohere historically hardcoded COHERE_API_BASE so the builder fills the gap for 429 tests), and log the retried response body at debug level instead of discarding it.

coderabbitai · 2026-05-29T05:47:37Z

📝 Walkthrough

Walkthrough

Adds a Retry-After/backoff utility and uses it in OpenAI and Cohere embedding clients to retry HTTP 429/503 with Retry-After precedence and exponential fallback; observability is updated with a regression test classifying these 429s as transient.

Changes

Embedding 429 Retry and Backoff

Layer / File(s)	Summary
Retry-After header parsing and backoff utility module `src/openhuman/embeddings/retry_after.rs`	New module with constants (`MAX_429_RETRIES`, `BASE_BACKOFF_MS`, `MAX_BACKOFF_MS`) and functions `parse_retry_after_ms()` and `backoff_ms_for_attempt()` to parse delta-seconds Retry-After headers and compute exponential backoff with saturation and capping. Includes unit tests for parsing edge cases and backoff selection.
Embeddings module export `src/openhuman/embeddings/mod.rs`	Exports new `retry_after` submodule as public.
OpenAI embedding 429/503 retry implementation & tests `src/openhuman/embeddings/openai.rs`, `src/openhuman/embeddings/openai_tests.rs`	Adds retry loop in `embed()` for HTTP 429/503 responses, respects `Retry-After` headers, uses exponential backoff when headers absent, moves `acquire_embedding_slot` inside the loop, reports non-success via `core::observability::report_error_or_expected`, and parses/validates embeddings within retry control flow. Adds Tokio tests for Retry-After:0 recovery, retry-cap bailout with transient classification, and exponential-backoff-path timing.
Cohere embedding 429/503 retry implementation & tests `src/openhuman/embeddings/cohere.rs`	Adds `base_url` field to `CohereEmbedding` with test-only `with_base_url()` builder; implements retry loop in `embed()` mirroring OpenAI pattern: Retry-After parsing, exponential backoff fallback, per-attempt rate-slot acquisition, observability error reporting, and embedding validation. Includes Axum mock-server tests asserting recovery and retry-cap behavior with exact request counts.
Observability classification test for 429 responses `src/core/observability.rs`	Regression test validates both OpenAI and Cohere HTTP 429 messages—canonical forms and retry-cap variants—classify as `TransientUpstreamHttp` to suppress Sentry error events.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant EmbeddingAPI as OpenAI/Cohere API
  participant RetryUtils
  participant Observability
  Client->>EmbeddingAPI: POST /embeddings (attempt N)
  EmbeddingAPI-->>Client: 429/503 (+ optional Retry-After)
  Client->>RetryUtils: parse_retry_after_ms / backoff_ms_for_attempt
  RetryUtils-->>Client: delay ms
  Client->>Client: sleep(delay)
  alt retries exhausted or non-retryable
    Client->>Observability: report_error_or_expected(status, body)
    Client-->>Client: return error
  else 200 OK
    EmbeddingAPI-->>Client: 200 + embeddings
    Client-->>Client: parse, validate, return embeddings
  end

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

tinyhumansai/openhuman#2461: Introduced a pre-request rate-limit gate for embeddings that complements the reactive retry/backoff changes here.
tinyhumansai/openhuman#2830: Related changes to ExpectedErrorKind transient classification used by the new observability test.
tinyhumansai/openhuman#2190: Intersects with routing embedding non-2xx responses through report_error_or_expected for expected-error classification.

Suggested Reviewers

graycyrus
oxoxDev

"I am a rabbit, small and spry,
I wait when Retry-After says 'try',
Exponential hops slow my pace,
Sentry quieted, peace in place,
🐇⏱️"

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(embeddings): add 429 backoff to stop 31K Sentry event flood' directly reflects the main changes—adding 429 backoff logic and suppressing Sentry events—matching the PR's primary objective.
Linked Issues check	✅ Passed	The PR addresses all primary objectives from issue `#2898`: adds exponential backoff for 429/503 retries respecting Retry-After, ensures 429s are classified as transient to suppress Sentry events, includes unit tests, and maintains expected error message shape.
Out of Scope Changes check	✅ Passed	All changes are within scope: retry logic in OpenAI/Cohere embedding clients, new retry_after helper module, observability test for classification, and test infrastructure—no unrelated modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/openhuman/embeddings/openai_tests.rs (1)

233-241: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

This 429 test now sleeps ~7s due to the new retry loop.

This mock returns 429 with no Retry-After, so the new backoff loop retries with the exponential fallback (1s + 2s + 4s ≈ 7s of real tokio::time::sleep) before bailing. Add Retry-After: 0 to keep CI fast while preserving the canonical-format assertions.

💚 Keep the assertions, drop the wall-clock delay

     let app = Router::new().route(
         "/v1/embeddings",
-        post(|| async {
-            (
-                StatusCode::TOO_MANY_REQUESTS,
-                r#"{"error":{"message":"Rate limit exceeded.","type":"rate_limit_error"}}"#,
-            )
-        }),
+        post(|| async {
+            axum::response::Response::builder()
+                .status(StatusCode::TOO_MANY_REQUESTS)
+                .header("Retry-After", "0")
+                .body(axum::body::Body::from(
+                    r#"{"error":{"message":"Rate limit exceeded.","type":"rate_limit_error"}}"#,
+                ))
+                .unwrap()
+        }),
     );

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/embeddings/openai_tests.rs` around lines 233 - 241, The 429
mock handler for the "/v1/embeddings" route causes long real sleeps because it
doesn't set a Retry-After header; update the Router::new().route handler so it
returns a 429 response that includes the HTTP header "Retry-After: 0" alongside
the body (i.e., modify the closure used in Router::new().route for
"/v1/embeddings" to include the Retry-After header in the response
tuple/object), preserving the same status and JSON body and keeping the existing
assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/embeddings/openai_tests.rs`:
- Around line 587-596: The test comment is misleading: the mock response inside
the branch guarding `if *n == 1` actually sets a "Retry-After: 0" header so the
HTTP round-trip exercises the header-based path rather than the no-header
exponential backoff; update the comment to state that this branch returns a
0-second Retry-After header (so the header path is taken) and, if you intend to
exercise the no-header exponential fallback end-to-end, remove the header here
or add a separate case that omits the "Retry-After" header; reference the `if *n
== 1` branch and the `backoff_ms_for_attempt` assertions so readers understand
which parts test header vs. exponential behavior.

In `@src/openhuman/embeddings/openai.rs`:
- Around line 127-174: The current embed function acquires a single rate-limit
token once before the retry loop, but each HTTP attempt (including retries) must
consume a token; modify embed so it calls
super::rate_limit::acquire_embedding_slot(&self.base_url).await inside the retry
loop immediately before performing req.send().await (or the request send path),
ensuring each attempt invokes TokenBucket::acquire and therefore every retry is
budgeted by the shared per-endpoint bucket; remove or adjust the single pre-loop
acquire to avoid double-consumption or leave it out if you move consumption into
the loop.

---

Outside diff comments:
In `@src/openhuman/embeddings/openai_tests.rs`:
- Around line 233-241: The 429 mock handler for the "/v1/embeddings" route
causes long real sleeps because it doesn't set a Retry-After header; update the
Router::new().route handler so it returns a 429 response that includes the HTTP
header "Retry-After: 0" alongside the body (i.e., modify the closure used in
Router::new().route for "/v1/embeddings" to include the Retry-After header in
the response tuple/object), preserving the same status and JSON body and keeping
the existing assertions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 18bb235f-052d-4d7a-b0a2-c213d61e74f8

📥 Commits

Reviewing files that changed from the base of the PR and between db3fdc2 and 22f5266.

📒 Files selected for processing (6)

src/core/observability.rs
src/openhuman/embeddings/cohere.rs
src/openhuman/embeddings/mod.rs
src/openhuman/embeddings/openai.rs
src/openhuman/embeddings/openai_tests.rs
src/openhuman/embeddings/retry_after.rs

graycyrus

@YOMXXX hey! the code looks solid — clean retry implementation, Retry-After header parsing is correct, the unreachable! macro is properly justified, and the test coverage is thorough across both providers. CI is still pending on Rust Quality (fmt+clippy), core coverage, and the Tauri build, so i'll hold off on approving until those are green.

One thing worth flagging that CodeRabbit didn't catch: after cap exhaustion on a sustained 503, the bail message is formatted as "Embedding API error (503 Service Unavailable): ...". This won't match the (429 substring token in is_transient_upstream_http_message, so a persistent 503 burst would still fire Sentry events rather than demoting to a warn breadcrumb. If you want 503 cap-exhaustion bails to be suppressed the same way 429s are, the classifier needs (503 added to TRANSIENT_STATUS_MESSAGE_TOKENS (or an equivalent check), and the observability test should cover the 503 bail message shapes. If sustained 503 should remain alertable (reasonable case — service down is different from rate limited), just add a comment clarifying that intent so it doesn't look like an oversight.

On CodeRabbit's finding about the rate-limit slot being acquired once before the retry loop: that one's valid and should be addressed before this merges — retries bypass acquire_embedding_slot, which could let concurrent callers exceed the per-endpoint cap and trigger additional 429s.

Once CI is green and the open threads are resolved, happy to approve.

…+cohere The pre-loop `acquire_embedding_slot` call only consumed one token per `embed` invocation, so retried 429/503 attempts bypassed the per-endpoint budget and concurrent callers could exceed the intended cap — ironically triggering more 429s. Move the acquire inside the loop so every HTTP attempt (initial + each retry) is budgeted by the shared token bucket. Loopback endpoints stay exempt as before.

…ter is absent The previous "no Retry-After" round-trip still set `Retry-After: 0` on the mock, so the embed loop never actually called `backoff_ms_for_attempt(.., None)` — the exponential fallback was only exercised by unit tests in `retry_after.rs`. Add a dedicated round-trip test that omits the header entirely so the loop must take the `BASE_BACKOFF_MS` branch, and assert the elapsed wait reflects the schedule. Also correct the misleading comment in the existing test so it points at the real coverage for the no-header branch.

coderabbitai

🧹 Nitpick comments (1)

src/openhuman/embeddings/openai_tests.rs (1)

1-690: ⚖️ Poor tradeoff

Test file exceeds the 500-line guideline.

This file is 690 lines, which exceeds the guideline by ~38%. Consider splitting into separate test modules (e.g., openai_tests_basic.rs, openai_tests_errors.rs, openai_tests_retry.rs) to improve maintainability and align with the repository's file-size standard.

As per coding guidelines: File size: prefer ≤ ~500 lines per source file; split modules when growing to maintain readability and single responsibility.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/embeddings/openai_tests.rs` around lines 1 - 690, The test file
openai_tests.rs is too large (~690 lines); split it into smaller focused test
modules (e.g., openai_tests_basic.rs for constructor/URL tests like
trailing_slash_stripped, dimensions_custom, accessors, url_*;
openai_tests_errors.rs for error-path tests like embed_missing_data_field,
embed_missing_embedding_field_in_item, embed_non_numeric_value_errors,
embed_count_mismatch, embed_dimension_mismatch, embed_malformed_json,
embed_connection_refused; and openai_tests_retry.rs for 429/backoff tests like
embed_429_retries_with_retry_after_then_succeeds,
embed_429_indefinite_bails_after_retry_cap,
embed_429_without_retry_after_uses_exponential_backoff,
embed_429_no_retry_after_header_falls_back_to_exponential), move the shared
helper start_mock and common imports into a new test utils module (e.g., mod
test_util with start_mock and shared use statements) and update each new test
file to import that utility and the same symbols (OpenAiEmbedding,
retry_after::*, core::observability) so tests compile and maintain the original
grouping and assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/embeddings/openai_tests.rs`:
- Around line 1-690: The test file openai_tests.rs is too large (~690 lines);
split it into smaller focused test modules (e.g., openai_tests_basic.rs for
constructor/URL tests like trailing_slash_stripped, dimensions_custom,
accessors, url_*; openai_tests_errors.rs for error-path tests like
embed_missing_data_field, embed_missing_embedding_field_in_item,
embed_non_numeric_value_errors, embed_count_mismatch, embed_dimension_mismatch,
embed_malformed_json, embed_connection_refused; and openai_tests_retry.rs for
429/backoff tests like embed_429_retries_with_retry_after_then_succeeds,
embed_429_indefinite_bails_after_retry_cap,
embed_429_without_retry_after_uses_exponential_backoff,
embed_429_no_retry_after_header_falls_back_to_exponential), move the shared
helper start_mock and common imports into a new test utils module (e.g., mod
test_util with start_mock and shared use statements) and update each new test
file to import that utility and the same symbols (OpenAiEmbedding,
retry_after::*, core::observability) so tests compile and maintain the original
grouping and assertions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bf1b6437-76c5-42ab-8b3a-8b5a924b93f0

📥 Commits

Reviewing files that changed from the base of the PR and between 38daea6 and 4b77c8f.

📒 Files selected for processing (3)

src/openhuman/embeddings/cohere.rs
src/openhuman/embeddings/openai.rs
src/openhuman/embeddings/openai_tests.rs

🚧 Files skipped from review as they are similar to previous changes (2)

src/openhuman/embeddings/cohere.rs
src/openhuman/embeddings/openai.rs

YOMXXX · 2026-05-29T08:57:52Z

@graycyrus all CI checks are green now (26/26 pass) and CodeRabbit re-approved after the rate-limit-slot-per-retry fix + the new end-to-end no-Retry-After test. Pinging in case this fell off your radar — happy to address anything else if needed.

YOMXXX added 6 commits May 29, 2026 13:21

YOMXXX requested a review from a team May 29, 2026 05:47

coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage bug labels May 29, 2026

style(embeddings): cargo fmt

38daea6

coderabbitai Bot requested changes May 29, 2026

View reviewed changes

Comment thread src/openhuman/embeddings/openai_tests.rs

Comment thread src/openhuman/embeddings/openai.rs

graycyrus reviewed May 29, 2026

View reviewed changes

YOMXXX added 2 commits May 29, 2026 15:21

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

coderabbitai Bot approved these changes May 29, 2026

View reviewed changes

graycyrus merged commit e9882f6 into tinyhumansai:main May 29, 2026
33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(embeddings): add 429 backoff to stop 31K Sentry event flood (#2898)#2904

fix(embeddings): add 429 backoff to stop 31K Sentry event flood (#2898)#2904
graycyrus merged 9 commits into
tinyhumansai:mainfrom
YOMXXX:fix/2898-embedding-429-backoff

YOMXXX commented May 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Possibly Related PRs

Suggested Reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

graycyrus left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

YOMXXX commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

YOMXXX commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why 31K events with the classifier already in place?

Files

Test plan

Scope notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Possibly Related PRs

Suggested Reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

graycyrus left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

YOMXXX commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YOMXXX commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading