Skip to content

fix(embeddings): add 429 backoff to stop 31K Sentry event flood (#2898)#2904

Merged
graycyrus merged 9 commits into
tinyhumansai:mainfrom
YOMXXX:fix/2898-embedding-429-backoff
May 29, 2026
Merged

fix(embeddings): add 429 backoff to stop 31K Sentry event flood (#2898)#2904
graycyrus merged 9 commits into
tinyhumansai:mainfrom
YOMXXX:fix/2898-embedding-429-backoff

Conversation

@YOMXXX
Copy link
Copy Markdown
Contributor

@YOMXXX YOMXXX commented May 29, 2026

Summary

Fixes #2898 — the project's noisiest Sentry issue (TAURI-RUST-3, 31,328 events across 0.54.0 → 0.56.0).

  • Wraps OpenAiEmbedding::embed and CohereEmbedding::embed in a retry loop that handles 429 and 503 with Retry-After header awareness and exponential fallback (1 s → 2 s → 4 s, capped at 30 s, max 3 retries / 4 attempts).
  • Preserves the canonical "... API error (429 …): …" bail message so the existing core::observability::ExpectedErrorKind::TransientUpstreamHttp classifier continues to demote these to warn! breadcrumbs (no Sentry event). A new classification test pins all four canonical message shapes against that classifier.

Why 31K events with the classifier already in place?

Sentry suppression was already correct via the "(429 " substring token in TRANSIENT_STATUS_MESSAGE_TOKENS. The flood came from no backoff — each 429 looped back into the next request immediately, generating breadcrumbs and tag events at high rate. Backoff alone is expected to cut the volume by 99 %+.

Files

File Change
src/openhuman/embeddings/retry_after.rs New helper: parse_retry_after_ms (delta-seconds form; HTTP-date intentionally not parsed) + backoff_ms_for_attempt + 11 unit tests
src/openhuman/embeddings/openai.rs Retry loop on 429/503 with Retry-After-aware backoff
src/openhuman/embeddings/cohere.rs Same retry loop mirrored; `base_url` made injectable via `#[cfg(test)] with_base_url` so the new backoff tests drive real production code (not a shadow impl)
src/openhuman/embeddings/openai_tests.rs 3 async backoff tests (Retry-After success, indefinite cap, no-header fallback)
src/core/observability.rs 1 classification proof test

Test plan

  • pnpm debug rust embed_429 and pnpm debug rust embedding_429 — 6 + 1 passing locally (after GGML_NATIVE=OFF workaround for the unrelated whisper-rs-sys -mcpu=native clang issue on this dev box)

Scope notes

  • Voyage / Ollama / cloud-dispatch: not touched. Voyage wraps OpenAiEmbedding and inherits the fix; Ollama is loopback (rate-limit-exempt); cloud just dispatches.
  • No new deps. Uses reqwest::header::RETRY_AFTER and tokio::time::sleep.
  • Bail message unchanged: matches (429 substring so TransientUpstreamHttp suppression still fires after cap exhaustion.

Summary by CodeRabbit

  • New Features

    • Embedding requests now retry on 429/503 responses (bounded attempts), honor Retry-After headers, and fall back to exponential backoff.
  • Bug Fixes

    • 429 rate-limit responses are classified and handled consistently as transient, preserving suppression behavior for re-emitted errors.
  • Tests

    • Added integration and unit tests for retry behavior, Retry-After parsing, backoff timing, and success/failure scenarios.

Review Change Stack

YOMXXX added 6 commits May 29, 2026 13:21
…inyhumansai#2898)

Introduces a reactive retry loop in OpenAiEmbedding::embed that retries on
429 and 503 up to MAX_429_RETRIES (3) times before bailing.  Retry-After
delta-seconds from the server are honoured when present; exponential backoff
(1 s, 2 s, 4 s, capped at 30 s) is used as fallback.  The bail message
preserves the canonical "(429 …)" shape so the TransientUpstreamHttp
classifier in core::observability continues to demote it to a warning
breadcrumb rather than a Sentry error event.

Backoff logic is extracted into retry_after.rs so cohere.rs can share it
without duplication.
Mirrors the openai.rs retry loop in CohereEmbedding::embed using the same
retry_after::backoff_ms_for_attempt helper.  Cohere's canonical error shape
"Cohere embed API error (429 …)" also matches the api error (429 prefix
checked by is_transient_upstream_http_message, so Sentry suppression is
preserved on the bail path.
…reamHttp

Proves that the canonical error messages produced by openai.rs and cohere.rs
on 429 — both the live-path format and the retry-cap bail format — are already
matched by is_transient_upstream_http_message via the "(429 " token, and
therefore routed to warn! instead of a Sentry error event.
Three new async tests for OpenAiEmbedding::embed:
- embed_429_retries_with_retry_after_then_succeeds: mock returns 429×2 then
  200, verifies exactly 3 requests and correct embedding result.
- embed_429_indefinite_bails_after_retry_cap: mock always 429, verifies
  MAX_429_RETRIES+1 requests and canonical transient error message.
- embed_429_without_retry_after_uses_exponential_backoff: unit-tests
  backoff_ms_for_attempt helper directly plus an end-to-end 429→200 round-trip.

Two new tests for CohereEmbedding via TestCohereEmbedding wrapper:
- cohere_embed_429_then_success
- cohere_embed_429_indefinite_bails_after_cap
… (no shadow impl)

Add a `base_url` field to `CohereEmbedding` (defaulting to `COHERE_API_BASE`
in the public constructor, no caller breakage) and a `#[cfg(test)]`
`with_base_url` builder so tests can point at a local axum mock server.

Delete `TestCohereEmbedding` — the shadow implementation that duplicated the
full retry loop, header construction, 429 detection, and Retry-After parsing.
Rewrite both 429 backoff tests to construct a real `CohereEmbedding` via
`with_base_url` and call `CohereEmbedding::embed` directly, mirroring the
pattern already used by the OpenAI 429 tests.
…st-only base url

Replace the dead post-loop anyhow::bail! in both openai.rs and cohere.rs
with unreachable! and an explanatory comment.  The bail was structurally
unreachable: on the final attempt the retryable guard (attempt <
MAX_429_RETRIES) is false, so execution falls into the existing non-2xx
branch which already bails with the body-bearing "... (429 ...): <body>"
format — the format that tests assert and that TransientUpstreamHttp in
core::observability uses to suppress the Sentry event.

Also: document why CohereEmbedding::with_base_url is cfg(test)-only on the
method itself (OpenAI takes its base URL via new(); Cohere historically
hardcoded COHERE_API_BASE so the builder fills the gap for 429 tests), and
log the retried response body at debug level instead of discarding it.
@YOMXXX YOMXXX requested a review from a team May 29, 2026 05:47
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

Adds a Retry-After/backoff utility and uses it in OpenAI and Cohere embedding clients to retry HTTP 429/503 with Retry-After precedence and exponential fallback; observability is updated with a regression test classifying these 429s as transient.

Changes

Embedding 429 Retry and Backoff

Layer / File(s) Summary
Retry-After header parsing and backoff utility module
src/openhuman/embeddings/retry_after.rs
New module with constants (MAX_429_RETRIES, BASE_BACKOFF_MS, MAX_BACKOFF_MS) and functions parse_retry_after_ms() and backoff_ms_for_attempt() to parse delta-seconds Retry-After headers and compute exponential backoff with saturation and capping. Includes unit tests for parsing edge cases and backoff selection.
Embeddings module export
src/openhuman/embeddings/mod.rs
Exports new retry_after submodule as public.
OpenAI embedding 429/503 retry implementation & tests
src/openhuman/embeddings/openai.rs, src/openhuman/embeddings/openai_tests.rs
Adds retry loop in embed() for HTTP 429/503 responses, respects Retry-After headers, uses exponential backoff when headers absent, moves acquire_embedding_slot inside the loop, reports non-success via core::observability::report_error_or_expected, and parses/validates embeddings within retry control flow. Adds Tokio tests for Retry-After:0 recovery, retry-cap bailout with transient classification, and exponential-backoff-path timing.
Cohere embedding 429/503 retry implementation & tests
src/openhuman/embeddings/cohere.rs
Adds base_url field to CohereEmbedding with test-only with_base_url() builder; implements retry loop in embed() mirroring OpenAI pattern: Retry-After parsing, exponential backoff fallback, per-attempt rate-slot acquisition, observability error reporting, and embedding validation. Includes Axum mock-server tests asserting recovery and retry-cap behavior with exact request counts.
Observability classification test for 429 responses
src/core/observability.rs
Regression test validates both OpenAI and Cohere HTTP 429 messages—canonical forms and retry-cap variants—classify as TransientUpstreamHttp to suppress Sentry error events.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant EmbeddingAPI as OpenAI/Cohere API
  participant RetryUtils
  participant Observability
  Client->>EmbeddingAPI: POST /embeddings (attempt N)
  EmbeddingAPI-->>Client: 429/503 (+ optional Retry-After)
  Client->>RetryUtils: parse_retry_after_ms / backoff_ms_for_attempt
  RetryUtils-->>Client: delay ms
  Client->>Client: sleep(delay)
  alt retries exhausted or non-retryable
    Client->>Observability: report_error_or_expected(status, body)
    Client-->>Client: return error
  else 200 OK
    EmbeddingAPI-->>Client: 200 + embeddings
    Client-->>Client: parse, validate, return embeddings
  end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

  • tinyhumansai/openhuman#2461: Introduced a pre-request rate-limit gate for embeddings that complements the reactive retry/backoff changes here.
  • tinyhumansai/openhuman#2830: Related changes to ExpectedErrorKind transient classification used by the new observability test.
  • tinyhumansai/openhuman#2190: Intersects with routing embedding non-2xx responses through report_error_or_expected for expected-error classification.

Suggested Reviewers

  • graycyrus
  • oxoxDev

"I am a rabbit, small and spry,
I wait when Retry-After says 'try',
Exponential hops slow my pace,
Sentry quieted, peace in place,
🐇⏱️"

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(embeddings): add 429 backoff to stop 31K Sentry event flood' directly reflects the main changes—adding 429 backoff logic and suppressing Sentry events—matching the PR's primary objective.
Linked Issues check ✅ Passed The PR addresses all primary objectives from issue #2898: adds exponential backoff for 429/503 retries respecting Retry-After, ensures 429s are classified as transient to suppress Sentry events, includes unit tests, and maintains expected error message shape.
Out of Scope Changes check ✅ Passed All changes are within scope: retry logic in OpenAI/Cohere embedding clients, new retry_after helper module, observability test for classification, and test infrastructure—no unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage bug labels May 29, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/openhuman/embeddings/openai_tests.rs (1)

233-241: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

This 429 test now sleeps ~7s due to the new retry loop.

This mock returns 429 with no Retry-After, so the new backoff loop retries with the exponential fallback (1s + 2s + 4s ≈ 7s of real tokio::time::sleep) before bailing. Add Retry-After: 0 to keep CI fast while preserving the canonical-format assertions.

💚 Keep the assertions, drop the wall-clock delay
     let app = Router::new().route(
         "/v1/embeddings",
-        post(|| async {
-            (
-                StatusCode::TOO_MANY_REQUESTS,
-                r#"{"error":{"message":"Rate limit exceeded.","type":"rate_limit_error"}}"#,
-            )
-        }),
+        post(|| async {
+            axum::response::Response::builder()
+                .status(StatusCode::TOO_MANY_REQUESTS)
+                .header("Retry-After", "0")
+                .body(axum::body::Body::from(
+                    r#"{"error":{"message":"Rate limit exceeded.","type":"rate_limit_error"}}"#,
+                ))
+                .unwrap()
+        }),
     );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/embeddings/openai_tests.rs` around lines 233 - 241, The 429
mock handler for the "/v1/embeddings" route causes long real sleeps because it
doesn't set a Retry-After header; update the Router::new().route handler so it
returns a 429 response that includes the HTTP header "Retry-After: 0" alongside
the body (i.e., modify the closure used in Router::new().route for
"/v1/embeddings" to include the Retry-After header in the response
tuple/object), preserving the same status and JSON body and keeping the existing
assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/embeddings/openai_tests.rs`:
- Around line 587-596: The test comment is misleading: the mock response inside
the branch guarding `if *n == 1` actually sets a "Retry-After: 0" header so the
HTTP round-trip exercises the header-based path rather than the no-header
exponential backoff; update the comment to state that this branch returns a
0-second Retry-After header (so the header path is taken) and, if you intend to
exercise the no-header exponential fallback end-to-end, remove the header here
or add a separate case that omits the "Retry-After" header; reference the `if *n
== 1` branch and the `backoff_ms_for_attempt` assertions so readers understand
which parts test header vs. exponential behavior.

In `@src/openhuman/embeddings/openai.rs`:
- Around line 127-174: The current embed function acquires a single rate-limit
token once before the retry loop, but each HTTP attempt (including retries) must
consume a token; modify embed so it calls
super::rate_limit::acquire_embedding_slot(&self.base_url).await inside the retry
loop immediately before performing req.send().await (or the request send path),
ensuring each attempt invokes TokenBucket::acquire and therefore every retry is
budgeted by the shared per-endpoint bucket; remove or adjust the single pre-loop
acquire to avoid double-consumption or leave it out if you move consumption into
the loop.

---

Outside diff comments:
In `@src/openhuman/embeddings/openai_tests.rs`:
- Around line 233-241: The 429 mock handler for the "/v1/embeddings" route
causes long real sleeps because it doesn't set a Retry-After header; update the
Router::new().route handler so it returns a 429 response that includes the HTTP
header "Retry-After: 0" alongside the body (i.e., modify the closure used in
Router::new().route for "/v1/embeddings" to include the Retry-After header in
the response tuple/object), preserving the same status and JSON body and keeping
the existing assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 18bb235f-052d-4d7a-b0a2-c213d61e74f8

📥 Commits

Reviewing files that changed from the base of the PR and between db3fdc2 and 22f5266.

📒 Files selected for processing (6)
  • src/core/observability.rs
  • src/openhuman/embeddings/cohere.rs
  • src/openhuman/embeddings/mod.rs
  • src/openhuman/embeddings/openai.rs
  • src/openhuman/embeddings/openai_tests.rs
  • src/openhuman/embeddings/retry_after.rs

Comment thread src/openhuman/embeddings/openai_tests.rs
Comment thread src/openhuman/embeddings/openai.rs
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YOMXXX hey! the code looks solid — clean retry implementation, Retry-After header parsing is correct, the unreachable! macro is properly justified, and the test coverage is thorough across both providers. CI is still pending on Rust Quality (fmt+clippy), core coverage, and the Tauri build, so i'll hold off on approving until those are green.

One thing worth flagging that CodeRabbit didn't catch: after cap exhaustion on a sustained 503, the bail message is formatted as "Embedding API error (503 Service Unavailable): ...". This won't match the (429 substring token in is_transient_upstream_http_message, so a persistent 503 burst would still fire Sentry events rather than demoting to a warn breadcrumb. If you want 503 cap-exhaustion bails to be suppressed the same way 429s are, the classifier needs (503 added to TRANSIENT_STATUS_MESSAGE_TOKENS (or an equivalent check), and the observability test should cover the 503 bail message shapes. If sustained 503 should remain alertable (reasonable case — service down is different from rate limited), just add a comment clarifying that intent so it doesn't look like an oversight.

On CodeRabbit's finding about the rate-limit slot being acquired once before the retry loop: that one's valid and should be addressed before this merges — retries bypass acquire_embedding_slot, which could let concurrent callers exceed the per-endpoint cap and trigger additional 429s.

Once CI is green and the open threads are resolved, happy to approve.

YOMXXX added 2 commits May 29, 2026 15:21
…+cohere

The pre-loop `acquire_embedding_slot` call only consumed one token per `embed`
invocation, so retried 429/503 attempts bypassed the per-endpoint budget and
concurrent callers could exceed the intended cap — ironically triggering more
429s. Move the acquire inside the loop so every HTTP attempt (initial + each
retry) is budgeted by the shared token bucket. Loopback endpoints stay exempt
as before.
…ter is absent

The previous "no Retry-After" round-trip still set `Retry-After: 0` on the mock,
so the embed loop never actually called `backoff_ms_for_attempt(.., None)` — the
exponential fallback was only exercised by unit tests in `retry_after.rs`. Add a
dedicated round-trip test that omits the header entirely so the loop must take
the `BASE_BACKOFF_MS` branch, and assert the elapsed wait reflects the schedule.
Also correct the misleading comment in the existing test so it points at the
real coverage for the no-header branch.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/openhuman/embeddings/openai_tests.rs (1)

1-690: ⚖️ Poor tradeoff

Test file exceeds the 500-line guideline.

This file is 690 lines, which exceeds the guideline by ~38%. Consider splitting into separate test modules (e.g., openai_tests_basic.rs, openai_tests_errors.rs, openai_tests_retry.rs) to improve maintainability and align with the repository's file-size standard.

As per coding guidelines: File size: prefer ≤ ~500 lines per source file; split modules when growing to maintain readability and single responsibility.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/embeddings/openai_tests.rs` around lines 1 - 690, The test file
openai_tests.rs is too large (~690 lines); split it into smaller focused test
modules (e.g., openai_tests_basic.rs for constructor/URL tests like
trailing_slash_stripped, dimensions_custom, accessors, url_*;
openai_tests_errors.rs for error-path tests like embed_missing_data_field,
embed_missing_embedding_field_in_item, embed_non_numeric_value_errors,
embed_count_mismatch, embed_dimension_mismatch, embed_malformed_json,
embed_connection_refused; and openai_tests_retry.rs for 429/backoff tests like
embed_429_retries_with_retry_after_then_succeeds,
embed_429_indefinite_bails_after_retry_cap,
embed_429_without_retry_after_uses_exponential_backoff,
embed_429_no_retry_after_header_falls_back_to_exponential), move the shared
helper start_mock and common imports into a new test utils module (e.g., mod
test_util with start_mock and shared use statements) and update each new test
file to import that utility and the same symbols (OpenAiEmbedding,
retry_after::*, core::observability) so tests compile and maintain the original
grouping and assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/embeddings/openai_tests.rs`:
- Around line 1-690: The test file openai_tests.rs is too large (~690 lines);
split it into smaller focused test modules (e.g., openai_tests_basic.rs for
constructor/URL tests like trailing_slash_stripped, dimensions_custom,
accessors, url_*; openai_tests_errors.rs for error-path tests like
embed_missing_data_field, embed_missing_embedding_field_in_item,
embed_non_numeric_value_errors, embed_count_mismatch, embed_dimension_mismatch,
embed_malformed_json, embed_connection_refused; and openai_tests_retry.rs for
429/backoff tests like embed_429_retries_with_retry_after_then_succeeds,
embed_429_indefinite_bails_after_retry_cap,
embed_429_without_retry_after_uses_exponential_backoff,
embed_429_no_retry_after_header_falls_back_to_exponential), move the shared
helper start_mock and common imports into a new test utils module (e.g., mod
test_util with start_mock and shared use statements) and update each new test
file to import that utility and the same symbols (OpenAiEmbedding,
retry_after::*, core::observability) so tests compile and maintain the original
grouping and assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bf1b6437-76c5-42ab-8b3a-8b5a924b93f0

📥 Commits

Reviewing files that changed from the base of the PR and between 38daea6 and 4b77c8f.

📒 Files selected for processing (3)
  • src/openhuman/embeddings/cohere.rs
  • src/openhuman/embeddings/openai.rs
  • src/openhuman/embeddings/openai_tests.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/openhuman/embeddings/cohere.rs
  • src/openhuman/embeddings/openai.rs

@YOMXXX
Copy link
Copy Markdown
Contributor Author

YOMXXX commented May 29, 2026

@graycyrus all CI checks are green now (26/26 pass) and CodeRabbit re-approved after the rate-limit-slot-per-retry fix + the new end-to-end no-Retry-After test. Pinging in case this fell off your radar — happy to address anything else if needed.

@graycyrus graycyrus merged commit e9882f6 into tinyhumansai:main May 29, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Embedding API 429 floods Sentry with 31K events — no rate limit backoff

2 participants