fix(observability): suppress 429/rate-limit Sentry noise — TAURI-RUST-3#2899
Conversation
…T-3)
Three-layer suppression for upstream rate-limit events that were
generating ~31k Sentry events:
1. **`observability::is_upstream_rate_limit_message`** (new public predicate)
Catches all three observed wire shapes:
- `"rate_limit_error"` JSON type field (Anthropic/OpenAI envelope)
- `"upstream rate limit exceeded"` (OpenHuman backend wrapping upstream 429)
- `"429 rate limit exceeded"` (numeric-prefix form)
- `"api error (" + "rate limit exceeded"` (provider envelope catch-all)
Polarity contract: does NOT match `security::policy` action-budget message
(`"Rate limit exceeded: action budget exhausted"`) or bare "rate limit exceeded"
without the API error anchor.
2. **`expected_error_kind`** routes rate-limit messages to `TransientUpstreamHttp`
— demotes re-reports from `agent.run_single` / `web_channel.run_chat_task`
/ `rpc.invoke_method` to a `warn`-level breadcrumb instead of a Sentry event.
3. **`ops::api_error`** (primary per-attempt guard): when
`should_report_provider_http_failure(status)` would otherwise fire for a
non-429 status (e.g. HTTP 500), detect rate-limit phrases in the body and
skip `report_error` — logs a `warn` breadcrumb instead.
Targets OPENHUMAN-TAURI-S (~6 984 events: HTTP 500 wrapping
`"429 rate limit exceeded"` body).
4. **`before_send` in `main.rs`**: defense-in-depth filter that drops any future
call site that bypasses (2) or (3).
Covered Sentry issues (all resolved, fixes prevent recurrence):
- OPENHUMAN-TAURI-S ~6 984 events (500 wrapping "429 rate limit exceeded")
- OPENHUMAN-TAURI-6Y ~19 849 events (500 "upstream rate limit exceeded")
- OPENHUMAN-TAURI-2E ~1 482 events ("rate_limit_error" type in body)
- OPENHUMAN-TAURI-RQ ~741 events (embeddings 429 old format)
Tests: 9 new unit tests for the predicate and `expected_error_kind` in
`observability.rs`; 4 new tests in `ops.rs::rate_limit_body_suppression`.
Closes tinyhumansai#2898
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
💤 Files with no reviewable changes (2)
📝 WalkthroughWalkthroughAdds a shared classifier for upstream rate-limit message shapes and applies it in three places: observability routing ( ChangesUpstream rate-limit observability defense
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/main.rs`:
- Around line 105-111: The before_send filter currently lowercases and checks
only event.message and event.logentry for upstream rate-limit text before
calling is_upstream_rate_limit_message; update the filter to also inspect
exception values (e.g., iterate event.exception.values and any nested
Exception.value or Exception.stacktrace message strings) and include those
strings in the combined event_message used for to_ascii_lowercase, so
exception-only events containing the rate-limit text are caught by
openhuman_core::core::observability::is_upstream_rate_limit_message; ensure you
reference the same before_send closure and use event.exception (or
event.exception.values) and each Exception.value/message when building the
string checked.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f6750fd1-a43b-44a5-811b-f1c1b70aae45
📒 Files selected for processing (3)
src/core/observability.rssrc/main.rssrc/openhuman/inference/provider/ops.rs
…ilter - Merged upstream/main (resolving conflict in src/core/observability.rs by taking upstream and re-applying the PR's rate-limit suppression on top) - CodeRabbit fix: updated the `before_send` rate-limit filter in `src/main.rs` to also inspect `event.exception.values` so exception-only Sentry events carrying rate-limit text are caught, not just message/logentry events (addresses review comment on PR tinyhumansai#2899) - `is_upstream_rate_limit_message` and its tests merged cleanly into the updated observability.rs alongside the new upstream classifiers (ContextWindowExceeded, EmptyProviderResponse, FilesystemUserPathInvalid, MemoryStorePiiRejection, MemoryStoreBreakerOpen, DiskFull)
oxoxDev
left a comment
There was a problem hiding this comment.
LGTM — careful, well-tested rate-limit suppression. The polarity contract is handled right: security::policy "Rate limit exceeded: action budget exhausted" stays reachable (tested wrapped + unwrapped), and bare "rate limit exceeded" without the api error ( envelope is not swallowed. Three-layer suppression (call-site guard → expected_error_kind → before_send net), tight anchors, CI green. Approving.
Nitpick (optional) — main.rs before_send checks only the last exception
event.exception.last().and_then(|e| e.value.as_deref()) inspects only the outermost frame; a rate-limit phrase nested in an earlier exception would slip past this net. Low risk (phrase normally in message/logentry/outermost), but iterating all frames is more complete:
let from_exception = event.exception.iter().filter_map(|e| e.value.as_deref());Nice work citing each Sentry ID + event count and covering the re-report wrapping paths in tests.
Summary
is_upstream_rate_limit_messagepredicate toobservability.rs— matches three observed wire shapes:"rate_limit_error"JSON type,"upstream rate limit exceeded"body phrase, and"429 rate limit exceeded"numeric-prefix formexpected_error_kind → TransientUpstreamHttpso re-reports fromagent.run_single/web_channel.run_chat_task/rpc.invoke_methodare demoted towarnbreadcrumbsops::api_error: when the HTTP status is non-429 (e.g. 500) but the body contains a rate-limit phrase, skipreport_error— covers OPENHUMAN-TAURI-S (~6 984 events: OpenHuman backend returning HTTP 500 wrapping"429 rate limit exceeded")before_senddefense-in-depth filter inmain.rsfor any future call site that bypasses the primary guardsCovered Sentry issues (~28k total events):
Polarity contract: does NOT suppress
security::policyaction-budget message ("Rate limit exceeded: action budget exhausted") or bare "rate limit exceeded" without the API error anchor — those are user-facing hard stops, not upstream quota hits.Test plan
cargo check— clean (pre-existing warnings inslack-backfillbin only)cargo test --lib observability::tests— 100 tests pass including 9 new onescargo test --lib inference::provider::ops::tests— 28 tests pass including 4 new onescargo test --lib— 9776 tests pass, 0 failurescargo fmtappliedCloses #2898
Summary by CodeRabbit
Improvements
Tests