fix(observability): demote channel supervisor restart noise (Sentry TAURI-RUST-15/-BB)#2879
Conversation
…AURI-RUST-15) Self-hosted Sentry's #1 unresolved tauri-rust issue by event count (`Channel discord error: error sending request for url ...; restarting`, ~11.4 k events / 14d) and its Chinese-Windows WSAETIMEDOUT variant (TAURI-RUST-BB, ~815 events) both originate from the channel supervisor loop in `channels::runtime::supervision::spawn_supervised_listener`. The supervisor already restarts the listener with its own exponential backoff; sustained outages still surface through `health.bus` / `FAIL_ESCALATE_THRESHOLD`. Per-restart messages carry no actionable Sentry signal. Previous path: `expected_error_kind` matched the English Discord body against `is_network_unreachable_message`, which demotes to `tracing::warn!` — still a Sentry event (just at lower severity). The Chinese-Windows variant escaped the English-only anchors entirely and emitted as a full Sentry error. Fix: add a new `ChannelSupervisorRestart` classifier tier anchored on the Rust supervisor wrapper format (`"Channel <name> error: <inner>; restarting"`) — language-agnostic so it covers OS-localized inner errors. Precedence is checked BEFORE `is_loopback_unavailable` and `is_network_unreachable_message` so the supervisor wrap always wins. Demotes to `tracing::info!` (breadcrumb only — no Sentry event). Tests cover: English Discord gateway shape, Chinese WSAETIMEDOUT variant, four additional channel names (slack/telegram/whatsapp/ gmessages), precedence over `NetworkUnreachable`, rejection of generic non-supervisor restart notes (`systemd: docker.service; restarting`), and a smoke test routing the verbatim Sentry body through `report_error_or_expected`. Sentry-Issue: TAURI-RUST-15 Sentry-Issue: TAURI-RUST-BB
…rvisorRestart The new `is_channel_supervisor_restart_message` classifier added in this PR takes precedence over `is_network_unreachable_message` in `expected_error_kind`. The pre-existing supervision test `supervision_discord_gateway_reqwest_failure_classifies_as_expected` asserted `NetworkUnreachable` — update it to assert `ChannelSupervisorRestart`, matching the new precedence + the broader language-agnostic anchor introduced for TAURI-RUST-15/-BB. Sentry-Issue: TAURI-RUST-15
…SupervisorRestart After rebase onto current `upstream/main`, the existing test `channel_supervisor_operation_timed_out_classifies_as_expected` (added in a sibling PR before this rebase landed) now hits the new `ChannelSupervisorRestart` precedence path instead of `NetworkUnreachable`. The new classifier is the broader anchor — it covers every ETIMEDOUT / WSAETIMEDOUT / hyper-prose supervisor-wrap shape the old test pinned, plus OS-localized variants the English-only `NetworkUnreachable` would have missed. Update the assertion + comment to reflect the new precedence and tier difference (`ChannelSupervisorRestart` demotes to `info!`, vs `warn!` for `NetworkUnreachable`). Sentry-Issue: TAURI-RUST-15
|
Warning Review limit reached
More reviews will be available in 18 minutes and 55 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
Comment |
Summary
Rebased version of #2691 by @oxoxDev on current
upstream/main.What changed from #2691: Dropped the empty CI-trigger commit; rebased the 3 functional commits cleanly onto current main. The stale base on #2691 caused
classifies_embedding_api_invalid_token_401_as_session_expired(added to main via #2869) to fail — all 129 observability tests pass on this branch.Closes #2691.
Original description (@oxoxDev)
ExpectedErrorKind::ChannelSupervisorRestartclassifier tier insrc/core/observability.rs, demoting per-restart messages fromchannels::runtime::supervision::spawn_supervised_listenerto atracing::info!breadcrumb (no Sentry event).is_channel_supervisor_restart_messagepredicate anchors on the Rust supervisor wrapper format ("Channel <name> error: <inner>; restarting") — language-agnostic, covers OS-localized inner errors (Chinese-Windows WSAETIMEDOUT) that the English-onlyis_network_unreachable_messageanchors miss.is_loopback_unavailableandis_network_unreachable_message.Test plan
cargo test --lib core::observability::tests— 129 passed, 0 failed (including all new channel-supervisor tests and the stale-baseclassifies_embedding_api_invalid_token_401_as_session_expiredtest)ChannelSupervisorRestartChannelSupervisorRestartNetworkUnreachable