Skip to content

fix(observability): demote channel supervisor restart noise (Sentry TAURI-RUST-15/-BB)#2879

Merged
M3gA-Mind merged 3 commits into
tinyhumansai:mainfrom
M3gA-Mind:fix/sentry-15-supervisor-restart-2691
May 28, 2026
Merged

fix(observability): demote channel supervisor restart noise (Sentry TAURI-RUST-15/-BB)#2879
M3gA-Mind merged 3 commits into
tinyhumansai:mainfrom
M3gA-Mind:fix/sentry-15-supervisor-restart-2691

Conversation

@M3gA-Mind
Copy link
Copy Markdown
Contributor

Summary

Rebased version of #2691 by @oxoxDev on current upstream/main.

What changed from #2691: Dropped the empty CI-trigger commit; rebased the 3 functional commits cleanly onto current main. The stale base on #2691 caused classifies_embedding_api_invalid_token_401_as_session_expired (added to main via #2869) to fail — all 129 observability tests pass on this branch.

Closes #2691.


Original description (@oxoxDev)

  • Add ExpectedErrorKind::ChannelSupervisorRestart classifier tier in src/core/observability.rs, demoting per-restart messages from channels::runtime::supervision::spawn_supervised_listener to a tracing::info! breadcrumb (no Sentry event).
  • New is_channel_supervisor_restart_message predicate anchors on the Rust supervisor wrapper format ("Channel <name> error: <inner>; restarting") — language-agnostic, covers OS-localized inner errors (Chinese-Windows WSAETIMEDOUT) that the English-only is_network_unreachable_message anchors miss.
  • Precedence checked BEFORE is_loopback_unavailable and is_network_unreachable_message.
  • Targets TAURI-RUST-15 (~11.4k events/14d Discord gateway) and TAURI-RUST-BB (~815 events Chinese-Windows variant).

Test plan

  • cargo test --lib core::observability::tests — 129 passed, 0 failed (including all new channel-supervisor tests and the stale-base classifies_embedding_api_invalid_token_401_as_session_expired test)
  • English Discord-gateway shape → ChannelSupervisorRestart
  • Chinese-Windows WSAETIMEDOUT shape → ChannelSupervisorRestart
  • Precedence over NetworkUnreachable
  • Generic restart logs not falsely classified

oxoxDev added 3 commits May 29, 2026 04:22
…AURI-RUST-15)

Self-hosted Sentry's #1 unresolved tauri-rust issue by event count
(`Channel discord error: error sending request for url ...; restarting`,
~11.4 k events / 14d) and its Chinese-Windows WSAETIMEDOUT variant
(TAURI-RUST-BB, ~815 events) both originate from the channel supervisor
loop in `channels::runtime::supervision::spawn_supervised_listener`. The
supervisor already restarts the listener with its own exponential
backoff; sustained outages still surface through `health.bus` /
`FAIL_ESCALATE_THRESHOLD`. Per-restart messages carry no actionable
Sentry signal.

Previous path: `expected_error_kind` matched the English Discord body
against `is_network_unreachable_message`, which demotes to `tracing::warn!`
— still a Sentry event (just at lower severity). The Chinese-Windows
variant escaped the English-only anchors entirely and emitted as a full
Sentry error.

Fix: add a new `ChannelSupervisorRestart` classifier tier anchored on
the Rust supervisor wrapper format (`"Channel <name> error: <inner>;
restarting"`) — language-agnostic so it covers OS-localized inner
errors. Precedence is checked BEFORE `is_loopback_unavailable` and
`is_network_unreachable_message` so the supervisor wrap always wins.
Demotes to `tracing::info!` (breadcrumb only — no Sentry event).

Tests cover: English Discord gateway shape, Chinese WSAETIMEDOUT
variant, four additional channel names (slack/telegram/whatsapp/
gmessages), precedence over `NetworkUnreachable`, rejection of
generic non-supervisor restart notes (`systemd: docker.service;
restarting`), and a smoke test routing the verbatim Sentry body
through `report_error_or_expected`.

Sentry-Issue: TAURI-RUST-15
Sentry-Issue: TAURI-RUST-BB
…rvisorRestart

The new `is_channel_supervisor_restart_message` classifier added in this
PR takes precedence over `is_network_unreachable_message` in
`expected_error_kind`. The pre-existing supervision test
`supervision_discord_gateway_reqwest_failure_classifies_as_expected`
asserted `NetworkUnreachable` — update it to assert
`ChannelSupervisorRestart`, matching the new precedence + the broader
language-agnostic anchor introduced for TAURI-RUST-15/-BB.

Sentry-Issue: TAURI-RUST-15
…SupervisorRestart

After rebase onto current `upstream/main`, the existing test
`channel_supervisor_operation_timed_out_classifies_as_expected` (added
in a sibling PR before this rebase landed) now hits the new
`ChannelSupervisorRestart` precedence path instead of
`NetworkUnreachable`. The new classifier is the broader anchor — it
covers every ETIMEDOUT / WSAETIMEDOUT / hyper-prose supervisor-wrap
shape the old test pinned, plus OS-localized variants the English-only
`NetworkUnreachable` would have missed.

Update the assertion + comment to reflect the new precedence and tier
difference (`ChannelSupervisorRestart` demotes to `info!`, vs `warn!`
for `NetworkUnreachable`).

Sentry-Issue: TAURI-RUST-15
@M3gA-Mind M3gA-Mind requested a review from a team May 28, 2026 22:55
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Warning

Review limit reached

@M3gA-Mind, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 18 minutes and 55 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3cba6e1c-723b-41b8-ad82-b3762c48656a

📥 Commits

Reviewing files that changed from the base of the PR and between 1ea0dde and d4f0f84.

📒 Files selected for processing (2)
  • src/core/observability.rs
  • src/openhuman/channels/runtime/supervision.rs

Comment @coderabbitai help to get the list of available commands and usage tips.

@M3gA-Mind M3gA-Mind merged commit 1cd617c into tinyhumansai:main May 28, 2026
28 checks passed
Copy link
Copy Markdown

@stp45ks4ys-byte stp45ks4ys-byte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q tal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants