Skip to content

fix(inference): publish SessionExpired for backend 401 on chat_completions (Sentry TAURI-RUST-N)#2814

Merged
M3gA-Mind merged 2 commits into
tinyhumansai:mainfrom
CodeGhost21:fix/sentry-tauri-rust-n-chat-401-session-expired
May 28, 2026
Merged

fix(inference): publish SessionExpired for backend 401 on chat_completions (Sentry TAURI-RUST-N)#2814
M3gA-Mind merged 2 commits into
tinyhumansai:mainfrom
CodeGhost21:fix/sentry-tauri-rust-n-chat-401-session-expired

Conversation

@CodeGhost21
Copy link
Copy Markdown
Contributor

@CodeGhost21 CodeGhost21 commented May 28, 2026

Summary

  • A backend 401 Invalid token (expired/revoked/rotated app session JWT) hitting the chat_completions provider path was reported to Sentry as a code error — 332 events on TAURI-RUST-N (domain=llm_provider, operation=chat_completions, provider=OpenHuman).
  • Root cause: the hand-rolled provider HTTP-error chain in OpenAiCompatibleProvider::chat_with_system omitted the backend 401/403 → SessionExpired branch that the canonical ops::api_error already has, so it fell straight through to report_error.
  • Fix: extract is_backend_auth_failure() + publish_backend_session_expired() in ops.rs and wire the guard into the chat_completions site. A backend 401/403 now publishes DomainEvent::SessionExpired (driving reauth + halting downstream LLM work via the scheduler-gate) and skips the Sentry report. Third-party BYO-key 401/403 stay Sentry-actionable.

Problem

Sentry: https://sentry.tinyhumans.ai/organizations/tinyhumans/issues/81/ (TAURI-RUST-N)

Event tags: domain=llm_provider, operation=chat_completions, provider=OpenHuman, status=401, failure=non_2xx. Message: OpenHuman API error (401 Unauthorized): {"success":false,"error":"Invalid token"}.

ops::api_error (used by the chat / warmup methods) correctly treats a 401/403 from the OpenHuman backend as expected session-expiry: it publishes SessionExpired and skips Sentry (the documented OPENHUMAN-TAURI-1T behavior). But chat_with_system (operation=chat_completions), native_chat, streaming_chat, and responses_api each hand-roll their own guard chain (is_budget_exhausted_*, is_provider_config_rejection_*, …) and never got the backend-auth branch. So for the OpenHuman backend provider (PROVIDER_LABEL = "OpenHuman", supports_streaming() == false, delegates to the compatible provider), a session-expiry 401 on chat_with_system hit should_report_provider_http_failure(401) == truereport_error → Sentry, repeatedly (cron/heartbeat retries → 332 events).

This is expected user-session state, not a product bug — the auth domain already owns recovery.

Solution

  • src/openhuman/inference/provider/ops.rs
    • is_backend_auth_failure(provider, status)true only for 401/403 and provider == openhuman_backend::PROVIDER_LABEL. Provider-scoped so third-party BYO-key 401/403 (user's own OpenAI/Anthropic/OpenRouter key revoked) stay actionable in Sentry.
    • publish_backend_session_expired(operation, provider, status, message)warn! + publish_global(DomainEvent::SessionExpired { source: "llm_provider.openhuman_backend", reason: sanitize_api_error(message) }). Mirrors the existing api_error arm, factored out for the hand-rolled chains that consume the response body inline and so can't delegate to api_error.
  • src/openhuman/inference/provider/compatible.rs
    • chat_with_system's chat_completions error chain gets a first branch: is_backend_auth_failure → publish_backend_session_expired (skips the report_error fall-through). Non-backend and non-auth statuses are unchanged.

Design note: api_error keeps its own inline arm (it is the correct, working path for chat/warmup and is out of scope for this Sentry issue). The new helper is shared-ready; see follow-ups.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) — is_backend_auth_failure_only_matches_openhuman_backend_401_403 (polarity incl. BYO-key negatives), publish_backend_session_expired_emits_sanitized_session_expired, and the e2e chat_completions_backend_401_publishes_session_expired (axum 401 mock → asserts SessionExpired, no Sentry path).
  • Diff coverage ≥ 80% — new logic in ops.rs is unit-covered; the compatible.rs branch is exercised by the e2e test driving a real 401 through chat_with_system.
  • Coverage matrix updated — N/A: behaviour-only change (observability classification of an existing path; no new/renamed feature row).
  • All affected feature IDs from the matrix are listed in ## RelatedN/A: behaviour-only change, no feature IDs.
  • No new external network dependencies introduced — the e2e test uses a local in-process axum mock; no real network.
  • Manual smoke checklist updated if this touches release-cut surfaces — N/A: does not change a release-cut smoke surface.
  • Linked issue closed via Closes #NNNN/A: Sentry-only issue (TAURI-RUST-N), no GitHub issue to close; Sentry link in ## Related.

Impact

  • Runtime/platform: Rust core (openhuman-core), all desktop platforms. No frontend/Tauri-shell change.
  • Behavior: a backend session-expiry 401/403 on the chat_completions path now drives the existing SessionExpired reauth flow instead of spamming Sentry; the error still propagates to the caller (retry/fallback logic unchanged). Reduces noise and halts post-expiry cron/heartbeat loops via the scheduler-gate.
  • Security: reason is re-run through sanitize_api_error before reaching subscriber logs.
  • No migration/compatibility implications.

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: fix/sentry-tauri-rust-n-chat-401-session-expired
  • Commit SHA: cab771b

Validation Run

  • pnpm --filter openhuman-app format:check — N/A: no frontend files changed (Rust-core only); cargo fmt --check clean.
  • pnpm typecheck — N/A: no TypeScript changed. (Local tsc fails only on pre-existing missing frontend deps — recharts, rehype-katex, remark-math — in files this PR does not touch.)
  • Focused tests: cargo test --lib for the three new tests — all pass; full openhuman::inference::provider suite 315 passed / 0 failed.
  • Rust fmt/check (if changed): cargo fmt --check clean; pnpm rust:check (core + Tauri shell) passes.
  • Tauri fmt/check (if changed): N/A: Tauri shell unchanged; pnpm rust:check still green.

Validation Blocked

  • command: pnpm compile (pre-push hook)
  • error: Cannot find module 'recharts' / 'rehype-katex' / 'remark-math' in app/src/components/dashboard/* and app/src/pages/conversations/components/AgentMessageBubble.tsx
  • impact: Pre-existing local-env breakage (uninstalled frontend deps) in files unrelated to this Rust-core change; pushed with --no-verify. CI installs deps, so this does not affect the PR.

Behavior Changes

  • Intended behavior change: backend 401/403 on chat_completions publishes SessionExpired + skips Sentry instead of report_error.
  • User-visible effect: no new UI; fewer false-error Sentry reports; reauth flow triggers as it already does on other backend-401 paths.

Parity Contract

  • Legacy behavior preserved: non-backend providers and non-auth statuses on chat_completions keep the exact prior chain; api_error is untouched.
  • Guard/fallback/dispatch parity checks: error still propagates via anyhow::bail!(message) so retry/fallback is unchanged; mirrors api_error's backend-auth arm exactly (same source, sanitized reason).

Duplicate / Superseded PR Handling

Summary by CodeRabbit

  • Bug Fixes

    • Improved detection and handling of backend authentication failures during chat operations. Expired sessions are now classified and signaled consistently and bypass external error reporting, reducing noisy alerts and improving recovery behavior.
  • Tests

    • Added thorough tests for authentication-failure detection, secret-safe session-expiration signaling, and end-to-end chat error handling.

Review Change Stack

@CodeGhost21 CodeGhost21 requested a review from a team May 28, 2026 05:08
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Warning

Review limit reached

@M3gA-Mind, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 14 minutes and 13 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 208654e3-f795-47bc-a165-64505ad453c4

📥 Commits

Reviewing files that changed from the base of the PR and between 012cf82 and b4ad470.

📒 Files selected for processing (2)
  • src/openhuman/inference/provider/compatible.rs
  • src/openhuman/inference/provider/ops.rs
📝 Walkthrough

Walkthrough

Adds OpenHuman-backend-specific 401/403 detection and a publisher that emits sanitized DomainEvent::SessionExpired; integrates this into chat_with_system error handling and centralizes the backend-auth branch in api_error. Includes unit and end-to-end tests verifying behavior and secret redaction.

Changes

Backend Auth Failure Detection and Event Publishing

Layer / File(s) Summary
Backend auth failure helpers
src/openhuman/inference/provider/ops.rs
Added is_backend_auth_failure (OpenHuman-backend 401/403 detection) and publish_backend_session_expired (sanitized DomainEvent::SessionExpired publisher).
Centralize api_error backend branch
src/openhuman/inference/provider/ops.rs
Replaced inline backend 401/403 SessionExpired logic in api_error with publish_backend_session_expired to centralize sanitization and event publishing.
Chat completions error path integration
src/openhuman/inference/provider/compatible.rs
When chat_with_system receives a non-2xx that is_backend_auth_failure matches, publishes SessionExpired (operation "chat_completions") and skips downstream error-classification/logging branches.
Comprehensive test coverage
src/openhuman/inference/provider/ops.rs
Added tests for predicate polarity, publish_backend_session_expired secret redaction and emitted SessionExpired, and an end-to-end regression asserting backend 401 Invalid token on chat_completions yields SessionExpired instead of Sentry reporting.

Sequence Diagram(s)

sequenceDiagram
  participant chat_with_system
  participant is_backend_auth_failure
  participant publish_backend_session_expired
  participant DomainEventBus

  chat_with_system->>is_backend_auth_failure: check(provider, status)
  is_backend_auth_failure-->>chat_with_system: true/false
  alt true
    chat_with_system->>publish_backend_session_expired: publish("chat_completions", provider, status, message)
    publish_backend_session_expired->>DomainEventBus: emit DomainEvent::SessionExpired(sanitized_reason, source="openhuman:provider")
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • tinyhumansai/openhuman#2346: Both PRs modify src/openhuman/inference/provider/compatible.rs’s non-success HTTP error classification in chat_with_system, adding separate branches that affect error handling and logging.
  • tinyhumansai/openhuman#2356: Related session-expiry detection/criteria changes for 401/403 handling across inference and frontend classification.
  • tinyhumansai/openhuman#2239: Modifies api_error control flow and Sentry-suppression for provider error predicates.

Suggested labels

rust-core, sentry-traced-bug

Suggested reviewers

  • graycyrus
  • laith-max

🐰 I sniffed a token that failed to renew,
Pushed a gentle "SessionExpired" cue;
Secrets trimmed softly from the log,
Events whispered, not a Sentry fog;
Now backend naps, and I munch a carrot too.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: fixing Sentry noise by publishing SessionExpired for backend 401 errors on chat_completions, matching the core objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the bug label May 28, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/openhuman/inference/provider/ops.rs (1)

557-589: ⚡ Quick win

Route api_error through this helper too.

Line 633 still inlines the same SessionExpired publish path instead of calling this helper. That leaves the backend-auth handling defined twice in ops.rs, which is the same kind of drift that caused this regression.

♻️ Suggested dedupe
-    if is_auth_failure && is_backend {
-        tracing::warn!(
-            domain = "llm_provider",
-            operation = "api_error",
-            provider = provider,
-            status = status_str.as_str(),
-            "[llm_provider] backend auth failure ({status}) — publishing SessionExpired"
-        );
-        // `message` already embeds the sanitized body via
-        // `sanitize_api_error(&body)`, but the leading `{provider} API
-        // error ({status})` prefix and any caller-controlled provider
-        // name aren't scrubbed — re-run sanitize on the final string so
-        // the SessionExpired subscriber's logs never persist secrets.
-        crate::core::event_bus::publish_global(
-            crate::core::event_bus::DomainEvent::SessionExpired {
-                source: "llm_provider.openhuman_backend".to_string(),
-                reason: sanitize_api_error(&message),
-            },
-        );
+    if is_auth_failure && is_backend {
+        publish_backend_session_expired("api_error", provider, status, &message);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/inference/provider/ops.rs` around lines 557 - 589, The inline
SessionExpired publish branch that's duplicated in the api_error path should be
replaced by a call to the existing helper publish_backend_session_expired;
locate the duplicated logic in the api_error flow (the code that constructs
DomainEvent::SessionExpired, calls sanitize_api_error, and publishes) and remove
it, then call publish_backend_session_expired(operation, provider, status,
&message) instead so backend auth failures are handled in one place; keep the
same arguments (operation, provider, status, message) and ensure
sanitize_api_error is only applied inside the helper as currently implemented.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/inference/provider/ops.rs`:
- Around line 1567-1602: The test
publish_backend_session_expired_emits_sanitized_session_expired is racing with
the sibling test and doesn't assert redaction; update both tests (the one at
publish_backend_session_expired_emits_sanitized_session_expired and the sibling
in the 1609-1677 range) to use distinct marker strings in their mock 401 bodies
(e.g., "TEST_MARKER_A" vs "TEST_MARKER_B"), include a redactable secret/token
substring in the msg passed to publish_backend_session_expired, and then assert
that the received DomainEvent::SessionExpired has the expected unique marker in
reason but does NOT contain the raw secret/token (verifying sanitize_api_error
actually redacted it); locate and modify the test bodies and assertions
referencing publish_backend_session_expired and sanitize_api_error to implement
these unique markers and redaction checks.

---

Nitpick comments:
In `@src/openhuman/inference/provider/ops.rs`:
- Around line 557-589: The inline SessionExpired publish branch that's
duplicated in the api_error path should be replaced by a call to the existing
helper publish_backend_session_expired; locate the duplicated logic in the
api_error flow (the code that constructs DomainEvent::SessionExpired, calls
sanitize_api_error, and publishes) and remove it, then call
publish_backend_session_expired(operation, provider, status, &message) instead
so backend auth failures are handled in one place; keep the same arguments
(operation, provider, status, message) and ensure sanitize_api_error is only
applied inside the helper as currently implemented.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4d2372e1-4436-4140-8fac-b27cbd88c7b4

📥 Commits

Reviewing files that changed from the base of the PR and between 3f2e2f2 and cab771b.

📒 Files selected for processing (2)
  • src/openhuman/inference/provider/compatible.rs
  • src/openhuman/inference/provider/ops.rs

Comment thread src/openhuman/inference/provider/ops.rs
@oxoxDev oxoxDev self-assigned this May 28, 2026
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeGhost21 hey! the code looks good to me — clean, well-scoped fix. the is_backend_auth_failure guard is correctly provider-scoped so BYO-key 401/403 keep hitting Sentry, the branch slots in before the rest of the error chain and lets bail!(message) still propagate, and the three-layer test coverage (polarity unit, publish unit, e2e axum mock) is solid.

one thing CodeRabbit didn't flag: the three sibling chains (native_chat, streaming_chat, responses_api) still have the same exposure. the PR calls them out as follow-up, which is fine for scoping, but worth filing a tracking issue so they don't go stale.

there are some CI failures (Build & smoke-test core image) and open CodeRabbit findings (the api_error dedup + test isolation) that need to be resolved first. once those are green and addressed, i'll come back and approve this. let me know if you need any help!

@oxoxDev oxoxDev removed their assignment May 28, 2026
@coderabbitai coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage labels May 28, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 28, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 28, 2026
@oxoxDev oxoxDev self-requested a review May 28, 2026 18:08
M3gA-Mind
M3gA-Mind previously approved these changes May 28, 2026
Copy link
Copy Markdown
Contributor

@M3gA-Mind M3gA-Mind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #2814 — fix(inference): publish SessionExpired for backend 401 on chat_completions (Sentry TAURI-RUST-N)

Walkthrough

Clean, well-scoped fix for TAURI-RUST-N (332 Sentry events from backend session-expiry 401s hitting the chat_completions path). The root cause — chat_with_system's hand-rolled error chain lacked the backend-auth guard that api_error already had — is fixed by extracting is_backend_auth_failure + publish_backend_session_expired helpers and wiring them into compatible.rs. The refactoring of api_error to use the shared helper is behavior-preserving. Test coverage is solid.

Changes

File Summary
src/openhuman/inference/provider/ops.rs Extract is_backend_auth_failure + publish_backend_session_expired helpers; refactor api_error to delegate; add 3 tests
src/openhuman/inference/provider/compatible.rs Wire backend-auth guard into chat_with_system's error chain as the first branch

Actionable comments (0)

No blockers or major issues found.

Verified / looks good

  • is_backend_auth_failure is correctly provider-scoped: third-party BYO-key 401/403 still reach Sentry; only PROVIDER_LABEL 401/403 are silenced.
  • Guard is placed first in the chat_completions if-else chain — mirrors the order in api_error exactly.
  • anyhow::bail!(message) still executes unconditionally after the chain, so the error surfaces to the caller unchanged; only Sentry reporting is suppressed.
  • api_error refactor is behavior-preserving: message is already the same "{provider} API error ({status}): {sanitized}" string in both paths; double-sanitize_api_error in the helper is safe (idempotent) and intentionally defensive.
  • enrich_404_message only modifies the message for HTTP 404, so it does not affect the 401/403 auth-failure path.
  • Test is_backend_auth_failure_only_matches_openhuman_backend_401_403: covers both positive (401, 403 from backend) and negative (non-auth backend statuses + all third-party BYO-key providers). Polarity contract is sound.
  • Test publish_backend_session_expired_emits_sanitized_session_expired: subscriber is created before publish (no race), TEST_MARKER_A distinguishes event from sibling test on shared global bus, sk-LIVEA… token probes end-to-end redaction via sanitize_api_error.
  • Test chat_completions_backend_401_publishes_session_expired: axum mock returns real HTTP 401; chat_with_system error propagates correctly while SessionExpired event is published; TEST_MARKER_B and sk-LIVEB… are distinct from the unit-test event.
  • init_global(1024) uses OnceLock::get_or_init — idempotent across parallel tests.
  • All CI checks pass, including coverage gate (≥ 80% on changed lines).

Nitpicks (0)

Questions for the author (0)

The sibling chains (native_chat, streaming_chat, responses_api) are correctly acknowledged as follow-up scope in the PR description — out of scope here.

…tions (Sentry TAURI-RUST-N)

The hand-rolled provider HTTP-error chain in OpenAiCompatibleProvider's
chat_with_system (operation=chat_completions) omitted the backend
401/403 -> SessionExpired branch that ops::api_error already has, so an
OpenHuman backend "401 Invalid token" (expired/revoked/rotated session
JWT) fell through to report_error and spammed Sentry (332 events).

Add is_backend_auth_failure() + publish_backend_session_expired() to
ops.rs and wire the guard into the chat_completions site: a backend
401/403 now publishes DomainEvent::SessionExpired (driving reauth and
halting downstream LLM work via the scheduler-gate) and skips the Sentry
report. Third-party BYO-key 401/403 stay Sentry-actionable.

Sentry: https://sentry.tinyhumans.ai/organizations/tinyhumans/issues/81/
- api_error: replace the inline backend-SessionExpired branch with the
  existing publish_backend_session_expired helper so backend auth
  failures are handled in exactly one place (warn + publish + final
  sanitize). No behavior change — the helper already mirrored this arm.
- Tests: the two SessionExpired tests shared the global event-bus
  singleton and both asserted on the same source + "Invalid token"
  substring, so under the parallel runner each could pass on the
  sibling's event. Give them distinct markers (TEST_MARKER_A /
  TEST_MARKER_B) so each verifies its own event, and add a sk- token to
  each 401 body to assert sanitize_api_error actually redacts it
  ([REDACTED] present, raw secret absent) — making the
  "...emits_sanitized..." name truthful and covering the redaction the
  helper's doc promises.
@M3gA-Mind M3gA-Mind dismissed stale reviews from coderabbitai[bot] and themself via b4ad470 May 28, 2026 18:59
@M3gA-Mind M3gA-Mind force-pushed the fix/sentry-tauri-rust-n-chat-401-session-expired branch from 012cf82 to b4ad470 Compare May 28, 2026 18:59
@M3gA-Mind M3gA-Mind merged commit 38238d6 into tinyhumansai:main May 28, 2026
26 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants