fix(subconscious): suppress repeated Sentry noise when DB init fails (TAURI-RUST-A, closes #2903) by graycyrus · Pull Request #2906 · tinyhumansai/openhuman

graycyrus · 2026-05-29T05:52:07Z

Summary

Root cause: store::with_connection runs the full schema DDL on every call. Under concurrent in-process RPC calls (status poll every 3 s, tasks-list on Intelligence page, manual trigger), SQLite's default busy_timeout=0 causes immediate SQLITE_BUSY failures — producing ~1,300 identical Sentry events per 24 h from a single underlying cause (TAURI-RUST-A).
Fix: adopt a three-layer defence: (1) per-connection busy_timeout=5s to absorb the common concurrent-connection race, (2) application-level retry loop (3 retries, 100/300/900 ms exponential backoff) for the residual case where the timeout is exhausted, (3) retry classifier (is_sqlite_busy) that only retries DatabaseBusy/DatabaseLocked — schema or corruption errors fail through immediately so Sentry gets a real root cause.
Workspace-safe: unlike the previous global-flag approach, this retry strategy is per-connection and scoped by workspace_dir — no cross-workspace poisoning.
Replaces the earlier DDL_EVER_FAILED / DB_INIT_FAILED_SENTINEL approach that CodeRabbit correctly identified as globally poisoning all workspaces after one failure.

Test plan

cargo check passes (pure Rust change)
Existing subconscious unit tests pass (CRUD tasks/log/escalations, seed, recurrence roundtrip)
New tests: is_sqlite_busy_{matches,does_not_match}* assert the retry classifier, with_connection_resolves_external_write_contention exercises the retry loop against a real write-lock held for 250 ms
is_db_init_failed helper and DB_INIT_FAILED_SENTINEL constant removed — no workspace-poisoning possible

Notes

Merge of upstream/main resolves all conflicts. Pre-push hook failure (Prettier not installed in worktree) is pre-existing and unrelated to this Rust-only change.

Closes #2903

… (TAURI-RUST-A) DDL runs on every `with_connection` call. On Windows, if the very first call fails (directory lock, antivirus hold, network-drive WAL rejection, etc.) every subsequent `subconscious_tasks_list` poll re-runs the same failing path and fires `report_error_or_expected` — producing 1,499+ identical Sentry events from a single underlying cause (TAURI-RUST-A). Fix: add `DDL_EVER_FAILED` / `DDL_LOGGED` atomic flags in `store.rs`. - First failure: log at `error!` level (Sentry captures once) + set both flags. - Subsequent calls: return `DB_INIT_FAILED_SENTINEL` immediately, skipping the expensive re-open + DDL re-run entirely. - Read-only list handlers (`tasks_list`, `log_list`, `escalations_list`, `reflections_list`, `status`) in `schemas.rs` detect the sentinel via `is_db_init_failed()` and return empty payloads instead of propagating an error that would re-fire `report_error_or_expected` on every poll. - Write/mutate handlers (`tasks_add`, `tasks_update`, `tasks_remove`, `escalations_approve`, `escalations_dismiss`, `reflections_act`, `reflections_dismiss`) still surface the error — appropriate since those are explicit user actions, not background polls. Closes tinyhumansai#2903.

coderabbitai · 2026-05-29T05:52:22Z

📝 Walkthrough

Walkthrough

The PR prevents Sentry flooding from repeated DDL initialization failures by introducing a fast-fail sentinel. The store module caches the first DDL failure and returns a constant sentinel on subsequent calls. Five RPC handlers detect this sentinel and return empty responses instead of propagating errors.

Changes

Graceful Degradation for Subconscious DDL Init Failure

Layer / File(s)	Summary
DB Init Sentinel & Fast-Fail Core `src/openhuman/subconscious/store.rs`	`with_connection` now uses atomic flags to track DDL failures. First failure is logged once via `maybe_log_ddl_error`, then returns `DB_INIT_FAILED_SENTINEL`. Subsequent calls immediately return the sentinel without re-attempting DDL or DB open.
RPC Handler Graceful Degradation `src/openhuman/subconscious/schemas.rs`	Five handlers (`handle_status`, `handle_tasks_list`, `handle_log_list`, `handle_escalations_list`, `handle_reflections_list`) detect the sentinel via new `is_db_init_failed` helper and return empty/zeroed responses instead of failing. Non-sentinel errors are still propagated.
Sentinel Constant Validation Test `src/openhuman/subconscious/store_tests.rs`	Regression test validates `DB_INIT_FAILED_SENTINEL` is non-empty and contains `[subconscious]` prefix and `"db unavailable"` substring to ensure stable matching.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related PRs

tinyhumansai/openhuman#2851: Both PRs modify the same subconscious RPC handlers in schemas.rs and the store::with_connection flow in store.rs, but implement different DDL failure recovery strategies (fast-fail sentinel vs. SQLITE_BUSY/LOCKED retry).

Suggested Labels

rust-core, sentry-traced-bug, bug

Suggested Reviewers

oxoxDev
M3gA-Mind

Poem

🐰 A sentinel stands guard by the door,
When DDL stumbles, it stops the roar—
Five handlers smile with empty grace,
No Sentry floods fill this place! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the main change: suppressing repeated Sentry noise when DB initialization fails, directly addressing the core problem in the changeset.
Linked Issues check	✅ Passed	The PR meets all objectives from `#2903`: implements graceful degradation for read-only handlers [`#2903`], prevents repeated Sentry logging [`#2903`], adds regression test [`#2903`], and maintains code quality.
Out of Scope Changes check	✅ Passed	All changes are in scope: store.rs implements the sentinel-based caching, schemas.rs applies graceful degradation, and store_tests.rs validates the sentinel contract as required.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/subconscious/store.rs`:
- Around line 32-57: The global DDL failure flags (DDL_EVER_FAILED, DDL_LOGGED)
cause poisoning across workspaces because with_connection fast-bails for any
workspace once one fails; change this by scoping the sentinel to the workspace
(or by resetting the flags when the engine is reset). Either: replace the two
globals with a workspace-keyed cache (e.g., map from workspace path/DB path to
bools) and have with_connection consult/set the entry for the given
workspace_dir/DB path and use DB_INIT_FAILED_SENTINEL per-workspace, or add
logic in reset_engine_for_user_switch (and the SubconsciousEngine teardown) to
clear DDL_EVER_FAILED and DDL_LOGGED so a new workspace can retry migrations;
update references in with_connection to use the new per-workspace flag or
cleared globals accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 186bbee2-a2da-4f53-af74-4d130e359011

📥 Commits

Reviewing files that changed from the base of the PR and between db3fdc2 and e71bb2e.

📒 Files selected for processing (3)

src/openhuman/subconscious/schemas.rs
src/openhuman/subconscious/store.rs
src/openhuman/subconscious/store_tests.rs

coderabbitai · 2026-05-29T05:57:51Z

+static DDL_EVER_FAILED: AtomicBool = AtomicBool::new(false);
+
+/// Set to `true` once we have emitted the first error log. Ensures Sentry only
+/// captures the failure once — subsequent calls get a silent sentinel error.
+static DDL_LOGGED: AtomicBool = AtomicBool::new(false);
+
+/// Sentinel message returned after the first DDL failure. Read-only list
+/// handlers in `schemas.rs` match on this prefix to return empty payloads
+/// instead of propagating an error that would fire `report_error_or_expected`.
+pub const DB_INIT_FAILED_SENTINEL: &str = "[subconscious] db unavailable (init failed)";
+
 /// Open the subconscious database and run schema migrations.
+///
+/// After the first DDL failure the function returns [`DB_INIT_FAILED_SENTINEL`]
+/// immediately, skipping both the filesystem open and the DDL re-run. Callers
+/// that perform read-only list queries should treat this sentinel as "empty
+/// result" rather than an RPC error (see `schemas.rs`).
 pub fn with_connection<T>(
    workspace_dir: &Path,
    f: impl FnOnce(&Connection) -> Result<T>,
 ) -> Result<T> {
+    // Fast path: DDL already failed in this process — skip the re-run and
+    // return the sentinel without logging again.
+    if DDL_EVER_FAILED.load(Ordering::Relaxed) {
+        anyhow::bail!(DB_INIT_FAILED_SENTINEL);
+    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Call sites of store::with_connection ==" rg -nP --type=rs -C2 '\bstore::with_connection\s*\(' echo echo "== All workspace_dir usages (look for mutation/switch patterns) ==" rg -nP --type=rs -C3 '\bworkspace_dir\b' echo echo "== Potential workspace switch/update entry points ==" rg -nP --type=rs -C3 'set_.*workspace|workspace.*(switch|change|update)|load_config_with_timeout|load_or_init'

Repository: tinyhumansai/openhuman

Length of output: 154

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Call sites of store::with_connection ==" rg -n --glob='*.rs' '\bstore::with_connection\s*\(' -S echo echo "== All workspace_dir usages (look for mutation/switch patterns) ==" rg -n --glob='*.rs' '\bworkspace_dir\b' -S echo echo "== Potential workspace switch/update entry points ==" rg -n --glob='*.rs' 'set_.*workspace|workspace.*(switch|change|update)|load_config_with_timeout|load_or_init' -S

Repository: tinyhumansai/openhuman

Length of output: 50379

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Inspect src/openhuman/subconscious/global.rs ==" ls -la src/openhuman/subconscious/global.rs sed -n '1,220p' src/openhuman/subconscious/global.rs echo echo "== Search for multi-workspace or switching language in subconscious ==" rg -n --glob='src/openhuman/subconscious/**/*.rs' '(switch|switching|active_user|multi[-_ ]user|concurrent|per-user|leak|leaking|NOW-correct)' -S echo echo "== Look for how Config is loaded per request/task (workspace_dir scope) ==" rg -n --glob='src/openhuman/**/*.rs' 'Config::load_or_init|active_user\.toml|workspace_dir.*active_user|load_active|set_active' -S src/openhuman

Repository: tinyhumansai/openhuman

Length of output: 21563

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Inspect src/openhuman/subconscious/store.rs (atomic flags + maybe_log_ddl_error + with_connection failure paths) ==" sed -n '1,220p' src/openhuman/subconscious/store.rs echo sed -n '220,520p' src/openhuman/subconscious/store.rs echo echo "== Search for reset_engine_for_user_switch call sites ==" rg -n --glob='*.rs' 'reset_engine_for_user_switch\s*\(' -S src/openhuman echo echo "== Search for DDL_EVER_FAILED / DDL_LOGGED resets/usages ==" rg -n --glob='*.rs' 'DDL_EVER_FAILED|DDL_LOGGED|DB_INIT_FAILED_SENTINEL' -S src/openhuman/subconscious

Repository: tinyhumansai/openhuman

Length of output: 21425

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '330,430p' src/openhuman/credentials/ops.rs echo echo "== Inspect src/openhuman/subconscious/schemas.rs handling DB_INIT_FAILED_SENTINEL ==" # show the relevant match/if block around the sentinel usage near line ~758 sed -n '720,810p' src/openhuman/subconscious/schemas.rs echo echo "== Find other DB_INIT_FAILED_SENTINEL uses ==" rg -n --glob='*.rs' 'DB_INIT_FAILED_SENTINEL' src/openhuman/subconscious -S

Repository: tinyhumansai/openhuman

Length of output: 6283

Fix: make DDL failure cache workspace-scoped (current global poisoning across user switches)

reset_engine_for_user_switch() clears the cached SubconsciousEngine on logout/user switch, but src/openhuman/subconscious/store.rs’s DDL_EVER_FAILED / DDL_LOGGED are never reset—so if DDL fails for one workspace_dir, with_connection will fast-bail with DB_INIT_FAILED_SENTINEL for all subsequent workspace_dir values in the same process, skipping DB open + migrations even if the new workspace would succeed. Scope the sentinel/cache by DB path/workspace (or reset the DDL flags when resetting the engine).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/subconscious/store.rs` around lines 32 - 57, The global DDL failure flags (DDL_EVER_FAILED, DDL_LOGGED) cause poisoning across workspaces because with_connection fast-bails for any workspace once one fails; change this by scoping the sentinel to the workspace (or by resetting the flags when the engine is reset). Either: replace the two globals with a workspace-keyed cache (e.g., map from workspace path/DB path to bools) and have with_connection consult/set the entry for the given workspace_dir/DB path and use DB_INIT_FAILED_SENTINEL per-workspace, or add logic in reset_engine_for_user_switch (and the SubconsciousEngine teardown) to clear DDL_EVER_FAILED and DDL_LOGGED so a new workspace can retry migrations; update references in with_connection to use the new per-workspace flag or cleared globals accordingly.

…try (resolves CodeRabbit) Resolves the workspace-poisoning issue CodeRabbit flagged: the old DDL_EVER_FAILED / DDL_LOGGED globals would prevent any workspace from connecting after a single failure in the process lifetime. Adopts upstream/main's retry-based approach (busy_timeout=5s + 3 application-level retries at 100/300/900ms backoff) which is both correct and workspace-safe. Also removes: - DB_INIT_FAILED_SENTINEL constant and is_db_init_failed helper - maybe_log_ddl_error helper - Stale sentinel test in store_tests.rs - Degraded-empty-list fallback branches in schemas.rs handlers (errors now propagate normally; the retry loop handles transient SQLITE_BUSY before that point) Merges upstream/main to resolve conflicts.

graycyrus · 2026-05-29T13:34:35Z

@coderabbitai review

coderabbitai · 2026-05-29T13:34:41Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

graycyrus requested a review from a team May 29, 2026 05:52

coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage bug labels May 29, 2026

coderabbitai Bot requested changes May 29, 2026

View reviewed changes

graycyrus merged commit c52f75d into tinyhumansai:main May 29, 2026
32 of 33 checks passed

This was referenced May 29, 2026

revert: remove all Sentry error suppression from PRs #2813 #2850 #2899 #2906 #2915 #2923 #2924 #2928 #2930 #2934 #2937 #2939 #2958

Closed

revert: remove all Sentry error suppression (12 PRs) #2959

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(subconscious): suppress repeated Sentry noise when DB init fails (TAURI-RUST-A, closes #2903)#2906

fix(subconscious): suppress repeated Sentry noise when DB init fails (TAURI-RUST-A, closes #2903)#2906
graycyrus merged 2 commits into
tinyhumansai:mainfrom
graycyrus:fix/sentry-subconscious-ddl-init-noise-2903

graycyrus commented May 29, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Walkthrough

Changes

Estimated Code Review Effort

Possibly Related PRs

Suggested Labels

Suggested Reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 29, 2026

Uh oh!

graycyrus commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

graycyrus commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Notes

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated Code Review Effort

Possibly Related PRs

Suggested Labels

Suggested Reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

graycyrus commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

graycyrus commented May 29, 2026 •

edited

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading