Skip to content

fix(subconscious): suppress repeated Sentry noise when DB init fails (TAURI-RUST-A, closes #2903)#2906

Merged
graycyrus merged 2 commits into
tinyhumansai:mainfrom
graycyrus:fix/sentry-subconscious-ddl-init-noise-2903
May 29, 2026
Merged

fix(subconscious): suppress repeated Sentry noise when DB init fails (TAURI-RUST-A, closes #2903)#2906
graycyrus merged 2 commits into
tinyhumansai:mainfrom
graycyrus:fix/sentry-subconscious-ddl-init-noise-2903

Conversation

@graycyrus
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus commented May 29, 2026

Summary

  • Root cause: store::with_connection runs the full schema DDL on every call. Under concurrent in-process RPC calls (status poll every 3 s, tasks-list on Intelligence page, manual trigger), SQLite's default busy_timeout=0 causes immediate SQLITE_BUSY failures — producing ~1,300 identical Sentry events per 24 h from a single underlying cause (TAURI-RUST-A).
  • Fix: adopt a three-layer defence: (1) per-connection busy_timeout=5s to absorb the common concurrent-connection race, (2) application-level retry loop (3 retries, 100/300/900 ms exponential backoff) for the residual case where the timeout is exhausted, (3) retry classifier (is_sqlite_busy) that only retries DatabaseBusy/DatabaseLocked — schema or corruption errors fail through immediately so Sentry gets a real root cause.
  • Workspace-safe: unlike the previous global-flag approach, this retry strategy is per-connection and scoped by workspace_dir — no cross-workspace poisoning.
  • Replaces the earlier DDL_EVER_FAILED / DB_INIT_FAILED_SENTINEL approach that CodeRabbit correctly identified as globally poisoning all workspaces after one failure.

Test plan

  • cargo check passes (pure Rust change)
  • Existing subconscious unit tests pass (CRUD tasks/log/escalations, seed, recurrence roundtrip)
  • New tests: is_sqlite_busy_{matches,does_not_match}* assert the retry classifier, with_connection_resolves_external_write_contention exercises the retry loop against a real write-lock held for 250 ms
  • is_db_init_failed helper and DB_INIT_FAILED_SENTINEL constant removed — no workspace-poisoning possible

Notes

Merge of upstream/main resolves all conflicts. Pre-push hook failure (Prettier not installed in worktree) is pre-existing and unrelated to this Rust-only change.

Closes #2903

… (TAURI-RUST-A)

DDL runs on every `with_connection` call. On Windows, if the very first
call fails (directory lock, antivirus hold, network-drive WAL rejection,
etc.) every subsequent `subconscious_tasks_list` poll re-runs the same
failing path and fires `report_error_or_expected` — producing 1,499+
identical Sentry events from a single underlying cause (TAURI-RUST-A).

Fix: add `DDL_EVER_FAILED` / `DDL_LOGGED` atomic flags in `store.rs`.
- First failure: log at `error!` level (Sentry captures once) + set both flags.
- Subsequent calls: return `DB_INIT_FAILED_SENTINEL` immediately, skipping the
  expensive re-open + DDL re-run entirely.
- Read-only list handlers (`tasks_list`, `log_list`, `escalations_list`,
  `reflections_list`, `status`) in `schemas.rs` detect the sentinel via
  `is_db_init_failed()` and return empty payloads instead of propagating
  an error that would re-fire `report_error_or_expected` on every poll.
- Write/mutate handlers (`tasks_add`, `tasks_update`, `tasks_remove`,
  `escalations_approve`, `escalations_dismiss`, `reflections_act`,
  `reflections_dismiss`) still surface the error — appropriate since those
  are explicit user actions, not background polls.

Closes tinyhumansai#2903.
@graycyrus graycyrus requested a review from a team May 29, 2026 05:52
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

The PR prevents Sentry flooding from repeated DDL initialization failures by introducing a fast-fail sentinel. The store module caches the first DDL failure and returns a constant sentinel on subsequent calls. Five RPC handlers detect this sentinel and return empty responses instead of propagating errors.

Changes

Graceful Degradation for Subconscious DDL Init Failure

Layer / File(s) Summary
DB Init Sentinel & Fast-Fail Core
src/openhuman/subconscious/store.rs
with_connection now uses atomic flags to track DDL failures. First failure is logged once via maybe_log_ddl_error, then returns DB_INIT_FAILED_SENTINEL. Subsequent calls immediately return the sentinel without re-attempting DDL or DB open.
RPC Handler Graceful Degradation
src/openhuman/subconscious/schemas.rs
Five handlers (handle_status, handle_tasks_list, handle_log_list, handle_escalations_list, handle_reflections_list) detect the sentinel via new is_db_init_failed helper and return empty/zeroed responses instead of failing. Non-sentinel errors are still propagated.
Sentinel Constant Validation Test
src/openhuman/subconscious/store_tests.rs
Regression test validates DB_INIT_FAILED_SENTINEL is non-empty and contains [subconscious] prefix and "db unavailable" substring to ensure stable matching.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related PRs

  • tinyhumansai/openhuman#2851: Both PRs modify the same subconscious RPC handlers in schemas.rs and the store::with_connection flow in store.rs, but implement different DDL failure recovery strategies (fast-fail sentinel vs. SQLITE_BUSY/LOCKED retry).

Suggested Labels

rust-core, sentry-traced-bug, bug

Suggested Reviewers

  • oxoxDev
  • M3gA-Mind

Poem

🐰 A sentinel stands guard by the door,
When DDL stumbles, it stops the roar—
Five handlers smile with empty grace,
No Sentry floods fill this place! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change: suppressing repeated Sentry noise when DB initialization fails, directly addressing the core problem in the changeset.
Linked Issues check ✅ Passed The PR meets all objectives from #2903: implements graceful degradation for read-only handlers [#2903], prevents repeated Sentry logging [#2903], adds regression test [#2903], and maintains code quality.
Out of Scope Changes check ✅ Passed All changes are in scope: store.rs implements the sentinel-based caching, schemas.rs applies graceful degradation, and store_tests.rs validates the sentinel contract as required.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage bug labels May 29, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/subconscious/store.rs`:
- Around line 32-57: The global DDL failure flags (DDL_EVER_FAILED, DDL_LOGGED)
cause poisoning across workspaces because with_connection fast-bails for any
workspace once one fails; change this by scoping the sentinel to the workspace
(or by resetting the flags when the engine is reset). Either: replace the two
globals with a workspace-keyed cache (e.g., map from workspace path/DB path to
bools) and have with_connection consult/set the entry for the given
workspace_dir/DB path and use DB_INIT_FAILED_SENTINEL per-workspace, or add
logic in reset_engine_for_user_switch (and the SubconsciousEngine teardown) to
clear DDL_EVER_FAILED and DDL_LOGGED so a new workspace can retry migrations;
update references in with_connection to use the new per-workspace flag or
cleared globals accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 186bbee2-a2da-4f53-af74-4d130e359011

📥 Commits

Reviewing files that changed from the base of the PR and between db3fdc2 and e71bb2e.

📒 Files selected for processing (3)
  • src/openhuman/subconscious/schemas.rs
  • src/openhuman/subconscious/store.rs
  • src/openhuman/subconscious/store_tests.rs

Comment thread src/openhuman/subconscious/store.rs Outdated
Comment on lines +32 to +57
static DDL_EVER_FAILED: AtomicBool = AtomicBool::new(false);

/// Set to `true` once we have emitted the first error log. Ensures Sentry only
/// captures the failure once — subsequent calls get a silent sentinel error.
static DDL_LOGGED: AtomicBool = AtomicBool::new(false);

/// Sentinel message returned after the first DDL failure. Read-only list
/// handlers in `schemas.rs` match on this prefix to return empty payloads
/// instead of propagating an error that would fire `report_error_or_expected`.
pub const DB_INIT_FAILED_SENTINEL: &str = "[subconscious] db unavailable (init failed)";

/// Open the subconscious database and run schema migrations.
///
/// After the first DDL failure the function returns [`DB_INIT_FAILED_SENTINEL`]
/// immediately, skipping both the filesystem open and the DDL re-run. Callers
/// that perform read-only list queries should treat this sentinel as "empty
/// result" rather than an RPC error (see `schemas.rs`).
pub fn with_connection<T>(
workspace_dir: &Path,
f: impl FnOnce(&Connection) -> Result<T>,
) -> Result<T> {
// Fast path: DDL already failed in this process — skip the re-run and
// return the sentinel without logging again.
if DDL_EVER_FAILED.load(Ordering::Relaxed) {
anyhow::bail!(DB_INIT_FAILED_SENTINEL);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Call sites of store::with_connection =="
rg -nP --type=rs -C2 '\bstore::with_connection\s*\('

echo
echo "== All workspace_dir usages (look for mutation/switch patterns) =="
rg -nP --type=rs -C3 '\bworkspace_dir\b'

echo
echo "== Potential workspace switch/update entry points =="
rg -nP --type=rs -C3 'set_.*workspace|workspace.*(switch|change|update)|load_config_with_timeout|load_or_init'

Repository: tinyhumansai/openhuman

Length of output: 154


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Call sites of store::with_connection =="
rg -n --glob='*.rs' '\bstore::with_connection\s*\(' -S

echo
echo "== All workspace_dir usages (look for mutation/switch patterns) =="
rg -n --glob='*.rs' '\bworkspace_dir\b' -S

echo
echo "== Potential workspace switch/update entry points =="
rg -n --glob='*.rs' 'set_.*workspace|workspace.*(switch|change|update)|load_config_with_timeout|load_or_init' -S

Repository: tinyhumansai/openhuman

Length of output: 50379


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Inspect src/openhuman/subconscious/global.rs =="
ls -la src/openhuman/subconscious/global.rs
sed -n '1,220p' src/openhuman/subconscious/global.rs

echo
echo "== Search for multi-workspace or switching language in subconscious =="
rg -n --glob='src/openhuman/subconscious/**/*.rs' '(switch|switching|active_user|multi[-_ ]user|concurrent|per-user|leak|leaking|NOW-correct)' -S

echo
echo "== Look for how Config is loaded per request/task (workspace_dir scope) =="
rg -n --glob='src/openhuman/**/*.rs' 'Config::load_or_init|active_user\.toml|workspace_dir.*active_user|load_active|set_active' -S src/openhuman

Repository: tinyhumansai/openhuman

Length of output: 21563


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Inspect src/openhuman/subconscious/store.rs (atomic flags + maybe_log_ddl_error + with_connection failure paths) =="
sed -n '1,220p' src/openhuman/subconscious/store.rs
echo
sed -n '220,520p' src/openhuman/subconscious/store.rs

echo
echo "== Search for reset_engine_for_user_switch call sites =="
rg -n --glob='*.rs' 'reset_engine_for_user_switch\s*\(' -S src/openhuman

echo
echo "== Search for DDL_EVER_FAILED / DDL_LOGGED resets/usages =="
rg -n --glob='*.rs' 'DDL_EVER_FAILED|DDL_LOGGED|DB_INIT_FAILED_SENTINEL' -S src/openhuman/subconscious

Repository: tinyhumansai/openhuman

Length of output: 21425


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '330,430p' src/openhuman/credentials/ops.rs

echo
echo "== Inspect src/openhuman/subconscious/schemas.rs handling DB_INIT_FAILED_SENTINEL =="

# show the relevant match/if block around the sentinel usage near line ~758
sed -n '720,810p' src/openhuman/subconscious/schemas.rs

echo
echo "== Find other DB_INIT_FAILED_SENTINEL uses =="

rg -n --glob='*.rs' 'DB_INIT_FAILED_SENTINEL' src/openhuman/subconscious -S

Repository: tinyhumansai/openhuman

Length of output: 6283


Fix: make DDL failure cache workspace-scoped (current global poisoning across user switches)

reset_engine_for_user_switch() clears the cached SubconsciousEngine on logout/user switch, but src/openhuman/subconscious/store.rs’s DDL_EVER_FAILED / DDL_LOGGED are never reset—so if DDL fails for one workspace_dir, with_connection will fast-bail with DB_INIT_FAILED_SENTINEL for all subsequent workspace_dir values in the same process, skipping DB open + migrations even if the new workspace would succeed. Scope the sentinel/cache by DB path/workspace (or reset the DDL flags when resetting the engine).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/subconscious/store.rs` around lines 32 - 57, The global DDL
failure flags (DDL_EVER_FAILED, DDL_LOGGED) cause poisoning across workspaces
because with_connection fast-bails for any workspace once one fails; change this
by scoping the sentinel to the workspace (or by resetting the flags when the
engine is reset). Either: replace the two globals with a workspace-keyed cache
(e.g., map from workspace path/DB path to bools) and have with_connection
consult/set the entry for the given workspace_dir/DB path and use
DB_INIT_FAILED_SENTINEL per-workspace, or add logic in
reset_engine_for_user_switch (and the SubconsciousEngine teardown) to clear
DDL_EVER_FAILED and DDL_LOGGED so a new workspace can retry migrations; update
references in with_connection to use the new per-workspace flag or cleared
globals accordingly.

…try (resolves CodeRabbit)

Resolves the workspace-poisoning issue CodeRabbit flagged: the old
DDL_EVER_FAILED / DDL_LOGGED globals would prevent any workspace from
connecting after a single failure in the process lifetime.

Adopts upstream/main's retry-based approach (busy_timeout=5s +
3 application-level retries at 100/300/900ms backoff) which is both
correct and workspace-safe.

Also removes:
- DB_INIT_FAILED_SENTINEL constant and is_db_init_failed helper
- maybe_log_ddl_error helper
- Stale sentinel test in store_tests.rs
- Degraded-empty-list fallback branches in schemas.rs handlers
  (errors now propagate normally; the retry loop handles transient
  SQLITE_BUSY before that point)

Merges upstream/main to resolve conflicts.
@graycyrus
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@graycyrus graycyrus merged commit c52f75d into tinyhumansai:main May 29, 2026
32 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Subconscious schema DDL failure fires 1,499 Sentry errors on every RPC call

1 participant