[Onyx] Add IDB healing mechanism for "Internal error opening backing store"

Coming from https://github.com/Expensify/App/issues/87862#issuecomment-4428339647.

## Issue

For the `UnknownError: Internal error opening backing store for indexedDB.open` error class analyzed in #87862, neither retries (action item §8.2) nor degrading to `MemoryOnlyProvider` actually addresses the underlying problem. The session continues working from the in-memory cache so users don't see immediate degradation, but the storage layer is silently broken — and that broken state has two real costs:

1. **Operational cost**: log/Sentry volume from the silent retry storm (mitigated separately by §8.2).
2. **Data loss risk on offline refresh**: if the user is offline and accumulates queued writes (e.g. `SequentialQueue` items stored as an Onyx key), and the cache rebuilds empty from broken storage on refresh, those queued writes are gone. The user has no indication this happened.

The right strategy is to attempt to **heal** the IDB connection so writes get back onto disk. There is no need to swap providers, show user-visible UI, or call `deleteDatabase()` (which Chapter 2 of #87862 proved also fails when corrupt LevelDB files persist on disk). Just reopen the connection and let normal operation resume if it heals; if it doesn't, fall through to the cache-only behavior the session already exhibits, without further log noise.

### Precedent: Dexie's workaround

Dexie ships [a clean precedent](https://github.com/dexie/Dexie.js/blob/master/src/classes/dexie/dexie-open.ts) for this approach — catch `UnknownError` from `indexedDB.open()` and retry up to 3 times. No backoff, no fallback, no provider swap:

```ts
.catch((err) => {
    switch (err?.name) {
        case 'UnknownError':
            if (state.PR1398_maxLoop > 0) {
                state.PR1398_maxLoop--;
                console.warn('Dexie: Workaround for Chrome UnknownError on open()');
                return tryOpenDB();
            }
            break;
        // ...
    }
    return Promise.reject(err);
});
```

There's also a sibling workaround in Dexie's `temp-transaction.ts`: when a transaction throws `InvalidStateError` while the DB reports as open, close and reopen the DB and retry once. Both are bounded by the same `PR1398_maxLoop = 3` budget.

## Solution

> **Scope: web (IDB) only.** The error class this issue addresses is Chromium-IDB-specific. The native SQLite provider hits a different set of errors (`disk I/O error`, `database is locked`, `database or disk is full`) with categorically different root causes — filesystem-level issues, lock contention, or genuine capacity exhaustion — none of which benefit from a close+reopen heal pattern. If a SQLite-side mitigation is needed, it should be designed and tracked separately.

Implement two related healing mechanisms on `IDBKeyValProvider`, designed to work together as a single healing strategy with a **shared retry budget** — directly mirroring Dexie's `PR1398_maxLoop` pattern.

### Shared retry counter

Maintain a single counter inside `IDBKeyValProvider` — call it `healAttemptsRemaining` — initialized to `3`. The counter is:

- **Decremented** on every heal attempt (both init retries and mid-session reopens).
- **Reset to 3** on every successful IDB operation (`IDBKeyValProvider.setItem`, `multiSet`, `mergeItem`, `multiMerge`, `removeItem`, etc.).
- **Checked** before any heal attempt — if it's already at 0, fall through to cache-only behavior without further attempts.

The counter, the heal logic, and all the log messages live entirely inside `IDBKeyValProvider`. None of this code runs on native; the SQLite provider is untouched.

This naturally creates a circuit breaker for the permanent-corruption case (no successes → counter drains to 0 → no further heal attempts), while allowing a healthy session to recover from multiple separate transient incidents (each success replenishes the budget). It mirrors Dexie's [`PR1398_maxLoop`](https://github.com/dexie/Dexie.js/blob/master/src/functions/temp-transaction.ts) directly.

### 1. On provider init: retry `indexedDB.open()` on `UnknownError`

When `IDBKeyValProvider` init throws `UnknownError`, retry the `indexedDB.open()` call up to the remaining budget. This is the direct Dexie-style workaround for the transient post-`Clear cookies and site data` class (Walexander's repro in [Dexie #543](https://github.com/dexie/Dexie.js/issues/543)).

Sketch:

```ts
async function openWithHealing(dbName: string, version: number): Promise<IDBDatabase> {
    let lastError: unknown;
    while (healAttemptsRemaining > 0) {
        try {
            return await openIDB(dbName, version);
        } catch (error) {
            lastError = error;
            if (!(error instanceof DOMException) || error.name !== 'UnknownError') {
                throw error;
            }
            healAttemptsRemaining--;
            Logger.logInfo(`IDB heal: UnknownError on open, retrying. attemptsRemaining=${healAttemptsRemaining}`);
        }
    }
    throw lastError;
}
```

### 2. Mid-session: `close + reopen` on this error

When the error fires during a write operation in an active session, attempt a `close + reopen` of the IDB connection — up to the remaining budget — before considering the operation failed:

```ts
async function healAndRetry<T>(operation: () => Promise<T>): Promise<T> {
    try {
        const result = await operation();
        healAttemptsRemaining = 3;  // reset budget on success — mirrors Dexie's pattern
        return result;
    } catch (error) {
        if (!isBackingStoreError(error) || healAttemptsRemaining <= 0) {
            throw error;
        }

        healAttemptsRemaining--;
        Logger.logInfo(`IDB heal: backing store error during operation — attempting close + reopen. attemptsRemaining=${healAttemptsRemaining}`);
        await closeConnection();
        await openIDB(DB_NAME, DB_VERSION);

        return healAndRetry(operation);  // recursive retry, bounded by shared counter
    }
}
```

If a heal succeeds and the subsequent operation completes, the counter resets to 3, so a fresh transient incident later in the session gets a full budget again. If 3 heal attempts fail in succession with no intervening success, the counter hits 0 and subsequent operations fall through to cache-only behavior without further heal attempts or log noise (the cache already absorbed the write per parent comment §5).

### Important constraints

- **No provider swap.** The cache already serves reads and absorbs writes (parent comment §7). Swapping to `MemoryOnlyProvider` changes nothing observable during the session and adds complexity without benefit.
- **No user-visible UI / notification.** The session is already serving correctly from cache; there's nothing to surface.
- **No `deleteDatabase()` calls.** Chapter 2 of #87862 demonstrated that `deleteDatabase()` also fails when corrupt LevelDB files persist — it's not a viable healing primitive in this scenario.
- **Bounded heal attempts via shared counter.** 3 total in flight; reset on success. Healing should be cheap and silent; if it doesn't work within the budget, further attempts are pure noise until something succeeds and refreshes the budget.

### Coordination with action items §8.1 and §8.2

This issue can be worked on independently of the others — the heal logic lives **inside the provider** (e.g. `IDBKeyValProvider.setItem` wrapping the raw IDB call), so errors are caught and retried before they ever reach `tryOrDegradePerformance` or `retryOperation`. None of the other action items are strict prerequisites.

That said, they interact:

- [**§8.1** (async-catch fix + removal of the `'Internal error opening backing store'` string check)](https://github.com/Expensify/App/issues/90632). Currently the string check is dead code because of Bug 1, so it doesn't conflict with this issue. But §8.1 must remove the string check at the same time it fixes the catch — otherwise the live string check would route this error class to `degradePerformance`, which we don't want once heal is in place. As long as §8.1 ships its two changes together, the order with this issue doesn't matter.
- [**§8.2** (`NON_RETRIABLE_ERRORS` classification for this error class) prevents `OnyxUtils.retryOperation` from retrying the operation independently when it eventually reaches that layer (after a failed heal)](https://github.com/Expensify/App/issues/90633). Without §8.2, a heal-failed write would still produce 5 retry log lines + an exhaustion alert from `retryOperation`. With §8.2, it produces a single log line and exits cleanly. Either order works; the user-visible behavior is the same, but §8.2 cuts the residual log noise that survives a failed heal.

### Test plan

- Unit test (init heal): mock `indexedDB.open` to reject with `UnknownError` twice, then resolve. Confirm init succeeds after 3 attempts total (1 initial + 2 retries), heal-attempt logs fire, and the counter ends at `1` (3 - 2 decrements).
- Unit test (init heal exhaustion): mock `indexedDB.open` to always reject with `UnknownError`. Confirm init fails after 3 attempts, counter is `0`, and subsequent operations skip the heal path.
- Unit test (mid-session heal): mock `Storage.setItem` to reject once with the backing-store error, then resolve. Confirm a close+reopen happens, the second `setItem` succeeds, the heal log fires, and the counter is reset to `3` after success.
- Unit test (mid-session heal exhaustion): mock `Storage.setItem` to reject 3 times consecutively. Confirm 3 heal attempts happen, the counter drains to `0`, and a 4th failing `setItem` does NOT trigger another heal attempt.
- Unit test (counter reset after success): drain the counter to `1` with mid-session heals, then a successful `setItem`, then a new error. Confirm the counter was reset to `3` and a fresh heal attempt fires.
- Verify in VictoriaLogs post-deploy: `IDB heal` log lines appear, and a fraction of users emit them and then continue without further `Failed to save to storage` errors (indicating successful heal). The ratio of post-heal successes to heal attempts gives a direct readout of how often this mechanism helps.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Onyx] Add IDB healing mechanism for "Internal error opening backing store" #90636

Issue

Precedent: Dexie's workaround

Solution

Shared retry counter

1. On provider init: retry `indexedDB.open()` on `UnknownError`

2. Mid-session: `close + reopen` on this error

Important constraints

Coordination with action items §8.1 and §8.2

Test plan

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Onyx] Add IDB healing mechanism for "Internal error opening backing store" #90636

Description

Issue

Precedent: Dexie's workaround

Solution

Shared retry counter

1. On provider init: retry indexedDB.open() on UnknownError

2. Mid-session: close + reopen on this error

Important constraints

Coordination with action items §8.1 and §8.2

Test plan

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. On provider init: retry `indexedDB.open()` on `UnknownError`

2. Mid-session: `close + reopen` on this error