Skip to content

fix(telemetry): handle obvious main-branch field crashes#1115

Open
pedramamini wants to merge 1 commit into
mainfrom
fix/sentry-main-telemetry-noise
Open

fix(telemetry): handle obvious main-branch field crashes#1115
pedramamini wants to merge 1 commit into
mainfrom
fix/sentry-main-telemetry-noise

Conversation

@pedramamini

@pedramamini pedramamini commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Summary

Triages the channel:stable (= main) Sentry issues for Maestro and fixes the four that affect the current 0.17.x release and have an obvious bug + path to resolution. All four are the same family the prior triages addressed: expected/recoverable conditions being reported to Sentry as crashes.

Validated each against current main code (not rc) before fixing. Issues that are build/packaging (MAESTRO-RS glibc), native Chromium crashes (SG/RB/SZ/Y), already-archived (1G), genuine spawn failures worth surfacing (R0/NM), or already-fixed-on-main but still trickling from old 0.15.x builds (4Y, BH) were deliberately left out of scope.

Fixes

Issue Symptom (current release) Fix
MAESTRO-RR "codex usage sample failed" — ~375 events, all reason: request: fetch failed The Codex usage sampler is best-effort; a thrown fetch (offline / DNS / TLS / abort-timeout) is expected. Stop reporting it to Sentry. The 401/403 carve-out and unexpected-HTTP reporting are unchanged.
MAESTRO-M9 RangeError: Invalid string length in agentSessions global-stats refresh V8 throws this when a session log is too large to read into one string. Treat it as an expected, recoverable data condition (skip the file + warn) rather than a Sentry crash. Matches the rc fix in #1114 (and the storage-layer pattern) so the two converge cleanly on the next rcmain merge.
MAESTRO-SP stats:record-shortcut-usage … Database not initialized A shortcut can fire before the stats DB finishes init. Skip the fire-and-forget analytics write when the DB isn't ready (new StatsDB.isReady()).
MAESTRO-QF sessions:setMany returned false (recoverable disk error) The debounced persistence flush throws this deliberately (to preserve isPending for retry) when the main process returns a recoverable disk error. Keep that expected user-environment condition out of Sentry; genuine flush failures still report.

Tests

Adds regression tests for RR (network failure not reported), SP (skips silently when DB not ready; records + broadcasts when ready), and QF (recoverable disk error not reported; genuine failure still reported).

  • npm run lint (tsc) — no new errors in changed files
  • npm run lint:eslint — clean
  • prettier --check — clean
  • Affected suites: 194 passed (codex-usage-sampler, stats handlers, stats-db, agentSessions, useDebouncedPersistence)

Summary by CodeRabbit

  • Bug Fixes
    • Improved error handling for quota metadata requests so recoverable network/offline-style failures no longer generate unnecessary error reporting.
    • Prevented stats recording and related events when the stats database isn’t ready.
    • Treated oversized session parsing issues as recoverable to avoid excessive error reporting.
    • Refined flush failure handling to suppress reporting for recoverable disk errors while still reporting genuine errors.
  • Tests
    • Added/expanded coverage for quota request behavior, shortcut usage recording, debounced persistence flush reporting, and oversized session parsing scenarios.

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR suppresses Sentry noise from recoverable/expected failures across three subsystems: network-level fetch errors in the Codex usage sampler, recoverable disk errors in the renderer session persistence hook, and uninitialized-DB errors in the stats:record-shortcut-usage IPC handler via a new StatsDB.isReady() method. It also adds conditional error handling for oversized session files in the agent sessions global stats pass.

Changes

Recoverable Error Noise Reduction

Layer / File(s) Summary
StatsDB isReady guard and IPC handler short-circuit
src/main/stats/stats-db.ts, src/main/ipc/handlers/stats.ts, src/__tests__/main/ipc/handlers/stats.test.ts
Adds StatsDB.isReady() (returns this.db !== null) and uses it in stats:record-shortcut-usage to return null without recording when the DB is not yet initialized; tests verify both the ready and not-ready handler paths.
Codex fetch error Sentry suppression
src/main/agents/codex-usage-sampler.ts, src/__tests__/main/agents/codex-usage-sampler.test.ts
Removes the reportCodexUsageFailure call from the fetch catch block so offline/abort/DNS/TLS failures return the error payload silently; test asserts captureMessage is never called for network errors.
Renderer persistence recoverable disk error suppression
src/renderer/hooks/utils/useDebouncedPersistence.ts, src/__tests__/renderer/hooks/utils/useDebouncedPersistence.test.ts
Makes captureException conditional on whether the error message contains "recoverable disk error"; tests cover both the suppressed case (setAll returns false) and the reported case (setAll rejects with a real error).
Oversized session file RangeError handling
src/main/ipc/handlers/agentSessions.ts
Treats RangeError in Claude and Codex incremental session file parsing as a recoverable condition with a dedicated warning log including the file path, skipping Sentry reporting; other error types continue to be captured and logged generically.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • RunMaestro/Maestro#1039: Earlier implementation of sampleCodexUsage network/quota-sampling behavior in codex-usage-sampler.ts that this PR modifies by removing Sentry reporting for fetch failures.
  • RunMaestro/Maestro#1094: Also modifies sampleCodexUsage Sentry-reporting logic (HTTP failure branches) with corresponding codex sampler tests, directly intersecting with this PR's fetch catch-block change.

Suggested labels

approved, ready to merge

🐇 A rabbit hops through errors galore,
But "recoverable" means: just ignore!
No Sentry alarm for a disk that might wheeze,
Or a fetch that times out in a cold network breeze.
The stats DB sleeps? We skip with a wink —
Less noise in the logs is more calming, I think! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: fixing telemetry/Sentry reporting issues for expected crashes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/sentry-main-telemetry-noise

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Addresses four stable-channel (main) Sentry issues seen on the current
0.17.x release, all where the bug and path to resolution are clear:

- MAESTRO-RR: the Codex usage sampler reported network "request: fetch
  failed" errors to Sentry (the dominant cause, ~375 events). Offline /
  DNS / TLS / abort-timeout failures are expected for a best-effort
  background sampler, so skip them - the UI still surfaces the error
  state. The existing 401/403 carve-out (un-logged-in CODEX_HOME) stays.

- MAESTRO-M9: the global-stats refresh reported the RangeError ("Invalid
  string length") that V8 throws when a session log is too large to read
  into a single string. Treat it as an expected, recoverable data
  condition (skip the file, log a warning) instead of a Sentry crash -
  mirrors the storage-layer pattern and the rc fix in #1114 so the two
  converge cleanly on the next rc -> main merge.

- MAESTRO-SP: stats:record-shortcut-usage threw "Database not
  initialized" when a shortcut fired before the stats DB finished
  initializing. Skip the fire-and-forget analytics write when the DB
  isn't ready yet (new StatsDB.isReady()).

- MAESTRO-QF: the debounced session-persistence flush reported the main
  process's deliberate "recoverable disk error" (a false return thrown
  only to preserve isPending for retry) to Sentry. Keep that expected
  user-environment condition out of Sentry; genuine flush failures still
  report.

Adds regression tests for RR, SP, and QF.
@pedramamini pedramamini force-pushed the fix/sentry-main-telemetry-noise branch from 71f85e1 to 4aa9b1d Compare June 21, 2026 14:28
@greptile-apps

greptile-apps Bot commented Jun 21, 2026

Copy link
Copy Markdown

Greptile Summary

This PR reduces Sentry noise by stopping four categories of expected, recoverable user-environment conditions from being reported as crashes in the channel:stable release. All four fixes follow the same pattern: distinguish genuinely unexpected failures from predictable error states (offline network, DB not yet initialized, transient disk error, oversized log file) and suppress only the latter.

  • MAESTRO-RR (codex-usage-sampler.ts): Fetch errors (offline, DNS/TLS, abort timeout) no longer go to Sentry; 401/403 and other unexpected HTTP errors still do.
  • MAESTRO-M9 (agentSessions.ts): Session log files ≥ 500 MB are skipped with a warning instead of crashing the whole stats pass with a V8 string-length RangeError.
  • MAESTRO-SP (stats.ts / stats-db.ts): Shortcut events that arrive before the stats DB finishes initializing now return null silently instead of throwing "Database not initialized" across the IPC bridge. A new isReady() method is added to StatsDB — but a pre-existing isReady() with different semantics already exists in the class, producing a duplicate identifier.
  • MAESTRO-QF (useDebouncedPersistence.ts): Recoverable disk errors (main process returned false from setAll/setMany) are filtered from Sentry via a message.includes('recoverable disk error') substring check; genuine exceptions still report.

Confidence Score: 3/5

Hold for the duplicate isReady() in stats-db.ts before merging — the other three fixes are clean and safe.

Three of the four fixes are straightforward and well-tested. The MAESTRO-SP fix introduces a new isReady() in StatsDB at line 103, but an existing isReady() with stronger semantics already lives in the same class at line 188–190. TypeScript treats duplicate method definitions as a compile error (TS2300), and the weaker new implementation could permit shortcut-usage writes before schema migrations finish if the duplicate is resolved in favour of the new definition.

src/main/stats/stats-db.ts — the duplicate isReady() needs to be resolved before this PR is merged.

Important Files Changed

Filename Overview
src/main/stats/stats-db.ts Adds isReady() at line 103 returning this.db !== null, but a pre-existing isReady() already exists at line 188 returning this.initialized && this.db !== null — duplicate method definition causes a TypeScript compile error and the new implementation has weaker semantics.
src/renderer/hooks/utils/useDebouncedPersistence.ts Suppresses Sentry reporting for recoverable disk errors via message.includes() string check — functional but fragile; genuine exceptions still report correctly.
src/main/agents/codex-usage-sampler.ts Stops reporting network fetch failures to Sentry; 401/403 and unexpected HTTP errors still report correctly. Clean change.
src/main/ipc/handlers/agentSessions.ts Adds a 500 MB size guard before reading session files; straightforward and safe.
src/main/ipc/handlers/stats.ts Adds db.isReady() guard before incrementShortcutUsage(); relies on StatsDB.isReady() which has a duplicate-definition issue in this PR.
src/tests/main/agents/codex-usage-sampler.test.ts Adds regression test confirming network TypeError is not sent to Sentry. Well-scoped and correct.
src/tests/main/ipc/handlers/stats.test.ts Adds isReady and incrementShortcutUsage mocks; covers both ready and not-ready paths for the shortcut-usage handler.
src/tests/renderer/hooks/utils/useDebouncedPersistence.test.ts Adds Sentry mock and two new tests: recoverable error not reported, genuine error reported. Coverage is precise and correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Recoverable condition occurs] --> B{Which fix?}
    B --> RR[MAESTRO-RR\nFetch throws in codex-usage-sampler]
    RR --> RR2[Return error snapshot\nNo Sentry report]
    B --> M9[MAESTRO-M9\nSession file >= 500 MB]
    M9 --> M9b{fileStat.size > MAX?}
    M9b -->|Yes| M9c[logger.warn + continue\nSkip file, no crash]
    M9b -->|No| M9d[readFile + parse normally]
    B --> SP[MAESTRO-SP\nShortcut fires before DB ready]
    SP --> SP2{db.isReady?}
    SP2 -->|false| SP3[return null silently]
    SP2 -->|true| SP4[incrementShortcutUsage\nbroadcast stats:updated]
    B --> QF[MAESTRO-QF\nsetAll/setMany returns false]
    QF --> QF2[persistInternal throws\nrecoverable disk error]
    QF2 --> QF3{message.includes\nrecoverable disk error?}
    QF3 -->|Yes| QF4[Suppress Sentry\nisPending preserved for retry]
    QF3 -->|No| QF5[captureException to Sentry]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[Recoverable condition occurs] --> B{Which fix?}
    B --> RR[MAESTRO-RR\nFetch throws in codex-usage-sampler]
    RR --> RR2[Return error snapshot\nNo Sentry report]
    B --> M9[MAESTRO-M9\nSession file >= 500 MB]
    M9 --> M9b{fileStat.size > MAX?}
    M9b -->|Yes| M9c[logger.warn + continue\nSkip file, no crash]
    M9b -->|No| M9d[readFile + parse normally]
    B --> SP[MAESTRO-SP\nShortcut fires before DB ready]
    SP --> SP2{db.isReady?}
    SP2 -->|false| SP3[return null silently]
    SP2 -->|true| SP4[incrementShortcutUsage\nbroadcast stats:updated]
    B --> QF[MAESTRO-QF\nsetAll/setMany returns false]
    QF --> QF2[persistInternal throws\nrecoverable disk error]
    QF2 --> QF3{message.includes\nrecoverable disk error?}
    QF3 -->|Yes| QF4[Suppress Sentry\nisPending preserved for retry]
    QF3 -->|No| QF5[captureException to Sentry]
Loading

Reviews (1): Last reviewed commit: "fix(telemetry): handle obvious main-bran..." | Re-trigger Greptile

Comment on lines +103 to +105
isReady(): boolean {
return this.db !== null;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Duplicate isReady() definition — compile error and weaker semantics

This PR adds a second isReady() at line 103 that returns this.db !== null, but a pre-existing isReady() already exists in the same class at line 188–190 and returns this.initialized && this.db !== null. TypeScript will emit error TS2300: Duplicate identifier 'isReady'. Beyond the compile failure, the semantics differ: the pre-existing check requires both the DB handle to be open and initialize() to have run to completion (which includes running migrations); the new one only checks for the handle, meaning it could return true between the this.db = new Database(...) assignment (line 139) and runMigrations() finishing (line 149/151). The pre-existing method should be removed and the new one updated to preserve the stronger this.initialized && this.db !== null guard.

// Maestro bug, so keep it out of Sentry. Genuine flush failures (real
// exceptions) still report. (MAESTRO-QF)
const message = err instanceof Error ? err.message : String(err);
if (!message.includes('recoverable disk error')) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 String-based error discrimination is fragile

The message.includes('recoverable disk error') guard couples the catch site (line 391) to the throw-site messages at lines 340 and 355 through a magic string. If either message is ever rephrased without updating this check, the recoverable error will silently start reaching Sentry (or, conversely, a new genuine error with that substring will be silently swallowed). A dedicated error subclass — e.g. class RecoverableDiskError extends Error {} — gives the same locality with a robust instanceof check that can't go stale.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/main/stats/stats-db.ts`:
- Around line 96-105: The StatsDB class has duplicate isReady() method
definitions, with one at lines 96-105 and another at line 188. Remove the first
isReady() method implementation (the one shown in the diff at lines 96-105 with
the JSDoc comment about the database handle being open) to eliminate the
duplicate method declaration, as having two methods with the same name in the
same class will cause compilation errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d35e94b3-1578-40c6-a3a6-329271cc42ec

📥 Commits

Reviewing files that changed from the base of the PR and between 549b402 and 71f85e1.

📒 Files selected for processing (8)
  • src/__tests__/main/agents/codex-usage-sampler.test.ts
  • src/__tests__/main/ipc/handlers/stats.test.ts
  • src/__tests__/renderer/hooks/utils/useDebouncedPersistence.test.ts
  • src/main/agents/codex-usage-sampler.ts
  • src/main/ipc/handlers/agentSessions.ts
  • src/main/ipc/handlers/stats.ts
  • src/main/stats/stats-db.ts
  • src/renderer/hooks/utils/useDebouncedPersistence.ts

Comment on lines +96 to +105
/**
* True once the underlying database handle is open and usable. Lets
* best-effort callers (e.g. fire-and-forget analytics writes) skip
* gracefully instead of throwing "Database not initialized" when a write
* races app startup (the handle is created in `initialize()`, called on
* app-ready). (MAESTRO-SP)
*/
isReady(): boolean {
return this.db !== null;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify only one StatsDB.isReady() implementation exists after fix.
rg -nP '^\s*isReady\(\): boolean\s*\{' src/main/stats/stats-db.ts
# Expected: exactly one match

Repository: RunMaestro/Maestro

Length of output: 114


Remove the duplicate isReady() implementation at lines 96–105.

StatsDB defines isReady() twice: at line 103 (in the changed code) and line 188. Duplicate method declarations in a class are a critical error and will cause a compilation failure or unexpected behavior.

Suggested fix
-	/**
-	 * True once the underlying database handle is open and usable. Lets
-	 * best-effort callers (e.g. fire-and-forget analytics writes) skip
-	 * gracefully instead of throwing "Database not initialized" when a write
-	 * races app startup (the handle is created in `initialize()`, called on
-	 * app-ready). (MAESTRO-SP)
-	 */
-	isReady(): boolean {
-		return this.db !== null;
-	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/stats/stats-db.ts` around lines 96 - 105, The StatsDB class has
duplicate isReady() method definitions, with one at lines 96-105 and another at
line 188. Remove the first isReady() method implementation (the one shown in the
diff at lines 96-105 with the JSDoc comment about the database handle being
open) to eliminate the duplicate method declaration, as having two methods with
the same name in the same class will cause compilation errors.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/main/stats/stats-db.ts (1)

96-105: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Duplicate isReady() method causes compilation failure.

This is already flagged in a previous review. The class defines isReady() twice:

  • Lines 103-105: return this.db !== null;
  • Lines 188-190: return this.initialized && this.db !== null;

The existing implementation at lines 188-190 is semantically correct—it ensures migrations have completed before allowing writes. Remove the new duplicate and update the existing method's JSDoc to document the MAESTRO-SP use case.

Suggested fix

Remove lines 96-105 entirely, then update the existing method's docstring:

 	/**
-	 * Check if database is initialized and ready
+	 * True once the underlying database handle is open and usable. Lets
+	 * best-effort callers (e.g. fire-and-forget analytics writes) skip
+	 * gracefully instead of throwing "Database not initialized" when a write
+	 * races app startup (the handle is created in `initialize()`, called on
+	 * app-ready). (MAESTRO-SP)
 	 */
 	isReady(): boolean {
 		return this.initialized && this.db !== null;
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/stats/stats-db.ts` around lines 96 - 105, The isReady() method is
defined twice in the class causing a compilation failure. Remove the duplicate
isReady() method that returns this.db !== null along with its JSDoc comment
block. Keep the existing implementation of isReady() at lines 188-190 that
returns this.initialized && this.db !== null, and update its JSDoc to include
the MAESTRO-SP use case documentation that was in the removed method's comment.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/main/stats/stats-db.ts`:
- Around line 96-105: The isReady() method is defined twice in the class causing
a compilation failure. Remove the duplicate isReady() method that returns
this.db !== null along with its JSDoc comment block. Keep the existing
implementation of isReady() at lines 188-190 that returns this.initialized &&
this.db !== null, and update its JSDoc to include the MAESTRO-SP use case
documentation that was in the removed method's comment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5619ee3e-3450-432b-9847-09ad16d47417

📥 Commits

Reviewing files that changed from the base of the PR and between 71f85e1 and 4aa9b1d.

📒 Files selected for processing (8)
  • src/__tests__/main/agents/codex-usage-sampler.test.ts
  • src/__tests__/main/ipc/handlers/stats.test.ts
  • src/__tests__/renderer/hooks/utils/useDebouncedPersistence.test.ts
  • src/main/agents/codex-usage-sampler.ts
  • src/main/ipc/handlers/agentSessions.ts
  • src/main/ipc/handlers/stats.ts
  • src/main/stats/stats-db.ts
  • src/renderer/hooks/utils/useDebouncedPersistence.ts
🚧 Files skipped from review as they are similar to previous changes (6)
  • src/tests/main/agents/codex-usage-sampler.test.ts
  • src/renderer/hooks/utils/useDebouncedPersistence.ts
  • src/main/agents/codex-usage-sampler.ts
  • src/tests/renderer/hooks/utils/useDebouncedPersistence.test.ts
  • src/main/ipc/handlers/stats.ts
  • src/tests/main/ipc/handlers/stats.test.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant