fix(cloud-task): drop unsafe append in log-gap inconsistency path#2102
Merged
Conversation
When `reconcileCloudLogGapOnce` fetched local + S3 logs and they came back with fewer entries than `expectedCount`, the fallback appended `newEntries` (the snapshot's tail slice at positions [expectedCount-N, expectedCount]) on top of `session.events` and ratcheted `processedLineCount = latestCount + newEntries.length`. The appended slice isn't contiguous with `latestCount`, so events ended up with duplicates and gaps and the count drifted past entries we never had. With cloud-task watchers reconnecting every ~2 min on `error: 'terminated'`, this stacked indefinitely and OOM'd the V8 renderer heap (16 renderer crashes observed in a single morning). Now the inconsistency path logs the warning and returns; the next snapshot/log update retries once the source has caught up. Generated-By: PostHog Code Task-Id: 3294a546-b9be-404e-ba5c-5d762cc95bd8
Contributor
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
apps/code/src/renderer/features/sessions/service/service.ts:3568-3573
Now that the warning fires specifically when the fetch lagged behind `expectedCount`, adding `fetchedCount: rawEntries.length` would make it immediately clear how far behind the S3/local fetch was — useful for diagnosing future incidents without needing to correlate separate log lines.
```suggestion
log.warn("Cloud task log count inconsistency", {
taskRunId,
currentCount,
expectedCount,
fetchedCount: rawEntries.length,
entriesReceived: newEntries.length,
});
```
Reviews (1): Last reviewed commit: "fix(cloud-task): drop unsafe append in l..." | Re-trigger Greptile |
Comment on lines
3568
to
3573
| log.warn("Cloud task log count inconsistency", { | ||
| taskRunId, | ||
| currentCount, | ||
| expectedCount, | ||
| entriesReceived: newEntries.length, | ||
| }); |
Contributor
There was a problem hiding this comment.
Now that the warning fires specifically when the fetch lagged behind
expectedCount, adding fetchedCount: rawEntries.length would make it immediately clear how far behind the S3/local fetch was — useful for diagnosing future incidents without needing to correlate separate log lines.
Suggested change
| log.warn("Cloud task log count inconsistency", { | |
| taskRunId, | |
| currentCount, | |
| expectedCount, | |
| entriesReceived: newEntries.length, | |
| }); | |
| log.warn("Cloud task log count inconsistency", { | |
| taskRunId, | |
| currentCount, | |
| expectedCount, | |
| fetchedCount: rawEntries.length, | |
| entriesReceived: newEntries.length, | |
| }); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/renderer/features/sessions/service/service.ts
Line: 3568-3573
Comment:
Now that the warning fires specifically when the fetch lagged behind `expectedCount`, adding `fetchedCount: rawEntries.length` would make it immediately clear how far behind the S3/local fetch was — useful for diagnosing future incidents without needing to correlate separate log lines.
```suggestion
log.warn("Cloud task log count inconsistency", {
taskRunId,
currentCount,
expectedCount,
fetchedCount: rawEntries.length,
entriesReceived: newEntries.length,
});
```
How can I resolve this? If you propose a fix, please make it concise.Make it immediately clear from a single log line how far behind the S3/local fetch was relative to expectedCount. Generated-By: PostHog Code Task-Id: 3294a546-b9be-404e-ba5c-5d762cc95bd8
Contributor
|
Reviews (2): Last reviewed commit: "log fetchedCount in cloud-log inconsiste..." | Re-trigger Greptile |
Contributor
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a renderer OOM crash loop on long-running cloud tasks. Observed: 16 V8 OOM renderer crashes between 06:12 and 06:34 on 2026-05-08 against task
aab06c8f-…(run accumulated 11k+ log entries).reconcileCloudLogGapOncehad a fallback that appendedupdate.newEntrieson top ofsession.eventswhenever local + S3 fetches couldn't catch up toexpectedCount. ButnewEntriesis the snapshot's tail slice (positions[expectedCount-N, expectedCount]) — it's not contiguous with the renderer'sprocessedLineCount. So each invocation:session.events(duplicates and gaps)processedLineCount = latestCount + newEntries.lengthpast entries we never actually hadWith cloud-task watchers reconnecting every ~2 min on
error: 'terminated', this stacked indefinitely until V8 OOM'd the renderer.The fix: when the fetched logs lag behind
expectedCount, log the existingCloud task log count inconsistencywarning and return. The next snapshot orlogsupdate retries once the source has caught up. The success branch (S3/local catches up toexpectedCount) is unchanged.Test plan
pnpm --filter code typecheckpnpm exec vitest run src/renderer/features/sessions/service/service.test.ts— all 80 tests passprocesses a pending cloud log gap…test to assert the corrected behavior: queue still drains (verified via secondreadLocalLogscall) but no append happens when fetches stay staleterminatederrors and confirm the renderer no longer OOMsNotes
Two other related leaks remain (out of scope here, deferred):
emittedLogEntriesinapps/code/src/main/services/cloud-task/service.tsgrows unbounded on single-subscriber runs (only pruned inemitCurrentSnapshot, which only fires for 2nd+ subscribers).emitCurrentSnapshotships the full historical log (newEntries: snapshotEntries) to every newly-attached renderer subscriber — for an 11k-entry run that's a multi-MB IPC payload per re-attach.Both are worth a follow-up. This PR is the conservative hotfix.