fix(code): coalesce writeLocalLogs per taskRunId to stop main-thread storm#2255
Open
k11kirky wants to merge 4 commits into
Open
fix(code): coalesce writeLocalLogs per taskRunId to stop main-thread storm#2255k11kirky wants to merge 4 commits into
k11kirky wants to merge 4 commits into
Conversation
Contributor
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
apps/code/src/main/services/local-logs/service.test.ts:79-93
**Prefer parameterised tests for equivalent null-return cases**
The two `readLocalLogs` tests — `"returns null when the file is missing"` and `"returns null on other read errors"` — differ only in the error object injected. Per the team style, these should be collapsed into a single `it.each`.
Reviews (1): Last reviewed commit: "fix(code): coalesce writeLocalLogs per t..." | Re-trigger Greptile |
Comment on lines
+79
to
+93
| const service = new LocalLogsService(); | ||
| await expect(service.readLocalLogs(RUN_ID)).resolves.toBeNull(); | ||
| }); | ||
|
|
||
| it("returns null on other read errors", async () => { | ||
| mockReadFile.mockRejectedValue(new Error("boom")); | ||
| const service = new LocalLogsService(); | ||
| await expect(service.readLocalLogs(RUN_ID)).resolves.toBeNull(); | ||
| }); | ||
| }); | ||
|
|
||
| describe("writeLocalLogs", () => { | ||
| it("writes content to the run's NDJSON path", async () => { | ||
| const service = new LocalLogsService(); | ||
| await service.writeLocalLogs(RUN_ID, "line1\n"); |
Contributor
There was a problem hiding this comment.
Prefer parameterised tests for equivalent null-return cases
The two readLocalLogs tests — "returns null when the file is missing" and "returns null on other read errors" — differ only in the error object injected. Per the team style, these should be collapsed into a single it.each.
Context Used: Do not attempt to comment on incorrect alphabetica... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/main/services/local-logs/service.test.ts
Line: 79-93
Comment:
**Prefer parameterised tests for equivalent null-return cases**
The two `readLocalLogs` tests — `"returns null when the file is missing"` and `"returns null on other read errors"` — differ only in the error object injected. Per the team style, these should be collapsed into a single `it.each`.
**Context Used:** Do not attempt to comment on incorrect alphabetica... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0))
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
b81c449 to
f2b4def
Compare
Contributor
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
This was referenced May 21, 2026
jonathanlab
approved these changes
May 22, 2026
…storm Extract local NDJSON cache handling into a singleton LocalLogsService and single-flight writes per taskRunId with latest-wins coalescing. If a write is already in flight when another arrives for the same run, the new content replaces any queued content rather than spawning a parallel fs.promises.writeFile. When the renderer's gap-reconcile loop fires on every SSE snapshot (which happens whenever parseLogContent silently drops corrupted lines and the processedLineCount never catches up to the server's expectedCount), the old fire-and-forget writeLocalLogs would pile fs.promises.writeFile continuations onto the main thread, producing the FileHandle::CloseReq::Resolve hang signature we saw at app launch. Generated-By: PostHog Code Task-Id: 0270afde-6140-4867-bd57-52ae8a767a4d
…-content writes Tighten WriteState shape (drop the placeholder cast), skip mkdir after the first successful write per drain, skip writeFile when the coalesced content matches the last written payload, and use the DATA_DIR constant. Generated-By: PostHog Code Task-Id: 65171e71-4765-479c-b0c8-e4d0dd5c5bc7
The duplicate `INBOX_VIEWED` entries in `ANALYTICS_EVENTS` (one before PR #2228 added it as a typed event, and the second inside the proper Inbox section) were tripping `tsc`. Remove the legacy untyped placeholder plus the no-props `track(INBOX_VIEWED)` call in `navigateToInbox` — the properly-typed event already fires from `InboxSignalsTab` with the full report counts. Also trim the LocalLogsService docblock to focus on the non-obvious single-flight rationale. Generated-By: PostHog Code Task-Id: 65171e71-4765-479c-b0c8-e4d0dd5c5bc7
Generated-By: PostHog Code Task-Id: 65171e71-4765-479c-b0c8-e4d0dd5c5bc7
8913168 to
d7b92a8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Layer 1 of a multi-PR fix for the launch hang and "can't click cloud task" symptoms reported by cloud-task users.
Extracts local NDJSON cache handling (
readLocalLogs/writeLocalLogs) into a singletonLocalLogsServiceand single-flights writes pertaskRunIdwith latest-wins coalescing. If a write is already in flight when another arrives for the same run, the new content replaces any queued content rather than spawning a parallelfs.promises.writeFile.Why this is needed
When the renderer's gap-reconcile loop fires on every SSE snapshot — which happens whenever
parseLogContentsilently drops corrupted lines andprocessedLineCountnever catches up to the server'sexpectedCount— the old fire-and-forgetwriteLocalLogspilesfs.promises.writeFilecontinuations onto the main process, producing theFileHandle::CloseReq::Resolvesaturation signature we saw at app launch.This is one of two distinct main-thread hangs reported on the same crash signature:
unzipSyncpath that affected all users.What this does NOT fix
Stops the storm, but doesn't address the underlying corruption-amplification loop or unbounded reconnects — those are layer 2 and 3 below.
Follow-ups (separate PRs to be stacked on this one)
Layer 2 — break the corruption-amplification feedback loop.
parseLogContent(rendererservice.ts:3539) silently drops malformed lines, soprocessedLineCount < expectedCountforever and every SSE snapshot triggers another gap-reconcile + S3 fetch + overwrite. Need to either:Fixes the "can't click on cloud task" symptom for users whose local NDJSON is already poisoned.
Layer 3 — bound the reconnect budget.
MAX_SSE_RECONNECT_ATTEMPTS = 5(cloud-task/service.ts:21) is defeated by two paths:handleStreamCompletionreconnects withcountAttempt: falsefor non-terminal clean EOF (cloud-task/service.ts:1057).retry/retryUnhealthyCloudSessionsresets the counter on every focus.Need a per-run cumulative cap and an explicit unrecoverable terminal state so the UI can surface "this run is broken" instead of looping silently.
Separate ticket — S3 source corruption.
The agent's local writer (
packages/agent/src/session-log-writer.ts:391) correctly appends\n, but user logs show records concatenated without separators across days. Missing newlines are being introduced somewhere in the agent-server upload/aggregation path. This PR limits the blast radius of corruption but doesn't stop it from being produced.Test plan
pnpm --filter code typecheckclean.pnpm --filter code test— new tests pass; remaining failures are pre-existing archive integration tests unrelated to this change.