fix(code): unblock main thread by decompressing skills zips off-thread#2242
Merged
Merged
Conversation
PosthogPluginService runs updateSkills() during app startup, which calls fflate.unzipSync on the downloaded skills + context-mill archives. Pure-JS sync decompression on the Electron main thread pegged CrBrowserMain for tens of seconds on launch, surfacing as an app-hang for users on v0.52.0. Swap to fflate's async unzip via a small unzipAsync promisify helper in extract-zip.ts, used by both extractZip and the inner omnibus-zip path in update-skills-saga.ts. Async unzip yields the event loop so the main thread stays responsive during extraction. Build-time scripts (vite.main.config.mts, scripts/pull-skills.mjs) keep unzipSync — they don't run in the Electron app. Generated-By: PostHog Code Task-Id: 0270afde-6140-4867-bd57-52ae8a767a4d
Contributor
|
Reviews (1): Last reviewed commit: "fix(code): unblock main thread by decomp..." | Re-trigger Greptile |
andrewm4894
approved these changes
May 20, 2026
Member
andrewm4894
left a comment
There was a problem hiding this comment.
was just looking into this too when doing local dev for some inbox stuff
4 tasks
k11kirky
added a commit
that referenced
this pull request
May 25, 2026
…storm (#2255) ## Summary Layer 1 of a multi-PR fix for the launch hang and "can't click cloud task" symptoms reported by cloud-task users. Extracts local NDJSON cache handling (`readLocalLogs` / `writeLocalLogs`) into a singleton `LocalLogsService` and **single-flights writes per `taskRunId` with latest-wins coalescing**. If a write is already in flight when another arrives for the same run, the new content replaces any queued content rather than spawning a parallel `fs.promises.writeFile`. ### Why this is needed When the renderer's gap-reconcile loop fires on every SSE snapshot — which happens whenever `parseLogContent` silently drops corrupted lines and `processedLineCount` never catches up to the server's `expectedCount` — the old fire-and-forget `writeLocalLogs` piles `fs.promises.writeFile` continuations onto the main process, producing the `FileHandle::CloseReq::Resolve` saturation signature we saw at app launch. This is one of two distinct main-thread hangs reported on the same crash signature: - [#2242](#2242) (merged) addressed the startup `unzipSync` path that affected all users. - This PR addresses the cloud-task corruption-feedback loop that only manifested for users with cloud tasks. ### What this does NOT fix Stops the storm, but doesn't address the underlying corruption-amplification loop or unbounded reconnects — those are layer 2 and 3 below. ## Follow-ups (separate PRs to be stacked on this one) **Layer 2 — break the corruption-amplification feedback loop.** `parseLogContent` (renderer `service.ts:3539`) silently drops malformed lines, so `processedLineCount < expectedCount` forever and every SSE snapshot triggers another gap-reconcile + S3 fetch + overwrite. Need to either: - Track dropped-line counts and feed them into the reconciliation math so a known-corrupted file stops triggering gap-reconcile after one observation, or - Hash-compare local vs S3 content and short-circuit re-write when they match. Fixes the "can't click on cloud task" symptom for users whose local NDJSON is already poisoned. **Layer 3 — bound the reconnect budget.** `MAX_SSE_RECONNECT_ATTEMPTS = 5` (`cloud-task/service.ts:21`) is defeated by two paths: - `handleStreamCompletion` reconnects with `countAttempt: false` for non-terminal clean EOF (`cloud-task/service.ts:1057`). - `retry` / `retryUnhealthyCloudSessions` resets the counter on every focus. Need a per-run cumulative cap and an explicit unrecoverable terminal state so the UI can surface "this run is broken" instead of looping silently. **Separate ticket — S3 source corruption.** The agent's local writer (`packages/agent/src/session-log-writer.ts:391`) correctly appends `\n`, but user logs show records concatenated without separators across days. Missing newlines are being introduced somewhere in the agent-server upload/aggregation path. This PR limits the *blast radius* of corruption but doesn't stop it from being produced. ## Test plan - [x] Unit tests cover: single-flight coalescing, multi-run independence, propagation of in-flight resolution to all coalesced callers, recovery after write rejection, queue draining after completion. - [x] `pnpm --filter code typecheck` clean. - [x] `pnpm --filter code test` — new tests pass; remaining failures are pre-existing archive integration tests unrelated to this change. - [ ] Manual verification on a dev build with cloud tasks (post-merge).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PosthogPluginService.updateSkills()runs during startup and callsfflate.unzipSyncon the downloaded skills + context-mill archives. Pure-JS sync decompression on the Electron main thread peggedCrBrowserMainfor tens of seconds on launch, surfacing as an app-hang for users on v0.52.0 (Unresponsive for 81 seconds before samplingin the crash log).unzipvia a smallunzipAsyncpromisify helper inextract-zip.ts. Used by bothextractZipand the inner omnibus-zip path inupdate-skills-saga.ts. Async unzip yields the event loop so the main thread stays responsive during extraction.vite.main.config.mts,scripts/pull-skills.mjs) keepunzipSync— they don't run in the Electron app.Root cause (from the crash log)
Main thread stuck at
node::fs::FileHandle::CloseReq::Resolve→ V8 → JIT'd JS for all 41/41 samples — the promise continuation right afterawait readFile(zipPath)resolves, which is exactly whereunzipSyncran. All 4libuv-workerthreads parked inuv_cond_wait, confirming the hang was pure CPU on the main thread, not I/O. 1.3 GB footprint matched the fully-materialized decompressed zip held in memory.Test plan
pnpm --filter code typecheckcleanpnpm --filter code test— all 1348 tests pass (one mock updated fromunzipSyncto asyncunzip)Generated-By: PostHog Code
Task-Id: 0270afde-6140-4867-bd57-52ae8a767a4d