Skip to content

Perf WG-5: Extension JSONL ingestion — tail contract, async prescan, sync FS audit #75

@DFearing

Description

@DFearing

Summary

Workflow 5 of 5 from the 2026-05-03 performance review of main. Parent: #46. Siblings: #71, #72, #73 (renderer/React), #74 (relay perf).

Includes one real correctness bug (CR-4: data-loss already covered by an existing regression test that the majority of callers violate). The rest are blocking-IO and sync-FS items on the JSONL ingestion hot path. Together with WG-4 (#74), this addresses the architectural drift between the three near-identical watcher paths (extension SessionWatcher, relay watcher, Codex watcher) — only the Codex watcher currently follows the documented contracts; the others have diverged.

Critical

CR-4 · readNewFileLines partial-line tail silently dropped by 4 of 5 callers

This is the worst class of bug for a "relay everything" system: silent data loss.

  • extension/src/fs-utils.ts:26-51 — definition returns { lines, newSize, tail }; tail is the bytes after the last newline that the caller is supposed to feed back as lastTail next call to reassemble a JSONL line split across two writer flushes
  • Lossy callers (omit lastTail, ignore returned tail, advance fileSize past the partial bytes):
  • Working reference: extension/src/codex-session-watcher.ts:307 (passes session.fileTail, restored at line 274, persisted at line 310)
  • Contract is locked down by extension/test/fs-utils.test.ts:70-83 ("regression guard") which explicitly demonstrates the loss
  • A split line then triggers JSON.parse failure inside processTranscriptLine and is silently dropped (log.debug('Skipping unparseable line:'))
  • Fix: add fileTail: string to WatchedSession, JsonlEventSource state, and subagent state. Plumb through every caller. Mechanical change, working reference exists, regression test exists.

CR-7 · prescanExistingContent synchronously reads + parses the full JSONL on session detect

  • extension/src/transcript-parser.ts:438-521; called from extension/src/session-watcher.ts:388 (and from scripts/relay.ts:288 — that caller is WG-4)
  • Helper at extension/src/fs-utils.ts:7-16 does fs.openSync + Buffer.alloc(stat.size) + fs.readSync + fs.closeSyncBuffer.alloc zero-fills, which the caller immediately overwrites
  • Then loops JSON.parse(line.trim()) for every line in a tight loop
  • For a typical .sim-sessions/ transcript (500 KB / ~900 lines): 20-100 ms blocking the extension host UI thread. Real Claude transcripts can be 5-50 MB.
  • scanForActiveSessions runs every 1 s and may attach multiple sessions per tick (extension side at session-watcher.ts:264-353).
  • Fix: stream via readline.createInterface over fs.createReadStream (cleanest), OR yield via await new Promise(setImmediate) every 100 lines (cheapest). Switch Buffer.allocBuffer.allocUnsafe (caller fills from disk, zero-fill is wasted work).

Important

IR-16 · scanForActiveSessions runs every 1 s with sync FS scaling per-session

  • extension/src/session-watcher.ts:264-353
  • Per scan: readdirSync(CLAUDE_DIR) + per-dir isContainedProject (which itself does readdirSync + openSync + readSync(8KB) + JSON.parse until cwd is found, until cached) + per-jsonl statSync + per-subagent-dir existsSync + readdirSync + statSync per subagent file
  • scanForActiveSessions body owned by extension; the relay's equivalent at scripts/relay.ts:340-389 is owned by WG-4 (no caching at all there)
  • Fix: rely primarily on the existing dirWatcher callback as the trigger; downgrade the 1 s interval to a 5-10 s heartbeat fallback. Cache negative results (see IR-IO MR-10).

IR-17 · scanSubagentsDir runs on every appended-line read

  • extension/src/session-watcher.ts:461
  • readNewLines() is the per-write hot path (called for every fs.watch change event AND every 3 s poll tick) and it unconditionally calls scanSubagentsDir(...)fs.readdirSync(session.subagentsDir) at the end
  • The work after the first pass is just the readdirSync (already-watched files are skipped), but that syscall happens once per parent-file-change. Stress scenario (500 rapid writes) → 500 redundant readdirSyncs
  • The session.subagentsDirWatcher (set up inside scanSubagentsDir) already gets a callback when new files appear — that's the right mechanism. The unconditional call from readNewLines is redundant.
  • Fix: trigger scanSubagentsDir only once per session-attach; rely on the dir watcher for new subagent files.

IR-22 · discovery.ts:39 blocks activation up to 3 s on execSync('command -v node')

  • extension/src/discovery.ts:34-52
  • Synchronous shell spawn the first time getHookCommand is called from migrateHttpHooks (called from startClaudeRuntime at claude-runtime.ts:79)
  • Cached after first call but blocks the activation/UI thread. Worst case: 3 s typical 50-200 ms
  • Comment at line 32-34 explains why it's needed (Electron process.execPath is not node) — that justification stands; the issue is the synchronous call
  • Fix: execFile/async, or guard behind an actual hook-write path so it only runs when needed (currently runs as part of routine startup if any project-level settings.local.json exists).

IR-23 · JsonlEventSource.processExistingContent reads full file synchronously

  • extension/src/event-source.ts:45-54
  • start() calls processExistingContent()fs.readFileSync(this.filePath, 'utf-8') on the entire file, splits, emits every line as an event
  • Same shape as CR-7 but in the user-triggered "Connect to Running Agent" → "Watch JSONL File" path. Multi-hundred-ms freeze on the UI thread for large logs.
  • Fix: stream via fs.createReadStream + readline; emit events incrementally with setImmediate yields.

IR-24 · subagent-watcher.startWatchingSubagentFile reads full subagent JSONL on attach

  • extension/src/subagent-watcher.ts:103-126
  • For each newly-discovered subagent file: fs.readFileSync(filePath, 'utf-8') reads the entire file, then splits and JSON.parses every line just to populate seenToolUseIds for dedup
  • Sim subagent files are small (5-7 lines) so it's fast there, but real Claude subagent transcripts can be much larger. Hits the relay event loop on every subagent dir-watch event.
  • Fix: stream-and-yield, same as CR-7 / IR-23.

Minor

MR-10 · containedProjectCache only caches positive matches

  • extension/src/session-watcher.ts:226-229
  • Comment at line 226-228 explains positives-only caching with a justification (a fresh dir might race) that doesn't cover the common case. Negative results re-scan every 1 s.
  • Fix: TTL the negative cache (e.g., 30 s) or invalidate it from the CLAUDE_DIR watcher callback.

MR-11 · Permission timers are re-created on every readNewLines

  • extension/src/permission-detection.ts:36-83
  • Cheap individually, but the Array.from(pendingToolCalls.values()).some(...) (line 50) is O(pending) per call, executed twice per readNewLines.
  • Fix: only re-arm the timer when the pending set actually changes.

Coverage gap to add (to bench)

The stress sim writes 500 events at ~5 ms intervals = ~50 KB total — far below real-world transcript sizes. Add:

  • A bigfile scenario that pre-stages a 5 MB transcript before attaching, to exercise CR-7
  • A split-write scenario that flushes mid-line, to exercise CR-4 directly against the regression test contract

Parallelism

Independent of WG-1 (#71), WG-2 (#72), WG-3 (#73), WG-4 (#74). Owns extension/src/{fs-utils,transcript-parser,session-watcher,subagent-watcher,event-source,codex-session-watcher,discovery,permission-detection}.ts. Does not touch scripts/relay.ts (WG-4) or extension/src/webview-provider.ts (WG-4). The CR-4 caller fix at relay.ts:326 is owned by WG-4 — flag this in your PR if WG-4 hasn't merged yet, since it's a one-line change in a file you don't otherwise touch.

Test plan

  • CR-4: extend extension/test/fs-utils.test.ts with a regression test per fixed caller; run sim with deliberate mid-line flushes; confirm zero Skipping unparseable line debug entries
  • CR-7 / IR-23 / IR-24: connect to a 10 MB JSONL log; UI thread blocking time should drop from seconds to ~16 ms frames; events should arrive incrementally
  • IR-16 / IR-17: at stress sim 200 evt/s, strace -c -p $(pgrep ...) should show fewer getdents/stat syscalls
  • IR-22: extension activation time on a fresh workspace before/after; should be ~50-200 ms faster

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions