Skip to content

fix: serialize index.json read-modify-write to prevent race#25

Merged
khaliqgant merged 2 commits intomainfrom
fix/index-write-race
Apr 20, 2026
Merged

fix: serialize index.json read-modify-write to prevent race#25
khaliqgant merged 2 commits intomainfrom
fix/index-write-race

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

@khaliqgant khaliqgant commented Apr 20, 2026

Summary

FileStorage.updateIndex (via save) did a plain loadIndex -> mutate -> saveIndex round trip against .trajectories/index.json with no serialization. When two callers in the same process ran concurrently, one writer could truncate the file to 0 bytes (Node's writeFile opens in "w" mode) while another was mid-readFile -> JSON.parse. The reader hit JSON.parse(""), threw SyntaxError: Unexpected end of JSON input, the error was caught and logged, and the reader then wrote its single new entry on top of an empty index - wiping every trajectory registered before it.

This PR serializes all index read-modify-write cycles with an in-process mutex, makes index writes atomic via tmp + rename, and makes loadIndex tolerate empty files silently as belt-and-braces.

Reproduction

A relay workflow fans out 3 agents sharing one upstream dependsOn step. All three hit stepStarted -> registerAgent -> flush -> FileStorage.save -> updateIndex in the same millisecond. Observed symptom:

Error loading trajectory index, using empty index: SyntaxError: Unexpected end of JSON input
    at JSON.parse (<anonymous>)
    at FileStorage.loadIndex (.../agent-trajectories/dist/chunk-2XT3DOJC.js:1071:19)
    at async FileStorage.updateIndex (.../chunk-2XT3DOJC.js:1091:19)
    at async FileStorage.save (.../chunk-2XT3DOJC.js:854:5)

Logged three times per fan-out; prior trajectories silently lost from the final index.json.

Fix

Three layers, in src/storage/file.ts:

  1. In-process mutex - withIndexLock(indexPath, task) keeps a Promise tail per absolute index path. Each updateIndex / reconcileIndex / delete / first-run initialize chains onto the previous task, so siblings in the same process cannot interleave. No new runtime deps - it's ~10 lines of promise chaining with swallowed-error tails.
  2. Atomic write - saveIndex now writes to index.json.tmp and calls fs.rename over the live file. rename is atomic on POSIX, so any reader (including one in another process) sees either the old complete file or the new complete file - never a half-written zero-byte state.
  3. Empty-read tolerance - loadIndex treats a zero-byte file as an empty index silently. With the mutex + atomic rename in place this should never happen in practice, but it's cheap defense-in-depth and avoids a scary log line on the rare corner case. Non-empty-but-malformed JSON still logs console.error - genuine corruption is worth surfacing.

Public API is unchanged. save / updateIndex / loadIndex signatures are identical.

Test plan

  • New tests/storage/concurrent-save.test.ts with three cases:
    • 20 concurrent save calls all land in the final index, the file is valid JSON, and no SyntaxError is logged.
    • Forcing index.json to an empty file (simulating a truncated-mid-write read) is tolerated without a console.error.
    • Non-empty malformed JSON still triggers console.error (regression guard).
  • All 238 existing tests still pass (npm run test:run).
  • npm run typecheck clean.
  • npm run build (tsup) succeeds.
  • npm run lint - pre-existing package.json formatter warning on main is unrelated and left untouched per the bugfix-only scope.

Open in Devin Review

Concurrent callers into `FileStorage.save` / `updateIndex` could
interleave their `loadIndex` (readFile + JSON.parse) and `saveIndex`
(writeFile "w" mode truncates) on `.trajectories/index.json`. Typical
trigger: a relay workflow fans out 3+ agents, each calls
`WorkflowTrajectory.registerAgent` -> `FileStorage.save` at the same
millisecond. One writer truncates the file to 0 bytes; a concurrent
reader hits `JSON.parse("")`, logs `SyntaxError: Unexpected end of
JSON input`, falls back to an empty index, and then writes its one
new entry on top - wiping every trajectory registered before it.

Fix has three layers:

1. In-process mutex keyed by absolute index path. A tiny promise-chain
   mutex (no new deps) queues `loadIndex -> mutate -> saveIndex` so
   siblings in the same process never interleave.
2. Atomic write: `saveIndex` writes to `index.json.tmp` and
   `fs.rename`s it over the live file. `rename` is atomic on POSIX so
   cross-process readers never observe a half-written file.
3. Tolerate empty reads: `loadIndex` treats a zero-byte file as an
   empty index silently (can only happen under the old truncate
   window; still worth being resilient). Non-empty malformed JSON
   still logs - that's a real corruption worth surfacing.

Also wraps `reconcileIndex`, `delete`, and the first-run seed write in
`initialize` in the same mutex so all index read-modify-write sites
are serialized, not just `updateIndex`.

Adds `tests/storage/concurrent-save.test.ts` with three cases: 20
concurrent saves all land in the final index with no SyntaxError;
empty index.json is tolerated silently; genuinely malformed JSON is
still logged.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fd8e9ed88a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/storage/file.ts Outdated
Comment on lines +719 to +721
const tmpPath = `${this.indexPath}.tmp`;
await writeFile(tmpPath, JSON.stringify(index, null, 2), "utf-8");
await rename(tmpPath, this.indexPath);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use unique temp files for index writes

saveIndex always stages through the fixed path index.json.tmp, so concurrent writers from different processes (for example two CLI commands or relay workers touching the same data dir) can race each other: one process renames the tmp file first, and the other then fails with ENOENT on rename, causing save/delete to reject and potentially drop index updates. This regression is introduced by the new tmp+rename flow because there is no cross-process lock, so the temp filename must be made process-unique (or otherwise collision-free).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

@khaliqgant khaliqgant merged commit 4c3fb6f into main Apr 20, 2026
6 checks passed
@khaliqgant khaliqgant deleted the fix/index-write-race branch April 20, 2026 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant