fix: serialize index.json read-modify-write to prevent race#25
fix: serialize index.json read-modify-write to prevent race#25khaliqgant merged 2 commits intomainfrom
Conversation
Concurrent callers into `FileStorage.save` / `updateIndex` could
interleave their `loadIndex` (readFile + JSON.parse) and `saveIndex`
(writeFile "w" mode truncates) on `.trajectories/index.json`. Typical
trigger: a relay workflow fans out 3+ agents, each calls
`WorkflowTrajectory.registerAgent` -> `FileStorage.save` at the same
millisecond. One writer truncates the file to 0 bytes; a concurrent
reader hits `JSON.parse("")`, logs `SyntaxError: Unexpected end of
JSON input`, falls back to an empty index, and then writes its one
new entry on top - wiping every trajectory registered before it.
Fix has three layers:
1. In-process mutex keyed by absolute index path. A tiny promise-chain
mutex (no new deps) queues `loadIndex -> mutate -> saveIndex` so
siblings in the same process never interleave.
2. Atomic write: `saveIndex` writes to `index.json.tmp` and
`fs.rename`s it over the live file. `rename` is atomic on POSIX so
cross-process readers never observe a half-written file.
3. Tolerate empty reads: `loadIndex` treats a zero-byte file as an
empty index silently (can only happen under the old truncate
window; still worth being resilient). Non-empty malformed JSON
still logs - that's a real corruption worth surfacing.
Also wraps `reconcileIndex`, `delete`, and the first-run seed write in
`initialize` in the same mutex so all index read-modify-write sites
are serialized, not just `updateIndex`.
Adds `tests/storage/concurrent-save.test.ts` with three cases: 20
concurrent saves all land in the final index with no SyntaxError;
empty index.json is tolerated silently; genuinely malformed JSON is
still logged.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fd8e9ed88a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const tmpPath = `${this.indexPath}.tmp`; | ||
| await writeFile(tmpPath, JSON.stringify(index, null, 2), "utf-8"); | ||
| await rename(tmpPath, this.indexPath); |
There was a problem hiding this comment.
Use unique temp files for index writes
saveIndex always stages through the fixed path index.json.tmp, so concurrent writers from different processes (for example two CLI commands or relay workers touching the same data dir) can race each other: one process renames the tmp file first, and the other then fails with ENOENT on rename, causing save/delete to reject and potentially drop index updates. This regression is introduced by the new tmp+rename flow because there is no cross-process lock, so the temp filename must be made process-unique (or otherwise collision-free).
Useful? React with 👍 / 👎.
Summary
FileStorage.updateIndex(viasave) did a plainloadIndex-> mutate ->saveIndexround trip against.trajectories/index.jsonwith no serialization. When two callers in the same process ran concurrently, one writer could truncate the file to 0 bytes (Node'swriteFileopens in"w"mode) while another was mid-readFile->JSON.parse. The reader hitJSON.parse(""), threwSyntaxError: Unexpected end of JSON input, the error was caught and logged, and the reader then wrote its single new entry on top of an empty index - wiping every trajectory registered before it.This PR serializes all index read-modify-write cycles with an in-process mutex, makes index writes atomic via
tmp + rename, and makesloadIndextolerate empty files silently as belt-and-braces.Reproduction
A relay workflow fans out 3 agents sharing one upstream
dependsOnstep. All three hitstepStarted->registerAgent->flush->FileStorage.save->updateIndexin the same millisecond. Observed symptom:Logged three times per fan-out; prior trajectories silently lost from the final
index.json.Fix
Three layers, in
src/storage/file.ts:withIndexLock(indexPath, task)keeps aPromisetail per absolute index path. EachupdateIndex/reconcileIndex/delete/ first-runinitializechains onto the previous task, so siblings in the same process cannot interleave. No new runtime deps - it's ~10 lines of promise chaining with swallowed-error tails.saveIndexnow writes toindex.json.tmpand callsfs.renameover the live file.renameis atomic on POSIX, so any reader (including one in another process) sees either the old complete file or the new complete file - never a half-written zero-byte state.loadIndextreats a zero-byte file as an empty index silently. With the mutex + atomic rename in place this should never happen in practice, but it's cheap defense-in-depth and avoids a scary log line on the rare corner case. Non-empty-but-malformed JSON still logsconsole.error- genuine corruption is worth surfacing.Public API is unchanged.
save/updateIndex/loadIndexsignatures are identical.Test plan
tests/storage/concurrent-save.test.tswith three cases:savecalls all land in the final index, the file is valid JSON, and noSyntaxErroris logged.index.jsonto an empty file (simulating a truncated-mid-write read) is tolerated without aconsole.error.console.error(regression guard).npm run test:run).npm run typecheckclean.npm run build(tsup) succeeds.npm run lint- pre-existingpackage.jsonformatter warning onmainis unrelated and left untouched per the bugfix-only scope.