feat: add file locking to prevent concurrent state.json corruption#20
feat: add file locking to prevent concurrent state.json corruption#20don-petry wants to merge 2 commits intooneirosoft:mainfrom
Conversation
Use an advisory lock file (.dagger/lock) with atomic create_new to ensure only one dgr process can modify state.json at a time. The lock is held for the duration of a StoreSession and released on drop. Relates to oneirosoft#10 item 1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces an advisory lock file to prevent multiple concurrent dgr processes from simultaneously mutating the store (notably state.json), reducing the risk of store corruption under parallel invocations.
Changes:
- Added a
StoreLockguard that acquires an exclusive lock via atomiccreate_newfile creation and removes the lock file on drop. - Integrated lock acquisition into
StoreSessionopening/creation flows. - Added unit tests for lock acquisition, mutual exclusion, and cleanup; added
tempfileas a dev-dependency.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/core/store/lock.rs | New lock guard implementation and unit tests. |
| src/core/store/session.rs | Acquires the lock as part of session creation/opening. |
| src/core/store/mod.rs | Exposes the new lock module within store. |
| src/core/commit.rs | Switches session construction to a helper that acquires the lock. |
| Cargo.toml | Adds tempfile for the new unit tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub fn acquire(dagger_root: &Path) -> io::Result<Self> { | ||
| let lock_path = dagger_root.join("lock"); | ||
|
|
||
| // Try to create the lock file exclusively. | ||
| // O_CREAT | O_EXCL ensures atomic creation — fails if file already exists. | ||
| match fs::OpenOptions::new() | ||
| .write(true) | ||
| .create_new(true) | ||
| .open(&lock_path) |
There was a problem hiding this comment.
StoreLock::acquire assumes dagger_root already exists. That prevents taking the lock before store initialization (e.g., in open_or_initialize) and will fail with NotFound if the .dagger dir is missing. Consider creating the directory (e.g., fs::create_dir_all(dagger_root)?) before attempting create_new so locking can protect initialization too.
There was a problem hiding this comment.
Addressed in the latest push.
There was a problem hiding this comment.
Fixed in b5373c0. StoreLock::acquire now calls fs::create_dir_all(dagger_root) before attempting to create the lock file, so the lock can be taken even before the store directory is fully initialized.
| let store_initialization = initialize_store(&paths, trunk_branch)?; | ||
| let config = | ||
| load_config(&paths)?.ok_or_else(|| io::Error::other("dagger config is missing"))?; | ||
| let lock = StoreLock::acquire(&paths.root)?; |
There was a problem hiding this comment.
In open_or_initialize, the lock is acquired after initialize_store (which may write config.json/state.json/events). Two concurrent dgr processes can still race during initialization and write state concurrently before either takes the lock. If StoreLock::acquire is updated to create_dir_all on the root, consider acquiring the lock before calling initialize_store so initialization and subsequent reads/writes are fully mutually exclusive.
| let store_initialization = initialize_store(&paths, trunk_branch)?; | |
| let config = | |
| load_config(&paths)?.ok_or_else(|| io::Error::other("dagger config is missing"))?; | |
| let lock = StoreLock::acquire(&paths.root)?; | |
| let lock = StoreLock::acquire(&paths.root)?; | |
| let store_initialization = initialize_store(&paths, trunk_branch)?; | |
| let config = | |
| load_config(&paths)?.ok_or_else(|| io::Error::other("dagger config is missing"))?; |
There was a problem hiding this comment.
Addressed in the latest push.
There was a problem hiding this comment.
Fixed in b5373c0. Lock acquisition is now moved before initialize_store in open_or_initialize, so two concurrent processes can no longer race on initialization.
| let actions = | ||
| restack::plan_after_branch_advance(&state, node.id, &node.branch_name, old_head_oid)?; | ||
| let mut session = StoreSession { | ||
| repo: context.repo.clone(), | ||
| paths: store_paths, | ||
| config, | ||
| state, | ||
| }; | ||
| let mut session = StoreSession::from_parts(context.repo.clone(), store_paths, config, state)?; | ||
| record_branch_divergence_state(&mut session, node.id, BranchDivergenceState::Diverged)?; |
There was a problem hiding this comment.
StoreSession::from_parts acquires the store lock only at this point, but load_state and plan_after_branch_advance were already done earlier without holding the lock. That means this code can still read state.json while another dgr process holds the lock (and may be mid-write), and the planned restack actions may be based on stale state. Consider acquiring the lock before reading state / planning (or pass an already-acquired StoreLock into session construction) and reloading state after the lock is held.
There was a problem hiding this comment.
Addressed in the latest push.
There was a problem hiding this comment.
Fixed in b5373c0. The lock is now acquired in maybe_restack_after_commit_inner before load_state, and the pre-acquired lock is passed into the session via the new StoreSession::from_lock constructor. Note that load_config remains outside the lock because the config is written once during init and is effectively immutable after that—it is never modified by concurrent operations, so reading it without the lock is safe. The mutable state.json is the resource that needs serialization, and that is now fully covered.
- Ensure dagger_root directory exists before creating lock file - Move lock acquisition before initialize_store in open_or_initialize - Acquire lock before state read in maybe_restack_after_commit_inner - Add from_lock constructor to StoreSession for pre-acquired locks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
don-petry
left a comment
There was a problem hiding this comment.
Automated review — NEEDS HUMAN REVIEW
Risk: MEDIUM
Reviewed commit: b5373c0b15429658a1c92853f5b29c1738fbdb78
Council vote: security=MEDIUM · correctness=MEDIUM · maintainability=LOW
Summary
The PR correctly implements advisory file locking to address issue #10 item 1, using sound RAII patterns with atomic create_new and idiomatic Rust. All three Copilot review threads from the first commit were resolved in the follow-up commit. However, mergeStateStatus is UNSTABLE with an empty statusCheckRollup — no CI run has confirmed cargo test passes for the new lock code, which AGENTS.md requires as the verification baseline. Two additional minor issues were identified: StoreSession::from_parts is public dead code that introduces a TOCTOU risk for future callers, and the silent removal of #[derive(Debug, Clone)] from StoreSession is an undocumented breaking change.
Linked issue analysis
This PR addresses item 1 of issue #10 (add file locking before state reads/writes). A lock is correctly acquired in all three session-creation paths (open_initialized, open_or_initialize, maybe_restack_after_commit_inner). Items 2–6 of issue #10 (write_atomic TOCTOU, cycle detection, state validation, --abort workflow, schema migration) remain open and are correctly not claimed by this PR.
Findings
Major
- [major] (security + correctness) —
mergeStateStatusisUNSTABLEandstatusCheckRollupis empty. No CI run confirmscargo testpasses for the new lock code. AGENTS.md requires passing tests as the verification baseline; this gate cannot be verified from PR metadata.
Minor
- [minor]
src/core/store/session.rs:41(correctness + maintainability) —StoreSession::from_partsis defined as public but is never called in this diff. Any future caller that reads state before callingfrom_partswill have a TOCTOU window. The method should either be removed or made private. - [minor]
src/core/store/session.rs:8(correctness + maintainability) — Removing#[derive(Debug, Clone)]fromStoreSessionis a silent breaking change. It is intentional (sinceStoreLockis notClone) but undocumented in the PR body.
Info
- [info]
src/core/store/lock.rs:16(security + correctness) — Stale lock: the lock file stores the owner PID butStoreLock::acquirenever verifies the PID is still alive. A SIGKILL'd process leaves a stale lock blocking all future operations until manually deleted. Acceptable for a local dev tool; the error message guides users to delete the file. - [info]
src/core/store/lock.rs:17(security) —create_dir_all(dagger_root)follows symlinks for intermediate path components. If.daggerwere a symlink, the lock file would be created at the symlink target. Low risk for a local dev tool where the user controls the git directory. - [info]
Cargo.toml:16(security) — New dev-dependencytempfile = "3"is a well-known crate with no typosquat risk. Floating semver is acceptable for dev-dependencies; the lockfile pins the resolved version. - [info]
src/core/store/lock.rs(correctness) — The lock is advisory (file-creation based), not OS-enforced. Code that constructsStoreSessionoutside the provided helpers bypasses mutual exclusion entirely. Expected approach for a single-machine CLI tool. - [info] (correctness) — Issue #10 has 6 items; this PR addresses only item 1 and correctly does not close the issue. Items 2–6 remain open.
- [info]
src/core/store/session.rs:59(maintainability) —open_initializedandopen_or_initializeconstructStoreSessionvia a direct struct literal (_lock: lock), whilecommit.rsusesfrom_lock(). Prefer the constructor everywhere for consistency. - [info]
src/core/store/lock.rs(maintainability) — Concurrent lock behavior is described as manual in the PR test plan; no automated smoke test was added. Acceptable for pure in-process locking logic.
CI status
mergeStateStatus is UNSTABLE with an empty statusCheckRollup — 0 checks reported (no passing, failing, or pending results available).
Reviewed automatically by the don-petry PR-review council (security: opus 4.6 · correctness: sonnet 4.6 · maintainability: sonnet 4.6 · synthesis: sonnet 4.6). The marker on line 1 lets the agent detect new commits and re-review. Reply with @don-petry if you need a human.
Why?
Running multiple
dgrprocesses concurrently (e.g., in editor hooks or parallel terminals) can cause them to read and writestate.jsonsimultaneously, leading to silent data corruption or lost updates. A file lock serializes access so only onedgrprocess modifies state at a time.Summary
StoreLockguard (src/core/store/lock.rs) that uses atomiccreate_newfile creation to prevent concurrentdgrprocesses from writing tostate.jsonsimultaneously.daggerdirectory if it doesn't exist before acquiring the lock, so locking works even on first initializationinitialize_storeto prevent races during initial setupStoreSessionso it is acquired when a session opens and released on dropRelates to #10 item 1 (file locking).
Test plan
cargo testpasses (new lock tests inlock.rs)dgrcommands simultaneously and verify the second reports "another dgr process appears to be running".dagger/lockrecovers from a stale lock after crash🤖 Generated with Claude Code