Summary
Workers can silently validate different filesystem states because review/test execution is not pinned hard enough to the same checkout/commit, and testers can run decisive pass/fail checks from a dirty shared workspace.
Incident
First Light issue #221 / PR #222 exposed this clearly:
- reviewer approved based on the PR diff and described the fixed code
- tester ran the compile repro in
/home/sai/.openclaw/workspace/firstlight
- tester reproduced the old compile error and reported that the fix was not present locally
- later verification showed:
origin/main at merged commit 24164242926923d9accc44cfa65892c1506e77d8 does contain the fix
- the shared workspace was on a different dirty local
HEAD and still had the old broken lines
This created a false contradiction where "dev/reviewer says compile should pass" and "tester says compile fails", but they were effectively validating different trees.
Evidence
Merged PR:
yaqub0r/firstlight#222
- merge commit:
24164242926923d9accc44cfa65892c1506e77d8
Fixed code on origin/main:
hasHarnessSeed = harnessSeed.HasValue,
harnessSeed = harnessSeed ?? 0,
Broken code still present in the shared local workspace during test:
hasHarnessSeed = hasHarnessSeed,
harnessSeed = harnessSeed,
Observed local mismatch during follow-up:
- local workspace
HEAD: e1c54402
origin/main: 24164242
- workspace had additional dirty tracked changes and untracked files
Problem
Current worker coordination appears to allow at least one of these bad behaviors:
- tester uses a shared mutable workspace instead of an isolated clean worktree pinned to the target ref
- reviewer/dev/tester are not all given the same mandatory target commit/checkout contract
- decisive verification comments do not automatically include enough provenance (
git rev-parse HEAD, dirty state, worktree path, target PR/commit)
- workers do not fail fast when the workspace is dirty or not at the intended ref
Expected behavior
For review/test/build tasks, DevClaw should ensure all workers validate the same code state.
Suggested contract:
- pin every worker to an explicit commit SHA / PR head SHA / merge commit
- prefer isolated clean worktrees for developer/tester/reviewer execution
- before any pass/fail verdict, record and/or enforce:
- repo path
- worktree path
- branch/ref
git rev-parse HEAD
- dirty/clean status
- tester should refuse a definitive verification run if the tree is dirty or if HEAD does not match the requested target commit
Why this matters
Without this, DevClaw can produce conflicting authoritative-seeming comments about the same issue even when reality is deterministic, which undermines trust in the workflow.
Summary
Workers can silently validate different filesystem states because review/test execution is not pinned hard enough to the same checkout/commit, and testers can run decisive pass/fail checks from a dirty shared workspace.
Incident
First Light issue
#221/ PR#222exposed this clearly:/home/sai/.openclaw/workspace/firstlightorigin/mainat merged commit24164242926923d9accc44cfa65892c1506e77d8does contain the fixHEADand still had the old broken linesThis created a false contradiction where "dev/reviewer says compile should pass" and "tester says compile fails", but they were effectively validating different trees.
Evidence
Merged PR:
yaqub0r/firstlight#22224164242926923d9accc44cfa65892c1506e77d8Fixed code on
origin/main:Broken code still present in the shared local workspace during test:
Observed local mismatch during follow-up:
HEAD:e1c54402origin/main:24164242Problem
Current worker coordination appears to allow at least one of these bad behaviors:
git rev-parse HEAD, dirty state, worktree path, target PR/commit)Expected behavior
For review/test/build tasks, DevClaw should ensure all workers validate the same code state.
Suggested contract:
git rev-parse HEADWhy this matters
Without this, DevClaw can produce conflicting authoritative-seeming comments about the same issue even when reality is deterministic, which undermines trust in the workflow.