Skip to content

Workers can validate different checkouts and silently disagree on build/test results #171

@fujiwaranosai850

Description

@fujiwaranosai850

Summary

Workers can silently validate different filesystem states because review/test execution is not pinned hard enough to the same checkout/commit, and testers can run decisive pass/fail checks from a dirty shared workspace.

Incident

First Light issue #221 / PR #222 exposed this clearly:

  • reviewer approved based on the PR diff and described the fixed code
  • tester ran the compile repro in /home/sai/.openclaw/workspace/firstlight
  • tester reproduced the old compile error and reported that the fix was not present locally
  • later verification showed:
    • origin/main at merged commit 24164242926923d9accc44cfa65892c1506e77d8 does contain the fix
    • the shared workspace was on a different dirty local HEAD and still had the old broken lines

This created a false contradiction where "dev/reviewer says compile should pass" and "tester says compile fails", but they were effectively validating different trees.

Evidence

Merged PR:

  • yaqub0r/firstlight#222
  • merge commit: 24164242926923d9accc44cfa65892c1506e77d8

Fixed code on origin/main:

hasHarnessSeed = harnessSeed.HasValue,
harnessSeed = harnessSeed ?? 0,

Broken code still present in the shared local workspace during test:

hasHarnessSeed = hasHarnessSeed,
harnessSeed = harnessSeed,

Observed local mismatch during follow-up:

  • local workspace HEAD: e1c54402
  • origin/main: 24164242
  • workspace had additional dirty tracked changes and untracked files

Problem

Current worker coordination appears to allow at least one of these bad behaviors:

  1. tester uses a shared mutable workspace instead of an isolated clean worktree pinned to the target ref
  2. reviewer/dev/tester are not all given the same mandatory target commit/checkout contract
  3. decisive verification comments do not automatically include enough provenance (git rev-parse HEAD, dirty state, worktree path, target PR/commit)
  4. workers do not fail fast when the workspace is dirty or not at the intended ref

Expected behavior

For review/test/build tasks, DevClaw should ensure all workers validate the same code state.

Suggested contract:

  • pin every worker to an explicit commit SHA / PR head SHA / merge commit
  • prefer isolated clean worktrees for developer/tester/reviewer execution
  • before any pass/fail verdict, record and/or enforce:
    • repo path
    • worktree path
    • branch/ref
    • git rev-parse HEAD
    • dirty/clean status
  • tester should refuse a definitive verification run if the tree is dirty or if HEAD does not match the requested target commit

Why this matters

Without this, DevClaw can produce conflicting authoritative-seeming comments about the same issue even when reality is deterministic, which undermines trust in the workflow.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions