Skip to content

Investigate post-intervention regression family behind blocked-hold redispatch loops not fixed by #225 #227

@fujiwaranosai850

Description

@fujiwaranosai850

We need a follow-up issue to track the remaining regression family behind blocked-hold redispatch loops that still persists after the fix merged in #225.

Why this needs a separate issue

Issue #225 fixed one real bug path and landed live on devclaw-local-dev, but live behavior shows the broader problem is not fully resolved.

PhysioLink issue #74 is still looping after the #225 fix went live.

That means we likely have an additional regression path beyond the one fixed in #225.

Current diagnosis

The operator's historical knowledge is important here:

  • this bug family did not exist before operator intervention work landed
  • therefore operator-intervention-era changes should be treated as the leading regression origin for the broader loop family, even if the immediate actor in a specific loop is not an actively configured intervention policy

Important distinction:

Observed remaining behavior

PhysioLink issue #74

The issue repeatedly:

  • enters blocked hold correctly: Doing -> Refining
  • then later appears back in runnable queue state: To Do
  • then gets dispatched again: To Do -> Doing

This continues even after #225 was merged live.

The worker reports the same underlying blocker each time:

  • code work is complete
  • closeout is blocked by a PR-detection bug that resolves the wrong branch (issue-65-local-dev-boot instead of the real issue-74-dev-prefix-publication lane)
  • worker correctly falls back to blocked
  • system still eventually reintroduces the issue into runnable state

Why this looks like a separate bug path

The current live code now explicitly blocks intervention auto-requeue/queue actions from HOLD states, and PhysioLink has no active intervention policies.

So the remaining loop likely comes from one of these shared paths:

  • heartbeat health auto-fix / stale-session requeue logic
  • shared workflow/event/slot-state code touched during intervention-era refactors
  • state-repair logic that wrongly interprets held blocked issues as requeueable
  • closeout failure handling around PR/branch mismatch that leaks back into runnable state

Goal of this issue

Identify and fix the remaining post-intervention regression path(s) that cause blocked/Refining issues to re-enter To Do/active work even when no human restart was requested and no live intervention policy is configured.

Acceptance criteria

Relationship to existing issues

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions