Skip to content

pipeline: orchestrator never recovers issues stuck with aw-dispatched but no resulting PR #1052

@microsasa

Description

@microsasa

Problem

The orchestrator's eligibility filter (in .github/workflows/pipeline-orchestrator.yml, Dispatch implementer for unworked issues step) excludes any issue carrying the aw-dispatched label:

([.labels.nodes[].name] | any(. == "aw-dispatched") | not)
and any(. == "agentic-workflows") | not)
and any(. == "aw-protected-files") | not)
and any(. == "backlog") | not)

aw-dispatched is applied at dispatch time, immediately before triggering issue-implementer.lock.yml. But it is never removed, even when the implementer run fails to produce a PR.

Failure modes that leave issues stranded

  1. Threat-detection parse failure — e.g. issue [aw][code health] _first_pass uses isinstance on raw dict instead of typed ToolExecutionData accessor #1030 (run 24737654133, 2026-04-21 17:48): detection model did not emit THREAT_DETECTION_RESULT; job failed; no PR created; label remained.
  2. Empty patch — implementer concludes no change is needed or makes a change inside a protected path; push job silently no-ops; no PR created; label remains.
  3. Dispatch of work the pipeline shouldn't attempt — issue refactor: extract caching and discovery from parser.py #924 body explicitly says "Do not attempt via pipeline", but the filter is label-only and doesn't inspect body text, so it got dispatched once and is now stuck.

Once stranded, the orchestrator will never re-dispatch the issue — it's invisible to every subsequent scheduled run.

Evidence

Proposed fix

Orchestrator should detect "dispatched but no PR after threshold" and reconcile.

Concrete approach (one step in pipeline-orchestrator.yml, runs before the dispatch step):

  1. Query all issues with aw-dispatched.
  2. For each, check whether there's any open/merged PR whose body or commits reference it (via #<num> or a branch-naming convention).
  3. If no PR exists AND the issue has been in aw-dispatched for > N hours (suggest: 2 hours — long enough for a normal run + detection + push):
    • Remove aw-dispatched
    • Post a comment noting the auto-unstick
    • Optionally add aw-retry-count:N so a second unstick can escalate to aw-stuck:implement for human review instead of looping forever.

Out of scope (separate considerations)

  • Honoring free-text like "Do not attempt via pipeline" in issue bodies — better solved by making sure the issue author applies backlog at creation time, or by code-health only emitting automation-eligible issues.
  • Re-queuing failed runs themselves (transient failures like threat-detection parse) — this issue is about recovering the visible state, not about retrying specific actions.

Refs

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions