Skip to content

fix(cascade): break the PR-covered-issue livelock — broaden in-flight guard + planning-phase reroute#28

Merged
ulmentflam merged 1 commit into
mainfrom
fix/cascade-issue-livelock
Jun 11, 2026
Merged

fix(cascade): break the PR-covered-issue livelock — broaden in-flight guard + planning-phase reroute#28
ulmentflam merged 1 commit into
mainfrom
fix/cascade-issue-livelock

Conversation

@ulmentflam

Copy link
Copy Markdown
Owner

Root cause (issue #27)

corpus-forge run 2026-06-10T02-58-40Z: the agent held for ~300 consecutive keepalive turns while nightly next kept surfacing issue #125. The v0.0.10 keepalive fix held fine — the livelock is in the cascade.

The issue ranker's in-flight guard (issue #10 §bug-3) only matches GitHub closing keywords (fixes/closes/resolves #N) in open PR text. Verified against the real corpus-forge PRs: the epic #125 is referenced by its per-item PRs as "(#125)" in the title and "Addresses item #5 of #125" in the body — deliberately no closing keyword, since a per-item PR must not auto-close the epic. So the ranker never skipped it, the cascade never bottomed out at ideate/nothing, and the planning-phase doctrine was unreachable. (#120's occasional picks predate PR #133, which does say "Fixes #120" and was correctly suppressed once open.)

The fix — two layers

1. Broadened in-flight signal (triage.py). fetch_open_pr_issue_refs_via_gh now returns an OpenPRRefs pair:

  • closing_refs — closing-keyword claims from all open PRs (unchanged behavior, takes precedence), skip reason "open PR already addresses this issue".
  • nightly_mention_refs — bare #N mentions, counted only from Nightly-authored PRs (nightly/* head branch), new skip reason "open Nightly PR references this issue (in-flight)". A bare mention is too weak a signal from arbitrary PRs, but in a PR the orchestrator itself opened it means the issue is being worked. GitHub issues/PRs share one number sequence, so over-matching PR cross-refs can't collide with issue numbers.

Injected fakes returning a bare frozenset[int] are coerced for back-compat, so the test-injection pattern survives.

2. Livelock backstop (keepalive_hook.py). The Stop hook already records consecutive cascade-pick repeats in keepalive.history (the count was intentionally unused since v0.0.3). Now: when a github_issue or accepted_rfc pick repeats 3 consecutive turn boundaries, the hook injects the planning-phase prompt ("GENUINE WORK IS NEVER EXHAUSTED…") instead of "Continue on: X". Rationale: an actionable pick from those two rungs becomes a task plan within one turn, after which resume_in_flight outranks it and the fingerprint changes — a third identical re-pick means holding, not working. resume_in_flight/unblocked_approval/pr_rescue legitimately repeat across long tasks and never reroute. Detection reroutes to ideation; it never releases the session — the v0.0.3 "only human intervention terminates" contract holds (the removed cascade_loop guard's flaw was releasing, not detecting).

Rules block (rules.py → regenerated AGENTS.md/CLAUDE.md) and skill.md updated to document both. Version bumped to 0.0.11.

Testing

make check clean: ruff, pyrefly, 1044 passed. New coverage includes a regression test shaped exactly like the real case ("(#125)" title + "Addresses item #5 of #125" body in a nightly/* PR → skipped), non-nightly bare mentions NOT skipped, closing-ref precedence, fetcher degradation, 3rd-re-pick reroute (first two normal), resume_in_flight ×5 never reroutes, fingerprint change resets to normal prompts.

Fixes #27.

🤖 Generated with Claude Code

… guard + planning-phase reroute

The github_issue rung re-picked epic issues forever when their open PRs
referenced them without closing keywords ("(#125)", "Addresses item #5
of #125" — deliberately not "Fixes #125", since a per-item PR must not
auto-close the epic). The closing-keyword guard (issue #10 §bug-3) never
skipped them, the cascade never fell through to ideate, the planning-
phase doctrine (which triggers only on `nothing`) was unreachable, and
the agent held for 300+ keepalive turns (issue #27).

- triage: the ranker now also skips issues mentioned as a bare `#N` in
  any open Nightly-authored PR (`nightly/*` head branch) — the
  orchestrator's own open PR referencing an issue means it is in
  flight. New OpenPRRefs carries both signals; closing-keyword refs
  (from all open PRs) take precedence; injected frozenset fakes coerce
  for back-compat. GitHub issues and PRs share one number sequence, so
  bare-mention over-matching of PR numbers can't collide with issues.
- keepalive: livelock backstop — when a github_issue/accepted_rfc pick
  repeats 3 consecutive turn boundaries (an actionable pick becomes a
  task plan in one turn, flipping the cascade to resume_in_flight), the
  Stop hook injects the planning-phase prompt instead of "Continue on:
  X". Detection reroutes to ideation; the session never releases —
  the old cascade_loop guard's flaw was releasing, not detecting.
  resume_in_flight / unblocked_approval / pr_rescue legitimately repeat
  and never reroute.
- Rules block, skill.md regenerated/updated; bump 0.0.11.

Fixes #27.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@ulmentflam, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 39 minutes and 55 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: bd8712e9-9047-471d-af6a-33bd3754e299

📥 Commits

Reviewing files that changed from the base of the PR and between 058f5c3 and bdf0568.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • AGENTS.md
  • CLAUDE.md
  • packages/nightly-core/pyproject.toml
  • packages/nightly-core/src/nightly_core/_version.py
  • packages/nightly-core/src/nightly_core/keepalive_hook.py
  • packages/nightly-core/src/nightly_core/rules.py
  • packages/nightly-core/src/nightly_core/triage.py
  • packages/nightly-core/tests/test_keepalive_hook.py
  • packages/nightly-core/tests/test_triage.py
  • packages/nightly-host-antigravity/pyproject.toml
  • packages/nightly-host-claude/pyproject.toml
  • packages/nightly-host-claude/src/nightly_host_claude/skill.md
  • packages/nightly-host-codex/pyproject.toml
  • packages/nightly-host-cursor/pyproject.toml
  • packages/nightly-host-gemini/pyproject.toml
  • packages/nightly-host-opencode/pyproject.toml
  • pyproject.toml
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/cascade-issue-livelock

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ulmentflam ulmentflam merged commit d0658b0 into main Jun 11, 2026
3 checks passed
@ulmentflam ulmentflam deleted the fix/cascade-issue-livelock branch June 11, 2026 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cascade livelock: agent holds on PR-covered issues instead of reaching ideate/planning-phase

1 participant