fix(cascade): break the PR-covered-issue livelock — broaden in-flight guard + planning-phase reroute#28
Conversation
… guard + planning-phase reroute
The github_issue rung re-picked epic issues forever when their open PRs
referenced them without closing keywords ("(#125)", "Addresses item #5
of #125" — deliberately not "Fixes #125", since a per-item PR must not
auto-close the epic). The closing-keyword guard (issue #10 §bug-3) never
skipped them, the cascade never fell through to ideate, the planning-
phase doctrine (which triggers only on `nothing`) was unreachable, and
the agent held for 300+ keepalive turns (issue #27).
- triage: the ranker now also skips issues mentioned as a bare `#N` in
any open Nightly-authored PR (`nightly/*` head branch) — the
orchestrator's own open PR referencing an issue means it is in
flight. New OpenPRRefs carries both signals; closing-keyword refs
(from all open PRs) take precedence; injected frozenset fakes coerce
for back-compat. GitHub issues and PRs share one number sequence, so
bare-mention over-matching of PR numbers can't collide with issues.
- keepalive: livelock backstop — when a github_issue/accepted_rfc pick
repeats 3 consecutive turn boundaries (an actionable pick becomes a
task plan in one turn, flipping the cascade to resume_in_flight), the
Stop hook injects the planning-phase prompt instead of "Continue on:
X". Detection reroutes to ideation; the session never releases —
the old cascade_loop guard's flaw was releasing, not detecting.
resume_in_flight / unblocked_approval / pr_rescue legitimately repeat
and never reroute.
- Rules block, skill.md regenerated/updated; bump 0.0.11.
Fixes #27.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 39 minutes and 55 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (17)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Root cause (issue #27)
corpus-forge run
2026-06-10T02-58-40Z: the agent held for ~300 consecutive keepalive turns whilenightly nextkept surfacing issue #125. The v0.0.10 keepalive fix held fine — the livelock is in the cascade.The issue ranker's in-flight guard (issue #10 §bug-3) only matches GitHub closing keywords (
fixes/closes/resolves #N) in open PR text. Verified against the real corpus-forge PRs: the epic #125 is referenced by its per-item PRs as "(#125)" in the title and "Addresses item #5 of #125" in the body — deliberately no closing keyword, since a per-item PR must not auto-close the epic. So the ranker never skipped it, the cascade never bottomed out atideate/nothing, and the planning-phase doctrine was unreachable. (#120's occasional picks predate PR #133, which does say "Fixes #120" and was correctly suppressed once open.)The fix — two layers
1. Broadened in-flight signal (
triage.py).fetch_open_pr_issue_refs_via_ghnow returns anOpenPRRefspair:closing_refs— closing-keyword claims from all open PRs (unchanged behavior, takes precedence), skip reason "open PR already addresses this issue".nightly_mention_refs— bare#Nmentions, counted only from Nightly-authored PRs (nightly/*head branch), new skip reason "open Nightly PR references this issue (in-flight)". A bare mention is too weak a signal from arbitrary PRs, but in a PR the orchestrator itself opened it means the issue is being worked. GitHub issues/PRs share one number sequence, so over-matching PR cross-refs can't collide with issue numbers.Injected fakes returning a bare
frozenset[int]are coerced for back-compat, so the test-injection pattern survives.2. Livelock backstop (
keepalive_hook.py). The Stop hook already records consecutive cascade-pick repeats inkeepalive.history(the count was intentionally unused since v0.0.3). Now: when agithub_issueoraccepted_rfcpick repeats 3 consecutive turn boundaries, the hook injects the planning-phase prompt ("GENUINE WORK IS NEVER EXHAUSTED…") instead of "Continue on: X". Rationale: an actionable pick from those two rungs becomes a task plan within one turn, after whichresume_in_flightoutranks it and the fingerprint changes — a third identical re-pick means holding, not working.resume_in_flight/unblocked_approval/pr_rescuelegitimately repeat across long tasks and never reroute. Detection reroutes to ideation; it never releases the session — the v0.0.3 "only human intervention terminates" contract holds (the removedcascade_loopguard's flaw was releasing, not detecting).Rules block (
rules.py→ regenerated AGENTS.md/CLAUDE.md) and skill.md updated to document both. Version bumped to 0.0.11.Testing
make checkclean: ruff, pyrefly, 1044 passed. New coverage includes a regression test shaped exactly like the real case ("(#125)" title + "Addresses item #5 of #125" body in anightly/*PR → skipped), non-nightly bare mentions NOT skipped, closing-ref precedence, fetcher degradation, 3rd-re-pick reroute (first two normal),resume_in_flight×5 never reroutes, fingerprint change resets to normal prompts.Fixes #27.
🤖 Generated with Claude Code