Skip to content

fix(code-reviews): recover stuck reviews#3040

Merged
alex-alecu merged 3 commits intomainfrom
stuck-waiting-for-events
May 5, 2026
Merged

fix(code-reviews): recover stuck reviews#3040
alex-alecu merged 3 commits intomainfrom
stuck-waiting-for-events

Conversation

@alex-alecu
Copy link
Copy Markdown
Contributor

@alex-alecu alex-alecu commented May 5, 2026

Summary

Some code reviews could stay queued because the worker can miss the first signal that tells it to start work. When that happened, the app had no backup path to wake the review again.

Before this PR, the app did roughly this:

  1. Mark review as queued.
  2. Send review to Worker.
  3. Worker creates Durable Object.
  4. Worker starts review in the background.

The weak point was step 4.

If the Worker accepted the review but the background start was missed or failed early, the review could stay queued forever.

Now the Worker does this:

  1. It accepts the review.
  2. It saves Durable Object state as queued.
  3. It schedules a backup alarm for 30 seconds later.
  4. It still tries to start the review right away.
  5. When the alarm fires:
    • if status is still queued, it starts the review
    • if status is already running, it does nothing
    • if status is done, it uses the alarm for cleanup as before

The fix adds a backup wake-up path for queued reviews and makes the app check worker state before deciding what to do next. This lets stuck reviews start again without starting the same review twice.

The cancel flow is also safer. If a worker request times out, the app no longer marks active reviews as cancelled unless it knows the review is still queued and has not started a worker session.

Tests cover the stuck queue case, the backup wake-up path, worker state checks, and safer cancel handling.

Verification

Manual test passed.

Tested:

  • Started local Postgres, Next.js, the code-review Worker, and a local cloud-agent-next-compatible endpoint, then dispatched a code review through the Worker and verified it reached running state.
  • Seeded pending and stale queued review rows, completed the active review through the internal status callback, and verified dispatch recovered the stale queued review.
  • Exercised user-facing tRPC cancellation for queued reviews with no Worker session and with an existing session.

Verified:

  • Worker status returned 404 for missing Durable Object state and 200 with running state for accepted reviews.
  • Database transitions included stale queued to running with session IDs and started_at, active running to completed, queued no-session cancel to cancelled with completed_at, and queued with-session cancel remaining queued.
  • Worker logs showed fallback alarm scheduling and alarm no-op after the review was already running.

Reviewer Notes

The main risk is the boundary between app state and worker state. The change keeps that logic high level: only retry when the worker says no review is active, and only cancel locally when the review has not started yet.

Comment thread apps/web/src/routers/code-reviews/code-reviews-router.ts
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented May 5, 2026

Code Review Summary

Status: 1 Issue Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
apps/web/src/routers/code-reviews/code-reviews-router.ts 404 Resolved in 0217365c343a45f0269063c15f623407decc2f09; no longer counted.
Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

File Line Issue
apps/web/src/lib/code-reviews/dispatch/dispatch-pending-reviews.ts 265 The queued claim does not bump updated_at, but stale recovery uses updated_at; an older pending review can be treated as stale immediately after it is claimed and be re-dispatched.
Files Reviewed (2 files)
  • apps/web/src/routers/code-reviews-router.test.ts - 0 issues
  • apps/web/src/routers/code-reviews/code-reviews-router.ts - 0 new issues; previous inline issue resolved

Fix these issues in Kilo Cloud


Reviewed by gpt-5.5-20260423 · 391,926 tokens

# Conflicts:
#	apps/web/src/lib/code-reviews/dispatch/dispatch-pending-reviews.test.ts
#	apps/web/src/lib/code-reviews/dispatch/dispatch-pending-reviews.ts
@alex-alecu alex-alecu merged commit f70339f into main May 5, 2026
39 checks passed
@alex-alecu alex-alecu deleted the stuck-waiting-for-events branch May 5, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants