Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .changeset/think-prompt-recovery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
"@cloudflare/think": minor
---

Add DO chat recovery to `step.prompt()` retry loop.

When a prompt wait times out (e.g. the Think Durable Object died during a deploy), the retry loop now checks whether the DO's built-in chat recovery has picked up the interrupted submission before cancelling and re-submitting. If the submission is still `pending` or `running` (recovery in progress) or already `completed`, the workflow re-waits for the original completion event instead of wasting the in-flight turn.

This leverages Think's existing `_recoverSubmissionsOnStart()` and fiber recovery mechanisms — no new RPC is needed (`inspectSubmission` already exists). The workflow uses a single event type across all retry attempts so the recovered submission's completion event reaches any retry's `waitForEvent`.

Recovery is only attempted for `ThinkPromptTimeoutError` with `retryOnTimeout` enabled. Non-timeout errors (provider errors, validation failures) still go through the existing cancel + full retry path. If the recovery re-wait also times out, the loop falls through to cancel + full retry (no infinite recovery loop).
11 changes: 11 additions & 0 deletions .changeset/think-step-prompt-retries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
"@cloudflare/think": patch
---

Add optional retries to `ThinkWorkflow.step.prompt()`.

`step.prompt()` now accepts a `retries` option with `{ maxAttempts?, baseDelayMs?, maxDelayMs?, retryOnTimeout? }`. When a prompt fails for any reason, the workflow waits with jittered exponential backoff and submits a fresh prompt attempt, mirroring the default behavior of `step.do()` retries. All prompt failures are retried up to `maxAttempts` (including the first attempt). Set `retryOnTimeout: false` to fail fast on a wait timeout instead of retrying (timeouts often repeat).

Retry state is durable: each retry uses unique workflow step names and idempotency keys, so retries survive workflow hibernation and replays. The first attempt keeps the original (`:submit`/`:wait`) step names so in-flight workflows from earlier versions continue to replay without re-executing completed steps.

Before retrying, the workflow cancels the abandoned attempt's submission. Think keeps its own `chatRecovery` running for the submission (which preserves in-flight turn state across DO restarts/stalls), so without this a lingering turn or recovery continuation for the old attempt could keep running and race the fresh attempt on the same session — producing duplicate or interleaved output. Each retry is also logged via `console.warn` with the step name, attempt, backoff delay, and error.
Loading
Loading