cloudflare · thomasgauvin · Jun 18, 2026 · Jun 18, 2026 · Jun 18, 2026 · Jun 18, 2026
diff --git a/.changeset/think-prompt-recovery.md b/.changeset/think-prompt-recovery.md
@@ -0,0 +1,13 @@
+---
+"@cloudflare/think": minor
+---
+
+Add DO chat recovery to the `step.prompt()` retry loop.
+
+When a prompt wait times out (e.g. the Think Durable Object was restarting during a deploy), the retry loop first tries to recover the in-flight submission via the DO's built-in chat recovery before discarding it and resubmitting. It inspects the submission and, if it is still `pending`/`running` (recovery in progress) or already `completed`, re-waits for the original completion event — reusing the in-flight turn instead of wasting it.
+
+Recovery is resilient to the DO being temporarily unreachable: while `inspectSubmission` fails (the DO is still coming back up after a deploy), the submission is treated as "still recovering" rather than dead, so the loop backs off and re-checks rather than abandoning the durable submission. Recovery runs for a bounded number of rounds; if it can't recover within that budget it falls through to the cancel + fresh-resubmit path. It never throws out of `step.prompt()` — a recovery-wait timeout, a terminal failure of the recovered turn, or invalid recovered output all fall through to a fresh retry.
+
+Each retry attempt uses a distinct event type derived from its key, so a delivered workflow event maps 1:1 to the submission that produced it and no event can be misattributed across attempts. The DO re-emits an interrupted submission's completion event with that same type, which the recovery wait listens on.
+
+Recovery is only attempted for `ThinkPromptTimeoutError` with `retryOnTimeout` enabled. Non-timeout errors (provider errors, validation failures) still go through the cancel + full-retry path. This leverages Think's existing submission recovery and fiber mechanisms — no new RPC is needed (`inspectSubmission` already exists).
diff --git a/.changeset/think-step-prompt-retries.md b/.changeset/think-step-prompt-retries.md
@@ -0,0 +1,11 @@
+---
+"@cloudflare/think": patch
+---
+
+Add optional retries to `ThinkWorkflow.step.prompt()`.
+
+`step.prompt()` now accepts a `retries` option with `{ maxAttempts?, baseDelayMs?, maxDelayMs?, retryOnTimeout? }`. When a prompt fails for any reason, the workflow waits with jittered exponential backoff and submits a fresh prompt attempt, mirroring the default behavior of `step.do()` retries. All prompt failures are retried up to `maxAttempts` (including the first attempt). Set `retryOnTimeout: false` to fail fast on a wait timeout instead of retrying (timeouts often repeat).
+
+Retry state is durable: each retry uses unique workflow step names and idempotency keys, so retries survive workflow hibernation and replays. The first attempt keeps the original (`:submit`/`:wait`) step names so in-flight workflows from earlier versions continue to replay without re-executing completed steps.
+
+Before retrying, the workflow cancels the abandoned attempt's submission. Think keeps its own `chatRecovery` running for the submission (which preserves in-flight turn state across DO restarts/stalls), so without this a lingering turn or recovery continuation for the old attempt could keep running and race the fresh attempt on the same session — producing duplicate or interleaved output. Each retry is also logged via `console.warn` with the step name, attempt, backoff delay, and error.
diff --git a/docs/think/workflows.md b/docs/think/workflows.md
@@ -178,6 +178,43 @@ and throws `ThinkPromptTimeoutError`.
 Set `cancelOnTimeout: false` when you intentionally want the Think submission to
 continue after the Workflow stops waiting.
 
+## Retries
+
+Pass `retries` to retry a failed `step.prompt()` attempt:
+
+```typescript
+await step.prompt("summarize-file", {
+  prompt: "Summarize the file",
+  output: summarySchema,
+  timeout: "5 minutes",
+  retries: {
+    maxAttempts: 3,
+    baseDelayMs: 500,
+    maxDelayMs: 5000,
+    retryOnTimeout: true
+  }
+});
+```
+
+| Option           | Default | Description                                      |
+| ---------------- | ------- | ------------------------------------------------ |
+| `maxAttempts`    | `1`     | Total attempts, including the first attempt.     |
+| `baseDelayMs`    | `500`   | Base delay for deterministic exponential jitter. |
+| `maxDelayMs`     | `5000`  | Maximum retry delay.                             |
+| `retryOnTimeout` | `true`  | Whether timeout errors should be retried.        |
+
+When retries are enabled and a wait times out, `step.prompt()` first gives the
+Think Durable Object a chance to recover the in-flight submission. This is useful
+when the Durable Object is restarting after a deploy: the Workflow waits for the
+original completion event instead of immediately discarding the turn and
+submitting a duplicate prompt. If recovery cannot complete within its bounded
+retry window, the Workflow cancels the abandoned submission and submits a fresh
+attempt.
+
+Set `retryOnTimeout: false` to fail fast on `ThinkPromptTimeoutError`. With
+multiple attempts enabled, `cancelOnTimeout` only applies to the final timed-out
+attempt; abandoned attempts are cancelled before a fresh retry starts.
+
 ## Boundary With Other Primitives
 
 Use `getScheduledTasks()` for recurring prompt submissions or deterministic

diff --git a/packages/think/src/tests/think-session.test.ts b/packages/think/src/tests/think-session.test.ts
@@ -3862,6 +3862,23 @@ describe("Think — onChatRecovery", () => {
     expect(incident?.status).toBe("scheduled");
   });
 
+  it("does not mark a running submission errored while a recovered retry is scheduled", async () => {
+    const agent = await freshRecoveryAgent(
+      `recover-retry-sweep-${crypto.randomUUID()}`
+    );
+    await agent.seedRunningSubmissionForTest("root-RS");
+    await agent.preScheduleRecoveryRetryForTest({
+      recoveredRequestId: "root-RS",
+      targetUserId: "user-RS",
+      incidentId: "inc-RS",
+      originalRequestId: "root-RS"
+    });
+
+    await agent.recoverSubmissionsOnStartForTest();
+
+    expect(await agent.getSubmissionStatusForTest("root-RS")).toBe("running");
+  });
+
   it("exhausts via onExhausted once the stable-state continue budget is spent", async () => {
     const agent = await freshRecoveryAgent(
       `stable-exhaust-${crypto.randomUUID()}`