cloudflare · threepointone · Jun 20, 2026 · Jun 17, 2026 · Jun 17, 2026 · Jun 17, 2026
diff --git a/.changeset/ai-chat-give-up-broadcast-first.md b/.changeset/ai-chat-give-up-broadcast-first.md
@@ -0,0 +1,19 @@
+---
+"@cloudflare/ai-chat": patch
+---
+
+`AIChatAgent` now delivers the terminal banner **before** persisting the durable
+terminal record when chat recovery gives up, converging onto
+`@cloudflare/think`'s broadcast-first ordering.
+
+Previously `_exhaustChatRecovery` persisted the durable terminal record first
+and broadcast the banner second. A terminal-record write can reject in the
+deploy/storage window a give-up runs in (#1730); under persist-first the throw
+propagated before the banner was sent, so the live banner was dropped on that
+pass and only delivered on the healthy re-run (potentially a different isolate,
+after the affected connections had gone). Broadcasting first makes the banner
+resilient to a failing storage write: the throw still propagates and the whole
+give-up re-runs on a healthy isolate, which persists the record idempotently and
+re-delivers the banner (the documented at-least-once edge). Persisting first
+gained no durability — the re-run persists either way — while losing this banner
+resilience, so both chat hosts now terminalize broadcast-first.
diff --git a/.changeset/chat-auto-continuation-event-driven-barrier.md b/.changeset/chat-auto-continuation-event-driven-barrier.md
@@ -0,0 +1,27 @@
+---
+"@cloudflare/ai-chat": minor
+---
+
+`AIChatAgent` now uses an event-driven auto-continuation barrier that parks
+indefinitely on an incomplete parallel tool batch instead of force-continuing
+after a fixed timeout.
+
+Previously, when a turn ended with several parallel client tool calls and only
+some results had arrived, `AIChatAgent` ran the completeness barrier _inside_
+the continuation turn and polled for up to 60s
+(`AUTO_CONTINUATION_PENDING_TOOL_TIMEOUT_MS`), after which it continued
+inference against whatever results had landed — potentially a half-complete tool
+batch. The barrier is now event-driven and runs _before_ the continuation is
+enqueued (converging onto `@cloudflare/think`'s model): it fires only once every
+result in the batch has arrived, re-arms as each sibling result is applied and
+when a streaming turn finalizes, guards against double-fire, and is gated on no
+active stream. There is **no orphan timeout** — a batch with a never-arriving
+sibling now parks budget-free until it completes (the same way a turn already
+parks on a pending HITL/client interaction) rather than force-continuing with
+missing results.
+
+This is a behavior change for the rare stuck-tool case: a result that never
+arrives no longer triggers a continuation after 60s; it parks until the missing
+result lands (or a later user turn / chat recovery repairs the transcript). A
+parked continuation leaves the same on-disk signature as a HITL park, so a
+deploy/crash mid-park recovers by re-arming rather than terminalizing.
diff --git a/.changeset/chat-recovering-on-connect.md b/.changeset/chat-recovering-on-connect.md
@@ -0,0 +1,16 @@
+---
+"@cloudflare/ai-chat": minor
+---
+
+`AIChatAgent` now replays the live "recovering…" status on connect (#1620).
+
+Previously the `cf_agent_chat_recovering` frame was only broadcast live, so a
+client that connected (or reconnected) while a durable turn was mid-recovery —
+between a scheduled continuation and its first chunk — saw nothing and appeared
+frozen until the turn resumed or failed. It now receives the recovering status
+directly on connect (when no stream is active to resume), so `useAgentChat`'s
+`isRecovering` reflects the in-progress recovery immediately. This converges
+`AIChatAgent` onto `@cloudflare/think`'s behavior. The status is still cleared on
+completion, exhaustion, or any terminal outcome, and stale records (older than
+the recovering-flag TTL) are skipped so a recovery abandoned without a terminal
+cannot show "recovering…" forever.
diff --git a/.changeset/chat-recovery-progress-credit-convergence.md b/.changeset/chat-recovery-progress-credit-convergence.md
@@ -0,0 +1,9 @@
+---
+"agents": patch
+"@cloudflare/ai-chat": patch
+"@cloudflare/think": patch
+---
+
+Converge recovery forward-progress crediting between `AIChatAgent` and `Think`.
+
+Both hosts now credit the recovery no-progress counter through one shared, host-agnostic rule (`shouldCreditStreamProgress`): a progress milestone (a started text/reasoning segment or a settled tool input/output) credits unconditionally, and mid-segment streaming deltas (`text-delta`/`reasoning-delta`/`tool-input-delta`) credit at most once per throttle window via a per-isolate `StreamProgressCreditThrottle`. Previously `AIChatAgent` credited only on chunk-type milestones while `Think` credited on its flush cadence, so a long single content segment spanning repeated crashes could read as "no progress" under `AIChatAgent` and false-fire its `no_progress_timeout`. The new rule is never coarser than either host's prior cadence, so it can only delay or avoid a false no-progress timeout, never hasten give-up.
diff --git a/.changeset/chat-row-size-structured-compaction.md b/.changeset/chat-row-size-structured-compaction.md
@@ -0,0 +1,19 @@
+---
+"@cloudflare/ai-chat": minor
+---
+
+`AIChatAgent` now compacts oversized tool outputs structurally instead of
+replacing them with a flat summary string.
+
+Previously, when a persisted assistant message exceeded the SQLite row-size
+limit, `AIChatAgent` replaced each large tool output with a single english
+summary string (`"This tool output was too large to persist… Preview: …"`),
+discarding the original shape. It now uses the shared shape-preserving
+`truncateToolOutput` compactor (the same one `@cloudflare/think` already used):
+objects and arrays keep their structure, long strings are truncated in place
+with a `... [truncated N chars]` marker, and only genuinely unrepresentable
+nesting collapses to a marker object. This makes a compacted tool result far
+easier for the model to keep reasoning about, and converges `AIChatAgent` and
+`@cloudflare/think` onto one row-size compaction path. The
+`metadata.compactedToolOutputs` / `metadata.compactedTextParts` annotations and
+the compaction `console.warn`s are unchanged.
diff --git a/.changeset/chat-stream-stall-watchdog.md b/.changeset/chat-stream-stall-watchdog.md
@@ -0,0 +1,24 @@
+---
+"@cloudflare/ai-chat": minor
+---
+
+`AIChatAgent` can now detect and recover from a hung model/transport stream via
+the opt-in `chatStreamStallTimeoutMs` watchdog (#1626).
+
+Set `chatStreamStallTimeoutMs` (a class field, like `chatRecovery`) to the
+maximum number of milliseconds allowed between stream chunks. If a turn parks
+longer than that — a hung provider or a stalled transport — the watchdog aborts
+the live stream instead of leaving the turn spinning forever. When `chatRecovery`
+is enabled, the stall is routed into the same bounded-recovery machinery a
+deploy/eviction interruption uses: the partial generated so far is persisted and
+a continuation is scheduled (or, once the recovery budget is spent, the
+configured terminal message is delivered). With `chatRecovery` disabled, a stall
+surfaces as a terminal stream error so the spinner is cleared.
+
+The default is `0`, which disables the watchdog (no behavior change unless you
+opt in), matching `@cloudflare/think`. Because the watchdog measures the gap
+between chunks — not total turn duration — a steadily streaming turn never trips
+it regardless of overall length. Internally this is built on the shared
+`iterateWithStallWatchdog` primitive both `@cloudflare/ai-chat` and
+`@cloudflare/think` consume (an internal `agents/chat` seam, not a public API),
+so this change ships under the `@cloudflare/ai-chat` bump alone.
diff --git a/.changeset/orphan-persist-reconcile-helper.md b/.changeset/orphan-persist-reconcile-helper.md
@@ -0,0 +1,18 @@
+---
+"agents": patch
+---
+
+Export `reconcileOrphanPartial` from `agents/chat`.
+
+This is the shared primitive that merges a freshly-reconstructed orphaned stream
+partial onto an assistant message that already owns its target id (an early
+persist at tool-approval time, or a continuation resuming the prior assistant
+message). It keeps all existing parts, appends only reconstructed parts whose
+`toolCallId` is not already present (so a recovery replay never duplicates a
+tool call), and overlays incoming metadata onto existing — preserving an
+in-place tool result that lives only in storage rather than letting a replayed
+chunk re-advance it. `@cloudflare/ai-chat`'s orphan-persist path now uses it;
+hosts whose orphan persist only runs at stream finalize don't need it (the
+shared reconstruction is already idempotent by `toolCallId`).
+
+Additive export only — no behavior change to existing APIs.
diff --git a/.changeset/orphan-persist-store-interface.md b/.changeset/orphan-persist-store-interface.md
@@ -0,0 +1,17 @@
+---
+"agents": patch
+---
+
+Export the `OrphanPersistStore<M = UIMessage>` type from `agents/chat`.
+
+This is the minimal message store the chat-recovery orphan-persist write goes
+through — the write subset of `SessionProvider` (the `getMessage`,
+`appendMessage`, and `updateMessage` methods). It is parameterized over the
+host's message type so the seam itself is not AI-SDK-specific: the AI-SDK chat
+hosts (`@cloudflare/ai-chat`, `@cloudflare/think`) instantiate it at the
+`UIMessage` default, while `SessionProvider` satisfies it at its own
+`SessionMessage`. Both hosts now route their orphan-persist write through a host
+adapter typed against this interface, turning the previous by-convention
+alignment into a type-enforced contract.
+
+Additive export only — no behavior change to existing APIs.
diff --git a/.changeset/recovery-stream-resolution-newest-row.md b/.changeset/recovery-stream-resolution-newest-row.md
@@ -0,0 +1,15 @@
+---
+"@cloudflare/ai-chat": patch
+"@cloudflare/think": patch
+---
+
+Recovery give-up now resolves the orphaned stream by newest metadata row.
+
+The stable-timeout/error give-up path that terminalizes an exhausted recovery
+turn previously resolved the turn's orphaned stream id with an in-memory
+first-match scan over all stream metadata, while the wake (restart) path already
+used the newest durable row keyed by the recovery-root request id. These two
+lookups are now a single seam, so both paths surface the same partial — the
+newest stream the turn produced — when a request id spans more than one
+recovery attempt. Single-attempt turns (one stream row per request id) are
+unaffected.
diff --git a/.changeset/think-row-size-compaction-annotations.md b/.changeset/think-row-size-compaction-annotations.md
@@ -0,0 +1,16 @@
+---
+"@cloudflare/think": minor
+---
+
+`Think` now annotates and logs row-size compaction the same way
+`@cloudflare/ai-chat` does.
+
+When a persisted message exceeds the SQLite row-size limit and `Think` compacts
+its tool outputs or truncates its text parts to fit, the resulting message now
+carries `metadata.compactedToolOutputs` (the compacted tool-call IDs) and/or
+`metadata.compactedTextParts` (the truncated text-part indices), and `Think`
+emits a `console.warn` describing the compaction. The compaction itself is
+unchanged — `Think` already used the shared shape-preserving `truncateToolOutput`
+compactor — this only adds the previously ai-chat-only annotations/warnings so a
+client can tell that a stored message was compacted. Both packages now share one
+`enforceRowSizeLimit` implementation.
diff --git a/.github/workflows/nightly.yml b/.github/workflows/nightly.yml
@@ -6,6 +6,11 @@ on:
     - cron: "0 0 * * *"
   # Allow manual trigger for debugging
   workflow_dispatch:
+    inputs:
+      run_deployed:
+        description: "Also run the DEPLOYED recovery suites (real, billable Workers)"
+        type: boolean
+        default: false
 
 concurrency:
   group: ${{ github.workflow }}
@@ -66,3 +71,137 @@ jobs:
           CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
         run: pnpm run test:e2e
         working-directory: packages/think
+
+  e2e-ai-chat-recovery:
+    name: "E2E: ai-chat (SIGKILL recovery)"
+    timeout-minutes: 30
+    runs-on: ubuntu-24.04
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          fetch-depth: 1
+
+      - uses: ./.github/actions/install
+
+      - run: pnpm run build
+
+      # Real `wrangler dev` + mid-stream SIGKILL suite that proves AIChatAgent's
+      # convergence onto the shared `agents/chat` recovery engine survives a real
+      # isolate crash. Distinct from the `e2e-ai-chat` Playwright job above, which
+      # exercises the client-side resume protocol (disconnect/reconnect), not
+      # process death. (Excludes the env-gated `deployed-recovery` suite.)
+      - name: Run ai-chat SIGKILL recovery e2e tests
+        env:
+          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
+          CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
+        run: pnpm run test:e2e
+        working-directory: packages/ai-chat
+
+  e2e-agents:
+    name: "E2E: agents (fiber eviction)"
+    timeout-minutes: 30
+    runs-on: ubuntu-24.04
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          fetch-depth: 1
+
+      - uses: ./.github/actions/install
+
+      - run: pnpm run build
+
+      # Core `runFiber` SIGKILL recovery primitives (root/sub-agent, concurrency,
+      # scan-deadline yield, poison-row aging/backoff, facet multipass). Excluded
+      # from the default unit `test` target because it spawns real wrangler dev
+      # processes; runs here nightly instead.
+      - name: Run agents fiber-recovery e2e tests
+        env:
+          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
+          CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
+        run: pnpm run test:e2e
+        working-directory: packages/agents
+
+  e2e-engine-genericity:
+    name: "E2E: shared-engine genericity (pi + tanstack)"
+    timeout-minutes: 30
+    runs-on: ubuntu-24.04
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          fetch-depth: 1
+
+      - uses: ./.github/actions/install
+
+      - run: pnpm run build
+
+      # Genericity proofs that the shared `agents/chat` recovery engine is not
+      # AI-SDK-specific: a non-`UIMessage` PiAgent and a foreign (AG-UI / TanStack
+      # AI) tool vocabulary each recover from a real SIGKILL through the same
+      # engine + resume handshake + codec. Both use deterministic faux models; the
+      # real-Workers-AI tanstack leg stays gated behind RUN_WORKERS_AI_E2E.
+      - name: Run pi-recovery genericity e2e
+        run: pnpm run test:e2e
+        working-directory: experimental/pi-recovery
+
+      - name: Run tanstack-recovery genericity e2e
+        env:
+          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
+          CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
+        run: pnpm run test:e2e
+        working-directory: experimental/tanstack-recovery
+
+  # ── DEPLOYED (Layer-5) recovery suites ──────────────────────────────────────
+  # These deploy REAL, billable Workers and drive recovery on Cloudflare's edge
+  # (not local workerd). They are OPT-IN: skipped unless the repo variable
+  # `RUN_DEPLOYED_E2E` is `1`, or a manual run sets the `run_deployed` input.
+  # Each suite uniquely names its throwaway Worker and always deletes it.
+  e2e-deployed-ai-chat:
+    name: "E2E (deployed): ai-chat recovery on real edge"
+    if: ${{ vars.RUN_DEPLOYED_E2E == '1' || github.event.inputs.run_deployed == 'true' }}
+    timeout-minutes: 30
+    runs-on: ubuntu-24.04
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          fetch-depth: 1
+
+      - uses: ./.github/actions/install
+
+      - run: pnpm run build
+
+      # Deploys `wrangler.deployed.jsonc`, forces a mid-turn redeploy to evict the
+      # live DO, asserts recovery fires + the no-false-incident counterpart, then
+      # deletes the Worker.
+      - name: Run ai-chat DEPLOYED recovery e2e
+        env:
+          RUN_DEPLOYED_E2E: "1"
+          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
+          CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
+        run: pnpm run test:e2e:deployed
+        working-directory: packages/ai-chat
+
+  e2e-deployed-think-probe:
+    name: "E2E (deployed): Think recovery probe on real edge"
+    if: ${{ vars.RUN_DEPLOYED_E2E == '1' || github.event.inputs.run_deployed == 'true' }}
+    timeout-minutes: 30
+    runs-on: ubuntu-24.04
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          fetch-depth: 1
+
+      - uses: ./.github/actions/install
+
+      - run: pnpm run build
+
+      # Deploys the chat-recovery-probe under a throwaway name and runs the fast,
+      # deterministic abort-driven Think scenarios (a6 HITL, a7 server-orphan,
+      # a8 approval, idem), then deletes the Worker. The slow real-deploy-churn
+      # scenarios (a1/a2/a4/a5/a9/rapid) stay manual (see the probe README).
+      - name: Run Think recovery-probe DEPLOYED suite
+        env:
+          RUN_DEPLOYED_E2E: "1"
+          CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
+          CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
+        run: pnpm run test:e2e:deployed
+        working-directory: experimental/chat-recovery-probe
diff --git a/.gitignore b/.gitignore
@@ -140,6 +140,9 @@ dist
 .wrangler-*-state
 .dev.vars
 
+# e2e recovery harness miniflare/SQLite state
+**/.smoke-state/
+
 # macOS
 .DS_Store