Merge branch 'main' into run-store-write-adapter

d-cs · web-flow · commit 3b809b9aafbe · 2026-06-18T12:24:56.000+01:00
diff --git a/docs/ai-chat/custom-agents.mdx b/docs/ai-chat/custom-agents.mdx
@@ -179,6 +179,38 @@ for await (const turn of session) {
 }
 ```
 
+## Stopping generation
+
+The frontend stops a turn with [`transport.stopGeneration(chatId)`](/ai-chat/frontend#stop-generation), which writes a stop signal to the session's input stream. It aborts the current turn's generation but keeps the run alive, so the next message continues on the same session.
+
+`turn.signal` is a combined stop-and-cancel `AbortSignal`, fresh each turn. Pass it to `streamText` so the stop reaches the model, then let `turn.complete()` finish the turn:
+
+```ts trigger/my-chat.ts
+for await (const turn of session) {
+  const result = streamText({
+    model: anthropic("claude-sonnet-4-5"),
+    messages: turn.messages,
+    abortSignal: turn.signal, // fires on a user stop OR a run cancel
+    stopWhen: stepCountIs(15),
+  });
+
+  await turn.complete(result);
+
+  if (turn.stopped) {
+    // user stopped this turn — the partial response is already accumulated
+  }
+}
+```
+
+On a stop, `turn.complete()` cleans up the aborted parts of the partial response, accumulates it as its own assistant message, and writes turn-complete. The run does not end — the loop continues to the next turn.
+
+Read `turn.stopped` to tell a user stop from a full run cancel:
+
+- **User stop** (`transport.stopGeneration`): `turn.signal` aborts, `turn.stopped` is `true`, the partial response is accumulated, and the run stays alive for the next message.
+- **Run cancel** (cancelled, expired, or `maxDuration` exceeded): `turn.signal` aborts, `turn.stopped` is `false`, and `turn.complete()` returns without accumulating because the run is ending.
+
+A hand-rolled loop wires this itself with `chat.createStopSignal()` and `chat.cleanupAbortedParts()`. Two things `createSession` handles for you are easy to get wrong there — see the [hand-rolled loop checklist](#hand-rolled-loop-checklist).
+
 ## Hand-rolled loop with primitives
 
 For full control, skip `createSession` and compose the primitives directly:
diff --git a/docs/ai-chat/patterns/human-in-the-loop.mdx b/docs/ai-chat/patterns/human-in-the-loop.mdx
@@ -20,7 +20,7 @@ Turn N:
   LLM streams text → calls askUser tool (no execute)
   streamText ends with tool-call in `input-available` state
   onTurnComplete fires (finishReason = "tool-calls")
-  Agent idle
+  Agent suspends (compute freed) — maxDuration does not tick while paused
 
 Frontend:
   Renders question + option buttons from tool input
@@ -36,6 +36,14 @@ Turn N+1:
 
 The AI SDK's `toUIMessageStream` automatically reuses the assistant message ID across the pause (we pass `originalMessages` internally), so `responseMessage` in the post-resume `onTurnComplete` is the **full merged message** — the original text, the completed tool call, and any follow-up content — not just the new parts.
 
+## Duration and cost while paused
+
+A pause doesn't hold compute. After the model calls a no-execute tool, the turn finishes and the run stays warm for `idleTimeoutInSeconds` (default 30s), then **suspends** and frees its compute, the same way [`wait.for`](/wait-for) does. The user's `addToolOutput` wakes it back up.
+
+Because the run is suspended while it waits, the human's thinking time is not billed and does **not** count against [`maxDuration`](/runs/max-duration). `maxDuration` measures active CPU time and excludes suspended waitpoint time, exactly like `wait.for`, so a user can take minutes, hours, or days to answer without the run hitting `maxDuration`. The only time that counts is each turn's actual compute plus the short warm window before each suspend.
+
+You don't need to raise `maxDuration` or end the run to support long human waits. How long a single suspended pause stays open is governed by the run's suspend timeout, not `maxDuration`; if a wait outlives it the run ends, and the next `addToolOutput` boots a fresh continuation that picks up the resolved tool result.
+
 ## Backend: define the tool
 
 A HITL tool has an `inputSchema` describing what the model can ask, but **no `execute` function**. When the LLM calls it, `streamText` returns control to your agent.