You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/ai-chat/custom-agents.mdx
+32Lines changed: 32 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -179,6 +179,38 @@ for await (const turn of session) {
179
179
}
180
180
```
181
181
182
+
## Stopping generation
183
+
184
+
The frontend stops a turn with [`transport.stopGeneration(chatId)`](/ai-chat/frontend#stop-generation), which writes a stop signal to the session's input stream. It aborts the current turn's generation but keeps the run alive, so the next message continues on the same session.
185
+
186
+
`turn.signal` is a combined stop-and-cancel `AbortSignal`, fresh each turn. Pass it to `streamText` so the stop reaches the model, then let `turn.complete()` finish the turn:
187
+
188
+
```ts trigger/my-chat.ts
189
+
forawait (const turn ofsession) {
190
+
const result =streamText({
191
+
model: anthropic("claude-sonnet-4-5"),
192
+
messages: turn.messages,
193
+
abortSignal: turn.signal, // fires on a user stop OR a run cancel
194
+
stopWhen: stepCountIs(15),
195
+
});
196
+
197
+
awaitturn.complete(result);
198
+
199
+
if (turn.stopped) {
200
+
// user stopped this turn — the partial response is already accumulated
201
+
}
202
+
}
203
+
```
204
+
205
+
On a stop, `turn.complete()` cleans up the aborted parts of the partial response, accumulates it as its own assistant message, and writes turn-complete. The run does not end — the loop continues to the next turn.
206
+
207
+
Read `turn.stopped` to tell a user stop from a full run cancel:
208
+
209
+
-**User stop** (`transport.stopGeneration`): `turn.signal` aborts, `turn.stopped` is `true`, the partial response is accumulated, and the run stays alive for the next message.
210
+
-**Run cancel** (cancelled, expired, or `maxDuration` exceeded): `turn.signal` aborts, `turn.stopped` is `false`, and `turn.complete()` returns without accumulating because the run is ending.
211
+
212
+
A hand-rolled loop wires this itself with `chat.createStopSignal()` and `chat.cleanupAbortedParts()`. Two things `createSession` handles for you are easy to get wrong there — see the [hand-rolled loop checklist](#hand-rolled-loop-checklist).
213
+
182
214
## Hand-rolled loop with primitives
183
215
184
216
For full control, skip `createSession` and compose the primitives directly:
Agent suspends (compute freed) — maxDuration does not tick while paused
24
24
25
25
Frontend:
26
26
Renders question + option buttons from tool input
@@ -36,6 +36,14 @@ Turn N+1:
36
36
37
37
The AI SDK's `toUIMessageStream` automatically reuses the assistant message ID across the pause (we pass `originalMessages` internally), so `responseMessage` in the post-resume `onTurnComplete` is the **full merged message** — the original text, the completed tool call, and any follow-up content — not just the new parts.
38
38
39
+
## Duration and cost while paused
40
+
41
+
A pause doesn't hold compute. After the model calls a no-execute tool, the turn finishes and the run stays warm for `idleTimeoutInSeconds` (default 30s), then **suspends** and frees its compute, the same way [`wait.for`](/wait-for) does. The user's `addToolOutput` wakes it back up.
42
+
43
+
Because the run is suspended while it waits, the human's thinking time is not billed and does **not** count against [`maxDuration`](/runs/max-duration). `maxDuration` measures active CPU time and excludes suspended waitpoint time, exactly like `wait.for`, so a user can take minutes, hours, or days to answer without the run hitting `maxDuration`. The only time that counts is each turn's actual compute plus the short warm window before each suspend.
44
+
45
+
You don't need to raise `maxDuration` or end the run to support long human waits. How long a single suspended pause stays open is governed by the run's suspend timeout, not `maxDuration`; if a wait outlives it the run ends, and the next `addToolOutput` boots a fresh continuation that picks up the resolved tool result.
46
+
39
47
## Backend: define the tool
40
48
41
49
A HITL tool has an `inputSchema` describing what the model can ask, but **no `execute` function**. When the LLM calls it, `streamText` returns control to your agent.
0 commit comments