From f3ab518c20ae914e26f21188a9badc73df2bfc52 Mon Sep 17 00:00:00 2001 From: Eric Allam Date: Thu, 18 Jun 2026 11:51:32 +0100 Subject: [PATCH] docs(ai-chat): document HITL pause suspension and maxDuration --- docs/ai-chat/patterns/human-in-the-loop.mdx | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/ai-chat/patterns/human-in-the-loop.mdx b/docs/ai-chat/patterns/human-in-the-loop.mdx index 7a8028bf85b..843523f59d9 100644 --- a/docs/ai-chat/patterns/human-in-the-loop.mdx +++ b/docs/ai-chat/patterns/human-in-the-loop.mdx @@ -20,7 +20,7 @@ Turn N: LLM streams text → calls askUser tool (no execute) streamText ends with tool-call in `input-available` state onTurnComplete fires (finishReason = "tool-calls") - Agent idle + Agent suspends (compute freed) — maxDuration does not tick while paused Frontend: Renders question + option buttons from tool input @@ -36,6 +36,14 @@ Turn N+1: The AI SDK's `toUIMessageStream` automatically reuses the assistant message ID across the pause (we pass `originalMessages` internally), so `responseMessage` in the post-resume `onTurnComplete` is the **full merged message** — the original text, the completed tool call, and any follow-up content — not just the new parts. +## Duration and cost while paused + +A pause doesn't hold compute. After the model calls a no-execute tool, the turn finishes and the run stays warm for `idleTimeoutInSeconds` (default 30s), then **suspends** and frees its compute, the same way [`wait.for`](/wait-for) does. The user's `addToolOutput` wakes it back up. + +Because the run is suspended while it waits, the human's thinking time is not billed and does **not** count against [`maxDuration`](/runs/max-duration). `maxDuration` measures active CPU time and excludes suspended waitpoint time, exactly like `wait.for`, so a user can take minutes, hours, or days to answer without the run hitting `maxDuration`. The only time that counts is each turn's actual compute plus the short warm window before each suspend. + +You don't need to raise `maxDuration` or end the run to support long human waits. How long a single suspended pause stays open is governed by the run's suspend timeout, not `maxDuration`; if a wait outlives it the run ends, and the next `addToolOutput` boots a fresh continuation that picks up the resolved tool result. + ## Backend: define the tool A HITL tool has an `inputSchema` describing what the model can ask, but **no `execute` function**. When the LLM calls it, `streamText` returns control to your agent.