From f3ab518c20ae914e26f21188a9badc73df2bfc52 Mon Sep 17 00:00:00 2001
From: Eric Allam <eallam@icloud.com>
Date: Thu, 18 Jun 2026 11:51:32 +0100
Subject: [PATCH] docs(ai-chat): document HITL pause suspension and maxDuration

---
 docs/ai-chat/patterns/human-in-the-loop.mdx | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/docs/ai-chat/patterns/human-in-the-loop.mdx b/docs/ai-chat/patterns/human-in-the-loop.mdx
index 7a8028bf85b..843523f59d9 100644
--- a/docs/ai-chat/patterns/human-in-the-loop.mdx
+++ b/docs/ai-chat/patterns/human-in-the-loop.mdx
@@ -20,7 +20,7 @@ Turn N:
   LLM streams text → calls askUser tool (no execute)
   streamText ends with tool-call in `input-available` state
   onTurnComplete fires (finishReason = "tool-calls")
-  Agent idle
+  Agent suspends (compute freed) — maxDuration does not tick while paused
 
 Frontend:
   Renders question + option buttons from tool input
@@ -36,6 +36,14 @@ Turn N+1:
 
 The AI SDK's `toUIMessageStream` automatically reuses the assistant message ID across the pause (we pass `originalMessages` internally), so `responseMessage` in the post-resume `onTurnComplete` is the **full merged message** — the original text, the completed tool call, and any follow-up content — not just the new parts.
 
+## Duration and cost while paused
+
+A pause doesn't hold compute. After the model calls a no-execute tool, the turn finishes and the run stays warm for `idleTimeoutInSeconds` (default 30s), then **suspends** and frees its compute, the same way [`wait.for`](/wait-for) does. The user's `addToolOutput` wakes it back up.
+
+Because the run is suspended while it waits, the human's thinking time is not billed and does **not** count against [`maxDuration`](/runs/max-duration). `maxDuration` measures active CPU time and excludes suspended waitpoint time, exactly like `wait.for`, so a user can take minutes, hours, or days to answer without the run hitting `maxDuration`. The only time that counts is each turn's actual compute plus the short warm window before each suspend.
+
+You don't need to raise `maxDuration` or end the run to support long human waits. How long a single suspended pause stays open is governed by the run's suspend timeout, not `maxDuration`; if a wait outlives it the run ends, and the next `addToolOutput` boots a fresh continuation that picks up the resolved tool result.
+
 ## Backend: define the tool
 
 A HITL tool has an `inputSchema` describing what the model can ask, but **no `execute` function**. When the LLM calls it, `streamText` returns control to your agent.