You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Think (and the agents Session memory layer) has no first-class prompt caching, and the one caching-adjacent primitive it does have — withCachedPrompt() / freezeSystemPrompt() — is in tension with dynamic per-turn context. This is the last item from a customer's hand-rolled patch set that the framework doesn't own (the recovery / transcript-integrity / compaction / sub-agent-signal items are addressed in #1623). Filing as a discussion to settle the design before implementing, because it crosses the provider boundary and interacts with compaction and the per-turn system suffix.
1. No provider cache breakpoints anywhere in core. There is no cacheControl / cache_control wiring in packages/agents or packages/think. Apps that want Anthropic/Bedrock prompt caching must set providerOptions themselves per turn (Think forwards TurnConfig.providerOptions, but adds no cache defaults). So even a perfectly stable system prefix is not actually cached at the provider unless the app hand-marks it.
2. withCachedPrompt() / freezeSystemPrompt() is opt-in and freezes at first use.
Session.withCachedPrompt() is a builder opt-in; Think.configureSession() returns the session unchanged by default.
freezeSystemPrompt() renders the context blocks once and persists the string to the prompt store; subsequent turns return the stored value verbatim (packages/agents/src/experimental/memory/session/context.ts):
asyncfreezeSystemPrompt(): Promise<string>{if(this.promptStore){conststored=awaitthis.promptStore.get();if(stored!==null)returnstored;// frozen — never re-rendered}if(!this.loaded)awaitthis.load();constprompt=this.toSystemPrompt();if(this.promptStore)awaitthis.promptStore.set(prompt);returnprompt;}
The in-memory snapshot has the same freeze semantics (toSystemPrompt() caches the first render; only refreshSnapshot() re-renders).
The frozen prefix is only invalidated by refreshSystemPrompt() (which compaction calls after a successful summarize).
3. The per-turn system suffix sits after the frozen base, with no cache boundary.Think._systemPromptForTurn() appends a capability block reflecting the current tool set after the (possibly frozen) base:
private_systemPromptForTurn(baseSystem: string,tools: ToolSet): string {if(baseSystem.includes("You are running inside a Think agent."))returnbaseSystem;return`${baseSystem.trimEnd()}\n\n${this._buildThinkCapabilityBlock(tools)}`;}
So the string actually sent to the model is frozenBase + per-turn capability block, and nothing tells the provider where the cacheable prefix ends.
The two coupled problems
No way to express "cache this prefix" to the provider. Caching is the single biggest cost/latency lever for long agent loops, and today it's entirely a per-app providerOptions chore, re-implemented per provider (Anthropic ephemeral breakpoints, Bedrock, OpenAI implicit caching, Gemini).
Frozen vs dynamic is all-or-nothing. If you freeze for cache stability, per-turn dynamic context (e.g. set_context, current time, retrieved memory) can't update the cached region until you call refreshSystemPrompt() — which throws away the cache. If you don't freeze, the prefix can drift every turn and never caches. There is no notion of a stable cached prefix + a small dynamic tail within the system prompt.
Design questions to settle
Where do breakpoints go? System prompt only, or also tools and the first N transcript messages? Anthropic allows up to 4 breakpoints; the highest-leverage layout for agent loops is usually [stable system] [tools] [stable history prefix].
Provider-agnostic abstraction. Should Think expose a declarative cache: { ttl, breakpoints } concept it lowers to each provider's providerOptions, or just document the manual path and offer a helper? How do we degrade on providers without explicit caching (OpenAI implicit, Gemini)?
Stable-prefix + dynamic-tail in the system prompt. Do we split the system prompt into a cached region (context blocks frozen) and an explicitly-dynamic region (per-turn) so the dynamic part lands after the cache breakpoint and doesn't bust it? The current capability suffix already implies this shape — it just isn't marked.
Interaction with compaction.refreshSystemPrompt() (called post-compaction) invalidates the frozen prefix and therefore the cache. Is that acceptable (compaction is rare), or do we want compaction to preserve the cached head?
AIChatAgent. It assembles messages in the subclass onChatMessage, so any caching primitive there is necessarily a documented pattern, not framework-owned. Do we want parity, or is this Think-only?
Tentative direction (for discussion, not a decision)
Add a declarative caching surface on Think / Session (e.g. mark context blocks or regions as cacheable, and a cache option that lowers to providerOptions breakpoints per provider).
Formalize the cacheable prefix + dynamic suffix split that _systemPromptForTurn already hints at, so the dynamic part is always after the last cache breakpoint.
Keep it opt-in and provider-aware; no-op cleanly where the provider has no explicit cache control.
Scope / non-goals
Not trying to auto-detect cacheability or auto-tune TTLs in v1.
Not changing default behavior — caching stays opt-in.
Summary
Think (and the agents Session memory layer) has no first-class prompt caching, and the one caching-adjacent primitive it does have —
withCachedPrompt()/freezeSystemPrompt()— is in tension with dynamic per-turn context. This is the last item from a customer's hand-rolled patch set that the framework doesn't own (the recovery / transcript-integrity / compaction / sub-agent-signal items are addressed in #1623). Filing as a discussion to settle the design before implementing, because it crosses the provider boundary and interacts with compaction and the per-turn system suffix.Related: #1593 (compaction token counting), #1623 (deploy-churn hardening).
Current state (what the framework does today)
1. No provider cache breakpoints anywhere in core. There is no
cacheControl/cache_controlwiring inpackages/agentsorpackages/think. Apps that want Anthropic/Bedrock prompt caching must setproviderOptionsthemselves per turn (Think forwardsTurnConfig.providerOptions, but adds no cache defaults). So even a perfectly stable system prefix is not actually cached at the provider unless the app hand-marks it.2.
withCachedPrompt()/freezeSystemPrompt()is opt-in and freezes at first use.Session.withCachedPrompt()is a builder opt-in;Think.configureSession()returns the session unchanged by default.freezeSystemPrompt()renders the context blocks once and persists the string to the prompt store; subsequent turns return the stored value verbatim (packages/agents/src/experimental/memory/session/context.ts):toSystemPrompt()caches the first render; onlyrefreshSnapshot()re-renders).refreshSystemPrompt()(which compaction calls after a successful summarize).3. The per-turn system suffix sits after the frozen base, with no cache boundary.
Think._systemPromptForTurn()appends a capability block reflecting the current tool set after the (possibly frozen) base:So the string actually sent to the model is
frozenBase + per-turn capability block, and nothing tells the provider where the cacheable prefix ends.The two coupled problems
providerOptionschore, re-implemented per provider (Anthropic ephemeral breakpoints, Bedrock, OpenAI implicit caching, Gemini).set_context, current time, retrieved memory) can't update the cached region until you callrefreshSystemPrompt()— which throws away the cache. If you don't freeze, the prefix can drift every turn and never caches. There is no notion of a stable cached prefix + a small dynamic tail within the system prompt.Design questions to settle
[stable system] [tools] [stable history prefix].cache: { ttl, breakpoints }concept it lowers to each provider'sproviderOptions, or just document the manual path and offer a helper? How do we degrade on providers without explicit caching (OpenAI implicit, Gemini)?refreshSystemPrompt()(called post-compaction) invalidates the frozen prefix and therefore the cache. Is that acceptable (compaction is rare), or do we want compaction to preserve the cached head?onChatMessage, so any caching primitive there is necessarily a documented pattern, not framework-owned. Do we want parity, or is this Think-only?Tentative direction (for discussion, not a decision)
cacheable, and acacheoption that lowers toproviderOptionsbreakpoints per provider)._systemPromptForTurnalready hints at, so the dynamic part is always after the last cache breakpoint.Scope / non-goals
References
packages/agents/src/experimental/memory/session/context.ts—toSystemPrompt/freezeSystemPrompt/getSystemPromptForEstimate/refreshSystemPromptpackages/think/src/think.ts—_systemPromptForTurn/_buildThinkCapabilityBlock, turn assembly,TurnConfig.providerOptionspackages/agents/src/experimental/memory/session/session.ts—withCachedPrompt