Skip to content

discussion: first-class prompt caching + a cacheable-but-dynamic system prefix for Think #1624

@threepointone

Description

@threepointone

Summary

Think (and the agents Session memory layer) has no first-class prompt caching, and the one caching-adjacent primitive it does have — withCachedPrompt() / freezeSystemPrompt() — is in tension with dynamic per-turn context. This is the last item from a customer's hand-rolled patch set that the framework doesn't own (the recovery / transcript-integrity / compaction / sub-agent-signal items are addressed in #1623). Filing as a discussion to settle the design before implementing, because it crosses the provider boundary and interacts with compaction and the per-turn system suffix.

Related: #1593 (compaction token counting), #1623 (deploy-churn hardening).

Current state (what the framework does today)

1. No provider cache breakpoints anywhere in core. There is no cacheControl / cache_control wiring in packages/agents or packages/think. Apps that want Anthropic/Bedrock prompt caching must set providerOptions themselves per turn (Think forwards TurnConfig.providerOptions, but adds no cache defaults). So even a perfectly stable system prefix is not actually cached at the provider unless the app hand-marks it.

2. withCachedPrompt() / freezeSystemPrompt() is opt-in and freezes at first use.

  • Session.withCachedPrompt() is a builder opt-in; Think.configureSession() returns the session unchanged by default.
  • freezeSystemPrompt() renders the context blocks once and persists the string to the prompt store; subsequent turns return the stored value verbatim (packages/agents/src/experimental/memory/session/context.ts):
async freezeSystemPrompt(): Promise<string> {
  if (this.promptStore) {
    const stored = await this.promptStore.get();
    if (stored !== null) return stored;          // frozen — never re-rendered
  }
  if (!this.loaded) await this.load();
  const prompt = this.toSystemPrompt();
  if (this.promptStore) await this.promptStore.set(prompt);
  return prompt;
}
  • The in-memory snapshot has the same freeze semantics (toSystemPrompt() caches the first render; only refreshSnapshot() re-renders).
  • The frozen prefix is only invalidated by refreshSystemPrompt() (which compaction calls after a successful summarize).

3. The per-turn system suffix sits after the frozen base, with no cache boundary. Think._systemPromptForTurn() appends a capability block reflecting the current tool set after the (possibly frozen) base:

private _systemPromptForTurn(baseSystem: string, tools: ToolSet): string {
  if (baseSystem.includes("You are running inside a Think agent.")) return baseSystem;
  return `${baseSystem.trimEnd()}\n\n${this._buildThinkCapabilityBlock(tools)}`;
}

So the string actually sent to the model is frozenBase + per-turn capability block, and nothing tells the provider where the cacheable prefix ends.

The two coupled problems

  1. No way to express "cache this prefix" to the provider. Caching is the single biggest cost/latency lever for long agent loops, and today it's entirely a per-app providerOptions chore, re-implemented per provider (Anthropic ephemeral breakpoints, Bedrock, OpenAI implicit caching, Gemini).
  2. Frozen vs dynamic is all-or-nothing. If you freeze for cache stability, per-turn dynamic context (e.g. set_context, current time, retrieved memory) can't update the cached region until you call refreshSystemPrompt() — which throws away the cache. If you don't freeze, the prefix can drift every turn and never caches. There is no notion of a stable cached prefix + a small dynamic tail within the system prompt.

Design questions to settle

  • Where do breakpoints go? System prompt only, or also tools and the first N transcript messages? Anthropic allows up to 4 breakpoints; the highest-leverage layout for agent loops is usually [stable system] [tools] [stable history prefix].
  • Provider-agnostic abstraction. Should Think expose a declarative cache: { ttl, breakpoints } concept it lowers to each provider's providerOptions, or just document the manual path and offer a helper? How do we degrade on providers without explicit caching (OpenAI implicit, Gemini)?
  • Stable-prefix + dynamic-tail in the system prompt. Do we split the system prompt into a cached region (context blocks frozen) and an explicitly-dynamic region (per-turn) so the dynamic part lands after the cache breakpoint and doesn't bust it? The current capability suffix already implies this shape — it just isn't marked.
  • Interaction with compaction. refreshSystemPrompt() (called post-compaction) invalidates the frozen prefix and therefore the cache. Is that acceptable (compaction is rare), or do we want compaction to preserve the cached head?
  • Cache key drift from existing repair. Transcript repair (fix(think,ai-chat,agents): harden recovery, transcript integrity & compaction under deploy churn #1623) rewrites historical tool parts (orphan → errored, input normalization); any "cache the first N messages" breakpoint must account for the fact that history can be rewritten, which busts a message-prefix cache.
  • AIChatAgent. It assembles messages in the subclass onChatMessage, so any caching primitive there is necessarily a documented pattern, not framework-owned. Do we want parity, or is this Think-only?

Tentative direction (for discussion, not a decision)

  • Add a declarative caching surface on Think / Session (e.g. mark context blocks or regions as cacheable, and a cache option that lowers to providerOptions breakpoints per provider).
  • Formalize the cacheable prefix + dynamic suffix split that _systemPromptForTurn already hints at, so the dynamic part is always after the last cache breakpoint.
  • Keep it opt-in and provider-aware; no-op cleanly where the provider has no explicit cache control.

Scope / non-goals

References

  • packages/agents/src/experimental/memory/session/context.tstoSystemPrompt / freezeSystemPrompt / getSystemPromptForEstimate / refreshSystemPrompt
  • packages/think/src/think.ts_systemPromptForTurn / _buildThinkCapabilityBlock, turn assembly, TurnConfig.providerOptions
  • packages/agents/src/experimental/memory/session/session.tswithCachedPrompt

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions