discussion: first-class prompt caching + a cacheable-but-dynamic system prefix for Think

## Summary

Think (and the agents Session memory layer) has **no first-class prompt caching**, and the one caching-adjacent primitive it does have — `withCachedPrompt()` / `freezeSystemPrompt()` — is in tension with dynamic per-turn context. This is the last item from a customer's hand-rolled patch set that the framework doesn't own (the recovery / transcript-integrity / compaction / sub-agent-signal items are addressed in #1623). Filing as a discussion to settle the design before implementing, because it crosses the provider boundary and interacts with compaction and the per-turn system suffix.

Related: #1593 (compaction token counting), #1623 (deploy-churn hardening).

## Current state (what the framework does today)

**1. No provider cache breakpoints anywhere in core.** There is no `cacheControl` / `cache_control` wiring in `packages/agents` or `packages/think`. Apps that want Anthropic/Bedrock prompt caching must set `providerOptions` themselves per turn (Think forwards `TurnConfig.providerOptions`, but adds no cache defaults). So even a perfectly stable system prefix is **not** actually cached at the provider unless the app hand-marks it.

**2. `withCachedPrompt()` / `freezeSystemPrompt()` is opt-in and freezes at first use.**

- `Session.withCachedPrompt()` is a builder opt-in; `Think.configureSession()` returns the session unchanged by default.
- `freezeSystemPrompt()` renders the context blocks once and persists the string to the prompt store; subsequent turns return the stored value verbatim (`packages/agents/src/experimental/memory/session/context.ts`):

```ts
async freezeSystemPrompt(): Promise<string> {
  if (this.promptStore) {
    const stored = await this.promptStore.get();
    if (stored !== null) return stored;          // frozen — never re-rendered
  }
  if (!this.loaded) await this.load();
  const prompt = this.toSystemPrompt();
  if (this.promptStore) await this.promptStore.set(prompt);
  return prompt;
}
```

- The in-memory snapshot has the same freeze semantics (`toSystemPrompt()` caches the first render; only `refreshSnapshot()` re-renders).
- The frozen prefix is only invalidated by `refreshSystemPrompt()` (which compaction calls after a successful summarize).

**3. The per-turn system suffix sits after the frozen base, with no cache boundary.** `Think._systemPromptForTurn()` appends a capability block reflecting the *current* tool set after the (possibly frozen) base:

```ts
private _systemPromptForTurn(baseSystem: string, tools: ToolSet): string {
  if (baseSystem.includes("You are running inside a Think agent.")) return baseSystem;
  return `${baseSystem.trimEnd()}\n\n${this._buildThinkCapabilityBlock(tools)}`;
}
```

So the string actually sent to the model is `frozenBase + per-turn capability block`, and nothing tells the provider where the cacheable prefix ends.

## The two coupled problems

1. **No way to express "cache this prefix" to the provider.** Caching is the single biggest cost/latency lever for long agent loops, and today it's entirely a per-app `providerOptions` chore, re-implemented per provider (Anthropic ephemeral breakpoints, Bedrock, OpenAI implicit caching, Gemini).
2. **Frozen vs dynamic is all-or-nothing.** If you freeze for cache stability, per-turn dynamic context (e.g. `set_context`, current time, retrieved memory) can't update the cached region until you call `refreshSystemPrompt()` — which throws away the cache. If you don't freeze, the prefix can drift every turn and never caches. There is no notion of a **stable cached prefix + a small dynamic tail** within the system prompt.

## Design questions to settle

- **Where do breakpoints go?** System prompt only, or also tools and the first N transcript messages? Anthropic allows up to 4 breakpoints; the highest-leverage layout for agent loops is usually `[stable system] [tools] [stable history prefix]`.
- **Provider-agnostic abstraction.** Should Think expose a declarative `cache: { ttl, breakpoints }` concept it lowers to each provider's `providerOptions`, or just document the manual path and offer a helper? How do we degrade on providers without explicit caching (OpenAI implicit, Gemini)?
- **Stable-prefix + dynamic-tail in the system prompt.** Do we split the system prompt into a cached region (context blocks frozen) and an explicitly-dynamic region (per-turn) so the dynamic part lands *after* the cache breakpoint and doesn't bust it? The current capability suffix already implies this shape — it just isn't marked.
- **Interaction with compaction.** `refreshSystemPrompt()` (called post-compaction) invalidates the frozen prefix and therefore the cache. Is that acceptable (compaction is rare), or do we want compaction to preserve the cached head?
- **Cache key drift from existing repair.** Transcript repair (#1623) rewrites historical tool parts (orphan → errored, input normalization); any "cache the first N messages" breakpoint must account for the fact that history can be rewritten, which busts a message-prefix cache.
- **AIChatAgent.** It assembles messages in the subclass `onChatMessage`, so any caching primitive there is necessarily a documented pattern, not framework-owned. Do we want parity, or is this Think-only?

## Tentative direction (for discussion, not a decision)

- Add a declarative caching surface on Think / Session (e.g. mark context blocks or regions as `cacheable`, and a `cache` option that lowers to `providerOptions` breakpoints per provider).
- Formalize the **cacheable prefix + dynamic suffix** split that `_systemPromptForTurn` already hints at, so the dynamic part is always after the last cache breakpoint.
- Keep it opt-in and provider-aware; no-op cleanly where the provider has no explicit cache control.

## Scope / non-goals

- Not trying to auto-detect cacheability or auto-tune TTLs in v1.
- Not changing default behavior — caching stays opt-in.
- Token-counting/compaction accuracy is out of scope (handled by #1593).

## References

- `packages/agents/src/experimental/memory/session/context.ts` — `toSystemPrompt` / `freezeSystemPrompt` / `getSystemPromptForEstimate` / `refreshSystemPrompt`
- `packages/think/src/think.ts` — `_systemPromptForTurn` / `_buildThinkCapabilityBlock`, turn assembly, `TurnConfig.providerOptions`
- `packages/agents/src/experimental/memory/session/session.ts` — `withCachedPrompt`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discussion: first-class prompt caching + a cacheable-but-dynamic system prefix for Think #1624

Summary

Current state (what the framework does today)

The two coupled problems

Design questions to settle

Tentative direction (for discussion, not a decision)

Scope / non-goals

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

discussion: first-class prompt caching + a cacheable-but-dynamic system prefix for Think #1624

Description

Summary

Current state (what the framework does today)

The two coupled problems

Design questions to settle

Tentative direction (for discussion, not a decision)

Scope / non-goals

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions