Skip to content

fix(provider): prompt-cache breakpoints — double rolling tail + tool-definition marker#1213

Merged
wqymi merged 5 commits into
mainfrom
vb/9ac0-cache-break-poin
Jun 23, 2026
Merged

fix(provider): prompt-cache breakpoints — double rolling tail + tool-definition marker#1213
wqymi merged 5 commits into
mainfrom
vb/9ac0-cache-break-poin

Conversation

@wqymi

@wqymi wqymi commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes the low prompt-cache hit rate by correcting where message-level cache breakpoints land, and adds the missing tool-definition breakpoint. Cross-checked against cc-haha source, openclacky's apply_message_caching, upstream opencode PR #26786 (packages/llm/src/cache-policy.ts), and Anthropic's official prompt-caching docs.

Commits (cache breakpoint work):

  1. pin prompt-cache breakpoints to rolling head — drop the drifting midpoint + before-last-user (by-index) markers that never tracked the request tail.
  2. double-marker rolling tail + tool-definition cache breakpoint — replace the single tail marker with a double rolling tail, and add a tool-definition breakpoint.
  3. skip assistant turns... then revert(provider): drop role-based skip — net no role filtering; the double-tail selects the last two messages by position.

Why the message-level change

Anthropic prompt caching is longest-common-prefix based with a backward lookback window (~20 blocks) from each breakpoint.

  • A single tail marker only hits when a turn appends < 20 content blocks; agentic turns with many tool calls routinely exceed that → full-history miss.
  • The decisive win is message removal: on a tool-call retry, Ctrl-C, or the user editing/deleting their latest message, a lone tail marker vanishes with that message — and how much surrounding prefix the provider then evicts depends on the upstream KV-cache implementation. A next-to-last marker is a still-present write the next lookback can land on, bounding the worst case to "recompute the removed message" instead of "recompute the whole history".
  • Cost is ~equal to a single marker: the two adjacent breakpoints write roughly the same incremental bytes (split in two), and a cache hit never rewrites.

What changed

Message-level — applyCaching (transform.ts)

  • Mark the last two non-system messages (rolling double buffer), by position. Mirrors openclacky's double-marker and cc's skipCacheWrite (which shifts the marker to the next-to-last message for the same reason).
  • No role-based filtering: protocol gating already lives in supportsCacheMarkers (plain @ai-sdk/openai / @ai-sdk/openai-compatible never reach applyCaching, mirroring upstream's RESPECTS_INLINE_HINTS whitelist). The providers that do reach it (anthropic, bedrock, openrouter/copilot/alibaba on Claude) all honor message-level markers including on assistant turns.

Tools — new ProviderTransform.tools() (transform.ts + llm.ts)

  • Cache hierarchy is tools → system → messages, so marking the last tool caches the whole tool-schema block as a stable prefix. Tools bypass message()'s providerID→SDK-key remap, so the marker is written under the SDK key directly (anthropic cacheControl / bedrock cachePoint) in llm.ts before streamText, after any _noop stub so it lands on the real last tool.

Notes on provider behavior (for reviewers)

  • providerOptions is namespaced per provider; a provider reads only its own namespace. For protocols that don't consume cache_control (plain OpenAI), user and assistant markers are equally absent from the payload — there is no assistant-specific "silent drop". openai-compatible cache hits come from the server's implicit prefix caching, independent of our markers.

Verification

  • bun test test/provider/transform.test.ts155 pass / 0 fail (multi-turn double-tail assertion, content-level provider marks last two by role-agnostic position, + 5 tools-breakpoint cases: anthropic/bedrock shapes, 1h TTL, unsupported providers, empty tools).
  • bun typecheck → clean.

Doc

  • internal/docs/cache-policy.md documents Fix C (double rolling tail, removal/KV-eviction rationale, protocol gating via supportsCacheMarkers) + Fix D (tools breakpoint). It's gitignored, so not in the diff.

@wqymi wqymi force-pushed the vb/9ac0-cache-break-poin branch from 1acb148 to 21c4ab3 Compare June 22, 2026 14:36
wqymi added 5 commits June 23, 2026 14:22
The applyCaching message-level strategy placed breakpoints at a drifting
midpoint and before-last-user, positions that shift every turn and never
cache the request tail. Since prompt caching is longest-common-prefix
based, only a marker pinned to the end grows the cached prefix each turn,
so the old markers spent the budget without improving hit rate (and often
landed on assistant/tool messages dropped by openai-compatible proxies).

Replace them with a stable two-breakpoint scheme: last system message +
last message (rolling head), mirroring upstream cc. Add a multi-turn test
asserting only those two positions are marked.
…akpoint

A single tail breakpoint only hits cache when a turn appends fewer than 20
content blocks (Anthropic's backward lookback window). Agentic turns with
multiple tool calls routinely exceed that, pushing the prior write out of the
window — and on openai-compatible proxies a lone marker landing on an assistant
message is silently dropped. Both break the message cache.

Mark the last TWO non-system messages instead (rolling double buffer): the
prior turn's tail marker survives as the read point while the new tail is the
next write, and the second marker also survives single-step tool-call retries
and user-initiated session forks. This mirrors openclacky's double-marker
strategy for proxied Claude.

Also add ProviderTransform.tools(): the cache hierarchy is tools -> system ->
messages, so marking the last tool caches the whole tool-schema block as a
stable prefix. Tools bypass message()'s providerID->SDK-key remap, so the
marker is written under the SDK key (anthropic cacheControl / bedrock
cachePoint) directly in llm.ts before streamText.

Update the multi-turn test to assert the two tail markers and add coverage for
the tools breakpoint (anthropic/bedrock shapes, 1h TTL, unsupported providers).
…roxy path

The double rolling-tail selection used a blind last-2 (nonSystem.slice(-2)).
On openai-compatible / content-level providers, cache_control on an assistant
message is silently dropped, so when the tail is [..., user, assistant] one of
the two markers evaporates and the double buffer collapses to a single marker —
re-triggering the original low-hit-rate bug.

Select the last 2 cacheable messages by role: Anthropic/Bedrock honor
message-level assistant markers so take the last 2 outright; for everyone else
skip assistant turns and take the last 2 user/tool messages so both markers
survive. Tool results are role:"tool" after conversion and remain cacheable.

Add a content-level (OpenRouter) test asserting only the user turns are marked.
The previous commit skipped assistant turns when selecting the two tail
breakpoints, on the theory that openai-compatible proxies silently drop
cache_control on assistant messages. That premise doesn't hold:

- providerOptions is namespaced per provider; a provider only reads its own
  namespace, so for protocols that don't consume cache_control (plain OpenAI),
  user AND assistant markers are equally absent from the payload — not an
  assistant-specific drop.
- supportsCacheMarkers already returns false for @ai-sdk/openai and
  @ai-sdk/openai-compatible, so those never reach applyCaching at all (the
  protocol whitelist already exists, mirroring upstream RESPECTS_INLINE_HINTS).
- The providers that do reach applyCaching (anthropic, bedrock, openrouter/
  copilot/alibaba on Claude) all honor message-level assistant markers, so
  skipping assistant just discards a valid breakpoint.

Select the last two messages by position again. Reframe the "why two" comment
around the real win: when the last message is removed (retry, Ctrl-C, user
edit/delete), the next-to-last marker is a still-present write the lookback
can land on, bounding eviction to the removed message rather than the whole
history. Cost is ~equal to a single marker (adjacent writes, no rewrite on hit).
…tools

applyCaching and tools() each built the per-provider cache-control marker
independently, and tools() diverged: it wrote { copilot: { cacheControl } }
where the copilot SDK actually reads copilot_cache_control, so the tool
breakpoint was silently ineffective on github-copilot.

Extract cacheMarkerOptions() as the single source of truth for the per-provider
marker shapes (anthropic/openrouter cacheControl, bedrock cachePoint,
openaiCompatible cache_control, copilot copilot_cache_control, alibaba). message()
keeps attaching the full object and remapping; tools() resolves one SDK-keyed
namespace via cacheMarkerFor(). Add a copilot tools test locking in the shape.
@wqymi wqymi force-pushed the vb/9ac0-cache-break-poin branch from fe25aeb to 2588478 Compare June 23, 2026 06:26
@wqymi wqymi merged commit 7793ccc into main Jun 23, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant