fix(provider): prompt-cache breakpoints — double rolling tail + tool-definition marker#1213
Merged
Conversation
1acb148 to
21c4ab3
Compare
The applyCaching message-level strategy placed breakpoints at a drifting midpoint and before-last-user, positions that shift every turn and never cache the request tail. Since prompt caching is longest-common-prefix based, only a marker pinned to the end grows the cached prefix each turn, so the old markers spent the budget without improving hit rate (and often landed on assistant/tool messages dropped by openai-compatible proxies). Replace them with a stable two-breakpoint scheme: last system message + last message (rolling head), mirroring upstream cc. Add a multi-turn test asserting only those two positions are marked.
…akpoint A single tail breakpoint only hits cache when a turn appends fewer than 20 content blocks (Anthropic's backward lookback window). Agentic turns with multiple tool calls routinely exceed that, pushing the prior write out of the window — and on openai-compatible proxies a lone marker landing on an assistant message is silently dropped. Both break the message cache. Mark the last TWO non-system messages instead (rolling double buffer): the prior turn's tail marker survives as the read point while the new tail is the next write, and the second marker also survives single-step tool-call retries and user-initiated session forks. This mirrors openclacky's double-marker strategy for proxied Claude. Also add ProviderTransform.tools(): the cache hierarchy is tools -> system -> messages, so marking the last tool caches the whole tool-schema block as a stable prefix. Tools bypass message()'s providerID->SDK-key remap, so the marker is written under the SDK key (anthropic cacheControl / bedrock cachePoint) directly in llm.ts before streamText. Update the multi-turn test to assert the two tail markers and add coverage for the tools breakpoint (anthropic/bedrock shapes, 1h TTL, unsupported providers).
…roxy path The double rolling-tail selection used a blind last-2 (nonSystem.slice(-2)). On openai-compatible / content-level providers, cache_control on an assistant message is silently dropped, so when the tail is [..., user, assistant] one of the two markers evaporates and the double buffer collapses to a single marker — re-triggering the original low-hit-rate bug. Select the last 2 cacheable messages by role: Anthropic/Bedrock honor message-level assistant markers so take the last 2 outright; for everyone else skip assistant turns and take the last 2 user/tool messages so both markers survive. Tool results are role:"tool" after conversion and remain cacheable. Add a content-level (OpenRouter) test asserting only the user turns are marked.
The previous commit skipped assistant turns when selecting the two tail breakpoints, on the theory that openai-compatible proxies silently drop cache_control on assistant messages. That premise doesn't hold: - providerOptions is namespaced per provider; a provider only reads its own namespace, so for protocols that don't consume cache_control (plain OpenAI), user AND assistant markers are equally absent from the payload — not an assistant-specific drop. - supportsCacheMarkers already returns false for @ai-sdk/openai and @ai-sdk/openai-compatible, so those never reach applyCaching at all (the protocol whitelist already exists, mirroring upstream RESPECTS_INLINE_HINTS). - The providers that do reach applyCaching (anthropic, bedrock, openrouter/ copilot/alibaba on Claude) all honor message-level assistant markers, so skipping assistant just discards a valid breakpoint. Select the last two messages by position again. Reframe the "why two" comment around the real win: when the last message is removed (retry, Ctrl-C, user edit/delete), the next-to-last marker is a still-present write the lookback can land on, bounding eviction to the removed message rather than the whole history. Cost is ~equal to a single marker (adjacent writes, no rewrite on hit).
…tools
applyCaching and tools() each built the per-provider cache-control marker
independently, and tools() diverged: it wrote { copilot: { cacheControl } }
where the copilot SDK actually reads copilot_cache_control, so the tool
breakpoint was silently ineffective on github-copilot.
Extract cacheMarkerOptions() as the single source of truth for the per-provider
marker shapes (anthropic/openrouter cacheControl, bedrock cachePoint,
openaiCompatible cache_control, copilot copilot_cache_control, alibaba). message()
keeps attaching the full object and remapping; tools() resolves one SDK-keyed
namespace via cacheMarkerFor(). Add a copilot tools test locking in the shape.
fe25aeb to
2588478
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the low prompt-cache hit rate by correcting where message-level cache breakpoints land, and adds the missing tool-definition breakpoint. Cross-checked against cc-haha source, openclacky's
apply_message_caching, upstream opencode PR #26786 (packages/llm/src/cache-policy.ts), and Anthropic's official prompt-caching docs.Commits (cache breakpoint work):
pin prompt-cache breakpoints to rolling head— drop the driftingmidpoint+before-last-user(by-index) markers that never tracked the request tail.double-marker rolling tail + tool-definition cache breakpoint— replace the single tail marker with a double rolling tail, and add a tool-definition breakpoint.skip assistant turns...thenrevert(provider): drop role-based skip— net no role filtering; the double-tail selects the last two messages by position.Why the message-level change
Anthropic prompt caching is longest-common-prefix based with a backward lookback window (~20 blocks) from each breakpoint.
What changed
Message-level —
applyCaching(transform.ts)skipCacheWrite(which shifts the marker to the next-to-last message for the same reason).supportsCacheMarkers(plain@ai-sdk/openai/@ai-sdk/openai-compatiblenever reachapplyCaching, mirroring upstream'sRESPECTS_INLINE_HINTSwhitelist). The providers that do reach it (anthropic, bedrock, openrouter/copilot/alibaba on Claude) all honor message-level markers including on assistant turns.Tools — new
ProviderTransform.tools()(transform.ts + llm.ts)tools → system → messages, so marking the last tool caches the whole tool-schema block as a stable prefix. Tools bypassmessage()'s providerID→SDK-key remap, so the marker is written under the SDK key directly (anthropiccacheControl/ bedrockcachePoint) inllm.tsbeforestreamText, after any_noopstub so it lands on the real last tool.Notes on provider behavior (for reviewers)
providerOptionsis namespaced per provider; a provider reads only its own namespace. For protocols that don't consumecache_control(plain OpenAI), user and assistant markers are equally absent from the payload — there is no assistant-specific "silent drop". openai-compatible cache hits come from the server's implicit prefix caching, independent of our markers.Verification
bun test test/provider/transform.test.ts→ 155 pass / 0 fail (multi-turn double-tail assertion, content-level provider marks last two by role-agnostic position, + 5 tools-breakpoint cases: anthropic/bedrock shapes, 1h TTL, unsupported providers, empty tools).bun typecheck→ clean.Doc
internal/docs/cache-policy.mddocuments Fix C (double rolling tail, removal/KV-eviction rationale, protocol gating via supportsCacheMarkers) + Fix D (tools breakpoint). It's gitignored, so not in the diff.