fix(provider): prompt-cache breakpoints — double rolling tail + tool-definition marker by wqymi · Pull Request #1213 · XiaomiMiMo/MiMo-Code

wqymi · 2026-06-22T14:26:48Z

Summary

Fixes the low prompt-cache hit rate by correcting where message-level cache breakpoints land, and adds the missing tool-definition breakpoint. Cross-checked against cc-haha source, openclacky's apply_message_caching, upstream opencode PR #26786 (packages/llm/src/cache-policy.ts), and Anthropic's official prompt-caching docs.

Commits (cache breakpoint work):

pin prompt-cache breakpoints to rolling head — drop the drifting midpoint + before-last-user (by-index) markers that never tracked the request tail.
double-marker rolling tail + tool-definition cache breakpoint — replace the single tail marker with a double rolling tail, and add a tool-definition breakpoint.
skip assistant turns... then revert(provider): drop role-based skip — net no role filtering; the double-tail selects the last two messages by position.

Why the message-level change

Anthropic prompt caching is longest-common-prefix based with a backward lookback window (~20 blocks) from each breakpoint.

A single tail marker only hits when a turn appends < 20 content blocks; agentic turns with many tool calls routinely exceed that → full-history miss.
The decisive win is message removal: on a tool-call retry, Ctrl-C, or the user editing/deleting their latest message, a lone tail marker vanishes with that message — and how much surrounding prefix the provider then evicts depends on the upstream KV-cache implementation. A next-to-last marker is a still-present write the next lookback can land on, bounding the worst case to "recompute the removed message" instead of "recompute the whole history".
Cost is ~equal to a single marker: the two adjacent breakpoints write roughly the same incremental bytes (split in two), and a cache hit never rewrites.

What changed

Message-level — applyCaching (transform.ts)

Mark the last two non-system messages (rolling double buffer), by position. Mirrors openclacky's double-marker and cc's skipCacheWrite (which shifts the marker to the next-to-last message for the same reason).
No role-based filtering: protocol gating already lives in supportsCacheMarkers (plain @ai-sdk/openai / @ai-sdk/openai-compatible never reach applyCaching, mirroring upstream's RESPECTS_INLINE_HINTS whitelist). The providers that do reach it (anthropic, bedrock, openrouter/copilot/alibaba on Claude) all honor message-level markers including on assistant turns.

Tools — new ProviderTransform.tools() (transform.ts + llm.ts)

Cache hierarchy is tools → system → messages, so marking the last tool caches the whole tool-schema block as a stable prefix. Tools bypass message()'s providerID→SDK-key remap, so the marker is written under the SDK key directly (anthropic cacheControl / bedrock cachePoint) in llm.ts before streamText, after any _noop stub so it lands on the real last tool.

Notes on provider behavior (for reviewers)

providerOptions is namespaced per provider; a provider reads only its own namespace. For protocols that don't consume cache_control (plain OpenAI), user and assistant markers are equally absent from the payload — there is no assistant-specific "silent drop". openai-compatible cache hits come from the server's implicit prefix caching, independent of our markers.

Verification

bun test test/provider/transform.test.ts → 155 pass / 0 fail (multi-turn double-tail assertion, content-level provider marks last two by role-agnostic position, + 5 tools-breakpoint cases: anthropic/bedrock shapes, 1h TTL, unsupported providers, empty tools).
bun typecheck → clean.

Doc

internal/docs/cache-policy.md documents Fix C (double rolling tail, removal/KV-eviction rationale, protocol gating via supportsCacheMarkers) + Fix D (tools breakpoint). It's gitignored, so not in the diff.

The applyCaching message-level strategy placed breakpoints at a drifting midpoint and before-last-user, positions that shift every turn and never cache the request tail. Since prompt caching is longest-common-prefix based, only a marker pinned to the end grows the cached prefix each turn, so the old markers spent the budget without improving hit rate (and often landed on assistant/tool messages dropped by openai-compatible proxies). Replace them with a stable two-breakpoint scheme: last system message + last message (rolling head), mirroring upstream cc. Add a multi-turn test asserting only those two positions are marked.

…akpoint A single tail breakpoint only hits cache when a turn appends fewer than 20 content blocks (Anthropic's backward lookback window). Agentic turns with multiple tool calls routinely exceed that, pushing the prior write out of the window — and on openai-compatible proxies a lone marker landing on an assistant message is silently dropped. Both break the message cache. Mark the last TWO non-system messages instead (rolling double buffer): the prior turn's tail marker survives as the read point while the new tail is the next write, and the second marker also survives single-step tool-call retries and user-initiated session forks. This mirrors openclacky's double-marker strategy for proxied Claude. Also add ProviderTransform.tools(): the cache hierarchy is tools -> system -> messages, so marking the last tool caches the whole tool-schema block as a stable prefix. Tools bypass message()'s providerID->SDK-key remap, so the marker is written under the SDK key (anthropic cacheControl / bedrock cachePoint) directly in llm.ts before streamText. Update the multi-turn test to assert the two tail markers and add coverage for the tools breakpoint (anthropic/bedrock shapes, 1h TTL, unsupported providers).

…roxy path The double rolling-tail selection used a blind last-2 (nonSystem.slice(-2)). On openai-compatible / content-level providers, cache_control on an assistant message is silently dropped, so when the tail is [..., user, assistant] one of the two markers evaporates and the double buffer collapses to a single marker — re-triggering the original low-hit-rate bug. Select the last 2 cacheable messages by role: Anthropic/Bedrock honor message-level assistant markers so take the last 2 outright; for everyone else skip assistant turns and take the last 2 user/tool messages so both markers survive. Tool results are role:"tool" after conversion and remain cacheable. Add a content-level (OpenRouter) test asserting only the user turns are marked.

The previous commit skipped assistant turns when selecting the two tail breakpoints, on the theory that openai-compatible proxies silently drop cache_control on assistant messages. That premise doesn't hold: - providerOptions is namespaced per provider; a provider only reads its own namespace, so for protocols that don't consume cache_control (plain OpenAI), user AND assistant markers are equally absent from the payload — not an assistant-specific drop. - supportsCacheMarkers already returns false for @ai-sdk/openai and @ai-sdk/openai-compatible, so those never reach applyCaching at all (the protocol whitelist already exists, mirroring upstream RESPECTS_INLINE_HINTS). - The providers that do reach applyCaching (anthropic, bedrock, openrouter/ copilot/alibaba on Claude) all honor message-level assistant markers, so skipping assistant just discards a valid breakpoint. Select the last two messages by position again. Reframe the "why two" comment around the real win: when the last message is removed (retry, Ctrl-C, user edit/delete), the next-to-last marker is a still-present write the lookback can land on, bounding eviction to the removed message rather than the whole history. Cost is ~equal to a single marker (adjacent writes, no rewrite on hit).

…tools applyCaching and tools() each built the per-provider cache-control marker independently, and tools() diverged: it wrote { copilot: { cacheControl } } where the copilot SDK actually reads copilot_cache_control, so the tool breakpoint was silently ineffective on github-copilot. Extract cacheMarkerOptions() as the single source of truth for the per-provider marker shapes (anthropic/openrouter cacheControl, bedrock cachePoint, openaiCompatible cache_control, copilot copilot_cache_control, alibaba). message() keeps attaching the full object and remapping; tools() resolves one SDK-keyed namespace via cacheMarkerFor(). Add a copilot tools test locking in the shape.

wqymi force-pushed the vb/9ac0-cache-break-poin branch from 1acb148 to 21c4ab3 Compare June 22, 2026 14:36

wqymi added 5 commits June 23, 2026 14:22

wqymi force-pushed the vb/9ac0-cache-break-poin branch from fe25aeb to 2588478 Compare June 23, 2026 06:26

wqymi merged commit 7793ccc into main Jun 23, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(provider): prompt-cache breakpoints — double rolling tail + tool-definition marker#1213

fix(provider): prompt-cache breakpoints — double rolling tail + tool-definition marker#1213
wqymi merged 5 commits into
mainfrom
vb/9ac0-cache-break-poin

wqymi commented Jun 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

wqymi commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why the message-level change

What changed

Notes on provider behavior (for reviewers)

Verification

Doc

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wqymi commented Jun 22, 2026 •

edited

Loading