Fix inline summarization prompt cache misses by kevin-m-kent · Pull Request #309701 · microsoft/vscode

kevin-m-kent · 2026-04-14T02:35:41Z

Problem

The background inline summarization call gets 0% prompt cache hit rate on 2nd+ summarizations within the same turn.

Root Cause

The background summarizer forks messages from the main render but applies different post-processing than the tool calling loop:

Main agent call: stripInternalToolCallIds + validateToolMessages (filters orphaned tool results that lack matching assistant tool calls)
Background summarizer: stripInternalToolCallIds only (no validation)

After a summarization is applied, prompt-tsx re-renders the conversation with summarized history. Tool results that referenced tool calls from the now-summarized rounds become orphaned. The main call filters these out via validateToolMessages, but the background summarizer keeps them. This causes the message arrays to diverge, breaking prefix-based prompt caching (Anthropic cache_control).

Why it only affects 2nd+ summarizations

1st summarization: no prior summarization applied → no orphaned tool messages → messages match → good cache (65-98%)
2nd+ summarization in same turn: prior summarization created orphaned messages → summarizer keeps them, main call filters them → messages diverge → 0% cache
Later summarizations after many rounds: orphaned messages from prior summarization are no longer in the prompt (prompt-tsx trimmed them) → messages match again → good cache (98%)

Evidence

Analyzed multiple conversations across internal telemetry (CopilotChatEvents / engine.messages.length) and client-side exports (panel.request / summarizedconversationhistory):

Tools, system prompt, thinking config, and max window are all constant across calls — confirmed from engine.messages.length data
The 0% cache events consistently occur on the 2nd+ summarization within the same user turn
1st summarizations consistently achieve 65-98% cache hit rates
The promptcachetokencount from summarizedconversationhistory telemetry events shows the pattern clearly

Fix

Apply validateToolMessagesCore to the forked messages in the background summarizer, matching the main call's processing pipeline. This ensures orphaned tool messages are filtered identically.
Move addCacheBreakpoints() to run before _startBackgroundSummarization for deterministic breakpoint ordering (code clarity).

Impact

For a ~90K-token summarization call, going from 0% to ~90%+ cache hit means ~80K fewer uncached input tokens per summarization event. This adds up significantly for long agentic conversations that trigger multiple summarizations per turn.

The background inline summarizer forks messages from the main render but applies different post-processing than the tool calling loop: - Main agent call: stripInternalToolCallIds + validateToolMessages (filters orphaned tool results that lack matching assistant tool calls) - Background summarizer: stripInternalToolCallIds only After a summarization is applied, prompt-tsx re-renders the conversation with summarized history. Tool results that referenced tool calls from the now-summarized rounds become orphaned. The main call filters these out via validateToolMessages, but the background summarizer keeps them. This causes the message arrays to diverge, breaking prefix-based prompt caching (e.g., Anthropic's cache_control). The divergence specifically occurs on the 2nd+ summarization in the same turn, because the 1st summarization creates the orphaned messages that the 2nd summarization's forked copy includes but the main call filters out. This explains the observed pattern of 0% cache hit rate on 2nd+ summarizations while 1st summarizations get 65-98% hits. Fix: apply validateToolMessagesCore to the forked messages in the background summarizer, matching the main call's processing pipeline. Also move addCacheBreakpoints() to run before _startBackgroundSummarization to ensure cache breakpoint ordering is deterministic (code clarity improvement). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Fixes prompt cache misses in the inline background summarization flow by aligning its message post-processing and cache-breakpoint ordering with the main agent fetch pipeline, so Anthropic prefix-based prompt caching can achieve high hit rates on 2nd+ summarizations within the same user turn.

Changes:

Move addCacheBreakpoints(result.messages) earlier so the background summarizer always receives messages with deterministic breakpoint placement.
Apply ToolCallingLoop.validateToolMessagesCore(...) (after stripping internal tool call IDs) to background summarizer forked messages to filter orphaned tool results and keep message arrays aligned with the main call.
Preserve the rendered last user message on the current turn earlier in the post-render flow via RenderedUserMessageMetadata.

Show a summary per file

File	Description
extensions/copilot/src/extension/intents/node/agentIntent.ts	Align background inline summarization message processing with main agent pipeline and ensure cache breakpoints are applied before forking messages.

Copilot's findings

Files reviewed: 1/1 changed files
Comments generated: 1

Copilot · 2026-04-14T02:42:56Z

extensions/copilot/src/extension/intents/node/agentIntent.ts

+					const strippedMainMessages = ToolCallingLoop.validateToolMessagesCore(
+						ToolCallingLoop.stripInternalToolCallIds(mainRenderMessages),
+					).messages;


In the inline background summarization path, validateToolMessagesCore is called without the stripOrphanedToolCalls option. The main agent fetch path applies applyMessagePostProcessing(..., { stripOrphanedToolCalls: isGeminiFamily(endpoint) }), so for Gemini endpoints the background summarizer can keep orphaned toolCalls on assistant messages. That can (a) re-diverge the message prefix again (hurting cache parity) and (b) cause Gemini 400s due to missing 1:1 tool_call ↔ tool_result pairing. Consider passing { stripOrphanedToolCalls: isGeminiFamily(this.endpoint) } here (and importing isGeminiFamily) to match the main loop’s post-processing exactly.

Copilot AI review requested due to automatic review settings April 14, 2026 02:35

Copilot started reviewing on behalf of kevin-m-kent April 14, 2026 02:39 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

vs-code-engineering bot added the triage-needed label Apr 14, 2026

vs-code-engineering bot assigned Yoyokrazy Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix inline summarization prompt cache misses#309701

Fix inline summarization prompt cache misses#309701
kevin-m-kent wants to merge 1 commit intomicrosoft:mainfrom
kevin-m-kent:kevin-m-kent/fix-inline-summarization-cache

kevin-m-kent commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kevin-m-kent commented Apr 14, 2026

Problem

Root Cause

Why it only affects 2nd+ summarizations

Evidence

Fix

Impact

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants