Improve prefix cache inspection and warmup by wplll · Pull Request #1966 · Hmbown/CodeWhale

wplll · 2026-05-24T00:39:00Z

Summary

Improve DeepSeek prefix-cache stability diagnostics and warmup correctness by making cache inspection and warmup keys account for real prompt layers, tool schemas, project context packs, and skills.

Motivation

Previous cache diagnostics could report incomplete or misleading cache state because /cache inspect did not include the real tool catalog, warmup keys depended on prior inspection state, and project context pack cache invalidation could miss prompt-relevant changes.

Changes

Include the real request tool catalog in /cache inspect.
Add tool_catalog_hash and stable_prefix_hash to prompt inspection output.
Add a complete CacheWarmupKey covering provider, model, base URL, stable prefix, tool catalog, project pack, and skills.
Compute warmup keys from current app/request state instead of last_cache_inspection.
Make Project Context Pack generation deterministic with sorted entries and sort-before-truncate behavior.
Improve Project Context Pack cache invalidation with a manifest hash covering included file paths and README excerpt content.
Skip wire-payload dedup for short tool results and keep dedup strictly wire-only.
Update memory documentation to describe memory/session goal content as dynamic.

Cache hit improvement rationale

The changes increase probability of prefix cache reuse by stabilizing prompt-layer ordering, tool catalog ordering, and Project Context Pack rendering. They also make cache misses more diagnosable by exposing stable prefix, full reusable prefix, tool catalog, project pack, and skills hashes.

Performance considerations

Project Context Pack generation now walks prompt-relevant entries to compute a reliable manifest hash, while reusing rendered packs when the manifest is unchanged. Short tool results bypass dedup hashing and map lookup entirely. Warmup does not introduce extra catalog/network requests beyond the explicit warmup model call.

Risks and rollback

Risk is limited to cache diagnostics, warmup behavior, project context pack rendering, and wire payload compaction. Rollback can revert the cache inspection/warmup and project pack changes independently if provider behavior differs in production.

Follow-up work

Optionally allow first-run /cache inspect to build the current tool catalog before any model turn.
Consider adding config file excerpts to Project Context Pack only if they are intentionally part of the prompt, with matching manifest hash coverage.

gemini-code-assist

Code Review

This pull request enhances prompt stability and cache observability by implementing deterministic sorting for tools, MCP resources, and diagnostics, alongside a manifest-based caching system for project context packs. It introduces a CacheWarmupKey to identify stable prefixes and expands the /cache inspect command with JSON and verbose output modes. Additionally, user memory and session goals were relocated to dynamic prompt sections to improve prefix cache hits. The review feedback highlights opportunities to simplify JSON serialization using more idiomatic error handling and suggests lowering the tool result deduplication threshold to optimize token usage for medium-sized outputs.

gemini-code-assist · 2026-05-24T00:41:21Z

+    let content =
+        serde_json::to_string(&values).unwrap_or_else(|_| Value::Array(values).to_string());


The unwrap_or_else block here is redundant and potentially misleading. serde_json::to_string for a Vec<Value> is extremely unlikely to fail, and if it does, Value::Array(values).to_string() (which uses the Display trait) will also trigger a panic. It is cleaner and more idiomatic to use .expect() with a descriptive message.

Suggested change

let content =

serde_json::to_string(&values).unwrap_or_else(|_| Value::Array(values).to_string());

let content = serde_json::to_string(&values).expect("tool catalog serialization failed");

gemini-code-assist · 2026-05-24T00:41:22Z

+        let content = serde_json::to_string(tools)
+            .unwrap_or_else(|_| Value::Array(tools.to_vec()).to_string());


Similar to the tool_catalog_hash function, this fallback is redundant and includes an unnecessary tools.to_vec() clone. Using .expect() is preferred here as it avoids the overhead and clarifies that serialization is expected to succeed for these types.

let content = serde_json::to_string(tools).expect("tool catalog serialization failed");

gemini-code-assist · 2026-05-24T00:41:22Z

+    // Short content: skip dedup entirely — not worth the ref overhead.
+    // Do NOT compute hash or insert into seen_tool_results.
+    if original_chars <= TOOL_RESULT_SENT_CHAR_BUDGET {


The threshold for skipping tool result deduplication is set to the full truncation budget (12,000 characters). This is quite high; 12,000 characters (~3,000 tokens) is a significant amount of data to repeat in a request if the same result appears multiple times in the history. Since hashing and map lookup for a few kilobytes of data is extremely cheap, consider using a much lower threshold (e.g., 1,024 characters) to allow deduplication of medium-sized repeated results while still skipping it for truly trivial ones.

Suggested change

// Short content: skip dedup entirely — not worth the ref overhead.

// Do NOT compute hash or insert into seen_tool_results.

if original_chars <= TOOL_RESULT_SENT_CHAR_BUDGET {

// Very short content: skip dedup entirely — not worth the ref overhead.

// Do NOT compute hash or insert into seen_tool_results.

if original_chars <= 1024 {

Hmbown · 2026-05-24T01:00:07Z

Thanks for sending this. It is directly relevant to #1965 and the cache-continuity work, but I am not pulling it into v0.8.42 stabilization: the diff is large, currently DIRTY, only GitGuardian has run, and it touches prompt/cache/project-context behavior that needs focused review and telemetry.

I am routing this to v0.8.47. The most useful next shape would be smaller slices: cache inspection truth surface, warmup key correctness, project-context determinism, and wire-payload dedup, each with targeted tests and before/after cache-hit evidence.

Improve prefix cache inspection and warmup

0fd576a

gemini-code-assist Bot reviewed May 24, 2026

View reviewed changes

Hmbown added this to the v0.8.47 milestone May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve prefix cache inspection and warmup#1966

Improve prefix cache inspection and warmup#1966
wplll wants to merge 1 commit into
Hmbown:mainfrom
wplll:feat/cache-hit-optimization

wplll commented May 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 24, 2026

Uh oh!

gemini-code-assist Bot May 24, 2026

Uh oh!

gemini-code-assist Bot May 24, 2026

Uh oh!

Hmbown commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		let content =
		serde_json::to_string(&values).unwrap_or_else(\|_\| Value::Array(values).to_string());

	let content =
	serde_json::to_string(&values).unwrap_or_else(\|_\| Value::Array(values).to_string());
	let content = serde_json::to_string(&values).expect("tool catalog serialization failed");

		let content = serde_json::to_string(tools)
		.unwrap_or_else(\|_\| Value::Array(tools.to_vec()).to_string());

Conversation

wplll commented May 24, 2026

Summary

Motivation

Changes

Cache hit improvement rationale

Performance considerations

Risks and rollback

Follow-up work

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

Hmbown commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants