Overview
OpenRouter launched Response Caching — a feature that enables caching of identical API requests so responses return in a fraction of the time at zero cost.
What it is
Response Caching stores the output of LLM API calls and serves cached responses for subsequent identical requests. This is particularly useful for:
- Repeated system prompts — Netclaw uses consistent system prompts (AGENTS.md, SOUL.md, skills) across sessions
- Common tool definitions — Tool schemas are sent with every request and rarely change
- Frequent recurring queries — Reminders, webhooks, and scheduled tasks often reuse the same prompt structures
- Memory retrieval context — Repeated memory lookups with similar queries
How to enable
Two methods:
- Request header: Add
x-response-cache-control: cache to API requests
- OpenRouter Presets: Enable via the OpenRouter presets UI for specific models
Why this matters for Netclaw
Netclaw makes a high volume of LLM API calls:
- Every session turn sends system prompt + tool definitions + conversation history
- Reminders fire autonomously with structured prompts
- Webhooks process inbound payloads through LLM analysis
- Memory operations (find/store) generate API calls
Many of these involve repetitive prompt structures that would benefit significantly from caching.
Proposed implementation
- Add a configuration option for enabling response caching per provider/model
- For OpenRouter providers: Support the
x-response-cache-control: cache header
- Consider prompt prefix caching where applicable (e.g., Anthropic's caching headers)
- Surface cache hit/miss metrics in session or debug output
- Respect model-level caching support — not all models on OpenRouter may support this
References
Type
Enhancement / Cost optimization
Overview
OpenRouter launched Response Caching — a feature that enables caching of identical API requests so responses return in a fraction of the time at zero cost.
What it is
Response Caching stores the output of LLM API calls and serves cached responses for subsequent identical requests. This is particularly useful for:
How to enable
Two methods:
x-response-cache-control: cacheto API requestsWhy this matters for Netclaw
Netclaw makes a high volume of LLM API calls:
Many of these involve repetitive prompt structures that would benefit significantly from caching.
Proposed implementation
x-response-cache-control: cacheheaderReferences
Type
Enhancement / Cost optimization