Summary
Request to expose Bedrock / Messages API prompt-cache TTL controls through ClaudeAgentOptions so long-running agentic applications can use the extended-cache-ttl-2025-04-11 beta (1h TTL) on the system-prompt prefix. Today the SDK only supports the default 5-minute ephemeral cache, which caps cross-session amortization for workloads that run many short-lived sessions against a shared system prompt.
Background
We run agentic security-case investigations on AWS Bedrock (Sonnet 4.6) via the Python Claude Agent SDK. Each investigation is a separate query() / ClaudeSDKClient session. The underlying system prompt is ~15–20K tokens and is intentionally byte-identical across investigations for a given org, so Bedrock's content-addressed prefix cache already amortizes it — but the default 5-min TTL limits how often back-to-back sessions get cache hits on the shared prefix.
The 1h TTL on the Messages API / Bedrock Converse requires two things that the SDK doesn't currently surface:
anthropic-beta: extended-cache-ttl-2025-04-11 header.
- An explicit
cache_control: {"type": "ephemeral", "ttl": "1h"} block on the system prompt.
What we checked (claude-agent-sdk 0.1.65, bundled CLI 2.1.117)
SdkBeta = Literal["context-1m-2025-08-07"] (claude_agent_sdk/types.py:29) is the only allowed beta; typed callers can't pass extended-cache-ttl-2025-04-11.
ClaudeAgentOptions has no cache_control, cache_ttl, cachePoint, or ephemeral field.
- A grep across the whole 0.1.65 wheel for
ttl|ephemeral|cache_control|cachePoint|cache_ttl returns no matches in the public API.
- The
extra_args: dict[str, str | None] escape hatch can smuggle arbitrary CLI flags through, but the bundled CLI doesn't appear to expose a TTL flag, and the Bedrock Converse requests we observe do not contain a cache_control block with a non-default TTL.
What we'd love to see
Either (or both) of:
- Widen
SdkBeta to include "extended-cache-ttl-2025-04-11", plus add a cache_ttl: Literal["5m", "1h"] = "5m" field on ClaudeAgentOptions (or on SystemPromptPreset / SystemPromptFile). When cache_ttl="1h", the SDK would emit the corresponding cache_control block on the system prompt and the anthropic-beta header.
- A supported pattern for pre-warming / pre-caching a shared prefix on Bedrock ahead of a batch of short-lived agent sessions. Our workload is 1000s of short multi-turn sessions per day, clustered in bursts with long quiet stretches — classic 1h-TTL territory.
A lower-level escape hatch (e.g. a typed system_prompt_blocks that accepts the raw Messages-API content-block shape with cache controls) would also work for us.
Why this matters
On one customer org we observe ~$11K / 14 days of Sonnet agentic spend. Cross-case amortization is currently capped by the 5-min TTL because our session cadence is bursty (minutes apart at peak) with long quiet stretches between bursts. Extending the prefix TTL to 1h on the shared system-prompt blob would push the cache-read share on cross-session turns materially higher and reduce cost in proportion.
Happy to share more detailed usage data privately if it would help scope or prioritize this, and happy to open a PR if you can point us at the preferred API shape (e.g. cache_ttl on ClaudeAgentOptions vs. a richer system-prompt block list vs. plumbing it through extra_args + a new CLI flag).
Thanks!
Summary
Request to expose Bedrock / Messages API prompt-cache TTL controls through
ClaudeAgentOptionsso long-running agentic applications can use theextended-cache-ttl-2025-04-11beta (1h TTL) on the system-prompt prefix. Today the SDK only supports the default 5-minute ephemeral cache, which caps cross-session amortization for workloads that run many short-lived sessions against a shared system prompt.Background
We run agentic security-case investigations on AWS Bedrock (Sonnet 4.6) via the Python Claude Agent SDK. Each investigation is a separate
query()/ClaudeSDKClientsession. The underlying system prompt is ~15–20K tokens and is intentionally byte-identical across investigations for a given org, so Bedrock's content-addressed prefix cache already amortizes it — but the default 5-min TTL limits how often back-to-back sessions get cache hits on the shared prefix.The 1h TTL on the Messages API / Bedrock Converse requires two things that the SDK doesn't currently surface:
anthropic-beta: extended-cache-ttl-2025-04-11header.cache_control: {"type": "ephemeral", "ttl": "1h"}block on the system prompt.What we checked (claude-agent-sdk 0.1.65, bundled CLI 2.1.117)
SdkBeta = Literal["context-1m-2025-08-07"](claude_agent_sdk/types.py:29) is the only allowed beta; typed callers can't passextended-cache-ttl-2025-04-11.ClaudeAgentOptionshas nocache_control,cache_ttl,cachePoint, orephemeralfield.ttl|ephemeral|cache_control|cachePoint|cache_ttlreturns no matches in the public API.extra_args: dict[str, str | None]escape hatch can smuggle arbitrary CLI flags through, but the bundled CLI doesn't appear to expose a TTL flag, and the Bedrock Converse requests we observe do not contain acache_controlblock with a non-default TTL.What we'd love to see
Either (or both) of:
SdkBetato include"extended-cache-ttl-2025-04-11", plus add acache_ttl: Literal["5m", "1h"] = "5m"field onClaudeAgentOptions(or onSystemPromptPreset/SystemPromptFile). Whencache_ttl="1h", the SDK would emit the correspondingcache_controlblock on the system prompt and theanthropic-betaheader.A lower-level escape hatch (e.g. a typed
system_prompt_blocksthat accepts the raw Messages-API content-block shape with cache controls) would also work for us.Why this matters
On one customer org we observe ~$11K / 14 days of Sonnet agentic spend. Cross-case amortization is currently capped by the 5-min TTL because our session cadence is bursty (minutes apart at peak) with long quiet stretches between bursts. Extending the prefix TTL to 1h on the shared system-prompt blob would push the cache-read share on cross-session turns materially higher and reduce cost in proportion.
Happy to share more detailed usage data privately if it would help scope or prioritize this, and happy to open a PR if you can point us at the preferred API shape (e.g.
cache_ttlonClaudeAgentOptionsvs. a richer system-prompt block list vs. plumbing it throughextra_args+ a new CLI flag).Thanks!