Skip to content

Expose prompt-cache cache_control / 1h TTL on ClaudeAgentOptions.system_prompt (Bedrock + Messages API) #864

@sharan-artemis

Description

@sharan-artemis

Summary

Request to expose Bedrock / Messages API prompt-cache TTL controls through ClaudeAgentOptions so long-running agentic applications can use the extended-cache-ttl-2025-04-11 beta (1h TTL) on the system-prompt prefix. Today the SDK only supports the default 5-minute ephemeral cache, which caps cross-session amortization for workloads that run many short-lived sessions against a shared system prompt.

Background

We run agentic security-case investigations on AWS Bedrock (Sonnet 4.6) via the Python Claude Agent SDK. Each investigation is a separate query() / ClaudeSDKClient session. The underlying system prompt is ~15–20K tokens and is intentionally byte-identical across investigations for a given org, so Bedrock's content-addressed prefix cache already amortizes it — but the default 5-min TTL limits how often back-to-back sessions get cache hits on the shared prefix.

The 1h TTL on the Messages API / Bedrock Converse requires two things that the SDK doesn't currently surface:

  1. anthropic-beta: extended-cache-ttl-2025-04-11 header.
  2. An explicit cache_control: {"type": "ephemeral", "ttl": "1h"} block on the system prompt.

What we checked (claude-agent-sdk 0.1.65, bundled CLI 2.1.117)

  • SdkBeta = Literal["context-1m-2025-08-07"] (claude_agent_sdk/types.py:29) is the only allowed beta; typed callers can't pass extended-cache-ttl-2025-04-11.
  • ClaudeAgentOptions has no cache_control, cache_ttl, cachePoint, or ephemeral field.
  • A grep across the whole 0.1.65 wheel for ttl|ephemeral|cache_control|cachePoint|cache_ttl returns no matches in the public API.
  • The extra_args: dict[str, str | None] escape hatch can smuggle arbitrary CLI flags through, but the bundled CLI doesn't appear to expose a TTL flag, and the Bedrock Converse requests we observe do not contain a cache_control block with a non-default TTL.

What we'd love to see

Either (or both) of:

  1. Widen SdkBeta to include "extended-cache-ttl-2025-04-11", plus add a cache_ttl: Literal["5m", "1h"] = "5m" field on ClaudeAgentOptions (or on SystemPromptPreset / SystemPromptFile). When cache_ttl="1h", the SDK would emit the corresponding cache_control block on the system prompt and the anthropic-beta header.
  2. A supported pattern for pre-warming / pre-caching a shared prefix on Bedrock ahead of a batch of short-lived agent sessions. Our workload is 1000s of short multi-turn sessions per day, clustered in bursts with long quiet stretches — classic 1h-TTL territory.

A lower-level escape hatch (e.g. a typed system_prompt_blocks that accepts the raw Messages-API content-block shape with cache controls) would also work for us.

Why this matters

On one customer org we observe ~$11K / 14 days of Sonnet agentic spend. Cross-case amortization is currently capped by the 5-min TTL because our session cadence is bursty (minutes apart at peak) with long quiet stretches between bursts. Extending the prefix TTL to 1h on the shared system-prompt blob would push the cache-read share on cross-session turns materially higher and reduce cost in proportion.

Happy to share more detailed usage data privately if it would help scope or prioritize this, and happy to open a PR if you can point us at the preferred API shape (e.g. cache_ttl on ClaudeAgentOptions vs. a richer system-prompt block list vs. plumbing it through extra_args + a new CLI flag).

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions