feat(api): tool_use, split cache breakpoints, 1h TTL, pricing fixes#14
feat(api): tool_use, split cache breakpoints, 1h TTL, pricing fixes#14That1Drifter merged 2 commits intomasterfrom
Conversation
Inner Claude now uses tool_use to guarantee structured output, removing the JSON-parse + malformed-retry path. Cache prefix split into a global CONTRACT breakpoint (5m) and a per-session scenario breakpoint (1h via extended TTL), so new sessions stop re-paying cache writes on the static contract. Pricing table re-keyed with exact dated prefixes so sonnet-4-5/4-6 don't collide. Anthropic client hoisted to a lazy singleton in both inner-claude and debrief; default stakeholder/debrief models bumped to claude-sonnet-4-6; debrief system prompt marked ephemeral. TODO updated with the broken next-lint and rawText followups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Anthropic API rejects cache_control blocks ordered with a longer TTL after a shorter one (system.1.cache_control.ttl error). The previous split put CONTRACT at default 5m before scenario context at 1h. Bumping CONTRACT to 1h is strictly better anyway — it's static globally, so the longer TTL is paid once and read constantly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Staging smoke results ✅Deployed Test plan results
Bug caught and fixed mid-test ( Sample turn 4 usage (sonnet-4-6, surprise tier) Pre-existing bug spotted in server log (not from this PR — flagging as separate issue) Looks like a |
A Sonnet subagent with no project context played support-triage on staging via gstack browse and surfaced two critical bugs the existing suite never would have caught: an Inner Claude contract guard that rejects valid tool inputs intermittently (regression from the tool_use migration in #14) and a page-reload session destruction. Pulling the staging logs alongside the playthrough also surfaced a separate infra issue: the box is being OOM-killed under modest load, six SIGKILL events on 2026-04-10 alone. - Promote the three critical bugs to the top of the Now section with source-line references and proposed fix sketches. - Add five medium UX items to Polish (debrief structure, trust deltas, cost tooltip, objective badges, action log auto-expand). - Add a tech-debt note documenting the gstack-browse-with-in-URL-creds fetch artifact that caused at least one false-positive bug report, so future smoke runs use header-based auth instead. - Record verified false alarms (picker click works in real Chromium, debrief button doesn't submit a turn, the (?i) regex was already fixed in d791a3d, reset has a confirm guard) so future fresh-eyes reviews don't re-investigate them. - Mark back-nav and lint migration as done (shipped in #15). - Note that the demo GIF has a draft generated via the new stitching script. Also adds scripts/stitch-demo-gif.py (Pillow-based curated frame assembler) and gitignores .playwright-mcp/ to keep snapshot artifacts out of the repo, matching the existing .gstack/ convention. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Tightens API usage and prompt caching in the inner-Claude turn loop and the debrief generator.
inner-claude.tsnow defines anemit_turn_responsetool whoseinput_schemamirrorsInnerClaudeTurnResponse, and forces it viatool_choice. The JSON-in-text parse path and the malformed-retry branch are gone; only the truncation retry remains.isInnerClaudeResponsestays as the runtime guard against tool_input drift.extended-cache-ttl-2025-04-11beta header). Previously the contract was lumped into the scenario block, so each new session re-paid cache-write on the contract too.claude-sonnet-4-5,claude-sonnet-4-6,claude-opus-4-5,claude-opus-4-6,claude-haiku-4-5) so sonnet-4-5 and a future sonnet-4-6 don't collide on aclaude-sonnet-4prefix.startsWithlookup still resolves dated suffixes likeclaude-haiku-4-5-20251001.MODEL_STAKEHOLDERandDEBRIEF_MODELdefaults are nowclaude-sonnet-4-6. Env overrides preserved.new Anthropic({ apiKey })moved to a lazy module-level singleton in bothinner-claude.tsanddebrief.ts. The "ANTHROPIC_API_KEY not configured" error still throws at call time so import-time behavior is unchanged.SYSTEM_PROMPTand markedcache_control: ephemeral.pnpm lint(next lint deprecation) and the now-misnamedTurnCallResult.rawText.Test plan
pnpm typecheck— clean across 6 packagespnpm test— 17 core + 7 scenarios + 10 rubric tests passingusage.cache_read_input_tokenson turn 2+claude-sonnet-4-6default🤖 Generated with Claude Code