feat(api): tool_use, split cache breakpoints, 1h TTL, pricing fixes by That1Drifter · Pull Request #14 · That1Drifter/fieldwork

That1Drifter · 2026-04-10T21:53:21Z

Summary

Tightens API usage and prompt caching in the inner-Claude turn loop and the debrief generator.

tool_use for structured output — inner-claude.ts now defines an emit_turn_response tool whose input_schema mirrors InnerClaudeTurnResponse, and forces it via tool_choice. The JSON-in-text parse path and the malformed-retry branch are gone; only the truncation retry remains. isInnerClaudeResponse stays as the runtime guard against tool_input drift.
Split cache breakpoints — system prompt is now two ephemeral blocks. Block 1 is the static CONTRACT (5m TTL, reused globally across every scenario and session). Block 2 is the per-session scenario context + ticket sample (1h TTL via the extended-cache-ttl-2025-04-11 beta header). Previously the contract was lumped into the scenario block, so each new session re-paid cache-write on the contract too.
Pricing table — re-keyed to exact dated prefixes (claude-sonnet-4-5, claude-sonnet-4-6, claude-opus-4-5, claude-opus-4-6, claude-haiku-4-5) so sonnet-4-5 and a future sonnet-4-6 don't collide on a claude-sonnet-4 prefix. startsWith lookup still resolves dated suffixes like claude-haiku-4-5-20251001.
Model defaults bumped — MODEL_STAKEHOLDER and DEBRIEF_MODEL defaults are now claude-sonnet-4-6. Env overrides preserved.
Client hoist — new Anthropic({ apiKey }) moved to a lazy module-level singleton in both inner-claude.ts and debrief.ts. The "ANTHROPIC_API_KEY not configured" error still throws at call time so import-time behavior is unchanged.
Debrief caching — debrief system prompt extracted to SYSTEM_PROMPT and marked cache_control: ephemeral.
TODO — added followups for the broken pnpm lint (next lint deprecation) and the now-misnamed TurnCallResult.rawText.

Test plan

pnpm typecheck — clean across 6 packages
pnpm test — 17 core + 7 scenarios + 10 rubric tests passing
Smoke a real session end-to-end on staging to confirm tool_use + 1h TTL header are accepted by the API and cache hits show in usage.cache_read_input_tokens on turn 2+
Confirm the debrief still renders narrative output with the new claude-sonnet-4-6 default

🤖 Generated with Claude Code

Inner Claude now uses tool_use to guarantee structured output, removing the JSON-parse + malformed-retry path. Cache prefix split into a global CONTRACT breakpoint (5m) and a per-session scenario breakpoint (1h via extended TTL), so new sessions stop re-paying cache writes on the static contract. Pricing table re-keyed with exact dated prefixes so sonnet-4-5/4-6 don't collide. Anthropic client hoisted to a lazy singleton in both inner-claude and debrief; default stakeholder/debrief models bumped to claude-sonnet-4-6; debrief system prompt marked ephemeral. TODO updated with the broken next-lint and rawText followups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The Anthropic API rejects cache_control blocks ordered with a longer TTL after a shorter one (system.1.cache_control.ttl error). The previous split put CONTRACT at default 5m before scenario context at 1h. Bumping CONTRACT to 1h is strictly better anyway — it's static globally, so the longer TTL is paid once and read constantly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

That1Drifter · 2026-04-10T22:01:09Z

Staging smoke results ✅

Deployed fcf7860 to staging and exercised the full turn loop + debrief on support-triage.

Test plan results

✅ tool_use accepted by API across haiku-4-5 and sonnet-4-6 — every call returned valid structured output, retried: false on every turn
✅ 1h cache header accepted — turn 1 wrote 6357-token prefix (cache_write: 6357, cache_read: 0)
✅ Cache hits on turn 2+ — every subsequent turn shows cache_write: 0, cache_read: 6357-6358
✅ Surprise tier auto-routes to claude-sonnet-4-6 — spam_reroute surprise fired and modelUsed: claude-sonnet-4-6, modelTier: stakeholder
✅ Debrief renders narrative on claude-sonnet-4-6 — multi-paragraph critique with per-turn references

Bug caught and fixed mid-test (fcf7860)
The first deploy hit 400 system.1.cache_control.ttl: a ttl='1h' cache_control block must not come after a ttl='5m' cache_control block. The Anthropic API requires longer-TTL cache blocks to be ordered before shorter ones. Bumped CONTRACT to 1h too — strictly better since it's static globally.

Sample turn 4 usage (sonnet-4-6, surprise tier)

usage: { input: 820, output: 565, cache_write: 0, cache_read: 6358 }
turnCostUsd: 0.0128

Pre-existing bug spotted in server log (not from this PR — flagging as separate issue)

SyntaxError: Invalid regular expression: /(?i)(deploy|ship|launch|rollout|production)/: Invalid group
  at .next/server/app/api/debrief/route.js:39:10591
  called from app/api/turn/route.js

Looks like a payload_regex rubric rule is using Python's (?i) inline flag, which JS regex doesn't support. Worth a follow-up fix in @fieldwork/rubric to either flip to the i flag or strip (?i) prefixes when compiling rules.

That1Drifter · 2026-04-10T22:03:10Z

Correction: the regex bug I flagged in the previous comment was already fixed in d791a3d. The two log entries I saw were from restarts before this branch's deploy — the current build (fcf7860, restart 21:57:31) has had zero errors. Disregard that followup.

A Sonnet subagent with no project context played support-triage on staging via gstack browse and surfaced two critical bugs the existing suite never would have caught: an Inner Claude contract guard that rejects valid tool inputs intermittently (regression from the tool_use migration in #14) and a page-reload session destruction. Pulling the staging logs alongside the playthrough also surfaced a separate infra issue: the box is being OOM-killed under modest load, six SIGKILL events on 2026-04-10 alone. - Promote the three critical bugs to the top of the Now section with source-line references and proposed fix sketches. - Add five medium UX items to Polish (debrief structure, trust deltas, cost tooltip, objective badges, action log auto-expand). - Add a tech-debt note documenting the gstack-browse-with-in-URL-creds fetch artifact that caused at least one false-positive bug report, so future smoke runs use header-based auth instead. - Record verified false alarms (picker click works in real Chromium, debrief button doesn't submit a turn, the (?i) regex was already fixed in d791a3d, reset has a confirm guard) so future fresh-eyes reviews don't re-investigate them. - Mark back-nav and lint migration as done (shipped in #15). - Note that the demo GIF has a draft generated via the new stitching script. Also adds scripts/stitch-demo-gif.py (Pillow-based curated frame assembler) and gitignores .playwright-mcp/ to keep snapshot artifacts out of the repo, matching the existing .gstack/ convention. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

That1Drifter and others added 2 commits April 10, 2026 16:52

That1Drifter merged commit 3be7b21 into master Apr 10, 2026
1 check passed

That1Drifter deleted the api-cache-toolify branch April 10, 2026 22:03

That1Drifter mentioned this pull request Apr 10, 2026

fix(core,web): normalize inner Claude tool input instead of rejecting it #16

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): tool_use, split cache breakpoints, 1h TTL, pricing fixes#14

feat(api): tool_use, split cache breakpoints, 1h TTL, pricing fixes#14
That1Drifter merged 2 commits intomasterfrom
api-cache-toolify

That1Drifter commented Apr 10, 2026

Uh oh!

That1Drifter commented Apr 10, 2026

Uh oh!

That1Drifter commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

That1Drifter commented Apr 10, 2026

Summary

Test plan

Uh oh!

That1Drifter commented Apr 10, 2026

Staging smoke results ✅

Uh oh!

That1Drifter commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant