Skip to content

fix: cheaper claude -p calls + honest cost-led stats (0.24.1)#43

Merged
Shahinyanm merged 1 commit into
mainfrom
fix/complete-cheaper-honest-stats
Jun 13, 2026
Merged

fix: cheaper claude -p calls + honest cost-led stats (0.24.1)#43
Shahinyanm merged 1 commit into
mainfrom
fix/complete-cheaper-honest-stats

Conversation

@Shahinyanm

Copy link
Copy Markdown
Member

Why

Running complete on a real task showed two problems with the new stats:

  • claude -p carries ~46k tokens of Claude Code system-prompt + tool schemas in every call. Empirically measured: a 9-token "say ok" prompt reported ~46k cached tokens, ~$0.03.
  • Summing those cached tokens made the line read spent 53k tok — alarming and misleading (our actual prompt was ~2k).

Changes

  • --disallowed-tools on every one-shot claude -p call. We never use tools there; denying the built-in set keeps their schemas out of the prompt and roughly halves the overhead (~46k → ~25k cached). Measured empirically; the cache-creation cost floor remains, so it's a token win more than a cost win.
  • Honest, cost-led stats. claude -p token accounting is muddy (a large prompt lands in cache_creation, not input_tokens), so the line now leads with the real dollar cost for cost-reporting backends and shows clean token counts only for API backends (Anthropic/OpenAI report no caching muddle). Token sizes scale to M.
  • Backend tip. When a cost-reporting backend was used, a one-liner points at --backend anthropic (direct Haiku API — skips the harness, ~50× cheaper per task) or --backend ollama (free, local).

Verified on real data

complete tj-aknbh66f8n (a huge multi-session finance task) → correctly left open ("investigation without a concluding cause/fix; compacted mid-investigation"), cost $0.098 · saved ~2.8M→199 tok (13874×).

Tests updated (stats_suffix cost-led vs token-led, fmt_tokens M scale, E2E asserts cost $0.0012). Full local gate green.

🤖 Generated with Claude Code

One-shot claude -p calls now pass --disallowed-tools (we never use
tools), keeping built-in tool schemas out of the prompt — roughly halves
the harness overhead. Stats lead with the real dollar cost for claude -p
(its token counts are muddy: a big prompt lands in cache_creation, not
input_tokens) and show clean tokens only for API backends; sizes scale
to M. A tip points at --backend anthropic (~50× cheaper) / ollama (free)
when a cost-reporting backend is used.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Shahinyanm Shahinyanm merged commit 04aa194 into main Jun 13, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant