fix: cheaper claude -p calls + honest cost-led stats (0.24.1)#43
Merged
Conversation
One-shot claude -p calls now pass --disallowed-tools (we never use tools), keeping built-in tool schemas out of the prompt — roughly halves the harness overhead. Stats lead with the real dollar cost for claude -p (its token counts are muddy: a big prompt lands in cache_creation, not input_tokens) and show clean tokens only for API backends; sizes scale to M. A tip points at --backend anthropic (~50× cheaper) / ollama (free) when a cost-reporting backend is used. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Running
completeon a real task showed two problems with the new stats:claude -pcarries ~46k tokens of Claude Code system-prompt + tool schemas in every call. Empirically measured: a 9-token "say ok" prompt reported ~46k cached tokens, ~$0.03.spent 53k tok— alarming and misleading (our actual prompt was ~2k).Changes
--disallowed-toolson every one-shotclaude -pcall. We never use tools there; denying the built-in set keeps their schemas out of the prompt and roughly halves the overhead (~46k → ~25k cached). Measured empirically; the cache-creation cost floor remains, so it's a token win more than a cost win.claude -ptoken accounting is muddy (a large prompt lands incache_creation, notinput_tokens), so the line now leads with the real dollar cost for cost-reporting backends and shows clean token counts only for API backends (Anthropic/OpenAI report no caching muddle). Token sizes scale toM.--backend anthropic(direct Haiku API — skips the harness, ~50× cheaper per task) or--backend ollama(free, local).Verified on real data
complete tj-aknbh66f8n(a huge multi-session finance task) → correctly left open ("investigation without a concluding cause/fix; compacted mid-investigation"),cost $0.098 · saved ~2.8M→199 tok (13874×).Tests updated (
stats_suffixcost-led vs token-led,fmt_tokensM scale, E2E assertscost $0.0012). Full local gate green.🤖 Generated with Claude Code