Summary
Every LLM call sends the JSON schema for every available tool in the request. Netclaw currently has ~30-40 tools registered, each with parameter schemas, descriptions, and examples. Measure how many tokens of each turn's input are actually tool definitions vs conversation content. The measurement itself is the deliverable — the decision on what to do with it depends on the numbers.
Why
Tool schemas are stable per-session — they don't change between turns unless the tool set itself changes (progressive disclosure, skill auto-load, etc.). That means they should live inside the cacheable prefix and ride along the #608 fix for free. But there are two ways they could secretly be costing us:
-
If the schemas are large enough (say, 5k+ tokens of pure schema), the uncached cost on the FIRST turn of every session is non-trivial. Even with caching, you pay the full cost once per session + any turn where the tool set changes.
-
If the schemas sit in a position where they break prefix cache stability (e.g., if the tool list re-serializes with different ordering or whitespace each turn), they could be poisoning the cache we just worked so hard to make stable.
Method
Use the eval suite + Multi-Turn Cache Evolution table from the post-#608 baseline as a starting point. multi_turn_text_growth has 5 short chit-chat turns — minimal conversation payload, so the input size is almost entirely {persisted prompt + tool schemas + session block}.
From the current post-fix baseline (memory 6b42a0e4-8210-4e55-b9ca-8ff65c527cac):
multi_turn_text_growth 1 input=5380 cached=4707 uncached=673
multi_turn_text_growth 2 input=5038 cached=4864 uncached=174
That's roughly 5000 tokens of "static baseline" on each turn. SOUL.md + AGENTS.md + TOOLING.md account for some of it, tool schemas account for some of it. The question is what fraction.
Concrete steps
- Tap the Netclaw-side serialization point where tool schemas get added to the outgoing request. Log or capture the serialized JSON length of the
tools field on one representative call.
- Compare to the persisted system prompt length from
ISystemPromptProvider.GetSystemPrompt().
- Compute the ratio: how much of a 5000-token static prefix is tool schemas? If it's >40%, trimming has measurable impact. If it's <10%, it's not worth optimizing.
Potential follow-up actions (depend on the measurement)
Not filing as separate issues until we have numbers, but candidates are:
- Trim tool descriptions — if descriptions are verbose and repetitive, condensing them saves the same tokens on every single turn.
- More aggressive progressive disclosure — Netclaw already has
search_tools for dynamic discovery. If the token savings are large, we could move more tools behind progressive discovery and let the agent pull them on demand.
- Tool schema compression — some providers let you use a shorter format (OpenAI has
tool_choice: "auto" with a parallel-friendly format); not available on all providers but worth checking.
Out of scope
- Any code changes based on the measurement — this issue is just the measurement. Filing decisions on subsequent actions will happen after we see real numbers.
- Non-measurement changes to the tool serialization pipeline.
Related
Summary
Every LLM call sends the JSON schema for every available tool in the request. Netclaw currently has ~30-40 tools registered, each with parameter schemas, descriptions, and examples. Measure how many tokens of each turn's input are actually tool definitions vs conversation content. The measurement itself is the deliverable — the decision on what to do with it depends on the numbers.
Why
Tool schemas are stable per-session — they don't change between turns unless the tool set itself changes (progressive disclosure, skill auto-load, etc.). That means they should live inside the cacheable prefix and ride along the #608 fix for free. But there are two ways they could secretly be costing us:
If the schemas are large enough (say, 5k+ tokens of pure schema), the uncached cost on the FIRST turn of every session is non-trivial. Even with caching, you pay the full cost once per session + any turn where the tool set changes.
If the schemas sit in a position where they break prefix cache stability (e.g., if the tool list re-serializes with different ordering or whitespace each turn), they could be poisoning the cache we just worked so hard to make stable.
Method
Use the eval suite + Multi-Turn Cache Evolution table from the post-#608 baseline as a starting point.
multi_turn_text_growthhas 5 short chit-chat turns — minimal conversation payload, so the input size is almost entirely {persisted prompt + tool schemas + session block}.From the current post-fix baseline (memory
6b42a0e4-8210-4e55-b9ca-8ff65c527cac):That's roughly 5000 tokens of "static baseline" on each turn. SOUL.md + AGENTS.md + TOOLING.md account for some of it, tool schemas account for some of it. The question is what fraction.
Concrete steps
toolsfield on one representative call.ISystemPromptProvider.GetSystemPrompt().Potential follow-up actions (depend on the measurement)
Not filing as separate issues until we have numbers, but candidates are:
search_toolsfor dynamic discovery. If the token savings are large, we could move more tools behind progressive discovery and let the agent pull them on demand.tool_choice: "auto"with a parallel-friendly format); not available on all providers but worth checking.Out of scope
Related
6b42a0e4-8210-4e55-b9ca-8ff65c527cac(post-fix baseline eval numbers, when the memorizer box comes back online)