What the user sees
When chatting with a self-hosted Netclaw connected to a llama.cpp / llama-server backend, the assistant occasionally produces messages or tool calls where raw XML-style markup leaks into visible content. Symptoms include:
<tool_call>, <function=…>, <parameter=…> tags appearing as plain text in the chat
- Stray
</think> (or <think>) tags appearing in the assistant's reply or inside tool-call arguments
- A tool call whose
arguments JSON value contains the literal text of another tool call concatenated onto the end
- Tool calls whose argument fields are partially or completely empty (e.g.
args={}, {"Path": ""}) when the model clearly intended to populate them
- The same prompt working fine in one session but corrupting in another with longer history
This is almost always a chat-template mismatch on the inference server, not a Netclaw bug. Netclaw faithfully assembles the streaming deltas it receives — if the server emits <tool_call> literal text instead of structured tool-call deltas, that's what Netclaw sees.
Why it happens
Reasoning-capable open-weight models emit tool calls and reasoning blocks with model-specific delimiters. Common patterns:
- Qwen3 family —
<tool_call><function=…><parameter=…>…</parameter></tool_call>, reasoning in <think>…</think>
- DeepSeek-R1 family — reasoning in
<think>…</think>
- Hermes / Mistral / others — JSON-shaped tool calls
llama-server only knows how to parse these correctly when it's told to use the embedded chat template via --jinja (or an explicit --chat-template-file). Without that, it falls back to a heuristic parser that does not recognize the model's tool-call delimiters and lets the literal markup leak through into streaming output as plain text.
Models known to require --jinja for clean tool calling
This is not exhaustive, but the following families have been reported to exhibit XML / tool-call leakage when run through llama-server without --jinja:
- Qwen3 / Qwen3.5 / Qwen3-Coder — confirmed;
<tool_call> XML and </think> leak into content and tool-call args
- Qwen2.5-Instruct (with tool calling) — covered explicitly by llama.cpp's function-calling docs as requiring
--jinja
- DeepSeek-R1 distills — reasoning leakage if
--reasoning-format is wrong for the consumer
Other reasoning-capable models likely behave similarly. As a rule of thumb, any model whose Hugging Face card describes a tool-call format using XML-style markup or that ships its own chat_template.jinja should be served with --jinja.
Recommended llama-server flags (Qwen3 example)
llama-server \
--model <path/to/qwen3-gguf> \
--jinja \
--reasoning-format deepseek \
--flash-attn on \
--ctx-size <N> \
--parallel <K> \
--port <P>
--jinja — mandatory; uses the GGUF's embedded chat template (knows the model's tool-call delimiters)
--reasoning-format deepseek — correct for Qwen3; the model uses the same <think>/</think> delimiters as DeepSeek
- For some buggy GGUF templates a corrected external template via
--chat-template-file <path> is also published by the community (search the model's Hugging Face discussions)
Diagnostic checklist
If a self-hosted Netclaw instance is producing corrupted tool calls or visible XML markup:
- Check the inference server's launch arguments for
--jinja. If absent and the model is Qwen3 / DeepSeek-R1 / similar, that's almost certainly the issue.
- Check the model card on Hugging Face for the recommended
llama-server / vLLM / Ollama command line. If it lists --jinja or a custom chat template file, follow it.
- Check the llama.cpp build commit for known parser-related regressions if argument fields are empty rather than corrupted with extra text. Check upstream issue/PR history for recent tool-call parser fixes.
- Try a higher quantization (Q5_K_XL or Q6_K_XL instead of Q4_*). Tool-call structure is documented as quantization-sensitive — sub-4-bit quants frequently produce malformed tool calls even with the right template.
- Confirm the chat template embedded in the GGUF isn't itself broken — community-corrected templates exist for several Qwen3 quants on Hugging Face.
What Netclaw provides to help diagnose this
Netclaw emits diagnostic counters at three layers around every LLM streaming call:
- SSE layer — what came off the wire from the server (delta counts, suppressed deltas, finish reason)
- Middleware layer — what the chat-client decorator saw before the actor consumed it
- Actor layer — the assembled
ChatResponse content breakdown (text chars, thinking chars, tool calls, finish reason)
These show up in the per-session log at ~/.netclaw/logs/sessions/<channel>_<thread>/session.log. If counts match across all three layers but a tool call's arguments field is corrupted, the corruption originates upstream of Netclaw — almost always the inference server's chat template.
What this issue tracks
A short troubleshooting article (FAQ entry or docs page) covering:
- The symptoms above with one or two anonymized example fragments
- A short list of model families known to require
--jinja (or equivalent template flag)
- The recommended diagnostic flow when a user reports XML leakage
- A pointer to llama.cpp's
function-calling.md and the official Qwen llama.cpp guide
The article shouldn't try to be exhaustive — the goal is to short-circuit the obvious case ("user is on Qwen3 without --jinja") and point the rest at upstream documentation.
References
What the user sees
When chatting with a self-hosted Netclaw connected to a llama.cpp /
llama-serverbackend, the assistant occasionally produces messages or tool calls where raw XML-style markup leaks into visible content. Symptoms include:<tool_call>,<function=…>,<parameter=…>tags appearing as plain text in the chat</think>(or<think>) tags appearing in the assistant's reply or inside tool-call argumentsargumentsJSON value contains the literal text of another tool call concatenated onto the endargs={},{"Path": ""}) when the model clearly intended to populate themThis is almost always a chat-template mismatch on the inference server, not a Netclaw bug. Netclaw faithfully assembles the streaming deltas it receives — if the server emits
<tool_call>literal text instead of structured tool-call deltas, that's what Netclaw sees.Why it happens
Reasoning-capable open-weight models emit tool calls and reasoning blocks with model-specific delimiters. Common patterns:
<tool_call><function=…><parameter=…>…</parameter></tool_call>, reasoning in<think>…</think><think>…</think>llama-serveronly knows how to parse these correctly when it's told to use the embedded chat template via--jinja(or an explicit--chat-template-file). Without that, it falls back to a heuristic parser that does not recognize the model's tool-call delimiters and lets the literal markup leak through into streaming output as plain text.Models known to require
--jinjafor clean tool callingThis is not exhaustive, but the following families have been reported to exhibit XML / tool-call leakage when run through
llama-serverwithout--jinja:<tool_call>XML and</think>leak into content and tool-call args--jinja--reasoning-formatis wrong for the consumerOther reasoning-capable models likely behave similarly. As a rule of thumb, any model whose Hugging Face card describes a tool-call format using XML-style markup or that ships its own
chat_template.jinjashould be served with--jinja.Recommended
llama-serverflags (Qwen3 example)--jinja— mandatory; uses the GGUF's embedded chat template (knows the model's tool-call delimiters)--reasoning-format deepseek— correct for Qwen3; the model uses the same<think>/</think>delimiters as DeepSeek--chat-template-file <path>is also published by the community (search the model's Hugging Face discussions)Diagnostic checklist
If a self-hosted Netclaw instance is producing corrupted tool calls or visible XML markup:
--jinja. If absent and the model is Qwen3 / DeepSeek-R1 / similar, that's almost certainly the issue.llama-server/ vLLM / Ollama command line. If it lists--jinjaor a custom chat template file, follow it.What Netclaw provides to help diagnose this
Netclaw emits diagnostic counters at three layers around every LLM streaming call:
ChatResponsecontent breakdown (text chars, thinking chars, tool calls, finish reason)These show up in the per-session log at
~/.netclaw/logs/sessions/<channel>_<thread>/session.log. If counts match across all three layers but a tool call'sargumentsfield is corrupted, the corruption originates upstream of Netclaw — almost always the inference server's chat template.What this issue tracks
A short troubleshooting article (FAQ entry or docs page) covering:
--jinja(or equivalent template flag)function-calling.mdand the official Qwenllama.cppguideThe article shouldn't try to be exhaustive — the goal is to short-circuit the obvious case ("user is on Qwen3 without
--jinja") and point the rest at upstream documentation.References
docs/function-calling.mdllama.cppguide