Classify tool call tokens by mcharytoniuk · Pull Request #10 · intentee/llama-cpp-bindings

mcharytoniuk · 2026-05-07T18:39:16Z

No description provided.

- Rename ReasoningTokenClassifier to SampledTokenClassifier and accept optional reasoning + tool-call marker pairs. - Add SampledToken::ToolCall variant and TokenUsage tool_call_tokens counter. - Expose llama_rs_detect_tool_call_markers FFI that reports the autoparser's tools.format.section_start/end strings. - completion_tokens now sums every classified output kind so OpenAI-style totals match generated output even for models without reasoning markers.

… gating The autoparser's `analyze_template` only runs tool-call analysis when `jinja_caps.supports_tool_calls` is true, which is itself computed by trying to render the template against a synthetic tool-using conversation. Templates that can't render that exact conversation (Qwen3 is one) end up reporting `supports_tool_calls=false` even though they happily emit tool calls in real use, and the autoparser then leaves `tools.format` empty. `llama_rs_detect_tool_call_markers` now reproduces the autoparser's diff-based detection directly: render the template with and without a tool-call assistant turn (using plain ASCII synthetic names), strip reasoning markers, locate the JSON payload by braces, and return the surrounding text as the open/close markers. This stays grounded in the template's actual emitted output instead of falling back to model-specific heuristics. Also adds `llama_rs_diagnose_tool_call_synthetic_renders` so callers can inspect the rendered no-tools/with-tools outputs when detection fails.

Round-trip test confirms the configured marker pairs come back through markers(), and the undetermined() constructor reports None for both — matching the runtime behaviour the diff-based detector now relies on.

Merge ToolCall and Undeterminable arms into one branch where they share a no-op body, document the new diagnose_tool_call_synthetic_renders helper's errors section, and backtick OpenAI in the TokenUsage::completion_tokens docstring.

New wrapper_chat_parse.{h,cpp} wrap llama.cpp's `common_chat_parse` so Paddler can recover structured tool-call data without ever deserialising JSON in Rust on model output. The handle owns the parsed common_chat_msg; accessor functions return owned strings (count + indexed getters for the tool_calls list, plus content / reasoning_content getters) and a free function tears down the handle. ParsedChatMessage / ParsedToolCall value objects (Rust side) are pure data and carry their own unit tests. Model::parse_chat_message wraps the FFI behind a typed Result, with ParseChatMessageError variants per failure mode (FfiError, ParseException, StringUtf8Error, ToolsSerialization, NoChatTemplate). TestFixture::shared now uses OnceLock::get_or_init so multiple tests in a binary don't race on LlamaBackend::init. New integration tests exercise parse_chat_message on the env-driven default model (pure content, Qwen3 tool-call payload, partial input, multiple calls, reasoning section, empty input). The classifier marker-detection test that used to live in paddler_tests now lives in bindings-tests so the bindings carry their own quality bar.

… boundary

…tring detector; require compiled gpu backend in test fixture

…del classifier tests

… for Gemma 4, Mistral 3, Qwen XML

…Rust, tighten lint attributes

…classify-tool-call-tokens

…ly downgraded to March 30

…sts via --no-report accumulation

…98.83%

…synthesis into parse_chat_message

…overage for GLM-4.7 and DeepSeek-R1-8B

…easoning tokens classify correctly

…ition locals

…nd JsonObject duck-type parser

… of letting GGML_ASSERT abort

…cept MTL as a Metal backend name

…ritance

mcharytoniuk and others added 29 commits May 5, 2026 03:30

Cover the SampledTokenClassifier markers getter

47e0242

Round-trip test confirms the configured marker pairs come back through markers(), and the undetermined() constructor reports None for both — matching the runtime behaviour the diff-based detector now relies on.

Address clippy warnings on bindings test arms

820e476

Merge ToolCall and Undeterminable arms into one branch where they share a no-op body, document the new diagnose_tool_call_synthetic_renders helper's errors section, and backtick OpenAI in the TokenUsage::completion_tokens docstring.

Extract llama-cpp-bindings-types crate; ToolCallArguments enum at FFI…

d14777c

… boundary

Cache per-model toktrie env and classifier markers; consolidate ffi s…

a6d7374

…tring detector; require compiled gpu backend in test fixture

Rewrite sampled token classifier with prompt-token replay; add per-mo…

925c94f

…del classifier tests

Tool-call template overrides registry with ToolCallArgsShape variants…

f17d508

… for Gemma 4, Mistral 3, Qwen XML

Merge branch 'main' into classify-tool-call-tokens

fc88f15

Pre-merge quality pass: dedup C++ helpers, port marker extraction to …

bd2844a

…Rust, tighten lint attributes

Merge remote-tracking branch 'origin/classify-tool-call-tokens' into …

fd96d22

…classify-tool-call-tokens

Restore llama.cpp submodule to 846262d (May 4) after merge accidental…

6084fdf

…ly downgraded to March 30

Fix coverage gate: combine library unit tests with LLM integration te…

97f5f1b

…sts via --no-report accumulation

Make llguidance unconditional and add tests pushing line coverage to …

b4b8fe4

…98.83%

Fold template-override fallback parsers (nom-based) and tool-call id …

93d09e1

…synthesis into parse_chat_message

Add GLM-4.7 key-value XML tool-call parser and per-model classifier c…

8575266

…overage for GLM-4.7 and DeepSeek-R1-8B

Replay multimodal text-chunk tokens through marker state machine so r…

9c81fab

…easoning tokens classify correctly

Process multimodal chunks in a single pass with split start/final pos…

cff3a77

…ition locals

Recover tool calls via wrapper parser when C++ chat autoparser throws

01c9912

Detect markerless JSON tool calls via streaming probe in classifier a…

98f9fe8

…nd JsonObject duck-type parser

clean up makefile

01f20aa

clean up makefile

8778138

fix metal shutdown errors

09f81a9

Refuse oversized image chunks in eval_single with typed error instead…

e39dbd1

… of letting GGML_ASSERT abort

Silence Darwin ar -D warnings by overriding cmake archive recipes; ac…

6e4614c

…cept MTL as a Metal backend name

add claude rules

3136323

Pin workspace dependencies to exact versions and consolidate via inhe…

b8dcecf

…ritance

Apply rule-compliance sweep and break context↔model cycle

8f9a636

mcharytoniuk merged commit 71738a2 into main May 12, 2026
2 checks passed

mcharytoniuk deleted the classify-tool-call-tokens branch May 12, 2026 18:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classify tool call tokens#10

Classify tool call tokens#10
mcharytoniuk merged 29 commits into
mainfrom
classify-tool-call-tokens

mcharytoniuk commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mcharytoniuk commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants