Skip to content

Classify tool call tokens#10

Merged
mcharytoniuk merged 29 commits into
mainfrom
classify-tool-call-tokens
May 12, 2026
Merged

Classify tool call tokens#10
mcharytoniuk merged 29 commits into
mainfrom
classify-tool-call-tokens

Conversation

@mcharytoniuk
Copy link
Copy Markdown

No description provided.

mcharytoniuk and others added 29 commits May 5, 2026 03:30
- Rename ReasoningTokenClassifier to SampledTokenClassifier and accept
  optional reasoning + tool-call marker pairs.
- Add SampledToken::ToolCall variant and TokenUsage tool_call_tokens counter.
- Expose llama_rs_detect_tool_call_markers FFI that reports the autoparser's
  tools.format.section_start/end strings.
- completion_tokens now sums every classified output kind so OpenAI-style
  totals match generated output even for models without reasoning markers.
… gating

The autoparser's `analyze_template` only runs tool-call analysis when
`jinja_caps.supports_tool_calls` is true, which is itself computed by trying
to render the template against a synthetic tool-using conversation. Templates
that can't render that exact conversation (Qwen3 is one) end up reporting
`supports_tool_calls=false` even though they happily emit tool calls in real
use, and the autoparser then leaves `tools.format` empty.

`llama_rs_detect_tool_call_markers` now reproduces the autoparser's
diff-based detection directly: render the template with and without a
tool-call assistant turn (using plain ASCII synthetic names), strip
reasoning markers, locate the JSON payload by braces, and return the
surrounding text as the open/close markers. This stays grounded in the
template's actual emitted output instead of falling back to model-specific
heuristics.

Also adds `llama_rs_diagnose_tool_call_synthetic_renders` so callers can
inspect the rendered no-tools/with-tools outputs when detection fails.
Round-trip test confirms the configured marker pairs come back through
markers(), and the undetermined() constructor reports None for both —
matching the runtime behaviour the diff-based detector now relies on.
Merge ToolCall and Undeterminable arms into one branch where they share a
no-op body, document the new diagnose_tool_call_synthetic_renders helper's
errors section, and backtick OpenAI in the TokenUsage::completion_tokens
docstring.
New wrapper_chat_parse.{h,cpp} wrap llama.cpp's `common_chat_parse` so
Paddler can recover structured tool-call data without ever deserialising
JSON in Rust on model output. The handle owns the parsed common_chat_msg;
accessor functions return owned strings (count + indexed getters for the
tool_calls list, plus content / reasoning_content getters) and a free
function tears down the handle.

ParsedChatMessage / ParsedToolCall value objects (Rust side) are pure data
and carry their own unit tests. Model::parse_chat_message wraps the FFI
behind a typed Result, with ParseChatMessageError variants per failure
mode (FfiError, ParseException, StringUtf8Error, ToolsSerialization,
NoChatTemplate).

TestFixture::shared now uses OnceLock::get_or_init so multiple tests in a
binary don't race on LlamaBackend::init. New integration tests exercise
parse_chat_message on the env-driven default model (pure content, Qwen3
tool-call payload, partial input, multiple calls, reasoning section,
empty input). The classifier marker-detection test that used to live in
paddler_tests now lives in bindings-tests so the bindings carry their own
quality bar.
…tring detector; require compiled gpu backend in test fixture
@mcharytoniuk mcharytoniuk merged commit 71738a2 into main May 12, 2026
2 checks passed
@mcharytoniuk mcharytoniuk deleted the classify-tool-call-tokens branch May 12, 2026 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants