Token usage and thinking classification by mcharytoniuk · Pull Request #229 · intentee/paddler

mcharytoniuk · 2026-05-07T18:39:33Z

No description provided.

Wire format ----------- - Replace GeneratedTokenResult::Token(String) with three explicit variants: ContentToken, ReasoningToken, ToolCallToken (plus existing UndeterminableToken). - Replace GeneratedTokenResult::Done unit variant with Done(GenerationSummary) carrying the final TokenUsage with prompt/cached/image/audio/content/ reasoning/tool_call/undeterminable token counts. Agent ----- - Construct a per-request SampledTokenClassifier from the model and feed every sampled token through ingest(), then emit the matching token variant on the inference channel; usage is converted from the bindings type once per generation and shipped on the Done event. Transformer trait ----------------- - TransformsOutgoingMessage::transform now returns Vec<TransformResult> so a single message can produce multiple SSE chunks. OpenAI compat endpoint ---------------------- - Add stream_options.include_usage; honor it to emit a final usage chunk in streaming mode. - Route reasoning tokens to delta.reasoning_content and tool-call tokens into delta.tool_calls function arguments fragments per the streaming spec. - Forward the tools array from the request through to the agent. - Non-streaming path replaces the simple text concatenator with an Arc<Mutex>-backed aggregator that buffers content/reasoning/tool-call text separately, parses tool-call JSON for name/arguments, and emits a single OpenAI chat.completion JSON with finish_reason "tool_calls" when applicable. Local bindings -------------- - Workspace dependency now points at sibling llama-cpp-bindings checkout (mtmd became always-on so the feature flag is dropped).

Adds a focused integration test that loads Qwen3 0.6B and asserts the classifier resolves both the reasoning and tool-call marker pairs to single special tokens. On failure the test attaches the rendered no-tools / with-tools template outputs so marker-extraction issues can be diagnosed without re-running the full inference pipeline.

The classifier emits the tool-call open/close markers as ToolCallToken events alongside the JSON payload, so the non-streaming OpenAI aggregator's buffer ends up shaped like `<tool_call>\n{...}\n</tool_call>`. Locate the JSON object by its first `{` and last `}` before parsing — the same approach llama.cpp's autoparser uses for JSON-native tool calls — so the resulting function name and arguments survive marker text on either side.

OpenAIStreamingResponseTransformer now buffers a saw_tool_call flag flipped on the first ToolCallToken; the trailing chunk's finish_reason becomes "tool_calls" instead of "stop" when set. Aligns the streaming response with OpenAI's spec for tool-using completions and with the non-streaming aggregator that already reports the same finish reason. Bumps max_completion_tokens on the reasoning-routing and non-streaming-usage integration tests so Qwen3 has room to finish its <think> block before the content phase, then assert on the corresponding streaming/non-streaming fields.

Two new transformer-level tests confirm that emitting a ToolCallToken during the turn flips the trailing chunk's finish_reason to "tool_calls" and that a content-only turn still finishes with "stop".

Generate one call_id per request inside the streaming transformer's state, include it in every delta.tool_calls fragment, and reuse the same prefix ("call_<nanoid>") that the non-streaming aggregator emits. Strict OpenAI clients require the id field on each tool-call delta to correlate fragments.

- token_usage::{completion_tokens,total_tokens} are const-eligible. - token_usage_from_bindings is also const-eligible. - Replace ContinuousBatchScheduler's ad-hoc i32->u64 cast with an explicit expect-with-reason since max_tokens is non-negative by API contract. - Use is_some_and / map_or_else instead of match-on-Option, drop the Result wrapping on a panic-only test, and scope the non-streaming Mutex guards so they drop before the empty-vec return. - Adopt the GeneratedTokenResult::is_token method reference everywhere the previous lambda was inferred-redundant by clippy.

Replace the single InferenceMessageKind.TOKEN with one kind per token variant — CONTENT_TOKEN, REASONING_TOKEN, TOOL_CALL_TOKEN, UNDETERMINABLE_TOKEN — and parse the `Done` payload as a TokenUsage-bearing GenerationSummary. is_token now matches every token kind and is_terminal inverts off it. New unit tests cover each token variant, the populated summary on Done, and the rejection of the legacy string-form Done.

New paddler::tool_call_* modules — one struct each, single responsibility: - ToolCallBuffer: append-only string buffer; pure data, fully unit-tested. - ToolCallParser: thin wrapper over Model::parse_chat_message; never deserialises JSON in Rust on model output. - ToolCallValidator: schema-driven where the tool declared one, JSON-object structural check otherwise; always invoked, with ValidatorBuildError surfacing schema-load failures and ToolCallValidationError separating UnknownToolName / InvalidJson / NotAnObject / SchemaMismatch. - ToolCallEvent: explicit event enum (Pending / Resolved / ParseFailed / ValidationFailed); pure data, unit-testable. - ToolCallPipeline: composes Buffer + Parser + Validator. Same component is shared by both endpoints; integration tests cover the end-to-end behaviour. Wire format gains three new GeneratedTokenResult variants — ToolCallParsed (structured, always emitted on close marker) plus ToolCallParseFailed and ToolCallValidationFailed (informational, do NOT terminate the request). paddler_types::ParsedToolCall is the shared wire value object. Scheduler integration: ContinuousBatchActiveRequest holds an Option<ToolCallPipeline>; the scheduler feeds every ToolCallToken to the pipeline and finalises whenever the classifier transitions out of in_tool_call state, emitting the resulting structured event downstream. The pipeline is constructed only when the request actually has tools. OpenAI compat refactor: deleted parse_tool_call_payload, locate_json_object, and the ToolCallPayload struct — every JSON parse on model output is now gone. Both transformers consume the structured ToolCallParsed event and emit OpenAI-spec delta.tool_calls / message.tool_calls without ever peeking inside the model's raw text. Per-variant arms moved into focused helper methods (handle_content / handle_reasoning / handle_tool_call_parsed / handle_done) per single-responsibility audit. ParsedToolCall is added to paddler_types as a Serialize/Deserialize wire struct; bindings ships an internal twin for the FFI return.

…phans

… ToolCallPass

…nel onto it

…errors as hard Err; collapse image pipeline to one decode

…ool-call internal tests

…3.6 templates

Copilot

Pull request overview

This PR expands Paddler’s token streaming protocol to classify streamed tokens (content vs reasoning vs tool-call) and report detailed token usage at completion, while adding an end-to-end tool-call parsing/validation pipeline. In parallel, it extracts the web/admin TypeScript wire schemas + state helpers into a new @intentee/paddler-client workspace package and updates the existing UI code to consume it.

Changes:

Add tool-call parsing/validation pipeline in paddler and surface structured tool-call events and richer “Done” summaries (usage breakdown) through streaming.
Extend request params with parse_tool_calls, refactor ContinueFromConversationHistoryParams imports, and update OpenAI compatibility transformers + tests for new token kinds/usage.
Introduce paddler_client_javascript workspace package (schemas, streaming helpers, errors) and migrate resources/ts usage/tests to it.

Reviewed changes

Copilot reviewed 278 out of 290 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tsconfig.json	Switch module resolution to `bundler` for workspace-style TS imports.
resources/ts/webSocketProtocol.ts	Remove local helper (migrated to JS client).
resources/ts/urlToAgentDesiredModel_test.ts	Remove local test (migrated to JS client).
resources/ts/schemas/InferenceServiceGenerateTokensResponse.ts	Remove local schema (migrated to JS client).
resources/ts/matchWebSocketState.ts	Use `@intentee/paddler-client` WebSocket state types.
resources/ts/matchFetchJsonState.ts	Use `@intentee/paddler-client` fetch state types; simplify success path.
resources/ts/matchEventSourceUpdateState.ts	Use `@intentee/paddler-client` EventSource state types.
resources/ts/InferenceSocketClient.interface.ts	Remove local interface (moved/embedded in JS client).
resources/ts/inferenceParametersFormKeys.ts	Reintroduce boolean/number key helpers for UI forms.
resources/ts/hooks/useWebSocket.ts	Use shared WebSocket state constants/types from JS client.
resources/ts/hooks/usePrompt.ts	Switch HTTP streaming to `streamHttpNdjson` + updated response schema/tokenKind.
resources/ts/hooks/useFetchJson.ts	Use shared fetch state constants/types from JS client.
resources/ts/hooks/useChatTemplateOverride.ts	Import shared `ChatTemplateSchema` from JS client.
resources/ts/hooks/useBalancerDesiredState.ts	Import shared `BalancerDesiredStateSchema` from JS client.
resources/ts/hooks/useAgentDesiredModelUrl.ts	Import shared `AgentDesiredModel` + URL parser from JS client.
resources/ts/extractHuggingFaceUrlParts.ts	Remove local helper (migrated to JS client).
resources/ts/ConversationMessageContentPart.type.ts	Remove local type (migrated to JS client schemas).
resources/ts/ConversationMessage.type.ts	Remove local type (migrated to JS client schemas).
resources/ts/contexts/InferenceParametersContext.ts	Import shared `InferenceParameters` type from JS client.
resources/ts/contexts/ChatTemplateContext.ts	Import shared `ChatTemplate` type from JS client.
resources/ts/components/PromptPage.tsx	Import `webSocketProtocol` from JS client.
resources/ts/components/ModelMetadataPreviewButton.tsx	Import shared `Agent` type from JS client.
resources/ts/components/ModelMetadataLoader.tsx	Import shared `Agent` type from JS client.
resources/ts/components/ModelMetadata.tsx	Import shared `Agent` type from JS client.
resources/ts/components/ModelChatTemplateOverridePreviewButton.tsx	Import shared `Agent` type from JS client.
resources/ts/components/InferenceParametersContextProvider.tsx	Import shared `InferenceParameters` type from JS client.
resources/ts/components/InferenceParameterPoolingType.tsx	Import shared constants from JS client schemas.
resources/ts/components/InferenceParameterInput.tsx	Use local form-key helper types for numeric keys.
resources/ts/components/InferenceParameterCheckbox.tsx	Use local form-key helper types for boolean keys.
resources/ts/components/InferenceParameterCacheDtype.tsx	Import shared constants from JS client schemas.
resources/ts/components/ChatTemplateOverrideLoader.tsx	Import shared `Agent` type from JS client.
resources/ts/components/ChatTemplateEditButton.tsx	Import shared `ChatTemplate` type from JS client.
resources/ts/components/ChatTemplateContextProvider.tsx	Import shared `ChatTemplate` type from JS client.
resources/ts/components/ChangeModelPage.tsx	Import shared `AgentDesiredModel` type from JS client.
resources/ts/components/ChangeModelForm.tsx	Import shared `BalancerDesiredState` type from JS client.
resources/ts/components/BufferedRequestsStream.tsx	Import shared response schema from JS client.
resources/ts/components/AgentListStream.tsx	Import shared response schema from JS client.
resources/ts/components/AgentListAgentStatus.tsx	Import shared `Agent` type from JS client.
resources/ts/components/AgentList.tsx	Import shared `Agent` type from JS client.
resources/ts/components/AgentIssuesPreviewButton.tsx	Import shared `AgentIssue` type from JS client.
resources/ts/components/AgentIssues.tsx	Import shared `AgentIssue` type from JS client.
paddler/src/tool_call_validation_error.rs	Add validation error types for tool calls.
paddler/src/tool_call_pipeline.rs	Add buffering/parse/validate pipeline for tool-call fragments.
paddler/src/tool_call_parser.rs	Parse tool calls via llama.cpp and template-override fallback.
paddler/src/tool_call_parse_error.rs	Define parse errors for tool-call parsing.
paddler/src/tool_call_event.rs	Introduce tool-call pipeline events + unit tests.
paddler/src/sets_desired_state.rs	Move `AgentDesiredState` import to `paddler_types`.
paddler/src/lib.rs	Export new tool-call modules; remove `llama_cpp_bindings` re-export.
paddler/src/cancellation_token_stream_guard.rs	Formatting-only change to `poll_next` signature.
paddler/src/balancer/inference_service/http_route/api/post_generate_embedding_batch.rs	Update transformer trait to return multiple chunks; preserve NDJSON streaming.
paddler/src/balancer/inference_service/http_route/api/post_continue_from_conversation_history.rs	Update params import path for `ContinueFromConversationHistoryParams`.
paddler/src/balancer/chunk_forwarding_session_controller/transforms_outgoing_message.rs	Change transformer contract to `Vec<TransformResult>`.
paddler/src/balancer/chunk_forwarding_session_controller/mod.rs	Forward multiple transform results per outgoing message.
paddler/src/balancer/chunk_forwarding_session_controller/identity_transformer.rs	Adapt identity transformer to new multi-result interface.
paddler/src/balancer/agent_controller.rs	Update imports to `paddler_types` + new params path.
paddler/src/balancer/agent_controller_pool.rs	Update imports + minor formatting simplification.
paddler/src/balancer_applicable_state.rs	Use `paddler_types::agent_desired_state::AgentDesiredState`.
paddler/src/balancer_applicable_state_holder.rs	Use `paddler_types::agent_desired_state::AgentDesiredState`.
paddler/src/agent/reconciliation_service.rs	Use `paddler_types::agent_desired_state::AgentDesiredState`.
paddler/src/agent/prepared_conversation_history_request.rs	Carry `parse_tool_calls` + validated tools into prepared requests.
paddler/src/agent/prepare_conversation_history_request.rs	Thread `parse_tool_calls` through; consolidate image prep with `prepared_for_inference`.
paddler/src/agent/management_socket_client_service.rs	Use `paddler_types::agent_desired_state::AgentDesiredState`.
paddler/src/agent/jsonrpc/request.rs	Update `ContinueFromConversationHistoryParams` import path.
paddler/src/agent/jsonrpc/notification_params/set_state_params.rs	Use `paddler_types::agent_desired_state::AgentDesiredState`.
paddler/src/agent/continuous_batch_scheduler/tool_call_pipeline_build_outcome.rs	Add outcome type for enabling tool-call pipeline.
paddler/src/agent/continuous_batch_scheduler/tool_call_pass.rs	Add pass to finalize tool-call parsing when leaving tool-call section.
paddler/src/agent/continuous_batch_scheduler/sample_token_phase.rs	Extract sampling logic into a dedicated phase type.
paddler/src/agent/continuous_batch_scheduler/sample_outcome.rs	Define typed outcomes for sampling stage.
paddler/src/agent/continuous_batch_scheduler/ingesting_contribution.rs	Track prompt-ingest contributions per pass.
paddler/src/agent/continuous_batch_scheduler/generating_contribution.rs	Track generating contributions per pass.
paddler/src/agent/continuous_batch_scheduler/emit_token_phase.rs	Emit classified token events (content/reasoning/tool-call/undeterminable).
paddler/src/agent/continuous_batch_scheduler/emit_token_outcome.rs	Define outcomes for emission stage.
paddler/src/agent/continuous_batch_scheduler/decode_outcome.rs	Wrap decode outcomes + add unit tests for mapping.
paddler/src/agent/continuous_batch_scheduler/decode_batch_phase.rs	Extract decode execution into a dedicated phase.
paddler/src/agent/continuous_batch_scheduler/contributions.rs	Aggregate ingest/generate contributions per batch pass.
paddler/src/agent/continuous_batch_scheduler/completion_check_phase.rs	Stop on EOG or `max_tokens` using classifier usage counts.
paddler/src/agent/continuous_batch_scheduler/completion_check_outcome.rs	Define completion-check outcomes.
paddler/src/agent/continuous_batch_scheduler/commit_phase.rs	Commit batch-pass effects back into active requests.
paddler/src/agent/continuous_batch_scheduler/classify_token_phase.rs	Classify sampled tokens and track tool-call section transitions.
paddler/src/agent/continuous_batch_scheduler/classified_token.rs	Store raw vs visible pieces and tool-call section flags.
paddler/src/agent/continuous_batch_scheduler/batch_pass.rs	Encapsulate `LlamaBatch` plus contribution bookkeeping.
paddler/src/agent/continuous_batch_scheduler/advance_outcome.rs	Represent “advance” outcomes + add unit tests.
paddler/src/agent/continuous_batch_arbiter.rs	Update token-to-piece calls to new `SampledToken` API.
paddler/src/agent/continuous_batch_active_request.rs	Track token classifier, pending `SampledToken`, and optional tool-call pipeline.
paddler/src/agent/continue_from_conversation_history_request.rs	Update params import path.
paddler/src/agent_desired_state.rs	Stop re-exporting `AgentDesiredState`; use `paddler_types` directly.
paddler_types/src/request_params/mod.rs	Remove re-export of `ContinueFromConversationHistoryParams`.
paddler_types/src/request_params/continue_from_conversation_history_params/tool/tool_params/mod.rs	Remove re-export; require explicit path for `FunctionCall`.
paddler_types/src/request_params/continue_from_conversation_history_params/tool/tool_params/function_call/parameters_schema/raw_parameters_schema.rs	Simplify schema validation (keep required-in-properties check).
paddler_types/src/request_params/continue_from_conversation_history_params/tool/mod.rs	Fix `FunctionCall` import path and ordering.
paddler_types/src/request_params/continue_from_conversation_history_params/mod.rs	Add `parse_tool_calls` flag and propagate through validation.
paddler_types/src/lib.rs	Export new `generation_summary` module.
paddler_types/src/inference_server/request.rs	Update params import path.
paddler_types/src/generation_summary.rs	Add `GenerationSummary` carrying `TokenUsage`.
paddler_types/Cargo.toml	Add dependency on `llama-cpp-bindings-types`.
paddler_tests/tests/smolvlm2_generates_tokens_from_image_input.rs	Update params + token-kind checks + Done now carries summary.
paddler_tests/tests/qwen35_without_mmproj_rejects_image_with_multimodal_not_supported.rs	Add `parse_tool_calls` field.
paddler_tests/tests/qwen35_with_system_message_completes_without_thinking.rs	Add `parse_tool_calls`; update token-kind checks + Done summary.
paddler_tests/tests/qwen35_with_system_message_completes_with_thinking.rs	Add `parse_tool_calls`; update token-kind checks + Done summary.
paddler_tests/tests/qwen35_with_mmproj_generates_tokens_from_image.rs	Add `parse_tool_calls`; update token-kind checks + Done summary.
paddler_tests/tests/qwen35_thinking_multi_turn_conversation_stops_cleanly.rs	Add `parse_tool_calls`; update token-kind checks + Done summary.
paddler_tests/tests/qwen35_thinking_mode_stops_cleanly_before_max_tokens.rs	Add `parse_tool_calls`; update token-kind checks + Done summary.
paddler_tests/tests/qwen35_generation_stops_at_eog_before_max_tokens.rs	Add `parse_tool_calls`; update token-kind checks + Done summary.
paddler_tests/tests/qwen35_generates_tokens_for_long_system_and_user_prompt.rs	Add `parse_tool_calls`; update token-kind checks + Done summary.
paddler_tests/tests/qwen3_without_grammar_generates_unconstrained_output.rs	Update token-kind checks via `is_token()`.
paddler_tests/tests/qwen3_openai_streaming_usage_breakdown_with_thinking.rs	New test for OpenAI streaming usage chunk with thinking.
paddler_tests/tests/qwen3_openai_streaming_routes_reasoning_to_reasoning_content.rs	New test mapping reasoning tokens to OpenAI `reasoning_content`.
paddler_tests/tests/qwen3_openai_streaming_omits_usage_when_not_requested.rs	New test ensuring usage omitted unless requested.
paddler_tests/tests/qwen3_openai_streaming_emits_usage_when_requested.rs	New test ensuring trailing usage chunk emitted when requested.
paddler_tests/tests/qwen3_openai_streaming_emits_tool_calls_for_function_tool.rs	New test for structured OpenAI streaming tool calls.
paddler_tests/tests/qwen3_openai_non_streaming_usage_with_tool_calls.rs	New test for non-streaming usage + tool calls.
paddler_tests/tests/qwen3_openai_non_streaming_returns_usage.rs	New test for non-streaming usage fields.
paddler_tests/tests/qwen3_internal_endpoint_with_thinking_enabled_emits_reasoning_tokens.rs	New test for reasoning token classification + usage invariants.
paddler_tests/tests/qwen3_internal_endpoint_with_thinking_disabled_emits_no_reasoning_tokens.rs	New test ensuring no reasoning tokens when disabled.
paddler_tests/tests/qwen3_internal_endpoint_pure_content_usage.rs	New test for pure content usage breakdown.
paddler_tests/tests/qwen3_internal_endpoint_max_tokens_usage_matches.rs	New test ensuring usage completion count matches streamed token count.
paddler_tests/tests/qwen3_internal_endpoint_concurrent_requests_independent_usage.rs	New test ensuring per-request usage counters are independent.
paddler_tests/tests/qwen3_grammar_with_thinking_returns_incompatible_error.rs	Add `parse_tool_calls` field.
paddler_tests/tests/qwen3_generates_tokens_from_raw_prompt.rs	Update token-kind checks + Done summary.
paddler_tests/tests/qwen3_generates_tokens_from_conversation_history.rs	Add `parse_tool_calls`; update token-kind checks + Done summary.
paddler_tests/tests/qwen25vl_generates_tokens_from_image_input.rs	Add `parse_tool_calls`; update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_two_concurrent_multimodal_requests_produce_tokens.rs	Add `parse_tool_calls`; update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_stops_generation_when_stop_sender_dropped.rs	Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_stops_at_max_tokens_boundary.rs	Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_smoke.rs	Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_serves_four_concurrent_requests.rs	Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_reuses_slot_after_request_completes.rs	Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_plain_and_multimodal_run_concurrently.rs	Add `parse_tool_calls`; update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_long_and_short_prompts_complete_concurrently.rs	Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_generates_tokens_with_partial_layer_offload.rs	Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_generates_tokens_with_distinct_k_and_v_cache_dtypes.rs	Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_evicts_long_sequence_under_kv_pressure.rs	Update Done summary matching.
paddler_tests/tests/continuous_batch_concurrent_conversation_history_requests_complete.rs	Add `parse_tool_calls`; update token-kind checks + Done summary.
paddler_tests/tests/chat_template_swaps_between_inference_calls.rs	Add `parse_tool_calls`; update token-kind checks via `is_token`.
paddler_tests/tests/chat_template_override_replaces_model_builtin.rs	Add `parse_tool_calls`; update token-kind checks via `is_token`.
paddler_tests/tests/chat_template_drains_in_flight_inference_before_swap.rs	Add `parse_tool_calls`; update token-kind checks + Done summary.
paddler_tests/tests/balancer_completes_in_flight_inference_during_model_switch.rs	Fix race by awaiting first token before triggering model switch.
paddler_tests/tests/agent_text_only_model_rejects_image_input.rs	Add `parse_tool_calls` field.
paddler_tests/tests/agent_streams_tokens_from_raw_prompt.rs	Update token-kind checks via `is_token()`.
paddler_tests/tests/agent_streams_tokens_from_image_data_uri.rs	Add `parse_tool_calls`; update token-kind checks via `is_token`.
paddler_tests/tests/agent_streams_tokens_from_conversation_history_over_http.rs	Add `parse_tool_calls`; update token-kind checks via `is_token()`.
paddler_tests/tests/agent_serves_four_concurrent_clients_streaming_tokens.rs	Update token-kind checks via `is_token()`.
paddler_tests/tests/agent_returns_image_decoding_error_for_remote_url.rs	Add `parse_tool_calls` field.
paddler_tests/tests/agent_returns_image_decoding_error_for_malformed_data_uri.rs	Add `parse_tool_calls` field.
paddler_tests/tests/agent_returns_image_decoding_error_for_invalid_base64.rs	Add `parse_tool_calls` field.
paddler_tests/tests/agent_rejects_tool_with_invalid_required_field_in_schema.rs	Update FunctionCall import path; enable `parse_tool_calls` for test.
paddler_tests/tests/agent_raw_prompt_respects_max_tokens.rs	Update token-kind checks via `is_token()`.
paddler_tests/tests/agent_openai_chat_completions_non_streaming_returns_text.rs	Increase max tokens and explicitly disable thinking in template kwargs.
paddler_tests/tests/agent_grammar_with_thinking_returns_incompatible_error.rs	Add `parse_tool_calls` field.
paddler_tests/tests/agent_exits_cleanly_on_sigterm_during_multimodal_inference.rs	Add `parse_tool_calls` field.
paddler_tests/tests/agent_conversation_with_json_schema_grammar_returns_valid_json.rs	Add `parse_tool_calls` field.
paddler_tests/tests/agent_conversation_with_gbnf_grammar_constrains_output.rs	Add `parse_tool_calls` field.
paddler_tests/tests/agent_conversation_with_function_tool_succeeds.rs	Update FunctionCall import path; enable `parse_tool_calls` for tools test.
paddler_tests/tests/agent_conversation_history_respects_max_tokens.rs	Add `parse_tool_calls`; update token-kind checks via `is_token()`.
paddler_tests/tests/agent_conversation_accepts_empty_tools_list.rs	Add `parse_tool_calls`; update token-kind checks via `is_token()`.
paddler_tests/src/test_device.rs	Import `llama_cpp_bindings` directly (no longer re-exported by `paddler`).
paddler_tests/src/start_in_process_cluster_with_qwen3_6.rs	New helper to start cluster with Qwen3.6 model card.
paddler_tests/src/start_in_process_cluster_with_ministral_3.rs	New helper to start cluster with Ministral 3 model card.
paddler_tests/src/start_in_process_cluster_with_gemma_4.rs	New helper to start cluster with Gemma 4 model card.
paddler_tests/src/openai_chat_completions_client.rs	New test client for OpenAI chat completions endpoints.
paddler_tests/src/model_card/qwen3_6_35b_a3b.rs	New model card for Qwen3.6 35B A3B.
paddler_tests/src/model_card/mod.rs	Register new model cards/modules.
paddler_tests/src/model_card/ministral_3_14b_reasoning.rs	New model card for Ministral 3 14B reasoning.
paddler_tests/src/model_card/gemma_4_e4b_it.rs	New model card for Gemma 4 E4B IT.
paddler_tests/src/lib.rs	Export new helpers for OpenAI client and model cards.
paddler_tests/src/inference_http_client.rs	Make client cloneable; update params import path.
paddler_tests/src/collect_generated_tokens.rs	Accumulate text via `token_text()` abstraction.
paddler_tests/src/cluster_handle.rs	Add `Drop` impl to ensure subprocess cleanup; refactor shutdown ownership.
paddler_tests/Cargo.toml	Add `hf-hub` + `llama-cpp-bindings` deps for tests.
paddler_gui/src/running_balancer_snapshot.rs	Use `AgentDesiredState` from `paddler_types` in tests.
paddler_client/src/lib.rs	Make internal modules private and re-export intended public API.
paddler_client/src/client_inference.rs	Update params import path.
paddler_client_python/tests/test_tool_call_arguments.py	New tests for Python tool-call arguments tagged enum parsing.
paddler_client_python/tests/test_stream_ndjson.py	Update token/done wire shapes and message kind enum.
paddler_client_python/tests/test_response_stream.py	Update token kind to content-token.
paddler_client_python/tests/test_parsed_tool_call.py	New tests for Python parsed tool-call model.
paddler_client_python/tests/test_integration_inference.py	Update integration assertions for new token kind enum.
paddler_client_python/tests/test_client_inference.py	Update NDJSON helpers + message kind enum.
paddler_client_python/paddler_client/tool_call_arguments.py	Add Python ToolCallArguments tagged enum + parser.
paddler_client_python/paddler_client/parsed_tool_call.py	Add Python ParsedToolCall model + dict conversion.
paddler_client_javascript/tsconfig.json	Add TS config for new JS client package build output.
paddler_client_javascript/tests/webSocketProtocol.test.ts	Add tests for protocol mapping helper.
paddler_client_javascript/tests/urlToAgentDesiredModel.test.ts	Add tests for URL→model parsing behavior.
paddler_client_javascript/tests/streamHttpNdjson.test.ts	Add tests for NDJSON streaming helper + errors.
paddler_client_javascript/tests/schemas/ParsedToolCall.test.ts	Add tests for parsed tool-call schema.
paddler_client_javascript/tests/schemas/InferenceServiceGenerateTokensResponse.test.ts	Add tests for new token kinds + Done usage summary mapping.
paddler_client_javascript/tests/schemas/Agent.test.ts	Add tests for Agent schema parsing/validation.
paddler_client_javascript/tests/PaddlerError.test.ts	Add tests for new error subclasses.
paddler_client_javascript/tests/fetchJson.test.ts	Add tests for `fetchJson` helper + HttpError.
paddler_client_javascript/tests/extractHuggingFaceUrlParts.test.ts	Add tests for HuggingFace URL parsing helper.
paddler_client_javascript/src/WebSocketState.ts	Add shared WebSocket state union type.
paddler_client_javascript/src/webSocketProtocol.ts	Add shared WebSocket protocol mapping helper.
paddler_client_javascript/src/WebSocketError.ts	Add WebSocket error type.
paddler_client_javascript/src/WebSocketConnectionOpenedState.ts	Add opened WebSocket state type.
paddler_client_javascript/src/WebSocketConnectionErrorState.ts	Add error WebSocket state + frozen constant.
paddler_client_javascript/src/WebSocketConnectionClosedState.ts	Add closed WebSocket state + frozen constant.
paddler_client_javascript/src/WebSocketConnectingState.ts	Add connecting WebSocket state + frozen constant.
paddler_client_javascript/src/urlToAgentDesiredModel.ts	Implement URL→desired-model parsing with explicit error on unsupported formats.
paddler_client_javascript/src/streamHttpNdjson.ts	Add HTTP NDJSON streaming helper (Observable-based).
paddler_client_javascript/src/streamEventSource.ts	Add SSE/EventSource streaming helper emitting connection/data states.
paddler_client_javascript/src/ServerError.ts	Add server error type with integer code.
paddler_client_javascript/src/schemas/ValidatedParametersSchema.ts	Add schema for validated function parameter JSON Schema.
paddler_client_javascript/src/schemas/Tool.ts	Add OpenAI-like function tool schema definitions.
paddler_client_javascript/src/schemas/PoolingType.ts	Add pooling type enum schema.
paddler_client_javascript/src/schemas/ParsedToolCall.ts	Add parsed tool call schema with tagged arguments union.
paddler_client_javascript/src/schemas/ModelMetadata.ts	Add schema for model metadata map.
paddler_client_javascript/src/schemas/InferenceParameters.ts	Remove exported BooleanKeys/NumberKeys (moved to UI helper).
paddler_client_javascript/src/schemas/HuggingFaceModelReference.ts	Add schema for HuggingFace model reference.
paddler_client_javascript/src/schemas/HuggingFaceDownloadLock.ts	Minor formatting/import tweak.
paddler_client_javascript/src/schemas/GrammarConstraint.ts	Add schema for grammar constraints.
paddler_client_javascript/src/schemas/GenerateEmbeddingBatchParams.ts	Add schema for embedding batch params.
paddler_client_javascript/src/schemas/EmbeddingNormalizationMethod.ts	Add schema for embedding normalization variants.
paddler_client_javascript/src/schemas/EmbeddingInputDocument.ts	Add schema for embedding input documents.
paddler_client_javascript/src/schemas/Embedding.ts	Add schema for embedding response items.
paddler_client_javascript/src/schemas/ConversationMessageContentPart.ts	Add schema for multimodal conversation content parts.
paddler_client_javascript/src/schemas/ConversationMessage.ts	Add schema for conversation message payloads.
paddler_client_javascript/src/schemas/ContinueFromRawPromptParams.ts	Add schema for raw-prompt inference params.
paddler_client_javascript/src/schemas/ContinueFromConversationHistoryParams.ts	Add schema for conversation-history inference params incl `parse_tool_calls`.
paddler_client_javascript/src/schemas/ChatTemplate.ts	Add chat template schema.
paddler_client_javascript/src/schemas/BufferedRequestsResponse.ts	Add buffered requests snapshot schema.
paddler_client_javascript/src/schemas/BalancerDesiredState.ts	Add balancer desired state schema.
paddler_client_javascript/src/schemas/AgentsResponse.ts	Add agents response schema + stable sort transform.
paddler_client_javascript/src/schemas/AgentIssueModelPath.ts	Add schema for model-path structured errors.
paddler_client_javascript/src/schemas/AgentIssue.ts	Add schema for agent issue union.
paddler_client_javascript/src/schemas/AgentDesiredModel.ts	Add schema for desired model union variants.
paddler_client_javascript/src/schemas/Agent.ts	Add schema for agent status payload.
paddler_client_javascript/src/PaddlerError.ts	Add base error class for JS client.
paddler_client_javascript/src/JsonError.ts	Add JSON parse error carrying raw payload.
paddler_client_javascript/src/inferenceSocketClient.ts	Implement WS inference client; switch request id generation to `crypto.randomUUID()`.
paddler_client_javascript/src/HttpError.ts	Add HTTP error class carrying status code.
paddler_client_javascript/src/FetchJsonSuccessState.ts	Add fetch-json success state type.
paddler_client_javascript/src/FetchJsonState.ts	Add fetch-json state union type.
paddler_client_javascript/src/FetchJsonLoadingState.ts	Add loading state type + frozen constant.
paddler_client_javascript/src/FetchJsonErrorState.ts	Add error state type.
paddler_client_javascript/src/FetchJsonEmptyState.ts	Add empty state type + frozen constant.
paddler_client_javascript/src/fetchJson.ts	Add HTTP JSON helper with schema validation.
paddler_client_javascript/src/extractHuggingFaceUrlParts.ts	Add HuggingFace URL parsing helper without `path-to-regexp`.
paddler_client_javascript/src/EventSourceState.ts	Add EventSource state union type.
paddler_client_javascript/src/EventSourceInitialState.ts	Add initial EventSource state + frozen constant.
paddler_client_javascript/src/EventSourceDeserializationErrorState.ts	Add deserialization error state + frozen constant.
paddler_client_javascript/src/EventSourceDataSnapshotState.ts	Add typed data snapshot state for SSE.
paddler_client_javascript/src/EventSourceConnectionErrorState.ts	Add connection error state + frozen constant.
paddler_client_javascript/src/EventSourceConnectedState.ts	Add connected state + frozen constant.
paddler_client_javascript/src/ConnectionDroppedError.ts	Add error for dropped streaming connection keyed by request id.
paddler_client_javascript/shell.nix	Add dev shell with Node 22.
paddler_client_javascript/README.md	Document JS client usage patterns (WS, NDJSON, SSE).
paddler_client_javascript/package.json	Define package metadata, exports pattern, peer deps, and test/build scripts.
paddler_client_javascript/Makefile	Add package-local build/test targets.
paddler_client_javascript/.gitignore	Ignore node artifacts in JS client package.
paddler_cli/src/main.rs	Reorder module declarations/imports; keep conditional web panel init.
paddler_bootstrap/tests/runners.rs	Minor formatting simplification for agent runner start.
paddler_bootstrap/src/bootstrapped_agent_handle.rs	Import `AgentDesiredState` from `paddler_types`.
package.json	Mark repo private and add npm workspace for JS client package.
Makefile	Add top-level build/test targets for JS client workspace.
jarmuz/run-website.mjs	Watch JS client sources and trigger TS jobs on changes.
Cargo.toml	Switch llama-cpp-bindings crates from crates.io to path deps; add types crate.
Cargo.lock	Reflect llama-cpp-bindings path upgrade and new types crate; update deps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+  return new Observable(function (subscriber) {
+    fetch(url, {
+      body: JSON.stringify(body),
+      headers: { "Content-Type": "application/json" },
+      method: "POST",
+      signal,
+    })


+        match strategy {
+            ValidationStrategy::JsonObjectOnly => Ok(()),
+            ValidationStrategy::Schema(validator) => {


…for emit + classify phases

…rds, consolidate OpenAI compat error chunks

…, simplify pipeline

…nd add waiter wakeup tests

…th strict-whitelist kwargs

…overage

… reasoning enabled

…resolution

…d_by

…d SSE shutdown and dispatch candidate tests

…GTERM during startup yields a clean exit

…shing; rename batch_n_tokens to n_batch

…e per-chunk cap; consolidate test agents into AgentConfig

…ap with a typed error

…ate iced Message, collapse nested if

…parser; raise TypeError on non-dict/non-list payloads

…ication' into token-usage-and-thinking-classification

…rnels are built once

…ad of hanging

mcharytoniuk added 17 commits May 5, 2026 03:30

Cover streaming finish_reason routing

5ee161d

Two new transformer-level tests confirm that emitting a ToolCallToken during the turn flips the trailing chunk's finish_reason to "tool_calls" and that a content-only turn still finishes with "stop".

Consume shared bindings-types crate; drop value-object duplication

33e0f4c

Decompose scheduler iteration into named pipeline phases

b180783

Drop impl on ClusterHandle SIGTERMs subprocess children to prevent or…

5817399

…phans

make tool-call parsing explicit via parse_tool_calls flag and isolate…

cc7e26a

… ToolCallPass

introduce @intentee/paddler-client npm workspace and migrate admin pa…

b000890

…nel onto it

Surface tool-call schema invalidity as soft event and infrastructure …

ca5ba9e

…errors as hard Err; collapse image pipeline to one decode

Drop llama-cpp-bindings type re-exports; add per-model thinking and t…

1cf5e37

…ool-call internal tests

Wrapper-layer tool-call parsers for Gemma 4, Mistral 3, and Qwen 3.5/…

b492791

…3.6 templates

Copilot AI review requested due to automatic review settings May 7, 2026 18:39

mcharytoniuk requested review from a team as code owners May 7, 2026 18:39

Copilot started reviewing on behalf of mcharytoniuk May 7, 2026 18:40 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

mcharytoniuk and others added 8 commits May 7, 2026 21:31

Decompose tool_call_pass into module fn; add table-driven unit tests …

68a35db

…for emit + classify phases

Inline scheduler phase wrappers, surface fire-and-forget Result disca…

061a656

…rds, consolidate OpenAI compat error chunks

Move tool-call template-override parsers and orchestrator to bindings…

33971c0

…, simplify pipeline

Fix agent status snapshot dropping field updates after first change a…

c646c7d

…nd add waiter wakeup tests

Split chat_template_renderer into module and harden pyjinja_tojson wi…

d579ba7

…th strict-whitelist kwargs

Add GLM-4.7-Flash and DeepSeek-R1-Distill-Llama-8B integration test c…

46cf957

…overage

Add per-model integration tests covering image-attached requests with…

e2871da

… reasoning enabled

add paddler client CLI application

2a704c1

malzag and others added 21 commits May 9, 2026 03:37

Fix template-swap drain race with RAII slot guard; consolidate model …

14464a3

…resolution

highlight tokens with different colors, reorganize paddler_client_cli

b111348

attribute every response to its producing agent via envelope.generate…

72077ae

…d_by

Surface UnrecognizedToolCallFormat through wire types and clients; ad…

7ff51d8

…d SSE shutdown and dispatch candidate tests

Install shutdown signal handlers synchronously before bootstrap so SI…

019b0dd

…GTERM during startup yields a clean exit

Surface oversized image inputs as a typed wire variant instead of cra…

7f0ccc3

…shing; rename batch_n_tokens to n_batch

Distribute embedding batches evenly across agents up to a configurabl…

45aea05

…e per-chunk cap; consolidate test agents into AgentConfig

Reject zero agent count and zero per-chunk cap in chunk_evenly_with_c…

57ecf9f

…ap with a typed error

Resolve preexisting clippy warnings: box large arbiter variant, annot…

647c2ae

…ate iced Message, collapse nested if

Migrate to LlamaContext::from_model after bindings cycle break

89c4dc6

update paddler_client_python gitignore

100bc00

Scope Claude rules to source-file paths and add python-on-nixos guidance

c8fb5d6

Bump Python client ruff to 0.15.12 and refactor GeneratedTokenResult …

429a24a

…parser; raise TypeError on non-dict/non-list payloads

Merge remote-tracking branch 'origin/token-usage-and-thinking-classif…

d32be7c

…ication' into token-usage-and-thinking-classification

Use nanoid for inference socket requestId

7a33340

update claude rules

ee55691

add SKILL to run the tests

dcffc52

update claude rules

d8e0a43

switch ot llama.cpp from crates.io

74f9001

Share cargo target dir between integration build and tests so CUDA ke…

c38e57b

…rnels are built once

Detect agent disappearance in AgentsStreamWatcher and fail fast inste…

a1fb73b

…ad of hanging

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token usage and thinking classification#229

Token usage and thinking classification#229
mcharytoniuk wants to merge 46 commits into
mainfrom
token-usage-and-thinking-classification

mcharytoniuk commented May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mcharytoniuk commented May 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants