Skip to content

Token usage and thinking classification#229

Open
mcharytoniuk wants to merge 46 commits into
mainfrom
token-usage-and-thinking-classification
Open

Token usage and thinking classification#229
mcharytoniuk wants to merge 46 commits into
mainfrom
token-usage-and-thinking-classification

Conversation

@mcharytoniuk
Copy link
Copy Markdown
Contributor

No description provided.

Wire format
-----------
- Replace GeneratedTokenResult::Token(String) with three explicit variants:
  ContentToken, ReasoningToken, ToolCallToken (plus existing
  UndeterminableToken).
- Replace GeneratedTokenResult::Done unit variant with Done(GenerationSummary)
  carrying the final TokenUsage with prompt/cached/image/audio/content/
  reasoning/tool_call/undeterminable token counts.

Agent
-----
- Construct a per-request SampledTokenClassifier from the model and feed every
  sampled token through ingest(), then emit the matching token variant on the
  inference channel; usage is converted from the bindings type once per
  generation and shipped on the Done event.

Transformer trait
-----------------
- TransformsOutgoingMessage::transform now returns Vec<TransformResult> so a
  single message can produce multiple SSE chunks.

OpenAI compat endpoint
----------------------
- Add stream_options.include_usage; honor it to emit a final usage chunk in
  streaming mode.
- Route reasoning tokens to delta.reasoning_content and tool-call tokens into
  delta.tool_calls function arguments fragments per the streaming spec.
- Forward the tools array from the request through to the agent.
- Non-streaming path replaces the simple text concatenator with an
  Arc<Mutex>-backed aggregator that buffers content/reasoning/tool-call text
  separately, parses tool-call JSON for name/arguments, and emits a single
  OpenAI chat.completion JSON with finish_reason "tool_calls" when applicable.

Local bindings
--------------
- Workspace dependency now points at sibling llama-cpp-bindings checkout
  (mtmd became always-on so the feature flag is dropped).
Adds a focused integration test that loads Qwen3 0.6B and asserts the
classifier resolves both the reasoning and tool-call marker pairs to
single special tokens. On failure the test attaches the rendered
no-tools / with-tools template outputs so marker-extraction issues can be
diagnosed without re-running the full inference pipeline.
The classifier emits the tool-call open/close markers as ToolCallToken
events alongside the JSON payload, so the non-streaming OpenAI aggregator's
buffer ends up shaped like `<tool_call>\n{...}\n</tool_call>`. Locate the
JSON object by its first `{` and last `}` before parsing — the same approach
llama.cpp's autoparser uses for JSON-native tool calls — so the resulting
function name and arguments survive marker text on either side.
OpenAIStreamingResponseTransformer now buffers a saw_tool_call flag flipped
on the first ToolCallToken; the trailing chunk's finish_reason becomes
"tool_calls" instead of "stop" when set. Aligns the streaming response with
OpenAI's spec for tool-using completions and with the non-streaming
aggregator that already reports the same finish reason.

Bumps max_completion_tokens on the reasoning-routing and non-streaming-usage
integration tests so Qwen3 has room to finish its <think> block before the
content phase, then assert on the corresponding streaming/non-streaming
fields.
Two new transformer-level tests confirm that emitting a ToolCallToken
during the turn flips the trailing chunk's finish_reason to "tool_calls"
and that a content-only turn still finishes with "stop".
Generate one call_id per request inside the streaming transformer's state,
include it in every delta.tool_calls fragment, and reuse the same prefix
("call_<nanoid>") that the non-streaming aggregator emits. Strict OpenAI
clients require the id field on each tool-call delta to correlate fragments.
- token_usage::{completion_tokens,total_tokens} are const-eligible.
- token_usage_from_bindings is also const-eligible.
- Replace ContinuousBatchScheduler's ad-hoc i32->u64 cast with an explicit
  expect-with-reason since max_tokens is non-negative by API contract.
- Use is_some_and / map_or_else instead of match-on-Option, drop the
  Result wrapping on a panic-only test, and scope the non-streaming
  Mutex guards so they drop before the empty-vec return.
- Adopt the GeneratedTokenResult::is_token method reference everywhere
  the previous lambda was inferred-redundant by clippy.
Replace the single InferenceMessageKind.TOKEN with one kind per token
variant — CONTENT_TOKEN, REASONING_TOKEN, TOOL_CALL_TOKEN,
UNDETERMINABLE_TOKEN — and parse the `Done` payload as a TokenUsage-bearing
GenerationSummary. is_token now matches every token kind and is_terminal
inverts off it. New unit tests cover each token variant, the populated
summary on Done, and the rejection of the legacy string-form Done.
New paddler::tool_call_* modules — one struct each, single responsibility:

- ToolCallBuffer: append-only string buffer; pure data, fully unit-tested.
- ToolCallParser: thin wrapper over Model::parse_chat_message; never
  deserialises JSON in Rust on model output.
- ToolCallValidator: schema-driven where the tool declared one,
  JSON-object structural check otherwise; always invoked, with
  ValidatorBuildError surfacing schema-load failures and
  ToolCallValidationError separating UnknownToolName / InvalidJson /
  NotAnObject / SchemaMismatch.
- ToolCallEvent: explicit event enum (Pending / Resolved / ParseFailed /
  ValidationFailed); pure data, unit-testable.
- ToolCallPipeline: composes Buffer + Parser + Validator. Same component
  is shared by both endpoints; integration tests cover the end-to-end
  behaviour.

Wire format gains three new GeneratedTokenResult variants —
ToolCallParsed (structured, always emitted on close marker) plus
ToolCallParseFailed and ToolCallValidationFailed (informational, do NOT
terminate the request). paddler_types::ParsedToolCall is the shared wire
value object.

Scheduler integration: ContinuousBatchActiveRequest holds an
Option<ToolCallPipeline>; the scheduler feeds every ToolCallToken to the
pipeline and finalises whenever the classifier transitions out of
in_tool_call state, emitting the resulting structured event downstream.
The pipeline is constructed only when the request actually has tools.

OpenAI compat refactor: deleted parse_tool_call_payload, locate_json_object,
and the ToolCallPayload struct — every JSON parse on model output is now
gone. Both transformers consume the structured ToolCallParsed event and
emit OpenAI-spec delta.tool_calls / message.tool_calls without ever
peeking inside the model's raw text. Per-variant arms moved into focused
helper methods (handle_content / handle_reasoning / handle_tool_call_parsed
/ handle_done) per single-responsibility audit.

ParsedToolCall is added to paddler_types as a Serialize/Deserialize wire
struct; bindings ships an internal twin for the FFI return.
…errors as hard Err; collapse image pipeline to one decode
Copilot AI review requested due to automatic review settings May 7, 2026 18:39
@mcharytoniuk mcharytoniuk requested review from a team as code owners May 7, 2026 18:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands Paddler’s token streaming protocol to classify streamed tokens (content vs reasoning vs tool-call) and report detailed token usage at completion, while adding an end-to-end tool-call parsing/validation pipeline. In parallel, it extracts the web/admin TypeScript wire schemas + state helpers into a new @intentee/paddler-client workspace package and updates the existing UI code to consume it.

Changes:

  • Add tool-call parsing/validation pipeline in paddler and surface structured tool-call events and richer “Done” summaries (usage breakdown) through streaming.
  • Extend request params with parse_tool_calls, refactor ContinueFromConversationHistoryParams imports, and update OpenAI compatibility transformers + tests for new token kinds/usage.
  • Introduce paddler_client_javascript workspace package (schemas, streaming helpers, errors) and migrate resources/ts usage/tests to it.

Reviewed changes

Copilot reviewed 278 out of 290 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tsconfig.json Switch module resolution to bundler for workspace-style TS imports.
resources/ts/webSocketProtocol.ts Remove local helper (migrated to JS client).
resources/ts/urlToAgentDesiredModel_test.ts Remove local test (migrated to JS client).
resources/ts/schemas/InferenceServiceGenerateTokensResponse.ts Remove local schema (migrated to JS client).
resources/ts/matchWebSocketState.ts Use @intentee/paddler-client WebSocket state types.
resources/ts/matchFetchJsonState.ts Use @intentee/paddler-client fetch state types; simplify success path.
resources/ts/matchEventSourceUpdateState.ts Use @intentee/paddler-client EventSource state types.
resources/ts/InferenceSocketClient.interface.ts Remove local interface (moved/embedded in JS client).
resources/ts/inferenceParametersFormKeys.ts Reintroduce boolean/number key helpers for UI forms.
resources/ts/hooks/useWebSocket.ts Use shared WebSocket state constants/types from JS client.
resources/ts/hooks/usePrompt.ts Switch HTTP streaming to streamHttpNdjson + updated response schema/tokenKind.
resources/ts/hooks/useFetchJson.ts Use shared fetch state constants/types from JS client.
resources/ts/hooks/useChatTemplateOverride.ts Import shared ChatTemplateSchema from JS client.
resources/ts/hooks/useBalancerDesiredState.ts Import shared BalancerDesiredStateSchema from JS client.
resources/ts/hooks/useAgentDesiredModelUrl.ts Import shared AgentDesiredModel + URL parser from JS client.
resources/ts/extractHuggingFaceUrlParts.ts Remove local helper (migrated to JS client).
resources/ts/ConversationMessageContentPart.type.ts Remove local type (migrated to JS client schemas).
resources/ts/ConversationMessage.type.ts Remove local type (migrated to JS client schemas).
resources/ts/contexts/InferenceParametersContext.ts Import shared InferenceParameters type from JS client.
resources/ts/contexts/ChatTemplateContext.ts Import shared ChatTemplate type from JS client.
resources/ts/components/PromptPage.tsx Import webSocketProtocol from JS client.
resources/ts/components/ModelMetadataPreviewButton.tsx Import shared Agent type from JS client.
resources/ts/components/ModelMetadataLoader.tsx Import shared Agent type from JS client.
resources/ts/components/ModelMetadata.tsx Import shared Agent type from JS client.
resources/ts/components/ModelChatTemplateOverridePreviewButton.tsx Import shared Agent type from JS client.
resources/ts/components/InferenceParametersContextProvider.tsx Import shared InferenceParameters type from JS client.
resources/ts/components/InferenceParameterPoolingType.tsx Import shared constants from JS client schemas.
resources/ts/components/InferenceParameterInput.tsx Use local form-key helper types for numeric keys.
resources/ts/components/InferenceParameterCheckbox.tsx Use local form-key helper types for boolean keys.
resources/ts/components/InferenceParameterCacheDtype.tsx Import shared constants from JS client schemas.
resources/ts/components/ChatTemplateOverrideLoader.tsx Import shared Agent type from JS client.
resources/ts/components/ChatTemplateEditButton.tsx Import shared ChatTemplate type from JS client.
resources/ts/components/ChatTemplateContextProvider.tsx Import shared ChatTemplate type from JS client.
resources/ts/components/ChangeModelPage.tsx Import shared AgentDesiredModel type from JS client.
resources/ts/components/ChangeModelForm.tsx Import shared BalancerDesiredState type from JS client.
resources/ts/components/BufferedRequestsStream.tsx Import shared response schema from JS client.
resources/ts/components/AgentListStream.tsx Import shared response schema from JS client.
resources/ts/components/AgentListAgentStatus.tsx Import shared Agent type from JS client.
resources/ts/components/AgentList.tsx Import shared Agent type from JS client.
resources/ts/components/AgentIssuesPreviewButton.tsx Import shared AgentIssue type from JS client.
resources/ts/components/AgentIssues.tsx Import shared AgentIssue type from JS client.
paddler/src/tool_call_validation_error.rs Add validation error types for tool calls.
paddler/src/tool_call_pipeline.rs Add buffering/parse/validate pipeline for tool-call fragments.
paddler/src/tool_call_parser.rs Parse tool calls via llama.cpp and template-override fallback.
paddler/src/tool_call_parse_error.rs Define parse errors for tool-call parsing.
paddler/src/tool_call_event.rs Introduce tool-call pipeline events + unit tests.
paddler/src/sets_desired_state.rs Move AgentDesiredState import to paddler_types.
paddler/src/lib.rs Export new tool-call modules; remove llama_cpp_bindings re-export.
paddler/src/cancellation_token_stream_guard.rs Formatting-only change to poll_next signature.
paddler/src/balancer/inference_service/http_route/api/post_generate_embedding_batch.rs Update transformer trait to return multiple chunks; preserve NDJSON streaming.
paddler/src/balancer/inference_service/http_route/api/post_continue_from_conversation_history.rs Update params import path for ContinueFromConversationHistoryParams.
paddler/src/balancer/chunk_forwarding_session_controller/transforms_outgoing_message.rs Change transformer contract to Vec<TransformResult>.
paddler/src/balancer/chunk_forwarding_session_controller/mod.rs Forward multiple transform results per outgoing message.
paddler/src/balancer/chunk_forwarding_session_controller/identity_transformer.rs Adapt identity transformer to new multi-result interface.
paddler/src/balancer/agent_controller.rs Update imports to paddler_types + new params path.
paddler/src/balancer/agent_controller_pool.rs Update imports + minor formatting simplification.
paddler/src/balancer_applicable_state.rs Use paddler_types::agent_desired_state::AgentDesiredState.
paddler/src/balancer_applicable_state_holder.rs Use paddler_types::agent_desired_state::AgentDesiredState.
paddler/src/agent/reconciliation_service.rs Use paddler_types::agent_desired_state::AgentDesiredState.
paddler/src/agent/prepared_conversation_history_request.rs Carry parse_tool_calls + validated tools into prepared requests.
paddler/src/agent/prepare_conversation_history_request.rs Thread parse_tool_calls through; consolidate image prep with prepared_for_inference.
paddler/src/agent/management_socket_client_service.rs Use paddler_types::agent_desired_state::AgentDesiredState.
paddler/src/agent/jsonrpc/request.rs Update ContinueFromConversationHistoryParams import path.
paddler/src/agent/jsonrpc/notification_params/set_state_params.rs Use paddler_types::agent_desired_state::AgentDesiredState.
paddler/src/agent/continuous_batch_scheduler/tool_call_pipeline_build_outcome.rs Add outcome type for enabling tool-call pipeline.
paddler/src/agent/continuous_batch_scheduler/tool_call_pass.rs Add pass to finalize tool-call parsing when leaving tool-call section.
paddler/src/agent/continuous_batch_scheduler/sample_token_phase.rs Extract sampling logic into a dedicated phase type.
paddler/src/agent/continuous_batch_scheduler/sample_outcome.rs Define typed outcomes for sampling stage.
paddler/src/agent/continuous_batch_scheduler/ingesting_contribution.rs Track prompt-ingest contributions per pass.
paddler/src/agent/continuous_batch_scheduler/generating_contribution.rs Track generating contributions per pass.
paddler/src/agent/continuous_batch_scheduler/emit_token_phase.rs Emit classified token events (content/reasoning/tool-call/undeterminable).
paddler/src/agent/continuous_batch_scheduler/emit_token_outcome.rs Define outcomes for emission stage.
paddler/src/agent/continuous_batch_scheduler/decode_outcome.rs Wrap decode outcomes + add unit tests for mapping.
paddler/src/agent/continuous_batch_scheduler/decode_batch_phase.rs Extract decode execution into a dedicated phase.
paddler/src/agent/continuous_batch_scheduler/contributions.rs Aggregate ingest/generate contributions per batch pass.
paddler/src/agent/continuous_batch_scheduler/completion_check_phase.rs Stop on EOG or max_tokens using classifier usage counts.
paddler/src/agent/continuous_batch_scheduler/completion_check_outcome.rs Define completion-check outcomes.
paddler/src/agent/continuous_batch_scheduler/commit_phase.rs Commit batch-pass effects back into active requests.
paddler/src/agent/continuous_batch_scheduler/classify_token_phase.rs Classify sampled tokens and track tool-call section transitions.
paddler/src/agent/continuous_batch_scheduler/classified_token.rs Store raw vs visible pieces and tool-call section flags.
paddler/src/agent/continuous_batch_scheduler/batch_pass.rs Encapsulate LlamaBatch plus contribution bookkeeping.
paddler/src/agent/continuous_batch_scheduler/advance_outcome.rs Represent “advance” outcomes + add unit tests.
paddler/src/agent/continuous_batch_arbiter.rs Update token-to-piece calls to new SampledToken API.
paddler/src/agent/continuous_batch_active_request.rs Track token classifier, pending SampledToken, and optional tool-call pipeline.
paddler/src/agent/continue_from_conversation_history_request.rs Update params import path.
paddler/src/agent_desired_state.rs Stop re-exporting AgentDesiredState; use paddler_types directly.
paddler_types/src/request_params/mod.rs Remove re-export of ContinueFromConversationHistoryParams.
paddler_types/src/request_params/continue_from_conversation_history_params/tool/tool_params/mod.rs Remove re-export; require explicit path for FunctionCall.
paddler_types/src/request_params/continue_from_conversation_history_params/tool/tool_params/function_call/parameters_schema/raw_parameters_schema.rs Simplify schema validation (keep required-in-properties check).
paddler_types/src/request_params/continue_from_conversation_history_params/tool/mod.rs Fix FunctionCall import path and ordering.
paddler_types/src/request_params/continue_from_conversation_history_params/mod.rs Add parse_tool_calls flag and propagate through validation.
paddler_types/src/lib.rs Export new generation_summary module.
paddler_types/src/inference_server/request.rs Update params import path.
paddler_types/src/generation_summary.rs Add GenerationSummary carrying TokenUsage.
paddler_types/Cargo.toml Add dependency on llama-cpp-bindings-types.
paddler_tests/tests/smolvlm2_generates_tokens_from_image_input.rs Update params + token-kind checks + Done now carries summary.
paddler_tests/tests/qwen35_without_mmproj_rejects_image_with_multimodal_not_supported.rs Add parse_tool_calls field.
paddler_tests/tests/qwen35_with_system_message_completes_without_thinking.rs Add parse_tool_calls; update token-kind checks + Done summary.
paddler_tests/tests/qwen35_with_system_message_completes_with_thinking.rs Add parse_tool_calls; update token-kind checks + Done summary.
paddler_tests/tests/qwen35_with_mmproj_generates_tokens_from_image.rs Add parse_tool_calls; update token-kind checks + Done summary.
paddler_tests/tests/qwen35_thinking_multi_turn_conversation_stops_cleanly.rs Add parse_tool_calls; update token-kind checks + Done summary.
paddler_tests/tests/qwen35_thinking_mode_stops_cleanly_before_max_tokens.rs Add parse_tool_calls; update token-kind checks + Done summary.
paddler_tests/tests/qwen35_generation_stops_at_eog_before_max_tokens.rs Add parse_tool_calls; update token-kind checks + Done summary.
paddler_tests/tests/qwen35_generates_tokens_for_long_system_and_user_prompt.rs Add parse_tool_calls; update token-kind checks + Done summary.
paddler_tests/tests/qwen3_without_grammar_generates_unconstrained_output.rs Update token-kind checks via is_token().
paddler_tests/tests/qwen3_openai_streaming_usage_breakdown_with_thinking.rs New test for OpenAI streaming usage chunk with thinking.
paddler_tests/tests/qwen3_openai_streaming_routes_reasoning_to_reasoning_content.rs New test mapping reasoning tokens to OpenAI reasoning_content.
paddler_tests/tests/qwen3_openai_streaming_omits_usage_when_not_requested.rs New test ensuring usage omitted unless requested.
paddler_tests/tests/qwen3_openai_streaming_emits_usage_when_requested.rs New test ensuring trailing usage chunk emitted when requested.
paddler_tests/tests/qwen3_openai_streaming_emits_tool_calls_for_function_tool.rs New test for structured OpenAI streaming tool calls.
paddler_tests/tests/qwen3_openai_non_streaming_usage_with_tool_calls.rs New test for non-streaming usage + tool calls.
paddler_tests/tests/qwen3_openai_non_streaming_returns_usage.rs New test for non-streaming usage fields.
paddler_tests/tests/qwen3_internal_endpoint_with_thinking_enabled_emits_reasoning_tokens.rs New test for reasoning token classification + usage invariants.
paddler_tests/tests/qwen3_internal_endpoint_with_thinking_disabled_emits_no_reasoning_tokens.rs New test ensuring no reasoning tokens when disabled.
paddler_tests/tests/qwen3_internal_endpoint_pure_content_usage.rs New test for pure content usage breakdown.
paddler_tests/tests/qwen3_internal_endpoint_max_tokens_usage_matches.rs New test ensuring usage completion count matches streamed token count.
paddler_tests/tests/qwen3_internal_endpoint_concurrent_requests_independent_usage.rs New test ensuring per-request usage counters are independent.
paddler_tests/tests/qwen3_grammar_with_thinking_returns_incompatible_error.rs Add parse_tool_calls field.
paddler_tests/tests/qwen3_generates_tokens_from_raw_prompt.rs Update token-kind checks + Done summary.
paddler_tests/tests/qwen3_generates_tokens_from_conversation_history.rs Add parse_tool_calls; update token-kind checks + Done summary.
paddler_tests/tests/qwen25vl_generates_tokens_from_image_input.rs Add parse_tool_calls; update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_two_concurrent_multimodal_requests_produce_tokens.rs Add parse_tool_calls; update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_stops_generation_when_stop_sender_dropped.rs Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_stops_at_max_tokens_boundary.rs Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_smoke.rs Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_serves_four_concurrent_requests.rs Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_reuses_slot_after_request_completes.rs Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_plain_and_multimodal_run_concurrently.rs Add parse_tool_calls; update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_long_and_short_prompts_complete_concurrently.rs Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_generates_tokens_with_partial_layer_offload.rs Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_generates_tokens_with_distinct_k_and_v_cache_dtypes.rs Update token-kind checks + Done summary.
paddler_tests/tests/continuous_batch_evicts_long_sequence_under_kv_pressure.rs Update Done summary matching.
paddler_tests/tests/continuous_batch_concurrent_conversation_history_requests_complete.rs Add parse_tool_calls; update token-kind checks + Done summary.
paddler_tests/tests/chat_template_swaps_between_inference_calls.rs Add parse_tool_calls; update token-kind checks via is_token.
paddler_tests/tests/chat_template_override_replaces_model_builtin.rs Add parse_tool_calls; update token-kind checks via is_token.
paddler_tests/tests/chat_template_drains_in_flight_inference_before_swap.rs Add parse_tool_calls; update token-kind checks + Done summary.
paddler_tests/tests/balancer_completes_in_flight_inference_during_model_switch.rs Fix race by awaiting first token before triggering model switch.
paddler_tests/tests/agent_text_only_model_rejects_image_input.rs Add parse_tool_calls field.
paddler_tests/tests/agent_streams_tokens_from_raw_prompt.rs Update token-kind checks via is_token().
paddler_tests/tests/agent_streams_tokens_from_image_data_uri.rs Add parse_tool_calls; update token-kind checks via is_token.
paddler_tests/tests/agent_streams_tokens_from_conversation_history_over_http.rs Add parse_tool_calls; update token-kind checks via is_token().
paddler_tests/tests/agent_serves_four_concurrent_clients_streaming_tokens.rs Update token-kind checks via is_token().
paddler_tests/tests/agent_returns_image_decoding_error_for_remote_url.rs Add parse_tool_calls field.
paddler_tests/tests/agent_returns_image_decoding_error_for_malformed_data_uri.rs Add parse_tool_calls field.
paddler_tests/tests/agent_returns_image_decoding_error_for_invalid_base64.rs Add parse_tool_calls field.
paddler_tests/tests/agent_rejects_tool_with_invalid_required_field_in_schema.rs Update FunctionCall import path; enable parse_tool_calls for test.
paddler_tests/tests/agent_raw_prompt_respects_max_tokens.rs Update token-kind checks via is_token().
paddler_tests/tests/agent_openai_chat_completions_non_streaming_returns_text.rs Increase max tokens and explicitly disable thinking in template kwargs.
paddler_tests/tests/agent_grammar_with_thinking_returns_incompatible_error.rs Add parse_tool_calls field.
paddler_tests/tests/agent_exits_cleanly_on_sigterm_during_multimodal_inference.rs Add parse_tool_calls field.
paddler_tests/tests/agent_conversation_with_json_schema_grammar_returns_valid_json.rs Add parse_tool_calls field.
paddler_tests/tests/agent_conversation_with_gbnf_grammar_constrains_output.rs Add parse_tool_calls field.
paddler_tests/tests/agent_conversation_with_function_tool_succeeds.rs Update FunctionCall import path; enable parse_tool_calls for tools test.
paddler_tests/tests/agent_conversation_history_respects_max_tokens.rs Add parse_tool_calls; update token-kind checks via is_token().
paddler_tests/tests/agent_conversation_accepts_empty_tools_list.rs Add parse_tool_calls; update token-kind checks via is_token().
paddler_tests/src/test_device.rs Import llama_cpp_bindings directly (no longer re-exported by paddler).
paddler_tests/src/start_in_process_cluster_with_qwen3_6.rs New helper to start cluster with Qwen3.6 model card.
paddler_tests/src/start_in_process_cluster_with_ministral_3.rs New helper to start cluster with Ministral 3 model card.
paddler_tests/src/start_in_process_cluster_with_gemma_4.rs New helper to start cluster with Gemma 4 model card.
paddler_tests/src/openai_chat_completions_client.rs New test client for OpenAI chat completions endpoints.
paddler_tests/src/model_card/qwen3_6_35b_a3b.rs New model card for Qwen3.6 35B A3B.
paddler_tests/src/model_card/mod.rs Register new model cards/modules.
paddler_tests/src/model_card/ministral_3_14b_reasoning.rs New model card for Ministral 3 14B reasoning.
paddler_tests/src/model_card/gemma_4_e4b_it.rs New model card for Gemma 4 E4B IT.
paddler_tests/src/lib.rs Export new helpers for OpenAI client and model cards.
paddler_tests/src/inference_http_client.rs Make client cloneable; update params import path.
paddler_tests/src/collect_generated_tokens.rs Accumulate text via token_text() abstraction.
paddler_tests/src/cluster_handle.rs Add Drop impl to ensure subprocess cleanup; refactor shutdown ownership.
paddler_tests/Cargo.toml Add hf-hub + llama-cpp-bindings deps for tests.
paddler_gui/src/running_balancer_snapshot.rs Use AgentDesiredState from paddler_types in tests.
paddler_client/src/lib.rs Make internal modules private and re-export intended public API.
paddler_client/src/client_inference.rs Update params import path.
paddler_client_python/tests/test_tool_call_arguments.py New tests for Python tool-call arguments tagged enum parsing.
paddler_client_python/tests/test_stream_ndjson.py Update token/done wire shapes and message kind enum.
paddler_client_python/tests/test_response_stream.py Update token kind to content-token.
paddler_client_python/tests/test_parsed_tool_call.py New tests for Python parsed tool-call model.
paddler_client_python/tests/test_integration_inference.py Update integration assertions for new token kind enum.
paddler_client_python/tests/test_client_inference.py Update NDJSON helpers + message kind enum.
paddler_client_python/paddler_client/tool_call_arguments.py Add Python ToolCallArguments tagged enum + parser.
paddler_client_python/paddler_client/parsed_tool_call.py Add Python ParsedToolCall model + dict conversion.
paddler_client_javascript/tsconfig.json Add TS config for new JS client package build output.
paddler_client_javascript/tests/webSocketProtocol.test.ts Add tests for protocol mapping helper.
paddler_client_javascript/tests/urlToAgentDesiredModel.test.ts Add tests for URL→model parsing behavior.
paddler_client_javascript/tests/streamHttpNdjson.test.ts Add tests for NDJSON streaming helper + errors.
paddler_client_javascript/tests/schemas/ParsedToolCall.test.ts Add tests for parsed tool-call schema.
paddler_client_javascript/tests/schemas/InferenceServiceGenerateTokensResponse.test.ts Add tests for new token kinds + Done usage summary mapping.
paddler_client_javascript/tests/schemas/Agent.test.ts Add tests for Agent schema parsing/validation.
paddler_client_javascript/tests/PaddlerError.test.ts Add tests for new error subclasses.
paddler_client_javascript/tests/fetchJson.test.ts Add tests for fetchJson helper + HttpError.
paddler_client_javascript/tests/extractHuggingFaceUrlParts.test.ts Add tests for HuggingFace URL parsing helper.
paddler_client_javascript/src/WebSocketState.ts Add shared WebSocket state union type.
paddler_client_javascript/src/webSocketProtocol.ts Add shared WebSocket protocol mapping helper.
paddler_client_javascript/src/WebSocketError.ts Add WebSocket error type.
paddler_client_javascript/src/WebSocketConnectionOpenedState.ts Add opened WebSocket state type.
paddler_client_javascript/src/WebSocketConnectionErrorState.ts Add error WebSocket state + frozen constant.
paddler_client_javascript/src/WebSocketConnectionClosedState.ts Add closed WebSocket state + frozen constant.
paddler_client_javascript/src/WebSocketConnectingState.ts Add connecting WebSocket state + frozen constant.
paddler_client_javascript/src/urlToAgentDesiredModel.ts Implement URL→desired-model parsing with explicit error on unsupported formats.
paddler_client_javascript/src/streamHttpNdjson.ts Add HTTP NDJSON streaming helper (Observable-based).
paddler_client_javascript/src/streamEventSource.ts Add SSE/EventSource streaming helper emitting connection/data states.
paddler_client_javascript/src/ServerError.ts Add server error type with integer code.
paddler_client_javascript/src/schemas/ValidatedParametersSchema.ts Add schema for validated function parameter JSON Schema.
paddler_client_javascript/src/schemas/Tool.ts Add OpenAI-like function tool schema definitions.
paddler_client_javascript/src/schemas/PoolingType.ts Add pooling type enum schema.
paddler_client_javascript/src/schemas/ParsedToolCall.ts Add parsed tool call schema with tagged arguments union.
paddler_client_javascript/src/schemas/ModelMetadata.ts Add schema for model metadata map.
paddler_client_javascript/src/schemas/InferenceParameters.ts Remove exported BooleanKeys/NumberKeys (moved to UI helper).
paddler_client_javascript/src/schemas/HuggingFaceModelReference.ts Add schema for HuggingFace model reference.
paddler_client_javascript/src/schemas/HuggingFaceDownloadLock.ts Minor formatting/import tweak.
paddler_client_javascript/src/schemas/GrammarConstraint.ts Add schema for grammar constraints.
paddler_client_javascript/src/schemas/GenerateEmbeddingBatchParams.ts Add schema for embedding batch params.
paddler_client_javascript/src/schemas/EmbeddingNormalizationMethod.ts Add schema for embedding normalization variants.
paddler_client_javascript/src/schemas/EmbeddingInputDocument.ts Add schema for embedding input documents.
paddler_client_javascript/src/schemas/Embedding.ts Add schema for embedding response items.
paddler_client_javascript/src/schemas/ConversationMessageContentPart.ts Add schema for multimodal conversation content parts.
paddler_client_javascript/src/schemas/ConversationMessage.ts Add schema for conversation message payloads.
paddler_client_javascript/src/schemas/ContinueFromRawPromptParams.ts Add schema for raw-prompt inference params.
paddler_client_javascript/src/schemas/ContinueFromConversationHistoryParams.ts Add schema for conversation-history inference params incl parse_tool_calls.
paddler_client_javascript/src/schemas/ChatTemplate.ts Add chat template schema.
paddler_client_javascript/src/schemas/BufferedRequestsResponse.ts Add buffered requests snapshot schema.
paddler_client_javascript/src/schemas/BalancerDesiredState.ts Add balancer desired state schema.
paddler_client_javascript/src/schemas/AgentsResponse.ts Add agents response schema + stable sort transform.
paddler_client_javascript/src/schemas/AgentIssueModelPath.ts Add schema for model-path structured errors.
paddler_client_javascript/src/schemas/AgentIssue.ts Add schema for agent issue union.
paddler_client_javascript/src/schemas/AgentDesiredModel.ts Add schema for desired model union variants.
paddler_client_javascript/src/schemas/Agent.ts Add schema for agent status payload.
paddler_client_javascript/src/PaddlerError.ts Add base error class for JS client.
paddler_client_javascript/src/JsonError.ts Add JSON parse error carrying raw payload.
paddler_client_javascript/src/inferenceSocketClient.ts Implement WS inference client; switch request id generation to crypto.randomUUID().
paddler_client_javascript/src/HttpError.ts Add HTTP error class carrying status code.
paddler_client_javascript/src/FetchJsonSuccessState.ts Add fetch-json success state type.
paddler_client_javascript/src/FetchJsonState.ts Add fetch-json state union type.
paddler_client_javascript/src/FetchJsonLoadingState.ts Add loading state type + frozen constant.
paddler_client_javascript/src/FetchJsonErrorState.ts Add error state type.
paddler_client_javascript/src/FetchJsonEmptyState.ts Add empty state type + frozen constant.
paddler_client_javascript/src/fetchJson.ts Add HTTP JSON helper with schema validation.
paddler_client_javascript/src/extractHuggingFaceUrlParts.ts Add HuggingFace URL parsing helper without path-to-regexp.
paddler_client_javascript/src/EventSourceState.ts Add EventSource state union type.
paddler_client_javascript/src/EventSourceInitialState.ts Add initial EventSource state + frozen constant.
paddler_client_javascript/src/EventSourceDeserializationErrorState.ts Add deserialization error state + frozen constant.
paddler_client_javascript/src/EventSourceDataSnapshotState.ts Add typed data snapshot state for SSE.
paddler_client_javascript/src/EventSourceConnectionErrorState.ts Add connection error state + frozen constant.
paddler_client_javascript/src/EventSourceConnectedState.ts Add connected state + frozen constant.
paddler_client_javascript/src/ConnectionDroppedError.ts Add error for dropped streaming connection keyed by request id.
paddler_client_javascript/shell.nix Add dev shell with Node 22.
paddler_client_javascript/README.md Document JS client usage patterns (WS, NDJSON, SSE).
paddler_client_javascript/package.json Define package metadata, exports pattern, peer deps, and test/build scripts.
paddler_client_javascript/Makefile Add package-local build/test targets.
paddler_client_javascript/.gitignore Ignore node artifacts in JS client package.
paddler_cli/src/main.rs Reorder module declarations/imports; keep conditional web panel init.
paddler_bootstrap/tests/runners.rs Minor formatting simplification for agent runner start.
paddler_bootstrap/src/bootstrapped_agent_handle.rs Import AgentDesiredState from paddler_types.
package.json Mark repo private and add npm workspace for JS client package.
Makefile Add top-level build/test targets for JS client workspace.
jarmuz/run-website.mjs Watch JS client sources and trigger TS jobs on changes.
Cargo.toml Switch llama-cpp-bindings crates from crates.io to path deps; add types crate.
Cargo.lock Reflect llama-cpp-bindings path upgrade and new types crate; update deps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +18 to +24
return new Observable(function (subscriber) {
fetch(url, {
body: JSON.stringify(body),
headers: { "Content-Type": "application/json" },
method: "POST",
signal,
})
Comment on lines +75 to +77
match strategy {
ValidationStrategy::JsonObjectOnly => Ok(()),
ValidationStrategy::Schema(validator) => {
malzag and others added 21 commits May 9, 2026 03:37
…e per-chunk cap; consolidate test agents into AgentConfig
…parser; raise TypeError on non-dict/non-list payloads
…ication' into token-usage-and-thinking-classification
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants