Token usage and thinking classification#229
Open
mcharytoniuk wants to merge 46 commits into
Open
Conversation
Wire format ----------- - Replace GeneratedTokenResult::Token(String) with three explicit variants: ContentToken, ReasoningToken, ToolCallToken (plus existing UndeterminableToken). - Replace GeneratedTokenResult::Done unit variant with Done(GenerationSummary) carrying the final TokenUsage with prompt/cached/image/audio/content/ reasoning/tool_call/undeterminable token counts. Agent ----- - Construct a per-request SampledTokenClassifier from the model and feed every sampled token through ingest(), then emit the matching token variant on the inference channel; usage is converted from the bindings type once per generation and shipped on the Done event. Transformer trait ----------------- - TransformsOutgoingMessage::transform now returns Vec<TransformResult> so a single message can produce multiple SSE chunks. OpenAI compat endpoint ---------------------- - Add stream_options.include_usage; honor it to emit a final usage chunk in streaming mode. - Route reasoning tokens to delta.reasoning_content and tool-call tokens into delta.tool_calls function arguments fragments per the streaming spec. - Forward the tools array from the request through to the agent. - Non-streaming path replaces the simple text concatenator with an Arc<Mutex>-backed aggregator that buffers content/reasoning/tool-call text separately, parses tool-call JSON for name/arguments, and emits a single OpenAI chat.completion JSON with finish_reason "tool_calls" when applicable. Local bindings -------------- - Workspace dependency now points at sibling llama-cpp-bindings checkout (mtmd became always-on so the feature flag is dropped).
Adds a focused integration test that loads Qwen3 0.6B and asserts the classifier resolves both the reasoning and tool-call marker pairs to single special tokens. On failure the test attaches the rendered no-tools / with-tools template outputs so marker-extraction issues can be diagnosed without re-running the full inference pipeline.
The classifier emits the tool-call open/close markers as ToolCallToken
events alongside the JSON payload, so the non-streaming OpenAI aggregator's
buffer ends up shaped like `<tool_call>\n{...}\n</tool_call>`. Locate the
JSON object by its first `{` and last `}` before parsing — the same approach
llama.cpp's autoparser uses for JSON-native tool calls — so the resulting
function name and arguments survive marker text on either side.
OpenAIStreamingResponseTransformer now buffers a saw_tool_call flag flipped on the first ToolCallToken; the trailing chunk's finish_reason becomes "tool_calls" instead of "stop" when set. Aligns the streaming response with OpenAI's spec for tool-using completions and with the non-streaming aggregator that already reports the same finish reason. Bumps max_completion_tokens on the reasoning-routing and non-streaming-usage integration tests so Qwen3 has room to finish its <think> block before the content phase, then assert on the corresponding streaming/non-streaming fields.
Two new transformer-level tests confirm that emitting a ToolCallToken during the turn flips the trailing chunk's finish_reason to "tool_calls" and that a content-only turn still finishes with "stop".
Generate one call_id per request inside the streaming transformer's state,
include it in every delta.tool_calls fragment, and reuse the same prefix
("call_<nanoid>") that the non-streaming aggregator emits. Strict OpenAI
clients require the id field on each tool-call delta to correlate fragments.
- token_usage::{completion_tokens,total_tokens} are const-eligible.
- token_usage_from_bindings is also const-eligible.
- Replace ContinuousBatchScheduler's ad-hoc i32->u64 cast with an explicit
expect-with-reason since max_tokens is non-negative by API contract.
- Use is_some_and / map_or_else instead of match-on-Option, drop the
Result wrapping on a panic-only test, and scope the non-streaming
Mutex guards so they drop before the empty-vec return.
- Adopt the GeneratedTokenResult::is_token method reference everywhere
the previous lambda was inferred-redundant by clippy.
Replace the single InferenceMessageKind.TOKEN with one kind per token variant — CONTENT_TOKEN, REASONING_TOKEN, TOOL_CALL_TOKEN, UNDETERMINABLE_TOKEN — and parse the `Done` payload as a TokenUsage-bearing GenerationSummary. is_token now matches every token kind and is_terminal inverts off it. New unit tests cover each token variant, the populated summary on Done, and the rejection of the legacy string-form Done.
New paddler::tool_call_* modules — one struct each, single responsibility: - ToolCallBuffer: append-only string buffer; pure data, fully unit-tested. - ToolCallParser: thin wrapper over Model::parse_chat_message; never deserialises JSON in Rust on model output. - ToolCallValidator: schema-driven where the tool declared one, JSON-object structural check otherwise; always invoked, with ValidatorBuildError surfacing schema-load failures and ToolCallValidationError separating UnknownToolName / InvalidJson / NotAnObject / SchemaMismatch. - ToolCallEvent: explicit event enum (Pending / Resolved / ParseFailed / ValidationFailed); pure data, unit-testable. - ToolCallPipeline: composes Buffer + Parser + Validator. Same component is shared by both endpoints; integration tests cover the end-to-end behaviour. Wire format gains three new GeneratedTokenResult variants — ToolCallParsed (structured, always emitted on close marker) plus ToolCallParseFailed and ToolCallValidationFailed (informational, do NOT terminate the request). paddler_types::ParsedToolCall is the shared wire value object. Scheduler integration: ContinuousBatchActiveRequest holds an Option<ToolCallPipeline>; the scheduler feeds every ToolCallToken to the pipeline and finalises whenever the classifier transitions out of in_tool_call state, emitting the resulting structured event downstream. The pipeline is constructed only when the request actually has tools. OpenAI compat refactor: deleted parse_tool_call_payload, locate_json_object, and the ToolCallPayload struct — every JSON parse on model output is now gone. Both transformers consume the structured ToolCallParsed event and emit OpenAI-spec delta.tool_calls / message.tool_calls without ever peeking inside the model's raw text. Per-variant arms moved into focused helper methods (handle_content / handle_reasoning / handle_tool_call_parsed / handle_done) per single-responsibility audit. ParsedToolCall is added to paddler_types as a Serialize/Deserialize wire struct; bindings ships an internal twin for the FFI return.
…errors as hard Err; collapse image pipeline to one decode
…ool-call internal tests
Contributor
There was a problem hiding this comment.
Pull request overview
This PR expands Paddler’s token streaming protocol to classify streamed tokens (content vs reasoning vs tool-call) and report detailed token usage at completion, while adding an end-to-end tool-call parsing/validation pipeline. In parallel, it extracts the web/admin TypeScript wire schemas + state helpers into a new @intentee/paddler-client workspace package and updates the existing UI code to consume it.
Changes:
- Add tool-call parsing/validation pipeline in
paddlerand surface structured tool-call events and richer “Done” summaries (usage breakdown) through streaming. - Extend request params with
parse_tool_calls, refactorContinueFromConversationHistoryParamsimports, and update OpenAI compatibility transformers + tests for new token kinds/usage. - Introduce
paddler_client_javascriptworkspace package (schemas, streaming helpers, errors) and migrateresources/tsusage/tests to it.
Reviewed changes
Copilot reviewed 278 out of 290 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tsconfig.json | Switch module resolution to bundler for workspace-style TS imports. |
| resources/ts/webSocketProtocol.ts | Remove local helper (migrated to JS client). |
| resources/ts/urlToAgentDesiredModel_test.ts | Remove local test (migrated to JS client). |
| resources/ts/schemas/InferenceServiceGenerateTokensResponse.ts | Remove local schema (migrated to JS client). |
| resources/ts/matchWebSocketState.ts | Use @intentee/paddler-client WebSocket state types. |
| resources/ts/matchFetchJsonState.ts | Use @intentee/paddler-client fetch state types; simplify success path. |
| resources/ts/matchEventSourceUpdateState.ts | Use @intentee/paddler-client EventSource state types. |
| resources/ts/InferenceSocketClient.interface.ts | Remove local interface (moved/embedded in JS client). |
| resources/ts/inferenceParametersFormKeys.ts | Reintroduce boolean/number key helpers for UI forms. |
| resources/ts/hooks/useWebSocket.ts | Use shared WebSocket state constants/types from JS client. |
| resources/ts/hooks/usePrompt.ts | Switch HTTP streaming to streamHttpNdjson + updated response schema/tokenKind. |
| resources/ts/hooks/useFetchJson.ts | Use shared fetch state constants/types from JS client. |
| resources/ts/hooks/useChatTemplateOverride.ts | Import shared ChatTemplateSchema from JS client. |
| resources/ts/hooks/useBalancerDesiredState.ts | Import shared BalancerDesiredStateSchema from JS client. |
| resources/ts/hooks/useAgentDesiredModelUrl.ts | Import shared AgentDesiredModel + URL parser from JS client. |
| resources/ts/extractHuggingFaceUrlParts.ts | Remove local helper (migrated to JS client). |
| resources/ts/ConversationMessageContentPart.type.ts | Remove local type (migrated to JS client schemas). |
| resources/ts/ConversationMessage.type.ts | Remove local type (migrated to JS client schemas). |
| resources/ts/contexts/InferenceParametersContext.ts | Import shared InferenceParameters type from JS client. |
| resources/ts/contexts/ChatTemplateContext.ts | Import shared ChatTemplate type from JS client. |
| resources/ts/components/PromptPage.tsx | Import webSocketProtocol from JS client. |
| resources/ts/components/ModelMetadataPreviewButton.tsx | Import shared Agent type from JS client. |
| resources/ts/components/ModelMetadataLoader.tsx | Import shared Agent type from JS client. |
| resources/ts/components/ModelMetadata.tsx | Import shared Agent type from JS client. |
| resources/ts/components/ModelChatTemplateOverridePreviewButton.tsx | Import shared Agent type from JS client. |
| resources/ts/components/InferenceParametersContextProvider.tsx | Import shared InferenceParameters type from JS client. |
| resources/ts/components/InferenceParameterPoolingType.tsx | Import shared constants from JS client schemas. |
| resources/ts/components/InferenceParameterInput.tsx | Use local form-key helper types for numeric keys. |
| resources/ts/components/InferenceParameterCheckbox.tsx | Use local form-key helper types for boolean keys. |
| resources/ts/components/InferenceParameterCacheDtype.tsx | Import shared constants from JS client schemas. |
| resources/ts/components/ChatTemplateOverrideLoader.tsx | Import shared Agent type from JS client. |
| resources/ts/components/ChatTemplateEditButton.tsx | Import shared ChatTemplate type from JS client. |
| resources/ts/components/ChatTemplateContextProvider.tsx | Import shared ChatTemplate type from JS client. |
| resources/ts/components/ChangeModelPage.tsx | Import shared AgentDesiredModel type from JS client. |
| resources/ts/components/ChangeModelForm.tsx | Import shared BalancerDesiredState type from JS client. |
| resources/ts/components/BufferedRequestsStream.tsx | Import shared response schema from JS client. |
| resources/ts/components/AgentListStream.tsx | Import shared response schema from JS client. |
| resources/ts/components/AgentListAgentStatus.tsx | Import shared Agent type from JS client. |
| resources/ts/components/AgentList.tsx | Import shared Agent type from JS client. |
| resources/ts/components/AgentIssuesPreviewButton.tsx | Import shared AgentIssue type from JS client. |
| resources/ts/components/AgentIssues.tsx | Import shared AgentIssue type from JS client. |
| paddler/src/tool_call_validation_error.rs | Add validation error types for tool calls. |
| paddler/src/tool_call_pipeline.rs | Add buffering/parse/validate pipeline for tool-call fragments. |
| paddler/src/tool_call_parser.rs | Parse tool calls via llama.cpp and template-override fallback. |
| paddler/src/tool_call_parse_error.rs | Define parse errors for tool-call parsing. |
| paddler/src/tool_call_event.rs | Introduce tool-call pipeline events + unit tests. |
| paddler/src/sets_desired_state.rs | Move AgentDesiredState import to paddler_types. |
| paddler/src/lib.rs | Export new tool-call modules; remove llama_cpp_bindings re-export. |
| paddler/src/cancellation_token_stream_guard.rs | Formatting-only change to poll_next signature. |
| paddler/src/balancer/inference_service/http_route/api/post_generate_embedding_batch.rs | Update transformer trait to return multiple chunks; preserve NDJSON streaming. |
| paddler/src/balancer/inference_service/http_route/api/post_continue_from_conversation_history.rs | Update params import path for ContinueFromConversationHistoryParams. |
| paddler/src/balancer/chunk_forwarding_session_controller/transforms_outgoing_message.rs | Change transformer contract to Vec<TransformResult>. |
| paddler/src/balancer/chunk_forwarding_session_controller/mod.rs | Forward multiple transform results per outgoing message. |
| paddler/src/balancer/chunk_forwarding_session_controller/identity_transformer.rs | Adapt identity transformer to new multi-result interface. |
| paddler/src/balancer/agent_controller.rs | Update imports to paddler_types + new params path. |
| paddler/src/balancer/agent_controller_pool.rs | Update imports + minor formatting simplification. |
| paddler/src/balancer_applicable_state.rs | Use paddler_types::agent_desired_state::AgentDesiredState. |
| paddler/src/balancer_applicable_state_holder.rs | Use paddler_types::agent_desired_state::AgentDesiredState. |
| paddler/src/agent/reconciliation_service.rs | Use paddler_types::agent_desired_state::AgentDesiredState. |
| paddler/src/agent/prepared_conversation_history_request.rs | Carry parse_tool_calls + validated tools into prepared requests. |
| paddler/src/agent/prepare_conversation_history_request.rs | Thread parse_tool_calls through; consolidate image prep with prepared_for_inference. |
| paddler/src/agent/management_socket_client_service.rs | Use paddler_types::agent_desired_state::AgentDesiredState. |
| paddler/src/agent/jsonrpc/request.rs | Update ContinueFromConversationHistoryParams import path. |
| paddler/src/agent/jsonrpc/notification_params/set_state_params.rs | Use paddler_types::agent_desired_state::AgentDesiredState. |
| paddler/src/agent/continuous_batch_scheduler/tool_call_pipeline_build_outcome.rs | Add outcome type for enabling tool-call pipeline. |
| paddler/src/agent/continuous_batch_scheduler/tool_call_pass.rs | Add pass to finalize tool-call parsing when leaving tool-call section. |
| paddler/src/agent/continuous_batch_scheduler/sample_token_phase.rs | Extract sampling logic into a dedicated phase type. |
| paddler/src/agent/continuous_batch_scheduler/sample_outcome.rs | Define typed outcomes for sampling stage. |
| paddler/src/agent/continuous_batch_scheduler/ingesting_contribution.rs | Track prompt-ingest contributions per pass. |
| paddler/src/agent/continuous_batch_scheduler/generating_contribution.rs | Track generating contributions per pass. |
| paddler/src/agent/continuous_batch_scheduler/emit_token_phase.rs | Emit classified token events (content/reasoning/tool-call/undeterminable). |
| paddler/src/agent/continuous_batch_scheduler/emit_token_outcome.rs | Define outcomes for emission stage. |
| paddler/src/agent/continuous_batch_scheduler/decode_outcome.rs | Wrap decode outcomes + add unit tests for mapping. |
| paddler/src/agent/continuous_batch_scheduler/decode_batch_phase.rs | Extract decode execution into a dedicated phase. |
| paddler/src/agent/continuous_batch_scheduler/contributions.rs | Aggregate ingest/generate contributions per batch pass. |
| paddler/src/agent/continuous_batch_scheduler/completion_check_phase.rs | Stop on EOG or max_tokens using classifier usage counts. |
| paddler/src/agent/continuous_batch_scheduler/completion_check_outcome.rs | Define completion-check outcomes. |
| paddler/src/agent/continuous_batch_scheduler/commit_phase.rs | Commit batch-pass effects back into active requests. |
| paddler/src/agent/continuous_batch_scheduler/classify_token_phase.rs | Classify sampled tokens and track tool-call section transitions. |
| paddler/src/agent/continuous_batch_scheduler/classified_token.rs | Store raw vs visible pieces and tool-call section flags. |
| paddler/src/agent/continuous_batch_scheduler/batch_pass.rs | Encapsulate LlamaBatch plus contribution bookkeeping. |
| paddler/src/agent/continuous_batch_scheduler/advance_outcome.rs | Represent “advance” outcomes + add unit tests. |
| paddler/src/agent/continuous_batch_arbiter.rs | Update token-to-piece calls to new SampledToken API. |
| paddler/src/agent/continuous_batch_active_request.rs | Track token classifier, pending SampledToken, and optional tool-call pipeline. |
| paddler/src/agent/continue_from_conversation_history_request.rs | Update params import path. |
| paddler/src/agent_desired_state.rs | Stop re-exporting AgentDesiredState; use paddler_types directly. |
| paddler_types/src/request_params/mod.rs | Remove re-export of ContinueFromConversationHistoryParams. |
| paddler_types/src/request_params/continue_from_conversation_history_params/tool/tool_params/mod.rs | Remove re-export; require explicit path for FunctionCall. |
| paddler_types/src/request_params/continue_from_conversation_history_params/tool/tool_params/function_call/parameters_schema/raw_parameters_schema.rs | Simplify schema validation (keep required-in-properties check). |
| paddler_types/src/request_params/continue_from_conversation_history_params/tool/mod.rs | Fix FunctionCall import path and ordering. |
| paddler_types/src/request_params/continue_from_conversation_history_params/mod.rs | Add parse_tool_calls flag and propagate through validation. |
| paddler_types/src/lib.rs | Export new generation_summary module. |
| paddler_types/src/inference_server/request.rs | Update params import path. |
| paddler_types/src/generation_summary.rs | Add GenerationSummary carrying TokenUsage. |
| paddler_types/Cargo.toml | Add dependency on llama-cpp-bindings-types. |
| paddler_tests/tests/smolvlm2_generates_tokens_from_image_input.rs | Update params + token-kind checks + Done now carries summary. |
| paddler_tests/tests/qwen35_without_mmproj_rejects_image_with_multimodal_not_supported.rs | Add parse_tool_calls field. |
| paddler_tests/tests/qwen35_with_system_message_completes_without_thinking.rs | Add parse_tool_calls; update token-kind checks + Done summary. |
| paddler_tests/tests/qwen35_with_system_message_completes_with_thinking.rs | Add parse_tool_calls; update token-kind checks + Done summary. |
| paddler_tests/tests/qwen35_with_mmproj_generates_tokens_from_image.rs | Add parse_tool_calls; update token-kind checks + Done summary. |
| paddler_tests/tests/qwen35_thinking_multi_turn_conversation_stops_cleanly.rs | Add parse_tool_calls; update token-kind checks + Done summary. |
| paddler_tests/tests/qwen35_thinking_mode_stops_cleanly_before_max_tokens.rs | Add parse_tool_calls; update token-kind checks + Done summary. |
| paddler_tests/tests/qwen35_generation_stops_at_eog_before_max_tokens.rs | Add parse_tool_calls; update token-kind checks + Done summary. |
| paddler_tests/tests/qwen35_generates_tokens_for_long_system_and_user_prompt.rs | Add parse_tool_calls; update token-kind checks + Done summary. |
| paddler_tests/tests/qwen3_without_grammar_generates_unconstrained_output.rs | Update token-kind checks via is_token(). |
| paddler_tests/tests/qwen3_openai_streaming_usage_breakdown_with_thinking.rs | New test for OpenAI streaming usage chunk with thinking. |
| paddler_tests/tests/qwen3_openai_streaming_routes_reasoning_to_reasoning_content.rs | New test mapping reasoning tokens to OpenAI reasoning_content. |
| paddler_tests/tests/qwen3_openai_streaming_omits_usage_when_not_requested.rs | New test ensuring usage omitted unless requested. |
| paddler_tests/tests/qwen3_openai_streaming_emits_usage_when_requested.rs | New test ensuring trailing usage chunk emitted when requested. |
| paddler_tests/tests/qwen3_openai_streaming_emits_tool_calls_for_function_tool.rs | New test for structured OpenAI streaming tool calls. |
| paddler_tests/tests/qwen3_openai_non_streaming_usage_with_tool_calls.rs | New test for non-streaming usage + tool calls. |
| paddler_tests/tests/qwen3_openai_non_streaming_returns_usage.rs | New test for non-streaming usage fields. |
| paddler_tests/tests/qwen3_internal_endpoint_with_thinking_enabled_emits_reasoning_tokens.rs | New test for reasoning token classification + usage invariants. |
| paddler_tests/tests/qwen3_internal_endpoint_with_thinking_disabled_emits_no_reasoning_tokens.rs | New test ensuring no reasoning tokens when disabled. |
| paddler_tests/tests/qwen3_internal_endpoint_pure_content_usage.rs | New test for pure content usage breakdown. |
| paddler_tests/tests/qwen3_internal_endpoint_max_tokens_usage_matches.rs | New test ensuring usage completion count matches streamed token count. |
| paddler_tests/tests/qwen3_internal_endpoint_concurrent_requests_independent_usage.rs | New test ensuring per-request usage counters are independent. |
| paddler_tests/tests/qwen3_grammar_with_thinking_returns_incompatible_error.rs | Add parse_tool_calls field. |
| paddler_tests/tests/qwen3_generates_tokens_from_raw_prompt.rs | Update token-kind checks + Done summary. |
| paddler_tests/tests/qwen3_generates_tokens_from_conversation_history.rs | Add parse_tool_calls; update token-kind checks + Done summary. |
| paddler_tests/tests/qwen25vl_generates_tokens_from_image_input.rs | Add parse_tool_calls; update token-kind checks + Done summary. |
| paddler_tests/tests/continuous_batch_two_concurrent_multimodal_requests_produce_tokens.rs | Add parse_tool_calls; update token-kind checks + Done summary. |
| paddler_tests/tests/continuous_batch_stops_generation_when_stop_sender_dropped.rs | Update token-kind checks + Done summary. |
| paddler_tests/tests/continuous_batch_stops_at_max_tokens_boundary.rs | Update token-kind checks + Done summary. |
| paddler_tests/tests/continuous_batch_smoke.rs | Update token-kind checks + Done summary. |
| paddler_tests/tests/continuous_batch_serves_four_concurrent_requests.rs | Update token-kind checks + Done summary. |
| paddler_tests/tests/continuous_batch_reuses_slot_after_request_completes.rs | Update token-kind checks + Done summary. |
| paddler_tests/tests/continuous_batch_plain_and_multimodal_run_concurrently.rs | Add parse_tool_calls; update token-kind checks + Done summary. |
| paddler_tests/tests/continuous_batch_long_and_short_prompts_complete_concurrently.rs | Update token-kind checks + Done summary. |
| paddler_tests/tests/continuous_batch_generates_tokens_with_partial_layer_offload.rs | Update token-kind checks + Done summary. |
| paddler_tests/tests/continuous_batch_generates_tokens_with_distinct_k_and_v_cache_dtypes.rs | Update token-kind checks + Done summary. |
| paddler_tests/tests/continuous_batch_evicts_long_sequence_under_kv_pressure.rs | Update Done summary matching. |
| paddler_tests/tests/continuous_batch_concurrent_conversation_history_requests_complete.rs | Add parse_tool_calls; update token-kind checks + Done summary. |
| paddler_tests/tests/chat_template_swaps_between_inference_calls.rs | Add parse_tool_calls; update token-kind checks via is_token. |
| paddler_tests/tests/chat_template_override_replaces_model_builtin.rs | Add parse_tool_calls; update token-kind checks via is_token. |
| paddler_tests/tests/chat_template_drains_in_flight_inference_before_swap.rs | Add parse_tool_calls; update token-kind checks + Done summary. |
| paddler_tests/tests/balancer_completes_in_flight_inference_during_model_switch.rs | Fix race by awaiting first token before triggering model switch. |
| paddler_tests/tests/agent_text_only_model_rejects_image_input.rs | Add parse_tool_calls field. |
| paddler_tests/tests/agent_streams_tokens_from_raw_prompt.rs | Update token-kind checks via is_token(). |
| paddler_tests/tests/agent_streams_tokens_from_image_data_uri.rs | Add parse_tool_calls; update token-kind checks via is_token. |
| paddler_tests/tests/agent_streams_tokens_from_conversation_history_over_http.rs | Add parse_tool_calls; update token-kind checks via is_token(). |
| paddler_tests/tests/agent_serves_four_concurrent_clients_streaming_tokens.rs | Update token-kind checks via is_token(). |
| paddler_tests/tests/agent_returns_image_decoding_error_for_remote_url.rs | Add parse_tool_calls field. |
| paddler_tests/tests/agent_returns_image_decoding_error_for_malformed_data_uri.rs | Add parse_tool_calls field. |
| paddler_tests/tests/agent_returns_image_decoding_error_for_invalid_base64.rs | Add parse_tool_calls field. |
| paddler_tests/tests/agent_rejects_tool_with_invalid_required_field_in_schema.rs | Update FunctionCall import path; enable parse_tool_calls for test. |
| paddler_tests/tests/agent_raw_prompt_respects_max_tokens.rs | Update token-kind checks via is_token(). |
| paddler_tests/tests/agent_openai_chat_completions_non_streaming_returns_text.rs | Increase max tokens and explicitly disable thinking in template kwargs. |
| paddler_tests/tests/agent_grammar_with_thinking_returns_incompatible_error.rs | Add parse_tool_calls field. |
| paddler_tests/tests/agent_exits_cleanly_on_sigterm_during_multimodal_inference.rs | Add parse_tool_calls field. |
| paddler_tests/tests/agent_conversation_with_json_schema_grammar_returns_valid_json.rs | Add parse_tool_calls field. |
| paddler_tests/tests/agent_conversation_with_gbnf_grammar_constrains_output.rs | Add parse_tool_calls field. |
| paddler_tests/tests/agent_conversation_with_function_tool_succeeds.rs | Update FunctionCall import path; enable parse_tool_calls for tools test. |
| paddler_tests/tests/agent_conversation_history_respects_max_tokens.rs | Add parse_tool_calls; update token-kind checks via is_token(). |
| paddler_tests/tests/agent_conversation_accepts_empty_tools_list.rs | Add parse_tool_calls; update token-kind checks via is_token(). |
| paddler_tests/src/test_device.rs | Import llama_cpp_bindings directly (no longer re-exported by paddler). |
| paddler_tests/src/start_in_process_cluster_with_qwen3_6.rs | New helper to start cluster with Qwen3.6 model card. |
| paddler_tests/src/start_in_process_cluster_with_ministral_3.rs | New helper to start cluster with Ministral 3 model card. |
| paddler_tests/src/start_in_process_cluster_with_gemma_4.rs | New helper to start cluster with Gemma 4 model card. |
| paddler_tests/src/openai_chat_completions_client.rs | New test client for OpenAI chat completions endpoints. |
| paddler_tests/src/model_card/qwen3_6_35b_a3b.rs | New model card for Qwen3.6 35B A3B. |
| paddler_tests/src/model_card/mod.rs | Register new model cards/modules. |
| paddler_tests/src/model_card/ministral_3_14b_reasoning.rs | New model card for Ministral 3 14B reasoning. |
| paddler_tests/src/model_card/gemma_4_e4b_it.rs | New model card for Gemma 4 E4B IT. |
| paddler_tests/src/lib.rs | Export new helpers for OpenAI client and model cards. |
| paddler_tests/src/inference_http_client.rs | Make client cloneable; update params import path. |
| paddler_tests/src/collect_generated_tokens.rs | Accumulate text via token_text() abstraction. |
| paddler_tests/src/cluster_handle.rs | Add Drop impl to ensure subprocess cleanup; refactor shutdown ownership. |
| paddler_tests/Cargo.toml | Add hf-hub + llama-cpp-bindings deps for tests. |
| paddler_gui/src/running_balancer_snapshot.rs | Use AgentDesiredState from paddler_types in tests. |
| paddler_client/src/lib.rs | Make internal modules private and re-export intended public API. |
| paddler_client/src/client_inference.rs | Update params import path. |
| paddler_client_python/tests/test_tool_call_arguments.py | New tests for Python tool-call arguments tagged enum parsing. |
| paddler_client_python/tests/test_stream_ndjson.py | Update token/done wire shapes and message kind enum. |
| paddler_client_python/tests/test_response_stream.py | Update token kind to content-token. |
| paddler_client_python/tests/test_parsed_tool_call.py | New tests for Python parsed tool-call model. |
| paddler_client_python/tests/test_integration_inference.py | Update integration assertions for new token kind enum. |
| paddler_client_python/tests/test_client_inference.py | Update NDJSON helpers + message kind enum. |
| paddler_client_python/paddler_client/tool_call_arguments.py | Add Python ToolCallArguments tagged enum + parser. |
| paddler_client_python/paddler_client/parsed_tool_call.py | Add Python ParsedToolCall model + dict conversion. |
| paddler_client_javascript/tsconfig.json | Add TS config for new JS client package build output. |
| paddler_client_javascript/tests/webSocketProtocol.test.ts | Add tests for protocol mapping helper. |
| paddler_client_javascript/tests/urlToAgentDesiredModel.test.ts | Add tests for URL→model parsing behavior. |
| paddler_client_javascript/tests/streamHttpNdjson.test.ts | Add tests for NDJSON streaming helper + errors. |
| paddler_client_javascript/tests/schemas/ParsedToolCall.test.ts | Add tests for parsed tool-call schema. |
| paddler_client_javascript/tests/schemas/InferenceServiceGenerateTokensResponse.test.ts | Add tests for new token kinds + Done usage summary mapping. |
| paddler_client_javascript/tests/schemas/Agent.test.ts | Add tests for Agent schema parsing/validation. |
| paddler_client_javascript/tests/PaddlerError.test.ts | Add tests for new error subclasses. |
| paddler_client_javascript/tests/fetchJson.test.ts | Add tests for fetchJson helper + HttpError. |
| paddler_client_javascript/tests/extractHuggingFaceUrlParts.test.ts | Add tests for HuggingFace URL parsing helper. |
| paddler_client_javascript/src/WebSocketState.ts | Add shared WebSocket state union type. |
| paddler_client_javascript/src/webSocketProtocol.ts | Add shared WebSocket protocol mapping helper. |
| paddler_client_javascript/src/WebSocketError.ts | Add WebSocket error type. |
| paddler_client_javascript/src/WebSocketConnectionOpenedState.ts | Add opened WebSocket state type. |
| paddler_client_javascript/src/WebSocketConnectionErrorState.ts | Add error WebSocket state + frozen constant. |
| paddler_client_javascript/src/WebSocketConnectionClosedState.ts | Add closed WebSocket state + frozen constant. |
| paddler_client_javascript/src/WebSocketConnectingState.ts | Add connecting WebSocket state + frozen constant. |
| paddler_client_javascript/src/urlToAgentDesiredModel.ts | Implement URL→desired-model parsing with explicit error on unsupported formats. |
| paddler_client_javascript/src/streamHttpNdjson.ts | Add HTTP NDJSON streaming helper (Observable-based). |
| paddler_client_javascript/src/streamEventSource.ts | Add SSE/EventSource streaming helper emitting connection/data states. |
| paddler_client_javascript/src/ServerError.ts | Add server error type with integer code. |
| paddler_client_javascript/src/schemas/ValidatedParametersSchema.ts | Add schema for validated function parameter JSON Schema. |
| paddler_client_javascript/src/schemas/Tool.ts | Add OpenAI-like function tool schema definitions. |
| paddler_client_javascript/src/schemas/PoolingType.ts | Add pooling type enum schema. |
| paddler_client_javascript/src/schemas/ParsedToolCall.ts | Add parsed tool call schema with tagged arguments union. |
| paddler_client_javascript/src/schemas/ModelMetadata.ts | Add schema for model metadata map. |
| paddler_client_javascript/src/schemas/InferenceParameters.ts | Remove exported BooleanKeys/NumberKeys (moved to UI helper). |
| paddler_client_javascript/src/schemas/HuggingFaceModelReference.ts | Add schema for HuggingFace model reference. |
| paddler_client_javascript/src/schemas/HuggingFaceDownloadLock.ts | Minor formatting/import tweak. |
| paddler_client_javascript/src/schemas/GrammarConstraint.ts | Add schema for grammar constraints. |
| paddler_client_javascript/src/schemas/GenerateEmbeddingBatchParams.ts | Add schema for embedding batch params. |
| paddler_client_javascript/src/schemas/EmbeddingNormalizationMethod.ts | Add schema for embedding normalization variants. |
| paddler_client_javascript/src/schemas/EmbeddingInputDocument.ts | Add schema for embedding input documents. |
| paddler_client_javascript/src/schemas/Embedding.ts | Add schema for embedding response items. |
| paddler_client_javascript/src/schemas/ConversationMessageContentPart.ts | Add schema for multimodal conversation content parts. |
| paddler_client_javascript/src/schemas/ConversationMessage.ts | Add schema for conversation message payloads. |
| paddler_client_javascript/src/schemas/ContinueFromRawPromptParams.ts | Add schema for raw-prompt inference params. |
| paddler_client_javascript/src/schemas/ContinueFromConversationHistoryParams.ts | Add schema for conversation-history inference params incl parse_tool_calls. |
| paddler_client_javascript/src/schemas/ChatTemplate.ts | Add chat template schema. |
| paddler_client_javascript/src/schemas/BufferedRequestsResponse.ts | Add buffered requests snapshot schema. |
| paddler_client_javascript/src/schemas/BalancerDesiredState.ts | Add balancer desired state schema. |
| paddler_client_javascript/src/schemas/AgentsResponse.ts | Add agents response schema + stable sort transform. |
| paddler_client_javascript/src/schemas/AgentIssueModelPath.ts | Add schema for model-path structured errors. |
| paddler_client_javascript/src/schemas/AgentIssue.ts | Add schema for agent issue union. |
| paddler_client_javascript/src/schemas/AgentDesiredModel.ts | Add schema for desired model union variants. |
| paddler_client_javascript/src/schemas/Agent.ts | Add schema for agent status payload. |
| paddler_client_javascript/src/PaddlerError.ts | Add base error class for JS client. |
| paddler_client_javascript/src/JsonError.ts | Add JSON parse error carrying raw payload. |
| paddler_client_javascript/src/inferenceSocketClient.ts | Implement WS inference client; switch request id generation to crypto.randomUUID(). |
| paddler_client_javascript/src/HttpError.ts | Add HTTP error class carrying status code. |
| paddler_client_javascript/src/FetchJsonSuccessState.ts | Add fetch-json success state type. |
| paddler_client_javascript/src/FetchJsonState.ts | Add fetch-json state union type. |
| paddler_client_javascript/src/FetchJsonLoadingState.ts | Add loading state type + frozen constant. |
| paddler_client_javascript/src/FetchJsonErrorState.ts | Add error state type. |
| paddler_client_javascript/src/FetchJsonEmptyState.ts | Add empty state type + frozen constant. |
| paddler_client_javascript/src/fetchJson.ts | Add HTTP JSON helper with schema validation. |
| paddler_client_javascript/src/extractHuggingFaceUrlParts.ts | Add HuggingFace URL parsing helper without path-to-regexp. |
| paddler_client_javascript/src/EventSourceState.ts | Add EventSource state union type. |
| paddler_client_javascript/src/EventSourceInitialState.ts | Add initial EventSource state + frozen constant. |
| paddler_client_javascript/src/EventSourceDeserializationErrorState.ts | Add deserialization error state + frozen constant. |
| paddler_client_javascript/src/EventSourceDataSnapshotState.ts | Add typed data snapshot state for SSE. |
| paddler_client_javascript/src/EventSourceConnectionErrorState.ts | Add connection error state + frozen constant. |
| paddler_client_javascript/src/EventSourceConnectedState.ts | Add connected state + frozen constant. |
| paddler_client_javascript/src/ConnectionDroppedError.ts | Add error for dropped streaming connection keyed by request id. |
| paddler_client_javascript/shell.nix | Add dev shell with Node 22. |
| paddler_client_javascript/README.md | Document JS client usage patterns (WS, NDJSON, SSE). |
| paddler_client_javascript/package.json | Define package metadata, exports pattern, peer deps, and test/build scripts. |
| paddler_client_javascript/Makefile | Add package-local build/test targets. |
| paddler_client_javascript/.gitignore | Ignore node artifacts in JS client package. |
| paddler_cli/src/main.rs | Reorder module declarations/imports; keep conditional web panel init. |
| paddler_bootstrap/tests/runners.rs | Minor formatting simplification for agent runner start. |
| paddler_bootstrap/src/bootstrapped_agent_handle.rs | Import AgentDesiredState from paddler_types. |
| package.json | Mark repo private and add npm workspace for JS client package. |
| Makefile | Add top-level build/test targets for JS client workspace. |
| jarmuz/run-website.mjs | Watch JS client sources and trigger TS jobs on changes. |
| Cargo.toml | Switch llama-cpp-bindings crates from crates.io to path deps; add types crate. |
| Cargo.lock | Reflect llama-cpp-bindings path upgrade and new types crate; update deps. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+18
to
+24
| return new Observable(function (subscriber) { | ||
| fetch(url, { | ||
| body: JSON.stringify(body), | ||
| headers: { "Content-Type": "application/json" }, | ||
| method: "POST", | ||
| signal, | ||
| }) |
Comment on lines
+75
to
+77
| match strategy { | ||
| ValidationStrategy::JsonObjectOnly => Ok(()), | ||
| ValidationStrategy::Schema(validator) => { |
…for emit + classify phases
…rds, consolidate OpenAI compat error chunks
…, simplify pipeline
…nd add waiter wakeup tests
…th strict-whitelist kwargs
… reasoning enabled
…d SSE shutdown and dispatch candidate tests
…GTERM during startup yields a clean exit
…shing; rename batch_n_tokens to n_batch
…e per-chunk cap; consolidate test agents into AgentConfig
…ap with a typed error
…ate iced Message, collapse nested if
…parser; raise TypeError on non-dict/non-list payloads
…ication' into token-usage-and-thinking-classification
…rnels are built once
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.