From 057fa2e6586a3ce0b82460cfe0eed1783ef42b9c Mon Sep 17 00:00:00 2001 From: Ashwin Giridharan Date: Thu, 18 Jun 2026 20:38:30 -0700 Subject: [PATCH 1/2] docs: propose tool framework design for multi-type tool support Introduces a design doc for a generic tool framework that handles the full lifecycle of heterogeneous tool types (function, mcp, web_search, file_search, code_interpreter) through a single pipeline with type-specific handlers. Key ideas: - ResponsesTool becomes a tagged enum (backward-compatible serde) - Request-scoped ToolRegistry routes function_call items by name - ToolHandler trait allows adding new types without touching the loop - function type is client-owned (requires_action); all others gateway-executed Signed-off-by: Ashwin Giridharan --- docs/design/tool-framework.md | 294 ++++++++++++++++++++++++++++++++++ 1 file changed, 294 insertions(+) create mode 100644 docs/design/tool-framework.md diff --git a/docs/design/tool-framework.md b/docs/design/tool-framework.md new file mode 100644 index 0000000..ba1fff5 --- /dev/null +++ b/docs/design/tool-framework.md @@ -0,0 +1,294 @@ +# Design: Tool Framework + +> Status: Proposal +> References: [ADR-01 D7](../adr/ADR-01_core.md), [ADR-03 D3](../adr/ADR-03_gateway_integration.md) + +--- + +## Problem + +Clients send heterogeneous tool types (`function`, `mcp`, `web_search`, `file_search`, `code_interpreter`). vLLM only speaks function calling — it produces `function_call` output items regardless of tool origin. The gateway must bridge both directions: normalize inbound tools for inference, and route outbound calls to their correct executors. + +Today `ResponsesTool = FunctionTool`. This design replaces that with a type-aware framework that handles the full tool lifecycle for any tool type through a single pipeline. + +--- + +## Principles + +1. **One pipeline, many types.** The tool lifecycle is the same for all types. What varies is the behavior at each stage. +2. **vLLM is function-only.** Every tool type normalizes to `type: "function"` before inference. Permanent constraint. +3. **Routing by registry, not heuristics.** After inference, `function_call` items are looked up in a request-scoped registry that maps names back to origin type and config. +4. **Function tools are client-owned.** `type: "function"` is never gateway-executed. The response returns `status: "requires_action"` and the client resolves it. All other types are gateway-executed. +5. **Additive.** New tool types implement a trait and register. The executor loop doesn't change. + +--- + +## Architecture + +```mermaid +graph TD + subgraph "Request Phase (once per request)" + REQ["Client Request
tools: mixed types"] + PARSE["Parse + Validate
per-type schemas"] + DISC["Discover
MCP: tools/list"] + NORM["Normalize
all → type: function"] + REG["Build Registry
name → type + config"] + end + + subgraph "Inference" + VLLM["vLLM
sees only function tools"] + end + + subgraph "Execution Phase (per iteration)" + ROUTE["Route
registry lookup per call"] + EXEC_GW["Gateway Execute
mcp / web / file / code"] + PASS["Passthrough
function → requires_action"] + LOOP["Inject Results
re-enter inference"] + end + + REQ --> PARSE --> DISC --> NORM --> REG + REG --> VLLM + VLLM --> ROUTE + ROUTE -->|gateway-owned| EXEC_GW + ROUTE -->|client-owned| PASS + EXEC_GW --> LOOP --> VLLM + + style REQ fill:#1a5c2a,color:#e0e0e0 + style VLLM fill:#1a5c2a,color:#e0e0e0 + style PARSE fill:#2a4a8a,color:#e0e0e0 + style DISC fill:#2a4a8a,color:#e0e0e0 + style NORM fill:#2a4a8a,color:#e0e0e0 + style REG fill:#2a4a8a,color:#e0e0e0 + style ROUTE fill:#2a4a8a,color:#e0e0e0 + style EXEC_GW fill:#2a4a8a,color:#e0e0e0 + style PASS fill:#2a4a8a,color:#e0e0e0 + style LOOP fill:#2a4a8a,color:#e0e0e0 +``` + +--- + +## Pipeline Stages + +Every request with tools passes through 7 stages. Stages 1–4 run once at request start. Stages 5–7 repeat per inference iteration. + +| # | Stage | Generic (framework) | Type-Specific (handler) | +|---|-------|---------------------|-------------------------| +| 1 | **Parse** | Deserialize `tools[]`, classify by `type` | Validate required fields per type | +| 2 | **Discover** | Iterate handlers, collect discovered tools | MCP: `tools/list`. Others: no-op | +| 3 | **Normalize** | Flatten all into `Vec` for vLLM | MCP: schema → parameters. WebSearch: synthetic def | +| 4 | **Register** | Build `HashMap` | Each handler declares ownership of its tool names | +| 5 | **Route** | Lookup `function_call.name` in registry | Determine: gateway-execute or client-passthrough | +| 6 | **Execute** | Parallel execution with timeout + error isolation | MCP: JSON-RPC. WebSearch: HTTP API. Function: skip | +| 7 | **Emit** | Forward type-specific SSE events to client | MCP: 7 events. WebSearch: 2 events. Function: 0 | + +Stages 1–4 produce two artifacts: +- **Normalized tools** — `Vec` forwarded to vLLM +- **Tool registry** — `ToolRegistry` consumed by dispatch for routing + +--- + +## Core Types + +### Tool Classification + +```rust +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +pub enum ToolType { + Function, + Mcp, + WebSearch, + FileSearch, + CodeInterpreter, +} +``` + +### Request-Side Tool Param + +Replaces `pub type ResponsesTool = FunctionTool`: + +```rust +#[derive(Debug, Clone, Serialize, Deserialize)] +#[serde(tag = "type")] +pub enum ResponsesTool { + #[serde(rename = "function")] + Function(FunctionToolParam), + + #[serde(rename = "mcp")] + Mcp(McpToolParam), + + #[serde(rename = "web_search_preview")] + WebSearch(WebSearchToolParam), + + #[serde(rename = "file_search")] + FileSearch(FileSearchToolParam), + + #[serde(rename = "code_interpreter")] + CodeInterpreter(CodeInterpreterToolParam), +} +``` + +`#[serde(tag = "type")]` makes this wire-compatible with existing `{"type":"function",...}` requests. + +### Tool Registry + +```rust +pub struct ToolEntry { + pub tool_type: ToolType, + pub config: Value, + pub server_label: Option, +} + +pub struct ToolRegistry { + entries: HashMap, +} + +impl ToolRegistry { + pub fn lookup(&self, tool_name: &str) -> Option<&ToolEntry>; + pub fn gateway_owned_calls<'a>(&self, calls: &'a [FunctionToolCall]) -> Vec<&'a FunctionToolCall>; + pub fn client_owned_calls<'a>(&self, calls: &'a [FunctionToolCall]) -> Vec<&'a FunctionToolCall>; +} +``` + +### Loop Decision + +```rust +#[derive(Debug)] +#[non_exhaustive] +pub enum LoopDecision { + /// Gateway tools executed — inject results and re-infer. + Continue(Vec), + + /// No tool calls — return response as completed. + Done, + + /// Only client-owned function calls — return requires_action. + RequiresAction(Vec), + + /// Mixed: gateway tools executed AND client calls pending. + /// Loop back; on next pass if only client calls remain → RequiresAction. + ContinuePartial { + results: Vec, + pending_client_calls: Vec, + }, + + /// Safety cap reached. + Incomplete(String), +} +``` + +--- + +## The ToolHandler Trait + +Each tool type implements this: + +```rust +#[async_trait] +pub trait ToolHandler: Send + Sync { + fn tool_type(&self) -> ToolType; + + fn validate(&self, param: &Value) -> Result<(), ToolError>; + + async fn discover(&self, param: &Value) -> Result, ToolError> { + Ok(vec![]) // default: no discovery needed + } + + fn normalize(&self, param: &Value, discovered: &[DiscoveredTool]) -> Vec; + + async fn execute( + &self, + tool_name: &str, + arguments: &str, + config: &Value, + ) -> Result; + + fn event_prefix(&self) -> Option<&'static str> { + None // default: no special SSE events + } + + fn output_item_type(&self) -> &'static str; +} +``` + +Adding a new tool type = implementing this trait + registering it. No changes to the executor loop, accumulator, or streaming path. + +--- + +## Per-Type Behavior + +| Stage | `function` | `mcp` | `web_search` | `file_search` | `code_interpreter` | +|-------|-----------|-------|-------------|--------------|-------------------| +| Validate | name required | server_url required | (none) | vector_store_ids required | (none) | +| Discover | no-op | `tools/list` on server | no-op | no-op | no-op | +| Normalize | passthrough | McpToolDef → FunctionTool | synthetic `web_search(query)` | synthetic `file_search(query)` | synthetic `code_interpreter(code)` | +| Route | → client | → gateway | → gateway | → gateway | → gateway | +| Execute | N/A | JSON-RPC `tools/call` | HTTP search API | vector store query | sandboxed container | +| SSE events | `function_call_arguments.*` | `mcp_call.*` (7 events) | `web_search_call.*` (2) | `file_search_call.*` (2) | `code_interpreter_call.*` | +| Response status | `requires_action` | `completed` | `completed` | `completed` | `completed` | + +--- + +## Mixed-Tool Request Walkthrough + +Request: +```json +{ + "tools": [ + {"type": "function", "name": "run_shell", "parameters": {...}}, + {"type": "mcp", "server_label": "db", "server_url": "http://db-mcp:8080"}, + {"type": "web_search_preview"} + ], + "input": "Find papers on RLHF, check our DB, then run the import script" +} +``` + +**Preparation:** +- Discover: MCP server returns `[query_papers, insert_paper]` +- Registry: `run_shell → Function`, `query_papers → Mcp`, `insert_paper → Mcp`, `web_search → WebSearch` +- vLLM sees 4 function tools + +**Iteration 1:** Model calls `web_search("RLHF papers")` → gateway executes → loop back + +**Iteration 2:** Model calls `query_papers("topic=RLHF")` → gateway executes via JSON-RPC → loop back + +**Iteration 3:** Model calls `run_shell("python import.py")` → registry lookup → `Function` → **client-owned** → response returns `status: "requires_action"` + +Client executes locally, submits `function_call_output`, inference continues. + +--- + +## Shipping Plan + +| PR | Scope | Depends on | +|----|-------|------------| +| **A: Tool Types + Registry** | `ToolType` enum, `ResponsesTool` enum, `ToolRegistry`, `ToolHandler` trait, `FunctionHandler`, normalize pipeline. No execution logic. | io types refactor | +| **B: Type-Aware Dispatch** | Registry-based routing in `dispatch_tools`, `LoopDecision::RequiresAction` + `ContinuePartial`, `HandlerRegistry`. | PR A | +| **C: MCP Handler** | First real `ToolHandler` impl — `tools/list` + `tools/call` via JSON-RPC. Stateless HTTP client. | PR A | +| **D: Tool SSE Events** | Type-specific event emission during execution. Extends `SSEEventType`. | PR B + streaming | +| **E: Output Item Types** | `OutputItem::McpCall`, `OutputItem::WebSearchCall`, etc. Storage + serialization. | PR B | + +PR A lands independently. PR C can parallelize with PR B. Future handlers (web_search, file_search, code_interpreter) implement the same trait. + +--- + +## Design Decisions + +| # | Decision | Rationale | +|---|----------|-----------| +| D1 | Registry-based routing | Name prefixes leak implementation into the model's tool namespace. Registry is invisible to inference. | +| D2 | Request-scoped registry | Different requests may target different MCP servers. Global state would require sync and conflict resolution. | +| D3 | `function` never gateway-executed | Matches OpenAI spec. Enables agent clients (Codex, etc.) that own their tool implementations. "No client delegation" means the gateway doesn't punt *its* work — not that function tools can't exist. | +| D4 | `ContinuePartial` in LoopDecision | Mixed requests need to execute gateway tools and loop, while tracking that client tools also exist. Without this, we'd skip gateway tools or lose client tools. | +| D5 | MCP client is stateless | Each request opens fresh connections. Connection pooling per `server_url` is a follow-up optimization. | +| D6 | `ResponsesTool` uses `#[serde(tag = "type")]` | Wire-compatible with existing `{"type":"function",...}` — no client migration needed. | + +--- + +## Open Questions + +| # | Question | Proposed Answer | +|---|----------|-----------------| +| Q1 | What if MCP `tools/list` returns a name colliding with a `function` tool? | Function wins (client-defined takes precedence). Emit warning log. | +| Q2 | How does `ContinuePartial` look to the streaming client? | Gateway tool events stream in real-time. Final status is `requires_action`. Client already handles incremental events. | +| Q3 | Should `tool_choice: {function: {name: "x"}}` work for MCP-discovered tools? | Yes. vLLM sees all normalized functions. If the forced name is MCP-originated, the call routes through MCP naturally. | +| Q4 | Should `prepare_tools` be a Praxis filter or part of `execute_loop`? | Part of `execute_loop` in core. Praxis wraps the whole loop, not individual tool stages. | From 793675b462faf8a2a073910903c3b3bb7756e530 Mon Sep 17 00:00:00 2001 From: Ashwin Giridharan Date: Thu, 18 Jun 2026 20:46:09 -0700 Subject: [PATCH 2/2] docs: add alternatives considered for function tool handling Enumerates five alternatives (reject, ignore+warn, search MCP, require executor, configurable per-request) and explains why passthrough with requires_action was chosen. Signed-off-by: Ashwin Giridharan --- docs/design/tool-framework.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/docs/design/tool-framework.md b/docs/design/tool-framework.md index ba1fff5..038986a 100644 --- a/docs/design/tool-framework.md +++ b/docs/design/tool-framework.md @@ -284,6 +284,22 @@ PR A lands independently. PR C can parallelize with PR B. Future handlers (web_s --- +## Alternatives Considered for `function` Tool Handling + +Decision D3 (`function` is never gateway-executed, returns `requires_action`) is the most debatable choice. Here are the alternatives we evaluated: + +| # | Alternative | Behavior | Why rejected | +|---|-------------|----------|--------------| +| A | **Reject function tools entirely** | Validate at parse time — if `type: "function"` is present, return 400. Force clients to back all tools with MCP servers. | Breaks OpenAI spec compatibility. Prevents agent clients (Codex, Claude Code) from using their natural pattern. Unnecessarily opinionated. | +| B | **Ignore + warn** | Accept `function` tools, normalize to vLLM, but if model calls one: drop the call silently, log a warning, and continue inference without it. | Silent data loss. Model asked for a tool result and gets nothing — produces hallucinated or degraded responses. Violates least-surprise. | +| C | **Search MCP servers for matching name** | When model calls a `function` tool, check if any registered MCP server happens to expose a tool with that name. If found, execute via MCP. If not, fall back to `requires_action`. | Spooky action at a distance. Client declares `type: "function"` expecting to own execution, but gateway silently intercepts it if an MCP server has a name collision. Also adds latency (extra `tools/list` queries). | +| D | **Gateway-execute all (require registered executor)** | Every `function` tool must have a backing executor configured in gateway config. No `requires_action` at all. | Requires operators to pre-configure every tool. Impossible for dynamic agent clients that generate tool definitions at runtime. Breaks the most common agentic pattern. | +| E | **Configurable per-request** | Add a field like `function_execution: "client" \| "gateway"` to let the client choose. | Over-engineering for MVP. Adds complexity to every code path. If a real use case emerges, we can add it later without breaking the default. | + +**Chosen: passthrough with `requires_action`** — matches OpenAI spec exactly, zero surprise for clients, and cleanly separates "tools the gateway owns" from "tools the client owns" based solely on the `type` field the client already provides. + +--- + ## Open Questions | # | Question | Proposed Answer |