From 39c294b72d79c5fb8ee57b4bc08b1caec698df4e Mon Sep 17 00:00:00 2001 From: haoshan98 Date: Fri, 19 Jun 2026 10:03:38 +0000 Subject: [PATCH 1/3] Codex integration design Signed-off-by: haoshan98 --- docs/design/core-public-api.md | 249 ++++++++++++++++++++++++++++++++- 1 file changed, 247 insertions(+), 2 deletions(-) diff --git a/docs/design/core-public-api.md b/docs/design/core-public-api.md index fb92954..873fc45 100644 --- a/docs/design/core-public-api.md +++ b/docs/design/core-public-api.md @@ -1,7 +1,8 @@ # Design: `agentic-core` Public API > Status: Active — implementation in progress -> References: [ADR-03](../adr/ADR-03_gateway_integration.md), [Issue #42](https://github.com/vllm-project/agentic-api/issues/42), [Praxis #354](https://github.com/praxis-proxy/praxis/issues/354) +> References: [ADR-03](../adr/ADR-03_gateway_integration.md), [Issue #42](https://github.com/vllm-project/agentic-api/issues/42), +> [Issue #54](https://github.com/vllm-project/agentic-api/issues/54), [Praxis #354](https://github.com/praxis-proxy/praxis/issues/354) > Owner: @ashwing (tool dispatch, loop control, streaming tee) + @maralbahari (base loop, store integration) --- @@ -33,6 +34,7 @@ The base loop handles text messages. This design extends it with: 3. **Streaming tee** — forward SSE to client in real-time while accumulating for tool detection 4. **Extended SSE events** — function_call, reasoning, file_search, web_search event types 5. **Tool executor traits** — MCP, web_search, vector_store as pluggable implementations +6. **Codex CLI compatibility** — recognize Codex client-side tool types and route them without server execution --- @@ -123,7 +125,7 @@ pub async fn execute_loop( 1. Rehydrate (delegates to PR #46's `rehydrate_conversation`) 2. Call inference (delegates to PR #46's `call_inference` — returns stream lazily) 3. Accumulate response (via `ResponseAccumulator::from_stream`) -4. Check output for `OutputItem::FunctionCall` → `dispatch_tools` → loop or done +4. Check output for `OutputItem::FunctionCall` → `dispatch_tools` → loop, client action, or done 5. Persist final response (delegates to PR #46's `persist_response` with explicit handlers) **Phase 2 is non-streaming only.** The tool loop inspects the full accumulated response before deciding. Streaming + tool dispatch (forwarding events to client while detecting tool calls) requires Phase 3's tee pattern. @@ -252,6 +254,249 @@ How the complete pipeline maps to @leseb's proposed filter chain: --- +## Codex Integration + +Allow `agentic-api` to serve as the upstream layer for Codex CLI in the coming PR by @haoshan98, related to [Issue #54](https://github.com/vllm-project/agentic-api/issues/54). +`agentic-api` should accept Codex CLI traffic, route inference to vLLM-supported models, preserve +`previous_response_id` and conversation persistence, and pass client-owned tool calls back to Codex for local +execution. + +The immediate compatibility gap is request parsing and pass-through behavior. `agentic-api` already supports +`type: "function"`, but it does not yet recognize the Responses API tool shapes Codex uses for local/client tools: +`namespace`, `tool_search`, and `custom`. Today those request tools can fail before they reach upstream inference. +This section scopes the Codex integration to accepting those tool types losslessly and returning client-owned tool +calls to Codex CLI for local execution. + +Server-hosted tool types such as `file_search`, `web_search_preview`, and `code_interpreter` remain future +server-side work. They should not be conflated with this Codex compatibility pass. + +### Codex Tool Type Taxonomy + +Codex CLI sends tool declarations that are executed locally by the CLI. For this phase, `agentic-core` only needs +to recognize and preserve these shapes, normalize them for vLLM when necessary, and avoid treating them as +gateway-executed tools. + +| Tool type | Executor | Core behavior | +|-----------|----------|---------------| +| `function` | Codex CLI by default | Already supported on the wire. Codex requests should return calls to Codex unless configuration marks the tool as gateway-owned. | +| `namespace` | Codex CLI | Accept the model-facing container shape and preserve child function metadata. Calls still arrive as `function_call` with an optional namespace. | +| `tool_search` | Codex CLI | Accept deferred-discovery shape, preserve `execution`, and return calls/output handling to Codex when `execution = "client"`. | +| `custom` | Codex CLI | Accept free-form/grammar tool shape and preserve `format` metadata for Codex. | + +The key requirement is compatibility, not server execution. These tools should pass through `agentic-api` +without request validation failures, and any client-owned model-emitted calls should be surfaced to Codex CLI +rather than executed inside the gateway. + +The key distinction is execution owner, not just wire type. `function` is a shared wire type: Codex can own local +functions, while future gateway integrations may also expose server-executed functions. The request normalizer must +classify every model-visible tool before inference and carry that registry through response handling. + +### Public Type Additions + +The current `ResponsesTool = FunctionTool` alias is too narrow for Codex. Replace it with a tagged tool enum that +preserves unknown shapes while giving Codex-used variants first-class names. + +Do not implement this as `#[serde(tag = "type")]` wrapping the existing `FunctionTool` struct directly, because +`FunctionTool` already stores the wire `type` field. Either use variant-specific payload structs that omit the +already-consumed tag, as sketched below, or write manual deserialization that preserves the raw object. + +```rust +#[derive(Debug, Clone, Serialize, Deserialize)] +#[serde(untagged)] +pub enum ResponsesTool { + Known(KnownResponsesTool), + Unknown(Value), +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +#[serde(tag = "type")] +pub enum KnownResponsesTool { + #[serde(rename = "function")] + Function(ResponsesFunctionTool), + #[serde(rename = "namespace")] + Namespace(CodexNamespaceTool), + #[serde(rename = "tool_search")] + ToolSearch(CodexToolSearchTool), + #[serde(rename = "custom")] + Custom(CodexCustomTool), +} + +pub struct ResponsesFunctionTool { + pub name: String, + pub description: Option, + pub parameters: Option, + pub strict: Option, + #[serde(default)] + pub defer_loading: bool, +} + +pub struct CodexNamespaceTool { + pub name: String, + pub description: Option, + #[serde(default)] + pub tools: Vec, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +#[serde(tag = "type")] +pub enum CodexNamespaceMember { + #[serde(rename = "function")] + Function(ResponsesFunctionTool), +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum ToolSearchExecution { + Server, + Client, +} + +pub struct CodexToolSearchTool { + pub description: Option, + pub execution: Option, + pub parameters: Option, +} + +pub struct CodexCustomTool { + pub name: String, + pub description: Option, + pub format: Option, + #[serde(default)] + pub defer_loading: bool, +} +``` + +The storage and upstream request paths should preserve raw unknown tools as `Value`. Unknown tool types must not +be executed by default. + +### Codex Call Shapes + +Codex's local router treats namespace as part of a function tool name, not as a separate payload type: + +| Response item | Local Codex payload | Gateway behavior | +|---------------|---------------------|------------------| +| `function_call` | `ToolPayload::Function { arguments }` | Preserve optional `namespace` and return the call to Codex when the tool is client-owned. | +| `tool_search_call` with `execution = "client"` | `ToolPayload::ToolSearch` | Return to Codex for local deferred discovery. | +| Hosted `tool_search_call` | Provider-owned | Do not execute locally; provider/upstream owns it. | +| `custom_tool_call` | `ToolPayload::Custom { input }` | Preserve free-form `input`, not JSON-schema function arguments. | + +`custom_tool_call` is for free-form/custom Responses tools, including grammar-based patch or code tools. It should +not be normalized as JSON-schema function arguments unless the adapter can reconstruct the original custom call +exactly. + +### Normalization And Registry + +Codex compatibility needs two related operations: + +1. Build an upstream-safe tool list for the selected inference backend. +2. Keep a lossless registry that maps model-emitted calls back to the original client-visible tool declaration. + +If vLLM only accepts flat function declarations, `tool_search` and `custom` become request normalization concerns. +`namespace` is mostly model-facing spec organization: `ToolSpec::Namespace` wraps function tools, and the model +still emits `function_call` with an optional `namespace`. Any backend-specific flattening is an adapter detail, +not the public semantics of the tool. + +```rust +pub struct NormalizedTools { + pub upstream_tools: Vec, + pub registry: ToolRegistry, +} + +pub struct ToolName { + pub namespace: Option, + pub name: String, +} + +pub enum ToolExecutionOwner { + Client, + Gateway, +} + +pub struct ToolRegistryEntry { + pub owner: ToolExecutionOwner, + pub original_type: String, + pub original_name: ToolName, + pub model_name: ToolName, + pub original_tool: Value, +} +``` + +For a namespace tool: + +```json +{ + "type": "namespace", + "name": "mcp__github", + "tools": [ + { "name": "create_issue", "description": "Create issue", "parameters": {} } + ] +} +``` + +the normalizer records a registry entry keyed by the split `ToolName`: + +```text +ToolName { namespace: Some("mcp__github"), name: "create_issue" } + -> owner = Client, original_type = "namespace" +``` + +Because the registry keys by split `ToolName`, two tools named `run` in different namespaces can coexist. When the +upstream response includes a `namespace` field on a `function_call`, preserve it. When an upstream backend only +returns an encoded flat name, recover the namespace and child tool name from the registry before returning the +response to Codex. This likely requires extending `FunctionToolCall` with: + +```rust +pub namespace: Option, +``` + +`tool_search` may need to be adapted for backends that only understand functions, but the registry must preserve +`execution` and map client-executed `tool_search_call` / `tool_search_output` items back to the +Responses-compatible shape. Hosted/non-client tool search remains provider-owned and should not be handled by the +local Codex route. `custom` carries free-form `input` instead of JSON arguments, so the registry must retain +`format` and enough raw metadata to reconstruct the Codex-visible call. + +### Pass-Through Behavior + +Routing rules: + +1. Client-owned calls (`function`, namespaced `function`, client-executed `tool_search`, and `custom`) are returned + to Codex without gateway execution. +2. Gateway-owned calls execute inside `agentic-api` only when explicitly supported by a registered executor. +3. `namespace`, `tool_search`, and `custom` request declarations must not fail deserialization or validation. +4. The registry preserves the original request tool shape so the returned call can be interpreted by Codex CLI. +5. Unknown tool types are not executed by default. Preserve them when possible and reject them only when the + upstream cannot receive a safe normalized declaration. + +For Codex-owned calls, the gateway should not synthesize tool outputs. It persists the assistant call item, returns +the call to Codex, and expects Codex to continue the conversation with the corresponding tool output item after +local execution. + +The loop needs an explicit client-action decision so this path is not confused with either `Done` or +`Continue`: + +```rust +pub enum LoopDecision { + Continue(Vec), + RequiresClientAction(Vec), + Done, + Incomplete(String), +} +``` + +### Auto-Approval Model Alias + +[Issue #54](https://github.com/vllm-project/agentic-api/issues/54) also notes Codex's auto-approval request path. MVP support should add a simple model alias map in +configuration: + +```toml +[model_aliases] +codex-auto-review = "real-upstream-model" +``` + +`ExecutionContext` resolves aliases before `call_inference()`. A Codex-specific `/v1/models` response with model metadata can come later; the alias map is sufficient to unblock CLI compatibility without expanding the public API. + +--- + ## Open Questions 1. **`execute_loop` vs refactoring `execute`:** Should the loop wrapper be a new function or replace PR #46's `execute()`? Pending maralbahari's response on PR #46 review. From 6d7ce49244a364777fce675d9161ea026c1ded3c Mon Sep 17 00:00:00 2001 From: haoshan98 Date: Thu, 25 Jun 2026 11:01:03 +0000 Subject: [PATCH 2/3] Revert "Codex integration design" This reverts commit e1852fc0ffdf389942ff0e89f6b19018dd7bbf53. Signed-off-by: haoshan98 --- docs/design/core-public-api.md | 249 +-------------------------------- 1 file changed, 2 insertions(+), 247 deletions(-) diff --git a/docs/design/core-public-api.md b/docs/design/core-public-api.md index 873fc45..fb92954 100644 --- a/docs/design/core-public-api.md +++ b/docs/design/core-public-api.md @@ -1,8 +1,7 @@ # Design: `agentic-core` Public API > Status: Active — implementation in progress -> References: [ADR-03](../adr/ADR-03_gateway_integration.md), [Issue #42](https://github.com/vllm-project/agentic-api/issues/42), -> [Issue #54](https://github.com/vllm-project/agentic-api/issues/54), [Praxis #354](https://github.com/praxis-proxy/praxis/issues/354) +> References: [ADR-03](../adr/ADR-03_gateway_integration.md), [Issue #42](https://github.com/vllm-project/agentic-api/issues/42), [Praxis #354](https://github.com/praxis-proxy/praxis/issues/354) > Owner: @ashwing (tool dispatch, loop control, streaming tee) + @maralbahari (base loop, store integration) --- @@ -34,7 +33,6 @@ The base loop handles text messages. This design extends it with: 3. **Streaming tee** — forward SSE to client in real-time while accumulating for tool detection 4. **Extended SSE events** — function_call, reasoning, file_search, web_search event types 5. **Tool executor traits** — MCP, web_search, vector_store as pluggable implementations -6. **Codex CLI compatibility** — recognize Codex client-side tool types and route them without server execution --- @@ -125,7 +123,7 @@ pub async fn execute_loop( 1. Rehydrate (delegates to PR #46's `rehydrate_conversation`) 2. Call inference (delegates to PR #46's `call_inference` — returns stream lazily) 3. Accumulate response (via `ResponseAccumulator::from_stream`) -4. Check output for `OutputItem::FunctionCall` → `dispatch_tools` → loop, client action, or done +4. Check output for `OutputItem::FunctionCall` → `dispatch_tools` → loop or done 5. Persist final response (delegates to PR #46's `persist_response` with explicit handlers) **Phase 2 is non-streaming only.** The tool loop inspects the full accumulated response before deciding. Streaming + tool dispatch (forwarding events to client while detecting tool calls) requires Phase 3's tee pattern. @@ -254,249 +252,6 @@ How the complete pipeline maps to @leseb's proposed filter chain: --- -## Codex Integration - -Allow `agentic-api` to serve as the upstream layer for Codex CLI in the coming PR by @haoshan98, related to [Issue #54](https://github.com/vllm-project/agentic-api/issues/54). -`agentic-api` should accept Codex CLI traffic, route inference to vLLM-supported models, preserve -`previous_response_id` and conversation persistence, and pass client-owned tool calls back to Codex for local -execution. - -The immediate compatibility gap is request parsing and pass-through behavior. `agentic-api` already supports -`type: "function"`, but it does not yet recognize the Responses API tool shapes Codex uses for local/client tools: -`namespace`, `tool_search`, and `custom`. Today those request tools can fail before they reach upstream inference. -This section scopes the Codex integration to accepting those tool types losslessly and returning client-owned tool -calls to Codex CLI for local execution. - -Server-hosted tool types such as `file_search`, `web_search_preview`, and `code_interpreter` remain future -server-side work. They should not be conflated with this Codex compatibility pass. - -### Codex Tool Type Taxonomy - -Codex CLI sends tool declarations that are executed locally by the CLI. For this phase, `agentic-core` only needs -to recognize and preserve these shapes, normalize them for vLLM when necessary, and avoid treating them as -gateway-executed tools. - -| Tool type | Executor | Core behavior | -|-----------|----------|---------------| -| `function` | Codex CLI by default | Already supported on the wire. Codex requests should return calls to Codex unless configuration marks the tool as gateway-owned. | -| `namespace` | Codex CLI | Accept the model-facing container shape and preserve child function metadata. Calls still arrive as `function_call` with an optional namespace. | -| `tool_search` | Codex CLI | Accept deferred-discovery shape, preserve `execution`, and return calls/output handling to Codex when `execution = "client"`. | -| `custom` | Codex CLI | Accept free-form/grammar tool shape and preserve `format` metadata for Codex. | - -The key requirement is compatibility, not server execution. These tools should pass through `agentic-api` -without request validation failures, and any client-owned model-emitted calls should be surfaced to Codex CLI -rather than executed inside the gateway. - -The key distinction is execution owner, not just wire type. `function` is a shared wire type: Codex can own local -functions, while future gateway integrations may also expose server-executed functions. The request normalizer must -classify every model-visible tool before inference and carry that registry through response handling. - -### Public Type Additions - -The current `ResponsesTool = FunctionTool` alias is too narrow for Codex. Replace it with a tagged tool enum that -preserves unknown shapes while giving Codex-used variants first-class names. - -Do not implement this as `#[serde(tag = "type")]` wrapping the existing `FunctionTool` struct directly, because -`FunctionTool` already stores the wire `type` field. Either use variant-specific payload structs that omit the -already-consumed tag, as sketched below, or write manual deserialization that preserves the raw object. - -```rust -#[derive(Debug, Clone, Serialize, Deserialize)] -#[serde(untagged)] -pub enum ResponsesTool { - Known(KnownResponsesTool), - Unknown(Value), -} - -#[derive(Debug, Clone, Serialize, Deserialize)] -#[serde(tag = "type")] -pub enum KnownResponsesTool { - #[serde(rename = "function")] - Function(ResponsesFunctionTool), - #[serde(rename = "namespace")] - Namespace(CodexNamespaceTool), - #[serde(rename = "tool_search")] - ToolSearch(CodexToolSearchTool), - #[serde(rename = "custom")] - Custom(CodexCustomTool), -} - -pub struct ResponsesFunctionTool { - pub name: String, - pub description: Option, - pub parameters: Option, - pub strict: Option, - #[serde(default)] - pub defer_loading: bool, -} - -pub struct CodexNamespaceTool { - pub name: String, - pub description: Option, - #[serde(default)] - pub tools: Vec, -} - -#[derive(Debug, Clone, Serialize, Deserialize)] -#[serde(tag = "type")] -pub enum CodexNamespaceMember { - #[serde(rename = "function")] - Function(ResponsesFunctionTool), -} - -#[derive(Debug, Clone, Serialize, Deserialize)] -#[serde(rename_all = "snake_case")] -pub enum ToolSearchExecution { - Server, - Client, -} - -pub struct CodexToolSearchTool { - pub description: Option, - pub execution: Option, - pub parameters: Option, -} - -pub struct CodexCustomTool { - pub name: String, - pub description: Option, - pub format: Option, - #[serde(default)] - pub defer_loading: bool, -} -``` - -The storage and upstream request paths should preserve raw unknown tools as `Value`. Unknown tool types must not -be executed by default. - -### Codex Call Shapes - -Codex's local router treats namespace as part of a function tool name, not as a separate payload type: - -| Response item | Local Codex payload | Gateway behavior | -|---------------|---------------------|------------------| -| `function_call` | `ToolPayload::Function { arguments }` | Preserve optional `namespace` and return the call to Codex when the tool is client-owned. | -| `tool_search_call` with `execution = "client"` | `ToolPayload::ToolSearch` | Return to Codex for local deferred discovery. | -| Hosted `tool_search_call` | Provider-owned | Do not execute locally; provider/upstream owns it. | -| `custom_tool_call` | `ToolPayload::Custom { input }` | Preserve free-form `input`, not JSON-schema function arguments. | - -`custom_tool_call` is for free-form/custom Responses tools, including grammar-based patch or code tools. It should -not be normalized as JSON-schema function arguments unless the adapter can reconstruct the original custom call -exactly. - -### Normalization And Registry - -Codex compatibility needs two related operations: - -1. Build an upstream-safe tool list for the selected inference backend. -2. Keep a lossless registry that maps model-emitted calls back to the original client-visible tool declaration. - -If vLLM only accepts flat function declarations, `tool_search` and `custom` become request normalization concerns. -`namespace` is mostly model-facing spec organization: `ToolSpec::Namespace` wraps function tools, and the model -still emits `function_call` with an optional `namespace`. Any backend-specific flattening is an adapter detail, -not the public semantics of the tool. - -```rust -pub struct NormalizedTools { - pub upstream_tools: Vec, - pub registry: ToolRegistry, -} - -pub struct ToolName { - pub namespace: Option, - pub name: String, -} - -pub enum ToolExecutionOwner { - Client, - Gateway, -} - -pub struct ToolRegistryEntry { - pub owner: ToolExecutionOwner, - pub original_type: String, - pub original_name: ToolName, - pub model_name: ToolName, - pub original_tool: Value, -} -``` - -For a namespace tool: - -```json -{ - "type": "namespace", - "name": "mcp__github", - "tools": [ - { "name": "create_issue", "description": "Create issue", "parameters": {} } - ] -} -``` - -the normalizer records a registry entry keyed by the split `ToolName`: - -```text -ToolName { namespace: Some("mcp__github"), name: "create_issue" } - -> owner = Client, original_type = "namespace" -``` - -Because the registry keys by split `ToolName`, two tools named `run` in different namespaces can coexist. When the -upstream response includes a `namespace` field on a `function_call`, preserve it. When an upstream backend only -returns an encoded flat name, recover the namespace and child tool name from the registry before returning the -response to Codex. This likely requires extending `FunctionToolCall` with: - -```rust -pub namespace: Option, -``` - -`tool_search` may need to be adapted for backends that only understand functions, but the registry must preserve -`execution` and map client-executed `tool_search_call` / `tool_search_output` items back to the -Responses-compatible shape. Hosted/non-client tool search remains provider-owned and should not be handled by the -local Codex route. `custom` carries free-form `input` instead of JSON arguments, so the registry must retain -`format` and enough raw metadata to reconstruct the Codex-visible call. - -### Pass-Through Behavior - -Routing rules: - -1. Client-owned calls (`function`, namespaced `function`, client-executed `tool_search`, and `custom`) are returned - to Codex without gateway execution. -2. Gateway-owned calls execute inside `agentic-api` only when explicitly supported by a registered executor. -3. `namespace`, `tool_search`, and `custom` request declarations must not fail deserialization or validation. -4. The registry preserves the original request tool shape so the returned call can be interpreted by Codex CLI. -5. Unknown tool types are not executed by default. Preserve them when possible and reject them only when the - upstream cannot receive a safe normalized declaration. - -For Codex-owned calls, the gateway should not synthesize tool outputs. It persists the assistant call item, returns -the call to Codex, and expects Codex to continue the conversation with the corresponding tool output item after -local execution. - -The loop needs an explicit client-action decision so this path is not confused with either `Done` or -`Continue`: - -```rust -pub enum LoopDecision { - Continue(Vec), - RequiresClientAction(Vec), - Done, - Incomplete(String), -} -``` - -### Auto-Approval Model Alias - -[Issue #54](https://github.com/vllm-project/agentic-api/issues/54) also notes Codex's auto-approval request path. MVP support should add a simple model alias map in -configuration: - -```toml -[model_aliases] -codex-auto-review = "real-upstream-model" -``` - -`ExecutionContext` resolves aliases before `call_inference()`. A Codex-specific `/v1/models` response with model metadata can come later; the alias map is sufficient to unblock CLI compatibility without expanding the public API. - ---- - ## Open Questions 1. **`execute_loop` vs refactoring `execute`:** Should the loop wrapper be a new function or replace PR #46's `execute()`? Pending maralbahari's response on PR #46 review. From 80fe88222ff992ec1f3d485336eddb1746a5de5b Mon Sep 17 00:00:00 2001 From: haoshan98 Date: Thu, 25 Jun 2026 11:03:40 +0000 Subject: [PATCH 3/3] Codex compatibility layer Signed-off-by: haoshan98 --- docs/design/codex-integration.md | 153 +++++++++++++++++++++++++++++++ 1 file changed, 153 insertions(+) create mode 100644 docs/design/codex-integration.md diff --git a/docs/design/codex-integration.md b/docs/design/codex-integration.md new file mode 100644 index 0000000..f73b6d9 --- /dev/null +++ b/docs/design/codex-integration.md @@ -0,0 +1,153 @@ +# Design: Codex CLI Integration + +> **References:** [Issue #54](https://github.com/vllm-project/agentic-api/issues/54), +> [PR #67](https://github.com/vllm-project/agentic-api/pull/67) +> **Owner:** @haoshan98 for Codex compatibility. @ashwing PR #67 owns the generic tool framework. + +--- + +## Summary + +`agentic-api` should work as an upstream layer for Codex CLI while routing inference to vLLM-supported models. + +This PR is an MVP compatibility slice. It lets `agentic-api` accept and preserve Codex-used Responses traffic now, +without waiting for the full generic tool framework from PR #67. + +The important split: + +- **This PR:** preserve Codex request/response shapes and continuation state. +- **PR #67:** formalize generic tool normalization, execution, registry, ownership, and loop decisions. + +--- + +## Current PR Scope + +This PR should do only the minimum needed for Codex compatibility: + +- Add this standalone design doc. +- Accept Codex-used tool declarations without rejecting requests. +- Preserve unknown tool declarations and unknown input/output items as raw JSON. +- Preserve optional `namespace` on `function_call`. +- Preserve `tool_search_call` and `custom_tool_call` shapes. +- Preserve assistant tool-call items through `previous_response_id` rehydration. +- Add model alias routing for Codex-facing model names to local vLLM models. +- Add lightweight helper types/tests that document what #67 should formalize later. + +This PR should **not** build a second generic tool framework. + +--- + +## Deferred To PR #67 + +PR #67 should own the formal shared tool system: + +- `ToolHandler` / `Tool` trait shape. +- Generic tool normalization before `call_inference()`. +- Request-scoped tool registry. +- Client-owned vs gateway-owned dispatch. +- Requires-action / client-action loop decision. +- Live `execution_loop` orchestration and streaming tool events. + +The helper types in this PR are temporary. They express Codex requirements, but the canonical versions should come +from #67. After #67 lands, this slice should plug into or be refactored onto those abstractions. + +--- + +## Compatibility Rules + +The gateway should not detect requests by user agent, route, or "is this Codex?" heuristics. Compatibility is +driven by Responses tool shapes and execution semantics, so it can be always on. + +| Shape | Behavior | +|-------|----------| +| `function` | Client-owned by default. Preserve declaration and return matching calls to the client unless configured as gateway-owned. | +| `namespace` | Model-facing grouping for function tools. Do not treat namespace as a separate executable call type. | +| `tool_search` | Client-owned only when `execution == "client"`. Hosted/non-client search is provider-owned. | +| `custom` | Client-owned by default. Preserve free-form / grammar metadata. | +| Unknown tool | Preserve as raw JSON. Never execute by default. | + +For response items: + +| Response item | Behavior | +|---------------|----------| +| `function_call` | Preserve optional `namespace`. | +| `tool_search_call` with `execution == "client"` | Return to the client for local deferred discovery. | +| Hosted / non-client `tool_search_call` | Do not execute locally. Leave to provider-specific handling. | +| `custom_tool_call` | Preserve free-form `input`; do not coerce into JSON function arguments. | +| Unknown output item | Preserve as raw JSON. Never execute by default. | + +--- + +## Requirements For #67 + +The generic framework should preserve enough metadata for Codex-compatible behavior: + +- raw original tool JSON +- model-visible tool name +- original client-visible identity +- optional namespace or an equivalent unambiguous key +- execution owner: `Client`, `Gateway`, or provider-owned +- raw hints such as `execution`, `format`, and `defer_loading` + +If namespaced tools need disambiguation, a split identity is useful: + +```rust +pub struct ToolName { + pub namespace: Option, + pub name: String, +} +``` + +This avoids collisions such as two different namespaces both defining a tool named `run`. + +--- + +## Continuation + +Codex-owned tool calls must survive response-store continuation. + +Expected rehydration shape: + +```text +prior context + assistant tool call + Codex tool output + new input +``` + +On a turn that returns client-owned tool calls, storage should keep the assistant call item. On the next turn, Codex +submits the matching tool output item, and `previous_response_id` should rebuild the full sequence. + +--- + +## Model Aliases + +Model aliases route Codex-facing model names to local vLLM models: + +```toml +[model_aliases] +codex-compatible = "qwen3-coder" +``` + +Alias resolution is only model routing. It must not imply approval, auto-review, or human-confirmation behavior. + +--- + +## Test Plan + +Current PR tests should cover: + +- `function`, `namespace`, `tool_search`, `custom`, and unknown tools round-trip. +- Extra fields remain preserved. +- `function_call.namespace` round-trips. +- `tool_search_call` and `custom_tool_call` remain raw-compatible. +- Unknown input/output items remain raw JSON. +- `previous_response_id` rehydrates assistant tool calls before tool outputs. +- Model aliases resolve on executor and proxy paths. + +Post-#67 tests should prove the same behavior through the formal tool framework. + +--- + +## Open Questions + +1. What exact requires-action payload type should #67 expose? +2. Should #67 use split `ToolName { namespace, name }` or a different unambiguous registry key? +3. Which Codex-used fields should become typed framework fields, and which should remain raw metadata?