diff --git a/docs/design/codex-integration.md b/docs/design/codex-integration.md new file mode 100644 index 0000000..f73b6d9 --- /dev/null +++ b/docs/design/codex-integration.md @@ -0,0 +1,153 @@ +# Design: Codex CLI Integration + +> **References:** [Issue #54](https://github.com/vllm-project/agentic-api/issues/54), +> [PR #67](https://github.com/vllm-project/agentic-api/pull/67) +> **Owner:** @haoshan98 for Codex compatibility. @ashwing PR #67 owns the generic tool framework. + +--- + +## Summary + +`agentic-api` should work as an upstream layer for Codex CLI while routing inference to vLLM-supported models. + +This PR is an MVP compatibility slice. It lets `agentic-api` accept and preserve Codex-used Responses traffic now, +without waiting for the full generic tool framework from PR #67. + +The important split: + +- **This PR:** preserve Codex request/response shapes and continuation state. +- **PR #67:** formalize generic tool normalization, execution, registry, ownership, and loop decisions. + +--- + +## Current PR Scope + +This PR should do only the minimum needed for Codex compatibility: + +- Add this standalone design doc. +- Accept Codex-used tool declarations without rejecting requests. +- Preserve unknown tool declarations and unknown input/output items as raw JSON. +- Preserve optional `namespace` on `function_call`. +- Preserve `tool_search_call` and `custom_tool_call` shapes. +- Preserve assistant tool-call items through `previous_response_id` rehydration. +- Add model alias routing for Codex-facing model names to local vLLM models. +- Add lightweight helper types/tests that document what #67 should formalize later. + +This PR should **not** build a second generic tool framework. + +--- + +## Deferred To PR #67 + +PR #67 should own the formal shared tool system: + +- `ToolHandler` / `Tool` trait shape. +- Generic tool normalization before `call_inference()`. +- Request-scoped tool registry. +- Client-owned vs gateway-owned dispatch. +- Requires-action / client-action loop decision. +- Live `execution_loop` orchestration and streaming tool events. + +The helper types in this PR are temporary. They express Codex requirements, but the canonical versions should come +from #67. After #67 lands, this slice should plug into or be refactored onto those abstractions. + +--- + +## Compatibility Rules + +The gateway should not detect requests by user agent, route, or "is this Codex?" heuristics. Compatibility is +driven by Responses tool shapes and execution semantics, so it can be always on. + +| Shape | Behavior | +|-------|----------| +| `function` | Client-owned by default. Preserve declaration and return matching calls to the client unless configured as gateway-owned. | +| `namespace` | Model-facing grouping for function tools. Do not treat namespace as a separate executable call type. | +| `tool_search` | Client-owned only when `execution == "client"`. Hosted/non-client search is provider-owned. | +| `custom` | Client-owned by default. Preserve free-form / grammar metadata. | +| Unknown tool | Preserve as raw JSON. Never execute by default. | + +For response items: + +| Response item | Behavior | +|---------------|----------| +| `function_call` | Preserve optional `namespace`. | +| `tool_search_call` with `execution == "client"` | Return to the client for local deferred discovery. | +| Hosted / non-client `tool_search_call` | Do not execute locally. Leave to provider-specific handling. | +| `custom_tool_call` | Preserve free-form `input`; do not coerce into JSON function arguments. | +| Unknown output item | Preserve as raw JSON. Never execute by default. | + +--- + +## Requirements For #67 + +The generic framework should preserve enough metadata for Codex-compatible behavior: + +- raw original tool JSON +- model-visible tool name +- original client-visible identity +- optional namespace or an equivalent unambiguous key +- execution owner: `Client`, `Gateway`, or provider-owned +- raw hints such as `execution`, `format`, and `defer_loading` + +If namespaced tools need disambiguation, a split identity is useful: + +```rust +pub struct ToolName { + pub namespace: Option, + pub name: String, +} +``` + +This avoids collisions such as two different namespaces both defining a tool named `run`. + +--- + +## Continuation + +Codex-owned tool calls must survive response-store continuation. + +Expected rehydration shape: + +```text +prior context + assistant tool call + Codex tool output + new input +``` + +On a turn that returns client-owned tool calls, storage should keep the assistant call item. On the next turn, Codex +submits the matching tool output item, and `previous_response_id` should rebuild the full sequence. + +--- + +## Model Aliases + +Model aliases route Codex-facing model names to local vLLM models: + +```toml +[model_aliases] +codex-compatible = "qwen3-coder" +``` + +Alias resolution is only model routing. It must not imply approval, auto-review, or human-confirmation behavior. + +--- + +## Test Plan + +Current PR tests should cover: + +- `function`, `namespace`, `tool_search`, `custom`, and unknown tools round-trip. +- Extra fields remain preserved. +- `function_call.namespace` round-trips. +- `tool_search_call` and `custom_tool_call` remain raw-compatible. +- Unknown input/output items remain raw JSON. +- `previous_response_id` rehydrates assistant tool calls before tool outputs. +- Model aliases resolve on executor and proxy paths. + +Post-#67 tests should prove the same behavior through the formal tool framework. + +--- + +## Open Questions + +1. What exact requires-action payload type should #67 expose? +2. Should #67 use split `ToolName { namespace, name }` or a different unambiguous registry key? +3. Which Codex-used fields should become typed framework fields, and which should remain raw metadata?