feat(server): add /v1/responses (OpenAI Responses API) for Codex CLI#91
feat(server): add /v1/responses (OpenAI Responses API) for Codex CLI#91audreyt wants to merge 1 commit into
Conversation
Implements the Responses API endpoint that Codex CLI (and other modern OpenAI tooling) speaks instead of /v1/chat/completions. The wire format is documented in OpenAI's Responses API; this implementation has been iterated against the Codex CLI binary's SSE parser shape until no remaining schema gaps were found. Request parsing (parse_responses_request, parse_responses_input): - Accepts the typed input array (message, function_call, function_call_output, reasoning, custom_tool_call(_output), local_shell_call(_output), web_search_call(_output), tool_search_call(_output), image_generation_call(_output), compaction, context_compaction). - Maps hosted-tool history to function_call/function_call_output so prior actions survive across turns; rejects unknown item types and non-completed status with 400 to avoid silent context loss. - Strict content-array parsing: only string|null|array of recognized text blocks (input_text/output_text/text/summary_text/ reasoning_text); rejects non-text modalities (input_image/file/ audio) instead of accepting an empty prompt. - Merges adjacent function_call items into the preceding assistant message so text + tool-call turns render as a single assistant block. - Honors reasoning.effort (incl. "minimal"/"none") and gates reasoning summary surface on reasoning.summary opt-in. - Rejects previous_response_id, conversation, and forced tool_choice explicitly (constrained decoding / persisted state not supported). Output (responses_sse_*, responses_final_response): - Emits the full streaming lifecycle: response.created, output_item.added/.done, reasoning_summary_part.added/.done, reasoning_summary_text.delta/.done, content_part.added/.done, output_text.delta/.done, function_call_arguments.delta/.done, response.completed. - Branches the terminal event by finish reason: response.failed for errors and response.incomplete with reason "max_tokens" for length. - Every event carries sequence_number; every output_text part carries annotations:[]; function_call output_item.added ships with an empty arguments string (full args arrive via function_call_arguments.done and output_item.done), and item ids are stable across added/done. - Tracks whether </think> was actually observed so a truncated stream marks the reasoning item incomplete instead of "completed". - Recovers gracefully when the DSML tool parse fails after the model was suppressed at the tool marker: the suppressed tail is flushed as additional output_text deltas so the streamed message matches output_item.done. Tested by 25 rounds of /codex:adversarial-review against the same client this is meant to feed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Just came here to feature-request this because I wanted to try Codex CLI. With this patch applied I think we should be able to add this to [model_providers.ds4]
name = "DS4"
base_url = "http://127.0.0.1:8000/v1"
wire_api = "responses"
stream_idle_timeout_ms = 1000000And then run: |
|
FYI: There is the issue #22 which also contains a branch with Responses API implementation (but no PR). |
|
I think it's a bit unfortunate if this ends up with many different protocol versions in the core server because it will be tricky to support them all at the same level of quality without blowing up the complexity. I wonder if it would not be better to have a separate proxy that translates to one common protocol. |
|
@mitsuhiko I understand the complexity limitation argument, but I have the feeling the API side is the less fragile part of all this, if we have good tests in place. The tool calling, which is the nightmare part, should be in theory shared across the paths. I'll investigate this PR soon and report back. |
|
Handling this, news soon. |
|
You can find the candiate for merging here: responses-api, it implements continuing from the current KV memory checkpoint matching IDs without the need for prefix matching. |
Implements the Responses API endpoint that Codex CLI (and other modern OpenAI tooling) speaks instead of /v1/chat/completions. The wire format is documented in OpenAI's Responses API; this implementation has been iterated against the Codex CLI binary's SSE parser shape until no remaining schema gaps were found.
Request parsing (parse_responses_request, parse_responses_input):
Output (responses_sse_*, responses_final_response):
Tested by 25 rounds of /codex:adversarial-review against the same client this is meant to feed.