-
Notifications
You must be signed in to change notification settings - Fork 0
OpenAI Compatible
contractVersion: 1.1.0
Verified against:
@ai-sdk/openai@3.0.55(verification date 2026-05-01). Re-verified on every dependency bump to this package.
The bundled OpenAI-compatible provider is the reference extension for any backend that speaks an OpenAI-schema API. It supports OpenAI direct, self-hosted gateways, corporate proxies, and local runtimes through one baseURL configuration value.
protocol: openai-compatible
| Field | Required | Type | Meaning |
|---|---|---|---|
protocol |
yes | "openai-compatible" |
Selects this adapter. |
apiKeyRef |
yes | reference ({ kind: "env" | "keyring", name }) |
Credential reference. Resolved at request time; never persisted in the manifest. |
baseURL |
yes | absolute http/https URL | Forwarded without mutation. |
models |
yes | string[] |
The model names this entry serves. Each is selectable via /model while this provider is active. |
apiShape |
no |
"chat-completions" (default) | "responses"
|
Selects the wire contract. |
timeoutMs |
no | integer ms | Request timeout budget. |
defaultParams |
no | object | Provider-specific request defaults. |
Required fields:
-
protocolis the literal string"openai-compatible". Core uses this — not thesettings.json.providersmap key — to look up the adapter. -
apiKeyRefstores only a credential reference. The provider resolves it at request time. -
baseURLis an absolutehttporhttpsURL and is forwarded without mutation. -
modelslists every model name the backend atbaseURLserves. The provider entry exposes the union of these via/model; the active model defaults tomodels[0]unlessactive.modeloverrides.
Optional fields:
-
apiShapeselects the wire contract. It defaults tochat-completions. -
timeoutMssets the request timeout budget. -
defaultParamssupplies provider-specific request defaults.
Example — the entry lives at settings.json.providers.<id>. The user picks the id (typically the backend's name):
{
"providers": {
"openai-prod": {
"protocol": "openai-compatible",
"apiKeyRef": { "kind": "env", "name": "OPENAI_API_KEY" },
"baseURL": "https://api.openai.com/v1",
"models": ["gpt-4o", "gpt-4o-mini"],
"apiShape": "chat-completions"
}
},
"active": { "provider": "openai-prod", "model": "gpt-4o" }
}Two entries may share protocol: "openai-compatible" and remain independently selectable. Use this when you want one config to expose, say, an OpenAI cloud endpoint and a local OpenAI-compatible runtime:
{
"providers": {
"openai-prod": {
"protocol": "openai-compatible",
"apiKeyRef": { "kind": "env", "name": "OPENAI_API_KEY" },
"baseURL": "https://api.openai.com/v1",
"models": ["gpt-4o", "gpt-4o-mini"]
},
"bailian": {
"protocol": "openai-compatible",
"apiKeyRef": { "kind": "keyring", "name": "bailian-api-key" },
"baseURL": "http://192.168.1.253:8317/v1",
"models": ["qwen3.6-plus", "glm-5"]
}
},
"active": { "provider": "bailian", "model": "qwen3.6-plus" }
}Each entry registers (providerId, modelId) pairs in the model registry. /provider bailian and /provider openai-prod swap the active provider; /model qwen3.6-plus and /model gpt-4o swap within the active provider's models[]. See Cardinality and Activation — Providers are loaded unlimited and active unlimited.
apiShape |
Endpoint family | Use when |
|---|---|---|
chat-completions |
Chat Completions compatible stream shape | The backend exposes /chat/completions-style chunks. |
responses |
Responses compatible stream shape | The backend exposes the newer Responses stream shape. |
Both shapes normalize through the shared protocol adapter and emit the same internal StreamEvent union. See Protocol Adapters for the wire-shape mapping rows.
A single baseURL configuration value covers any backend that speaks one of the two OpenAI schemas. The apiShape toggle selects which wire contract the adapter assembles and parses.
The Chat Completions stream emits choices[].delta chunks. The adapter handles three delta classes:
| Delta class | Adapter behavior |
|---|---|
choices[].delta.content |
Emit text-delta. |
choices[].delta.tool_calls[] |
Buffer tool-call-delta keyed by index; emit assembled tool-call once name is complete and arguments parses as JSON. |
choices[].finish_reason |
Emit exactly one finish event with the normalized reason. |
The Chat Completions shape is the historical default and the most widely supported across OSS proxies, corporate gateways, and local runtimes.
The Responses stream emits typed events on a single event channel. The adapter handles four event classes:
| Event class | Adapter behavior |
|---|---|
response.output_text.delta |
Emit text-delta. |
response.function_call_arguments.delta |
Buffer tool-call-delta; emit assembled tool-call once arguments parse. |
response.reasoning_text.delta |
When passReasoningToLoop: true, emit reasoning; otherwise drop. |
response.completed |
Emit finish with the normalized reason. |
The Responses shape is the newer schema; backends that opt in expose richer event taxonomy (reasoning, structured output, citations) on the same channel.
baseURL is forwarded without mutation to the underlying ai-sdk client. The adapter does not normalize trailing slashes, append paths, or rewrite the URL. The configuration is the URL the backend serves; if the backend wants /v1/chat/completions, the user provides a baseURL that resolves correctly under the chosen apiShape.
The OpenAI-compatible reference provider declares streaming and toolCalling as hard; structuredOutput as preferred; everything else (multimodal, reasoning, contextWindow, promptCaching) as probed. See Model Capabilities for the seven-vector definition.
Per-vendor notes: the probed defaults exist because OpenAI-compatible backends vary widely. Capability Negotiation verifies the actual posture at session start before any dependent flow relies on it.
The adapter accepts the AI SDK's providerOptions.openai shape verbatim per @ai-sdk/openai@3.0.55. The native field set differs slightly between Responses and Chat Completions API shapes; both surfaces are covered below. Wire snake_case (e.g., reasoning_effort, service_tier, safety_identifier) is not accepted in defaultParams; only the AI SDK camelCase form is canonical. Wire names appear here as translation notes only.
| Field | Type / Values | Notes |
|---|---|---|
reasoningEffort |
"none" | "minimal" | "low" | "medium" | "high" | "xhigh" |
Default "medium". Per-model gating per OpenAI's API reference: o-series Chat Completions accepts `minimal |
reasoningSummary |
"auto" | "detailed" |
Default undefined. When set, reasoning summaries appear in the stream as reasoning events and in non-streaming responses under the reasoning field. |
forceReasoning |
bool | Forces a reasoning pass even on models where it would otherwise be skipped. |
textVerbosity |
"low" | "medium" | "high" |
Default "medium". |
serviceTier |
"auto" | "flex" | "priority" | "default" |
Default "auto". flex available on o3 / o4-mini / gpt-5; priority is Enterprise-gated. Wire name: service_tier. |
safetyIdentifier |
string | User-provided safety/abuse identifier. New in latest stable. Wire name: safety_identifier. |
systemMessageMode |
"system" | "developer" | "remove" |
Renders the assembled system layer as a system message, a developer message, or omits it entirely. "remove" interaction with Context Assembly is load-bearing: when the assembled system layer carries SM-stage bodies or system-message Context Provider contributions, systemMessageMode: "remove" emits a loud diagnostic and is rejected when the system layer is load-bearing. See Provider Params § Reserved options. |
parallelToolCalls |
bool | Default true. |
store |
bool | Default true. |
maxToolCalls |
integer | Cap on built-in tool-call invocations per response. |
metadata |
Record<string, string> |
Free metadata stored with the generation. |
conversation |
string | OpenAI Conversation id to continue. Mutually exclusive with previousResponseId. |
previousResponseId |
string | Continuation handle for the prior response. |
user |
string | End-user identifier (legacy alias of safetyIdentifier). |
logprobs |
bool | number |
Return token logprobs (or top-N where N is the number). |
truncation |
"auto" | "disabled" |
Truncation strategy when input exceeds context window. Default "disabled" (request fails on overflow). |
strictJsonSchema |
bool | Default true. Strict structured-output schema enforcement. |
include |
string[] |
Additional content to include in the response (["file_search_call.results"], ["message.output_text.logprobs"]). |
The Chat Completions surface accepts most of the Responses fields above (where applicable) plus provider-specific sampling controls that live here in the adapter-native bucket (not the common bucket):
| Field | Type / Values | Notes |
|---|---|---|
presencePenalty |
number | Sampling penalty. |
frequencyPenalty |
number | Sampling penalty. |
logitBias |
Record<string, number> |
Token-id biases. |
maxCompletionTokens |
integer | Cap on completion tokens (Chat-Completions-specific; the common bucket's maxOutputTokens maps here for o-series and standard models). |
reasoningEffort on Chat Completions accepts minimal | low | medium | high only — narrower than Responses.
The full defaultParams shape (zone split, validation, merge layers) is pinned by Provider Params. The reservations below are the OpenAI-specific carve-outs:
-
promptCacheKey,promptCacheRetention— cache identity is owned by Prompt Caching. The adapter forwardsprompt_cache_keyderived from the session id by default. The AI SDK exposes both as provider options; the wiki spec carves them out so cache identity routing remains an adapter concern in v1.promptCacheRetention: 'in_memory' | '24h'('24h' on 5.1 series only) becomes user-configurable in a future contract revision. -
instructions— Reserved. Settinginstructionswould create a second system-prompt surface that bypasses Context Assembly. The single assembled system layer is the canonical source. Future contract revision may define a constrained interaction. -
prediction— Speculative-decoding hint. Reserved in v1 pending a dedicated decoding-hints contract.
The adapter renders the Protocol Adapters system argument differently per apiShape:
apiShape |
Wire rendering of system
|
|---|---|
chat-completions |
A synthetic { role: "system", content } message is prepended to the messages array. |
responses |
The top-level instructions field carries the merged system content. |
Prefix caching on supported backends is automatic — there is no in-band cache marker to place. The adapter's job is to keep the static prefix byte-stable across turns: canonical JSON ordering, deterministic field order, no incidental whitespace drift. A stable prefix is the precondition for the backend's automatic prefix cache to hit.
Optional knobs the adapter forwards when the deployment exposes them:
-
prompt_cache_key— a routing identifier the backend uses to keep cache entries attached to the same logical session across replicas. The adapter derives this from the session id by default; deployments that pin the cache to a different identity can override. -
prompt_cache_retention— a retention hint where supported.
Cache observability lands on usage.prompt_tokens_details.cached_tokens in the stream's terminal finish usage payload, normalized into the shared usage bag described in Protocol Adapters.
See Prompt Caching for the cross-provider strategy.
The provider never stores plaintext keys in manifests and never resolves apiKeyRef during construction. Request execution resolves the reference through the host secrets surface when present, falling back to host.env.get for environment references.
401 responses and credential-resolution failures surface as ProviderTransient / Unauthorized without including the resolved credential in errors or logs.
| Wire condition | Core class + code |
|---|---|
| 429 / rate limit |
ProviderTransient / RateLimited
|
| 5xx |
ProviderTransient / Provider5xx
|
| 401 |
ProviderTransient / Unauthorized
|
| Network timeout |
ProviderTransient / NetworkTimeout
|
| Tool capability mismatch |
ProviderCapability / MissingToolCalling
|
The adapter honors signal and has no filesystem side effects.
- OpenAI-compatible reference provider:
apiShapedefaults tochat-completions; standard provider error taxonomy;apiKeyRefresolved at request time. - OpenAI-compatible wire-shape specialization on top of ai-sdk v6: both
chat-completionsandresponsesstream shapes with abaseURLtoggle.baseURLURL-routing semantics are forwarded without mutation. - System-prompt rendering:
chat-completionsprepends a synthetic system-role message;responsesuses top-levelinstructions. Automatic prefix caching benefits from byte-stable prefix serialization; optionalprompt_cache_key/prompt_cache_retentionforwarded where supported. Cache hits reported viausage.prompt_tokens_details.cached_tokens. See Prompt Caching.
- New "Native fields" section enumerates the AI SDK 3.0.55
providerOptions.openaisurface verbatim for both API shapes. Responses includes newsafetyIdentifier,systemMessageMode,forceReasoningfields;reasoningEffortcarries the fullnone | minimal | low | medium | high | xhighenum with per-model gating per OpenAI's API reference (GPT-5.1 has nominimal; GPT-5.1-Codex-Max acceptsnone | medium | high | xhigh;xhighfor "models after gpt-5.1-codex-max"). Chat Completions section calls out adapter-native sampling fields (presencePenalty,frequencyPenalty,logitBias,maxCompletionTokens). - Reserved (adapter-managed) subsection:
promptCacheKeyandpromptCacheRetention(cache identity owned by Prompt Caching; rationale documented);instructions(would create a second system-prompt surface bypassing Context Assembly);prediction(speculative-decoding hint deferred to future contract). -
systemMessageMode: "remove"documented as rejected when the assembled system layer is load-bearing (SM stage body,system-messageContext Provider contributions). Loud diagnostic when allowed; mutually exclusive with the assembled system layer in the load-bearing case. - Wire snake_case (
reasoning_effort,service_tier,safety_identifier, etc.) documented as translation notes only; canonical accepted shape indefaultParamsis the AI SDK camelCase form. - No removal of pre-existing prose; all changes are additive on top of 1.0.0.
- Execution Model
- Message Loop
- Concurrency and Cancellation
- Error Model
- Event and Command Ordering
- Event Bus
- Command Model
- Interaction Protocol
- Hook Taxonomy
- Host API
- Extension Lifecycle
- Env Provider
- Prompt Registry
- Resource Registry
- Session Lifecycle
- Session Manifest
- Persistence and Recovery
- Stage Executions
- Subagent Sessions
- Contract Pattern
- Versioning and Compatibility
- Deprecation Policy
- Capability Negotiation
- Dependency Resolution
- Validation Pipeline
- Cardinality and Activation
- Extension State
- Conformance and Testing
- Providers
- Provider Params
- Tools
- Hooks
- UI
- Loggers
- State Machines
- SM Stage Lifecycle
- Stage Definitions
- Commands
- Session Store
- Context Providers
- Settings Shape
- Trust Model
- Project Trust
- Extension Isolation
- Extension Integrity
- LLM Context Isolation
- Secrets Hygiene
- Security Modes
- Tool Approvals
- MCP Trust
- Sandboxing
- Configuration Scopes
- Project Root
- Extension Discovery
- Extension Installation
- Extension Reloading
- Headless and Interactor
- Determinism and Ordering
- Launch Arguments
- Network Policy
- Platform Integration
Tools
UI
Session Stores
Loggers
Providers
Hooks
Context Providers
Commands
- First Run
- Default Chat
- Tool Call Cycle
- Hook Interception
- Guard Deny Reproposal
- State Machine Workflow
- SM Stage Retry
- Hot Model Switch
- Capability Mismatch Switch
- Session Resume
- Session Resume Drift
- Approval and Auth
- Interaction Timeout
- Headless Run
- Parallel Tool Approvals
- Subagent Delegation
- Scope Layering
- Project First-Run Trust
- Reload Mid-Turn
- Compaction Warning
- MCP Remote Tool Call
- MCP Prompt Consume
- MCP Resource Bind
- MCP Reconnect