Skip to content

OpenAI Compatible

Z-M-Huang edited this page May 1, 2026 · 7 revisions

OpenAI-Compatible

contractVersion: 1.1.0

Verified against: @ai-sdk/openai@3.0.55 (verification date 2026-05-01). Re-verified on every dependency bump to this package.

The bundled OpenAI-compatible provider is the reference extension for any backend that speaks an OpenAI-schema API. It supports OpenAI direct, self-hosted gateways, corporate proxies, and local runtimes through one baseURL configuration value.

protocol: openai-compatible


Configuration

Field Required Type Meaning
protocol yes "openai-compatible" Selects this adapter.
apiKeyRef yes reference ({ kind: "env" | "keyring", name }) Credential reference. Resolved at request time; never persisted in the manifest.
baseURL yes absolute http/https URL Forwarded without mutation.
models yes string[] The model names this entry serves. Each is selectable via /model while this provider is active.
apiShape no "chat-completions" (default) | "responses" Selects the wire contract.
timeoutMs no integer ms Request timeout budget.
defaultParams no object Provider-specific request defaults.

Required fields:

  • protocol is the literal string "openai-compatible". Core uses this — not the settings.json.providers map key — to look up the adapter.
  • apiKeyRef stores only a credential reference. The provider resolves it at request time.
  • baseURL is an absolute http or https URL and is forwarded without mutation.
  • models lists every model name the backend at baseURL serves. The provider entry exposes the union of these via /model; the active model defaults to models[0] unless active.model overrides.

Optional fields:

  • apiShape selects the wire contract. It defaults to chat-completions.
  • timeoutMs sets the request timeout budget.
  • defaultParams supplies provider-specific request defaults.

Example — the entry lives at settings.json.providers.<id>. The user picks the id (typically the backend's name):

{
  "providers": {
    "openai-prod": {
      "protocol": "openai-compatible",
      "apiKeyRef": { "kind": "env", "name": "OPENAI_API_KEY" },
      "baseURL": "https://api.openai.com/v1",
      "models": ["gpt-4o", "gpt-4o-mini"],
      "apiShape": "chat-completions"
    }
  },
  "active": { "provider": "openai-prod", "model": "gpt-4o" }
}

Multiple openai-compatible backends

Two entries may share protocol: "openai-compatible" and remain independently selectable. Use this when you want one config to expose, say, an OpenAI cloud endpoint and a local OpenAI-compatible runtime:

{
  "providers": {
    "openai-prod": {
      "protocol": "openai-compatible",
      "apiKeyRef": { "kind": "env", "name": "OPENAI_API_KEY" },
      "baseURL": "https://api.openai.com/v1",
      "models": ["gpt-4o", "gpt-4o-mini"]
    },
    "bailian": {
      "protocol": "openai-compatible",
      "apiKeyRef": { "kind": "keyring", "name": "bailian-api-key" },
      "baseURL": "http://192.168.1.253:8317/v1",
      "models": ["qwen3.6-plus", "glm-5"]
    }
  },
  "active": { "provider": "bailian", "model": "qwen3.6-plus" }
}

Each entry registers (providerId, modelId) pairs in the model registry. /provider bailian and /provider openai-prod swap the active provider; /model qwen3.6-plus and /model gpt-4o swap within the active provider's models[]. See Cardinality and Activation — Providers are loaded unlimited and active unlimited.


API shapes

apiShape Endpoint family Use when
chat-completions Chat Completions compatible stream shape The backend exposes /chat/completions-style chunks.
responses Responses compatible stream shape The backend exposes the newer Responses stream shape.

Both shapes normalize through the shared protocol adapter and emit the same internal StreamEvent union. See Protocol Adapters for the wire-shape mapping rows.


OpenAI-compatible wire-shape specialization

A single baseURL configuration value covers any backend that speaks one of the two OpenAI schemas. The apiShape toggle selects which wire contract the adapter assembles and parses.

chat-completions shape

The Chat Completions stream emits choices[].delta chunks. The adapter handles three delta classes:

Delta class Adapter behavior
choices[].delta.content Emit text-delta.
choices[].delta.tool_calls[] Buffer tool-call-delta keyed by index; emit assembled tool-call once name is complete and arguments parses as JSON.
choices[].finish_reason Emit exactly one finish event with the normalized reason.

The Chat Completions shape is the historical default and the most widely supported across OSS proxies, corporate gateways, and local runtimes.

responses shape

The Responses stream emits typed events on a single event channel. The adapter handles four event classes:

Event class Adapter behavior
response.output_text.delta Emit text-delta.
response.function_call_arguments.delta Buffer tool-call-delta; emit assembled tool-call once arguments parse.
response.reasoning_text.delta When passReasoningToLoop: true, emit reasoning; otherwise drop.
response.completed Emit finish with the normalized reason.

The Responses shape is the newer schema; backends that opt in expose richer event taxonomy (reasoning, structured output, citations) on the same channel.

baseURL routing

baseURL is forwarded without mutation to the underlying ai-sdk client. The adapter does not normalize trailing slashes, append paths, or rewrite the URL. The configuration is the URL the backend serves; if the backend wants /v1/chat/completions, the user provides a baseURL that resolves correctly under the chosen apiShape.

Capability posture

The OpenAI-compatible reference provider declares streaming and toolCalling as hard; structuredOutput as preferred; everything else (multimodal, reasoning, contextWindow, promptCaching) as probed. See Model Capabilities for the seven-vector definition.

Per-vendor notes: the probed defaults exist because OpenAI-compatible backends vary widely. Capability Negotiation verifies the actual posture at session start before any dependent flow relies on it.


Native fields {#native-fields}

The adapter accepts the AI SDK's providerOptions.openai shape verbatim per @ai-sdk/openai@3.0.55. The native field set differs slightly between Responses and Chat Completions API shapes; both surfaces are covered below. Wire snake_case (e.g., reasoning_effort, service_tier, safety_identifier) is not accepted in defaultParams; only the AI SDK camelCase form is canonical. Wire names appear here as translation notes only.

Responses API (apiShape: "responses")

Field Type / Values Notes
reasoningEffort "none" | "minimal" | "low" | "medium" | "high" | "xhigh" Default "medium". Per-model gating per OpenAI's API reference: o-series Chat Completions accepts `minimal
reasoningSummary "auto" | "detailed" Default undefined. When set, reasoning summaries appear in the stream as reasoning events and in non-streaming responses under the reasoning field.
forceReasoning bool Forces a reasoning pass even on models where it would otherwise be skipped.
textVerbosity "low" | "medium" | "high" Default "medium".
serviceTier "auto" | "flex" | "priority" | "default" Default "auto". flex available on o3 / o4-mini / gpt-5; priority is Enterprise-gated. Wire name: service_tier.
safetyIdentifier string User-provided safety/abuse identifier. New in latest stable. Wire name: safety_identifier.
systemMessageMode "system" | "developer" | "remove" Renders the assembled system layer as a system message, a developer message, or omits it entirely. "remove" interaction with Context Assembly is load-bearing: when the assembled system layer carries SM-stage bodies or system-message Context Provider contributions, systemMessageMode: "remove" emits a loud diagnostic and is rejected when the system layer is load-bearing. See Provider Params § Reserved options.
parallelToolCalls bool Default true.
store bool Default true.
maxToolCalls integer Cap on built-in tool-call invocations per response.
metadata Record<string, string> Free metadata stored with the generation.
conversation string OpenAI Conversation id to continue. Mutually exclusive with previousResponseId.
previousResponseId string Continuation handle for the prior response.
user string End-user identifier (legacy alias of safetyIdentifier).
logprobs bool | number Return token logprobs (or top-N where N is the number).
truncation "auto" | "disabled" Truncation strategy when input exceeds context window. Default "disabled" (request fails on overflow).
strictJsonSchema bool Default true. Strict structured-output schema enforcement.
include string[] Additional content to include in the response (["file_search_call.results"], ["message.output_text.logprobs"]).

Chat Completions API (apiShape: "chat-completions", default)

The Chat Completions surface accepts most of the Responses fields above (where applicable) plus provider-specific sampling controls that live here in the adapter-native bucket (not the common bucket):

Field Type / Values Notes
presencePenalty number Sampling penalty.
frequencyPenalty number Sampling penalty.
logitBias Record<string, number> Token-id biases.
maxCompletionTokens integer Cap on completion tokens (Chat-Completions-specific; the common bucket's maxOutputTokens maps here for o-series and standard models).

reasoningEffort on Chat Completions accepts minimal | low | medium | high only — narrower than Responses.

Reserved (adapter-managed; defaultParams MAY NOT set)

The full defaultParams shape (zone split, validation, merge layers) is pinned by Provider Params. The reservations below are the OpenAI-specific carve-outs:

  • promptCacheKey, promptCacheRetention — cache identity is owned by Prompt Caching. The adapter forwards prompt_cache_key derived from the session id by default. The AI SDK exposes both as provider options; the wiki spec carves them out so cache identity routing remains an adapter concern in v1. promptCacheRetention: 'in_memory' | '24h' ('24h' on 5.1 series only) becomes user-configurable in a future contract revision.
  • instructions — Reserved. Setting instructions would create a second system-prompt surface that bypasses Context Assembly. The single assembled system layer is the canonical source. Future contract revision may define a constrained interaction.
  • prediction — Speculative-decoding hint. Reserved in v1 pending a dedicated decoding-hints contract.

System-prompt rendering and prefix caching

The adapter renders the Protocol Adapters system argument differently per apiShape:

apiShape Wire rendering of system
chat-completions A synthetic { role: "system", content } message is prepended to the messages array.
responses The top-level instructions field carries the merged system content.

Prefix caching on supported backends is automatic — there is no in-band cache marker to place. The adapter's job is to keep the static prefix byte-stable across turns: canonical JSON ordering, deterministic field order, no incidental whitespace drift. A stable prefix is the precondition for the backend's automatic prefix cache to hit.

Optional knobs the adapter forwards when the deployment exposes them:

  • prompt_cache_key — a routing identifier the backend uses to keep cache entries attached to the same logical session across replicas. The adapter derives this from the session id by default; deployments that pin the cache to a different identity can override.
  • prompt_cache_retention — a retention hint where supported.

Cache observability lands on usage.prompt_tokens_details.cached_tokens in the stream's terminal finish usage payload, normalized into the shared usage bag described in Protocol Adapters.

See Prompt Caching for the cross-provider strategy.


Auth and secrets hygiene

The provider never stores plaintext keys in manifests and never resolves apiKeyRef during construction. Request execution resolves the reference through the host secrets surface when present, falling back to host.env.get for environment references.

401 responses and credential-resolution failures surface as ProviderTransient / Unauthorized without including the resolved credential in errors or logs.


Error translation

Wire condition Core class + code
429 / rate limit ProviderTransient / RateLimited
5xx ProviderTransient / Provider5xx
401 ProviderTransient / Unauthorized
Network timeout ProviderTransient / NetworkTimeout
Tool capability mismatch ProviderCapability / MissingToolCalling

The adapter honors signal and has no filesystem side effects.


Related pages


Changelog

1.0.0 — initial

  • OpenAI-compatible reference provider: apiShape defaults to chat-completions; standard provider error taxonomy; apiKeyRef resolved at request time.
  • OpenAI-compatible wire-shape specialization on top of ai-sdk v6: both chat-completions and responses stream shapes with a baseURL toggle. baseURL URL-routing semantics are forwarded without mutation.
  • System-prompt rendering: chat-completions prepends a synthetic system-role message; responses uses top-level instructions. Automatic prefix caching benefits from byte-stable prefix serialization; optional prompt_cache_key / prompt_cache_retention forwarded where supported. Cache hits reported via usage.prompt_tokens_details.cached_tokens. See Prompt Caching.

1.1.0 — defaultParams shape; native fields enumerated; reserved keys

  • New "Native fields" section enumerates the AI SDK 3.0.55 providerOptions.openai surface verbatim for both API shapes. Responses includes new safetyIdentifier, systemMessageMode, forceReasoning fields; reasoningEffort carries the full none | minimal | low | medium | high | xhigh enum with per-model gating per OpenAI's API reference (GPT-5.1 has no minimal; GPT-5.1-Codex-Max accepts none | medium | high | xhigh; xhigh for "models after gpt-5.1-codex-max"). Chat Completions section calls out adapter-native sampling fields (presencePenalty, frequencyPenalty, logitBias, maxCompletionTokens).
  • Reserved (adapter-managed) subsection: promptCacheKey and promptCacheRetention (cache identity owned by Prompt Caching; rationale documented); instructions (would create a second system-prompt surface bypassing Context Assembly); prediction (speculative-decoding hint deferred to future contract).
  • systemMessageMode: "remove" documented as rejected when the assembled system layer is load-bearing (SM stage body, system-message Context Provider contributions). Loud diagnostic when allowed; mutually exclusive with the assembled system layer in the load-bearing case.
  • Wire snake_case (reasoning_effort, service_tier, safety_identifier, etc.) documented as translation notes only; canonical accepted shape in defaultParams is the AI SDK camelCase form.
  • No removal of pre-existing prose; all changes are additive on top of 1.0.0.

Introduction

Reading

Core runtime

Contracts

Category contracts

Context

Security

Runtime behavior

Operations

Providers (bundled)

Integrations

Reference extensions

Tools

UI

Session Stores

Loggers

Providers

Hooks

Context Providers

Commands

Case studies

Flows

Maintainers

Clone this wiki locally