Skip to content

agentsmith-project/llm-universal-proxy

Repository files navigation

LLM Universal Proxy

中文文档 · Documentation

llmup is a single-binary LLM HTTP proxy. Put it between your client and your real model provider, and it gives you one stable local entrypoint even when the client protocol and upstream protocol do not match.

It is most useful when you want to:

  • use non-native models behind Codex CLI
  • route Claude Code or Gemini CLI through one local proxy
  • expose stable local model aliases instead of vendor model IDs

Important

llmup is designed for provider APIs and compatible endpoints. It is not a bridge into vendor first-party app subscriptions or bundled first-party CLI entitlements unless that vendor explicitly documents that kind of third-party access.

LLMUP dashboard

The optional local dashboard helps you inspect routing, streaming, cancellation, upstream state, and hook activity while the proxy is running.

Quick Start

The GA user-entry path is provider-neutral and starts with the CLI wrappers. The recommended config source is examples/quickstart-provider-neutral.yaml, using these stable local aliases:

  • preset-openai-compatible for the OpenAI-compatible lane
  • preset-anthropic-compatible for the Anthropic-compatible lane

MiniMax is only a replaceable OpenAI-compatible example, not a GA-required provider and not the mainline preset name. A concrete OpenAI + MiniMax sample remains in examples/quickstart-openai-minimax.yaml for users who want to replace the preset placeholders with named providers.

The provider-neutral config source is:

listen: 127.0.0.1:8080
upstream_timeout_secs: 120

upstreams:
  PRESET-ANTHROPIC-COMPATIBLE:
    api_root: PRESET_ANTHROPIC_ENDPOINT_BASE_URL
    format: anthropic
    provider_key_env: PRESET_ENDPOINT_API_KEY
    limits:
      context_window: 200000
      max_output_tokens: 128000
    surface_defaults:
      modalities:
        input: ["text"]
        output: ["text"]
      tools:
        supports_search: false
        supports_view_image: false
        apply_patch_transport: freeform
        supports_parallel_calls: false

  PRESET-OPENAI-COMPATIBLE:
    api_root: PRESET_OPENAI_ENDPOINT_BASE_URL
    format: openai-completion
    provider_key_env: PRESET_ENDPOINT_API_KEY
    limits:
      context_window: 200000
      max_output_tokens: 128000
    surface_defaults:
      modalities:
        input: ["text"]
        output: ["text"]
      tools:
        supports_search: false
        supports_view_image: false
        apply_patch_transport: freeform
        supports_parallel_calls: false

model_aliases:
  preset-anthropic-compatible: "PRESET-ANTHROPIC-COMPATIBLE:PRESET_ENDPOINT_MODEL"
  preset-openai-compatible: "PRESET-OPENAI-COMPATIBLE:PRESET_ENDPOINT_MODEL"

Set the preset environment variables before starting a wrapper-managed session:

git clone https://github.com/agentsmith-project/llm-universal-proxy.git
cd llm-universal-proxy
cargo build --locked --release

export PRESET_OPENAI_ENDPOINT_BASE_URL="https://openai-compatible.example/v1"
export PRESET_ANTHROPIC_ENDPOINT_BASE_URL="https://anthropic-compatible.example/v1"
export PRESET_ENDPOINT_MODEL="provider-model-id"
export PRESET_ENDPOINT_API_KEY="provider-api-key"
export LLM_UNIVERSAL_PROXY_AUTH_MODE=proxy_key
export LLM_UNIVERSAL_PROXY_KEY="local-proxy-key"

What those variables do:

Variable Used for
PRESET_OPENAI_ENDPOINT_BASE_URL API root for the OpenAI-compatible upstream, including its version segment such as /v1
PRESET_ANTHROPIC_ENDPOINT_BASE_URL API root for the Anthropic-compatible upstream
PRESET_ENDPOINT_MODEL Provider model ID hydrated into both preset aliases
PRESET_ENDPOINT_API_KEY Env-sourced server-side provider credential used by both preset upstreams
LLM_UNIVERSAL_PROXY_AUTH_MODE Compatibility fallback for data-plane auth when static data_auth is omitted; use proxy_key when the proxy holds provider keys
LLM_UNIVERSAL_PROXY_KEY Proxy API key for clients in proxy_key mode; also used by the env fallback or by data_auth.proxy_key.env when configured

The PRESET_* values are a wrapper/config-source contract. The wrappers hydrate them into a concrete runtime config before starting the proxy. If you run llm-universal-proxy --config directly, replace the placeholders with concrete URLs and model names first.

Prefer static data_auth in YAML for data-plane auth; LLM_UNIVERSAL_PROXY_AUTH_MODE and LLM_UNIVERSAL_PROXY_KEY are the environment fallback when data_auth is omitted. In proxy_key mode, each upstream provider credential can come from provider_key.inline, provider_key.env, or legacy provider_key_env; the preset source above uses the legacy env-name form so wrappers can hydrate it from PRESET_ENDPOINT_API_KEY.

Reasoning effort such as xhigh is a client/request-side setting, not part of the model name. Keep the alias stable and set reasoning in the request or client config.

Compatibility Contract

llmup gives clients a stable local protocol surface, not unlimited provider equivalence.

  • same-provider/native passthrough preserves provider-native fields and lifecycle state
  • compatible same-protocol lanes promise portable core/portable fields only; they are not native provider passthrough
  • translated paths target a portable core and may warn or reject non-portable provider-native features
  • native extensions and provider-owned lifecycle state stay on same-provider/native paths unless a documented shim says otherwise
  • Responses reasoning/compaction continuity is mode-bound: default/max_compat may drop an opaque carrier only when visible summary text or visible transcript history remains; strict/balanced fail closed; opaque-only reasoning and opaque-only compaction fail closed; same-provider/native passthrough preserves provider-owned state
  • the quickstart includes conservative text-only surface_defaults; turn on search, image, or parallel-tool flags only when that model surface really supports them
  • multimodal surface.modalities.input gates media types, not every source transport; HTTP(S) image/PDF URLs are distinct from provider or local URIs such as gs://, s3://, and file://
  • Gemini inlineData can be preserved when translating to OpenAI Chat/Responses, but all Gemini fileData.fileUri sources currently fail closed until an explicit fetch/upload adapter exists
  • typed media metadata must be internally consistent; conflicting MIME hints such as mime_type versus a file_data data URI are rejected before the upstream call

Codex / Claude Code / Gemini Basic Setup

For day-to-day usage, prefer the repo's wrapper scripts instead of hand-configuring each client. They handle local environment isolation, base URL injection, preset hydration, and client-specific metadata.

The defaults in scripts/interactive_cli.py match the provider-neutral preset names:

Client Default wrapper model
Codex CLI preset-openai-compatible
Claude Code preset-anthropic-compatible
Gemini CLI preset-openai-compatible

Codex CLI

bash scripts/run_codex_proxy.sh \
  --config-source examples/quickstart-provider-neutral.yaml \
  --workspace "$PWD" \
  --model preset-openai-compatible

Claude Code

bash scripts/run_claude_proxy.sh \
  --config-source examples/quickstart-provider-neutral.yaml \
  --workspace "$PWD" \
  --model preset-anthropic-compatible

Gemini CLI

bash scripts/run_gemini_proxy.sh \
  --config-source examples/quickstart-provider-neutral.yaml \
  --workspace "$PWD" \
  --model preset-openai-compatible

Pass --proxy-base http://127.0.0.1:8080 when you want to attach to a proxy you started separately. When --proxy-base is omitted, the wrapper renders the preset config, starts the proxy, waits for /health, launches the client, and stops the proxy when the session exits.

Wrapper base URL and actual proxy endpoint are related but not identical.

For Codex specifically, the wrapper currently fixes wire_api="responses", so Codex uses the Responses route:

Client Wrapper-configured base URL Client appends Proxy endpoint actually hit
Codex CLI OPENAI_BASE_URL=<proxy>/openai/v1 /responses /openai/v1/responses
Claude Code ANTHROPIC_BASE_URL=<proxy>/anthropic /v1/messages /anthropic/v1/messages
Gemini CLI GOOGLE_GEMINI_BASE_URL=<proxy>/google /v1beta/models/... /google/v1beta/models/...

Codex especially benefits from the wrapper because it injects temporary model metadata for proxy-backed aliases. For more detail, see docs/clients.md.

Most Common Static Configuration

The static YAML story is intentionally small:

Field Purpose
listen Proxy listen address
upstream_timeout_secs Upstream request timeout
data_auth Process-wide data-plane auth mode; if omitted, the proxy uses the environment fallback
upstreams Named upstream API roots, formats, and credential policy
model_aliases Stable local names mapped to UPSTREAM:MODEL
surface_defaults / surface Optional client-visible capability metadata for wrappers and model catalogs
proxy Optional default upstream egress proxy
hooks Optional usage / exchange export hooks
debug_trace Optional local debug trace

Practical rules:

  • api_root should be the provider API root and include its version segment, such as .../v1 or .../v1beta
  • format pins the upstream protocol: openai-responses, openai-completion, anthropic, or google
  • aliases such as preset-openai-compatible and preset-anthropic-compatible are local names; they do not need to equal the upstream model ID
  • use structured aliases only when you want extra limits or surface metadata on top of target: UPSTREAM:MODEL
  • the provider-neutral PRESET_* placeholders are for wrapper-rendered config sources; direct static YAML should contain concrete URLs and model IDs
  • data_auth is a process-wide setting for all data-plane routes, not a per-upstream field; provider_key.inline, provider_key.env, and legacy provider_key_env choose per-upstream provider credential sources in proxy_key mode
  • if data_auth is omitted, LLM_UNIVERSAL_PROXY_AUTH_MODE and LLM_UNIVERSAL_PROXY_KEY provide the compatibility environment fallback

For the full YAML reference and more examples, see docs/configuration.md.

Container Image

Release images are published at ghcr.io/agentsmith-project/llm-universal-proxy. The current published container release is v0.2.27; Cargo package version 0.2.28 is the next release identity, not a published container tag yet. For production, pin ghcr.io/agentsmith-project/llm-universal-proxy:v0.2.27 or the published digest instead of relying on latest. Container usage, Docker Compose, one-minute smoke verification, Admin Dashboard auth boundaries, and GHCR access for authenticated or public pulls are documented in docs/container.md.

Dynamic Configuration Overview

Static YAML is the default. If you need live updates, the proxy also exposes admin endpoints for reading runtime state, replacing namespace config, and rotating the global data-plane auth config without restarting the whole process. Namespace payloads use a runtime shape. Global data_auth is either static YAML, the environment fallback, or the Admin API /admin/data-auth state.

Current admin endpoints:

  • GET /admin/state
  • GET /admin/data-auth
  • PUT /admin/data-auth
  • GET /admin/namespaces/:namespace/state
  • POST /admin/namespaces/:namespace/config

That flow is documented in docs/admin-dynamic-config.md.

Keep Reading

License

MIT License

About

LLM universal proxy

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors