Website · Architecture Diagram
Multi-tenant, OpenAI-compatible LLM gateway. Route requests to cloud GPU pods (RunPod, Vast.ai, Lambda Labs), a local GPU, or commercial APIs — all behind one API endpoint.
Use cases:
- Self-host large models (7B–70B) on rented GPUs; pay only when running
- Drop-in replacement for OpenAI API — works with Open WebUI, Cursor, any OpenAI client
- Mix providers: local GPU for dev, cloud GPU for heavy loads, OpenAI as fallback
- Per-user billing, quotas, and tier restrictions for team deployments
- Quickstart
- Endpoints
- Tiers & Models
- Providers — RunPod, local GPU, OpenAI, Groq, DeepSeek, etc.
- Claude Code Router — use Claude Code with this bridge via ccr
- CLI Reference — user management, API keys, billing, observability
- Embeddings
- Preprocessing Pipeline
- Development & Architecture
- Deployment Modes — solo, hosted multi-user, gateway client, full self-hosted
- Workspace Tools — Open WebUI file I/O + code execution tool; includes recommended system prompt
- Website & Wiki Deployment — GitHub Pages, Netlify, Vercel, GitHub Wiki
# 1. Clone and configure
cp .env.example .env
$EDITOR .env # set provider keys, DB password, JWT secret
# 2. One-shot setup (builds images, starts stack, runs migrations, bootstraps admin)
bash scripts/setup.sh
# 3. Smoke test
curl -H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"llm-simple","messages":[{"role":"user","content":"hello"}]}' \
http://localhost:8000/v1/chat/completions| Service | Port | URL |
|---|---|---|
| Bridge API | 8000 | http://localhost:8000 |
| Dashboard | 8501 | http://localhost:8501 |
| Open WebUI | 3000 | http://localhost:3000 |
| CCR (claude-code-router) | 3456 | http://localhost:3456 |
| Gateway (optional) | 8080 | http://localhost:8080 |
| Postgres | 5432 | (internal) |
| Redis | 6379 | (internal) |
| Ollama | 11434 | (internal) |
Send requests using the OpenAI API format:
# Use a specific tier
curl -H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"llm-architecture","messages":[{"role":"user","content":"review this code"}]}' \
http://localhost:8000/v1/chat/completions
# Let the router pick (based on complexity)
curl ... -d '{"model":"llm-auto","messages":[...]}'
# Force a tier per-request
curl -H "X-Tier: simple" ...Available models: llm-simple, llm-mid, llm-architecture, llm-maximum, llm-ultra, llm-vision, llm-auto
See Tiers & Models for full details and tier locking options.
| Mode | Bridge | Gateway | Notes |
|---|---|---|---|
| Solo | Local | — | OPENWEBUI_BRIDGE_API_KEY = your key |
| Hosted multi-user | Shared server | — | Per-user keys via gpu-relay Pipeline |
| Gateway client | Remote (host's) | Local :8080 | Point Open WebUI at gateway |
| Full self-hosted | Own server | Optional | Complete control |
# Standalone — no main stack required
GATEWAY_BRIDGE_URL=https://your-bridge.example.com docker compose -f docker/docker-compose.gateway.yml up -dPoint any OpenAI client at http://localhost:8080 with your sk-llm- key. The gateway forwards requests to the upstream bridge with auth unchanged.
# .env — configure sync
OPENWEBUI_ADMIN_EMAIL=admin@example.com
OPENWEBUI_ADMIN_PASSWORD=<password>
PIPELINES_API_KEY=<from Open WebUI Admin → Pipelines>
# Create user with Open WebUI account + pipeline key mapping in one step
llmctl users add alice@example.com --sync-openwebui
llmctl users keys-add alice@example.com --sync-pipelineMOCK_PROVIDERS=1 ./scripts/smoke_test.shRoutes all requests to the local Ollama service — no cloud account, no billing. See Development.