One local endpoint. All your AI providers.
Official Documentation β’ Quick Start β’ Installation β’ Setup Providers β’ API Reference
switchAILocal is a unified API gateway that lets you use all your AI providers through a single OpenAI-compatible endpoint running on your machine.
| Feature | Description |
|---|---|
| π¨ Modern Web UI | Single-file React dashboard to configure providers, manage model routing, and adjust settings (226 KB, zero dependencies) |
| π Use Your Subscriptions | Connect Gemini CLI, Claude Code, Codex, Ollama, and moreβno API keys needed |
| π― Single Endpoint | Any OpenAI-compatible tool works with http://localhost:18080 |
| π CLI Attachments | Pass files and folders directly to CLI providers via extra_body.cli |
| π§ Superbrain Intelligence | Autonomous self-healing: monitors executions, diagnoses failures with AI, auto-responds to prompts, restarts with corrective flags, and routes to fallback providers |
| βοΈ Load Balancing | Round-robin across multiple accounts per provider |
| π Intelligent Failover | Smart routing to alternatives based on capabilities and success rates |
| π Observability Ready | Enterprise-grade JSON structured logging, raw NDJSON event streams, and a native /metrics endpoint for Prometheus & Grafana integration |
| π Local-First | Everything runs on your machine, your data never leaves |
| Provider | CLI Tool | Prefix | Status |
|---|---|---|---|
| Google Gemini | gemini |
geminicli: |
β Ready |
| Anthropic Claude | claude |
claudecli: |
β Ready |
| OpenAI Codex | codex |
codex: |
β Ready |
| Mistral Vibe | vibe |
vibe: |
β Ready |
| OpenCode | opencode |
opencode: |
β Ready |
| Provider | Prefix | Status |
|---|---|---|
| Ollama | ollama: |
β Ready |
| LM Studio | lmstudio: |
β Ready |
| Provider | Prefix | Status |
|---|---|---|
| Traylinx switchAI | switchai: |
β Ready |
| Google AI Studio | gemini: |
β Ready |
| Anthropic API | claude: |
β Ready |
| OpenAI API | openai: |
β Ready |
| OpenRouter | openai-compat: |
β Ready |
| Requirement | Minimum | Notes |
|---|---|---|
| Node.js | 18+ | For npx install (recommended) |
| Go | 1.25+ | Only needed for building from source |
| Docker | Optional | Only needed for ail start --docker |
| macOS | Ventura+ | Linux support is experimental |
npx @traylinx/switchailocalThat's it. Downloads the right binary for your platform, caches it, and runs.
docker run -p 18080:18080 ghcr.io/traylinx/switchailocal:latestgit clone https://github.com/traylinx/switchAILocal.git
cd switchAILocal
# First-time setup (builds binaries, installs bridge service, registers CLI)
./ail.sh setup
# Then start the server (from anywhere, after setup)
ail start
# OR start with Docker (add --build to force rebuild)
ail start --docker --buildChoose the authentication method that works best for you:
If you already have gemini, claude, or vibe CLI tools installed and authenticated, switchAILocal uses them automatically. No additional login required!
# Just use the CLI prefix - it works immediately
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{"model": "geminicli:gemini-2.5-pro", "messages": [...]}'- β Zero configuration - Uses your existing CLI authentication
- β
Works immediately - No
--loginneeded - β
Supports:
geminicli:,claudecli:,codex:,vibe:,opencode:
Add your AI Studio or Anthropic API keys to config.yaml:
gemini:
api-key: "your-gemini-api-key"
claude:
api-key: "your-claude-api-key"Then use without the cli suffix: gemini:, claude:
Only needed if:
- β You don't have the CLI tools installed
- β You don't have API keys
- β You want switchAILocal to manage OAuth tokens directly
# Optional OAuth login (alternative to CLI wrappers)
./switchAILocal --login # Google Gemini OAuth
./switchAILocal --claude-login # Anthropic Claude OAuthGEMINI_CLIENT_ID and GEMINI_CLIENT_SECRET environment variables. Most users should use Option A (CLI wrappers) instead.
π See the Provider Guide for detailed setup instructions.
./ail.sh statusThe server runs on http://localhost:18080.
When you omit the provider prefix, switchAILocal automatically routes to an available provider:
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "gemini-2.5-pro",
"messages": [{"role": "user", "content": "Hello!"}]
}'Use the provider:model format to route to a specific provider:
# Force Gemini CLI
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "geminicli:gemini-2.5-pro",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Force Ollama
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "ollama:llama3.2",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Force Claude CLI
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "claudecli:claude-sonnet-4",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Force LM Studio
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "lmstudio:mistral-7b",
"messages": [{"role": "user", "content": "Hello!"}]
}'If you use external generic APIs (e.g., DashScope, Groq) via openai-compatibility but your HTTP client strictly hardcodes a legacy model string (e.g., "model": "alibaba:qwen-plus" with a colon), you can seamlessly intercept this payload by assigning an explicit alias:
# config.yaml
openai-compatibility:
- name: "alibaba"
prefix: "alibaba"
base-url: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
models-url: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/models"
api-key-entries:
- api-key: "sk-your-key-here"
models:
- name: "qwen-plus" # Actual DashScope upstream model
alias: "alibaba:qwen-plus" # Captures legacy strictly formatted client requestsNow, your locked-in client can directly query switchAILocal without code changes:
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-your-key" \
-d '{
"model": "alibaba:qwen-plus",
"messages": [{"role": "user", "content": "Hello!"}]
}'curl http://localhost:18080/v1/models \
-H "Authorization: Bearer sk-test-123"
# Filter by modality (text, image, audio, embedding, vision)
curl http://localhost:18080/v1/models?modality=image \
-H "Authorization: Bearer sk-test-123"switchAILocal supports the full OpenAI multimodal API surface:
curl http://localhost:18080/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{"model": "switchai-embed", "input": "Hello world"}'curl http://localhost:18080/v1/images/generations \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{"model": "dall-e-3", "prompt": "A sunset over mountains", "size": "1024x1024"}'curl http://localhost:18080/v1/images/edits \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{"model": "dall-e-3", "prompt": "Add a rainbow", "images": [{"image_url": "https://example.com/photo.png"}]}'Works natively via the MiniMax T2A adapter β gateway translates OpenAI shape to MiniMax /v1/t2a_pro behind the scenes. Use MiniMax voice IDs (not OpenAI voice names):
curl http://localhost:18080/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "minimax:speech-02-hd",
"input": "Hello from switchAILocal",
"voice": "male-qn-qingse",
"response_format": "mp3"
}' --output speech.mp3Common voice IDs: male-qn-qingse, female-shaonv, audiobook_male_2, presenter_male, clever_boy. Formats: mp3, pcm, flac, wav.
curl http://localhost:18080/v1/audio/transcriptions \
-H "Authorization: Bearer sk-test-123" \
-F file="@audio.mp3" \
-F model="whisper-large-v3"Generate songs (text-to-music) or covers via MiniMax. 30β90s sync, 100/day on Plus plan. Pass stream: true to stream raw MP3 bytes as they arrive (~20s TTFB instead of ~60s wall-clock).
# Sync: returns JSON with base64-encoded MP3
curl http://localhost:18080/v1/music/generations \
-H "Authorization: Bearer sk-test-123" \
-H "Content-Type: application/json" \
-d '{
"model": "minimax:music-2.6",
"lyrics": "[Verse]\nHello world\n[Chorus]\nLet us sing"
}' | jq -r .data.audio | base64 -d > song.mp3
# Stream: returns audio/mpeg directly, playable as it downloads
curl -N http://localhost:18080/v1/music/generations \
-H "Authorization: Bearer sk-test-123" \
-H "Content-Type: application/json" \
-d '{
"model": "minimax:music-2.6",
"stream": true,
"lyrics": "[Verse]\nHello world\n[Chorus]\nLet us sing"
}' > song.mp3Sync returns {data: {audio: <base64>, duration_ms, sample_rate, channels, bitrate}, model, trace_id}. Streaming returns Content-Type: audio/mpeg with progressive MP3 frames. For cover mode use "model": "minimax:music-cover" with "audio_url" or "audio_base64" + "prompt".
curl http://localhost:18080/v1/music/lyrics \
-H "Authorization: Bearer sk-test-123" \
-H "Content-Type: application/json" \
-d '{"mode":"write_full_song","prompt":"happy pop about coding"}'Returns {song_title, style_tags, lyrics} with [Verse]/[Chorus] structure tags.
curl http://localhost:18080/v1/audio/translations \
-H "Authorization: Bearer sk-test-123" \
-F file="@german_audio.mp3" \
-F model="whisper-1"switchAILocal passes through all standard and provider-specific parameters, so you can use tools, thinking mode, vision, and more β just like talking directly to the upstream provider.
Built-in tools like web search are forwarded as-is to the upstream. MiniMax M2.7 is the canonical web-search provider β the model runs the search internally and returns the synthesized answer in choices[0].message.content.
Critical: set
max_tokens >= 2000when using web_search. Search results are folded into the prompt and inflate context to 6kβ13k tokens β lower budgets produce truncated or empty responses.
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "minimax:MiniMax-M2.7",
"messages": [
{"role": "user", "content": "What are the latest AI news today?"}
],
"max_tokens": 2000,
"tools": [
{
"type": "web_search",
"max_keyword": 3,
"force_search": true,
"limit": 3
}
],
"tool_choice": "auto"
}'Standard OpenAI function calling works with any compatible provider:
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "gemini-2.5-pro",
"messages": [
{"role": "user", "content": "What is the weather in Berlin?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'Enable extended thinking for complex reasoning tasks:
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "claudecli:claude-sonnet-4",
"messages": [
{"role": "user", "content": "Prove that the square root of 2 is irrational"}
],
"thinking": {
"type": "enabled",
"budget_tokens": 10000
}
}'Send images for analysis using the standard multimodal content format:
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "geminicli:gemini-2.5-pro",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What do you see in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}
]
}'Pass files and folders directly to CLI providers via extra_body.cli:
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-d '{
"model": "geminicli:gemini-2.5-pro",
"messages": [
{"role": "user", "content": "Review this code for security issues"}
],
"extra_body": {
"cli": {
"files": ["/path/to/main.go", "/path/to/server.go"],
"directories": ["/path/to/internal/"]
}
}
}'All chat endpoints support SSE streaming:
curl http://localhost:18080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-test-123" \
-N \
-d '{
"model": "gemini-2.5-pro",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}'from openai import OpenAI
client = OpenAI(
base_url="http://localhost:18080/v1",
api_key="sk-test-123", # Must match a key in config.yaml
)
# Recommended: Auto-routing (switchAILocal picks the best available provider)
completion = client.chat.completions.create(
model="gemini-2.5-pro", # No prefix = auto-route to any logged-in provider
messages=[
{"role": "user", "content": "What is the meaning of life?"}
]
)
print(completion.choices[0].message.content)
# Streaming example
stream = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
# Optional: Explicit provider selection (use prefix only when needed)
completion = client.chat.completions.create(
model="ollama:llama3.2", # Force Ollama provider
messages=[{"role": "user", "content": "Hello!"}]
)import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:18080/v1',
apiKey: 'sk-test-123', // Must match a key in config.yaml
});
async function main() {
// Auto-routing
const completion = await client.chat.completions.create({
model: 'gemini-2.5-pro',
messages: [
{ role: 'user', content: 'What is the meaning of life?' }
],
});
console.log(completion.choices[0].message.content);
// Explicit provider selection
const ollamaResponse = await client.chat.completions.create({
model: 'ollama:llama3.2', // Force Ollama
messages: [
{ role: 'user', content: 'Hello!' }
],
});
}
main();from openai import OpenAI
client = OpenAI(
base_url="http://localhost:18080/v1",
api_key="sk-test-123",
)
stream = client.chat.completions.create(
model="geminicli:gemini-2.5-pro",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)All settings are in config.yaml. Copy the example to get started:
cp config.example.yaml config.yamlKey configuration options:
# Server port (default: 18080)
port: 18080
# Enable Ollama integration
ollama:
enabled: true
base-url: "http://localhost:11434"
# Enable LM Studio
lmstudio:
enabled: true
base-url: "http://localhost:1234/v1"
# Enable LUA plugins for request/response modification
plugin:
enabled: true
plugin-dir: "./plugins"π See Configuration Guide for all options.
The Cortex Router plugin provides intelligent, multi-tier routing that automatically selects the optimal model based on request content.
Enable intelligent routing in config.yaml:
plugin:
enabled: true
enabled-plugins:
- "cortex-router"
intelligence:
enabled: true
router-model: "ollama:qwen:0.5b" # Fast classification model
matrix:
coding: "switchai-chat"
reasoning: "switchai-reasoner"
fast: "switchai-fast"
secure: "ollama:llama3.2" # Local model for sensitive dataWhen you use model="auto" or model="cortex", the router analyzes your request through multiple tiers:
- Reflex Tier (<1ms): Pattern matching for obvious cases (code blocks β coding model, PII β secure model)
- Semantic Tier (<20ms): Embedding-based intent matching (requires Phase 2)
- Cognitive Tier (200-500ms): LLM-based classification with confidence scoring
# Automatic intelligent routing
completion = client.chat.completions.create(
model="auto", # Let Cortex Router decide
messages=[{"role": "user", "content": "Write a Python function to sort a list"}]
)
# β Routes to coding model automaticallyEnable advanced features for even smarter routing:
intelligence:
enabled: true
# Semantic matching (faster than LLM classification)
embedding:
enabled: true
semantic-tier:
enabled: true
# Skill-based prompt augmentation
skill-matching:
enabled: true
# Quality-based model cascading
cascade:
enabled: true21 Pre-built Skills including:
- Language experts (Go, Python, TypeScript)
- Infrastructure (Docker, Kubernetes, DevOps)
- Security, Testing, Debugging
- Frontend, Vision, and more
π See Cortex Router Phase 2 Guide for full documentation.
π Read the Official switchAILocal Documentation
| Guide | Description |
|---|---|
| Installation | Getting started guide |
| Performance Guide | Rate limiting, load shedding, and profiling |
| Configuration | All configuration options |
| Providers | Setting up AI providers |
| API Reference | REST API documentation |
| Intelligent Systems | Memory, Heartbeat, Steering, and Hooks |
| Advanced Features | Payload overrides, failover, and more |
| State Box | Secure state management & configuration |
| Management Dashboard | Modern web UI for provider setup, model routing & settings |
# Recommended: use the setup script
./ail.sh setup
# OR build manually
go build -o switchAILocal ./cmd/server
go build -o bridge-agent ./cmd/bridge-agent
# Build the Management UI (optional)
./ail_ui.sh| Guide | Description |
|---|---|
| SDK Usage | Embed switchAILocal in your Go apps |
| LUA Plugins | Custom request/response hooks |
| SDK Advanced | Create custom providers |
Contributions are welcome!
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes
- Push and open a Pull Request
MIT License - see LICENSE for details.