Skip to content

traylinx/switchAILocal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

229 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
switchAILocal Logo

switchAILocal

One local endpoint. All your AI providers.

Official Documentation β€’ Quick Start β€’ Installation β€’ Setup Providers β€’ API Reference


What is switchAILocal?

switchAILocal is a unified API gateway that lets you use all your AI providers through a single OpenAI-compatible endpoint running on your machine.

Key Benefits

Feature Description
🎨 Modern Web UI Single-file React dashboard to configure providers, manage model routing, and adjust settings (226 KB, zero dependencies)
πŸ”‘ Use Your Subscriptions Connect Gemini CLI, Claude Code, Codex, Ollama, and moreβ€”no API keys needed
🎯 Single Endpoint Any OpenAI-compatible tool works with http://localhost:18080
πŸ“Ž CLI Attachments Pass files and folders directly to CLI providers via extra_body.cli
🧠 Superbrain Intelligence Autonomous self-healing: monitors executions, diagnoses failures with AI, auto-responds to prompts, restarts with corrective flags, and routes to fallback providers
βš–οΈ Load Balancing Round-robin across multiple accounts per provider
πŸ”„ Intelligent Failover Smart routing to alternatives based on capabilities and success rates
πŸ“Š Observability Ready Enterprise-grade JSON structured logging, raw NDJSON event streams, and a native /metrics endpoint for Prometheus & Grafana integration
πŸ”’ Local-First Everything runs on your machine, your data never leaves

Supported Providers

CLI Tools (Use Your Paid Subscriptions)

Provider CLI Tool Prefix Status
Google Gemini gemini geminicli: βœ… Ready
Anthropic Claude claude claudecli: βœ… Ready
OpenAI Codex codex codex: βœ… Ready
Mistral Vibe vibe vibe: βœ… Ready
OpenCode opencode opencode: βœ… Ready

Local Models

Provider Prefix Status
Ollama ollama: βœ… Ready
LM Studio lmstudio: βœ… Ready

Cloud APIs

Provider Prefix Status
Traylinx switchAI switchai: βœ… Ready
Google AI Studio gemini: βœ… Ready
Anthropic API claude: βœ… Ready
OpenAI API openai: βœ… Ready
OpenRouter openai-compat: βœ… Ready

Prerequisites

Requirement Minimum Notes
Node.js 18+ For npx install (recommended)
Go 1.25+ Only needed for building from source
Docker Optional Only needed for ail start --docker
macOS Ventura+ Linux support is experimental

Quick Start

Option 1: npx (Recommended)

npx @traylinx/switchailocal

That's it. Downloads the right binary for your platform, caches it, and runs.

Option 2: Docker

docker run -p 18080:18080 ghcr.io/traylinx/switchailocal:latest

Option 3: Build from Source

git clone https://github.com/traylinx/switchAILocal.git
cd switchAILocal

# First-time setup (builds binaries, installs bridge service, registers CLI)
./ail.sh setup

# Then start the server (from anywhere, after setup)
ail start

# OR start with Docker (add --build to force rebuild)
ail start --docker --build

2. Connect Your Providers

Choose the authentication method that works best for you:

Option A: Local CLI Wrappers (Recommended - Zero Setup)

If you already have gemini, claude, or vibe CLI tools installed and authenticated, switchAILocal uses them automatically. No additional login required!

# Just use the CLI prefix - it works immediately
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{"model": "geminicli:gemini-2.5-pro", "messages": [...]}'
  • βœ… Zero configuration - Uses your existing CLI authentication
  • βœ… Works immediately - No --login needed
  • βœ… Supports: geminicli:, claudecli:, codex:, vibe:, opencode:

Option B: API Keys (Standard)

Add your AI Studio or Anthropic API keys to config.yaml:

gemini:
  api-key: "your-gemini-api-key"
claude:
  api-key: "your-claude-api-key"

Then use without the cli suffix: gemini:, claude:

Option C: OAuth Cloud Proxy (Advanced - Alternative to CLI)

Only needed if:

  • ❌ You don't have the CLI tools installed
  • ❌ You don't have API keys
  • βœ… You want switchAILocal to manage OAuth tokens directly
# Optional OAuth login (alternative to CLI wrappers)
./switchAILocal --login        # Google Gemini OAuth
./switchAILocal --claude-login # Anthropic Claude OAuth

⚠️ Note: This requires GEMINI_CLIENT_ID and GEMINI_CLIENT_SECRET environment variables. Most users should use Option A (CLI wrappers) instead.

πŸ“– See the Provider Guide for detailed setup instructions.

3. Check Status

./ail.sh status

The server runs on http://localhost:18080.


Usage Examples

Basic Request (Auto-Routing)

When you omit the provider prefix, switchAILocal automatically routes to an available provider:

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Explicit Provider Selection

Use the provider:model format to route to a specific provider:

# Force Gemini CLI
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "geminicli:gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Force Ollama
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "ollama:llama3.2",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Force Claude CLI
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "claudecli:claude-sonnet-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Force LM Studio
curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "lmstudio:mistral-7b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Custom Providers & Legacy Client Support

If you use external generic APIs (e.g., DashScope, Groq) via openai-compatibility but your HTTP client strictly hardcodes a legacy model string (e.g., "model": "alibaba:qwen-plus" with a colon), you can seamlessly intercept this payload by assigning an explicit alias:

# config.yaml
openai-compatibility:
  - name: "alibaba"
    prefix: "alibaba"
    base-url: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    models-url: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/models"
    api-key-entries:
      - api-key: "sk-your-key-here"
    models:
      - name: "qwen-plus"                # Actual DashScope upstream model
        alias: "alibaba:qwen-plus"       # Captures legacy strictly formatted client requests

Now, your locked-in client can directly query switchAILocal without code changes:

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-key" \
  -d '{
    "model": "alibaba:qwen-plus",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

List Available Models

curl http://localhost:18080/v1/models \
  -H "Authorization: Bearer sk-test-123"

# Filter by modality (text, image, audio, embedding, vision)
curl http://localhost:18080/v1/models?modality=image \
  -H "Authorization: Bearer sk-test-123"

Multimodal API

switchAILocal supports the full OpenAI multimodal API surface:

Embeddings

curl http://localhost:18080/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{"model": "switchai-embed", "input": "Hello world"}'

Image Generation

curl http://localhost:18080/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{"model": "dall-e-3", "prompt": "A sunset over mountains", "size": "1024x1024"}'

Image Editing

curl http://localhost:18080/v1/images/edits \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{"model": "dall-e-3", "prompt": "Add a rainbow", "images": [{"image_url": "https://example.com/photo.png"}]}'

Text-to-Speech

Works natively via the MiniMax T2A adapter β€” gateway translates OpenAI shape to MiniMax /v1/t2a_pro behind the scenes. Use MiniMax voice IDs (not OpenAI voice names):

curl http://localhost:18080/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "minimax:speech-02-hd",
    "input": "Hello from switchAILocal",
    "voice": "male-qn-qingse",
    "response_format": "mp3"
  }' --output speech.mp3

Common voice IDs: male-qn-qingse, female-shaonv, audiobook_male_2, presenter_male, clever_boy. Formats: mp3, pcm, flac, wav.

Audio Transcription

curl http://localhost:18080/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-test-123" \
  -F file="@audio.mp3" \
  -F model="whisper-large-v3"

Music Generation

Generate songs (text-to-music) or covers via MiniMax. 30–90s sync, 100/day on Plus plan. Pass stream: true to stream raw MP3 bytes as they arrive (~20s TTFB instead of ~60s wall-clock).

# Sync: returns JSON with base64-encoded MP3
curl http://localhost:18080/v1/music/generations \
  -H "Authorization: Bearer sk-test-123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax:music-2.6",
    "lyrics": "[Verse]\nHello world\n[Chorus]\nLet us sing"
  }' | jq -r .data.audio | base64 -d > song.mp3

# Stream: returns audio/mpeg directly, playable as it downloads
curl -N http://localhost:18080/v1/music/generations \
  -H "Authorization: Bearer sk-test-123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax:music-2.6",
    "stream": true,
    "lyrics": "[Verse]\nHello world\n[Chorus]\nLet us sing"
  }' > song.mp3

Sync returns {data: {audio: <base64>, duration_ms, sample_rate, channels, bitrate}, model, trace_id}. Streaming returns Content-Type: audio/mpeg with progressive MP3 frames. For cover mode use "model": "minimax:music-cover" with "audio_url" or "audio_base64" + "prompt".

Lyrics Generation

curl http://localhost:18080/v1/music/lyrics \
  -H "Authorization: Bearer sk-test-123" \
  -H "Content-Type: application/json" \
  -d '{"mode":"write_full_song","prompt":"happy pop about coding"}'

Returns {song_title, style_tags, lyrics} with [Verse]/[Chorus] structure tags.

Audio Translation (to English)

curl http://localhost:18080/v1/audio/translations \
  -H "Authorization: Bearer sk-test-123" \
  -F file="@german_audio.mp3" \
  -F model="whisper-1"

Advanced Features

switchAILocal passes through all standard and provider-specific parameters, so you can use tools, thinking mode, vision, and more β€” just like talking directly to the upstream provider.

Tool Calling (Web Search)

Built-in tools like web search are forwarded as-is to the upstream. MiniMax M2.7 is the canonical web-search provider β€” the model runs the search internally and returns the synthesized answer in choices[0].message.content.

Critical: set max_tokens >= 2000 when using web_search. Search results are folded into the prompt and inflate context to 6k–13k tokens β€” lower budgets produce truncated or empty responses.

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "minimax:MiniMax-M2.7",
    "messages": [
      {"role": "user", "content": "What are the latest AI news today?"}
    ],
    "max_tokens": 2000,
    "tools": [
      {
        "type": "web_search",
        "max_keyword": 3,
        "force_search": true,
        "limit": 3
      }
    ],
    "tool_choice": "auto"
  }'

Function Calling (OpenAI Standard)

Standard OpenAI function calling works with any compatible provider:

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "gemini-2.5-pro",
    "messages": [
      {"role": "user", "content": "What is the weather in Berlin?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City name"},
              "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Thinking / Reasoning Mode

Enable extended thinking for complex reasoning tasks:

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "claudecli:claude-sonnet-4",
    "messages": [
      {"role": "user", "content": "Prove that the square root of 2 is irrational"}
    ],
    "thinking": {
      "type": "enabled",
      "budget_tokens": 10000
    }
  }'

Vision (Image Analysis)

Send images for analysis using the standard multimodal content format:

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "geminicli:gemini-2.5-pro",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What do you see in this image?"},
          {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
        ]
      }
    ]
  }'

CLI Attachments (Files & Folders)

Pass files and folders directly to CLI providers via extra_body.cli:

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -d '{
    "model": "geminicli:gemini-2.5-pro",
    "messages": [
      {"role": "user", "content": "Review this code for security issues"}
    ],
    "extra_body": {
      "cli": {
        "files": ["/path/to/main.go", "/path/to/server.go"],
        "directories": ["/path/to/internal/"]
      }
    }
  }'

Streaming

All chat endpoints support SSE streaming:

curl http://localhost:18080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-test-123" \
  -N \
  -d '{
    "model": "gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

SDK Integration

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:18080/v1",
    api_key="sk-test-123",  # Must match a key in config.yaml
)

# Recommended: Auto-routing (switchAILocal picks the best available provider)
completion = client.chat.completions.create(
    model="gemini-2.5-pro",  # No prefix = auto-route to any logged-in provider
    messages=[
        {"role": "user", "content": "What is the meaning of life?"}
    ]
)
print(completion.choices[0].message.content)

# Streaming example
stream = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

# Optional: Explicit provider selection (use prefix only when needed)
completion = client.chat.completions.create(
    model="ollama:llama3.2",  # Force Ollama provider
    messages=[{"role": "user", "content": "Hello!"}]
)

JavaScript/Node.js (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:18080/v1',
  apiKey: 'sk-test-123', // Must match a key in config.yaml
});

async function main() {
  // Auto-routing
  const completion = await client.chat.completions.create({
    model: 'gemini-2.5-pro',
    messages: [
      { role: 'user', content: 'What is the meaning of life?' }
    ],
  });

  console.log(completion.choices[0].message.content);

  // Explicit provider selection
  const ollamaResponse = await client.chat.completions.create({
    model: 'ollama:llama3.2',  // Force Ollama
    messages: [
      { role: 'user', content: 'Hello!' }
    ],
  });
}

main();

Streaming Example (Python)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:18080/v1",
    api_key="sk-test-123",
)

stream = client.chat.completions.create(
    model="geminicli:gemini-2.5-pro",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Configuration

All settings are in config.yaml. Copy the example to get started:

cp config.example.yaml config.yaml

Key configuration options:

# Server port (default: 18080)
port: 18080

# Enable Ollama integration
ollama:
  enabled: true
  base-url: "http://localhost:11434"

# Enable LM Studio
lmstudio:
  enabled: true
  base-url: "http://localhost:1234/v1"

# Enable LUA plugins for request/response modification
plugin:
  enabled: true
  plugin-dir: "./plugins"

πŸ“– See Configuration Guide for all options.


Cortex Router: Intelligent Model Selection

The Cortex Router plugin provides intelligent, multi-tier routing that automatically selects the optimal model based on request content.

Quick Start

Enable intelligent routing in config.yaml:

plugin:
  enabled: true
  enabled-plugins:
    - "cortex-router"

intelligence:
  enabled: true
  router-model: "ollama:qwen:0.5b"  # Fast classification model
  matrix:
    coding: "switchai-chat"
    reasoning: "switchai-reasoner"
    fast: "switchai-fast"
    secure: "ollama:llama3.2"  # Local model for sensitive data

How It Works

When you use model="auto" or model="cortex", the router analyzes your request through multiple tiers:

  1. Reflex Tier (<1ms): Pattern matching for obvious cases (code blocks β†’ coding model, PII β†’ secure model)
  2. Semantic Tier (<20ms): Embedding-based intent matching (requires Phase 2)
  3. Cognitive Tier (200-500ms): LLM-based classification with confidence scoring
# Automatic intelligent routing
completion = client.chat.completions.create(
    model="auto",  # Let Cortex Router decide
    messages=[{"role": "user", "content": "Write a Python function to sort a list"}]
)
# β†’ Routes to coding model automatically

Phase 2 Features (Optional)

Enable advanced features for even smarter routing:

intelligence:
  enabled: true
  
  # Semantic matching (faster than LLM classification)
  embedding:
    enabled: true
  semantic-tier:
    enabled: true
  
  # Skill-based prompt augmentation
  skill-matching:
    enabled: true
  
  # Quality-based model cascading
  cascade:
    enabled: true

21 Pre-built Skills including:

  • Language experts (Go, Python, TypeScript)
  • Infrastructure (Docker, Kubernetes, DevOps)
  • Security, Testing, Debugging
  • Frontend, Vision, and more

πŸ“– See Cortex Router Phase 2 Guide for full documentation.


Documentation

πŸ“š Read the Official switchAILocal Documentation

For Users

Guide Description
Installation Getting started guide
Performance Guide Rate limiting, load shedding, and profiling
Configuration All configuration options
Providers Setting up AI providers
API Reference REST API documentation
Intelligent Systems Memory, Heartbeat, Steering, and Hooks
Advanced Features Payload overrides, failover, and more
State Box Secure state management & configuration
Management Dashboard Modern web UI for provider setup, model routing & settings

Build from Source

# Recommended: use the setup script
./ail.sh setup

# OR build manually
go build -o switchAILocal ./cmd/server
go build -o bridge-agent ./cmd/bridge-agent

# Build the Management UI (optional)
./ail_ui.sh

For Developers

Guide Description
SDK Usage Embed switchAILocal in your Go apps
LUA Plugins Custom request/response hooks
SDK Advanced Create custom providers

Contributing

Contributions are welcome!

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes
  4. Push and open a Pull Request

License

MIT License - see LICENSE for details.


Maintained by Sebastian Schkudlara

About

One local endpoint. All your AI providers. A unified API gateway that bridges Gemini, Claude, Ollama, and CLI tools into a single OpenAI-compatible server.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors