Multi-provider LLM gateway with automatic fallback and cost tracking. Provides a single HTTP API that routes requests across DeepSeek, Gemini, OpenAI, Anthropic, Ollama — and any OpenAI-compatible API — trying cheaper providers first and falling back automatically on failure.
# Install dependencies
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Set up at least one provider
export LLM_PROVIDER=deepseek
export DEEPSEEK_API_KEY=your-key
# Start the server
python main.pyThe server runs on http://localhost:8090 by default.
All endpoints are served under /api/v1.0/.
| Endpoint | Method | Description |
|---|---|---|
/api/v1.0/classify |
POST | Classify items using AI (returns JSON) |
/api/v1.0/plan |
POST | Generate structured plans using AI (returns JSON) |
/api/v1.0/embed |
POST | Generate text embeddings (requires OPENAI_API_KEY) |
/api/v1.0/chat/completions |
POST | OpenAI-compatible chat with optional tool call support |
/api/v1.0/health |
GET | Health check with provider status |
Deprecated endpoints: The following unversioned routes still work but are deprecated and will be removed in a future release. Migrate to the
/api/v1.0/equivalents above.
Legacy Endpoint Replacement POST /classifyPOST /api/v1.0/classifyPOST /planPOST /api/v1.0/planPOST /embedPOST /api/v1.0/embedPOST /v1/chat/completionsPOST /api/v1.0/chat/completionsGET /healthGET /api/v1.0/health
Send a prompt, get back a JSON classification response.
curl -X POST http://localhost:8090/api/v1.0/classify \
-H "Content-Type: application/json" \
-d '{"prompt": "Classify these items: ..."}'Generate a structured plan from context and a system prompt.
curl -X POST http://localhost:8090/api/v1.0/plan \
-H "Content-Type: application/json" \
-d '{
"context": {"task": "...", "constraints": []},
"system_prompt": "You are a planner. Return JSON."
}'Generate text embeddings using OpenAI's embedding models.
curl -X POST http://localhost:8090/api/v1.0/embed \
-H "Content-Type: application/json" \
-d '{"text": "text to embed"}'Request body:
text: String or list of strings to embedmodel: Embedding model (default:text-embedding-ada-002)
Response:
{
"embeddings": [[0.1, 0.2, ...]],
"model": "text-embedding-ada-002",
"dimensions": 1536,
"ai_call_log": {
"provider": "openai",
"model": "text-embedding-ada-002",
"prompt_tokens": 5,
"completion_tokens": 0,
"cost_microcents": 1,
"latency_ms": 150,
"success": true
}
}OpenAI-compatible endpoint supporting optional tool calls. Provider-specific translation (e.g. Anthropic tool format) is handled transparently.
curl -X POST http://localhost:8090/api/v1.0/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Hello"}]
}'Check service health and provider status.
curl http://localhost:8090/api/v1.0/healthResponse:
{
"status": "healthy",
"providers": [{"name": "deepseek", "model": "deepseek-chat"}],
"embeddings_available": true
}All configuration is via environment variables. Copy .env.example to .env and fill in your keys. Provider definitions (pricing, timeouts, features) live in providers.json.
| Variable | Default | Description |
|---|---|---|
LLM_PROVIDER |
auto |
Provider: auto, ollama, deepseek, gemini, openai, anthropic |
When LLM_PROVIDER=auto, providers are tried in the priority order defined in providers.json (default: cheapest first). Only providers with configured env vars are used.
| Variable | Description |
|---|---|
OLLAMA_HOST |
Ollama server URL (e.g., http://localhost:11434) |
OLLAMA_MODEL |
Ollama model (e.g., qwen2.5-coder:14b) |
DEEPSEEK_API_KEY |
DeepSeek API key |
DEEPSEEK_MODEL |
DeepSeek model (default: deepseek-chat) |
GEMINI_API_KEY |
Google Gemini API key |
GEMINI_MODEL |
Gemini model (default: gemini-2.0-flash) |
OPENAI_API_KEY |
OpenAI API key (also required for /embed) |
OPENAI_MODEL |
OpenAI model (default: gpt-4o-mini) |
ANTHROPIC_API_KEY |
Anthropic API key |
ANTHROPIC_MODEL |
Anthropic model (default: claude-3-5-sonnet-20241022) |
At least one provider must have its required env vars configured (API key, or host for Ollama). Model env vars are optional — defaults come from providers.json.
| Variable | Default | Description |
|---|---|---|
PORT |
8090 |
HTTP port |
LOG_LEVEL |
INFO |
Logging level |
Any provider with an OpenAI-compatible API (Groq, Together, Mistral, etc.) can be added with just a JSON entry. Add to providers.json:
{
"providers": {
"groq": {
"kind": "openai_compatible",
"base_url": "https://api.groq.com/openai/v1",
"env_key": "GROQ_API_KEY",
"env_model": "GROQ_MODEL",
"default_model": "llama-3.3-70b-versatile",
"timeout": 60,
"features": { "tool_calls": true, "json_mode": true },
"pricing": { "input_per_1k_microcents": 0.59, "output_per_1k_microcents": 0.79 }
}
}
}Then set GROQ_API_KEY in your environment. That's it — no Python changes needed.
For providers with non-OpenAI APIs (like Anthropic or Gemini), create a provider class in providers/ that extends Provider, then register its kind in providers/registry.py's _KIND_MAP.
| Field | Required | Description |
|---|---|---|
kind |
Yes | Provider class: openai_compatible, anthropic, gemini, ollama |
env_key |
Yes* | Env var for API key (*or env_host for Ollama) |
env_model |
No | Env var to override default model |
default_model |
Yes | Fallback model if env var is unset |
base_url |
No | API base URL (omit for default OpenAI endpoint) |
timeout |
No | Request timeout in seconds (default: 300) |
api_params |
No | Extra API params: max_tokens, temperature, etc. |
features |
No | tool_calls, json_mode, reasoning_content |
pricing |
No | input_per_1k_microcents, output_per_1k_microcents |
# Run all tests
pytest -v
# Run with coverage
pytest --cov=. --cov-report=term-missing
# Run specific test file
pytest tests/test_providers.py -vPre-built images are published to GitHub Container Registry on every release.
# Pull and run
docker run -p 8090:8090 \
-e LLM_PROVIDER=auto \
-e DEEPSEEK_API_KEY=key \
ghcr.io/nullrabbitlabs/llm-gateway:latestPin to a specific version in production:
docker pull ghcr.io/nullrabbitlabs/llm-gateway:1.0.0To build locally instead:
docker build -t llm-gateway .
docker run -p 8090:8090 \
-e LLM_PROVIDER=auto \
-e DEEPSEEK_API_KEY=key \
llm-gateway┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Your Svc A │ │ Your Svc B │ │ Your Svc C │
│ │ │ │ │ │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ HTTP │ HTTP │ HTTP
▼ ▼ ▼
┌──────────────────────────────────────────────────────┐
│ llm-gateway (Python) │
│ ┌────────────────────────────────────────────────┐ │
│ │ providers.json (registry) │ │
│ │ ┌──────────────────────────────────────────┐ │ │
│ │ │ OpenAI-compatible: DeepSeek, OpenAI, ... │ │ │
│ │ │ Custom: Anthropic, Gemini, Ollama │ │ │
│ │ └──────────────────────────────────────────┘ │ │
│ │ Features: Auto-fallback, Cost tracking, Retries │ │
│ │ Endpoints: /api/v1.0/classify, /plan, /embed │ │
│ │ /chat/completions, /health │ │
│ └────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
See CONTRIBUTING.md.
LLM Gateway is the provider abstraction layer used by NullRabbit's AI agents for autonomous threat analysis across validator infrastructure and decentralised networks.
It is open-sourced as a standalone tool because multi-provider routing with cost tracking and automatic fallback is useful beyond security — if you're building AI agents or pipelines that need resilient LLM access, this does the job.
MIT — see LICENSE.