Multi-provider LLM proxy for Claude Code. Route different agent roles to different model providers with automatic fallback, racing, circuit breakers, and a native desktop GUI.
- Session pool timer leak fix —
closeAll()now clears thesweepTimerinterval (#199) - Tauri GUI crash protection — defensive
if letreplaces unsafe.unwrap()in setup (#200) - WS reconnect timer cleanup — prevent dual polling on reconnect (#198)
- Monitor signal handler — defensive exit handler prevents double-signal crashes (#197)
- Per-model connection pools — each model gets its own HTTP/2 connection for TCP isolation (#186)
- GOAWAY-aware retry — graceful HTTP/2 drain no longer marks pool as "failed" (#188)
View all releases · Full changelog
ModelWeaver sits between Claude Code and upstream model providers as a local HTTP proxy. It inspects the model field in each Anthropic Messages API request and routes it to the best-fit provider.
Claude Code ──→ ModelWeaver ──→ Anthropic (primary)
(localhost) ──→ OpenRouter (fallback)
│
1. Match exact model name (modelRouting)
2. Match tier via substring (tierPatterns)
3. Fallback on 429 / 5xx errors
4. Race remaining providers on 429
- Tier-based routing — route by model family (sonnet/opus/haiku) using substring pattern matching
- Exact model routing — route specific model names to dedicated providers (checked first)
- Automatic fallback — transparent failover on rate limits (429) and server errors (5xx)
- Adaptive racing — on 429, automatically races remaining providers simultaneously
- Model name rewriting — each provider in the chain can use a different model name
- Weighted distribution — spread traffic across providers by weight percentage
- Circuit breaker — per-provider circuit breaker with closed/open/half-open states, prevents hammering unhealthy providers
- Request hedging — sends multiple copies when a provider shows high latency variance (CV > 0.5), returns the fastest response
- TTFB timeout — fails slow providers before full timeout elapses (configurable per provider)
- Stall detection — detects stalled streams and aborts them, triggering fallback
- Connection pooling — per-provider undici Agent dispatcher with configurable pool size
- Per-model connection pools — isolate HTTP/2 connections per model via
modelPoolsconfig for TCP-level isolation - Connection retry — automatic retry with exponential backoff for stale connections, TTFB timeouts, and GOAWAY drains
- Session agent pooling — reuses HTTP/2 agents across requests within the same session for connection affinity
- Adaptive TTFB — dynamically adjusts TTFB timeout based on observed latency history
- GOAWAY-aware retry — graceful HTTP/2 GOAWAY drain no longer marks pool as "failed"
- Stream buffering — optional time-based and size-based SSE buffering (
streamBufferMs,streamBufferBytes) - Health scores — per-provider health scoring based on latency and error rates
- Provider error tracking — per-provider error counts with status code breakdown, displayed in GUI in real-time
- Concurrent limits — cap concurrent requests per provider
- Interactive setup wizard — guided configuration with API key validation, hedging config, and provider editing
- Config hot-reload — changes to config file are picked up automatically, no restart needed
- Daemon mode — background process with auto-restart, launchd integration, and reload support
- Desktop GUI — native Tauri app with real-time progress bars, provider health, error breakdown, and recent request history
Claude Code sessions die when the API returns 429 or 5xx — you lose context, wait, and retry manually. With multiple subagents (Explore, Plan, code review) firing simultaneously, one rate limit can cascade into multiple failed agents.
- 429 → instant race: Instead of waiting for retry-after, all remaining providers are raced simultaneously. Recovery in <2s vs 30-60s wait.
- Circuit breaker (3 failures in 60s → open): A degraded provider stops receiving traffic within seconds, not minutes. Auto-recovers via half-open probing.
- Connection retry (exponential backoff, up to 5 retries): Stale HTTP/2 connections, TTFB timeouts, and GOAWAY frames are retried transparently before escalating to fallback.
- Global backoff: If ALL providers are unhealthy, returns 503 immediately instead of wasting 30+ seconds trying each one sequentially.
Running everything through Anthropic API at Tier 1/2 rates gets expensive. A full Claude Code session with multiple subagents can generate 50-100+ API calls.
- Weighted routing with health blending:
finalWeight = (1-0.3) * staticWeight + 0.3 * healthScore. Traffic distribution automatically shifts toward healthier providers when a provider degrades. - Tier-based routing: Haiku-tier Explore agents (cheap, fast) never accidentally hit Opus-tier pricing. Sonnet coding agents don't burn expensive Opus tokens.
- Model rewriting per provider: The same
claude-sonnet-4-6model name can route to different models on different providers — zero config changes in Claude Code.
Some providers have extreme latency variance (CV 1.5-4.0). A request that normally takes 3 seconds might take 30 seconds, freezing your coding session.
- Request hedging (CV > 0.5 triggers): Sends 2-4 copies of the same request simultaneously and returns the fastest response. Production data shows providers with CV 1.5-4.0 benefit significantly from hedging.
- Adaptive TTFB: Dynamically adjusts the "time to first byte" timeout based on each provider's observed latency history. No false positives on slow-but-healthy providers, fast failure on stuck ones.
- Stall detection (default 15s): If a streaming response stops sending data mid-stream, it aborts and falls back immediately.
- Health-score reordering: Providers with health scores below 0.5 are deprioritized to the end of fallback chains automatically. Score =
0.7 * successRate + 0.3 * latencyScore(5-minute rolling window). - Session agent pooling: Reuses HTTP/2 connections across requests within the same session. Eliminates TCP+TLS handshake overhead per request — critical when subagents fire 10+ requests in rapid succession.
When coding through a proxy, you're normally blind to why responses are slow or failing.
- Desktop GUI: Real-time progress bars showing which provider handled each request, response time, and whether hedging fired.
- Health scores API:
curl /api/health-scoresshows per-provider scores (0-1). A score of 0.3 means the provider is failing ~50% of requests. - Error breakdown: Per-provider error counts with status code breakdown. Spot patterns (e.g., a provider returning 502s consistently).
- Circuit breaker state: See which providers are open/closed/half-open in real-time.
- Hot-reload (300ms debounce): Edit
config.yamland the daemon picks up changes automatically. No restart, no killed in-flight requests. - SIGHUP reload: After rebuilding from source,
modelweaver reloadrestarts the worker without killing the monitor.
| Pain Point | Feature | Advantage |
|---|---|---|
| 429 rate limits kill sessions | Adaptive racing on 429 | <2s recovery vs 30-60s wait |
| Provider downtime | Fallback chains + circuit breaker | Automatic failover, no manual intervention |
| High latency variance | Hedging (CV > 0.5) + adaptive TTFB | 3-5s responses even with 30s tail latency |
| Expensive API bills | Weighted distribution + tier routing | Traffic to cheapest acceptable provider |
| Blind to failures | GUI + health scores + error tracking | Know exactly what's failing and why |
| Stale connections | Connection retry + GOAWAY handling | Transparent recovery, no visible errors |
| Config changes need restart | Hot-reload (300ms debounce) | Change weights mid-session, zero downtime |
| Connection overhead per request | Session agent pooling (HTTP/2 reuse) | Faster sequential requests from subagents |
- Node.js 20 or later — Install Node.js
npx— included with Node.js (no separate install needed)
ModelWeaver requires no permanent install — npx downloads and runs it on the fly. But if you prefer a global install:
npm install -g @kianwoon/modelweaverAfter that, replace npx @kianwoon/modelweaver with modelweaver (or the shorter mw) in all commands below.
npx @kianwoon/modelweaver initThe wizard guides you through:
- Selecting from 6 preset providers (Anthropic, OpenRouter, Together AI, GLM/Z.ai, Minimax, Fireworks)
- Testing API keys to verify connectivity
- Setting up model routing tiers and hedging config
- Creating
~/.modelweaver/config.yamland~/.modelweaver/.env
# Foreground (see logs in terminal)
npx @kianwoon/modelweaver
# Background daemon (auto-restarts on crash)
npx @kianwoon/modelweaver start
# Install as launchd service (auto-start at login)
npx @kianwoon/modelweaver installexport ANTHROPIC_BASE_URL=http://localhost:3456
export ANTHROPIC_API_KEY=unused-but-required
claudenpx @kianwoon/modelweaver init # Interactive setup wizard
npx @kianwoon/modelweaver start # Start as background daemon
npx @kianwoon/modelweaver stop # Stop background daemon
npx @kianwoon/modelweaver status # Show daemon status + service state
npx @kianwoon/modelweaver remove # Stop daemon + remove PID and log files
npx @kianwoon/modelweaver reload # Reload daemon worker (after rebuild)
npx @kianwoon/modelweaver install # Install launchd service (auto-start at login)
npx @kianwoon/modelweaver uninstall # Uninstall launchd service
npx @kianwoon/modelweaver gui # Launch desktop GUI (auto-downloads binary)
npx @kianwoon/modelweaver [options] # Run in foreground -p, --port <number> Server port (default: from config)
-c, --config <path> Config file path (auto-detected)
-v, --verbose Enable debug logging (default: off)
-h, --help Show help
--global Edit global config only
--path <file> Write config to a specific file
Run ModelWeaver as a background process that survives terminal closure and auto-recovers from crashes.
npx @kianwoon/modelweaver start # Start (forks monitor + daemon)
npx @kianwoon/modelweaver status # Check if running
npx @kianwoon/modelweaver reload # Reload worker after rebuild
npx @kianwoon/modelweaver stop # Graceful stop (SIGTERM → SIGKILL after 5s)
npx @kianwoon/modelweaver remove # Stop + remove PID file + log file
npx @kianwoon/modelweaver install # Install launchd service
npx @kianwoon/modelweaver uninstall # Uninstall launchd serviceHow it works: start forks a lightweight monitor process that owns the PID file. The monitor spawns the actual daemon worker. If the worker crashes, the monitor auto-restarts it with exponential backoff starting at 500ms (up to 10 attempts). After 60 seconds of stable running, the restart counter resets.
modelweaver.pid → Monitor process (handles signals, watches child)
└── modelweaver.worker.pid → Daemon worker (runs HTTP server)
Files:
~/.modelweaver/modelweaver.pid— monitor PID~/.modelweaver/modelweaver.worker.pid— worker PID~/.modelweaver/modelweaver.log— daemon output log
ModelWeaver ships a native desktop GUI built with Tauri. No Rust toolchain needed — the binary is auto-downloaded from GitHub Releases.
npx @kianwoon/modelweaver guiFirst run downloads the latest binary for your platform (~10-30 MB). Subsequent launches use the cached version.
GUI features:
- Real-time progress bars with provider name and model info
- Provider health cards with error counts and status code breakdown
- Recent request history sorted by timestamp
- Config validation error banner
- Auto-reconnect on daemon restart
Supported platforms:
| Platform | Format |
|---|---|
| macOS (Apple Silicon) | .dmg |
| macOS (Intel) | .dmg |
| Linux (x86_64) | .AppImage |
| Windows (x86_64) | .msi |
Cached files are stored in ~/.modelweaver/gui/ with version tracking — new versions download automatically on the next gui launch.
Checked in order (first found wins):
./modelweaver.yaml(project-local)~/.modelweaver/config.yaml(user-global)
server:
port: 3456 # Server port (default: 3456)
host: localhost # Bind address (default: localhost)
streamBufferMs: 0 # Time-based stream flush threshold (default: disabled)
streamBufferBytes: 0 # Size-based stream flush threshold (default: disabled)
globalBackoffEnabled: true # Global backoff on repeated failures (default: true)
unhealthyThreshold: 0.5 # Health score below which provider is unhealthy (default: 0.5, 0–1)
maxBodySizeMB: 10 # Max request body size in MB (default: 10, 1–100)
sessionIdleTtlMs: 600000 # Session agent pool idle TTL in ms (default: 600000 / 10min, min: 60000)
disableThinking: false # Strip thinking blocks from requests (default: false)
# Adaptive request hedging
hedging:
speculativeDelay: 500 # ms before starting backup providers (default: 500)
cvThreshold: 0.5 # latency CV threshold for hedging (default: 0.5)
maxHedge: 4 # max concurrent copies per request (default: 4)
providers:
anthropic:
baseUrl: https://api.anthropic.com
apiKey: ${ANTHROPIC_API_KEY} # Env var substitution
timeout: 20000 # Request timeout in ms (default: 20000)
ttfbTimeout: 8000 # TTFB timeout in ms (default: 8000)
stallTimeout: 15000 # Stall detection timeout (default: 15000)
poolSize: 10 # Connection pool size (default: 10)
concurrentLimit: 10 # Max concurrent requests (default: unlimited)
connectionRetries: 3 # Retries for stale connections (default: 3, max: 10)
staleAgentThresholdMs: 30000 # Mark pooled agent stale after idle ms (optional)
modelPools: # Per-model pool size overrides (optional)
"claude-sonnet-4-20250514": 20
modelLimits: # Per-provider token limits (optional)
maxOutputTokens: 16384
authType: anthropic # "anthropic" | "bearer" (default: anthropic)
circuitBreaker: # Per-provider circuit breaker (optional)
failureThreshold: 3 # Failures before opening circuit (alias: threshold, default: 3)
windowSeconds: 60 # Time window for failure count (default: 60)
cooldownSeconds: 30 # Cooldown in seconds (alias: cooldown, also in seconds, default: 30)
rateLimitCooldownSeconds: 10 # Shorter cooldown for 429 rate limits (optional)
openrouter:
baseUrl: https://openrouter.ai/api
apiKey: ${OPENROUTER_API_KEY}
authType: bearer
timeout: 60000
# Exact model name routing (checked FIRST, before tier patterns)
modelRouting:
"glm-5-turbo":
- provider: anthropic
"MiniMax-M2.7":
- provider: openrouter
model: minimax/MiniMax-M2.7 # With model name rewrite
# Weighted distribution example:
# "claude-sonnet-4":
# - provider: anthropic
# weight: 70
# - provider: openrouter
# weight: 30
# Tier-based routing (fallback chain)
routing:
sonnet:
- provider: anthropic
model: claude-sonnet-4-20250514 # Optional: rewrite model name
- provider: openrouter
model: anthropic/claude-sonnet-4 # Fallback
opus:
- provider: anthropic
model: claude-opus-4-20250514
haiku:
- provider: anthropic
model: claude-haiku-4-5-20251001
# Pattern matching: model name includes any string → matched to tier
tierPatterns:
sonnet: ["sonnet", "3-5-sonnet", "3.5-sonnet"]
opus: ["opus", "3-opus", "3.5-opus"]
haiku: ["haiku", "3-haiku", "3.5-haiku"]- Exact model name (
modelRouting) — if the request model matches exactly, use that route - Weighted distribution — if the model has
weightentries, requests are distributed across providers proportionally - Tier pattern (
tierPatterns+routing) — substring match the model name against patterns, then use the tier's provider chain - No match — returns 502 with a descriptive error listing configured tiers and model routes
- First provider is primary, rest are fallbacks
- Fallback triggers on: 429 (rate limit), 5xx (server error), network timeout, stream stall
- Adaptive race mode — when a 429 is received, remaining providers are raced simultaneously (not sequentially) for faster recovery
- Circuit breaker — providers that repeatedly fail are temporarily skipped (auto-recovers after cooldown, configurable window)
- No fallback on: 4xx (bad request, auth failure, forbidden) — returned immediately
- Model rewriting: each provider entry can override the
modelfield in the request body
| Provider | Auth Type | Base URL |
|---|---|---|
| Anthropic | x-api-key |
https://api.anthropic.com |
| OpenRouter | Bearer | https://openrouter.ai/api |
| Together AI | Bearer | https://api.together.xyz |
| GLM (Z.ai) | x-api-key |
https://api.z.ai/api/anthropic |
| Minimax | x-api-key |
https://api.minimax.io/anthropic |
| Fireworks | Bearer | https://api.fireworks.ai/inference/v1 |
Any OpenAI/Anthropic-compatible API works — just set baseUrl and authType appropriately.
In daemon mode, ModelWeaver watches the config file for changes and reloads automatically (debounced 300ms). You can also send a manual reload signal:
kill -SIGHUP $(cat ~/.modelweaver/modelweaver.pid)Or use the CLI:
npx @kianwoon/modelweaver reloadRe-running npx @kianwoon/modelweaver init also signals the running daemon to reload.
curl http://localhost:3456/api/statusReturns circuit breaker state for all providers and server uptime.
curl http://localhost:3456/api/versionReturns the running ModelWeaver version.
curl http://localhost:3456/api/poolReturns active connection pool state for all providers.
curl http://localhost:3456/api/health-scoresReturns per-provider health scores based on latency and error rates.
curl http://localhost:3456/api/sessionsReturns session agent pool statistics.
# Aggregated request metrics (by model, provider, error type)
curl http://localhost:3456/api/metrics/summary
# Per-provider circuit breaker state
curl http://localhost:3456/api/circuit-breaker
# Hedging win/loss statistics
curl http://localhost:3456/api/hedging/statsClaude Code sends different model names for different agent roles:
| Agent Role | Model Tier | Typical Model Name |
|---|---|---|
| Main conversation, coding | Sonnet | claude-sonnet-4-20250514 |
| Explore (codebase search) | Haiku | claude-haiku-4-5-20251001 |
| Plan (analysis) | Sonnet | claude-sonnet-4-20250514 |
| Complex subagents | Opus | claude-opus-4-20250514 |
| GLM/Z.ai models | Exact routing | glm-5-turbo |
| MiniMax models | Exact routing | MiniMax-M2.7 |
ModelWeaver uses the model name to determine which agent tier is calling, then routes accordingly.
npm install # Install dependencies
npm test # Run tests (299 tests)
npm run build # Build for production (tsup)
npm run dev # Run in dev mode (tsx)Apache-2.0