Skip to content

feat(openrouter): client-side rate limit for free-tier models only#13

Merged
mrdulasolutions merged 1 commit into
mainfrom
claude/openrouter-free-rate-limit
May 13, 2026
Merged

feat(openrouter): client-side rate limit for free-tier models only#13
mrdulasolutions merged 1 commit into
mainfrom
claude/openrouter-free-rate-limit

Conversation

@mrdulasolutions
Copy link
Copy Markdown
Owner

Summary

Fixes user-reported failure:

```
bridge.call(drafts.rerunAgent) failed: sidecar returned error:
OpenRouter HTTP 429: Rate limit exceeded: free-models-per-min.
{"X-RateLimit-Limit":"16","X-RateLimit-Remaining":"0"}
```

OpenRouter caps free-tier models at 16 req/min. The current retry path (3 attempts, exponential backoff capped at 10s) can't outlast a 60-second rate window, and there's no client-side gate — so multiple concurrent agent flows (analyzer, draft generator, rerunAgent, sender lookup) blow through the cap in seconds and the 429 surfaces to the user.

What this PR does

Two complementary changes in `sidecar/src/services/providers/openrouter.ts`, both scoped to `:free` models only (paid OpenRouter models bypass the gate entirely — they don't need it, and threading them through a 16/min bucket would just add latency for no reason):

1. Sliding-window rate limiter

Module-scope timestamp queue tracking the last 16 "issue this call" decisions. Before each free-model fetch:

  • Drop timestamps older than 60s.
  • If queue is under capacity → push current timestamp, proceed.
  • Otherwise → sleep until the oldest timestamp ages out (+50ms cushion to avoid herd retry at the boundary), then re-check.

Awaiters proceed in arrival order because each call awaits before mutating the array.

2. `X-RateLimit-Reset` header parsing on 429

If a 429 slips through (concurrent caller race, sidecar restart resetting the in-process queue, etc.), parse the server's `X-RateLimit-Reset` epoch value and sleep until that point instead of falling back to capped exponential backoff. Caps the sleep at 90s for safety against header bugs / clock skew.

Free vs paid detection

By model-id convention: OpenRouter distributes free models with a `:free` suffix (e.g. `meta-llama/llama-3.3-70b-instruct:free`). The bare model id is the paid tier. `isFreeModel()` is a one-line predicate, easy to extend if OpenRouter changes their naming.

What this PR doesn't do

  • Doesn't gate paid models. Per the user's scoping decision.
  • Doesn't persist daily limits. OpenRouter also has ~200/day on free; that would need filesystem state and seems premature.
  • Doesn't fix Anthropic-side rate limits — different provider, different retry path in `services/anthropic.ts`.

Test plan

  • Trigger `drafts.rerunAgent` 20× rapidly with a free model configured — should queue, not 429
  • Same with a paid model — should NOT queue (no added latency)
  • Inspect sidecar logs for `free-model rate limit reached, queueing` entries during the burst

🤖 Generated with Claude Code

User report: `bridge.call(drafts.rerunAgent)` failed with
`OpenRouter HTTP 429: Rate limit exceeded: free-models-per-min.
X-RateLimit-Limit: 16`. Concurrent agent flows (analyzer, draft
generator, rerunAgent, sender lookup) burst through OpenRouter's
16 req/min cap on free models in seconds, and the existing retry
logic — 3 attempts with backoff capped at 10s — can't outlast a
60s rate window.

Two complementary changes, both isolated to `:free` model ids:

1. Sliding-window rate limiter in front of the fetch. A
   module-scope timestamp queue tracks the last N successful
   "issue this call" decisions; before each free-model call we
   drop expired timestamps and either claim a slot or sleep until
   the oldest ages out. Awaiters proceed in arrival order. Paid
   models bypass entirely — adding latency to a request that
   doesn't need throttling helps nothing.

2. On 429 responses, parse `X-RateLimit-Reset` and sleep until the
   window opens (capped at 90s) instead of falling back to
   exponential backoff that wouldn't survive 60s. Acts as a
   server-truthing safety net for the in-process limiter, which
   resets across sidecar restarts.

Free vs paid detection is by model-id convention: OpenRouter
distributes free models with a `:free` suffix; the bare model id
is the paid tier. No other changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mrdulasolutions mrdulasolutions merged commit c5e78e0 into main May 13, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant