feat(openrouter): client-side rate limit for free-tier models only#13
Merged
Merged
Conversation
User report: `bridge.call(drafts.rerunAgent)` failed with `OpenRouter HTTP 429: Rate limit exceeded: free-models-per-min. X-RateLimit-Limit: 16`. Concurrent agent flows (analyzer, draft generator, rerunAgent, sender lookup) burst through OpenRouter's 16 req/min cap on free models in seconds, and the existing retry logic — 3 attempts with backoff capped at 10s — can't outlast a 60s rate window. Two complementary changes, both isolated to `:free` model ids: 1. Sliding-window rate limiter in front of the fetch. A module-scope timestamp queue tracks the last N successful "issue this call" decisions; before each free-model call we drop expired timestamps and either claim a slot or sleep until the oldest ages out. Awaiters proceed in arrival order. Paid models bypass entirely — adding latency to a request that doesn't need throttling helps nothing. 2. On 429 responses, parse `X-RateLimit-Reset` and sleep until the window opens (capped at 90s) instead of falling back to exponential backoff that wouldn't survive 60s. Acts as a server-truthing safety net for the in-process limiter, which resets across sidecar restarts. Free vs paid detection is by model-id convention: OpenRouter distributes free models with a `:free` suffix; the bare model id is the paid tier. No other changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes user-reported failure:
```
bridge.call(drafts.rerunAgent) failed: sidecar returned error:
OpenRouter HTTP 429: Rate limit exceeded: free-models-per-min.
{"X-RateLimit-Limit":"16","X-RateLimit-Remaining":"0"}
```
OpenRouter caps free-tier models at 16 req/min. The current retry path (3 attempts, exponential backoff capped at 10s) can't outlast a 60-second rate window, and there's no client-side gate — so multiple concurrent agent flows (analyzer, draft generator, rerunAgent, sender lookup) blow through the cap in seconds and the 429 surfaces to the user.
What this PR does
Two complementary changes in `sidecar/src/services/providers/openrouter.ts`, both scoped to `:free` models only (paid OpenRouter models bypass the gate entirely — they don't need it, and threading them through a 16/min bucket would just add latency for no reason):
1. Sliding-window rate limiter
Module-scope timestamp queue tracking the last 16 "issue this call" decisions. Before each free-model fetch:
Awaiters proceed in arrival order because each call awaits before mutating the array.
2. `X-RateLimit-Reset` header parsing on 429
If a 429 slips through (concurrent caller race, sidecar restart resetting the in-process queue, etc.), parse the server's `X-RateLimit-Reset` epoch value and sleep until that point instead of falling back to capped exponential backoff. Caps the sleep at 90s for safety against header bugs / clock skew.
Free vs paid detection
By model-id convention: OpenRouter distributes free models with a `:free` suffix (e.g. `meta-llama/llama-3.3-70b-instruct:free`). The bare model id is the paid tier. `isFreeModel()` is a one-line predicate, easy to extend if OpenRouter changes their naming.
What this PR doesn't do
Test plan
🤖 Generated with Claude Code