feat(openrouter): client-side rate limit for free-tier models only by mrdulasolutions · Pull Request #13 · mrdulasolutions/AOS-Mail

mrdulasolutions · 2026-05-13T18:10:25Z

Summary

Fixes user-reported failure:

```
bridge.call(drafts.rerunAgent) failed: sidecar returned error:
OpenRouter HTTP 429: Rate limit exceeded: free-models-per-min.
{"X-RateLimit-Limit":"16","X-RateLimit-Remaining":"0"}
```

OpenRouter caps free-tier models at 16 req/min. The current retry path (3 attempts, exponential backoff capped at 10s) can't outlast a 60-second rate window, and there's no client-side gate — so multiple concurrent agent flows (analyzer, draft generator, rerunAgent, sender lookup) blow through the cap in seconds and the 429 surfaces to the user.

What this PR does

Two complementary changes in `sidecar/src/services/providers/openrouter.ts`, both scoped to `:free` models only (paid OpenRouter models bypass the gate entirely — they don't need it, and threading them through a 16/min bucket would just add latency for no reason):

1. Sliding-window rate limiter

Module-scope timestamp queue tracking the last 16 "issue this call" decisions. Before each free-model fetch:

Drop timestamps older than 60s.
If queue is under capacity → push current timestamp, proceed.
Otherwise → sleep until the oldest timestamp ages out (+50ms cushion to avoid herd retry at the boundary), then re-check.

Awaiters proceed in arrival order because each call awaits before mutating the array.

2. `X-RateLimit-Reset` header parsing on 429

If a 429 slips through (concurrent caller race, sidecar restart resetting the in-process queue, etc.), parse the server's `X-RateLimit-Reset` epoch value and sleep until that point instead of falling back to capped exponential backoff. Caps the sleep at 90s for safety against header bugs / clock skew.

Free vs paid detection

By model-id convention: OpenRouter distributes free models with a `:free` suffix (e.g. `meta-llama/llama-3.3-70b-instruct:free`). The bare model id is the paid tier. `isFreeModel()` is a one-line predicate, easy to extend if OpenRouter changes their naming.

What this PR doesn't do

Doesn't gate paid models. Per the user's scoping decision.
Doesn't persist daily limits. OpenRouter also has ~200/day on free; that would need filesystem state and seems premature.
Doesn't fix Anthropic-side rate limits — different provider, different retry path in `services/anthropic.ts`.

Test plan

Trigger `drafts.rerunAgent` 20× rapidly with a free model configured — should queue, not 429
Same with a paid model — should NOT queue (no added latency)
Inspect sidecar logs for `free-model rate limit reached, queueing` entries during the burst

🤖 Generated with Claude Code

User report: `bridge.call(drafts.rerunAgent)` failed with `OpenRouter HTTP 429: Rate limit exceeded: free-models-per-min. X-RateLimit-Limit: 16`. Concurrent agent flows (analyzer, draft generator, rerunAgent, sender lookup) burst through OpenRouter's 16 req/min cap on free models in seconds, and the existing retry logic — 3 attempts with backoff capped at 10s — can't outlast a 60s rate window. Two complementary changes, both isolated to `:free` model ids: 1. Sliding-window rate limiter in front of the fetch. A module-scope timestamp queue tracks the last N successful "issue this call" decisions; before each free-model call we drop expired timestamps and either claim a slot or sleep until the oldest ages out. Awaiters proceed in arrival order. Paid models bypass entirely — adding latency to a request that doesn't need throttling helps nothing. 2. On 429 responses, parse `X-RateLimit-Reset` and sleep until the window opens (capped at 90s) instead of falling back to exponential backoff that wouldn't survive 60s. Acts as a server-truthing safety net for the in-process limiter, which resets across sidecar restarts. Free vs paid detection is by model-id convention: OpenRouter distributes free models with a `:free` suffix; the bare model id is the paid tier. No other changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

mrdulasolutions merged commit c5e78e0 into main May 13, 2026
2 of 3 checks passed

mrdulasolutions mentioned this pull request May 13, 2026

chore(release): bump to v0.1.9 #20

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openrouter): client-side rate limit for free-tier models only#13

feat(openrouter): client-side rate limit for free-tier models only#13
mrdulasolutions merged 1 commit into
mainfrom
claude/openrouter-free-rate-limit

mrdulasolutions commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mrdulasolutions commented May 13, 2026

Summary

What this PR does

1. Sliding-window rate limiter

2. `X-RateLimit-Reset` header parsing on 429

Free vs paid detection

What this PR doesn't do

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant