| Column | Sort key | Description |
|---|---|---|
| Rank | R |
Position based on current sort order (medals for top 3: 🥇🥈🥉) |
| Tier | — | SWE-bench tier (S+, S, A+, A, A-, B+, B, C) |
| SWE% | S |
SWE-bench Verified score — industry-standard for coding |
| CTX | C |
Context window size (e.g. 128k) |
| Model | M |
Model display name (favorites show ⭐ prefix) |
| Provider | O |
Provider name — press D to cycle provider filter |
| Latest Ping | L |
Most recent round-trip latency in milliseconds |
| Avg Ping | A |
Rolling average of ALL successful pings since launch |
| Health | H |
Current status: UP ✅, NO KEY 🔑, Timeout ⏳, Overloaded 🔥, Not Found 🚫 |
| Verdict | V |
Health verdict based on avg latency + stability |
| Stability | B |
Composite 0–100 consistency score (see below) |
| Up% | U |
Uptime — percentage of successful pings |
| Verdict | Meaning |
|---|---|
| Perfect | Avg < 400ms with stable p95/jitter |
| Normal | Avg < 1000ms, consistent responses |
| Slow | Avg 1000–2000ms |
| Spiky | Good avg but erratic tail latency (p95 >> avg) |
| Very Slow | Avg 2000–5000ms |
| Overloaded | Server returned 429/503 (rate limited or capacity hit) |
| Unstable | Was previously up but now timing out, or avg > 5000ms |
| Not Active | No successful pings yet |
| Pending | First ping still in flight |
The Stability column answers: "How consistent and predictable is this model?"
Average latency alone is misleading. A model averaging 250ms that randomly spikes to 6 s feels slower than a steady 400ms model. The stability score captures this.
Four signals, normalized to 0–100, combined with weights:
Stability = 0.30 × p95_score
+ 0.30 × jitter_score
+ 0.20 × spike_score
+ 0.20 × reliability_score
| Component | Weight | What it measures | Normalization |
|---|---|---|---|
| p95 latency | 30% | Worst 5% of response times | 100 × (1 - p95 / 5000), clamped 0–100 |
| Jitter (σ) | 30% | Standard deviation of ping times | 100 × (1 - jitter / 2000), clamped 0–100 |
| Spike rate | 20% | Fraction of pings above 3000ms | 100 × (1 - spikes / total_pings) |
| Reliability | 20% | Fraction of HTTP 200 pings | Direct uptime % (0–100) |
Example: Model A: avg 250ms, p95 6000ms → score ~30. Model B: avg 400ms, p95 650ms → score ~85. Model B feels faster in real usage.