-
Notifications
You must be signed in to change notification settings - Fork 0
Benchmark Methodology
This page explains how FasterAPI's performance claims are measured, what hardware was used, and how to reproduce the results yourself.
We run three categories of benchmarks:
| Category | What it measures | How |
|---|---|---|
| Component | Individual operations (routing, JSON encode/decode) | Tight loop, time.perf_counter()
|
| Framework (Direct ASGI) | Full request cycle without network overhead | Synthetic ASGI scope → app(scope, receive, send)
|
| End-to-End HTTP | Real HTTP performance including server + network |
httpx.AsyncClient against a live uvicorn server |
The README numbers come from Framework (Direct ASGI) benchmarks — these isolate the framework's actual performance without conflating uvicorn's overhead.
Machine: Apple Silicon (M-series)
OS: macOS
Python: 3.13.7
uvloop: 0.21.x
msgspec: 0.19.x
FastAPI: 0.115.x (comparison target)
Pydantic: 2.10.x
Machine: GitHub Actions ubuntu-latest (2-core x86_64)
OS: Ubuntu 22.04
Python: 3.13
CI runners are significantly slower than local Apple Silicon. The CI benchmark workflow compares speedup ratios (FasterAPI/FastAPI), not raw req/s, to account for hardware differences.
This is the main benchmark used for the README results. It bypasses the network layer entirely.
- Creates both a FasterAPI and FastAPI app with identical routes
- Constructs synthetic ASGI
scope,receive,sendfunctions - Calls
await app(scope, receive, send)in a tight loop - Measures throughput in requests/second
| Endpoint | Method | Purpose |
|---|---|---|
/health |
GET | Minimal handler — measures framework dispatch overhead |
/users/{id} |
GET | Path parameter extraction + JSON response |
/users |
POST | JSON body parsing + validation + response |
1. Warm-up phase: 500 requests (not timed)
2. Measured phase: 50,000 requests
3. Timing: time.perf_counter() around the measured phase
4. Result: requests / elapsed_seconds
Both frameworks are benchmarked in the same process, same event loop, same Python version. This eliminates environmental variance.
The benchmark is in benchmarks/compare.py. Run with --direct flag:
python benchmarks/compare.py --directSetup:
- 100 routes registered (50 static, 30 single-param, 20 multi-param)
- 3 representative lookup paths tested
- 500,000 iterations × 3 paths = 1,500,000 total lookups
Measured: ops/second for each router implementation
Setup:
- Dict payload: {"id": 42, "name": "test", "email": "t@t.com", "scores": [1,2,3]}
- 1,000,000 iterations
Measured: encode ops/second
Setup:
- Same payload as encoding, as raw bytes
- Decoded into a typed Struct/BaseModel
- 1,000,000 iterations
Measured: decode+validate ops/second
cd FasterAPI
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,benchmark]"python benchmarks/compare.pypython benchmarks/compare.py --directimport time
import msgspec
import json
data = {"id": 42, "name": "test", "scores": [1, 2, 3]}
N = 1_000_000
# msgspec
start = time.perf_counter()
for _ in range(N):
msgspec.json.encode(data)
msgspec_rps = N / (time.perf_counter() - start)
# stdlib json
start = time.perf_counter()
for _ in range(N):
json.dumps(data).encode()
json_rps = N / (time.perf_counter() - start)
print(f"msgspec: {msgspec_rps:,.0f} ops/s")
print(f"json: {json_rps:,.0f} ops/s")
print(f"speedup: {msgspec_rps/json_rps:.1f}x")Every PR to stage or master triggers an automated benchmark that:
- Runs the direct ASGI benchmark (50,000 requests per endpoint)
- Runs the routing benchmark (1.5M lookups)
- Posts a comment on the PR with results
| Endpoint | FasterAPI | FastAPI | Speedup | vs Baseline |
|------------------|----------------|---------------|---------|-------------|
| GET /health | 150,000/s | 22,000/s | 6.82x | ⚪ -0.4% |
| GET /users/{id} | 128,000/s | 15,000/s | 8.53x | ⚪ -2.3% |
| POST /users | 95,000/s | 13,000/s | 7.31x | 🟢 +2.2% |
- Speedup = FasterAPI req/s ÷ FastAPI req/s
- vs Baseline compares the speedup ratio against the README baseline
- 🟢 = speedup improved by >2%
- ⚪ = within noise (±5%)
- 🔴 = speedup regressed by >5% — needs investigation
Raw req/s on CI runners will be 2-3x lower than local Apple Silicon. This is expected. The speedup ratio is hardware-independent and is what matters.
-
Same process, same loop — Both frameworks run in the same Python process and event loop. No one gets a "warm" advantage.
-
Warm-up phase — 500 requests are run before timing starts to ensure JIT-like optimizations (e.g.,
__pycache__,lru_cachewarming) are accounted for. -
Identical routes — Both apps define the exact same endpoints with equivalent handler logic. The FasterAPI handler uses
msgspec.Struct, FastAPI usespydantic.BaseModel. -
No GC interference — The benchmark runs long enough (50K requests) that GC pauses are amortized and don't skew results.
-
Deterministic input — The same request payload is used for every iteration. No randomness that could cause branch prediction differences.
-
Open source — All benchmark code is in
benchmarks/compare.py. Run it yourself.