Skip to content

Benchmark for latency and throughput both GPU server and Modal#12

Closed
ArnavBharti wants to merge 14 commits intomainfrom
benchmarking
Closed

Benchmark for latency and throughput both GPU server and Modal#12
ArnavBharti wants to merge 14 commits intomainfrom
benchmarking

Conversation

@ArnavBharti
Copy link
Copy Markdown
Collaborator

=== Benchmark Summary ===
Mode: server
Target: https://scaledfocus--masala-embed-server-modal-fastapi-app.modal.run/v1/embeddings
Concurrency: 100
Payloads: 3000
Request timeout: 120s

=== Per-stage results ===
--- stage-1 (duration=20s, target_rps=5) ---
Requests: 100 Successes: 100 Failures: 0
Measured wall-time (first->last): 16.251 s
Throughput (successful reqs / wall-time): 6.154 req/s
End-to-end latency p50/p95/p99: 475.93 / 3199.36 / 4002.60 ms
vLLM p50/p95/p99: 166.60 / 344.90 / 364.09 ms
Normalize p50/p95/p99: 0.01 / 0.02 / 0.04 ms
Search p50/p95/p99: 0.17 / 0.26 / 0.29 ms
Server p50/p95/p99: 166.90 / 345.12 / 364.27 ms

--- stage-2 (duration=40s, target_rps=20) ---
Requests: 796 Successes: 796 Failures: 0
Measured wall-time (first->last): 40.084 s
Throughput (successful reqs / wall-time): 19.858 req/s
End-to-end latency p50/p95/p99: 513.94 / 588.97 / 633.42 ms
vLLM p50/p95/p99: 191.36 / 251.17 / 282.14 ms
Normalize p50/p95/p99: 0.01 / 0.02 / 0.02 ms
Search p50/p95/p99: 0.14 / 0.25 / 0.40 ms
Server p50/p95/p99: 191.60 / 251.36 / 282.32 ms

--- stage-3 (duration=60s, target_rps=50) ---
Requests: 2980 Successes: 2980 Failures: 0
Measured wall-time (first->last): 60.127 s
Throughput (successful reqs / wall-time): 49.562 req/s
End-to-end latency p50/p95/p99: 656.43 / 877.53 / 1108.32 ms
vLLM p50/p95/p99: 295.64 / 448.96 / 521.56 ms
Normalize p50/p95/p99: 0.01 / 0.02 / 0.04 ms
Search p50/p95/p99: 0.17 / 0.42 / 0.61 ms
Server p50/p95/p99: 295.85 / 449.30 / 521.91 ms

--- stage-4 (duration=60s, target_rps=100) ---
Requests: 5960 Successes: 5960 Failures: 0
Measured wall-time (first->last): 90.897 s
Throughput (successful reqs / wall-time): 65.569 req/s
End-to-end latency p50/p95/p99: 1513.67 / 1903.91 / 2114.91 ms
vLLM p50/p95/p99: 314.50 / 891.68 / 1387.54 ms
Normalize p50/p95/p99: 0.01 / 0.02 / 0.14 ms
Search p50/p95/p99: 0.20 / 0.60 / 0.78 ms
Server p50/p95/p99: 314.76 / 891.96 / 1387.78 ms

=== Overall ===
Requests: 9836 Successes: 9836 Failures: 0
Measured wall-time (first->last): 208.773 s
Throughput (successful reqs / wall-time): 47.113 req/s
End-to-end latency p50/p95/p99: 1330.15 / 1823.62 / 2086.31 ms
vLLM p50/p95/p99: 295.48 / 685.89 / 1349.32 ms
Normalize p50/p95/p99: 0.01 / 0.02 / 0.10 ms
Search p50/p95/p99: 0.18 / 0.55 / 0.74 ms
Server p50/p95/p99: 295.81 / 686.11 / 1349.63 ms

@NirantK NirantK self-requested a review October 23, 2025 12:32
@NirantK NirantK closed this Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants