Add GLM 5.1 TPS benchmarking support by teofeliu · Pull Request #83 · fw-ai/benchmark

teofeliu · 2026-04-13T18:06:46Z

Summary

This PR adds tooling to help measure and optimize TPS (tokens per second) for GLM 5.1 deployments, specifically to help the Revolut team diagnose and improve their B200 deployment performance.

Context

The Revolut team is seeing ~100 TPS on their GLM 5.1 deployment on 4xB200 GPUs, but expects ~300 TPS for a model of this size. This PR adds benchmarking tools to help:

Accurately measure current TPS
Find optimal concurrency settings
Test different token configurations
Generate reports with recommendations

Changes

New files

llm_bench/locust-glm5p1.conf: Locust configuration file optimized for GLM 5.1 TPS testing
llm_bench/benchmark_glm5p1_tps.py: Comprehensive TPS benchmark suite that:
- Runs concurrency sweeps to find the saturation point
- Tests different prompt/output token configurations
- Generates detailed reports with TPS analysis and recommendations

Modified files

llm_bench/load_test.py: Added output_tps metric to the summary output (calculated as QPS × average completion tokens)
llm_bench/README.md: Added documentation for GLM 5.1 TPS benchmarking

Usage

Quick TPS measurement

locust --config locust-glm5p1.conf \
  -H https://api.fireworks.ai/inference \
  -k $FIREWORKS_API_KEY \
  -m accounts/fireworks/models/glm-5p1 \
  -u 16 -t 2min

Comprehensive benchmark suite

python benchmark_glm5p1_tps.py \
  --host https://api.fireworks.ai/inference \
  --api-key $FIREWORKS_API_KEY \
  --model accounts/fireworks/models/glm-5p1

Testing

The changes are additive and don't modify existing behavior. The new output_tps metric is only shown for non-embedding/non-rerank benchmarks.

- Add output_tps metric to load_test.py summary (QPS * avg completion tokens) - Add locust-glm5p1.conf configuration file for GLM 5.1 TPS testing - Add benchmark_glm5p1_tps.py comprehensive TPS benchmark suite with: - Concurrency sweep to find saturation point - Token length sweep for different configurations - Automated report generation with recommendations - Update README.md with GLM 5.1 TPS benchmarking documentation This helps measure and optimize TPS for GLM 5.1 deployments on B200 GPUs. Co-authored-by: Teo Feliu <teofeliu@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GLM 5.1 TPS benchmarking support#83

Add GLM 5.1 TPS benchmarking support#83
teofeliu wants to merge 1 commit intomainfrom
cursor/glm-5-1-tps-ef63

teofeliu commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teofeliu commented Apr 13, 2026

Summary

Context

Changes

New files

Modified files

Usage

Quick TPS measurement

Comprehensive benchmark suite

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants