feat(vllm): vllm gb200 dsv4 recipes by alec-flowers · Pull Request #103 · NVIDIA/srt-slurm

alec-flowers · 2026-04-28T00:02:32Z

Draft PR for the vLLM GB200 v0.20.0 branch.

Summary:

Adds a self-contained lm-eval benchmark runner for GSM8K-style evals against an OpenAI-compatible chat endpoint.
Keeps the existing SGLang-dependent gsm8k runner untouched; this new path uses python3 -m lm_eval --model local-chat-completions and does not require SGLang or an InferenceX workspace mount.
Bundles the GSM8K task YAML, score thresholds, and score validator under src/srtctl/benchmarks/scripts/lm-eval/.
Updates only recipes/vllm/deepseek-v4-pro/GB200/8k1k/disagg-gb200-1p4d-dep8-tp8-c256-c512-offload.yaml to run lm-eval with the bundled GSM8K task and VALIDATE_EVAL_SCORES=true.

Validation:

Local smoke: launched a small Qwen/Qwen2.5-0.5B-Instruct OpenAI-compatible chat endpoint and ran the bundled script with EVAL_LIMIT=2, EVAL_NUM_FEWSHOT=0, EVAL_CONC=1; it produced meta_env.json, results_*.json, and samples_*.jsonl successfully.
bash -n src/srtctl/benchmarks/scripts/lm-eval/bench.sh passed.
python3 -m py_compile src/srtctl/benchmarks/lm_eval.py src/srtctl/cli/do_sweep.py src/srtctl/benchmarks/scripts/lm-eval/validate_scores.py passed.
Focused tests passed: tests/test_benchmarks.py::TestLMEvalRunner, TestRunPostEval, and TestScriptsExist.
UV_DEFAULT_INDEX=https://pypi.org/simple make check passed (635 passed, 2 skipped, 6 deselected).
The known ty diagnostic in src/srtctl/core/validation.py is still emitted under the existing || true Makefile behavior.

chore: track vLLM GB200 v0.20.0 baseline

2a7632a

alec-flowers changed the title ~~chore: track vLLM GB200 v0.20.0 baseline~~ feat(vllm): run DSv4 1p4d GSM8K via lm-eval Apr 28, 2026

alec-flowers force-pushed the aflowers/vllm-gb200-v0.20.0 branch from a48fe36 to 4acf4ea Compare April 28, 2026 03:39

alec-flowers changed the title ~~feat(vllm): run DSv4 1p4d GSM8K via lm-eval~~ feat(vllm): vllm gb200 dsv4 recipes Apr 28, 2026

Disable blocking apply preflight

c6df50a

alec-flowers force-pushed the aflowers/vllm-gb200-v0.20.0 branch from 707e933 to c6df50a Compare April 28, 2026 03:55

alec-flowers and others added 14 commits April 27, 2026 22:15

Fix DeepSeek V4 vLLM GB200 concurrencies (#105)

9aa7c9f

Keep max throughput DeepSeek V4 GB200 recipe

f50e486

Reserve infra node for DeepSeek V4 max tpt

2a0fd69

Keep selected DeepSeek V4 GB200 recipes

50b3970

Fit mid low latency recipe on 18 GB200 nodes

e913bb1

Run mid low latency at concurrencies 8 and 256

ac1da34

Split DeepSeek V4 GB200 curve recipes

54e7322

Split GB200 max throughput MegaMoE and offload recipes

18c5c67

Keep single offload max throughput recipe

b0a396e

Add vLLM v0.20 one-sided patch setup

3542513

Add InferenceX lm-eval runner for vLLM GSM8K

5e154a3

Vendor self-contained lm-eval GSM8K runner

5736dbb

Fix lm-eval dependency isolation

636ac46

Fix lm-eval venv setuptools pin

08a9082

alec-flowers force-pushed the aflowers/vllm-gb200-v0.20.0 branch from 406e5b4 to 7beaa58 Compare April 29, 2026 01:11

Add GB200 MegaMOE max throughput recipe

653a652

alec-flowers force-pushed the aflowers/vllm-gb200-v0.20.0 branch from 7beaa58 to 653a652 Compare April 29, 2026 01:32

alec-flowers added 7 commits April 28, 2026 21:05

Add GB200 mid-curve MegaMOE recipe

8dd0513

Run mid-curve MegaMOE at concurrency 128

e6058aa

Add GB200 DEP2 MegaMOE max throughput recipe

9a0c0bd

Restore GB200 low-middle curve recipe

78b5b0f

Fix GB200 MegaMOE 2x DEP8 recipe

5c1ae10

Update GB200 MegaMOE mid-curve concurrencies

a4be43b

Rename GB200 MegaMOE high throughput recipe

51659bd

alec-flowers added 23 commits April 29, 2026 09:50

Disable FlashInfer autotune on GB200 decode

a96b77e

Harden uv setup download

fac7557

Remove GB200 non-MegaMOE mid curve recipe

8e58743

Add GB200 TEP8 test recipe

370b012

Increase GB200 TEP8 prefill memory

5338083

Pin NIXL engine IDs for TEP8 test recipe

4d9457e

Remove stale GB200 max throughput recipe

892a0bb

Add GB200 DEP8 MegaMOE MTP test recipe

4d49f10

Narrow GB200 MTP test recipe concurrencies

6e69846

Use one MTP speculative token for GB200 test

54e870a

Add GB200 aggregate MegaMOE MTP test recipe

0e8bf18

Add PR41189 DSV4 disagg MTP2 test recipes

c5ac96b

Add PR41189 low-latency MTP2 c32-c64 fallback

a549f21

Add PR41189 3P1D offload prefill MTP2 fallback

cb871f0

Add PR41189 4P1D offload prefill MTP2 fallback

a0cee9b

Add PR41189 3P1D decode eager MTP2 fallback

ce9684c

Add PR41189 4P1D decode eager MTP2 fallback

e24d175

Add PR41189 3P1D decode eager c512 fallback

5136fb1

Add PR41189 3P1D MegaMOE c512 MTP2 probe

f2b5bc7

Promote GB200 MTP2 FP4 Pareto recipes

72c2eb2

Restore GB200 recipes and simplify MTP2 names

41a2b53

Fix vLLM NUMA bind patch path detection

f42aebb

Add vLLM issue 41603 reproduction artifacts

f676145

alec-flowers mentioned this pull request May 4, 2026

DeepSeek-V4 MTP2 GB200 throughput regression likely tied to FP32->FP4 cvt path (#41015) vllm-project/vllm#41603

Open

alec-flowers added 4 commits May 3, 2026 21:36

Remove failed vLLM issue 41603 artifacts

f0fc602

Use vLLM 0.20.1 for GB200 recipes

4bb97f1

Document vLLM issue 41603 patch wrapper

139cbcb

Remove duplicate PR41015 setup wrapper

8a8acdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vllm): vllm gb200 dsv4 recipes#103

feat(vllm): vllm gb200 dsv4 recipes#103
alec-flowers wants to merge 51 commits intomainfrom
aflowers/vllm-gb200-v0.20.0

alec-flowers commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alec-flowers commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alec-flowers commented Apr 28, 2026 •

edited

Loading