docs(qwen3-32b): add page with single-GH200 serving numbers by FeathBow · Pull Request #5 · openinfer-project/website

FeathBow · 2026-07-02T22:47:17Z

Summary

Add Qwen3-32B page — launch commands, serving numbers on a single GH200 120GB (QPS sweep, saturation ~5.3 req/s / ~680 out tok/s), architecture notes. Measured on openinfer main 5959f05.
Supported-models row — index table links the new page.
Sidebar entry — Models group now lists Qwen3-32B.

Test plan

Rendered page reviewed at localhost (npm run build passes, 8 pages)
All commands on the page are runnable (download, launch, curl)
Benchmark data matches the run artifacts posted in the verification report

xiaguan · 2026-07-04T05:50:32Z

Thanks for the contribution! The benchmark methodology is solid and the HF correctness comparison is a nice touch. One structural change before we merge:

Merge into the Qwen3 family page

The existing qwen3-4b.md already serves as the Qwen3 family page — it hosts both 4B and 8B under one file, with 8B as a ### Qwen3-8B subsection (see qwen3-4b.md:65). Since your page explicitly notes 32B is "the same qwen3 crate ... same model_type: qwen3 ... no feature flags or build changes needed," it fits the same pattern.

Please merge the 32B content into qwen3-4b.md as a ### Qwen3-32B subsection (same shape as the 8B subsection), and delete src/content/docs/models/qwen3-32b.md. This keeps "one Qwen3 family = one page" consistent so future contributors know where Qwen3-N-B belongs.

Concretely:

Move the GH200 benchmark table, launch command, tool calling example, and architecture notes under ### Qwen3-32B in qwen3-4b.md.
Remove the standalone qwen3-32b.md file.
Drop the sidebar entry added in astro.config.mjs (the existing Qwen3-4B entry covers the family).
In index.mdx, keep the supported-models table as a single Qwen3 row pointing at /models/qwen3-4b/.

This also means tests/routes.mjs doesn't need changes (no new route), which avoids the silent-CI-gap that would've existed with a standalone page.

Minor

index.mdx architecture column mixes axes. The current Qwen3-4B/8B row says "Full attention, tensor parallel" (a capability). If you keep that phrasing for the merged Qwen3 row, note that 32B also supports TP (same model_type). Alternatively, if the column describes the benchmark setup, both rows are actually TP1 single-GPU — pick one axis so readers aren't misled into thinking 4B/8B rely on TP while 32B relies on large VRAM.

FeathBow · 2026-07-04T12:11:30Z

Thanks for the look :) Tool calling seems a general capability shared by all Qwen3 sizes, should it be pulled out into its own section rather than sitting under Qwen3-32B?

docs(qwen3-32b): add page with single-GH200 serving numbers

0bc419a

FeathBow force-pushed the feat/qwen3-32b-page branch from 24f53b9 to 0bc419a Compare July 4, 2026 12:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(qwen3-32b): add page with single-GH200 serving numbers#5

docs(qwen3-32b): add page with single-GH200 serving numbers#5
FeathBow wants to merge 1 commit into
openinfer-project:mainfrom
FeathBow:feat/qwen3-32b-page

FeathBow commented Jul 2, 2026

Uh oh!

xiaguan commented Jul 4, 2026

Uh oh!

FeathBow commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

FeathBow commented Jul 2, 2026

Summary

Test plan

Uh oh!

xiaguan commented Jul 4, 2026

Merge into the Qwen3 family page

Minor

Uh oh!

FeathBow commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants