Skip to content

docs(qwen3-32b): add page with single-GH200 serving numbers#5

Open
FeathBow wants to merge 1 commit into
openinfer-project:mainfrom
FeathBow:feat/qwen3-32b-page
Open

docs(qwen3-32b): add page with single-GH200 serving numbers#5
FeathBow wants to merge 1 commit into
openinfer-project:mainfrom
FeathBow:feat/qwen3-32b-page

Conversation

@FeathBow

@FeathBow FeathBow commented Jul 2, 2026

Copy link
Copy Markdown

Summary

  • Add Qwen3-32B page — launch commands, serving numbers on a single GH200 120GB (QPS sweep, saturation ~5.3 req/s / ~680 out tok/s), architecture notes. Measured on openinfer main 5959f05.
  • Supported-models row — index table links the new page.
  • Sidebar entry — Models group now lists Qwen3-32B.

Test plan

  • Rendered page reviewed at localhost (npm run build passes, 8 pages)
  • All commands on the page are runnable (download, launch, curl)
  • Benchmark data matches the run artifacts posted in the verification report

@xiaguan

xiaguan commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! The benchmark methodology is solid and the HF correctness comparison is a nice touch. One structural change before we merge:

Merge into the Qwen3 family page

The existing qwen3-4b.md already serves as the Qwen3 family page — it hosts both 4B and 8B under one file, with 8B as a ### Qwen3-8B subsection (see qwen3-4b.md:65). Since your page explicitly notes 32B is "the same qwen3 crate ... same model_type: qwen3 ... no feature flags or build changes needed," it fits the same pattern.

Please merge the 32B content into qwen3-4b.md as a ### Qwen3-32B subsection (same shape as the 8B subsection), and delete src/content/docs/models/qwen3-32b.md. This keeps "one Qwen3 family = one page" consistent so future contributors know where Qwen3-N-B belongs.

Concretely:

  • Move the GH200 benchmark table, launch command, tool calling example, and architecture notes under ### Qwen3-32B in qwen3-4b.md.
  • Remove the standalone qwen3-32b.md file.
  • Drop the sidebar entry added in astro.config.mjs (the existing Qwen3-4B entry covers the family).
  • In index.mdx, keep the supported-models table as a single Qwen3 row pointing at /models/qwen3-4b/.

This also means tests/routes.mjs doesn't need changes (no new route), which avoids the silent-CI-gap that would've existed with a standalone page.

Minor

index.mdx architecture column mixes axes. The current Qwen3-4B/8B row says "Full attention, tensor parallel" (a capability). If you keep that phrasing for the merged Qwen3 row, note that 32B also supports TP (same model_type). Alternatively, if the column describes the benchmark setup, both rows are actually TP1 single-GPU — pick one axis so readers aren't misled into thinking 4B/8B rely on TP while 32B relies on large VRAM.

@FeathBow FeathBow force-pushed the feat/qwen3-32b-page branch from 24f53b9 to 0bc419a Compare July 4, 2026 12:08
@FeathBow

FeathBow commented Jul 4, 2026

Copy link
Copy Markdown
Author

Thanks for the look :) Tool calling seems a general capability shared by all Qwen3 sizes, should it be pulled out into its own section rather than sitting under Qwen3-32B?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants