Skip to content

Add SGLang Cosmos3 serving docs#174

Open
mickqian wants to merge 7 commits into
NVIDIA:mainfrom
mickqian:sglang-serve-cosmos3-docs
Open

Add SGLang Cosmos3 serving docs#174
mickqian wants to merge 7 commits into
NVIDIA:mainfrom
mickqian:sglang-serve-cosmos3-docs

Conversation

@mickqian
Copy link
Copy Markdown

@mickqian mickqian commented Jun 1, 2026

Summary

  • add a Generator with SGLang section next to the existing Diffusers and vLLM-Omni serving paths
  • document SGLang serve commands for Cosmos3-Nano and Cosmos3-Super
  • list the current SGLang-supported Cosmos3 visual generation checkpoints and note follow-up gaps for sound, V2V, and action modes
  • include OpenAI-compatible T2I and T2V request examples

Notes

This section is intended to mirror the vLLM-Omni style while keeping the current SGLang support boundary explicit.

@mickqian mickqian marked this pull request as ready for review June 1, 2026 16:59
@lfengad
Copy link
Copy Markdown
Collaborator

lfengad commented Jun 2, 2026

@yogeshbalaji Is this on the plan of the landing page?

Comment thread README.md
```shell
sglang serve \
--model-path nvidia/Cosmos3-Super-Image2Video \
--num-gpus 4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken

sglang serve \
  --model-path nvidia/Cosmos3-Super-Image2Video \
  --num-gpus 4

is equivalent to CFG + ulysses-deg 2 i.e.

sglang serve \
  --model-path nvidia/Cosmos3-Super-Image2Video \
  --num-gpus 4 --enable-cfg-parallel --ulysses-degree 2

which is indeed preferred way to serve multi-gpu inference, but only if the model fits into single GPU (>80GB). This it only best setup for performance, but it doesn't reduce memory requirements.

Safer option would be to use fsdp as an example for Cosmos3-Super checkpoint, as this setup actually does reduce memory requirement by sharding the weights across gpus, i.e.:

sglang serve \
  --model-path nvidia/Cosmos3-Super-Image2Video \
  --num-gpus 4 --use-fsdp-inference

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are looking for memory-friendly setups, yes we could do better, whether fsdp or offloading would do

Comment thread README.md
```shell
git clone https://github.com/sgl-project/sglang.git
cd sglang
pip install -e "python[diffusion]"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make tag/stable release of the sglang repo and pin it here?
This command will always download top of tree sglang, which is not what we want as part of the README.

Copy link
Copy Markdown
Author

@mickqian mickqian Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I added an optional checkout step plus a version note. the default keeps tracking upstream SGLang to pick up ongoing Cosmos 3 fixes/performance improvements, while production or reproducible deployments should pin a release tag or known-good commit before install.

Comment thread README.md

| Model | Status | Notes |
| --- | --- | --- |
| `nvidia/Cosmos3-Nano` | Supported | Text-to-image, text-to-video, image-to-video |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably good to specify we support other modalities such as sound and action.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the wording to mention that Cosmos 3 includes video-with-sound and action/policy models, while keeping this SGLang section scoped to the currently supported T2I/T2V/I2V generator serving paths.

Comment thread README.md
cd sglang
# Optional: pin a release tag or known-good commit for reproducible deployments.
# git checkout <release-tag-or-commit>
pip install -e "python[diffusion]"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best to support uv or venv

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a venv setup before the editable SGLang install.

Comment thread README.md
Comment on lines +472 to +494
job_id=$(curl -sS -X POST http://localhost:30000/v1/videos \
--form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
--form-string "negative_prompt=blurry, distorted, low quality" \
--form-string "size=1280x720" \
--form-string "num_frames=81" \
--form-string "fps=24" \
--form-string "num_inference_steps=35" \
--form-string "guidance_scale=4.0" \
--form-string "flow_shift=10.0" \
--form-string "seed=42" \
--form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
| python -c 'import json, sys; print(json.load(sys.stdin)["id"])')

while true; do
status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" \
| python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
[ "$status" = "completed" ] && break
[ "$status" = "failed" ] && exit 1
sleep 1
done

curl -sS -L "http://localhost:30000/v1/videos/${job_id}/content" \
-o cosmos3_t2v_output.mp4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add comments here to improve readability?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments for the submit, poll, and download steps in the video example.

Comment thread README.md Outdated
Comment on lines +485 to +491
while true; do
status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" \
| python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
[ "$status" = "completed" ] && break
[ "$status" = "failed" ] && exit 1
sleep 1
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a cleaner way to write this?
Maybe

until [ "$status" = completed ]; do
      status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" | jq -r .status)
      [ "$status" = failed ] && exit 1
      sleep 1
    done

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also Generation usually take 3 mins on single GPU instance so we can try sleeping for a longer duration.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched the polling snippet to a cleaner jq + until loop.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increased the polling interval to 5 seconds to better match longer video generation times.

Copy link
Copy Markdown

@atharvajoshi10 atharvajoshi10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Let some minor formatting comments, looks good otherwise

@lfengad lfengad requested a review from vinjn June 3, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants