Add SGLang Cosmos3 serving docs#174
Conversation
|
@yogeshbalaji Is this on the plan of the landing page? |
| ```shell | ||
| sglang serve \ | ||
| --model-path nvidia/Cosmos3-Super-Image2Video \ | ||
| --num-gpus 4 |
There was a problem hiding this comment.
If I'm not mistaken
sglang serve \
--model-path nvidia/Cosmos3-Super-Image2Video \
--num-gpus 4is equivalent to CFG + ulysses-deg 2 i.e.
sglang serve \
--model-path nvidia/Cosmos3-Super-Image2Video \
--num-gpus 4 --enable-cfg-parallel --ulysses-degree 2which is indeed preferred way to serve multi-gpu inference, but only if the model fits into single GPU (>80GB). This it only best setup for performance, but it doesn't reduce memory requirements.
Safer option would be to use fsdp as an example for Cosmos3-Super checkpoint, as this setup actually does reduce memory requirement by sharding the weights across gpus, i.e.:
sglang serve \
--model-path nvidia/Cosmos3-Super-Image2Video \
--num-gpus 4 --use-fsdp-inferenceThere was a problem hiding this comment.
if we are looking for memory-friendly setups, yes we could do better, whether fsdp or offloading would do
| ```shell | ||
| git clone https://github.com/sgl-project/sglang.git | ||
| cd sglang | ||
| pip install -e "python[diffusion]" |
There was a problem hiding this comment.
Can we make tag/stable release of the sglang repo and pin it here?
This command will always download top of tree sglang, which is not what we want as part of the README.
There was a problem hiding this comment.
good point. I added an optional checkout step plus a version note. the default keeps tracking upstream SGLang to pick up ongoing Cosmos 3 fixes/performance improvements, while production or reproducible deployments should pin a release tag or known-good commit before install.
|
|
||
| | Model | Status | Notes | | ||
| | --- | --- | --- | | ||
| | `nvidia/Cosmos3-Nano` | Supported | Text-to-image, text-to-video, image-to-video | |
There was a problem hiding this comment.
Probably good to specify we support other modalities such as sound and action.
There was a problem hiding this comment.
Updated the wording to mention that Cosmos 3 includes video-with-sound and action/policy models, while keeping this SGLang section scoped to the currently supported T2I/T2V/I2V generator serving paths.
| cd sglang | ||
| # Optional: pin a release tag or known-good commit for reproducible deployments. | ||
| # git checkout <release-tag-or-commit> | ||
| pip install -e "python[diffusion]" |
There was a problem hiding this comment.
Probably best to support uv or venv
There was a problem hiding this comment.
Added a venv setup before the editable SGLang install.
| job_id=$(curl -sS -X POST http://localhost:30000/v1/videos \ | ||
| --form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \ | ||
| --form-string "negative_prompt=blurry, distorted, low quality" \ | ||
| --form-string "size=1280x720" \ | ||
| --form-string "num_frames=81" \ | ||
| --form-string "fps=24" \ | ||
| --form-string "num_inference_steps=35" \ | ||
| --form-string "guidance_scale=4.0" \ | ||
| --form-string "flow_shift=10.0" \ | ||
| --form-string "seed=42" \ | ||
| --form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \ | ||
| | python -c 'import json, sys; print(json.load(sys.stdin)["id"])') | ||
|
|
||
| while true; do | ||
| status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" \ | ||
| | python -c 'import json, sys; print(json.load(sys.stdin)["status"])') | ||
| [ "$status" = "completed" ] && break | ||
| [ "$status" = "failed" ] && exit 1 | ||
| sleep 1 | ||
| done | ||
|
|
||
| curl -sS -L "http://localhost:30000/v1/videos/${job_id}/content" \ | ||
| -o cosmos3_t2v_output.mp4 |
There was a problem hiding this comment.
Can we add comments here to improve readability?
There was a problem hiding this comment.
Added comments for the submit, poll, and download steps in the video example.
| while true; do | ||
| status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" \ | ||
| | python -c 'import json, sys; print(json.load(sys.stdin)["status"])') | ||
| [ "$status" = "completed" ] && break | ||
| [ "$status" = "failed" ] && exit 1 | ||
| sleep 1 | ||
| done |
There was a problem hiding this comment.
Is there a cleaner way to write this?
Maybe
until [ "$status" = completed ]; do
status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" | jq -r .status)
[ "$status" = failed ] && exit 1
sleep 1
done
There was a problem hiding this comment.
Also Generation usually take 3 mins on single GPU instance so we can try sleeping for a longer duration.
There was a problem hiding this comment.
Switched the polling snippet to a cleaner jq + until loop.
There was a problem hiding this comment.
Increased the polling interval to 5 seconds to better match longer video generation times.
atharvajoshi10
left a comment
There was a problem hiding this comment.
Thanks, Let some minor formatting comments, looks good otherwise
Summary
Generator with SGLangsection next to the existing Diffusers and vLLM-Omni serving pathsNotes
This section is intended to mirror the vLLM-Omni style while keeping the current SGLang support boundary explicit.