From df000367811fd5d6202fc182dc6f0818e5f070b2 Mon Sep 17 00:00:00 2001
From: Mick <mickjagger19@icloud.com>
Date: Mon, 1 Jun 2026 17:54:14 +0800
Subject: [PATCH 1/8] Add SGLang Cosmos3 serving docs

---
 README.md | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 99 insertions(+)
diff --git a/README.md b/README.md
index ed3571b..7cc0083 100644
--- a/README.md
+++ b/README.md
@@ -25,6 +25,7 @@
   - [Quickstart](#quickstart)
     - [Generator with Diffusers](#generator-with-diffusers)
     - [Generator with vLLM-Omni](#generator-with-vllm-omni)
+    - [Generator with SGLang](#generator-with-sglang)
     - [Reasoner with Transformers](#reasoner-with-transformers)
     - [Reasoner with vLLM](#reasoner-with-vllm)
   - [Troubleshooting](#troubleshooting)
@@ -413,6 +414,104 @@ References:
 
 </details>
 
+#### Generator with SGLang
+
+<details>
+<summary>Expand SGLang generator setup, endpoints, and request reference</summary>
+
+Use SGLang Diffusion for native Cosmos 3 visual generation behind OpenAI-compatible image and video APIs. SGLang currently supports text-to-image, text-to-video, and image-to-video for the generator checkpoints; video-to-video, video-with-sound, and action modes are planned separately.
+
+Supported checkpoints:
+
+| Model | Status | Notes |
+| --- | --- | --- |
+| `nvidia/Cosmos3-Nano` | Supported | Text-to-image, text-to-video, image-to-video |
+| `nvidia/Cosmos3-Super` | Supported | Use multiple GPUs for the 64B checkpoint |
+| `nvidia/Cosmos3-Super-Text2Image` | Supported | Text-to-image specialized checkpoint |
+| `nvidia/Cosmos3-Super-Image2Video` | Supported | Image-to-video specialized checkpoint |
+| `nvidia/Cosmos3-Nano-Policy-DROID` | Not supported yet | Action/policy checkpoint |
+
+Install SGLang with diffusion extras:
+
+```shell
+pip install -e "python[diffusion]"
+pip install "cosmos-guardrail==0.3.1"
+```
+
+Start a Nano server:
+
+```shell
+sglang serve \
+  --model-type diffusion \
+  --model-path nvidia/Cosmos3-Nano \
+  --num-gpus 1 \
+  --host 0.0.0.0 \
+  --port 8000 \
+  --output-path /tmp/cosmos3-sglang
+```
+
+For `Cosmos3-Super`, use multiple GPUs:
+
+```shell
+sglang serve \
+  --model-type diffusion \
+  --model-path nvidia/Cosmos3-Super \
+  --num-gpus 4 \
+  --host 0.0.0.0 \
+  --port 8000 \
+  --output-path /tmp/cosmos3-sglang
+```
+
+Vision endpoints:
+
+| Mode | Endpoint | Notes |
+| --- | --- | --- |
+| Text to image | `POST /v1/images/generations` | Returns base64 by default for Cosmos 3 |
+| Text to video | `POST /v1/videos/sync` or `POST /v1/videos` | `/sync` blocks and returns MP4 bytes with `Accept: video/mp4` |
+| Image to video | `POST /v1/videos/sync` or `POST /v1/videos` | Upload the conditioning image with `input_reference` |
+
+Text-to-video example:
+
+```shell
+curl -sS -X POST http://localhost:8000/v1/videos/sync \
+  -H "Accept: video/mp4" \
+  --form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
+  --form-string "negative_prompt=blurry, distorted, low quality" \
+  --form-string "size=1280x720" \
+  --form-string "num_frames=81" \
+  --form-string "fps=24" \
+  --form-string "num_inference_steps=35" \
+  --form-string "guidance_scale=4.0" \
+  --form-string "flow_shift=10.0" \
+  --form-string "seed=42" \
+  --form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
+  -o cosmos3_t2v_output.mp4
+```
+
+Text-to-image example:
+
+```shell
+curl -sS -X POST http://localhost:8000/v1/images/generations \
+  -H "Content-Type: application/json" \
+  -d '{
+    "prompt": "A warehouse robot folds a blue cloth on a clean workbench.",
+    "size": "1280x720",
+    "n": 1,
+    "num_inference_steps": 35,
+    "guidance_scale": 6.0,
+    "flow_shift": 10.0,
+    "seed": 0,
+    "extra_args": {
+      "use_resolution_template": false,
+      "guardrails": true
+    }
+  }'
+```
+
+SGLang accepts the same Cosmos 3 request knobs used by the vLLM-Omni examples: `max_sequence_length`, `flow_shift`, `extra_params.guardrails`, `extra_params.use_resolution_template`, and `extra_params.use_duration_template`. Guardrails are enabled by default when `cosmos-guardrail` is installed; set `SGLANG_DISABLE_COSMOS3_GUARDRAILS=1` before starting the server to skip loading the guardrail models.
+
+</details>
+
 #### Reasoner with Transformers
 Coming soon!
 

From 52ff056b5b236b86b4c17cb7383f21f64ede54c6 Mon Sep 17 00:00:00 2001
From: Mick <mickjagger19@icloud.com>
Date: Mon, 1 Jun 2026 20:04:00 +0800
Subject: [PATCH 2/8] Use async SGLang video API in Cosmos3 docs

---
 README.md | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 7cc0083..c9fd0bf 100644
--- a/README.md
+++ b/README.md
@@ -467,14 +467,13 @@ Vision endpoints:
 | Mode | Endpoint | Notes |
 | --- | --- | --- |
 | Text to image | `POST /v1/images/generations` | Returns base64 by default for Cosmos 3 |
-| Text to video | `POST /v1/videos/sync` or `POST /v1/videos` | `/sync` blocks and returns MP4 bytes with `Accept: video/mp4` |
-| Image to video | `POST /v1/videos/sync` or `POST /v1/videos` | Upload the conditioning image with `input_reference` |
+| Text to video | `POST /v1/videos` | Creates an async job; poll `GET /v1/videos/{id}` and download `/content` |
+| Image to video | `POST /v1/videos` | Upload the conditioning image with `input_reference` |
 
 Text-to-video example:
 
 ```shell
-curl -sS -X POST http://localhost:8000/v1/videos/sync \
-  -H "Accept: video/mp4" \
+job_id=$(curl -sS -X POST http://localhost:8000/v1/videos \
   --form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
   --form-string "negative_prompt=blurry, distorted, low quality" \
   --form-string "size=1280x720" \
@@ -485,6 +484,17 @@ curl -sS -X POST http://localhost:8000/v1/videos/sync \
   --form-string "flow_shift=10.0" \
   --form-string "seed=42" \
   --form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
+  | python -c 'import json, sys; print(json.load(sys.stdin)["id"])')
+
+while true; do
+  status=$(curl -sS "http://localhost:8000/v1/videos/${job_id}" \
+    | python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
+  [ "$status" = "completed" ] && break
+  [ "$status" = "failed" ] && exit 1
+  sleep 1
+done
+
+curl -sS -L "http://localhost:8000/v1/videos/${job_id}/content" \
   -o cosmos3_t2v_output.mp4
 ```
 

From 44e38e7128a53b760a36fe7da1cd6d72bf2a096a Mon Sep 17 00:00:00 2001
From: Mick <mickjagger19@icloud.com>
Date: Tue, 2 Jun 2026 00:58:52 +0800
Subject: [PATCH 3/8] Update SGLang install snippet

---
 README.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README.md b/README.md
index c9fd0bf..7a722fa 100644
--- a/README.md
+++ b/README.md
@@ -434,6 +434,8 @@ Supported checkpoints:
 Install SGLang with diffusion extras:
 
 ```shell
+git clone https://github.com/sgl-project/sglang.git
+cd sglang
 pip install -e "python[diffusion]"
 pip install "cosmos-guardrail==0.3.1"
 ```

From 76c41280c998d60dac4bf0d18255b264b9f85529 Mon Sep 17 00:00:00 2001
From: Mick <mickjagger19@icloud.com>
Date: Tue, 2 Jun 2026 01:04:27 +0800
Subject: [PATCH 4/8] Remove vLLM reference from SGLang docs

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 7a722fa..bd17041 100644
--- a/README.md
+++ b/README.md
@@ -520,7 +520,7 @@ curl -sS -X POST http://localhost:8000/v1/images/generations \
   }'
 ```
 
-SGLang accepts the same Cosmos 3 request knobs used by the vLLM-Omni examples: `max_sequence_length`, `flow_shift`, `extra_params.guardrails`, `extra_params.use_resolution_template`, and `extra_params.use_duration_template`. Guardrails are enabled by default when `cosmos-guardrail` is installed; set `SGLANG_DISABLE_COSMOS3_GUARDRAILS=1` before starting the server to skip loading the guardrail models.
+SGLang accepts Cosmos 3 request options including `max_sequence_length`, `flow_shift`, `extra_params.guardrails`, `extra_params.use_resolution_template`, and `extra_params.use_duration_template`. Guardrails are enabled by default when `cosmos-guardrail` is installed; set `SGLANG_DISABLE_COSMOS3_GUARDRAILS=1` before starting the server to skip loading the guardrail models.
 
 </details>
 

From 54bf1407363f23798d06ae1124987b88dd8b00ff Mon Sep 17 00:00:00 2001
From: Mick <mickjagger19@icloud.com>
Date: Tue, 2 Jun 2026 01:10:56 +0800
Subject: [PATCH 5/8] Simplify SGLang Cosmos3 serve examples

---
 README.md | 26 ++++++++------------------
 1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/README.md b/README.md
index bd17041..5fef260 100644
--- a/README.md
+++ b/README.md
@@ -443,25 +443,15 @@ pip install "cosmos-guardrail==0.3.1"
 Start a Nano server:
 
 ```shell
-sglang serve \
-  --model-type diffusion \
-  --model-path nvidia/Cosmos3-Nano \
-  --num-gpus 1 \
-  --host 0.0.0.0 \
-  --port 8000 \
-  --output-path /tmp/cosmos3-sglang
+sglang serve --model-path nvidia/Cosmos3-Nano
 ```
 
-For `Cosmos3-Super`, use multiple GPUs:
+For a video-specialized checkpoint, use `Cosmos3-Super-Image2Video` with multiple GPUs:
 
 ```shell
 sglang serve \
-  --model-type diffusion \
-  --model-path nvidia/Cosmos3-Super \
-  --num-gpus 4 \
-  --host 0.0.0.0 \
-  --port 8000 \
-  --output-path /tmp/cosmos3-sglang
+  --model-path nvidia/Cosmos3-Super-Image2Video \
+  --num-gpus 4
 ```
 
 Vision endpoints:
@@ -475,7 +465,7 @@ Vision endpoints:
 Text-to-video example:
 
 ```shell
-job_id=$(curl -sS -X POST http://localhost:8000/v1/videos \
+job_id=$(curl -sS -X POST http://localhost:30000/v1/videos \
   --form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
   --form-string "negative_prompt=blurry, distorted, low quality" \
   --form-string "size=1280x720" \
@@ -489,21 +479,21 @@ job_id=$(curl -sS -X POST http://localhost:8000/v1/videos \
   | python -c 'import json, sys; print(json.load(sys.stdin)["id"])')
 
 while true; do
-  status=$(curl -sS "http://localhost:8000/v1/videos/${job_id}" \
+  status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" \
     | python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
   [ "$status" = "completed" ] && break
   [ "$status" = "failed" ] && exit 1
   sleep 1
 done
 
-curl -sS -L "http://localhost:8000/v1/videos/${job_id}/content" \
+curl -sS -L "http://localhost:30000/v1/videos/${job_id}/content" \
   -o cosmos3_t2v_output.mp4
 ```
 
 Text-to-image example:
 
 ```shell
-curl -sS -X POST http://localhost:8000/v1/images/generations \
+curl -sS -X POST http://localhost:30000/v1/images/generations \
   -H "Content-Type: application/json" \
   -d '{
     "prompt": "A warehouse robot folds a blue cloth on a clean workbench.",

From 499f6295d35975a13c7171a8643332500d758110 Mon Sep 17 00:00:00 2001
From: Mick <mickjagger19@icloud.com>
Date: Tue, 2 Jun 2026 20:47:41 +0800
Subject: [PATCH 6/8] Clarify SGLang install pinning guidance

---
 README.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/README.md b/README.md
index 5fef260..a23e1de 100644
--- a/README.md
+++ b/README.md
@@ -436,10 +436,14 @@ Install SGLang with diffusion extras:
 ```shell
 git clone https://github.com/sgl-project/sglang.git
 cd sglang
+# Optional: pin a release tag or known-good commit for reproducible deployments.
+# git checkout <release-tag-or-commit>
 pip install -e "python[diffusion]"
 pip install "cosmos-guardrail==0.3.1"
 ```
 
+> **Version note:** Cosmos 3 support in SGLang Diffusion is actively improving. The command above installs the latest upstream SGLang so users get current Cosmos 3 fixes and performance work. For production deployments or reproducible benchmarks, pin an SGLang release tag or a known-good commit before running `pip install`.
+
 Start a Nano server:
 
 ```shell

From 03e80fe464b0e0b8d930a88ac1457a019e29443b Mon Sep 17 00:00:00 2001
From: Mick <mickjagger19@icloud.com>
Date: Wed, 3 Jun 2026 08:52:47 +0800
Subject: [PATCH 7/8] Update SGLang Cosmos3 README comments

---
 README.md | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/README.md b/README.md
index a23e1de..e1723e5 100644
--- a/README.md
+++ b/README.md
@@ -419,7 +419,7 @@ References:
 <details>
 <summary>Expand SGLang generator setup, endpoints, and request reference</summary>
 
-Use SGLang Diffusion for native Cosmos 3 visual generation behind OpenAI-compatible image and video APIs. SGLang currently supports text-to-image, text-to-video, and image-to-video for the generator checkpoints; video-to-video, video-with-sound, and action modes are planned separately.
+Use SGLang Diffusion for native Cosmos 3 visual generation behind OpenAI-compatible image and video APIs. Cosmos 3 also includes video-with-sound and action/policy models; this SGLang section focuses on the currently supported text-to-image, text-to-video, and image-to-video generator serving paths.
 
 Supported checkpoints:
 
@@ -438,6 +438,9 @@ git clone https://github.com/sgl-project/sglang.git
 cd sglang
 # Optional: pin a release tag or known-good commit for reproducible deployments.
 # git checkout <release-tag-or-commit>
+python -m venv .venv
+source .venv/bin/activate
+python -m pip install --upgrade pip
 pip install -e "python[diffusion]"
 pip install "cosmos-guardrail==0.3.1"
 ```
@@ -469,6 +472,7 @@ Vision endpoints:
 Text-to-video example:
 
 ```shell
+# Submit an async video generation job and capture its ID.
 job_id=$(curl -sS -X POST http://localhost:30000/v1/videos \
   --form-string "prompt=A small warehouse robot moves a blue box across a clean floor." \
   --form-string "negative_prompt=blurry, distorted, low quality" \
@@ -480,16 +484,17 @@ job_id=$(curl -sS -X POST http://localhost:30000/v1/videos \
   --form-string "flow_shift=10.0" \
   --form-string "seed=42" \
   --form-string 'extra_params={"guardrails":true,"use_resolution_template":false,"use_duration_template":false}' \
-  | python -c 'import json, sys; print(json.load(sys.stdin)["id"])')
+  | jq -r .id)
 
-while true; do
-  status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" \
-    | python -c 'import json, sys; print(json.load(sys.stdin)["status"])')
-  [ "$status" = "completed" ] && break
+# Poll until the job completes. Cosmos 3 video generation can take several minutes.
+status=""
+until [ "$status" = "completed" ]; do
+  status=$(curl -sS "http://localhost:30000/v1/videos/${job_id}" | jq -r .status)
   [ "$status" = "failed" ] && exit 1
-  sleep 1
+  sleep 5
 done
 
+# Download the completed MP4.
 curl -sS -L "http://localhost:30000/v1/videos/${job_id}/content" \
   -o cosmos3_t2v_output.mp4
 ```

From 53049aeae8830171c0ba37b3dc3ebbd9f157a28c Mon Sep 17 00:00:00 2001
From: Mick <mickjagger19@icloud.com>
Date: Thu, 4 Jun 2026 08:19:54 +0800
Subject: [PATCH 8/8] Use SGLang main branch for Cosmos3 install

---
 README.md | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index e1723e5..1f89dc8 100644
--- a/README.md
+++ b/README.md
@@ -431,13 +431,11 @@ Supported checkpoints:
 | `nvidia/Cosmos3-Super-Image2Video` | Supported | Image-to-video specialized checkpoint |
 | `nvidia/Cosmos3-Nano-Policy-DROID` | Not supported yet | Action/policy checkpoint |
 
-Install SGLang with diffusion extras:
+Install SGLang from the main branch with diffusion extras:
 
 ```shell
-git clone https://github.com/sgl-project/sglang.git
+git clone --branch main https://github.com/sgl-project/sglang.git
 cd sglang
-# Optional: pin a release tag or known-good commit for reproducible deployments.
-# git checkout <release-tag-or-commit>
 python -m venv .venv
 source .venv/bin/activate
 python -m pip install --upgrade pip
@@ -445,7 +443,7 @@ pip install -e "python[diffusion]"
 pip install "cosmos-guardrail==0.3.1"
 ```
 
-> **Version note:** Cosmos 3 support in SGLang Diffusion is actively improving. The command above installs the latest upstream SGLang so users get current Cosmos 3 fixes and performance work. For production deployments or reproducible benchmarks, pin an SGLang release tag or a known-good commit before running `pip install`.
+> **Version note:** Cosmos 3 support in SGLang Diffusion currently requires the SGLang main branch. Switch to a stable SGLang release once Cosmos 3 support is included there.
 
 Start a Nano server: