Hi team, thank you for releasing Cosmos3.
I have a question about Reasoner input support.
On the Hugging Face model card for nvidia/Cosmos3-Nano, the Reasoner section appears to describe support for Text, Text+Image, and Text+Video, with video format listed as MP4.
However, when I run cosmos-framework inference in reasoner mode with a local MP4 in vision_path, it fails because the pipeline tries to open vision_path as an image (PIL), which raises an UnidentifiedImageError.
Could you clarify the expected behavior?
Is video input for reasoner mode currently supported in cosmos-framework CLI inference?
If yes, what is the correct input format and example JSON for video reasoning?
If not yet supported, is this a known gap between the model card and current implementation?
Are there recommended workarounds (for example, frame sampling or storyboard image input) until native video-reasoner support is available?
Minimal reproduction
model_mode: reasoner
vision_path: local MP4 file
command: torchrun inference with Cosmos3-Nano
observed error: PIL.UnidentifiedImageError on the MP4 path
Thanks in advance for clarification.
Hi team, thank you for releasing Cosmos3.
I have a question about Reasoner input support.
On the Hugging Face model card for nvidia/Cosmos3-Nano, the Reasoner section appears to describe support for Text, Text+Image, and Text+Video, with video format listed as MP4.
However, when I run cosmos-framework inference in reasoner mode with a local MP4 in vision_path, it fails because the pipeline tries to open vision_path as an image (PIL), which raises an UnidentifiedImageError.
Could you clarify the expected behavior?
Is video input for reasoner mode currently supported in cosmos-framework CLI inference?
If yes, what is the correct input format and example JSON for video reasoning?
If not yet supported, is this a known gap between the model card and current implementation?
Are there recommended workarounds (for example, frame sampling or storyboard image input) until native video-reasoner support is available?
Minimal reproduction
model_mode: reasoner
vision_path: local MP4 file
command: torchrun inference with Cosmos3-Nano
observed error: PIL.UnidentifiedImageError on the MP4 path
Thanks in advance for clarification.