Question: Is Reasoner expected to support MP4 video input in cosmos-framework inference?

Hi team, thank you for releasing Cosmos3.

I have a question about Reasoner input support.

On the Hugging Face model card for nvidia/Cosmos3-Nano, the Reasoner section appears to describe support for Text, Text+Image, and Text+Video, with video format listed as MP4.

However, when I run cosmos-framework inference in reasoner mode with a local MP4 in vision_path, it fails because the pipeline tries to open vision_path as an image (PIL), which raises an UnidentifiedImageError.

Could you clarify the expected behavior?

Is video input for reasoner mode currently supported in cosmos-framework CLI inference?
If yes, what is the correct input format and example JSON for video reasoning?
If not yet supported, is this a known gap between the model card and current implementation?
Are there recommended workarounds (for example, frame sampling or storyboard image input) until native video-reasoner support is available?
Minimal reproduction

model_mode: reasoner
vision_path: local MP4 file
command: torchrun inference with Cosmos3-Nano
observed error: PIL.UnidentifiedImageError on the MP4 path
Thanks in advance for clarification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Is Reasoner expected to support MP4 video input in cosmos-framework inference? #8

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Question: Is Reasoner expected to support MP4 video input in cosmos-framework inference? #8

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions