Skip to content

Question: Is Reasoner expected to support MP4 video input in cosmos-framework inference? #8

@reikote

Description

@reikote

Hi team, thank you for releasing Cosmos3.

I have a question about Reasoner input support.

On the Hugging Face model card for nvidia/Cosmos3-Nano, the Reasoner section appears to describe support for Text, Text+Image, and Text+Video, with video format listed as MP4.

However, when I run cosmos-framework inference in reasoner mode with a local MP4 in vision_path, it fails because the pipeline tries to open vision_path as an image (PIL), which raises an UnidentifiedImageError.

Could you clarify the expected behavior?

Is video input for reasoner mode currently supported in cosmos-framework CLI inference?
If yes, what is the correct input format and example JSON for video reasoning?
If not yet supported, is this a known gap between the model card and current implementation?
Are there recommended workarounds (for example, frame sampling or storyboard image input) until native video-reasoner support is available?
Minimal reproduction

model_mode: reasoner
vision_path: local MP4 file
command: torchrun inference with Cosmos3-Nano
observed error: PIL.UnidentifiedImageError on the MP4 path
Thanks in advance for clarification.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions