Skip to content

Support vision/OCR extraction with included models (Gemini Flash 2.0, DeepSeek, etc.) #1

@rickpixelml

Description

@rickpixelml

Issue

User is asking whether the included models (Gemini Flash 2.0, DeepSeek, etc.) support vision capabilities such as OCR extraction within workflows. Currently it is unclear from the UI/docs whether these models can process image inputs for tasks like document text extraction.

Expectation

Clarify which included models support vision/image input.
If supported, document how to use vision capabilities (e.g., passing image URLs or base64 to LLM nodes).
If not supported, consider adding vision support for included models or clearly document the limitation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions