Issue
User is asking whether the included models (Gemini Flash 2.0, DeepSeek, etc.) support vision capabilities such as OCR extraction within workflows. Currently it is unclear from the UI/docs whether these models can process image inputs for tasks like document text extraction.
Expectation
Clarify which included models support vision/image input.
If supported, document how to use vision capabilities (e.g., passing image URLs or base64 to LLM nodes).
If not supported, consider adding vision support for included models or clearly document the limitation.