From 528392c18cf13cb8e8dbeabacedb13122a5635a1 Mon Sep 17 00:00:00 2001
From: fzowl <zoltan@voyageai.com>
Date: Sun, 21 Dec 2025 14:37:46 +0100
Subject: [PATCH 1/5] voyage-multimodal-3.5 (video) support

---
 integrations/voyage.md | 125 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 125 insertions(+)

diff --git a/integrations/voyage.md b/integrations/voyage.md
index b64974ce..316ea41c 100644
--- a/integrations/voyage.md
+++ b/integrations/voyage.md
@@ -24,17 +24,50 @@ toc: true
 
 - [Installation](#installation)
 - [Usage](#usage)
+- [Supported Models](#supported-models)
 - [Example](#example)
 - [Contextualized Embeddings Example](#contextualized-embeddings-example)
+- [Multimodal Embeddings](#multimodal-embeddings)
 
 [Voyage AI](https://voyageai.com/)'s embedding and ranking models are state-of-the-art in retrieval accuracy. The integration supports the following models:
 - **`voyage-3.5`** and **`voyage-3.5-lite`** - Latest general-purpose embedding models with superior performance
 - **`voyage-3-large`** and **`voyage-3`** - High-performance general-purpose embedding models
 - **`voyage-context-3`** - Contextualized chunk embedding model that preserves document context for improved retrieval accuracy
+- **`voyage-multimodal-3.5`** - Multimodal model supporting text, images, and video (preview)
 - **`voyage-2`** and **`voyage-large-2`** - Proven models that outperform `intfloat/e5-mistral-7b-instruct` and `OpenAI/text-embedding-3-large` on the [MTEB Benchmark](https://github.com/embeddings-benchmark/mteb)
 
 For the complete list of available models, see the [Embeddings Documentation](https://docs.voyageai.com/embeddings/) and [Contextualized Chunk Embeddings](https://docs.voyageai.com/docs/contextualized-chunk-embeddings).
 
+## Supported Models
+
+### Text Embedding Models
+
+| Model | Description | Dimensions |
+|-------|-------------|------------|
+| `voyage-3.5` | Latest general-purpose embedding model | 1024 |
+| `voyage-3.5-lite` | Efficient model with lower latency | 1024 |
+| `voyage-3-large` | High-capacity embedding model | 1024 |
+| `voyage-3` | High-performance general-purpose model | 1024 |
+| `voyage-code-3` | Optimized for code retrieval | 1024 |
+| `voyage-finance-2` | Optimized for financial documents | 1024 |
+| `voyage-law-2` | Optimized for legal documents | 1024 |
+| `voyage-2` | Proven general-purpose model | 1024 |
+| `voyage-large-2` | Larger proven model | 1536 |
+
+### Multimodal Embedding Models
+
+| Model | Description | Dimensions | Modalities |
+|-------|-------------|------------|------------|
+| `voyage-multimodal-3` | Multimodal embedding model | 1024 | Text, Images |
+| `voyage-multimodal-3.5` | Multimodal embedding model (preview) | 256, 512, 1024, 2048 | Text, Images, Video |
+
+### Reranker Models
+
+| Model | Description |
+|-------|-------------|
+| `rerank-2` | High-accuracy reranker model |
+| `rerank-2-lite` | Efficient reranker with lower latency |
+
 ## Installation
 
 ```bash
@@ -188,6 +221,98 @@ result = embedder.run(documents=docs)
 
 For more examples, see the [contextualized embedder example](https://github.com/awinml/voyage-embedders-haystack/blob/voyage_context-3_model/examples/contextualized_embedder_example.py).
 
+## Multimodal Embeddings
+
+Voyage AI's `voyage-multimodal-3.5` model transforms unstructured data from multiple modalities (text, images, video) into a shared vector space. This enables mixed-media document retrieval and cross-modal semantic search.
+
+### Features
+
+- **Multiple modalities**: Supports text, images, and video in a single input
+- **Variable dimensions**: Output dimensions of 256, 512, 1024 (default), or 2048
+- **Interleaved content**: Mix text, images, and video in single inputs
+- **No preprocessing required**: Process documents with embedded images directly
+
+### Limits
+
+- Images: Max 20MB, 16 million pixels
+- Video: Max 20MB
+- Context: 32,000 tokens
+- Token counting: 560 image pixels = 1 token, 1120 video pixels = 1 token
+
+### Multimodal API Example
+
+The multimodal model uses a different API endpoint (`/v1/multimodalembeddings`):
+
+```python
+import os
+import voyageai
+from PIL import Image
+
+# Initialize client (uses VOYAGE_API_KEY environment variable)
+client = voyageai.Client(api_key=os.environ.get("VOYAGE_API_KEY"))
+
+# Text-only embedding
+result = client.multimodal_embed(
+    inputs=[["Your text here"]],
+    model="voyage-multimodal-3.5"
+)
+
+# Text + Image embedding
+image = Image.open("document.jpg")
+result = client.multimodal_embed(
+    inputs=[["Caption or context", image]],
+    model="voyage-multimodal-3.5",
+    output_dimension=1024  # Optional: 256, 512, 1024, or 2048
+)
+
+print(f"Dimensions: {len(result.embeddings[0])}")
+print(f"Tokens used: {result.total_tokens}")
+```
+
+### Video Embedding Example
+
+Video inputs require the `voyageai.video_utils` module. Use `optimize_video` to fit videos within the 32K token context:
+
+```python
+import os
+import voyageai
+from voyageai.video_utils import optimize_video
+
+client = voyageai.Client(api_key=os.environ.get("VOYAGE_API_KEY"))
+
+# Load and optimize video (videos can be large in tokens)
+with open("video.mp4", "rb") as f:
+    video_bytes = f.read()
+
+# Optimize to fit within token budget
+optimized_video = optimize_video(
+    video_bytes,
+    model="voyage-multimodal-3.5",
+    max_video_tokens=5000  # Limit tokens used by video
+)
+print(f"Optimized: {optimized_video.num_frames} frames, ~{optimized_video.estimated_num_tokens} tokens")
+
+# Embed video (optionally with text context)
+result = client.multimodal_embed(
+    inputs=[[optimized_video]],
+    model="voyage-multimodal-3.5"
+)
+
+print(f"Dimensions: {len(result.embeddings[0])}")
+print(f"Tokens used: {result.total_tokens}")
+```
+
+### Use Cases
+
+- Mixed-media document retrieval (PDFs, slides with images)
+- Image-text similarity search
+- Video content retrieval and search
+- Cross-modal semantic search
+
+For more information, see the [Multimodal Embeddings Documentation](https://docs.voyageai.com/docs/multimodal-embeddings).
+
+> **Note:** The `voyage-multimodal-3.5` model is currently in preview. Video input requires `voyageai` SDK version 0.3.6 or later.
+
 ## License
 
 `voyage-embedders-haystack` is distributed under the terms of the [Apache-2.0 license](https://github.com/awinml/voyage-embedders-haystack/blob/main/LICENSE).

From b6c56ffd75f7716c03b61a5082a2ae023b1f26a3 Mon Sep 17 00:00:00 2001
From: fzowl <zoltan@voyageai.com>
Date: Fri, 30 Jan 2026 14:10:54 +0100
Subject: [PATCH 2/5] Updated: voyage-multimodal-3.5 (video) support

---
 integrations/voyage.md | 94 ++++++++++++++++++++++--------------------
 1 file changed, 49 insertions(+), 45 deletions(-)

diff --git a/integrations/voyage.md b/integrations/voyage.md
index 316ea41c..fb2da82a 100644
--- a/integrations/voyage.md
+++ b/integrations/voyage.md
@@ -76,10 +76,11 @@ pip install voyage-embedders-haystack
 
 ## Usage
 
-You can use Voyage models with four components:
+You can use Voyage models with five components:
 - [VoyageTextEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_text_embedder.py) - For embedding query text
 - [VoyageDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_document_embedder.py) - For embedding documents
-- [VoyageContextualizedDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/voyage_context-3_model/src/haystack_integrations/components/embedders/voyage_embedders/voyage_contextualized_document_embedder.py) - For contextualized chunk embeddings with `voyage-context-3`
+- [VoyageContextualizedDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_contextualized_document_embedder.py) - For contextualized chunk embeddings with `voyage-context-3`
+- [VoyageMultimodalEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_multimodal_embedder.py) - For multimodal embeddings with `voyage-multimodal-3.5`
 - [VoyageRanker](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/rankers/voyage/ranker.py) - For reranking documents
 
 ### Standard Embeddings
@@ -239,67 +240,70 @@ Voyage AI's `voyage-multimodal-3.5` model transforms unstructured data from mult
 - Context: 32,000 tokens
 - Token counting: 560 image pixels = 1 token, 1120 video pixels = 1 token
 
-### Multimodal API Example
+### Basic Multimodal Example
 
-The multimodal model uses a different API endpoint (`/v1/multimodalembeddings`):
+Use the `VoyageMultimodalEmbedder` component for multimodal embeddings. Each input is a list of content items (text, images, or videos):
 
 ```python
-import os
-import voyageai
-from PIL import Image
-
-# Initialize client (uses VOYAGE_API_KEY environment variable)
-client = voyageai.Client(api_key=os.environ.get("VOYAGE_API_KEY"))
+from haystack.dataclasses import ByteStream
+from haystack_integrations.components.embedders.voyage_embedders import VoyageMultimodalEmbedder
 
 # Text-only embedding
-result = client.multimodal_embed(
-    inputs=[["Your text here"]],
-    model="voyage-multimodal-3.5"
-)
+embedder = VoyageMultimodalEmbedder(model="voyage-multimodal-3.5")
+result = embedder.run(inputs=[["What is in this image?"]])
+print(f"Embedding dimensions: {len(result['embeddings'][0])}")
+
+# Mixed text and image embedding
+image_bytes = ByteStream.from_file_path("image.jpg")
+result = embedder.run(inputs=[["Describe this image:", image_bytes]])
+print(f"Tokens used: {result['meta']['total_tokens']}")
+```
+
+### Multimodal Example with Custom Dimensions
+
+```python
+from haystack.dataclasses import ByteStream
+from haystack_integrations.components.embedders.voyage_embedders import VoyageMultimodalEmbedder
 
-# Text + Image embedding
-image = Image.open("document.jpg")
-result = client.multimodal_embed(
-    inputs=[["Caption or context", image]],
+# Configure output dimensions (256, 512, 1024, or 2048)
+embedder = VoyageMultimodalEmbedder(
     model="voyage-multimodal-3.5",
-    output_dimension=1024  # Optional: 256, 512, 1024, or 2048
+    output_dimension=2048,  # Higher dimensions for better accuracy
+    input_type="document",  # Optimize for document retrieval
 )
 
-print(f"Dimensions: {len(result.embeddings[0])}")
-print(f"Tokens used: {result.total_tokens}")
+# Embed multiple inputs at once
+image1 = ByteStream.from_file_path("doc1.jpg")
+image2 = ByteStream.from_file_path("doc2.jpg")
+
+result = embedder.run(inputs=[
+    ["Document about machine learning", image1],
+    ["Technical diagram", image2],
+])
+
+print(f"Number of embeddings: {len(result['embeddings'])}")
+print(f"Image pixels processed: {result['meta']['image_pixels']}")
 ```
 
 ### Video Embedding Example
 
-Video inputs require the `voyageai.video_utils` module. Use `optimize_video` to fit videos within the 32K token context:
+Video inputs require the `voyageai.video_utils` module:
 
 ```python
-import os
-import voyageai
-from voyageai.video_utils import optimize_video
-
-client = voyageai.Client(api_key=os.environ.get("VOYAGE_API_KEY"))
+from voyageai.video_utils import Video
+from haystack_integrations.components.embedders.voyage_embedders import VoyageMultimodalEmbedder
 
-# Load and optimize video (videos can be large in tokens)
-with open("video.mp4", "rb") as f:
-    video_bytes = f.read()
+embedder = VoyageMultimodalEmbedder(model="voyage-multimodal-3.5")
 
-# Optimize to fit within token budget
-optimized_video = optimize_video(
-    video_bytes,
-    model="voyage-multimodal-3.5",
-    max_video_tokens=5000  # Limit tokens used by video
-)
-print(f"Optimized: {optimized_video.num_frames} frames, ~{optimized_video.estimated_num_tokens} tokens")
+# Load video using VoyageAI's Video utility
+video = Video.from_path("video.mp4", model="voyage-multimodal-3.5")
 
-# Embed video (optionally with text context)
-result = client.multimodal_embed(
-    inputs=[[optimized_video]],
-    model="voyage-multimodal-3.5"
-)
+# Embed video with optional text context
+result = embedder.run(inputs=[["Describe this video:", video]])
 
-print(f"Dimensions: {len(result.embeddings[0])}")
-print(f"Tokens used: {result.total_tokens}")
+print(f"Embedding dimensions: {len(result['embeddings'][0])}")
+print(f"Video pixels processed: {result['meta']['video_pixels']}")
+print(f"Total tokens: {result['meta']['total_tokens']}")
 ```
 
 ### Use Cases
@@ -311,7 +315,7 @@ print(f"Tokens used: {result.total_tokens}")
 
 For more information, see the [Multimodal Embeddings Documentation](https://docs.voyageai.com/docs/multimodal-embeddings).
 
-> **Note:** The `voyage-multimodal-3.5` model is currently in preview. Video input requires `voyageai` SDK version 0.3.6 or later.
+> **Note:** The `voyage-multimodal-3.5` model is currently in preview. Video input requires `voyageai` SDK version 0.3.6 or later and `pillow` for image processing.
 
 ## License
 

From 518e9cf07c2033a610dda5af289e1ac7df0949d0 Mon Sep 17 00:00:00 2001
From: fzowl <zoltan@voyageai.com>
Date: Fri, 30 Jan 2026 14:14:44 +0100
Subject: [PATCH 3/5] Updated: voyage-multimodal-3.5 (video) support

---
 integrations/voyage.md | 2 --
 1 file changed, 2 deletions(-)

diff --git a/integrations/voyage.md b/integrations/voyage.md
index fb2da82a..d4e16475 100644
--- a/integrations/voyage.md
+++ b/integrations/voyage.md
@@ -315,8 +315,6 @@ print(f"Total tokens: {result['meta']['total_tokens']}")
 
 For more information, see the [Multimodal Embeddings Documentation](https://docs.voyageai.com/docs/multimodal-embeddings).
 
-> **Note:** The `voyage-multimodal-3.5` model is currently in preview. Video input requires `voyageai` SDK version 0.3.6 or later and `pillow` for image processing.
-
 ## License
 
 `voyage-embedders-haystack` is distributed under the terms of the [Apache-2.0 license](https://github.com/awinml/voyage-embedders-haystack/blob/main/LICENSE).

From 8ded8480374bcb7d4e0f0af04e11caf2f5fb024d Mon Sep 17 00:00:00 2001
From: fzowl <zoltan@voyageai.com>
Date: Fri, 13 Feb 2026 22:34:18 +0100
Subject: [PATCH 4/5] Updated: voyage-multimodal-3.5 (video) support

---
 integrations/voyage.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/integrations/voyage.md b/integrations/voyage.md
index d4e16475..03b21a7d 100644
--- a/integrations/voyage.md
+++ b/integrations/voyage.md
@@ -250,12 +250,12 @@ from haystack_integrations.components.embedders.voyage_embedders import VoyageMu
 
 # Text-only embedding
 embedder = VoyageMultimodalEmbedder(model="voyage-multimodal-3.5")
-result = embedder.run(inputs=[["What is in this image?"]])
+result = embedder.run(inputs=[["A sunset over the ocean"]])
 print(f"Embedding dimensions: {len(result['embeddings'][0])}")
 
 # Mixed text and image embedding
 image_bytes = ByteStream.from_file_path("image.jpg")
-result = embedder.run(inputs=[["Describe this image:", image_bytes]])
+result = embedder.run(inputs=[["Product photo for online store", image_bytes]])
 print(f"Tokens used: {result['meta']['total_tokens']}")
 ```
 
@@ -299,7 +299,7 @@ embedder = VoyageMultimodalEmbedder(model="voyage-multimodal-3.5")
 video = Video.from_path("video.mp4", model="voyage-multimodal-3.5")
 
 # Embed video with optional text context
-result = embedder.run(inputs=[["Describe this video:", video]])
+result = embedder.run(inputs=[["Machine learning tutorial", video]])
 
 print(f"Embedding dimensions: {len(result['embeddings'][0])}")
 print(f"Video pixels processed: {result['meta']['video_pixels']}")

From 5b15ecd4cd7559d0c9e6737fe802d93026547e09 Mon Sep 17 00:00:00 2001
From: fzowl <zoltan@voyageai.com>
Date: Fri, 13 Feb 2026 22:43:49 +0100
Subject: [PATCH 5/5] Adding VoyageAI V4 family models

---
 integrations/voyage.md | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/integrations/voyage.md b/integrations/voyage.md
index 03b21a7d..2a29c437 100644
--- a/integrations/voyage.md
+++ b/integrations/voyage.md
@@ -30,11 +30,11 @@ toc: true
 - [Multimodal Embeddings](#multimodal-embeddings)
 
 [Voyage AI](https://voyageai.com/)'s embedding and ranking models are state-of-the-art in retrieval accuracy. The integration supports the following models:
-- **`voyage-3.5`** and **`voyage-3.5-lite`** - Latest general-purpose embedding models with superior performance
-- **`voyage-3-large`** and **`voyage-3`** - High-performance general-purpose embedding models
+- **`voyage-4-large`**, **`voyage-4`**, and **`voyage-4-lite`** - Latest general-purpose embedding models with shared embedding space and MoE architecture
+- **`voyage-3.5`** and **`voyage-3.5-lite`** - General-purpose embedding models with superior performance
+- **`voyage-code-3`** - Optimized for code retrieval
 - **`voyage-context-3`** - Contextualized chunk embedding model that preserves document context for improved retrieval accuracy
 - **`voyage-multimodal-3.5`** - Multimodal model supporting text, images, and video (preview)
-- **`voyage-2`** and **`voyage-large-2`** - Proven models that outperform `intfloat/e5-mistral-7b-instruct` and `OpenAI/text-embedding-3-large` on the [MTEB Benchmark](https://github.com/embeddings-benchmark/mteb)
 
 For the complete list of available models, see the [Embeddings Documentation](https://docs.voyageai.com/embeddings/) and [Contextualized Chunk Embeddings](https://docs.voyageai.com/docs/contextualized-chunk-embeddings).
 
@@ -44,15 +44,12 @@ For the complete list of available models, see the [Embeddings Documentation](ht
 
 | Model | Description | Dimensions |
 |-------|-------------|------------|
-| `voyage-3.5` | Latest general-purpose embedding model | 1024 |
+| `voyage-4-large` | The best general-purpose and multilingual retrieval quality | 1024 (default), 256, 512, 2048 |
+| `voyage-4` | Optimized for general-purpose and multilingual retrieval quality | 1024 (default), 256, 512, 2048 |
+| `voyage-4-lite` | Optimized for latency and cost | 1024 (default), 256, 512, 2048 |
+| `voyage-3.5` | General-purpose embedding model | 1024 |
 | `voyage-3.5-lite` | Efficient model with lower latency | 1024 |
-| `voyage-3-large` | High-capacity embedding model | 1024 |
-| `voyage-3` | High-performance general-purpose model | 1024 |
 | `voyage-code-3` | Optimized for code retrieval | 1024 |
-| `voyage-finance-2` | Optimized for financial documents | 1024 |
-| `voyage-law-2` | Optimized for legal documents | 1024 |
-| `voyage-2` | Proven general-purpose model | 1024 |
-| `voyage-large-2` | Larger proven model | 1536 |
 
 ### Multimodal Embedding Models
 
@@ -92,11 +89,10 @@ To create semantic embeddings for documents, use `VoyageDocumentEmbedder` in you
 For improved retrieval quality, use `VoyageContextualizedDocumentEmbedder` with the `voyage-context-3` model. This component preserves context between related document chunks by grouping them together during embedding, reducing context loss that occurs when chunks are embedded independently
 
 **Important:** You must explicitly specify the `model` parameter when initializing any component. Choose from the available models listed in the [Embeddings Documentation](https://docs.voyageai.com/embeddings/). Recommended choices include:
-- `voyage-3.5` - Latest general-purpose model for best performance
-- `voyage-3.5-lite` - Efficient model with lower latency
-- `voyage-3-large` - High-capacity model for complex tasks
+- `voyage-4-large` - Best general-purpose and multilingual retrieval quality
+- `voyage-4` - Balanced general-purpose and multilingual retrieval quality
+- `voyage-4-lite` - Optimized for latency and cost
 - `voyage-context-3` - Contextualized embeddings for improved retrieval (use with `VoyageContextualizedDocumentEmbedder`)
-- `voyage-2` - Proven general-purpose model
 
 You can set the environment variable `VOYAGE_API_KEY` instead of passing the API key as an argument. To get an API key, please see the [Voyage AI website.](https://www.voyageai.com/)