From 528392c18cf13cb8e8dbeabacedb13122a5635a1 Mon Sep 17 00:00:00 2001 From: fzowl Date: Sun, 21 Dec 2025 14:37:46 +0100 Subject: [PATCH 1/5] voyage-multimodal-3.5 (video) support --- integrations/voyage.md | 125 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 125 insertions(+) diff --git a/integrations/voyage.md b/integrations/voyage.md index b64974ce..316ea41c 100644 --- a/integrations/voyage.md +++ b/integrations/voyage.md @@ -24,17 +24,50 @@ toc: true - [Installation](#installation) - [Usage](#usage) +- [Supported Models](#supported-models) - [Example](#example) - [Contextualized Embeddings Example](#contextualized-embeddings-example) +- [Multimodal Embeddings](#multimodal-embeddings) [Voyage AI](https://voyageai.com/)'s embedding and ranking models are state-of-the-art in retrieval accuracy. The integration supports the following models: - **`voyage-3.5`** and **`voyage-3.5-lite`** - Latest general-purpose embedding models with superior performance - **`voyage-3-large`** and **`voyage-3`** - High-performance general-purpose embedding models - **`voyage-context-3`** - Contextualized chunk embedding model that preserves document context for improved retrieval accuracy +- **`voyage-multimodal-3.5`** - Multimodal model supporting text, images, and video (preview) - **`voyage-2`** and **`voyage-large-2`** - Proven models that outperform `intfloat/e5-mistral-7b-instruct` and `OpenAI/text-embedding-3-large` on the [MTEB Benchmark](https://github.com/embeddings-benchmark/mteb) For the complete list of available models, see the [Embeddings Documentation](https://docs.voyageai.com/embeddings/) and [Contextualized Chunk Embeddings](https://docs.voyageai.com/docs/contextualized-chunk-embeddings). +## Supported Models + +### Text Embedding Models + +| Model | Description | Dimensions | +|-------|-------------|------------| +| `voyage-3.5` | Latest general-purpose embedding model | 1024 | +| `voyage-3.5-lite` | Efficient model with lower latency | 1024 | +| `voyage-3-large` | High-capacity embedding model | 1024 | +| `voyage-3` | High-performance general-purpose model | 1024 | +| `voyage-code-3` | Optimized for code retrieval | 1024 | +| `voyage-finance-2` | Optimized for financial documents | 1024 | +| `voyage-law-2` | Optimized for legal documents | 1024 | +| `voyage-2` | Proven general-purpose model | 1024 | +| `voyage-large-2` | Larger proven model | 1536 | + +### Multimodal Embedding Models + +| Model | Description | Dimensions | Modalities | +|-------|-------------|------------|------------| +| `voyage-multimodal-3` | Multimodal embedding model | 1024 | Text, Images | +| `voyage-multimodal-3.5` | Multimodal embedding model (preview) | 256, 512, 1024, 2048 | Text, Images, Video | + +### Reranker Models + +| Model | Description | +|-------|-------------| +| `rerank-2` | High-accuracy reranker model | +| `rerank-2-lite` | Efficient reranker with lower latency | + ## Installation ```bash @@ -188,6 +221,98 @@ result = embedder.run(documents=docs) For more examples, see the [contextualized embedder example](https://github.com/awinml/voyage-embedders-haystack/blob/voyage_context-3_model/examples/contextualized_embedder_example.py). +## Multimodal Embeddings + +Voyage AI's `voyage-multimodal-3.5` model transforms unstructured data from multiple modalities (text, images, video) into a shared vector space. This enables mixed-media document retrieval and cross-modal semantic search. + +### Features + +- **Multiple modalities**: Supports text, images, and video in a single input +- **Variable dimensions**: Output dimensions of 256, 512, 1024 (default), or 2048 +- **Interleaved content**: Mix text, images, and video in single inputs +- **No preprocessing required**: Process documents with embedded images directly + +### Limits + +- Images: Max 20MB, 16 million pixels +- Video: Max 20MB +- Context: 32,000 tokens +- Token counting: 560 image pixels = 1 token, 1120 video pixels = 1 token + +### Multimodal API Example + +The multimodal model uses a different API endpoint (`/v1/multimodalembeddings`): + +```python +import os +import voyageai +from PIL import Image + +# Initialize client (uses VOYAGE_API_KEY environment variable) +client = voyageai.Client(api_key=os.environ.get("VOYAGE_API_KEY")) + +# Text-only embedding +result = client.multimodal_embed( + inputs=[["Your text here"]], + model="voyage-multimodal-3.5" +) + +# Text + Image embedding +image = Image.open("document.jpg") +result = client.multimodal_embed( + inputs=[["Caption or context", image]], + model="voyage-multimodal-3.5", + output_dimension=1024 # Optional: 256, 512, 1024, or 2048 +) + +print(f"Dimensions: {len(result.embeddings[0])}") +print(f"Tokens used: {result.total_tokens}") +``` + +### Video Embedding Example + +Video inputs require the `voyageai.video_utils` module. Use `optimize_video` to fit videos within the 32K token context: + +```python +import os +import voyageai +from voyageai.video_utils import optimize_video + +client = voyageai.Client(api_key=os.environ.get("VOYAGE_API_KEY")) + +# Load and optimize video (videos can be large in tokens) +with open("video.mp4", "rb") as f: + video_bytes = f.read() + +# Optimize to fit within token budget +optimized_video = optimize_video( + video_bytes, + model="voyage-multimodal-3.5", + max_video_tokens=5000 # Limit tokens used by video +) +print(f"Optimized: {optimized_video.num_frames} frames, ~{optimized_video.estimated_num_tokens} tokens") + +# Embed video (optionally with text context) +result = client.multimodal_embed( + inputs=[[optimized_video]], + model="voyage-multimodal-3.5" +) + +print(f"Dimensions: {len(result.embeddings[0])}") +print(f"Tokens used: {result.total_tokens}") +``` + +### Use Cases + +- Mixed-media document retrieval (PDFs, slides with images) +- Image-text similarity search +- Video content retrieval and search +- Cross-modal semantic search + +For more information, see the [Multimodal Embeddings Documentation](https://docs.voyageai.com/docs/multimodal-embeddings). + +> **Note:** The `voyage-multimodal-3.5` model is currently in preview. Video input requires `voyageai` SDK version 0.3.6 or later. + ## License `voyage-embedders-haystack` is distributed under the terms of the [Apache-2.0 license](https://github.com/awinml/voyage-embedders-haystack/blob/main/LICENSE). From b6c56ffd75f7716c03b61a5082a2ae023b1f26a3 Mon Sep 17 00:00:00 2001 From: fzowl Date: Fri, 30 Jan 2026 14:10:54 +0100 Subject: [PATCH 2/5] Updated: voyage-multimodal-3.5 (video) support --- integrations/voyage.md | 94 ++++++++++++++++++++++-------------------- 1 file changed, 49 insertions(+), 45 deletions(-) diff --git a/integrations/voyage.md b/integrations/voyage.md index 316ea41c..fb2da82a 100644 --- a/integrations/voyage.md +++ b/integrations/voyage.md @@ -76,10 +76,11 @@ pip install voyage-embedders-haystack ## Usage -You can use Voyage models with four components: +You can use Voyage models with five components: - [VoyageTextEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_text_embedder.py) - For embedding query text - [VoyageDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_document_embedder.py) - For embedding documents -- [VoyageContextualizedDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/voyage_context-3_model/src/haystack_integrations/components/embedders/voyage_embedders/voyage_contextualized_document_embedder.py) - For contextualized chunk embeddings with `voyage-context-3` +- [VoyageContextualizedDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_contextualized_document_embedder.py) - For contextualized chunk embeddings with `voyage-context-3` +- [VoyageMultimodalEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_multimodal_embedder.py) - For multimodal embeddings with `voyage-multimodal-3.5` - [VoyageRanker](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/rankers/voyage/ranker.py) - For reranking documents ### Standard Embeddings @@ -239,67 +240,70 @@ Voyage AI's `voyage-multimodal-3.5` model transforms unstructured data from mult - Context: 32,000 tokens - Token counting: 560 image pixels = 1 token, 1120 video pixels = 1 token -### Multimodal API Example +### Basic Multimodal Example -The multimodal model uses a different API endpoint (`/v1/multimodalembeddings`): +Use the `VoyageMultimodalEmbedder` component for multimodal embeddings. Each input is a list of content items (text, images, or videos): ```python -import os -import voyageai -from PIL import Image - -# Initialize client (uses VOYAGE_API_KEY environment variable) -client = voyageai.Client(api_key=os.environ.get("VOYAGE_API_KEY")) +from haystack.dataclasses import ByteStream +from haystack_integrations.components.embedders.voyage_embedders import VoyageMultimodalEmbedder # Text-only embedding -result = client.multimodal_embed( - inputs=[["Your text here"]], - model="voyage-multimodal-3.5" -) +embedder = VoyageMultimodalEmbedder(model="voyage-multimodal-3.5") +result = embedder.run(inputs=[["What is in this image?"]]) +print(f"Embedding dimensions: {len(result['embeddings'][0])}") + +# Mixed text and image embedding +image_bytes = ByteStream.from_file_path("image.jpg") +result = embedder.run(inputs=[["Describe this image:", image_bytes]]) +print(f"Tokens used: {result['meta']['total_tokens']}") +``` + +### Multimodal Example with Custom Dimensions + +```python +from haystack.dataclasses import ByteStream +from haystack_integrations.components.embedders.voyage_embedders import VoyageMultimodalEmbedder -# Text + Image embedding -image = Image.open("document.jpg") -result = client.multimodal_embed( - inputs=[["Caption or context", image]], +# Configure output dimensions (256, 512, 1024, or 2048) +embedder = VoyageMultimodalEmbedder( model="voyage-multimodal-3.5", - output_dimension=1024 # Optional: 256, 512, 1024, or 2048 + output_dimension=2048, # Higher dimensions for better accuracy + input_type="document", # Optimize for document retrieval ) -print(f"Dimensions: {len(result.embeddings[0])}") -print(f"Tokens used: {result.total_tokens}") +# Embed multiple inputs at once +image1 = ByteStream.from_file_path("doc1.jpg") +image2 = ByteStream.from_file_path("doc2.jpg") + +result = embedder.run(inputs=[ + ["Document about machine learning", image1], + ["Technical diagram", image2], +]) + +print(f"Number of embeddings: {len(result['embeddings'])}") +print(f"Image pixels processed: {result['meta']['image_pixels']}") ``` ### Video Embedding Example -Video inputs require the `voyageai.video_utils` module. Use `optimize_video` to fit videos within the 32K token context: +Video inputs require the `voyageai.video_utils` module: ```python -import os -import voyageai -from voyageai.video_utils import optimize_video - -client = voyageai.Client(api_key=os.environ.get("VOYAGE_API_KEY")) +from voyageai.video_utils import Video +from haystack_integrations.components.embedders.voyage_embedders import VoyageMultimodalEmbedder -# Load and optimize video (videos can be large in tokens) -with open("video.mp4", "rb") as f: - video_bytes = f.read() +embedder = VoyageMultimodalEmbedder(model="voyage-multimodal-3.5") -# Optimize to fit within token budget -optimized_video = optimize_video( - video_bytes, - model="voyage-multimodal-3.5", - max_video_tokens=5000 # Limit tokens used by video -) -print(f"Optimized: {optimized_video.num_frames} frames, ~{optimized_video.estimated_num_tokens} tokens") +# Load video using VoyageAI's Video utility +video = Video.from_path("video.mp4", model="voyage-multimodal-3.5") -# Embed video (optionally with text context) -result = client.multimodal_embed( - inputs=[[optimized_video]], - model="voyage-multimodal-3.5" -) +# Embed video with optional text context +result = embedder.run(inputs=[["Describe this video:", video]]) -print(f"Dimensions: {len(result.embeddings[0])}") -print(f"Tokens used: {result.total_tokens}") +print(f"Embedding dimensions: {len(result['embeddings'][0])}") +print(f"Video pixels processed: {result['meta']['video_pixels']}") +print(f"Total tokens: {result['meta']['total_tokens']}") ``` ### Use Cases @@ -311,7 +315,7 @@ print(f"Tokens used: {result.total_tokens}") For more information, see the [Multimodal Embeddings Documentation](https://docs.voyageai.com/docs/multimodal-embeddings). -> **Note:** The `voyage-multimodal-3.5` model is currently in preview. Video input requires `voyageai` SDK version 0.3.6 or later. +> **Note:** The `voyage-multimodal-3.5` model is currently in preview. Video input requires `voyageai` SDK version 0.3.6 or later and `pillow` for image processing. ## License From 518e9cf07c2033a610dda5af289e1ac7df0949d0 Mon Sep 17 00:00:00 2001 From: fzowl Date: Fri, 30 Jan 2026 14:14:44 +0100 Subject: [PATCH 3/5] Updated: voyage-multimodal-3.5 (video) support --- integrations/voyage.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/integrations/voyage.md b/integrations/voyage.md index fb2da82a..d4e16475 100644 --- a/integrations/voyage.md +++ b/integrations/voyage.md @@ -315,8 +315,6 @@ print(f"Total tokens: {result['meta']['total_tokens']}") For more information, see the [Multimodal Embeddings Documentation](https://docs.voyageai.com/docs/multimodal-embeddings). -> **Note:** The `voyage-multimodal-3.5` model is currently in preview. Video input requires `voyageai` SDK version 0.3.6 or later and `pillow` for image processing. - ## License `voyage-embedders-haystack` is distributed under the terms of the [Apache-2.0 license](https://github.com/awinml/voyage-embedders-haystack/blob/main/LICENSE). From 8ded8480374bcb7d4e0f0af04e11caf2f5fb024d Mon Sep 17 00:00:00 2001 From: fzowl Date: Fri, 13 Feb 2026 22:34:18 +0100 Subject: [PATCH 4/5] Updated: voyage-multimodal-3.5 (video) support --- integrations/voyage.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/integrations/voyage.md b/integrations/voyage.md index d4e16475..03b21a7d 100644 --- a/integrations/voyage.md +++ b/integrations/voyage.md @@ -250,12 +250,12 @@ from haystack_integrations.components.embedders.voyage_embedders import VoyageMu # Text-only embedding embedder = VoyageMultimodalEmbedder(model="voyage-multimodal-3.5") -result = embedder.run(inputs=[["What is in this image?"]]) +result = embedder.run(inputs=[["A sunset over the ocean"]]) print(f"Embedding dimensions: {len(result['embeddings'][0])}") # Mixed text and image embedding image_bytes = ByteStream.from_file_path("image.jpg") -result = embedder.run(inputs=[["Describe this image:", image_bytes]]) +result = embedder.run(inputs=[["Product photo for online store", image_bytes]]) print(f"Tokens used: {result['meta']['total_tokens']}") ``` @@ -299,7 +299,7 @@ embedder = VoyageMultimodalEmbedder(model="voyage-multimodal-3.5") video = Video.from_path("video.mp4", model="voyage-multimodal-3.5") # Embed video with optional text context -result = embedder.run(inputs=[["Describe this video:", video]]) +result = embedder.run(inputs=[["Machine learning tutorial", video]]) print(f"Embedding dimensions: {len(result['embeddings'][0])}") print(f"Video pixels processed: {result['meta']['video_pixels']}") From 5b15ecd4cd7559d0c9e6737fe802d93026547e09 Mon Sep 17 00:00:00 2001 From: fzowl Date: Fri, 13 Feb 2026 22:43:49 +0100 Subject: [PATCH 5/5] Adding VoyageAI V4 family models --- integrations/voyage.md | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/integrations/voyage.md b/integrations/voyage.md index 03b21a7d..2a29c437 100644 --- a/integrations/voyage.md +++ b/integrations/voyage.md @@ -30,11 +30,11 @@ toc: true - [Multimodal Embeddings](#multimodal-embeddings) [Voyage AI](https://voyageai.com/)'s embedding and ranking models are state-of-the-art in retrieval accuracy. The integration supports the following models: -- **`voyage-3.5`** and **`voyage-3.5-lite`** - Latest general-purpose embedding models with superior performance -- **`voyage-3-large`** and **`voyage-3`** - High-performance general-purpose embedding models +- **`voyage-4-large`**, **`voyage-4`**, and **`voyage-4-lite`** - Latest general-purpose embedding models with shared embedding space and MoE architecture +- **`voyage-3.5`** and **`voyage-3.5-lite`** - General-purpose embedding models with superior performance +- **`voyage-code-3`** - Optimized for code retrieval - **`voyage-context-3`** - Contextualized chunk embedding model that preserves document context for improved retrieval accuracy - **`voyage-multimodal-3.5`** - Multimodal model supporting text, images, and video (preview) -- **`voyage-2`** and **`voyage-large-2`** - Proven models that outperform `intfloat/e5-mistral-7b-instruct` and `OpenAI/text-embedding-3-large` on the [MTEB Benchmark](https://github.com/embeddings-benchmark/mteb) For the complete list of available models, see the [Embeddings Documentation](https://docs.voyageai.com/embeddings/) and [Contextualized Chunk Embeddings](https://docs.voyageai.com/docs/contextualized-chunk-embeddings). @@ -44,15 +44,12 @@ For the complete list of available models, see the [Embeddings Documentation](ht | Model | Description | Dimensions | |-------|-------------|------------| -| `voyage-3.5` | Latest general-purpose embedding model | 1024 | +| `voyage-4-large` | The best general-purpose and multilingual retrieval quality | 1024 (default), 256, 512, 2048 | +| `voyage-4` | Optimized for general-purpose and multilingual retrieval quality | 1024 (default), 256, 512, 2048 | +| `voyage-4-lite` | Optimized for latency and cost | 1024 (default), 256, 512, 2048 | +| `voyage-3.5` | General-purpose embedding model | 1024 | | `voyage-3.5-lite` | Efficient model with lower latency | 1024 | -| `voyage-3-large` | High-capacity embedding model | 1024 | -| `voyage-3` | High-performance general-purpose model | 1024 | | `voyage-code-3` | Optimized for code retrieval | 1024 | -| `voyage-finance-2` | Optimized for financial documents | 1024 | -| `voyage-law-2` | Optimized for legal documents | 1024 | -| `voyage-2` | Proven general-purpose model | 1024 | -| `voyage-large-2` | Larger proven model | 1536 | ### Multimodal Embedding Models @@ -92,11 +89,10 @@ To create semantic embeddings for documents, use `VoyageDocumentEmbedder` in you For improved retrieval quality, use `VoyageContextualizedDocumentEmbedder` with the `voyage-context-3` model. This component preserves context between related document chunks by grouping them together during embedding, reducing context loss that occurs when chunks are embedded independently **Important:** You must explicitly specify the `model` parameter when initializing any component. Choose from the available models listed in the [Embeddings Documentation](https://docs.voyageai.com/embeddings/). Recommended choices include: -- `voyage-3.5` - Latest general-purpose model for best performance -- `voyage-3.5-lite` - Efficient model with lower latency -- `voyage-3-large` - High-capacity model for complex tasks +- `voyage-4-large` - Best general-purpose and multilingual retrieval quality +- `voyage-4` - Balanced general-purpose and multilingual retrieval quality +- `voyage-4-lite` - Optimized for latency and cost - `voyage-context-3` - Contextualized embeddings for improved retrieval (use with `VoyageContextualizedDocumentEmbedder`) -- `voyage-2` - Proven general-purpose model You can set the environment variable `VOYAGE_API_KEY` instead of passing the API key as an argument. To get an API key, please see the [Voyage AI website.](https://www.voyageai.com/)