diff --git a/integrations/voyage.md b/integrations/voyage.md index b64974c..2a29c43 100644 --- a/integrations/voyage.md +++ b/integrations/voyage.md @@ -24,17 +24,47 @@ toc: true - [Installation](#installation) - [Usage](#usage) +- [Supported Models](#supported-models) - [Example](#example) - [Contextualized Embeddings Example](#contextualized-embeddings-example) +- [Multimodal Embeddings](#multimodal-embeddings) [Voyage AI](https://voyageai.com/)'s embedding and ranking models are state-of-the-art in retrieval accuracy. The integration supports the following models: -- **`voyage-3.5`** and **`voyage-3.5-lite`** - Latest general-purpose embedding models with superior performance -- **`voyage-3-large`** and **`voyage-3`** - High-performance general-purpose embedding models +- **`voyage-4-large`**, **`voyage-4`**, and **`voyage-4-lite`** - Latest general-purpose embedding models with shared embedding space and MoE architecture +- **`voyage-3.5`** and **`voyage-3.5-lite`** - General-purpose embedding models with superior performance +- **`voyage-code-3`** - Optimized for code retrieval - **`voyage-context-3`** - Contextualized chunk embedding model that preserves document context for improved retrieval accuracy -- **`voyage-2`** and **`voyage-large-2`** - Proven models that outperform `intfloat/e5-mistral-7b-instruct` and `OpenAI/text-embedding-3-large` on the [MTEB Benchmark](https://github.com/embeddings-benchmark/mteb) +- **`voyage-multimodal-3.5`** - Multimodal model supporting text, images, and video (preview) For the complete list of available models, see the [Embeddings Documentation](https://docs.voyageai.com/embeddings/) and [Contextualized Chunk Embeddings](https://docs.voyageai.com/docs/contextualized-chunk-embeddings). +## Supported Models + +### Text Embedding Models + +| Model | Description | Dimensions | +|-------|-------------|------------| +| `voyage-4-large` | The best general-purpose and multilingual retrieval quality | 1024 (default), 256, 512, 2048 | +| `voyage-4` | Optimized for general-purpose and multilingual retrieval quality | 1024 (default), 256, 512, 2048 | +| `voyage-4-lite` | Optimized for latency and cost | 1024 (default), 256, 512, 2048 | +| `voyage-3.5` | General-purpose embedding model | 1024 | +| `voyage-3.5-lite` | Efficient model with lower latency | 1024 | +| `voyage-code-3` | Optimized for code retrieval | 1024 | + +### Multimodal Embedding Models + +| Model | Description | Dimensions | Modalities | +|-------|-------------|------------|------------| +| `voyage-multimodal-3` | Multimodal embedding model | 1024 | Text, Images | +| `voyage-multimodal-3.5` | Multimodal embedding model (preview) | 256, 512, 1024, 2048 | Text, Images, Video | + +### Reranker Models + +| Model | Description | +|-------|-------------| +| `rerank-2` | High-accuracy reranker model | +| `rerank-2-lite` | Efficient reranker with lower latency | + ## Installation ```bash @@ -43,10 +73,11 @@ pip install voyage-embedders-haystack ## Usage -You can use Voyage models with four components: +You can use Voyage models with five components: - [VoyageTextEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_text_embedder.py) - For embedding query text - [VoyageDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_document_embedder.py) - For embedding documents -- [VoyageContextualizedDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/voyage_context-3_model/src/haystack_integrations/components/embedders/voyage_embedders/voyage_contextualized_document_embedder.py) - For contextualized chunk embeddings with `voyage-context-3` +- [VoyageContextualizedDocumentEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_contextualized_document_embedder.py) - For contextualized chunk embeddings with `voyage-context-3` +- [VoyageMultimodalEmbedder](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/embedders/voyage_embedders/voyage_multimodal_embedder.py) - For multimodal embeddings with `voyage-multimodal-3.5` - [VoyageRanker](https://github.com/awinml/voyage-embedders-haystack/blob/main/src/haystack_integrations/components/rankers/voyage/ranker.py) - For reranking documents ### Standard Embeddings @@ -58,11 +89,10 @@ To create semantic embeddings for documents, use `VoyageDocumentEmbedder` in you For improved retrieval quality, use `VoyageContextualizedDocumentEmbedder` with the `voyage-context-3` model. This component preserves context between related document chunks by grouping them together during embedding, reducing context loss that occurs when chunks are embedded independently **Important:** You must explicitly specify the `model` parameter when initializing any component. Choose from the available models listed in the [Embeddings Documentation](https://docs.voyageai.com/embeddings/). Recommended choices include: -- `voyage-3.5` - Latest general-purpose model for best performance -- `voyage-3.5-lite` - Efficient model with lower latency -- `voyage-3-large` - High-capacity model for complex tasks +- `voyage-4-large` - Best general-purpose and multilingual retrieval quality +- `voyage-4` - Balanced general-purpose and multilingual retrieval quality +- `voyage-4-lite` - Optimized for latency and cost - `voyage-context-3` - Contextualized embeddings for improved retrieval (use with `VoyageContextualizedDocumentEmbedder`) -- `voyage-2` - Proven general-purpose model You can set the environment variable `VOYAGE_API_KEY` instead of passing the API key as an argument. To get an API key, please see the [Voyage AI website.](https://www.voyageai.com/) @@ -188,6 +218,99 @@ result = embedder.run(documents=docs) For more examples, see the [contextualized embedder example](https://github.com/awinml/voyage-embedders-haystack/blob/voyage_context-3_model/examples/contextualized_embedder_example.py). +## Multimodal Embeddings + +Voyage AI's `voyage-multimodal-3.5` model transforms unstructured data from multiple modalities (text, images, video) into a shared vector space. This enables mixed-media document retrieval and cross-modal semantic search. + +### Features + +- **Multiple modalities**: Supports text, images, and video in a single input +- **Variable dimensions**: Output dimensions of 256, 512, 1024 (default), or 2048 +- **Interleaved content**: Mix text, images, and video in single inputs +- **No preprocessing required**: Process documents with embedded images directly + +### Limits + +- Images: Max 20MB, 16 million pixels +- Video: Max 20MB +- Context: 32,000 tokens +- Token counting: 560 image pixels = 1 token, 1120 video pixels = 1 token + +### Basic Multimodal Example + +Use the `VoyageMultimodalEmbedder` component for multimodal embeddings. Each input is a list of content items (text, images, or videos): + +```python +from haystack.dataclasses import ByteStream +from haystack_integrations.components.embedders.voyage_embedders import VoyageMultimodalEmbedder + +# Text-only embedding +embedder = VoyageMultimodalEmbedder(model="voyage-multimodal-3.5") +result = embedder.run(inputs=[["A sunset over the ocean"]]) +print(f"Embedding dimensions: {len(result['embeddings'][0])}") + +# Mixed text and image embedding +image_bytes = ByteStream.from_file_path("image.jpg") +result = embedder.run(inputs=[["Product photo for online store", image_bytes]]) +print(f"Tokens used: {result['meta']['total_tokens']}") +``` + +### Multimodal Example with Custom Dimensions + +```python +from haystack.dataclasses import ByteStream +from haystack_integrations.components.embedders.voyage_embedders import VoyageMultimodalEmbedder + +# Configure output dimensions (256, 512, 1024, or 2048) +embedder = VoyageMultimodalEmbedder( + model="voyage-multimodal-3.5", + output_dimension=2048, # Higher dimensions for better accuracy + input_type="document", # Optimize for document retrieval +) + +# Embed multiple inputs at once +image1 = ByteStream.from_file_path("doc1.jpg") +image2 = ByteStream.from_file_path("doc2.jpg") + +result = embedder.run(inputs=[ + ["Document about machine learning", image1], + ["Technical diagram", image2], +]) + +print(f"Number of embeddings: {len(result['embeddings'])}") +print(f"Image pixels processed: {result['meta']['image_pixels']}") +``` + +### Video Embedding Example + +Video inputs require the `voyageai.video_utils` module: + +```python +from voyageai.video_utils import Video +from haystack_integrations.components.embedders.voyage_embedders import VoyageMultimodalEmbedder + +embedder = VoyageMultimodalEmbedder(model="voyage-multimodal-3.5") + +# Load video using VoyageAI's Video utility +video = Video.from_path("video.mp4", model="voyage-multimodal-3.5") + +# Embed video with optional text context +result = embedder.run(inputs=[["Machine learning tutorial", video]]) + +print(f"Embedding dimensions: {len(result['embeddings'][0])}") +print(f"Video pixels processed: {result['meta']['video_pixels']}") +print(f"Total tokens: {result['meta']['total_tokens']}") +``` + +### Use Cases + +- Mixed-media document retrieval (PDFs, slides with images) +- Image-text similarity search +- Video content retrieval and search +- Cross-modal semantic search + +For more information, see the [Multimodal Embeddings Documentation](https://docs.voyageai.com/docs/multimodal-embeddings). + ## License `voyage-embedders-haystack` is distributed under the terms of the [Apache-2.0 license](https://github.com/awinml/voyage-embedders-haystack/blob/main/LICENSE).