diff --git a/pages/generative-apis/reference-content/supported-models.mdx b/pages/generative-apis/reference-content/supported-models.mdx index be2d3ab1b8..1bb44a8944 100644 --- a/pages/generative-apis/reference-content/supported-models.mdx +++ b/pages/generative-apis/reference-content/supported-models.mdx @@ -3,7 +3,7 @@ title: Generative APIs supported models description: This page lists the open-source large language models supported by Scaleway. tags: dates: - validation: 2026-04-24 + validation: 2026-05-26 posted: 2024-04-18 --- This page provides a quick overview of available models in Scaleway's catalog and their core attributes. Expand any model below to see usage examples and detailed capabilities. @@ -22,19 +22,31 @@ This page provides a quick overview of available models in Scaleway's catalog an | Model name | Available in Serverless? | Maximum context window (tokens) | Maximum output (tokens) - Serverless | Modalities | License \* | |------------|--------------|--------------|------------|-----------|-----------| | [`gpt-oss-120b`](#gpt-oss-120b)| Yes | 128k | 32k | Text | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`gpt-oss-20b`](#gpt-oss-20b) | No | 131k | N/A | Text | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`whisper-large-v3`](#whisper-large-v3) | Yes | - | - | Audio transcription | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`qwen3.6-35b-a3b`](#qwen36-35b-a3b)| Yes | 256k | 32k | Text, Code, Vision | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`qwen3.5-397b-a17b`](#qwen35-397b-a17b)| Yes | 250k | 16k | Text, Code, Vision | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`qwen3.5-35b-a3b`](#qwen35-35b-a3b) | No | 262k | N/A | Text, Code, Vision | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`qwen3.5-122b-a10b`](#qwen35-122b-a10b) | No | 262k | N/A | Text, Code, Vision | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`qwen3-235b-a22b-instruct-2507`](#qwen3-235b-a22b-instruct-2507) | Yes | 250k | 16k | Text | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`qwen3-235b-a22b-thinking-2507`](#qwen3-235b-a22b-thinking-2507) | No | 262k | N/A | Text | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`qwen3-embedding-8b`](#qwen3-embedding-8b) | Yes | 32k | N/A | Embeddings | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`qwen3-coder-30b-a3b-instruct`](#qwen3-coder-30b-a3b-instruct) | Yes | 128k | 32k | Code | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | [EOL for Serverless](#end-of-life-eol-models-for-serverless) | 32k | [EOL for Serverless](#end-of-life-eol-models-for-serverless) | Code | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`gemma-4-31b-it`](#gemma-4-31b-it) | No | 262k | N/A | Text, Vision | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`gemma-4-26b-a4b-it`](#gemma-4-26b-a4b-it) | Yes | 256k | 32k | Text, Vision | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`gemma-3-27b-it`](#gemma-3-27b-it) | Yes | 40k | 8k | Text, Vision | [Gemma](https://ai.google.dev/gemma/terms) | | [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Yes | 100k (Serverless)/ 128k (Dedicated)| 16k | Text | [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | | [`llama-3.1-70b-instruct`](#llama-31-70b-instruct) | [EOL for Serverless](#end-of-life-eol-models-for-serverless) | 128k | [EOL for Serverless](#end-of-life-eol-models-for-serverless) | Text | [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) | | [`llama-3.1-8b-instruct`](#llama-31-8b-instruct) | [EOL for Serverless](#end-of-life-eol-models-for-serverless) | 128k | [EOL for Serverless](#end-of-life-eol-models-for-serverless) | Text | [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) | +| [`llama-3-8b-instruct`](#llama-3-8b-instruct) | No | 8k | N/A | Text | [Meta Llama 3](https://www.llama.com/llama3/license/) | | [`llama-3-70b-instruct`](#llama-3-70b-instruct) | No | 8k | N/A | Text | [Llama 3 Community](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE) | | [`llama-3.1-nemotron-70b-instruct`](#llama-31-nemotron-70b-instruct) | No | 128k | N/A | Text | [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE) | | [`deepseek-r1-distill-llama-70b`](#deepseek-r1-distill-llama-70b) | [EOL for Serverless](#end-of-life-eol-models-for-serverless) | 16k (Serverless) / 128k (Dedicated) | 4k | Text | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE) | | [`deepseek-r1-distill-llama-8b`](#deepseek-r1-distill-llama-8b) | No | 128k | N/A | Text | [MIT](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/blob/main/LICENSE) and [Llama 3.1 Community](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/blob/main/LICENSE) | | [`mistral-7b-instruct-v0.3`](#mistral-7b-instruct-v03) | No | 32k | N/A | Text | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`mistral-large-3-675b-instruct-2512`](#mistral-large-3-675b-instruct-2512) | No | 250k | N/A | Text, Vision | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`mistral-medium-3.5-128b`](#mistral-medium-35-128b) | Yes | 256k | 16k | Text, Vision | [Modified MIT License](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B/blob/main/LICENSE) | | [`mistral-small-3.2-24b-instruct-2506`](#mistral-small-32-24b-instruct-2506) | Yes | 128k | 32k | Text, Vision | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`mistral-small-3.1-24b-instruct-2503`](#mistral-small-31-24b-instruct-2503) | [EOL for Serverless](#end-of-life-eol-models-for-serverless) | 128k | [EOL for Serverless](#end-of-life-eol-models-for-serverless) | Text, Vision | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`mistral-small-24b-instruct-2501`](#mistral-small-24b-instruct-2501) | No | 32k | N/A | Text | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | @@ -47,11 +59,9 @@ This page provides a quick overview of available models in Scaleway's catalog an | [`pixtral-12b-2409`](#pixtral-12b-2409) | Yes | 128k | 4k | Text, Vision | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`molmo-72b-0924`](#molmo-72b-0924) | No | 50k | N/A | Text, Vision | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) and [Twonyi Qianwen license](https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE)| | [`holo2-30b-a3b`](#holo2-30b-a3b)| Yes | 22k | 32k | Text, Vision | [CC-BY-NC-4.0](https://spdx.org/licenses/CC-BY-NC-4.0)| -| [`qwen3-embedding-8b`](#qwen3-embedding-8b) | Yes | 32k | N/A | Embeddings | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | -| [`qwen3-coder-30b-a3b-instruct`](#qwen3-coder-30b-a3b-instruct) | Yes | 128k | 32k | Code | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | -| [`qwen2.5-coder-32b-instruct`](#qwen25-coder-32b-instruct) | [EOL for Serverless](#end-of-life-eol-models-for-serverless) | 32k | [EOL for Serverless](#end-of-life-eol-models-for-serverless) | Code | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`bge-multilingual-gemma2`](#bge-multilingual-gemma2) | Yes | 8k | N/A | Embeddings | [Gemma](https://ai.google.dev/gemma/terms) | -| [`sentence-t5-xxl`](#sentence-t5-xxl) | [EOL for Serverless](#end-of-life-eol-models-for-serverless) | 512 | [EOL for Serverless](#end-of-life-eol-models-for-serverless) | Embeddings | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |qwen3-235b-a22b-instruct-2507 +| [`sentence-t5-xxl`](#sentence-t5-xxl) | [EOL for Serverless](#end-of-life-eol-models-for-serverless) | 512 | [EOL for Serverless](#end-of-life-eol-models-for-serverless) | Embeddings | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`minimax-m2.5`](#minimax-m25) | No | 197k | N/A | Code | [MIT](https://choosealicense.com/licenses/mit/) | \*Licences which are not open-weight and may restrict commercial usage (such as `CC-BY-NC-4.0`), do not apply to usage through Scaleway Products due to existing partnerships between Scaleway and the corresponding providers. Original licences are provided for transparency only. @@ -66,6 +76,31 @@ This page provides a quick overview of available models in Scaleway's catalog an Vision models can understand and analyze images, not generate them. You will use vision models through the `/v1/chat/completions` endpoint. +### Gemma-4-31b-it + +Released in April 2026, Gemma-4-31b-it is a frontier small-sized model to perform agentic and reasoning tasks on many languages. + +| Attribute | Value | +|-----------|-------| +| Provider | Google | +| Supports structured output | Yes | +| Supports function calling | Yes | +| Supports parallel tool calling | Yes | +| Supported reasoning efforts | `none`, `low`, `medium`, `high` | +| Supported image formats | PNG, JPEG, WEBP, and non-animated GIFs | +| Maximum image resolution (pixels) | 896x896 | +| Token dimension (pixels)| 64x64 | +| Supported languages | English, Chinese, Japanese, Korean, and 136 additional languages | +| Compatible Instances (max context in tokens\*) - Dedicated Deployment | H100 (66k), H100-2 (262k), H100-SXM-2 (262k), H100-SXM-4 | +| Hugging Face model card | [google/gemma-4-31B-it](https://huggingface.co/google/gemma-4-31B-it) | + +\*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model. + +#### Model name +``` +google/gemma-4-31b-it:bf16 +``` + ### Gemma-3-27b-it Gemma-3-27b-it is a model developed by Google to perform text processing and image analysis on many languages. The model was not trained specifically to output function / tool call tokens. Hence function calling is currently supported, but reliability remains limited. @@ -92,6 +127,32 @@ Pan & Scan is not yet supported for Gemma 3 images. This means that high-resolut google/gemma-3-27b-it:bf16 ``` +### Gemma-4-26b-a4b-it + +Released in April 2026, Gemma-4-26b-a4b-it is a frontier small-sized model to perform agentic and reasoning tasks on many languages. +This model has a Mixture-of-Expert (MoE) architecture, providing significant throughput and fitting on a single H100 GPU while supporting its maximum context size. + +| Attribute | Value | +|-----------|-------| +| Provider | Google | +| Supports structured output | Yes | +| Supports function calling | Yes | +| Supports parallel tool-calling | Yes | +| Supported reasoning efforts | `none`, `low`, `medium`, `high` | +| Supported image formats | PNG, JPEG, WEBP, and non-animated GIFs | +| Maximum image resolution (pixels) | 896x896 | +| Token dimension (pixels) | 64x64 | +| Supported languages | English, Chinese, Japanese, Korean, and 136 additional languages | +| Compatible Instances (max context in tokens\*) - Dedicated Deployment | H100 (262k), H100-2 (262k), H100-SXM-2 (262k), H100-SXM-4 | +| Hugging Face model card | [gemma-4-26B-A4B-it](https://huggingface.co/google/gemma-4-26B-A4B-it) | + +\*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model. + +#### Model name +``` +google/gemma-4-26b-a4b-it:bf16 +``` + ### Mistral-large-3-675b-instruct-2512 Mistral-large-3-675b-instruct-2512 is a frontier model, performing among the best open-weight models as of December 2025. It is ideal for agentic workflows and image understanding. @@ -113,6 +174,30 @@ Mistral-large-3-675b-instruct-2512 is a frontier model, performing among the bes ``` mistral/mistral-large-3-675b-instruct-2512:fp4 ``` +### Mistral-medium-3.5-128b + +Mistral-medium-3.5 is a unified model from Mistral with strong performance for instruct, reasoning, and coding tasks. + +| Attribute | Value | +|-----------|-------| +| Provider | Mistral | +| Supports structured output | Yes | +| Supports function calling | Yes | +| Supports parallel tool-calling | Yes | +| Supported reasoning efforts | `none`, `high` | +| Supported image formats | PNG, JPEG, WEBP, GIF | +| Maximum image resolution (pixels) | 1540x1540 | +| Token dimension (pixels) | 28x28 | +| Supported languages | English, French, German, Spanish, Portuguese, Italian, and 18 additional languages and dialects | +| Compatible Instances (max context in tokens\*) - Dedicated Deployment | H100-SXM-4 (256k), H100-SXM-8 (256k) | +| Hugging Face model card | [mistralai/Mistral-Medium-3.5-128B](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B) | + +\*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model. + +#### Model name +``` +mistral/mistral-medium-3.5-128b:fp8 +``` ### Mistral-small-3.2-24b-instruct-2506 Mistral-small-3.2-24b-instruct-2506 is an improved version of Mistral-small-3.1, which performs better on tool-calling. @@ -167,6 +252,32 @@ mistral/mistral-small-3.1-24b-instruct-2503:bf16 mistral/mistral-small-3.1-24b-instruct-2503:fp8 ``` +### Qwen3.6-35b-a3b + +Released in April 2026, Qwen3.6-35b-a3b is a state-of-the-art small-sized model optimized for agentic tasks and logical reasoning. + +| Attribute | Value | +|-----------|-------| +| Provider | Qwen | +| Supports structured output | Yes | +| Supports function calling | Yes | +| Supports parallel tool-calling | Yes | +| Supported reasoning efforts | `none`, `low`, `medium`, `high` | +| Supported image formats | PNG, JPEG, GIF | +| Maximum image resolution (pixels) | 4096x4096 | +| Token dimension (pixels) | 32x32 | +| Supported languages | English, French, Portuguese, German, Romanian, Swedish, and 70 additional languages and dialects | +| Compatible Instances (max context in tokens\*) - Dedicated Deployment | H100, H100-2, H100-SXM-2 | +| Hugging Face model card | [qwen3.6-35b-a3b](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) | + +\*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model. + +#### Model name +``` +qwen/qwen3.6-35b-a3b:bf16 +qwen/qwen3.6-35b-a3b:fp8 +``` + ### Qwen3.5-397b-a17b Qwen3.5-397b-a17b is a model developed by Qwen to perform text processing, agentic coding, image, and video analysis in several languages. This model was released as a frontier reasoning model on 16 February 2026. @@ -177,6 +288,7 @@ This model was released as a frontier reasoning model on 16 February 2026. | Supports structured output | Yes | | Supports function calling | Yes | | Supports parallel tool-calling | Yes | +| Supported reasoning efforts | `none`, `low`, `medium`, `high` | | Supported image formats | PNG, JPEG, WEBP, and non-animated GIFs | | Supported video formats | MP4, MPEG, MOV, OGG and WEBM | | Maximum image resolution (pixels) | 4096x4096 | @@ -192,6 +304,42 @@ This model was released as a frontier reasoning model on 16 February 2026. qwen/qwen3.5-397b-a17b:int4 ``` +### Qwen3.5-35b-a3b + +Released in March 2026, Qwen3.5-35b-a3b is a state-of-the-art small-sized model optimized for agentic and coding tasks, as well as logical reasoning. + +| Attribute | Value | +|-----------|-------| +| Provider | Qwen | +| Supports parallel tool-calling | Yes | +| Compatible Instances (max context in tokens\*) - Dedicated Deployment | H100-2, H100-SXM-2, H100-SXM-4, H100-SXM-8 | +| Hugging Face model card | [Qwen3.5-35B-A3B-GPTQ-Int4](https://huggingface.co/Qwen/Qwen3.5-35B-A3B-GPTQ-Int4) | + +\*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model. + +#### Model name +``` +qwen/qwen3.5-35b-a3b:int4 +``` + +### Qwen3.5-122b-a10b + +Released in March 2026, Qwen3.5-122b-a10b is a state-of-the-art medium-sized model optimized for agentic and coding tasks, as well as logical reasoning. + +| Attribute | Value | +|-----------|-------| +| Provider | Qwen | +| Supports parallel tool-calling | Yes | +| Compatible Instances (max context in tokens\*) - Dedicated Deployment | H100-2, H100-SXM-2, H100-SXM-4, H100-SXM-8 | +| Hugging Face model card | [Qwen/Qwen3.5-122B-A10B-GPTQ-Int4](https://huggingface.co/Qwen/Qwen3.5-122B-A10B-GPTQ-Int4) | + +\*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model. + +#### Model name +``` +qwen/qwen3.5-122b-a10b:int4 +``` + ### Pixtral-12b-2409 Pixtral is a vision language model introducing a novel architecture: 12B parameter multimodal decoder plus 400M parameter vision encoder. It can analyze images and offer insights from visual content alongside text. @@ -325,6 +473,44 @@ openai/whisper-large-v3:bf16 ## Text models +### Gpt-oss-120b +Released 5 August 2025, GPT OSS 120B is an open-weight model providing significant throughput performance and reasoning capabilities. +Currently, this model should be used through Responses API, as Chat Completion does not yet support tool-calling for this model. + +| Attribute | Value | +|-----------|-------| +| Provider | OpenAI | +| Supports structured output | Yes | +| Supports function calling | Yes | +| Supports parallel tool-calling | Yes | +| Supported reasoning efforts | `low`, `medium`, `high` | +| Supported languages | English | +| Compatible Instances (max context in tokens\*) - Dedicated Deployment | H100 | +| Hugging Face model card | [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) | + +\*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model. + +#### Model name +``` +openai/gpt-oss-120b:fp4 +``` + +### Gpt-oss-20b +GPT OSS 20b is an OpenAI open-weight model designed for powerful reasoning, agentic tasks, and versatile developer use cases. + +| Attribute | Value | +|-----------|-------| +| Provider | OpenAI | +| Supports parallel tool-calling | Yes | +| Compatible Instances (max context in tokens\*) - Dedicated Deployment | H100, H100-2, H100-SXM-2, H100-SXM-4, H100-SXM-8 | +| Hugging Face model card | [gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) | + + +#### Model name +``` +openai/gpt-oss-20b:fp4 +``` + ### Qwen3-235b-a22b-instruct-2507 Released 23 July 2025, Qwen 3 235B A22B is an open-weight model, competitive in multiple benchmarks (such as [LM Arena for text use cases](https://lmarena.ai/leaderboard)) compared to Gemini 2.5 Pro and GPT4.5. @@ -346,25 +532,20 @@ Released 23 July 2025, Qwen 3 235B A22B is an open-weight model, competitive in qwen/qwen3-235b-a22b-instruct-2507 ``` -### Gpt-oss-120b -Released 5 August 2025, GPT OSS 120B is an open-weight model providing significant throughput performance and reasoning capabilities. -Currently, this model should be used through Responses API, as Chat Completion does not yet support tool-calling for this model. +### Qwen3-235b-a22b-thinking-2507 +Qwen3-235b-a22b-thinking-2507 is a highly capable and versatile language model, optimized for instruction following, logical reasoning through thinking capabilities, and long-context understanding, with enhanced performance across various domains and preferences. | Attribute | Value | |-----------|-------| -| Provider | OpenAI | -| Supports structured output | Yes | -| Supports function calling | Yes | +| Provider | Qwen | | Supports parallel tool-calling | Yes | -| Supported languages | English | -| Compatible Instances (max context in tokens\*) - Dedicated Deployment | H100 | -| Hugging Face model card | [gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) | +| Compatible Instances (max context in tokens\*) - Dedicated Deployment | H100-2 (100k), H100-SXM-2 (100k), H100-SXM-4 (262k) | +| Hugging Face model card | [Qwen3-235B-A22B-Thinking-2507-AWQ](https://huggingface.co/QuantTrio/Qwen3-235B-A22B-Thinking-2507-AWQ) | -\*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model. #### Model name ``` -openai/gpt-oss-120b:fp4 +qwen/qwen3-235b-a22b-thinking-2507:awq ``` ### Llama-3.3-70b-instruct @@ -430,6 +611,20 @@ Llama 3.1 was designed to match the best proprietary models and outperform many meta/llama-3.1-8b-instruct:fp8 meta/llama-3.1-8b-instruct:bf16 ``` +### Llama-3-8b-instruct +Llama-3-8b-instruct is the first generation of 8B-param models by Meta, fine-tuned for instruction and automation. + +| Attribute | Value | +|-----------|-------| +| Provider | Meta | +| Compatible Instances (max context in tokens\*) - Dedicated Deployment | L4, L40S, H100, H100-2, H100-SXM-2, H100-SXM-4, H100-SXM-8 | +| Hugging Face model card | [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | + + +#### Model name +``` +meta/llama-3-8b-instruct:bf16 +``` ### Llama-3-70b-instruct Meta’s Llama 3 is an iteration of the open-access Llama family. @@ -677,6 +872,23 @@ With Qwen2.5-coder deployed at Scaleway, your company can benefit from code gene qwen/qwen2.5-coder-32b-instruct:int8 ``` +### MiniMax-M2.5 +A state-of-the-art coding and agentic model excelling in tool use, search, and office tasks with exceptional efficiency and cost-effectiveness. + +| Attribute | Value | +|-----------|-------| +| Provider | MiniMaxAI | +| Supports parallel tool-calling | Yes | +| Compatible Instances (max context in tokens\*) - Dedicated Deployment | H100-SXM-4, H100-SXM-8 | +| Hugging Face model card | [lukealonso/MiniMax-M2.5-NVFP4](https://huggingface.co/lukealonso/MiniMax-M2.5-NVFP4) | + +\*Maximum context length is only mentioned when an instance's VRAM size limits context length. Otherwise, maximum context length is the one defined by the model. + +#### Model name +``` +minimaxai/minimax-m2.5:nvfp4 +``` + ## Embeddings models ### Qwen3-embedding-8b