HuggingFaceMultiModal version mismatch

When attempting to use the latest HuggingFace MultiModal, it fails to work due to version incompatibility with either the llama core or the HuggingFace LLM base model.

```python
from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal

mm_llm = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")
```

```
Traceback (most recent call last):
  File "/workspaces/raggify/temp/test.py", line 4, in <module>
    mm_llm = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/raggify/.venv/lib/python3.12/site-packages/llama_index/multi_modal_llms/huggingface/base.py", line 255, in from_model_name
    return Qwen2VisionMultiModal(model_name=model_name, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/raggify/.venv/lib/python3.12/site-packages/llama_index/multi_modal_llms/huggingface/base.py", line 96, in __init__
    super().__init__(**kwargs)
  File "/workspaces/raggify/.venv/lib/python3.12/site-packages/llama_index/llms/huggingface/base.py", line 212, in __init__
    model = model or AutoModelForCausalLM.from_pretrained(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/raggify/.venv/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 607, in from_pretrained
    raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers.models.qwen2_vl.configuration_qwen2_vl.Qwen2VLConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of ApertusConfig, ArceeConfig, AriaTextConfig, BambaConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BitNetConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, BltConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, DeepseekV2Config, DeepseekV3Config, DiffLlamaConfig, DogeConfig, Dots1Config, ElectraConfig, Emu3Config, ErnieConfig, Ernie4_5Config, Ernie4_5_MoeConfig, Exaone4Config, FalconConfig, FalconH1Config, FalconMambaConfig, FlexOlmoConfig, FuyuConfig, GemmaConfig, Gemma2Config, Gemma3Config, Gemma3TextConfig, Gemma3nConfig, Gemma3nTextConfig, GitConfig, GlmConfig, Glm4Config, Glm4MoeConfig, GotOcr2Config, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GptOssConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, GraniteMoeHybridConfig, GraniteMoeSharedConfig, HeliumConfig, HunYuanDenseV1Config, HunYuanMoEV1Config, JambaConfig, JetMoeConfig, Lfm2Config, LlamaConfig, Llama4Config, Llama4TextConfig, LongcatFlashConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MiniMaxConfig, MinistralConfig, MistralConfig, MixtralConfig, MllamaConfig, ModernBertDecoderConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, Olmo2Config, Olmo3Config, OlmoeConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, Phi4MultimodalConfig, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, Qwen3Config, Qwen3MoeConfig, Qwen3NextConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, SeedOssConfig, SmolLM3Config, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, VaultGemmaConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, xLSTMConfig, XmodConfig, ZambaConfig, Zamba2Config.
```

Downgrading to version 0.4.2 resolved the issue, but this also downgraded the dependent torch library to version 2.4.1 (torchvision 0.19.1).

Attempting to use HuggingFace LLM for text summarization in this state results in errors because the torch version is too outdated.

```python
from llama_index.llms.huggingface import HuggingFaceLLM

llm = HuggingFaceLLM(model_name="StabilityAI/stablelm-tuned-alpha-3b")
```

```
ValueError: Due to a serious vulnerability issue in torch.load, even with weights_only=True, we now require users to upgrade torch to at least v2.6 in order to use the function. This version restriction does not apply when loading files with safetensors.
See the vulnerability report here https://nvd.nist.gov/vuln/detail/CVE-2025-32434
```

With no other option, I gave up on using stablelm and applied Qwen for text summarization as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HuggingFaceMultiModal version mismatch #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

HuggingFaceMultiModal version mismatch #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions