Skip to content

[EXPERIMENT][WIP][OpenVINO] Add support for nvidia/Nemotron-Labs-Diffusion-VLM-8B#7

Open
mlukasze wants to merge 2 commits into
mainfrom
enable/nvidia-nemotron-labs-diffusion-vlm-8b
Open

[EXPERIMENT][WIP][OpenVINO] Add support for nvidia/Nemotron-Labs-Diffusion-VLM-8B#7
mlukasze wants to merge 2 commits into
mainfrom
enable/nvidia-nemotron-labs-diffusion-vlm-8b

Conversation

@mlukasze
Copy link
Copy Markdown
Owner

⚠️ AUTOMATICALLY GENERATED BY OMEGA AGENT — REQUIRES HUMAN REVIEW ⚠️
This PR was produced from a local validation workflow and still needs maintainer review before merge.

What does this PR do?

  • adds OpenVINO export support for nvidia/Nemotron-Labs-Diffusion-VLM-8B (nemotron_labs_diffusion_vlm)
  • wires the VLM into the multi-component export path (vision embeddings, text embeddings, language model)
  • adds loader/runtime mapping so OVModelForVisualCausalLM.from_pretrained(..., trust_remote_code=True) can load the exported IR
  • documents the required CLI flow and known model-specific limitations
  • adds a lightweight registration/unit test for the new export config

Installation instructions

pip install -U "transformers>=5.0.0" openvino
pip install -e .[openvino]

Exporting cmd-line

optimum-cli export openvino \
  --model nvidia/Nemotron-Labs-Diffusion-VLM-8B \
  --task image-text-to-text \
  --weight-format int4 \
  --trust-remote-code \
  ov_model

Inference script

import openvino as ov
from optimum.intel import OVModelForVisualCausalLM

model_dir = "ov_model"
model = OVModelForVisualCausalLM.from_pretrained(model_dir, trust_remote_code=True)

core = ov.Core()
compiled = core.compile_model(f"{model_dir}/openvino_language_model.xml", "GPU.1")
print(type(model).__name__, compiled)

Testing

  • Unit tests: tests/openvino/test_nemotron_labs_diffusion_vlm.py
  • Docs updated: docs/source/openvino/export.mdx, docs/source/openvino/models.mdx
  • Local validation: exported IR, CPU load OK, GPU.0 compile OK, GPU.1 compile/load OK

Known limitations

  • the backbone uses bidirectional diffusion (dlm_paradigm=bidirectional) rather than standard autoregressive decoding
  • use_cache=False, so generation-oriented flows remain model-specific
  • trust_remote_code=True is required for config/processor/chat-template handling
  • export emits tracer warnings from remote code; the produced IR should be validated on target devices

omega-intel and others added 2 commits May 26, 2026 19:26
…s_diffusion)

- Register NemotronOpenVINOConfig for nemotron_labs_diffusion
- Load remote-code checkpoint via AutoModel during OV export
- Tested export and OV generation on CPU

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…usion-VLM-8B

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mlukasze mlukasze added experimental Experimental work in progress do-not-merge Do not merge yet labels May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge Do not merge yet experimental Experimental work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants