[OpenVINO] Add support for Zyphra ZAYA by mlukasze · Pull Request #4 · mlukasze/optimum-intel

mlukasze · 2026-05-08T06:37:57Z

⚠️ AUTOMATICALLY GENERATED BY MEAT AGENT — REQUIRES HUMAN REVIEW ⚠️
This PR was created by an AI agent as part of automated model enablement.
A human maintainer must review and approve it before it can be considered for merge.
Do NOT merge without human review and sign-off.

What does this PR do?

Adds OpenVINO export and runtime support for Zyphra ZAYA (zaya) — a hybrid attention + recurrent architecture implemented in Transformers as ZayaForCausalLM.

Installation instructions

pip install git+https://github.com/mlukasze/optimum-intel.git@feat/add-zaya-support
pip install --pre -U openvino openvino-tokenizers nncf --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly

Exporting cmd-line

optimum-cli export openvino -m Zyphra/ZAYA1-8B --task text-generation-with-past --weight-format int4 zaya1-8b-ov/

Inference script

from optimum.intel import OVModelForCausalLM
from transformers import AutoTokenizer

model = OVModelForCausalLM.from_pretrained('zaya1-8b-ov/')
tokenizer = AutoTokenizer.from_pretrained('Zyphra/ZAYA1-8B')
inputs = tokenizer('Hello, how are you?', return_tensors='pt')
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

* [CI] Bump style-bot SHA + switch to GitHub App * [CI] Bump style-bot to merged SHA e2867e92

* [OpenVINO] Use performant 3GEMM MoE for Qwen3.5 Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com> * Fix quantization test Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com> --------- Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Support Zyphra ZAYA hybrid SSM+attention export and runtime in OpenVINO. Includes exporter registration, model patching, runtime cache handling, Zaya-specific tests, and docs update. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…opy_() with assignment+detach() so OpenVINO make_stateful can materialize the missing ReadValue/Assign pair for ZAYA Mamba state caches.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…patch Three fixes for ZAYA1-8B models exported with past_key_values/present naming (rather than cache_params.* naming): 1. Class dispatch fix (from_pretrained): OVModelWithMambaForCausalLM requires cache_params.* input names. When an SSM model is exported with standard past_key_values/present naming, fall back to OVModelForCausalLM which handles that correctly. 2. key_value_input_names rebuild (OVBaseDecoderModel.__init__): ZAYA exports even-layer KV-cache inputs as past_key_values.X.key/value and odd-layer KV-cache inputs as present.X.key/value (same name for both input and output). The standard filter only picks up past_key_values.* names, so key_value_input_names is half-empty. When inputs < outputs, rebuild the list ordered to match outputs so dict(zip(key_value_input_names, past_key_values)) is correct. 3. _reshape fix (OVBaseDecoderModel._reshape): present.X.key/value inputs (4-D KV tensors) must have their sequence dimension (dim 2) made dynamic, not dim 1 (num_heads). The previous else branch incorrectly set dim 1 to -1, making num_heads dynamic and causing 'Cannot get length of dynamic dimension' during empty-cache tensor creation in prepare_inputs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…rid Mamba+Attention MoE) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…d SSM+Attention models Fix Concat shape mismatch in stateful dynamo export for ZAYA1-8B. opset13.gather with scalar axis can return a 0D tensor, but opset13.concat requires consistent rank. Normalize batch with reshape(-1) before concatenation. Also fixes ZayaDummyPastKeyValuesGenerator to read conv_kernel_size from model config instead of hardcoding 2. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

paulinebm and others added 3 commits May 7, 2026 14:04

[CI] Bump style-bot SHA + switch to GitHub App (huggingface#1726)

0e1a9df

* [CI] Bump style-bot SHA + switch to GitHub App * [CI] Bump style-bot to merged SHA e2867e92

Add OpenVINO support for Zaya

9c528ab

Support Zyphra ZAYA hybrid SSM+attention export and runtime in OpenVINO. Includes exporter registration, model patching, runtime cache handling, Zaya-specific tests, and docs update. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mlukasze force-pushed the feat/add-zaya-support branch from 0937be4 to 9c528ab Compare May 8, 2026 06:39

mlukasze and others added 8 commits May 8, 2026 08:43

Adjust Zaya docs and tests

fa1404a

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix ZAYA1 stateful export on Transformers 4.57.1

4202c43

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix ZAYA1 OpenVINO stateful export runtime on Transformers 4.57.1

c2138dc

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix Mamba prev_hs state export for ZAYA\n\nReplace in-place prev_hs c…

95990a0

…opy_() with assignment+detach() so OpenVINO make_stateful can materialize the missing ReadValue/Assign pair for ZAYA Mamba state caches.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add zaya to MODEL_NAMES in utils_tests.py

f7af850

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

[EXPERIMENT][WIP][OpenVINO] Add ZAYA support for Zyphra/ZAYA1-8B (hyb…

880fe90

…rid Mamba+Attention MoE) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenVINO] Add support for Zyphra ZAYA#4

[OpenVINO] Add support for Zyphra ZAYA#4
mlukasze wants to merge 11 commits into
mainfrom
feat/add-zaya-support

mlukasze commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mlukasze commented May 8, 2026

What does this PR do?

Installation instructions

Exporting cmd-line

Inference script

Before submitting

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants