Skip to content

[OpenVINO] Add support for Zyphra ZAYA#4

Draft
mlukasze wants to merge 11 commits into
mainfrom
feat/add-zaya-support
Draft

[OpenVINO] Add support for Zyphra ZAYA#4
mlukasze wants to merge 11 commits into
mainfrom
feat/add-zaya-support

Conversation

@mlukasze
Copy link
Copy Markdown
Owner

@mlukasze mlukasze commented May 8, 2026

⚠️ AUTOMATICALLY GENERATED BY MEAT AGENT — REQUIRES HUMAN REVIEW ⚠️
This PR was created by an AI agent as part of automated model enablement.
A human maintainer must review and approve it before it can be considered for merge.
Do NOT merge without human review and sign-off.

What does this PR do?

Adds OpenVINO export and runtime support for Zyphra ZAYA (zaya) — a hybrid attention + recurrent architecture implemented in Transformers as ZayaForCausalLM.

Installation instructions

pip install git+https://github.com/mlukasze/optimum-intel.git@feat/add-zaya-support
pip install --pre -U openvino openvino-tokenizers nncf --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly

Exporting cmd-line

optimum-cli export openvino -m Zyphra/ZAYA1-8B --task text-generation-with-past --weight-format int4 zaya1-8b-ov/

Inference script

from optimum.intel import OVModelForCausalLM
from transformers import AutoTokenizer

model = OVModelForCausalLM.from_pretrained('zaya1-8b-ov/')
tokenizer = AutoTokenizer.from_pretrained('Zyphra/ZAYA1-8B')
inputs = tokenizer('Hello, how are you?', return_tensors='pt')
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

paulinebm and others added 3 commits May 7, 2026 14:04
* [CI] Bump style-bot SHA + switch to GitHub App

* [CI] Bump style-bot to merged SHA e2867e92
* [OpenVINO] Use performant 3GEMM MoE for Qwen3.5

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

* Fix quantization test

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

---------

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Support Zyphra ZAYA hybrid SSM+attention export and runtime in OpenVINO.
Includes exporter registration, model patching, runtime cache handling,
Zaya-specific tests, and docs update.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mlukasze mlukasze force-pushed the feat/add-zaya-support branch from 0937be4 to 9c528ab Compare May 8, 2026 06:39
mlukasze and others added 8 commits May 8, 2026 08:43
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…opy_() with assignment+detach() so OpenVINO make_stateful can materialize the missing ReadValue/Assign pair for ZAYA Mamba state caches.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…patch

Three fixes for ZAYA1-8B models exported with past_key_values/present
naming (rather than cache_params.* naming):

1. Class dispatch fix (from_pretrained):
   OVModelWithMambaForCausalLM requires cache_params.* input names.
   When an SSM model is exported with standard past_key_values/present
   naming, fall back to OVModelForCausalLM which handles that correctly.

2. key_value_input_names rebuild (OVBaseDecoderModel.__init__):
   ZAYA exports even-layer KV-cache inputs as past_key_values.X.key/value
   and odd-layer KV-cache inputs as present.X.key/value (same name for
   both input and output).  The standard filter only picks up
   past_key_values.* names, so key_value_input_names is half-empty.
   When inputs < outputs, rebuild the list ordered to match outputs so
   dict(zip(key_value_input_names, past_key_values)) is correct.

3. _reshape fix (OVBaseDecoderModel._reshape):
   present.X.key/value inputs (4-D KV tensors) must have their sequence
   dimension (dim 2) made dynamic, not dim 1 (num_heads).  The previous
   else branch incorrectly set dim 1 to -1, making num_heads dynamic and
   causing 'Cannot get length of dynamic dimension' during empty-cache
   tensor creation in prepare_inputs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rid Mamba+Attention MoE)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…d SSM+Attention models

Fix Concat shape mismatch in stateful dynamo export for ZAYA1-8B.
opset13.gather with scalar axis can return a 0D tensor, but opset13.concat
requires consistent rank. Normalize batch with reshape(-1) before concatenation.

Also fixes ZayaDummyPastKeyValuesGenerator to read conv_kernel_size from
model config instead of hardcoding 2.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants