Skip to content

Add HY-OmniWeaving support for HunyuanVideo 1.5#13289

Open
ifilipis wants to merge 1 commit intoComfy-Org:masterfrom
ifilipis:OmniWeaving
Open

Add HY-OmniWeaving support for HunyuanVideo 1.5#13289
ifilipis wants to merge 1 commit intoComfy-Org:masterfrom
ifilipis:OmniWeaving

Conversation

@ifilipis
Copy link
Copy Markdown

@ifilipis ifilipis commented Apr 4, 2026

https://huggingface.co/tencent/HY-OmniWeaving

Repackaged models:
https://huggingface.co/vafipas663/HY-OmniWeaving_repackaged/tree/main/split_files

  • Transformer is a finetune
  • Clip is a finetune
  • VAE is original
  • Upsamplers are original

Tested with their model and encoder as is

Workflow:
omniweave_t2v_test_00015_

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 4, 2026

📝 Walkthrough

Walkthrough

This pull request adds support for HunyuanVideo 1.5 "Omni" models by extending text encoder detection and checkpoint handling for Qwen2.5-VL encoders, adding attention tensor format conversion for HY-OmniWeave checkpoints, and introducing three new conditioning nodes. The changes include model detection logic updates, checkpoint key normalization, attention tensor merging for split Q/K/V formats, and UI/API extensions to expose the new hunyuan_video_15 CLIP type alongside new nodes for encoding text, concatenating vision outputs, and generating task-specific conditioning latents.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main purpose of the PR: adding support for HY-OmniWeaving in HunyuanVideo 1.5.
Description check ✅ Passed The description provides relevant context about the HY-OmniWeaving model, links to resources, testing information, and includes a workflow screenshot demonstrating the implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy_extras/nodes_hunyuan.py`:
- Around line 528-535: The omni_mask can exceed 1.0 (e.g., omni_mask[ref_idx]
becomes 2.0), which makes concat_mask negative after computing 1.0 - omni_mask;
clamp omni_mask to the [0,1] range before inverting so concat_mask remains a
proper 0/1 mask. Update the code that computes concat_mask (and/or immediately
before it) to use a clamped version of omni_mask (e.g., torch.clamp(omni_mask,
0.0, 1.0)) when computing 1.0 - omni_mask, referencing omni_mask, concat_mask,
cond_latent, latent_length and the preceding logic that modifies omni_mask
(including _encode_single_image/reference_images handling).

In `@comfy/sd.py`:
- Around line 1270-1276: detect_te_model() accepts checkpoints keyed under
model.language_model.* for both QWEN25_3B and QWEN25_7B, but the 3B loading path
calls omnigen2.te() with the raw sd (whereas the 7B path normalizes prefixes
before loading), which risks silent weight-dropping by
transformer.load_state_dict in SDClipModel.load_sd(); update the 3B branch to
perform the same key-prefix normalization as the 7B loader before calling
omnigen2.te() (i.e., rewrite keys from the model.language_model.* layout to the
expected model.* layout), or alternatively restrict detect_te_model() to only
detect the 7B layout—prefer the former and apply the same prefix-rewrite logic
where the 3B omnigen2.te(...) invocation occurs so the state dict keys match the
model expected by transformer.load_state_dict.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 07947a53-d006-487c-a7ba-e3c834765b33

📥 Commits

Reviewing files that changed from the base of the PR and between 13917b3 and 6447250.

📒 Files selected for processing (3)
  • comfy/sd.py
  • comfy_extras/nodes_hunyuan.py
  • nodes.py

Comment on lines +528 to +535
encoded_ref = cls._encode_single_image(vae, reference_images[:1], width, height)
ref_idx = 1 if latent_length > 1 else 0
cond_latent[:, :, ref_idx:ref_idx + 1] += encoded_ref[:, :, :1]
omni_mask[ref_idx] += 1.0

cond_latent = comfy.utils.resize_to_batch_size(cond_latent, batch_size)
# BaseModel/HunyuanVideo15 inverts concat_mask (mask = 1 - concat_mask), so pass the pre-inverted mask.
concat_mask = (1.0 - omni_mask).view(1, 1, latent_length, 1, 1).expand(cond_latent.shape[0], 1, latent_length, cond_latent.shape[-2], cond_latent.shape[-1]).to(cond_latent.dtype)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clamp the TiV2V mask before inverting it.

Line 531 increments a slot that is already set to 1.0 by the conditioned-video branch, so omni_mask[ref_idx] becomes 2.0. After the 1.0 - omni_mask transform on Line 535, the TiV2V path sends -1.0 in concat_mask for that frame, which breaks the 0/1 mask semantics used by the other tasks.

Proposed fix
             encoded_ref = cls._encode_single_image(vae, reference_images[:1], width, height)
             ref_idx = 1 if latent_length > 1 else 0
             cond_latent[:, :, ref_idx:ref_idx + 1] += encoded_ref[:, :, :1]
-            omni_mask[ref_idx] += 1.0
+            omni_mask[ref_idx] = 1.0

         cond_latent = comfy.utils.resize_to_batch_size(cond_latent, batch_size)
+        omni_mask = omni_mask.clamp_(0.0, 1.0)
         # BaseModel/HunyuanVideo15 inverts concat_mask (mask = 1 - concat_mask), so pass the pre-inverted mask.
         concat_mask = (1.0 - omni_mask).view(1, 1, latent_length, 1, 1).expand(cond_latent.shape[0], 1, latent_length, cond_latent.shape[-2], cond_latent.shape[-1]).to(cond_latent.dtype)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy_extras/nodes_hunyuan.py` around lines 528 - 535, The omni_mask can
exceed 1.0 (e.g., omni_mask[ref_idx] becomes 2.0), which makes concat_mask
negative after computing 1.0 - omni_mask; clamp omni_mask to the [0,1] range
before inverting so concat_mask remains a proper 0/1 mask. Update the code that
computes concat_mask (and/or immediately before it) to use a clamped version of
omni_mask (e.g., torch.clamp(omni_mask, 0.0, 1.0)) when computing 1.0 -
omni_mask, referencing omni_mask, concat_mask, cond_latent, latent_length and
the preceding logic that modifies omni_mask (including
_encode_single_image/reference_images handling).

Comment on lines +1270 to +1276
# Qwen-VL checkpoints can be saved under model.language_model.* (e.g. HY-OmniWeave text encoder).
if 'model.language_model.layers.0.self_attn.k_proj.bias' in sd:
weight = sd['model.language_model.layers.0.self_attn.k_proj.bias']
if weight.shape[0] == 256:
return TEModel.QWEN25_3B
if weight.shape[0] == 512:
return TEModel.QWEN25_7B
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

files=$(fd -a '(omnigen2|qwen_image|hunyuan_image|hunyuan_video)\.py$' .)

printf 'Inspecting wrapper files:\n%s\n\n' "$files"

for f in $files; do
  echo "== $f =="
  rg -n -C2 'model\.language_model|model\.layers\.0\.self_attn\.k_proj|state_dict_prefix_replace|def llama_detect|load_sd|load_state_dict' "$f"
  echo
done

Repository: Comfy-Org/ComfyUI

Length of output: 407


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Examine the critical line ranges in comfy/sd.py
echo "=== Lines 1270-1280 (detect_te_model QWEN25 detection) ==="
sed -n '1270,1280p' comfy/sd.py

echo ""
echo "=== Lines 1420-1445 (QWEN25_7B and prefix normalization) ==="
sed -n '1420,1445p' comfy/sd.py

echo ""
echo "=== Lines 1400-1420 (Context before QWEN25_7B) ==="
sed -n '1400,1420p' comfy/sd.py

echo ""
echo "=== Search for QWEN25_3B handling ==="
rg -n -A8 'QWEN25_3B' comfy/sd.py | head -50

Repository: Comfy-Org/ComfyUI

Length of output: 6141


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== omnigen2.te() and llama_detect definition ==="
rg -n -A15 'def te\(' comfy/text_encoders/omnigen2.py | head -60

echo ""
echo "=== llama_detect in comfy/sd.py to see what keys it expects ==="
rg -n -B2 -A20 'def llama_detect' comfy/sd.py

echo ""
echo "=== Check if omnigen2.te handles state_dict_prefix_replace or similar ==="
rg -n 'state_dict_prefix_replace|model\.language_model|model\.visual' comfy/text_encoders/omnigen2.py

Repository: Comfy-Org/ComfyUI

Length of output: 1642


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Omnigen2Model class and __init__ ==="
rg -n -B2 -A30 'class Omnigen2Model' comfy/text_encoders/omnigen2.py | head -80

echo ""
echo "=== Check if Omnigen2Model or parent does any prefix rewriting ==="
rg -n 'state_dict_prefix_replace|model\.language_model|load_state_dict' comfy/text_encoders/omnigen2.py

echo ""
echo "=== Check parent class if Omnigen2Model inherits ==="
rg -n 'import.*Omnigen|from.*omnigen' comfy/text_encoders/omnigen2.py | head -10

Repository: Comfy-Org/ComfyUI

Length of output: 1022


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check SD1ClipModel ==="
rg -n -B2 -A20 'class SD1ClipModel' comfy/text_encoders/sd1_clip.py | head -60

echo ""
echo "=== Check Qwen25_3BModel ==="
fd 'qwen.*\.py$' --type f | xargs rg -l 'Qwen25_3BModel' 

echo ""
echo "=== Search for Qwen25_3BModel definition ==="
rg -n -B2 -A25 'class Qwen25_3BModel' comfy/

Repository: Comfy-Org/ComfyUI

Length of output: 176


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Find SD1ClipModel definition ==="
rg -n 'class SD1ClipModel' comfy/

echo ""
echo "=== Find where sd1_clip is imported from ==="
rg -n 'from.*sd1_clip|import.*sd1_clip' comfy/

echo ""
echo "=== Find Qwen25_3BModel ==="
rg -n 'class Qwen25_3BModel' comfy/

Repository: Comfy-Org/ComfyUI

Length of output: 2090


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Qwen25_3BModel definition ==="
sed -n '25,50p' comfy/text_encoders/omnigen2.py

echo ""
echo "=== SDClipModel and SD1ClipModel load_sd or __init__ ==="
sed -n '710,800p' comfy/sd1_clip.py

echo ""
echo "=== Search for load_state_dict in SD1ClipModel area ==="
rg -n -B5 -A15 'def load_state_dict|state_dict_prefix' comfy/sd1_clip.py | head -100

Repository: Comfy-Org/ComfyUI

Length of output: 3202


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Search for SDClipModel definition ==="
rg -n 'class SDClipModel' comfy/

echo ""
echo "=== Search for Qwen25_3B in llama.py ==="
rg -n -B2 -A30 'class Qwen25_3B' comfy/text_encoders/llama.py | head -80

echo ""
echo "=== Check if SDClipModel has load_sd or transformers loading ==="
rg -n -B5 -A20 'class SDClipModel' comfy/sd1_clip.py | head -80

Repository: Comfy-Org/ComfyUI

Length of output: 3913


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check SDClipModel.load_sd implementation ==="
rg -n -B3 -A25 'def load_sd' comfy/sd1_clip.py | head -80

echo ""
echo "=== Check if Qwen25_3B or Llama2_ handle state dict prefix flexibly ==="
rg -n 'def forward|def load_state_dict|state_dict' comfy/text_encoders/llama.py | head -30

echo ""
echo "=== Verify: check what keys Llama2_ expects by looking at model structure ==="
sed -n '300,400p' comfy/text_encoders/llama.py | head -60

Repository: Comfy-Org/ComfyUI

Length of output: 4758


Check whether the 3B path needs the same prefix normalization as the 7B path.

detect_te_model() now accepts model.language_model.* prefixed layouts for both 256-dim (3B) and 512-dim (7B) models (lines 1271–1276). However, the 3B loader at line 1425 passes the state dict directly to omnigen2.te() with no key rewriting, while the 7B loader at lines 1431–1440 normalizes the prefixes before loading.

Since SDClipModel.load_sd() calls transformer.load_state_dict(sd, strict=False), PyTorch will silently ignore the mismatched keys model.language_model.* when the model expects model.layers.*. The checkpoint will appear supported but fail to load any weights.

The 3B path should either rewrite the keys the same way as 7B, or the detection should be scoped to only the 7B branch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/sd.py` around lines 1270 - 1276, detect_te_model() accepts checkpoints
keyed under model.language_model.* for both QWEN25_3B and QWEN25_7B, but the 3B
loading path calls omnigen2.te() with the raw sd (whereas the 7B path normalizes
prefixes before loading), which risks silent weight-dropping by
transformer.load_state_dict in SDClipModel.load_sd(); update the 3B branch to
perform the same key-prefix normalization as the 7B loader before calling
omnigen2.te() (i.e., rewrite keys from the model.language_model.* layout to the
expected model.* layout), or alternatively restrict detect_te_model() to only
detect the 7B layout—prefer the former and apply the same prefix-rewrite logic
where the 3B omnigen2.te(...) invocation occurs so the state dict keys match the
model expected by transformer.load_state_dict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant