add fix for qwen35 mtp layer name by artem-osmosis · Pull Request #2 · radixark/Megatron-Bridge

artem-osmosis · 2026-05-11T18:40:28Z

When we try to load Qwen3.5 into miles using megatron bridge, occasionally (for the MoE models specifically), mtp modules are loaded as language_model.mtp.layers.0.transformer_layer.mlp.router.weight, not language_model.mtp.layers.0.mtp_model_layer.mlp.router.weight etc. This handles both cases.

Guard EnergonProvider import with try/except so megatron-bridge works when megatron.energon is not installed. Co-Authored-By: Yusheng Su <yushengsu.thu@gmail.com> Co-Authored-By: gongyisheng <yishenggong9437@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>

Add megatron_param_name field to HFWeightTuple so downstream consumers (e.g. RL pipelines) can map exported HF weights back to their original Megatron parameter names. Co-Authored-By: Yusheng Su <yushengsu.thu@gmail.com> Co-Authored-By: gongyisheng <yishenggong9437@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>

Propagate actual megatron linear_in/linear_out parameter names through HFWeightTuple in peft_bridge adapter weight streaming instead of None. Co-Authored-By: Yusheng Su <yushengsu.thu@gmail.com> Co-Authored-By: gongyisheng <yishenggong9437@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>

- Add _pad_right_dim0 padding for vocab size mismatch in ColumnParallelMapping (embedding/output_layer weights) - Fix HFWeightTuple unpacking in state.py, utils.py, and auto_bridge.py to handle the new 3-field NamedTuple (param_name, weight, megatron_param_name) Co-Authored-By: Yusheng Su <yushengsu.thu@gmail.com> Co-Authored-By: gongyisheng <yishenggong9437@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>

Handle MoE expert layers (e.g. GPT-OSS gate_up_proj) whose HF param names do not end with the conventional ".weight" suffix: - _select_hf_base_param_name: return hf_param even when it lacks the expected suffix, so MoE expert mappings are no longer silently dropped. - _megatron_to_hf_adapter_name: append hf_suffix directly when hf_base_name does not end with base_suffix instead of returning None. - _make_lora_param_name: remove hard ".weight" suffix requirement; gracefully handle both standard and MoE-style param names. For grouped MoE experts: - Use a single ".weight0" lookup instead of per-expert iteration, since grouped expert adapters share 2D weights across all experts. - unsqueeze(0) on adapter tensors to restore the expected 3D shape when yielding HFWeightTuples for grouped experts. Made-with: Cursor

This reverts commit d123265.

Refactor grouped expert adapter handling in peft_bridge to use overridable hook methods instead of hardcoded GPT-OSS-specific logic: - _select_hf_base_param_name: accept HF params that don't end with the expected suffix (e.g. MoE expert names like gate_up_proj) - _resolve_hf_adapter_param_name / _make_lora_param_name: handle param names without .weight suffix gracefully - Add _get_grouped_expert_base_suffixes() hook: default per-expert iteration; GPT-OSS overrides to single .weight0 lookup - Add _prepare_expert_adapter_for_hf() hook: default no-op; GPT-OSS overrides to unsqueeze(0) for 3D shape restoration - Add unit test for grouped expert LoRA adapter export path Made-with: Cursor

yushengsu-thu and others added 8 commits April 9, 2026 01:06

Revert "fix: support MoE expert LoRA adapter conversion in peft_bridge"

dc22ea0

This reverts commit d123265.

add fix for qwen35 mtp layer name

65355f0

artem-osmosis mentioned this pull request May 11, 2026

qwen3.5 lora compatibility (needs SGLang + Megatron-Bridge patches as well) radixark/miles#1112

Open

yushengsu-thu force-pushed the bridge branch from 3fd3768 to 6fde1c8 Compare May 25, 2026 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add fix for qwen35 mtp layer name#2

add fix for qwen35 mtp layer name#2
artem-osmosis wants to merge 8 commits into
radixark:bridgefrom
artem-osmosis:qwen35_miles_lora_v4

artem-osmosis commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

artem-osmosis commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

artem-osmosis commented May 11, 2026 •

edited

Loading