[feat] support multilora by mathewjhan · Pull Request #4 · radixark/Megatron-Bridge

mathewjhan · 2026-05-15T23:57:12Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Guard EnergonProvider import with try/except so megatron-bridge works when megatron.energon is not installed. Co-Authored-By: Yusheng Su <yushengsu.thu@gmail.com> Co-Authored-By: gongyisheng <yishenggong9437@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>

Add megatron_param_name field to HFWeightTuple so downstream consumers (e.g. RL pipelines) can map exported HF weights back to their original Megatron parameter names. Co-Authored-By: Yusheng Su <yushengsu.thu@gmail.com> Co-Authored-By: gongyisheng <yishenggong9437@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>

Propagate actual megatron linear_in/linear_out parameter names through HFWeightTuple in peft_bridge adapter weight streaming instead of None. Co-Authored-By: Yusheng Su <yushengsu.thu@gmail.com> Co-Authored-By: gongyisheng <yishenggong9437@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>

- Add _pad_right_dim0 padding for vocab size mismatch in ColumnParallelMapping (embedding/output_layer weights) - Fix HFWeightTuple unpacking in state.py, utils.py, and auto_bridge.py to handle the new 3-field NamedTuple (param_name, weight, megatron_param_name) Co-Authored-By: Yusheng Su <yushengsu.thu@gmail.com> Co-Authored-By: gongyisheng <yishenggong9437@gmail.com> Co-Authored-By: Claude <noreply@anthropic.com>

Handle MoE expert layers (e.g. GPT-OSS gate_up_proj) whose HF param names do not end with the conventional ".weight" suffix: - _select_hf_base_param_name: return hf_param even when it lacks the expected suffix, so MoE expert mappings are no longer silently dropped. - _megatron_to_hf_adapter_name: append hf_suffix directly when hf_base_name does not end with base_suffix instead of returning None. - _make_lora_param_name: remove hard ".weight" suffix requirement; gracefully handle both standard and MoE-style param names. For grouped MoE experts: - Use a single ".weight0" lookup instead of per-expert iteration, since grouped expert adapters share 2D weights across all experts. - unsqueeze(0) on adapter tensors to restore the expected 3D shape when yielding HFWeightTuples for grouped experts. Made-with: Cursor

This reverts commit d123265.

Refactor grouped expert adapter handling in peft_bridge to use overridable hook methods instead of hardcoded GPT-OSS-specific logic: - _select_hf_base_param_name: accept HF params that don't end with the expected suffix (e.g. MoE expert names like gate_up_proj) - _resolve_hf_adapter_param_name / _make_lora_param_name: handle param names without .weight suffix gracefully - Add _get_grouped_expert_base_suffixes() hook: default per-expert iteration; GPT-OSS overrides to single .weight0 lookup - Add _prepare_expert_adapter_for_hf() hook: default no-op; GPT-OSS overrides to unsqueeze(0) for 3D shape restoration - Add unit test for grouped expert LoRA adapter export path Made-with: Cursor

yushengsu-thu and others added 18 commits April 9, 2026 01:06

Revert "fix: support MoE expert LoRA adapter conversion in peft_bridge"

dc22ea0

This reverts commit d123265.

[feat] add multilora layers

0d26b99

[fix] update to v4

e836b45

[temp]3

4416c99

[fix] comms

8e3374d

[fix] wrong rank masks

72149b4

[fix] sp shapes

e176534

[fix] adapter loading + expose

a47e357

[fix] move register adapter logic into the layer

0cc753e

[fix] naming

e7a9ae2

[fix] naming

6f2ee85

[misc] remove simplemultiloralinear

1ff9369

mathewjhan mentioned this pull request May 16, 2026

feat: multi-lora training radixark/miles#1141

Draft

mathewjhan marked this pull request as ready for review May 16, 2026 02:57

yushengsu-thu self-assigned this May 17, 2026

yushengsu-thu force-pushed the bridge branch from 3fd3768 to 6fde1c8 Compare May 25, 2026 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] support multilora #4

[feat] support multilora #4
mathewjhan wants to merge 18 commits into
radixark:bridgefrom
mathewjhan:radixark/multilora-support

mathewjhan commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mathewjhan commented May 15, 2026

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants