Skip to content

[feat] support multilora #4

Open
mathewjhan wants to merge 18 commits into
radixark:bridgefrom
mathewjhan:radixark/multilora-support
Open

[feat] support multilora #4
mathewjhan wants to merge 18 commits into
radixark:bridgefrom
mathewjhan:radixark/multilora-support

Conversation

@mathewjhan
Copy link
Copy Markdown

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

yushengsu-thu and others added 18 commits April 9, 2026 01:06
Guard EnergonProvider import with try/except so megatron-bridge works
when megatron.energon is not installed.

Co-Authored-By: Yusheng Su <yushengsu.thu@gmail.com>
Co-Authored-By: gongyisheng <yishenggong9437@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Add megatron_param_name field to HFWeightTuple so downstream consumers
(e.g. RL pipelines) can map exported HF weights back to their original
Megatron parameter names.

Co-Authored-By: Yusheng Su <yushengsu.thu@gmail.com>
Co-Authored-By: gongyisheng <yishenggong9437@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Propagate actual megatron linear_in/linear_out parameter names through
HFWeightTuple in peft_bridge adapter weight streaming instead of None.

Co-Authored-By: Yusheng Su <yushengsu.thu@gmail.com>
Co-Authored-By: gongyisheng <yishenggong9437@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>
- Add _pad_right_dim0 padding for vocab size mismatch in
  ColumnParallelMapping (embedding/output_layer weights)
- Fix HFWeightTuple unpacking in state.py, utils.py, and auto_bridge.py
  to handle the new 3-field NamedTuple (param_name, weight,
  megatron_param_name)

Co-Authored-By: Yusheng Su <yushengsu.thu@gmail.com>
Co-Authored-By: gongyisheng <yishenggong9437@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Handle MoE expert layers (e.g. GPT-OSS gate_up_proj) whose HF param
names do not end with the conventional ".weight" suffix:

- _select_hf_base_param_name: return hf_param even when it lacks the
  expected suffix, so MoE expert mappings are no longer silently dropped.
- _megatron_to_hf_adapter_name: append hf_suffix directly when
  hf_base_name does not end with base_suffix instead of returning None.
- _make_lora_param_name: remove hard ".weight" suffix requirement;
  gracefully handle both standard and MoE-style param names.

For grouped MoE experts:
- Use a single ".weight0" lookup instead of per-expert iteration, since
  grouped expert adapters share 2D weights across all experts.
- unsqueeze(0) on adapter tensors to restore the expected 3D shape when
  yielding HFWeightTuples for grouped experts.

Made-with: Cursor
Refactor grouped expert adapter handling in peft_bridge to use
overridable hook methods instead of hardcoded GPT-OSS-specific logic:

- _select_hf_base_param_name: accept HF params that don't end with
  the expected suffix (e.g. MoE expert names like gate_up_proj)
- _resolve_hf_adapter_param_name / _make_lora_param_name: handle
  param names without .weight suffix gracefully
- Add _get_grouped_expert_base_suffixes() hook: default per-expert
  iteration; GPT-OSS overrides to single .weight0 lookup
- Add _prepare_expert_adapter_for_hf() hook: default no-op;
  GPT-OSS overrides to unsqueeze(0) for 3D shape restoration
- Add unit test for grouped expert LoRA adapter export path

Made-with: Cursor
@mathewjhan mathewjhan marked this pull request as ready for review May 16, 2026 02:57
@yushengsu-thu yushengsu-thu self-assigned this May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants