Skip to content

[draft] Multi lora#60

Closed
rcano-baseten wants to merge 1 commit intomodelscope:mainfrom
basetenlabs:multi-lora
Closed

[draft] Multi lora#60
rcano-baseten wants to merge 1 commit intomodelscope:mainfrom
basetenlabs:multi-lora

Conversation

@rcano-baseten
Copy link
Copy Markdown

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Multi-LoRA registry and routing mechanism to support per-sequence and per-token adapter dispatching in Megatron-Core models, including MoE layers. My review highlights potential issues with registry initialization, the safety of direct dictionary manipulation for cleanup, and performance considerations regarding per-token linear operations and buffer allocation in the routing path.

Comment thread src/mcore_bridge/bridge/gpt_bridge.py Outdated
num_slots: Number of concurrent adapters to support.
lora_config: ``peft.LoraConfig`` describing rank, target modules, etc.
"""
self._adapter_registry: dict = {} # logical_name → slot_name
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _adapter_registry dictionary is initialized inside preallocate_adapters. If this method is called multiple times, it will overwrite the existing registry, potentially losing previously registered adapters. Consider checking if it already exists or initializing it in __init__.

Comment thread src/mcore_bridge/patcher.py Outdated
Comment on lines +883 to +887
if root is not None:
root.__dict__.pop('_expert_lora_indices', None)
if post is not None:
self.__dict__.pop('_lora_post_dispatch', None)
return result
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cleanup logic in routed_experts_compute uses root.__dict__.pop and self.__dict__.pop. While this works, it is safer to use getattr(root, '_expert_lora_indices', None) and delattr or setattr(root, '_expert_lora_indices', None) to avoid potential issues with direct dictionary manipulation of objects.

Comment thread src/mcore_bridge/tuners/lora.py Outdated
Comment on lines +376 to +387
for idx, slot in _lpl._idx_to_name.items():
if slot not in _lpl.lora_A:
continue
mask = token_idx == idx
if not mask.any():
continue
rows = mask.nonzero(as_tuple=True)[0]
x_sub = x_flat.index_select(0, rows).to(result.dtype)
delta = F.linear(_lpl.lora_dropout[slot](x_sub),
_lpl.lora_A[slot].weight.to(result.dtype))
delta = F.linear(delta, _lpl.lora_B[slot].weight.to(result.dtype))
result_flat.index_add_(0, rows, (delta * _lpl.scaling[slot]).to(result.dtype))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The loop iterates over _lpl._idx_to_name.items() and performs F.linear operations. This could be inefficient if the number of adapters is large. Consider vectorizing these operations if possible, or at least ensure that the x_sub selection and subsequent operations are as efficient as possible.

Comment thread src/mcore_bridge/tuners/lora.py Outdated
Comment on lines +514 to +517
delta = torch.zeros(total_tokens, out_feat, dtype=dtype, device=result.device)
apply_routed_lora(delta, x_flat, token_indices,
lora_A_by_idx, lora_B_by_idx, scaling_by_idx, dropout_by_idx)
result = result + delta.view_as(result).to(result.dtype)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Creating a new torch.zeros buffer for delta on every forward pass can be memory-intensive. Consider reusing a buffer if possible, or ensure that this allocation does not become a performance bottleneck during training.

@rcano-baseten rcano-baseten deleted the multi-lora branch May 4, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant