-
Notifications
You must be signed in to change notification settings - Fork 139
moba注意力机制是否可以不经过训练直接插入其他模型中使用? #35
Copy link
Copy link
Open
Description
moba注意力机制是否可以不经过训练直接插入其他模型中使用?
在技术报告中有写到
Additionally, MoBA’s flexibility allows it to be integrated with existing models without substantial training cost, making it a practical continual pre-training solution for enhancing long-context capabilities in LLMs.
在readme文档中有写
Note: MoBA requires continue training of existing models to achieve its acceleration benefits. It is not a drop-in sparse attention solution that can be directly applied to pretrained models without additional training.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels