moba注意力机制是否可以不经过训练直接插入其他模型中使用？

moba注意力机制是否可以不经过训练直接插入其他模型中使用？

在技术报告中有写到
Additionally, MoBA’s flexibility allows it to be integrated with existing models without substantial training cost, making it a practical continual pre-training solution for enhancing long-context capabilities in LLMs.
在readme文档中有写
Note: MoBA requires continue training of existing models to achieve its acceleration benefits. It is not a drop-in sparse attention solution that can be directly applied to pretrained models without additional training.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

moba注意力机制是否可以不经过训练直接插入其他模型中使用？ #35

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

moba注意力机制是否可以不经过训练直接插入其他模型中使用？ #35

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions