Skip to content

moba注意力机制是否可以不经过训练直接插入其他模型中使用? #35

@YuHuiGao

Description

@YuHuiGao

moba注意力机制是否可以不经过训练直接插入其他模型中使用?

在技术报告中有写到
Additionally, MoBA’s flexibility allows it to be integrated with existing models without substantial training cost, making it a practical continual pre-training solution for enhancing long-context capabilities in LLMs.
在readme文档中有写
Note: MoBA requires continue training of existing models to achieve its acceleration benefits. It is not a drop-in sparse attention solution that can be directly applied to pretrained models without additional training.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions