question about GPU memory consumption

Thank you for your impressive work. I have a question regarding MoBA. Compared to full attention mechanisms, is MoBA capable of reducing GPU memory consumption during both training and inference stages, thereby enabling support for longer sequence inputs?

We conducted some experiments and observed a slight increase in memory consumption, despite a significant improvement in decoding speed. Is this an expected outcome? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about GPU memory consumption #36

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

question about GPU memory consumption #36

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions