Question about performance at different seqlen

Great job for you guys!

My device only has 24GB VRAM, so I must set the seqlen to around 4k when running the perf test, and the performance of FSA is worse than NSA. Besides, it seems like the provided speedup ratio is also tested on default 64k seqlen, which is not very common at practice.

How's the performance of NSA _ref and FSA when the seqlen is relative shorter? It will be better that you can provide a benchmark comparing NSA_ref, FSA and full-attention along side the seqlen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about performance at different seqlen #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about performance at different seqlen #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions