Skip to content

Question about performance at different seqlen #2

@jayden222

Description

@jayden222

Great job for you guys!

My device only has 24GB VRAM, so I must set the seqlen to around 4k when running the perf test, and the performance of FSA is worse than NSA. Besides, it seems like the provided speedup ratio is also tested on default 64k seqlen, which is not very common at practice.

How's the performance of NSA _ref and FSA when the seqlen is relative shorter? It will be better that you can provide a benchmark comparing NSA_ref, FSA and full-attention along side the seqlen.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions