-
Notifications
You must be signed in to change notification settings - Fork 139
mismatch max_seqlen in _flash_attn_varlen_forward #32
Copy link
Copy link
Open
Description
Hi, thanks for your great work.
I noticed in the MixedAttention function that the following code first computes the query (q) and its interactions within the corresponding chunk.
# self attn
_, _, _, _, self_attn_out_sh, self_attn_lse_hs, _, _ = (
_flash_attn_varlen_forward(
q=q,
k=k,
v=v,
cu_seqlens_q=self_attn_cu_seqlen,
cu_seqlens_k=self_attn_cu_seqlen,
max_seqlen_q=max_seqlen,
max_seqlen_k=max_seqlen,
softmax_scale=softmax_scale,
causal=True,
dropout_p=0.0,
)
)However, the max_seqlen is clearly larger than the maximum value in self_attn_cu_seqlen.
Line 96 in b5d5836
| max_seqlen_q=max_seqlen, |
I would like to know if this leads to any potential issues, such as reduced computational efficiency or unintended behavior in the attention computation?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels