Skip to content

Add headdim 256 md5 and Disable Python GC during timed iterations#24

Open
baoqiwen wants to merge 1 commit into
umiswing:masterfrom
baoqiwen:bqw_d256
Open

Add headdim 256 md5 and Disable Python GC during timed iterations#24
baoqiwen wants to merge 1 commit into
umiswing:masterfrom
baoqiwen:bqw_d256

Conversation

@baoqiwen

Copy link
Copy Markdown
Contributor

No description provided.

Comment thread benchmark_flashmask.py
fn()
# Benchmark

gc.collect()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

防止 kernel 中恰好出现 gc,统计出错

@baoqiwen baoqiwen force-pushed the bqw_d256 branch 2 times, most recently from f79e4ec to bdafe32 Compare June 26, 2026 10:13

def generate_random_eviction_mask(batch_size, seqlen_q, seqlen_k, h, start_row=None):
# np.random.seed(0)
np.random.seed(0)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有一次发现该mask单测挂了,有一个很小的精度误差。复现了一下午没复现出来。怀疑是mask没有固定seed,在某一个特殊场景下小幅度超精度上限。

@baoqiwen baoqiwen force-pushed the bqw_d256 branch 5 times, most recently from 73d91aa to 8afd22c Compare June 26, 2026 10:32
Comment thread benchmark_fa4_mask_mod.py
# Note(umiswing): fa4 does not support d 256
for D in [128]:
H = 4096 // D
if D == 192:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d192/dv128 支持的不全,这里只保留了 d != dv 测试的可能性,但实际上不能测。

Comment thread benchmark_fa4_mask_mod.py
@@ -846,15 +853,21 @@ def main(examples: List[str] = ["all"], dtype='bf16'):
#doc_seq_lens_list = doc_seq_lens_list[::-1]
# Note(umiswing): fa4 does not support d 256
for D in [128]:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d192/dv128 fa4 mask mod 支持的不全面,这里先把功能加上,但是现在不测。

Comment thread benchmark_flashmask.py
#doc_seq_lens_list = doc_seq_lens_list[::-1]
for D in [128] if fm_version == 4 else [64, 128, 256]:
H = 4096 // D
for D in [128, 192, 256] if fm_version == 4 else [64, 128, 256]:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

涉及 3 个数据流,全部监控

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant