一些碎碎念

Compile

git submodule update --init --recursive
mkdir -p build && cd build
cmake ..
cd ..
ln -s build/compile_commands.json compile_commands.json

如果clangd报错，则下载最新的clangd并设置vscode的clangd path

Kernel后续优化

softmax_f32_kernel分片处理，使用online softmax算法，减少和global memory交互的次数
Flash attention 解决prefill时softmax显存爆炸和memory bottleneck的情况
cudaStream 优化
cudaGraph 优化
logits 变为pinned memory，减少copy开销
测量flash attention中对于计算O的时候，到底是存score划算还是重新算score划算；现在用的是存score
batch GEMV用HGEMM实现

并行Prefill性能测试

使用项目根目录的test.txt文件，3000中文词，thinking模式，仅对比TTFT指标

开启prefill，结果如下

[perf] prompt_tokens=1371, generated_tokens=373, total_tokens=1744, inference_time=12.295s, TTFT=3.745s, tokens/s=141.85

不用prefill，结果如下

[perf] prompt_tokens=1371, generated_tokens=7581, total_tokens=8952, inference_time=196.526s, TTFT=196.526s, tokens/s=45.55

FlashAttention V1 优化

online softmax改为reduce两次实现
reg_score * value将warp在head_dim上切分，对于同一warp的线程，改为连续访问；而不是原来的stride
对key的访问增加swizzle减轻bank conflict
将key value改为bfloatx2，使用向量化加载消除2-way bank conflict

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.vscode		.vscode
include		include
src		src
tests		tests
third_party		third_party
.clang-format		.clang-format
.clangd		.clangd
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md
experiment_ncu.sh		experiment_ncu.sh
experiment_nsys.sh		experiment_nsys.sh
experiment_time.sh		experiment_time.sh
export.py		export.py
main.cpp		main.cpp
run.sh		run.sh
test.txt		test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

一些碎碎念

Compile

Kernel后续优化

并行Prefill性能测试

FlashAttention V1 优化

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

一些碎碎念

Compile

Kernel后续优化

并行Prefill性能测试

FlashAttention V1 优化

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages