Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
2411032
feat(ops): add native Conv1d CUDA kernel (#180)
m96-chan Jan 6, 2026
feb3304
feat(examples): add Llama Guard 3 content safety classifier
m96-chan Jan 12, 2026
5fcf3c3
feat(llm): add LLaMA 4 native CUDA kernels
m96-chan Jan 16, 2026
86c821b
chore: add security benchmark scripts to gitignore
m96-chan Jan 16, 2026
9eb3a4b
chore: update cutlass submodule (alignment fix)
m96-chan Jan 16, 2026
ee6b1fe
feat(attention): add Flash Attention 3 for SM120 (Blackwell)
m96-chan Jan 16, 2026
1241b74
feat(ops): add TMA utilities for SM90+ kernels
m96-chan Jan 16, 2026
adee44b
wip(fa3): add TMA-enabled Flash Attention 3 kernel
m96-chan Jan 16, 2026
dcd1f8e
feat(attention): integrate TMA FA3 into SDPA dispatch
m96-chan Jan 16, 2026
5f763de
fix(fa3): resolve __syncthreads divergence causing kernel hang at scale
m96-chan Jan 16, 2026
028af30
refactor(fa3): parallelize softmax and fix consumer warp indexing
m96-chan Jan 16, 2026
a7c814c
fix(fa3): resolve non-determinism in TMA FA3 attention kernel
m96-chan Jan 16, 2026
e3214b3
wip(fa4): add Flash Attention 4 SM120 Phase 1 BF16 baseline
m96-chan Jan 16, 2026
b676995
wip(fa4): Phase 2 NVFP4 Q@K^T external validation
m96-chan Jan 16, 2026
b7f66c3
bench(fa4): add Phase 3 full NVFP4 pipeline validation
m96-chan Jan 16, 2026
d22576e
docs(fa4): add SM120 implementation report
m96-chan Jan 16, 2026
7b99c62
feat(fa3): add SM120 tuning configs with version selection
m96-chan Jan 16, 2026
1ac32c8
docs(fa3): document sync requirements in SM120 kernel
m96-chan Jan 17, 2026
dc3658f
feat(fp8): add native PTX inline assembly for FP8 block-scale MMA
m96-chan Jan 17, 2026
a9b75e2
feat(fa3): add FP8 block-scale MMA Flash Attention 3 for SM120
m96-chan Jan 17, 2026
ce6de2b
feat(nn): add fused kernels for RMSNorm+Residual, SwiGLU, GeGLU
m96-chan Jan 26, 2026
bea759c
fix(lint): resolve ruff lint errors in fused kernel files
m96-chan Jan 26, 2026
0820572
fix(mypy): add type annotation for scores_max in llama4.py
m96-chan Jan 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -150,3 +150,4 @@ test_gpu/
.claude/memory.jsonl
.claude/benchmarks.db
.claude/logs/
examples/security/*_benchmark.py
Loading