Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
0aacbdc
LUT by CC; tune more iters
Starmys Jan 30, 2023
deca07d
fix matmul param checker; bench unbind device
Starmys Jan 31, 2023
8528fa7
rearrange tunning code
Starmys Jan 31, 2023
b718c59
LUT maker
Starmys Jan 31, 2023
67a981e
fix matmul parameter checker
Starmys Feb 2, 2023
bb91310
update LUT maker
Starmys Feb 2, 2023
624f234
fix LUT maker: aggregate log by idxmin
Starmys Feb 2, 2023
d6e6386
fix sparse softmax & BCSR kernel; add 61 LUTs
Starmys Feb 6, 2023
e1cc4cd
add 70 LUTs (as default)
Starmys Feb 6, 2023
bd11b5f
add 75 LUTs
Starmys Feb 6, 2023
530e530
fix kernel.set_parameters()
Starmys Feb 6, 2023
40ea972
Merge branch 'main' of github.com:Starmys/SparTA into main
Starmys Feb 6, 2023
66f1748
refactoring: functional
Starmys Feb 21, 2023
1cbcbfe
update operators; combine functions and operators
Starmys Feb 24, 2023
8de6950
move sparse attr to kernel level
Starmys Mar 7, 2023
604a5b3
fix port connection
Starmys Mar 21, 2023
ff52951
SparseLinear DSD support dynamic input shape
Starmys Mar 23, 2023
8cad262
FlashSparseAttentionForwardKernel; fix softmax kernels
Starmys Apr 13, 2023
370a6db
FlashSParseAttentionBackwardKernel with limited performance
Starmys Apr 27, 2023
cba098b
update FlashSparseAttentionBackwardKernel
Starmys Apr 27, 2023
6f8b721
update FlashSparseAttentionBackwardKernel
Starmys May 4, 2023
7aaf4c5
Flash Attention fp16 forward version 1
Starmys May 9, 2023
dcc4197
Flash Attention fp16 forward version 2: pad (bank conflict) & fp32-so…
Starmys May 9, 2023
a1ff80c
Flash Attention fp16 backward version 1
Starmys May 11, 2023
2a16117
use dynamic shared memory in Flash Attention
Starmys May 12, 2023
92a6781
Flash Attention fp16 backward version 2
Starmys May 12, 2023
1f636e1
update Flash Attention kernel: transpose (N, H) is not required
Starmys May 16, 2023
d4027ba
update Flash Attention FP32 kernel: transpose (N, H) is not required
Starmys May 16, 2023
5bcae7d
Fix sparse kernel unit tests
Starmys Feb 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ _build
generated
test/bench/*/latency.csv
test/bench/*/latency.png
test/lut_maker/*.log.csv
6 changes: 3 additions & 3 deletions docs/1-code-specializer.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ To balance between the flexibility, performance, and developing efficiency, we a

| Layer | Base Class | Role |
| :- | :- | :- |
| Sparse Operator | [`sparta.nn.OperatorBase`](reference/nn.rst) | User interface as `torch.nn.Module` |
| Sparse Context | `sparta.specializer.funtional.SparseCtxBase` | Function context to interact with `torch.autograd.Function` |
| Sparse Kernel Placeholder | `sparta.specializer.funtional.KernelPlaceholder` | Collection of multiple kernel implementations |
| Sparse Operator | [`sparta.nn.SparseOperator`](reference/nn.rst) | User interface as `torch.nn.Module` |
| Sparse Context | `sparta.specializer.functional.SparseCtxBase` | Function context to interact with `torch.autograd.Function` |
| Sparse Kernel Placeholder | `sparta.specializer.functional.KernelPlaceholder` | Collection of multiple kernel implementations |
| Sparse Kernel | `sparta.specializer.kernels.KernelBase` | Tunable sparse CUDA kernel interface |

## Generating CUDA Codes
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/nn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
sparta.nn
===================================

.. autoclass:: sparta.nn.OperatorBase
.. autoclass:: sparta.nn.SparseOperator
:members:

.. autoclass:: sparta.nn.SparseLinear
Expand Down
4 changes: 2 additions & 2 deletions examples/sparse_attention.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@
"source": [
"Check whether the sparse operator works correctly.\n",
"\n",
"We provide `sparta.testing.sparse_multi_head_attention_reference()` function to calculate masked attention using dense method."
"We provide `sparta.testing.sparse_multi_head_attention_forward_reference()` function to calculate masked attention using dense method."
]
},
{
Expand All @@ -141,7 +141,7 @@
"value.requires_grad = True\n",
"\n",
"def dense_attention(query, key, value):\n",
" return sparta.testing.sparse_multi_head_attention_reference(query, key, value, mask)\n",
" return sparta.testing.sparse_multi_head_attention_forward_reference(query, key, value, mask)\n",
"\n",
"for sparse_out, dense_out in zip(forward_backward(dense_attention), forward_backward(sparse_attention)):\n",
" torch.testing.assert_close(sparse_out, dense_out)"
Expand Down
13 changes: 7 additions & 6 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,15 @@
os.makedirs(os.path.join('csrc', 'build'), exist_ok=True)
with open(os.path.join('csrc', 'build', 'moe_sparse_forward_kernel.cu'), 'w') as f:
f.write(moe_kernel)

moe_ext = CUDAExtension(
name='sparse_moe_cpp',
name='sparta.sp_moe_ops',
sources=[
os.path.join('csrc', 'moe_sparse_forward.cpp'),
os.path.join('csrc', 'build', 'moe_sparse_forward_kernel.cu'),
],
extra_compile_args=[
'-std=c++14',
'-std=c++17',
'-O3',
'-U__CUDA_NO_HALF_OPERATORS__',
'-U__CUDA_NO_HALF_CONVERSIONS__',
Expand All @@ -37,12 +38,12 @@
ext_modules.append(moe_ext)

seqlen_dynamic_attention_ext = CUDAExtension(
name='seqlen_dynamic_sparse_attention_cpp',
name='sparta.sp_attn_ops',
sources=[
os.path.join('csrc', 'seqlen_dynamic_sparse_attention_forward.cpp'),
os.path.join('csrc', 'seqlen_dynamic_sparse_attention_forward_kernel.cu'),
],
extra_compile_args=['-std=c++14', '-O3'],
extra_compile_args=['-std=c++17', '-O3'],
)
ext_modules.append(seqlen_dynamic_attention_ext)

Expand All @@ -63,8 +64,8 @@
cmdclass={'build_ext': BuildExtension},
include_package_data=True,
package_data={
'sparta.specializer.kernels.templates': ['*.j2'],
'sparta.specializer.kernels.look_up_tables': ['*.csv'],
'sparta.kernels.templates': ['*.j2'],
'sparta.kernels.look_up_tables': ['*.csv'],
'sparta.tesa.templates': ['*.j2'],
},
)
119 changes: 0 additions & 119 deletions sparta/common/tuning.py

This file was deleted.

99 changes: 0 additions & 99 deletions sparta/common/utils.py

This file was deleted.

7 changes: 7 additions & 0 deletions sparta/kernels/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.

from sparta.kernels.kernel_base import KernelBase, SparsityAttr, KernelGroup
from sparta.kernels.matmul import SparseMatMulKernel, SparTASparseMatMulKernel, OpenAISparseMatMulKernel
from sparta.kernels.softmax import SparseSoftmaxForwardKernel, SparTASparseSoftmaxForwardKernel, SparseSoftmaxBackwardKernel, SparTASparseSoftmaxBackwardKernel
from sparta.kernels.attention import FlashSparseAttentionFP32ForwardKernel, FlashSparseAttentionFP32BackwardKernel, FlashSparseAttentionFP16ForwardKernel, FlashSparseAttentionFP16BackwardKernel
Loading