- All materials in this repository are based on lectures and code from Inflean's CUDA course.
- The link of the course is as follows:
- All lecture materials and codes follow the instructor's license and are only for educational purposes for the course attended.
- Commercial use is prohibited !!
- Nvidia GPU
- OS: Ubuntu 20.04 (for me), Windows 10 over., Mac
- Install CUDA (for me, v12.1)
- Pure python Env (Not conda Env)
- If you want to set python3 for main python module, please set.
sudo apt update
sudo apt install python3.8
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.8 10
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 10 # If don't use python3, no need it
sudo update-alternatives --config python3 # If don't use python3, no need itsudo apt-get install libglfw3-dev libglfw3sudo apt purge cmake
sudo apt install wget build-essential
wget https://github.com/Kitware/CMake/releases/download/v3.30.0/cmake-3.30.0.tar.gz
tar -xvzf cmake-3.30.0.tar.gz
cd cmake-3.30.0
./bootstrap --prefix=/usr/local
make
sudo make install
cmake --version- If you don't find cmake version, please edit as follows:
vi ~./bashrcPATH=/usr/local/bin:$PATH:$HOME/bin
source ~./bashrcDownload CUDA Samples (for me, v12.1)
wget https://github.com/NVIDIA/cuda-samples/archive/refs/tags/v12.1.tar.gz
tar -zxvf v12.1.tar.gzwget https://github.com/NVIDIA/cuda-samples/archive/refs/tags/v12.1.zip
unzip v12.1.zip
make
sudo make install- $ ubuntu-drivers devices
- $ sudo apt install nvidia-driver-xx
- reboot !
- $ nvidia-smi (only for checking your NVIDIA driver)
- visit CUDA-zone to get the CUDA toolkit
- $ sudo apt get install build-essential (to get GCC compilers)
- $ nvcc -V (now you should get the NVIDIA CUDA Compiler messages)
- in each section, build the project as shown below and run the generated file.
mkdir build
cd build
cmake ..
make
./generated_execution_file1. part1_cuda_kernel: Start CUDA programming | Certificate
- print hello cuda (on Ubuntu)
- memory copy
- add vector by using cpu or CUDA
- error check
2. part2_vector_addition: Study CUDA kernel launch | Certificate
- elapsed time
- CUDA kernel launch
- 1d vector addition
- Giga vector addition
- AXPY and FMA
- single precision
- linear interpolation
- thread and GPU
3. part3_memory_structure: Memory Structure | Certificate
- ๋ฉ๋ชจ๋ฆฌ ๊ณ์ธต ๊ตฌ์กฐ
- CUDA ์ ์ฉ์ 2D ๋ฉ๋ชจ๋ฆฌ ํ ๋น ํจ์, pitched point ์ฌ์ฉ๋ฒ
- 3D ํ๋ ฌ ์ฌ์ฉ ๋ฐ pitched point ์ฌ์ฉ๋ฒ
- CUDA ๋ฉ๋ชจ๋ฆฌ ๊ณ์ธต ๊ตฌ์กฐ
- ์ธ์ ์์๋ผ๋ฆฌ ์ฐจ์ด ๊ตฌํ๊ธฐ: shared memory ํ์ฉ
4. part4_matrix_multiply: Matrix Multiply | Certificate
- matrix copy
- Matrix Transpose ์ ์น ํ๋ ฌ
- Matrix Multiplication
- GEMM: general matrix-to-matrix multiplication
- ๋ฉ๋ชจ๋ฆฌ์ ๋ฐ๋ฅธ CUDA ๋ณ์ ์คํผ๋ ์ธก์
- ์ ๋ฐ๋์ ์๋๊ฐ์
5. part5_atommic_operation: Atomic Operation | Certificate
- Control Flow
- if ๋ฌธ ๊ณผ for loop ๋ฌธ ์ด๋ป๊ฒ ์ต์ ํ ํ ๊ฒ์ธ์ง?
- shared ๋ฉ๋ชจ๋ฆฌ๋ฅผ ์ฌ์ฉํ๋ ๊ฒฝ์ฐ๋ผ๋ฉด,
half-by-half๋ฅผ ์ฌ์ฉํ๋even-odd๋ณด๋ค ์กฐ๊ธ๋ ๋น ๋ฅด๋ค.!!
- race conditions ๋ฌธ์ ์ ํด๊ฒฐ๋ฐฉ๋ฒ์ผ๋ก Atomic Operation ์ฌ์ฉ
- atomic operation ์ฌ์ฉํ์ฌ histogram ๊ตฌํ๊ธฐ
- Reduction Problem ์๋ฃจ์
- GEMV operation
6. part6_search_sort: Search & Sort | Certificate
- Linear Search ์ ํ ํ์
- Search All ๋ชจ๋ ์์น ๋ชจ๋ ์ฐพ๊ธฐ
- CUDA์์ stride ์ฌ์ฉํ๋ ๊ฒ์ด ์ ์ผ ๋น ๋ฅด๋ค.
- Binary Search ์ด์ง ํ์
- CUDA ์ฌ์ฉํด์,
binary search๋ ํจ๊ณผ์ ์ด์ง ๋ชปํ๋ค. ๊ทธ๋ฅ CPU ์ฌ์ฉํ์ธ์!. ํนํ STL ์งฑ์งฑ ๋น ๋ฆ.
- CUDA ์ฌ์ฉํด์,
CUDA ์์ Sort ํ๋ ๋ฐฉ๋ฒ.. ๋ณธ๊ฒฉ์ ์ผ๋ก ์๊ธฐํด ๋ณด์!!- ๋ธ๋ญ ๋จ์ parallel sorting
- CUDA even-odd sort: ์์ฒญ ๋นจ๋ผ ์ง
- global ๋ฉ๋ชจ๋ฆฌ ํ์ฉ parallel sort ํ ๋๋,
- CUDA (even-odd) ์์ ๋์ฐจ๋ ์๋นํ ๋๋ฆฌ๋ค.
- ๋ธ๋ญ ๋จ์ parallel sorting
- Bitonic Sort ๋ฐ์ดํ ๋ ์ํธ
- ๋ณ๋ ฌ ์ฒ๋ฆฌ๋ฅผ ์ํ, ์ํ ๋ฐฉ๋ฒ์ด๋ผ๊ณ ๋ณด๋ฉด ๋จ
- Counting Merge Sort ์นด์ดํ
๋ฐฉ์ ๋จธ์ง ์ํธ (๋ณํฉ ์ ๋ ฌ)
- ๋ณ๋ ฌ ์ฒ๋ฆฌ์ ๊ฐ์ฅ ์ ํฉํ Large Scale Parallel Counting Merge Sort ๋ฐฉ๋ฒ
- All description in the materials have been modified by myself, Hyunkoo Kim.
- (c) 2024. hyunkookim.me@gmail.com. All rights reserved.