Skip to content

Latest commit

ย 

History

History
144 lines (126 loc) ยท 5.25 KB

File metadata and controls

144 lines (126 loc) ยท 5.25 KB

Study_CUDA_Programming (based on C++ ver11)

  • All materials in this repository are based on lectures and code from Inflean's CUDA course.
  • The link of the course is as follows:
  • All lecture materials and codes follow the instructor's license and are only for educational purposes for the course attended.
  • Commercial use is prohibited !!

Must prepare as follows:

  • Nvidia GPU
  • OS: Ubuntu 20.04 (for me), Windows 10 over., Mac
  • Install CUDA (for me, v12.1)
  • Pure python Env (Not conda Env)
    • If you want to set python3 for main python module, please set.
sudo apt update
sudo apt install python3.8
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.8 10
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 10 # If don't use python3, no need it
sudo update-alternatives --config python3 # If don't use python3, no need it

Install glfw3 packages

sudo apt-get install libglfw3-dev libglfw3

Install cmake 3.30

sudo apt purge cmake
sudo apt install wget build-essential

wget https://github.com/Kitware/CMake/releases/download/v3.30.0/cmake-3.30.0.tar.gz
tar -xvzf cmake-3.30.0.tar.gz
cd cmake-3.30.0
./bootstrap --prefix=/usr/local
make
sudo make install
cmake --version
  • If you don't find cmake version, please edit as follows:
vi ~./bashrc

PATH=/usr/local/bin:$PATH:$HOME/bin

source ~./bashrc

Download CUDA Samples (for me, v12.1)

wget https://github.com/NVIDIA/cuda-samples/archive/refs/tags/v12.1.tar.gz
tar -zxvf v12.1.tar.gz
wget https://github.com/NVIDIA/cuda-samples/archive/refs/tags/v12.1.zip
unzip v12.1.zip
make
sudo make install

CUDA for Ubuntu

  • $ ubuntu-drivers devices
  • $ sudo apt install nvidia-driver-xx
    • reboot !
  • $ nvidia-smi (only for checking your NVIDIA driver)
    • visit CUDA-zone to get the CUDA toolkit
  • $ sudo apt get install build-essential (to get GCC compilers)
  • $ nvcc -V (now you should get the NVIDIA CUDA Compiler messages)

CUDA Tutorial

  • in each section, build the project as shown below and run the generated file.
mkdir build
cd build
cmake ..
make
./generated_execution_file

This tutorial is structured as follows:

1. part1_cuda_kernel: Start CUDA programming | Certificate

  • print hello cuda (on Ubuntu)
  • memory copy
  • add vector by using cpu or CUDA
  • error check

2. part2_vector_addition: Study CUDA kernel launch | Certificate

  • elapsed time
  • CUDA kernel launch
  • 1d vector addition
  • Giga vector addition
  • AXPY and FMA
    • single precision
    • linear interpolation
  • thread and GPU

3. part3_memory_structure: Memory Structure | Certificate

  • ๋ฉ”๋ชจ๋ฆฌ ๊ณ„์ธต ๊ตฌ์กฐ
  • CUDA ์ „์šฉ์˜ 2D ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น ํ•จ์ˆ˜, pitched point ์‚ฌ์šฉ๋ฒ•
  • 3D ํ–‰๋ ฌ ์‚ฌ์šฉ ๋ฐ pitched point ์‚ฌ์šฉ๋ฒ•
  • CUDA ๋ฉ”๋ชจ๋ฆฌ ๊ณ„์ธต ๊ตฌ์กฐ
  • ์ธ์ ‘ ์›์†Œ๋ผ๋ฆฌ ์ฐจ์ด ๊ตฌํ•˜๊ธฐ: shared memory ํ™œ์šฉ

4. part4_matrix_multiply: Matrix Multiply | Certificate

  • matrix copy
  • Matrix Transpose ์ „์น˜ ํ–‰๋ ฌ
  • Matrix Multiplication
  • GEMM: general matrix-to-matrix multiplication
  • ๋ฉ”๋ชจ๋ฆฌ์— ๋”ฐ๋ฅธ CUDA ๋ณ€์ˆ˜ ์Šคํ”ผ๋“œ ์ธก์ •
  • ์ •๋ฐ€๋„์™€ ์†๋„๊ฐœ์„ 

5. part5_atommic_operation: Atomic Operation | Certificate

  • Control Flow
    • if ๋ฌธ ๊ณผ for loop ๋ฌธ ์–ด๋–ป๊ฒŒ ์ตœ์ ํ™” ํ• ๊ฒƒ์ธ์ง€?
    • shared ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ๋ผ๋ฉด, half-by-half๋ฅผ ์‚ฌ์šฉํ•˜๋Š” even-odd ๋ณด๋‹ค ์กฐ๊ธˆ๋” ๋น ๋ฅด๋‹ค.!!
  • race conditions ๋ฌธ์ œ์˜ ํ•ด๊ฒฐ๋ฐฉ๋ฒ•์œผ๋กœ Atomic Operation ์‚ฌ์šฉ
  • atomic operation ์‚ฌ์šฉํ•˜์—ฌ histogram ๊ตฌํ•˜๊ธฐ
  • Reduction Problem ์†”๋ฃจ์…˜
  • GEMV operation

6. part6_search_sort: Search & Sort | Certificate

  • Linear Search ์„ ํ˜• ํƒ์ƒ‰
  • Search All ๋ชจ๋“  ์œ„์น˜ ๋ชจ๋‘ ์ฐพ๊ธฐ
    • CUDA์—์„œ stride ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ œ์ผ ๋น ๋ฅด๋‹ค.
  • Binary Search ์ด์ง„ ํƒ์ƒ‰
    • CUDA ์‚ฌ์šฉํ•ด์„œ, binary search๋Š” ํšจ๊ณผ์ ์ด์ง€ ๋ชปํ•˜๋‹ค.
    • ๊ทธ๋ƒฅ CPU ์‚ฌ์šฉํ•˜์„ธ์š”!. ํŠนํžˆ STL ์งฑ์งฑ ๋น ๋ฆ„.
  • CUDA ์—์„œ Sort ํ•˜๋Š” ๋ฐฉ๋ฒ•.. ๋ณธ๊ฒฉ์ ์œผ๋กœ ์–˜๊ธฐํ•ด ๋ณด์ž!!
    • ๋ธ”๋Ÿญ ๋‹จ์œ„ parallel sorting
      • CUDA even-odd sort: ์—„์ฒญ ๋นจ๋ผ ์ง
    • global ๋ฉ”๋ชจ๋ฆฌ ํ™œ์šฉ parallel sort ํ• ๋•Œ๋Š”,
      • CUDA (even-odd) ์—์„œ ๋„์ฐจ๋„ ์ƒ๋‹นํžˆ ๋А๋ฆฌ๋‹ค.
  • Bitonic Sort ๋ฐ”์ดํ† ๋‹‰ ์†ŒํŠธ
    • ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ, ์†ŒํŒ… ๋ฐฉ๋ฒ•์ด๋ผ๊ณ  ๋ณด๋ฉด ๋จ
  • Counting Merge Sort ์นด์šดํŒ… ๋ฐฉ์‹ ๋จธ์ง€ ์†ŒํŠธ (๋ณ‘ํ•ฉ ์ •๋ ฌ)
    • ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ์— ๊ฐ€์žฅ ์ ํ•ฉํ•œ Large Scale Parallel Counting Merge Sort ๋ฐฉ๋ฒ•

Additional Comments

  • All description in the materials have been modified by myself, Hyunkoo Kim.
  • (c) 2024. hyunkookim.me@gmail.com. All rights reserved.