This repository now includes a small CMake-based framework for CUDA operator practice.
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -jOptional architecture override:
cmake -S . -B build -DCMAKE_CUDA_ARCHITECTURES=86ctest --test-dir build --output-on-failureCurrent test:
vec_add_correctness
./build/vec_add_bench --size 16777216 --warmup 20 --iters 100Arguments:
--size: number of elements (default1<<24)--warmup: warmup launches before measurement--iters: measured launches
- Add declaration in
include/ops/<your_op>.cuh. - Add CUDA implementation in
src/ops/<your_op>.cu. - Register source file in
cuda_opsinsideCMakeLists.txt. - Add correctness test in
tests/<your_op>_test.cuand register withadd_test. - Add benchmark binary in
benchmarks/<your_op>_bench.cu.
Reusable helpers:
include/common/cuda_check.cuh: CUDA error checks.include/common/timer.cuh: event-based GPU timing.