Skip to content

CalvinXKY/BasicCUDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intro

This project is a hands-on CUDA programming library designed to help learners master CUDA fundamentals and their integration with PyTorch. Unlike the official NVIDIA documentation, which often requires substantial time and effort to digest, we focus on real-world examples and concise explanations to make the learning process more intuitive and efficient. Several key modules include progressive versions, allowing readers to build understanding step-by-step while learning practical optimization techniques for kernels and functions. To keep compilation straightforward, each module is self-contained and organized in its own file.

eg.

cd matrix_multiply 
make
./matMul

🔍 中文博客

📚 文章 📖 类型 🧩 代码
GPU硬件: Tesla 经典架构详解 GPU基础 -
GPU硬件:AI算力GPU发展简史 GPU基础 -
GPU软件:GPU内存(显存)的理解与基本使用 GPU基础 link
GPU硬件: MIG-GPU简介与A100-MIG实践详解 GPU基础 -
GPU硬件: Tensor core和cuda core是什么区别? GPU基础 -
GPU硬件: Ampere架构硬件分析与A100测试 GPU基础 link
CUDA全局坐标计算&Grid/Block/threadIdx映射处理 CUDA C++ link
CUDA入门:矩阵乘运算从CPU到GPU CUDA C++ link
CUDA入门:虚拟地址(VMM)的基本使用 CUDA C++ link
CUDA实践:训练融合运算ScaledMaskSoftmax算子 CUDA C++ link
CUDA入门:常用技巧/方法 CUDA C++ link
CUDA实践:20行代码入门PyTorch自定义CUDA/C++ CUDA C++ link
NCCL算法的拓扑建立与通路选择 GPU网络 link
NCCL初始化日志解读 GPU网络 -
NCCL通信C++示例(一): 基础用例解读与运行 GPU网络 link
NCCL通信C++示例(二): 用socket建立多机连接 GPU网络 link
NCCL通信C++示例(三): 多流并发通信(非阻塞) GPU网络 link
NCCL通信C++示例(四): AlltoAll_Split实现与分析 GPU网络 link
GPU组网:一图了解GPU网络拓扑 GPU基础 -
PyTorch显存管理介绍与源码解析(一) PyTorch link
PyTorch显存管理介绍与源码解析(二) PyTorch link
PyTorch显存管理介绍与源码解析(三) PyTorch link
PyTorch显存可视化与Snapshot数据分析 PyTorch link

About

A tutorial for CUDA&PyTorch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors