Skip to content

txfs19260817/LeetGPU

Repository files navigation

LeetGPU

A collection of exercises from LeetGPU (GitHub), featuring implementations in CUDA, PyTorch, and Triton.

Prerequisites

Tested on WSL2 Ubuntu 24.04.

System Requirements

sudo apt-get install -y python3.12-dev libgtest-dev
uv sync

Usages

Run CUDA Tests

make build && make test

Run CUDA Benchmarks

make build-release && make bench

Run Python Tests

make py-sync && make py-test

Clean

make clean

NCU

  1. Follow (the Windows section for WSL2) in NVIDIA Developer Tools Solutions: Permission Issue with Performance Counters to grant access to the GPU performance counters to all users.
  2. Restart WSL in powershell by running wsl --shutdown
  3. Run ncu (without sudo):
ncu \
  --set=full \ # Most comprehensive profiling
  -f \ # Force overwrite output files if they already exist
  --kernel-name-base demangled \ # Use human-readable kernel names in output
  --kernel-name 'regex:vector_add' \ # Only profile kernels matching the regex pattern "vector_add"
  -o vector_add \ # Output results to files with "vector_add" prefix (creates .ncu-rep files)
  ./001_vector_addition_benchmark \ # The executable to profile. Here is a nvbench program. Flags for nvbench program can be found in https://github.com/NVIDIA/nvbench/blob/main/docs/cli_help.md
  --profile \ # Run once only
  --axis "N=67108864" # Run the benchmark with N=67108864

This will generate vector_add.ncu-rep which can be opened in:

  • Nsight Compute GUI (Windows): For interactive analysis with charts and recommendations
  • Command line: ncu -i vector_add.ncu-rep for text-based analysis

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published