C++ port of Karpathy's microgpt — a minimal, dependency-free GPT training and inference implementation.
This version faithfully reproduces the original scalar autograd + GPT architecture in C++, and adds cuBLAS-accelerated linear layers (cublasSgemv for forward, cublasSgemv + cublasSger for backward) to offload matrix-vector operations to the GPU.
src/
├── main.cpp # data loading, training loop, sampling
├── value.h # scalar autograd engine (Value with backward)
├── gpt.h # GPT forward pass: linear, softmax, rmsnorm, multi-head attention
└── linear_cublas.h # cuBLAS wrappers for gemv forward & backward
The default build uses cuBLAS to accelerate linear layers. Requires CUDA toolkit (tested with CUDA 12.8). Adjust the CUDA path in CMakeLists.txt if needed.
cmake -B build
cmake --build build
./build/microgptTo build without CUDA, uncomment the CPU linear loop and comment out the cuBLAS block, then remove the CUDA-related lines from CMakeLists.txt.
The program auto-downloads names.txt from the makemore dataset if input.txt is not present.
| Param | Value |
|---|---|
n_embd |
64 |
n_head |
4 |
n_layer |
1 |
block_size |
16 |
Follows GPT-2 style with minor differences (same as the original): RMSNorm instead of LayerNorm, ReLU instead of GeLU, no biases. Trains with Adam (linear LR decay) for 1000 steps, then samples 20 names with temperature 0.5.