Skip to content

poofeth/DXRL-Starter

Repository files navigation

Below is a ready‑to‑drop‑in README.md that walks through provisioning an Ubuntu 22.04 (or newer) box with CUDA 12.8 + A100 for TRL‑based GRPO training. Everything is written as sequential shell blocks so you can copy‑paste one chunk at a time.


# ⚡️ A100 CUDA 12.8 — GRPO Training Stack (PyTorch + TRL + vLLM + FlashAttention + W&B)

This guide assumes **Ubuntu 22.04+**, a working **CUDA 12.8** driver/toolkit, and an **A100** GPU.  
The same steps work on H100 & Blackwell cards; adjust compute capability flags if you build from source.

---

## 0  Prerequisites

```bash
# Sanity‑check that Ubuntu sees the GPU & CUDA 12.8
nvidia-smi          # should list A100 + Driver built against CUDA 12.8
nvcc --version      # CUDA compilation tools release 12.8

1  System update & build tooling

sudo apt update && sudo apt -y full-upgrade

# Core build chain + Python headers (needed for PyTorch / FlashAttention wheels)
sudo apt install -y build-essential pkg-config git curl wget ca-certificates \
  libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev \
  libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev \
  liblzma-dev ninja-build

sudo apt-get install python3-dev python3.12-dev

2  Install pyenv (Python version / venv manager)

    apt install python3.12-venv

We already have Python 3.12.3 system‑wide, but pyenv lets us isolate project envs and pin micro‑versions as needed. (Medium)


3  Create a clean Python 3.12 env for training

python3 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip wheel setuptools

4  Install PyTorch + CUDA 12.8 wheels

pip install --pre torch torchvision torchaudio \
  --index-url https://download.pytorch.org/whl/nightly/cu128   # cu128 wheels

Official cu128 wheels are distributed via the PyTorch extra index. (GitHub)

✅ Quick GPU test

python - <<'PY'
import torch, platform
print("PyTorch", torch.__version__, "CUDA", torch.version.cuda)
print("GPU OK:", torch.cuda.is_available())
print("Device:", torch.cuda.get_device_name(0))
PY

5  Install TRL (Transformer RL) & friends

pip install trl transformers datasets evaluate accelerate

(The current TRL 0.19.x line supports GRPO, DPO, RM, etc.)


6  Install vLLM pre‑built for CUDA 12.8

# Pull the wheel compiled against cu128 and the PyTorch you already installed
pip install vllm --extra-index-url https://download.pytorch.org/whl/cu128

vLLM ships wheels built with CUDA 12.8—no manual compile needed. (vLLM)


7  Install FlashAttention 2 / 3

Flash‑Attn requires a short source build to link against your local CUDA toolkit:

pip install flash-attn --no-build-isolation     # fastest path
#   or, if you need FlashAttention‑3 for Hopper/H100:
# git clone https://github.com/Dao-AILab/flash-attention.git && cd flash-attention
# python setup.py install
#then lets do liger kernel for some extra speed and memory savings...
pip install liger-kernel

Flash‑Attn 2.8+ targets CUDA ≥ 12.3, with 12.8 recommended for best perf. (PyPI)


8  Weights & Biases (experiment tracking)

pip install wandb
wandb login   # paste your API key

Lightweight install—no GPU code. (PyPI)


9  Optional data / viz tooling

pip install pandas pyarrow numpy matplotlib seaborn tqdm rich

These cover Parquet IO (pyarrow), general data munging (pandas), and quick exploratory plots (matplotlib, seaborn) without clashing with the deep‑learning stack.


🎉 You’re ready to launch GRPO runs using TRL. Want to know more? https://huggingface.co/docs/trl/main/en/grpo_trainer

python -m trl.grpo_trainer \
  --model_name_or_path facebook/opt-1.3b \
  --reward_model my_reward_model \
  --logging_steps 10 \
  --report_to wandb

For distributed multi‑GPU launches on the A100, consult DeepSpeed or Accelerate config files as needed.

Happy training!


---

**Anything missing?**  
If you need Triton, cuDNN, or additional monitoring (e.g., `gpustat`, `nvtop`), just add them under a new bullet—this scaffold is meant to stay minimal and conflict‑free.
::contentReference[oaicite:5]{index=5}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages