The code samples on how to distribute the LLM training between GPUs/nodes. The code samples are written from the first principle.
- train_ffns.py: distributed training of Transformer's FFN sublocks (currently implemented: DDP, FSDP and TP).
| Name | Name | Last commit date | ||
|---|---|---|---|---|