Skip to content

martin-kukla/distributed-llm-code-samples

Repository files navigation

Distributed LLM training: code samples

The code samples on how to distribute the LLM training between GPUs/nodes. The code samples are written from the first principle.

Files

  • train_ffns.py: distributed training of Transformer's FFN sublocks (currently implemented: DDP, FSDP and TP).

About

Code samples on how to distribute the LLM training between GPUs/nodes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors