An experiment implementing the decoder part of the transformer architecture from the paper "Attention is All You Need" by Vaswani et al. (2017) in PyTorch.
The current model was trained on TinyStories Dataset.
This project is a work in progress.
| Name | Name | Last commit date | ||
|---|---|---|---|---|