Simulation code for paper "Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality". Arxiv: paperlink
requirements.tex freezes the packages used in this project.
Run training.py after modifying the line
wandb.init(project="<your project name>", entity="<your entity name>", config=hparams_dict)