Skip to content

Simulation code for paper "Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality"

License

Notifications You must be signed in to change notification settings

FFishy-git/MS-Attn-Simulation

Repository files navigation

MS-Attn-Simulation

Simulation code for paper "Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality". Arxiv: paperlink

Installation

requirements.tex freezes the packages used in this project.

Simulation

Run training.py after modifying the line

wandb.init(project="<your project name>", entity="<your entity name>", config=hparams_dict)

About

Simulation code for paper "Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages