See code here: https://github.com/NVIDIA/Megatron-LM/blob/ad58411ddb396aeb196f6a08bd9c4000a0f10361/megatron/training/training.py#L290 implement for all models
See code here: https://github.com/NVIDIA/Megatron-LM/blob/ad58411ddb396aeb196f6a08bd9c4000a0f10361/megatron/training/training.py#L290
implement for all models