Attention Positional Encoding, adds PE directly to Attention, leaving embedding to serve its sole purpose.
nlp research deep-learning experimental pytorch transformer neural-networks attention attention-mechanism architecture-design sequence-modeling positional-encoding efficient-transformers transformer-variants attention-positional-encoding ape-transformer no-qkv-projection embedding-purity
-
Updated
May 5, 2026 - Jupyter Notebook