Skip to content

Non-record: DART - Differential Attention Recurrent Transformer (Student submission, Kerala)#345

Open
anandks2006 wants to merge 2 commits intoopenai:mainfrom
anandks2006:main
Open

Non-record: DART - Differential Attention Recurrent Transformer (Student submission, Kerala)#345
anandks2006 wants to merge 2 commits intoopenai:mainfrom
anandks2006:main

Conversation

@anandks2006
Copy link

Non-record submission for the unlimited compute track.

DART is a shared-weight recurrent transformer combining Differential Attention V2 (ICLR 2025) with per-loop specialisation mechanisms including low-rank Q deltas, memory tokens, deep supervision, and U-Net skip connections.

  • Model: 3.92M params, 3.55MB int8+zlib (22.5% of 16MB budget)
  • Score: val_bpb = 1.85221128
  • Hardware: Google Colab T4 free tier
  • Training: 2000 steps, ~65M tokens
  • Author: BCA 2nd year undergraduate, Kerala, India

Full details in README including honest documentation of limitations, compute constraints, and AI tool usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant