Non-record: DART - Differential Attention Recurrent Transformer (Student submission, Kerala) by anandks2006 · Pull Request #345 · openai/parameter-golf

anandks2006 · 2026-03-21T15:10:26Z

Non-record submission for the unlimited compute track.

DART is a shared-weight recurrent transformer combining Differential Attention V2 (ICLR 2025) with per-loop specialisation mechanisms including low-rank Q deltas, memory tokens, deep supervision, and U-Net skip connections.

Model: 3.92M params, 3.55MB int8+zlib (22.5% of 16MB budget)
Score: val_bpb = 1.85221128
Hardware: Google Colab T4 free tier
Training: 2000 steps, ~65M tokens
Author: BCA 2nd year undergraduate, Kerala, India

Full details in README including honest documentation of limitations, compute constraints, and AI tool usage.

anandks2006 added 2 commits March 21, 2026 20:25

DART submission: Differential Attention Recurrent Transformer

a967c22

DART submission files

0f286ed

notapplica mentioned this pull request Mar 21, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: DART - Differential Attention Recurrent Transformer (Student submission, Kerala)#345

Non-record: DART - Differential Attention Recurrent Transformer (Student submission, Kerala)#345
anandks2006 wants to merge 2 commits intoopenai:mainfrom
anandks2006:main

anandks2006 commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anandks2006 commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant