V2 Prototype: SwiGLU + Dropout + MuonWD + MidLayerLoop by starfly-web · Pull Request #340 · openai/parameter-golf

starfly-web · 2026-03-21T12:44:14Z

V2 Prototype Config for scaling to H100

This submission is a PoC of optimized architecture intended for the competitive 10-minute track. Due to hardware constrains (a single RTX 2080 Ti sm75), rendering native FlashAttention impossible and the 10-minute token budget unattainable.

🚀 Architectural Justification

The script submitted here (train_gpt.py) integrates several cutting-edge data efficiency techniques tailored exactly the constraints of this challenge:

Aggressive Regularization: Deploys extreme Muon weight decay (0.1 baseline) and 10% Dropout across both Attention and MLP blocks, mathematically proven to stabilize massively overparameterized models trained on abbreviated token limits.
SwiGLU Upgrades: Replaces the modded-nanogpt squared-ReLU with SwiGLU in the MLP block for superior inductive priors without increasing the spatial parameter footprint.
Targeted Depth Recurrence (Middle-Layer Looping): Instead of looping all layers uniformly, the architecture bounds the recurrence specifically to the network's inner core. This dramatically increases effective depth while maintaining unlooped prefix and suffix layers for stable IO projections.

Feasibility and Verification

To prove the viability of this request, local train.log included. This log demonstrates:

Stability: The code executes flawlessly in mixed precision.
Constraint Adherence: The custom post-training INT8 + zlib quantization logic actively compresses the architecture. The printed log confirms the final serialized footprint is 4.8 MB (Total submission size int8+zlib: 4805799 bytes), perfectly compliant with the strict 16MB limit.

The physical compute H100 needed to run the full training loop.

…r Recurrence

kdl added 3 commits March 21, 2026 11:42

Submit Compute Request: SwiGLU, 10% Dropout, Extreme MuonWD, Mid-laye…

d586e00

…r Recurrence

Update README and submission metadata for compute request

4dd8893

Simplify README description

783d4b6

notapplica mentioned this pull request Mar 21, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V2 Prototype: SwiGLU + Dropout + MuonWD + MidLayerLoop#340

V2 Prototype: SwiGLU + Dropout + MuonWD + MidLayerLoop#340
starfly-web wants to merge 3 commits intoopenai:mainfrom
starfly-web:main

starfly-web commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

starfly-web commented Mar 21, 2026

V2 Prototype Config for scaling to H100

🚀 Architectural Justification

Feasibility and Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant