Record: 11L XSA + EMA + Int5-MLP (val_bpb=1.1399) by Mapika · Pull Request #349 · openai/parameter-golf

Mapika · 2026-03-21T16:14:09Z

Summary

11 layers with XSA (Exclusive Self-Attention) on last 4 layers
Continuous GPU float32 EMA (decay=0.997) — every step, no CPU transfers
Mixed int5 MLP / int6 attention / int8 embedding quantization
8% magnitude pruning + zstd-22 compression

3-Seed Results (8xH100)

Seed	val_bpb	artifact_bytes	valid
42	1.14005	15,919,150	yes
1337	1.13874	15,999,808	yes
7	1.14080	15,882,678	yes
Mean	1.1399
Std	0.0009

All seeds trained in <600s, all under 16MB.

Architecture

26.8M params, 512 dim, 8 heads, 4 KV heads (GQA)
SmearGate + BigramHash(2048) + U-Net skip connections
Muon optimizer (WD=0.04), cosine warmdown (3000 iters)
~5,850 steps at ~102ms/step

Seeds 42/1337/7: 1.1401/1.1387/1.1408 (mean 1.1399, std 0.0009). All under 16MB, trained in <600s on 8xH100. Key techniques: - 11 layers with XSA on last 4 layers - Continuous GPU float32 EMA (decay=0.997) - Mixed int5 MLP / int6 attention quantization - 8% magnitude pruning + zstd-22

notapplica mentioned this pull request Mar 21, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 11L XSA + EMA + Int5-MLP (val_bpb=1.1399)#349

Record: 11L XSA + EMA + Int5-MLP (val_bpb=1.1399)#349
Mapika wants to merge 1 commit intoopenai:mainfrom
Mapika:submission/11L-XSA-EMA-Int5MLP

Mapika commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mapika commented Mar 21, 2026

Summary

3-Seed Results (8xH100)

Architecture

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant