Skip to content

Non-record: PR315 repro on 1xH100 PCIe, int6+zstd (val_bpb=1.8338)#356

Open
sjp611 wants to merge 1 commit intoopenai:mainfrom
sjp611:sjp611-pr315-1gpu-repro
Open

Non-record: PR315 repro on 1xH100 PCIe, int6+zstd (val_bpb=1.8338)#356
sjp611 wants to merge 1 commit intoopenai:mainfrom
sjp611:sjp611-pr315-1gpu-repro

Conversation

@sjp611
Copy link

@sjp611 sjp611 commented Mar 21, 2026

Summary

Notes

  • 1GPU limits to ~492 steps (vs ~6200 on 8xH100)
  • QAT had only ~40 steps of adaptation due to limited training budget

Test plan

  • Training completes within 10min wallclock
  • Artifact under 16MB (10.0MB)
  • int6+zstd roundtrip eval passes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant