Skip to content

feat(sa3): fp16 checkpoint converter (halves DiT on-disk size)#299

Draft
leszko wants to merge 1 commit into
mainfrom
sa3-fp16-checkpoint
Draft

feat(sa3): fp16 checkpoint converter (halves DiT on-disk size)#299
leszko wants to merge 1 commit into
mainfrom
sa3-fp16-checkpoint

Conversation

@leszko

@leszko leszko commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

The medium DiT ships as ~9 GB of fp32 safetensors, but DEMON runs it in fp16 — so we store 2× the bytes for precision we discard at load.

scripts/sa3/sa3_convert_fp16.py rewrites a checkpoint casting only fp32 tensors → fp16 (bf16 T5Gemma, int/bool, and __metadata__ left alone). Supports an fp16 copy (default <dir>-fp16) or --in-place for the image-bake path.

  • Numerically a no-op at runtime — verified MAX_ABS_DIFF=0.0 across all 996 float tensors vs the fp32 load.
  • DiT 9.2 → 4.6 GB; full checkpoint 9.8 → 5.5 GB (44% smaller).
  • tests/unit/test_sa3_fp16_convert.py locks the contract with synthetic safetensors (no GPU/weights in CI).

Baking this into the image (--in-place before promote_warm) is follow-up work, not part of this PR.

The medium DiT ships as ~9 GB of float32 safetensors but DEMON runs it
in fp16 (load_diffusion_cond does model.to(float16) after loading), so
storing fp32 doubles the disk/transfer bytes for precision we discard.

Add scripts/sa3/sa3_convert_fp16.py: rewrites a checkpoint casting only
float32 tensors to fp16 (bf16 T5Gemma is already 2 bytes -> left alone;
int/bool and the safetensors __metadata__ preserved). Supports an fp16
copy (default <dir>-fp16) or --in-place for the image-bake path.

Numerically a no-op at runtime: today's path is fp16(fp32); pre-convert
stores fp16(fp32) and load_state_dict upcasts fp16->fp32 (exact) before
the same .to(fp16). Verified on the medium checkpoint: MAX_ABS_DIFF=0.0
across all 996 float tensors vs the fp32 load. DiT 9.22 GB -> 4.61 GB
(checkpoint 9.8 -> 5.5 GiB, 44% smaller). tests/unit/test_sa3_fp16_convert.py
locks the contract with synthetic safetensors (no GPU / weights in CI).

Consuming this in the baked image (run --in-place before promote_warm)
is Phase 1 (demon-public-demo) work, not part of this PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@leszko leszko marked this pull request as ready for review July 1, 2026 09:49
@leszko leszko marked this pull request as draft July 1, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant