feat(sa3): fp16 checkpoint converter (halves DiT on-disk size)#299
Draft
leszko wants to merge 1 commit into
Draft
feat(sa3): fp16 checkpoint converter (halves DiT on-disk size)#299leszko wants to merge 1 commit into
leszko wants to merge 1 commit into
Conversation
The medium DiT ships as ~9 GB of float32 safetensors but DEMON runs it in fp16 (load_diffusion_cond does model.to(float16) after loading), so storing fp32 doubles the disk/transfer bytes for precision we discard. Add scripts/sa3/sa3_convert_fp16.py: rewrites a checkpoint casting only float32 tensors to fp16 (bf16 T5Gemma is already 2 bytes -> left alone; int/bool and the safetensors __metadata__ preserved). Supports an fp16 copy (default <dir>-fp16) or --in-place for the image-bake path. Numerically a no-op at runtime: today's path is fp16(fp32); pre-convert stores fp16(fp32) and load_state_dict upcasts fp16->fp32 (exact) before the same .to(fp16). Verified on the medium checkpoint: MAX_ABS_DIFF=0.0 across all 996 float tensors vs the fp32 load. DiT 9.22 GB -> 4.61 GB (checkpoint 9.8 -> 5.5 GiB, 44% smaller). tests/unit/test_sa3_fp16_convert.py locks the contract with synthetic safetensors (no GPU / weights in CI). Consuming this in the baked image (run --in-place before promote_warm) is Phase 1 (demon-public-demo) work, not part of this PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The medium DiT ships as ~9 GB of fp32 safetensors, but DEMON runs it in fp16 — so we store 2× the bytes for precision we discard at load.
scripts/sa3/sa3_convert_fp16.pyrewrites a checkpoint casting only fp32 tensors → fp16 (bf16 T5Gemma, int/bool, and__metadata__left alone). Supports an fp16 copy (default<dir>-fp16) or--in-placefor the image-bake path.tests/unit/test_sa3_fp16_convert.pylocks the contract with synthetic safetensors (no GPU/weights in CI).Baking this into the image (
--in-placebefore promote_warm) is follow-up work, not part of this PR.