Support Cosmos3-Super task-specialized (Text2Image / Image2Video) che…#13
Merged
Conversation
e687153 to
7600d64
Compare
…ckpoints These task-specialized diffusers checkpoints reuse the Cosmos3-Super architecture but omit unused modality weights and bundle their own VLM processor. Loading them previously failed, and the processor pulled a redundant full base-Super download. - inference/model.py: tolerate absent action/sound projection-head weights in the diffusers load planner, mirroring the existing vision carve-out. Fixes the masked "TypeError: cannot pickle code objects" that surfaced when DCP tried to broadcast the missing-tensor ValueError across ranks. No-op for self-consistent base checkpoints: Nano/Super provide all modality weights, so the guards never fire. - inference: add CheckpointConfig.vlm_processor_from_checkpoint. When set, the loader sources the VLM processor from the loaded checkpoint's own bundled files instead of the repository hardcoded in the model config, avoiding a redundant base-Super download. Enabled for the two task checkpoints; base Nano/Super keep their configured repository. - docs/faq.md: add EADDRINUSE / --master-port entry. Verified: Text2Image (t2i) and Image2Video (i2v) load and generate; a full base Cosmos3-Nano t2i run is unchanged with strict weight loading intact (carve-out never triggers). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7600d64 to
0fd3d55
Compare
foreverlms
reviewed
Jun 3, 2026
| ), | ||
| "Cosmos3-Super-Text2Image": CheckpointConfig( | ||
| model_memory_bytes=MODEL_MEMORY_BYTES_BY_SIZE["32B"], | ||
| config_file=str(CONFIG_DIR / "model/Cosmos3-Super.yaml"), |
Collaborator
There was a problem hiding this comment.
Are we sure that these two specialized models could just reuse this super yaml file, that they do not have specialized training config?
Collaborator
Author
There was a problem hiding this comment.
Yeah, we have run the test and verified.
foreverlms
approved these changes
Jun 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…ckpoints
These task-specialized diffusers checkpoints reuse the Cosmos3-Super architecture but omit unused modality weights and bundle their own VLM processor. Loading them previously failed, and the processor pulled a redundant full base-Super download.
inference/model.py: tolerate absent action/sound projection-head weights in the diffusers load planner, mirroring the existing vision carve-out. Fixes the masked "TypeError: cannot pickle code objects" that surfaced when DCP tried to broadcast the missing-tensor ValueError across ranks. No-op for self-consistent base checkpoints: Nano/Super provide all modality weights, so the guards never fire.
inference: add CheckpointConfig.vlm_processor_from_checkpoint. When set, the loader sources the VLM processor from the loaded checkpoint's own bundled files instead of the repository hardcoded in the model config, avoiding a redundant base-Super download. Enabled for the two task checkpoints; base Nano/Super keep their configured repository.
data/vfm/processors: clearer error when build_processor_lazy is given neither a repository nor a tokenizer_type source (never fires for existing call sites; only improves a previously-TypeError path).
docs/faq.md: add EADDRINUSE / --master-port entry.
docs/superpowers/specs: design spec for the processor-source change.
Verified: Text2Image (t2i) and Image2Video (i2v) load and generate; a full base Cosmos3-Nano t2i run is unchanged with strict weight loading intact (carve-out never triggers).