fix(modeling): include named_buffers in module split expansion for infer_auto_device_map#4020
Open
Anai-Guo wants to merge 1 commit into
Open
Conversation
…fer_auto_device_map When infer_auto_device_map splits a module into its children (because it doesn't fit on one device), it expands the module into named_parameters(recurse=False) + named_children() but omits named_buffers(recurse=False). This means any buffer registered directly on a layer is never added to modules_to_treat and consequently never receives a device assignment. check_device_map then raises: ValueError: The device_map provided does not give any device for the following parameters: model.language_model.layers.8.layer_scalar Gemma-4 (google/gemma-4-E4B-it) exhibits this because its decoder layer registers layer_scalar via register_buffer, making it a buffer rather than a parameter. Fix: add list(module.named_buffers(recurse=False)) in the four expansion sites (fallback_allocate module-split, fallback_allocate parent-expansion, infer_auto_device_map tied-module-split, infer_auto_device_map main-module-split).
Contributor
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Author
|
Still relevant — happy to address any review feedback. Please keep open. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
splits modules that don't fit on one device by expanding them into
named_parameters(recurse=False) + named_children(). But it omitsnamed_buffers(recurse=False), so anyregister_buffertensor on a layer never gets assigned a device.check_device_mapthen raises:This breaks multi-GPU loading of Gemma-4 (google/gemma-4-E4B-it), whose
Gemma4DecoderLayerregisterslayer_scalaras a buffer:Closes #4014
Fix
Add
list(module.named_buffers(recurse=False))in the four module-expansion sites inside_infer_auto_device_map/fallback_allocate:fallback_allocatemodule-splitfallback_allocateparent-expansioninfer_auto_device_maptied-module splitinfer_auto_device_mapmain-module splitBuffers are already included in
compute_module_sizes(vianamed_module_tensors) and in the initialmodules_to_treatconstruction — this patch makes the split-expansion paths consistent with both.Repro (before fix)
🤖 Generated with Claude Code