Skip to content

fix(modeling): include named_buffers in module split expansion for infer_auto_device_map#4020

Open
Anai-Guo wants to merge 1 commit into
huggingface:mainfrom
Anai-Guo:fix-infer-auto-device-map-missing-buffers
Open

fix(modeling): include named_buffers in module split expansion for infer_auto_device_map#4020
Anai-Guo wants to merge 1 commit into
huggingface:mainfrom
Anai-Guo:fix-infer-auto-device-map-missing-buffers

Conversation

@Anai-Guo
Copy link
Copy Markdown

Problem

splits modules that don't fit on one device by expanding them into named_parameters(recurse=False) + named_children(). But it omits named_buffers(recurse=False), so any register_buffer tensor on a layer never gets assigned a device.

check_device_map then raises:

ValueError: The device_map provided does not give any device for the following parameters:
  model.language_model.layers.8.layer_scalar

This breaks multi-GPU loading of Gemma-4 (google/gemma-4-E4B-it), whose Gemma4DecoderLayer registers layer_scalar as a buffer:

self.register_buffer("layer_scalar", torch.ones(1))  # transformers line 1347

Closes #4014

Fix

Add list(module.named_buffers(recurse=False)) in the four module-expansion sites inside _infer_auto_device_map / fallback_allocate:

Site Function
fallback_allocate module-split line 1249
fallback_allocate parent-expansion line 1264
infer_auto_device_map tied-module split line 1465
infer_auto_device_map main-module split line 1513

Buffers are already included in compute_module_sizes (via named_module_tensors) and in the initial modules_to_treat construction — this patch makes the split-expansion paths consistent with both.

Repro (before fix)

from accelerate import infer_auto_device_map
from transformers import AutoConfig, AutoModelForCausalLM
from accelerate import init_empty_weights

config = AutoConfig.from_pretrained("google/gemma-4-E4B-it")
with init_empty_weights():
    model = AutoModelForCausalLM.from_config(config)

memalloc = {"cpu": "8GiB", 0: "48GiB", 1: "48GiB", 2: "0GiB", 3: "0GiB"}
from accelerate import get_balanced_memory
balanced = get_balanced_memory(model, max_memory=memalloc)
device_map = infer_auto_device_map(model, max_memory=balanced)
# -> ValueError: The device_map provided does not give any device for the
#    following parameters: model.language_model.layers.8.layer_scalar

🤖 Generated with Claude Code

…fer_auto_device_map

When infer_auto_device_map splits a module into its children (because it
doesn't fit on one device), it expands the module into
named_parameters(recurse=False) + named_children() but omits
named_buffers(recurse=False).

This means any buffer registered directly on a layer is never added to
modules_to_treat and consequently never receives a device assignment.
check_device_map then raises:

  ValueError: The device_map provided does not give any device for the
  following parameters: model.language_model.layers.8.layer_scalar

Gemma-4 (google/gemma-4-E4B-it) exhibits this because its decoder layer
registers layer_scalar via register_buffer, making it a buffer rather than
a parameter.

Fix: add list(module.named_buffers(recurse=False)) in the four expansion
sites (fallback_allocate module-split, fallback_allocate parent-expansion,
infer_auto_device_map tied-module-split, infer_auto_device_map
main-module-split).
@github-actions
Copy link
Copy Markdown
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@Anai-Guo
Copy link
Copy Markdown
Author

Still relevant — happy to address any review feedback. Please keep open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to infer valid auto device map for gemma-4-E4B-it

1 participant