Skip to content

Refactor Mamba2 to use standardized output tracing#44087

Open
huyxdang wants to merge 2 commits intohuggingface:mainfrom
huyxdang:mamb2-refractor-output-tracing
Open

Refactor Mamba2 to use standardized output tracing#44087
huyxdang wants to merge 2 commits intohuggingface:mainfrom
huyxdang:mamb2-refractor-output-tracing

Conversation

@huyxdang
Copy link

Summary

Refactors the Mamba2 model to use the standardized output collection interface as part of #43979.

Changes

  • Standardized Output Mapping: Added _can_record_outputs to Mamba2PreTrainedModel mapping hidden_statesMamba2Block.
  • Base Model Refactor: Added @capture_outputs and @merge_with_config_defaults decorators to Mamba2Model.forward.
  • Head Model Refactor: Added @can_return_tuple decorator to Mamba2ForCausalLM.forward to handle automated tuple/dict packaging.
  • Boilerplate Removal: Removed manual output_hidden_states and return_dict parameter resolution and manual collection loops in both Mamba2Model and Mamba2ForCausalLM.
  • Architecture Simplification: Simplified Mamba2Block.forward to return hidden_states directly as a single tensor.
  • Bug Fix: Fixed a TypeError in src/transformers/integrations/hub_kernels.py where integer version numbers in the kernel mapping caused a crash during loading.

Technical Context

Unlike traditional Transformer models which utilize attention mechanisms, Mamba2 is a State Space Model (SSM). It doesn't generate attention weights and thus the refractor focuses only on capturing hidden_states.

Migrate Mamba2Model and Mamba2ForCausalLM to use the PreTrainedModel output tracing decorators (@capture_outputs and @can_return_tuple).

This removes manual boilerplate for collecting hidden states and packing return tuples, aligning the implementation with the library standard.

Also fix a crash in hub_kernels.py where integer version numbers in the kernel mapping caused a TypeError during loading.

Fixes huggingface#43979
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: mamba2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant