Skip to content

[Bug] TypeError: _upad_input() missing 1 required positional argument: 'unpad_input_func' when running inference #41

@panelwe

Description

@panelwe

问题描述

在尝试运行 SpikingBrain-7B 模型进行推理时,遇到 TypeError: _upad_input() missing 1 required positional argument: 'unpad_input_func' 错误。模型权重可以正常加载,但在 model.generate() 阶段报错。

错误日志

<details> <summary>点击展开完整错误日志</summary>
bash
加载 tokenizer...
加载模型...
Loading weights: 100%|████████████████| 395/395 [00:03<00:00, 113.33it/s]
模型加载完成!

步骤2: 生成回复
错误: _upad_input() missing 1 required positional argument: 'unpad_input_func'
Traceback (most recent call last):
  File "/data/pengwei/panel/snn/SpikingBrain-7B/test_m1.py", line 36, in <module>
    outputs = model.generate(...)
  File "/data/pengwei/.cache/huggingface/modules/transformers_modules/V1_hyphen_7B_hyphen_sft_hyphen_s3_hyphen_reasoning/modeling_gla_swa.py", line 328, in generate
    return super().generate(*args, **kwargs)
  File "/data/pengwei/.cache/huggingface/modules/transformers_modules/V1_hyphen_7B_hyphen_sft_hyphen_s3_hyphen_reasoning/window_attention.py", line 196, in forward
    query_states, key_states, value_states, indices_q, cu_seq_lens, max_seq_lens = _upad_input(
                                                                                   ^^^^^^^^^^^^
TypeError: _upad_input() missing 1 required positional argument: 'unpad_input_func'
</details>

复现步骤

  1. 环境配置

bash
conda create -n spikingbrain python=3.11 -y
conda activate spikingbrain
pip install torch==2.7.1 --index-url https://download.pytorch.org/whl/cu118
pip install flash-attn==2.7.3 --no-build-isolation
pip install transformers==4.46.0
pip install flash-linear-attention==0.3.2
pip install accelerate sentencepiece protobuf safetensors
  1. 运行测试代码

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "./models/V1-7B-sft-s3-reasoning"

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

messages = [{"role": "user", "content": "你好"}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)

outputs = model.generate(inputs.input_ids, max_new_tokens=100)

环境信息

组件 | 版本 -- | -- Python | 3.11.15 PyTorch | 2.7.1+cu118 CUDA | 11.8 Transformers | 5.5.4 (也试过 4.46.0) flash-attn | 2.7.3 flash-linear-attention | 0.3.2 fla-core | (随上述安装) accelerate | 1.2.1 操作系统 | Ubuntu 22.04 GPU | NVIDIA (CUDA 11.8)

已尝试的解决方案

  • 降级 flash-linear-attention 到 0.1(但需要 transformers>=4.45.0,与 bitnet 冲突)

  • 升级到 flash-linear-attention==0.3.2

  • 清理 Hugging Face 缓存 (rm -rf ~/.cache/huggingface/modules/)

  • 手动修改 window_attention.py 添加 unpad_input 参数

  • 设置环境变量 FLASH_ATTENTION_DISABLE=1

  • 在模型加载时添加 attn_implementation="eager"  "sdpa"

  • 禁用 PyTorch SDPA (torch.backends.cuda.enable_flash_sdp(False))

问题代码位置

在缓存文件中的 window_attention.py 第 196 行:

python
query_states, key_states, value_states, indices_q, cu_seq_lens, max_seq_lens = _upad_input(
    query_states, key_states, value_states, \
    attention_mask[:, -key_states.shape[1]:] if attention_mask is not None \
        else torch.ones(key_states.shape[:2]).to(key_states), q_len
)

_upad_input 函数签名似乎已经改变,需要额外的 unpad_input_func 参数。

期望行为

模型能够正常生成文本输出,不报参数缺失错误。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions