Hi.
i have a issue in loading the model which is cause this error:
WARNING:auto_gptq.nn_modules.fused_llama_mlp:skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet.
RuntimeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 model = AutoGPTQForCausalLM.from_quantized(model_id,use_safetensors=False)
2 tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="right", use_fast=False)
2 frames
/usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_utils.py in autogptq_post_init(model, use_act_order, max_input_length)
256
257 for device, buffers in model.device_to_buffers.items():
--> 258 prepare_buffers(device, buffers["temp_state"], buffers["temp_dq"])
259
260 # Using the default from exllama repo here.
RuntimeError: no device index
Hi.
i have a issue in loading the model which is cause this error:
WARNING:auto_gptq.nn_modules.fused_llama_mlp:skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet.
RuntimeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 model = AutoGPTQForCausalLM.from_quantized(model_id,use_safetensors=False)
2 tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="right", use_fast=False)
2 frames
/usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_utils.py in autogptq_post_init(model, use_act_order, max_input_length)
256
257 for device, buffers in model.device_to_buffers.items():
--> 258 prepare_buffers(device, buffers["temp_state"], buffers["temp_dq"])
259
260 # Using the default from exllama repo here.
RuntimeError: no device index