You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to use optimum to quantize gpt2-medium. It's OK when I load gpt2 using GPT2ForSequenceClassification.from_pretrained() with CPU. But it throw an error [RuntimeError: Expected all tensors to be on the same device, but found at least two devices] when I load gpt2 to GPU with device_map=torch.device('cuda:3')
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
classGpt2GPTQ:
def__init__(self, model_dir: str, output_dir: str):
self._output_dir=output_diros.makedirs(self._output_dir, exist_ok=True)
self._tokenizer=GPT2Tokenizer.from_pretrained(model_dir)
self._model=GPT2ForSequenceClassification.from_pretrained(model_dir,
torch_dtype=torch.float16,
device_map=torch.device('cuda:3')) # It's all OK when load with CPUdefquantization(self, calib_data):
quantizer=GPTQQuantizer(bits=4, dataset=calib_data, block_name_to_quantize=None, model_seqlen=1024)
quantized_model=quantizer.quantize_model(self._model, self._tokenizer)
quantizer.save(quantized_model, self._output_dir)
self._tokenizer.save_pretrained(self._output_dir)
gpt2_gptq=Gpt2GPTQ(
'/tmp/gpt2-medium',
'/tmp/gpt2_q'
)
gpt2_gptq.quantization(["auto-gptq is an easy-to-use model quantization library with user-friendly apis, based on GPTQ algorithm."])
Expected behavior
It should act normal when I load a model to a specific GPU
The text was updated successfully, but these errors were encountered:
System Info
Who can help?
@fxmarty, @SunMarc
hello there, can help looking this issue?
I want to use optimum to quantize gpt2-medium. It's OK when I load gpt2 using GPT2ForSequenceClassification.from_pretrained() with CPU. But it throw an error [RuntimeError: Expected all tensors to be on the same device, but found at least two devices] when I load gpt2 to GPU with device_map=torch.device('cuda:3')
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
Expected behavior
It should act normal when I load a model to a specific GPU
The text was updated successfully, but these errors were encountered: