RuntimeError: Expected all tensors to be on the same device, but found at least two devices #1889

Daya-Jin · 2024-06-04T07:34:08Z

System Info

optimum 1.20.0
Python 3.10.8

Who can help?

@fxmarty, @SunMarc

hello there, can help looking this issue?

I want to use optimum to quantize gpt2-medium. It's OK when I load gpt2 using GPT2ForSequenceClassification.from_pretrained() with CPU. But it throw an error [RuntimeError: Expected all tensors to be on the same device, but found at least two devices] when I load gpt2 to GPU with device_map=torch.device('cuda:3')

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

class Gpt2GPTQ:
    def __init__(self, model_dir: str, output_dir: str):
        self._output_dir = output_dir
        os.makedirs(self._output_dir, exist_ok=True)
        self._tokenizer = GPT2Tokenizer.from_pretrained(model_dir)
        self._model = GPT2ForSequenceClassification.from_pretrained(model_dir,
                                                                    torch_dtype=torch.float16,
                                                                    device_map=torch.device('cuda:3'))    # It's all OK when load with CPU

    def quantization(self, calib_data):
        quantizer = GPTQQuantizer(bits=4, dataset=calib_data, block_name_to_quantize=None, model_seqlen=1024)
        quantized_model = quantizer.quantize_model(self._model, self._tokenizer)
        quantizer.save(quantized_model, self._output_dir)
        self._tokenizer.save_pretrained(self._output_dir)

gpt2_gptq = Gpt2GPTQ(
        '/tmp/gpt2-medium',
        '/tmp/gpt2_q'
    )
gpt2_gptq.quantization(["auto-gptq is an easy-to-use model quantization library with user-friendly apis, based on GPTQ algorithm."])

Expected behavior

It should act normal when I load a model to a specific GPU

The text was updated successfully, but these errors were encountered:

Daya-Jin · 2024-06-04T07:40:58Z

I commit a PR try to fix this problem but I haven't execute fully test to make sure it's correct. I only just test on my local machine.

#1891

SunMarc · 2024-06-04T09:26:28Z

Thanks for the report and submitting a PR @Daya-Jin !

Daya-Jin added the bug Something isn't working label Jun 4, 2024

Daya-Jin mentioned this issue Jun 4, 2024

fix: device consistence #1890

Closed

3 tasks

Daya-Jin mentioned this issue Jun 4, 2024

fix: device consistence #1891

Merged

3 tasks

dacorvo added the quantization label Oct 9, 2024

jklj077 mentioned this issue Nov 19, 2024

[Badcase]: 使用llamafactory量化gptq时，使用2个GPU时出错 QwenLM/Qwen2.5#1084

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Expected all tensors to be on the same device, but found at least two devices #1889

RuntimeError: Expected all tensors to be on the same device, but found at least two devices #1889

Daya-Jin commented Jun 4, 2024 •

edited

Loading

Daya-Jin commented Jun 4, 2024 •

edited

Loading

SunMarc commented Jun 4, 2024

RuntimeError: Expected all tensors to be on the same device, but found at least two devices #1889

RuntimeError: Expected all tensors to be on the same device, but found at least two devices #1889

Comments

Daya-Jin commented Jun 4, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

Daya-Jin commented Jun 4, 2024 • edited Loading

SunMarc commented Jun 4, 2024

Daya-Jin commented Jun 4, 2024 •

edited

Loading

Daya-Jin commented Jun 4, 2024 •

edited

Loading