Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Expected all tensors to be on the same device, but found at least two devices #1889

Open
2 of 4 tasks
Daya-Jin opened this issue Jun 4, 2024 · 2 comments
Open
2 of 4 tasks
Labels
bug Something isn't working quantization

Comments

@Daya-Jin
Copy link
Contributor

Daya-Jin commented Jun 4, 2024

System Info

optimum 1.20.0
Python 3.10.8

Who can help?

@fxmarty, @SunMarc

hello there, can help looking this issue?

I want to use optimum to quantize gpt2-medium. It's OK when I load gpt2 using GPT2ForSequenceClassification.from_pretrained() with CPU. But it throw an error [RuntimeError: Expected all tensors to be on the same device, but found at least two devices] when I load gpt2 to GPU with device_map=torch.device('cuda:3')

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

class Gpt2GPTQ:
    def __init__(self, model_dir: str, output_dir: str):
        self._output_dir = output_dir
        os.makedirs(self._output_dir, exist_ok=True)
        self._tokenizer = GPT2Tokenizer.from_pretrained(model_dir)
        self._model = GPT2ForSequenceClassification.from_pretrained(model_dir,
                                                                    torch_dtype=torch.float16,
                                                                    device_map=torch.device('cuda:3'))    # It's all OK when load with CPU

    def quantization(self, calib_data):
        quantizer = GPTQQuantizer(bits=4, dataset=calib_data, block_name_to_quantize=None, model_seqlen=1024)
        quantized_model = quantizer.quantize_model(self._model, self._tokenizer)
        quantizer.save(quantized_model, self._output_dir)
        self._tokenizer.save_pretrained(self._output_dir)

gpt2_gptq = Gpt2GPTQ(
        '/tmp/gpt2-medium',
        '/tmp/gpt2_q'
    )
gpt2_gptq.quantization(["auto-gptq is an easy-to-use model quantization library with user-friendly apis, based on GPTQ algorithm."])

Expected behavior

It should act normal when I load a model to a specific GPU

@Daya-Jin Daya-Jin added the bug Something isn't working label Jun 4, 2024
@Daya-Jin Daya-Jin mentioned this issue Jun 4, 2024
3 tasks
@Daya-Jin
Copy link
Contributor Author

Daya-Jin commented Jun 4, 2024

I commit a PR try to fix this problem but I haven't execute fully test to make sure it's correct. I only just test on my local machine.

#1891

@Daya-Jin Daya-Jin mentioned this issue Jun 4, 2024
3 tasks
@SunMarc
Copy link
Member

SunMarc commented Jun 4, 2024

Thanks for the report and submitting a PR @Daya-Jin !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working quantization
Projects
None yet
Development

No branches or pull requests

3 participants