Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vicuna-1.5 Quantized using AWQ Not Working - CUDA Illegal Memory Access #2264

Open
mmaaz60 opened this issue Aug 18, 2023 · 3 comments
Open

Comments

@mmaaz60
Copy link

mmaaz60 commented Aug 18, 2023

Dear Team,

I followed the instructions at https://github.com/mit-han-lab/llm-awq#usage to quantize Vicuna-13B-1.5 model and follow the instructions at https://github.com/lm-sys/FastChat/blob/55b2f8fdb0e0b80d64e043e9fc9018641bf7289f/docs/awq.md to perform inference. I am getting CUDA Illegal Memory Access Error.

Invalid response object from API: '{"object":"error","message":"**NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.**\\n\\n(CUDA error: an illegal memory access was encountered\\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\\n)","code":50001}' (HTTP response code was 400)

Any pointers would be greatly appreciated. Thanks

@merrymercy
Copy link
Member

cc @tonylins @kentang-mit @ys-2020

@digisomni
Copy link
Contributor

If you haven't gotten help for this yet and still wanna try quant'd models with FastChat, you can give my PR a whirl: #2365

On your model worker you have to set gptq-transformers-bits 4 or whatever bits you're using and gptq-transformers-disable-exllama. Should work then.

@surak
Copy link
Collaborator

surak commented Oct 22, 2023

@mmaaz60 Have you tried @digisomni 's suggestion? Did it work out for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants