-
-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ImportError: /home/myuser/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so: undefined symbol: hipblasGetStream #154
Comments
Hey fellow MI60 chad. |
🤝 I'm all about that 32GB of HBM2 I've installed exllama via pip using the command line that you provided, but I'm still seeing the ImportError related to hipblasGetStream
I should have hipBLAS installed correctly, and I've checked for the
|
Hmm weird, what's the log say right before |
Full output:
I removed exllama from text-generation-webui/repositories when I installed it via pip |
I kept my exllama folder in /repositories and it seemed to work, but... I'm not sure about the hip modules missing. |
Do you think that it is potentially an environment/system issue with the ROCm installation and not an exllama issue? I've tried running with both the pip module installed and exllama cloned to text-generation-webui/repositories but the output is identical to above. I am not sure what else to try, I might reformat and try on Ubuntu 22.04 instead of 20.04. What OS are you using with your MI60s? Or are you using a docker container with them? |
I've switched between docker and standalone system, I used 22.04 for both. There was another github with a bunch of rocm scripts for setting up stablediffusion, ooba, etc, but I can't seem to find it now. |
I've been having mostly success running GPTQ single-GPU by following that rentry.co guide already. I say "mostly success" because some models output no tokens, gibberish, or some error; but other models run great. I have not been able to do any kind of multi-GPU yet though, so far I have only been running 30B/33B sized models on each MI60. I'd love to get exllama working with multi-GPU so that I can run 65B sized models across my 2 MI60s. |
I was running GPTQ multi for 65B, it's pretty slow across the two MI60s, but you have soooo much memory to spare you could probably dump context like it's nothing. |
They are both in physical x16 slots but one is running in PCIE Gen3x16 mode and the other is unfortunately running PCIE Gen3x4 mode at the moment. |
I think... the MI60s HATE running in <16x for some reason. just a theory, since I have always had trouble with the like that until I swapped to epyc boards... Also, what's the output of rocm-smi? And maybe rocm-smi -a? |
I've got a 4600G for APU, so it detects 3 GPUs but rocm-smi does not play super nicely with the APU. I did install every use-case from amdgpu installer before I opened the issue. No issues with that installation. I think next step is reformat and upgrade to 22.04 rocm-smi
rocm-smi -a
|
Hmm, yeah maybe next step will be 22.04, unfortunately. btw I noticed you have one mismatched card, exactly like mine. One of them is samsung memory and one is hynix, if I remember correctly. It doesn't change anything architecture-wise, but I just found it interesting that we both ended up with mismatched cards. |
We could swap so that we'd both have a match! Ha ha 😛 Interestingly, one of my MI60s is 1x8-pin PCIe power connector + 1x6-pin PCIe power connector, and the other MI60 is 2x8-pin PCIe power connectors, so they are mismatched in that way too |
Yep, confirmed we have probably exactly the same mismatched cards, LOL. |
Out of curiosity, how are you cooling your MI60s? I am cooling using 1 80mm fan per card, rigged up with 3D printed fan shrouds so that they force air through the cards |
Can you try hiding your APU? I only had problems trying to use mine.
|
I already have these environment variables set. I've run with them as part of the command line as well and the result is identical to above |
Did you try a docker yet? I'm not sure if it will help, but here's what I used to set up my docker container for oobabooga before. |
What's the output of |
I have been procrastinating reformatting, so I can still tell you this! No output. Here is without grep
|
Pytorch doesn't have to be built linked to libhipblas.so. This links exllama_ext.so directly to hipblas to avoid potential errors like "exllama_ext.so: undefined symbol: hipblasGetStream" (turboderp#154)
Did you build Pytorch with Maybe exllama should start linking to hipblas directly. It looks like the only part of torch itself that needs hipblas is FBGEMM and that's both optional and doesn't get built for x86 32bit. Could you try this change to the text-generation-webui/repositories version and see if it works? Engininja2/exllama@bb3473e |
I built Pytorch by cloning the official github and following only the steps specified in their README for building from source. I definitely didn't explicitly set this environment variable, but I'm not sure if it is on or off by default.
I think this has resolved the original issue! Here is my latest run with output. Not a final success, but definitely good progress, and maybe we are at the point where I should close this issue and think about opening a new one?
I managed to catch the rocm-smi output a moment before the text-generation-ui server bombed out, so I can confirm that the model did load into VRAM across both of the cards!
|
I installed python-pytorch-opt-rocm on Arch Linux and also needed the explicit |
@sjstulga not sure if you're still having issues, but I wanted to point out that I was using CUDA_VISIBLE_DEVICES=0 (or 1) even when using my MI60s. I saw you have that in there by other names, but maybe. |
I am using exllama through the oobabooga text-generation-webui with AMD/ROCm. I cloned exllama into the text-generation-webui/repositories folder and installed dependencies.
Devices: 2x AMD Instinct MI60 gfx906
Distro: Ubuntu 20.04.6
Kernel: 5.15.0-76-generic
ROCm version 5.6.0
pytorch version 2.0.1 built from source
My command line
Output
The text was updated successfully, but these errors were encountered: