ImportError: /home/myuser/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so: undefined symbol: hipblasGetStream #154

sjstulga · 2023-07-13T13:10:20Z

I am using exllama through the oobabooga text-generation-webui with AMD/ROCm. I cloned exllama into the text-generation-webui/repositories folder and installed dependencies.

Devices: 2x AMD Instinct MI60 gfx906
Distro: Ubuntu 20.04.6
Kernel: 5.15.0-76-generic
ROCm version 5.6.0
pytorch version 2.0.1 built from source

My command line

(textgen) myuser@mymachine:~/text-generation-webui$ python server.py --notebook --model airoboros-65B-gpt4-1.4-GPTQ --loader exllama --gpu-split 20,20 --listen --api

Output

2023-07-13 09:05:48 INFO:Loading airoboros-65B-gpt4-1.4-GPTQ...
2023-07-13 09:05:48 WARNING:Exllama module failed to load. Will attempt to load from repositories.
Successfully preprocessed all matching files.
2023-07-13 09:05:48 ERROR:Could not find repositories/exllama/. Make sure that exllama is cloned inside repositories/ and is up to date.
Traceback (most recent call last):
  File "/home/myuser/text-generation-webui/modules/exllama.py", line 10, in <module>
    from exllama.generator import ExLlamaGenerator
ModuleNotFoundError: No module named 'exllama'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/myuser/text-generation-webui/server.py", line 1157, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/myuser/text-generation-webui/modules/models.py", line 78, in load_model
    output = load_func_map[loader](model_name)
  File "/home/myuser/text-generation-webui/modules/models.py", line 296, in ExLlama_loader
    from modules.exllama import ExllamaModel
  File "/home/muser/text-generation-webui/modules/exllama.py", line 19, in <module>
    from generator import ExLlamaGenerator
  File "/home/myuser/text-generation-webui/repositories/exllama/generator.py", line 1, in <module>
    import cuda_ext
  File "/home/myuser/text-generation-webui/repositories/exllama/cuda_ext.py", line 43, in <module>
    exllama_ext = load(
  File "/home/myuser/pytorch/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/home/myuser/pytorch/torch/utils/cpp_extension.py", line 1535, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/myuser/pytorch/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
ImportError: /home/myuser/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so: undefined symbol: hipblasGetStream

The text was updated successfully, but these errors were encountered:

jmoney7823956789378 · 2023-07-13T13:37:51Z

Hey fellow MI60 chad.
Exllama in ooba's webui recent changed to using the pip module.
Try python -m pip install git+https://github.com/jllllll/exllama

sjstulga · 2023-07-13T13:55:56Z

🤝 I'm all about that 32GB of HBM2

I've installed exllama via pip using the command line that you provided, but I'm still seeing the ImportError related to hipblasGetStream

Traceback (most recent call last):
  File "/home/myuser/text-generation-webui/modules/exllama.py", line 10, in <module>
    from exllama.generator import ExLlamaGenerator
  File "/home/myuser/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/__init__.py", line 1, in <module>
    from . import cuda_ext, generator, model, tokenizer
  File "/home/myuser/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/cuda_ext.py", line 9, in <module>
    import exllama_ext
ImportError: /home/myuser/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama_ext.cpython-310-x86_64-linux-gnu.so: undefined symbol: hipblasGetStream

I should have hipBLAS installed correctly, and I've checked for the hipblas.h file

(textgen) myuser@mymachine:~/text-generation-webui$ ls /opt/rocm/include/hipblas/
hipblas-export.h  hipblas.h  hipblas_module.f90  hipblas-version.h

jmoney7823956789378 · 2023-07-13T14:11:10Z

Hmm weird, what's the log say right before Traceback (most recent call last): ?
Also, does your $PATH have /opt/rocm/bin right at the start?
export PATH=/opt/rocm/bin:$PATH

sjstulga · 2023-07-13T14:16:19Z

(textgen) myuser@mymachine:~/text-generation-webui$ echo $PATH
/opt/rocm/bin:/home/myuser/miniconda3/envs/textgen/bin:/home/myuser/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

Full output:

(textgen) myuser@mymachine:~/text-generation-webui$ python server.py --notebook --model airoboros-65B-gpt4-1.4-GPTQ --loader exllama --gpu-split 20,20 --listen --api
2023-07-13 10:14:19 INFO:Loading airoboros-65B-gpt4-1.4-GPTQ...
2023-07-13 10:14:20 WARNING:Exllama module failed to load. Will attempt to load from repositories.
2023-07-13 10:14:20 ERROR:Could not find repositories/exllama/. Make sure that exllama is cloned inside repositories/ and is up to date.
Traceback (most recent call last):
  File "/home/myuser/text-generation-webui/modules/exllama.py", line 10, in <module>
    from exllama.generator import ExLlamaGenerator
  File "/home/myuser/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/__init__.py", line 1, in <module>
    from . import cuda_ext, generator, model, tokenizer
  File "/home/myuser/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/cuda_ext.py", line 9, in <module>
    import exllama_ext
ImportError: /home/myuser/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama_ext.cpython-310-x86_64-linux-gnu.so: undefined symbol: hipblasGetStream

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/myuser/text-generation-webui/server.py", line 1157, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/myuser/text-generation-webui/modules/models.py", line 78, in load_model
    output = load_func_map[loader](model_name)
  File "/home/myuser/text-generation-webui/modules/models.py", line 296, in ExLlama_loader
    from modules.exllama import ExllamaModel
  File "/home/myuser/text-generation-webui/modules/exllama.py", line 19, in <module>
    from generator import ExLlamaGenerator
ModuleNotFoundError: No module named 'generator'

I removed exllama from text-generation-webui/repositories when I installed it via pip

jmoney7823956789378 · 2023-07-13T14:39:59Z

I kept my exllama folder in /repositories and it seemed to work, but... I'm not sure about the hip modules missing.

sjstulga · 2023-07-13T14:50:28Z

Do you think that it is potentially an environment/system issue with the ROCm installation and not an exllama issue? I've tried running with both the pip module installed and exllama cloned to text-generation-webui/repositories but the output is identical to above. I am not sure what else to try, I might reformat and try on Ubuntu 22.04 instead of 20.04. What OS are you using with your MI60s? Or are you using a docker container with them?

jmoney7823956789378 · 2023-07-13T14:55:15Z

Do you think that it is potentially an environment/system issue with the ROCm installation and not an exllama issue? I've tried running with both the pip module installed and exllama cloned to text-generation-webui/repositories but the output is identical to above. I am not sure what else to try, I might reformat and try on Ubuntu 22.04 instead of 20.04. What OS are you using with your MI60s? Or are you using a docker container with them?

I've switched between docker and standalone system, I used 22.04 for both.
You might also have luck with an Arch docker if you're into that.
I'm assuming you've probably been following this guide?
https://rentry.co/eq3hg

There was another github with a bunch of rocm scripts for setting up stablediffusion, ooba, etc, but I can't seem to find it now.

sjstulga · 2023-07-13T14:58:06Z

I've been having mostly success running GPTQ single-GPU by following that rentry.co guide already. I say "mostly success" because some models output no tokens, gibberish, or some error; but other models run great. I have not been able to do any kind of multi-GPU yet though, so far I have only been running 30B/33B sized models on each MI60. I'd love to get exllama working with multi-GPU so that I can run 65B sized models across my 2 MI60s.

jmoney7823956789378 · 2023-07-13T14:59:55Z

I was running GPTQ multi for 65B, it's pretty slow across the two MI60s, but you have soooo much memory to spare you could probably dump context like it's nothing.
Do you have your MI60s both in x16 slots?

sjstulga · 2023-07-13T15:06:02Z

They are both in physical x16 slots but one is running in PCIE Gen3x16 mode and the other is unfortunately running PCIE Gen3x4 mode at the moment.

jmoney7823956789378 · 2023-07-13T15:11:54Z

the other is unfortunately running PCIE Gen3x4 mode at the moment.

I think... the MI60s HATE running in <16x for some reason. just a theory, since I have always had trouble with the like that until I swapped to epyc boards...
I think our underlying issue with hipblas is something different though.
I'd say either go balls-to-the-wall and install every hip-sdk use-case from the amdgpu installer, switch to trying out dockers, or swap to 22.04 and try that.

Also, what's the output of rocm-smi? And maybe rocm-smi -a?

sjstulga · 2023-07-13T15:20:37Z

I've got a 4600G for APU, so it detects 3 GPUs but rocm-smi does not play super nicely with the APU. I did install every use-case from amdgpu installer before I opened the issue. No issues with that installation. I think next step is reformat and upgrade to 22.04

rocm-smi

(textgen) myuser@mymachine:~/text-generation-webui$ rocm-smi 


========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
ERROR: GPU[2]   : sclk clock is unsupported
====================================================================================
====================================================================================
GPU[2]          : get_power_cap, Not supported on the given system
GPU  Temp (DieEdge)  AvgPwr  SCLK    MCLK    Fan     Perf  PwrCap       VRAM%  GPU%  
0    32.0c           18.0W   938Mhz  350Mhz  14.51%  auto  225.0W         0%   0%    
1    30.0c           15.0W   938Mhz  350Mhz  14.51%  auto  225.0W         0%   0%    
2    27.0c           11.0W   None    None    0%      auto  Unsupported    3%   0%    
====================================================================================
=============================== End of ROCm SMI Log ================================

rocm-smi -a

(textgen) myuser@mymachine:~/text-generation-webui$ rocm-smi -a -d 0 1


========================= ROCm System Management Interface =========================
=========================== Version of System Component ============================
Driver version: 6.1.5
====================================================================================
======================================== ID ========================================
GPU[0]          : GPU ID: 0x66a1
GPU[1]          : GPU ID: 0x66a1
====================================================================================
==================================== Unique ID =====================================
GPU[0]          : Unique ID: 0xc04a208172fd5d70
GPU[1]          : Unique ID: 0x72a288a172edb148
====================================================================================
====================================== VBIOS =======================================
GPU[0]          : VBIOS version: 113-D1630600-107
GPU[1]          : VBIOS version: 113-D1631200-107
====================================================================================
=================================== Temperature ====================================
GPU[0]          : Temperature (Sensor edge) (C): 32.0
GPU[0]          : Temperature (Sensor junction) (C): 32.0
GPU[0]          : Temperature (Sensor memory) (C): 31.0
GPU[0]          : Temperature (Sensor HBM 0) (C): 0.0
GPU[0]          : Temperature (Sensor HBM 1) (C): 0.0
GPU[0]          : Temperature (Sensor HBM 2) (C): 0.0
GPU[0]          : Temperature (Sensor HBM 3) (C): 0.0
GPU[1]          : Temperature (Sensor edge) (C): 30.0
GPU[1]          : Temperature (Sensor junction) (C): 31.0
GPU[1]          : Temperature (Sensor memory) (C): 30.0
GPU[1]          : Temperature (Sensor HBM 0) (C): 0.0
GPU[1]          : Temperature (Sensor HBM 1) (C): 0.0
GPU[1]          : Temperature (Sensor HBM 2) (C): 0.0
GPU[1]          : Temperature (Sensor HBM 3) (C): 0.0
====================================================================================
============================ Current clock frequencies =============================
GPU[0]          : dcefclk clock level: 0: (357Mhz)
GPU[0]          : fclk clock level: 0: (550Mhz)
GPU[0]          : mclk clock level: 0: (350Mhz)
GPU[0]          : sclk clock level: 1: (938Mhz)
GPU[0]          : socclk clock level: 0: (309Mhz)
GPU[0]          : pcie clock level: 1 (8.0GT/s x16)
GPU[1]          : dcefclk clock level: 0: (357Mhz)
GPU[1]          : fclk clock level: 0: (550Mhz)
GPU[1]          : mclk clock level: 0: (350Mhz)
GPU[1]          : sclk clock level: 1: (938Mhz)
GPU[1]          : socclk clock level: 0: (309Mhz)
GPU[1]          : pcie clock level: 1 (8.0GT/s x4)
====================================================================================
================================ Current Fan Metric ================================
GPU[0]          : Fan Level: 37 (15%)
GPU[0]          : Fan RPM: 0
GPU[1]          : Fan Level: 37 (15%)
GPU[1]          : Fan RPM: 0
====================================================================================
============================== Show Performance Level ==============================
GPU[0]          : Performance Level: auto
GPU[1]          : Performance Level: auto
====================================================================================
================================= OverDrive Level ==================================
GPU[0]          : GPU OverDrive value (%): 0
GPU[1]          : GPU OverDrive value (%): 0
====================================================================================
================================= OverDrive Level ==================================
GPU[0]          : GPU Memory OverDrive value (%): 0
GPU[1]          : GPU Memory OverDrive value (%): 0
====================================================================================
==================================== Power Cap =====================================
GPU[0]          : Max Graphics Package Power (W): 225.0
GPU[1]          : Max Graphics Package Power (W): 225.0
====================================================================================
=============================== Show Power Profiles ================================
GPU[0]          : 1. Available power profile (#1 of 7): CUSTOM
GPU[0]          : 2. Available power profile (#2 of 7): VIDEO
GPU[0]          : 3. Available power profile (#3 of 7): POWER SAVING
GPU[0]          : 4. Available power profile (#4 of 7): COMPUTE
GPU[0]          : 5. Available power profile (#5 of 7): VR
GPU[0]          : 6. Available power profile (#6 of 7): 3D FULL SCREEN
GPU[0]          : 7. Available power profile (#7 of 7): BOOTUP DEFAULT*
GPU[1]          : 1. Available power profile (#1 of 7): CUSTOM
GPU[1]          : 2. Available power profile (#2 of 7): VIDEO
GPU[1]          : 3. Available power profile (#3 of 7): POWER SAVING
GPU[1]          : 4. Available power profile (#4 of 7): COMPUTE
GPU[1]          : 5. Available power profile (#5 of 7): VR
GPU[1]          : 6. Available power profile (#6 of 7): 3D FULL SCREEN
GPU[1]          : 7. Available power profile (#7 of 7): BOOTUP DEFAULT*
====================================================================================
================================ Power Consumption =================================
GPU[0]          : Average Graphics Package Power (W): 19.0
GPU[1]          : Average Graphics Package Power (W): 15.0
====================================================================================
=========================== Supported clock frequencies ============================
GPU[0]          : Supported dcefclk frequencies on GPU0
GPU[0]          : 0: 357Mhz *
GPU[0]          : 1: 453Mhz
GPU[0]          : 2: 566Mhz
GPU[0]          : 3: 680Mhz
GPU[0]          : 4: 755Mhz
GPU[0]          : 5: 850Mhz
GPU[0]          : 6: 971Mhz
GPU[0]          : 7: 1133Mhz
GPU[0]          : 
GPU[0]          : Supported fclk frequencies on GPU0
GPU[0]          : 0: 550Mhz *
GPU[0]          : 1: 610Mhz
GPU[0]          : 2: 690Mhz
GPU[0]          : 3: 760Mhz
GPU[0]          : 4: 870Mhz
GPU[0]          : 5: 960Mhz
GPU[0]          : 6: 1080Mhz
GPU[0]          : 7: 1180Mhz
GPU[0]          : 
GPU[0]          : Supported mclk frequencies on GPU0
GPU[0]          : 0: 350Mhz *
GPU[0]          : 1: 800Mhz
GPU[0]          : 2: 1000Mhz
GPU[0]          : 
GPU[0]          : Supported sclk frequencies on GPU0
GPU[0]          : 0: 925Mhz
GPU[0]          : 1: 938Mhz *
GPU[0]          : 2: 1076Mhz
GPU[0]          : 3: 1179Mhz
GPU[0]          : 4: 1339Mhz
GPU[0]          : 5: 1461Mhz
GPU[0]          : 6: 1599Mhz
GPU[0]          : 7: 1711Mhz
GPU[0]          : 8: 1800Mhz
GPU[0]          : 
GPU[0]          : Supported socclk frequencies on GPU0
GPU[0]          : 0: 309Mhz *
GPU[0]          : 1: 523Mhz
GPU[0]          : 2: 566Mhz
GPU[0]          : 3: 618Mhz
GPU[0]          : 4: 680Mhz
GPU[0]          : 5: 755Mhz
GPU[0]          : 6: 850Mhz
GPU[0]          : 7: 971Mhz
GPU[0]          : 
GPU[0]          : Supported PCIe frequencies on GPU0
GPU[0]          : 0: 2.5GT/s x16
GPU[0]          : 1: 8.0GT/s x16 *
GPU[0]          : 
------------------------------------------------------------------------------------
GPU[1]          : Supported dcefclk frequencies on GPU1
GPU[1]          : 0: 357Mhz *
GPU[1]          : 1: 453Mhz
GPU[1]          : 2: 566Mhz
GPU[1]          : 3: 680Mhz
GPU[1]          : 4: 755Mhz
GPU[1]          : 5: 850Mhz
GPU[1]          : 6: 971Mhz
GPU[1]          : 7: 1133Mhz
GPU[1]          : 
GPU[1]          : Supported fclk frequencies on GPU1
GPU[1]          : 0: 550Mhz *
GPU[1]          : 1: 610Mhz
GPU[1]          : 2: 690Mhz
GPU[1]          : 3: 760Mhz
GPU[1]          : 4: 870Mhz
GPU[1]          : 5: 960Mhz
GPU[1]          : 6: 1080Mhz
GPU[1]          : 7: 1278Mhz
GPU[1]          : 
GPU[1]          : Supported mclk frequencies on GPU1
GPU[1]          : 0: 350Mhz *
GPU[1]          : 1: 800Mhz
GPU[1]          : 2: 1000Mhz
GPU[1]          : 
GPU[1]          : Supported sclk frequencies on GPU1
GPU[1]          : 0: 925Mhz
GPU[1]          : 1: 938Mhz *
GPU[1]          : 2: 1076Mhz
GPU[1]          : 3: 1179Mhz
GPU[1]          : 4: 1339Mhz
GPU[1]          : 5: 1461Mhz
GPU[1]          : 6: 1599Mhz
GPU[1]          : 7: 1711Mhz
GPU[1]          : 8: 1800Mhz
GPU[1]          : 
GPU[1]          : Supported socclk frequencies on GPU1
GPU[1]          : 0: 309Mhz *
GPU[1]          : 1: 523Mhz
GPU[1]          : 2: 566Mhz
GPU[1]          : 3: 618Mhz
GPU[1]          : 4: 680Mhz
GPU[1]          : 5: 755Mhz
GPU[1]          : 6: 850Mhz
GPU[1]          : 7: 971Mhz
GPU[1]          : 
GPU[1]          : Supported PCIe frequencies on GPU1
GPU[1]          : 0: 2.5GT/s x4
GPU[1]          : 1: 8.0GT/s x4 *
GPU[1]          : 
------------------------------------------------------------------------------------
====================================================================================
================================ % time GPU is busy ================================
GPU[0]          : GPU use (%): 0
GPU[0]          : GFX Activity: 0
GPU[1]          : GPU use (%): 0
GPU[1]          : GFX Activity: 0
====================================================================================
================================ Current Memory Use ================================
GPU[0]          : GPU memory use (%): 0
GPU[0]          : Memory Activity: 0
GPU[1]          : GPU memory use (%): 0
GPU[1]          : Memory Activity: 0
====================================================================================
================================== Memory Vendor ===================================
GPU[0]          : GPU memory vendor: samsung
GPU[1]          : GPU memory vendor: hynix
====================================================================================
=============================== PCIe Replay Counter ================================
GPU[0]          : PCIe Replay Count: 0
GPU[1]          : PCIe Replay Count: 0
====================================================================================
================================== Serial Number ===================================
GPU[0]          : Serial Number: PCB026063-0102
GPU[1]          : Serial Number: 692001000098
====================================================================================
================================== KFD Processes ===================================
No KFD PIDs currently running
====================================================================================
=============================== GPUs Indexed by PID ================================
No KFD PIDs currently running
====================================================================================
==================== GPU Memory clock frequencies and voltages =====================
GPU[0]          : get_od_volt, Not supported on the given system
GPU[1]          : get_od_volt, Not supported on the given system
====================================================================================
================================= Current voltage ==================================
GPU[0]          : Voltage (mV): 737
GPU[1]          : Voltage (mV): 737
====================================================================================
==================================== PCI Bus ID ====================================
GPU[0]          : PCI Bus: 0000:03:00.0
GPU[1]          : PCI Bus: 0000:08:00.0
====================================================================================
=============================== Firmware Information ===============================
GPU[0]          : ASD firmware version:         0x210000a7
GPU[0]          : CE firmware version:          80
GPU[0]          : DMCU firmware version:        0
GPU[0]          : MC firmware version:          0
GPU[0]          : ME firmware version:          166
GPU[0]          : MEC firmware version:         469
GPU[0]          : MEC2 firmware version:        469
GPU[0]          : PFP firmware version:         194
GPU[0]          : RLC firmware version:         50
GPU[0]          : RLC SRLC firmware version:    1
GPU[0]          : RLC SRLG firmware version:    1
GPU[0]          : RLC SRLS firmware version:    1
GPU[0]          : SDMA firmware version:        145
GPU[0]          : SDMA2 firmware version:       145
GPU[0]          : SMC firmware version:         00.40.60.00
GPU[0]          : SOS firmware version:         0x00080b67
GPU[0]          : TA RAS firmware version:      27.00.01.43
GPU[0]          : TA XGMI firmware version:     32.00.00.02
GPU[0]          : UVD firmware version:         0x42002b13
GPU[0]          : VCE firmware version:         0x39060400
GPU[0]          : VCN firmware version:         0x00000000
GPU[1]          : ASD firmware version:         0x210000a7
GPU[1]          : CE firmware version:          80
GPU[1]          : DMCU firmware version:        0
GPU[1]          : MC firmware version:          0
GPU[1]          : ME firmware version:          166
GPU[1]          : MEC firmware version:         469
GPU[1]          : MEC2 firmware version:        469
GPU[1]          : PFP firmware version:         194
GPU[1]          : RLC firmware version:         50
GPU[1]          : RLC SRLC firmware version:    1
GPU[1]          : RLC SRLG firmware version:    1
GPU[1]          : RLC SRLS firmware version:    1
GPU[1]          : SDMA firmware version:        145
GPU[1]          : SDMA2 firmware version:       145
GPU[1]          : SMC firmware version:         00.40.60.00
GPU[1]          : SOS firmware version:         0x00080b67
GPU[1]          : TA RAS firmware version:      27.00.01.43
GPU[1]          : TA XGMI firmware version:     32.00.00.02
GPU[1]          : UVD firmware version:         0x42002b13
GPU[1]          : VCE firmware version:         0x39060400
GPU[1]          : VCN firmware version:         0x00000000
====================================================================================
=================================== Product Info ===================================
GPU[0]          : Card series:          TBD VEGA20 CARD
GPU[0]          : Card model:           0x0834
GPU[0]          : Card vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0]          : Card SKU:             D1630600
GPU[1]          : Card series:           Radeon Instinct MI60 32GB
GPU[1]          : Card model:           0x0834
GPU[1]          : Card vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
GPU[1]          : Card SKU:             D1631200
====================================================================================
==================================== Pages Info ====================================
====================================================================================
============================== Show Valid sclk Range ===============================
GPU[0]          : get_od_volt, Not supported on the given system
GPU[1]          : get_od_volt, Not supported on the given system
====================================================================================
============================== Show Valid mclk Range ===============================
GPU[0]          : get_od_volt, Not supported on the given system
GPU[1]          : get_od_volt, Not supported on the given system
====================================================================================
============================= Show Valid voltage Range =============================
GPU[0]          : get_od_volt, Not supported on the given system
GPU[1]          : get_od_volt, Not supported on the given system
====================================================================================
=============================== Voltage Curve Points ===============================
GPU[0]          : get_od_volt_info, Not supported on the given system
GPU[1]          : get_od_volt_info, Not supported on the given system
====================================================================================
================================= Consumed Energy ==================================
GPU[0]          : Energy counter: 4294967295
GPU[0]          : Accumulated Energy (uJ): 65713000432.7
GPU[1]          : Energy counter: 4294967295
GPU[1]          : Accumulated Energy (uJ): 65713000432.7
====================================================================================
============================ Current Compute Partition =============================
GPU[0]          : Not supported on the given system
GPU[1]          : Not supported on the given system
====================================================================================
================================= Current NPS Mode =================================
GPU[0]          : Not supported on the given system
GPU[1]          : Not supported on the given system
====================================================================================
=============================== End of ROCm SMI Log ================================

jmoney7823956789378 · 2023-07-13T15:44:04Z

Hmm, yeah maybe next step will be 22.04, unfortunately.
If that doesn't work I'd say you might be getting stuck by the x8/x4 connection, but that could just be me talking out of my ass from my own experience.

btw I noticed you have one mismatched card, exactly like mine. One of them is samsung memory and one is hynix, if I remember correctly. It doesn't change anything architecture-wise, but I just found it interesting that we both ended up with mismatched cards.

sjstulga · 2023-07-13T15:46:19Z

We could swap so that we'd both have a match! Ha ha 😛

Interestingly, one of my MI60s is 1x8-pin PCIe power connector + 1x6-pin PCIe power connector, and the other MI60 is 2x8-pin PCIe power connectors, so they are mismatched in that way too

jmoney7823956789378 · 2023-07-13T15:53:52Z

We could swap so that we'd both have a match! Ha ha 😛

Interestingly, one of my MI60s is 1x8-pin PCIe power connector + 1x6-pin PCIe power connector, and the other MI60 is 2x8-pin PCIe power connectors, so they are mismatched in that way too

Yep, confirmed we have probably exactly the same mismatched cards, LOL.

sjstulga · 2023-07-13T16:07:42Z

Out of curiosity, how are you cooling your MI60s? I am cooling using 1 80mm fan per card, rigged up with 3D printed fan shrouds so that they force air through the cards

jmoney7823956789378 · 2023-07-13T16:15:55Z

Out of curiosity, how are you cooling your MI60s? I am cooling using 1 80mm fan per card, rigged up with 3D printed fan shrouds so that they force air through the cards

4x monster turbo whiney 40mms.

ardfork · 2023-07-14T12:05:26Z

Can you try hiding your APU? I only had problems trying to use mine.

HIP_VISIBLE_DEVICES=0,1 ROCR_VISIBLE_DEVICES=0,1

sjstulga · 2023-07-14T12:11:49Z

Can you try hiding your APU? I only had problems trying to use mine.

HIP_VISIBLE_DEVICES=0,1 ROCR_VISIBLE_DEVICES=0,1

I already have these environment variables set. I've run with them as part of the command line as well and the result is identical to above

jmoney7823956789378 · 2023-07-14T17:57:29Z

Did you try a docker yet? I'm not sure if it will help, but here's what I used to set up my docker container for oobabooga before.

https://github.com/evshiron/rocm_lab

Engininja2 · 2023-07-16T01:51:59Z

What's the output of ldd torch/lib/libtorch_hip.so | grep hipblas?

sjstulga · 2023-07-16T10:49:30Z

What's the output of ldd torch/lib/libtorch_hip.so | grep hipblas?

I have been procrastinating reformatting, so I can still tell you this! No output.

Here is without grep

(textgen) myuser@mymachine:~/pytorch/torch/lib$ ldd libtorch_hip.so 
        linux-vdso.so.1 (0x00007ffe3c368000)
        libc10_hip.so => /home/myuser/pytorch/torch/lib/./libc10_hip.so (0x00007f39c5c79000)
        libamdhip64.so.5 => /opt/rocm/hip/lib/libamdhip64.so.5 (0x00007f39c41b3000)
        libMIOpen.so.1 => /opt/rocm-5.6.0/lib/libMIOpen.so.1 (0x00007f39a7c3f000)
        libroctx64.so.4 => /opt/rocm-5.6.0/lib/libroctx64.so.4 (0x00007f39a7c3a000)
        librocblas.so.3 => /opt/rocm-5.6.0/lib/librocblas.so.3 (0x00007f3990bdc000)
        libhipfft.so => /opt/rocm-5.6.0/lib/libhipfft.so (0x00007f3990bce000)
        libhiprand.so.1 => /opt/rocm-5.6.0/lib/libhiprand.so.1 (0x00007f3990bc8000)
        libhipsparse.so.0 => /opt/rocm-5.6.0/lib/libhipsparse.so.0 (0x00007f3990b8f000)
        librccl.so.1 => /opt/rocm-5.6.0/lib/librccl.so.1 (0x00007f3980001000)
        libc10.so => /home/myuser/pytorch/torch/lib/./libc10.so (0x00007f397ff65000)
        libtorch_cpu.so => /home/myuser/pytorch/torch/lib/./libtorch_cpu.so (0x00007f3974ca6000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3974c6b000)
        libstdc++.so.6 => /home/myuser/miniconda3/envs/textgen/lib/libstdc++.so.6 (0x00007f3974a57000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3974908000)
        libgcc_s.so.1 => /home/myuser/miniconda3/envs/textgen/lib/libgcc_s.so.1 (0x00007f39748ee000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f39746fc000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f39d7abd000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f39746f0000)
        libamd_comgr.so.2 => /opt/rocm-5.6.0/lib/libamd_comgr.so.2 (0x00007f396b699000)
        libhsa-runtime64.so.1 => /opt/rocm-5.6.0/lib/libhsa-runtime64.so.1 (0x00007f396b3e8000)
        libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f396b3db000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f396b3d5000)
        librocfft.so.0 => /opt/rocm-5.6.0/lib/librocfft.so.0 (0x00007f396ad5e000)
        librocrand.so.1 => /opt/rocm-5.6.0/lib/librocrand.so.1 (0x00007f39681be000)
        librocsparse.so.0 => /opt/rocm-5.6.0/lib/librocsparse.so.0 (0x00007f3936819000)
        librocm_smi64.so.5 => /opt/rocm-5.6.0/lib/librocm_smi64.so.5 (0x00007f393676b000)
        libmkl_intel_lp64.so.2 => /home/myuser/miniconda3/envs/textgen/lib/libmkl_intel_lp64.so.2 (0x00007f393547b000)
        libmkl_gnu_thread.so.2 => /home/myuser/miniconda3/envs/textgen/lib/libmkl_gnu_thread.so.2 (0x00007f39337f9000)
        libmkl_core.so.2 => /home/myuser/miniconda3/envs/textgen/lib/libmkl_core.so.2 (0x00007f392f327000)
        libgomp.so.1 => /home/myuser/miniconda3/envs/textgen/lib/libgomp.so.1 (0x00007f392f2e3000)
        libroctracer64.so.4 => /opt/rocm/lib/libroctracer64.so.4 (0x00007f392f285000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f392f269000)
        libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f392f237000)
        libelf.so.1 => /lib/x86_64-linux-gnu/libelf.so.1 (0x00007f392f21b000)
        libdrm.so.2 => /opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2 (0x00007f392f202000)
        libdrm_amdgpu.so.1 => /opt/amdgpu/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1 (0x00007f392f1f4000)
        libhiprtc.so.5 => /opt/rocm-5.6.0/lib/libhiprtc.so.5 (0x00007f392f135000)

Pytorch doesn't have to be built linked to libhipblas.so. This links exllama_ext.so directly to hipblas to avoid potential errors like "exllama_ext.so: undefined symbol: hipblasGetStream" (turboderp#154)

Engininja2 · 2023-07-16T17:26:52Z

Did you build Pytorch with USE_FBGEMM=OFF?

Maybe exllama should start linking to hipblas directly. It looks like the only part of torch itself that needs hipblas is FBGEMM and that's both optional and doesn't get built for x86 32bit.

Could you try this change to the text-generation-webui/repositories version and see if it works? Engininja2/exllama@bb3473e

sjstulga · 2023-07-16T18:20:25Z

Did you build Pytorch with USE_FBGEMM=OFF?

I built Pytorch by cloning the official github and following only the steps specified in their README for building from source. I definitely didn't explicitly set this environment variable, but I'm not sure if it is on or off by default.

Could you try this change to the text-generation-webui/repositories version and see if it works? Engininja2@bb3473e

I think this has resolved the original issue! Here is my latest run with output. Not a final success, but definitely good progress, and maybe we are at the point where I should close this issue and think about opening a new one?

(textgen) myuser@mymachine:~/text-generation-webui$ HIP_VISIBLE_DEVICES=0,1 ROCR_VISIBLE_DEVICES=0,1 python server.py --notebook --model airoboros-65B-gpt4-1.4-GPTQ --loader exllama --gpu-split 20,20 --listen --api
2023-07-16 14:12:29 INFO:Loading airoboros-65B-gpt4-1.4-GPTQ...
2023-07-16 14:12:30 WARNING:Exllama module failed to load. Will attempt to load from repositories.
Successfully preprocessed all matching files.
Traceback (most recent call last):
  File "/home/myuser/text-generation-webui/server.py", line 1157, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/myuser/text-generation-webui/modules/models.py", line 78, in load_model
    output = load_func_map[loader](model_name)
  File "/home/myuser/text-generation-webui/modules/models.py", line 298, in ExLlama_loader
    model, tokenizer = ExllamaModel.from_pretrained(model_name)
  File "/home/myuser/text-generation-webui/modules/exllama.py", line 67, in from_pretrained
    model = ExLlama(config)
  File "/home/myuser/text-generation-webui/repositories/exllama/model.py", line 788, in __init__
    inv_freq = 1.0 / (self.config.rotary_embedding_base ** (torch.arange(0, self.config.head_dim, 2, device = device).float() / self.config.head_dim))
RuntimeError: HIP error: the operation cannot be performed in the present state
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

I managed to catch the rocm-smi output a moment before the text-generation-ui server bombed out, so I can confirm that the model did load into VRAM across both of the cards!

(textgen) myuser@mymachine:~$ rocm-smi 


========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
ERROR: GPU[2]   : sclk clock is unsupported
====================================================================================
====================================================================================
GPU[2]          : get_power_cap, Not supported on the given system
GPU  Temp (DieEdge)  AvgPwr  SCLK    MCLK    Fan     Perf  PwrCap       VRAM%  GPU%  
0    30.0c           19.0W   925Mhz  350Mhz  14.51%  auto  225.0W        63%   0%    
1    30.0c           15.0W   925Mhz  350Mhz  14.51%  auto  225.0W        51%   0%    
2    28.0c           12.0W   None    None    0%      auto  Unsupported    3%   0%    
====================================================================================
=============================== End of ROCm SMI Log ================================

cebtenzzre · 2023-07-29T05:14:31Z

I installed python-pytorch-opt-rocm on Arch Linux and also needed the explicit -lhipblas.

jmoney7823956789378 · 2023-07-29T13:42:17Z

@sjstulga not sure if you're still having issues, but I wanted to point out that I was using CUDA_VISIBLE_DEVICES=0 (or 1) even when using my MI60s. I saw you have that in there by other names, but maybe.
Also the device-side assertions error happens a lot in CUDA too. I haven't found a reliable fix outside of turning it off and back on...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImportError: /home/myuser/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so: undefined symbol: hipblasGetStream #154

ImportError: /home/myuser/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so: undefined symbol: hipblasGetStream #154

sjstulga commented Jul 13, 2023 •

edited

Loading

jmoney7823956789378 commented Jul 13, 2023

sjstulga commented Jul 13, 2023

jmoney7823956789378 commented Jul 13, 2023

sjstulga commented Jul 13, 2023

jmoney7823956789378 commented Jul 13, 2023 •

edited

Loading

sjstulga commented Jul 13, 2023

jmoney7823956789378 commented Jul 13, 2023

sjstulga commented Jul 13, 2023

jmoney7823956789378 commented Jul 13, 2023

sjstulga commented Jul 13, 2023

jmoney7823956789378 commented Jul 13, 2023 •

edited

Loading

sjstulga commented Jul 13, 2023 •

edited

Loading

jmoney7823956789378 commented Jul 13, 2023

sjstulga commented Jul 13, 2023 •

edited

Loading

jmoney7823956789378 commented Jul 13, 2023

sjstulga commented Jul 13, 2023

jmoney7823956789378 commented Jul 13, 2023

ardfork commented Jul 14, 2023

sjstulga commented Jul 14, 2023

jmoney7823956789378 commented Jul 14, 2023

Engininja2 commented Jul 16, 2023

sjstulga commented Jul 16, 2023

Engininja2 commented Jul 16, 2023

sjstulga commented Jul 16, 2023 •

edited

Loading

cebtenzzre commented Jul 29, 2023

jmoney7823956789378 commented Jul 29, 2023

ImportError: /home/myuser/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so: undefined symbol: hipblasGetStream #154

ImportError: /home/myuser/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so: undefined symbol: hipblasGetStream #154

Comments

sjstulga commented Jul 13, 2023 • edited Loading

jmoney7823956789378 commented Jul 13, 2023

sjstulga commented Jul 13, 2023

jmoney7823956789378 commented Jul 13, 2023

sjstulga commented Jul 13, 2023

jmoney7823956789378 commented Jul 13, 2023 • edited Loading

sjstulga commented Jul 13, 2023

jmoney7823956789378 commented Jul 13, 2023

sjstulga commented Jul 13, 2023

jmoney7823956789378 commented Jul 13, 2023

sjstulga commented Jul 13, 2023

jmoney7823956789378 commented Jul 13, 2023 • edited Loading

sjstulga commented Jul 13, 2023 • edited Loading

jmoney7823956789378 commented Jul 13, 2023

sjstulga commented Jul 13, 2023 • edited Loading

jmoney7823956789378 commented Jul 13, 2023

sjstulga commented Jul 13, 2023

jmoney7823956789378 commented Jul 13, 2023

ardfork commented Jul 14, 2023

sjstulga commented Jul 14, 2023

jmoney7823956789378 commented Jul 14, 2023

Engininja2 commented Jul 16, 2023

sjstulga commented Jul 16, 2023

Engininja2 commented Jul 16, 2023

sjstulga commented Jul 16, 2023 • edited Loading

cebtenzzre commented Jul 29, 2023

jmoney7823956789378 commented Jul 29, 2023

sjstulga commented Jul 13, 2023 •

edited

Loading

jmoney7823956789378 commented Jul 13, 2023 •

edited

Loading

jmoney7823956789378 commented Jul 13, 2023 •

edited

Loading

sjstulga commented Jul 13, 2023 •

edited

Loading

sjstulga commented Jul 13, 2023 •

edited

Loading

sjstulga commented Jul 16, 2023 •

edited

Loading