Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm containers fail on multi-gpu AMD systems #525

Closed
abn opened this issue Dec 26, 2024 · 4 comments
Closed

ROCm containers fail on multi-gpu AMD systems #525

abn opened this issue Dec 26, 2024 · 4 comments

Comments

@abn
Copy link
Contributor

abn commented Dec 26, 2024

When attempting to run a model, the command fails.

$ ramalama --debug run granite-code
run_cmd:  podman run --rm -i --label RAMALAMA --security-opt=label=disable --name ramalama_rDnJbnsIEU -t --device /dev/dri --device /dev/kfd -e HIP_VISIBLE_DEVICES=1 --mount=type=bind,src=/home/abn/.local/share/ramalama/models/ollama/granite-code:latest,destination=/mnt/models/model.file,rw=false quay.io/ramalama/rocm:latest /bin/sh -c llama-cli -m /mnt/models/model.file --in-prefix '' --in-suffix '' -c 2048 --temp 0.8 -p 'You are a helpful assistant' -cnv
Error: Command '['podman', 'run', '--rm', '-i', '--label', 'RAMALAMA', '--security-opt=label=disable', '--name', 'ramalama_rDnJbnsIEU', '-t', '--device', '/dev/dri', '--device', '/dev/kfd', '-e', 'HIP_VISIBLE_DEVICES=1', '--mount=type=bind,src=/home/abn/.local/share/ramalama/models/ollama/granite-code:latest,destination=/mnt/models/model.file,rw=false', 'quay.io/ramalama/rocm:latest', '/bin/sh', '-c', "llama-cli -m /mnt/models/model.file --in-prefix '' --in-suffix '' -c 2048 --temp 0.8 -p 'You are a helpful assistant' -cnv"]' returned non-zero exit status 139.

Running the podman command gives the following.

$ podman run --rm -i --label RAMALAMA --security-opt=label=disable --name ramalama_rDnJbnsIEU -t --device /dev/dri --device /dev/kfd -e HIP_VISIBLE_DEVICES=1 --mount=type=bind,src=/home/abn/.local/share/ramalama/models/ollama/granite-code:latest,destination=/mnt/models/model.file,rw=false quay.io/ramalama/rocm:latest /bin/sh -c llama-cli -m /mnt/models/model.file --in-prefix '' --in-suffix '' -c 2048 --temp 0.8 -p 'You are a helpful assistant' -cnv

rocBLAS error: Cannot read /opt/rocm-6.2.2/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx1103
 List of available TensileLibrary Files : 
"/opt/rocm-6.2.2/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat"
"/opt/rocm-6.2.2/lib/rocblas/library/TensileLibrary_lazy_gfx1012.dat"
"/opt/rocm-6.2.2/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/opt/rocm-6.2.2/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/opt/rocm-6.2.2/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
"/opt/rocm-6.2.2/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat"

The error encountered in this case seems to be similar to that seen in #497. However, in my case the desired outcome was either that the right GPU be selected or the HSA_OVERRIDE_GFX_VERSION env var be set rather than forcing to run on cpu.

The root cause in my setup seems to be somehow related to the existence of multiple GPUs on the machine. Although, I am not certain.

$ rocminfo
...
==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 7840HS w/ Radeon 780M Graphics
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 7840HS w/ Radeon 780M Graphics 
...
*******                  
Agent 2                  
*******                  
  Name:                    gfx1102                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 7600M XT        
...
*******                  
Agent 3                  
*******                  
  Name:                    gfx1103                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon 780M
...

Expected Outcome

  1. When a model is executed on a multi-gpu AMD system, the correct HIP device (maybe a preferential order of some sort?) is selected and specified in the container. Alternatively, allowing for specifying the device via flag (--gpu <num>) or respecting existing environment variables would be sufficient (Fixed gpu detection for cuda rocm etc using env vars #490 might resolve this).
  2. (Maybe?) When a model is executed on a HIP device whose GFX version differs to the highest available version on the system, specify HSA_OVERRIDE_GFX_VERSION when executing the container.

I am happy to contribute to code if a direction for the fix is provided.

Workarounds

In my local environment, I had to do one of the following to work around the issue. Both had to be done in the podman command as I could not figure out how to configure it via RamaLama.

  1. Set HIP_VISIBLE_DEVICES=0 to detect AMD Radeon RX 7600M XT.
  2. Set HSA_OVERRIDE_GFX_VERSION=11.0.2 so that it worked with HIP_VISIBLE_DEVICES=1 which selects the iGPU and is what RamaLama chooses to pass to podman.
@ericcurtin
Copy link
Collaborator

Open to ideas, but by default I propose that RamaLama should just try and use the GPU with the most VRAM (and just a sole GPU). I think heuristics more complex than that are not worth it.

For multi-GPU etc. or any other non-default way of running models, there should be a way to set that up, either via flag, env var etc.

We should support using the various flags people are using with llama.cpp in the AI community like HIP_VISIBLE_DEVICES, HSA_OVERRIDE_GFX_VERSION, etc. No point in reinventing the wheel.

@abn
Copy link
Contributor Author

abn commented Dec 28, 2024

From my perspective, I agree that defaulting to larger VRAM GPU is great as it is likely what most users would expect anyway.

Edit: Seems like this is already done in code, in my case seems I have around 8GB allocated to the APU for VRAM, and the GPU has 7.98GB - this caused RamaLama to choose the APU instead of the GPU.

That said, something I have not fully understood in my original error is why the command failed when the iGPU (HIP_VISIBLE_DEVICES=1) supported 11.0.3. It feels like this type of mismatch could be common for multi-gpu systems, like when using an AMD CPU and a dedicated AMD GPU.

Edit: Upon investigation this looks more like a ROCm/PyTorch issue.

And yes, it would be great if RamaLama simply passed in all HIP_* and HSA_* variables in defined in the environment to the container.

Edit: Proposed #526 for this change.

abn added a commit to abn/ramalama that referenced this issue Dec 28, 2024
abn added a commit to abn/ramalama that referenced this issue Dec 28, 2024
abn added a commit to abn/ramalama that referenced this issue Dec 28, 2024
abn added a commit to abn/ramalama that referenced this issue Dec 28, 2024
abn added a commit to abn/ramalama that referenced this issue Dec 28, 2024
@ericcurtin
Copy link
Collaborator

@abn do we consider this issue closed?

@abn
Copy link
Contributor Author

abn commented Jan 5, 2025

@ericcurtin I think we can. I do not think ramalama can do much else here for now.

@abn abn closed this as completed Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants