-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Vulkan backend unable to allocate memory when run across multiple GPUs for larger models #7804
Comments
On a side note, when the |
@richardanaya This might be a case where your main GPU takes on a little bit more than the other one, and it also handles the UI, so it runs out of VRAM. Try running it with the second one as main gpu |
@0cc4m hmm, thanks, i'm going to close this. I was able to get it to run by reducing the number of layers, it was far less than I was expecting. I'll re-open if I can get some more specific vulkan issues. |
@richardanaya I fixed a VRAM bug that affected llama-3 on Vulkan in #7947, can you check whether that solved your issue? 70B should fit on 48GB VRAM. |
@0cc4m I think there's been a significant regression, llama-server.exe doesn't seem to detect two 7900xtxs anymore. Older versions of llama.cpp still see two graphics cards. Created a new bug here #7997
|
@0cc4m great news! multi-gpu is running with larger models than i've ever been able to run before :) I can get some 70b with 8k context |
PS Z:\llama> .\llama-bench.exe -m ..\gguf_models\Cat-Llama-3-70B-instruct-Q4_K_M.gguf
build: 557b653 (3197) |
What happened?
I have two 24gb 7900xtx and i've noticed when I try to offload models to them that are definitely within their specs I get OOM errors. @0cc4m
Name and Version
.\server.exe -m ..\gguf_models\Cat-Llama-3-70B-instruct-Q4_K_M.gguf -ngl 100
What operating system are you seeing the problem on?
Windows
Relevant log output
The text was updated successfully, but these errors were encountered: