-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCm: Memory calculation fails to take HIP_VISIBLE_DEVICES
into account
#1104
Comments
Do you know if I am able to differentiate between the GPU types in rocminfo? I dont have amd so I can't tell. Ideally there should be a rocminfo flag to skip igpus |
Yeah, that would be ideal, wouldn't it? :P FWIW this is the output of
I stubbed out the id of my discrete card just in case anyone could get up to shenanigans with it (I don't think this is actually the serial number because there's an option for that too, and that never returns anything, but who knows). So maybe the unique ID is usable as the iGPU has "N/A"? But I couldn't tell you if this is a guarantee. Incidentally there is also a {
"card0": {
"Unique ID": "0x9246____________",
"VRAM Total Memory (B)": "17163091968",
"VRAM Total Used Memory (B)": "708767744",
"Card Series": "Navi 21 [Radeon RX 6800/6800 XT / 6900 XT]",
"Card Model": "0x73bf",
"Card Vendor": "Advanced Micro Devices, Inc. [AMD/ATI]",
"Card SKU": "001",
"Subsystem ID": "0x2406",
"Device Rev": "0xc1",
"Node ID": "1",
"GUID": "45534",
"GFX Version": "gfx1030"
},
"card1": {
"Unique ID": "N/A",
"VRAM Total Memory (B)": "67108864",
"VRAM Total Used Memory (B)": "26079232",
"Card Series": "Raphael",
"Card Model": "0x164e",
"Card Vendor": "Advanced Micro Devices, Inc. [AMD/ATI]",
"Card SKU": "RAPHAEL",
"Subsystem ID": "0x364e",
"Device Rev": "0xc6",
"Node ID": "2",
"GUID": "52156",
"GFX Version": "gfx1036"
}
} |
To be clear, I'm not suggesting koboldcpp should actually do something like detect iGPUs and actively ignore them -- this would be counterproductive for people who actually want to use them, which is absolutely possible for most AMD configurations (though you may need |
Decided to be un-lazy and cook up a PR anyway. The logic may or may not apply to NV as well, per my earlier comment. Note that the calculation as a whole is, I think, wrong/suspect/deficient when multiple GPUs are used simultaneously, since there seems to be no attempt at using |
I have a ROCm compiled with support for both the discrete GPU and the iGPU, but with
HIP_VISIBLE_DEVICES
set to0
to ensure only the discrete GPU is considered (the iGPU is just for experimenting, it's far too slow to meaningfully use). But becauserocminfo
androcm-smi
list both GPUs, settinggpulayers
to-1
to trigger the automatic calculation uses the reported iGPU memory. This is wrong/suboptimal in two respects:GGML_HIP_UMA
, though, which tanks perf on discrete GPUs -- but I digress.HIP_VISIBLE_DEVICES
set, the iGPU is actually not used for inference at all, yet its lower memory is used, causing the calculation to always offload 0 layers.If I ugly hack my local koboldcpp.py to simply ignore any devices beyond the first the auto-layer calculation does its job correctly. I'm too lazy to write a real patch fixing the problem, I just wanted to mention it here. Taking
HIP_VISIBLE_DEVICES
(and/orCUDA_VISIBLE_DEVICES
, which AMD supports for compatibility) should not be particularly hard. Taking the iGPU reported memory thing into account is probably too complicated (AFAIK there isn't even a reliable way to detect it's an iGPU; things like the marketing name being "AMD Radeon Graphics" and the uuid being "GPU-XX" are certainly suggestive, but hardly convincing). If you're using an iGPU (or God forbid a combination) you probably don't want to rely on the auto-layer feature anyway.I don't know if Nvidia would have similar issues with
CUDA_VISIBLE_DEVICES
active.The text was updated successfully, but these errors were encountered: