-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Linux] (Exit code 133) Error when loading large LLM models #285
Comments
@yushijinhun which operating system are you on? Assuming Linux? |
Yes. I'm on Linux. |
Thanks. @neilmehta24 from our team is investigating this with priority as we speak. |
I built Electron v33.0.2 with |
Could you give a little more specific detail about exactly how you resolved this? I'm in a similar situation, on linux, 16gbVRAM 128GB system RAM, could load models in Oobabooga but getting the 133 error in LM studio. Not sure where to start or how I would "build" Electron or what Electron is and how it relates to the Linux app image of LM Studio that I load. |
I run LM Studio v0.3.6 on Ubuntu 24.04, with 7840HS CPU, 96GB RAM and 16GB GPU VRAM, encountered the same issue: small models can be loaded normally but large models cannot be loaded. I would appreciate it if @yushijinhun share your compiled Electron version. |
Electron is a UI framework built on Chromium used by LM Studio.
Here is my Electron v33.0.2 Linux x86_64 build (without allocator shim): Google Drive You can also follow Electron's Build Instructions to build it yourself. To replace LM Studio' stock Electron, first unpack the AppImage:
Then you would see a |
@yushijinhun Thank you for that :) |
Why is this not set to default? LMstudio exclusively uses GGUF files which implies most users are limited on Vram. |
This is a new bug in 0.3.6 because we updated our Electron version. |
Same issue when downgrading. |
I am using linux version of LM studio 0 3.6 , with Ubuntu 22.04 LTS , I have two 3090 card and 256 GB RAM and AMD 24 Core CPU , for some reason the Qwen2 VL model not load 7B or 70B I get the following message (Exit code: 133). Please check settings and try loading the model again. I was able to load 7B version of the same model on the windows machine (0.3.6) with 64 GB RAM and 12 GB 4070 card , here neither 72B or 4B does not load my NVidia driver is 550, I thought it was a memory issues , them I tried LLAMA 70B , which loaded just fine , the GPU utilisation was around 15 GB each . Are there any logs collected by LM studion which I can share to help with this problem. |
Thanks @aamir-gmail! We are aware of this issue and we are working on a fix. The recommendation is to stay on 0.3.5 until it's out. Get 0.3.5 from https://lmstudio.ai/download#beta-and-experimental-releases |
FYI, I was able to load LLAMA 70B with 0.3.6 without a problem. I will
keep you posted on how go with 0.3.5
…On Mon, Jan 13, 2025 at 12:37 PM Yagil Burowski ***@***.***> wrote:
Thanks @aamir-gmail <https://github.com/aamir-gmail>! We are aware of
this issue and we are working on a fix. The recommendation is to stay on
0.3.5 until it's out. Get 0.3.5 from
https://lmstudio.ai/download#beta-and-experimental-releases
—
Reply to this email directly, view it on GitHub
<#285 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJA2ECIRZUWL2M6CV27FQXT2KMKG7AVCNFSM6AAAAABUWBROFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBWGAZTAMZSG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Kind Regards
Aamir Mirza
|
Tried your link for Qwen2 VL 7B and 70B , on the download link you provided
still got the same error message. Are there any logs I can send you.
Let me know where to find them if you require them.
(Exit code: 133). Please check the settings and try loading the model
again.
…On Mon, Jan 13, 2025 at 4:24 PM Aamir mirza ***@***.***> wrote:
FYI, I was able to load LLAMA 70B with 0.3.6 without a problem. I will
keep you posted on how go with 0.3.5
On Mon, Jan 13, 2025 at 12:37 PM Yagil Burowski ***@***.***>
wrote:
> Thanks @aamir-gmail <https://github.com/aamir-gmail>! We are aware of
> this issue and we are working on a fix. The recommendation is to stay on
> 0.3.5 until it's out. Get 0.3.5 from
> https://lmstudio.ai/download#beta-and-experimental-releases
>
> —
> Reply to this email directly, view it on GitHub
> <#285 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AJA2ECIRZUWL2M6CV27FQXT2KMKG7AVCNFSM6AAAAABUWBROFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBWGAZTAMZSG4>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
--
Kind Regards
Aamir Mirza
--
Kind Regards
Aamir Mirza
|
I'm facing with the same issue. I have no GPU, but I have 32GB ram and intel core vPRO i7. I'm using latest cersion of LMS0.3.8-4 |
Same issue here even if I try offload the whole thing to RAM (mave more then enough) errors out. currently just dropping down a model size for the time being. Happy to share any logs that you need. |
Same issue on Ubuntu 24.04 with 64Gb RAM. The model I am trying to load is 34.8 Gb bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF/DeepSeek-R1-Distill-Qwen-32B-Q8_0.gguf |
I resolved this issue by doing the following, In my case I need to first install 'vulkan' tools
|
Same problem. 64GB RAM, GeForce RTX 3060, Linux Mint 22.1 Cinnamon, Model 'DeepSeek R1 Distill Llama 70B Q4_K_L': Error loading model. (Exit code: 133) But with Windows 11 on the same machine, with the same model copied from the Linux folder to the Windows folder, it loads and works without problems. Both LM Studio 0.3.8 (Build 4) alphacentrino: your solution doesn't work on my computer. |
We are working on a fix, sorry for the wait. In the meantime, 0.3.5 should work: https://lmstudio.ai/download#beta-and-experimental-releases |
@yagil O.3.5 has the same issue for me. I have two desktops and same issue occurs on both machines with 0.3.5 and the recent release. |
Hi Jachyme, |
I have the same issue with ubuntu 22.04 (128GB RAM) (4GB VRAM). However it works in 0.3.5 (Build 2) if I use:
The trick with the reduce batch evaluation size shouldn't even be necessary as OP pointed out it all works with vanilla llama.cpp. BUT: However going to newer version (0.3.7 and 0.3.8) to use the R1 Distill models for example , the "trick" with the reduced batch eval size does not work anymore and it tells me:
It loads the model and then at 10GB RAM usage or so the error occurs. EDIT: BUG LOGS: [ModelLoadingProvider] Requested to load model bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF/DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf with opts {
|
Fyi the model Qwen2-VL-72B-Instruct 7B or 72B does not work any LM
studio release 3.5 to 3.8, to recap my system is 2x 3090, 256 GB RAm
ubuntu 22.04 .
As as suggestion please enable detailed logging so we can understand where
and what exactly is failing and offer fix.
…On Sun, Jan 26, 2025 at 9:21 AM dynamiccreator ***@***.***> wrote:
I have the same issue with ubuntu 22.04 (128GB RAM) (4GB VRAM).
However it works in 0.3.5 (Build 2) if I use:
-
CPU runtime from the settings
-
set offload layers explicitly (bin symbol appears) to 0
(GPU runtime might also work with this setting)
-
reduce batch evaluation size to 128 or lower
The trick with the reduce batch evaluation size shouldn't even be
necessary as OP pointed out it all works with vanilla llama.cpp.
However going to newer version (0.3.7 and 0.3.8) to use the R1 Distill
models for example , the "trick" with the reduced batch eval size does not
work anymore and it tells me:
🥲 Failed to load the model
Error loading model.
(Exit code: 133). Please check settings and try loading the model again.
It loads the model and then at 10GB RAM usage or so the error occurs.
—
Reply to this email directly, view it on GitHub
<#285 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJA2ECL33Q55RJ3PCQJI4JD2MQE7RAVCNFSM6AAAAABUWBROFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJUGEYTQMBZGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Kind Regards
Aamir Mirza
|
I have edited my comment and added logs to my description. I hope these are the log messages you need. |
I have met same situation/errors, that says "(Exit code: 133). Please check settings and try loading the model again. ". error logsin case of error occurs
in case of successful pattern
environmentSoftware
Hardware
|
I want to add that it works with no problem if all layers + cache fits into VRAM. But with 0 layers to GPU it stops loading and that is not bound to the model size but only to the context size. So even a 70B model can be loaded if I set the context to 1000 or so. |
@yagil any news on this? By the way, latest 0.3.9 builds - both stable or beta - don't fix this and users are reporting neither the 0.3.5 you linked does. Are there any cons to the proposed change in the Electron compilation flag discussed above, which seems to fix the issue? |
Does not seem to be solved in the new version 0.3.9 |
Version 0.3.9 still has the issue. By lowering the Context Length, the model was able to load. |
how did you solve this problem? I'm trying to run deepseekr1 gguf which at 20gpu would run |
how did you solve this problem? I'm trying to run deepseekr1 gguf which at 20gpu would run |
For what it's worth, I updated to the latest beta and that solved it for me. |
I have not been able to get it to work with any version or any of the tricks mentioned. The user interface of the application is the best of all I've tried, it makes me very angry that I can't use it. |
I was running into this same issue over the last couple of days when attempting to use DeepSeek-R1 GGUF. This approach worked for me. I am now able to load the 300GB+ versions of the model. |
0.3.9 build 6, issue still exists... |
I have same the problem on 0.3.9 |
the same problem on 0.3.9, model
|
Update about models loaded successfully and unsuccessfully with LM Studio 0.3.9 build 6 (Linux Mint 22, AMD Ryzen 7 5700G, 64 GB RAM):
|
When loading large LLMs (for example, Meta-Llama-3.1-70B-Instruct-IQ2_S with context window 32768), I would encounter the error
(Exit code: 133). Please check settings and try loading the model again.
. My machine has 64G RAM and 16G vRAM, and I can load the model with same configuration with llama.cpp. Therefore, the problem should not be caused by insufficient RAM or vRAM.Further investigation into the coredump shows that, the application crashes in function
_ZN15partition_alloc8internal32PartitionExcessiveAllocationSizeEm
, which means the application is trying to allocate an excessive amount of memory using PartitionAlloc in a call toposix_memalign
. This explains why the problem occurs only on LM Studio but not on llama.cpp, as Electron uses PartitionAlloc by default. This is tracked in issue electron/electron#44291.Apart from waiting for upstream to fix this issue, a potential workaround is to use a customized Electron build to disable PartitionAlloc.
The text was updated successfully, but these errors were encountered: