-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM when use InternVL2_5-1B-MPO #3143
Comments
Can you share the following information?
|
I will get the error when I run Below is the code, I run |
Can you help open INFO log level? Let's check what the log indicates from lmdeploy import pipeline, TurbomindEngineConfig, PytorchEngineConfig
pipe = pipeline("./InternVL2_5-1B-MPO-4bit/", backend_config=TurbomindEngineConfig(model_format="awq",session_len=2048), log_level='INFO') |
What's the mem size of jetson orin? |
GPU memory size is 16GB, and I uploaded the log file. log.txt |
Did you build lmdeploy from the source? The default prebuilt package works for x86_64 platform rather than aarch64 platform |
Yes, I built LMDeploy from source. By default, BUILD_MULTI_GPU is set to ON, but I modified it to OFF because there is only one GPU on the Jetson. |
Sure. |
seem similar to me #3006 |
Thank you for your sharing. I want to test InternVL2.5, I can't downgrade to v0.4.0. |
From the log, the OOM is triggered at tuning stage. The most relevant option is You may also want to decrease |
I still encountered the same error after adding I upload two log files( log_1.txt log_2.txt ), log_1 uses max_prefill_token_num, while log_2 uses both. |
I followed the installation guide to build mldeploy (0.7.0.post3) from source.
Inference using the PyTorch engine works fine.
However, after quantizing the model to 4-bit using AWQ, I encountered an OOM error when loading the model with the TurboMind engine.
I try to set "session_len=2048" in TurbomindEngineConfig.
The text was updated successfully, but these errors were encountered: