OOM when use InternVL2_5-1B-MPO #3143

BobHo5474 · 2025-02-14T08:46:52Z

I followed the installation guide to build mldeploy (0.7.0.post3) from source.
Inference using the PyTorch engine works fine.
However, after quantizing the model to 4-bit using AWQ, I encountered an OOM error when loading the model with the TurboMind engine.
I try to set "session_len=2048" in TurbomindEngineConfig.

lvhan028 · 2025-02-14T09:00:03Z

Can you share the following information?

running lmdeploy check_env
the reproducible code

BobHo5474 · 2025-02-14T09:17:47Z

I will get the error when I run lmdeploy check_env because I built lmdeploy on Jetson Orin.

Below is the code,
from lmdeploy import pipeline, TurbomindEngineConfig, PytorchEngineConfig
pipe = pipeline("./InternVL2_5-1B-MPO-4bit/", backend_config=TurbomindEngineConfig(model_format="awq", session_len=2048))

I run lmdeploy lite auto_awq OpenGVLab/InternVL2_5-1B-MPO --work-dir InternVL2_5-1B-MPO-4bit to quantize model.

lvhan028 · 2025-02-14T09:35:31Z

Can you help open INFO log level? Let's check what the log indicates

from lmdeploy import pipeline, TurbomindEngineConfig, PytorchEngineConfig
pipe = pipeline("./InternVL2_5-1B-MPO-4bit/", backend_config=TurbomindEngineConfig(model_format="awq",session_len=2048), log_level='INFO')

lvhan028 · 2025-02-14T09:36:36Z

What's the mem size of jetson orin?

BobHo5474 · 2025-02-14T09:40:04Z

GPU memory size is 16GB, and I uploaded the log file. log.txt

lvhan028 · 2025-02-17T04:21:28Z

Did you build lmdeploy from the source? The default prebuilt package works for x86_64 platform rather than aarch64 platform

BobHo5474 · 2025-02-17T04:43:04Z

Yes, I built LMDeploy from source. By default, BUILD_MULTI_GPU is set to ON, but I modified it to OFF because there is only one GPU on the Jetson.

lvhan028 · 2025-02-17T05:24:22Z

Sure.
@lzhangzz do you have any clue?

quanfeifan · 2025-02-22T08:32:30Z

seem similar to me #3006

BobHo5474 · 2025-02-24T01:32:29Z

Thank you for your sharing. I want to test InternVL2.5, I can't downgrade to v0.4.0.
@lzhangzz Do you have any clues on how to solve this issue?

lzhangzz · 2025-02-25T06:07:13Z

From the log, the OOM is triggered at tuning stage. The most relevant option is --max-prefill-token-num, the default value 8192. To start with, try to decrease it to 2048.

You may also want to decrease --cache-max-entry-count to 0.5 or even 0.25, as allocation for KV cache precedes intermediate buffers.

BobHo5474 · 2025-02-26T10:19:18Z

I still encountered the same error after adding max_prefill_token_num=2048.
However, when I also addedcache_max_entry_count=0.25, the error did not occur, but I received no kernel image is available for execution on the device. Then, inference resulted in a new error.
Does no kernel image is available for execution on the device indicate that lmdeploy 0.7.0-post3 is not supported on Jetson?

I upload two log files( log_1.txt log_2.txt ), log_1 uses max_prefill_token_num, while log_2 uses both.

lvhan028 self-assigned this Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM when use InternVL2_5-1B-MPO #3143

OOM when use InternVL2_5-1B-MPO #3143

BobHo5474 commented Feb 14, 2025

lvhan028 commented Feb 14, 2025

BobHo5474 commented Feb 14, 2025

lvhan028 commented Feb 14, 2025

lvhan028 commented Feb 14, 2025

BobHo5474 commented Feb 14, 2025

lvhan028 commented Feb 17, 2025 •

edited

Loading

BobHo5474 commented Feb 17, 2025

lvhan028 commented Feb 17, 2025

quanfeifan commented Feb 22, 2025

BobHo5474 commented Feb 24, 2025

lzhangzz commented Feb 25, 2025 •

edited

Loading

BobHo5474 commented Feb 26, 2025

OOM when use InternVL2_5-1B-MPO #3143

OOM when use InternVL2_5-1B-MPO #3143

Comments

BobHo5474 commented Feb 14, 2025

lvhan028 commented Feb 14, 2025

BobHo5474 commented Feb 14, 2025

lvhan028 commented Feb 14, 2025

lvhan028 commented Feb 14, 2025

BobHo5474 commented Feb 14, 2025

lvhan028 commented Feb 17, 2025 • edited Loading

BobHo5474 commented Feb 17, 2025

lvhan028 commented Feb 17, 2025

quanfeifan commented Feb 22, 2025

BobHo5474 commented Feb 24, 2025

lzhangzz commented Feb 25, 2025 • edited Loading

BobHo5474 commented Feb 26, 2025

lvhan028 commented Feb 17, 2025 •

edited

Loading

lzhangzz commented Feb 25, 2025 •

edited

Loading