Video2Text Inference is slow and high vram consumption #116

southkorea2013 · 2024-11-21T07:26:49Z

Hi,

I want to process a 90-seconds video, but the memory is overflow. Is there any solution to decrease the vram consumption?
Thanks.

python -m mlx_vlm.video_generate --model mlx-community/Qwen2-VL-7B-Instruct-bf16 --max-tokens 500 --prompt "Describe this video" --video /Users/mdsadmin/demos/Excavator.mp4 --max-pixels 720 410 --fps 1.0
Loading model: mlx-community/Qwen2-VL-7B-Instruct-bf16
Fetching 14 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 44183.79it/s]
==========
Video: /Users/mdsadmin/demos/Excavator.mp4 

Prompt: <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
<|vision_start|><|video_pad|><|vision_end|>Describe this video<|im_end|>
<|im_start|>assistant

qwen-vl-utils using torchvision to read video.
Generating video description...
libc++abi: terminating due to uncaught exception of type std::runtime_error: Attempting to allocate 190794240000 bytes which is greater than the maximum allowed buffer size of 77309411328 bytes.

The text was updated successfully, but these errors were encountered:

Blaizzy · 2024-11-21T07:28:10Z

Could you share the specs of your machine?

Blaizzy · 2024-11-21T07:35:01Z

I would recommend:

Trying 8bit or 4bit quants.
Trying the 2B version.
Or lowering the resolution further to 512 or 224

southkorea2013 · 2024-11-21T07:35:32Z

Hi Prince,

My testing Machine is: M3 Max 128G ram.

Thanks,
Nan

southkorea2013 · 2024-11-21T07:37:09Z

I would recommend:

Trying 8bit or 4bit quants.

Trying the 2B version.

Or lowering the resolution further to 512 or 224

Ok, Thanks.

Blaizzy · 2024-11-21T07:45:10Z

Awesome!

It should work fine if you just lower the resolution.

I have M3 Max with 96GB URAM.

I can run this example in under a minute:
https://github.com/Blaizzy/mlx-vlm/blob/62bb0ee2f57354de4cd27e42be593049269353a4/examples/video_generation.ipynb

Blaizzy · 2024-11-21T07:45:49Z

Ok, Thanks

My pleasure!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Video2Text Inference is slow and high vram consumption #116

Video2Text Inference is slow and high vram consumption #116

southkorea2013 commented Nov 21, 2024 •

edited

Loading

Blaizzy commented Nov 21, 2024

Blaizzy commented Nov 21, 2024

southkorea2013 commented Nov 21, 2024

southkorea2013 commented Nov 21, 2024

Blaizzy commented Nov 21, 2024

Blaizzy commented Nov 21, 2024

Video2Text Inference is slow and high vram consumption #116

Video2Text Inference is slow and high vram consumption #116

Comments

southkorea2013 commented Nov 21, 2024 • edited Loading

Blaizzy commented Nov 21, 2024

Blaizzy commented Nov 21, 2024

southkorea2013 commented Nov 21, 2024

southkorea2013 commented Nov 21, 2024

Blaizzy commented Nov 21, 2024

Blaizzy commented Nov 21, 2024

southkorea2013 commented Nov 21, 2024 •

edited

Loading