Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video2Text Inference is slow and high vram consumption #116

Open
southkorea2013 opened this issue Nov 21, 2024 · 6 comments
Open

Video2Text Inference is slow and high vram consumption #116

southkorea2013 opened this issue Nov 21, 2024 · 6 comments

Comments

@southkorea2013
Copy link

southkorea2013 commented Nov 21, 2024

Hi,

I want to process a 90-seconds video, but the memory is overflow. Is there any solution to decrease the vram consumption?
Thanks.

python -m mlx_vlm.video_generate --model mlx-community/Qwen2-VL-7B-Instruct-bf16 --max-tokens 500 --prompt "Describe this video" --video /Users/mdsadmin/demos/Excavator.mp4 --max-pixels 720 410 --fps 1.0
Loading model: mlx-community/Qwen2-VL-7B-Instruct-bf16
Fetching 14 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 44183.79it/s]
==========
Video: /Users/mdsadmin/demos/Excavator.mp4 

Prompt: <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
<|vision_start|><|video_pad|><|vision_end|>Describe this video<|im_end|>
<|im_start|>assistant

qwen-vl-utils using torchvision to read video.
Generating video description...
libc++abi: terminating due to uncaught exception of type std::runtime_error: Attempting to allocate 190794240000 bytes which is greater than the maximum allowed buffer size of 77309411328 bytes.
@Blaizzy
Copy link
Owner

Blaizzy commented Nov 21, 2024

Could you share the specs of your machine?

@Blaizzy
Copy link
Owner

Blaizzy commented Nov 21, 2024

I would recommend:

  1. Trying 8bit or 4bit quants.
  2. Trying the 2B version.
  3. Or lowering the resolution further to 512 or 224

@southkorea2013
Copy link
Author

Hi Prince,

My testing Machine is: M3 Max 128G ram.

Thanks,
Nan

@southkorea2013
Copy link
Author

I would recommend:

  1. Trying 8bit or 4bit quants.
  2. Trying the 2B version.
  3. Or lowering the resolution further to 512 or 224

Ok, Thanks.

@Blaizzy
Copy link
Owner

Blaizzy commented Nov 21, 2024

Awesome!

It should work fine if you just lower the resolution.

I have M3 Max with 96GB URAM.

I can run this example in under a minute:
https://github.com/Blaizzy/mlx-vlm/blob/62bb0ee2f57354de4cd27e42be593049269353a4/examples/video_generation.ipynb

@Blaizzy
Copy link
Owner

Blaizzy commented Nov 21, 2024

Ok, Thanks

My pleasure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants