-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qwen2-vl 2b 4-bit always getting OOM, yet llama3.2 11b works! #1326
Comments
Hey @mehamednews :), Qwen2-VL uses more memory than Llama-3.2 due to its architecture and the way it processes images.
|
@mehamednews Apologies on the delay - actually weird I'm pretty sure I reduced the VRAM requirement of Qwen by a lot mainly due to gradient checkpointing. Would it be possible to log your memory usage and take a screenshot - also if possible could you print out the Unsloth info part (Unsloth version, torch version etc) |
Hi @danielhanchen , I am facing the same issue when I try to finetune qwen 2 vl 7b on my custom dataset on A5000 (24GB) GPU. LLama 3.2 11b runs without problem but I get out of memory errors with qwen, not sure where's the issue. here's my environment:
Can it be image size issue ? If yes, can you guide how can I reduce the image size using unsloth's tokenizer wrapper ? It's not clear in the documentation or code. Or should I just resize it and then pass to tokenizer ? Specifically which part of unsloth expects these two parameters when loading models and tokenizer using unsloth? # default processer
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
# The default range for the number of visual tokens per image in the model is 4-16384. You can set min_pixels and max_pixels according to your needs, such as a token count range of 256-1280, to balance speed and memory usage.
# min_pixels = 256*28*28
# max_pixels = 1280*28*28
# processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels) |
qwen2-vl has always been memory hungry (compared to the other vision models) and even with unsloth it still OOMs when the largest llama3.2 11b works fine.
I'm using a dataset that has high resolution images ~1200px, running with the Latex dataset did work with qwen.
Not sure if this can be fixed.
Any help would be appreciated.
here's the code I'm using (replacing llama3.2 with qwen fails)
The text was updated successfully, but these errors were encountered: