qwen2-vl 2b 4-bit always getting OOM, yet llama3.2 11b works! #1326

mehamednews · 2024-11-22T17:04:22Z

qwen2-vl has always been memory hungry (compared to the other vision models) and even with unsloth it still OOMs when the largest llama3.2 11b works fine.
I'm using a dataset that has high resolution images ~1200px, running with the Latex dataset did work with qwen.
Not sure if this can be fixed.
Any help would be appreciated.

here's the code I'm using (replacing llama3.2 with qwen fails)

import json
from unsloth import FastVisionModel, is_bf16_supported  # FastLanguageModel for LLMs
from unsloth.trainer import UnslothVisionDataCollator
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset

model, tokenizer = FastVisionModel.from_pretrained(
    "unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit",
    load_in_4bit=True,  # Use 4bit to reduce memory use. False for 16bit LoRA.
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for long context
)

model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers=True,  # False if not finetuning vision layers
    finetune_language_layers=True,  # False if not finetuning language layers
    finetune_attention_modules=True,  # False if not finetuning attention layers
    finetune_mlp_modules=True,  # False if not finetuning MLP layers
    r=16,  # The larger, the higher the accuracy, but might overfit
    lora_alpha=16,  # Recommended alpha == r at least
    lora_dropout=0,
    bias="none",
    random_state=3407,
    use_rslora=False,  # We support rank stabilized LoRA
    loftq_config=None,  # And LoftQ
    # target_modules = "all-linear", # Optional now! Can specify a list if needed
)

# Load the JSONL dataset
dataset_path = "./label-dataset-train.jsonl"
dataset = []
with open(dataset_path, "r") as f:
    for line in f:
        sample = json.loads(line)
        # if len(sample["images"]) > 1:
        #     continue
        conversation = [
            {
                "role": "user",
                "content": [{"type": "text", "text": sample["query"]}, *[{"type": "image", "image": img} for img in sample["images"]]],
            },
            {"role": "assistant", "content": [{"type": "text", "text": sample["response"]}]},
        ]
        dataset.append({"messages": conversation})

converted_dataset = dataset
print(len(dataset))


FastVisionModel.for_training(model)  # Enable for training!

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    data_collator=UnslothVisionDataCollator(model, tokenizer),  # Must use!
    train_dataset=converted_dataset,
    args=SFTConfig(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=16,
        warmup_steps=10,
        max_steps=50,
        # num_train_epochs=1,  # Set this instead of max_steps for full training runs
        learning_rate=2e-4,
        fp16=not is_bf16_supported(),
        bf16=is_bf16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to="none",  # For Weights and Biases
        # You MUST put the below items for vision finetuning:
        remove_unused_columns=False,
        dataset_text_field="",
        dataset_kwargs={"skip_prepare_dataset": True},
        dataset_num_proc=4,
        max_seq_length=2048,
    ),
)

trainer_stats = trainer.train()

The text was updated successfully, but these errors were encountered:

WizKnight · 2024-11-22T18:50:25Z

Hey @mehamednews :), Qwen2-VL uses more memory than Llama-3.2 due to its architecture and the way it processes images.
Since you're working with high-resolution images, try experimenting with couple of things:

Downsampling the images resolution to ~512px or ~256px. This can significantly reduce memory usage.
Increasing the gradient_accumulation_steps in your code, this will help to add a larger batch size without loading all the data into memory at once.

danielhanchen · 2024-11-26T11:43:04Z

@mehamednews Apologies on the delay - actually weird I'm pretty sure I reduced the VRAM requirement of Qwen by a lot mainly due to gradient checkpointing.

Would it be possible to log your memory usage and take a screenshot - also if possible could you print out the Unsloth info part (Unsloth version, torch version etc)

Gladiator07 · 2024-11-26T11:58:44Z

Hi @danielhanchen , I am facing the same issue when I try to finetune qwen 2 vl 7b on my custom dataset on A5000 (24GB) GPU. LLama 3.2 11b runs without problem but I get out of memory errors with qwen, not sure where's the issue.

here's my environment:

==((====))==  Unsloth 2024.11.9: Fast Qwen2_Vl vision patching. Transformers = 4.46.3.
   \\   /|    GPU: NVIDIA RTX A5000. Max memory: 23.988 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1. CUDA = 8.6. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46

Can it be image size issue ? If yes, can you guide how can I reduce the image size using unsloth's tokenizer wrapper ? It's not clear in the documentation or code. Or should I just resize it and then pass to tokenizer ?

Specifically which part of unsloth expects these two parameters when loading models and tokenizer using unsloth?

# default processer
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")

# The default range for the number of visual tokens per image in the model is 4-16384. You can set min_pixels and max_pixels according to your needs, such as a token count range of 256-1280, to balance speed and memory usage.
# min_pixels = 256*28*28
# max_pixels = 1280*28*28
# processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen2-vl 2b 4-bit always getting OOM, yet llama3.2 11b works! #1326

qwen2-vl 2b 4-bit always getting OOM, yet llama3.2 11b works! #1326

mehamednews commented Nov 22, 2024

WizKnight commented Nov 22, 2024

danielhanchen commented Nov 26, 2024

Gladiator07 commented Nov 26, 2024 •

edited

Loading

qwen2-vl 2b 4-bit always getting OOM, yet llama3.2 11b works! #1326

qwen2-vl 2b 4-bit always getting OOM, yet llama3.2 11b works! #1326

Comments

mehamednews commented Nov 22, 2024

WizKnight commented Nov 22, 2024

danielhanchen commented Nov 26, 2024

Gladiator07 commented Nov 26, 2024 • edited Loading

Gladiator07 commented Nov 26, 2024 •

edited

Loading