Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anyres compatible fine-tuning of llava-1.6 mistral 7b and 34b #1347

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

arielnlee
Copy link

Lowrank fine-tuning with anyres for the LLaVA Next models :)

@awzhgw
Copy link

awzhgw commented Apr 12, 2024

that is a good pr.. i am finetune this pr ..thanks .

@arielnlee

@arielnlee
Copy link
Author

Ofc, glad you found it useful! I'm sure the author's version is far superior (<3 llava), but wanted to leave this here for others to use until we get the real magic :)

@awzhgw

@awzhgw
Copy link

awzhgw commented Apr 14, 2024

@arielnlee I encountered an issue during the training process. I am using the LoRA fine-tuning method, and my data consists of two parts:

lots of Pure question-answering dialogues.
Image-question-answering dialogues.

During training, I found that the training speed for the first part of the dataset is very slow, and it is as slow as the first part. After investigation, I found that the reason is:

在train.py的LazySupervisedDataset 类的__getitem__ 方法:

        if 'image' in self.list_data_dict[i]:
            data_dict['image'] = image
            data_dict['image_size'] = image_size
        elif self.data_args.is_multimodal:
            # image does not exist in the data, but the model is multimodal
            crop_size = self.data_args.image_processor.crop_size
            data_dict['image'] = torch.zeros(3, crop_size['height'], crop_size['width'])
            data_dict['image_size'] = crop_size
        return data_dict

when i delete this code 👍

            elif self.data_args.is_multimodal:
                # image does not exist in the data, but the model is multimodal
                crop_size = self.data_args.image_processor.crop_size
                data_dict['image'] = torch.zeros(3, crop_size['height'], crop_size['width'])
                data_dict['image_size'] = crop_size
            return data_dict

the train process error:

Traceback (most recent call last):
  File "/export/App/training_platform/PinoModel/LLaVA/llava/train/train_mem.py", line 9, in <module>
    train(attn_implementation="flash_attention_2")
  File "/export/App/training_platform/PinoModel/LLaVA/llava/train/train.py", line 1092, in train
    trainer.train()
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1854, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2744, in training_step
    self.accelerator.backward(loss)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 1958, in backward
    self.deepspeed_engine_wrapped.backward(loss, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/deepspeed.py", line 167, in backward
    self.engine.backward(loss, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1964, in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage3.py", line 2152, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 491, in backward
    torch.autograd.backward(
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

how to fine-tuning for the pure text stage with very fast speed?

Can I do this? ? ? Finally trained a good llava model

@rohithbojja
Copy link

i got adapter_model.safetensor instead of adapter_model.bin after LORA FInetuning of 1.6-mistral and getting error as
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.30it/s]
Traceback (most recent call last):
File "/home/rohith/LLaVA-1.6-ft/scripts/merge_lora_weights.py", line 22, in
merge_lora(args)
File "/home/rohith/LLaVA-1.6-ft/scripts/merge_lora_weights.py", line 8, in merge_lora
tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name, device_map='cpu')
File "/home/rohith/LLaVA-1.6-ft/llava/model/builder.py", line 112, in load_pretrained_model
mm_projector_weights = torch.load(os.path.join(model_path, 'mm_projector.bin'), map_location='cpu')
File "/home/rohith/miniconda3/envs/llava/lib/python3.10/site-packages/torch/serialization.py", line 986, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/rohith/miniconda3/envs/llava/lib/python3.10/site-packages/torch/serialization.py", line 435, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/rohith/miniconda3/envs/llava/lib/python3.10/site-packages/torch/serialization.py", line 416, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/home/rohith/Documents/mistral-llava/mm_projector.bin'

when trying to merge model

@awzhgw
Copy link

awzhgw commented Apr 18, 2024

@rohithbojja may be mode_path is error 。。 please give me :model_path, mode_base args

@awzhgw
Copy link

awzhgw commented Apr 18, 2024

@rohithbojja

nohup python scripts/merge_lora_weights.py --model-path=../checkpoints/llava-v1.6-34b-xxx-lora-5000 --model-base=../checkpoints/llava-v1.6-34b --save-model-path=../checkpoints/llava-v1.6-34b-xxx-5000 &

@rohithbojja
Copy link

@rohithbojja

nohup python scripts/merge_lora_weights.py --model-path=../checkpoints/llava-v1.6-34b-xxx-lora-5000 --model-base=../checkpoints/llava-v1.6-34b --save-model-path=../checkpoints/llava-v1.6-34b-xxx-5000 &

ive fixed my adding "lora" to model-path

@findalexli
Copy link

Can you please provide some example of your training data?

system="""<|im_start|>system\nAnswer the questions.""",
roles=("<|im_start|>user\n", "<|im_start|>assistant\n"),

I was wondering why you chose to add a new conversation format. I was trying to tune based on your PR and my existing data made for LLaVA 1.5 fine-tuning which uses 'V1' version, but currently running into issues where the tokenizer length is mismatching

@rohithbojja
Copy link

rohithbojja commented Apr 21, 2024

@rohithbojja
Copy link

#!/bin/bash

deepspeed llava/train/train_mem.py
--lora_enable True --lora_r 16 --lora_alpha 32 --mm_projector_lr 2e-5
--deepspeed ./scripts/zero2.json
--model_name_or_path /home/rohith/llava-v1.6-mistral-7b-bnb-4bit/
--version mistral_instruct
--data_path /home/rohith/Desktop/vqa/vqa/images/filtered_dataset.json
--image_folder /home/rohith/Desktop/vqa/vqa/images/
--vision_tower openai/clip-vit-large-patch14-336
--mm_projector_type mlp2x_gelu
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--mm_patch_merge_type spatial_unpad
--image_aspect_ratio anyres
--group_by_modality_length False
--bf16 False
--fp16 True
--output_dir /home/rohith/LLaVA-1.6-ft/llava_lora_mistral_med/
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 4
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 500
--save_total_limit 5
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.05
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 True
--model_max_length 4096
--gradient_checkpointing True
--dataloader_num_workers 4
--lazy_preprocess True
--report_to wandb \

using this script gives me error

ValueError: .to is not supported for 4-bit or 8-bit bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype.

and
using original model doesnt give any error,
used
panoyo9829/llava-v1.6-mistral-7b-bnb-4bit model

@findalexli
Copy link

findalexli commented Apr 24, 2024 via email

@rohithbojja
Copy link

rohithbojja commented Apr 24, 2024

@findalexli
Use this to download dataset

https://drive.google.com/file/d/1gYLOFaz7Mn-E2u9ksT0R2BOai7MnmNcm/view?usp=drivesdk

It has following structure
VQA-
1.images
|_img1
|_img2
2.train
|_filtered_dataset.json

1 image is truncated,
Remove it,
Use this to detect

from PIL import Image
import os

trunk_ = 0
def is_truncated(image_path):
try:
# Open the image file
img = Image.open(image_path)
# Check if the image is truncated by trying to load it
img.load()
return False # Image is not truncated
except Exception as e:
print(f"Error loading image {image_path}: {e}")
return True # Image is truncated or corrupt

def check_for_truncated_images(directory,trunk_):
# Iterate through all files in the directory
for filename in os.listdir(directory):
# Check if the file is an image
if filename.endswith(('.jpg', '.jpeg', '.png', '.gif', '.bmp')):
image_path = os.path.join(directory, filename)
if is_truncated(image_path):
print(f"The image {filename} in directory {directory} is truncated.")
trunk_ = 1
else:
trunk_=0
print(trunk_)

directory_path = '/workspace/vqa/images'
check_for_truncated_images(directory_path,0)

Also remove the entry in json.
Otherwise you'll end up failing at 30% or so

Good luck

@Sato-Daichi Sato-Daichi mentioned this pull request May 1, 2024
@diridiri
Copy link

Hi, Thanks for working on private version of anyres llava.

I have done fintuning vicuna-v1.5-7b with anyres / spatial_unpad in same configuration as above, but the result doesn't seem to work out well on lmms-eval with MME score 357 / 224 (LLaVA-v1.5-7B : 1519 / 332).

Have you done some evaluation on public benchmarks and got similar score?

@NicoZenith
Copy link

Hi! Thanks for sharing!
However, when I execute your training script, it also trains the vision encoder (adapter_model.safetensors contains vision encoder weights)
Is there a way to disable the gradient backprop into these weights as done for the original llava fine-tuning?

@ipheiman
Copy link

Can you please provide some example of your training data?

system="""<|im_start|>system\nAnswer the questions.""",
roles=("<|im_start|>user\n", "<|im_start|>assistant\n"),

I was wondering why you chose to add a new conversation format. I was trying to tune based on your PR and my existing data made for LLaVA 1.5 fine-tuning which uses 'V1' version, but currently running into issues where the tokenizer length is mismatching

@findalexli Hi there, did you find out why? I used the new template and ran into tokenization mismatch errors so i'm going to try with v1 now. Let me know if you managed to finetune a llava1.6! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants