How to finetune llava-ov from PREV_STAGE_CHECKPOINT? #378

BBBBchan · 2024-12-29T05:08:31Z

After reading the scripts/train/README.md, I am attempting to reproduce the training of LLaVA-OneVision from scratch. I successfully ran the scripts/train/pretrain_siglip.sh script, specifying the output directory as checkpoints/projectors/${BASE_RUN_NAME}, where BASE_RUN_NAME is set to llavanext-model_zoo_google_siglip-so400m-patch14-384-model_zoo_Qwen_Qwen2.5-0.5B-Instruct-mlp2x_gelu-pretrain_blip558k_plain.

Upon completion of the pretraining phase, the output directory contains the following files:

config.json
mm_projector.bin
trainer_state.json

Then, referring to the scripts/train/finetune_ov.sh script, I replaced the PREV_STAGE_CHECKPOINT variable with llavanext-model_zoo_google_siglip-so400m-patch14-384-model_zoo_Qwen_Qwen2.5-0.5B-Instruct-mlp2x_gelu-pretrain_blip558k_plain. However, this resulted in an error when I ran the fintune script:

Traceback (most recent call last):
  File "/mnt/data/LLaVA-NeXT/llava/train/train_mem.py", line 4, in <module>
    train()
  File "/mnt/data/LLaVA-NeXT/llava/train/train.py", line 1496, in train
    model = get_model(model_args, training_args, bnb_model_from_pretrained_args)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/LLaVA-NeXT/llava/train/train.py", line 1428, in get_model
    model = LlavaQwenForCausalLM.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3144, in from_pretrained
    raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory checkpoints/projectors/llavanext-model_zoo_google_siglip-so400m-patch14-384-model_zoo_Qwen_Qwen2.5-0.5B-Instruct-mlp2x_gelu-pretrain_blip558k_plain.

It seems like the complete checkpoint is missing; the previous stage appears to have saved only the mm_projector.bin file rather than the entire model. How can I obtain a full checkpoint from the mm_projectors.bin?

P.S. In case my understanding was incorrect from the start, the correct script workflow for training llava-ov from scratch should be:

Stage-1: pretrain_siglip.sh
Stage-1.5: finetune_ov.sh (using the checkpoint from Stage-1)
Stage-2: finetune_ov.sh (using the checkpoint from Stage-1.5)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to finetune llava-ov from PREV_STAGE_CHECKPOINT? #378

How to finetune llava-ov from PREV_STAGE_CHECKPOINT? #378

BBBBchan commented Dec 29, 2024

How to finetune llava-ov from PREV_STAGE_CHECKPOINT? #378

How to finetune llava-ov from PREV_STAGE_CHECKPOINT? #378

Comments

BBBBchan commented Dec 29, 2024