finetune llava-onevision-qwen2-0.5b-si on custom dataset. #368

yanbai1993 · 2024-12-18T11:11:05Z

Hi,
I hope to finetune llava-onevision-qwen2-0.5b-si on my own dataset. During the inference process after training the model, a warning appears stating: "Some weights of LlavaQwenForCausalLM were not initialized from the model checkpoint at /mnt/dolphinfs/ssd_pool/docker/user/hadoop-perception-zw04/baiyan02/llava_log/llava-onevision-qwen2-0.5b-si-test and are newly initialized: ['lm_head.weight']." Additionally, the inference output is garbled. However, when testing the model llava-onevision-qwen2-0.5b-si separately, it works normally.

my training scripts:
META_NAME='test'
OUTPUT_DIR=llava-onevision-qwen2-0.5b-si-$META_NAME
LLM_VERSION="llava-onevision-qwen2-0.5b-si"
VISION_MODEL_VERSION="llava-data/llava/siglip"
DATA_PATH="test.json"

PROMPT_VERSION="qwen_1_5"
LLM_VERSION_CLEAN="${LLM_VERSION////}"
VISION_MODEL_VERSION_CLEAN="${VISION_MODEL_VERSION////}"
RUN_NAME="llava-onevision-${VISION_MODEL_VERSION_CLEAN}-${LLM_VERSION_CLEAN}-${META_NAME}"
echo "MID_RUN_NAME: ${RUN_NAME}"

torchrun --nproc_per_node=8 --nnodes=1
llava/train/train_mem.py
--deepspeed scripts/zero3.json
--model_name_or_path ${LLM_VERSION}
--version $PROMPT_VERSION
--data_path ${DATA_PATH}
--image_folder ""
--mm_tunable_parts="mm_vision_tower,mm_mlp_adapter,mm_language_model"
--mm_vision_tower_lr=2e-6
--vision_tower ${VISION_MODEL_VERSION}
--mm_projector_type mlp2x_gelu
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--group_by_modality_length True
--image_aspect_ratio anyres_max_9
--image_grid_pinpoints "(1x1),...,(6x6)"
--mm_patch_merge_type spatial_unpad
--bf16 True
--run_name $RUN_NAME
--output_dir ${OUTPUT_DIR}
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 4
--gradient_accumulation_steps 2
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 1000
--save_total_limit 1
--learning_rate 1e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 True
--model_max_length 32768
--gradient_checkpointing True
--dataloader_num_workers 4
--lazy_preprocess True
--report_to none
--torch_compile True
--torch_compile_backend "inductor"
--dataloader_drop_last True
--frames_upbound 32
--attn_implementation sdpa

during inference, I load my model by:
model_name = "llava_qwen"
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, None, model_name, attn_implementation="sdpa", device_map="auto", multimodal=True)
Thank you very much for your assistance.

XIDIANPQZ · 2024-12-25T10:00:22Z

Same issue. Have you solved it?

XIDIANPQZ · 2024-12-26T08:07:39Z

Same issue. Have you solved it?

updating model_type in config.json to "qwen2" is effectively

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

finetune llava-onevision-qwen2-0.5b-si on custom dataset. #368

finetune llava-onevision-qwen2-0.5b-si on custom dataset. #368

yanbai1993 commented Dec 18, 2024 •

edited

Loading

XIDIANPQZ commented Dec 25, 2024

XIDIANPQZ commented Dec 26, 2024

finetune llava-onevision-qwen2-0.5b-si on custom dataset. #368

finetune llava-onevision-qwen2-0.5b-si on custom dataset. #368

Comments

yanbai1993 commented Dec 18, 2024 • edited Loading

XIDIANPQZ commented Dec 25, 2024

XIDIANPQZ commented Dec 26, 2024

yanbai1993 commented Dec 18, 2024 •

edited

Loading