Im in trouble with error message ' Attempting to unscale FP16 gradients' #1484

ChanyoungE · 2024-05-03T02:48:35Z

ChanyoungE
May 3, 2024

@haotian-liu
Im trying to train model "llava-v1.5-7b" with LLaVATrainer (train.py). Im trying to train with these command but error occured 'Attempting to unscale FP16 gradient'. What should i do?

python train.py --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 --model_name_or_path "offline model path.." --version v1 --data_path "data.json path...." --image_folder "some image_folder..." --vision_tower "vision tower offline path..." --mm_projector_type mlp2x_gelu --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --image_aspect_ratio pad --group_by_modality_length True --fp16 True --output_dir "out path to .. lora-llava-v1.5-7b_1" --num_train_epochs 10 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 1 --evaluation_strategy "no" --save_strategy "steps" --save_steps 50000 --save_total_limit 1 --learning_rate 2e-4 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --tf32 False --model_max_length 2048 --gradient_checkpointing True --dataloader_num_workers 2 --lazy_preprocess True

here's my specification.
OS : linux x64
GPU : tesla-v100
torch : 1.13.1
llava-torch : 1.2.2.post1
transformer : 4.36.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Im in trouble with error message ' Attempting to unscale FP16 gradients' #1484

{{title}}

Replies: 0 comments

Select a reply

Im in trouble with error message ' Attempting to unscale FP16 gradients' #1484

ChanyoungE May 3, 2024

Replies: 0 comments

ChanyoungE
May 3, 2024