Im in trouble with error message ' Attempting to unscale FP16 gradients' #1484
Unanswered
ChanyoungE
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
@haotian-liu
Im trying to train model "llava-v1.5-7b" with LLaVATrainer (train.py). Im trying to train with these command but error occured 'Attempting to unscale FP16 gradient'. What should i do?
python train.py --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 --model_name_or_path "offline model path.." --version v1 --data_path "data.json path...." --image_folder "some image_folder..." --vision_tower "vision tower offline path..." --mm_projector_type mlp2x_gelu --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --image_aspect_ratio pad --group_by_modality_length True --fp16 True --output_dir "out path to .. lora-llava-v1.5-7b_1" --num_train_epochs 10 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 1 --evaluation_strategy "no" --save_strategy "steps" --save_steps 50000 --save_total_limit 1 --learning_rate 2e-4 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --tf32 False --model_max_length 2048 --gradient_checkpointing True --dataloader_num_workers 2 --lazy_preprocess True
here's my specification.
OS : linux x64
GPU : tesla-v100
torch : 1.13.1
llava-torch : 1.2.2.post1
transformer : 4.36.2
Beta Was this translation helpful? Give feedback.
All reactions