We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llamafactory
图文数据集长度 66081 纯文本数据集长度 423 使用的是sharegpt格式数据,两个数据集的dataset_info分别是:
66081
423
"data_mixed_unimodal": { "file_name": "data_mixed_unimodal.json", "formatting": "sharegpt", "columns": { "messages": "messages", "images": "images" }, "tags": { "role_tag": "role", "content_tag": "content", "user_tag": "user", "assistant_tag": "assistant", "system_tag": "system" } }, "data_mixed_unimodal_txt": { "file_name": "data_mixed_unimodal_txt.json", "formatting": "sharegpt", "columns": { "messages": "messages" }, "tags": { "role_tag": "role", "content_tag": "content", "user_tag": "user", "assistant_tag": "assistant", "system_tag": "system" } }
训练命令为:
dataset="data_mixed_unimodal,data_mixed_unimodal_txt" DS_CONFIG_PATH=${BASE_PATH}/LLaMA-Factory/examples/deepspeed/ds_z2_config.json torchrun $DISTRIBUTED_ARGS src/train.py \ --deepspeed $DS_CONFIG_PATH \ --stage sft \ --do_train \ --model_name_or_path $MODEL_PATH \ --dataset_dir $DATASET \ --dataset $dataset \ --template qwen2_vl \ --finetuning_type full \ --output_dir $OUTPUT_PATH \ --overwrite_cache \ --overwrite_output_dir \ --warmup_ratio 0.1 \ --weight_decay 0.08 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 16 \ --ddp_timeout 18000000 \ --learning_rate 4e-6 \ --lr_scheduler_type cosine \ --logging_steps 200 \ --cutoff_len ${CUT_OFF} \ --save_strateg epoch \ --plot_loss \ --num_train_epochs 3 \ --bf16 \ --image_resolution 448 \ --fix_embedding False \ -fix_vit False \ --attn_implementation $attn_implementation \ --report_to none
希望能够正常混合训练纯文本输入和多模态输入数据,但是最后模型的loss 跌为了0且模型infer时只输出问号。log文件为:
{'loss': 2.4421278141604166e+27, 'grad_norm': nan, 'learning_rate': 2.5641025641025644e-06, 'epoch': 0.19} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 3.9902938328141285e-06, 'epoch': 0.38} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 3.896854514436596e-06, 'epoch': 0.58} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 3.7086363653163876e-06, 'epoch': 0.77} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 3.4350439528372386e-06, 'epoch': 0.96} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 3.0897476817442102e-06, 'epoch': 1.15} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 2.690000734389941e-06, 'epoch': 1.35} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 2.2557769927853283e-06, 'epoch': 1.54} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 1.8087730173132427e-06, 'epoch': 1.73} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 1.371323949551559e-06, 'epoch': 1.92} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 9.652875075468517e-07, 'epoch': 2.12} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 6.109518361827841e-07, 'epoch': 2.31} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 3.2602178333604456e-07, 'epoch': 2.5} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 1.24734253865034e-07, 'epoch': 2.69} {'loss': 0.0, 'grad_norm': nan, 'learning_rate': 1.714684393284638e-08, 'epoch': 2.89}
请问这是什么原因导致的,如果想要正常的混合数据训练需要怎么做?
No response
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Reminder
System Info
llamafactory
version: 0.9.1.dev0Reproduction
图文数据集长度
66081
纯文本数据集长度
423
使用的是sharegpt格式数据,两个数据集的dataset_info分别是:
训练命令为:
Expected behavior
希望能够正常混合训练纯文本输入和多模态输入数据,但是最后模型的loss 跌为了0且模型infer时只输出问号。log文件为:
请问这是什么原因导致的,如果想要正常的混合数据训练需要怎么做?
Others
No response
The text was updated successfully, but these errors were encountered: