二次预训练阶段全参微调,损失曲线是否正常,如何优化 #5896
Unanswered
Shame-fight
asked this question in
Q&A
Replies: 1 comment
-
您好,我也遇到了类似问题,方便加个联系方式交流一下吗!vx:18923400893 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Reminder
System Info
llamafactory== 0.9.1.dev0
Reproduction
model
model_name_or_path: /nanshu_data/jgx/LLM_Model/Qwen/Qwen2___5-0___5B
method
stage: pt
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json
dataset
dataset: yingji_pt,CCI3_2w,refineweb,wiki_en,wiki_ch_6k,alpha_gpt4,BELLE
template: default
cutoff_len: 1024
max_samples: 200000
overwrite_cache: true
preprocessing_num_workers: 16
output
output_dir: saves/qwen2.5-0.5b/full/pt
logging_steps: 10
save_steps: 1000
plot_loss: true
overwrite_output_dir: true
train
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
learning_rate: 1.0e-5
num_train_epochs: 4.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
eval
val_size: 0.05
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 1000
Expected behavior
预训练阶段train loss阶梯式下降,eval loss先下降后上升,是否正常。5000个step后出现过拟合?请问是否需要继续预训练,如何改进。指令监督微调应该选择多少step的二次预训练模型
Others
No response
Beta Was this translation helpful? Give feedback.
All reactions