You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @PhzCode , the num_train_epochs means the data epoch, instead of the block epoch in paper. In your case, $n=61$, $D=32$, and $K=50$, so one data epoch is equivalent to $\frac{61}{3KD} \approx 0.013$ block epochs. Given your small dataset size, I suppose it would be fine to not finish one block-epoch (otherwise it may lead to overfit).
Reminder
System Info
运行指令
test_llama3_full_sft.yaml文件内容
Reproduction
num_train_epochs为3,在计算迭代轮次的结果是用num_train_epochs*len(datasets)。但是论文说的是对每一个block都需要执行K次(超参设置为50),那32个layer需要执行32*50次。
按照现在的逻辑,数据集长度为61,所以执行的轮次只有3*61=183次。在finetune到第3层layer后结束了
Expected behavior
No response
Others
No response
The text was updated successfully, but these errors were encountered: