BAdam算法finetune的迭代轮次和论文不是很符合 #6088

PhzCode · 2024-11-20T08:24:51Z

Reminder

I have read the README and searched the existing issues.

System Info

运行指令

llamafactory-cli train examples/extras/badam/test_llama3_full_sft.yaml

test_llama3_full_sft.yaml文件内容

### model
model_name_or_path: /models/llama/llama-2-7b-hf

### method
stage: sft
do_train: true
finetuning_type: full
use_badam: true
badam_mode: layer
badam_switch_mode: ascending
badam_switch_interval: 50
badam_verbose: 2
deepspeed: examples/deepspeed/ds_z3_config.json

### dataset
dataset: identity,alpaca_en_demo
template: llama2
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/llama2-7b/full/test
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-5
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1

### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

Reproduction

num_train_epochs为3，在计算迭代轮次的结果是用num_train_epochs*len(datasets)。但是论文说的是对每一个block都需要执行K次（超参设置为50），那32个layer需要执行32*50次。

按照现在的逻辑，数据集长度为61，所以执行的轮次只有3*61=183次。在finetune到第3层layer后结束了

Expected behavior

No response

Others

No response

The text was updated successfully, but these errors were encountered:

Ledzy · 2024-11-21T03:53:38Z

Hi @PhzCode , the num_train_epochs means the data epoch, instead of the block epoch in paper. In your case, $n=61$, $D=32$, and $K=50$, so one data epoch is equivalent to $\frac{61}{3KD} \approx 0.013$ block epochs. Given your small dataset size, I suppose it would be fine to not finish one block-epoch (otherwise it may lead to overfit).

PhzCode · 2024-11-25T06:54:54Z

感谢。那如果我想把所有block都finetune一遍，那就需要增加n（the number of training data points）。如果D=32，每个block epoch的K=50，需要32*50=1600条data points，对吧？你们有这么大的数据集来测试过吗 @Ledzy

Ledzy · 2024-11-25T08:22:00Z

@PhzCode 大多数finetune数据集的样本数量都在1w-100w之间，BAdam在Alpaca-GPT4 (5w), MathInstruct (26w)进行了实验，具体细节见论文。

对于你的问题，1600个样本是1个block epoch的数据量，而block epoch不一定要比data epoch小。也就是说你们可以设置更多的data epoch来确保所有block都被训练到。

PhzCode · 2024-11-25T09:08:06Z

@Ledzy 明白了。在llama-factory的llama3_full_sft.yaml中dataset设置的数据集是本仓库的一些demo，所有只有61个数据集来做了finetune。应设置为Alpaca-GPT4就正常了

github-actions bot added the pending This problem is yet to be addressed label Nov 20, 2024

PhzCode closed this as completed Nov 25, 2024

PhzCode reopened this Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BAdam算法finetune的迭代轮次和论文不是很符合 #6088

BAdam算法finetune的迭代轮次和论文不是很符合 #6088

PhzCode commented Nov 20, 2024 •

edited

Loading

Ledzy commented Nov 21, 2024

PhzCode commented Nov 25, 2024 •

edited

Loading

Ledzy commented Nov 25, 2024

PhzCode commented Nov 25, 2024

BAdam算法finetune的迭代轮次和论文不是很符合 #6088

BAdam算法finetune的迭代轮次和论文不是很符合 #6088

Comments

PhzCode commented Nov 20, 2024 • edited Loading

Reminder

System Info

Reproduction

Expected behavior

Others

Ledzy commented Nov 21, 2024

PhzCode commented Nov 25, 2024 • edited Loading

Ledzy commented Nov 25, 2024

PhzCode commented Nov 25, 2024

PhzCode commented Nov 20, 2024 •

edited

Loading

PhzCode commented Nov 25, 2024 •

edited

Loading