Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BAdam算法finetune的迭代轮次和论文不是很符合 #6088

Open
1 task done
PhzCode opened this issue Nov 20, 2024 · 4 comments
Open
1 task done

BAdam算法finetune的迭代轮次和论文不是很符合 #6088

PhzCode opened this issue Nov 20, 2024 · 4 comments
Labels
pending This problem is yet to be addressed

Comments

@PhzCode
Copy link

PhzCode commented Nov 20, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

运行指令

llamafactory-cli train examples/extras/badam/test_llama3_full_sft.yaml

test_llama3_full_sft.yaml文件内容

### model
model_name_or_path: /models/llama/llama-2-7b-hf

### method
stage: sft
do_train: true
finetuning_type: full
use_badam: true
badam_mode: layer
badam_switch_mode: ascending
badam_switch_interval: 50
badam_verbose: 2
deepspeed: examples/deepspeed/ds_z3_config.json

### dataset
dataset: identity,alpaca_en_demo
template: llama2
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/llama2-7b/full/test
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-5
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1

### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

Reproduction

num_train_epochs为3,在计算迭代轮次的结果是用num_train_epochs*len(datasets)。但是论文说的是对每一个block都需要执行K次(超参设置为50),那32个layer需要执行32*50次。
image
按照现在的逻辑,数据集长度为61,所以执行的轮次只有3*61=183次。在finetune到第3层layer后结束了
image

Expected behavior

No response

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Nov 20, 2024
@Ledzy
Copy link
Contributor

Ledzy commented Nov 21, 2024

Hi @PhzCode , the num_train_epochs means the data epoch, instead of the block epoch in paper. In your case, $n=61$, $D=32$, and $K=50$, so one data epoch is equivalent to $\frac{61}{3KD} \approx 0.013$ block epochs. Given your small dataset size, I suppose it would be fine to not finish one block-epoch (otherwise it may lead to overfit).

@PhzCode
Copy link
Author

PhzCode commented Nov 25, 2024

感谢。那如果我想把所有block都finetune一遍,那就需要增加n(the number of training data points)。如果D=32,每个block epoch的K=50,需要32*50=1600条data points,对吧?你们有这么大的数据集来测试过吗 @Ledzy

@PhzCode PhzCode closed this as completed Nov 25, 2024
@PhzCode PhzCode reopened this Nov 25, 2024
@Ledzy
Copy link
Contributor

Ledzy commented Nov 25, 2024

@PhzCode 大多数finetune数据集的样本数量都在1w-100w之间,BAdam在Alpaca-GPT4 (5w), MathInstruct (26w)进行了实验,具体细节见论文。

对于你的问题,1600个样本是1个block epoch的数据量,而block epoch不一定要比data epoch小。也就是说你们可以设置更多的data epoch来确保所有block都被训练到。

@PhzCode
Copy link
Author

PhzCode commented Nov 25, 2024

@Ledzy 明白了。在llama-factory的llama3_full_sft.yaml中dataset设置的数据集是本仓库的一些demo,所有只有61个数据集来做了finetune。应设置为Alpaca-GPT4就正常了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

2 participants