[Update] loader.py , evaluate will run separate evaluations on each eval_dataset #5522

SrWYG · 2024-09-24T02:12:59Z

If you pass a dictionary with names of datasets as keys and datasets as values, evaluate will run separate evaluations on each dataset. This can be useful to monitor how training affects other datasets or simply to get a more fine-grained evaluation

seq2seqtrainner support eval_dataset as Dict.

What does this PR do?

Fixes # (issue)

Before submitting

[ ✅ ] Did you read the contributor guideline?
[ ✅ ] Did you write any new necessary tests?
- I test it in alpacha format data，mode sft, model qwen2.5-7B-Instruct ，both single GPU and 2 GPU using fsdp.
- the loss will be print and be logged in tensorboard run logs, which can be filtered by _loss in your tensorboard webUI.

…ataset. `If you pass a dictionary with names of datasets as keys and datasets as values, evaluate will run separate evaluations on each dataset. This can be useful to monitor how training affects other datasets or simply to get a more fine-grained evaluation` seq2seqtrainner support eval_dataset as Dict.

chengchengpei · 2024-09-25T17:40:20Z

src/llamafactory/data/loader.py

+    if merge:
+        return merge_dataset([data for _, data in datasets.items()], data_args, seed=training_args.seed)
+    else:
+        return datasets


does dict match the return value type Optional[Union["Dataset", "IterableDataset"]]?

chengchengpei · 2024-09-25T17:41:13Z

can we add some tests because the change it pretty huge..

SrWYG · 2024-10-12T07:11:45Z

Test case yaml:

### model
## Testing on the 72b model is more convincing than the 7b model
model_name_or_path: /data3/models/Qwen2.5-72B-Instruct

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: q_proj,v_proj

### dataset
dataset: identity,alpaca_en_demo
packing: true
dataset_dir: data/
template: qwen
cutoff_len: 8000
## for test
max_samples: 100
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/Qwen2.5-72B-Instruct/lora/test
logging_steps: 10
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 16
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
# bf16: true
fp16: true
ddp_timeout: 180000000
## It‘s ok with or without quantization
quantization_bit: 4
# neftune_noise_alpha: 5
lora_rank: 16
flash_attn: fa2


### eval
## You can specify val_size to divide a whole set from the whole trainset.
# val_size: 0.05
## Alternatively, specify eval_dataset to evaluate separately on each set
eval_dataset: alpaca_en_demo,alpaca_zh_demo
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 10

save_steps: 10
eval_steps: 10

# low_cpu_mem_usage: False
# ddp_find_unused_parameters: False

You can test it in multi-gpus with fsdp：

CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
    --config_file examples/accelerate/fsdp_config.yaml \
    src/train.py examples/train_lora/qwen2_lora_sft.yaml

or, single gpu but 7b model

CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/train_lora/qwen2_lora_sft.yaml

It's only a change about eval datasets, so i didn't test more models.

and
separate evaluation loss will printed on training log and tensor board webUI.

tensorboard --logdir=saves/Qwen2.5-72B-Instruct/lora/test/runs --bind_all --port=6006

hiyouga added the pending This problem is yet to be addressed label Sep 24, 2024

chengchengpei reviewed Sep 25, 2024

View reviewed changes

hiyouga force-pushed the main branch from 5569125 to b4c7dd3 Compare October 29, 2024 07:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Update] loader.py , evaluate will run separate evaluations on each eval_dataset #5522

[Update] loader.py , evaluate will run separate evaluations on each eval_dataset #5522

SrWYG commented Sep 24, 2024

chengchengpei Sep 25, 2024

chengchengpei commented Sep 25, 2024

SrWYG commented Oct 12, 2024

[Update] loader.py , evaluate will run separate evaluations on each eval_dataset #5522

Are you sure you want to change the base?

[Update] loader.py , evaluate will run separate evaluations on each eval_dataset #5522

Conversation

SrWYG commented Sep 24, 2024

What does this PR do?

Before submitting

chengchengpei Sep 25, 2024

Choose a reason for hiding this comment

chengchengpei commented Sep 25, 2024

SrWYG commented Oct 12, 2024