KeyError in DPO Trainer, evaluation_loop #2473

qingjianbuyi · 2024-12-13T12:36:45Z

System Info

TRL version: 0.12.1

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

python examples/scripts/dpo.py \
    --dataset_name trl-lib/ultrafeedback_binarized \
    --model_name_or_path Qwen/Qwen2-0.5B-Instruct \
    --learning_rate 5.0e-6 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --gradient_checkpointing \
    --logging_steps 25 \
    --eval_strategy steps \
    --eval_steps 50 \
    --output_dir Qwen2-0.5B-DPO \
    --no_remove_unused_columns \
    --use_peft \
    --lora_r 32 \
    --lora_alpha 16 \
    --generate_during_eval

Expected behavior

            self.log(
                {
                    "game_log": wandb.Table(
                        columns=["Prompt", "Policy", "Ref Model"],
                        rows=[
                            [prompt, pol[len(prompt) :], ref[len(prompt) :]]
                            for prompt, pol, ref in zip(
#                                random_batch["prompt"], policy_output_decoded, ref_output_decoded
                                self.tokenizer.decode(random_batch["prompt_input_ids"]), policy_output_decoded, ref_output_decoded
                            )
                        ],
                    )
                }
            )

Checklist

I have checked that my issue isn't already filed (see open issues)
I have included my system information
Any code provided is minimal, complete, and reproducible (more on MREs)
Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
Any traceback provided is complete

The text was updated successfully, but these errors were encountered:

qgallouedec added 🐛 bug Something isn't working 🏋 DPO Related to DPO labels Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError in DPO Trainer, evaluation_loop #2473

KeyError in DPO Trainer, evaluation_loop #2473

qingjianbuyi commented Dec 13, 2024 •

edited

Loading

KeyError in DPO Trainer, evaluation_loop #2473

KeyError in DPO Trainer, evaluation_loop #2473

Comments

qingjianbuyi commented Dec 13, 2024 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

Checklist

qingjianbuyi commented Dec 13, 2024 •

edited

Loading