We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
examples
python examples/scripts/dpo.py \ --dataset_name trl-lib/ultrafeedback_binarized \ --model_name_or_path Qwen/Qwen2-0.5B-Instruct \ --learning_rate 5.0e-6 \ --num_train_epochs 1 \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 8 \ --gradient_checkpointing \ --logging_steps 25 \ --eval_strategy steps \ --eval_steps 50 \ --output_dir Qwen2-0.5B-DPO \ --no_remove_unused_columns \ --use_peft \ --lora_r 32 \ --lora_alpha 16 \ --generate_during_eval
self.log( { "game_log": wandb.Table( columns=["Prompt", "Policy", "Ref Model"], rows=[ [prompt, pol[len(prompt) :], ref[len(prompt) :]] for prompt, pol, ref in zip( # random_batch["prompt"], policy_output_decoded, ref_output_decoded self.tokenizer.decode(random_batch["prompt_input_ids"]), policy_output_decoded, ref_output_decoded ) ], ) } )
The text was updated successfully, but these errors were encountered:
No branches or pull requests
System Info
Information
Tasks
examples
folderReproduction
Expected behavior
Checklist
The text was updated successfully, but these errors were encountered: