Using Reinforcement Learning to Guide Chains of Thought

With TRL:

Where the reinforcement learning is located.

There, one finds the supervised baselines:

Launch all SFT jobs with

cd approach_sft
./queue_all_jobs.sh

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
accelerate_configs		accelerate_configs
analysis		analysis
approach_sft		approach_sft
archive		archive
general		general
job_sets		job_sets
mlc_datasets		mlc_datasets
notebooks		notebooks
with_trl		with_trl
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
setup.py		setup.py