Skip to content

JulesGM/Marg-Li-CoT

Repository files navigation

Using Reinforcement Learning to Guide Chains of Thought

With TRL:

Where the reinforcement learning is located.

Approach SFT:

There, one finds the supervised baselines:

  • Generate, then learn, masked.

Launch all SFT jobs with

cd approach_sft
./queue_all_jobs.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published