Added new Post-training an LLM using GRPO with TRL
recipe π§βπ³οΈ
#710
Loading
Post-training an LLM using GRPO with TRL
recipe π§βπ³οΈ
#710