An awesome list of tools and resources to get started with Human in the Loop or RHLF.
- **Open AI - Aligning language models to follow instructions | **Internal blog post, how-to-use
- **Cornell University - Scaling Language Models: Methods, Analysis & Insights from Training Gopher | **Academic paper
- **Hugging Face - Illustrating Reinforcement Learning from Human Feedback (RLHF) | **Definition, blog post
- **LessWrong - RLHF | **Blog post
- **Unite.ai | What is Reinforcement Learning From Human Feedback (RLHF) | **Blog post
- **Surge.ai - Introduction to Reinforcement Learning with Human Feedback | **Blog post
- Secrets of RLHF in Large Language Models | Code and tutorials for RLHF in nutshell
- Scale - RLHF for Large Language Models | Landing page, tool
- **Github - lucidrains/PaLM-rlhf-pytorch | **Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
- **Github - anthropics/hh-rlhf | **Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
- **Github - conceptofmind/LaMDA-rlhf-pytorch | **Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.
- **Github - opendilab/awesome-RLHF | **A curated list of reinforcement learning with human feedback resources (continually updated)
- **Github - CarperAI/trlx | **A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
- **Github - sunzeyeah/RLHF | **Implementation of Chinese ChatGPT
- **Github - LAION-AI/Open-Assistant | **OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
- **Github - xrsrke/instructGOOSE | **Implementation of Reinforcement Learning from Human Feedback (RLHF)
- **Github - arunprsh/ChatGPT-Decoded-GPT2-FAQ-Bot-RLHF-PPO | **A Practical Guide to Developing a Reliable FAQ Chatbot with Reinforcement Learning and Human Feedback using GPT-2 on AWS
- **Github - voidful/TextRL | **Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
- **Github - cogment/Cogment-verse | **Library of Environments, Human Actor UIs and Agent implementation for Human In the Loop Learning & Reinforcement Learning
- **Github - s-JoL/Open-Llama | **The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.
- **Github - jianzhnie/open-chatgpt | **The open source implementation of chatgpt and RLHF. 从0开始实现一个ChatGPT.
- **Github - andy-yangz/Awesome-RLHF | **Awesome Reinforcement Learning from Human Feedback, the secret behind ChatGPT XD
- **Github - jordimas/awesome-RLHF-language-models | **Curated list of resources for Reinforcement Learning from Human Feedback and Language Models
- **Github - RUCAIBox/LLMSurvey | **A collection of papers and resources related to Large Language Models.
- **Github - mfarisadip/T5-rlhf-pytorch | **Implementation of RLHF (Reinforcement Learning with Human Feedback) and GAN (Generative Adversarial Network) on top of the T5 architecture.
- **Github - CarperAI/Polygraph | **RLHF Mechanistic Interpretability and Deception
- **Github - ayulockin/T5-RLHF-TF | **Implementation of Reinforcement Learning from Human Feedback for Summarization Task in TensorFlow
- **Github - ckkissane/rlhf-shakespeare | **Shakespeare transformer fine-tuned to generate positive sentiment samples using RLHF
- **Github - G-U-N/T2I-HumanFeedback | **Implementations of Baseline Methods for Aligning Text2Img Diffusion Models with Human FeedBack
- **Github - nazneenrajani/rlhf_langchain | **Langchain for RLHF
- **Github - uSaiPrashanth/raithubot-training | **Training a RLHF-transformer architecture to answer farmers' queries
- **Github - l294265421/alpaca-rlhf | **Finetuning alpaca with RLHF (Reinforcement Learning with Human Feedback)
- **Github - DaehanKim/EasyRLHF | **EasyRLHF aims to providing an easy and minimal interface to train RLHF LMs, using off-the-shelf solutions and datasets
- **Github - jeremy-collins/robot-rlhf | **Robot Learning through Human Feedback. Inspired by advancements in NLP, we train a robot policy via reinforcement learning using a reward function learned exclusively from human preferences.
- **Github - Sugoto/GPT-Model-with-RLHF | **This is a GPT 📜 model built from scratch that uses Reinforcement Learning with Human Feedback (RLHF) 🤖 to generate positive 👍 or negative 👎 recreations of Shakespeare's writing style 🎭.
- **Github - vincentmin/transformer_rlhf_eli5 | **We train a transformer model using Reinforcement Learning Human Feedback on the Reddit ELI5 dataset
- **Github - ojus1/MyMusicTransformer | **RLHF + MusicTransformer = Generate the music YOU love
- **Github - AmirMotefaker/Create-your-own-ChatGPT | **Create your own ChatGPT with Python