allenai · natolambert · Sep 6, 2024 · Sep 6, 2024
diff --git a/README.md b/README.md
@@ -30,7 +30,6 @@ The repository includes the following:
 The two primary scripts to generate results (more in `scripts/`):
 1. `scripts/run_rm.py`: Run evaluations for reward models.
 2. `scripts/run_dpo.py`: Run evaluations for direct preference optimization (DPO) models (and other models using implicit rewards, such as KTO).
-3. `scripts/train_rm.py`: A basic RM training script built on [TRL](https://github.com/huggingface/trl).
 
 ## Quick Usage
 RewardBench let's you quickly evaluate any reward model on any preference set. 
@@ -81,6 +80,10 @@ Add the following to your `.bashrc`:
 export HF_TOKEN="{your_token}"
 ```
 
+## Training
+
+For training, we recommend using [`open-instruct`](https://github.com/allenai/open-instruct).
+
 ## Contribute Your Model
 
 For now, in order to contribute your model to the leaderboard, open an issue with the model name on HuggingFace (you can still evaluate local models with RewardBench, see below).
@@ -208,7 +211,7 @@ print(scores_per_section)
 ├── rewardbench/                <- Core utils and modeling files
 |   ├── models/                     ├── Standalone files for running existing reward models
 |   └── *.py                        └── RewardBench tools and utilities
-├── scripts/                    <- Scripts and configs to train and evaluate reward models
+├── scripts/                    <- Scripts and configs to evaluate reward models
 ├── tests                       <- Unit tests
 ├── Dockerfile                  <- Build file for reproducible and scaleable research at AI2
 ├── LICENSE

diff --git a/rewardbench.pdf b/rewardbench.pdf
diff --git a/scripts/configs/README.md b/scripts/configs/README.md
@@ -3,4 +3,3 @@
 The following configs are supported:
 1. `beaker_eval.yaml`: Config for internal AI tooling to correctly setup compute environment.
 2. `eval_configs.yaml`: Configs for models to reproduce results on `run_rm.py`/`run_dpo.py`.
-3. [in progress] `training_configs.yaml`: Configs for training reward models.
diff --git a/scripts/configs/beaker_train.yaml b/scripts/configs/beaker_train.yaml
diff --git a/scripts/configs/stage3_no_offloading.conf b/scripts/configs/stage3_no_offloading.conf
diff --git a/scripts/configs/train_configs.yaml b/scripts/configs/train_configs.yaml
diff --git a/scripts/submit_train_jobs.py b/scripts/submit_train_jobs.py