Skip to content

Commit

Permalink
make work with core set
Browse files Browse the repository at this point in the history
  • Loading branch information
natolambert committed May 4, 2024
1 parent 188a87b commit 8570e12
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 5 deletions.
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,19 +32,24 @@ pip install reward bench
```
Then, run a following:
```
python -m rewardbench --model={} --dataset={} --batch_size=8
rewardbench --model={yourmodel} --dataset={yourdataset} --batch_size=8
```
For a DPO model, pass --ref_model={} and the script will automatically route there.
Automatically uses Tokenizers chat templates, but can also use fastchat conv templates.

To run the core Reward Bench evaluation set, run:
```
rewardbench --model={yourmodel}
```

Examples:
1. Normal operation
```
python -m rewardbench --model=OpenAssistant/reward-model-deberta-v3-large-v2 --dataset=allenai/ultrafeedback_binarized_cleaned --split=test_gen --chat_template=raw
rewardbench --model=OpenAssistant/reward-model-deberta-v3-large-v2 --dataset=allenai/ultrafeedback_binarized_cleaned --split=test_gen --chat_template=raw
```
2. DPO model from local dataset (note `--load_json`)
```
python -m rewardbench --model=Qwen/Qwen1.5-0.5B-Chat --ref_model=Qwen/Qwen1.5-0.5B --dataset=/net/nfs.cirrascale/allennlp/jacobm/herm/data/berkeley-nectar-binarized-preferences-random-rejected.jsonl --load_json
rewardbench --model=Qwen/Qwen1.5-0.5B-Chat --ref_model=Qwen/Qwen1.5-0.5B --dataset=/net/nfs.cirrascale/allennlp/jacobm/herm/data/berkeley-nectar-binarized-preferences-random-rejected.jsonl --load_json
```

## Full Installation
Expand Down
6 changes: 4 additions & 2 deletions rewardbench/rewardbench.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ def main():
custom_dialogue_formatting=False,
tokenizer=tokenizer,
logger=logger,
keep_columns=["text_chosen", "text_rejected", "id"],
keep_columns=["text_chosen", "text_rejected", "prompt"],
)
else:
dataset = load_preference_dataset(
Expand Down Expand Up @@ -305,7 +305,9 @@ def main():

if args.dataset == "allenai/reward-bench":
out_dataset = dataset.add_column("results", results)
out_dataset = out_dataset.add_column("subset", subsets)
if args.debug:
subsets = subsets[:10]
out_dataset = out_dataset.add_column("subsets", subsets)
out_dataset = out_dataset.to_pandas() # I know this is meh

results_grouped = {}
Expand Down

0 comments on commit 8570e12

Please sign in to comment.