From 0979835d0e29a8382cbaa02732862e62237b7c17 Mon Sep 17 00:00:00 2001 From: Nathan Lambert Date: Wed, 7 Feb 2024 13:21:24 -0800 Subject: [PATCH] Update README.md --- analysis/README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/analysis/README.md b/analysis/README.md index 61777a33..67272368 100644 --- a/analysis/README.md +++ b/analysis/README.md @@ -9,6 +9,7 @@ This returns the reward per-token to show how the reward evolves over a piece of python analysis/per_token_reward.py --model=OpenAssistant/reward-model-deberta-v3-large-v2 --text="I love to walk the dog, what do you like?" ``` E.g. with OpenAssistant/reward-model-deberta-v3-large-v2 +``` Reward: -0.544 | Substring: I Reward: -0.556 | Substring: I love Reward: -0.566 | Substring: I love to @@ -21,7 +22,7 @@ Reward: 0.085 | Substring: I love to walk the dog, what do Reward: 0.089 | Substring: I love to walk the dog, what do you Reward: 0.09 | Substring: I love to walk the dog, what do you like Reward: 0.093 | Substring: I love to walk the dog, what do you like? - +``` ### Model usage within eval. dataset To run this, execute: ``` @@ -80,4 +81,4 @@ This will also return the following table by default: | tulu-30b | 2 | 2 | 0 | | vicuna-33b-v1.3 | 1 | 1 | 0 | -Total number of models involved: 44 \ No newline at end of file +Total number of models involved: 44