Basic questions with using the repo for Llama and Mistral models #209

NamburiSrinath · 2024-12-11T23:21:48Z

Hi team,

Great work!! Really appreciate bringing so many details together. As a beginner, I've some basic questions, so appreciate your patience.

If I understand it correctly, a LM can be evaluated

Either as a classifier depending on how they were trained - if a normal reward model is used, then we will be using run_rm.py, and if DPO is used, we will be using run_dpo.py
Or generate the responses from this LM, use another LM (GPT4 or so as a judge) and evaluate the generative responses - run_generative.py

Now, if my understanding is correct, Llama models (eg: meta-llama/Llama-2-7b-chat-hf) is trained using a reward model and Mistral models (eg: mistralai/Mistral-7B-Instruct-v0.2) is trained using DPO.

Q: So my first question is, run_rm.py can be used for Llama models and run_dpo.py can be used for Mistral models.

Q: If instead of evaluating in classifier fashion, I plan to evaluate using the generative approach which will truly tell the capability of LM, I believe the below commands should work, without providing any chat templates!

python scripts/run_generative.py --model=meta-llama/Llama-2-7b-chat-hf --force_local
python scripts/run_generative.py --model=mistralai/Mistral-7B-Instruct-v0.2 --force_local

I'm aware that the main results in the paper (except for Table 8) are of classifier based reward models; and as such the final results when using run_generative.py will be lower as generative based approach is more challenging.

The text was updated successfully, but these errors were encountered:

natolambert · 2024-12-14T18:21:05Z

The big distinction is on the type of LM it is. You can think about this with HuggingFace Transformers abstractions. Most reward models are trained with AutoModelForSequenceClassification which adds a value head to output one logit -- these are run with run_rm (along with similar models with slightly different architecture). A standard generative model is AutoModelForCausalLM which is run with run_generative.

run_dpo is largely deprecated, but it is designed to run a DPO trained model as an implicit RM. These models can also be run with run_generative, but the usages as a RM is different (generative == llm as a judge).

Hope that helps @NamburiSrinath

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic questions with using the repo for Llama and Mistral models #209

Basic questions with using the repo for Llama and Mistral models #209

NamburiSrinath commented Dec 11, 2024

natolambert commented Dec 14, 2024

Basic questions with using the repo for Llama and Mistral models #209

Basic questions with using the repo for Llama and Mistral models #209

Comments

NamburiSrinath commented Dec 11, 2024

natolambert commented Dec 14, 2024