-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RLVR multinode (2 nodes) issue for mistral-nemo-12B. #526
Comments
I see. I will first investigate the gloo multinode issue then. Thanks! |
Hi @vwxyzjn , it is working now. The issue was not gloo related but I updated the reward model to be same as the base model instead of tutu-3-8b-RM. I think different reward model can't be used? It shouldn't be the case though, right? |
Also, is it required to train RM model beforehand, or can pass same DPOed model in the reward model args which will be trained on the fly? |
Yep, you should use a trained RM. No, you can't pass in a DPOed model as RM. The primary purpose of the RM is to initialize the value model, so it needs to be the same arch as the base model. If you don't want to initialize from an RM you could also initial from the DPO / SFT model, but in our paper we show this could lead to worse average performance. |
I am trying to reproduce this - tulu-3-70B on multinode setting for a mistral nemo 12B: (mistralai/Mistral-Nemo-Instruct-2407) model. Single node doesn't work (due to OOM issue). For 2 nodes I am trying following config -
But this also gives OOM -
If I set vllm_tensor_parallel_size = 2, then I get following errors depending on different actor_num_gpus:
waits at ray.get(pg.ready()) forever because it can't get one more gpu resource.
But runs into -
Can anybody point out the correct config for 2 nodes to run mistral-nemo-12B model, with RM kept as is (Tulu-RM)?
The text was updated successfully, but these errors were encountered: