You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i am using 1.9.0+cu111,and error message says: argument "gather_list" must be specified on destination rank.
Also, I am confused that what is the point to gather all information in None in the llm master process?
As per Pytorch 1.9.0's documentation (https://pytorch.org/docs/1.9.0/distributed.html), the torch.distributed.gather_object method still takes an object_gather_list argument. So I don't get why you have this error.
Concerning the None, the object_gather_list argument is used to specify the variable on which to gather (on the destination process) all obj passed by the other processes. So each process sending an obj has no need to specify an object_gather_list. On the contrary, the destination process (here self._llm_master_process) does not specify any obj but does give an object_gather_list (as it is receiving objects but not sending one). You can find the destination process' code here.
In the __call_model funtion in lamorel/caller.py, you set object_gather_list=None. However, it is not allowed in torch.dist.
The text was updated successfully, but these errors were encountered: