You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I followed the official ray debugger document, tried to step into the remote func by remote and it hanged infinitely without any error or print log.
Active breakpoints:
index | timestamp | Ray task | filename:lineno
0 | 2025-03-06 12:03:52 | ray::main_task | /home/projects/Logic-RL/verl/trainer/ppo/ray_trainer.py:691
Enter breakpoint index or press enter to refresh: 0
> /home/projects/Logic-RL/verl/trainer/ppo/ray_trainer.py(692)fit()
-> actor_output = self.actor_rollout_wg.update_actor(batch)
(Pdb) s
--Call--
> /home/projects/Logic-RL/verl/single_controller/ray/base.py(38)func()
-> def func(*args, **kwargs):
(Pdb) n
> /home/projects/Logic-RL/verl/single_controller/ray/base.py(39)func()
-> args, kwargs = dispatch_fn(self, *args, **kwargs)
(Pdb) n
> /home/projects/Logic-RL/verl/single_controller/ray/base.py(40)func()
-> output = execute_fn(method_name, *args, **kwargs)
(Pdb) s
--Call--
> /home/projects/Logic-RL/verl/single_controller/ray/base.py(329)execute_all()
-> def execute_all(self, method_name: str, *args, **kwargs):
(Pdb) s
> /home/projects/Logic-RL/verl/single_controller/ray/base.py(330)execute_all()
-> return self.execute_all_async(method_name, *args, **kwargs)
(Pdb) s
--Call--
> /home/projects/Logic-RL/verl/single_controller/ray/base.py(335)execute_all_async()
-> def execute_all_async(self, method_name: str, *args, **kwargs):
(Pdb) n
> /home/projects/Logic-RL/verl/single_controller/ray/base.py(339)execute_all_async()
-> length = len(self._workers)
(Pdb) n
> /home/projects/Logic-RL/verl/single_controller/ray/base.py(340)execute_all_async()
-> if all(isinstance(arg, list) for arg in args) and all(isinstance(kwarg, list) for kwarg in kwargs.values()):
(Pdb) n
> /home/projects/Logic-RL/verl/single_controller/ray/base.py(341)execute_all_async()
-> if all(len(arg) == length for arg in args) and all(len(kwarg) == length for kwarg in kwargs.values()):
(Pdb) n
> /home/projects/Logic-RL/verl/single_controller/ray/base.py(343)execute_all_async()
-> result = []
(Pdb) n
> /home/projects/Logic-RL/verl/single_controller/ray/base.py(344)execute_all_async()
-> for i in range(length):
(Pdb) n
> /home/projects/Logic-RL/verl/single_controller/ray/base.py(345)execute_all_async()
-> sliced_args = tuple(arg[i] for arg in args)
(Pdb) n
> /home/projects/Logic-RL/verl/single_controller/ray/base.py(346)execute_all_async()
-> sliced_kwargs = {k: v[i] for k, v in kwargs.items()}
(Pdb) n
> /home/projects/Logic-RL/verl/single_controller/ray/base.py(347)execute_all_async()
-> remote_call = getattr(self._workers[i], method_name)
(Pdb) n
> /home/projects/Logic-RL/verl/single_controller/ray/base.py(348)execute_all_async()
-> result.append(remote_call.remote(*sliced_args, **sliced_kwargs))
(Pdb) remote
Continuing pdb session in different process...
Can you guys give me some suggestions on how to tune the hyperparameters and how to fix the debugger hanging problem?
Thx!
The text was updated successfully, but these errors were encountered:
When running training with Qwen2.5-3B on 4x RTX 4090 GPUs (24GB each), encountering two issues:
My script is as follows:
I followed the official ray debugger document, tried to step into the remote func by
remote
and it hanged infinitely without any error or print log.Can you guys give me some suggestions on how to tune the hyperparameters and how to fix the debugger hanging problem?
Thx!
The text was updated successfully, but these errors were encountered: