You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It stuck at policy_step=1024, although no errors. But also no videos saved, no trained checkpoint. I didn't change anything, just clone the repo and ran it.
Btw, what is the difference between the policy step and the environment step in the original dreamer v3 paper? How to convert them?
Thanks very much.
The text was updated successfully, but these errors were encountered:
Hi @ruiiu, by default the selected accelerator is the cpu, if you did not change it and you have a GPU on which train your agent, I suggest you to run the following command:
The difference between policy and environment steps is that the policy step is the number of times the actor selects actions during the environment interaction: at each iteration, the policy steps are incremented by num_envs * world_size. Where num_envs is the number of environments and world_size is the number of devices you are using for training (you can define them with the fabric.devices=<world_size> parameter. Instead, the environment steps are the steps performed by the environments, they can be different from the policy steps because you can set, for example, the action_repeat. The action repeat parameter specifies that every time the actor selects an action, that action is repeated action_repeat times in the environment.
For example, let us suppose you are using a device, a single environment for training and the action_repeat = 2. After running 500 policy steps, the number of environment steps will be 1000.
I hope you are now clearer on the difference between policy and environment steps
Hi, when I run the code
python sheeprl.py exp=dreamer_v3 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb]
, the following is the output:Rank-0: policy_step=788, reward_env_2=0.31653469800949097
Rank-0: policy_step=828, reward_env_2=-1.8613801002502441
Rank-0: policy_step=828, reward_env_3=-0.6631340384483337
Rank-0: policy_step=840, reward_env_0=-3.4890027046203613
Rank-0: policy_step=876, reward_env_3=-4.6154303550720215
Rank-0: policy_step=880, reward_env_1=10.097464561462402
Rank-0: policy_step=888, reward_env_2=-6.006372928619385
Rank-0: policy_step=916, reward_env_0=2.8062071800231934
Rank-0: policy_step=928, reward_env_3=2.518906831741333
Rank-0: policy_step=944, reward_env_1=0.48591500520706177
Rank-0: policy_step=952, reward_env_2=0.014924541115760803
Rank-0: policy_step=964, reward_env_0=2.63313364982605
Rank-0: policy_step=996, reward_env_1=1.226755142211914
Rank-0: policy_step=1020, reward_env_2=1.3471245765686035
Rank-0: policy_step=1024, reward_env_0=-1.6578210592269897
Rank-0: policy_step=1024, reward_env_3=-6.501708507537842
It stuck at policy_step=1024, although no errors. But also no videos saved, no trained checkpoint. I didn't change anything, just clone the repo and ran it.
Btw, what is the difference between the policy step and the environment step in the original dreamer v3 paper? How to convert them?
Thanks very much.
The text was updated successfully, but these errors were encountered: