Issues with running mujoco walker 2d #314

ruiiu · 2024-07-26T01:32:12Z

Hi, when I run the code python sheeprl.py exp=dreamer_v3 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb], the following is the output:

Rank-0: policy_step=788, reward_env_2=0.31653469800949097
Rank-0: policy_step=828, reward_env_2=-1.8613801002502441
Rank-0: policy_step=828, reward_env_3=-0.6631340384483337
Rank-0: policy_step=840, reward_env_0=-3.4890027046203613
Rank-0: policy_step=876, reward_env_3=-4.6154303550720215
Rank-0: policy_step=880, reward_env_1=10.097464561462402
Rank-0: policy_step=888, reward_env_2=-6.006372928619385
Rank-0: policy_step=916, reward_env_0=2.8062071800231934
Rank-0: policy_step=928, reward_env_3=2.518906831741333
Rank-0: policy_step=944, reward_env_1=0.48591500520706177
Rank-0: policy_step=952, reward_env_2=0.014924541115760803
Rank-0: policy_step=964, reward_env_0=2.63313364982605
Rank-0: policy_step=996, reward_env_1=1.226755142211914
Rank-0: policy_step=1020, reward_env_2=1.3471245765686035
Rank-0: policy_step=1024, reward_env_0=-1.6578210592269897
Rank-0: policy_step=1024, reward_env_3=-6.501708507537842

It stuck at policy_step=1024, although no errors. But also no videos saved, no trained checkpoint. I didn't change anything, just clone the repo and ran it.

Btw, what is the difference between the policy step and the environment step in the original dreamer v3 paper? How to convert them?

Thanks very much.

The text was updated successfully, but these errors were encountered:

michele-milesi · 2024-07-27T03:44:55Z

Hi @ruiiu, by default the selected accelerator is the cpu, if you did not change it and you have a GPU on which train your agent, I suggest you to run the following command:

python sheeprl.py exp=dreamer_v3 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb] fabric.accelerator=cuda

The difference between policy and environment steps is that the policy step is the number of times the actor selects actions during the environment interaction: at each iteration, the policy steps are incremented by num_envs * world_size. Where num_envs is the number of environments and world_size is the number of devices you are using for training (you can define them with the fabric.devices=<world_size> parameter. Instead, the environment steps are the steps performed by the environments, they can be different from the policy steps because you can set, for example, the action_repeat. The action repeat parameter specifies that every time the actor selects an action, that action is repeated action_repeat times in the environment.
For example, let us suppose you are using a device, a single environment for training and the action_repeat = 2. After running 500 policy steps, the number of environment steps will be 1000.

I hope you are now clearer on the difference between policy and environment steps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with running mujoco walker 2d #314

Issues with running mujoco walker 2d #314

ruiiu commented Jul 26, 2024

michele-milesi commented Jul 27, 2024 •

edited

Loading

Issues with running mujoco walker 2d #314

Issues with running mujoco walker 2d #314

Comments

ruiiu commented Jul 26, 2024

michele-milesi commented Jul 27, 2024 • edited Loading

michele-milesi commented Jul 27, 2024 •

edited

Loading