Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with running mujoco walker 2d #314

Open
ruiiu opened this issue Jul 26, 2024 · 1 comment
Open

Issues with running mujoco walker 2d #314

ruiiu opened this issue Jul 26, 2024 · 1 comment

Comments

@ruiiu
Copy link

ruiiu commented Jul 26, 2024

Hi, when I run the code python sheeprl.py exp=dreamer_v3 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb], the following is the output:

Rank-0: policy_step=788, reward_env_2=0.31653469800949097
Rank-0: policy_step=828, reward_env_2=-1.8613801002502441
Rank-0: policy_step=828, reward_env_3=-0.6631340384483337
Rank-0: policy_step=840, reward_env_0=-3.4890027046203613
Rank-0: policy_step=876, reward_env_3=-4.6154303550720215
Rank-0: policy_step=880, reward_env_1=10.097464561462402
Rank-0: policy_step=888, reward_env_2=-6.006372928619385
Rank-0: policy_step=916, reward_env_0=2.8062071800231934
Rank-0: policy_step=928, reward_env_3=2.518906831741333
Rank-0: policy_step=944, reward_env_1=0.48591500520706177
Rank-0: policy_step=952, reward_env_2=0.014924541115760803
Rank-0: policy_step=964, reward_env_0=2.63313364982605
Rank-0: policy_step=996, reward_env_1=1.226755142211914
Rank-0: policy_step=1020, reward_env_2=1.3471245765686035
Rank-0: policy_step=1024, reward_env_0=-1.6578210592269897
Rank-0: policy_step=1024, reward_env_3=-6.501708507537842

It stuck at policy_step=1024, although no errors. But also no videos saved, no trained checkpoint. I didn't change anything, just clone the repo and ran it.

Btw, what is the difference between the policy step and the environment step in the original dreamer v3 paper? How to convert them?

Thanks very much.

@michele-milesi
Copy link
Member

michele-milesi commented Jul 27, 2024

Hi @ruiiu, by default the selected accelerator is the cpu, if you did not change it and you have a GPU on which train your agent, I suggest you to run the following command:

python sheeprl.py exp=dreamer_v3 env=mujoco env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb] fabric.accelerator=cuda

The difference between policy and environment steps is that the policy step is the number of times the actor selects actions during the environment interaction: at each iteration, the policy steps are incremented by num_envs * world_size. Where num_envs is the number of environments and world_size is the number of devices you are using for training (you can define them with the fabric.devices=<world_size> parameter. Instead, the environment steps are the steps performed by the environments, they can be different from the policy steps because you can set, for example, the action_repeat. The action repeat parameter specifies that every time the actor selects an action, that action is repeated action_repeat times in the environment.
For example, let us suppose you are using a device, a single environment for training and the action_repeat = 2. After running 500 policy steps, the number of environment steps will be 1000.

I hope you are now clearer on the difference between policy and environment steps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants