Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dreamer V3 Performance #218

Closed
michele-milesi opened this issue Feb 27, 2024 · 10 comments
Closed

Dreamer V3 Performance #218

michele-milesi opened this issue Feb 27, 2024 · 10 comments
Labels
algorithm question Further information is requested

Comments

@michele-milesi
Copy link
Member

Hi @LYK-love, I am bringing your question on the performance of Dreamer v3 back here so that we can continue the conversation.

I'm worried about the performance of your implementation of DreamerV3 is not as well as the original one (by Hafner). Can you show some evaluation scores of DreamerV3?

@michele-milesi michele-milesi added question Further information is requested algorithm labels Feb 27, 2024
@michele-milesi
Copy link
Member Author

@LYK-love, I will show you three experiments that we compared with the results described in the Dreamer V3 paper (https://arxiv.org/abs/2301.04104).

  1. Crafter
crafter-reward

The reward we obtained in crafter with these configs. The paper claims to have achieved a reward of 11.7 ± 1.9 which is in line with our results (12.1 of reward during the test). In addition, our plot of the reward during training is almost equal to the one obtained by Hafner.

  1. MsPacman 100K
ms-pacman-reward

We used these configs for training (+ fabric.accelerator=cuda).
The paper obtained a score of 1327, we evaluated the trained agent with 6 seeds and we obtained the following results (1.911 ± 505.78):

MsPacman Test Reward
2020.0 (seed 5)
1070.0 (seed 1024)
2050.0 (seed 42)
1940.0 (seed 1337)
2630.0 (seed 8)
1760.0 (seed 2)
  1. Boxing 100K
boxing-reward

We used these configs for training (+ fabric.accelerator=cuda).
The paper obtained a score of 78, we evaluated the trained agent with 6 seeds and we obtained the following results (94 ± 2.53):

Boxing Test Reward
96.0 (seed 5)
92.0 (seed 1024)
96.0 (seed 42)
90.0 (seed 1337)
94.0 (seed 8)
96.0 (seed 2)

Let me know if you have other questions regarding the performance of Dreamer V3.
Thanks

@LYK-love
Copy link

LYK-love commented Mar 15, 2024

Great. Currently I have 8 GPUs, and I'm reproducing your performance with

# Boxing
python sheeprl.py exp=dreamer_v3_100k_boxing fabric.strategy=ddp fabric.devices=8 fabric.accelerator=cuda

,

# Crafter
python sheeprl.py exp=dreamer_v3_XL_crafter fabric.strategy=ddp fabric.devices=8 fabric.accelerator=cuda

and

python sheeprl.py exp=dreamer_v3_100k_ms_pacman fabric.strategy=ddp fabric.devices=8 fabric.accelerator=cuda

I will comment here once I get the result. Meanwhile, I also want to reproduce performance for other envs, like Atari Video Pinball and Star Gunner. Have you reproduced them?

@michele-milesi
Copy link
Member Author

Hi @LYK-love, we have never tried those two environments.

@LYK-love
Copy link

LYK-love commented Mar 15, 2024

Hello, I get the rewards, but I think I made some mistakes.

Crafter reward

This is my training reward for crafter_reward. It only runs for 215,000 steps, instead of 1, 000, 000 steps in your config.
image
The reward value is lower--only 5.1 compared to your 12.1.

image

I evaluated this trained agent with checkpoint at 200,000 steps.

export CKPT="logs/runs/dreamer_v3/crafter_reward/2024-03-15_02-26-07_dreamer_v3_crafter_reward_5/version_0/checkpoint/ckpt_200000_0.ckpt"
sheeprl-eval checkpoint_path=$CKPT fabric.accelerator=gpu env.capture_video=True

I got evaluation reward 5.09.

MsPacman 100K

This is my reward for Pacman. It runs for 100, 000 steps which is the same as the number in your config.
image

However, the reward value is also lower--only 570 compared to your 1327.
image

Meanwhile, at step = 90, 000. I do observed a reward=1300, which is similar to 1327.
image

I evaluated this trained agent with checkpoint at 100,000 steps, and set 6 seeds. The commands are

export CKPT="logs/runs/dreamer_v3/MsPacmanNoFrameskip-v4/2024-03-15_02-20-34_dreamer_v3_MsPacmanNoFrameskip-v4_5/version_0/checkpoint/ckpt_100000_0.ckpt"

seeds=(5 1024 42 1337 8 2)

for seed in "${seeds[@]}"; do
  sheeprl-eval checkpoint_path=$CKPT fabric.accelerator=gpu env.capture_video=True seed=$seed
done

The evaluation rewards are

Test - Reward: 1730.0 (seed=5)
Test - Reward: 570.0 (seed=1024)
Test - Reward: 810.0 (seed=42)
Test - Reward: 640.0  (seed=1337)
Test - Reward: 540.0 (seed=8)
Test - Reward: 580.0(seed=2)

The average evaluation reward is 811.67

Boxing

This is my training reward for Boxing. The training has 100, 000 steps which is the same as the number in your config.
image

However, the reward is for 85,000 steps, instead of 100, 000 steps. I don't know why. The reward value is lower as well--only 18, instead of your 94 ± 2.53.
image
I evaluated this trained agent with checkpoint at 100,000 steps, and set 6 seeds. The commands are

export CKPT="logs/runs/dreamer_v3/BoxingNoFrameskip-v4/2024-03-15_02-28-28_dreamer_v3_BoxingNoFrameskip-v4_5/version_0/checkpoint/ckpt_100000_0.ckpt"

seeds=(5 1024 42 1337 8 2)

for seed in "${seeds[@]}"; do
  sheeprl-eval checkpoint_path=$CKPT fabric.accelerator=gpu env.capture_video=True seed=$seed
done

The evaluation rewards are

Test - Reward: 32.0 (seed=5)
Test - Reward: 437.0 (seed=1024)
Test - Reward: 23.0 (seed=42)
Test - Reward: 28.0  (seed=1337)
Test - Reward: 32.0 (seed=8)
Test - Reward: 44.0 (seed=2)

The average evaluation reward is 29.33

Conclusion

I have two questions:

  1. In crafter_reward, why did the training process only run for 215,000 steps, instead of 1, 000, 000 steps. I used 8 gpus for training, maybe this is one reason?
  2. Why my training rewards and evaluation rewards were lower than yours. Looks like my training script was not correct?

@michele-milesi
Copy link
Member Author

Hi @LYK-love,

  1. This is strange, if you set 1M of total (policy) steps in the config, then 1M policy steps are performed. As happened in Atari 100K: you used 8 GPUs, but the number of steps is still 100k. So it should not be related to the number of GPUs you used for training.
  2. With distributed training some considerations must be made. First, I suggest reading this discussion: How to scale learning rate with batch size for DDP training? Lightning-AI/pytorch-lightning#3706 (comment), in which it is suggested to divide the batch size by the number of GPUs.
    Another thing we are working on is how to manage how often to update the model (train_every) and how often to update the parameters (per_rank_gradient_steps). In distributed training, these parameters should be modified according to the number of GPUs used. You can find an explanation of how we would like to replace the train_every and per_rank_gradient_steps in issue About Hafner train_ratio and general replay_ratio #223.
    This means that your training is different from ours.

In the meantime, I advise you not to distribute the training, at least not until we fix this.
Sorry for that.

@LYK-love
Copy link

LYK-love commented Mar 18, 2024

Sure. I wonder what is the commands for training. When I use

python sheeprl.py exp=dreamer_v3_XL_crafter fabric.accelerator=cuda

I got an error:

Seed set to 5
Log dir: logs/runs/dreamer_v3/crafter_reward/2024-03-18_04-11-30_dreamer_v3_crafter_reward_5/version_0
Error executing job with overrides: ['exp=dreamer_v3_XL_crafter']
Error locating target 'sheeprl.envs.crafter.CrafterWrapper', set env var HYDRA_FULL_ERROR=1 to see chained exception.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Meanwhile, I didn't get any error when running:

python sheeprl.py exp=dreamer_v3_100k_boxing fabric.accelerator=cuda

@michele-milesi
Copy link
Member Author

Can you share your environment? There may be a problem with the ruamel.yaml package: we fixed it with PR no. #230.
Which commit are you using?
I suggest updating the repo and installing Crafter with pip install -e .[crafter].

Let me know, thanks.

@michele-milesi
Copy link
Member Author

Hi @LYK-love,
in the last period we have made a number of improvements to DreamerV3 and the repo, I report below the results on Walker Walk.
ww-rew

The grey line is DreamerV3 trained with a single GPU, whereas the orange line is still Dreamer V3 trained on 2 GPUs. You can find the configs here.
These experiments were run with the new improvements/fixes made: #247, #252, #253, #255, #256, #257, #258.

@belerico
Copy link
Member

belerico commented Apr 10, 2024

Hi @LYK-love, this is an experiment that I've run on Ms-PacMan: #261 (comment).

It has been run with the torch.compie model, but it contains all the improvements @michele-milesi listed here

@belerico belerico reopened this Apr 10, 2024
@belerico
Copy link
Member

belerico commented May 5, 2024

@LYK-love I'm closing this due to both inactivity and it seems to have been resolved. Re-open it if you have more evidence on your side

@belerico belerico closed this as completed May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
algorithm question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants