Dreamer V3 Performance #218

michele-milesi · 2024-02-27T10:50:31Z

Hi @LYK-love, I am bringing your question on the performance of Dreamer v3 back here so that we can continue the conversation.

I'm worried about the performance of your implementation of DreamerV3 is not as well as the original one (by Hafner). Can you show some evaluation scores of DreamerV3?

michele-milesi · 2024-02-27T11:15:02Z

@LYK-love, I will show you three experiments that we compared with the results described in the Dreamer V3 paper (https://arxiv.org/abs/2301.04104).

Crafter

The reward we obtained in crafter with these configs. The paper claims to have achieved a reward of 11.7 ± 1.9 which is in line with our results (12.1 of reward during the test). In addition, our plot of the reward during training is almost equal to the one obtained by Hafner.

MsPacman 100K

We used these configs for training (+ fabric.accelerator=cuda).
The paper obtained a score of 1327, we evaluated the trained agent with 6 seeds and we obtained the following results (1.911 ± 505.78):

MsPacman Test Reward
2020.0 (seed 5)
1070.0 (seed 1024)
2050.0 (seed 42)
1940.0 (seed 1337)
2630.0 (seed 8)
1760.0 (seed 2)

Boxing 100K

We used these configs for training (+ fabric.accelerator=cuda).
The paper obtained a score of 78, we evaluated the trained agent with 6 seeds and we obtained the following results (94 ± 2.53):

Boxing Test Reward
96.0 (seed 5)
92.0 (seed 1024)
96.0 (seed 42)
90.0 (seed 1337)
94.0 (seed 8)
96.0 (seed 2)

Let me know if you have other questions regarding the performance of Dreamer V3.
Thanks

LYK-love · 2024-03-15T02:08:38Z

Great. Currently I have 8 GPUs, and I'm reproducing your performance with

# Boxing
python sheeprl.py exp=dreamer_v3_100k_boxing fabric.strategy=ddp fabric.devices=8 fabric.accelerator=cuda

,

# Crafter
python sheeprl.py exp=dreamer_v3_XL_crafter fabric.strategy=ddp fabric.devices=8 fabric.accelerator=cuda

and

python sheeprl.py exp=dreamer_v3_100k_ms_pacman fabric.strategy=ddp fabric.devices=8 fabric.accelerator=cuda

I will comment here once I get the result. Meanwhile, I also want to reproduce performance for other envs, like Atari Video Pinball and Star Gunner. Have you reproduced them?

michele-milesi · 2024-03-15T08:27:59Z

Hi @LYK-love, we have never tried those two environments.

LYK-love · 2024-03-15T19:24:55Z

Hello, I get the rewards, but I think I made some mistakes.

Crafter reward

This is my training reward for crafter_reward. It only runs for 215,000 steps, instead of 1, 000, 000 steps in your config.

The reward value is lower--only 5.1 compared to your 12.1.

I evaluated this trained agent with checkpoint at 200,000 steps.

export CKPT="logs/runs/dreamer_v3/crafter_reward/2024-03-15_02-26-07_dreamer_v3_crafter_reward_5/version_0/checkpoint/ckpt_200000_0.ckpt"
sheeprl-eval checkpoint_path=$CKPT fabric.accelerator=gpu env.capture_video=True

I got evaluation reward 5.09.

MsPacman 100K

This is my reward for Pacman. It runs for 100, 000 steps which is the same as the number in your config.

However, the reward value is also lower--only 570 compared to your 1327.

Meanwhile, at step = 90, 000. I do observed a reward=1300, which is similar to 1327.

I evaluated this trained agent with checkpoint at 100,000 steps, and set 6 seeds. The commands are

export CKPT="logs/runs/dreamer_v3/MsPacmanNoFrameskip-v4/2024-03-15_02-20-34_dreamer_v3_MsPacmanNoFrameskip-v4_5/version_0/checkpoint/ckpt_100000_0.ckpt"

seeds=(5 1024 42 1337 8 2)

for seed in "${seeds[@]}"; do
  sheeprl-eval checkpoint_path=$CKPT fabric.accelerator=gpu env.capture_video=True seed=$seed
done

The evaluation rewards are

Test - Reward: 1730.0 (seed=5)
Test - Reward: 570.0 (seed=1024)
Test - Reward: 810.0 (seed=42)
Test - Reward: 640.0  (seed=1337)
Test - Reward: 540.0 (seed=8)
Test - Reward: 580.0(seed=2)

The average evaluation reward is 811.67

Boxing

This is my training reward for Boxing. The training has 100, 000 steps which is the same as the number in your config.

However, the reward is for 85,000 steps, instead of 100, 000 steps. I don't know why. The reward value is lower as well--only 18, instead of your 94 ± 2.53.

I evaluated this trained agent with checkpoint at 100,000 steps, and set 6 seeds. The commands are

export CKPT="logs/runs/dreamer_v3/BoxingNoFrameskip-v4/2024-03-15_02-28-28_dreamer_v3_BoxingNoFrameskip-v4_5/version_0/checkpoint/ckpt_100000_0.ckpt"

seeds=(5 1024 42 1337 8 2)

for seed in "${seeds[@]}"; do
  sheeprl-eval checkpoint_path=$CKPT fabric.accelerator=gpu env.capture_video=True seed=$seed
done

The evaluation rewards are

Test - Reward: 32.0 (seed=5)
Test - Reward: 437.0 (seed=1024)
Test - Reward: 23.0 (seed=42)
Test - Reward: 28.0  (seed=1337)
Test - Reward: 32.0 (seed=8)
Test - Reward: 44.0 (seed=2)

The average evaluation reward is 29.33

Conclusion

I have two questions:

In crafter_reward, why did the training process only run for 215,000 steps, instead of 1, 000, 000 steps. I used 8 gpus for training, maybe this is one reason?
Why my training rewards and evaluation rewards were lower than yours. Looks like my training script was not correct?

michele-milesi · 2024-03-16T09:27:30Z

Hi @LYK-love,

This is strange, if you set 1M of total (policy) steps in the config, then 1M policy steps are performed. As happened in Atari 100K: you used 8 GPUs, but the number of steps is still 100k. So it should not be related to the number of GPUs you used for training.
With distributed training some considerations must be made. First, I suggest reading this discussion: How to scale learning rate with batch size for DDP training? Lightning-AI/pytorch-lightning#3706 (comment), in which it is suggested to divide the batch size by the number of GPUs.
Another thing we are working on is how to manage how often to update the model (train_every) and how often to update the parameters (per_rank_gradient_steps). In distributed training, these parameters should be modified according to the number of GPUs used. You can find an explanation of how we would like to replace the train_every and per_rank_gradient_steps in issue About Hafner train_ratio and general replay_ratio #223.
This means that your training is different from ours.

In the meantime, I advise you not to distribute the training, at least not until we fix this.
Sorry for that.

LYK-love · 2024-03-18T04:14:01Z

Sure. I wonder what is the commands for training. When I use

python sheeprl.py exp=dreamer_v3_XL_crafter fabric.accelerator=cuda

I got an error:

Seed set to 5
Log dir: logs/runs/dreamer_v3/crafter_reward/2024-03-18_04-11-30_dreamer_v3_crafter_reward_5/version_0
Error executing job with overrides: ['exp=dreamer_v3_XL_crafter']
Error locating target 'sheeprl.envs.crafter.CrafterWrapper', set env var HYDRA_FULL_ERROR=1 to see chained exception.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Meanwhile, I didn't get any error when running:

python sheeprl.py exp=dreamer_v3_100k_boxing fabric.accelerator=cuda

michele-milesi · 2024-03-18T08:48:35Z

Can you share your environment? There may be a problem with the ruamel.yaml package: we fixed it with PR no. #230.
Which commit are you using?
I suggest updating the repo and installing Crafter with pip install -e .[crafter].

Let me know, thanks.

michele-milesi · 2024-04-10T08:39:02Z

Hi @LYK-love,
in the last period we have made a number of improvements to DreamerV3 and the repo, I report below the results on Walker Walk.

The grey line is DreamerV3 trained with a single GPU, whereas the orange line is still Dreamer V3 trained on 2 GPUs. You can find the configs here.
These experiments were run with the new improvements/fixes made: #247, #252, #253, #255, #256, #257, #258.

belerico · 2024-04-10T16:31:08Z

Hi @LYK-love, this is an experiment that I've run on Ms-PacMan: #261 (comment).

It has been run with the torch.compie model, but it contains all the improvements @michele-milesi listed here

belerico · 2024-05-05T10:30:02Z

@LYK-love I'm closing this due to both inactivity and it seems to have been resolved. Re-open it if you have more evidence on your side

michele-milesi added question Further information is requested algorithm labels Feb 27, 2024

belerico mentioned this issue Mar 2, 2024

The algorithm framework differs from the DreamerV1 paper #226

Closed

michele-milesi closed this as completed Apr 10, 2024

belerico reopened this Apr 10, 2024

belerico closed this as completed May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dreamer V3 Performance #218

Dreamer V3 Performance #218

michele-milesi commented Feb 27, 2024

michele-milesi commented Feb 27, 2024

LYK-love commented Mar 15, 2024 •

edited

Loading

michele-milesi commented Mar 15, 2024

LYK-love commented Mar 15, 2024 •

edited

Loading

michele-milesi commented Mar 16, 2024

LYK-love commented Mar 18, 2024 •

edited

Loading

michele-milesi commented Mar 18, 2024

michele-milesi commented Apr 10, 2024

belerico commented Apr 10, 2024 •

edited

Loading

belerico commented May 5, 2024

Dreamer V3 Performance #218

Dreamer V3 Performance #218

Comments

michele-milesi commented Feb 27, 2024

michele-milesi commented Feb 27, 2024

LYK-love commented Mar 15, 2024 • edited Loading

michele-milesi commented Mar 15, 2024

LYK-love commented Mar 15, 2024 • edited Loading

Crafter reward

MsPacman 100K

Boxing

Conclusion

michele-milesi commented Mar 16, 2024

LYK-love commented Mar 18, 2024 • edited Loading

michele-milesi commented Mar 18, 2024

michele-milesi commented Apr 10, 2024

belerico commented Apr 10, 2024 • edited Loading

belerico commented May 5, 2024

LYK-love commented Mar 15, 2024 •

edited

Loading

LYK-love commented Mar 15, 2024 •

edited

Loading

LYK-love commented Mar 18, 2024 •

edited

Loading

belerico commented Apr 10, 2024 •

edited

Loading