Parallel collection and evaluation #143

gliese876b · 2024-11-18T16:18:03Z

Can _evaluation_loop use SyncDataCollector for non vectorized envs so that the evaluation is also parallel?

While running on Melting Pot envs, increasing n_envs_per_worker definitely improves execution time but the evaluation steps take almost 3 times longer (I have evaluation_episodes: 10) than a regular iteration since the evaluation is sequential.

Making test_env SerialEnv could solve the issue.

matteobettini · 2024-11-24T11:06:03Z

Hello!

Thanks for opening this and sorry for the delay in answering.
It gives me the chance to speak about this which I wanted to do.

In vectorized environments, both collection and evaluation are done using a batch of vectorized environments.

In other environments, right now, both collection and evaluation are sequentially in the number of environments.

Collection:

BenchMARL/benchmarl/experiment/experiment.py

Line 439 in 9813807

SerialEnv(self.config.n_envs_per_worker(self.on_policy), env_func),

Evaluation:

BenchMARL/benchmarl/experiment/experiment.py

Line 833 in 9813807

for eval_episode in range(self.config.evaluation_episodes):

Allowing to change both of these to Parallel has long been on the TODO list: #94

This could be as simple as changing SerialEnv to ParallelEnv but also have certain implications which have to be checked.

This is top of the todo list so I think I will get to it when i have time.

RE your specific case, in meltingpot changing the n_envs_per_worker should not change much as it will collect sequentially anyway. Maybe the reason evaluation is so much longer could be rendering? try to test it with rendering disabled (it is in the experiment config)

gliese876b · 2024-11-24T11:20:07Z

Hello!

Thanks for the response. It is good to know that the issue is at the top of the todo list.

You are right that the collection and evaluation is done sequentially.

Just to follow your suggestion, I changed SerialEnv to ParallelEnv yet it led to many errors so I stopped.

Also, I definitely see execution time improvements when I set n_envs_per_worker from 2 to 20. But I guess it has something to do with the reset method of meltingpot envs.

Here is an example run of IQN on Harvest env with 10 agents with off_policy_n_envs_per_worker: 20, evaluation_interval: 50_000, evaluation_episodes: 10, off_policy_collected_frames_per_batch: 2000.

0%|          | 0/2500 [00:00<?, ?it/s].../logger.py:100: UserWarning: No episode terminated this iteration and thus the episode rewards will be NaN, this is normal if your horizon is longer then one iteration. Learning is proceeding fine.The episodes will probably terminate in a future iteration.
  warnings.warn(

mean return = nan:   0%|          | 1/2500 [05:48<241:38:32, 348.10s/it].../logger.py:100: UserWarning: No episode terminated this iteration and thus the episode rewards will be NaN, this is normal if your horizon is longer then one iteration. Learning is proceeding fine.The episodes will probably terminate in a future iteration.
  warnings.warn(

mean return = nan:   0%|          | 2/2500 [06:38<120:12:07, 173.23s/it]
mean return = nan:   0%|          | 3/2500 [07:29<81:18:35, 117.23s/it] 
mean return = nan:   0%|          | 4/2500 [08:20<63:04:10, 90.97s/it] 
mean return = nan:   0%|          | 5/2500 [09:11<53:00:30, 76.49s/it]
mean return = nan:   0%|          | 6/2500 [10:02<46:59:21, 67.83s/it]
mean return = nan:   0%|          | 7/2500 [10:52<43:04:41, 62.21s/it]
mean return = nan:   0%|          | 8/2500 [11:43<40:39:06, 58.73s/it]
mean return = nan:   0%|          | 9/2500 [12:36<39:21:00, 56.87s/it]
mean return = -88.84220123291016:   0%|          | 10/2500 [13:58<44:37:06, 64.51s/it]
mean return = nan:   0%|          | 11/2500 [14:49<41:44:54, 60.38s/it]               
mean return = nan:   0%|          | 12/2500 [15:41<39:56:15, 57.79s/it]
mean return = nan:   1%|          | 13/2500 [16:32<38:38:37, 55.94s/it]
mean return = nan:   1%|          | 14/2500 [17:24<37:44:07, 54.65s/it]
mean return = nan:   1%|          | 15/2500 [18:16<37:08:39, 53.81s/it]
mean return = nan:   1%|          | 16/2500 [19:08<36:42:25, 53.20s/it]
mean return = nan:   1%|          | 17/2500 [20:00<36:24:12, 52.78s/it]
mean return = nan:   1%|          | 18/2500 [20:52<36:17:42, 52.64s/it]
mean return = nan:   1%|          | 19/2500 [21:45<36:22:00, 52.77s/it]
mean return = -108.3295669555664:   1%|          | 20/2500 [23:08<42:33:57, 61.79s/it]
mean return = nan:   1%|          | 21/2500 [23:59<40:23:30, 58.66s/it]               
mean return = nan:   1%|          | 22/2500 [24:51<39:01:50, 56.70s/it]
mean return = nan:   1%|          | 23/2500 [25:43<38:02:08, 55.28s/it]
mean return = nan:   1%|          | 24/2500 [26:35<37:23:44, 54.37s/it]
mean return = nan:   1%|          | 25/2500 [32:32<99:44:40, 145.08s/it]  ------------------> evaluation
mean return = nan:   1%|          | 26/2500 [33:23<80:13:32, 116.74s/it]
mean return = nan:   1%|          | 27/2500 [34:13<66:28:06, 96.76s/it] 
mean return = nan:   1%|          | 28/2500 [35:04<57:02:45, 83.08s/it]
mean return = nan:   1%|          | 29/2500 [35:55<50:19:59, 73.33s/it]
mean return = -130.4561309814453:   1%|          | 30/2500 [37:15<51:42:54, 75.37s/it]

There is also an increase in time execution when episodes end. I guess, at the end, it cancels out the improvement on regular iterations.

matteobettini · 2024-11-24T11:25:19Z

Just to follow your suggestion, I changed SerialEnv to ParallelEnv yet it led to many errors so I stopped.

Ok that is what I was afraid of. In theory they should be interchangable but in practice they are my first cause of migranes (hence why we only have serial for now). but when I gather some courage I'll look into it.

Rgarding the other part of the message: anything out of what you expected/something I can help with?

gliese876b · 2024-11-25T10:09:11Z

Nope. Thanks for the quick responses.
Good luck!

matteobettini · 2024-11-25T10:16:37Z

I ll just keep this open until the feature lands

gliese876b · 2024-11-29T16:21:23Z

I revisited the issue and you were right that switching from SerialEnv to ParallelEnv works!

Apparently, the problem was about how I pass some env config params to env creator function. I guess ParallelEnv does not copy task config as SerialEnv do. I changed the way I pass the args and removed hydra option and now it works.

matteobettini · 2024-11-29T16:23:46Z

Nice! Would you be able to share your solution in a PR? Also maybe if you can open an issue in torchrl outlining where the serial and parallel differ that you did not expect

gliese876b · 2024-12-01T10:06:21Z

Well, in terms of collection time, ParallelEnv improves a lot.

However, after checking the results, I can see that there is a big change in terms of learning performance. I ran some more tests with the config below (on IQL), only changing SerialEnv - ParallelEnv, and somehow the learning is very poor when I use ParallelEnv.

off_policy_collected_frames_per_batch: 2000
off_policy_n_envs_per_worker: 20
off_policy_n_optimizer_steps: 20
off_policy_train_batch_size: 128
off_policy_memory_size: 20000
off_policy_init_random_frames: 0

I thought the only difference is that SerialEnv is just stepping 20 envs in sequence whereas ParallelEnv steps them in seperate processes. Note that an episode ends only if 1000 steps are taken.

I am not sure if this originates from MeltingPot and it is due to async collection from envs.

matteobettini · 2024-12-01T22:00:07Z

Oh no, that does not sound good. I feared something like this. I'll need to take a look

We need to identify where this deviation first occurs.

Maybe the first apporoach would be to test with a non-learned deterministic policy and see if the result is different betwen the 2 envs

matteobettini pinned this issue Nov 24, 2024

matteobettini changed the title ~~Parallel Evaluation~~ Parallel collection and evaluation Nov 24, 2024

matteobettini mentioned this issue Nov 24, 2024

[DO NOT CLOSE] Library TODOs and call for contributions #94

Open

13 tasks

gliese876b closed this as completed Nov 25, 2024

matteobettini reopened this Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel collection and evaluation #143

Parallel collection and evaluation #143

gliese876b commented Nov 18, 2024

matteobettini commented Nov 24, 2024 •

edited

Loading

gliese876b commented Nov 24, 2024

matteobettini commented Nov 24, 2024

gliese876b commented Nov 25, 2024

matteobettini commented Nov 25, 2024

gliese876b commented Nov 29, 2024

matteobettini commented Nov 29, 2024

gliese876b commented Dec 1, 2024 •

edited

Loading

matteobettini commented Dec 1, 2024 •

edited

Loading

Parallel collection and evaluation #143

Parallel collection and evaluation #143

Comments

gliese876b commented Nov 18, 2024

matteobettini commented Nov 24, 2024 • edited Loading

gliese876b commented Nov 24, 2024

matteobettini commented Nov 24, 2024

gliese876b commented Nov 25, 2024

matteobettini commented Nov 25, 2024

gliese876b commented Nov 29, 2024

matteobettini commented Nov 29, 2024

gliese876b commented Dec 1, 2024 • edited Loading

matteobettini commented Dec 1, 2024 • edited Loading

matteobettini commented Nov 24, 2024 •

edited

Loading

gliese876b commented Dec 1, 2024 •

edited

Loading

matteobettini commented Dec 1, 2024 •

edited

Loading