-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel collection and evaluation #143
Comments
Hello! Thanks for opening this and sorry for the delay in answering. In vectorized environments, both collection and evaluation are done using a batch of vectorized environments. In other environments, right now, both collection and evaluation are sequentially in the number of environments. Collection: BenchMARL/benchmarl/experiment/experiment.py Line 439 in 9813807
Evaluation: BenchMARL/benchmarl/experiment/experiment.py Line 833 in 9813807
Allowing to change both of these to Parallel has long been on the TODO list: #94 This could be as simple as changing This is top of the todo list so I think I will get to it when i have time. RE your specific case, in meltingpot changing the |
Hello! Thanks for the response. It is good to know that the issue is at the top of the todo list. You are right that the collection and evaluation is done sequentially. Just to follow your suggestion, I changed SerialEnv to ParallelEnv yet it led to many errors so I stopped. Also, I definitely see execution time improvements when I set n_envs_per_worker from 2 to 20. But I guess it has something to do with the reset method of meltingpot envs. Here is an example run of IQN on Harvest env with 10 agents with off_policy_n_envs_per_worker: 20,
There is also an increase in time execution when episodes end. I guess, at the end, it cancels out the improvement on regular iterations. |
Ok that is what I was afraid of. In theory they should be interchangable but in practice they are my first cause of migranes (hence why we only have serial for now). but when I gather some courage I'll look into it. Rgarding the other part of the message: anything out of what you expected/something I can help with? |
Nope. Thanks for the quick responses. |
I ll just keep this open until the feature lands |
I revisited the issue and you were right that switching from SerialEnv to ParallelEnv works! Apparently, the problem was about how I pass some env config params to env creator function. I guess ParallelEnv does not copy task config as SerialEnv do. I changed the way I pass the args and removed hydra option and now it works. |
Nice! Would you be able to share your solution in a PR? Also maybe if you can open an issue in torchrl outlining where the serial and parallel differ that you did not expect |
Well, in terms of collection time, ParallelEnv improves a lot. However, after checking the results, I can see that there is a big change in terms of learning performance. I ran some more tests with the config below (on IQL), only changing SerialEnv - ParallelEnv, and somehow the learning is very poor when I use ParallelEnv.
I thought the only difference is that SerialEnv is just stepping 20 envs in sequence whereas ParallelEnv steps them in seperate processes. Note that an episode ends only if 1000 steps are taken. I am not sure if this originates from MeltingPot and it is due to async collection from envs. |
Oh no, that does not sound good. I feared something like this. I'll need to take a look We need to identify where this deviation first occurs. Maybe the first apporoach would be to test with a non-learned deterministic policy and see if the result is different betwen the 2 envs |
Can _evaluation_loop use SyncDataCollector for non vectorized envs so that the evaluation is also parallel?
While running on Melting Pot envs, increasing n_envs_per_worker definitely improves execution time but the evaluation steps take almost 3 times longer (I have evaluation_episodes: 10) than a regular iteration since the evaluation is sequential.
Making test_env SerialEnv could solve the issue.
The text was updated successfully, but these errors were encountered: