Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to update Gym Environment after every episode #117

Open
rahuldwivedi1112 opened this issue Jul 18, 2023 · 1 comment
Open

How to update Gym Environment after every episode #117

rahuldwivedi1112 opened this issue Jul 18, 2023 · 1 comment

Comments

@rahuldwivedi1112
Copy link

I am looking for a way to update gym environment after an episode /or game ends. I looked at the code in reset() and step_wait() but after putting some logs cant figure out when a game is ending.

@DennisSoemers
Copy link
Collaborator

Note that MicroRTSGridModeVecEnv (usually) runs multiple episodes "in parallel". Not so much in the sense of truly running in parallel on multiple threads, but it doesn't fully play out one game before starting another. Every time you call step(), it takes one step in each of potentially many episodes. This allows for more efficient usage of GPUs, because we can batch up inputs and outputs. Instead of having a single state as input, we can have a larger batch with one state per episode, and the GPU can do a forwards pass of a neural network for all of them in parallel. Then it also produces actions for all the different episodes as outputs in parallel, and they are passed to the game engine to take one step in each episode.

Of course, different episodes may end after different numbers of time steps. So, while at the very beginning all episodes are "synchronised" in the sense that they all start at time = 0, this will gradually become desynchronised. Some episodes will end early (and get reset such that new episodes start in those slots), while others are still ongoing.

In step_wait(), you should be able to figure out when individual episodes end though. The done variable there is not a single bool, it's actually a matrix. This matrix is first indexed by player (0 or 1), and then by episode index (ranging from 0-inclusive to number-of-parallel-episodes-exclusive). I suppose which player you use to index doesn't actually matter: if the game is over for one player, it's also over for the other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants