Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pettingzoo Bindings #59

Open
vwxyzjn opened this issue Feb 7, 2022 · 8 comments
Open

Add Pettingzoo Bindings #59

vwxyzjn opened this issue Feb 7, 2022 · 8 comments

Comments

@vwxyzjn
Copy link
Collaborator

vwxyzjn commented Feb 7, 2022

TLDR: Petting Zoo has become the standard library for getting multi-agent environments & we want to support Petting Zoo's bindings in gym-microrts.

This project https://github.com/vwxyzjn/gym-microrts is an RL environment for RTS game, where lots of units are always spawning and dying. Because of the multi-agent nature of RTS games, gym-microrts should fit with PettingZoo’s interface pretty seamlessly.

We currently need help on the following fronts:

  1. Support PettingZoo’s API in gym-microrts. The current API is already similar to PettingZoo’s API (having similar observation space, action space, and support for action masks) but it will be nice to officially adopt PettingZoo.
  2. Making SB3 work with pettingzoo. Recently, we had a contribution that made SB3 work with gym-microrts by @kachayev: https://github.com/kachayev/gym-microrts-paper-sb3, it would be nice to have an SB3 demo that works with gym-microrts’s pettingzoo API and ultimately support all pettingzoo environments that have action masks such as Chess or Go.

Setting up an issue to track progress.

@BolunDai0216 suggests he would like to take a stab at this.

@kachayev
Copy link
Contributor

kachayev commented Feb 7, 2022

I have a working version of microrts integrated with PettingZoo to expose each unit in the game as independent agent :) I assume you are describing less "extreme" API where we have agent = player, right?

Is the goal to make wrappers from PettingZoo to SB3 to work (like vectorization)?

@vwxyzjn
Copy link
Collaborator Author

vwxyzjn commented Feb 7, 2022

Oh @kachayev that's awesome! @BolunDai0216 is interested in working on this. Would you mind sharing your version here?

@kachayev
Copy link
Contributor

kachayev commented Feb 7, 2022

Absolutely! I'll dig it up tomorrow

@kachayev
Copy link
Contributor

kachayev commented Feb 9, 2022

Okay, I completely blinked on this. This is (partially) the code I'm using in my experiments. I tried to cherry-pick it without any dependencies on my implementation of the environment. I think that the use case of having API for 2 players would be much easier: there won't be any problems with having dynamic number of agents, obs and action space is the same for both players, no problems with rewards/infos, etc. It will be just a little bit of index juggling when putting obs and actions in place. For having each unit as a separate agent, as you see here, is more involved. And I certainly don't have fully fledged solution that would cover most use cases (this one is tight to my specific algo only). Also, note that this AEC API. Not sure if the goal here to have only AEC, or other APIs as well. Support for parallel_env would be cool as well.

@kachayev
Copy link
Contributor

kachayev commented Feb 9, 2022

from pettingzoo import AECEnv
from pettingzoo.utils import agent_selector

class MicroRTSAEC(AECEnv, MicroRTSGridModeSharedMemVecEnv):

    def __init__(
        self,
        opponent,
        agent_vision_patch=(5,5),
        partial_obs=False,
        max_steps=2000,
        render_theme=2,
        frame_skip=0,
        map_path="maps/10x10/basesTwoWorkers10x10.xml",
        reward_weight=np.array([0.0, 1.0, 0.0, 0.0, 0.0, 5.0]),
    ):
        self.agent_vision_patch = agent_vision_patch
        super(MicroRTSGridModeSharedMemVecEnv, self).__init__(
            0,
            1,
            partial_obs,
            max_steps,
            render_theme,
            frame_skip,
            [opponent],
            [map_paths],
            reward_weight,
        )
        self._agent_selector = agent_selector([]) # empty before we start
        self.agent_selection = None
        self.agent_observation_space = gym.spaces.Box(
            low=0.0,
            high=1.0,
            shape=(self.agent_vision_patch[0], self.agent_vision_patch[1], sum(self.num_planes)),
            dtype=np.int32
        )
        self.agent_action_space = gym.spaces.MultiDiscrete(np.array(self.action_space_dims))
        self._reset_actions = np.zeros_like(self.actions)

    def observation_space(self, agent):
        """All agents have the same obs space."""
        return self.agent_observation_space

    def action_space(self, agent):
        """All agents have the same action space."""
        return self.agent_action_space

    def reset(self):
        """Note that we don't return obs here as we do with Gym."""
        super(MicroRTSGridModeSharedMemVecEnv, self).reset()
        np.copyto(self.actions, self._reset_actions)
        all_agents = self.agents
        self._agent_selector.reinit(all_agents)
        self.agent_selection = self._agent_selector.next()
        self.infos = {agent:{} for agent in all_agents}
        self.dones = {agent:False for agent in all_agents}
        self._cumulative_rewards = {agent:0. for agent in all_agents}

    def step(self, action):
        agent = self.agent_selection
        # fill in action for a given agent
        np.copyto(self.actions[0][agent], action)
        if self._agent_selector.is_last():
            all_agents = self.agents
            obs, rewards, dones, infos = self.step_wait()
            self.infos = {agent:infos[0].copy() for agent in all_agents}
            self.dones = {agent:dones[0].copy() for agent in all_agents}
            self._cumulative_rewards = {agent:rewards[0]/len(all_agents) for agent in all_agents}
            # reset actions now, as we already used them in the environment
            np.copyto(self.actions, self._reset_actions)

    def observe(self, agent):
        return self.obs[0][agent]

    @property
    def max_num_agents(self):
        return self.height * self.width    

    @property
    def game_state(self):
        return self.vec_client[0].gs

    @property
    def agents(self):
        return [u.getPosition() for u in game_state.getUnits()]

@BolunDai0216
Copy link
Contributor

BolunDai0216 commented Feb 9, 2022

Thanks for sharing, this definitely gives me a nice place to start.

@vwxyzjn
Copy link
Collaborator Author

vwxyzjn commented Feb 9, 2022

@kachayev thanks for sharing this!

I think that the use case of having API for 2 players would be much easier

I agree. My first thought on this is gym-microrts's pettingzoo API should be very similar to chess's pettingzoo API that only has two players: https://www.pettingzoo.ml/classic/chess

@kachayev
Copy link
Contributor

kachayev commented Feb 9, 2022

@BolunDai0216 Absolutely!

@vwxyzjn if my memory doesn't fail me, chess is also implemented as AEC. So the API would look the same. I meant the implementation would be easier with static number of agents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants