[Fix] Remove PettinZoo pursuit task

Signed-off-by: Matteo Bettini <[email protected]>
facebookresearch · Oct 8, 2023 · 8bec460 · 8bec460
1 parent 1eaf349
commit 8bec460
Show file tree

Hide file tree

Showing 7 changed files with 26 additions and 79 deletions.
diff --git a/README.md b/README.md
@@ -189,35 +189,35 @@ See the [run](#run) section for more information.
 **Algorithms**. Algorithms are an ensemble of components (e.g., losss, replay buffer) which
 determine the training strategy. Here is a table with the currently implemented algorithms in BenchMARL.
 
-| Name                                   | On/Off policy | Actor-critic | Full-observability in critic | Action compatibility          | Probabilistic actor |   
-|----------------------------------------|---------------|--------------|------------------------------|-------------------------------|---------------------|
-| [MAPPO](https://arxiv.org/abs/2103.01955)                              | On            | Yes          | Yes                          | Continuous + Discrete         | Yes                 |   
-| [IPPO](https://arxiv.org/abs/2011.09533)                               | On            | Yes          | No                           | Continuous + Discrete         | Yes                 |  
-| [MADDPG](https://arxiv.org/abs/1706.02275)                             | Off           | Yes          | Yes                          | Continuous                    | No                  | 
-| [IDDPG](benchmarl/algorithms/iddpg.py) | Off           | Yes          | No                           | Continuous                    |  No                 |   
-| [MASAC](benchmarl/algorithms/masac.py) | Off           | Yes          | Yes                          | Continuous + Discrete         |  Yes                |   
-| [ISAC](benchmarl/algorithms/isac.py)   | Off           | Yes          | No                           | Continuous + Discrete         |  Yes                |   
-| [QMIX](https://arxiv.org/abs/1803.11485)                               | Off           | No           | NA                           | Discrete                      |  No                 | 
-| [VDN](https://arxiv.org/abs/1706.05296)                                | Off           | No           | NA                           | Discrete                      |  No                 |  
-| [IQL](https://www.semanticscholar.org/paper/Multi-Agent-Reinforcement-Learning%3A-Independent-Tan/59de874c1e547399b695337bcff23070664fa66e)                                | Off           | No           | NA                           | Discrete                      |  No                 |  
+| Name                                                                                                                                        | On/Off policy | Actor-critic | Full-observability in critic | Action compatibility  | Probabilistic actor |   
+|---------------------------------------------------------------------------------------------------------------------------------------------|---------------|--------------|------------------------------|-----------------------|---------------------|
+| [MAPPO](https://arxiv.org/abs/2103.01955)                                                                                                   | On            | Yes          | Yes                          | Continuous + Discrete | Yes                 |   
+| [IPPO](https://arxiv.org/abs/2011.09533)                                                                                                    | On            | Yes          | No                           | Continuous + Discrete | Yes                 |  
+| [MADDPG](https://arxiv.org/abs/1706.02275)                                                                                                  | Off           | Yes          | Yes                          | Continuous            | No                  | 
+| [IDDPG](benchmarl/algorithms/iddpg.py)                                                                                                      | Off           | Yes          | No                           | Continuous            | No                  |   
+| [MASAC](benchmarl/algorithms/masac.py)                                                                                                      | Off           | Yes          | Yes                          | Continuous + Discrete | Yes                 |   
+| [ISAC](benchmarl/algorithms/isac.py)                                                                                                        | Off           | Yes          | No                           | Continuous + Discrete | Yes                 |   
+| [QMIX](https://arxiv.org/abs/1803.11485)                                                                                                    | Off           | No           | NA                           | Discrete              | No                  | 
+| [VDN](https://arxiv.org/abs/1706.05296)                                                                                                     | Off           | No           | NA                           | Discrete              | No                  |  
+| [IQL](https://www.semanticscholar.org/paper/Multi-Agent-Reinforcement-Learning%3A-Independent-Tan/59de874c1e547399b695337bcff23070664fa66e) | Off           | No           | NA                           | Discrete              | No                  |  
 
 
 **Tasks**. Tasks are scenarios from a specific environment which constitute the MARL
 challenge to solve.
 They differ based on many aspects, here is a table with the current environments in BenchMARL
 
-| Enviromnent | Tasks                               | Cooperation               | Global state | Reward function               | 
-|-------------|-------------------------------------|---------------------------|--------------|-------------------------------|
-| [VMAS](https://github.com/proroklab/VectorizedMultiAgentSimulator) | [5](benchmarl/conf/task/vmas)       | Cooperative + Competitive | No           | Shared + Independent + Global |  
-| [SMACv2](https://github.com/oxwhirl/smacv2) | [15](benchmarl/conf/task/smacv2)    | Cooperative               | Yes          | Global                        |  
-| [MPE](https://github.com/openai/multiagent-particle-envs)     | [8](benchmarl/conf/task/pettingzoo) | Cooperative + Competitive | Yes          | Shared + Independent          |   
-| [SISL](https://github.com/sisl/MADRL)    | [3](benchmarl/conf/task/pettingzoo) | Cooperative               | No           | Shared                        |  
+| Environment                                                        | Tasks                               | Cooperation               | Global state | Reward function               | Action space          | 
+|--------------------------------------------------------------------|-------------------------------------|---------------------------|--------------|-------------------------------|-----------------------|
+| [VMAS](https://github.com/proroklab/VectorizedMultiAgentSimulator) | [5](benchmarl/conf/task/vmas)       | Cooperative + Competitive | No           | Shared + Independent + Global | Continuous + Discrete |
+| [SMACv2](https://github.com/oxwhirl/smacv2)                        | [15](benchmarl/conf/task/smacv2)    | Cooperative               | Yes          | Global                        | Discrete              |
+| [MPE](https://github.com/openai/multiagent-particle-envs)          | [8](benchmarl/conf/task/pettingzoo) | Cooperative + Competitive | Yes          | Shared + Independent          | Continuous + Discrete |
+| [SISL](https://github.com/sisl/MADRL)                              | [2](benchmarl/conf/task/pettingzoo) | Cooperative               | No           | Shared                        | Continuous            |
 
 > [!NOTE]  
 > BenchMARL uses the [TorchRL MARL API](https://github.com/pytorch/rl/issues/1463) for grouping agents.
 > In competitive environments like MPE, for example, teams will be in different groups. Each group has its own loss,
 > models, buffers, and so on. Parameter sharing options refer to sharing within the group. See the example on [creating
-> a custom algorithm](examples/extending/custom_algorithm.py) for more info.
+> a custom algorithm](examples/extending/algorithm/custom_algorithm.py) for more info.
 
 **Models**. Models are neural networks used to process data. They can be used as actors (policies) or, 
 when possible, as critics. We provide a set of base models (layers) and a SequenceModel to concatenate
@@ -257,11 +257,11 @@ For this reason we expose standard interfaces for [algorithms](benchmarl/algorit
 To introduce your solution in the library, you just need to implement the abstract methods
 exposed by these base classes which use objects from the [TorchRL](https://github.com/pytorch/rl) library.
 
-Here is an example on how you can create a custom algorithm [![Example](https://img.shields.io/badge/Example-blue.svg)](examples/extending/custom_algorithm.py).
+Here is an example on how you can create a custom algorithm [![Example](https://img.shields.io/badge/Example-blue.svg)](examples/extending/algorithm).
 
-Here is an example on how you can create a custom task [![Example](https://img.shields.io/badge/Example-blue.svg)](examples/extending/custom_task.py).
+Here is an example on how you can create a custom task [![Example](https://img.shields.io/badge/Example-blue.svg)](examples/extending/task).
 
-Here is an example on how you can create a custom model [![Example](https://img.shields.io/badge/Example-blue.svg)](examples/extending/custom_model.py).
+Here is an example on how you can create a custom model [![Example](https://img.shields.io/badge/Example-blue.svg)](examples/extending/model).
 
 
 ## Configuring

diff --git a/benchmarl/conf/task/pettingzoo/pursuit.yaml b/benchmarl/conf/task/pettingzoo/pursuit.yaml
diff --git a/benchmarl/environments/__init__.py b/benchmarl/environments/__init__.py
@@ -12,7 +12,6 @@
 
 
 from .pettingzoo.multiwalker import TaskConfig as MultiwalkerConfig
-from .pettingzoo.pursuit import TaskConfig as PursuitConfig
 from .pettingzoo.simple_adverasary import TaskConfig as SimpleAdversaryConfig
 from .pettingzoo.simple_crypto import TaskConfig as SimpleCryptoConfig
 from .pettingzoo.simple_push import TaskConfig as SimplePushConfig
@@ -37,7 +36,6 @@
     "vmas_transport_config": TransportConfig,
     "vmas_wheel_config": WheelConfig,
     "pettingzoo_multiwalker_config": MultiwalkerConfig,
-    "pettingzoo_pursuit_config": PursuitConfig,
     "pettingzoo_waterworld_config": WaterworldConfig,
     "pettingzoo_simple_adversary_config": SimpleAdversaryConfig,
     "pettingzoo_simple_crypto_config": SimpleCryptoConfig,

diff --git a/benchmarl/environments/pettingzoo/common.py b/benchmarl/environments/pettingzoo/common.py
@@ -10,6 +10,7 @@
 
 class PettingZooTask(Task):
     MULTIWALKER = None
+    WATERWORLD = None
     SIMPLE_ADVERSARY = None
     SIMPLE_CRYPTO = None
     SIMPLE_PUSH = None
@@ -42,6 +43,7 @@ def get_env_fun(
     def supports_continuous_actions(self) -> bool:
         if self in {
             PettingZooTask.MULTIWALKER,
+            PettingZooTask.WATERWORLD,
             PettingZooTask.SIMPLE_TAG,
             PettingZooTask.SIMPLE_ADVERSARY,
             PettingZooTask.SIMPLE_CRYPTO,
@@ -88,7 +90,7 @@ def has_state(self) -> bool:
     def has_render(self, env: EnvBase) -> bool:
         return True
 
-    def max_steps(self, env: EnvBase) -> bool:
+    def max_steps(self, env: EnvBase) -> int:
         return self.config["max_cycles"]
 
     def group_map(self, env: EnvBase) -> Dict[str, List[str]]:

diff --git a/benchmarl/environments/pettingzoo/pursuit.py b/benchmarl/environments/pettingzoo/pursuit.py
diff --git a/benchmarl/environments/smacv2/common.py b/benchmarl/environments/smacv2/common.py
@@ -47,7 +47,7 @@ def supports_discrete_actions(self) -> bool:
     def has_render(self, env: EnvBase) -> bool:
         return True
 
-    def max_steps(self, env: EnvBase) -> bool:
+    def max_steps(self, env: EnvBase) -> int:
         return env.episode_limit
 
     def group_map(self, env: EnvBase) -> Dict[str, List[str]]:

diff --git a/benchmarl/environments/vmas/common.py b/benchmarl/environments/vmas/common.py
@@ -41,7 +41,7 @@ def supports_discrete_actions(self) -> bool:
     def has_render(self, env: EnvBase) -> bool:
         return True
 
-    def max_steps(self, env: EnvBase) -> bool:
+    def max_steps(self, env: EnvBase) -> int:
         return self.config["max_steps"]
 
     def group_map(self, env: EnvBase) -> Dict[str, List[str]]: