hsahovic
diff --git a/‎docs/source/examples/rl_with_gymnasium_wrapper.rst
+5-7 b/‎docs/source/examples/rl_with_gymnasium_wrapper.rst
+5-7
diff --git a/‎docs/source/modules/env.rst
+22 b/‎docs/source/modules/env.rst
+22
diff --git a/‎docs/source/modules/player.rst
-17 b/‎docs/source/modules/player.rst
-17
diff --git a/‎examples/env_example.py
+42 b/‎examples/env_example.py
+42
diff --git a/‎examples/gymnasium_example.py
-95 b/‎examples/gymnasium_example.py
-95
diff --git a/‎integration_tests/test_env.py
+84 b/‎integration_tests/test_env.py
+84
@@ -5,7 +5,7 @@ Reinforcement learning with the Gymnasium wrapper
 
 The corresponding complete source code can be found `here <https://github.com/hsahovic/poke-env/blob/master/examples/rl_with_new_gymnasium_wrapper.py>`__.
 
-The goal of this example is to demonstrate how to use the `farama gymnasium <https://gymnasium.farama.org/>`__ interface proposed by ``EnvPlayer``, and to train a simple deep reinforcement learning agent comparable in performance to the ``MaxDamagePlayer`` we created in :ref:`max_damage_player`.
+The goal of this example is to demonstrate how to use the `farama gymnasium <https://gymnasium.farama.org/>`__ interface proposed by ``PokeEnv``, and to train a simple deep reinforcement learning agent comparable in performance to the ``MaxDamagePlayer`` we created in :ref:`max_damage_player`.
 
 .. note:: This example necessitates `keras-rl <https://github.com/keras-rl/keras-rl>`__ (compatible with Tensorflow 1.X) or `keras-rl2 <https://github.com/wau/keras-rl2>`__ (Tensorflow 2.X), which implement numerous reinforcement learning algorithms and offer a simple API fully compatible with the Gymnasium API. You can install them by running ``pip install keras-rl`` or ``pip install keras-rl2``. If you are unsure, ``pip install keras-rl2`` is recommended.
 
@@ -33,7 +33,7 @@ for each component of the embedding vector and return them as a ``gymnasium.Spac
 Defining rewards
 ^^^^^^^^^^^^^^^^
 
-Rewards are signals that the agent will use in its optimization process (a common objective is optimizing a discounted total reward). ``EnvPlayer`` objects provide a helper method, ``reward_computing_helper``, that can help defining simple symmetric rewards that take into account fainted pokemons, remaining hp, status conditions and victory.
+Rewards are signals that the agent will use in its optimization process (a common objective is optimizing a discounted total reward). ``PokeEnv`` objects provide a helper method, ``reward_computing_helper``, that can help defining simple symmetric rewards that take into account fainted pokemons, remaining hp, status conditions and victory.
 
 We will use this method to define the following reward:
 
@@ -135,8 +135,6 @@ Instantiating train environment and evaluation environment
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Normally, to ensure isolation between training and testing, two different environments are created.
-The base class ``EnvPlayer`` allows you to choose the opponent either when you instantiate it or replace it during training
-with the ``set_opponent`` method.
 If you don't want the player to start challenging the opponent you can set ``start_challenging=False`` when creating it.
 In this case, we want them to start challenging right away:
 
@@ -270,7 +268,7 @@ This can be done with the following code:
     )
     ...
 
-The ``reset_env`` method of the ``EnvPlayer`` class allows you to reset the environment
+The ``reset_env`` method of the ``PokeEnv`` class allows you to reset the environment
 to a clean state, including internal counters for victories, battles, etc.
 
 It takes two optional parameters:
@@ -301,7 +299,7 @@ In order to evaluate the player with the provided method, we need to use a backg
 
 The ``result`` method of the ``Future`` object will block until the task is done and will return the result.
 
-.. warning:: ``background_evaluate_player`` requires the challenge loop to be stopped. To ensure this use method ``reset_env(restart=False)`` of ``EnvPlayer``.
+.. warning:: ``background_evaluate_player`` requires the challenge loop to be stopped. To ensure this use method ``reset_env(restart=False)`` of ``PokeEnv``.
 
 .. warning:: If you call ``result`` before the task is finished, the main thread will be blocked. Only do that if the agent is operating on a different thread than the one asking for the result.
 
@@ -337,7 +335,7 @@ To use the ``cross_evaluate`` method, the strategy is the same to the one used f
     print(tabulate(table))
     ...
 
-.. warning:: ``background_cross_evaluate`` requires the challenge loop to be stopped. To ensure this use method ``reset_env(restart=False)`` of ``EnvPlayer``.
+.. warning:: ``background_cross_evaluate`` requires the challenge loop to be stopped. To ensure this use method ``reset_env(restart=False)`` of ``PokeEnv``.
 
 .. warning:: If you call ``result`` before the task is finished, the main thread will be blocked. Only do that if the agent is operating on a different thread than the one asking for the result.
 
 
@@ -0,0 +1,22 @@
+.. _env:
+
+The env object and related subclasses
+========================================
+
+.. contents:: :local:
+
+PokeEnv
+******
+
+.. automodule:: poke_env.player.env
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+SinglesEnv
+*************
+
+.. automodule:: poke_env.player.singles_env
+   :members:
+   :undoc-members:
+   :show-inheritance:
@@ -5,14 +5,6 @@ The player object and related subclasses
 
 .. contents:: :local:
 
-Env player
-**********
-
-.. automodule:: poke_env.player.env_player
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
 Player
 ******
 
@@ -21,19 +13,10 @@ Player
    :undoc-members:
    :show-inheritance:
 
-GymnasiumEnv
-************
-
-.. automodule:: poke_env.player.gymnasium_api
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
 Random Player
 *************
 
 .. automodule:: poke_env.player.random_player
    :members:
    :undoc-members:
    :show-inheritance:
-
@@ -0,0 +1,42 @@
+import numpy as np
+import numpy.typing as npt
+from gymnasium.spaces import Box
+from pettingzoo.test.parallel_test import parallel_api_test
+
+from poke_env.environment.abstract_battle import AbstractBattle
+from poke_env.player import SinglesEnv
+
+
+class TestEnv(SinglesEnv[npt.NDArray[np.float32]]):
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        self.observation_spaces = {
+            agent: Box(np.array([0, 0]), np.array([6, 6]), dtype=np.float32)
+            for agent in self.possible_agents
+        }
+
+    def calc_reward(self, battle) -> float:
+        return self.reward_computing_helper(battle)
+
+    def embed_battle(self, battle: AbstractBattle):
+        to_embed = []
+        fainted_mons = 0
+        for mon in battle.team.values():
+            if mon.fainted:
+                fainted_mons += 1
+        to_embed.append(fainted_mons)
+        fainted_enemy_mons = 0
+        for mon in battle.opponent_team.values():
+            if mon.fainted:
+                fainted_enemy_mons += 1
+        to_embed.append(fainted_enemy_mons)
+        return np.array(to_embed)
+
+
+if __name__ == "__main__":
+    gymnasium_env = TestEnv(
+        battle_format="gen8randombattle",
+        start_challenging=True,
+    )
+    parallel_api_test(gymnasium_env)
+    gymnasium_env.close()
@@ -0,0 +1,84 @@
+import numpy as np
+import pytest
+from gymnasium.spaces import Box
+from pettingzoo.test.parallel_test import parallel_api_test
+
+from poke_env.player import SinglesEnv
+
+
+class SinglesTestEnv(SinglesEnv):
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        self.observation_spaces = {
+            agent: Box(np.array([0]), np.array([1]), dtype=np.int64)
+            for agent in self.possible_agents
+        }
+
+    def calc_reward(self, battle) -> float:
+        return 0.0
+
+    def embed_battle(self, battle):
+        return np.array([0])
+
+
+def play_function(env, n_battles):
+    for _ in range(n_battles):
+        done = False
+        env.reset()
+        while not done:
+            actions = {name: env.action_space(name).sample() for name in env.agents}
+            _, _, terminated, truncated, _ = env.step(actions)
+            done = any(terminated.values()) or any(truncated.values())
+
+
+@pytest.mark.timeout(120)
+def test_env_run():
+    for gen in range(4, 10):
+        env = SinglesTestEnv(
+            battle_format=f"gen{gen}randombattle",
+            log_level=25,
+            start_challenging=False,
+            strict=False,
+        )
+        env.start_challenging(3)
+        play_function(env, 3)
+        env.close()
+
+
+@pytest.mark.timeout(60)
+def test_repeated_runs():
+    env = SinglesTestEnv(
+        battle_format="gen8randombattle",
+        log_level=25,
+        start_challenging=False,
+        strict=False,
+    )
+    env.start_challenging(2)
+    play_function(env, 2)
+    env.start_challenging(2)
+    play_function(env, 2)
+    env.close()
+    env = SinglesTestEnv(
+        battle_format="gen9randombattle",
+        log_level=25,
+        start_challenging=False,
+        strict=False,
+    )
+    env.start_challenging(2)
+    play_function(env, 2)
+    env.start_challenging(2)
+    play_function(env, 2)
+    env.close()
+
+
+@pytest.mark.timeout(60)
+def test_env_api():
+    for gen in range(4, 10):
+        env = SinglesTestEnv(
+            battle_format=f"gen{gen}randombattle",
+            log_level=25,
+            start_challenging=True,
+            strict=False,
+        )
+        parallel_api_test(env)
+        env.close()