You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* initial commit
* black
* unused imports
* fix this
* don't need this
* fix
* put import back
* import
* fix
* fix
* simplify
* name fix
* simplify
* fix
* introduce SinglesEnv
* fix tests
* unused import
* fix integration tests
* fix examples
* simplify
* update docs
* rename files
* format
* polish
* unused import
* condense code
* polish
* bugfix
* add strictness
* fix test
* format and parameterize strict
* add parameter
* fix
* format
* invalid causes default
* fix test
* fix test
* experiment
* debug
* debug
* fix bug
* cleanup
* put tests back to normal
* avoid returning DefaultBattleOrder if at all possible during conversions
* add "fake" parameter which will allow conversions to be fabricated if they are invalid, if at all possible
* add fake as parameter to env
* bugfix
* fix test
* bugfix
* add docstring
* up the timeout
* separate tests
* remove init
* new .rst docs
Copy file name to clipboardexpand all lines: docs/source/examples/rl_with_gymnasium_wrapper.rst
+5-7
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ Reinforcement learning with the Gymnasium wrapper
5
5
6
6
The corresponding complete source code can be found `here <https://github.com/hsahovic/poke-env/blob/master/examples/rl_with_new_gymnasium_wrapper.py>`__.
7
7
8
-
The goal of this example is to demonstrate how to use the `farama gymnasium <https://gymnasium.farama.org/>`__ interface proposed by ``EnvPlayer``, and to train a simple deep reinforcement learning agent comparable in performance to the ``MaxDamagePlayer`` we created in :ref:`max_damage_player`.
8
+
The goal of this example is to demonstrate how to use the `farama gymnasium <https://gymnasium.farama.org/>`__ interface proposed by ``PokeEnv``, and to train a simple deep reinforcement learning agent comparable in performance to the ``MaxDamagePlayer`` we created in :ref:`max_damage_player`.
9
9
10
10
.. note:: This example necessitates `keras-rl <https://github.com/keras-rl/keras-rl>`__ (compatible with Tensorflow 1.X) or `keras-rl2 <https://github.com/wau/keras-rl2>`__ (Tensorflow 2.X), which implement numerous reinforcement learning algorithms and offer a simple API fully compatible with the Gymnasium API. You can install them by running ``pip install keras-rl`` or ``pip install keras-rl2``. If you are unsure, ``pip install keras-rl2`` is recommended.
11
11
@@ -33,7 +33,7 @@ for each component of the embedding vector and return them as a ``gymnasium.Spac
33
33
Defining rewards
34
34
^^^^^^^^^^^^^^^^
35
35
36
-
Rewards are signals that the agent will use in its optimization process (a common objective is optimizing a discounted total reward). ``EnvPlayer`` objects provide a helper method, ``reward_computing_helper``, that can help defining simple symmetric rewards that take into account fainted pokemons, remaining hp, status conditions and victory.
36
+
Rewards are signals that the agent will use in its optimization process (a common objective is optimizing a discounted total reward). ``PokeEnv`` objects provide a helper method, ``reward_computing_helper``, that can help defining simple symmetric rewards that take into account fainted pokemons, remaining hp, status conditions and victory.
37
37
38
38
We will use this method to define the following reward:
39
39
@@ -135,8 +135,6 @@ Instantiating train environment and evaluation environment
Normally, to ensure isolation between training and testing, two different environments are created.
138
-
The base class ``EnvPlayer`` allows you to choose the opponent either when you instantiate it or replace it during training
139
-
with the ``set_opponent`` method.
140
138
If you don't want the player to start challenging the opponent you can set ``start_challenging=False`` when creating it.
141
139
In this case, we want them to start challenging right away:
142
140
@@ -270,7 +268,7 @@ This can be done with the following code:
270
268
)
271
269
...
272
270
273
-
The ``reset_env`` method of the ``EnvPlayer`` class allows you to reset the environment
271
+
The ``reset_env`` method of the ``PokeEnv`` class allows you to reset the environment
274
272
to a clean state, including internal counters for victories, battles, etc.
275
273
276
274
It takes two optional parameters:
@@ -301,7 +299,7 @@ In order to evaluate the player with the provided method, we need to use a backg
301
299
302
300
The ``result`` method of the ``Future`` object will block until the task is done and will return the result.
303
301
304
-
.. warning:: ``background_evaluate_player`` requires the challenge loop to be stopped. To ensure this use method ``reset_env(restart=False)`` of ``EnvPlayer``.
302
+
.. warning:: ``background_evaluate_player`` requires the challenge loop to be stopped. To ensure this use method ``reset_env(restart=False)`` of ``PokeEnv``.
305
303
306
304
.. warning:: If you call ``result`` before the task is finished, the main thread will be blocked. Only do that if the agent is operating on a different thread than the one asking for the result.
307
305
@@ -337,7 +335,7 @@ To use the ``cross_evaluate`` method, the strategy is the same to the one used f
337
335
print(tabulate(table))
338
336
...
339
337
340
-
.. warning:: ``background_cross_evaluate`` requires the challenge loop to be stopped. To ensure this use method ``reset_env(restart=False)`` of ``EnvPlayer``.
338
+
.. warning:: ``background_cross_evaluate`` requires the challenge loop to be stopped. To ensure this use method ``reset_env(restart=False)`` of ``PokeEnv``.
341
339
342
340
.. warning:: If you call ``result`` before the task is finished, the main thread will be blocked. Only do that if the agent is operating on a different thread than the one asking for the result.
0 commit comments