Merge branch 'develop' into toni/isaaclab_examples

Toni-SM · Jan 17, 2025 · 5a08bc4 · 5a08bc4
2 parents 7eb8b16 + d57c8ea
commit 5a08bc4
Show file tree

Hide file tree

Showing 289 changed files with 16,213 additions and 7,591 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.yaml b/.github/ISSUE_TEMPLATE/bug_report.yaml
@@ -30,6 +30,8 @@ body:
     description: The skrl version can be obtained with the command `pip show skrl`.
     options:
       - ---
+      - 1.4.0
+      - 1.3.0
       - 1.2.0
       - 1.1.0
       - 1.0.0

diff --git a/.github/workflows/python-publish-manual.yml b/.github/workflows/python-publish-manual.yml
@@ -15,7 +15,7 @@ jobs:
 
   pypi:
     name: Publish package to PyPI
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
     if: ${{ github.event.inputs.job == 'pypi'}}
 
     steps:
@@ -24,7 +24,7 @@ jobs:
     - name: Set up Python
       uses: actions/setup-python@v3
       with:
-        python-version: '3.7'
+        python-version: '3.10.16'
 
     - name: Install dependencies
       run: |
@@ -43,7 +43,7 @@ jobs:
 
   test-pypi:
     name: Publish package to TestPyPI
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
     if: ${{ github.event.inputs.job == 'test-pypi'}}
 
     steps:
@@ -52,7 +52,7 @@ jobs:
     - name: Set up Python
       uses: actions/setup-python@v3
       with:
-        python-version: '3.7'
+        python-version: '3.10.16'
 
     - name: Install dependencies
       run: |

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,15 +1,41 @@
 repos:
 - repo: https://github.com/pre-commit/pre-commit-hooks
-  rev: v4.4.0
-  hooks:
-  - id: check-ast
-  - id: check-case-conflict
-  - id: check-docstring-first
-  - id: check-merge-conflict
-  - id: check-yaml
-  - id: end-of-file-fixer
-  - id: trailing-whitespace
+  rev: v4.6.0
+  hooks:
+    - id: check-ast
+    - id: check-case-conflict
+    - id: check-docstring-first
+    - id: check-json
+    - id: check-merge-conflict
+    - id: check-toml
+    - id: check-yaml
+    - id: debug-statements
+    - id: detect-private-key
+    - id: end-of-file-fixer
+    - id: name-tests-test
+      args: ["--pytest-test-first"]
+      exclude: ^(tests/strategies.py|tests/utils.py)
+    - id: trailing-whitespace
+- repo: https://github.com/codespell-project/codespell
+  rev: v2.3.0
+  hooks:
+    - id: codespell
+      exclude: ^(docs/source/_static|docs/_build|pyproject.toml)
+      additional_dependencies:
+        - tomli
+- repo: https://github.com/python/black
+  rev: 24.8.0
+  hooks:
+    - id: black
+      args: ["--line-length=120"]
+      exclude: ^(docs/)
 - repo: https://github.com/pycqa/isort
-  rev: 5.12.0
+  rev: 5.13.2
+  hooks:
+    - id: isort
+- repo: https://github.com/pre-commit/pygrep-hooks
+  rev: v1.10.0
   hooks:
-  - id: isort
+    - id: rst-backticks
+    - id: rst-directive-colons
+    - id: rst-inline-touching-normal
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,7 +2,48 @@
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
-## [1.3.0] - Unreleased
+## [1.4.0] - 2025-01-16
+### Added
+- Utilities to operate on Gymnasium spaces (`Box`, `Discrete`, `MultiDiscrete`, `Tuple` and `Dict`)
+- `parse_device` static method in ML framework configuration (used in library components to set up the device)
+- Model instantiator support for different shared model structures in PyTorch
+- Support for automatic mixed precision training in PyTorch
+- `init_state_dict` method to initialize model's lazy modules in PyTorch
+- Model instantiators `fixed_log_std` parameter to define immutable log standard deviations
+- Define the `stochastic_evaluation` trainer config to allow the use of the actions returned by the agent's model
+  as-is instead of deterministic actions (mean-actions in Gaussian-based models) during evaluation.
+  Make the return of deterministic actions the default behavior.
+
+### Changed
+- Call agent's `pre_interaction` method during evaluation
+- Use spaces utilities to process states, observations and actions for all the library components
+- Update model instantiators definitions to process supported fundamental and composite Gymnasium spaces
+- Make flattened tensor storage in memory the default option (revert changed introduced in version 1.3.0)
+- Drop support for PyTorch versions prior to 1.10 (the previous supported version was 1.9)
+- Update KL Adaptive learning rate scheduler implementation to match Optax's behavior in JAX
+- Update AMP agent to use the environment's terminated and truncated data, and the KL Adaptive learning rate scheduler
+- Update runner implementations to support definition of arbitrary agents and their models
+- Speed up PyTorch implementation:
+  - Disable argument checking when instantiating distributions
+  - Replace PyTorch's `BatchSampler` by Python slice when sampling data from memory
+
+### Changed (breaking changes: style)
+- Format code using Black code formatter (it's ugly, yes, but it does its job)
+
+### Fixed
+- Move the batch sampling inside gradient step loop for DQN, DDQN, DDPG (RNN), TD3 (RNN), SAC and SAC (RNN)
+- Model state dictionary initialization for composite Gymnasium spaces in JAX
+- Add missing `reduction` parameter to Gaussian model instantiator
+- Optax's learning rate schedulers integration in JAX implementation
+- Isaac Lab wrapper's multi-agent state retrieval with gymnasium 1.0
+- Treat truncation signal when computing 'done' (environment reset)
+
+### Removed
+- Remove OpenAI Gym (`gym`) from dependencies and source code. **skrl** continues to support gym environments,
+  it is just not installed as part of the library. If it is needed, it needs to be installed manually.
+  Any gym-based environment wrapper must use the `convert_gym_space` space utility to operate
+
+## [1.3.0] - 2024-09-11
 ### Added
 - Distributed multi-GPU and multi-node learning (JAX implementation)
 - Utilities to start multiple processes from a single program invocation for distributed learning using JAX
@@ -14,8 +55,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 ### Changed
 - Move the KL reduction from the PyTorch `KLAdaptiveLR` class to each agent that uses it in distributed runs
 - Move the PyTorch distributed initialization from the agent base class to the ML framework configuration
-- Implement model instantiators using dynamic execution of Python code
+- Upgrade model instantiator implementations to support CNN layers and complex network definitions,
+  and implement them using dynamic execution of Python code
 - Update Isaac Lab environment loader argument parser options to match Isaac Lab version
+- Allow to store tensors/arrays with their original dimensions in memory and make it the default option
 
 ### Changed (breaking changes)
 - Decouple the observation and state spaces in single and multi-agent environment wrappers and add the `state`
@@ -24,9 +67,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
 ### Fixed
 - Catch TensorBoard summary iterator exceptions in `TensorboardFileIterator` postprocessing utils
-- Fix automatic wrapper detection for Isaac Gym (previews), DeepMind and vectorized Gymnasium environments
+- Fix automatic wrapper detection issue (introduced in previous version) for Isaac Gym (previews),
+  DeepMind and vectorized Gymnasium environments
 - Fix vectorized/parallel environments `reset` method return values when called more than once
-- IPPO and MAPPO `act` method return values when JAX-NumPy backend is enabled
+- Fix IPPO and MAPPO `act` method return values when JAX-NumPy backend is enabled
 
 ## [1.2.0] - 2024-06-23
 ### Added
@@ -67,7 +111,7 @@ Summary of the most relevant features:
 ## [1.0.0-rc.2] - 2023-08-11
 ### Added
 - Get truncation from `time_outs` info in Isaac Gym, Isaac Orbit and Omniverse Isaac Gym environments
-- Time-limit (truncation) boostrapping in on-policy actor-critic agents
+- Time-limit (truncation) bootstrapping in on-policy actor-critic agents
 - Model instantiators `initial_log_std` parameter to set the log standard deviation's initial value
 
 ### Changed (breaking changes)
@@ -81,7 +125,7 @@ Summary of the most relevant features:
     - `from skrl.envs.loaders.jax import load_omniverse_isaacgym_env`
 
 ### Changed
-- Drop support for versions prior to PyTorch 1.9 (1.8.0 and 1.8.1)
+- Drop support for PyTorch versions prior to 1.9 (the previous supported version was 1.8)
 
 ## [1.0.0-rc.1] - 2023-07-25
 ### Added
@@ -174,7 +218,7 @@ to allow storing samples in memories during evaluation
 - Parameter `role` to model methods
 - Wrapper compatibility with the new OpenAI Gym environment API
 - Internal library colored logger
-- Migrate checkpoints/models from other RL libraries to skrl models/agents
+- Migrate checkpoints/models from other RL libraries to **skrl** models/agents
 - Configuration parameter `store_separately` to agent configuration dict
 - Save/load agent modules (models, optimizers, preprocessors)
 - Set random seed and configure deterministic behavior for reproducibility

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -54,7 +54,7 @@ Read the code a little bit and you will understand it at first glance... Also
   ```ini
   function annotation (e.g. typing)
   # insert an empty line
-  python libraries and other libraries (e.g. gym, numpy, time, etc.)
+  python libraries and other libraries (e.g. gymnasium, numpy, time, etc.)
   # insert an empty line
   machine learning framework modules (e.g. torch, torch.nn)
   # insert an empty line

diff --git a/docs/source/api/agents.rst b/docs/source/api/agents.rst
@@ -119,7 +119,6 @@ API (PyTorch)
     :private-members: _update, _empty_preprocessor, _get_internal_value
     :members:
 
-    .. automethod:: __init__
     .. automethod:: __str__
 
 .. raw:: html
@@ -136,5 +135,4 @@ API (JAX)
     :private-members: _update, _empty_preprocessor, _get_internal_value
     :members:
 
-    .. automethod:: __init__
     .. automethod:: __str__
diff --git a/docs/source/api/agents/a2c.rst b/docs/source/api/agents/a2c.rst
@@ -25,7 +25,7 @@ Algorithm implementation
 
 | Main notation/symbols:
 |   - policy function approximator (:math:`\pi_\theta`), value function approximator (:math:`V_\phi`)
-|   - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), dones (:math:`d`)
+|   - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), terminated (:math:`d_{_{end}}`), truncated (:math:`d_{_{timeout}}`)
 |   - values (:math:`V`), advantages (:math:`A`), returns (:math:`R`)
 |   - log probabilities (:math:`logp`)
 |   - loss (:math:`L`)
@@ -59,7 +59,7 @@ Learning algorithm
 | :literal:`_update(...)`
 | :green:`# compute returns and advantages`
 | :math:`V_{_{last}}' \leftarrow V_\phi(s')`
-| :math:`R, A \leftarrow f_{GAE}(r, d, V, V_{_{last}}')`
+| :math:`R, A \leftarrow f_{GAE}(r, d_{_{end}} \lor d_{_{timeout}}, V, V_{_{last}}')`
 | :green:`# sample mini-batches from memory`
 | [[:math:`s, a, logp, V, R, A`]] :math:`\leftarrow` states, actions, log_prob, values, returns, advantages
 | :green:`# mini-batches loop`
@@ -232,6 +232,10 @@ Support for advanced features is described in the next table
       - RNN, LSTM, GRU and any other variant
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\square`
+    * - Mixed precision
+      - Automatic mixed precision
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\square`
     * - Distributed
       - Single Program Multi Data (SPMD) multi-GPU
       - .. centered:: :math:`\blacksquare`
@@ -252,16 +256,12 @@ API (PyTorch)
     :private-members: _update
     :members:
 
-    .. automethod:: __init__
-
 .. autoclass:: skrl.agents.torch.a2c.A2C_RNN
     :undoc-members:
     :show-inheritance:
     :private-members: _update
     :members:
 
-    .. automethod:: __init__
-
 .. raw:: html
 
     <br>
@@ -276,5 +276,3 @@ API (JAX)
     :show-inheritance:
     :private-members: _update
     :members:
-
-    .. automethod:: __init__
diff --git a/docs/source/api/agents/amp.rst b/docs/source/api/agents/amp.rst
@@ -21,7 +21,7 @@ Algorithm implementation
 
 | Main notation/symbols:
 |   - policy (:math:`\pi_\theta`), value (:math:`V_\phi`) and discriminator (:math:`D_\psi`) function approximators
-|   - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), dones (:math:`d`)
+|   - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), terminated (:math:`d_{_{end}}`), truncated (:math:`d_{_{timeout}}`)
 |   - values (:math:`V`), next values (:math:`V'`), advantages (:math:`A`), returns (:math:`R`)
 |   - log probabilities (:math:`logp`)
 |   - loss (:math:`L`)
@@ -57,7 +57,7 @@ Learning algorithm
 | :math:`r_D \leftarrow -log(\text{max}( 1 - \hat{y}(D_\psi(s_{_{AMP}})), \, 10^{-4})) \qquad` with :math:`\; \hat{y}(x) = \dfrac{1}{1 + e^{-x}}`
 | :math:`r' \leftarrow` :guilabel:`task_reward_weight` :math:`r \, +` :guilabel:`style_reward_weight` :guilabel:`discriminator_reward_scale` :math:`r_D`
 | :green:`# compute returns and advantages`
-| :math:`R, A \leftarrow f_{GAE}(r', d, V, V')`
+| :math:`R, A \leftarrow f_{GAE}(r', d_{_{end}} \lor d_{_{timeout}}, V, V')`
 | :green:`# sample mini-batches from memory`
 | [[:math:`s, a, logp, V, R, A, s_{_{AMP}}`]] :math:`\leftarrow` states, actions, log_prob, values, returns, advantages, AMP states
 | [[:math:`s_{_{AMP}}^{^M}`]] :math:`\leftarrow` AMP states from :math:`M`
@@ -237,6 +237,10 @@ Support for advanced features is described in the next table
       - \-
       - .. centered:: :math:`\square`
       - .. centered:: :math:`\square`
+    * - Mixed precision
+      - Automatic mixed precision
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\square`
     * - Distributed
       - Single Program Multi Data (SPMD) multi-GPU
       - .. centered:: :math:`\blacksquare`
@@ -256,5 +260,3 @@ API (PyTorch)
     :show-inheritance:
     :private-members: _update
     :members:
-
-    .. automethod:: __init__
diff --git a/docs/source/api/agents/cem.rst b/docs/source/api/agents/cem.rst
@@ -17,7 +17,7 @@ Algorithm implementation
 
 | Main notation/symbols:
 |   - policy function approximator (:math:`\pi_\theta`)
-|   - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), dones (:math:`d`)
+|   - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), terminated (:math:`d_{_{end}}`), truncated (:math:`d_{_{timeout}}`)
 |   - loss (:math:`L`)
 
 .. raw:: html
@@ -41,7 +41,7 @@ Learning algorithm
 |
 | :literal:`_update(...)`
 | :green:`# sample all memory`
-| :math:`s, a, r, s', d \leftarrow` states, actions, rewards, next_states, dones
+| :math:`s, a, r \leftarrow` states, actions, rewards
 | :green:`# compute discounted return threshold`
 | :math:`[G] \leftarrow \sum_{t=0}^{E-1}` :guilabel:`discount_factor`:math:`^{t} \, r_t` for each episode
 | :math:`G_{_{bound}} \leftarrow q_{th_{quantile}}([G])` at the given :guilabel:`percentile`
@@ -175,6 +175,10 @@ Support for advanced features is described in the next table
       - \-
       - .. centered:: :math:`\square`
       - .. centered:: :math:`\square`
+    * - Mixed precision
+      - Automatic mixed precision
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\square`
     * - Distributed
       - \-
       - .. centered:: :math:`\square`
@@ -195,8 +199,6 @@ API (PyTorch)
     :private-members: _update
     :members:
 
-    .. automethod:: __init__
-
 .. raw:: html
 
     <br>
@@ -211,5 +213,3 @@ API (JAX)
     :show-inheritance:
     :private-members: _update
     :members:
-
-    .. automethod:: __init__
-Original file line number
+Diff line change
@@ Expand Up / @@ -30,6 +30,8 @@ body: @@
         description: The skrl version can be obtained with the command `pip show skrl`.
         options:
           - ---
+          - 1.4.0
+          - 1.3.0
           - 1.2.0
           - 1.1.0
           - 1.0.0
@@ Expand Down @@