Skip to content

Commit

Permalink
Merge branch 'develop' into toni/isaaclab_examples
Browse files Browse the repository at this point in the history
  • Loading branch information
Toni-SM committed Jan 17, 2025
2 parents 7eb8b16 + d57c8ea commit 5a08bc4
Show file tree
Hide file tree
Showing 289 changed files with 16,213 additions and 7,591 deletions.
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ body:
description: The skrl version can be obtained with the command `pip show skrl`.
options:
- ---
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/python-publish-manual.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:

pypi:
name: Publish package to PyPI
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
if: ${{ github.event.inputs.job == 'pypi'}}

steps:
Expand All @@ -24,7 +24,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.7'
python-version: '3.10.16'

- name: Install dependencies
run: |
Expand All @@ -43,7 +43,7 @@ jobs:

test-pypi:
name: Publish package to TestPyPI
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
if: ${{ github.event.inputs.job == 'test-pypi'}}

steps:
Expand All @@ -52,7 +52,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.7'
python-version: '3.10.16'

- name: Install dependencies
run: |
Expand Down
48 changes: 37 additions & 11 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,41 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-ast
- id: check-case-conflict
- id: check-docstring-first
- id: check-merge-conflict
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
rev: v4.6.0
hooks:
- id: check-ast
- id: check-case-conflict
- id: check-docstring-first
- id: check-json
- id: check-merge-conflict
- id: check-toml
- id: check-yaml
- id: debug-statements
- id: detect-private-key
- id: end-of-file-fixer
- id: name-tests-test
args: ["--pytest-test-first"]
exclude: ^(tests/strategies.py|tests/utils.py)
- id: trailing-whitespace
- repo: https://github.com/codespell-project/codespell
rev: v2.3.0
hooks:
- id: codespell
exclude: ^(docs/source/_static|docs/_build|pyproject.toml)
additional_dependencies:
- tomli
- repo: https://github.com/python/black
rev: 24.8.0
hooks:
- id: black
args: ["--line-length=120"]
exclude: ^(docs/)
- repo: https://github.com/pycqa/isort
rev: 5.12.0
rev: 5.13.2
hooks:
- id: isort
- repo: https://github.com/pre-commit/pygrep-hooks
rev: v1.10.0
hooks:
- id: isort
- id: rst-backticks
- id: rst-directive-colons
- id: rst-inline-touching-normal
58 changes: 51 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,48 @@

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [1.3.0] - Unreleased
## [1.4.0] - 2025-01-16
### Added
- Utilities to operate on Gymnasium spaces (`Box`, `Discrete`, `MultiDiscrete`, `Tuple` and `Dict`)
- `parse_device` static method in ML framework configuration (used in library components to set up the device)
- Model instantiator support for different shared model structures in PyTorch
- Support for automatic mixed precision training in PyTorch
- `init_state_dict` method to initialize model's lazy modules in PyTorch
- Model instantiators `fixed_log_std` parameter to define immutable log standard deviations
- Define the `stochastic_evaluation` trainer config to allow the use of the actions returned by the agent's model
as-is instead of deterministic actions (mean-actions in Gaussian-based models) during evaluation.
Make the return of deterministic actions the default behavior.

### Changed
- Call agent's `pre_interaction` method during evaluation
- Use spaces utilities to process states, observations and actions for all the library components
- Update model instantiators definitions to process supported fundamental and composite Gymnasium spaces
- Make flattened tensor storage in memory the default option (revert changed introduced in version 1.3.0)
- Drop support for PyTorch versions prior to 1.10 (the previous supported version was 1.9)
- Update KL Adaptive learning rate scheduler implementation to match Optax's behavior in JAX
- Update AMP agent to use the environment's terminated and truncated data, and the KL Adaptive learning rate scheduler
- Update runner implementations to support definition of arbitrary agents and their models
- Speed up PyTorch implementation:
- Disable argument checking when instantiating distributions
- Replace PyTorch's `BatchSampler` by Python slice when sampling data from memory

### Changed (breaking changes: style)
- Format code using Black code formatter (it's ugly, yes, but it does its job)

### Fixed
- Move the batch sampling inside gradient step loop for DQN, DDQN, DDPG (RNN), TD3 (RNN), SAC and SAC (RNN)
- Model state dictionary initialization for composite Gymnasium spaces in JAX
- Add missing `reduction` parameter to Gaussian model instantiator
- Optax's learning rate schedulers integration in JAX implementation
- Isaac Lab wrapper's multi-agent state retrieval with gymnasium 1.0
- Treat truncation signal when computing 'done' (environment reset)

### Removed
- Remove OpenAI Gym (`gym`) from dependencies and source code. **skrl** continues to support gym environments,
it is just not installed as part of the library. If it is needed, it needs to be installed manually.
Any gym-based environment wrapper must use the `convert_gym_space` space utility to operate

## [1.3.0] - 2024-09-11
### Added
- Distributed multi-GPU and multi-node learning (JAX implementation)
- Utilities to start multiple processes from a single program invocation for distributed learning using JAX
Expand All @@ -14,8 +55,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
### Changed
- Move the KL reduction from the PyTorch `KLAdaptiveLR` class to each agent that uses it in distributed runs
- Move the PyTorch distributed initialization from the agent base class to the ML framework configuration
- Implement model instantiators using dynamic execution of Python code
- Upgrade model instantiator implementations to support CNN layers and complex network definitions,
and implement them using dynamic execution of Python code
- Update Isaac Lab environment loader argument parser options to match Isaac Lab version
- Allow to store tensors/arrays with their original dimensions in memory and make it the default option

### Changed (breaking changes)
- Decouple the observation and state spaces in single and multi-agent environment wrappers and add the `state`
Expand All @@ -24,9 +67,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

### Fixed
- Catch TensorBoard summary iterator exceptions in `TensorboardFileIterator` postprocessing utils
- Fix automatic wrapper detection for Isaac Gym (previews), DeepMind and vectorized Gymnasium environments
- Fix automatic wrapper detection issue (introduced in previous version) for Isaac Gym (previews),
DeepMind and vectorized Gymnasium environments
- Fix vectorized/parallel environments `reset` method return values when called more than once
- IPPO and MAPPO `act` method return values when JAX-NumPy backend is enabled
- Fix IPPO and MAPPO `act` method return values when JAX-NumPy backend is enabled

## [1.2.0] - 2024-06-23
### Added
Expand Down Expand Up @@ -67,7 +111,7 @@ Summary of the most relevant features:
## [1.0.0-rc.2] - 2023-08-11
### Added
- Get truncation from `time_outs` info in Isaac Gym, Isaac Orbit and Omniverse Isaac Gym environments
- Time-limit (truncation) boostrapping in on-policy actor-critic agents
- Time-limit (truncation) bootstrapping in on-policy actor-critic agents
- Model instantiators `initial_log_std` parameter to set the log standard deviation's initial value

### Changed (breaking changes)
Expand All @@ -81,7 +125,7 @@ Summary of the most relevant features:
- `from skrl.envs.loaders.jax import load_omniverse_isaacgym_env`

### Changed
- Drop support for versions prior to PyTorch 1.9 (1.8.0 and 1.8.1)
- Drop support for PyTorch versions prior to 1.9 (the previous supported version was 1.8)

## [1.0.0-rc.1] - 2023-07-25
### Added
Expand Down Expand Up @@ -174,7 +218,7 @@ to allow storing samples in memories during evaluation
- Parameter `role` to model methods
- Wrapper compatibility with the new OpenAI Gym environment API
- Internal library colored logger
- Migrate checkpoints/models from other RL libraries to skrl models/agents
- Migrate checkpoints/models from other RL libraries to **skrl** models/agents
- Configuration parameter `store_separately` to agent configuration dict
- Save/load agent modules (models, optimizers, preprocessors)
- Set random seed and configure deterministic behavior for reproducibility
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Read the code a little bit and you will understand it at first glance... Also
```ini
function annotation (e.g. typing)
# insert an empty line
python libraries and other libraries (e.g. gym, numpy, time, etc.)
python libraries and other libraries (e.g. gymnasium, numpy, time, etc.)
# insert an empty line
machine learning framework modules (e.g. torch, torch.nn)
# insert an empty line
Expand Down
2 changes: 0 additions & 2 deletions docs/source/api/agents.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,6 @@ API (PyTorch)
:private-members: _update, _empty_preprocessor, _get_internal_value
:members:

.. automethod:: __init__
.. automethod:: __str__

.. raw:: html
Expand All @@ -136,5 +135,4 @@ API (JAX)
:private-members: _update, _empty_preprocessor, _get_internal_value
:members:

.. automethod:: __init__
.. automethod:: __str__
14 changes: 6 additions & 8 deletions docs/source/api/agents/a2c.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Algorithm implementation

| Main notation/symbols:
| - policy function approximator (:math:`\pi_\theta`), value function approximator (:math:`V_\phi`)
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), dones (:math:`d`)
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), terminated (:math:`d_{_{end}}`), truncated (:math:`d_{_{timeout}}`)
| - values (:math:`V`), advantages (:math:`A`), returns (:math:`R`)
| - log probabilities (:math:`logp`)
| - loss (:math:`L`)
Expand Down Expand Up @@ -59,7 +59,7 @@ Learning algorithm
| :literal:`_update(...)`
| :green:`# compute returns and advantages`
| :math:`V_{_{last}}' \leftarrow V_\phi(s')`
| :math:`R, A \leftarrow f_{GAE}(r, d, V, V_{_{last}}')`
| :math:`R, A \leftarrow f_{GAE}(r, d_{_{end}} \lor d_{_{timeout}}, V, V_{_{last}}')`
| :green:`# sample mini-batches from memory`
| [[:math:`s, a, logp, V, R, A`]] :math:`\leftarrow` states, actions, log_prob, values, returns, advantages
| :green:`# mini-batches loop`
Expand Down Expand Up @@ -232,6 +232,10 @@ Support for advanced features is described in the next table
- RNN, LSTM, GRU and any other variant
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\square`
* - Mixed precision
- Automatic mixed precision
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\square`
* - Distributed
- Single Program Multi Data (SPMD) multi-GPU
- .. centered:: :math:`\blacksquare`
Expand All @@ -252,16 +256,12 @@ API (PyTorch)
:private-members: _update
:members:

.. automethod:: __init__

.. autoclass:: skrl.agents.torch.a2c.A2C_RNN
:undoc-members:
:show-inheritance:
:private-members: _update
:members:

.. automethod:: __init__

.. raw:: html

<br>
Expand All @@ -276,5 +276,3 @@ API (JAX)
:show-inheritance:
:private-members: _update
:members:

.. automethod:: __init__
10 changes: 6 additions & 4 deletions docs/source/api/agents/amp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Algorithm implementation

| Main notation/symbols:
| - policy (:math:`\pi_\theta`), value (:math:`V_\phi`) and discriminator (:math:`D_\psi`) function approximators
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), dones (:math:`d`)
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), terminated (:math:`d_{_{end}}`), truncated (:math:`d_{_{timeout}}`)
| - values (:math:`V`), next values (:math:`V'`), advantages (:math:`A`), returns (:math:`R`)
| - log probabilities (:math:`logp`)
| - loss (:math:`L`)
Expand Down Expand Up @@ -57,7 +57,7 @@ Learning algorithm
| :math:`r_D \leftarrow -log(\text{max}( 1 - \hat{y}(D_\psi(s_{_{AMP}})), \, 10^{-4})) \qquad` with :math:`\; \hat{y}(x) = \dfrac{1}{1 + e^{-x}}`
| :math:`r' \leftarrow` :guilabel:`task_reward_weight` :math:`r \, +` :guilabel:`style_reward_weight` :guilabel:`discriminator_reward_scale` :math:`r_D`
| :green:`# compute returns and advantages`
| :math:`R, A \leftarrow f_{GAE}(r', d, V, V')`
| :math:`R, A \leftarrow f_{GAE}(r', d_{_{end}} \lor d_{_{timeout}}, V, V')`
| :green:`# sample mini-batches from memory`
| [[:math:`s, a, logp, V, R, A, s_{_{AMP}}`]] :math:`\leftarrow` states, actions, log_prob, values, returns, advantages, AMP states
| [[:math:`s_{_{AMP}}^{^M}`]] :math:`\leftarrow` AMP states from :math:`M`
Expand Down Expand Up @@ -237,6 +237,10 @@ Support for advanced features is described in the next table
- \-
- .. centered:: :math:`\square`
- .. centered:: :math:`\square`
* - Mixed precision
- Automatic mixed precision
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\square`
* - Distributed
- Single Program Multi Data (SPMD) multi-GPU
- .. centered:: :math:`\blacksquare`
Expand All @@ -256,5 +260,3 @@ API (PyTorch)
:show-inheritance:
:private-members: _update
:members:

.. automethod:: __init__
12 changes: 6 additions & 6 deletions docs/source/api/agents/cem.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Algorithm implementation

| Main notation/symbols:
| - policy function approximator (:math:`\pi_\theta`)
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), dones (:math:`d`)
| - states (:math:`s`), actions (:math:`a`), rewards (:math:`r`), next states (:math:`s'`), terminated (:math:`d_{_{end}}`), truncated (:math:`d_{_{timeout}}`)
| - loss (:math:`L`)
.. raw:: html
Expand All @@ -41,7 +41,7 @@ Learning algorithm
|
| :literal:`_update(...)`
| :green:`# sample all memory`
| :math:`s, a, r, s', d \leftarrow` states, actions, rewards, next_states, dones
| :math:`s, a, r \leftarrow` states, actions, rewards
| :green:`# compute discounted return threshold`
| :math:`[G] \leftarrow \sum_{t=0}^{E-1}` :guilabel:`discount_factor`:math:`^{t} \, r_t` for each episode
| :math:`G_{_{bound}} \leftarrow q_{th_{quantile}}([G])` at the given :guilabel:`percentile`
Expand Down Expand Up @@ -175,6 +175,10 @@ Support for advanced features is described in the next table
- \-
- .. centered:: :math:`\square`
- .. centered:: :math:`\square`
* - Mixed precision
- Automatic mixed precision
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\square`
* - Distributed
- \-
- .. centered:: :math:`\square`
Expand All @@ -195,8 +199,6 @@ API (PyTorch)
:private-members: _update
:members:

.. automethod:: __init__

.. raw:: html

<br>
Expand All @@ -211,5 +213,3 @@ API (JAX)
:show-inheritance:
:private-members: _update
:members:

.. automethod:: __init__
Loading

0 comments on commit 5a08bc4

Please sign in to comment.