From 3e6b7c6e62481addef0065812ab4317395910db3 Mon Sep 17 00:00:00 2001 From: Vincent Moens Date: Tue, 17 Sep 2024 11:52:31 -0700 Subject: [PATCH] [Doc] Document losses in README.md ghstack-source-id: 3a37a28c40e65b76ae50a1cd819474a58b94ae28 Pull Request resolved: https://github.com/pytorch/rl/pull/2408 --- README.md | 286 +++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 273 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 47189b758e0..8e9ea840d39 100644 --- a/README.md +++ b/README.md @@ -523,19 +523,279 @@ If you would like to contribute to new features, check our [call for contributio ## Examples, tutorials and demos A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are provided with an illustrative purpose: -- [DQN](https://github.com/pytorch/rl/blob/main/sota-implementations/dqn) -- [DDPG](https://github.com/pytorch/rl/blob/main/sota-implementations/ddpg/ddpg.py) -- [IQL](https://github.com/pytorch/rl/blob/main/sota-implementations/iql/iql_offline.py) -- [CQL](https://github.com/pytorch/rl/blob/main/sota-implementations/cql/cql_offline.py) -- [TD3](https://github.com/pytorch/rl/blob/main/sota-implementations/td3/td3.py) -- [TD3+BC](https://github.com/pytorch/rl/blob/main/sota-implementations/td3+bc/td3+bc.py) -- [A2C](https://github.com/pytorch/rl/blob/main/examples/a2c_old/a2c.py) -- [PPO](https://github.com/pytorch/rl/blob/main/sota-implementations/ppo/ppo.py) -- [SAC](https://github.com/pytorch/rl/blob/main/sota-implementations/sac/sac.py) -- [REDQ](https://github.com/pytorch/rl/blob/main/sota-implementations/redq/redq.py) -- [Dreamer](https://github.com/pytorch/rl/blob/main/sota-implementations/dreamer/dreamer.py) -- [Decision Transformers](https://github.com/pytorch/rl/blob/main/sota-implementations/decision_transformer) -- [RLHF](https://github.com/pytorch/rl/blob/main/examples/rlhf) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Algorithm + Compile Support** + Tensordict-free API + Modular Losses + Continuous and Discrete +
DQN + 1.9x + + + NA + + (through ActionDiscretizer transform) +
DDPG + 1.87x + + + + + - (continuous only) +
IQL + 3.22x + + + + + + +
CQL + 2.68x + + + + + + +
TD3 + 2.27x + + + + + - (continuous only) +
+ TD3+BC + untested + + + + + - (continuous only) +
+ A2C + 2.67x + + + - + + +
+ PPO + 2.42x + + + - + + +
SAC + 2.62x + + + - + + +
REDQ + 2.28x + + + - + - (continuous only) +
Dreamer v1 + untested + + + + (different classes) + - (continuous only) +
Decision Transformers + untested + + + NA + - (continuous only) +
CrossQ + untested + + + + + - (continuous only) +
Gail + untested + + + NA + + +
Impala + untested + + + - + + +
IQL (MARL) + untested + + + + + + +
DDPG (MARL) + untested + + + + + - (continuous only) +
PPO (MARL) + untested + + + - + + +
QMIX-VDN (MARL) + untested + + + NA + + +
SAC (MARL) + untested + + + - + + +
RLHF + NA + + + NA + NA +
+ +** The number indicates expected speed-up compared to eager mode when executed on CPU. Numbers may vary depending on + architecture and device. and many more to come!