Release v1.0.0b2 JAX Support and Hyperparameter Tuning · vwxyzjn/cleanrl

🎉 I am thrilled to announce the v1.0.0b2 CleanRL Beta Release. This new release comes with exciting new features. First, we now support JAX-based learning algorithms, which are usually faster than the torch equivalent! Here are the docs of the new JAX-based DQN, TD3, and DDPG implementations:

Also, we now have preliminary support for hyperparameter tuning via optuna (see docs), which is designed to help researchers to find a single set of hyperparameters that work well with a kind of games. The current API looks like below:

import optuna
from cleanrl_utils.tuner import Tuner
tuner = Tuner(
    script="cleanrl/ppo.py",
    metric="charts/episodic_return",
    metric_last_n_average_window=50,
    direction="maximize",
    aggregation_type="average",
    target_scores={
        "CartPole-v1": [0, 500],
        "Acrobot-v1": [-500, 0],
    },
    params_fn=lambda trial: {
        "learning-rate": trial.suggest_loguniform("learning-rate", 0.0003, 0.003),
        "num-minibatches": trial.suggest_categorical("num-minibatches", [1, 2, 4]),
        "update-epochs": trial.suggest_categorical("update-epochs", [1, 2, 4, 8]),
        "num-steps": trial.suggest_categorical("num-steps", [5, 16, 32, 64, 128]),
        "vf-coef": trial.suggest_uniform("vf-coef", 0, 5),
        "max-grad-norm": trial.suggest_uniform("max-grad-norm", 0, 5),
        "total-timesteps": 100000,
        "num-envs": 16,
    },
    pruner=optuna.pruners.MedianPruner(n_startup_trials=5),
    sampler=optuna.samplers.TPESampler(),
)
tuner.tune(
    num_trials=100,
    num_seeds=3,
)

Besides, we added support for new algorithms/environments, which are

Isaac Gym support in PPO for GPU accelerated robotics environment. ppo_continuous_action_isaacgym.py
Random Network Distillation (RND) for highly exploratory environments: ppo_rnd_envpool.py

I would like to cordially thank the core dev members @dosssman @yooceii @dipamc @kinalmehta for their efforts in helping maintain the CleanRL repository. I would also like to give a shout-out to our new contributors @cool-RR, @Howuhh, @jseppanen, @joaogui1, @kinalmehta, and @ALPH2H.

New CleanRL Supported Publications

Jiayi Weng, Min Lin, Shengyi Huang, Bo Liu, Denys Makoviichuk, Viktor Makoviychuk, Zichen Liu, Yufan Song, Ting Luo, Yukun Jiang, Zhongwen Xu, & Shuicheng YAN (2022). EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=BubxnHpuMbG

New Features PR

prototype jax with ddpg by @vwxyzjn in #187
Isaac Gym Envs PPO updates by @vwxyzjn in #233
JAX TD3 prototype by @joaogui1 in #225
prototype jax with dqn by @kinalmehta in #222
Poetry 1.2 by @vwxyzjn in #271
Add rnd_ppo.py documentation and refactor by @yooceii in #151
Hyperparameter optimization by @vwxyzjn in #228
Update the hyperparameter optimization example script by @vwxyzjn in #268

Bug Fixes PR

Td3 ddpg action bound fix by @dosssman in #211
added gamma to reward normalization wrappers by @Howuhh in #209
Seed envpool environment explicitly by @jseppanen in #238
Fix PPO + Isaac Gym Benchmark Script by @vwxyzjn in #243
Fix for noise sampling for the TD3 exploration by @dosssman in #260

Documentation PR

Add a note on PPG's performance by @vwxyzjn in #199
Clarify CleanRL is a non-modular library by @vwxyzjn in #200
Fix documentation link by @vwxyzjn in #213
JAX + DDPG docs fix by @vwxyzjn in #229
Fix links in docs for ppo_continuous_action_isaacgym.py by @vwxyzjn in #242
Fix docs (badge, TD3 + JAX, and DQN + JAX) by @vwxyzjn in #246
Fix typos by @ALPH2H in #282
Fix docs links in README.md by @vwxyzjn in #254
chore: remove unused parameters in jax implementations by @kinalmehta in #264

Misc PR

Show correct exception cause by @cool-RR in #205
Remove pettingzoo's pistonball example by @vwxyzjn in #214
Leverage CI to speed up poetry lock by @vwxyzjn in #235
Ubuntu runner for poetry lock by @vwxyzjn in #236
Remove the github pages CI in favor of vercel by @vwxyzjn in #241
Clarify LICENSE info by @vwxyzjn in #253
Update published paper citation by @vwxyzjn in #284
Refactor dqn word choice by @vwxyzjn in #257

New Contributors

@cool-RR made their first contribution in #205
@Howuhh made their first contribution in #209
@jseppanen made their first contribution in #238
@joaogui1 made their first contribution in #225
@kinalmehta made their first contribution in #222
@ALPH2H made their first contribution in #282

Full Changelog: v1.0.0b1...v1.0.0b2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0b2 JAX Support and Hyperparameter Tuning