Update configs, scripts, and instructions (#2)

* update configs * Update setup.py * update configs * fix bugs * update task * clean steup * update train and eval scripts * clean setup * update README * update training scripts * update configs * remove bc-frontier * format * format * add fsrl dependency * Update README.md * Update setup.py * update cdt * add bc frontier * clean up --------- Co-authored-by: Ja4822 <[email protected]>
liuzuxin · Jun 15, 2023 · 3e586b7 · 3e586b7
1 parent 1840b85
commit 3e586b7
Show file tree

Hide file tree

Showing 32 changed files with 1,056 additions and 477 deletions.
diff --git a/README.md b/README.md
@@ -1,59 +1,101 @@
 <div align="center">
-  <a href="http://fsrl.readthedocs.io"><img width="300px" height="auto" src="docs/_static/images/osrl-logo.png"></a>
+  <a href="http://www.offline-saferl.org"><img width="300px" height="auto" src="https://github.com/liuzuxin/osrl/raw/main/docs/_static/images/osrl-logo.png"></a>
 </div>
 
 <br/>
 
 <div align="center">
 
   <a>![Python 3.8+](https://img.shields.io/badge/Python-3.8%2B-brightgreen.svg)</a>
-  [![License](https://img.shields.io/badge/License-MIT-yellow.svg)](#license)
+  [![License](https://img.shields.io/badge/License-MIT-blue.svg)](#license)
+  [![PyPI](https://img.shields.io/pypi/v/osrl-lib?logo=pypi)](https://pypi.org/project/osrl-lib)
+  [![GitHub Repo Stars](https://img.shields.io/github/stars/liuzuxin/osrl?color=brightgreen&logo=github)](https://github.com/liuzuxin/osrl/stargazers)
+  [![Downloads](https://static.pepy.tech/personalized-badge/osrl-lib?period=total&left_color=grey&right_color=blue&left_text=downloads)](https://pepy.tech/project/osrl-lib)
   <!-- [![Documentation Status](https://img.shields.io/readthedocs/fsrl?logo=readthedocs)](https://fsrl.readthedocs.io) -->
   <!-- [![CodeCov](https://codecov.io/github/liuzuxin/fsrl/branch/main/graph/badge.svg?token=BU27LTW9F3)](https://codecov.io/github/liuzuxin/fsrl)
   [![Tests](https://github.com/liuzuxin/fsrl/actions/workflows/test.yml/badge.svg)](https://github.com/liuzuxin/fsrl/actions/workflows/test.yml) -->
   <!-- [![CodeCov](https://img.shields.io/codecov/c/github/liuzuxin/fsrl/main?logo=codecov)](https://app.codecov.io/gh/liuzuxin/fsrl) -->
   <!-- [![tests](https://img.shields.io/github/actions/workflow/status/liuzuxin/fsrl/test.yml?label=tests&logo=github)](https://github.com/liuzuxin/fsrl/tree/HEAD/tests) -->
-  <!-- [![PyPI](https://img.shields.io/pypi/v/fsrl?logo=pypi)](https://pypi.org/project/fsrl) -->
-  <!-- [![GitHub Repo Stars](https://img.shields.io/github/stars/liuzuxin/fsrl?color=brightgreen&logo=github)](https://github.com/liuzuxin/fsrl/stargazers)
-  [![Downloads](https://static.pepy.tech/personalized-badge/fsrl?period=total&left_color=grey&right_color=blue&left_text=downloads)](https://pepy.tech/project/fsrl) -->
-   <!-- [![License](https://img.shields.io/github/license/liuzuxin/fsrl?label=license)](#license) -->
 
 </div>
 
 ---
 
 **OSRL (Offline Safe Reinforcement Learning)** offers a collection of elegant and extensible implementations of state-of-the-art offline safe reinforcement learning (RL) algorithms. Aimed at propelling research in offline safe RL, OSRL serves as a solid foundation to implement, benchmark, and iterate on safe RL solutions.
 
-The OSRL package is a crucial component of our larger benchmarking suite for offline safe learning, which also includes [FSRL](https://github.com/liuzuxin/fsrl) and [DSRL](https://github.com/liuzuxin/dsrl), and is built to facilitate the development of robust and reliable offline safe RL solutions.
+The OSRL package is a crucial component of our larger benchmarking suite for offline safe learning, which also includes [DSRL](https://github.com/liuzuxin/DSRL) and [FSRL](https://github.com/liuzuxin/FSRL), and is built to facilitate the development of robust and reliable offline safe RL solutions.
 
 To learn more, please visit our [project website](http://www.offline-saferl.org).
 
 ## Structure
 The structure of this repo is as follows:
 ```
-├── osrl  # offline safe RL algorithms
-│   ├── common_net.py
-│   ├── common_util.py
-│   ├── xx_algorithm.py
-│   ├── xx_algorithm_util.py
-│   ├── ...
+├── examples
+│   ├── configs  # the training configs of each algorithm
+│   ├── eval     # the evaluation escipts
+│   ├── train    # the training scipts
+├── osrl
+│   ├── algorithms  # offline safe RL algorithms
+│   ├── common      # base networks and utils
 ```
+The implemented offline safe RL and imitation learning algorithms include:
+
+| Algorithm           | Type           | Description           |
+|:-------------------:|:-----------------:|:------------------------:|
+| BCQ-Lag             | Q-learning           | [BCQ](https://arxiv.org/pdf/1812.02900.pdf) with [PID Lagrangian](https://arxiv.org/abs/2007.03964) |
+| BEAR-Lag            | Q-learning           | [BEARL](https://arxiv.org/abs/1906.00949) with [PID Lagrangian](https://arxiv.org/abs/2007.03964)   |
+| CPQ                 | Q-learning           | [Constraints Penalized Q-learning (CPQ))](https://arxiv.org/abs/2107.09003) |
+| COptiDICE           | Distribution Correction Estimation           | [Offline Constrained Policy Optimization via stationary DIstribution Correction Estimation](https://arxiv.org/abs/2204.08957) |
+| CDT                 | Sequential Modeling | [Constrained Decision Transformer](https://arxiv.org/abs/2302.07351) |
+| BC-All                 | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with all datasets |
+| BC-Safe                 | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with safe trajectories |
+| BC-Frontier                 | Imitation Learning | [Behavior Cloning](https://arxiv.org/abs/2302.07351) with high-reward trajectories |
+
 
 ## Installation
-Pull the repo and install:
+
+OSRL is currently hosted on [PyPI](https://pypi.org/project/osrl-lib), you can simply install it by:
+
+```bash
+pip install osrl-lib
 ```
-git clone https://github.com/liuzuxin/osrl.git
+
+You can also pull the repo and install:
+```bash
+git clone https://github.com/liuzuxin/OSRL.git
 cd osrl
 pip install -e .
 ```
 
+If you want to use the `CDT` algorithm, please also manually install the `OApackage`:
+```bash
+pip install OApackage==2.7.6
+```
+
 ## How to use OSRL
 
-The example usage are in the `examples` folder, where you can find the training and evaluation scripts for all the algorithms.
+The example usage are in the `examples` folder, where you can find the training and evaluation scripts for all the algorithms. 
+All the parameters and their default configs for each algorithm are available in the `examples/configs` folder. 
+OSRL uses the `WandbLogger` in [FSRL](https://github.com/liuzuxin/FSRL) and [Pyrallis](https://github.com/eladrich/pyrallis) configuration system. The offline dataset and offline environments are provided in [DSRL](https://github.com/liuzuxin/DSRL), so make sure you install both of them first.
 
+### Training
 For example, to train the `bcql` method, simply run by overriding the default parameters:
 
 ```shell
-python examples/train/train_bcql.py --param1 args1
+python examples/train/train_bcql.py --task OfflineCarCirvle-v0 --param1 args1 ...
 ```
-All the parameters and their default configs for each algorithm are available in the `examples/configs` folder.
+By default, the config file and the logs during training will be written to `logs\` folder and the training plots can be viewed online using Wandb.
+
+You can also launch a sequence of experiments or in parallel via the [EasyRunner](https://github.com/liuzuxin/easy-runner) package, see `examples/train_all_tasks.py` for details.
+
+### Evaluation
+To evaluate a trained agent, for example, a BCQ agent, simply run
+```
+python example/eval/eval_bcql.py --path path_to_model --eval_episodes 20
+```
+It will load config file from `path_to_model/config.yaml` and model file from `path_to_model/checkpoints/model.pt`, run 20 episodes, and print the average normalized reward and cost.
+
+
+## Contributing
+
+If you have any suggestions or find any bugs, please feel free to submit an issue or a pull request. We welcome contributions from the community! 
diff --git a/examples/configs/bc_configs.py b/examples/configs/bc_configs.py
@@ -1,12 +1,13 @@
-from typing import Any, DefaultDict, Dict, List, Optional, Tuple
 from dataclasses import asdict, dataclass
+from typing import Any, DefaultDict, Dict, List, Optional, Tuple
+
 from pyrallis import field
 
 
 @dataclass
 class BCTrainConfig:
     # wandb params
-    project: str = "OSRL-baselines-new"
+    project: str = "OSRL-baselines"
     group: str = None
     name: Optional[str] = None
     prefix: Optional[str] = "BC"
@@ -16,7 +17,7 @@ class BCTrainConfig:
     # dataset params
     outliers_percent: float = None
     noise_scale: float = None
-    inpaint_ranges: Tuple[Tuple[float, float], ...] = None
+    inpaint_ranges: Tuple[Tuple[float, float, float, float], ...] = None
     epsilon: float = None
     density: float = 1.0
     # training params
@@ -29,7 +30,7 @@ class BCTrainConfig:
     cost_limit: int = 10
     episode_len: int = 300
     batch_size: int = 512
-    update_steps: int = 300_000
+    update_steps: int = 100_000
     num_workers: int = 8
     bc_mode: str = "all"  # "all", "safe", "risky", "frontier", "boundary", "multi-task"
     # model params
@@ -80,6 +81,20 @@ class BCAntCircleConfig(BCTrainConfig):
     episode_len: int = 500
 
 
+@dataclass
+class BCBallRunConfig(BCTrainConfig):
+    # training params
+    task: str = "OfflineBallRun-v0"
+    episode_len: int = 100
+
+
+@dataclass
+class BCBallCircleConfig(BCTrainConfig):
+    # training params
+    task: str = "OfflineBallCircle-v0"
+    episode_len: int = 200
+
+
 @dataclass
 class BCCarButton1Config(BCTrainConfig):
     # training params
@@ -191,89 +206,113 @@ class BCPointPush2Config(BCTrainConfig):
     task: str = "OfflinePointPush2Gymnasium-v0"
     episode_len: int = 1000
 
+
 @dataclass
 class BCAntVelocityConfig(BCTrainConfig):
     # training params
     task: str = "OfflineAntVelocityGymnasium-v1"
     episode_len: int = 1000
 
+
 @dataclass
 class BCHalfCheetahVelocityConfig(BCTrainConfig):
     # training params
     task: str = "OfflineHalfCheetahVelocityGymnasium-v1"
     episode_len: int = 1000
 
+
 @dataclass
 class BCHopperVelocityConfig(BCTrainConfig):
     # training params
     task: str = "OfflineHopperVelocityGymnasium-v1"
     episode_len: int = 1000
 
+
 @dataclass
 class BCSwimmerVelocityConfig(BCTrainConfig):
     # training params
     task: str = "OfflineSwimmerVelocityGymnasium-v1"
     episode_len: int = 1000
 
+
 @dataclass
 class BCWalker2dVelocityConfig(BCTrainConfig):
     # training params
     task: str = "OfflineWalker2dVelocityGymnasium-v1"
     episode_len: int = 1000
 
+
 @dataclass
 class BCEasySparseConfig(BCTrainConfig):
     # training params
     task: str = "OfflineMetadrive-easysparse-v0"
     episode_len: int = 1000
+    update_steps: int = 200_000
+
 
 @dataclass
 class BCEasyMeanConfig(BCTrainConfig):
     # training params
     task: str = "OfflineMetadrive-easymean-v0"
     episode_len: int = 1000
+    update_steps: int = 200_000
+
 
 @dataclass
 class BCEasyDenseConfig(BCTrainConfig):
     # training params
     task: str = "OfflineMetadrive-easydense-v0"
     episode_len: int = 1000
+    update_steps: int = 200_000
+
 
 @dataclass
 class BCMediumSparseConfig(BCTrainConfig):
     # training params
     task: str = "OfflineMetadrive-mediumsparse-v0"
     episode_len: int = 1000
+    update_steps: int = 200_000
+
 
 @dataclass
 class BCMediumMeanConfig(BCTrainConfig):
     # training params
     task: str = "OfflineMetadrive-mediummean-v0"
     episode_len: int = 1000
+    update_steps: int = 200_000
+
 
 @dataclass
 class BCMediumDenseConfig(BCTrainConfig):
     # training params
     task: str = "OfflineMetadrive-mediumdense-v0"
     episode_len: int = 1000
+    update_steps: int = 200_000
+
 
 @dataclass
 class BCHardSparseConfig(BCTrainConfig):
     # training params
     task: str = "OfflineMetadrive-hardsparse-v0"
     episode_len: int = 1000
+    update_steps: int = 200_000
+
 
 @dataclass
 class BCHardMeanConfig(BCTrainConfig):
     # training params
     task: str = "OfflineMetadrive-hardmean-v0"
     episode_len: int = 1000
+    update_steps: int = 200_000
+
 
 @dataclass
 class BCHardDenseConfig(BCTrainConfig):
     # training params
     task: str = "OfflineMetadrive-harddense-v0"
     episode_len: int = 1000
+    update_steps: int = 200_000
+
 
 BC_DEFAULT_CONFIG = {
     # bullet_safety_gym
@@ -283,6 +322,8 @@ class BCHardDenseConfig(BCTrainConfig):
     "OfflineDroneCircle-v0": BCDroneCircleConfig,
     "OfflineCarRun-v0": BCCarRunConfig,
     "OfflineAntCircle-v0": BCAntCircleConfig,
+    "OfflineBallCircle-v0": BCBallCircleConfig,
+    "OfflineBallRun-v0": BCBallRunConfig,
     # safety_gymnasium: car
     "OfflineCarButton1Gymnasium-v0": BCCarButton1Config,
     "OfflineCarButton2Gymnasium-v0": BCCarButton2Config,