An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime
DTR-Bench (DTR-Bench) is an expanding reinforcement learning simulation platform with a unified pipeline including hyperparameter search, training, evaluation, and visualisation.
These instructions will get you a copy of the project up and running on your local machine.
- Python 3.10: The project is developed using Python 3.10. It is recommended to use the same version to avoid compatibility issues.
- Install the DTR-Gym and the DTR-Bench
pip install DTRGym
- Install the required packages
cd DTR-Bench
pip install -r requirements.txt
- Test the installation
python test_installation.py
We provide a simple example to show how to use the DTR-Bench. The example is in the get_start.py file.
You can run the example by:
python get_start.py
After running the example, you will see the following a plot like this. It shows the effect of the treatment given by the train RL policy in the simulation environment.
The DTRBench provides a set of off-policy RL algorithms to train the RL policy in the simulation environments. These policies are developed based on Tianshou. The off-policy RL algorithms include:
Discrete | RNN-based | Continuous |
---|---|---|
DDQN | DQN-rnn | DDPG |
DDQN-dueling | DDQN-rnn | TD3 |
DQN | C51-rnn | SAC |
C51 | discrete-SAC-rnn | |
discrete-SAC |
Running functions in reinforcement learning, which includes hyperparameters grid-searching, policies training and evaluation, and baseline policies evaluation.
The DTRBench provides a standard visualisation tool to visualise the treatment effect of the trained RL policy in the simulation environments. It enables visualising the environment states, observations, actions, and the reward.
The DTR-Bench provides a set of APIs to interact with the simulation environments and the benchmark policies.
It can be used to:
- Create a simulation environment.
- Optimize the hyperparameters of the RL policy training.
- Train a RL policy in the simulation environments.
- Visualize the treatment effect of the trained RL policy.
import gymnasium as gym
import DTRGym # this line is necessary!
gym.make('AhnChemoEnv-discrete', n_act=11)
Please remember to import DTRGym to register the simulation environments.
- Discrete policies todo: change the cd path
cd DTR-Bench
export PYTHONPATH="."
python DTRBench/run_rl/online_discrete_search.py --policy_name=DQN --task SimGlucoseEnv --n_trials 100 --num_actions 11 --setting 1
- continuous policies
cd DTR-Bench
export PYTHONPATH="."
python DTRBench/run_rl/online_continuous_search.py --policy_name=DDPG --task OberstSepsisEnv --n_trials 100 --setting 1
- Discrete Policies
cd DTR-Bench
export PYTHONPATH="."
conda activate torch
python DTRBench/run_rl/online_discrete_retrain.py --policy_name=DQN --task SimGlucoseEnv --num_actions 11 --setting 1
- Continuous Policies
cd DTR-Bench
export PYTHONPATH="."
conda activate torch
python DTRBench/run_rl/online_continuous_retrain.py --policy_name=DDPG --task SimGlucoseEnv --setting 1
- Baseline Policies (RandomPolicy, MaxPolicy, MinPolicy)
cd DTR-Bench
export PYTHONPATH="."
conda activate torch
python DTRBench/run_rl/online_baseline.py --task OberstSepsisEnv
cd DTR-Bench
export PYTHONPATH="."
python DTRBench/visual_fn/visual.py
The hyperparameters and test results are stored on Kaggle.
If you use the DTR-Bench in your research, please cite the following paper:
to be updated
Special thanks to the following contributors that make the DTR-Bench possible:
- @Mingcheng Zhu - who developed DTRGym and produced extensive DTRBench experiments.
- To be continued