Skip to content

Latest commit

 

History

History
79 lines (62 loc) · 5.41 KB

README.md

File metadata and controls

79 lines (62 loc) · 5.41 KB

Description

This is a small and simple collection of some reinforcement learning algorithms. The core idea of this repo is to have minimal structure, such that each algorithm is easy to understand and to modify. For this reason, each algorithm has a separate folder, independent from the others. Only approximators (neural network, linear functions, ...), policy classes, and auxiliary functions (for plotting or collecting data with gym-like environments) are shared.

Note that an algorithm can have different versions. For example, SPG can learn the critic by using Monte-Carlo estimates or by temporal difference.

The repository has a modular structure and no installation is needed. To run an algorithm, from the root folder execute
python3 -m <ALG>.<RUN_SCRIPT> <ENV_NAME> <SEED>
(seed is optional, default is 1). At each iteration, data about the most important statistics (average return, value function loss, entropy, ...) is saved in
data-trial/<ALG_NAME>/<ENV_NAME>/<DATE_TIME>.dat.
For example, running
python3 -m ddpg.ddpg Pendulum-v0 0
will generate
data-trial/ddpg/Pendulum-v0/180921_155842.dat.

You can also save/load the learned model and visualize the graph. For more info, check demo.py. The demo also shows how to use the LQR environment and how to plot value functions.

Finally, use any of the run scripts in the root folder to run several trials of the same algorithm in parallel (see the scripts for instructions).
With data generated from the runs, you can plot the average results with 95% confidence interval using plot_shaded.py, or you can plot all learning curves together with plot_all.py (see the scripts for instructions).

Note that all scripts use flexible memory, i.e.,

config_tf = tf.ConfigProto()
config_tf.gpu_options.allow_growth=True
session = tf.Session(config=config_tf)

Requirements

Later versions of tensorflow may raise warnings.

You can also use other physics simulators, such as Roboschool, PyBullet and MuJoCo.

Common files

  • approximators.py : neural network, random Fourier features, polynomial features
  • average_env.py : introduces state resets to consider average return MDPs
  • cross_validation.py : function to minimize a loss function with cross-validation
  • data_collection.py : functions for sampling MDP transitions and getting mini-batches
  • filter_env.py : modifies a gym environment to have states and actions normalized in [-1,1]
  • logger.py : creates folders for saving data
  • noise.py : noise functions
  • plotting.py : to plot value functions
  • policy.py : implementation of common policies
  • rl_utils.py : RL functions, such as generalized advantage estimation and retrace

Algorithm-specific files

  • solver.py : (optional) defines optimization routines required by the algorithm
  • hyperparameters.py : defines the hyperparameters (e.g., number of transitions per iteration, network sizes and learning rates)
  • <NAME>.py : script to run the algorithm (e.g., ppo.py or ddpg.py)

Implemented algorithms


All implementations are very basic, there is no reward/gradient clipping, hyperparameters tuning, decaying KL/entropy coefficient, batch normalization, standardization with running mean and std, ...