rl

This repo is for learning RL basics. I intend to keep things simple, for example:

Focus on simple gym environment
Minimum dependencies and avoid advanced implementations in pytorch
Only use implementation tricks that are necessary to make things work
Tensorization as much as possible, e.g. episode masks to handle variable sequence length

Implementations

A2C suffers from instability issues. A few helpful tricks:
- advantage normalization
- use a learning schedule
- gradient norm clipping
- carefully tune learning rate and other parameters
PPO makes training much more stable. See A2C and PPO (both using GAE with same lambda to calculate advantages) comparison on CartPole-V1 (experimented using 8181d9f ):

There are other approaches (not implemented in this repo), e.g. A3C, SAC to make actor critic method more stable.
Reinforcement Learning materials:
- Sergey Levine's CS285 open course
- OpenAI RL Spinning Up

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
plots		plots
.gitignore		.gitignore
README.md		README.md
main.py		main.py
nets.py		nets.py
requirements.txt		requirements.txt