Reinforcement Learning with Policy Gradient

The idea is to create a deep policy network that is intelligent enough to generalize to most games in OpenAI's Gym.

To run this code first install OpenAI's Gym: https://github.com/openai/gym
Download this repo and run python run_carpole.py to run the agent (or any other game in this repo, like python run_lunarlander.py) and see it improve over time.
To run a Box2D game like LunarLander you have to install the Box2D Physics engine: pip install -e '.[box2d]'

Lunar Lander

Initially, the agent is as good as randomly picking the next action:

After several hundred episodes, the agent starts learning how to fly and hover around:

Finally after about 3K episodes the agent can land pretty well:

Initially, the agent is quite dumb, but it's exploring the state/action/reward space:

As more episodes go by, it starts to get better by learning from experience (using reward guided loss):

Eventually, the agent masters the game (trained on my Macbook Pro for ~10 minutes):

After 297 episodes the agent scored 617,332!

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
output/weights		output/weights
.gitignore		.gitignore
README.md		README.md
policy_gradient.py		policy_gradient.py
policy_gradient_layers.py		policy_gradient_layers.py
run_acrobot.py		run_acrobot.py
run_carracing.py		run_carracing.py
run_cartpole.py		run_cartpole.py
run_lunarlander.py		run_lunarlander.py
run_mountaincar.py		run_mountaincar.py