Name		Name	Last commit message	Last commit date
parent directory ..
LunarLanderContinuous-v2 (DDPG).html		LunarLanderContinuous-v2 (DDPG).html
LunarLanderContinuous-v2 (DDPG).ipynb		LunarLanderContinuous-v2 (DDPG).ipynb
README.md		README.md
checkpoint_actor.pth		checkpoint_actor.pth
checkpoint_critic.pth		checkpoint_critic.pth
ddpg_agent.py		ddpg_agent.py
model.py		model.py

README.md

LunarLander Problem

Getting Started

The environment to the LunarLanderContinuous is described here.

Solution Video

The video shows in the first part the behaviour of the untrained agent and then in comparison the behaviour of the trained agent.

Solution Info

My learning algorithm is a Deep Deterministic Policy Gradient.

DDPG is an actor-critic algorithm and primarily uses two neural networks. One for the actor and one for the critic. These networks calculate action vectors for the current state and and generate a temporal-difference error signal each time step.

DDPG uses a stochastic behavioral policy for good exploration and a deterministic target policy for estimating.

The current state is the input of the actuator network and the output is a single value representing the action. The deterministic policy gradient theorem provides the update rule for the weights of the actor network.

The critic's output is simply the estimated Q-value of the current state and the action given by the actor. The critic network is updated from the gradients obtained from the TD error signal.

More general information about DDPG in this paper.

Instructions

start Jupyter Notebook LunarLanderContinuous-v2 (DDPG).ipynb and follow the instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LunarLander-v2

LunarLander-v2

README.md

LunarLander Problem

Getting Started

Solution Video

Solution Info

Instructions

Files

LunarLander-v2

Directory actions

More options

Directory actions

More options

Latest commit

History

LunarLander-v2

Folders and files

parent directory

README.md

LunarLander Problem

Getting Started

Solution Video

Solution Info

Instructions