Deep-Learner

Deep Reinforcement Learning Algorithms

This collection showcases various projects focused on Deep Reinforcement Learning techniques. The projects are organized in a matrix structure: [environment x algorithm], where environment represents the challenge to be tackled, and algorithm denotes the method employed to solve it. In certain instances, multiple algorithms are applied to the same environment. Each project is presented as a Jupyter notebook, complete with a comprehensive training log.

The collection encompasses the following environments:

AntBulletEnv, BipedalWalker, BipedalWalkerHardcore, CarRacing, CartPole, Crawler, HalfCheetahBulletEnv,
HopperBulletEnv, LunarLander, LunarLanderContinuous, Markov Decision 6x6, Minitaur, Minitaur with Duck,
MountainCar, MountainCarContinuous, Pong, Navigation, Reacher, Snake, Tennis, Waker2DBulletEnv.

Four environments (Navigation, Crawler, Reacher, Tennis) are solved in the framework of the
Udacity Deep Reinforcement Learning Nanodegree Program.

Monte-Carlo Methods
In Monte Carlo (MC) methods, we play through episodes of the game until completion, collect the rewards along the way, and then trace back to the start of the episode. This process is repeated multiple times, and the average value of each state is calculated.
Temporal Difference Methods and Q-learning
Reinforcement Learning in Continuous Space (Deep Q-Network)
Function Approximation and Neural Network
The Universal Approximation Theorem (UAT) states The Universal Approximation Theorem (UAT) states that feed-forward neural networks with a single hidden layer and a finite number of nodes can approximate any continuous function, given certain mild assumptions about the activation function are met.
Policy-Based Methods, Hill-Climbing, Simulating Annealing
Random-restart hill-climbing is often surprisingly effective. Simulated annealing is a useful probabilistic technique because it avoids mistaking local extrema for global extrema.
Policy-Gradient Methods, REINFORCE, PPO
Define a performance measure J(\theta) to maximaze. Learn policy paramter \theta throgh approximate gradient ascent.
Actor-Critic Methods, A3C, A2C, DDPG, TD3, SAC
The key difference from A2C is the asynchronous aspect. A3C involves multiple independent agents (networks) with their own weights, interacting with different copies of the environment in parallel, thus exploring a larger part of the state-action space more quickly.
Forward-Looking Actor or FORK
Model-based reinforcement learning leverages the model in a sophisticated manner, often utilizing deterministic or stochastic optimal control theory to optimize the policy based on the model. FORK uses the system network as a black box to predict future states, without using it as a mathematical model for optimizing control actions. This distinction allows any model-free Actor-Critic algorithm with FORK to remain model-free.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Ant-PyBulletEnv-Soft-Actor-Critic		Ant-PyBulletEnv-Soft-Actor-Critic
BipedalWalker-A2C-VectorizedEnv		BipedalWalker-A2C-VectorizedEnv
BipedalWalker-PPO-VectorizedEnv		BipedalWalker-PPO-VectorizedEnv
BipedalWalker-Soft-Actor-Critic		BipedalWalker-Soft-Actor-Critic
BipedalWalker-TwinDelayed-DDPG (TD3)		BipedalWalker-TwinDelayed-DDPG (TD3)
BipedalWalkerHardcore-TD3-FORK		BipedalWalkerHardcore-TD3-FORK
CarRacing-From-Pixels-PPO		CarRacing-From-Pixels-PPO
CartPole-Policy-Based-Hill-Climbing		CartPole-Policy-Based-Hill-Climbing
CartPole-Policy-Gradient-Reinforce		CartPole-Policy-Gradient-Reinforce
Cartpole-Deep-Q-Learning		Cartpole-Deep-Q-Learning
Cartpole-Double-Deep-Q-Learning		Cartpole-Double-Deep-Q-Learning
HalfCheetahBulletEnv-TD3		HalfCheetahBulletEnv-TD3
HopperBulletEnv-v0-SAC		HopperBulletEnv-v0-SAC
HopperBulletEnv_v0-TD3		HopperBulletEnv_v0-TD3
LunarLander-v2-DQN		LunarLander-v2-DQN
LunarLanderContinuous-v2-DDPG		LunarLanderContinuous-v2-DDPG
Markov-Decision-Process_6x6		Markov-Decision-Process_6x6
Minitaur-Soft-Actor-Critic		Minitaur-Soft-Actor-Critic
MinitaurDuck-Soft-Actor-Critic		MinitaurDuck-Soft-Actor-Critic
MountainCar-DQN		MountainCar-DQN
MountainCar-Q-Learning		MountainCar-Q-Learning
MountainCarContinuous-TD3		MountainCarContinuous-TD3
MountainCarContinuous_PPO		MountainCarContinuous_PPO
Pong-Policy-Gradient-PPO		Pong-Policy-Gradient-PPO
Pong-Policy-Gradient-REINFORCE		Pong-Policy-Gradient-REINFORCE
Project-1_Navigation-DQN		Project-1_Navigation-DQN
Project-2_Continuous-Control-Crawler-PPO		Project-2_Continuous-Control-Crawler-PPO
Project-2_Continuous-Control-Reacher-DDPG		Project-2_Continuous-Control-Reacher-DDPG
Project-3_Collaboration_Competition-Tennis-Maddpg		Project-3_Collaboration_Competition-Tennis-Maddpg
Snake-Pygame-DQN		Snake-Pygame-DQN
Walker2DBulletEnv-v0_SAC		Walker2DBulletEnv-v0_SAC
Walker2DBulletEnv-v0_TD3		Walker2DBulletEnv-v0_TD3
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Reinforcement Learning Algorithms

Deep-Learner

About

Releases

Packages

Languages

Andres-Ventura/Deep-Learner

Folders and files

Latest commit

History

Repository files navigation

Deep Reinforcement Learning Algorithms

Deep-Learner

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages