This project applies Deep Reinforcement Learning on the Unity ML agents Banana environment.
More information on the algorithm, NN architecture and hyper-parameters can be found in this report.
The environment consists of a closed room containing yellow and blue bananas.
The goal is to find yellow bananas and avoid blue bananas. It is an episodic environment, the agent gets a fixed number of steps in which to maximise its reward.
The environment is considered to be solved if an agent gets an average reward of at least 13 over 100 episodes.
A reward of +1 is earned by catching a yellow banana. A penalty of -1 is given for catching a blue banana. Reward at all other times is 0.
This embodies the goal of catching as many yellow bananas as possible and avoiding the blue ones.
The state space is continuous. It consists of vectors of size 37, specifying the agent's velocity and a ray-traced representation of the agent's local field of vision. It specifies presence of any objects under a number of fixed angles in front of the agent.
The action space is discrete and consists of four options:
- go forward (0)
- go backward (1)
- go left (2)
- go right (3)
The project requires Python 3, PyTorch 0.4, the Unity ML API and the Unity ML environment which you can unzip in this project's bin directory (for Mac OS) or download from Udacity:
To install the requirements, first create an Anaconda environment (or another
virtual env of your choice) for the project using python 3.6: conda create --name env_name python=3.6 -y
Activate the environment:
conda activate env_name
Then go to the project's python directory and install the requirements in your environment:
pip install .
Make sure the Unity environment is present in the bin/
directory and the corresponding name has been set in the
ENV_APP constant in config.py
.
The project is run from the command line. There are two entry points. One trains an agent from scratch, the other shows you an agent performing a single episode.
To train a new agent, run the train.py
script. It currently only supports training a DQN agent so no command line
arguments are necessary. (Training parameters are set in config.py and the agent/factory.py module.)
When the environment is solved, the training script saves a checkpoint in the saved_models directory. During training, a checkpoint is saved every 100 iterations as wel. Both can be loaded with the watch script (see next point).
To watch an agent perform a single episode, run the watch.py
script and specify which agent you would like to see using
the --agent
option. Available choices are:
random
: shows a perfectly stupid agent.dqn_pretrained
: shows a pre-trained agent.dqn_checkpoint
: shows the last saved checkpoint reached during training.dqn_solved
: shows the last solution reached by the training script.
Example: python watch.py --agent=dqn_pretrained
This project is part of my Udacity DRL Nanodegree