Trains an agent to collect yellow bananas and avoid purple bananas.
The environment provides the state as a 37 dimension vector containing the agent's velocity and a ray-based perception of objects around the agent's forward direction. The reward provided by the environment is +1 for collecting a yellow banana and -1 for a purple banana.
The agent returns an integer in [0, 3] representing the following directions:
0
- move forward.1
- move backward.2
- turn left.3
- turn right.
The environment is considered solved when the average culmative reward over 100 consecutive episodes is above 13. The current agent solves the environment after around 550 episodes.
Uses Double DQN with 3 layer FC network. See Report.md for more details.
-
Download the environment for your operating system below.
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
-
Extract the contents into banana_app/
-
Install anaconda
-
Install pytorch and unityagents
Run either main.py or use navigation.ipynb to run environment on existing model or retrain.z