This repository contains code used in the experiments in our paper. "Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning" by Nat Dilokthanakul, Christos Kaplanis, Nick Pawlowski, Murray Shanahan (https://arxiv.org/abs/1705.06769)
We adapted the code from an open-source implementation of A3C, namely, “Universe-StarterAgent”. (https://github.com/openai/universe-starter-agent)
We later found that a flat A3C agent trained with shaped reward according to equation 3 can perform as well as our feature-control agent in Montezuma Revenge. This result supports the claim that additional auxiliary rewards or loss signals can be beneficial when dealing with sparse reward environments even though the reward can possibly skew the definition of its task.
Importantly, this raises a question about the benefit of having the hierarchical elements proposed in this paper. It appears that decisions made by the meta-controller do not significantly contribute to the success of feature-control agent.
- Python 2.7
- six (for py2/3 compatibility)
- TensorFlow
- tmux (the start script opens up a tmux session with multiple windows)
- htop (shown in one of the tmux windows)
- gym
- gym[atari]
- universe
- opencv-python
- numpy
- scipy
conda create --name env
source activate env
brew install tmux htop cmake # On Linux use sudo apt-get install -y tmux htop cmake
pip install gym[atari]
pip install universe
pip install six
pip install tensorflow
conda install -y -c https://conda.binstar.org/menpo opencv3
conda install -y numpy
conda install -y scipy
Add the following to your .bashrc
so that you'll have the correct environment when the train.py
script spawns new bash shells
source activate env
The problem of sparse rewards is one of the hardest challenges in contemporary reinforcement learning. Hierarchical reinforcement learning (HRL) tackles this problem by using a set of temporally-extended actions, or options, each of which has its own subgoal. These subgoals are normally handcrafted for specific tasks. Here, though, we introduce a generic class of subgoals with broad applicability in the visual domain. Underlying our approach (in common with work using "auxiliary tasks") is the hypothesis that the ability to control aspects of the environment is an inherently useful skill to have. We incorporate such subgoals in an end-to-end hierarchical reinforcement learning system and test two variants of our algorithm on a number of games from the Atari suite. We highlight the advantage of our approach in one of the hardest games -- Montezuma's revenge -- for which the ability to handle sparse rewards is key. Our agent learns several times faster than the current state-of-the-art HRL agent in this game, reaching a similar level of performance.
In this experiment, we evaluated the performance of the pixel-control agent (top row) and the feature-control agent (bottom row). To reproduce the results, checkout to branch pixel_control for pixel-control agent and feature_control for feature-control agent. For example, use the following command for Montezuma's Revenge:
python train.py -w 8 -e MontezumaRevenge-v0 -l ~/experiments/montezuma_experiment
To change the value of beta, edit line 136 of a3c.py to the value of beta we want. For example, for beta = 0.75:
self.beta = 0.75
In this experiment, we improve performance by changing the BPTT length from 20 to 100. In order to run experiments with BPTT = 100, checkout branch feature_control_bptt100. For baseline agent, checkout branch baseline. All the experiments used 8 asynchronous workers.
In this experiment, we further improve stability of the training by using a target network similar to DQN to calculate the intrinsic reward of the feature control agent. To run the experiment, checkout branch target.