Skip to content

Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning

License

Notifications You must be signed in to change notification settings

Nat-D/FeatureControlHRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning

This repository contains code used in the experiments in our paper. "Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning" by Nat Dilokthanakul, Christos Kaplanis, Nick Pawlowski, Murray Shanahan (https://arxiv.org/abs/1705.06769)

We adapted the code from an open-source implementation of A3C, namely, “Universe-StarterAgent”. (https://github.com/openai/universe-starter-agent)

UPDATE NOTE:

We later found that a flat A3C agent trained with shaped reward according to equation 3 can perform as well as our feature-control agent in Montezuma Revenge. This result supports the claim that additional auxiliary rewards or loss signals can be beneficial when dealing with sparse reward environments even though the reward can possibly skew the definition of its task.

Importantly, this raises a question about the benefit of having the hierarchical elements proposed in this paper. It appears that decisions made by the meta-controller do not significantly contribute to the success of feature-control agent.

Dependencies

Getting Started

conda create --name env
source activate env

brew install tmux htop cmake      # On Linux use sudo apt-get install -y tmux htop cmake

pip install gym[atari]
pip install universe
pip install six
pip install tensorflow
conda install -y -c https://conda.binstar.org/menpo opencv3
conda install -y numpy
conda install -y scipy

Add the following to your .bashrc so that you'll have the correct environment when the train.py script spawns new bash shells source activate env

Abstract

The problem of sparse rewards is one of the hardest challenges in contemporary reinforcement learning. Hierarchical reinforcement learning (HRL) tackles this problem by using a set of temporally-extended actions, or options, each of which has its own subgoal. These subgoals are normally handcrafted for specific tasks. Here, though, we introduce a generic class of subgoals with broad applicability in the visual domain. Underlying our approach (in common with work using "auxiliary tasks") is the hypothesis that the ability to control aspects of the environment is an inherently useful skill to have. We incorporate such subgoals in an end-to-end hierarchical reinforcement learning system and test two variants of our algorithm on a number of games from the Atari suite. We highlight the advantage of our approach in one of the hardest games -- Montezuma's revenge -- for which the ability to handle sparse rewards is key. Our agent learns several times faster than the current state-of-the-art HRL agent in this game, reaching a similar level of performance.

Reproducing the results

Experiment 1: Influence of the meta-controller on performance

ex1

In this experiment, we evaluated the performance of the pixel-control agent (top row) and the feature-control agent (bottom row). To reproduce the results, checkout to branch pixel_control for pixel-control agent and feature_control for feature-control agent. For example, use the following command for Montezuma's Revenge:

python train.py -w 8 -e MontezumaRevenge-v0 -l ~/experiments/montezuma_experiment

To change the value of beta, edit line 136 of a3c.py to the value of beta we want. For example, for beta = 0.75:

self.beta = 0.75

Experiment 2, 3: With different backpropagation through time (BPTT) length

ex2 ex3

In this experiment, we improve performance by changing the BPTT length from 20 to 100. In order to run experiments with BPTT = 100, checkout branch feature_control_bptt100. For baseline agent, checkout branch baseline. All the experiments used 8 asynchronous workers.

Extra: Improve stability with target network

ex4

In this experiment, we further improve stability of the training by using a target network similar to DQN to calculate the intrinsic reward of the feature control agent. To run the experiment, checkout branch target.

About

Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages