Fork of the original Meta-Learning Shared Hierarchies repository.
Code for Meta-Learning Shared Hierarchies.
Includes pre-trained checkpoints for task AntBandits-v1 (up to 2000 epochs) in mlsh_code/savedir/.
Install mujoco-py and MuJoCo by following the installation instructions.
To test whether MuJoCo works, execute the following command in the mjpro*/bin
folder:
./simulate ../model/humanoid.xml
Add to your .bashrc
(replace ... with path to directory):
export PYTHONPATH=$PYTHONPATH:/.../mlsh/gym;
export PYTHONPATH=$PYTHONPATH:/.../mlsh/rl-algs;
Install MovementBandits environments:
cd test_envs
pip install -e .
Use pip3
if pip
links to an older version of Python (like in the university computers).
If you are installing on the university computers, install mujoco-py version 0.5.7 with MuJoCo version 1.31 as newer versions will not work.
You may also think about installing TensorFlow 1.5.0 to be sure that it will run on any university computer although this is not as important.
Also remember to always use python3
or pip3
instead of python
or pip
.
pip3 install tensorflow==1.5.0 mujoco-py==0.5.7
cd mlsh_code
The MLSH script works on any Gym environment that implements the randomizeCorrect()
function. See the envs/
folder for examples of such environments.
To run on multiple cores:
mpirun -np 12 python main.py ...
python main.py --task AntBandits-v1 --num_subs 2 --num_epochs 10000 --macro_duration 1000 --num_rollouts 2000 --warmup_time 20 --train_time 30 --replay False AntAgent
Use python3
if python
links to an older version of Python (like in the university computers).
Parameter --save_every X
controls how often checkpoints are saved (every X
epochs). Default is 500.
Once you've trained your agent, view it by running:
python main.py [...] --replay True --continue_iter [your iteration] AntAgent
[your iteration]
should be 2000 to view the latest uploaded checkpoint