By Yuanch Liang and Hanna Kurniawati
School of Computing, The Australian National University, Canberra, Australia
We propose to use a encoder and a recurrent generator conditoned on the encoded environment context to produce macro actions for a planner.
The encoder and the recurrent generator is trained end to end with the following architecture,
When testing on four different environments, we obtain below promising results,
Please refer to the paper for more detailed methodlogy and experiments discussions.
The source code is built on top of MAGIC. The simulators are written in C++ (cpp
folder) and training schemes are written in Python (python
folder).
If you are looking for the specifics of each task (e.g. parameters, constants, dynamics), jump ahead to:
cpp/include/core/simulations/
for parameters and constantscpp/src/core/simulations/
for task dynamics and observation models
You will need to compile the C++ binaries to run the Python experiments.
For compiling the binaries, you will need to have:
- Boost library
- OpenCV library
- At least GCC-7.0
- At least CMake 3.14
mkdir cpp/build; cd cpp/build;
cmake ..
make
Ensure that the Boost and OpenCV headers and libraries are accessible during compilation (cmake
and make
). If installed in a custom location, you may find the CMake flag cmake .. -DCMAKE_PREFIX_PATH=<custom location>
useful.
To run the (virtual) experiments, you will need to have:
- C++ binaries compiled (see previous C++ section)
- At least Python 3.6.8
- A CUDA enabled GPU
- Dependent PIP packages:
pip3 install np torch pyzmq gym tensorboard
- Additional dependent packages:
power_spherical
The python
folder contains all scripts to run experiments. It is split into multiple subfolders each serving a different purpose.
Note in the code we name MAGIC as the vanilla model, MAE as the encoder and RMAG as the RNN model.
mdespot_handcrafted/
: Scripts to run (Macro-)DESPOT using handcrafted actions/macro-actions on our tasks.evaluate.py
: to visualize the approach.- e.g.
python3 evaluate.py --task=LightDark --macro-length=4
- e.g.
benchmark.py
: to test performance.- e.g.
python3 benchmark.py --task=LightDark --macro-length=4 --num-env=16
- e.g.
mdespot_magic/
: Scripts to run MAGIC, MAE and RMAG. Change the argumentgen-model-name
accordingly (Vanilla for MAGIC, Encoder for MAE, RNN for RMAG)evaluate.py
to visualize the approach using a trained Generator.- e.g.
python3 evaluate.py --task=LightDark --macro-length=8 --model-path=../models/learned_LightDark_8 --model-index=500000 --gen-model-name=Vanilla
- e.g.
benchmark.py
to test performance using a trained Generator.- e.g.
python3 benchmark.py --task=LightDark --macro-length=8 --num-env=16 --models-folder=../models/learned_LightDark_8 --model-index=500000 --gen-model-name=Vanilla
- e.g.
train.py
to train both Generator + Critic.- e.g.
python3 train.py --task=LightDark --macro-length=8 --num-env=16 --num-iterations=500000 --output-dir=../models/learned_LightDark_8 --gen-model-name=Vanilla
- e.g.
models/
: Contains the neural networks used. Also contains the trained models for each task.pomcpow/
: Scripts to run POMCPOW for our tasks.valuate.py
: to visualize the approach.- e.g.
python3 --task=LightDark
- e.g.
benchmark.py
: to test performance.- e.g.
python3 --task=LightDark --num-env=16
- e.g.
tune_params.py
: to tune the POMCPOW hyperparameters via grid search.- e.g.
python3 --task=LightDark --trials=30 --num-env=16
- e.g.