Code for Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration. Paper accepted at NeurIPS 2023!
Authors: Zhihan Liu*, Miao Lu*, Wei Xiong*, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang (* indicates equal contribution)
The code can be set up by:
git clone https://github.com/agentification/MEX.git
cd MEX/MEX_MB
pip install -e ".[dev]"
Below we provide an example to train MEX-MB in a single environment, e.g., Ant-v2:
python ./mbrl/examples/main.py algorithm=mbpo overrides=mbpo_ant comment=mbpo device=cuda:0 seed=0
To train MEX-MB in other environments, change the overrides
argument to the ones in MEX_MB/mbrl/examples/conf/overrides
.
The sparse environments are implemented in the MEX_MB/mujoco
folder, which can replace the original gym/envs/mujoco
to enable training in the sparse-reward tasks.
MEX-MF is trained using Python 3.7 and PyTorch 1.2.
Other dependencies can be set up by:
cd MEX/MEX_MF
pip install -r env.txt
The results in the paper can be reproduced by running:
./run_experiments.sh
Below we provide an example to train MEX-MF in a single environment, e.g., HalfCheetah-v2:
python main.py --env HalfCheetah-v2 --policy IO
For sparse tasks, please specify --sparse, e.g.,
python main.py --env walker-vel-sparse --sprase --policy IO
@article{liu2023one,
title={One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration},
author={Liu, Zhihan and Lu, Miao and Xiong, Wei and Zhong, Han and Hu, Hao and Zhang, Shenao and Zheng, Sirui and Yang, Zhuoran and Wang, Zhaoran},
journal={arXiv preprint arXiv:2305.18258},
year={2023}
}