This repository contains a Pytorch implementation of our CVPR 2022 paper:
PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning
Santhosh Kumar Ramakrishnan, Devendra Singh Chaplot, Ziad Al-Halah, Jitendra Malik, Kristen Grauman
Meta AI, UT Austin, UC Berkeley
Project website: https://vision.cs.utexas.edu/projects/poni/
State-of-the-art approaches to ObjectGoal navigation rely on reinforcement learning and typically require significant computational resources and time for learning. We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of 'where to look?' for an object and 'how to navigate to (x, y)?'. Our key insight is that 'where to look?' can be treated purely as a perception problem, and learned without environment interactions. To address this, we propose a network that predicts two complementary potential functions conditioned on a semantic map and uses them to decide where to look for an unseen object. We train the potential function network using supervised learning on a passive dataset of top-down semantic maps, and integrate it into a modular framework to perform ObjectGoal navigation. Experiments on Gibson and Matterport3D demonstrate that our method achieves the state-of-the-art for ObjectGoal navigation while incurring up to 1,600x less computational cost for training.
Clone the current repo and required submodules:
git clone [email protected]:srama2512/PONI.git
cd PONI
git submodule init
git submodule update
export PONI_ROOT=<PATH TO PONI/>
Create a conda environment:
conda create --name poni python=3.8.5
conda activate poni
Install pytorch (assuming cuda 10.2):
conda install pytorch==1.9.1 torchvision==0.10.1 torchaudio==0.9.1 cudatoolkit=10.2 -c pytorch
Install dependencies:
cd $PONI_ROOT/dependencies/habitat-lab
pip install -r requirements.txt
python setup.py develop --all
cd $PONI_ROOT/dependencies/habitat-sim
pip install -r requirements.txt
python setup.py install --headless --with-cuda
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.9/index.html
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.9.0+cu102.html
cd $PONI_ROOT/dependencies/astar_pycpp && make
Install requirements for PONI:
cd $PONI_ROOT
pip install -r requirements.txt
Add repository to python path:
export PYTHONPATH=$PYTHONPATH:$PONI_ROOT
-
Download Gibson and Matterport3D scenes following the instructions here.
-
Extract Gibson and MP3D semantic maps.
cd $PONI_ROOT ACTIVE_DATASET="gibson" python scripts/create_semantic_maps.py ACTIVE_DATASET="mp3d" python scripts/create_semantic_maps.py
-
Create dataset for PONI training.
a. First extract FMM distances for all objects in each map.cd $PONI_ROOT ACTIVE_DATASET="gibson" python scripts/precompute_fmm_dists.py ACTIVE_DATASET="mp3d" python scripts/precompute_fmm_dists.py
b. Extract training and validation data for PONI.
ACTIVE_DATASET="gibson" python scripts/create_poni_dataset.py --split "train" ACTIVE_DATASET="gibson" python scripts/create_poni_dataset.py --split "val" ACTIVE_DATASET="mp3d" python scripts/create_poni_dataset.py --split "train" ACTIVE_DATASET="mp3d" python scripts/create_poni_dataset.py --split "val"
-
The extracted data can be visualized using notebooks/visualize_pfs.ipynb.
-
The
create_poni_dataset.py
script also supports parallelized dataset creation. The--map-id
argument can be used to limit the data generation to one specific map. The--map-id-range
argument can be used to limit the data generation to maps in rangei
toj
as follows:--map-id-range i j
. These arguments can be used to divide the data generation across multiple processes within a node or on a cluster with SLURM by passing the appropriate map ids to each job.
To train models for PONI, predict-xy, predict-theta, and predict-action methods, copy over corresponding scripts from $PONI_ROOT/experiment_scripts/<DATASET_NAME>/train_<METHOD_NAME>.sh
to some experiment directory and execute it. For example, to train PONI on Gibson:
mkdir -p $PONI_ROOT/experiments/poni/
cd $PONI_ROOT/experiments/poni
cp $PONI_ROOT/experiment_scripts/gibson/train_poni.sh .
chmod +x train_poni.sh
./train_poni.sh
We release pre-trained models from the experiments in our paper:
Method | Dataset | Checkpoints | ||
---|---|---|---|---|
PONI | Gibson | poni_123.ckpt | poni_234.ckpt | poni_345.ckpt |
Predict-XY | Gibson | pred_xy_123.ckpt | pred_xy_234.ckpt | pred_xy_345.ckpt |
Predict-theta | Gibson | pred_theta_123.ckpt | pred_theta_234.ckpt | pred_theta_345.ckpt |
Predict-action | Gibson | pred_act_123.ckpt | pred_act_234.ckpt | pred_act_345.ckpt |
PONI | MP3D | poni_123.ckpt | poni_234.ckpt | poni_345.ckpt |
Predict-XY | MP3D | pred_xy_123.ckpt | pred_xy_234.ckpt | pred_xy_345.ckpt |
Predict-theta | MP3D | pred_theta_123.ckpt | pred_theta_234.ckpt | pred_theta_345.ckpt |
Predict-action | MP3D | pred_act_123.ckpt |
You can also download all models from here:
mkdir $PONI_ROOT/pretrained_models && cd $PONI_ROOT/pretrained_models
wget -O pretrained_models.tar.gz https://utexas.box.com/shared/static/0v59eqktjs7hicbd16p2etlz2cn3w6g9.gz
tar -xvzf pretrained_models.tar.gz && rm pretrained_models.tar.gz
We use a modified version of the Gibson ObjectNav evaluation setup from SemExp.
- Download the Gibson ObjectNav dataset to
$PONI_ROOT/data/datasets/objectnav/gibson
.cd $PONI_ROOT/data/datasets/objectnav wget -O gibson_objectnav_episodes.tar.gz https://utexas.box.com/shared/static/tss7udt3ralioalb6eskj3z3spuvwz7v.gz tar -xvzf gibson_objectnav_episodes.tar.gz && rm gibson_objectnav_episodes.tar.gz
- Download the image segmentation model [URL] to
$PONI_ROOT/pretrained_models
. - Copy the evaluation script corresponding to the model of interest from
$PONI_ROOT/experiment_scripts/gibson/eval_<METHOD_NAME>.sh
to the required experiment directory. - Set the
MODEL_PATH
variable in the script to the saved checkpoint. By default, it points to the path of a pre-trained model (see previous section). - To reproduce results from the paper, download the pre-trained models and evaluate them using the evaluation scripts.
- To visualize episodes with the semantic map and potential function predictions, add the arguments
--print_images 1 --num_pf_maps 3
in the evaluation script.
We use the ObjectNav evaluation setup from Habitat-Lab for the MP3D dataset.
- Download the MP3D ObjectNav dataset [URL] to
$PONI_ROOT/data/datasets/objectnav/mp3d/v1
. - Download the image segmentation model [URL] to
$PONI_ROOT/pretrained_models
. - Copy the evaluation script corresponding to the model of interest from
$PONI_ROOT/experiment_scripts/mp3d/eval_<METHOD_NAME>.sh
to the required experiment directory (say,$EXPT_ROOT
). - Set the
MODEL_PATH
variable in the script to the saved checkpoint. By default, it points to the path of a pre-trained model. Execute the eval script specifying the ids of 2 GPUs to evaluate on (0, 1 in this example). Note: In general, we found MP3D evaluation to be very slow on a single thread. The current MP3D evaluation code does not support multi-threaded evaluation. Instead, we split the MP3D val episode dataset into 11 parts (one for each scene), and run 11 single-threaded evaluations in parallel. By default, the first GPU evaluates on 6 parts (requiring ~20GB memory), and the second GPU evaluates on 5 parts (requiring ~16GB memory) simultaneously. If this exceeds the memory available on your GPU, please reduce the number of parts per GPU and increase the number of GPUs (i.e., modifyeval_<METHOD_NAME>.sh
)../eval_<METHOD_NAME>.sh 0 1
- Merge results from the 11 splits.
python $PONI_ROOT/hlab/merge_results --path_format "$EXPT_ROOT/mp3d_objectnav/tb_seed_100_val_part_*/stats.json"
- To reproduce results from the paper, download the pre-trained models and evaluate them using the evaluation scripts.
In our work, we used parts of Semantic-MapNet, Habitat-Lab, Object-Goal-Navigation, and astar_pycpp repos and extended them.
If you find this codebase useful, please cite us:
@inproceedings{ramakrishnan2022poni,
author = {Ramakrishnan, Santhosh K. and Chaplot, Devendra Singh and Al-Halah, Ziad and Malik, Jitendra and Grauman, Kristen},
booktitle = {Computer Vision and Pattern Recognition (CVPR), 2022 IEEE Conference on},
title = {PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning},
year = {2022},
organization = {IEEE},
}
This project is released under the MIT license, as found in the LICENSE file.