Skip to content

Official repository of "Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation". We present the first dataset - R2R-IE-CE - to benchmark instructions errors in VLN. We then propose a method, IEDL.

Notifications You must be signed in to change notification settings

intelligolabs/R2RIE-CE

Repository files navigation

Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation

Francesco Taioli; Stefano Rosa; Alberto Castellini, Lorenzo Natale, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Yiming Wang

Accepted to IROS 24

contact: [email protected]


Important

Consider citing our paper:

  @article{taioli2024mind,
  title={{Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation}},
  author={Taioli, Francesco and Rosa, Stefano and Castellini, Alberto and Natale, Lorenzo and Del Bue, Alessio and Farinelli, Alessandro and Cristani, Marco and Wang, Yiming},
  journal={arXiv preprint arXiv:2403.10700},
  year={2024},
  url={https://arxiv.org/abs/2403.10700}
  }

Abstract

Vision-and-Language Navigation in Continuous Environments (VLN-CE) is one of the most intuitive yet challenging embodied AI tasks. Agents are tasked to navigate towards a target goal by executing a set of low-level actions, following a series of natural language instructions. All VLN-CE methods in the literature assume that language instructions are exact. However, in practice, instructions given by humans can contain errors when describing a spatial environment due to inaccurate memory or confusion. Current VLN-CE benchmarks do not address this scenario, making the state-of-the-art methods in VLN-CE fragile in the presence of erroneous instructions from human users. For the first time, we propose a novel benchmark dataset that introduces various types of instruction errors considering potential human causes. This benchmark provides valuable insight into the robustness of VLN systems in continuous environments. We observe a noticeable performance drop (up to -25%) in Success Rate when evaluating the state-of-the-art VLN-CE methods on our benchmark. Moreover, we formally define the task of Instruction Error Detection and Localization, and establish an evaluation protocol on top of our benchmark dataset. We also propose an effective method, based on a cross-modal transformer architecture, that achieves the best performance in error detection and localization, compared to baselines. Surprisingly, our proposed method has revealed errors in the validation set of the two commonly used datasets for VLN-CE, i.e., R2R-CE and RxR-CE, demonstrating the utility of our technique in other tasks.

Table of contents

Setup

Install dependencies

  1. Create a virtual environment (tested with python 3.7, torch 1.9.1+cu111, torch-scatter 2.0.9+cu11). and install base dependencies.

    conda create --name r2r_ie_ce python=3.7.12 -c conda-forge
    conda activate r2r_ie_ce
  2. Download the Matterport3D scene meshes. download_mp.py must be obtained from the Matterport3D project webpage.

    # run with python 2.7
    python download_mp.py --task habitat -o data/scene_datasets/mp3d/
    # Extract to: ./data/scene_datasets/mp3d/{scene}/{scene}.glb

Extract such that it has the form data/scene_datasets/mp3d/{scene}/{scene}.glb. There should be 90 scenes. Place the scene_datasets folder in data

  1. Follow the Habitat Installation Guide to install habitat-sim and habitat-lab. We use version v0.1.7 in our experiments. In brief:
  • Install habitat-sim for a machine with multiple GPUs or without an attached display (i.e. a cluster):
    # option 1 - faster
    wget https://anaconda.org/aihabitat/habitat-sim/0.1.7/download/linux-64/habitat-sim-0.1.7-py3.7_headless_linux_856d4b08c1a2632626bf0d205bf46471a99502b7.tar.bz2
    conda install --use-local habitat-sim-0.1.7-py3.7_headless_linux_856d4b08c1a2632626bf0d205bf46471a99502b7.tar.bz2
    
    # option 2 - slower
    conda install -c aihabitat -c conda-forge habitat-sim=0.1.7 headless
  1. Install our project dependencies:

    pip install --ignore-installed -r requirements.txt
  2. Clone habitat-lab from the github repository and install. The command below will install the core of Habitat Lab as well as the habitat_baselines.

    git clone --branch v0.1.7 https://github.com/facebookresearch/habitat-lab.git
    cd habitat-lab
    python setup.py develop --all # install habitat and habitat_baselines
  3. Install the tested version of torch - torch==1.9.1+cu111 and other dependencies:

    pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
    pip install tensorboard==1.15.0 #  TensorBoard logging requires TensorBoard version 1.15 or above

Download models and task dataset

  1. Download BEVBert weights ckpt.iter9600.pth [link] in ckptfolder. Can also be done with gdown (must be installed with pip install gdown). This model is the best BEVBert model ckpts, to be downloaded only if you want train IEDL from scratch. Otherwise, you can skip this step and download IEDL

    gdown --fuzzy [link]
  2. Download IEDL (TODO)

    gdown --fuzzy [link]
  3. Download the waypoint predictor check_cwp_bestdist_hfov90 [link] for CE (continuous environment) and place it in data/wp_pred

    gdown --fuzzy [link]
  4. Download the task dataset - R2RIE-CE from gdrive, and place it under data/datasets/

    cd data/datasets
    gdown --fuzzy https://drive.google.com/file/d/1GbypzvkiQ-e8M2I77UU5YDIZXi1sHkC3/view?usp=sharing
    unzip R2RIE_CE_1_3_v1.zip; rm -rf R2RIE_CE_1_3_v1.zip
  5. Download gibson-2plus-resnet50.pth [link] and place in a folder of your choice.

    wget [link]

Then, set the path of this .pth in MODEL.DEPTH_ENCODER.ddppo_checkpoint in eval and train scripts.

How to run

For training: Go to run_R2RIE-CE/train.bash and set a folder name to save your checkpoints. To do that, set the variale WANDB_RUN_NAME. Then, copy the original BEVBert ckpt - ckpt/ckpt.iter9600.pth - in that folder and run the following command:

CUDA_VISIBLE_DEVICES="0,1" bash run_R2RIE-CE/train.bash 2333

For evaluation:

CUDA_VISIBLE_DEVICES="0,1" bash run_R2RIE-CE/eval.bash 2333

Docs

See the documentation on how to use the dataset (changing sensor, update task definition, ecc) in the docs folder.

Acknowledge

Our implementation is inspired by BEVBert.

Thanks for open sourcing this great work!

About

Official repository of "Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation". We present the first dataset - R2R-IE-CE - to benchmark instructions errors in VLN. We then propose a method, IEDL.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published