Skip to content

A curated list of awesome Vision-and-Language Navigation(VLN) resources (continually updated)

License

Notifications You must be signed in to change notification settings

KMS-TEAM/Awesome-Vision-and-Language-Navigation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Awesome-Vision-and-Language-Navigation

Updated on 2024.10.26

This repo tracks the research papers and codes related to Vision-and-Language Navigation (VLN). The repository will be continuously updated to keep up with the advancements in VLN. Feel free to follow and star!

Table of Contents

News

  • AI Meets Autonomy: Vision, Language, and Autonomous Systems Workshop is held at the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024), in Abu Dhabi, UAE on Oct.14, 2024.
    [Official Website]

Surveys

  • Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
    Yue Zhang, Ziqiao Ma, Jialu Li, Yanyuan Qiao, Zun Wang, Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi
    arXiv, 2024. [Paper]

  • Large Language Models for Robotics: Opportunities, Challenges, and Perspectives
    Jiaqi Wang, Zihao Wu, Yiwei Li, Hanqi Jiang, Peng Shu, Enze Shi, Huawen Hu, Chong Ma, Yiheng Liu, Xuhui Wang, Yincheng Yao, Xuan Liu, Huaqin Zhao, Zhengliang Liu, Haixing Dai, Lin Zhao, Bao Ge, Xiang Li, Tianming Liu, Shu Zhang
    arXiv, 2024. [Paper]

  • Large Language Models for Robotics: Opportunities, Challenges, and Perspectives
    Jiaqi Wang, Zihao Wu, Yiwei Li, Hanqi Jiang, Peng Shu, Enze Shi, Huawen Hu, Chong Ma, Yiheng Liu, Xuhui Wang, Yincheng Yao, Xuan Liu, Huaqin Zhao, Zhengliang Liu, Haixing Dai, Lin Zhao, Bao Ge, Xiang Li, Tianming Liu, Shu Zhang
    arXiv, 2024. [Paper]

Datasets and Simulators

In VLN tasks, the utilized datasets provide visual assets and scenes, and simulators render these visual assets and provide an environment for the VLN agent. This section will introduce some VLN datasets and simulators commonly used in VLN research.

Datasets

  • [Matterport3D dataset] Learning from RGB-D Data in Indoor Environments
    Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, Yinda Zhang
    3DV, 2017. [Paper] [GitHub][Project Page]

  • [R2R dataset] Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
    Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sunderhauf, Ian Reid, Stephen Gould, Anton van den Hengel
    CVPR, 2018. [Paper] [GitHub]

  • [RxR dataset] Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding
    Alexander Ku, Peter Anderson, Roma Patel, Eugene le, Jason Baldridge
    EMNLP, 2020. [Paper] [GitHub]

  • HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal Navigation
    Naoki Yokoyama, Ram Ramrakhya, Abhishek Das, Dhruv Batra, Sehoon Ha
    IROS, 2024. [Paper] [GitHub]

Simulators

  • [Habitat3.0 simulator] HABITAT 3.0: A CO-HABITAT FOR HUMANS, AVATARS AND ROBOTS
    Xavi Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote , Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M. Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, Roozbeh Mottaghi
    arXiv, 2023. [Paper] [GitHub] [Project Page]

Papers and Codes

2024

  • ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation
    Abrar Anwar, John Welsh, Joydeep Biswas, Soha Pouya, Yan Chang
    arXiv, 2024. [Paper] [GitHub]

  • Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models
    Mike Zhang, Kaixian Qu, Vaishakh Patil, Cesar Cadena, Marco Hutter
    arXiv, 2024. [Paper] [Project Page]

  • Adaptive Zone-aware Hierarchical Planner for Vision-Language Navigation
    Chen Gao, Xingyu Peng, Mi Yan, He Wang, Lirong Yang, Haibing Ren, Hongsheng Li, Si Liu
    CVPR, 2023. [Paper] [GitHub]

  • MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation
    Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong
    ACL, 2024. [Paper] [GitHub]

  • CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction
    Suhwan Choi, Yongjun Cho, Minchan Kim, Jaeyoon Jung, Myunchul Joe, Yubeen Park, Minseo Kim, Sungwoong Kim, Sungjae Lee, Hwiseong Park, Jiwan Chung, Youngjae Yu
    arXiv, 2024. [Paper] [GitHub]

  • VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation
    Naoki Yokoyama, Sehoon Ha, Dhruv Batra, Jiuguang Wang, Bernadette Bucher
    ICRA, 2024. [Paper] [GitHub]

  • Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation
    Francesco Taioli, Stefano Rosa, Alberto Castellini, Lorenzo Natale, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Yiming Wang
    IROS, 2024. [Paper] [GitHub]

  • MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains
    Zhaohuan Zhan, Lisha Yu, Sijie Yu, Guang Tan
    arXiv, 2024. [Paper]

  • Continual Vision-and-Language Navigation
    Seongjun Jeong, Gi-Cheon Kang, Seongho Choi, Joochan Kim, Byoung-Tak Zhang
    arXiv, 2024. [Paper]

  • Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs
    Yanyuan Qiao, Wenqi Lyu, Hui Wang, Zixu Wang, Zerui Li, Yuan Zhang, Mingkui Tan, Qi Wu
    arXiv, 2024. [Paper]

  • Find Everything: A General Vision Language Model Approach to Multi-Object Search
    Daniel Choi, Angus Fung, Haitong Wang, Aaron Hao Tan
    arXiv, 2024. [Paper] [GitHub]

  • NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
    Gengze Zhou, Yicong Hong, Qi Wu
    AAAI, 2024. [Paper] [GitHub]

  • NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
    Gengze Zhou, Yicong Hong, Zun Wang, Xin Eric Wang, Qi Wu
    arXiv, 2024. [Paper] [GitHub]

  • Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation
    Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Junjie Hu, Ming Jiang, Shuqiang Jiang
    CVPR, 2024. [Paper] [GitHub]

  • Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation
    Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Shuqiang Jiang
    arXiv, 2024. [Paper] [GitHub]

  • LangNav: Language as a Perceptual Representation for Navigation
    Bowen Pan, Rameswar Pandam, SouYoung Jin, Rogerio Feris, Aude Oliva, Phillip Isola, Yoon Kim
    arXiv, 2024. [Paper] [GitHub]

  • Building Cooperative Embodied Agents Modularly with Large Language Models
    Bowen Pan, Rameswar Pandam, SouYoung Jin, Rogerio Feris, Aude Oliva, Phillip Isola, Yoon Kim
    ICRL, 2024. [Paper] [GitHub]

2023

  • LANA: A Language-Capable Navigator for Instruction Following and Generation
    Xiaohan Wang, Wenguan Wang, Jiayi Shao, Yi Yang
    CVPR, 2023. [Paper] [GitHub]

  • Dreamwalker: Mental planning for continuous vision-language navigation
    Hanqing Wang, Wei Liang, Luc Van Gool, Wenguan Wang
    CVPR, 2023. [Paper] [GitHub]

  • A2Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models
    Peihao Chen, Xinyu Sun, Hongyan Zhi, Runhao Zeng, Thomas H. Li, Gaowen Liu, Mingkui Tan, Chuang Gan
    arXiv, 2023. [Paper]

2022

  • Sim-2-Sim for Vision-and-Language Navigation in Continuous Environments
    Jacob Krantz, Stefan Lee
    ECCV, 2022. [Paper] [GitHub]

  • Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
    Yicong Hong, Zun Wang, Qi Wu, Stephen Gould
    CVPR, 2022. [Paper] [GitHub]

  • ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments
    Dong An, Hanqing Wang, Wenguan Wang, Zun Wang, Yan Huang, Keji He, Liang Wang
    CVPR, 2022. [Paper] [GitHub] Winner of the RxR-Habitat Challenge in CVPR 2022

  • Cross-modal map learning for vision and language navigation
    Georgios Georgakis, Karl Schmeckpeper, Karan Wanchoo, Soham Dan, Eleni Miltsakaki, Dan Roth, Kostas Daniilidis
    CVPR, 2022. [Paper] [GitHub]

2021

  • SOON: Scenario Oriented Object Navigation with Graph-based Exploration
    Fengda Zhu, Xiwen Liang, Yi Zhu, Qizhi Yu, Xiaojun Chang1, Xiaodan Liang
    CVPR, 2021. [Paper] [GitHub] [Project Page]

  • Waypoint Models for Instruction-guided Navigation in Continuous Environments
    Fengda Zhu, Xiwen Liang, Yi Zhu, Qizhi Yu, Xiaojun Chang1, Xiaodan Liang
    ICCV, 2021. [Paper] [GitHub] [Project Page]

  • Think global, act local: Dual-scale graph transformer for vision-and-language navigation
    Shizhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev
    ICCV, 2021. [Paper] [GitHub] Winner of the ICCV 2021 Workshop Human Interaction for Robotic Navigation REVERIE & SOON Challenges

  • History Aware Multimodal Transformer for Vision-and-Language Navigation
    Shizhe Chen, Pierre-Louis Guhur, Cordelia Schmid, Ivan Laptev
    NIPS, 2021. [Paper] [GitHub]

Foundation Models

  • VILA: On Pre-training for Visual Language Models
    Ji Lin, Hongxu Yin, Wei Ping, Pavlo Molchanov, Mohammad Shoeybi, Song Han
    CVPR, 2024. [Paper] [GitHub] [HuggingFace]

  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
    arXiv, 2018. [Paper] [GitHub] [Official Website]

  • CLIP: Learning Transferable Visual Models From Natural Language Supervision
    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
    arXiv, 2021. [Paper] [GitHub] [Official Website]

  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
    Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi
    arXiv, 2023. [Paper] [GitHub] [Project Page]

  • BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
    Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi
    arXiv, 2023. [Paper] [GitHub] [Project Page]

  • ViNT: A Foundation Model for Visual Navigation
    Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Stachowicz, Kevin Black, Noriaki Hirose, Sergey Levine
    CoRL, 2023. [Paper] [GitHub] [Project Page]

Tools and Libraries

Acknowledgements

I would like to thank all the researchers and developers who have contributed to the field of Vision-and-Language Navigation.

Contact

If you have any suggestions for this repository, please create an issue or email [email protected].

About

A curated list of awesome Vision-and-Language Navigation(VLN) resources (continually updated)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published