Updated on 2024.10.26
This repo tracks the research papers and codes related to Vision-and-Language Navigation (VLN). The repository will be continuously updated to keep up with the advancements in VLN. Feel free to follow and star!
- News
- Surveys
- Datasets and Simulators
- Papers and Codes
- Foundation Models
- Tools and Libraries
- Acknowledgements
- Contact
- AI Meets Autonomy: Vision, Language, and Autonomous Systems Workshop is held at the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024), in Abu Dhabi, UAE on Oct.14, 2024.
[Official Website]
-
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Yue Zhang, Ziqiao Ma, Jialu Li, Yanyuan Qiao, Zun Wang, Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi
arXiv, 2024. [Paper] -
Large Language Models for Robotics: Opportunities, Challenges, and Perspectives
Jiaqi Wang, Zihao Wu, Yiwei Li, Hanqi Jiang, Peng Shu, Enze Shi, Huawen Hu, Chong Ma, Yiheng Liu, Xuhui Wang, Yincheng Yao, Xuan Liu, Huaqin Zhao, Zhengliang Liu, Haixing Dai, Lin Zhao, Bao Ge, Xiang Li, Tianming Liu, Shu Zhang
arXiv, 2024. [Paper] -
Large Language Models for Robotics: Opportunities, Challenges, and Perspectives
Jiaqi Wang, Zihao Wu, Yiwei Li, Hanqi Jiang, Peng Shu, Enze Shi, Huawen Hu, Chong Ma, Yiheng Liu, Xuhui Wang, Yincheng Yao, Xuan Liu, Huaqin Zhao, Zhengliang Liu, Haixing Dai, Lin Zhao, Bao Ge, Xiang Li, Tianming Liu, Shu Zhang
arXiv, 2024. [Paper]
In VLN tasks, the utilized datasets provide visual assets and scenes, and simulators render these visual assets and provide an environment for the VLN agent. This section will introduce some VLN datasets and simulators commonly used in VLN research.
-
[Matterport3D dataset] Learning from RGB-D Data in Indoor Environments
Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, Yinda Zhang
3DV, 2017. [Paper] [GitHub][Project Page] -
[R2R dataset] Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sunderhauf, Ian Reid, Stephen Gould, Anton van den Hengel
CVPR, 2018. [Paper] [GitHub] -
[RxR dataset] Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding
Alexander Ku, Peter Anderson, Roma Patel, Eugene le, Jason Baldridge
EMNLP, 2020. [Paper] [GitHub] -
HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal Navigation
Naoki Yokoyama, Ram Ramrakhya, Abhishek Das, Dhruv Batra, Sehoon Ha
IROS, 2024. [Paper] [GitHub]
- [Habitat3.0 simulator] HABITAT 3.0: A CO-HABITAT FOR HUMANS, AVATARS AND ROBOTS
Xavi Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote , Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M. Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, Roozbeh Mottaghi
arXiv, 2023. [Paper] [GitHub] [Project Page]
-
ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation
Abrar Anwar, John Welsh, Joydeep Biswas, Soha Pouya, Yan Chang
arXiv, 2024. [Paper] [GitHub] -
Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models
Mike Zhang, Kaixian Qu, Vaishakh Patil, Cesar Cadena, Marco Hutter
arXiv, 2024. [Paper] [Project Page] -
Adaptive Zone-aware Hierarchical Planner for Vision-Language Navigation
Chen Gao, Xingyu Peng, Mi Yan, He Wang, Lirong Yang, Haibing Ren, Hongsheng Li, Si Liu
CVPR, 2023. [Paper] [GitHub] -
MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation
Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong
ACL, 2024. [Paper] [GitHub] -
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction
Suhwan Choi, Yongjun Cho, Minchan Kim, Jaeyoon Jung, Myunchul Joe, Yubeen Park, Minseo Kim, Sungwoong Kim, Sungjae Lee, Hwiseong Park, Jiwan Chung, Youngjae Yu
arXiv, 2024. [Paper] [GitHub] -
VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation
Naoki Yokoyama, Sehoon Ha, Dhruv Batra, Jiuguang Wang, Bernadette Bucher
ICRA, 2024. [Paper] [GitHub] -
Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation
Francesco Taioli, Stefano Rosa, Alberto Castellini, Lorenzo Natale, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Yiming Wang
IROS, 2024. [Paper] [GitHub] -
MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains
Zhaohuan Zhan, Lisha Yu, Sijie Yu, Guang Tan
arXiv, 2024. [Paper] -
Continual Vision-and-Language Navigation
Seongjun Jeong, Gi-Cheon Kang, Seongho Choi, Joochan Kim, Byoung-Tak Zhang
arXiv, 2024. [Paper] -
Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs
Yanyuan Qiao, Wenqi Lyu, Hui Wang, Zixu Wang, Zerui Li, Yuan Zhang, Mingkui Tan, Qi Wu
arXiv, 2024. [Paper] -
Find Everything: A General Vision Language Model Approach to Multi-Object Search
Daniel Choi, Angus Fung, Haitong Wang, Aaron Hao Tan
arXiv, 2024. [Paper] [GitHub] -
NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
Gengze Zhou, Yicong Hong, Qi Wu
AAAI, 2024. [Paper] [GitHub] -
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Gengze Zhou, Yicong Hong, Zun Wang, Xin Eric Wang, Qi Wu
arXiv, 2024. [Paper] [GitHub] -
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation
Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Junjie Hu, Ming Jiang, Shuqiang Jiang
CVPR, 2024. [Paper] [GitHub] -
Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation
Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Shuqiang Jiang
arXiv, 2024. [Paper] [GitHub] -
LangNav: Language as a Perceptual Representation for Navigation
Bowen Pan, Rameswar Pandam, SouYoung Jin, Rogerio Feris, Aude Oliva, Phillip Isola, Yoon Kim
arXiv, 2024. [Paper] [GitHub] -
Building Cooperative Embodied Agents Modularly with Large Language Models
Bowen Pan, Rameswar Pandam, SouYoung Jin, Rogerio Feris, Aude Oliva, Phillip Isola, Yoon Kim
ICRL, 2024. [Paper] [GitHub]
-
LANA: A Language-Capable Navigator for Instruction Following and Generation
Xiaohan Wang, Wenguan Wang, Jiayi Shao, Yi Yang
CVPR, 2023. [Paper] [GitHub] -
Dreamwalker: Mental planning for continuous vision-language navigation
Hanqing Wang, Wei Liang, Luc Van Gool, Wenguan Wang
CVPR, 2023. [Paper] [GitHub] -
A2Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models
Peihao Chen, Xinyu Sun, Hongyan Zhi, Runhao Zeng, Thomas H. Li, Gaowen Liu, Mingkui Tan, Chuang Gan
arXiv, 2023. [Paper]
-
Sim-2-Sim for Vision-and-Language Navigation in Continuous Environments
Jacob Krantz, Stefan Lee
ECCV, 2022. [Paper] [GitHub] -
Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
Yicong Hong, Zun Wang, Qi Wu, Stephen Gould
CVPR, 2022. [Paper] [GitHub] -
ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments
Dong An, Hanqing Wang, Wenguan Wang, Zun Wang, Yan Huang, Keji He, Liang Wang
CVPR, 2022. [Paper] [GitHub] Winner of the RxR-Habitat Challenge in CVPR 2022 -
Cross-modal map learning for vision and language navigation
Georgios Georgakis, Karl Schmeckpeper, Karan Wanchoo, Soham Dan, Eleni Miltsakaki, Dan Roth, Kostas Daniilidis
CVPR, 2022. [Paper] [GitHub]
-
SOON: Scenario Oriented Object Navigation with Graph-based Exploration
Fengda Zhu, Xiwen Liang, Yi Zhu, Qizhi Yu, Xiaojun Chang1, Xiaodan Liang
CVPR, 2021. [Paper] [GitHub] [Project Page] -
Waypoint Models for Instruction-guided Navigation in Continuous Environments
Fengda Zhu, Xiwen Liang, Yi Zhu, Qizhi Yu, Xiaojun Chang1, Xiaodan Liang
ICCV, 2021. [Paper] [GitHub] [Project Page] -
Think global, act local: Dual-scale graph transformer for vision-and-language navigation
Shizhe Chen, Pierre-Louis Guhur, Makarand Tapaswi, Cordelia Schmid, Ivan Laptev
ICCV, 2021. [Paper] [GitHub] Winner of the ICCV 2021 Workshop Human Interaction for Robotic Navigation REVERIE & SOON Challenges -
History Aware Multimodal Transformer for Vision-and-Language Navigation
Shizhe Chen, Pierre-Louis Guhur, Cordelia Schmid, Ivan Laptev
NIPS, 2021. [Paper] [GitHub]
-
VILA: On Pre-training for Visual Language Models
Ji Lin, Hongxu Yin, Wei Ping, Pavlo Molchanov, Mohammad Shoeybi, Song Han
CVPR, 2024. [Paper] [GitHub] [HuggingFace] -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
arXiv, 2018. [Paper] [GitHub] [Official Website] -
CLIP: Learning Transferable Visual Models From Natural Language Supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
arXiv, 2021. [Paper] [GitHub] [Official Website] -
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi
arXiv, 2023. [Paper] [GitHub] [Project Page] -
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi
arXiv, 2023. [Paper] [GitHub] [Project Page] -
ViNT: A Foundation Model for Visual Navigation
Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Stachowicz, Kevin Black, Noriaki Hirose, Sergey Levine
CoRL, 2023. [Paper] [GitHub] [Project Page]
-
[Transformers] State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
[GitHub][Official Website] -
[LangChain] A framework for developing applications powered by large language models (LLMs)
[GitHub][Official Website]
I would like to thank all the researchers and developers who have contributed to the field of Vision-and-Language Navigation.
If you have any suggestions for this repository, please create an issue or email [email protected].