demo.mp4
We present OmniDrive, a holistic Drive LLM-Agent framework for end-to-end autonomous driving. Our main contributions involve novel solutions in both model (OmniDrive-Agent) and benchmark (OmniDrive-nuScenes). The former features a novel 3D multimodal LLM design that uses sparse queries to lift and compress visual representations into 3D. The latter is constituted of comprehensive VQA tasks for reasoning and planning, including scene description, traffic regulation, 3D grounding, counterfactual reasoning, decision making and planning.
[2025/04/16]
Adding TensorRT support. [Link][2025/02/26]
OmniDrive is accepted to CVPR 2025.[2024/07/18]
OmniDrive-nuScenes model release. [HF][2024/05/02]
OmniDrive-nuScenes dataset release. [Data][2024/05/02]
Technical report release. [arXiv]
Please follow Environment Setup step by step.
- OmnDrive Training Framework
- OmnDrive Dataset
- OmnDrive Checkpoint
- Evaluation
- Data Generation
- TensorRT Inference
- Tiny LLM
Joint End-to-end Planning and Reasoning
Interactive Conversation with Ego Vehicle
Counterfactual Reasoning of Planning Behaviors
If this work is helpful for your research, please consider citing:
@inproceedings{wang2025omnidrive,
title={{OmniDrive}: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning},
author={Shihao Wang and Zhiding Yu and Xiaohui Jiang and Shiyi Lan and Min Shi and Nadine Chang and Jan Kautz and Ying Li and Jose M. Alvarez},
booktitle={CVPR},
year={2025}
}
The team would like to give special thanks to the NVIDIA TSE Team, including Le An, Chengzhe Xu, Yuchao Jin, and Josh Park, for their exceptional work on the TensorRT deployment of OmniDrive.