Skip to content

mbzuai-oryx/DriveLMM-o1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

arXiv

Ayesha Ishaq, Jean Lahoud, Ketan More, Omkar Thawakar, Ritesh Thawkar, Dinura Dissanayake, Noor Ahsan, Yuhao Li, Fahad Shahbaz Khan, Hisham Cholakkal, Ivan Laptev, Rao Muhammad Anwer and Salman Khan

Mohamed bin Zayed University of Artificial Intelligence, UAE

If you like our project, please give us a star ⭐ on GitHub for the latest updates.

📣 Latest Updates

  • March-13-2025: DriveLMM-o1 is released on Arxiv. Paper
  • March-12-2025: Code, Model & Dataset release. Model Checkpoint: HuggingFace. Dataset: HuggingFace. Code is available at: GitHub. 🤗

🏆 Contributions

  • We introduce a dataset and benchmark specifically designed to assess the reasoning capabilities of models in autonomous driving, capturing diverse real-world scenarios.
  • We annotate driving scenes that inherently contain rich inputs, including multiview images, LiDAR point clouds, and temporal information, facilitating the integration of various modalities for future VQA solutions.
  • We propose a novel evaluation metric tailored to autonomous driving, measuring the logical coherence and accuracy of model-generated explanations.
  • We evaluate previous open-source and closed-source models on our proposed benchmark and introduce a model trained on our step-by-step reasoning dataset. Experimental results show that our model outperforms all models in both reasoning score and final accuracy.

📂 Dataset Overview

DriveLMM-o1 dataset includes diverse real-world driving scenarios with structured reasoning annotations. It provides a rich benchmark for evaluating autonomous driving LMMs.

Dataset Examples


📊 Benchmark & Results

Results Overview

Table 1: Comparison of models based on Final Answer accuracy and Driving-Specific Reasoning Steps Metrics on DriveLMM-o1 Benchmark.


⚙️ Model Setup

Load Pretrained Model

from transformers import AutoModel, AutoTokenizer
import torch

path = 'ayeshaishaq/DriveLMMo1'
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True
).eval().cuda()

tokenizer = AutoTokenizer.from_pretrained(
    path,
    trust_remote_code=True,
    use_fast=False
)

🏃 Inference

To run inference on our model, add the relevant image path to evaluation/demo.py and run:

python evaluation/demo.py

📏 Evaluation

To reproduce our results, first convert our test set to jsonl format using:

python evaluation/prepare_data_internvl.py

set the relevent data paths.

then, obtain model output on our test set using evaluation/inference.py,

torchrun --nnodes=1 \
         --node_rank=0 \
         --master_addr=127.0.0.1 \
         --nproc_per_node=4 \
         --master_port=63668 \
         evaluation/inference.py

lastly, set the output file path in evaluation/evaluation.py and your OpenAI key and run:

python evaluation/evaluation.py

📚 Dataset & Model Links


📜 Citation

If you find this work useful, please cite our paper:

@misc{ishaq2025drivelmmo1stepbystepreasoningdataset,
      title={DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding}, 
      author={Ayesha Ishaq and Jean Lahoud and Ketan More and Omkar Thawakar and Ritesh Thawkar and Dinura Dissanayake and Noor Ahsan and Yuhao Li and Fahad Shahbaz Khan and Hisham Cholakkal and Ivan Laptev and Rao Muhammad Anwer and Salman Khan},
      year={2025},
      eprint={2503.10621},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.10621}, 
}

About

Reasoning DriveLMM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages