GitHub - mbzuai-oryx/DriveLMM-o1: Reasoning DriveLMM

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

Ayesha Ishaq, Jean Lahoud, Ketan More, Omkar Thawakar, Ritesh Thawkar, Dinura Dissanayake, Noor Ahsan, Yuhao Li, Fahad Shahbaz Khan, Hisham Cholakkal, Ivan Laptev, Rao Muhammad Anwer and Salman Khan

Mohamed bin Zayed University of Artificial Intelligence, UAE

If you like our project, please give us a star ⭐ on GitHub for the latest updates.

📣 Latest Updates

March-13-2025: DriveLMM-o1 is released on Arxiv. Paper
March-12-2025: Code, Model & Dataset release. Model Checkpoint: HuggingFace. Dataset: HuggingFace. Code is available at: GitHub. 🤗

🏆 Contributions

We introduce a dataset and benchmark specifically designed to assess the reasoning capabilities of models in autonomous driving, capturing diverse real-world scenarios.
We annotate driving scenes that inherently contain rich inputs, including multiview images, LiDAR point clouds, and temporal information, facilitating the integration of various modalities for future VQA solutions.
We propose a novel evaluation metric tailored to autonomous driving, measuring the logical coherence and accuracy of model-generated explanations.
We evaluate previous open-source and closed-source models on our proposed benchmark and introduce a model trained on our step-by-step reasoning dataset. Experimental results show that our model outperforms all models in both reasoning score and final accuracy.

📂 Dataset Overview

DriveLMM-o1 dataset includes diverse real-world driving scenarios with structured reasoning annotations. It provides a rich benchmark for evaluating autonomous driving LMMs.

Dataset Examples

📊 Benchmark & Results

Results Overview

Table 1: Comparison of models based on Final Answer accuracy and Driving-Specific Reasoning Steps Metrics on DriveLMM-o1 Benchmark.

⚙️ Model Setup

Load Pretrained Model

from transformers import AutoModel, AutoTokenizer
import torch

path = 'ayeshaishaq/DriveLMMo1'
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True
).eval().cuda()

tokenizer = AutoTokenizer.from_pretrained(
    path,
    trust_remote_code=True,
    use_fast=False
)

🏃 Inference

To run inference on our model, add the relevant image path to evaluation/demo.py and run:

python evaluation/demo.py

📏 Evaluation

To reproduce our results, first convert our test set to jsonl format using:

python evaluation/prepare_data_internvl.py

set the relevent data paths.

then, obtain model output on our test set using evaluation/inference.py,

torchrun --nnodes=1 \
         --node_rank=0 \
         --master_addr=127.0.0.1 \
         --nproc_per_node=4 \
         --master_port=63668 \
         evaluation/inference.py

lastly, set the output file path in evaluation/evaluation.py and your OpenAI key and run:

python evaluation/evaluation.py

📚 Dataset & Model Links

Model on Hugging Face: DriveLMM-o1 Model
Dataset on Hugging Face: DriveLMM-o1 Dataset

📜 Citation

If you find this work useful, please cite our paper:

@misc{ishaq2025drivelmmo1stepbystepreasoningdataset,
      title={DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding}, 
      author={Ayesha Ishaq and Jean Lahoud and Ketan More and Omkar Thawakar and Ritesh Thawkar and Dinura Dissanayake and Noor Ahsan and Yuhao Li and Fahad Shahbaz Khan and Hisham Cholakkal and Ivan Laptev and Rao Muhammad Anwer and Salman Khan},
      year={2025},
      eprint={2503.10621},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.10621}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
evaluation		evaluation
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

If you like our project, please give us a star ⭐ on GitHub for the latest updates.

📣 Latest Updates

🏆 Contributions

📂 Dataset Overview

Dataset Examples

📊 Benchmark & Results

Results Overview

⚙️ Model Setup

Load Pretrained Model

🏃 Inference

📏 Evaluation

📚 Dataset & Model Links

📜 Citation

About

Releases

Packages

Contributors 2

Languages

mbzuai-oryx/DriveLMM-o1

Folders and files

Latest commit

History

Repository files navigation

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

If you like our project, please give us a star ⭐ on GitHub for the latest updates.

📣 Latest Updates

🏆 Contributions

📂 Dataset Overview

Dataset Examples

📊 Benchmark & Results

Results Overview

⚙️ Model Setup

Load Pretrained Model

🏃 Inference

📏 Evaluation

📚 Dataset & Model Links

📜 Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages