🚗💡 Tracking Meets Large Multimodal Models for Driving Scenario Understanding

📌 Enhancing autonomous driving with tracking-powered multimodal understanding!

✨ Overview

This repository presents an innovative approach that integrates 3D object tracking into Large Multimodal Models (LMMs) to enhance spatiotemporal understanding in autonomous driving. 🚘⚡ By leveraging tracking information, we significantly improve perception, planning, and prediction compared to baseline models.

🔹 Key Benefits:

📸 Vision + Tracking: We enhance VQA in autonomous driving by integrating tracking-based embeddings.
🚀 3D Object Tracking: We use 3DMOTFormer for robust multi-object tracking, improving contextual understanding.
🔗 Multimodal Fusion: Images and tracking features are jointly processed to enhance reasoning and predictions.
🧠 Self-Supervised Pretraining: Our tracking encoder boosts model comprehension.
🏆 Benchmark Success: We achieve a 9.5% accuracy gain and 7.04-point ChatGPT score improvement on DriveLM-nuScenes, and a 3.7% final score increase on DriveLM-CARLA. 📊🔥

📂 Data Preparation

🔹 VQA Datasets: Obtain datasets following instructions from DriveLM.

🔹 Tracking Data:

📌 Step 1: Generate 3D object and ego-vehicle tracks using 3DMOTFormer.
📌 Step 2: Process these tracks to map key object and ego-vehicle trajectories for each question.

🏆 Results

🚘 DriveLM-nuScenes

🌍 DriveLM-CARLA

⚙️ Setup & Fine-Tuning

💡 To set up and fine-tune the model, refer to [llama_adapter_v2_multimodal7b/README.md] in this repository.

🚀 Inference

🔧 Before running inference, extract the adapter weights using save_weights.py. Inside this script, set the trained weights path and output path accordingly.

Run the following command to perform inference on test data:

cd  llama_adapter_v2_multimodal7b/
python demo.py --llama_dir /path/to/llama_model_weights \
               --checkpoint /path/to/pre-trained/checkpoint.pth \
               --data ../test_llama.json  \
               --output ../output.json \
               --batch_size 4 \
               --num_processes 8

📊 Evaluation

🔍 To evaluate the model's performance:

1️⃣ Set up the evaluation package using instructions in DriveLM Challenge ReadMe.

2️⃣ Run the evaluation script:

python evaluation/evaluation.py --root_path1 ./output.json --root_path2 ./test_eval.json

🚀 TODO List

📢 Release pretrained weights
🎯 Release finetuned checkpoint
📊 Release nuScenes train and test VQA with tracks

🙏 Acknowledgments

We sincerely appreciate the contributions and resources from the following projects:

🚗 DriveLM – Benchmark datasets & evaluation.
🦙 LLaMA Adapter – Large Multimodal Model foundation.
🎯 3DMOTFormer – 3D multi-object tracking.
🌍 nuScenes Dataset – Real-world autonomous driving dataset.

🚀 If you like this project, drop a ⭐ on GitHub! 💙

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
evaluation		evaluation
llama_adapter_v2_multimodal7b		llama_adapter_v2_multimodal7b
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚗💡 Tracking Meets Large Multimodal Models for Driving Scenario Understanding

✨ Overview

📂 Data Preparation

🏆 Results

🚘 DriveLM-nuScenes

🌍 DriveLM-CARLA

⚙️ Setup & Fine-Tuning

🚀 Inference

📊 Evaluation

🚀 TODO List

🙏 Acknowledgments

About

Releases

Packages

Contributors 2

Languages

mbzuai-oryx/TrackingMeetsLMM

Folders and files

Latest commit

History

Repository files navigation

🚗💡 Tracking Meets Large Multimodal Models for Driving Scenario Understanding

✨ Overview

📂 Data Preparation

🏆 Results

🚘 DriveLM-nuScenes

🌍 DriveLM-CARLA

⚙️ Setup & Fine-Tuning

🚀 Inference

📊 Evaluation

🚀 TODO List

🙏 Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages