Skip to content

mbzuai-oryx/TrackingMeetsLMM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚗💡 Tracking Meets Large Multimodal Models for Driving Scenario Understanding

📌 Enhancing autonomous driving with tracking-powered multimodal understanding!


Overview

This repository presents an innovative approach that integrates 3D object tracking into Large Multimodal Models (LMMs) to enhance spatiotemporal understanding in autonomous driving. 🚘⚡ By leveraging tracking information, we significantly improve perception, planning, and prediction compared to baseline models.

Model

🔹 Key Benefits:

  • 📸 Vision + Tracking: We enhance VQA in autonomous driving by integrating tracking-based embeddings.
  • 🚀 3D Object Tracking: We use 3DMOTFormer for robust multi-object tracking, improving contextual understanding.
  • 🔗 Multimodal Fusion: Images and tracking features are jointly processed to enhance reasoning and predictions.
  • 🧠 Self-Supervised Pretraining: Our tracking encoder boosts model comprehension.
  • 🏆 Benchmark Success: We achieve a 9.5% accuracy gain and 7.04-point ChatGPT score improvement on DriveLM-nuScenes, and a 3.7% final score increase on DriveLM-CARLA. 📊🔥

Model

📂 Data Preparation

🔹 VQA Datasets: Obtain datasets following instructions from DriveLM.

🔹 Tracking Data:

  • 📌 Step 1: Generate 3D object and ego-vehicle tracks using 3DMOTFormer.
  • 📌 Step 2: Process these tracks to map key object and ego-vehicle trajectories for each question.

🏆 Results

🚘 DriveLM-nuScenes

Results Table 1

🌍 DriveLM-CARLA

Results Table 2


⚙️ Setup & Fine-Tuning

💡 To set up and fine-tune the model, refer to [llama_adapter_v2_multimodal7b/README.md] in this repository.


🚀 Inference

🔧 Before running inference, extract the adapter weights using save_weights.py. Inside this script, set the trained weights path and output path accordingly.

Run the following command to perform inference on test data:

cd  llama_adapter_v2_multimodal7b/
python demo.py --llama_dir /path/to/llama_model_weights \
               --checkpoint /path/to/pre-trained/checkpoint.pth \
               --data ../test_llama.json  \
               --output ../output.json \
               --batch_size 4 \
               --num_processes 8

📊 Evaluation

🔍 To evaluate the model's performance:

1️⃣ Set up the evaluation package using instructions in DriveLM Challenge ReadMe.

2️⃣ Run the evaluation script:

python evaluation/evaluation.py --root_path1 ./output.json --root_path2 ./test_eval.json

🚀 TODO List

  • 📢 Release pretrained weights
  • 🎯 Release finetuned checkpoint
  • 📊 Release nuScenes train and test VQA with tracks

🙏 Acknowledgments

We sincerely appreciate the contributions and resources from the following projects:


🚀 If you like this project, drop a ⭐ on GitHub! 💙

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published