📌 Enhancing autonomous driving with tracking-powered multimodal understanding!
This repository presents an innovative approach that integrates 3D object tracking into Large Multimodal Models (LMMs) to enhance spatiotemporal understanding in autonomous driving. 🚘⚡ By leveraging tracking information, we significantly improve perception, planning, and prediction compared to baseline models.
🔹 Key Benefits:
- 📸 Vision + Tracking: We enhance VQA in autonomous driving by integrating tracking-based embeddings.
- 🚀 3D Object Tracking: We use 3DMOTFormer for robust multi-object tracking, improving contextual understanding.
- 🔗 Multimodal Fusion: Images and tracking features are jointly processed to enhance reasoning and predictions.
- 🧠 Self-Supervised Pretraining: Our tracking encoder boosts model comprehension.
- 🏆 Benchmark Success: We achieve a 9.5% accuracy gain and 7.04-point ChatGPT score improvement on DriveLM-nuScenes, and a 3.7% final score increase on DriveLM-CARLA. 📊🔥
🔹 VQA Datasets: Obtain datasets following instructions from DriveLM.
🔹 Tracking Data:
- 📌 Step 1: Generate 3D object and ego-vehicle tracks using 3DMOTFormer.
- 📌 Step 2: Process these tracks to map key object and ego-vehicle trajectories for each question.
💡 To set up and fine-tune the model, refer to [llama_adapter_v2_multimodal7b/README.md]
in this repository.
🔧 Before running inference, extract the adapter weights using save_weights.py
. Inside this script, set the trained weights path and output path accordingly.
Run the following command to perform inference on test data:
cd llama_adapter_v2_multimodal7b/
python demo.py --llama_dir /path/to/llama_model_weights \
--checkpoint /path/to/pre-trained/checkpoint.pth \
--data ../test_llama.json \
--output ../output.json \
--batch_size 4 \
--num_processes 8
🔍 To evaluate the model's performance:
1️⃣ Set up the evaluation package using instructions in DriveLM Challenge ReadMe.
2️⃣ Run the evaluation script:
python evaluation/evaluation.py --root_path1 ./output.json --root_path2 ./test_eval.json
- 📢 Release pretrained weights
- 🎯 Release finetuned checkpoint
- 📊 Release nuScenes train and test VQA with tracks
We sincerely appreciate the contributions and resources from the following projects:
- 🚗 DriveLM – Benchmark datasets & evaluation.
- 🦙 LLaMA Adapter – Large Multimodal Model foundation.
- 🎯 3DMOTFormer – 3D multi-object tracking.
- 🌍 nuScenes Dataset – Real-world autonomous driving dataset.
🚀 If you like this project, drop a ⭐ on GitHub! 💙