DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving

Paper

DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving

Xianda Guo*, Ruijun Zhang*, Yiqun Duan*, Yuhang He, Chenming Zhang, Long Chen.

News

[2024/11] Paper released on arXiv.

Overall

Getting Started

0. Prepare Dataset

We are using the Hugging Face dataset MLLM_eval_dataset for evaluation. The images are sourced from the CAM_FRONT in the validation set of nuScenes. We have provided a metadata.jsonl file for all images, allowing users to easily access properties such as location2D.

1. Setup Environment

2. Inference

Run inference according to your requirements:

For GPT API calls:

export OPENAI_API_KEY=your_api_key

python inference/get_MLLM_output.py \
--model_type gpt \
--model gpt-4o \
--hf_dataset bonbon-rj/MLLM_eval_dataset \
--prompts_dir prompt/prompts \
--save_dir inference/mllm_outputs

For Gemini API calls:

export GOOGLE_API_KEY=your_api_key

python inference/get_MLLM_output.py \
--model_type gemini \
--model models/gemini-1.5-flash \
--hf_dataset bonbon-rj/MLLM_eval_dataset \
--prompts_dir prompt/prompts \
--save_dir inference/mllm_outputs

For Local LLaVA-Next inference:

python inference/get_MLLM_output.py \
--model_type llava \
--model lmms-lab/llava-onevision-qwen2-7b-si \
--hf_dataset bonbon-rj/MLLM_eval_dataset \
--prompts_dir prompt/prompts \
--save_dir inference/mllm_outputs

For Local QWen2-VL inference:

python inference/get_MLLM_output.py \
--model_type qwen \
--model Qwen/Qwen2-VL-7B-Instruct \
--hf_dataset bonbon-rj/MLLM_eval_dataset \
--prompts_dir prompt/prompts \
--save_dir inference/mllm_outputs

Run script to get the random output for the prompts:

python inference/get_random_output.py \
--hf_dataset bonbon-rj/MLLM_eval_dataset \
--prompts_dir prompt/prompts \
--save_dir inference/mllm_outputs

After executing the script, the results will be saved in the directory: {save_dir}/{model_type}/{model}.

3. Evaluation

You can execute the script below to evaluate all results located in eval_root_dir:

python evaluation/eval_from_json.py \
--hf_dataset bonbon-rj/MLLM_eval_dataset \
--eval_root_dir inference/mllm_outputs \
--save_dir evaluation/eval_result \
--eval_model_path all

Alternatively, you can also run the following script to evaluate a specific result under eval_root_dir by specifying a model eval_model_path:

python evaluation/eval_from_json.py \
--hf_dataset bonbon-rj/MLLM_eval_dataset \
--eval_root_dir inference/mllm_outputs \
--save_dir evaluation/eval_result \
--eval_model_path gemini/gemini-1.5-flash

After running the scripts, the evaluation results will be stored in the directory: {save_dir}.

Citation

@article{DriveMLLM,
        title={DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving},
        author={Guo, Xianda and Zhang Ruijun and Duan Yiqun and He Yuhang and Zhang, Chenming and Chen, Long},
        journal={arXiv preprint arXiv:2411.13112},
        year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
docs		docs
evaluation		evaluation
inference		inference
prompt		prompt
README.md		README.md
radar.jpg		radar.jpg
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving

Paper

News

Overall

Getting Started

0. Prepare Dataset

1. Setup Environment

2. Inference

3. Evaluation

Citation

About

Releases

Packages

Contributors 3

Languages

XiandaGuo/Drive-MLLM

Folders and files

Latest commit

History

Repository files navigation

DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving

Paper

News

Overall

Getting Started

0. Prepare Dataset

1. Setup Environment

2. Inference

3. Evaluation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages