Skip to content

XiandaGuo/Drive-MLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving

DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving

Xianda Guo*, Ruijun Zhang*, Yiqun Duan*, Yuhang He, Chenming Zhang, Long Chen.

News

  • [2024/11] Paper released on arXiv.

Overall

vis

Getting Started

0. Prepare Dataset

We are using the Hugging Face dataset MLLM_eval_dataset for evaluation. The images are sourced from the CAM_FRONT in the validation set of nuScenes. We have provided a metadata.jsonl file for all images, allowing users to easily access properties such as location2D.

2. Inference

Run inference according to your requirements:

  • For GPT API calls:
export OPENAI_API_KEY=your_api_key

python inference/get_MLLM_output.py \
--model_type gpt \
--model gpt-4o \
--hf_dataset bonbon-rj/MLLM_eval_dataset \
--prompts_dir prompt/prompts \
--save_dir inference/mllm_outputs
  • For Gemini API calls:
export GOOGLE_API_KEY=your_api_key

python inference/get_MLLM_output.py \
--model_type gemini \
--model models/gemini-1.5-flash \
--hf_dataset bonbon-rj/MLLM_eval_dataset \
--prompts_dir prompt/prompts \
--save_dir inference/mllm_outputs
  • For Local LLaVA-Next inference:
python inference/get_MLLM_output.py \
--model_type llava \
--model lmms-lab/llava-onevision-qwen2-7b-si \
--hf_dataset bonbon-rj/MLLM_eval_dataset \
--prompts_dir prompt/prompts \
--save_dir inference/mllm_outputs
  • For Local QWen2-VL inference:
python inference/get_MLLM_output.py \
--model_type qwen \
--model Qwen/Qwen2-VL-7B-Instruct \
--hf_dataset bonbon-rj/MLLM_eval_dataset \
--prompts_dir prompt/prompts \
--save_dir inference/mllm_outputs

Run script to get the random output for the prompts:

python inference/get_random_output.py \
--hf_dataset bonbon-rj/MLLM_eval_dataset \
--prompts_dir prompt/prompts \
--save_dir inference/mllm_outputs

After executing the script, the results will be saved in the directory: {save_dir}/{model_type}/{model}.

3. Evaluation

You can execute the script below to evaluate all results located in eval_root_dir:

python evaluation/eval_from_json.py \
--hf_dataset bonbon-rj/MLLM_eval_dataset \
--eval_root_dir inference/mllm_outputs \
--save_dir evaluation/eval_result \
--eval_model_path all 

Alternatively, you can also run the following script to evaluate a specific result under eval_root_dir by specifying a model eval_model_path:

python evaluation/eval_from_json.py \
--hf_dataset bonbon-rj/MLLM_eval_dataset \
--eval_root_dir inference/mllm_outputs \
--save_dir evaluation/eval_result \
--eval_model_path gemini/gemini-1.5-flash

After running the scripts, the evaluation results will be stored in the directory: {save_dir}.

Citation

@article{DriveMLLM,
        title={DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving},
        author={Guo, Xianda and Zhang Ruijun and Duan Yiqun and He Yuhang and Zhang, Chenming and Chen, Long},
        journal={arXiv preprint arXiv:2411.13112},
        year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages