The results of LLaVA-Video-7B-Qwen2's captioned test on VideoMME were quite different #452

ghost · 2024-12-10T16:18:24Z

I conducted the test with subtitles on VideoMME, using the model LLaVA-Video-7B-Qwen2, and the result was much different from that of the paper. The original paper was 69.7, but our result was 66.7. Here's my run command:

accelerate launch --num_processes=3 -m lmms_eval --model llava_onevision --model_args pretrained=/root/autodl-tmp/models/LLaVA-Video-7B-Qwen2,conv_template=qwen_1_5,model_name=llava_qwen --tasks videomme_w_subtitle --batch_size 1 --log_samples --log_samples_suffix llava_onevision --output_path ./logs/

ZhangYuanhan-AI · 2024-12-13T01:20:48Z

The model should be llava_vid,

Please carefully read the guideline here: https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_Video_1003.md

HulkZh · 2024-12-17T06:44:23Z

The model should be llava_vid,

Please carefully read the guideline here: https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_Video_1003.md

OK. We were able to reproduce the results from the original paper thanks to your advice. I would also like to ask for your guidance on the test run command for the Qwen/Qwen2-VL-7B-Instruct model on VideoMME. I used the command provided by lmms, but I did not get the results from the original paper. Is it because I set max_pixels to 256 * 28 * 28?

kcz358 · 2024-12-25T00:40:58Z

Max pixels could be one problem. Another problem is that we don't know the exact prompt and number of frames they are using so we are unable to exactly reproduce the result.

HulkZh · 2024-12-26T07:26:51Z

I tested lmms-lab/LLaVA-Video-72B-Qwen2 on videoMME using a 4-card GPU with 96GB video memory and encountered oom. I only loaded 20 weight files before encountering oom, but we did not encounter oom with the same configuration in the code of other warehouses. This is the run command:

accelerate launch --num_processes=4 -m lmms_eval --model llava_vid --model_args pretrained=/root/autodl-tmp/models/LLaVA-Video-72B-Qwen2,conv_template=qwen_1_5,max_frames_num=64,mm_spatial_pool_mode=a verage --tasks videomme_w_subtitle --batch_size 1 --log_samples --log_samples_suffix llava_vid --output_path ./logs/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The results of LLaVA-Video-7B-Qwen2's captioned test on VideoMME were quite different #452

The results of LLaVA-Video-7B-Qwen2's captioned test on VideoMME were quite different #452

ghost commented Dec 10, 2024

ZhangYuanhan-AI commented Dec 13, 2024

HulkZh commented Dec 17, 2024

kcz358 commented Dec 25, 2024

HulkZh commented Dec 26, 2024

The results of LLaVA-Video-7B-Qwen2's captioned test on VideoMME were quite different #452

The results of LLaVA-Video-7B-Qwen2's captioned test on VideoMME were quite different #452

Comments

ghost commented Dec 10, 2024

ZhangYuanhan-AI commented Dec 13, 2024

HulkZh commented Dec 17, 2024

kcz358 commented Dec 25, 2024

HulkZh commented Dec 26, 2024