Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The results of LLaVA-Video-7B-Qwen2's captioned test on VideoMME were quite different #452

Open
ghost opened this issue Dec 10, 2024 · 4 comments

Comments

@ghost
Copy link

ghost commented Dec 10, 2024

I conducted the test with subtitles on VideoMME, using the model LLaVA-Video-7B-Qwen2, and the result was much different from that of the paper. The original paper was 69.7, but our result was 66.7. Here's my run command:

accelerate launch --num_processes=3 -m lmms_eval --model llava_onevision --model_args pretrained=/root/autodl-tmp/models/LLaVA-Video-7B-Qwen2,conv_template=qwen_1_5,model_name=llava_qwen --tasks videomme_w_subtitle --batch_size 1 --log_samples --log_samples_suffix llava_onevision --output_path ./logs/
WeChat4a211f5a9a0d9185a3ccfb2ff8e2fec9

@ZhangYuanhan-AI
Copy link
Contributor

The model should be llava_vid,

Please carefully read the guideline here: https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_Video_1003.md

@HulkZh
Copy link

HulkZh commented Dec 17, 2024

The model should be llava_vid,

Please carefully read the guideline here: https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_Video_1003.md

OK. We were able to reproduce the results from the original paper thanks to your advice. I would also like to ask for your guidance on the test run command for the Qwen/Qwen2-VL-7B-Instruct model on VideoMME. I used the command provided by lmms, but I did not get the results from the original paper. Is it because I set max_pixels to 256 * 28 * 28?

@kcz358
Copy link
Collaborator

kcz358 commented Dec 25, 2024

Max pixels could be one problem. Another problem is that we don't know the exact prompt and number of frames they are using so we are unable to exactly reproduce the result.

@HulkZh
Copy link

HulkZh commented Dec 26, 2024

I tested lmms-lab/LLaVA-Video-72B-Qwen2 on videoMME using a 4-card GPU with 96GB video memory and encountered oom. I only loaded 20 weight files before encountering oom, but we did not encounter oom with the same configuration in the code of other warehouses. This is the run command:

accelerate launch --num_processes=4 -m lmms_eval --model llava_vid --model_args pretrained=/root/autodl-tmp/models/LLaVA-Video-72B-Qwen2,conv_template=qwen_1_5,max_frames_num=64,mm_spatial_pool_mode=a verage --tasks videomme_w_subtitle --batch_size 1 --log_samples --log_samples_suffix llava_vid --output_path ./logs/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants