Qwen2VLProcessor cannot handle odd number of video frames #35412

DarkLight1337 · 2024-12-25T06:58:11Z

System Info

- `transformers` version: 4.47.1
- Platform: Linux-5.4.0-174-generic-x86_64-with-glibc2.31
- Python version: 3.9.20
- Huggingface_hub version: 0.26.2
- Safetensors version: 0.4.5
- Accelerate version: 1.0.1
- Accelerate config:    not found
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: Yes
- GPU type: NVIDIA A10

Who can help?

@ArthurZucker @zucchini-nlp

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

I found that the processor for Qwen2-VL cannot handle input videos with an odd number of frames (except for videos with a single frame). This occurs regardless of the channel format and image dimensions of each frame.

import numpy as np
from transformers import AutoProcessor

# The processor fails when num_frames = 3, 5, 7, ...
num_frames = 3
video = np.random.randint(0, 255, size=(num_frames, 256, 256, 3), dtype=np.uint8)

processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
processor(text="<|vision_start|><|video_pad|><|vision_end|>", videos=[video])

Error when num_frames = 3

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cyrus/miniconda3/envs/vllm/lib/python3.9/site-packages/transformers/models/qwen2_vl/processing_qwen2_vl.py", line 124, in __call__
    videos_inputs = self.image_processor(images=None, videos=videos, **output_kwargs["videos_kwargs"])
  File "/home/cyrus/miniconda3/envs/vllm/lib/python3.9/site-packages/transformers/image_processing_utils.py", line 41, in __call__
    return self.preprocess(images, **kwargs)
  File "/home/cyrus/miniconda3/envs/vllm/lib/python3.9/site-packages/transformers/models/qwen2_vl/image_processing_qwen2_vl.py", line 439, in preprocess
    patches, video_grid_thw = self._preprocess(
  File "/home/cyrus/miniconda3/envs/vllm/lib/python3.9/site-packages/transformers/models/qwen2_vl/image_processing_qwen2_vl.py", line 299, in _preprocess
    patches = patches.reshape(
ValueError: cannot reshape array of size 571536 into shape (1,2,3,9,2,14,9,2,14)

Expected behavior

The processor should be able to handle videos with an odd number of frames.

The text was updated successfully, but these errors were encountered:

zucchini-nlp · 2025-01-06T09:15:56Z

Nice catch, haven't noticed this before. I think we either can repeat the last frame one more time until temporal_patch_dim is reached similar way to how images are repeated along time dimension, or raise a ValueError if the number of frames is not divisible

Let me ask the authors to see if there are any possible issues with replicating the last frame, and I will submit a PR

DarkLight1337 · 2025-01-06T09:20:00Z

Thanks for looking into this! It looks like @jla524 has already opened a PR.

zucchini-nlp · 2025-01-06T09:51:34Z

Oh cool, will review that one then, thanks!

DarkLight1337 added the bug label Dec 25, 2024

This was referenced Dec 25, 2024

[New Model]: QVQ-72B-Preview vllm-project/vllm#11479

Open

[Usage]: Qwen/Qwen2-VL-7B-Instruct vllm-project/vllm#10994

Closed

jla524 mentioned this issue Dec 27, 2024

Fix Qwen2VL processor to handle odd number of frames #35431

Merged

5 tasks

LysandreJik added the VLM label Dec 29, 2024

zucchini-nlp self-assigned this Jan 6, 2025

ArthurZucker closed this as completed in #35431 Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2VLProcessor cannot handle odd number of video frames #35412

Qwen2VLProcessor cannot handle odd number of video frames #35412

DarkLight1337 commented Dec 25, 2024 •

edited

Loading

zucchini-nlp commented Jan 6, 2025

DarkLight1337 commented Jan 6, 2025 •

edited

Loading

zucchini-nlp commented Jan 6, 2025

Qwen2VLProcessor cannot handle odd number of video frames #35412

Qwen2VLProcessor cannot handle odd number of video frames #35412

Comments

DarkLight1337 commented Dec 25, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

zucchini-nlp commented Jan 6, 2025

DarkLight1337 commented Jan 6, 2025 • edited Loading

zucchini-nlp commented Jan 6, 2025

DarkLight1337 commented Dec 25, 2024 •

edited

Loading

DarkLight1337 commented Jan 6, 2025 •

edited

Loading