You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, when I was using the following inference code (the same as provided on Huggingface) to evaluate LLaVA-Video-7B-Qwen2 on MSVD-QA dataset, it gives weird output like the repeating endless 0 or two "is" in the same sentence. How can I solve this problem?
tokenizer, model, image_processor, max_length = load_pretrained_model(pretrained, None, model_name, torch_dtype="bfloat16", device_map=device_map) # Add any other thing you want to pass in llava_model_args
model.eval()
video_path = "XXXX"
max_frames_num = 64
video,frame_time,video_time = load_video(video_path, max_frames_num, 1, force_sample=True)
video = image_processor.preprocess(video, return_tensors="pt")["pixel_values"].cuda().half()
video = [video]
conv_template = "qwen_1_5" # Make sure you use correct chat template for different models
time_instruciton = f"The video lasts for {video_time:.2f} seconds, and {len(video[0])} frames are uniformly sampled from it. These frames are located at {frame_time}.Please answer the following questions related to this video."
question = DEFAULT_IMAGE_TOKEN + f"\n{time_instruciton}\nPlease describe this video in detail."
conv = copy.deepcopy(conv_templates[conv_template])
conv.append_message(conv.roles[0], question)
conv.append_message(conv.roles[1], None)
prompt_question = conv.get_prompt()
input_ids = tokenizer_image_token(prompt_question, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0).to(device)
cont = model.generate(
input_ids,
images=video,
modalities= ["video"],
do_sample=False,
temperature=0,
max_new_tokens=4096,
)
By the way, I observe the similar weird output on Vstream-QA dataset ().. Do I need to change the prompt and how?
The text was updated successfully, but these errors were encountered:
Hi, when I was using the following inference code (the same as provided on Huggingface) to evaluate LLaVA-Video-7B-Qwen2 on MSVD-QA dataset, it gives weird output like the repeating endless 0 or two "is" in the same sentence. How can I solve this problem?
By the way, I observe the similar weird output on Vstream-QA dataset ().. Do I need to change the prompt and how?
The text was updated successfully, but these errors were encountered: