Multi-image conversation does not work for more than 2 images? #296

console-beaver · 2024-06-21T17:24:45Z

Hello, I tried using the multi-image conversation as outlined on

Lines 627 to 634 in 764fdc9

    
           # multi-round multi-image conversation 
        
           pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda() 
        
           pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda() 
        
           pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0) 
        
           question = "详细描述这两张图片" # Describe the two pictures in detail 
        
           response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True) 
        
           print(question, response)

With the two image example, I am able to reproduce the results seen in (https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5#model-usage). However, when I try three or more images, the model seems to ignore all but the last two images. For example, if I try:

# pixel_values1, 2, 3, correspond to images 1, 2, 3 in the examples folder
pixel_values = torch.cat((pixel_values1, pixel_values2, pixel_values3), dim=0)
question = "Can you construct a story from all the images?"
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
print(question, ': ', response)

and I see the following response:

Can you construct a story from all the images? :  The story begins with a young girl named Lily who has always been fascinated by animals. She dreams of one day having a pet panda, just like the ones she sees in the zoo. One day, her wish comes true when she finds a baby panda in the forest. She takes the panda home and names it Panda.

As Panda grows, Lily realizes that she needs a friend for Panda. She decides to adopt a cat from the local animal shelter. She brings home a beautiful calico cat and names her Luna. Luna and Panda become the best of friends, spending their days playing and exploring.

One day, Lily decides to take Luna and Panda to the zoo to see the other animals. They visit the panda exhibit and see a majestic adult panda. Panda is amazed by the size and beauty of the adult panda and wishes to be as big and strong as it.

Luna, being the wise cat that she is, tells Panda that size and strength are not the most important things. She tells Panda that what truly matters is the love and friendship they share with Lily and each other.

It seems that there is no mention of the red panda in image 1. Is there anything I can do to run multi-image conversation for more than 2 images? Or is this a restriction the model currently has?

The text was updated successfully, but these errors were encountered:

czczup · 2024-08-26T05:34:11Z

Hello, thank you for your interest. We have noticed that generating captions for multiple images is indeed challenging. We will continue to improve the training data in this area and hope to make progress in future updates.

czczup closed this as completed Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-image conversation does not work for more than 2 images? #296

Multi-image conversation does not work for more than 2 images? #296

console-beaver commented Jun 21, 2024

czczup commented Aug 26, 2024

Multi-image conversation does not work for more than 2 images? #296

Multi-image conversation does not work for more than 2 images? #296

Comments

console-beaver commented Jun 21, 2024

czczup commented Aug 26, 2024