Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-image conversation does not work for more than 2 images? #296

Closed
console-beaver opened this issue Jun 21, 2024 · 1 comment
Closed

Comments

@console-beaver
Copy link

Hello, I tried using the multi-image conversation as outlined on

InternVL/README.md

Lines 627 to 634 in 764fdc9

# multi-round multi-image conversation
pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)
question = "详细描述这两张图片" # Describe the two pictures in detail
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
print(question, response)
With the two image example, I am able to reproduce the results seen in (https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5#model-usage). However, when I try three or more images, the model seems to ignore all but the last two images. For example, if I try:

# pixel_values1, 2, 3, correspond to images 1, 2, 3 in the examples folder
pixel_values = torch.cat((pixel_values1, pixel_values2, pixel_values3), dim=0)
question = "Can you construct a story from all the images?"
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
print(question, ': ', response)

and I see the following response:

Can you construct a story from all the images? :  The story begins with a young girl named Lily who has always been fascinated by animals. She dreams of one day having a pet panda, just like the ones she sees in the zoo. One day, her wish comes true when she finds a baby panda in the forest. She takes the panda home and names it Panda.

As Panda grows, Lily realizes that she needs a friend for Panda. She decides to adopt a cat from the local animal shelter. She brings home a beautiful calico cat and names her Luna. Luna and Panda become the best of friends, spending their days playing and exploring.

One day, Lily decides to take Luna and Panda to the zoo to see the other animals. They visit the panda exhibit and see a majestic adult panda. Panda is amazed by the size and beauty of the adult panda and wishes to be as big and strong as it.

Luna, being the wise cat that she is, tells Panda that size and strength are not the most important things. She tells Panda that what truly matters is the love and friendship they share with Lily and each other.

It seems that there is no mention of the red panda in image 1. Is there anything I can do to run multi-image conversation for more than 2 images? Or is this a restriction the model currently has?

@czczup
Copy link
Member

czczup commented Aug 26, 2024

Hello, thank you for your interest. We have noticed that generating captions for multiple images is indeed challenging. We will continue to improve the training data in this area and hope to make progress in future updates.

@czczup czczup closed this as completed Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants