You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
question ="详细描述这两张图片"# Describe the two pictures in detail
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
print(question, response)
With the two image example, I am able to reproduce the results seen in (https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5#model-usage). However, when I try three or more images, the model seems to ignore all but the last two images. For example, if I try:
# pixel_values1, 2, 3, correspond to images 1, 2, 3 in the examples folder
pixel_values = torch.cat((pixel_values1, pixel_values2, pixel_values3), dim=0)
question = "Can you construct a story from all the images?"
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
print(question, ': ', response)
and I see the following response:
Can you construct a story from all the images? : The story begins with a young girl named Lily who has always been fascinated by animals. She dreams of one day having a pet panda, just like the ones she sees in the zoo. One day, her wish comes true when she finds a baby panda in the forest. She takes the panda home and names it Panda.
As Panda grows, Lily realizes that she needs a friend for Panda. She decides to adopt a cat from the local animal shelter. She brings home a beautiful calico cat and names her Luna. Luna and Panda become the best of friends, spending their days playing and exploring.
One day, Lily decides to take Luna and Panda to the zoo to see the other animals. They visit the panda exhibit and see a majestic adult panda. Panda is amazed by the size and beauty of the adult panda and wishes to be as big and strong as it.
Luna, being the wise cat that she is, tells Panda that size and strength are not the most important things. She tells Panda that what truly matters is the love and friendship they share with Lily and each other.
It seems that there is no mention of the red panda in image 1. Is there anything I can do to run multi-image conversation for more than 2 images? Or is this a restriction the model currently has?
The text was updated successfully, but these errors were encountered:
Hello, thank you for your interest. We have noticed that generating captions for multiple images is indeed challenging. We will continue to improve the training data in this area and hope to make progress in future updates.
Hello, I tried using the multi-image conversation as outlined on
InternVL/README.md
Lines 627 to 634 in 764fdc9
and I see the following response:
It seems that there is no mention of the red panda in image 1. Is there anything I can do to run multi-image conversation for more than 2 images? Or is this a restriction the model currently has?
The text was updated successfully, but these errors were encountered: