-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusion about PCA step #16
Comments
Thank you for your interest in our work. For more information, please refer to #13 (comment). |
Thanks for your reply so much. I still have some confusions that I'd like to discuss with you. The paper mentioned that if the "in one word:" method is not used, there will be a Modality Gap in the drawn graph. So, what prompt is used when it is not employed? Besides, I tried using the "in one word:" method on other MLLMs. Although the output texts for both modalities are similar words, there is still a Modality Gap when extracting hidden states for plotting. However, when directly testing the Recall value with the embeddings of the two modalities, there is a good effect. I would like to ask the authors if you have tried other MLLMs with this prompt in your experiments and whether it can also eliminate the Modality Gap effect? I have tried many prompts, but the Modality Gap still exists. Thank you very much for your time. Wish you good luck today. |
For the method without prompt, we use the same chat template of llame 3 like following: template = '<|start_header_id|>user<|end_header_id|>\n\n{}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n \n'
template.format('<image>') # image
template.format('<sent>') # sentence We have not plotted the gap on other MLLMs. Could you share the MLLMs and prompts that were used? |
Thanks for your reply! I used a similar prompt to yours and I solved my problems today. The main issue for me lies in the "one-word summary". I found that for multimodal data and their corresponding captions, my MLLM respectively summarized them as verbs and nouns, and the semantic distance between these two is quite large. Using in-context learning for the sentence summary process, I also obtained a graph similar to yours. I think that when using this prompt for semantic compression, in-context learning might also be necessary to ensure that more MLLMs can achieve the results described in your paper. Looking forward to further communication with you. |
Thank you for your advice. We also found that in-context learning is useful for this prompt in PromptEOL. We conducted a more in-depth analysis of in-context learning, focusing on how to select examples and scale with model size. |
I hope this message finds you well. I have a couple of questions regarding your methodology for PCA
(1) When performing PCA, did you apply any additional processing steps to the embedding data before conducting PCA?
(2) Regarding the elimination of the modality gap, is it possible to obtain the image Fig 3(b) by simply using the "one word:" method, or is further fine-tuning required to achieve this outcome?
Thank you very much for your time.
The text was updated successfully, but these errors were encountered: