Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnormalized image and text projected vectors #10

Open
HareshKarnan opened this issue Jun 26, 2023 · 1 comment
Open

Unnormalized image and text projected vectors #10

HareshKarnan opened this issue Jun 26, 2023 · 1 comment

Comments

@HareshKarnan
Copy link

Hi, thanks for open-sourcing your code. I noticed that your text and image vectors which you used to compute the logits are not unit normalized vectors. https://github.com/moein-shariatnia/OpenAI-CLIP/blob/e2c5bb3859d7478752af8c69862f63b1afe4a9cb/modules.py#L68 .

In this case, the two vectors can have arbitrary lengths and the dot product does not capture their cosine similarity as done in OpenAI's CLIP implementation. Do you have any intuition why you did not do L2 normalization instead of LayerNorm / why LayerNorm was your preferred choice?

@moein-shariatnia
Copy link
Owner

Hey Haresh,
Sorry for my late reply.

Yes, you're right. Normalizing the features before calculating the loss is a better option than relying on LayerNorm to fix for this. Will update the code to add this. Also, contributions are welcomed! Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants