Unnormalized image and text projected vectors #10

HareshKarnan · 2023-06-26T14:57:49Z

Hi, thanks for open-sourcing your code. I noticed that your text and image vectors which you used to compute the logits are not unit normalized vectors. https://github.com/moein-shariatnia/OpenAI-CLIP/blob/e2c5bb3859d7478752af8c69862f63b1afe4a9cb/modules.py#L68 .

In this case, the two vectors can have arbitrary lengths and the dot product does not capture their cosine similarity as done in OpenAI's CLIP implementation. Do you have any intuition why you did not do L2 normalization instead of LayerNorm / why LayerNorm was your preferred choice?

moein-shariatnia · 2023-07-03T17:31:56Z

Hey Haresh,
Sorry for my late reply.

Yes, you're right. Normalizing the features before calculating the loss is a better option than relying on LayerNorm to fix for this. Will update the code to add this. Also, contributions are welcomed! Thanks a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unnormalized image and text projected vectors #10

Unnormalized image and text projected vectors #10

HareshKarnan commented Jun 26, 2023

moein-shariatnia commented Jul 3, 2023

Unnormalized image and text projected vectors #10

Unnormalized image and text projected vectors #10

Comments

HareshKarnan commented Jun 26, 2023

moein-shariatnia commented Jul 3, 2023