Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug/Model Request]: Qdrant/clip-ViT-B-32-vision model image embeddings are not the same as Hugginface or openai-clip #357

Open
onurtunali opened this issue Oct 6, 2024 · 1 comment
Assignees

Comments

@onurtunali
Copy link

What happened?

I am not a vision expert so apologies in advance if my interpretation of the situation is incorrect. While working with Fastembed, I have observed that image embeddings generated by "Qdrant/clip-ViT-B-32" model are not the same as Hugginface "openai/clip-vit-base-patch32" model or OpenAI ""ViT-B/32" model.

I am adding an mre:


Versions:
fastembed: 0.3.6
transformers: 4.42.3
openai-clip: 1.0.1
python: 3.10.14

import io
import urllib

import clip
import numpy as np
import torch
from fastembed import ImageEmbedding
from PIL import Image
from transformers import CLIPModel, CLIPProcessor

N_TRIALS = 5
fastembed_trial_results = []
hf_trial_results = []

fastembed_model = ImageEmbedding(model_name="Qdrant/clip-ViT-B-32-vision")
openai_clip_model, openai_preprocess = clip.load("ViT-B/32")
hf_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
hf_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")


def generate_sample_data(sample_size=5, image_size=600):
    """Get random images with given sample_size."""
    images = []
    for _ in range(sample_size):
        response = urllib.request.urlopen(f"https://picsum.photos/{image_size}")
        image = Image.open(io.BytesIO(response.read()))
        images.append(image)
    return images


def get_fastembed_embeddings(images):
    fastemed_embeddins = list(fastembed_model.embed(images))
    fastemed_embeddins = np.vstack(fastemed_embeddins)
    return fastemed_embeddins


def get_openai_embeddings(images):
    image_tensors = torch.vstack([openai_preprocess(i).unsqueeze(0) for i in images])

    with torch.no_grad():
        openai_embeddings = openai_clip_model.encode_image(image_tensors).numpy()
    return openai_embeddings


def get_hf_embeddings(images):
    hf_model.eval()
    input_dict = hf_processor(images=images, return_tensors="pt")
    with torch.no_grad():
        hf_embeddings = hf_model.get_image_features(**input_dict).numpy()
    return hf_embeddings


for t in range(N_TRIALS):
    images = generate_sample_data()

    fastemed_embeddins = get_fastembed_embeddings(images)
    openai_embeddings = get_openai_embeddings(images)
    hf_embeddings = get_hf_embeddings(images)

    if np.allclose(fastemed_embeddins, openai_embeddings, atol=0.001):
        print(f"Trial {t} Fastembed same with openai")
        fastembed_trial_results.append(True)
    else:
        print(f"Trial {t} Fastembed is NOT the same with openai")
        fastembed_trial_results.append(False)

    if np.allclose(openai_embeddings, hf_embeddings, atol=0.001):
        print(f"Trial {t} Hf same with openai")
        hf_trial_results.append(True)
    else:
        print(f"Trial {t} Hf is NOT the same with openai")
        hf_trial_results.append(False)

print(f"Out of {N_TRIALS}, {sum(fastembed_trial_results)} are the same for Fastembed")
print(f"Out of {N_TRIALS}, {sum(hf_trial_results)} are the same for HF")

Here is the colab version: Open In Colab

What Python version are you on? e.g. python --version

Python 3.10.14

Version

0.2.7 (Latest)

What os are you seeing the problem on?

Linux, MacOS

Relevant stack traces and/or logs

Trial 0 Fastembed is NOT the same with openai
Trial 0 Hf same with openai
Trial 1 Fastembed is NOT the same with openai
Trial 1 Hf same with openai
Trial 2 Fastembed is NOT the same with openai
Trial 2 Hf same with openai
Trial 3 Fastembed is NOT the same with openai
Trial 3 Hf same with openai
Trial 4 Fastembed is NOT the same with openai
Trial 4 Hf same with openai
Out of 5, 0 are the same for Fastembed
Out of 5, 5 are the same for HF
@hh-space-invader
Copy link
Contributor

hh-space-invader commented Jan 14, 2025

Thank you for your observation and for reaching out! You're correct that embeddings from Qdrant/clip-ViT-B-32 in Fastembed may not match those from HuggingFace openai/clip-vit-base-patch32 or OpenAI's ViT-B/32 directly. The difference lies in normalization: Fastembed applies normalization to the embeddings by default, ensuring unit vector outputs.
You can verify this using the following code snippet, which demonstrates the comparison:

import io
import urllib

import torch
from PIL import Image
import numpy as np
from fastembed import ImageEmbedding
from transformers import CLIPProcessor, CLIPModel


def normalize(embedding: np.ndarray):
    norm = np.linalg.norm(embedding)
    return embedding / norm

response = urllib.request.urlopen("http://images.cocodataset.org/val2017/000000039769.jpg")
image = Image.open(io.BytesIO(response.read()))

fe_model = ImageEmbedding("Qdrant/clip-ViT-B-32-vision", cache_dir="models")
hf_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
hf_preprocess = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

hf_image_input = hf_preprocess(images=image, return_tensors="pt")
with torch.no_grad():
    hf_image_embeddings = hf_model.get_image_features(**hf_image_input).numpy()

fe_image_embeddings = list(fe_model.embed(images=[image]))

normalized_hf_image_embeddings = normalize(hf_image_embeddings)
print(np.allclose(normalized_hf_image_embeddings, fe_image_embeddings, atol=1e-3)) # True

Running this snippet will show that after normalizing the HuggingFace embeddings, they match the Fastembed output (up to a small numerical tolerance). I hope this clarifies the behavior! Let me know if you have further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants