[Bug/Model Request]: Qdrant/clip-ViT-B-32-vision model image embeddings are not the same as Hugginface or openai-clip #357

onurtunali · 2024-10-06T15:56:27Z

What happened?

I am not a vision expert so apologies in advance if my interpretation of the situation is incorrect. While working with Fastembed, I have observed that image embeddings generated by "Qdrant/clip-ViT-B-32" model are not the same as Hugginface "openai/clip-vit-base-patch32" model or OpenAI ""ViT-B/32" model.

I am adding an mre:

Versions:
fastembed: 0.3.6
transformers: 4.42.3
openai-clip: 1.0.1
python: 3.10.14

import io
import urllib

import clip
import numpy as np
import torch
from fastembed import ImageEmbedding
from PIL import Image
from transformers import CLIPModel, CLIPProcessor

N_TRIALS = 5
fastembed_trial_results = []
hf_trial_results = []

fastembed_model = ImageEmbedding(model_name="Qdrant/clip-ViT-B-32-vision")
openai_clip_model, openai_preprocess = clip.load("ViT-B/32")
hf_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
hf_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")


def generate_sample_data(sample_size=5, image_size=600):
    """Get random images with given sample_size."""
    images = []
    for _ in range(sample_size):
        response = urllib.request.urlopen(f"https://picsum.photos/{image_size}")
        image = Image.open(io.BytesIO(response.read()))
        images.append(image)
    return images


def get_fastembed_embeddings(images):
    fastemed_embeddins = list(fastembed_model.embed(images))
    fastemed_embeddins = np.vstack(fastemed_embeddins)
    return fastemed_embeddins


def get_openai_embeddings(images):
    image_tensors = torch.vstack([openai_preprocess(i).unsqueeze(0) for i in images])

    with torch.no_grad():
        openai_embeddings = openai_clip_model.encode_image(image_tensors).numpy()
    return openai_embeddings


def get_hf_embeddings(images):
    hf_model.eval()
    input_dict = hf_processor(images=images, return_tensors="pt")
    with torch.no_grad():
        hf_embeddings = hf_model.get_image_features(**input_dict).numpy()
    return hf_embeddings


for t in range(N_TRIALS):
    images = generate_sample_data()

    fastemed_embeddins = get_fastembed_embeddings(images)
    openai_embeddings = get_openai_embeddings(images)
    hf_embeddings = get_hf_embeddings(images)

    if np.allclose(fastemed_embeddins, openai_embeddings, atol=0.001):
        print(f"Trial {t} Fastembed same with openai")
        fastembed_trial_results.append(True)
    else:
        print(f"Trial {t} Fastembed is NOT the same with openai")
        fastembed_trial_results.append(False)

    if np.allclose(openai_embeddings, hf_embeddings, atol=0.001):
        print(f"Trial {t} Hf same with openai")
        hf_trial_results.append(True)
    else:
        print(f"Trial {t} Hf is NOT the same with openai")
        hf_trial_results.append(False)

print(f"Out of {N_TRIALS}, {sum(fastembed_trial_results)} are the same for Fastembed")
print(f"Out of {N_TRIALS}, {sum(hf_trial_results)} are the same for HF")

Here is the colab version:

What Python version are you on? e.g. python --version

Python 3.10.14

Version

0.2.7 (Latest)

What os are you seeing the problem on?

Linux, MacOS

Relevant stack traces and/or logs

Trial 0 Fastembed is NOT the same with openai
Trial 0 Hf same with openai
Trial 1 Fastembed is NOT the same with openai
Trial 1 Hf same with openai
Trial 2 Fastembed is NOT the same with openai
Trial 2 Hf same with openai
Trial 3 Fastembed is NOT the same with openai
Trial 3 Hf same with openai
Trial 4 Fastembed is NOT the same with openai
Trial 4 Hf same with openai
Out of 5, 0 are the same for Fastembed
Out of 5, 5 are the same for HF

hh-space-invader · 2025-01-14T08:57:17Z

Thank you for your observation and for reaching out! You're correct that embeddings from Qdrant/clip-ViT-B-32 in Fastembed may not match those from HuggingFace openai/clip-vit-base-patch32 or OpenAI's ViT-B/32 directly. The difference lies in normalization: Fastembed applies normalization to the embeddings by default, ensuring unit vector outputs.
You can verify this using the following code snippet, which demonstrates the comparison:

import io
import urllib

import torch
from PIL import Image
import numpy as np
from fastembed import ImageEmbedding
from transformers import CLIPProcessor, CLIPModel


def normalize(embedding: np.ndarray):
    norm = np.linalg.norm(embedding)
    return embedding / norm

response = urllib.request.urlopen("http://images.cocodataset.org/val2017/000000039769.jpg")
image = Image.open(io.BytesIO(response.read()))

fe_model = ImageEmbedding("Qdrant/clip-ViT-B-32-vision", cache_dir="models")
hf_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
hf_preprocess = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

hf_image_input = hf_preprocess(images=image, return_tensors="pt")
with torch.no_grad():
    hf_image_embeddings = hf_model.get_image_features(**hf_image_input).numpy()

fe_image_embeddings = list(fe_model.embed(images=[image]))

normalized_hf_image_embeddings = normalize(hf_image_embeddings)
print(np.allclose(normalized_hf_image_embeddings, fe_image_embeddings, atol=1e-3)) # True

Running this snippet will show that after normalizing the HuggingFace embeddings, they match the Fastembed output (up to a small numerical tolerance). I hope this clarifies the behavior! Let me know if you have further questions.

joein assigned hh-space-invader Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug/Model Request]: Qdrant/clip-ViT-B-32-vision model image embeddings are not the same as Hugginface or openai-clip #357

[Bug/Model Request]: Qdrant/clip-ViT-B-32-vision model image embeddings are not the same as Hugginface or openai-clip #357

onurtunali commented Oct 6, 2024

hh-space-invader commented Jan 14, 2025 •

edited

Loading

[Bug/Model Request]: Qdrant/clip-ViT-B-32-vision model image embeddings are not the same as Hugginface or openai-clip #357

[Bug/Model Request]: Qdrant/clip-ViT-B-32-vision model image embeddings are not the same as Hugginface or openai-clip #357

Comments

onurtunali commented Oct 6, 2024

What happened?

What Python version are you on? e.g. python --version

Version

What os are you seeing the problem on?

Relevant stack traces and/or logs

hh-space-invader commented Jan 14, 2025 • edited Loading

hh-space-invader commented Jan 14, 2025 •

edited

Loading