You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am not a vision expert so apologies in advance if my interpretation of the situation is incorrect. While working with Fastembed, I have observed that image embeddings generated by "Qdrant/clip-ViT-B-32" model are not the same as Hugginface "openai/clip-vit-base-patch32" model or OpenAI ""ViT-B/32" model.
importioimporturllibimportclipimportnumpyasnpimporttorchfromfastembedimportImageEmbeddingfromPILimportImagefromtransformersimportCLIPModel, CLIPProcessorN_TRIALS=5fastembed_trial_results= []
hf_trial_results= []
fastembed_model=ImageEmbedding(model_name="Qdrant/clip-ViT-B-32-vision")
openai_clip_model, openai_preprocess=clip.load("ViT-B/32")
hf_model=CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
hf_processor=CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
defgenerate_sample_data(sample_size=5, image_size=600):
"""Get random images with given sample_size."""images= []
for_inrange(sample_size):
response=urllib.request.urlopen(f"https://picsum.photos/{image_size}")
image=Image.open(io.BytesIO(response.read()))
images.append(image)
returnimagesdefget_fastembed_embeddings(images):
fastemed_embeddins=list(fastembed_model.embed(images))
fastemed_embeddins=np.vstack(fastemed_embeddins)
returnfastemed_embeddinsdefget_openai_embeddings(images):
image_tensors=torch.vstack([openai_preprocess(i).unsqueeze(0) foriinimages])
withtorch.no_grad():
openai_embeddings=openai_clip_model.encode_image(image_tensors).numpy()
returnopenai_embeddingsdefget_hf_embeddings(images):
hf_model.eval()
input_dict=hf_processor(images=images, return_tensors="pt")
withtorch.no_grad():
hf_embeddings=hf_model.get_image_features(**input_dict).numpy()
returnhf_embeddingsfortinrange(N_TRIALS):
images=generate_sample_data()
fastemed_embeddins=get_fastembed_embeddings(images)
openai_embeddings=get_openai_embeddings(images)
hf_embeddings=get_hf_embeddings(images)
ifnp.allclose(fastemed_embeddins, openai_embeddings, atol=0.001):
print(f"Trial {t} Fastembed same with openai")
fastembed_trial_results.append(True)
else:
print(f"Trial {t} Fastembed is NOT the same with openai")
fastembed_trial_results.append(False)
ifnp.allclose(openai_embeddings, hf_embeddings, atol=0.001):
print(f"Trial {t} Hf same with openai")
hf_trial_results.append(True)
else:
print(f"Trial {t} Hf is NOT the same with openai")
hf_trial_results.append(False)
print(f"Out of {N_TRIALS}, {sum(fastembed_trial_results)} are the same for Fastembed")
print(f"Out of {N_TRIALS}, {sum(hf_trial_results)} are the same for HF")
Here is the colab version:
What Python version are you on? e.g. python --version
Python 3.10.14
Version
0.2.7 (Latest)
What os are you seeing the problem on?
Linux, MacOS
Relevant stack traces and/or logs
Trial 0 Fastembed is NOT the same with openai
Trial 0 Hf same with openai
Trial 1 Fastembed is NOT the same with openai
Trial 1 Hf same with openai
Trial 2 Fastembed is NOT the same with openai
Trial 2 Hf same with openai
Trial 3 Fastembed is NOT the same with openai
Trial 3 Hf same with openai
Trial 4 Fastembed is NOT the same with openai
Trial 4 Hf same with openai
Out of 5, 0 are the same for Fastembed
Out of 5, 5 are the same for HF
The text was updated successfully, but these errors were encountered:
Thank you for your observation and for reaching out! You're correct that embeddings from Qdrant/clip-ViT-B-32 in Fastembed may not match those from HuggingFace openai/clip-vit-base-patch32 or OpenAI's ViT-B/32 directly. The difference lies in normalization: Fastembed applies normalization to the embeddings by default, ensuring unit vector outputs.
You can verify this using the following code snippet, which demonstrates the comparison:
Running this snippet will show that after normalizing the HuggingFace embeddings, they match the Fastembed output (up to a small numerical tolerance). I hope this clarifies the behavior! Let me know if you have further questions.
What happened?
I am not a vision expert so apologies in advance if my interpretation of the situation is incorrect. While working with Fastembed, I have observed that image embeddings generated by "Qdrant/clip-ViT-B-32" model are not the same as Hugginface "openai/clip-vit-base-patch32" model or OpenAI ""ViT-B/32" model.
I am adding an mre:
Versions:
fastembed: 0.3.6
transformers: 4.42.3
openai-clip: 1.0.1
python: 3.10.14
Here is the colab version:
What Python version are you on? e.g. python --version
Python 3.10.14
Version
0.2.7 (Latest)
What os are you seeing the problem on?
Linux, MacOS
Relevant stack traces and/or logs
The text was updated successfully, but these errors were encountered: