Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug/embeddings]: big difference between embeddings in sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 #368

Open
SardorLut opened this issue Oct 17, 2024 · 0 comments

Comments

@SardorLut
Copy link

What happened?

import numpy as np
from numpy import dot
from numpy.linalg import norm
from fastembed import TextEmbedding
from sentence_transformers import SentenceTransformer

text = ["Я помню чудное мгновенье:\nПередо мной явилась ты,\nКак мимолетное виденье,\nКак гений чистой красоты."]

embedding_model = TextEmbedding(
    model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2", providers=["CPUExecutionProvider"], 
)
embed_from_fastembed = np.array(list(embedding_model.embed(documents=text)))

model = SentenceTransformer('sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2')
embeddings_from_sentence_transformer = np.array(model.encode(text))

a = embeddings_from_sentence_transformer
b = embed_from_fastembed[0]
cos_sim = dot(a, b)/(norm(a)*norm(b))
print(float(cos_sim)) #0.6093958020210266

The difference is even greater if I give more bigger text

What Python version are you on? e.g. python --version

manager: poetry
Python=3.10
fastembed-gpu="^0.3.6"
onnxruntime-gpu==1.18.0

Version

0.2.7 (Latest)

What os are you seeing the problem on?

No response

Relevant stack traces and/or logs

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant