-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sentence Transformer encodings #1168
Comments
Adding a question to your issue would be quite helpful. |
yes , After extracting embeddings from sbert we using cosine score for sorting results. Issue here is results with high cosine score are irrelevant . And similar results are getting less score. We are unable to figure it out why it is happening |
Likely due to wrongly training the model. |
Thank you , Does performance depends on batch size ? Could you please elaborate what does it mean by wrongly training ? Epoch, batch size or Data perspective We trained for 4 epochs with batch size 16 and used SoftmaxLoss . |
SoftmaxLoss is the wrong loss. Have a look at the other losses functions |
Thanks for your replay , Could you please suggest me preferable loss to train sbert ? |
MultipleNegativesRankingLoss or one of the triplet losses |
Thank you @nreimers , Could you please explain why SoftmaxLoss is the wrong loss ? In the sbert website you mentioned that you used softmax loss for training sbert on NLI data and our data labels are similar to NLI data. |
Thank you @nreimers , Could you please explain why SoftmaxLoss is the wrong loss ? In the sbert website you mentioned that you used softmax loss for training sbert on NLI data and our data labels are similar to NLI data. |
That it works on NLI is rather a coincidence, but there is not good logic behind it: |
Thank you for you suggestion , In hard multiplenegativesrankingloss , team stated that "" You can also provide one or multiple hard negatives per anchor-positive pair by structering the data like this: (a_1, p_1, n_1), (a_2, p_2, n_2) Could you please eloberate this statement . Does it mean loss use P_j and n_j as negatives for a_i ? |
Yes |
We did synonym expansion for data and in our case most of a_i and p_j are positive. How does it works in this case. Won't it effect the embeddings? |
We did synonym expansion for data and in our case most of a_i and p_j are positive. How does it works in this case. Won't it effect the embeddings? |
Then you have to create a custom DataLoader that ensures that a batch does not contain two entries of the same type |
It is very difficult for us to extract two entries of the same type. Is it okay to go with triplet loss ? |
Sure |
Thank you .... Does distance_metric in triplet loss has any impact on performance ? We tried with default Euclidean preformance was not good so we are trying now with COsine. |
Hi ,
We finetuned Sentence transformers on our domain specific data (similar to NLI data). It is giving high cosine score for irrelevant suggestions . We used good , bad , ok while labeling the data.
The text was updated successfully, but these errors were encountered: