-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asymmetric semantic search with multilingual-MiniLM-L12-v2? #1463
Comments
Yes, this is currently work in progress. We hope we can start soon the training process. For German & English, there are some MSMARCO English-German models on the hub: |
Interesting. To avoid double training, I used
and got the 2000 steps result of https://drive.google.com/drive/folders/1--U-RQJscmfiZ7BxCzayLBLImO10HsRc?usp=sharing which can be reused |
Eagerly awaiting the results of this training! |
I just made this comment in another similar issue - it should solve this problem.
|
I have a corpus with 144,491 entries with around 2000 characters each forming phrases in english and german.
Each entry in monolingual.
My goal is to enter a query like a question or a set of keywords for it to output the best fitting index in the corpus.
I am using sentence-transformers_paraphrase-multilingual-MiniLM-L12-v2 currently with a
This gives reasonable results, but is there a better approach?
I am asking, because this is an asymmetric semantic search, which should use the MSMARCO Models according to your description, yet those are only in english and https://www.sbert.net/examples/training/ms_marco/multilingual/README.html seems unfinished.
Is the idea to
Which approach using sbert do you suggest?
The text was updated successfully, but these errors were encountered: