Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Deploy an SBERT model? #245

Open
ar3717 opened this issue May 23, 2020 · 17 comments
Open

How to Deploy an SBERT model? #245

ar3717 opened this issue May 23, 2020 · 17 comments

Comments

@ar3717
Copy link

ar3717 commented May 23, 2020

Hi, I am building a semantic search application and I want to deploy (put into production) my fine-tuned domain-adapted SBERT model. Any idea/recommendations for doing that?

@ar3717 ar3717 changed the title Deploy SBERT How to Deploy an SBERT model? May 23, 2020
@nreimers
Copy link
Member

Do you need any specific help?

@ar3717
Copy link
Author

ar3717 commented May 23, 2020

Yes, so let's say I want to have a server where I can send an http request to it along with a sentence (query) and I want to get back the embedding based on your pre-trained SBERT-NLI model. Do you have any suggestions on how I can do that?

@nreimers
Copy link
Member

You would need two components: The sentence embeddings service and the index server.

For the sentence embeddings service, I would use fastAPI. It should only be few lines of python code to run sentence transformers there and to return the embedding for a sent sentence.

Then for the corpus you would need to index your sentence embeddings so that you can search them. If you have a small corpus, you could use ElasticSearch.

For larger corpora, I can recommend faiss.

@ar3717
Copy link
Author

ar3717 commented May 23, 2020

Thanks a lot. Would you recommend Flask as another alternative for the sentence embedding service?

@nreimers
Copy link
Member

Flask also works

@FantasyCheese
Copy link

Hi @nreimers could you kindly elaborate Faiss vs ElasticSearch thing? Like how large are we talking about when Faiss is recommended and what's the problem that ElasticSearch might have? Our backend team have strong AWS background and it would be hard to convince them not to use AWS ElasticSearch Serivce.

@nreimers
Copy link
Member

nreimers commented May 27, 2020

Hi @FantasyCheese
The issue is that ES is performing full nearest neighbor search. If you have 1 Million documents (vectors) in your index, the vector of the index is compared against all 1 Million docs. Hence, the runtime is linear with the number of docs in your index.

In your experiments, the latency up to around 100k was OK. But this of course depends on your setup and how time critical your task is.

Faiss on the other hand uses approximate nearest neighbor (ANN) and is able to index the embeddings. There, you can retrieve the results within milli seconds, independent how many vectors you have indexed. Even when you have Million or Billion docs (vectors), you can find the nearest neighbors efficiently.

ANN is something ES is working on since a year, but I think so far it was yet done:
elastic/elasticsearch#42326

@FantasyCheese
Copy link

@nreimers Wow that was fast and clear! Thanks a lot for your clarification!

@pistocop
Copy link

pistocop commented May 28, 2020

Hi @ar3717 ,
I'm actually working on a very similar task and I have a question for you:

When you say

my fine-tuned domain-adapted SBERT model

what do you mean?

You have fine-tuned some BERT-kind model on a domain-specific dataset in an unsupervised manner, and then used this domain-specific-BERT model to train SBERT version using the same datasets used by @nreimers ?

Or you have taken another approach?

Thanks in advance if you will share some information.

@ar3717
Copy link
Author

ar3717 commented May 28, 2020

@GuardatiSimone Hi, Yeah, that is what I have done so far basically. Fine tune BERT on my domain and then feed that fine-tuned BERT into SBERT and use one of the training data sets (e.g. NLI or STS) to retrain the SBERT. But I am planing to use my own corpus to retrain my SBERT based on the wikipedia task in @nreimers paper. What is your approach?

@pistocop
Copy link

Hi @ar3717 and thanks for the reply.

At the moment, I'm using SBERT pretrained to calculate the sentences embeddings, then feed them into a Faiss system and build a BE (FastAPI) to get the top-K neighbors from it.

Although pretrained SBERT is working well, I need to build an embedding system able to adapt to a domain-specific context - in an unsupervised manner.

The approach are you using [1] produce better embeddings? And the global effort [1] is high in computational terms?

[1] BERT finetune on specific corpora + SBERT training on NLI/STS

@ar3717
Copy link
Author

ar3717 commented May 29, 2020

it really depends on your data size. So far, I have tried fine tuning BERT on a small domain specific corpus and I have seen some improvements but I think if I increase my corpus size (that is specific to my domain), I will get much better results. The corpus that I used for fine-tuning is around 6MB and it took 15 min on an AWS GPU ml.p3.2xlarge instance for fine-tuning. SBERT NLI training on the finetuned model took like 1.5 hours on NLI data on the same AWS GPU instance. Let me know if that helps.

@pistocop
Copy link

Hi, many thanks for your sharing, it is very useful for me.

I'm was trying to gather more informations as possible before starting my work, mainly to know more or less if the goal (better embeddings) could be reached, and the possible price range of the machines for the training.

So many thanks @ar3717 for the info and nreimers for the amazing repository.

@cabhijith
Copy link

@ar3717 How did you fine-tune BERT on your domain-specific dataset and then feed it to SBERT? I guess you used this script ?

@ar3717
Copy link
Author

ar3717 commented Jun 3, 2020

@cabhijith For SBERT, Iused that script. For fine-tuning BERT, I used this one: https://github.com/huggingface/transformers/blob/v2.9.1/examples/language-modeling/run_language_modeling.py. Are you doing the same thing? If so, could you please let me know what your approach is in case it is different from what I am doing?

@threefoldo
Copy link

@jobergum
Copy link

jobergum commented Sep 9, 2020

https://vespa.ai/ (https://github.com/vespa-engine/vespa) supports fast ANN tensor search/embedding retrieval using HNSW and one can combine regular sparse retrieval with embedding based retrieval in the same query. Our cord19.vespa.ai app uses sentence-bert embeddings for "Related articles" Example https://cord19.vespa.ai/article/58938.

Resources:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants