-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiencing https://github.com/huggingface/tokenizers/issues/537 issue when sentence-transformer is used for generating embeddings #794
Comments
Sentence-Transformer is using the huggingface code. Check the linked issue for solutions how to solve it:
In the model you are using, there is a folder 0_Transformer, which contains a tokenizer_config.json. Add in that json a new entry: This should load the slower Python tokenizer. |
Thank you @nreimers, I will try using "use_fast": false in the tokenizer_config.json. Multi-threading consumer can certainly be looked into; however I do not have control over there. In my service, a single model is used for generating embeddings; so I load it once and allow to serve embeddings from the same model. Some of the clients are multi-threaded and there it is causing the issue. |
I am in a similar situation to @MyBruso, and I am encountering the same problem. @nreimers' suggestion of setting |
@nreimers Excuse me, I have tried use_fast = false as you suggest but the error still happens.
or
I have used no library other than tensorflow and keras, so I have no idea what's wrong! |
I spent some time inspecting the code, and I figured out a solution similar to the one @nreimers suggested but actually works for me: add For context, I am using |
@aphedges Great, thanks for posting the solution here |
You're welcome! I probably wouldn't have figured it out without your initial comment, so thank you. I hope |
Thank you for sharing your findings @aphedges. By the way I am using pretrained distilbert-base-nli-stsb-mean-tokens model from sentence-transformer. I will try adding "tokenizer_args": {"use_fast": false} to sentence_bert_config.json. |
Thank you @nreimers and @aphedges. I am not experiencing this 'Already borrowed' error now. Since we have touched this topic of using 'SentenceTransformer' object from multithreaded program, I am checking this here, is there any way to improve the encode() ( get embeddings) API's response time when used from a multithreaded server?
|
Hi @MyBruso Instead of threads you should use processes. The process should have the model loaded. For each process, you should limit the number of threads the process can use. |
any news? |
@jkhalsa-arabesque I am assuming you are curious to know if I could use model in multi-process environment. The answer to that is there is a limitation to use multi-processed server at my end. |
I was actually curious if the underlying issue has been resolved. |
If you are referring to 'Already borrowed' exception issue; I am not observing it and solution is as per this comment above. If it is about whether it has been fixed in transformer library; I do not have any latest updates on it. |
If pretrained model's name is known it could be done programmatically: from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer
model_name = "paraphrase-distilroberta-base-v2"
model = SentenceTransformer(model_name)
model.tokenizer = AutoTokenizer.from_pretrained(f"sentence-transformers/{model_name}", use_fast=False) |
Recently, there was a fix in The only way to truly prevent it is to not use the Rust-based tokenizer, but the workarounds that have been posted in this thread are annoying to use. I have just created a PR (#1103) to make disabling the fast tokenizer easier. If my PR is merged, then using something as simple as |
I am just curious that which one is more fast, the way to disable the fast tokenizer or the way in transformers to reduce borrowed situation? |
@CaptXiong, I'm not sure what you are asking. Whether it is faster to use the slow tokenizer (by disabling the fast tokenizer) or use the fast tokenizer (by using the fix that reduces the borrowing problem)? I would hope the faster tokenizer would still be much faster, but I have not benchmarked it myself. |
@aphedges Thank you! I referenced your code to rewrite the SentenceTransformer class and the bug didn't come out again. |
Is there still interest in this? If so I can post a PR fixing it by storing the tokenizer in thread-local storage. |
Hello,
I am using sentence-transformer to get the text embeddings using SentenceTransformer.encode().
This function is getting invoked from multi-threaded program.
I see sometimes this encode method fails with 'Already Borrowed' exception.
The issue is same or similar to huggingface/tokenizers#537.
My sentence-transformer version is 0.4.1.2.
Which version of sentence-transformer will not have this issue? Or what is the alternative to resolve this issue?
The text was updated successfully, but these errors were encountered: