You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a very simple application to expose the Sentence Transformer using the model "distiluse-base-multilingual-cased-v1. The only thing it does it to expose an endpoint to encode a list of sentences:
from fastapi import FastAPI
from typing import List
import os
from sentence_transformers import SentenceTransformer
import json
app = FastAPI()
modelUrl = "https://modelhost/models/distiluse-base-multilingual-cased-v1.zip"
model = SentenceTransformer(modelUrl)
@app.get('/healthz')
def healthCheck():
return "OK"
@app.post('/')
def encodeSentences(sentences: List[str]):
embeddings = model.encode(sentences).tolist()
return embeddings
And it works perfectly until I increase the load of requests, for which I start getting the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line 398, in run_asgi
result = await app(self.scope, self.receive, self.send)
File "/usr/local/lib/python3.6/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.6/site-packages/fastapi/applications.py", line 201, in __call__
await super().__call__(scope, receive, send) # pragma: no cover
File "/usr/local/lib/python3.6/site-packages/starlette/applications.py", line 111, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.6/site-packages/starlette/middleware/errors.py", line 181, in __call__
raise exc from None
File "/usr/local/lib/python3.6/site-packages/starlette/middleware/errors.py", line 159, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.6/site-packages/starlette/exceptions.py", line 82, in __call__
raise exc from None
File "/usr/local/lib/python3.6/site-packages/starlette/exceptions.py", line 71, in __call__
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.6/site-packages/starlette/routing.py", line 566, in __call__
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.6/site-packages/starlette/routing.py", line 227, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.6/site-packages/starlette/routing.py", line 41, in app
response = await func(request)
File "/usr/local/lib/python3.6/site-packages/fastapi/routing.py", line 202, in app
dependant=dependant, values=values, is_coroutine=is_coroutine
File "/usr/local/lib/python3.6/site-packages/fastapi/routing.py", line 150, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/usr/local/lib/python3.6/site-packages/starlette/concurrency.py", line 34, in run_in_threadpool
return await loop.run_in_executor(None, func, *args)
File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "/app/app.py", line 19, in encodeSentences
embeddings = model.encode(sentences).tolist()
File "/usr/local/lib/python3.6/site-packages/sentence_transformers/SentenceTransformer.py", line 190, in encode
features = self.tokenize(sentences_batch)
File "/usr/local/lib/python3.6/site-packages/sentence_transformers/SentenceTransformer.py", line 342, in tokenize
return self._first_module().tokenize(text)
File "/usr/local/lib/python3.6/site-packages/sentence_transformers/models/Transformer.py", line 87, in tokenize
output.update(self.tokenizer(*to_tokenize, padding=True, truncation='longest_first', return_tensors="pt", max_length=self.max_seq_length))
File "/usr/local/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 2271, in __call__
**kwargs,
File "/usr/local/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 2456, in batch_encode_plus
**kwargs,
File "/usr/local/lib/python3.6/site-packages/transformers/tokenization_utils_fast.py", line 382, in _batch_encode_plus
pad_to_multiple_of=pad_to_multiple_of,
File "/usr/local/lib/python3.6/site-packages/transformers/tokenization_utils_fast.py", line 335, in set_truncation_and_padding
self._tokenizer.enable_truncation(max_length, stride=stride, strategy=truncation_strategy.value)
RuntimeError: Already borrowed
I see the same error already discussed here huggingface/tokenizers#537
But I was wondering whether there was already a solution specific to the sentence-transformer module.
The text was updated successfully, but these errors were encountered:
I have a very simple application to expose the Sentence Transformer using the model "distiluse-base-multilingual-cased-v1. The only thing it does it to expose an endpoint to encode a list of sentences:
And it works perfectly until I increase the load of requests, for which I start getting the following error:
I see the same error already discussed here huggingface/tokenizers#537
But I was wondering whether there was already a solution specific to the sentence-transformer module.
The text was updated successfully, but these errors were encountered: