The embedding service transform text to a vector representation. This can be useful, for example, for semantic textual similarity or semantic search.
An embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space.
The service can use sentence transformers models available on Hugging Face.
The default model is
T-Systems-onsite/cross-en-de-roberta-sentence-transformer
which supports English and German.
The Embedding service is available as a Docker image.
docker pull ghcr.io/data-house/embedding-service:main
The model is downloaded at startup time, so depending on the model size the first start can take several minutes. We suggest to mount a persistent volume to cache the downloaded models (folder: /root/.cache/torch/sentence_transformers
).
A sample docker-compose.yaml
file is available within the repository.
Please refer to Releases and Packages for the available tags.
Available environment variables
variable | default | description |
---|---|---|
MODEL_NAME |
T-Systems-onsite/cross-en-de-roberta-sentence-transformer |
The name of a sentence transformer model published on Hugging Face |
STRICT_MODE |
false |
Whenever to return an error if the text to embed is longer than the maximum context length of the model |
WORKERS |
2 | The number of Gunicorn sync workers |
WORKERS_TIMEOUT |
600 | The timeout, in seconds, of each worker |
The Embedding service expose a web application on port 5000
. The available API receive the text and return the vector representation as a JSON response.
The exposed service is unauthenticated therefore consider exposing it only within a trusted network. If you plan to make it available publicly consider adding a reverse proxy with authentication in front.
POST /embed
The /embed
endpoint accepts a POST
request with the following input as a json
body:
corpus
the text to transform
It returns an JSON with the following fields:
embedding
the array representing the embedding of the given text
warning The processing is performed synchronously
GET /embed
To obtain the maximum embedding size, the /embed
endpoint accepts a GET
request with no parameters.
It returns an JSON with the following fields:
embedding_size
the maximum length of the embedding
The service can return the following errors
code | message | description |
---|---|---|
422 |
Missing corpus in json body | In case no text is passed to the API |
422 |
Corpus too long | In case the text to embed is too long for the model, when STRICT_MODE is enabled |
The body of the response can contain a JSON with the following fields:
error
the error description
{
"error": "Missing corpus in json body",
}
The Embedding service is built using Flask on Python 3.9.
Given the selected stack the development requires:
- Python 3.9 with PIP
- Docker (optional) to test the build
Install all the required dependencies:
pip install -r requirements.txt
Run the local development application using:
python -m flask --app embedding_service run
to be documented
Thank you for considering contributing to the Embedding service! The contribution guide can be found in the CONTRIBUTING.md file.
The project is supported by OneOff-Tech (UG) and Oaks S.r.l.
If you discover a security vulnerability within Embedding service, please send an e-mail to OneOff-Tech team via [email protected]. All security vulnerabilities will be promptly addressed.