README.md.gotmpl

# magda-embedding-api

![CI Workflow](https://github.com/magda-io/magda-embedding-api/actions/workflows/main.yml/badge.svg?branch=main) [![Release](https://img.shields.io/github/release/magda-io/magda-embedding-api.svg)](https://github.com/magda-io/magda-embedding-api/releases)

An [OpenAI's `embeddings` API](https://platform.openai.com/docs/api-reference/embeddings/create) compatible microservice for Magda.

> See [this test case](./test/integration.test.ts) for an example of how to use this API with [@langchain/openai](https://www.npmjs.com/package/@langchain/openai).

Text embeddings evaluate how closely related text strings are. They are commonly utilized for:

- Search (ranking results based on their relevance to a query)
- Clustering (grouping text strings by similarity)
- Recommendations (suggesting items with similar text strings)
- Anomaly detection (identifying outliers with minimal relatedness)
- Diversity measurement (analyzing similarity distributions)
- Classification (categorizing text strings by their most similar label)

An embedding is a vector, or a list, of floating-point numbers. The distance between two vectors indicates their relatedness, with smaller distances suggesting higher relatedness and larger distances indicating lower relatedness.

This embedding API is created for [Magda](https://github.com/magda-io/magda)'s vector / hybrid search solution. The API interface is compatible with OpenAI's `embeddings` API to make it easier to reuse existing tools & libraries.

> the diagram below shows Magda's use case

![architecture](docs/architecture.png)

### Model Selection & Resources requirements

> Only the default mode, will be included in the docker image to speed up the starting up.
> If you want to use a different model (via `appConfig.modelList`), besides the resources requirements consideration here, you might also want to increase `pluginTimeout` and adjust `startupProbe` to allow the longer starting up time introduced by the model downloading.

Due to [this issue of ONNX runtime](https://github.com/microsoft/onnxruntime/issues/15080), the peak memory usage of the service is much higher than the model file size (2 times higher).
e.g. For the default 500MB model file, the peak memory usage could up to 1.8GB - 2GB.
However, the memory usage will drop back to much lower (for default model, it's around 800MB-900MB) after the model is loaded.
Please make sure your Kubernetes cluster has enough resources to run the service.

When specify `appConfig.modelList`, you can set the value of `dtype` field to select precision (quantizations) of the model. The possible value are: full-precision ("fp32"), half-precision ("fp16"), 8-bit ("q8", "int8", "uint8"), and 4-bit ("q4", "bnb4", "q4f16"). Please refer to helm chart document below for more information. You can also find an example from this config file [here](./deploy/test-deploy.yaml).

> Memory consumption test result for a few selected models can be found: https://github.com/magda-io/magda-embedding-api/issues/2

{{ template "chart.maintainersSection" . }}

{{ template "chart.requirementsSection" . }}

{{ template "chart.valuesHeader" . }}

{{ template "chart.valuesTable" . }}

### Build & Run for Local Development

> Please note: for production deployment, please use the released [Docker images](https://github.com/magda-io/magda-embedding-api/pkgs/container/magda-embedding-api) & [helm charts](https://github.com/magda-io/magda-embedding-api/pkgs/container/charts%2Fmagda-embedding-api).

#### Prerequisites

- Node.js 18.x
- [Minikube](https://minikube.sigs.k8s.io/) (for local Kubernetes development)
  - [Minikube addons Registry & Registry Aliases are requried](https://minikube.sigs.k8s.io/docs/handbook/addons/registry-aliases/)

#### Install dependencies

```bash
yarn install
```

#### Run the service locally

```bash
yarn start
```

#### Build Docker Image, Push into local Registry & Deploy to minikube Cluster


```bash
yarn build
yarn docker-build-local
```

Deploy to minikube Cluster

```bash
helm -n test upgrade --install test ./deploy/magda-embedding-api -f ./deploy/test-deploy.yaml
```