Skip to content

Commit

Permalink
infinity embedding server example (#378)
Browse files Browse the repository at this point in the history
Add truss example to deploy https://github.com/michaelfeil/infinity

---------

Co-authored-by: Tianshu Cheng <[email protected]>
  • Loading branch information
tianshuc0731 and Tianshu Cheng authored Nov 16, 2024
1 parent 45d0065 commit 239a568
Show file tree
Hide file tree
Showing 7 changed files with 98 additions and 2 deletions.
File renamed without changes.
80 changes: 80 additions & 0 deletions custom-server/infinity-embedding-server/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Infinity Embedding Server Truss

This is a [Truss](https://truss.baseten.co/) to deploy [infinity embedding server](https://github.com/michaelfeil/infinity), a high-throughput, low-latency REST API server for serving vector embeddings.

## Deployment

Before deployment:

1. Make sure you have a [Baseten account](https://app.baseten.co/signup) and [API key](https://app.baseten.co/settings/account/api_keys).
2. Install the latest version of Truss: `pip install --upgrade truss`
3. [Required for gated model] Retrieve your Hugging Face token from the [settings](https://huggingface.co/settings/tokens). Set your Hugging Face token as a Baseten secret [here](https://app.baseten.co/settings/secrets) with the key `hf_access_key`.

First, clone this repository:

```sh
git clone https://github.com/basetenlabs/truss-examples.git
cd custom-server/infinity-embedding-server
```

With `infinity-embedding-server` as your working directory, you can deploy the model with the following command, paste your Baseten API key if prompted.

```sh
truss push --publish --trusted
```

## Call your model

### curl

```bash
curl -X POST https://model-xxx.api.baseten.co/development/predict \
-H "Authorization: Api-Key YOUR_API_KEY" \
-d '{"input": "text string"}'
```

### request python library

```python
import requests

resp = requests.post(
"https://model-xxx.api.baseten.co/development/predict",
headers={"Authorization": "Api-Key YOUR_API_KEY"},
json={"input": "text string"},
)

print(resp.json())
```

### openai python SDK

```python
import os
from openai import OpenAI

client = OpenAI(
api_key=os.environ["YOUR_API_KEY"],
base_url="https://bridge.baseten.co/v1/direct"
)

model_id = "xxx"
deployment_id = "xxx"

response = client.embeddings.create(
input="text string",
model="BAAI/bge-small-en-v1.5",
extra_body={
"baseten": {
"model_id": model_id,
"deployment_id": deployment_id
}
}
)

print(response.data[0].embedding)
```

## Support

If you have any questions or need assistance, please open an issue in this repository or contact our support team.
16 changes: 16 additions & 0 deletions custom-server/infinity-embedding-server/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
base_image:
image: python:3.11-slim
docker_server:
start_command: sh -c "infinity_emb v2 --model-id BAAI/bge-small-en-v1.5"
readiness_endpoint: /health
liveness_endpoint: /health
predict_endpoint: /embeddings
server_port: 7997
resources:
accelerator: L4
use_gpu: true
model_name: infinity-embedding-server
requirements:
- infinity-emb[all]
environment_variables:
hf_access_token: null
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ First, clone this repository:

```sh
git clone https://github.com/basetenlabs/truss-examples/
cd trussless/pixtral-12b
cd custom-server/pixtral-12b
```

Before deployment:
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ First, clone this repository:

```sh
git clone https://github.com/basetenlabs/truss-examples.git
cd trussless/ultravox-0.4
cd custom-server/ultravox-0.4
```

Before deployment:
Expand Down
File renamed without changes.

0 comments on commit 239a568

Please sign in to comment.