text-embeddings-inference updated example trussless #386

michaelfeil · 2024-12-06T18:07:10Z

adding a trusses example for text-embeddings-inference

vshulman

A few nits -- I think the biggest one is we should make the README a little more embeddings-specific

text-embeddings-inference/custom_server/README.md

vshulman · 2024-12-17T20:32:29Z

text-embeddings-inference/custom_server/.maintain/roll_out_docker.sh

@@ -0,0 +1,28 @@
+#!/bin/bash


do we want to keep this?

Not sure, that's why I left it in. Do we want to keep this?

text-embeddings-inference/custom_server/.maintain/roll_out_docker.sh

text-embeddings-inference/custom_server/README.md

text-embeddings-inference/custom_server/config.yaml

michaelfeil · 2024-12-19T05:38:16Z

Okay, worked on it! @vshulman

vshulman

One stray file (I think) and two copy changes requested. Rest LGTM.

vshulman · 2024-12-20T01:28:14Z

internal/config.yaml

stray file?

text-embeddings-inference/README.md

vshulman · 2024-12-20T01:29:36Z

text-embeddings-inference/README.md

+- which model is deployed
+- how many concurrent requests users are sending
+
+The deployment example is for Bert-large and a Nvidia-L4. Bert-large has a maxiumum sequence length of 512 tokens per sentence.


text-embeddings-inference/README.md

vshulman · 2024-12-20T01:31:14Z

text-embeddings-inference/README.md

+--max-client-batch-size 256
+```
+This determines the number of sentences / items in a single request.
+For optimal autoscaling that gets regulated by metrics such as requests/second in Baseten's infrastructure, you want to set this as low as possible. OpenAI-API historically set it to `--max-client-batch-size 32`, which could help for more aggressive autoscaling and thus better latency. One the other hand, frameworks such as LLamaIndex, Langchain or Haystack might prefer or even require higher batch_sizes, especially if the user code is old-fashioned and sends requests 1-by-1 in a for loop. This depends on your users & how you are planning to use your deployment.


this is a little hard to follow

updated it, and just set it 32.

michaelfeil added 2 commits December 6, 2024 06:44

add custom_server example to TEI

ba73906

push: updated config

23a0d7b

michaelfeil marked this pull request as ready for review December 6, 2024 18:07

michaelfeil added 2 commits December 6, 2024 18:20

update: docker files

15b22c8

dockerfile: rm typo

ddb4852

michaelfeil changed the title ~~text-embeddings-inference updated example trusses~~ text-embeddings-inference updated example trussless Dec 7, 2024

Update config.yaml

e01c056

michaelfeil assigned vshulman Dec 10, 2024

vshulman requested changes Dec 17, 2024

View reviewed changes

tei refactor done

5c96709

readme

ac27419

vshulman reviewed Dec 20, 2024

View reviewed changes

michaelfeil added 2 commits December 20, 2024 17:03

update readme

1f8d831

update readme and config

67803bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-embeddings-inference updated example trussless #386

text-embeddings-inference updated example trussless #386

michaelfeil commented Dec 6, 2024

vshulman left a comment

vshulman Dec 17, 2024

michaelfeil Dec 19, 2024

michaelfeil commented Dec 19, 2024 •

edited

Loading

vshulman left a comment

vshulman Dec 20, 2024

vshulman Dec 20, 2024

vshulman Dec 20, 2024

michaelfeil Dec 20, 2024

text-embeddings-inference updated example trussless #386

Are you sure you want to change the base?

text-embeddings-inference updated example trussless #386

Conversation

michaelfeil commented Dec 6, 2024

vshulman left a comment

Choose a reason for hiding this comment

vshulman Dec 17, 2024

Choose a reason for hiding this comment

michaelfeil Dec 19, 2024

Choose a reason for hiding this comment

michaelfeil commented Dec 19, 2024 • edited Loading

vshulman left a comment

Choose a reason for hiding this comment

vshulman Dec 20, 2024

Choose a reason for hiding this comment

vshulman Dec 20, 2024

Choose a reason for hiding this comment

vshulman Dec 20, 2024

Choose a reason for hiding this comment

michaelfeil Dec 20, 2024

Choose a reason for hiding this comment

michaelfeil commented Dec 19, 2024 •

edited

Loading