Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text-embeddings-inference updated example trussless #386

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

michaelfeil
Copy link
Contributor

adding a trusses example for text-embeddings-inference

@michaelfeil michaelfeil marked this pull request as ready for review December 6, 2024 18:07
@michaelfeil michaelfeil changed the title text-embeddings-inference updated example trusses text-embeddings-inference updated example trussless Dec 7, 2024
Copy link
Contributor

@vshulman vshulman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits -- I think the biggest one is we should make the README a little more embeddings-specific

text-embeddings-inference/custom_server/README.md Outdated Show resolved Hide resolved
text-embeddings-inference/custom_server/README.md Outdated Show resolved Hide resolved
@@ -0,0 +1,28 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to keep this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, that's why I left it in. Do we want to keep this?

text-embeddings-inference/custom_server/README.md Outdated Show resolved Hide resolved
text-embeddings-inference/custom_server/README.md Outdated Show resolved Hide resolved
text-embeddings-inference/custom_server/config.yaml Outdated Show resolved Hide resolved
@michaelfeil
Copy link
Contributor Author

michaelfeil commented Dec 19, 2024

Okay, worked on it! @vshulman

Copy link
Contributor

@vshulman vshulman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One stray file (I think) and two copy changes requested. Rest LGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray file?

text-embeddings-inference/README.md Outdated Show resolved Hide resolved
- which model is deployed
- how many concurrent requests users are sending

The deployment example is for Bert-large and a Nvidia-L4. Bert-large has a maxiumum sequence length of 512 tokens per sentence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bert=>BERT

text-embeddings-inference/README.md Show resolved Hide resolved
--max-client-batch-size 256
```
This determines the number of sentences / items in a single request.
For optimal autoscaling that gets regulated by metrics such as requests/second in Baseten's infrastructure, you want to set this as low as possible. OpenAI-API historically set it to `--max-client-batch-size 32`, which could help for more aggressive autoscaling and thus better latency. One the other hand, frameworks such as LLamaIndex, Langchain or Haystack might prefer or even require higher batch_sizes, especially if the user code is old-fashioned and sends requests 1-by-1 in a for loop. This depends on your users & how you are planning to use your deployment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a little hard to follow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it, and just set it 32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants