-
Notifications
You must be signed in to change notification settings - Fork 381
[API] add support for embeddings api #1208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[API] add support for embeddings api #1208
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @gcalmettes, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the AIBrix API gateway by introducing support for the OpenAI /v1/embeddings
endpoint. It also implements a more flexible and configurable routing mechanism, allowing administrators to specify which API endpoints a deployed model should expose via a new Kubernetes label.
Highlights
- API Endpoint Expansion: I've added full support for the
/v1/embeddings
OpenAI API endpoint, enabling AIBrix to handle embedding generation requests. - Configurable Model Routing: I've introduced a new optional deployment label,
model.aibrix.ai/supported-routes
, which allows specifying a comma-separated list of supported API routes (e.g., "embeddings", "chat-completions"). This provides granular control over which endpoints are exposed for a given model. - Default Route Behavior: If the
model.aibrix.ai/supported-routes
label is not present, models will default to supporting the existing/v1/completions
and/v1/chat/completions
endpoints, maintaining backward compatibility. - Embeddings Input Validation: I've implemented validation logic for
/v1/embeddings
requests to check the input sequence length, ensuring it's within acceptable bounds (0 to 1024 characters/tokens).
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR adds support for the /v1/embeddings
API endpoint by introducing configurable routes via labels and updating request validation logic. The core changes appear sound. Additionally, a few minor improvements regarding code clarity and robustness have been suggested.
pkg/plugins/gateway/util.go
Outdated
switch input := embeddingNewParamsInputUnionAsAny(&inputParam).(type) { | ||
case *string: | ||
size = len(*input) | ||
case *[]string: | ||
size = len(*input) | ||
case *[]int64: | ||
size = len(*input) | ||
case *[][]int64: | ||
size = len(*input) | ||
default: | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default
case in this switch
statement currently does nothing, meaning size
will remain 0
if an unknown or unhandled input type is encountered from embeddingNewParamsInputUnionAsAny
. While this will subsequently trigger the if size == 0
error, the error message might be misleading.
Consider explicitly handling the default
case by logging a warning or returning an error indicating an unsupported/unknown input type. This would be particularly helpful if the openai-go
library introduces new input types for embeddings in the future.
switch input := embeddingNewParamsInputUnionAsAny(&inputParam).(type) {
case *string:
size = len(*input)
case *[]string:
size = len(*input)
case *[]int64:
size = len(*input)
case *[][]int64:
size = len(*input)
default:
// Log a warning or handle unknown input type explicitly.
// If input is nil (e.g. union was empty), size remains 0, which is handled below.
// If input is of an unexpected non-nil type, this log helps identify it.
if input != nil {
klog.WarningS("unhandled embedding input type", "requestID", requestID, "inputType", fmt.Sprintf("%T", input))
}
// size remains 0, will be caught by the check below.
}
return nil | ||
} | ||
|
||
// TODO: make asAny method publicly available on OpenAI go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
67dc5f8
to
05b8c31
Compare
Signed-off-by: Guillaume Calmettes <[email protected]>
Signed-off-by: Guillaume Calmettes <[email protected]>
05b8c31
to
a28193b
Compare
Signed-off-by: Guillaume Calmettes <[email protected]>
Signed-off-by: Guillaume Calmettes <[email protected]>
Signed-off-by: Guillaume Calmettes <[email protected]>
Signed-off-by: Guillaume Calmettes <[email protected]>
5c886d2
to
c165cc0
Compare
@Jeffwan @varungup90 what would be the best way to discuss implementation details regarding the changes introduced by this PR ? The current code of the aibrix plugins is tightly coupled to the chat-completions/completions api of openAI (e.g.: the responses body are directly unmarshalled in I made an implementation proposition in this PR (I still have to work on it as right now the |
Signed-off-by: Guillaume Calmettes <[email protected]>
Signed-off-by: Guillaume Calmettes <[email protected]>
Signed-off-by: Guillaume Calmettes <[email protected]>
b1cd840
to
82e4ab9
Compare
Signed-off-by: Guillaume Calmettes <[email protected]>
82e4ab9
to
10b946b
Compare
@gcalmettes are you in the slack channel? we can talk about more implementation details there? a google doc also works |
@varungup90 Since we're aiming to support API compatibility, could you also help review the change proposed by @gcalmettes? The coupling issue raised is indeed a concern for future extensibility. /cc @Xunzhuo |
assign myself for track the review, will schedule some bandwidth soon. |
@Jeffwan yes I am in the slack channel (I believe you're talking about the vllm slack correct ? Or is there another slack I should join ?) |
Pull Request Description
This PR adds the support for the
/v1/embeddings
openAI api endpoint in addition to the currently supported/v1/chat/completions
and/v1/completions
endpointsImplementation considerations to be discussed:
Introduction of the notion of
OpenAiRequestType
HandleResponseBody
method based on the detected request type. This would also allow later on to easily add support for the other openAI api (e.g.: audio, images, etc ...)HTTPRouteMatch:
/v1/chat/completions
and/v1/completions
path prefixes.model.aibrix.ai/supported-request-types
on the deployment that allows to selectively define theHTTPRouteMatch
to be created. If the label is not present, then the default routes will be created (/v1/chat/completions
and/v1/completions
)Todo:
[ ] Tests to be added
Related Issues
Resolves: #1205
Important: Before submitting, please complete the description above and review the checklist below.
Contribution Guidelines (Expand for Details)
We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Bug]
: Corrections to existing functionality[CI]
: Changes to build process or CI pipeline[Docs]
: Updates or additions to documentation[API]
: Modifications to aibrix's API or interface[CLI]
: Changes or additions to the Command Line Interface[Misc]
: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use multiple prefixes in order of importance.
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.