-
Notifications
You must be signed in to change notification settings - Fork 52
Support k8s gateway API inference extensions #423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@yanavlasov @envoyproxy/ai-gateway-maintainers I see EG also want to implement it refer to https://gateway-api-inference-extension.sigs.k8s.io/implementations/#envoy-gateway, so will we have two implement for this? |
@daixiang0 the preference is to implement this in Envoy AI Gateway first, get user feedback, iterate on the API until it moves to standard and then also directly support the implementation in Envoy Gateway |
We've worked with Envoy GW quite a bit, it's been great. Happy to continue work with y'all to get these projects workin together! Is it easiest if I just pop in on the community meeting on Th? |
thanks for picking this one up @kfswain ! you're familiar with the translation so this should be easy for you 😄 hey @mathetake can you point Kellen to the reconciliation and translation areas |
@yanavlasov said he will start working from next week, so could you @kfswain coordinate with Yan (i assume you both are working at Google) |
SGTM! |
Hi @yanavlasov @kfswain Luckily I have cycles to work on this from tomorrow, so I will do the initial work soon and i might ask for some help from you guys whenever I need. SG? |
That sounds great. Happy to help |
now i get the big picture on how to implement in here... i have one question @kfswain may i ask where's the conformance test suite? i couldn't find one in the repo |
Let's have a design doc first, this certainly changes the scope of the envoy ai gateway project. |
@yuzisun certainly, i will write up by noon tomorrow |
I opened a PR for the high level doc: #492 and here's a couple note from my end and offline discussion:
|
That should be fine, a User will still need an InferenceModel (the EPP reads and uses infModels). Other gateway implementations are not reconciling on the InferenceModel either.
We should have a doc detailing our conformance testing available next week |
This comment has been minimized.
This comment has been minimized.
it looks like i need to read the reference impl more |
ok i was misunderstanding the InferenceModel's role and i think it makes sense, sorry @kfswain ignore my comment |
I discussed this with @yuzisun offline and we are putting the development on hold until we see the actual envoy side impl of LB policy by @yanavlasov to know whether or not the use case by Bloomberg is covered by that |
**Commit Message** This adds a proposal doc on the support for Gateway API Inference Extension in Envoy AI Gateway project. This involves the change of the project scope as well as we need to make sure that the existing API layer will co-exist nicely with the GAIE. **Related Issues/PRs (if applicable)** Preliminary to #423 --------- Signed-off-by: Takeshi Yoneda <[email protected]> Signed-off-by: Erica Hughberg <[email protected]> Co-authored-by: Erica Hughberg <[email protected]>
Update on the impl: control plane 100% done; extproc 80% done for the mvp |
**Commit Message** This commit scaffolds the foundation for the Inference Extension API [1]. The design documentation was merged in #492. The controller needs to be started with `--enableInferenceExtension=true` to not break the existing controller deployment where the Inference Extension CRDs are not installed. This commit doesn't implement the actual "metrics-aware" load balancing and instead it just does the random routing out of given (resolved) endpoints. The follow up implementations will add more advanced algorithm while expanding the metrics interface that currently only provides the setter APIs. The summary of the implementation is: * Added `kind` field to AIGatewayRouteRuleBackendRef so that it can reference InferencePool. * InferencePool.Spec.Selector is allowed to specify multiple AIServiceBackend. * When building up all the extproc config via filterapi.Config, the controller reads the referenced InferencePool and its binding InferenceModels, and group them together into a single filterapi.DynamicLoadBalancing configuration. * When the extproc loads the configuration containing DynamicLoadBalancing, it will resolve all the IP addresses for hostnames belonging to the DyanmicLoadbalancing. The presence of DynamicLoadBalancing in th config forces the config watcher reload and refresh the config regardless of the updates. That way, the list of ip addresses will always be updated (eventual consistency anyways) in a non-hot path. * On the request path, the ChatCompletionProcessor will check the existence of the DynamicLoadBalancing config for the backend selected by the router. If so, it further tries to resolve the ip:port level endpoint selection. * The selected ip:port will be set to the special header that will be routed to ORIGINAL_DST. * ORIGINAL_DST cluster will be added by the EG extension sever implementation. Also, the extension server modifies some routes to properly route to that cluster. 1: https://github.com/kubernetes-sigs/gateway-api-inference-extension **Related Issues/PRs (if applicable)** Built on #492 Contributes to #423 --------- Signed-off-by: Takeshi Yoneda <[email protected]>
I wanted to post the update on the current status of the implementation; Our initial foundational work has landed #493 a few weeks ago. With that, at the consumer of InferencePool/Model level, it works. On the other hand, the initial PR left lots of TODOs, especially the most important load balancing logic is missing. At the community meeting two weeks ago, @yuzisun @sivanantha321 shared some proposal to address these TODOs Metrics-Based Load Balancing for Envoy AI Gateway Inference Extension. The suggested implementation was to implement the load balancing / metrics scraping logic directly in our extproc. After the discussion, we got to the agreement that we should decouple the "endpoint picker" logics instead of direct implementation, which is that we should allow users to bring in their own extensions to perform the load balancing logic. To do so, we need;
My only remaining concern about this direction is how this will play well with the existing features such as transformation, auth and metrics, etc. Personally, i am not currently working on this implementation, and I believe @sivanantha321 and @yuzisun will take on the task. cc @envoyproxy/ai-gateway-maintainers |
so now i am doing a relatively large (internal) refactoring in #599 to primarily add support for Envoy Gateway's fallback/priority/retry compatibility which means we can allow AIServiceBackend level fallbacks using EG's native API without our own ones. That will affect the initial design we have landed in #493 but in a good way that now i have an idea to solve this BYO endpoint picker matter. I will work on it after the #599 which temporarily disables Inference Extension support (but on the main branch so i can fix it before the next release) |
Uh oh!
There was an error while loading. Please reload this page.
Support k8s gateway API inference extensions
Design notes:
model
value form the request, assuming requests use OpenAI schema.Reference implementation for external endpoint picker: https://github.com/kubernetes-sigs/gateway-api-inference-extension/
Implementation details and questions:
Possible iteration steps:
The text was updated successfully, but these errors were encountered: