-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Gateway API Inference Extension Blog Post #49898
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Pull request preview available for checkingBuilt without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site configuration. |
content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md
Outdated
Show resolved
Hide resolved
@robscott PTAL and let me know if you would like any modifications. |
TODO [danehans]: Add benchmarks and ref to: kubernetes-sigs/gateway-api-inference-extension#480 (when merged). |
@@ -23,12 +23,13 @@ is missing. | |||
|
|||
## Enter Gateway API Inference Extension | |||
|
|||
[Gateway API Inference Extension](https://gateway-api-inference-extension.sigs.k8s.io/) was created to address this gap by building on the existing [Gateway API](https://gateway-api.sigs.k8s.io/), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 19 above, "... focused on HTTP path routing or ..."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton I resolved all your feedback in the latest commit other than this comment. Feel free to rereview and/or elaborate. Thanks again for your review.
standardize routing to inference workloads across the ecosystem. Key objectives include enabling model-aware | ||
routing, supporting per-request criticalities, facilitating safe model roll-outs, and optimizing load balancing | ||
based on real-time model metrics. By achieving these, the project aims to reduce latency and improve accelerator | ||
(GPU) utilization for AI workloads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love if you could work in
"Adding the inference extension to your existing gateway makes it an Inference Gateway - enabling you to self-host large language models with a model as a service mindset"
or similar. Roughly hitting the two points "inference extends gateway = inference gateway", and "inference gateway = self-host genai/large models as model as a service"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I really like this framing, and we should use it as much as we can throughout this post and our docs.
I started to go a bit farther with this theme and realized that we could write a very compelling blog post with this theme after KubeCon when we have more Gateway implementations ready. That post could be titled "Introducing Kubernetes Inference Gateways", and have a section describing that an Inference Gateway is an "existing gateway + inference extension". To really sell that though, I think we need to have a variety of "Inference Gateways" ready to play with.
So if we think we'll end up with two separate blog posts here, maybe this initial one is focused on the project goal of extending any existing Gateway with specialized Inference routing capabilities, and then in a follow up blog post we can focus more on the "Inference Gateway" term when we have more examples to work with.
Or maybe we should just hold off on this post until we have more Inference Gateway examples. I'm not sure, open to ideas here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like planting the "gateway + inference extension = inference gateway" seed here and using a follow-up post to drive the messaging.
@@ -64,7 +65,7 @@ steps, e.g. extensions, in the middle. Here’s a high-level example of the requ | |||
and identifies the matching InferencePool backend. | |||
|
|||
2. **Endpoint Selection** | |||
Instead of simply forwarding to any pod, the Gateway consults an inference-specific routing extension. This | |||
Instead of simply forwarding to any pod, the Gateway consults an inference-specific routing extension, e.g. endpoint selection extension. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe instead of 'e.g.'
Instead of simply forwarding to any available pod, the Gateway consults an inference-specific routing extension - an endpoint selection extension - to pick the best of the available pods.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making the ^ change with one minor difference s/an endpoint selection extension/the Endpoint Selection Extension/
more, it helps ops teams deliver the right LLM services to the right users—smoothly and efficiently. | ||
|
||
**Ready to learn more?** Visit the [project docs](https://gateway-api-inference-extension.sigs.k8s.io/) to dive deeper, | ||
give Inference Extension a try with a few [simple steps](https://gateway-api-inference-extension.sigs.k8s.io/guides/), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest saying "... give the Inference Gateway extension a try with a few ...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to hold off on publishing this until we've updated our guides to use proper "Inference Gateways" instead of Envoy patches. Maybe that's actually an argument for saving this until after KubeCon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial inference extension supported landed in kgateway and I plan on adding an inference extension docs PR in the next few days.
942afc7
to
6bdf890
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work on this @danehans!
--- | ||
layout: blog | ||
title: "Introducing Gateway API Inference Extension" | ||
date: 2025-02-21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danehans can we aim for a day that hasn't been claimed yet next week?
standardize routing to inference workloads across the ecosystem. Key objectives include enabling model-aware | ||
routing, supporting per-request criticalities, facilitating safe model roll-outs, and optimizing load balancing | ||
based on real-time model metrics. By achieving these, the project aims to reduce latency and improve accelerator | ||
(GPU) utilization for AI workloads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I really like this framing, and we should use it as much as we can throughout this post and our docs.
I started to go a bit farther with this theme and realized that we could write a very compelling blog post with this theme after KubeCon when we have more Gateway implementations ready. That post could be titled "Introducing Kubernetes Inference Gateways", and have a section describing that an Inference Gateway is an "existing gateway + inference extension". To really sell that though, I think we need to have a variety of "Inference Gateways" ready to play with.
So if we think we'll end up with two separate blog posts here, maybe this initial one is focused on the project goal of extending any existing Gateway with specialized Inference routing capabilities, and then in a follow up blog post we can focus more on the "Inference Gateway" term when we have more examples to work with.
Or maybe we should just hold off on this post until we have more Inference Gateway examples. I'm not sure, open to ideas here.
content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md
Outdated
Show resolved
Hide resolved
content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md
Outdated
Show resolved
Hide resolved
more, it helps ops teams deliver the right LLM services to the right users—smoothly and efficiently. | ||
|
||
**Ready to learn more?** Visit the [project docs](https://gateway-api-inference-extension.sigs.k8s.io/) to dive deeper, | ||
give Inference Extension a try with a few [simple steps](https://gateway-api-inference-extension.sigs.k8s.io/guides/), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to hold off on publishing this until we've updated our guides to use proper "Inference Gateways" instead of Envoy patches. Maybe that's actually an argument for saving this until after KubeCon?
|
||
This extra step provides a smarter, model-aware routing mechanism that still feels like a normal single request to | ||
the client. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere in this section I think it would be useful to mention the extensible nature of this model, and that new extensions can be developed that will be compatible with any Inference Gateway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this section based on ^ feedback, PTAL.
Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
Description
Adds a blog post introducing the Gateway API inference extension project.
cc: @robscott @kfswain