Initial Gateway API Inference Extension Blog Post #49898

danehans · 2025-02-25T15:52:55Z

Description

Adds a blog post introducing the Gateway API inference extension project.

k8s-ci-robot · 2025-02-25T15:53:03Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nate-double-u for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

content/en/blog/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

netlify · 2025-02-25T16:01:53Z

✅ Pull request preview available for checking

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`7ee2d03`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/67df0b4d5c35e80008f0cde8
😎 Deploy Preview	https://deploy-preview-49898--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md

danehans · 2025-03-07T18:56:04Z

@robscott PTAL and let me know if you would like any modifications.

danehans · 2025-03-13T17:51:14Z

TODO [danehans]: Add benchmarks and ref to: kubernetes-sigs/gateway-api-inference-extension#480 (when merged).

smarterclayton · 2025-03-21T17:48:29Z

content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md

@@ -23,12 +23,13 @@ is missing.

 ## Enter Gateway API Inference Extension

-[Gateway API Inference Extension](https://gateway-api-inference-extension.sigs.k8s.io/) was created to address this gap by building on the existing [Gateway API](https://gateway-api.sigs.k8s.io/),


On line 19 above, "... focused on HTTP path routing or ..."?

@smarterclayton I resolved all your feedback in the latest commit other than this comment. Feel free to rereview and/or elaborate. Thanks again for your review.

smarterclayton · 2025-03-21T17:50:36Z

content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md

+standardize routing to inference workloads across the ecosystem. Key objectives include enabling model-aware
+routing, supporting per-request criticalities, facilitating safe model roll-outs, and optimizing load balancing
+based on real-time model metrics. By achieving these, the project aims to reduce latency and improve accelerator
+(GPU) utilization for AI workloads.


I'd love if you could work in

"Adding the inference extension to your existing gateway makes it an Inference Gateway - enabling you to self-host large language models with a model as a service mindset"

or similar. Roughly hitting the two points "inference extends gateway = inference gateway", and "inference gateway = self-host genai/large models as model as a service"

+1, I really like this framing, and we should use it as much as we can throughout this post and our docs.

I started to go a bit farther with this theme and realized that we could write a very compelling blog post with this theme after KubeCon when we have more Gateway implementations ready. That post could be titled "Introducing Kubernetes Inference Gateways", and have a section describing that an Inference Gateway is an "existing gateway + inference extension". To really sell that though, I think we need to have a variety of "Inference Gateways" ready to play with.

So if we think we'll end up with two separate blog posts here, maybe this initial one is focused on the project goal of extending any existing Gateway with specialized Inference routing capabilities, and then in a follow up blog post we can focus more on the "Inference Gateway" term when we have more examples to work with.

Or maybe we should just hold off on this post until we have more Inference Gateway examples. I'm not sure, open to ideas here.

I like planting the "gateway + inference extension = inference gateway" seed here and using a follow-up post to drive the messaging.

smarterclayton · 2025-03-21T17:51:46Z

content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md

@@ -64,7 +65,7 @@ steps, e.g. extensions, in the middle. Here’s a high-level example of the requ
   and identifies the matching InferencePool backend.

 2. **Endpoint Selection**
-   Instead of simply forwarding to any pod, the Gateway consults an inference-specific routing extension. This
+   Instead of simply forwarding to any pod, the Gateway consults an inference-specific routing extension, e.g. endpoint selection extension. This


maybe instead of 'e.g.'

Instead of simply forwarding to any available pod, the Gateway consults an inference-specific routing extension - an endpoint selection extension - to pick the best of the available pods.

?

Making the ^ change with one minor difference s/an endpoint selection extension/the Endpoint Selection Extension/

smarterclayton · 2025-03-21T17:52:44Z

content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md

+more, it helps ops teams deliver the right LLM services to the right users—smoothly and efficiently.
+
+**Ready to learn more?** Visit the [project docs](https://gateway-api-inference-extension.sigs.k8s.io/) to dive deeper,
+give Inference Extension a try with a few [simple steps](https://gateway-api-inference-extension.sigs.k8s.io/guides/),


I would suggest saying "... give the Inference Gateway extension a try with a few ...".

We probably want to hold off on publishing this until we've updated our guides to use proper "Inference Gateways" instead of Envoy patches. Maybe that's actually an argument for saving this until after KubeCon?

The initial inference extension supported landed in kgateway and I plan on adding an inference extension docs PR in the next few days.

robscott

Thanks for the work on this @danehans!

robscott · 2025-03-21T22:28:06Z

content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md

+---
+layout: blog
+title: "Introducing Gateway API Inference Extension"
+date: 2025-02-21


@danehans can we aim for a day that hasn't been claimed yet next week?

robscott · 2025-03-21T22:38:56Z

content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md

+standardize routing to inference workloads across the ecosystem. Key objectives include enabling model-aware
+routing, supporting per-request criticalities, facilitating safe model roll-outs, and optimizing load balancing
+based on real-time model metrics. By achieving these, the project aims to reduce latency and improve accelerator
+(GPU) utilization for AI workloads.


+1, I really like this framing, and we should use it as much as we can throughout this post and our docs.

I started to go a bit farther with this theme and realized that we could write a very compelling blog post with this theme after KubeCon when we have more Gateway implementations ready. That post could be titled "Introducing Kubernetes Inference Gateways", and have a section describing that an Inference Gateway is an "existing gateway + inference extension". To really sell that though, I think we need to have a variety of "Inference Gateways" ready to play with.

So if we think we'll end up with two separate blog posts here, maybe this initial one is focused on the project goal of extending any existing Gateway with specialized Inference routing capabilities, and then in a follow up blog post we can focus more on the "Inference Gateway" term when we have more examples to work with.

Or maybe we should just hold off on this post until we have more Inference Gateway examples. I'm not sure, open to ideas here.

content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md

robscott · 2025-03-21T22:43:46Z

content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md

+more, it helps ops teams deliver the right LLM services to the right users—smoothly and efficiently.
+
+**Ready to learn more?** Visit the [project docs](https://gateway-api-inference-extension.sigs.k8s.io/) to dive deeper,
+give Inference Extension a try with a few [simple steps](https://gateway-api-inference-extension.sigs.k8s.io/guides/),


We probably want to hold off on publishing this until we've updated our guides to use proper "Inference Gateways" instead of Envoy patches. Maybe that's actually an argument for saving this until after KubeCon?

robscott · 2025-03-21T22:45:11Z

content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md

+
+This extra step provides a smarter, model-aware routing mechanism that still feels like a normal single request to
+the client.
+


Somewhere in this section I think it would be useful to mention the extensible nature of this model, and that new extensions can be developed that will be compatible with any Inference Gateway.

I updated this section based on ^ feedback, PTAL.

Signed-off-by: Daneyon Hansen <[email protected]>

k8s-ci-robot added the area/blog Issues or PRs related to the Kubernetes Blog subproject label Feb 25, 2025

k8s-ci-robot requested review from natalisucks and sftim February 25, 2025 15:53

k8s-ci-robot added language/en Issues or PRs related to English language cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 25, 2025

kfswain reviewed Feb 25, 2025

View reviewed changes

content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md Outdated Show resolved Hide resolved

danehans changed the title ~~Initial Gateway API Inference Extension Blog Post~~ [WIP] Initial Gateway API Inference Extension Blog Post Feb 25, 2025

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 25, 2025

sftim reviewed Feb 26, 2025

View reviewed changes

content/en/blog/_posts/2025-02-21-introducing-gateway-api-inference-extension/index.md Outdated Show resolved Hide resolved

danehans force-pushed the gie_kcon_blog branch from 81e60ca to 3b5f77a Compare March 7, 2025 18:54

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 20, 2025

smarterclayton reviewed Mar 21, 2025

View reviewed changes

danehans force-pushed the gie_kcon_blog branch 2 times, most recently from 942afc7 to 6bdf890 Compare March 21, 2025 19:31

danehans changed the title ~~[WIP] Initial Gateway API Inference Extension Blog Post~~ Initial Gateway API Inference Extension Blog Post Mar 21, 2025

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 21, 2025

robscott reviewed Mar 21, 2025

View reviewed changes

danehans added 4 commits March 22, 2025 11:32

Initial GIE intro blog

64ae14c

Signed-off-by: Daneyon Hansen <[email protected]>

Font matter for authors, roadmap, and streamlines content

75cf8b5

Signed-off-by: Daneyon Hansen <[email protected]>

Adds call to action and benchmarking todo

8a2a2f5

Signed-off-by: Daneyon Hansen <[email protected]>

Adds benchmark section

a3124db

Signed-off-by: Daneyon Hansen <[email protected]>

Changes the date and resolves initial review feedback

7ee2d03

Signed-off-by: Daneyon Hansen <[email protected]>

danehans force-pushed the gie_kcon_blog branch from 6bdf890 to 7ee2d03 Compare March 22, 2025 19:11

danehans requested review from robscott, sftim, kfswain and smarterclayton March 22, 2025 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Gateway API Inference Extension Blog Post #49898

Initial Gateway API Inference Extension Blog Post #49898

danehans commented Feb 25, 2025

k8s-ci-robot commented Feb 25, 2025

netlify bot commented Feb 25, 2025 •

edited

Loading

danehans commented Mar 7, 2025

danehans commented Mar 13, 2025

smarterclayton Mar 21, 2025

danehans Mar 22, 2025

smarterclayton Mar 21, 2025

robscott Mar 21, 2025

danehans Mar 22, 2025

smarterclayton Mar 21, 2025

danehans Mar 22, 2025

smarterclayton Mar 21, 2025

robscott Mar 21, 2025

danehans Mar 22, 2025

robscott left a comment

robscott Mar 21, 2025

robscott Mar 21, 2025

robscott Mar 21, 2025

robscott Mar 21, 2025

danehans Mar 22, 2025

		@@ -23,12 +23,13 @@ is missing.

		## Enter Gateway API Inference Extension

		[Gateway API Inference Extension](https://gateway-api-inference-extension.sigs.k8s.io/) was created to address this gap by building on the existing [Gateway API](https://gateway-api.sigs.k8s.io/),


		This extra step provides a smarter, model-aware routing mechanism that still feels like a normal single request to
		the client.

Initial Gateway API Inference Extension Blog Post #49898

Are you sure you want to change the base?

Initial Gateway API Inference Extension Blog Post #49898

Conversation

danehans commented Feb 25, 2025

Description

k8s-ci-robot commented Feb 25, 2025

netlify bot commented Feb 25, 2025 • edited Loading

✅ Pull request preview available for checking

danehans commented Mar 7, 2025

danehans commented Mar 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robscott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netlify bot commented Feb 25, 2025 •

edited

Loading