Add second vLLM-sim instance to allow for routing #41

oglok · 2025-04-21T21:54:54Z

Add a second vLLM-sim instance to test routing policies.

…ill be the target for a request. Session affinity scorer added

- Rename SessionId to SessionID - Remove datastore from scoreTargets, add datastore to SessionAffinityScorer - Rename ScoredPod to PodScore

…orerManager

…f ScoreMng - If some specific scorer failed to score pods - just log the problem, skip it and continue to the next scorer

Signed-off-by: Shane Utt <[email protected]>

Bumps [golang.org/x/net](https://github.com/golang/net) from 0.37.0 to 0.38.0. - [Commits](golang/net@v0.37.0...v0.38.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-version: 0.38.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]>

…es/golang.org/x/net-0.38.0 Bump golang.org/x/net from 0.37.0 to 0.38.0

Add scorers support in scheduler

…eployments First iteration of development deployments & environments

Signed-off-by: Shane Utt <[email protected]>

…ilds fix: basic container image builds for linux

Signed-off-by: Shane Utt <[email protected]>

Signed-off-by: Etai Lev Ran <[email protected]>

empty top level kustomization.yaml - make CICD happy

Fix kustomize envs

Signed-off-by: Shane Utt <[email protected]>

Signed-off-by: Etai Lev Ran <[email protected]>

Add CRDs deployments

upgrade golang.org/x/oauth2 to v0.27.0

Signed-off-by: Shane Utt <[email protected]>

This is required for full GIE support Signed-off-by: Shane Utt <[email protected]>

Signed-off-by: Shane Utt <[email protected]>

Add full stack deployment to Kind dev env

…service

Patch Istio deployment to use 1Gi of mem

Setup the Istio service to be a NodePort service and not a ClusterIP service

Signed-off-by: Ricardo Noriega De Soto <[email protected]>

elevran · 2025-04-22T10:46:44Z

deploy/components/vllm-sim/deployments-2.yaml

+  labels:
+    app: vllm-llama3-8b-instruct
+spec:
+  replicas: 1


@oglok could you increase the replica count on the existing deployment to get multiple simulator instances for routing or do you see value in having a completely separate deployment?

I thought about it, but with a higher number of replicas, the request will always hit the same Kubernetes service, won't it? How would the inference extension route it to a specific pod?

Isn't the GIE bypassing the Service and pulling podips via the Deployment?

That's true: https://github.com/neuralmagic/gateway-api-inference-extension/blob/dev/pkg/epp/backend/metrics/pod_metrics.go#L71-L79

I'll modify this to increase the number of replicas! thx

elevran · 2025-04-22T10:47:44Z

deploy/components/vllm-sim/deployments-2.yaml

+    metadata:
+      labels:
+        app: vllm-llama3-8b-instruct
+        ai-aware-router-pod: "true"


this is carry over from the previous design and can be removed without affecting what we need (ie InferencePool should be selecting on app: vllm-llama3-8b-instruct)

elevran · 2025-04-27T13:46:51Z

@oglok would be good to see a revised version, rebased on upstream/dev with the agreed changes (assuming it was not already bumped up to a higher replica count and label cleared in another PR). If already done elsewhere - please close.

mayabar and others added 30 commits April 10, 2025 14:50

Add initial support for scorers, used as part of decision which pod w…

0c95c2a

…ill be the target for a request. Session affinity scorer added

Fixes in scores infrastructure & session aware scorer

bde57da

- Add cleanup for session->pod map

8f9785e

- Rename SessionId to SessionID - Remove datastore from scoreTargets, add datastore to SessionAffinityScorer - Rename ScoredPod to PodScore

Export score and pod for external implementations

52af66f

Rename session id header

90e23bc

Separate code of Scorer interface and scorer implementations + add sc…

c946acc

…orerManager

Remove vllmRequest from scoreTargets API since it exists in the context

137ca09

Support negative score weights

c42f72a

Fix fakeDataStore to be compatible with DataStore intereface

a05a573

- Check for nils in list of available pods in main scoring function o…

aca8e07

…f ScoreMng - If some specific scorer failed to score pods - just log the problem, skip it and continue to the next scorer

[version bump] Promote 0.0.2 to prod, bump dev to 0.0.3

bae2a66

chore: move openshift router deployment to extra

c398f4b

Signed-off-by: Shane Utt <[email protected]>

feat: add deployment for sail operator

95e6bb1

Signed-off-by: Shane Utt <[email protected]>

feat: add istio control-plane deployment

6d12b06

Signed-off-by: Shane Utt <[email protected]>

feat: add vllm simulator deployment

8c4eb46

Signed-off-by: Shane Utt <[email protected]>

feat: add inference-gateway deployment

58ef159

Signed-off-by: Shane Utt <[email protected]>

feat: add kind environment deployment

f606e0d

Signed-off-by: Shane Utt <[email protected]>

feat: kind dev env deployment script

c679724

Signed-off-by: Shane Utt <[email protected]>

Merge pull request neuralmagic#5 from neuralmagic/dependabot/go_modul…

09e79e6

…es/golang.org/x/net-0.38.0 Bump golang.org/x/net from 0.37.0 to 0.38.0

Merge pull request neuralmagic#1 from mayabar/main

d8303a0

Add scorers support in scheduler

Merge pull request neuralmagic#4 from shaneutt/shaneutt/initial-dev-d…

dad8db2

…eployments First iteration of development deployments & environments

fix: basic container image builds for linux

f608526

Signed-off-by: Shane Utt <[email protected]>

fix: lint fix

2dd2ee7

Signed-off-by: Shane Utt <[email protected]>

Merge pull request neuralmagic#10 from shaneutt/shaneutt/fix-image-bu…

10be213

…ilds fix: basic container image builds for linux

fix: move openshift deployment to environments

a24a801

Signed-off-by: Shane Utt <[email protected]>

fix: retarget kustomize deployments in Makefile

950e07b

Signed-off-by: Shane Utt <[email protected]>

empty top level kustomization.yaml - make CICD happy

47bed9d

Signed-off-by: Etai Lev Ran <[email protected]>

Merge pull request neuralmagic#17 from elevran/deploy_kustomization_yaml

0c4e6c8

empty top level kustomization.yaml - make CICD happy

Merge pull request neuralmagic#15 from shaneutt/fix-kustomize-envs

e0dcba6

Fix kustomize envs

shaneutt and others added 27 commits April 18, 2025 09:31

feat: add crd deployment component

6bd139f

Signed-off-by: Shane Utt <[email protected]>

fix: remove podman load instructions that are no longer needed

9202462

Signed-off-by: Shane Utt <[email protected]>

upgrade golang.org/x/oauth2 to v0.27.0

8e46ea9

Signed-off-by: Etai Lev Ran <[email protected]>

Merge pull request neuralmagic#31 from shaneutt/shaneutt/crd-deployments

e7a53af

Add CRDs deployments

Merge pull request neuralmagic#32 from elevran/oauth2_vuln

142668c

upgrade golang.org/x/oauth2 to v0.27.0

feat: add istio crds to deployments

413c4a7

Signed-off-by: Shane Utt <[email protected]>

chore: add custom build for istio-control-plane

43ab7c1

This is required for full GIE support Signed-off-by: Shane Utt <[email protected]>

chore: cleanup vllm-sim deployments

a3355b3

Signed-off-by: Shane Utt <[email protected]>

chore: update gateway deployment for gie compat

f699eb7

Signed-off-by: Shane Utt <[email protected]>

chore: kind env script cleanup

760414e

Signed-off-by: Shane Utt <[email protected]>

chore: cleanup sail operator deployment

5659f22

Signed-off-by: Shane Utt <[email protected]>

chore: cleanup kind dev env deployment

1ef859f

Signed-off-by: Shane Utt <[email protected]>

chore: move kind dev env deploys

322a421

Signed-off-by: Shane Utt <[email protected]>

chore: move openshift dev env deploys

383d2db

Signed-off-by: Shane Utt <[email protected]>

feat: add environment.dev.kind makefile target

55bf0f8

Signed-off-by: Shane Utt <[email protected]>

docs: add development documentation

896270f

Signed-off-by: Shane Utt <[email protected]>

chore: cleanup some language in the Makefile

a83015b

Signed-off-by: Shane Utt <[email protected]>

Merge pull request neuralmagic#33 from shaneutt/shaneutt/kind-full-stack

33a10b5

Add full stack deployment to Kind dev env

added infra pipeline run stuff

175fbec

added infra pipeline run stuff

b7bcf73

added infra pipeline run stuff

eea72e4

pre-commit hook added

72a4328

test trivy scan

04a0e25

Setup the Istio service to be a NodePort service and not a ClusterIP …

5098332

…service

Merge pull request neuralmagic#37 from oglok/istio-mem

84ff88c

Patch Istio deployment to use 1Gi of mem

Merge pull request neuralmagic#38 from shmuelk/kind-nodeport

a7dbfa6

Setup the Istio service to be a NodePort service and not a ClusterIP service

Add second vLLM-sim instance to allow for routing

6837bd9

Signed-off-by: Ricardo Noriega De Soto <[email protected]>

elevran reviewed Apr 22, 2025

View reviewed changes

mayabar force-pushed the dev branch from b0d29ec to a80bcfc Compare April 23, 2025 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add second vLLM-sim instance to allow for routing #41

Add second vLLM-sim instance to allow for routing #41

oglok commented Apr 21, 2025 •

edited

Loading

elevran Apr 22, 2025

oglok Apr 22, 2025

shaneutt Apr 22, 2025 •

edited

Loading

oglok Apr 22, 2025

elevran Apr 22, 2025

elevran commented Apr 27, 2025

Add second vLLM-sim instance to allow for routing #41

Are you sure you want to change the base?

Add second vLLM-sim instance to allow for routing #41

Conversation

oglok commented Apr 21, 2025 • edited Loading

elevran Apr 22, 2025

Choose a reason for hiding this comment

oglok Apr 22, 2025

Choose a reason for hiding this comment

shaneutt Apr 22, 2025 • edited Loading

Choose a reason for hiding this comment

oglok Apr 22, 2025

Choose a reason for hiding this comment

elevran Apr 22, 2025

Choose a reason for hiding this comment

elevran commented Apr 27, 2025

oglok commented Apr 21, 2025 •

edited

Loading

shaneutt Apr 22, 2025 •

edited

Loading