Skip to content

Add second vLLM-sim instance to allow for routing #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 70 commits into
base: dev
Choose a base branch
from

Conversation

oglok
Copy link

@oglok oglok commented Apr 21, 2025

Add a second vLLM-sim instance to test routing policies.

mayabar and others added 30 commits April 10, 2025 14:50
…ill be the target for a request. Session affinity scorer added
- Rename SessionId to SessionID
- Remove datastore from scoreTargets, add datastore to SessionAffinityScorer
- Rename ScoredPod to PodScore
…f ScoreMng

- If some specific scorer failed to score pods - just log the problem, skip it and continue to the next scorer
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.37.0 to 0.38.0.
- [Commits](golang/net@v0.37.0...v0.38.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-version: 0.38.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
…es/golang.org/x/net-0.38.0

Bump golang.org/x/net from 0.37.0 to 0.38.0
Add scorers support in scheduler
…eployments

First iteration of development deployments & environments
Signed-off-by: Shane Utt <[email protected]>
…ilds

fix: basic container image builds for linux
empty top level kustomization.yaml - make CICD happy
shaneutt and others added 27 commits April 18, 2025 09:31
upgrade golang.org/x/oauth2 to v0.27.0
This is required for full GIE support

Signed-off-by: Shane Utt <[email protected]>
Patch Istio deployment to use 1Gi of mem
Setup the Istio service to be a NodePort service and not a ClusterIP service
labels:
app: vllm-llama3-8b-instruct
spec:
replicas: 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oglok could you increase the replica count on the existing deployment to get multiple simulator instances for routing or do you see value in having a completely separate deployment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but with a higher number of replicas, the request will always hit the same Kubernetes service, won't it? How would the inference extension route it to a specific pod?

Copy link
Collaborator

@shaneutt shaneutt Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the GIE bypassing the Service and pulling podips via the Deployment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata:
labels:
app: vllm-llama3-8b-instruct
ai-aware-router-pod: "true"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is carry over from the previous design and can be removed without affecting what we need (ie InferencePool should be selecting on app: vllm-llama3-8b-instruct)

@elevran
Copy link
Collaborator

elevran commented Apr 27, 2025

@oglok would be good to see a revised version, rebased on upstream/dev with the agreed changes (assuming it was not already bumped up to a higher replica count and label cleared in another PR). If already done elsewhere - please close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants