-
Notifications
You must be signed in to change notification settings - Fork 7
Add second vLLM-sim instance to allow for routing #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…ill be the target for a request. Session affinity scorer added
- Rename SessionId to SessionID - Remove datastore from scoreTargets, add datastore to SessionAffinityScorer - Rename ScoredPod to PodScore
…f ScoreMng - If some specific scorer failed to score pods - just log the problem, skip it and continue to the next scorer
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.37.0 to 0.38.0. - [Commits](golang/net@v0.37.0...v0.38.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-version: 0.38.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]>
…es/golang.org/x/net-0.38.0 Bump golang.org/x/net from 0.37.0 to 0.38.0
Add scorers support in scheduler
…eployments First iteration of development deployments & environments
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
…ilds fix: basic container image builds for linux
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Etai Lev Ran <[email protected]>
empty top level kustomization.yaml - make CICD happy
Fix kustomize envs
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Etai Lev Ran <[email protected]>
Add CRDs deployments
upgrade golang.org/x/oauth2 to v0.27.0
Signed-off-by: Shane Utt <[email protected]>
This is required for full GIE support Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Signed-off-by: Shane Utt <[email protected]>
Add full stack deployment to Kind dev env
Patch Istio deployment to use 1Gi of mem
Setup the Istio service to be a NodePort service and not a ClusterIP service
Signed-off-by: Ricardo Noriega De Soto <[email protected]>
labels: | ||
app: vllm-llama3-8b-instruct | ||
spec: | ||
replicas: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oglok could you increase the replica count on the existing deployment to get multiple simulator instances for routing or do you see value in having a completely separate deployment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, but with a higher number of replicas, the request will always hit the same Kubernetes service, won't it? How would the inference extension route it to a specific pod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the GIE bypassing the Service
and pulling podips via the Deployment
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll modify this to increase the number of replicas! thx
metadata: | ||
labels: | ||
app: vllm-llama3-8b-instruct | ||
ai-aware-router-pod: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is carry over from the previous design and can be removed without affecting what we need (ie InferencePool should be selecting on app: vllm-llama3-8b-instruct
)
@oglok would be good to see a revised version, rebased on upstream/dev with the agreed changes (assuming it was not already bumped up to a higher replica count and label cleared in another PR). If already done elsewhere - please close. |
Add a second vLLM-sim instance to test routing policies.