-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploy TPU Webhook to ray-system Namespace and Remove RayCluster Label Selectors #354
Conversation
…RayCluster interception
applications/ray/kuberay-tpu-webhook/deployments/mutate-pod-webhook-cfg.yaml
Outdated
Show resolved
Hide resolved
applications/ray/kuberay-tpu-webhook/deployments/validate-raycluster-webhook-cfg.yaml
Outdated
Show resolved
Hide resolved
applications/ray/kuberay-tpu-webhook/deployments/mutate-pod-webhook-cfg.yaml
Outdated
Show resolved
Hide resolved
@ryanaoleary can we follow some of the guidance around webhooks mentioned here: https://cloud.google.com/kubernetes-engine/docs/how-to/optimize-webhooks Specifically this section about excluding |
Done, I changed it to exclude those two namespaces. |
/gcbrun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ryanaoleary since we don't have automated tests, can you document manual steps you took to test this change?
Done. |
- kuberay-tpu-webhook.default.svc | ||
- kuberay-tpu-webhook.default.svc.cluster.local | ||
- kuberay-tpu-webhook.ray-system.svc | ||
- kuberay-tpu-webhook.ray-system.svc.cluster.local | ||
issuerRef: | ||
name: selfsigned-issuer | ||
secretTemplate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the reflection annotations below can be remnoved now)
This PR changes the namespace used to deploy the webhook, service, and certificate to
ray-system
rather thandefault
, creating theray-system
namespace withkubectl create namespace
if it does not already exist. This PR also separates the/validate
operations the webhook performs into two configs:validate-pod-
andvalidate-raycluster-
.validate-pod-
continues to select objects based on the automatically appliedapp.kubernetes.io/name: kuberay
label, whilevalidate-raycluster-
now intercepts any RayCluster creation requests to check whether they request TPUs.In a follow-up PR:
I plan to create a helm-chart for the webhook and have it installed in the same namespace alongside
kuberay-operator
in modules/kuberay-operator/kuberay.tf. Users will have the option of manually deploying the webhook using the Makefile or creating it through the Terraform in applications/ray whenenable_tpu = true
.Related Issues:
#306
#307
#308
Testing Strategy:
ray-system
, deploying a multi-host2x2x2
RayCluster with 2 replicas (causing 4 pods to scale up), and checking that the webhook correctly injected all pod affinities, labels, and env vars usingkubectl describe
.