Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment e2e tests for MultiKueue #4103

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Bobbins228
Copy link
Contributor

@Bobbins228 Bobbins228 commented Jan 30, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR is the second part in enabling deployments for MultiKueue with #4034 being part 1

Which issue(s) this PR fixes:

Part of #3802

Special notes for your reviewer:

@mimowo Seeing as Deployments do not have workload support we are unable to create an adapter and utilize Sync Job for creating a remote deployment and syncing status that way.

On the other hand as the Pods progress on the Manager Cluster the deployment's status is updated as if they were running on single cluster.
Another note is a concern in relation to which cluster the Pods are created in. In this implementation the pods can run on any cluster and are not run as a group on a single cluster. Is this something we should be wary of?

If we are happy with this implementation I can remove the WIP.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Jan 30, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Bobbins228
Once this PR has been reviewed and has the lgtm label, please assign mimowo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 30, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @Bobbins228. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 30, 2025
Copy link

netlify bot commented Jan 30, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 00a6bc6
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/679cac69bfe3a10009ec2b16

@mimowo
Copy link
Contributor

mimowo commented Jan 30, 2025

@Bobbins228 being able to execute the whole Deployment on a dedicated worker cluster would be great. One con I see is that scaling will probably not work.

One question that comes to my mind first - how do we discriminate which mode of operation a user want to use(via pods or via full workload)? should we discriminate based on a deployment annotation, or you have some alternative ideas?

@Bobbins228
Copy link
Contributor Author

@mimowo

One question that comes to my mind first - how do we discriminate which mode of operation a user want to use(via pods or via full workload)? should we discriminate based on a deployment annotation, or you have some alternative ideas?

Can you elaborate a bit more on this point? Thanks

@mimowo
Copy link
Contributor

mimowo commented Jan 30, 2025

Let me try - I might also be missing something here :) .

So, with #4034 I believe Deployments are sort of supported with workloads at the level of Pods. On worker clusters we only have the Pods created. This is the "first" mode of operation. With this mode of operation I assume scaling the Deployment works OOTB.

The "second" mode of operation is to create a workload at the Deployment level on the management cluster. Then, the entire Deployment could be copied onto the worker cluster and manage the pods there. IIUC in this mode of operation scaling would not work OOTB (but we may check, and add e2e test). I suppose we would need extra code to update the Deployment on a worker.

Now, IIUC with this PR you enable the "second" mode of operation. However, it may have a downside compared to (1.) . So, I'm wondering if a user could still have an option to choose (1.) if they need scaling? However, propably we could make scaling work with (2.) as well, so maybe we don't need to modes...

@Bobbins228
Copy link
Contributor Author

@mimowo
For now we are looking into mode 1, we can create a follow on PR for mode 2 if that is acceptable?

Just off the top of my head for mode 2 PR we could check owner references of Pods and ensure all Pods created through deployments are scheduled on a specific Cluster. We could potentially allow a user the option to change which mode they want to use through a "MultiKueue" configuration parameter in the Kueue manager config. WDYT?

@mimowo
Copy link
Contributor

mimowo commented Jan 30, 2025

For now we are looking into mode 1,

I see, I was confused by this line https://github.com/kubernetes-sigs/kueue/pull/4103/files#diff-5c2c085d3754c50f6e899f937ef8113a2fc81685d7130b5bf88d1a37d2f26d67R58. Can you elaborate why we need support for prebuilt workloads for Deployments to make it work in "mode1

we can create a follow on PR for mode 2 if that is acceptable?

I think mode 2 would make sense too, but we may get a bit more details, so I would suggest a KEP to discuss the details, such as configuration for mode1 vs mode2. And ideal if you could support it with some use-cases you have. One thing is making sure that all Pods land on the same worker I suppose. Also, scalability could be better as we don't need to create the dummy pods in the management cluster.

We could potentially allow a user the option to change which mode they want to use through a "MultiKueue" configuration parameter in the Kueue manager config. WDYT?

Yeah, I think we could start with the global config option. Making it granular per workload seems possible, but could get messy.

@Bobbins228
Copy link
Contributor Author

@mimowo

I see, I was confused by this line https://github.com/kubernetes-sigs/kueue/pull/4103/files#diff-5c2c085d3754c50f6e899f937ef8113a2fc81685d7130b5bf88d1a37d2f26d67R58. Can you elaborate why we need support for prebuilt workloads for Deployments to make it work in "mode1

My bad, I have removed the added validation.

I think mode 2 would make sense too, but we may get a bit more details, so I would suggest a KEP to discuss the details, such as configuration for mode1 vs mode2. And ideal if you could support it with some use-cases you have. One thing is making sure that all Pods land on the same worker I suppose. Also, scalability could be better as we don't need to create the dummy pods in the management cluster.

A KEP would be great I can make an issue for it.

Yeah, I think we could start with the global config option. Making it granular per workload seems possible, but could get messy.

+1, we would also need a better naming convention than "mode 1", "mode 2" :)

Updating this issue to cover just e2e tests for mode 1. I am going to make it more robust by also including a test for scaling up the deployment and ensuring more Workloads are created.

@Bobbins228 Bobbins228 changed the title [WIP] Deployment support for MultiKueue [WIP] Deployment e2e tests for MultiKueue Jan 30, 2025
@Bobbins228 Bobbins228 changed the title [WIP] Deployment e2e tests for MultiKueue Deployment e2e tests for MultiKueue Jan 30, 2025
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jan 30, 2025
@mimowo
Copy link
Contributor

mimowo commented Jan 30, 2025

cc @mszadkow Could you make the first pass?

@mszadkow
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 30, 2025

for _, pod := range pods.Items {
// Ensure all 4 local pods should be in "Running" phase
gomega.Eventually(func(g gomega.Gomega) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pods shouldn't be scheduled on the management cluster
Thus you need to observe their status in worker

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to know on which worker they will be it's good to request resources in certain type and amount that is available to only one of the workers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this does not work with pods created through deployments.
I tested by setting the deployment resources to 1.5 CPUs before expecting that all pods would be admitted to worker1 but this was not the case.

Kueue will schedule the pods on any cluster with the necessary quota available for mode 1 discussed above.
Luckily though there is another way to know what pods are scheduled where as the pods would be assigned a nodeName of either kind-worker1-control-plane or kind-worker2-control-plane on the remote clusters. This method would be a bit convoluted so I am looking for a better option.

})

pods := &corev1.PodList{}
gomega.Expect(k8sManagerClient.List(ctx, pods, client.InNamespace(managerNs.Namespace),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should happen on the worker cluster

// Local pods should be in "Running" phase
gomega.Eventually(func(g gomega.Gomega) {
createdPod := &corev1.Pod{}
g.Expect(k8sManagerClient.Get(ctx, client.ObjectKey{Namespace: pod.Namespace, Name: pod.Name}, createdPod)).To(gomega.Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should happen on the worker cluster

ginkgo.By("Waiting for the deployment to get status updates", func() {
gomega.Eventually(func(g gomega.Gomega) {
createdDeployment := appsv1.Deployment{}
g.Expect(k8sManagerClient.Get(ctx, client.ObjectKeyFromObject(deployment), &createdDeployment)).To(gomega.Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct, we should see status update on the deployment in manager cluster

@Bobbins228
Copy link
Contributor Author

@mszadkow Thanks for the review, I'll get right on it

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 31, 2025
@mszadkow
Copy link
Contributor

I understand that there is no "finish line" for Deployment, Pods just running.
That's a new type of testing in e2e, I guess it's fine.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 31, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a4fd4cdbbdea1f8ffe5c30a480274251ebbc63eb

Comment on lines +347 to +352
pods = &corev1.PodList{}
gomega.Expect(k8sManagerClient.List(ctx, pods, client.InNamespace(managerNs.Namespace),
client.MatchingLabels(deployment.Spec.Selector.MatchLabels))).To(gomega.Succeed())

// Check all worker pods are in a running phase.
ensurePodWorkloadsRunning(pods, *managerNs, multiKueueAc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, actually instead of the comment could you wrap this in a ginkgo.By("Check all worker pods are in a running phase")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within the function is a ginkgo.By("Ensure Pod Workloads are created and Pods are Running on the worker cluster")
The comment is just for dev readability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, but still something feels off with the code being split outside and inside the helper function.

I would like to consider the following, move this code inside the function:

	pods = &corev1.PodList{}
	gomega.Expect(k8sManagerClient.List(ctx, pods, client.InNamespace(managerNs.Namespace),
		client.MatchingLabels(deployment.Spec.Selector.MatchLabels))).To(gomega.Succeed())

but keep the gingo.By outside. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or make the ginkgo.By as the first line of the helper function)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the first option better will make things more uniform, thanks mimowo

client.MatchingLabels(deployment.Spec.Selector.MatchLabels))).To(gomega.Succeed())

// Check all worker pods are in a running phase.
ensurePodWorkloadsRunning(pods, *managerNs, multiKueueAc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could also add one more ginkgo.By - delete the Deployment and verify the Pods are gone?

Comment on lines +887 to +892
// By checking the assigned cluster we can discern which client to use
admissionCheckMessage := workload.FindAdmissionCheck(createdLeaderWorkload.Status.AdmissionChecks, multiKueueAc.Name).Message
workerCluster := k8sWorker1Client
if strings.Contains(admissionCheckMessage, "worker2") {
workerCluster = k8sWorker2Client
}
Copy link
Contributor

@mimowo mimowo Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Right, this will work but relies on the assumption we have only 2 workers. I would suggest to introduce a helper function which gets the worker name in the utils, for re-usability.

EDIT: for now maybe we can use a regex? For later I would suggest we have a dedicated API field, but this would be a separate enhancement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex sounds good but in terms of pairing up the retrieved worker name with the appropriate worker client I am at a bit of a loss.

Unless we have a map of cluster names and clients that we can compare to the given cluster name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, yeah we can add the map which we initialize along with the clients.

Comment on lines +299 to +304
pods := &corev1.PodList{}
gomega.Expect(k8sManagerClient.List(ctx, pods, client.InNamespace(managerNs.Namespace),
client.MatchingLabels(deployment.Spec.Selector.MatchLabels))).To(gomega.Succeed())

// Check all worker pods are in a running phase.
ensurePodWorkloadsRunning(pods, *managerNs, multiKueueAc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wrap this in a ginkgo.By step.

for _, pod := range pods.Items { // We want to test that all deployment pods have workloads.
createdLeaderWorkload := &kueue.Workload{}
wlLookupKey := types.NamespacedName{Name: workloadpod.GetWorkloadNameForPod(pod.Name, pod.UID), Namespace: managerNs.Name}
gomega.Eventually(func(g gomega.Gomega) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Eventually does not seem needed, right?

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just nits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants