-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add e2e tests for manageJobsWithoutQueueName #4112
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
/assign @mimowo PTAL! Thanks! 😊 |
8cbae20
to
49bd6c4
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kaisoz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
49bd6c4
to
7b42d2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, the new testing suite will be great to make sure the configuration works, which is really tricky and took lots of manual testing for now.
My main hesitation is about the way we run the new suite, and as pointed out in the comment I would be leaning towards a dedicated CI job. However, let me collect more feedback. I'm ok with the (2.) or (3.) approaches (from the comment) as interim.
func ApplyKueueConfiguration(ctx context.Context, k8sClient client.Client, kueueCfg *configapi.Configuration) { | ||
configMap := &corev1.ConfigMap{} | ||
kcmKey := types.NamespacedName{Namespace: "kueue-system", Name: "kueue-manager-config"} | ||
config, _ := yaml.Marshal(kueueCfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't silence errors
// Ensure that the rollout has started by waiting for the deployment to be unavailable | ||
g.Expect(deployment.Status.Conditions).To(gomega.ContainElement(gomega.BeComparableTo( | ||
appsv1.DeploymentCondition{Type: appsv1.DeploymentAvailable, Status: corev1.ConditionFalse}, | ||
cmpopts.IgnoreFields(appsv1.DeploymentCondition{}, "Reason", "Message", "LastUpdateTime", "LastTransitionTime")), | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest not to check Availbale=False, because it could get racy if the rastart takes little time.
"sigs.k8s.io/kueue/test/util" | ||
) | ||
|
||
var _ = ginkgo.Describe("ManageJobsWithoutQueueName", ginkgo.Ordered, func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make sure this is part of a suite.
config := util.GetKueueConfiguration(ctx, k8sClient) | ||
config.ManageJobsWithoutQueueName = true | ||
util.ApplyKueueConfiguration(ctx, k8sClient, config) | ||
util.RestartKueueController(ctx, k8sClient) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm not sure what is the best approach here, the restarting of Kueue adds overhead, and the tests need to be sequential.
Generally, I see three approaches to test different configurations:
- Have a dedicated CI job (as we currently have for singlecluster and multicluster)
- Have two e2e targets which create independent kind clusters, also sequentially. Here is a good example how the first setup for queue and multikueue was done: https://github.com/kubernetes-sigs/kueue/pull/1360/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52
- restart Kueue (as done here)
Also, (3.) is not great when we want to run the tests in isolation, because it would always restart Kueue on the cluster, which adds extra overhead.
So, my preference is ultimately (1.) I think, with (2.) and potentially (3.) as interim steps. Let me check if others have an opinion @mbobrovskyi @mszadkow @dgrove-oss @PBundyra ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think (1.) + (2.) is better solution. We just need to create separate kustomize file for this case with patch for changing managedByWithoutQueueName
field.
otoh we may want to test more e2e configurations in the future. So I have an idea to mix the strategies and have a new suite customconfigs, where we could test custom configurations and restart kueue between them. i believe in practice for which configuration we will have only a handful of tests which are specific to them, and the main body of tests will be still in singlecluster suite. So this way we would avoid creating a ci job per config, while also offload the main run. wdyt? For now I would suggest in this PR to put the e2e tests under e2e/customconfig/managedwithoutqueuebame. wdyt? |
@@ -151,6 +154,15 @@ run-test-tas-e2e-%: FORCE | |||
./hack/e2e-test.sh | |||
$(PROJECT_DIR)/bin/ginkgo-top -i $(ARTIFACTS)/$@/e2e.json > $(ARTIFACTS)/$@/e2e-top.yaml | |||
|
|||
run-test-queuename-e2e-%: K8S_VERSION = $(@:run-test-queuename-e2e-%=%) | |||
run-test-queuename-e2e-%: FORCE | |||
@echo Running e2e for k8s ${K8S_VERSION} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@echo Running e2e for k8s ${K8S_VERSION} | |
@echo Running without queue name e2e for k8s ${K8S_VERSION} |
@@ -151,6 +154,15 @@ run-test-tas-e2e-%: FORCE | |||
./hack/e2e-test.sh | |||
$(PROJECT_DIR)/bin/ginkgo-top -i $(ARTIFACTS)/$@/e2e.json > $(ARTIFACTS)/$@/e2e-top.yaml | |||
|
|||
run-test-queuename-e2e-%: K8S_VERSION = $(@:run-test-queuename-e2e-%=%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run-test-queuename-e2e-%: K8S_VERSION = $(@:run-test-queuename-e2e-%=%) | |
run-test-withoutqueuename-e2e-%: K8S_VERSION = $(@:run-test-queuename-e2e-%=%) |
Maybe like this?
@echo Running e2e for k8s ${K8S_VERSION} | ||
E2E_KIND_VERSION="kindest/node:v$(K8S_VERSION)" KIND_CLUSTER_NAME=$(KIND_CLUSTER_NAME) CREATE_KIND_CLUSTER=$(CREATE_KIND_CLUSTER) \ | ||
ARTIFACTS="$(ARTIFACTS)/$@" IMAGE_TAG=$(IMAGE_TAG) GINKGO_ARGS="$(GINKGO_ARGS)" \ | ||
JOBSET_VERSION=$(JOBSET_VERSION) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to prepare jobset?
visibilityClient visibilityv1beta1.VisibilityV1beta1Interface | ||
impersonatedVisibilityClient visibilityv1beta1.VisibilityV1beta1Interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
visibilityClient visibilityv1beta1.VisibilityV1beta1Interface | |
impersonatedVisibilityClient visibilityv1beta1.VisibilityV1beta1Interface |
visibilityClient = util.CreateVisibilityClient("") | ||
impersonatedVisibilityClient = util.CreateVisibilityClient("system:serviceaccount:kueue-system:default") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
visibilityClient = util.CreateVisibilityClient("") | |
impersonatedVisibilityClient = util.CreateVisibilityClient("system:serviceaccount:kueue-system:default") |
util.ExpectObjectToBeDeleted(ctx, k8sClient, defaultRf, true) | ||
}) | ||
|
||
ginkgo.It("should suspend it", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we join it with "should unsuspend it"? Looks like logic the same.
|
||
ginkgo.It("should suspend it", func() { | ||
var testJob *batchv1.Job | ||
ginkgo.By("creating a job without queue name", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please emphasize that the job in unsuspended?
} | ||
createdJob.Labels["kueue.x-k8s.io/queue-name"] = "main" | ||
return k8sClient.Update(ctx, createdJob) | ||
}, util.LongTimeout, util.Interval).Should(gomega.Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need long timeouts here and below?
gomega.Eventually(func(g gomega.Gomega) { | ||
g.Expect(k8sClient.Get(ctx, jobLookupKey, createdJob)).Should(gomega.Succeed()) | ||
g.Expect(ptr.Deref(createdJob.Spec.Suspend, false)).To(gomega.BeFalse()) | ||
}, util.LongTimeout, util.Interval).Should(gomega.Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
gomega.Eventually(func(g gomega.Gomega) { | ||
g.Expect(k8sClient.Get(ctx, wlLookupKey, createdWorkload)).Should(gomega.Succeed()) | ||
g.Expect(createdWorkload.Status.Admission).ShouldNot(gomega.BeNil()) | ||
}, util.LongTimeout, util.Interval).Should(gomega.Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
||
waitForAvailableStart := time.Now() | ||
util.WaitForKueueAvailability(ctx, k8sClient) | ||
util.WaitForJobSetAvailability(ctx, k8sClient) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need JobSet in this suite?
If we need to create custom configurations with the ability to run different setups, I suggest moving the creation and deletion logic of Kueue to the Go part instead of running it on the bash script. We can start Kueue in BeforeSuite and delete it in AfterSuite, allowing each test suite to run with the required configuration. |
can you elaborate a bit why? it might be a good idea but Im not sure how this relates to the PR. Also would it affect the singlecluster and multicluster suites? Also is this an alternative to the 1- 3 ideas or within one of them? EDIT: iiuc moving the kubectl apply before suite introduces overhead, so i would prefer to avoid it for the main singlecluster suite which is often used againt an already running cluster with kueue. |
+1 |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This PR adds e2e tests for the
manageJobsWithoutQueueName: true
configuration. This is one of the tasks required to close #3767Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
The test suite sets
manageJobsWithoutQueueName: true
during the setup phase of the description block (just before running the test specs). It then initiates a rollout of the controller deployment to ensure that the new configuration is picked up by the pods. To confirm that the deployment rollout has started, the setup function waits for the Deployment to have the conditionDeploymentCondition=false
.To ensure that this condition is met, the rollout strategy is set to
Recreate
, which guarantees that the pods are terminated before being recreated, resulting in the condition not being satisfied (see DeploymentConditionTypes docs).Does this PR introduce a user-facing change?