Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-2068] E2E Test for Preemption #705

Closed
wants to merge 27 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
b83bdd5
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Oct 26, 2023
3830426
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Oct 26, 2023
72ef0d8
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Oct 30, 2023
f3e8b57
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Oct 30, 2023
46eb327
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Oct 30, 2023
856b629
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Oct 31, 2023
7e22832
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 2, 2023
353bf51
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 2, 2023
2206722
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 2, 2023
c972734
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 2, 2023
daac46d
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 2, 2023
93ab76e
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 3, 2023
d916cd9
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 3, 2023
bca7152
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 3, 2023
b30f5ec
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 3, 2023
726c154
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 6, 2023
ad8e7eb
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 7, 2023
c4f17b0
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 8, 2023
ce0351b
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 8, 2023
fae4a0f
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 8, 2023
d2d1130
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 8, 2023
5fbc97f
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 8, 2023
c86acb1
[YUNIKORN-2068] E2E Test for Preemption
rrajesh-cloudera Nov 9, 2023
d85fa9a
cdh kli
rrajesh-cloudera Nov 9, 2023
87494ce
[YUNIKORN-2068] E2E Preemption tests
rrajesh-cloudera Nov 21, 2023
7a30f8e
[YUNIKORN-2068] E2E Preemption tests
rrajesh-cloudera Nov 21, 2023
461cff1
[YUNIKORN-2068] E2E Preemption tests
rrajesh-cloudera Nov 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[YUNIKORN-2068] E2E Preemption tests
  • Loading branch information
rrajesh-cloudera committed Nov 21, 2023
commit 87494ce9102b89be7aec2d4cad432ba5b6d187b4
29 changes: 18 additions & 11 deletions test/e2e/preemption/preemption_test.go
Original file line number Diff line number Diff line change
@@ -23,24 +23,24 @@ import (
"strings"
"time"

"github.com/onsi/ginkgo/v2"
"github.com/onsi/gomega"
v1 "k8s.io/api/core/v1"
schedulingv1 "k8s.io/api/scheduling/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

"github.com/apache/yunikorn-core/pkg/common/configs"
"github.com/apache/yunikorn-k8shim/pkg/common/constants"
tests "github.com/apache/yunikorn-k8shim/test/e2e"
"github.com/apache/yunikorn-k8shim/test/e2e/framework/helpers/common"
"github.com/apache/yunikorn-k8shim/test/e2e/framework/helpers/k8s"
"github.com/apache/yunikorn-k8shim/test/e2e/framework/helpers/yunikorn"
"github.com/onsi/ginkgo/v2"
"github.com/onsi/gomega"
v1 "k8s.io/api/core/v1"
schedulingv1 "k8s.io/api/scheduling/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

var kClient k8s.KubeCtl
var restClient yunikorn.RClient
var ns *v1.Namespace
var dev = "dev" + common.RandSeq(5)
var devNew = "dev" + common.RandSeq(5)
var oldConfigMap = new(v1.ConfigMap)
var annotation = "ann-" + common.RandSeq(10)

@@ -587,20 +587,23 @@ var _ = ginkgo.Describe("Preemption", func() {
return nil
})

newNamespace, err := kClient.CreateNamespace(devNew, nil)
gomega.Ω(err).NotTo(gomega.HaveOccurred())
gomega.Ω(newNamespace.Status.Phase).To(gomega.Equal(v1.NamespaceActive))
// Define sleepPod
sleepPodConfigs := createSandbox1SleepPodCofigsWithStaticNode(3, 600)
sleepPod4Config := k8s.SleepPodConfig{Name: "sleepjob4", NS: dev, Mem: sleepPodMemLimit, Time: 600, Optedout: k8s.Allow, Labels: map[string]string{"queue": "root.sandbox2"}, RequiredNode: nodeName}
sleepPod4Config := k8s.SleepPodConfig{Name: "sleepjob4", NS: devNew, Mem: sleepPodMemLimit, Time: 600, Optedout: k8s.Allow, Labels: map[string]string{"queue": "root.sandbox2"}, RequiredNode: nodeName}
sleepPodConfigs = append(sleepPodConfigs, sleepPod4Config)

for _, config := range sleepPodConfigs {
ginkgo.By("Deploy the sleep pod " + config.Name + " to the development namespace")
sleepObj, podErr := k8s.InitSleepPod(config)
Ω(podErr).NotTo(gomega.HaveOccurred())
sleepRespPod, podErr := kClient.CreatePod(sleepObj, dev)
sleepRespPod, podErr := kClient.CreatePod(sleepObj, devNew)
gomega.Ω(podErr).NotTo(gomega.HaveOccurred())

// Wait for pod to move to running state
podErr = kClient.WaitForPodBySelectorRunning(dev,
podErr = kClient.WaitForPodBySelectorRunning(devNew,
fmt.Sprintf("app=%s", sleepRespPod.ObjectMeta.Labels["app"]),
120)
gomega.Ω(podErr).NotTo(gomega.HaveOccurred())
@@ -609,7 +612,7 @@ var _ = ginkgo.Describe("Preemption", func() {
// assert one of the pods in root.sandbox1 is preempted
ginkgo.By("One of the pods in root.sanbox1 is preempted")
sandbox1RunningPodsCnt := 0
pods, err := kClient.ListPodsByLabelSelector(dev, "queue=root.sandbox1")
pods, err := kClient.ListPodsByLabelSelector(devNew, "queue=root.sandbox1")
gomega.Ω(err).NotTo(gomega.HaveOccurred())
for _, pod := range pods.Items {
if pod.DeletionTimestamp != nil {
@@ -620,6 +623,10 @@ var _ = ginkgo.Describe("Preemption", func() {
}
}
Ω(sandbox1RunningPodsCnt).To(gomega.Equal(2), "One of the pods in root.sandbox1 should be preempted")
errNew := kClient.DeletePods(newNamespace.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already performed inside AfterEach()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After each function is doing cleanup for a global namespace which is created for all the tests, In this test I am creating a new namespace and doing cleanup at the end of the test function to ensure there is no resources are available. I tried with the Global namespace and was facing some issue so trying to create inside the test script only and clean once execution is done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need a separate namespace for each test case, that should be done in BeforeEach/AfterEach. If something fails before this code, it will never be cleaned up.

Example:

ginkgo.BeforeEach(func() {
		kubeClient = k8s.KubeCtl{}
		gomega.Expect(kubeClient.SetClient()).To(gomega.BeNil())
		ns = "ns-" + common.RandSeq(10)
		ginkgo.By(fmt.Sprintf("Creating namespace: %s for admission controller tests", ns))
		var ns1, err1 = kubeClient.CreateNamespace(ns, nil)
		gomega.Ω(err1).NotTo(gomega.HaveOccurred())
		gomega.Ω(ns1.Status.Phase).To(gomega.Equal(v1.NamespaceActive))
	})

...

	ginkgo.AfterEach(func() {
		ginkgo.By("Tear down namespace: " + ns)
		err := kubeClient.TearDownNamespace(ns)
		gomega.Ω(err).NotTo(gomega.HaveOccurred())
		// call the healthCheck api to check scheduler health
		ginkgo.By("Check YuniKorn's health")
		checks, err2 := yunikorn.GetFailedHealthChecks()
		gomega.Ω(err2).ShouldNot(gomega.HaveOccurred())
		gomega.Ω(checks).Should(gomega.Equal(""), checks)
	})

if errNew != nil {
fmt.Fprintf(ginkgo.GinkgoWriter, "Failed to delete pods in namespace %s - reason is %s\n", newNamespace.Name, err.Error())
}
})

ginkgo.AfterEach(func() {
@@ -652,7 +659,7 @@ func createSandbox1SleepPodCofigs(cnt, time int) []k8s.SleepPodConfig {
func createSandbox1SleepPodCofigsWithStaticNode(cnt, time int) []k8s.SleepPodConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"WithRequiredNode" or "WithNodeSelector" sounds better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged.

sandbox1Configs := make([]k8s.SleepPodConfig, 0, cnt)
for i := 0; i < cnt; i++ {
sandbox1Configs = append(sandbox1Configs, k8s.SleepPodConfig{Name: fmt.Sprintf("sleepjob%d", i+1), NS: dev, Mem: sleepPodMemLimit2, Time: time, Optedout: k8s.Allow, Labels: map[string]string{"queue": "root.sandbox1"}, RequiredNode: nodeName})
sandbox1Configs = append(sandbox1Configs, k8s.SleepPodConfig{Name: fmt.Sprintf("sleepjob%d", i+1), NS: devNew, Mem: sleepPodMemLimit2, Time: time, Optedout: k8s.Allow, Labels: map[string]string{"queue": "root.sandbox1"}, RequiredNode: nodeName})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is very suspicious. You're using the "RequiredNode" setting here, which will trigger the "simple" (aka. RequiredNode) preemption logic inside Yunikorn. That is not the generic preemption logic that Craig worked on.

Please check if the console log of Yunikorn contains this:

log.Log(log.SchedApplication).Info("Triggering preemption process for daemon set ask",
		zap.String("ds allocation key", ask.GetAllocationKey()))

If this is the case (which is VERY likely), changes are necessary.

A trivial solution is to enhance the RequiredNode field, it has to be a struct like:

type RequiredNode struct {
  Node      string
  DaemonSet bool
}

type SleepPodConfig struct {
...
	Mem          int64
	RequiredNode RequiredNode 
	Optedout     AllowPreemptOpted
}

If DaemonSet == false, then this code doesn't run:

owner := metav1.OwnerReference{APIVersion: "v1", Kind: constants.DaemonSetType, Name: "daemonset job", UID: "daemonset"}
owners = []metav1.OwnerReference{owner}

So this line effectively becomes:

sandbox1Configs = append(sandbox1Configs,
 k8s.SleepPodConfig{
    Name: fmt.Sprintf("sleepjob%d", i+1),
    NS: devNew, Mem: sleepPodMemLimit2,
    Time: time,
    Optedout: k8s.Allow,
    Labels: map[string]string{"queue": "root.sandbox1"},
    RequiredNode: RequiredNode{Node: nodeName, DeamonSet: false}
    }
)

The existing tests inside simple_preemptor_test.go are affected.

Copy link
Contributor

@pbacsko pbacsko Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked at this? Again, the tests should address YUNIKORN-2068. RequiredNodePreemptor does not use the predicates.

}
return sandbox1Configs
}
Loading