Support draining DaemonSet pods using sriov devices #840

SchSeba · 2025-02-09T18:53:29Z

with this commit we also take care of removing DaemonSet owned pods using sriov devices.

we only do it when drain is requested we don't do it for reboot requests

github-actions · 2025-02-09T18:53:42Z

Thanks for your PR,
To run vendors CIs, Maintainers can use one of:

/test-all: To run all tests for all vendors.
/test-e2e-all: To run all E2E tests for all vendors.
/test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs, Maintainers can use one of:

/skip-all: To skip all tests for all vendors.
/skip-e2e-all: To skip all E2E tests for all vendors.
/skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
Best regards.

coveralls · 2025-02-09T19:00:23Z

Pull Request Test Coverage Report for Build 13370907002

Details

20 of 54 (37.04%) changed or added relevant lines in 1 file are covered.
2 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.05%) to 47.344%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/drain/drainer.go	20	54	37.04%

Files with Coverage Reduction	New Missed Lines	%
pkg/drain/drainer.go	2	59.33%

Totals
Change from base Build 13109399258:	0.05%
Covered Lines:	7282
Relevant Lines:	15381

💛 - Coveralls

bn222 · 2025-02-10T09:07:37Z

pkg/drain/drainer.go

+		// remove pods that are owned by a DaemonSet and use SR-IOV devices
+		dsPodsList := getDsPodsToRemove(podList)
+		for _, pod := range dsPodsList {
+			err = d.kubeClient.CoreV1().Pods(pod.Namespace).Delete(ctx, pod.Name, metav1.DeleteOptions{})


Shouldn't we block? Waiting to ensure that the pod is fully removed before continuing? A pod that's slow to delete might cause a race condition

yes I switch to use the drain from the kubernetes drain helper it does that :)

pkg/drain/drainer.go

adrianchiris · 2025-02-18T11:59:19Z

pkg/drain/drainer.go

+
+		// on full drain there is no need to try and remove pods that are owned by DaemonSets
+		// as we are going to reboot the node in any case.
+		if fullNodeDrain {


do we care ? (just thinking how to simplify) why not always remove DS pods from node IF they have sriov resources ?

with this commit we also take care of removing daemonset owned pods using sriov devices. we only do it when drain is requested we don't do it for reboot requests Signed-off-by: Sebastian Sch <[email protected]>

ykulazhenkov · 2025-02-20T13:52:31Z

test/conformance/tests/test_sriov_operator.go

@@ -2376,6 +2538,18 @@ func waitForPodRunning(p *corev1.Pod) *corev1.Pod {
 	return ret
 }

+func waitForDaemonReady(d *appsv1.DaemonSet) *appsv1.DaemonSet {


nit: waitForDaemonSetReady

ykulazhenkov · 2025-02-20T13:56:05Z

pkg/drain/drainer.go

-		reqLogger.Info("drainNode(): Draining failed, retrying", "error", err)
-		return false, nil
+
+		err = d.removeDaemonSetsFromNode(ctx, node.Name)


I'm not entirely clear on how this works. I understand the process of selecting and removing DS pods, but I'm unsure how we prevent the DS controller from restarting pods on the node we intend to drain. Could you clarify?

bn222 reviewed Feb 10, 2025

View reviewed changes

SchSeba force-pushed the drain_daemon branch 2 times, most recently from 2140f89 to c20fba0 Compare February 17, 2025 13:03

adrianchiris reviewed Feb 18, 2025

View reviewed changes

pkg/drain/drainer.go Show resolved Hide resolved

adrianchiris reviewed Feb 18, 2025

View reviewed changes

Support draining daemonset pods that use sriov devices

cd2fb78

with this commit we also take care of removing daemonset owned pods using sriov devices. we only do it when drain is requested we don't do it for reboot requests Signed-off-by: Sebastian Sch <[email protected]>

SchSeba force-pushed the drain_daemon branch from c20fba0 to cd2fb78 Compare February 19, 2025 12:15

ykulazhenkov reviewed Feb 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support draining DaemonSet pods using sriov devices #840

Support draining DaemonSet pods using sriov devices #840

SchSeba commented Feb 9, 2025

github-actions bot commented Feb 9, 2025

coveralls commented Feb 9, 2025 •

edited

Loading

bn222 Feb 10, 2025

SchSeba Feb 17, 2025 •

edited

Loading

adrianchiris Feb 18, 2025 •

edited

Loading

ykulazhenkov Feb 20, 2025

ykulazhenkov Feb 20, 2025

Support draining DaemonSet pods using sriov devices #840

Are you sure you want to change the base?

Support draining DaemonSet pods using sriov devices #840

Conversation

SchSeba commented Feb 9, 2025

github-actions bot commented Feb 9, 2025

coveralls commented Feb 9, 2025 • edited Loading

Pull Request Test Coverage Report for Build 13370907002

Details

💛 - Coveralls

bn222 Feb 10, 2025

Choose a reason for hiding this comment

SchSeba Feb 17, 2025 • edited Loading

Choose a reason for hiding this comment

adrianchiris Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

ykulazhenkov Feb 20, 2025

Choose a reason for hiding this comment

ykulazhenkov Feb 20, 2025

Choose a reason for hiding this comment

coveralls commented Feb 9, 2025 •

edited

Loading

SchSeba Feb 17, 2025 •

edited

Loading

adrianchiris Feb 18, 2025 •

edited

Loading