CA: refactor utils related to NodeInfos #7479

towca · 2024-11-07T20:51:59Z

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This is a part of Dynamic Resource Allocation (DRA) support in Cluster Autoscaler. There were multiple very similar utils related to copying and sanitizing NodeInfos scattered around the CA codebase. Instead of adding similar DRA handling to all of them separately, they're consolidated into a single location that will be later adapted to handle DRA.

Which issue(s) this PR fixes:

The CA/DRA integration is tracked in kubernetes/kubernetes#118612, this is just part of the implementation.

Special notes for your reviewer:

The first commit in the PR is just a squash of #7466, and it shouldn't be a part of this review. The PR will be rebased on top of master after #7466 is merged.

This is intended to be a no-op refactor. It was extracted from #7350 after #7447, and #7466.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

https://github.com/kubernetes/enhancements/blob/9de7f62e16fc5c1ea3bd40689487c9edc7fa5057/keps/sig-node/4381-dra-structured-parameters/README.md

towca · 2024-11-07T20:52:36Z

/assign @MaciekPytel
/assign @jackfrancis
/hold

DONOTSUBMIT

towca · 2024-11-19T14:14:22Z

/assign @BigDarkClown

DONOTSUBMIT

jackfrancis · 2024-11-19T20:28:00Z

cluster-autoscaler/core/static_autoscaler.go

@@ -34,7 +34,7 @@ import (
 	"k8s.io/autoscaler/cluster-autoscaler/core/scaledown/planner"
 	scaledownstatus "k8s.io/autoscaler/cluster-autoscaler/core/scaledown/status"
 	"k8s.io/autoscaler/cluster-autoscaler/core/scaleup"
-	orchestrator "k8s.io/autoscaler/cluster-autoscaler/core/scaleup/orchestrator"


DONOTSUBMIT

jackfrancis · 2024-11-21T19:55:08Z

cluster-autoscaler/processors/nodeinfosprovider/mixed_nodeinfos_processor.go

 			}
-			nodeInfo, err := simulator.BuildNodeInfoForNode(sanitizedNode, podsForNodes[node.Name], daemonsets, p.forceDaemonSets)
+			templateNodeInfo, caErr := simulator.TemplateNodeInfoFromExampleNodeInfo(nodeInfo, id, daemonsets, p.forceDaemonSets, taintConfig)
 			if err != nil {


should be if caErr != nil { here?

(also do we need to define a new caErr variable here instead of just re-using err?)

Ugh my bad, good catch!

As to why we need 2 variables:

GetNodeInfo returns the error interface, so err has type error.

TemplateNodeInfoFromExampleNodeInfo returns the errors.AutoscalerError interface.

We need to return the errors.AutoscalerError interface from Process().

We could technically assign the TemplateNodeInfoFromExampleNodeInfo error of type errors.AutoscalerError to err, since errors.AutoscalerError is a superset of error. But then we'd have to wrap it in errors.AutoscalerError right back to return it, which is IMO confusing.

ACK, all makes sense!

jackfrancis · 2024-11-21T20:36:16Z

cluster-autoscaler/simulator/framework/infos.go

+	for _, slice := range n.LocalResourceSlices {
+		newSlices = append(newSlices, slice.DeepCopy())
+	}
+	return NewNodeInfo(n.Node().DeepCopy(), newSlices, newPods...)


Because the NewNodeInfo constructor only sets a node object if the passed in node is not nil:

if node != nil { result.schedNodeInfo.SetNode(node) }

... invoking n.Node().DeepCopy(), inline like this might be (theoretically) subject to a nil pointer exception

nevermind, you can ignore this comment

// Node returns overall information about this node. func (n *NodeInfo) Node() *v1.Node { if n == nil { return nil } return n.node }

Yeah this should be okay because DeepCopy() gracefully handles nil receivers. Added a comment for clarity and more test cases to cover this scenario.

jackfrancis · 2024-11-21T20:38:10Z