You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're trying to reduce AWS EC2 costs by reducing the number of nodes to 0 automatically during times the enter cluster is not needed using an autoscaling_schedule (see e.g. https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_schedule). We have observed that, when a single node is started after no nodes at all have been available, the kube-scheduler may start a large number of Deployments on that node, leaving insufficient resources (in our case memory) for the pods of all DaemonSets to be started on that node. That situation was not rectified automatically after adding more nodes to the cluster, instead, I had to cordon the server and terminate some pods on the first node for them to be moved to another node.
It would be preferable if resources for DaemonSets were reserved before Workloads that are not tied to nodes (Deployments, ReplicaSets, ...) are scheduled.
The text was updated successfully, but these errors were encountered:
We're trying to reduce AWS EC2 costs by reducing the number of nodes to 0 automatically during times the enter cluster is not needed using an autoscaling_schedule (see e.g. https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_schedule). We have observed that, when a single node is started after no nodes at all have been available, the kube-scheduler may start a large number of Deployments on that node, leaving insufficient resources (in our case memory) for the pods of all DaemonSets to be started on that node. That situation was not rectified automatically after adding more nodes to the cluster, instead, I had to cordon the server and terminate some pods on the first node for them to be moved to another node.
It would be preferable if resources for DaemonSets were reserved before Workloads that are not tied to nodes (Deployments, ReplicaSets, ...) are scheduled.
The text was updated successfully, but these errors were encountered: