-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Starting process should ignore Pending state pods and start the next one in the rack #696
Comments
There's no separate seed nodes in our system, the seed labels get targeted with an algorithm to available nodes such as the first starting one. The system should always have minimum of 3 seed nodes (if there's 3 or more pods available) and they get set when the pod has reached Started state. How did you resume the stopped system to get seed labels to nodes which are in pending state? |
@burmanm Thanks for the explanation. I double checked and there are no seed-node labels on the unready Pods. Only one Pod came up and that's the only Pod with the seed-node label. This may not be a scheduling issue. I just set the Running the
while the currently scheduled Pods have the following IPs:
It seems that old Pods' IPs still exist in the membership list? |
This is as it should be, we only label seeds when they come up. The definition of seed isn't strict so it can be any node, we use a service with label selector to find nodes we at that point find suitable as seeds. They can change depending on pods going down and up. The situation you have here is not probably related to seed nodes themselves actually, but the fact that you happen to have Pending pods in certain order. Our starting process has running index where it tries to balance each rack having equal amount of nodes up and with the order being 0->n in the starting phase. But the way you randomly have some racks in pending order makes the process wait for those to be scheduled (as the operator can't know from scheduler that they're not coming up). Solving this correctly would require information from the scheduler or a workaround to ignore Pending pods. I'll change the topic of this to indicate that we need to ignore Pending pods in our starting scheduler. However, I think for now the other workaround is to use As for Stopped processing (I'm not sure if this was clear), the cluster membership states are not changed. Cassandra will remember all the nodes it had before Stopped was set to True. IPs might change, but the nodeIDs will remain the same. |
What happened?
We are trying to resume a stopped Cassandra cluster. The cluster may not have the enough resource to schedule all the Pods, and we found that in some cases the seed nodes in racks may be in
pending
state while other nodes are scheduled but cannot start correctly because of the lack of seed node.What did you expect to happen?
The operator should prioritize seed nodes when creating the Cassandra pods. If the seed node is not scheduled, the entire rack is defunct.
How can we reproduce it (as minimally and precisely as possible)?
cass-operator version
1.22.0
Kubernetes version
1.29.1
Method of installation
Helm
Anything else we need to know?
No response
┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: CASS-1
The text was updated successfully, but these errors were encountered: