-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add validation code to check replicas for quorum loss #102
base: main
Are you sure you want to change the base?
Conversation
Could we instead be relying on redpanda to apply back pressure to |
Oh interesting, that sounds like another problem, one associated with replication of partitions of topics. My understanding of the problem was loosing quorum of the admin API. But this also sounds legit, though i think raft is also implemented on the partitions so you could replicate to 5 and scale down to 4 and still have quorum over your partitions. I suggest that if we want to also account for replication of topics then we should consider that another check to avoid conflating the two issues. The latter check probably belongs to the decommission controller though, since there you have visibility over the scaling. Note that the decommission controller is a clean up process, it does not stop you from scaling redpanda but cleans up redpanda since it decommissions for you brokers that have already be turned off by the scaling. Here, the best we can do for partitions is try to ensure that quorum can be recovered based on desired partition replicas (largest of the min required for quorum wins), essentially avoid shooting yourself on the foot. |
Gotcha! As long as the decommission controller is a clean up process, we'll have to lean on validation. Ideally, we'd be able to run decommission on however many nodes ahead of changing the StatefulSet's number of replicas. My fear with validation is that we'll accidentally lock ourselves out of some operations in a disaster recovery scenario. We can always add in a backdoor in the future if need be though. |
I agree with @chrisseto. Decommission process is the right place to give us signal from core Redpanda that the scaling down is not possible. In operator v1 logic the decommission is performed before scaling down, not after. That way we will never break producers that needs to satisfy topic replication factor. |
That seems outside of the scope of what was said on the ticket. I can add the logic here if you want and have two places were we take care of quorum issues, no problem. The first part, here already, basically ensures that admin api does not loose quorum, and this was discussed in private conversation with @RafalKorepta. The second part will try to consider quorum based on partitions on topics, this will be part of the decommission controller, here the best we can do here is not decomission nodes if this will push any partition out of quorum, and we will re-queue instead, giving the user the opportunity to fix their issue. |
I'm not sure I understand what you are trying to say. By "here" you mean Redpanda controller? P.S. Could you add test for this feature? |
Here: means this PR for the PS read the TODO. |
The tests validate that Redpanda custom resource is working. I'm afraid that due to very simple values the post-install or post-upgrade helm jobs would not give us confidence that Redpanda cluster itself is working correctly. Redpanda controller is not verifying the health of the cluster. It only cares about helm chart install/upgrade success. |
In the buildkite CI I saw few errors:
K8S Operator v2 Helm test suite is failing all over the places as it can not install Redpanda helm chart. |
|
Checking if quorum will be lost if replica count changes.
TODO:
https://github.com/orgs/redpanda-data/projects/54/views/8?pane=issue&itemId=56191515