-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LI-HOTFIX] Avoid assigning replicas to the preferred controllers or maintenance brokers #111
base: 2.4-li
Are you sure you want to change the base?
Conversation
val topicsToBeRearranged = zkClient.getPartitionAssignmentForTopics(topicsToCheck.toSet).filter { | ||
case (topic, partitionMap) => | ||
val existingAssignment = controllerContext.partitionAssignments.getOrElse(topic, mutable.Map.empty) | ||
val newPartitions = partitionMap.filter{case (partitionId, _) => partitionId >= existingAssignment.size} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add some safe check here to ensure the partition state znode doesn't exist for these new partitions. Although I don't see an issue in the current implementation, if this function is used incorrectly, it would be very dangerous. For example, if we use "rearrangePartitionReplicaAssignmentForNewPartitions(topics, false)" when initializing controller context, this may cause all topics to get reassigned since controllerContext.partitionAssignments is an empty set (this will also result in orphan partitions).
Again, I don't see the issue in this implementation, but I think it is worth checking.
In addition, this doesn't address the case when there is controller move right after partition expansion (before the rearrange actually happened). I think it is ok since we are not trying to handle 100% no replicas in the preferred controller. Maybe we can add some comments for this unhandled situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments. It's true that they are all newPartitions given controllerContext.partitionAssignments is an empty set, but they shouldn't have a replica on the noNewPartitionBrokers, thus they shouldn't be rearranged.
It's a good point that the current implementation cannot handle controller switches. Given it's safe to scan all topics' partitions on controller switch, I think it should be done during a controller switch. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
manual assignment for existing topics can still assign partitions to noNewPartitionBrokers. I think we cannot guarantee that noNewPartitionBrokers won't get replicas. In addition, there are some small-time window that new replicas can still get assigned to preferred controllers due to the fact that preferred controller znode is emphermal znode (see the design doc for more detail).
I think a safer way is to rely on the existence of partition state znode when performing rearrangement.
scan all topics' partitions on controller switch => If we cannot guarantee 100% no replica in the preferred controllers, I think it is ok to not performing special handing during controller switch given the overhead/additional hacky code needed (up to you to make a decision).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon closer look, I find that there is already logic to handle the controller switch over by calling the rearrangePartitionReplicaAssignmentForNewPartitions
method inside initializeControllerContext
.
Regarding the preferred controller znodes being ephemeral, it's kinda an orthogonal design issue that we could address independently.
…or preferred controllers
…ng partition expansion
0126dbe
to
38e9116
Compare
@@ -1755,6 +1758,7 @@ class KafkaController(val config: KafkaConfig, | |||
} | |||
|
|||
if (!isActive) return | |||
rearrangePartitionReplicaAssignmentForNewPartitions(immutable.Set(topic)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it conflict with partition reassignment?
say if a partition gets reassigned to replica ( 1, 2, 5, 6 ) ==> if one of this replica is maintenance brokers, would this reassignment complete if we automatically changing zk node to disallow placing replicas on maintenance brokers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I've updated the PR to avoid assignment replicas to the undesirable hosts during partition reassignment. Please take another look. Thanks!
reassignments.foreach { case (tp, targetReplicas) => | ||
if (replicasAreValid(tp, targetReplicas)) { | ||
if (replicasAreValid(tp, targetReplicas, noNewPartitionBrokers)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still have some race condition here, because a broker can be marked as maintenance brokers after partition reassignment request is received before the reassignment request is completed
Quick Update on Review: Had an offline chat with @gitlw regarding whether we may want to address the second point (i.e. |
If we expand partitions using the AdminZkClient, currently there is no logic to avoid assigning replicas to preferred controllers.
This PR fixes the issue in the following two places:
Testing Stretegy:
An unit test is added to ensure the problem doesn't happen again.
Committer Checklist (excluded from commit message)