[LI-HOTFIX] Avoid assigning replicas to the preferred controllers or maintenance brokers #111

gitlw · 2021-01-07T17:52:16Z

If we expand partitions using the AdminZkClient, currently there is no logic to avoid assigning replicas to preferred controllers.
This PR fixes the issue in the following two places:

It avoids assignment of replicas to preferred controllers and maintenance brokers within the AdminZkClient
In case someone is using an non-patched client and still assigns replicas to preferred controllers, the KafkaController will call rearrangePartitionReplicaAssignmentForNewPartitions to remove the preferred controllers from the assignment.

Testing Stretegy:
An unit test is added to ensure the problem doesn't happen again.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

xiowu0 · 2021-01-07T20:02:00Z

core/src/main/scala/kafka/controller/KafkaController.scala

+        val topicsToBeRearranged = zkClient.getPartitionAssignmentForTopics(topicsToCheck.toSet).filter {
+          case (topic, partitionMap) =>
+            val existingAssignment = controllerContext.partitionAssignments.getOrElse(topic, mutable.Map.empty)
+            val newPartitions = partitionMap.filter{case (partitionId, _) => partitionId >= existingAssignment.size}


could we add some safe check here to ensure the partition state znode doesn't exist for these new partitions. Although I don't see an issue in the current implementation, if this function is used incorrectly, it would be very dangerous. For example, if we use "rearrangePartitionReplicaAssignmentForNewPartitions(topics, false)" when initializing controller context, this may cause all topics to get reassigned since controllerContext.partitionAssignments is an empty set (this will also result in orphan partitions).
Again, I don't see the issue in this implementation, but I think it is worth checking.

In addition, this doesn't address the case when there is controller move right after partition expansion (before the rearrange actually happened). I think it is ok since we are not trying to handle 100% no replicas in the preferred controller. Maybe we can add some comments for this unhandled situations.

Thanks for the comments. It's true that they are all newPartitions given controllerContext.partitionAssignments is an empty set, but they shouldn't have a replica on the noNewPartitionBrokers, thus they shouldn't be rearranged.

It's a good point that the current implementation cannot handle controller switches. Given it's safe to scan all topics' partitions on controller switch, I think it should be done during a controller switch. Thoughts?

manual assignment for existing topics can still assign partitions to noNewPartitionBrokers. I think we cannot guarantee that noNewPartitionBrokers won't get replicas. In addition, there are some small-time window that new replicas can still get assigned to preferred controllers due to the fact that preferred controller znode is emphermal znode (see the design doc for more detail).

I think a safer way is to rely on the existence of partition state znode when performing rearrangement.

scan all topics' partitions on controller switch => If we cannot guarantee 100% no replica in the preferred controllers, I think it is ok to not performing special handing during controller switch given the overhead/additional hacky code needed (up to you to make a decision).

Upon closer look, I find that there is already logic to handle the controller switch over by calling the rearrangePartitionReplicaAssignmentForNewPartitions method inside initializeControllerContext.

Regarding the preferred controller znodes being ephemeral, it's kinda an orthogonal design issue that we could address independently.

…or preferred controllers

…ng partition expansion

xiowu0 · 2021-05-27T21:24:09Z

core/src/main/scala/kafka/controller/KafkaController.scala

@@ -1755,6 +1758,7 @@ class KafkaController(val config: KafkaConfig,
    }

    if (!isActive) return
+    rearrangePartitionReplicaAssignmentForNewPartitions(immutable.Set(topic))


does it conflict with partition reassignment?
say if a partition gets reassigned to replica ( 1, 2, 5, 6 ) ==> if one of this replica is maintenance brokers, would this reassignment complete if we automatically changing zk node to disallow placing replicas on maintenance brokers.

That's a good point. I've updated the PR to avoid assignment replicas to the undesirable hosts during partition reassignment. Please take another look. Thanks!

…e brokers

xiowu0 · 2021-05-28T21:37:58Z

core/src/main/scala/kafka/controller/KafkaController.scala

      reassignments.foreach { case (tp, targetReplicas) =>
-        if (replicasAreValid(tp, targetReplicas)) {
+        if (replicasAreValid(tp, targetReplicas, noNewPartitionBrokers)) {


still have some race condition here, because a broker can be marked as maintenance brokers after partition reassignment request is received before the reassignment request is completed

efeg · 2021-05-29T01:04:02Z

Quick Update on Review: Had an offline chat with @gitlw regarding whether we may want to address the second point (i.e. 2. In case someone is using an non-patched client and still assigns replicas to preferred controllers, the KafkaController will call rearrangePartitionReplicaAssignmentForNewPartitions to remove the preferred controllers from the assignment.) in another way. We will chat more about the pros and cons (cc @xiowu0).

gitlw requested a review from xiowu0 January 7, 2021 17:54

xiowu0 reviewed Jan 7, 2021

View reviewed changes

gitlw added 4 commits May 26, 2021 15:49

Fixing bug so that replicas won't be assigned to maintenance brokers …

0536dc7

…or preferred controllers

Adding test to ensure the bug doesn't happen again

3f16983

Added test and fixes on the broker side to rearrange assignments duri…

caddfe1

…ng partition expansion

Check supplied topics without considering they are new or not

38e9116

gitlw force-pushed the no_replica_on_preferred_controller branch from 0126dbe to 38e9116 Compare May 26, 2021 23:14

xiowu0 reviewed May 27, 2021

View reviewed changes

gitlw added 2 commits May 27, 2021 15:21

Reject reassigning partitions to unassignable brokers

0597342

Adding test to ensure partition reassignment don't like on maintenanc…

821a23e

…e brokers

gitlw changed the title ~~[LI-HOTFIX] Avoid assigning replicas to the preferred controllers~~ [LI-HOTFIX] Avoid assigning replicas to the preferred controllers or maintenance brokers May 28, 2021

xiowu0 reviewed May 28, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LI-HOTFIX] Avoid assigning replicas to the preferred controllers or maintenance brokers #111

[LI-HOTFIX] Avoid assigning replicas to the preferred controllers or maintenance brokers #111

gitlw commented Jan 7, 2021

xiowu0 Jan 7, 2021

gitlw Jan 8, 2021

xiowu0 Jan 8, 2021

gitlw May 26, 2021

xiowu0 May 27, 2021

gitlw May 28, 2021

xiowu0 May 28, 2021

efeg commented May 29, 2021

[LI-HOTFIX] Avoid assigning replicas to the preferred controllers or maintenance brokers #111

Are you sure you want to change the base?

[LI-HOTFIX] Avoid assigning replicas to the preferred controllers or maintenance brokers #111

Conversation

gitlw commented Jan 7, 2021

Committer Checklist (excluded from commit message)

xiowu0 Jan 7, 2021

Choose a reason for hiding this comment

gitlw Jan 8, 2021

Choose a reason for hiding this comment

xiowu0 Jan 8, 2021

Choose a reason for hiding this comment

gitlw May 26, 2021

Choose a reason for hiding this comment

xiowu0 May 27, 2021

Choose a reason for hiding this comment

gitlw May 28, 2021

Choose a reason for hiding this comment

xiowu0 May 28, 2021

Choose a reason for hiding this comment

efeg commented May 29, 2021