Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller endlessly creates BEST_POSSIBLE nodes due to handleDelayedRebalanceMinActiveReplica #2971

Open
GrantPSpencer opened this issue Nov 26, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@GrantPSpencer
Copy link
Contributor

Describe the bug

Controller endlessly creates best possible nodes under certain circumstances due handlehandleDelayedRebalanceMinActiveReplica modifying the same underlying ResourceAssignments that are used for the in memory bestPossibleAssignment. This causes us to persist a best possible assignment to ZK and cache, then handleDelayedRebalanceMinActiveReplica causes the cache'd mapping to change. This causes the isBestPossibleChanged call in doPartialRebalance to evaluate to true because the cached map has been changed. Partial rebalance persists its calculated assignment to memory and updates the _bestPossibleVersion. The next emergencyRebalance run then will calculate the same assignment as is stored in memory, but because _bestPossibleVersion was incremented, it will still persist the mapping to ZK even though the assignments are identical.

To Reproduce

#2970

  1. Add waged resource, set max offline instances allowed
  2. Disable instances in cluster to auto enter maintenance mode
  3. re-enable some instances to no longer auto enter maintenance mode and add new instances to the cluster
  4. Exit maintenance mode
  5. Pipelines will succeed but best_possible versions

Expected behavior

Should not endlessly create nodes

@GrantPSpencer GrantPSpencer added the bug Something isn't working label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant