Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Huge Number of Watches in ZooKeeper #17482

Open
wants to merge 31 commits into
base: master
Choose a base branch
from
Open

Conversation

GWphua
Copy link
Contributor

@GWphua GWphua commented Nov 15, 2024

Fixes #6647

Description

This PR is built upon #6683 and #9172 and aims to reduce the number of ZooKeeper watch counts.

Fixed Huge Number of Watches in ZooKeeper

The current Announcer.java leverages on Apache Curator's PathChildrenCache. In its present form, the announcement mechanism watches the immediate parent of the specified path. This results in all child nodes under the parent path being monitored by the ZooKeeper ensemble, including sibling nodes and children of the specified path. This causes an unnecessarily large number of ZooKeeper watches to be produced.

The new NodeAnnouncer.java class is simply Announcer.java but leverages on NodeCache instead to watch a single node during announcement. By eliminating the watches on child nodes, this approach significantly reduces the total number of watch counts in ZooKeeper.

Tests conducted on the production server also indicate a decrease in watch counts resulting from this change.

ZK Watch Count

An additional add-on is that while Announcer.java is partially replaced in the previous two PRs, this PR replaces all instances of Announcer.java (See reason below). Should the reviewers feel it is ok to remove the entire Announcer class, I would be happy to do so.

The use of the two different announcer classes simultaneously may result in a KeeperException.NotEmptyException. This happens when two nodes are sharing the same parent, and since both announcers do not have a full picture of the nodes it is watching, the exception will be thrown when the following occurs:

  1. Announcer removes all of its tracked children nodes.
  2. Thinking that after all the removal the parent node has no children anymore, Announcer tries to remove the parent node.
  3. If NodeAnnouncer is still watching one or more child node, the attempt by Announcer in removing the parent node will result in the exception.

Documentation

  • Remove humor in error logs.
  • Add JavaDocs and comments within code to better describe the process.

Refactoring

  • Shift Announceable class in Announcer.java to Announceable.java.
  • Refactor long methods by creating helper functions.
  • Add ZKPathsUtils.java to abstract the retrieval of ZooKeeper path and ZooKeeper node.

Release note

Improved: ZooKeeper no longer spins up an unnecessary large number of watches when running realtime tasks.


Key changed/added classes in this PR
  • Announcer.java
  • NodeAnnouncer.java
  • Announceable.java
  • AnnouncerModule.java
  • ZKPathsUtils.java
Classes that use NodeAnnouncer in favour of Announcer.java

The following classes have switched to using NodeAnnouncer.java instead of Announcer.java:

  • CuratorDruidNodeAnnouncer.java
  • WorkerCuratorCoordinator.java
  • BatchDataSegmentAnnouncer.java
  • CuratorDataSegmentServerAnnouncer.java

This PR has:

  • been self-reviewed.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@GWphua GWphua marked this pull request as draft November 21, 2024 01:24
@GWphua GWphua marked this pull request as ready for review December 3, 2024 03:34
@GWphua
Copy link
Contributor Author

GWphua commented Dec 11, 2024

Ready for review, PTAL @kaijianding @leventov @jihoonson @asdf2014 @gianm

@GWphua
Copy link
Contributor Author

GWphua commented Dec 19, 2024

I did some benchmarking on a single cluster, deploying two druid instances into two namespaces. We can see a lowered memory usage in ZooKeeper after the change.

I conducted the benchmark by submitting 5 instances of the trips_xaa (3 files) datasources. Since the size of my cluster is not really large, we will not be seeing up to hundreds of millions of ZooKeeper watch counts, but at least this gives us an idea of what the count will be like for larger clusters:

Ingestion

image-20241216184859794

Querying

I ran a python script to conduct 10 groups of queries on the ingested datasources. Each group of queries are repeated 500 times. To simulate a concurrent querying environment, I allocated 9 threads to send these queries.

image-20241217155426534
image
image

@asdf2014 asdf2014 requested a review from gianm December 20, 2024 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

huge number of watch in zookeeper cause zookeeper full gc
2 participants