This repository has been archived by the owner on Sep 12, 2024. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Set cluster ID when ZK connection state is connected
The master and agent process check they can connect to ZooKeeper with a cluster ID value. This cluster ID value is passed in as a CLI switch on startup. The code also checks the cluster ID ZK node exists and stores it in an `AtomicBoolean`. The `AtomicBoolean` is updated with a `org.apache.zookeeper.Watcher` on the node itself and with a `ConnectionStateListener` that updates when the connection state to ZK changes. My hypothesis is that when agents lose connection entirely to the ZKs and the connection then comes back, the `ConnectionState` is `CONNECTED` instead of `RECONNECTED`. This would skip the conditional and the `AtomicBoolean` wouldn't get updated. Helios agents that do not automatically recover have these logs for `ConnectionState`. Notice that the last state is `CONNECTED`. ``` dxia@bad-host:~$ grep 'DefaultZooKeeperClient connection state change' /path/to/helios/info.log 2019-09-19T01:53:14.391+00:00 bad-host helios[21939]: DefaultZooKeeperClient connection state change - SUSPENDED 2019-09-19T01:53:33.511+00:00 bad-host helios[21939]: DefaultZooKeeperClient connection state change - LOST 2019-09-19T01:54:15.333+00:00 bad-host helios[21939]: DefaultZooKeeperClient connection state change - RECONNECTED 2019-09-19T03:15:16.298+00:00 bad-host helios[21939]: DefaultZooKeeperClient connection state change - SUSPENDED 2019-09-19T03:15:39.592+00:00 bad-host helios[21939]: DefaultZooKeeperClient connection state change - LOST 2019-09-19T08:11:09.721+00:00 bad-host helios[15902]: DefaultZooKeeperClient connection state change - CONNECTED ``` There are some helios agents that were fine that also had `CONNECTED` as the last state in the logs though. But it seems like we should update the `AtomicBoolean` in cases of `CONNECTED`, `RECONNECTED`, and `READ_ONLY`.
- Loading branch information