Skip to content
This repository has been archived by the owner on Sep 12, 2024. It is now read-only.

Commit

Permalink
Set cluster ID when ZK connection state is connected
Browse files Browse the repository at this point in the history
The master and agent process check they can connect to ZooKeeper with a cluster
ID value. This cluster ID value is passed in as a CLI switch on startup. The
code also checks the cluster ID ZK node exists and stores it in an
`AtomicBoolean`.

The `AtomicBoolean` is updated with a `org.apache.zookeeper.Watcher` on the
node itself and with a `ConnectionStateListener` that updates when the
connection state to ZK changes.

My hypothesis is that when agents lose connection entirely to the ZKs and the
connection then comes back, the `ConnectionState` is `CONNECTED` instead of
`RECONNECTED`. This would skip the conditional and the `AtomicBoolean` wouldn't
get updated.

Helios agents that do not automatically recover have these logs for
`ConnectionState`.  Notice that the last state is `CONNECTED`.

```
dxia@bad-host:~$ grep 'DefaultZooKeeperClient connection state change' /path/to/helios/info.log
2019-09-19T01:53:14.391+00:00 bad-host helios[21939]: DefaultZooKeeperClient connection state change - SUSPENDED
2019-09-19T01:53:33.511+00:00 bad-host helios[21939]: DefaultZooKeeperClient connection state change - LOST
2019-09-19T01:54:15.333+00:00 bad-host helios[21939]: DefaultZooKeeperClient connection state change - RECONNECTED
2019-09-19T03:15:16.298+00:00 bad-host helios[21939]: DefaultZooKeeperClient connection state change - SUSPENDED
2019-09-19T03:15:39.592+00:00 bad-host helios[21939]: DefaultZooKeeperClient connection state change - LOST
2019-09-19T08:11:09.721+00:00 bad-host helios[15902]: DefaultZooKeeperClient connection state change - CONNECTED
```

There are some helios agents that were fine that also had `CONNECTED` as the
last state in the logs though.

But it seems like we should update the `AtomicBoolean` in cases of
`CONNECTED`, `RECONNECTED`, and `READ_ONLY`.
  • Loading branch information
davidxia committed Sep 20, 2019
1 parent b311b79 commit a53832a
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ public void process(WatchedEvent event) {
@Override
public void stateChanged(CuratorFramework client, ConnectionState newState) {
log.info("DefaultZooKeeperClient connection state change - {}", newState);
if (newState == ConnectionState.RECONNECTED) {
if (newState.isConnected()) {
checkClusterIdExists(clusterId, "connectionStateListener");
}
}
Expand Down

0 comments on commit a53832a

Please sign in to comment.