-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/more reliable zookeeper #992
base: master
Are you sure you want to change the base?
Conversation
Rather than opening and closing connections for each Zookeeper interaction use the functionality in the KazooClient which maintains a connection with Zookeeper. This lowers resource consumption and improves latency.
KazooClient was mocked for some tests make sure this mocking is maintained with the new way of using KazooClient.
The IDE will get confused because the client will create methods at runtime. By referencing the KazooClient methods in the class the IDE can use hints based on the downstream methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(apparently I still had these comments from some time ago)
from time import sleep | ||
from typing import TYPE_CHECKING | ||
|
||
if TYPE_CHECKING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are all stdlib/thirdparty (based) symbols, so I don't think it's necessary to guard this with if TYPE_CHECKING
(that's typically just a trick to dig you out of circular import holes)
from kazoo.protocol.states import ZnodeStat | ||
|
||
# Helper type which corresponds to what Zookeeper for callbacks as used by watches | ||
ZEvent = tuple[bytes, ZnodeStat] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we're still on python 3.8, I don't think we can already use type annotations like tuple[...]
and still have to stick to Tuple[...]
FYI: I just merged latest master in this feature branch to make sure this PR doesn't get stale |
(improve signal-to-noise ratio of PR's diff)
|
||
@contextlib.contextmanager | ||
def _zk_client(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can't keep this usage pattern for now:
with self._zk_client() as zk:
...
but return the new wrapper instead there.
That would make the introduced change of this PR smaller (in LoC).
And it would even allow to gradually roll this out (e.g. with a feature flag or config)
exceptions=(NoActiveConnectionException,), | ||
tries=600, | ||
logger=None, | ||
)(_wait_100ms_for_zookeeper_connection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while reviewing this I got a bit lost in this _wait_100ms_for_zookeeper_connection
/_wait_up_to_1minute_for_zookeeper_connection
construction.
Can't it be simplified to a single helper like
def _wait_on_connected(self, timeout: float=60, sleep=0.1):
end = time.time() + timeout
while time.time() < end:
if self._zk.connected:
return
if self._logger is not None:
self._logger.debug(f"No active connection for zookeeper client ({self._zkhosts})")
time.sleep(sleep)
raise ZookeeperClient.NoActiveConnectionException()
(PS: I would avoid hardcoding implementation details like 100ms
and 1minute
in the function names to make it easier to fine-tune these when necessary)
while reviewing I also started wondering if it wouldn't be easier to just add an option for reuse to openeo-geopyspark-driver/openeogeotrellis/user_defined_process_repository.py Lines 83 to 94 in 4f57b9b
e.g. PoC: that approach would also allow to roll client reuse out gradually (e.g. with config or feature flags), instead of a a complete KazooClient replacement as proposed in this PR #992 |
This change will just create 1 KazooClient and keeps it running in the background. The KazooClient itself has logic to recover connectivity.
In order to deal with intermittent failure we also retry operations if they fail due to connectivity related issues. At this time the once that came from test scenario's were
kazoo.exceptions.SessionExpiredError
andkazoo.exceptions.ConnectionLoss
but if other retryable exceptions are identified they could be easily added.