Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more granular timeouts for queries #192

Open
matthax opened this issue May 26, 2023 · 4 comments
Open

Support more granular timeouts for queries #192

matthax opened this issue May 26, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@matthax
Copy link

matthax commented May 26, 2023

Is your feature request related to a problem? Please describe.
Feature request / enhancement

Describe the solution you'd like
I'd like to have more granular control over timeouts in the client. Specifically, the ping method has a hardcoded timeout of 3 but in our use case we'd like to be able to configure that to something like 1 for example.

  • The clickhouse-driver library solves this using a sync_request_timeout argument. Perhaps something similar could be added to the client constructor here.

The _raw_request method uses the timeout on the client instance, which is more flexible, but our use case requires a per-query timeout setting. Currently, I'm working around this with a solution I don't love

    @contextmanager
    def with_timeout(
        self,
        connect_timeout: Optional[int] = None,
        send_receive_timeout: Optional[int] = None,
    ):
        _old_timeout = self._delegate.timeout
        self._delegate.timeout = Timeout(
            connect=coerce_int(connect_timeout or _old_timeout.connect_timeout),
            read=coerce_int(send_receive_timeout or _old_timeout.read_timeout),
        )
        yield self._delegate
        self._delegate.timeout = _old_timeout

This approach is similar to that in clickhouse-driver which offers a timeout_setter on the Connection instance it provides. A builtin way to supply a timeout either through a context manager or with a sort of per-query setting e.g. client.query(..., connect_timeout=1, send_receive_timeout=3) would be great.

Describe alternatives you've considered
Right now I'm composing a class that adds some workarounds (the with_timeout above) and for ping I override the method

    def ping(self):
        """
        Ping the host, sync_request_timeout is supplied to the call
        TODO: Suggest this as a PR to clickhouse-connect
        """
        try:
            response = self._delegate.http.request(
                "GET", f"{self._delegate.url}/ping", timeout=self._sync_request_timeout
            )
            return 200 <= response.status < 300
        except HTTPError:
            return False

I suppose query context or settings could also be a feasible way to supply the timeout. I don't have a strong opinion on how it should be accomplished and would love to hear your thoughts.

Additional context

I'm happy to PR any of these changes, just wanted to start a discussion on the approach before starting any implementation.

@matthax matthax added the enhancement New feature or request label May 26, 2023
@genzgd
Copy link
Collaborator

genzgd commented May 26, 2023

Ping is obviously an easy fix, adding an optional defaulted timeout parameter (which takes either an float or a urllib3 Timeout) is trivial -- it's also the only method that doesn't go through the _raw_request method.

Given that a single timeout setting for the client seems to work for the vast majority of use cases I'm hesitant to add yet more constructor arguments to either the Query/InsertContext or the Client. Honestly your approach seems fine for your advanced use case -- with the caveat that you are making the client "less thread safe" by modifying the Client timeout property -- but I think most people who get to that level of control would be aware of that risk.

As for other approaches, I don't particularly like the query context as a place to handle this, since inserts and "commands" don't utilize it (and it also is suffering from having too many options/properties.

What do you think about making the connect_timeout and send_receive_timeout take either the current ints or a "provider function" that return an int (or I suppose a float, although fractional second timeouts don't seem necessary).

@matthax
Copy link
Author

matthax commented May 26, 2023

@genzgd I definitely hear you on the argument overload. For the provider option, what would you envision that function receiving as arguments?

Adding a default to ping would be great, that's the bigger pain point of the two.

And yes, the primary concern I had with the approach I demonstrated above was thread safety, but we do have locks around the client connection pulling that should prevent issues there.

@genzgd
Copy link
Collaborator

genzgd commented May 28, 2023

@matthax Thinking through this if you really need per query timeouts they should be added into the QueryContext (even though it has so many properties already). My provider function idea is just a little too fancy.

Since it's an advanced feature I think using a union of float/int/urllib3 Timeout should be okay, since it will just get passed to the urllib3 library anyway. Any of the methods (inserts, commands, raw*) that need such a timeout should accept that additional parameter.

If you want to take a stab at a PR along those lines that would be most welcome, otherwise it will probably be a week or so before I get to it (in connection with a few other enhancements I'm working on as I have time.)

@matthax
Copy link
Author

matthax commented May 30, 2023

@genzgd I can open a PR for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants