-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection exception during topology update #3101
Comments
Hey @suxb201 , Could you comment on what your expectation is for the driver to handle these cases? |
@tishun I want the client to handle cluster topology changes without throwing any exceptions, as exceptions are difficult to handle and retrying non-idempotent commands on exceptions can lead to data inconsistencies. We can achieve this by fully utilizing the MOVED response during topology changes:
In practice, most clients(Jedis, redis-py, etc.) remain unaffected during cluster topology changes, experiencing only slight increases in latency. Lettuce is an exception; even with the correct configuration, it sometimes throws exceptions, which I believe is a bug related to how Lettuce handles connection closures. |
Let me try to address all the things you said separately.
I am really confused by your example, because Lettuce always reacts to a This is why I asked what do you think we should do - if the command was sent during a transitional state of the system and we can't know if ...
Important Please have in mind that the topology updates are being sent on a separate connection and are sent asynchronously from any other connection / connections that the user application might have established. Lettuce in also a non-blocking client. When you use this pattern you need to accept the fact that there would be events that would be happening in the same time, for example cluster topology refresh and command execution.
Jedis is a synchronous client, while Lettuce is asynchronous. Lettuce also allows several threads to multiplex the same connection. So it is a bit of an odd comparison to make in the first place, because the problem you are describing does not really exist in the first place in Jedis. Depending on how you use redis-py this might be the case there too. In conclusionAll that said if there is a reasonable way we can improve the resilience of the driver during topology refresh I am more than happy to discuss it. We need to identify the specific problem and discuss a solution to it that would fit the common use cases of the driver. If you have evidence that the driver is not reacting to a |
Let me see if we can spend some more time thinking about how we can improve driver resilience here. PRs / suggestions are welcome. |
Currently unable to reproduce, using the code provided.
What steps do you take in order to achieve that? Calling |
Understood.
This is completely fine. Sometimes it is better to communicate ideas with code, but in this case we can try and figure this out another way.
This is where my problem currently stands. To decide how to handle these I need to reproduce the problem and analyse possible solutions, otherwise I would be making changes that might or might not help.
I agree. However I am not sure why the connection is closed. I was unable to reproduce it with the sample code you sent, because something in the way you do the failover causes it. I will try out a few other things, but until we reproduce it this would be hard to fix properly. |
Bug Report
I encountered an issue with Lettuce when sending commands while it updates the cluster topology, leading to exceptions. In my test, I continuously called Lettuce's read and write commands while performing a master-slave switch in a Redis cluster. Occasionally, Lettuce throws a connection exception, which appears as if the connection for normal commands was forcibly terminated.
Current Behavior
In most cases, no exceptions are thrown. However, there is a low probability of encountering the following exceptions:
io.lettuce.core.RedisException: Connection closed
io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 minute(s)
Packet Capture Analysis
io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 minute(s)
:Additionally, it is speculated that another exception,
io.lettuce.core.RedisException: Connection closed
, may follow this sequence:Steps to Reproduce
Input Code
Expected behavior/code
No exceptions.
Environment
The text was updated successfully, but these errors were encountered: