[BUG] socket_timeout parameter has no effect if the link is broken between redis instance and the client. Outage towards a slot range for several minutes. #579

jzkiss · 2024-07-05T10:35:37Z

Describe the bug
socket_timeout parameter has no effect if the link is broken between redis instance and the client. Outage towards a slot range for several minutes.

To Reproduce

Async client is used with cluster mode. Redis cluster is used with 3 masters and 3 slaves.
generate continuous traffic
during the traffic, select a master redis instance, apply the following iptables rule in the container/vm of that server:
iptables -A OUTPUT -p tcp --sport redis_port -s redis_ip -j DROP
kill that redis server (kill -9 redis_server_pid) [-> new master election will happen for that slot range]

Expected behavior
After socket_timeout, redis-plus-plus discover the new elected master / broken connection, traffic is redirected to that master

Unexpected result: old connection is used, continuous TCP packet retransmissions, no response to users for the given slot range for several minutes

Environment:
OS: Rocky Linux 8.2-20.el8.0.1
Compiler: gcc version 8.5.0
hiredis version: hiredis 1.2.0
redis-plus-plus version: 1.3.12

Additional context
Correction proposal:

introduce a new property per connection: last_response_received
failure detection: whenever a request is sent in a connection, check if (t_now - last_response_received) is under socket_timeout + TOLERATION (some milliseconds).
when the failure is detected, be careful with CLUSTER SLOTS, do not target the problematic master instance, select another (I see issues like that)
check if mastership was changed, and reconnect to the new master if needed. Abort the ongoing requests towards the redis-plus-plus user, redis-plus-plus user should retransmit the request, maybe with some delay.

As a workaround, we started guard timer and reset the AsyncRedisClient at timeouts to force redis-plus-plus to discover the mastership changes / broken links. But this solution also caused some issues, see:
#577
#578

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] socket_timeout parameter has no effect if the link is broken between redis instance and the client. Outage towards a slot range for several minutes. #579

[BUG] socket_timeout parameter has no effect if the link is broken between redis instance and the client. Outage towards a slot range for several minutes. #579

jzkiss commented Jul 5, 2024 •

edited

Loading

[BUG] socket_timeout parameter has no effect if the link is broken between redis instance and the client. Outage towards a slot range for several minutes. #579

[BUG] socket_timeout parameter has no effect if the link is broken between redis instance and the client. Outage towards a slot range for several minutes. #579

Comments

jzkiss commented Jul 5, 2024 • edited Loading

jzkiss commented Jul 5, 2024 •

edited

Loading