Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

THREESCALE-11528: Redis connection re-establishment in async mode #412

Merged
merged 5 commits into from
Jan 9, 2025

Conversation

jlledom
Copy link
Contributor

@jlledom jlledom commented Dec 17, 2024

This is a proposal on how to implement reconnection to Redis when the link is broken. The idea is taken from redis-client gem (code). Before every request to Redis, ensure_connected is called, this way:

  • When a request fails because a broken link, that doesn't affect the next requests.
  • Next requests will also fail while the link is broken.
  • As soon as the link is restored, the next request will get a valid connection and will work.

This PR also implements an implementation for the reconnect_attempts option:

reconnect_attempts: 1,

Which was being ignored in async mode. It just retries to connect to Redis once (by default) before raising the exception.

In the case the exception is raised:

  • The listener just returns a 403 and keep going. Requests will work again as soon as the link is restored
  • The worker doesn't crash anymore, it waits for 5 seconds and try again, forever, until the link is restored.

Jira:

https://issues.redhat.com/browse/THREESCALE-11528

Notes:

This PR only fixes the async mode. Sync mode uses redis and redis-client gems which already handle connection loss.

It was returning a network error as a NoMethodException
@jlledom jlledom self-assigned this Dec 17, 2024
lib/3scale/backend/errors.rb Show resolved Hide resolved
lib/3scale/backend/errors.rb Show resolved Hide resolved
lib/3scale/backend/storage_async/client.rb Outdated Show resolved Hide resolved
lib/3scale/backend/storage_async/client.rb Show resolved Hide resolved
lib/3scale/backend/storage_async/pipeline.rb Show resolved Hide resolved
@jlledom jlledom requested a review from akostadinov December 19, 2024 12:10
@jlledom
Copy link
Contributor Author

jlledom commented Dec 19, 2024

@akostadinov I implemented the changes you suggested in our call yesterday

Copy link
Contributor

@akostadinov akostadinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome. Not sure about the sync client. But if my concern is unfounded, good to merge!

lib/3scale/backend/storage_async/client.rb Outdated Show resolved Hide resolved
lib/3scale/backend/storage_helpers.rb Outdated Show resolved Hide resolved
@jlledom jlledom requested a review from akostadinov December 20, 2024 09:16
Comment on lines 291 to 295
options[:reconnect_attempts] = if ThreeScale::Backend.listener?
1
else
ThreeScale::Backend.configuration.redis.async ? Float::INFINITY : 1
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
options[:reconnect_attempts] = if ThreeScale::Backend.listener?
1
else
ThreeScale::Backend.configuration.redis.async ? Float::INFINITY : 1
end
options[:reconnect_attempts] = ThreeScale::Backend.listener? ? 1 : 2**30

Your original approach to make both sync and async wait a long time seemed good to me. So it is fine for me to set just a big value, not necessarily INFINITY.

But if you like to keep original sync behavior to 1 that's also fine. Everything looks good either way. Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried your suggestion but it doesn't work because the sync mode doesn't accept a :reconnect_wait_seconds parameter. Instead it accepts an array like eg: [1,5,15,30]. This would mean: four attempts, wait 1 second after the first attempt; wait five seconds after the second attempt, etc. It also accepts a single number, but then it doesn't wait between attempts. If we pass it a big number like 2**30 it will enter loop that attempts to connect to redis 2**30 times with not wait time, and it takes all the resources in the box.

Copy link
Contributor Author

@jlledom jlledom Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end I came up with this b2130c9

This will make sync and async modes behave equal. Passing the sync client a Hash(5) behaves like an infinite array arr = [5,5,5,...]. No matter the index you send to it, e.g. arr[1000] or arr[1000000] it will always return 5 so the client is tricked to retry forever and wait 5 seconds between attempts.

@@ -291,7 +291,7 @@ def cfg_reconnect_handler(options)
options[:reconnect_attempts] = if ThreeScale::Backend.listener?
1
else
ThreeScale::Backend.configuration.redis.async ? Float::INFINITY : 1
ThreeScale::Backend.configuration.redis.async ? Float::INFINITY : Hash.new(5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this documented? I understand we are going to remove sync anyway so if you tested it working, I guess it's alright. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's documented here. It expects an array, but I wrote a hack to behave like an infinite array of 5. Ugly but effective to match the behavior between sync and async modes.

@jlledom jlledom force-pushed the THREESCALE-11528-redis-reconnect branch from b2130c9 to 5fab2f6 Compare January 9, 2025 09:08
@jlledom jlledom merged commit 0537b18 into 3scale:master Jan 9, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants