Intermittent NetworkError #378

phillipadsmith · 2020-03-24T19:53:25Z

Warning: Github issues are not an official support channel for Recurly. If you have an urgent request we suggest you contact us through an official channel: [email protected] or https://recurly.zendesk.com

Describe the bug

I've built a simple integration using this library and I am seeing an intermittent recurly.NetworkError exception with the message "Remote end closed connection without response"

This appears to be happening most often for this API call: client.list_accounts()

To Reproduce

You can see the call in context here:
https://github.com/TheTyee/manage-account.thetyee.ca/blob/master/app.py#L56-L79

Expected behavior

For it to work consistently! :)

Your Environment

Which version of this library are you using?
Recurly Python client 3.0
Which version of the language are you using?
Python 3.8.2

The text was updated successfully, but these errors were encountered:

bhelx · 2020-03-24T20:02:55Z

Thanks for the issue @phillipadsmith . We are currently working on making the clients more resilient and pushed out some changes in ruby and node this week to address this problem. The client should be able to gracefully handle this by retrying on "open" timeouts. Apologies for the problems this is causing. We can prioritize fixing python next.

bhelx · 2020-03-24T20:07:32Z

Any other details you are seeing the response? I want to make 100% sure we're gonna solve the right problem.

phillipadsmith · 2020-03-24T20:11:04Z

Any other details you are seeing the response? I want to make 100% sure we're gonna solve the right problem.

Hi @bhelx -- thanks for the quick reply. That's all of the information I'm getting back from the exception. If you have a suggestion for how I can get a more verbose error message, I'd be happy to try it out.

bhelx · 2020-03-24T20:19:57Z

Sure, let me know if you have any more info. We'll keep moving forward with the expected fix in the mean time.

bhelx · 2020-03-24T20:23:18Z

@phillipadsmith what's the purpose of limit=1 here? https://github.com/TheTyee/manage-account.thetyee.ca/blob/master/app.py#L57

bhelx · 2020-03-24T20:24:52Z

If you're just trying to get the first / only account that matches, this will help you when it lands: #367

phillipadsmith · 2020-03-24T20:27:32Z

@phillipadsmith what's the purpose of limit=1 here? https://github.com/TheTyee/manage-account.thetyee.ca/blob/master/app.py#L57

I was hoping it would shorten the response time to only request one record. And, I'm only searching for one record (and only one should ever be returned).

If you're just trying to get the first / only account that matches, this will help you when it lands: #367

Ah-ha. Yes, that's what I'm trying to do. I'll keep an eye out for that release.

bhelx · 2020-03-24T20:36:53Z

Great, just checking. That's an appropriate way to use it. There has been a little confusion over what limit means so i wanted to be sure.

phillipadsmith · 2020-03-24T23:26:53Z

@bhelx What was the fix implemented in the Ruby and Node modules? Is there a similar tactic I could implement in my version of the module, or in the application itself, e.g., catching the exception and trying the request again some limited number of times?

phillipadsmith · 2020-03-24T23:28:01Z

@bhelx And what is the root cause of the timeouts? Is it the Recurly API not responding? I've never experienced this timeout when using the API endpoints directly with Postman or CURL or similar.

bhelx · 2020-03-24T23:34:58Z

@phillipadsmith I can't say what the root cause is just yet. At scale though, you're gonna have some random TCP-level blips. Most of the time, it's something that can be recovered from. Some http libraries handle this for you under the hood and postman may be doing that as well. We are trying our best to keep these clients dependency free for security and maintainability reasons so we are implementing some of this ourselves.
Here is the ruby change that went out: recurly/recurly-client-ruby#568
There isn't a clean way to do this with what we have in master, but keep in mind that you can safely retry GET requests as many times as you want. So you could catch a network error, sleep a few milliseconds, and retry some number of times.

bhelx · 2020-03-24T23:54:58Z

Again sorry about this, I'll be working on implementing and testing this the next couple days.

phillipadsmith · 2020-03-25T00:41:14Z

Again sorry about this, I'll be working on implementing and testing this the next couple days.

@bhelx No need to apologize. I appreciate the quick replies and it's helpful just to hear that I'm not crazy and that this is a known issue. :)

phillipadsmith · 2020-03-25T00:51:11Z

Couple of quick questions, @bhelx:

At scale though, you're gonna have some random TCP-level blips. Most of the time, it's something that can be recovered from.

True enough, but I've only got this on a test server at the moment, so we're not running any traffic against this yet -- it's just me and one other other person testing, and the exception happens quite often (almost too often, IMHO, to be a blip -- almost like there's another issue at play).

Some http libraries handle this for you under the hood and postman may be doing that as well.

Yes, interestingly, I've never had this happen when running the app from Flask, and I've never seen it in Postman.

And it never happens immediately after I start the app with Gunicorn, only after some time has passed, perhaps hours. Then it happens very frequently and very consistently until I restart the application.

Is there anything about running this library in a pre-fork WSGI server that might be introducing this network error?

bhelx · 2020-03-25T19:17:14Z

it's just me and one other other person testing, and the exception happens quite often (almost too often, IMHO, to be a blip -- almost like there's another issue at play).

This is something I personally see more on the order of 1 out of ~10,000 requests. If you are able to reproduce, could you try to print out some more information from the underlying socket error?
From the message you pasted, it sounds like a RemoteDisconnected error but there may be some more info in the message we can use to diagnose.

Is there anything about running this library in a pre-fork WSGI server that might be introducing this network error?

So your webserver is multiprocess and not multithreading right? It is possible I suppose. I'm not sure how python would handle forking an instance an Client with an already open connection: https://github.com/TheTyee/manage-account.thetyee.ca/blob/master/app.py#L44

Since forking is copy-on-write, my gut would tell me that you want to create the connection after you fork. You could test this theory by trying to create the client inside of the accounts_search method. This wouldn't be the recommended way for production because you want to re-use the connection where possible, but if the problem goes away you'll have more evidence that forking is causing the issue.

bhelx · 2020-03-25T19:23:24Z

@phillipadsmith i'm not sure if flask has a special way to deal with this, but also consider using a "post-fork" hook of some kind: https://docs.pylonsproject.org/projects/pyramid_cookbook/en/latest/deployment/forked_threaded_servers.html?highlight=sqlalchemy

Perhaps flask has a way of letting you run some code after a fork happens.

phillipadsmith · 2020-03-25T19:23:32Z

Thanks for those thoughts, @bhelx, appreciated.

This is something I personally see more on the order of 1 out of ~10,000 requests.

Oh, yes, definitely able to get it 8/10 times right now.

could you try to print out some more information from the underlying socket error?

Is there an easy way to get more verbose output from the exception? My assumption was that it was coming from the Recurly Python client.

Many thanks in advance!

bhelx · 2020-03-25T19:33:25Z

Oh, yes, definitely able to get it 8/10 times right now.

Oh, yeah that suggests a different problem entirely. Although I think the fix I'm working on might help. But I think we need to address the root problem.

Is there an easy way to get more verbose output from the exception? My assumption was that it was coming from the Recurly Python client.

A stack trace would be helpful. Let it explode. When i evoke one it looks like this:

Traceback (most recent call last):
  File "./clients/python/recurly/base_client.py", line 42, in _make_request
    http_resp = self.__conn.request(method, path, body, headers=headers).getresponse()
  File "/usr/lib/python3.7/http/client.py", line 1244, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1290, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1239, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/usr/lib/python3.7/http/client.py", line 966, in send
    self.connect()
  File "/usr/lib/python3.7/http/client.py", line 1399, in connect
    super().connect()
  File "/usr/lib/python3.7/http/client.py", line 938, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "/usr/lib/python3.7/socket.py", line 707, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/usr/lib/python3.7/socket.py", line 748, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "scripts/v2019-10-10/python/create_account.py", line 33, in <module>
    account = client.create_account(account_create)
  File "./clients/python/recurly/client.py", line 133, in create_account
    return self._make_request("POST", path, body, None)
  File "./clients/python/recurly/base_client.py", line 71, in _make_request
    raise NetworkError(e)
recurly.NetworkError: [Errno -2] Name or service not known

phillipadsmith · 2020-03-25T19:38:40Z

👍Let me see if I can get a stack trace.

phillipadsmith · 2020-03-25T21:09:16Z

I've imported traceback and I'm trying to get the error again 🤞

phillipadsmith · 2020-03-25T22:42:16Z

@bhelx Hey, I'm not getting any output from traceback when an exception occurs: https://github.com/TheTyee/manage-account.thetyee.ca/blob/master/app.py#L79-L80

Any idea what I've got wrong here?

bhelx · 2020-03-25T22:43:42Z

I'm not really familiar with it's usage, what happens if you just remove the whole rescue block for network exceptions? that's what i did in my script. python dumps the stack trace for you.

phillipadsmith · 2020-03-25T22:46:41Z

🤦‍♂ Didn't even think of that.

bhelx · 2020-03-25T23:12:49Z

@phillipadsmith did you look into my earlier comment about creating the client after fork? I'm reading some people recommend you use the before_first_request decorator to make sure that a database or http client connection is created in each fork:

I'm not sure how to reproduce your problem so I can't test this assumption on your app.

phillipadsmith · 2020-03-26T01:04:11Z

did you look into my earlier comment about creating the client after fork? I'm reading some people recommend you use the before_first_request decorator to make sure that a database or http client connection is created in each fork:

I will read up on those and give them a try also. I'm still trying to get a useful stack trace!

Thanks @bhelx, appreciated.

phillipadsmith · 2020-04-01T07:25:14Z

did you look into my earlier comment about creating the client after fork? I'm reading some people recommend you use the before_first_request decorator to make sure that a database or http client connection is created in each fork:

Thanks for you help, @bhelx.

I've made an update to the app so that Gunicorn will receive a post_fork hook that creates a Recurly client.

I haven't tested extensively, but I'm currently not able to trigger the error. I'll keep you updated.

phillipadsmith · 2020-04-01T20:23:27Z

Hey @bhelx!

Quick update: So the post_fork didn't fix the issue, I continued to get the network error Remote end closed connection without response error messages.

I'm now trying the approach of creating the client inside the account_search method.

I'll read up on the before_first_request hook also.

🤞

phillipadsmith · 2020-04-04T19:11:11Z

Happy Saturday @bhelx,

Just a quick update on this:

As crazy as it seemed, I tested your idea of [creating the client inside of each method].(https://github.com/TheTyee/manage-account.thetyee.ca/blob/master/app.py)
I then ran automated tests overnight, testing the complete flow that triggers each method.
After 12 hours of testing, ~600 tests, no errors have been reported. So, it seems like that resolved the intermitted network error that we were experiencing.

As you mention, this is not an ideal way to have something set-up for a production application, and yet this is a very low use application so I believe we're going to try it out in production.

I was never able to trigger a useful stack trace in part because Flask doesn't surface its errors up to Gunicorn, and the network errors only happened when running the application using Gunicorn, so it was a bit of a rabbit hole trying to figure out how to work around that while also hoping for the error to surface again.

All that said, I hope it's useful to know that there appears to be some kind of a intermitted network error being thrown by the Recurly client in this rather common configuration of Flask + Gunicorn.

Let me know if you have any questions. Know that we've got a work around that I can deploy for this project, I'm happy to go back and re-test things in my development environment.

Many thanks for your help & have a great weekend,
Phillip.

bhelx · 2020-04-04T20:45:54Z

@phillipadsmith I appreciate you digging so deeply into it. It does sound like there is some kind of issue with Gunicorn and the open connection. I'd like to keep this issue open until we fix it. When my team gets a free moment, we'll try to make a minimal working example and see if we can reproduce. If it does turn out that there is a change that people will need to make in their Gunicorn apps, then we'll document it.

declaresub · 2021-06-11T15:48:00Z

I encountered this same problem. In my case, the problem was using a single recurly.Client instance for multiple connections. Probably the recurly requests were stomping on each other. Using a recurly.Client instance per connection resolved the "Remote end closed connection without response" problems.

ridersofrohan · 2022-08-17T19:33:36Z

@bhelx Can you fix this using the requests library and just creating a session?

bhelx · 2022-08-21T02:07:58Z

@ridersofrohan I'm no longer at Recurly. I think the best person to contact would be @douglasmiller .

Regarding this code though. It doesn't use requests so you can't just set a session anywhere. It uses the native http.client module. I suspect a problem here is that we are creating one connection per client:

recurly-client-python/recurly/base_client.py

Line 45 in d01c772

self.__conn = http.client.HTTPSConnection(

It might make more sense to use a connection pool of some kind in the base client rather than a single connection like we do in the other clients.

phillipadsmith added the bug? label Mar 24, 2020

bhelx self-assigned this Mar 24, 2020

joannasese added the V3 v2019-10-10 Client label Feb 11, 2021

bhelx removed their assignment Jun 11, 2021

ridersofrohan mentioned this issue Aug 17, 2022

Recreate HTTPSConnection after errors #567

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent NetworkError #378

Intermittent NetworkError #378

phillipadsmith commented Mar 24, 2020 •

edited

Loading

bhelx commented Mar 24, 2020

bhelx commented Mar 24, 2020

phillipadsmith commented Mar 24, 2020 •

edited

Loading

bhelx commented Mar 24, 2020

bhelx commented Mar 24, 2020

bhelx commented Mar 24, 2020

phillipadsmith commented Mar 24, 2020

bhelx commented Mar 24, 2020 •

edited

Loading

phillipadsmith commented Mar 24, 2020

phillipadsmith commented Mar 24, 2020

bhelx commented Mar 24, 2020 •

edited

Loading

bhelx commented Mar 24, 2020

phillipadsmith commented Mar 25, 2020

phillipadsmith commented Mar 25, 2020 •

edited

Loading

bhelx commented Mar 25, 2020 •

edited

Loading

bhelx commented Mar 25, 2020

phillipadsmith commented Mar 25, 2020

bhelx commented Mar 25, 2020

phillipadsmith commented Mar 25, 2020

phillipadsmith commented Mar 25, 2020

phillipadsmith commented Mar 25, 2020

bhelx commented Mar 25, 2020 •

edited

Loading

phillipadsmith commented Mar 25, 2020

bhelx commented Mar 25, 2020

phillipadsmith commented Mar 26, 2020

phillipadsmith commented Apr 1, 2020

phillipadsmith commented Apr 1, 2020

phillipadsmith commented Apr 4, 2020

bhelx commented Apr 4, 2020

declaresub commented Jun 11, 2021

ridersofrohan commented Aug 17, 2022

bhelx commented Aug 21, 2022

Intermittent NetworkError #378

Intermittent NetworkError #378

Comments

phillipadsmith commented Mar 24, 2020 • edited Loading

bhelx commented Mar 24, 2020

bhelx commented Mar 24, 2020

phillipadsmith commented Mar 24, 2020 • edited Loading

bhelx commented Mar 24, 2020

bhelx commented Mar 24, 2020

bhelx commented Mar 24, 2020

phillipadsmith commented Mar 24, 2020

bhelx commented Mar 24, 2020 • edited Loading

phillipadsmith commented Mar 24, 2020

phillipadsmith commented Mar 24, 2020

bhelx commented Mar 24, 2020 • edited Loading

bhelx commented Mar 24, 2020

phillipadsmith commented Mar 25, 2020

phillipadsmith commented Mar 25, 2020 • edited Loading

bhelx commented Mar 25, 2020 • edited Loading

bhelx commented Mar 25, 2020

phillipadsmith commented Mar 25, 2020

bhelx commented Mar 25, 2020

phillipadsmith commented Mar 25, 2020

phillipadsmith commented Mar 25, 2020

phillipadsmith commented Mar 25, 2020

bhelx commented Mar 25, 2020 • edited Loading

phillipadsmith commented Mar 25, 2020

bhelx commented Mar 25, 2020

phillipadsmith commented Mar 26, 2020

phillipadsmith commented Apr 1, 2020

phillipadsmith commented Apr 1, 2020

phillipadsmith commented Apr 4, 2020

bhelx commented Apr 4, 2020

declaresub commented Jun 11, 2021

ridersofrohan commented Aug 17, 2022

bhelx commented Aug 21, 2022

phillipadsmith commented Mar 24, 2020 •

edited

Loading

phillipadsmith commented Mar 24, 2020 •

edited

Loading

bhelx commented Mar 24, 2020 •

edited

Loading

bhelx commented Mar 24, 2020 •

edited

Loading

phillipadsmith commented Mar 25, 2020 •

edited

Loading

bhelx commented Mar 25, 2020 •

edited

Loading

bhelx commented Mar 25, 2020 •

edited

Loading