-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hackney checkout_failure which was working before 1.16.0 #689
Comments
did you try latest master? |
Yes tried
…On Thu, Jun 17, 2021, 1:34 AM Benoit Chesneau ***@***.***> wrote:
did you try latest master?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#689 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACREQNKU5BRZ5TBSROVW2CDTTD7TJANCNFSM46VIYKYA>
.
|
can you send a trace of the request? Can you share a snippet? also ho many connections in the pool? |
We started seeing this too with a connection pool size set to 400. We've been running f876781 since it was merged into master. Unfortunately I don't have much useful information to share yet, but if we come up with something useful I will post it here. So far the only interesting thing was noticing that the message queue of the hackney_pool process started growing quickly at a linear rate so something was stopping it from responding to incoming messages. We saw memory usage on the node increase as well but that might just be the message queue size increasing. y axis is message queue length, x is time: The only other piece of vague data we have is that this might correlate with having very slow hosts responding to us. It's not consistent though, it only happens in select cases, and not to all nodes. If we learn more we will let you know! Thanks for all your hard work on this library @benoitc, I know being a sole maintainer is a very hard job, hopefully I will have something helpful after we try to reproduce the issue. |
I was facing this issue with the default hackney settings |
Yeah, unfortunately we need the connection pool. |
@mattbaker Can you share a s trace and snippet of the code doing the request? |
Yeah I'll work on finding an example, there's some abstraction that makes a simple copy-paste example hard |
@SoniCoder @mattbaker If you've not already solved this you may want to check if you have somehow pinned an older version of I was running into exactly this issue when upgrading hackney on a project and after debugging discovered I had a manual include of an older version which was causing exactly this Temporarily added extra trace point of the try/catch in hackney_pool:checkout/4 and the stack trace showed the missing module.
|
I don't :/ Thanks though! We ended up switching to Finch |
we also saw checkout_failure when traffic go through proxy |
Just bumping this up. Started encountering this after an elixir upgrade. |
Seeing this now after upgrading to OTP 24.3. Maybe related to erlang/otp#5783? |
I have been looking into this for the last couple weeks for a project at work. We are using The issue seems to be related to the I can start a new ticket if needed, but can you @benoitc describe what exactly this process is meant for? I can't seem to find any reference to the We are compiling against OTP 25.1.2. I would be more than happy to run any tests in order to help find and fix this issue. The problem is that I do not know how to trigger the issue. |
I believe we fixed our issue. We had a function that looked something like this. foo(A, B) ->
case parse_string(A, B) of
{ok, Val} -> Val;
{error, {_, reason}} ->
hackney:request ...
parse_string(?DEFAULT_VALUE, B)
end. Essentially if the function returned the error variant, we were not matching on the actual error we cared about for the retry. So when we received a different error message, we were calling this function infinitely. And because of that, we were sending an infinite number of requests to hackney. Once we had multiple processes calling this function, it was filling up the For anyone that may be having this issue, the following is how I was able to find out where the issue was coming from. To start, I needed to crawl through hackneys code in order to figure out what was actually calling the 1> Pid = erlang:whereis(hackney_connections).
2> erlang:process_info(Pid, message_queue_len). Here I found out the process had over 500k messages in the mailbox. So I took a look into the mailbox to see what those messages were. 3> {messages, Messages} = erlang:process_info(Pid, messages) That dumped me out the mailbox at that particular point in time. I then used the following (this may not be exact, I don't have the code on my machine and I may be getting the format slightly wrong here) 4> Domains = lists:foldl(fun({'$gen_call', _, {lookup, Domain, _, _}}, Acc) ->
case maps:get(Domain, Acc) of
{bad_key, _} -> maps:put(Domain, 1, Acc);
Val -> maps:put(Domain Val + 1, Acc)
end, Messages). Which gave me a map with the domains being the key and the value being the number of requests in the mailbox for that given domain. At this point I had an idea of where the requests were going. So I wrote a little gen server that could run and constantly checked the I hope this can help someone else find issues in the future. |
2021-06-14 13:47:38.633 [warning] <0.1002.0>@butler_client_hackney:request:121 HTTP Request unsuccessful: Reason checkout_failure, Retrying after 4000 Request: {"http://192.168.5.107:8080/api-gateway/auth-service/platform-auth/oauth/token",[{"Authorization","Basic ABCD"},{"Cache-Control","no-cache"}],"application/x-www-form-urlencoded","grant_type=client_credentials&client_id=butlerx&client_secret=butlerx"} Method: post
This error goes away when I use: {use_default_pool, false}
The text was updated successfully, but these errors were encountered: