Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zuul/Hystrix: HTTP connection leak in case of timeout #127

Open
brenuart opened this issue Jun 11, 2015 · 9 comments
Open

Zuul/Hystrix: HTTP connection leak in case of timeout #127

brenuart opened this issue Jun 11, 2015 · 9 comments

Comments

@brenuart
Copy link

See spring-cloud/spring-cloud-netflix#327 for a complete description of the issue and scenarios to reproduce it.

The bottom line is:

It turns out that Zuul leaks HTTP connections if a service takes longer than the Hystrix execution timeout but less than the ReadTimeout. To summarise:

execTime < hystrixTimeout < readTimeout --> OK (no leak)
hystrixTimeout < execTime < readTimeout --> LEAK
hystrixTimeout < readTimeout < execTime --> OK (no leak)
readTimeout < execTime < hystrixTimeout --> OK (no leak)
readTimeout < hystrixTimeout < execTime --> LEAK

The problem has been identified in the RibbonCommand returning an HttpResponse that will never be closed in case of Hystrix timeout. Hence causing connection leaks in the HTTP connection pool.

Although the issue affects SpringCloud's version of the Zuul server, it seems that com.netflix.zuul.dependency.ribbon.hystrix.RibbonCommand would be affected as well.

@NiteshKant
Copy link
Contributor

@brenuart sounds like a bug, since you are aware of the source of the issue, do you want to send a PR?

@NiteshKant NiteshKant added the bug label Aug 25, 2015
@brenuart
Copy link
Author

As you say I'm aware of the source of the issue but I have no clue on how to solve it "nicely".
Here is how I understand the problem:

1/ zuul-netflix-webapp/src/main/groovy/filters/route/ZuulNFRequest.groovy creates a new com.netflix.zuul.dependency.ribbon.hystrix.RibbonCommand for every incoming request

2/ ribbonCommand.execute() is invoked inside a try/catch block. The command returns an HttpResponse instance upon successful completion that holds the used HTTP connection. This connection is automatically closed when the stream is fully read or the response explicitly closed.

If the command times out, an Hystrix Timeout exception is thrown and no HttpResponse is ever returned so there is no way to release/close it. The connection is never returned to the pool hence causing the leak.

According to me, the RibbonCommand.execute() method should be responsible to release its ressources in case of exception. I'm not an Hystrix expert so I wonder if there is a callback or other a mechanism for a command to be notified when it times?

Our (temporary) approach was to slightly modify the RibbonCommand.forward() method to test for execution timeout before returning the HttpResponse and eagerly close it if needed.

But again, someone with better Hystrix and/or Zuul experience may find a better and cleaner approach.

@z0mb1ek
Copy link

z0mb1ek commented Apr 13, 2016

any news?

@dorkolog
Copy link

dorkolog commented May 29, 2016

@brenuart @NiteshKant @z0mb1ek

We were also experiencing connection leaks with Zuul.
Unfortunately, setting the hystrixTimeout or readTimeout didn't help much.

The only thing that did help was upgrading the apache-httpclient module and use the org.apache.http.impl.conn.PoolingHttpClientConnectionManager, since all the rest were deprecated (com.netflix.http4.NFHttpClientFactory creates com.netflix.http4.NFHttpClient that uses hardcoded
org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager).

Most of this factory code is pretty much sealed and cannot be changed easily, so we had to do many tricks to let this happen (mainly - re-write the NFHttpClientFactory, the NFHttpClient and injecting our factory).

So far - so good. Connections leaks are gone, and now we don't see any CLOSE_WAIT connections in netstat.

@kerumai
Copy link
Contributor

kerumai commented Jun 7, 2016

@brenuart were you doing something like this in RibbonCommand? :

    HttpResponse forward() throws Exception {

        NFRequestContext context = NFRequestContext.getCurrentContext();


        HttpRequest.Builder builder = HttpRequest.newBuilder().
                verb(verb).
                uri(uri).
                entity(requestEntity);

        for (String name : headers.keySet()) {
            List<String> values = headers.get(name);
            for (String value : values) {
                builder.header(name, value);
            }
        }

        for (String name : params.keySet()) {
            List<String> values = params.get(name);
            for (String value : values) {
                builder.queryParams(name, value);
            }
        }

        HttpRequest httpClientRequest = builder.build();

        HttpResponse response = restClient.executeWithLoadBalancer(httpClientRequest);
        context.setZuulResponse(response);

        // Here we want to handle the case where this hystrix command timed-out before the
        // ribbon client received a response from origin. So in this situation we want to
        // cleanup this response (release connection) now, as we know that the zuul filter
        // chain has already continued without us and therefore won't cleanup itself.
        if (isResponseTimedOut()) {
            response.close();
        }

        return response;
    }

As that seems like maybe the simplest solution to this - at least I'm not seeing a better way to solve this without significant work.

If this sounds ok, I'll merge this in.

@brenuart
Copy link
Author

brenuart commented Jun 7, 2016

Yes. At least for the part:

        if (isResponseTimedOut()) {
            response.close();
        }

kerumai added a commit that referenced this issue Jun 8, 2016
Handle the case where the hystrix RibbonCommand timed-out before the ribbon client received a response from origin.
So in this situation we want to release the connection now, as we know that the zuul filter chain has already continued
without us and therefore won't cleanup itself.
kerumai added a commit that referenced this issue Jun 8, 2016
@kerumai
Copy link
Contributor

kerumai commented Jun 8, 2016

Ok, I've merged that change (#225), and done a new patch release - v.1.2.2

@DaveBender
Copy link

This issue is still marked 'Open,' but the comments suggest that a bug fix has been merged nearly a year ago. Is this fixed or is it still hanging out there open?

Copy link

github-actions bot commented Dec 7, 2024

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants