-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix in Netflix / zuul not in spring-cloud-starter-zuul... hoping it can be added. #1871
Comments
It looks like the fix is present in Line 112 in f3deb04
|
Hm... so it is. Thanks for finding that, ryanjbaxter. Which begs the question, then: Why are we still seeing the CLOSE_WAIT connection leak issue using a version of Zuul which has the fix. We'll pursue further investigation on our side for now, having verified where we stand re: that fix. Thanks again for your help. |
So #1372 deals with that and is in the Dalston release (1.3.0.RELEASE) and 1.2.7.BUILD-SNAPSHOT |
Thanks much Spencer. Time to try another upgrade, then. |
if you try the 1.2.7 snapshots and that works, we could work on prioritizing releasing 1.2.7 |
Cool, I appreciate that. I'll respond again when we've tested the snapshot version and know more. Thanks again. |
No luck. Running Zuul with Dalston and spring-cloud-starter-zuul:1.2.7.BUILD-SNAPSHOT did not resolve the problem with CLOSE_WAIT connections eventually tipping over the app instance, for us. The instance became unresponsive to incoming requests after ~4000 CLOSE_WAIT connections accumulated, in about 12 hours. I also found out yesterday that we saw similar behavior recently with Spinnaker's Front50 service, which became unresponsive to incoming requests after ~10000 CLOSE_WAIT connection accumulated. |
It sounds like you used both |
My mistake. I misunderstood what Spencer meant. Thank you for clarifying. I'll give it another shot using Camden and the 1.2.7 snapshot. |
One more update: Running with The number of long-running CLOSE_WAIT connections the instance accumulated before becoming unresponsive increased significantly, as did the amount of time it took to get there, but the problem reoccurred. On an instance using our previous configuration, it usually takes around 50 connections and about an hour to tip over; the updated test instance plateaued around 230 long-running connections, and lasted much of the weekend before we saw the issue. Here is the Gradle dependency tree for our updated setup... wondering if someone can take a look and verify we're pulling in the correct version(s) of relevant dependencies to use the fix? Dependency Tree (click to expand)
+--- org.springframework.cloud:spring-cloud-starter-eureka: -> 1.2.6.RELEASE [Sorry for the messy formatting... haven't been able to figure out how to include code formatting in a collapsible block...] |
Can you show us your |
Is there a specific subsection I can post? |
I just want to see how you have setup your dependencies. |
ok Let me know if this isn't sufficient info... build.gradle
buildscript { Your help with this is very much appreciated. |
I have little hope this will actually make a difference but I meant to use
Then you wouldnt need to do you should just need The BOM will pull in |
Gotcha, I'll give that a shot. |
Ah, I do see a change in the zuul portion of the dependency tree after that change... before
+--- org.springframework.cloud:spring-cloud-starter-zuul:1.2.7.BUILD-SNAPSHOT after
+--- org.springframework.cloud:spring-cloud-starter-zuul: -> 1.2.7.BUILD-SNAPSHOT Hopefully that will do it. |
Once again, unfortunately, we saw the issue reoccur when using the The behavior of the long-running CLOSE_WAIT connections changed somewhat, with regard to the rate at which the number of such connections increased, but the app still ended up unresponsive after about a day running in our environment. Any other suggestions would be appreciated. Is there debug or trace logging around the bug fix code in Zuul that we can use to verify that our requests are actually using the fixed logic? |
I dont see any logging around it. You are using Ribbon in your Zuul proxy and not specifying URLs for the routes, correct? |
There's a mix... We use ribbon for the majority of routes, but we do have a few that forward requests to AWS S3 for static content using a bucket URL. |
So the fix above would only apply to those requests using Ribbon, not the ones using URL routes. Maybe it is the URL routes that is causing the issue..... |
It's possible... We'll try to determine whether the connections with the issue when running the snapshot version of Camden are all non-ribbon connections. In the mean time, any word on an ETA for the Camden SR7 release (containing the fix)? |
SR7 went live a day or two ago |
Thanks Spencer |
@ryanjbaxter Based on the reporting of long-running CLOSE_WAIT connections we have on our Zuul instances, running with the Camden.BUILD-SNAPSHOT version still resulted in connections to EC2 instances (via Ribbon) in the long-running CLOSE_WAIT state, and the Zuul instances becoming unresponsive. We'll continue our testing with the Camden.SR7 release and see if anything changes... I'll respond again if/when we have more information that might be helpful. |
Hello
There is a bug fix in Netflix / zuul that I'm interested in, but it doesn't appear to be available in spring-cloud-netflix (at least, not in the version we use or any other more recent version that I can find).
The version of spring-cloud-starter-zuul we use is:
org.springframework.cloud:spring-cloud-starter-zuul:1.2.6.RELEASE
... which uses:
com.netflix.zuul:zuul-core:1.3.0
Secifically, #127 and #327 discuss the issue, and the fix was added to zuul-netflix as part of this change. It was not added to zuul-core, however.
It looks like the fix was present in this version of RibbonCommand.java in spring-cloud-netflix, but that file is not present in the version we use. It also looks like the RibbonCommand.java file was replaced with RestClientRibbonCommand.java in the
1.2.6-RELEASE
version, which is missing the fix.Please let me know if anything I've said is inaccurate.
Is there a current version of spring-cloud-starter-zuul which contains the fix?
If not, can it be added?
The text was updated successfully, but these errors were encountered: