Processes with unterminated TCP connections are non reusable. #2592

tubsandcans · 2025-02-19T21:07:40Z

In our multi-process Passenger+Nginx service, we are noticing an eventual reduction in available RAM up until the point where the service crashes. This only began happening after integrating a feature that reads+writes to AWS S3, the gem for which utilizes a connection-pool by placing used connections in CLOSE_WAIT state for future resumption.

The problem, however, is that once a Passenger RubyApp instance/process is linked to one of these CLOSE_WAIT TCP connections, it appears to no longer have requests routed to it, thus locking up system resources. This causes Passenger to eventually create many more RubyApp instances than our max process count, which inevitably crashes the app.

I brought this issue up with ruby-aws-sdk gem maintainers, and I am able to work around this issue by monkey patching that gem. However, it seems worth investigating this issue from the perspective of Passenger and it (possibly) not being able to reuse processes with CLOSE_WAIT TCP connections.

mullermp · 2025-03-03T14:12:47Z

@CamJN @FooBarWidget I notice you both are maintainers of this repo. Do you mind taking a look at this issue and provide any explanation on how this might be blocking processes? This impacts this particular customer's usage of AWS SDK for Ruby.

FooBarWidget · 2025-03-04T16:49:10Z

Hi @mullermp and @tubsandcans, Passenger doesn't inherent have any interactions with how the app deals with TCP sockets, so I don't really see how Passenger can be a cause here. However, you can use this trick to debug things: when a process is stuck, send SIGQUIT to it, and it'll dump the backtraces of all threads. This will allow you to see whether it's blocked on anything.

If you still suspect it's Passenger-related, please provide the output of passenger-status during a problematic time. It would be even more helpful if you can provide a reproducible case.

tubsandcans · 2025-03-04T17:35:54Z

Hi @FooBarWidget, thanks for replying! The issue here is that I cannot safely reproduce this error as it only happens in production under heavy load/use. To reproduce requires that my production service crashes, which I'm not willing to do.

All I know is that Passenger had no problem re-using its supervised processes before they had any TCP sockets in CLOSE_WAIT (this behavior being introduced by the aws-sdk gem). Since then, I have monkey-patched this gem to not leave sockets in CLOSE_WAIT and instead destroy them. Everything has been working as it had before the aws-sdk integration, so there is definitely some interplay going on between Passenger and its supervised processes with CLOSE_WAIT sockets.

FooBarWidget · 2025-03-05T10:08:23Z

The only thing I can think of is that, by keeping so many sockets open, you reach the Ruby process's file descriptor limit. Sometimes, when Passenger's Ruby side accepts a new requests, it may have to open a new socket connection (= new file descriptor). That would fail if the limit was already reached at that point. But if that's the case then you should see error messages in your web server error log file.

FooBarWidget · 2025-03-05T10:09:46Z

And since you can't safely reproduce this issue, then the next best thing you can do is to automatically run diagnostics next time the issue does happen. You could create a cron job that sends SIGQUIT to all your application processes once every few minutes, so that if any of them does freeze, then at least you have backtraces.

mullermp · 2025-03-21T13:57:30Z

@tubsandcans Did you investigate @FooBarWidget's response?

tubsandcans · 2025-03-21T14:05:35Z

@mullermp I don't have a reasonable way to test it. I'd have to roll back the patch and wait until our main production service falls over and then hopefully gather enough debug information to find a solution.

I'm not (nor is my boss) willing to do that. I was hoping someone would give me a concrete answer about Passenger's supervisor and supervised processes with CLOSE_WAIT sockets affecting its ability to re-use them. I think our long-term solution is to abandon Passenger, it's a form of tech debt unfortunately.

mullermp · 2025-03-21T14:07:50Z

I don't have a dog in that fight, but I would recommend just using Puma as it's pretty much the default for rails applications. You should not see issues with Ruby SDK's connection pooling.

tubsandcans mentioned this issue Feb 19, 2025

S3 access results in improperly terminated TCP sockets. aws/aws-sdk-ruby#3193

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processes with unterminated TCP connections are non reusable. #2592

Processes with unterminated TCP connections are non reusable. #2592

tubsandcans commented Feb 19, 2025

mullermp commented Mar 3, 2025

FooBarWidget commented Mar 4, 2025

tubsandcans commented Mar 4, 2025

FooBarWidget commented Mar 5, 2025

FooBarWidget commented Mar 5, 2025

mullermp commented Mar 21, 2025

tubsandcans commented Mar 21, 2025

mullermp commented Mar 21, 2025

Processes with unterminated TCP connections are non reusable. #2592

Processes with unterminated TCP connections are non reusable. #2592

Comments

tubsandcans commented Feb 19, 2025

mullermp commented Mar 3, 2025

FooBarWidget commented Mar 4, 2025

tubsandcans commented Mar 4, 2025

FooBarWidget commented Mar 5, 2025

FooBarWidget commented Mar 5, 2025

mullermp commented Mar 21, 2025

tubsandcans commented Mar 21, 2025

mullermp commented Mar 21, 2025