Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Leak' of channels (IModel instances) in case where connection is blocked #1573

Open
lukebakken opened this issue May 16, 2024 Discussed in #1131 · 11 comments
Open

'Leak' of channels (IModel instances) in case where connection is blocked #1573

lukebakken opened this issue May 16, 2024 Discussed in #1131 · 11 comments
Assignees
Milestone

Comments

@lukebakken
Copy link
Contributor

Discussed in #1131

Originally posted by neilgreatorex December 24, 2021
Hi,

I think I've come across an issue whereby channels can be 'leaked' during a connection that has been blocked by the broker due to resource constraints. The issue occurs if you try to create new channels while the connection is blocked. The calls to CreateModel() throw a System.TimeoutException and you (quite reasonably) don't get a reference to the channel.

The problem is that once the connection is unblocked, suddenly the channels that you tried to create appear on the broker end and are not cleaned up by any garbage collection in the client. Looking at the heap at this point, the problem seems to be that the sessions created in the SessionManager are not cleaned up. As far as I can see, there's no way for a user of the client library to access the SessionManager via public API.

I've added my sample code to this gist. In the gist is also the results that I get on the console when running the code against a resource constrained (blocked) broker (along with the rabbitmqctl commands that were run and their output).

Please let me know if you want me to raise an issue here on GitHub for this, or if I should post this elsewhere.

Thanks,
Neil

PS. Merry Christmas to those who celebrate 😃🎅

@lukebakken lukebakken added this to the 7.0.0 milestone May 16, 2024
@lukebakken lukebakken self-assigned this May 16, 2024
@michaelklishin
Copy link
Member

In the Java client, the behavior of a Connection#createChannel call is not to return unless a channel.open-ok frame is received. When the connection is blocked by a resource alarm, that will not happen until the operation times out or the alarm clears. So I guess that avoids the leak.

In Bunny, in addition to the behavior above, there's a predicate on Bunny::Connection that returns true when the connection is blocked.

If the Java client option would be too difficult or impractical to pull off, the Bunny solution plus throwing an exception might be a good enough approximation.

@lukebakken
Copy link
Contributor Author

Thanks @michaelklishin. I moved this to an issue to try to reproduce it.

@neilgreatorex
Copy link

I was the original author of the discussion post. I did try this again recently and the behaviour still seems to be there.

@lukebakken
Copy link
Contributor Author

Hi @neilgreatorex, thanks for following up.

I took you code and added it to a test to demonstrate that this issue is fixed in the latest code -
#1587

Note that the TimeoutException when creating channels on a blocked connection does not happen anymore. I'm not sure why to be honest but it's good enough for me.

lukebakken added a commit that referenced this issue Jun 3, 2024
lukebakken added a commit that referenced this issue Jun 3, 2024
@neilgreatorex
Copy link

neilgreatorex commented Jun 3, 2024

Hi @lukebakken. Unfortunately I can still reproduce the issue using the code in master. I have shared my test class at https://gist.github.com/neilgreatorex/63e163c933ca8b2836b987eeac85a7b4.

To test:

  1. Startup a broker that is resource constrained (I used rabbitmqctl set_vm_memory_high_watermark absolute 10).
  2. Run the class I linked, following the console instructions.
  3. After clearing the resource constraint 5 channels appear on the broker (seen with rabbitmqctl list_channels) that the client has no reference to.

@lukebakken
Copy link
Contributor Author

@neilgreatorex do you get TimeoutException thrown when you create the five channels in your env?

@lukebakken lukebakken reopened this Jun 3, 2024
@neilgreatorex
Copy link

@neilgreatorex do you get TimeoutException thrown when you create the five channels in your env?

@lukebakken I get TaskCancelledExceptions now with the async methods

@lukebakken
Copy link
Contributor Author

Thanks for the updated code, I had a bug in my test. I can reproduce this now.

@lukebakken
Copy link
Contributor Author

Here is what is happening - when those CreateChannelAsync methods time out (due to not receiving a response from RabbitMQ), they should NOT be considered valid and added to the channelTasks array, even though the channels are considered open in RabbitMQ.

When the connection is unblocked, RabbitMQ sends the channel.open-ok response, but the task continuation already timed out (TaskCanceledException), so the client library can't do anything with them. The channels are "zombies" within RabbitMQ that will be closed when the connection closes.

Maaybe the next version of this library can keep track of "stale, timed-out" continuations for some time (perhaps when the connection is blocked) to see if a response is sent back, but version 7 will not do that.

cc @michaelklishin @Gsantomaggio I thought you'd find this case interesting.

@lukebakken lukebakken modified the milestones: 7.0.0, 8.0.0 Jun 3, 2024
@neilgreatorex
Copy link

@lukebakken Would it be possible that if the library received a channel.open-ok for a channel ID that it wasn't expecting (i.e. has no continuation for), it could immediately close the channel? This is just a suggestion without knowledge of how it all works and maybe it's not helpful. I don't even know if the library can tell if the channel ID is valid or not 😃

@lukebakken
Copy link
Contributor Author

lukebakken commented Jun 4, 2024

@neilgreatorex that's a good idea.

@lukebakken lukebakken modified the milestones: 8.0.0, 7.0.0 Jun 4, 2024
@lukebakken lukebakken modified the milestones: 8.0.0, 7.1.0 Sep 10, 2024
@lukebakken lukebakken removed this from the 7.1.0 milestone Dec 11, 2024
@lukebakken lukebakken added this to the 8.0.0 milestone Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants