-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synchronous operations like CreateModel block their thread waiting for a response to an async Write that they could be blocking. #1354
Comments
That is fair enough, and like I said above I do have a work around. It does seem like this issue has been cropping up in the wild though since 6.1 (see #1078 #860 #1127 ) and I hope you'd at least consider a setting like "DedicatedWriteLoop" as a patch to 6.5 since the change could be isolated to a single file and turned off by default, then marked obsolete for 7.0. Now if you're not planning any minor releases before 7.0.0, it's a moot point. Also, obligatory thank you for the work you all put in on this library! |
Yes, please feel free to open a PR with this change. |
@michac - I plan on releasing version 6.6.0 this month if you'd like to prepare a PR in time for me to review and test it. Thanks. |
I have refactored my project to no longer create and initialize hundreds of channels simultaneously, so I don't personally need this change anymore. It's a nice add in case anyone else runs across this problem, but it is a setting for an edge case that's going to have to be obsoleted and phased out in the future. I have submitted a pull request, but I definitely understand if you opt not to include it. |
This is also an issue for us. We're using NServiceBus and the NServiceBus.RabbitMQ project also creates more channels as concurrency increases. Point C from michac is totally spot on. This took three days to track down. Since the ThreadPool seems to set the minimum threads low (8 in my testing) it only takes a relative amount of concurrency before ThreadPool starvation (which leads to timeouts in the RabbitMQ .NET Client, but also blocks the ThreadPool for other async continuations as well)... Basically, really hope this fix gets released soon |
There is a workaround - increase the size of the
@brad-wright-512 - what would help us immensely is if you can test @michac's enhancement in your environment - #1392 |
@lukebakken did a bit of testing and things are certainly a lot better. The blocking on the connection itself looks to be entirely gone. However, in my most intense test script I can still see issues due to ThreadPool starvation. It it only happening when trying to create new connections. I have some quick and dirty changes I've made that fix the issue, but I'm struggling to get them formatted neatly in a comment and my company blocks github access. I'll see about getting it somewhere accessible soon. Until then you can reproduce the connection blocking with this script using System.Diagnostics;
using RabbitMQ.Client;
var done = false;
var threadPoolMonitor = new Thread(MonitorThreadPool) { IsBackground = true };
threadPoolMonitor.Start();
void MonitorThreadPool()
{
while (!done)
{
var queuedItems = ThreadPool.PendingWorkItemCount;
var threadCount = ThreadPool.ThreadCount;
//uncomment to watch ThreadPool queue drain
//Console.WriteLine($"Queued: {queuedItems}, Threads: {threadCount}");
Thread.Sleep(1000);
}
}
// simulate thread pool contention
var tasks = Enumerable.Range(0, 50)
.Select(n => Task.Run(() => Thread.Sleep(5000)))
.ToList();
var (hostname, virtualHost, username, password) =
(
"localhost",
"/",
"guest",
"guest"
);
var factory = new ConnectionFactory()
{
HostName = hostname,
VirtualHost = virtualHost,
UserName = username,
Password = password,
UseBackgroundThreadsForIO = true,
Ssl = new SslOption(hostname)
{
Enabled = false,
}
};
factory.Ssl.CertificateValidationCallback += (sender, cert, chain, error) => true;
var sw = Stopwatch.StartNew();
using (var connection = factory.CreateConnection("RMQ Client Test"))
{
Console.WriteLine($"Connection created in {sw.Elapsed}");
sw.Restart();
}
Console.WriteLine($"Connection closed in {sw.Elapsed}");
done = true;
threadPoolMonitor.Join();
Console.WriteLine("Joined"); |
@brad-wright-512 It's a shame github is blocked. We really appreciate your efforts here! I'm assuming that increasing the threadpool size in advance (i.e. the "usual workaround") does fix these issues in your tests... is that correct? |
I think the test that brad is running has diverged a little from the original issue. The problem I was trying to solve with the dedicated write loop was the case where Rabbit operations were artificially blocking thread pool threads because the request (socket write) to trigger the response they were blocking on couldn't be sent due to lack of threads. It felt like something that the rabbit library needed to do was blocked by something else the rabbit library was doing which warranted a library solution, like putting the write loop on a dedicated thread. |
Well, there are still some places in the .NET RMQ Client where sync/async code is mixed and it does cause the same sort of pressure on the thread pool. Admittedly, the circumstances where this would occur are significantly less likely (creating new connections, recovering an existing connection, etc) than creating channels and publishing messages, but it does still exist. I will say something interesting of note, the write loop can still hang when there is thread pool contention even after these changes. To fix this, I had to make it truly full synchronous. private readonly AutoResetEvent _channelChangedEvent = new AutoResetEvent(false);
// ...
public void Write(ReadOnlyMemory<byte> memory)
{
_channelWriter.TryWrite(memory);
_channelChangedEvent.Set();
}
private void SynchronousWriteLoop()
{
while (WaitToRead(_channelReader, out ReadOnlyMemory<byte> memory))
{
_socket.Client.Poll(_writeableStateTimeoutMicroSeconds, SelectMode.SelectWrite);
do
{
if (MemoryMarshal.TryGetArray(memory, out ArraySegment<byte> segment))
{
if (segment.Array != null)
{
_writer.Write(segment.Array, segment.Offset, segment.Count);
MemoryPool.Return(segment.Array);
}
}
}
while (_channelReader.TryRead(out memory));
_writer.Flush();
}
}
private bool WaitToRead(ChannelReader<ReadOnlyMemory<byte>> reader, out ReadOnlyMemory<byte> memory)
{
while (!reader.TryRead(out memory) && !reader.Completion.IsCompleted)
{
_channelChangedEvent.WaitOne(TimeSpan.FromMilliseconds(50));
}
if (reader.Completion.IsCompleted)
{
memory = ReadOnlyMemory<byte>.Empty;
return false;
}
return true;
} |
So you're saying that PR #1392 is not complete? I can see the difference ... not thrilled about polling. @brad-wright-512 - did you see my comment about the workaround to expand the size of the threadpool? It would be great to confirm that doing that solves your issues using the already-released 6.5.0 version of this library. |
@lukebakken it's tricky. Yes it does help because it makes the scenario where deadlock can occur smaller and smaller as you increase the number of threads... but I wonder if that itself doesn't come at some sort of cost? After a bit of fiddling, I managed to get the changes I made out on github here https://github.com/brad-wright-512/rabbitmq-dotnet-client/tree/Fix-thread-pool-contention This shows all the places where sync/async code mixing can cause issues. (Although as previously stated, the changes made by @michac do alleviate most of them.) |
I am experiencing similar problem which is related to thread pool contention and intermittent hangs in ASP.NET 4.8 applications.
I am creating connection and model in synchronous way and subscribing to events. Code
How can I avoid the context capture? |
Describe the bug
CreateModel writes to the async write channel in the SocketFrameHandler, then blocks waiting for a response. If you queue up enough of these operations it uses up all the available threads in the thread pool. The thread pool will start provisioning more threads in response, but once you're over your min thread count it does so relatively slowly, and you can find yourself timing out before it's gotten around to processing the write queue.
I encountered this issue upgrading from 5.2 to 6.5. In my case I am using a library called Akka.NET to manage background jobs. I create an actor (aka thread with message pump) per job, and based on the documentation I figured I needed a channel per job. Over time jobs accumulated in my test environment and now when my app starts up I am attempting to create over 100 channels simultaneously.
Note: This applies to all synchronous operations, not just CreateModel. I can reproduce this with QueueDeclare for example.
In retrospect I would not have implemented my app this way, for several reasons, and in the medium term I plan to centralize my rabbit interactions more and not have so many threads interacting with the library directly. However, I do think it's worth considering fixing this issue because a) I can see how someone could end up here b) I feel like the library is not being a good ThreadPool citizen by locking down threads and relying on theadpool headroom to get unlocked c) It's kind of a bear to figure out what's going on when you encounter the problem.
Reproduction steps
https://github.com/michac/RabbitThreadPoolTest
Expected behavior
Operations like CreateModel are able to complete even if the ThreadPool is too busy to provide a second thread.
I have a few ideas for fixing the problem:
Additional context
Problems like this should go away once the library is async across the board (which sounds like the direction you are headed).
The text was updated successfully, but these errors were encountered: