You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are in the process of migrating our .NET project from Framework to Core (and from Windows to Linux) and, during peak hours and thread starvation, we are getting a lot of Redis timeouts.
It's best illustrated with an MRE.
Here's what the script does:
Creates and connects a Redis multiplexer.
Saturates the thread pool with lots of units of work that do nothing except sleeping.
Runs and awaits a single GET from Redis.
Run it by installing dotnet tool install -g dotnet-script and then running dotnet script mre.csx
#nullable enable
#r "nuget: StackExchange.Redis, 2.7.33"usingSystem.Runtime.CompilerServices;usingSystem.Threading;usingStackExchange.Redis;usingStackExchange.Redis.Profiling;constintsleepTimeMilliseconds=7000;constintextraSleepingThreads=10;conststringredisConnectionString="127.0.0.1:7200";ThreadPool.GetMinThreads(outvarworkerThreads,outvarcompletionPortThreads);WriteLine($"Min worker threads: {workerThreads}, min completion port threads: {completionPortThreads}");varsleepingThreadCount=workerThreads+extraSleepingThreads;varconnectionMultiplexer=awaitConnectionMultiplexer.ConnectAsync(redisConnectionString, options =>{options.ClientName="redis-timeout-test";},Error);awaitconnectionMultiplexer.GetDatabase(0).StringGetAsync("test");varprofilingSession=newProfilingSession();connectionMultiplexer.RegisterProfiler(()=>profilingSession);voidPrintStats(stringstage,intid,[CallerMemberName]string?caller=null){WriteLine($"{caller}\t{id}\t{stage}\t{Thread.CurrentThread.ManagedThreadId}");}voidSleep(object?state){vari=(int)state!;try{PrintStats("start",i);Thread.Sleep(sleepTimeMilliseconds);PrintStats("end",i);}catch{PrintStats("exception",i);throw;}}asyncTaskRedis(inti){try{PrintStats("start",i);awaitconnectionMultiplexer.GetDatabase().StringGetAsync("test");PrintStats("end",i);}catch{PrintStats("exception",i);throw;}}for(vari=0;i<sleepingThreadCount;i++){ThreadPool.QueueUserWorkItem(Sleep,i);}awaitRedis(0);foreach(varprofiledCommandinprofilingSession.FinishProfiling()){WriteLine(profiledCommand);}
Our expectation is that thread starvation should not affect the inner workings of Redis client, and that the request should succeed.
It works on Windows (both in .NET 4.7.2 and .NET 8), but not on Linux.
On Linux, it fails with a timeout:
Sleep 0 start 14
Sleep 3 start 8
Sleep 4 start 11
Sleep 5 start 7
Sleep 1 start 12
Redis 0 start 5
Sleep 6 start 13
Sleep 2 start 10
Sleep 7 start 9
Sleep 8 start 15
Sleep 9 start 29
Sleep 10 start 30
Sleep 11 start 31
Sleep 12 start 5
Sleep 13 start 32
Sleep 14 start 33
Sleep 15 start 34
Sleep 16 start 35
Sleep 17 start 36
Redis 0 exception 37
StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 5461ms elapsed, timeout is 5000ms), command=GET, next: GET test, inst: 0, qu: 0, qs: 1, aw: False, bw: Inactive, rs: ReadAsync, ws: Idle, in: 5, in-pipe: 0, out-pipe: 0, last-in: 0, cur-in: 0, sync-ops: 0, async-ops: 3, serverEndpoint: 127.0.0.1:7201, conn-sec: 5,49, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: redis-timeout-test, PerfCounterHelperkeyHashSlot: 6918, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=19,Free=32748,Min=12,Max=32767), POOL: (Threads=19,QueuedItems=5,CompletedItems=101,Timers=1), v: 2.7.33.41805 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
at Submission#0.<Redis>d__11.MoveNext() in /mnt/c/Users/******/Projects/redis-test/main.csx:line 54
--- End of stack trace from previous location ---
at Submission#0.<<Initialize>>d__0.MoveNext() in /mnt/c/Users/******/Projects/redis-test/main.csx:line 69
--- End of stack trace from previous location ---
at Dotnet.Script.Core.ScriptRunner.Execute[TReturn](String dllPath, IEnumerable`1 commandLineArgs) in C:\Users\runneradmin\AppData\Local\Temp\tmp8BAF\Dotnet.Script.Core\ScriptRunner.cs:line 110
The reason for this is that even though the library creates a dedicated thread pool and does all the work inside this pool, some continuations still run on the main thread pool (which kinda defeats the purpose).
The problem seems to be here:
Breakpoint reached: /home/******/.config/JetBrains/Rider2024.1/resharper-host/SourcesCache/37cf93917ec38cc6a96176b4240fe2fe05f294df6d9955a4c023fd9b30aa58/SocketAwaitableEventArgs.cs:142,21
at Pipelines.Sockets.Unofficial.SocketAwaitableEventArgs.OnCompleted(SocketAsyncEventArgs e) in /_/src/Pipelines.Sockets.Unofficial/SocketAwaitableEventArgs.cs:line 145
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Net.Sockets.SocketAsyncEventArgs.TransferCompletionCallbackCore(Int32 bytesTransferred, Memory`1 socketAddress, SocketFlags receivedFlags, SocketError socketError)
at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
The breakpoint above shows that there is a unit of work within Pipelines.Sockets.Unofficial that is being run on the main thread pool. This unit of work is being scheduled from here:
Apparently, on Linux, Socket.ReceiveAsync(SocketAsyncEventArgs), when running asyncronously, always schedules the event callback on the main thread pool. If the main thread pool is backlogged, the continuation can't go through in time, and Redis client throws a timeout error.
This particular problem can be fixed by having the socket process its continuations on the socket thread. It can be done by setting the environment library DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS=1. However, this setting causes more problems with third-party libraries than it solves. Apparently, there is no way in .NET to set it for a particular socket (or I couldn't find it).
Is there any other way to have Pipelines.Sockets.Unofficial do all its work on the dedicated thread pool (and, ultimately, make Redis not time out if the process is thread starved)?
The text was updated successfully, but these errors were encountered:
We are in the process of migrating our .NET project from Framework to Core (and from Windows to Linux) and, during peak hours and thread starvation, we are getting a lot of Redis timeouts.
It's best illustrated with an MRE.
Here's what the script does:
Run it by installing
dotnet tool install -g dotnet-script
and then runningdotnet script mre.csx
Our expectation is that thread starvation should not affect the inner workings of Redis client, and that the request should succeed.
It works on Windows (both in .NET 4.7.2 and .NET 8), but not on Linux.
On Linux, it fails with a timeout:
The reason for this is that even though the library creates a dedicated thread pool and does all the work inside this pool, some continuations still run on the main thread pool (which kinda defeats the purpose).
The problem seems to be here:
The breakpoint above shows that there is a unit of work within Pipelines.Sockets.Unofficial that is being run on the main thread pool. This unit of work is being scheduled from here:
Pipelines.Sockets.Unofficial/src/Pipelines.Sockets.Unofficial/SocketConnection.Receive.cs
Line 221 in 6703efc
Apparently, on Linux,
Socket.ReceiveAsync(SocketAsyncEventArgs)
, when running asyncronously, always schedules the event callback on the main thread pool. If the main thread pool is backlogged, the continuation can't go through in time, and Redis client throws a timeout error.This particular problem can be fixed by having the socket process its continuations on the socket thread. It can be done by setting the environment library
DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS=1
. However, this setting causes more problems with third-party libraries than it solves. Apparently, there is no way in .NET to set it for a particular socket (or I couldn't find it).Is there any other way to have
Pipelines.Sockets.Unofficial
do all its work on the dedicated thread pool (and, ultimately, make Redis not time out if the process is thread starved)?The text was updated successfully, but these errors were encountered: