Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reconnect to the Redis server after RedisConnectionException. #2829

Open
SuperCatss opened this issue Dec 24, 2024 · 2 comments
Open

Comments

@SuperCatss
Copy link

Hi everyone, I have a net8 api project that runs locally. Through docker deployment, Redis Serve is also deployed in docker on the same host.

StackExchange.Redis Version 2.8.22

The connection string in the code is as follows

"127.0.0.1:6379,DefaultDatabase=0,allowAdmin=true,abortConnect=false,connectRetry=5,connectTimeout=3000,syncTimeout=3000,asyncTimeout=3000,name=MesBus_Api,keepAlive=30"

Under normal circumstances, I will see two client connections named MesBus_Api in Redis Serve.looks like this

id=48 addr=127.0.0.1:35494 laddr=127.0.0.1:6379 fd=14 name=MesBus_Api age=8 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=60 obl=0 oll=0 omem=0 tot-mem=22272 events=r cmd=setex user=default redir=-1 resp=2
id=49 addr=127.0.0.1:35496 laddr=127.0.0.1:6379 fd=15 name=MesBus_Api age=8 idle=7 flags=P db=0 sub=1 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1856 events=r cmd=subscribe user=default redir=-1 resp=2

But from time to time I encounter the following error:

StackExchange.Redis.RedisConnectionException: No connection is active/available to service this operation: HMGET MesBus:TokenBLK:2b290cab2815ab802a7e18e01bf679e1, inst: 0, qu: 0, qs: 0, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 2, cur-in: 0, sync-ops: 72, async-ops: 227416, serverEndpoint: 127.0.0.1:6379, conn-sec: 4370.5, aoc: 1, mc: 1/1/0, mgr: 10 of 10 available, clientName: hc-phytiumft20004(SE.Redis-v2.7.10.12442), IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=3,Free=32764,Min=16,Max=32767), POOL: (Threads=12,QueuedItems=0,CompletedItems=1155132,Timers=11), v: 2.7.10.12442
at StackExchange.Redis.ConnectionMultiplexer.ThrowFailed[T](TaskCompletionSource`1 source, Exception unthrownException) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2037

When I got this exception, I found that my project could not reconnect to the redis server correctly, and I used redis cli to check that the server had no slowlog, and the server was working normally (I have two programs written in rust that also use the redis server, both using db0)

Moreover, I checked the client list and there are still two MesBus_Api connections. When I restart the redis server, I can see that there is a MesBus_Api connection and my API project still tries to initiate a connection to redis because I detect that there is a TCP connection in my server to the Redis server, although They all end up as Time_Wait and they are unable to establish a normal connection to redis. Unless I restart my API project, at this point I can see that two connections named MesBus_Api have been established in redis. And the exception disappears in the project and all code works normally.

How should I troubleshoot or check this issue? Thank you for your help.
ps.I checked the server's CPU usage and it wasn't too high.

@mgravell
Copy link
Collaborator

Nothing leaps out here as being obviously casual - looks fairly healthy (other than not working): the time and ops show you aren't spinning up a multiplexer per context, and you're making good use of async. The fact that it hasn't fixed itself suggests something networking related, but this seems to be loopback, which cuts out a lot of things there. Is this reproducible on demand or at least reasonably, i.e. "leave it for 12 hours, it'll start happening around then" or something like that? I was wondering about maybe when this happens, try connecting a second multiplexer with the "log" parameter which gives us a bit more info. There are also some connection failed events on the multiplexer that might provide context.

@SuperCatss
Copy link
Author

Nothing leaps out here as being obviously casual - looks fairly healthy (other than not working): the time and ops show you aren't spinning up a multiplexer per context, and you're making good use of async. The fact that it hasn't fixed itself suggests something networking related, but this seems to be loopback, which cuts out a lot of things there. Is this reproducible on demand or at least reasonably, i.e. "leave it for 12 hours, it'll start happening around then" or something like that? I was wondering about maybe when this happens, try connecting a second multiplexer with the "log" parameter which gives us a bit more info. There are also some connection failed events on the multiplexer that might provide context.

This problem is completely random and I haven't found a way to reproduce it yet. I'm enabling "LOG" so I can get more information the next time I encounter it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants