Skip to content

LettuceConnectionProvider getConnection hang forever  #2289

Open
@heoYH

Description

@heoYH

We use spring-data-redis and lettuce to access the redis cluster.

I don't know the cause, but there was a problem that the connection could not be initialization complete
As a result, sharedConnection could not be inited and fell into a waiting state forever
And after that, all requests became blocking.

###thread dump

"lettuce-epollEventLoop-5-6" tid=0x3f native=false suspended=false
   java.lang.Thread.State: WAITING
	at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
	at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
	at java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1796)
	at java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3128)
	at java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1823)
	at java.util.concurrent.CompletableFuture.get([email protected]/CompletableFuture.java:1998)
	at io.lettuce.core.cluster.RedisClusterClient.get(RedisClusterClient.java:937)
	at io.lettuce.core.cluster.RedisClusterClient.getPartitions(RedisClusterClient.java:329)
	at org.springframework.data.redis.connection.lettuce.ClusterConnectionProvider.getConnectionAsync(ClusterConnectionProvider.java:92)
	at org.springframework.data.redis.connection.lettuce.ClusterConnectionProvider.getConnectionAsync(ClusterConnectionProvider.java:40)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionProvider.getConnection(LettuceConnectionProvider.java:53)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.getConnection(LettuceConnectionFactory.java:1527)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getNativeConnection(LettuceConnectionFactory.java:1315)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getConnection(LettuceConnectionFactory.java:1298)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getSharedReactiveConnection(LettuceConnectionFactory.java:1049)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveClusterConnection(LettuceConnectionFactory.java:481)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:457)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:101)
	at org.springframework.data.redis.core.ReactiveRedisTemplate.lambda$doInConnection$0(ReactiveRedisTemplate.java:198)
	at org.springframework.data.redis.core.ReactiveRedisTemplate$$Lambda$1725/0x0000000100cadc40.get(Unknown Source)
	at reactor.core.publisher.MonoSupplier.call(MonoSupplier.java:85)
"lettuce-epollEventLoop-5-5" tid=0x3e native=false suspended=false
   java.lang.Thread.State: BLOCKED
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getConnection(LettuceConnectionFactory.java:1297)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getSharedReactiveConnection(LettuceConnectionFactory.java:1049)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveClusterConnection(LettuceConnectionFactory.java:481)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:457)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:101)
	at org.springframework.data.redis.core.ReactiveRedisTemplate.lambda$doInConnection$0(ReactiveRedisTemplate.java:198)
	at org.springframework.data.redis.core.ReactiveRedisTemplate$$Lambda$1725/0x0000000100cadc40.get(Unknown Source)
	at reactor.core.publisher.MonoSupplier.call(MonoSupplier.java:85)
	at reactor.core.publisher.FluxUsingWhen.subscribe(FluxUsingWhen.java:80)
	at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:236)
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:203)
	at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onComplete(ScopePassingSpanSubscriber.java:102)
	at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onComplete(FluxSwitchIfEmpty.java:84)
	at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onComplete(ScopePassingSpanSubscriber.java:102)
	at reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber.onComplete(MonoIgnoreElements.java:88)
	at reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber.onComplete(MonoIgnoreElements.java:88)
	at reactor.core.publisher.Operators.complete(Operators.java:136)

We are looking for the cause of the connection failing to connect.
My guess is that getConnection should have a timeout.

return LettuceFutureUtils.join(getConnectionAsync(connectionType));


This is because it should not "hang" when sharedConnection init fails for various reasons.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions