Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite retries when unable to open a Connection #194

Open
gaeljw opened this issue Feb 9, 2024 · 5 comments
Open

Infinite retries when unable to open a Connection #194

gaeljw opened this issue Feb 9, 2024 · 5 comments

Comments

@gaeljw
Copy link

gaeljw commented Feb 9, 2024

Context

I've noticed a strange behavior which I'm attributing to zio-jdbc but it could actually be caused by ZIO core. I don't know enough of the details to be affirmative here.

Versions

"dev.zio" %% "zio" % "2.0.21",
"dev.zio" %% "zio-jdbc" % "0.1.2",

Observed behaviour

When a Connection cannot be opened with the database due to a connection timeout for instance (or any other error blocking long enough, in my reproduction scenario it's 30s but on the real case I observed it was 2 minutes), the retry logic seems to be misbehaving and the application retries infinitely instead of crashing.

Reproduction scenario

Here's a reproduction test using Scalatest:

package com.kk.p.e.services

import org.scalatest.flatspec.AnyFlatSpec
import org.scalatest.matchers.should.Matchers
import org.slf4j.{Logger, LoggerFactory}
import zio.Unsafe.unsafe
import zio.jdbc.{ZConnectionPool, ZConnectionPoolConfig, stringToSql, transaction}
import zio.{Chunk, RIO, Task, ULayer, ZEnvironment, ZIO, ZLayer}

import java.sql.SQLException

object MyService {

  val live: ZLayer[ZConnectionPool, Nothing, MyService] = ZLayer.fromFunction { pool: ZConnectionPool => new MyService(pool) }

}

class MyService(pool: ZConnectionPool) {

  def getResults(): Task[Chunk[Long]] = {
    _getResults().provideEnvironment(ZEnvironment(pool))
  }

  private def _getResults(): RIO[ZConnectionPool, Chunk[Long]] = transaction[ZConnectionPool, Throwable, Chunk[Long]] {
    stringToSql("""SELECT 1""")
      .query[Long]
      .selectAll
      .orDie
  }

}

object MyConnectionPool {

  private val logger: Logger = LoggerFactory.getLogger(getClass)

  private val poolConfig: ULayer[ZConnectionPoolConfig] = {
    val retryPolicy = ZConnectionPoolConfig.defaultRetryPolicy
      .tapOutput { case (i, d) => ZIO.succeed(logger.info(s"Retrying pool #$i for duration $d")) }
    ZLayer.succeed(ZConnectionPoolConfig.default.copy(minConnections = 0, retryPolicy = retryPolicy))
  }

  private val connectionPool: ZLayer[ZConnectionPoolConfig, Throwable, ZConnectionPool] = {
    ZConnectionPool.make {
      ZIO
        .attempt {
          logger.info("Opening connection...")
          // DriverManager.getConnection(jdbcUrl)
          Thread.sleep(1000 * 30) // Simulate a long operation
          throw new SQLException("BIM")
        }
        .tapError { ex =>
          ZIO.attempt {
            logger.error(s"Failed to create connection: ${ex.getMessage}")
          }.ignore
        }
    }
  }

  val live: ZLayer[Any, Throwable, ZConnectionPool] = poolConfig >>> connectionPool

}

class BoumSpec extends AnyFlatSpec with Matchers {

  it should "work" in {
    val theProgram: ZIO[MyService, Throwable, Chunk[Long]] = ZIO.serviceWithZIO[MyService](_.getResults())

    val services: ZLayer[Any, Throwable, MyService] = ZLayer.make[MyService](MyService.live, MyConnectionPool.live)

    val theProgramWithDeps = theProgram.provide(services)

    val result: Chunk[_] =
      unsafe { implicit u =>
        zio.Runtime.default.unsafe.run(theProgramWithDeps).getOrThrow()
      }

    result.size shouldEqual 5 // should fail
  }
}

When running this test, here's the output:

2024-02-09 17:54:42,561 [ZScheduler-Worker-0] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:55:12,605 [ZScheduler-Worker-0] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:55:12,622 [ZScheduler-Worker-0] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #0 for duration PT0.015S
2024-02-09 17:55:12,639 [ZScheduler-Worker-6] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:55:42,640 [ZScheduler-Worker-6] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:55:42,653 [ZScheduler-Worker-6] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #1 for duration PT0.03S
2024-02-09 17:55:42,685 [ZScheduler-Worker-4] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:56:12,686 [ZScheduler-Worker-4] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:56:12,687 [ZScheduler-Worker-4] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #2 for duration PT0.06S
2024-02-09 17:56:12,748 [ZScheduler-Worker-3] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:56:42,750 [ZScheduler-Worker-3] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:56:42,763 [ZScheduler-Worker-3] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #3 for duration PT0.12S
2024-02-09 17:56:42,885 [ZScheduler-Worker-5] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:57:12,887 [ZScheduler-Worker-5] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:57:12,889 [ZScheduler-Worker-5] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #4 for duration PT0.24S
2024-02-09 17:57:13,131 [ZScheduler-Worker-7] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:57:43,133 [ZScheduler-Worker-7] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:57:43,137 [ZScheduler-Worker-7] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #5 for duration PT0.48S
2024-02-09 17:57:43,618 [ZScheduler-Worker-2] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:58:13,620 [ZScheduler-Worker-2] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:58:13,634 [ZScheduler-Worker-2] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #6 for duration PT0.96S
2024-02-09 17:58:14,596 [ZScheduler-Worker-1] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:58:44,598 [ZScheduler-Worker-1] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:58:44,611 [ZScheduler-Worker-1] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #7 for duration PT1.92S
2024-02-09 17:58:46,533 [ZScheduler-Worker-0] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:59:16,535 [ZScheduler-Worker-0] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:59:16,536 [ZScheduler-Worker-0] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #8 for duration PT3.84S
2024-02-09 17:59:20,377 [ZScheduler-Worker-6] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 17:59:50,380 [ZScheduler-Worker-6] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 17:59:50,382 [ZScheduler-Worker-6] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #9 for duration PT7.68S
2024-02-09 17:59:58,065 [ZScheduler-Worker-3] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 18:00:28,067 [ZScheduler-Worker-3] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 18:00:28,069 [ZScheduler-Worker-3] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #10 for duration PT15.36S
2024-02-09 18:00:28,080 [ZScheduler-Worker-3] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 18:00:58,082 [ZScheduler-Worker-3] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 18:00:58,084 [ZScheduler-Worker-3] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #0 for duration PT0.015S
2024-02-09 18:00:58,101 [ZScheduler-Worker-7] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 18:01:28,103 [ZScheduler-Worker-7] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 18:01:28,109 [ZScheduler-Worker-7] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #1 for duration PT0.03S
2024-02-09 18:01:28,140 [ZScheduler-Worker-2] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 18:01:58,141 [ZScheduler-Worker-2] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 18:01:58,142 [ZScheduler-Worker-2] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #2 for duration PT0.06S
2024-02-09 18:01:58,203 [ZScheduler-Worker-1] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 18:02:28,203 [ZScheduler-Worker-1] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 18:02:28,204 [ZScheduler-Worker-1] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #3 for duration PT0.12S
2024-02-09 18:02:28,325 [ZScheduler-Worker-0] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...
2024-02-09 18:02:58,326 [ZScheduler-Worker-0] ERROR c.k.p.e.services.MyConnectionPool$ - Failed to create connection: BIM
2024-02-09 18:02:58,329 [ZScheduler-Worker-0] INFO  c.k.p.e.services.MyConnectionPool$ - Retrying pool #4 for duration PT0.24S
2024-02-09 18:02:58,570 [ZScheduler-Worker-6] INFO  c.k.p.e.services.MyConnectionPool$ - Opening connection...

You can notice that after reaching the 10 retries defined in the retry policy, it starts again to retry back at iteration 0.

Other

If the "long operation" is not that long (1s instead of 30s for instance), the behaviour is not the same: the app crashes as expected after 10 retries.

@gaeljw
Copy link
Author

gaeljw commented Feb 9, 2024

Adding that using ZIO.attemptBlocking instead ofZIO.attempt when opening the connection does not change the behavior. I thought it could be related but it's not.

@gaeljw
Copy link
Author

gaeljw commented Feb 12, 2024

The piece of code that relates to this in zio-jdbc is

def make(
acquire: IO[ConnectionException, Connection]
): ZLayer[ZConnectionPoolConfig, ConnectionException, ZConnectionPool] =
ZLayer.scoped {
for {
config <- ZIO.service[ZConnectionPoolConfig]
getConn = ZIO.acquireRelease(acquire.retry(config.retryPolicy).flatMap(ZConnection.make))(_.close.ignoreLogged)
pool <- ZPool.make(getConn, Range(config.minConnections, config.maxConnections), config.timeToLive)

And the default retry config is supposed to be max 10 times:

lazy val defaultRetryPolicy: Schedule.WithState[(Long, Long), Any, Any, (Long, Duration)] =
Schedule.recurs(10) && Schedule.exponential(15.millis)

@maxstreese
Copy link

For anyone stumbling across this, I would like to offer some more insights. From my understanding, what is actually happening are not actually infinite retries, but instead the retry policy of the connection pool is applied to every potentially available connection from the pool. To illustrate what I mean consider the below scala-cli script:

#!/usr/bin/env -S scala-cli shebang

//> using dep "dev.zio::zio-jdbc:0.1.2"
//> using dep "dev.zio::zio-streams:2.1.6"
//> using dep "dev.zio::zio:2.1.6"
//> using dep "org.postgresql:postgresql:42.7.3"

import zio.*
import zio.jdbc.*
import zio.stream.*

object App extends ZIOAppDefault:

  val retryPolicy = Schedule.stop.tapOutput(_ => ZIO.logInfo("failed"))

  val connectionPoolConfig =
    ZLayer.succeed:
      ZConnectionPoolConfig
        .default
        .copy(
          minConnections = 0,
          maxConnections = 1,
          retryPolicy = retryPolicy,
        )

  val connectionPool =
    val props = Map("user" -> "postgres", "password" -> "password")
    ZConnectionPool.postgres("postgres", 5432, "postgres", props)

  val query = sql"select count(*) from test".query[Int]

  val logic =
    for
      count <- transaction(query.selectOne).flatMap(ZIO.fromOption)
      _ <- ZIO.logInfo(s"count is $count")
    yield ()

  override def run = logic.provide(connectionPoolConfig, connectionPool)

App.main(args)

This script will log the failed message once. However, if you adjust minConnections to 3 and maxConnections to 4, the failed message will be logged 4 times.

@guizmaii
Copy link
Member

guizmaii commented Aug 3, 2024

I didn't read in details the issue but I fixed something similar some times ago here: #174

@gaeljw
Copy link
Author

gaeljw commented Aug 3, 2024

I'm pretty sure #174 was already included when I reported the issue. I mean the issue is still there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants