Coordinator crashes after upgrading to 29.0.0 #15942

jreyeshdez · 2024-02-22T14:44:52Z

Description

Upgraded Druid cluster from 28.0.1 to 29.0.0 configured with mm-less and zk-less. Coordinator fails to start up.

Cluster size: Dev cluster: 3xBrokers, 3xCoordinators, 3xHistorical
Configurations in use: mm-less and zk-less set up.
Steps to reproduce the problem: upgraded from 28.0.1 to 29.0.0
The error message or stack traces encountered: In coordinator:

2024-02-21T10:10:36,125 INFO [main] org.apache.druid.client.HttpServerInventoryView - Started executor[HttpServerInventoryView].
2024-02-21T10:10:36,126 INFO [main] org.apache.druid.client.CoordinatorServerView - CoordinatorServerView waiting for initialization.
2024-02-21T10:10:36,521 ERROR [leader-elector-scheduled-worker-1] io.kubernetes.client.extended.leaderelection.LeaderElector - Unexpected error on acquiring or renewing the lease
java.util.concurrent.ExecutionException: java.time.format.DateTimeParseException: Text '20240220T125204.849Z' could not be parsed at index 0
    at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?]
    at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[?:?]
    at io.kubernetes.client.extended.leaderelection.LeaderElector.lambda$acquire$1(LeaderElector.java:187) ~[?:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
    at java.lang.Thread.run(Thread.java:840) ~[?:?]
Caused by: java.time.format.DateTimeParseException: Text '20240220T125204.849Z' could not be parsed at index 0
    at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2052) ~[?:?]
    at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1880) ~[?:?]
    at io.kubernetes.client.openapi.JSON$DateTypeAdapter.read(JSON.java:405) ~[?:?]
    at io.kubernetes.client.openapi.JSON$DateTypeAdapter.read(JSON.java:363) ~[?:?]
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.readIntoField(ReflectiveTypeAdapterFactory.java:212) ~[gson-2.10.jar:?]
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$FieldReflectionAdapter.readField(ReflectiveTypeAdapterFactory.java:431) ~[gson-2.10.jar:?]
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:391) ~[gson-2.10.jar:?]
    at com.google.gson.Gson.fromJson(Gson.java:1214) ~[gson-2.10.jar:?]
    at com.google.gson.Gson.fromJson(Gson.java:1124) ~[gson-2.10.jar:?]
    at com.google.gson.Gson.fromJson(Gson.java:1034) ~[gson-2.10.jar:?]
    at com.google.gson.Gson.fromJson(Gson.java:1001) ~[gson-2.10.jar:?]
    at io.kubernetes.client.openapi.JSON.deserialize(JSON.java:168) ~[?:?]
    at io.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.get(ConfigMapLock.java:78) ~[?:?]
    at io.kubernetes.client.extended.leaderelection.LeaderElector.tryAcquireOrRenew(LeaderElector.java:291) ~[?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
    ... 3 more

Any debugging that you have already done: as suggested, followed this comment Exception attempting to re-aquire lock kubernetes-client/java#2350 (comment)

Went ahead an deleted coordinator endpoint, it got re-created but issue became worse since after that all nodes started failing.

Broker log:

2024-02-21T10:46:44,297 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Created NodeRoleWatcher for nodeRole [COORDINATOR].
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
    at org.apache.druid.server.security.PreResponseAuthorizationCheckFilter.doFilter(PreResponseAuthorizationCheckFilter.java:84)
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
    at org.apache.druid.server.initialization.jetty.StandardResponseHeaderFilterHolder$StandardResponseHeaderFilter.doFilter(StandardResponseHeaderFilterHolder.java:164)
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
    at org.apache.druid.server.security.AllowHttpMethodsResourceFilter.doFilter(AllowHttpMethodsResourceFilter.java:78)
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
    at org.apache.druid.server.security.AllowOptionsResourceFilter.doFilter(AllowOptionsResourceFilter.java:74)
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
    at org.apache.druid.security.basic.authentication.BasicHTTPAuthenticator$BasicHTTPAuthenticationFilter.doFilter(BasicHTTPAuthenticator.java:220)
    at org.apache.druid.server.security.AuthenticationWrappingFilter.doFilter(AuthenticationWrappingFilter.java:59)
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
    at org.apache.druid.server.security.SecuritySanityCheckFilter.doFilter(SecuritySanityCheckFilter.java:77)
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772)
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.server.Server.handle(Server.java:516)
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
    at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.RuntimeException: java.time.format.DateTimeParseException: Text &apos;20240221T104140.624Z&apos; could not be parsed at index 0
    at org.apache.druid.k8s.discovery.K8sDruidLeaderSelector.getCurrentLeader(K8sDruidLeaderSelector.java:112)
    at org.apache.druid.server.coordinator.DruidCoordinator.getCurrentLeader(DruidCoordinator.java:315)
    at org.apache.druid.server.http.CoordinatorRedirectInfo.getRedirectURL(CoordinatorRedirectInfo.java:57)
    ... 53 more
Caused by: java.time.format.DateTimeParseException: Text &apos;20240221T104140.624Z&apos; could not be parsed at index 0
    at java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2052)
    at java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1880)
    at io.kubernetes.client.openapi.JSON$DateTypeAdapter.read(JSON.java:405)
    at io.kubernetes.client.openapi.JSON$DateTypeAdapter.read(JSON.java:363)
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.readIntoField(ReflectiveTypeAdapterFactory.java:212)
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$FieldReflectionAdapter.readField(ReflectiveTypeAdapterFactory.java:431)
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:391)
    at com.google.gson.Gson.fromJson(Gson.java:1214)
    at com.google.gson.Gson.fromJson(Gson.java:1124)
    at com.google.gson.Gson.fromJson(Gson.java:1034)
    at com.google.gson.Gson.fromJson(Gson.java:1001)
    at io.kubernetes.client.openapi.JSON.deserialize(JSON.java:168)
    at io.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.get(ConfigMapLock.java:78)
    at org.apache.druid.k8s.discovery.DefaultK8sLeaderElectorFactory$1.getCurrentLeader(DefaultK8sLeaderElectorFactory.java:67)
    at org.apache.druid.k8s.discovery.LeaderElectorAsyncWrapper.getCurrentLeader(LeaderElectorAsyncWrapper.java:117)
    at org.apache.druid.k8s.discovery.K8sDruidLeaderSelector.getCurrentLeader(K8sDruidLeaderSelector.java:109)
    ... 55 more
</pre>

</body>
</html>
]
    at org.apache.druid.query.lookup.LookupReferencesManager.tryGetLookupListFromCoordinator(LookupReferencesManager.java:459) ~[druid-server-29.0.0.jar:29.0.0]
    at org.apache.druid.query.lookup.LookupReferencesManager.lambda$getLookupListFromCoordinator$4(LookupReferencesManager.java:428) ~[druid-server-29.0.0.jar:29.0.0]
    at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:129) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:163) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:153) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.query.lookup.LookupReferencesManager.getLookupListFromCoordinator(LookupReferencesManager.java:418) ~[druid-server-29.0.0.jar:29.0.0]
    at org.apache.druid.query.lookup.LookupReferencesManager.getLookupsList(LookupReferencesManager.java:395) ~[druid-server-29.0.0.jar:29.0.0]
    at org.apache.druid.query.lookup.LookupReferencesManager.loadAllLookupsAndInitStateRef(LookupReferencesManager.java:378) ~[druid-server-29.0.0.jar:29.0.0]
    at org.apache.druid.query.lookup.LookupReferencesManager.start(LookupReferencesManager.java:170) ~[druid-server-29.0.0.jar:29.0.0]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
    at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
    at org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:446) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:341) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.guice.LifecycleModule$2.start(LifecycleModule.java:152) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:136) ~[druid-services-29.0.0.jar:29.0.0]
    at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:94) ~[druid-services-29.0.0.jar:29.0.0]
    at org.apache.druid.cli.ServerRunnable.run(ServerRunnable.java:63) ~[druid-services-29.0.0.jar:29.0.0]
    at org.apache.druid.cli.Main.main(Main.java:112) ~[druid-services-29.0.0.jar:29.0.0]

Router logs:

2024-02-21T10:48:30,378 WARN [main] org.apache.druid.java.util.common.RetryUtils - Retrying (1 of 9) in 1,123ms.
com.fasterxml.jackson.core.JsonParseException: Input does not start with Smile format header (first byte = 0x3c) and parser has REQUIRE_HEADER enabled: can not parse
 at [Source: (byte[])"<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 500 java.lang.RuntimeException: java.lang.RuntimeException: java.time.format.DateTimeParseException: Text &apos;20240221T104140.624Z&apos; could not be parsed at index 0</title>
</head>
<body><h2>HTTP ERROR 500 java.lang.RuntimeException: java.lang.RuntimeException: java.time.format.DateTimeParseException: Text &apos;20240221T104140.624Z&apos; could not be parsed at index 0</h2>
<table>
<tr><th>UR"[truncated 7950 bytes]; line: -1, column: 0]
    at com.fasterxml.jackson.dataformat.smile.SmileParserBootstrapper.constructParser(SmileParserBootstrapper.java:133) ~[jackson-dataformat-smile-2.12.7.jar:2.12.7]
    at com.fasterxml.jackson.dataformat.smile.SmileFactory._createParser(SmileFactory.java:458) ~[jackson-dataformat-smile-2.12.7.jar:2.12.7]
    at com.fasterxml.jackson.dataformat.smile.SmileFactory.createParser(SmileFactory.java:364) ~[jackson-dataformat-smile-2.12.7.jar:2.12.7]
    at com.fasterxml.jackson.dataformat.smile.SmileFactory.createParser(SmileFactory.java:351) ~[jackson-dataformat-smile-2.12.7.jar:2.12.7]
    at com.fasterxml.jackson.dataformat.smile.SmileFactory.createParser(SmileFactory.java:29) ~[jackson-dataformat-smile-2.12.7.jar:2.12.7]
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3626) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
    at org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager.tryFetchUserMapFromCoordinator(CoordinatorPollingBasicAuthenticatorCacheManager.java:262) ~[?:?]
    at org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager.lambda$fetchUserMapFromCoordinator$1(CoordinatorPollingBasicAuthenticatorCacheManager.java:192) ~[?:?]
    at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:129) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:163) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:153) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager.fetchUserMapFromCoordinator(CoordinatorPollingBasicAuthenticatorCacheManager.java:190) ~[?:?]
    at org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager.initUserMaps(CoordinatorPollingBasicAuthenticatorCacheManager.java:289) ~[?:?]
    at org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager.start(CoordinatorPollingBasicAuthenticatorCacheManager.java:108) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
    at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
    at org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:446) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:341) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.guice.LifecycleModule$2.start(LifecycleModule.java:152) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:136) ~[druid-services-29.0.0.jar:29.0.0]
    at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:94) ~[druid-services-29.0.0.jar:29.0.0]
    at org.apache.druid.cli.ServerRunnable.run(ServerRunnable.java:63) ~[druid-services-29.0.0.jar:29.0.0]
    at org.apache.druid.cli.Main.main(Main.java:112) ~[druid-services-29.0.0.jar:29.0.0]
2024-02-21T10:48:31,605 WARN [main] org.apache.druid.java.util.common.RetryUtils - Retrying (2 of 9) in 2,098ms.
com.fasterxml.jackson.core.JsonParseException: Input does not start with Smile format header (first byte = 0x3c) and parser has REQUIRE_HEADER enabled: can not parse
 at [Source: (byte[])"<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 500 java.lang.RuntimeException: java.lang.RuntimeException: java.time.format.DateTimeParseException: Text &apos;20240221T104140.624Z&apos; could not be parsed at index 0</title>
</head>
<body><h2>HTTP ERROR 500 java.lang.RuntimeException: java.lang.RuntimeException: java.time.format.DateTimeParseException: Text &apos;20240221T104140.624Z&apos; could not be parsed at index 0</h2>
<table>
<tr><th>UR"[truncated 7950 bytes]; line: -1, column: 0]
    at com.fasterxml.jackson.dataformat.smile.SmileParserBootstrapper.constructParser(SmileParserBootstrapper.java:133) ~[jackson-dataformat-smile-2.12.7.jar:2.12.7]
    at com.fasterxml.jackson.dataformat.smile.SmileFactory._createParser(SmileFactory.java:458) ~[jackson-dataformat-smile-2.12.7.jar:2.12.7]
    at com.fasterxml.jackson.dataformat.smile.SmileFactory.createParser(SmileFactory.java:364) ~[jackson-dataformat-smile-2.12.7.jar:2.12.7]
    at com.fasterxml.jackson.dataformat.smile.SmileFactory.createParser(SmileFactory.java:351) ~[jackson-dataformat-smile-2.12.7.jar:2.12.7]
    at com.fasterxml.jackson.dataformat.smile.SmileFactory.createParser(SmileFactory.java:29) ~[jackson-dataformat-smile-2.12.7.jar:2.12.7]
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3626) ~[jackson-databind-2.12.7.1.jar:2.12.7.1]
    at org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager.tryFetchUserMapFromCoordinator(CoordinatorPollingBasicAuthenticatorCacheManager.java:262) ~[?:?]
    at org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager.lambda$fetchUserMapFromCoordinator$1(CoordinatorPollingBasicAuthenticatorCacheManager.java:192) ~[?:?]
    at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:129) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:163) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:153) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager.fetchUserMapFromCoordinator(CoordinatorPollingBasicAuthenticatorCacheManager.java:190) ~[?:?]
    at org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager.initUserMaps(CoordinatorPollingBasicAuthenticatorCacheManager.java:289) ~[?:?]
    at org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager.start(CoordinatorPollingBasicAuthenticatorCacheManager.java:108) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
    at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
    at org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:446) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:341) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.guice.LifecycleModule$2.start(LifecycleModule.java:152) ~[druid-processing-29.0.0.jar:29.0.0]
    at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:136) ~[druid-services-29.0.0.jar:29.0.0]
    at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:94) ~[druid-services-29.0.0.jar:29.0.0]
    at org.apache.druid.cli.ServerRunnable.run(ServerRunnable.java:63) ~[druid-services-29.0.0.jar:29.0.0]
    at org.apache.druid.cli.Main.main(Main.java:112) ~[druid-services-29.0.0.jar:29.0.0]

Coordinator logs:

2024-02-21T10:50:05,825 DEBUG [qtp778350106-150] org.apache.druid.jetty.RequestLog - 10.2.22.225 POST //10.2.14.73:8081/druid/coordinator/v1/metadata/dataSourceInformation HTTP/1.1 500
2024-02-21T10:50:06,031 WARN [qtp778350106-157] org.eclipse.jetty.server.HttpChannel - /druid/coordinator/v1/metadata/dataSourceInformation
java.lang.RuntimeException: java.lang.RuntimeException: java.time.format.DateTimeParseException: Text '20240221T104140.624Z' could not be parsed at index 0
    at org.apache.druid.server.http.CoordinatorRedirectInfo.getRedirectURL(CoordinatorRedirectInfo.java:71) ~[druid-server-29.0.0.jar:29.0.0]
    at org.apache.druid.cli.CoordinatorOverlordRedirectInfo.getRedirectURL(CoordinatorOverlordRedirectInfo.java:58) ~[druid-services-29.0.0.jar:29.0.0]
    at org.apache.druid.server.http.RedirectFilter.doFilter(RedirectFilter.java:75) ~[druid-server-29.0.0.jar:29.0.0]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.apache.druid.server.security.PreResponseAuthorizationCheckFilter.doFilter(PreResponseAuthorizationCheckFilter.java:84) ~[druid-server-29.0.0.jar:29.0.0]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.apache.druid.server.initialization.jetty.StandardResponseHeaderFilterHolder$StandardResponseHeaderFilter.doFilter(StandardResponseHeaderFilterHolder.java:164) ~[druid-server-29.0.0.jar:29.0.0]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.apache.druid.server.security.AllowHttpMethodsResourceFilter.doFilter(AllowHttpMethodsResourceFilter.java:78) ~[druid-server-29.0.0.jar:29.0.0]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.apache.druid.server.security.AllowOptionsResourceFilter.doFilter(AllowOptionsResourceFilter.java:74) ~[druid-server-29.0.0.jar:29.0.0]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.apache.druid.security.basic.authentication.BasicHTTPAuthenticator$BasicHTTPAuthenticationFilter.doFilter(BasicHTTPAuthenticator.java:220) ~[?:?]
    at org.apache.druid.server.security.AuthenticationWrappingFilter.doFilter(AuthenticationWrappingFilter.java:59) ~[druid-server-29.0.0.jar:29.0.0]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.apache.druid.server.security.SecuritySanityCheckFilter.doFilter(SecuritySanityCheckFilter.java:77) ~[druid-server-29.0.0.jar:29.0.0]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) ~[jetty-servlet-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) ~[jetty-server-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) ~[jetty-io-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) ~[jetty-io-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) ~[jetty-io-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) ~[jetty-util-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) ~[jetty-util-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) ~[jetty-util-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) ~[jetty-util-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409) ~[jetty-util-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) ~[jetty-util-9.4.53.v20231009.jar:9.4.53.v20231009]
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) ~[jetty-util-9.4.53.v20231009.jar:9.4.53.v20231009]
    at java.lang.Thread.run(Thread.java:840) ~[?:?]
Caused by: java.lang.RuntimeException: java.time.format.DateTimeParseException: Text '20240221T104140.624Z' could not be parsed at index 0
    at org.apache.druid.k8s.discovery.K8sDruidLeaderSelector.getCurrentLeader(K8sDruidLeaderSelector.java:112) ~[?:?]
    at org.apache.druid.server.coordinator.DruidCoordinator.getCurrentLeader(DruidCoordinator.java:315) ~[druid-server-29.0.0.jar:29.0.0]
    at org.apache.druid.server.http.CoordinatorRedirectInfo.getRedirectURL(CoordinatorRedirectInfo.java:57) ~[druid-server-29.0.0.jar:29.0.0]
    ... 53 more
Caused by: java.time.format.DateTimeParseException: Text '20240221T104140.624Z' could not be parsed at index 0
    at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2052) ~[?:?]
    at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1880) ~[?:?]
    at io.kubernetes.client.openapi.JSON$DateTypeAdapter.read(JSON.java:405) ~[?:?]
    at io.kubernetes.client.openapi.JSON$DateTypeAdapter.read(JSON.java:363) ~[?:?]
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.readIntoField(ReflectiveTypeAdapterFactory.java:212) ~[gson-2.10.jar:?]
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$FieldReflectionAdapter.readField(ReflectiveTypeAdapterFactory.java:431) ~[gson-2.10.jar:?]
    at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:391) ~[gson-2.10.jar:?]
    at com.google.gson.Gson.fromJson(Gson.java:1214) ~[gson-2.10.jar:?]
    at com.google.gson.Gson.fromJson(Gson.java:1124) ~[gson-2.10.jar:?]
    at com.google.gson.Gson.fromJson(Gson.java:1034) ~[gson-2.10.jar:?]
    at com.google.gson.Gson.fromJson(Gson.java:1001) ~[gson-2.10.jar:?]
    at io.kubernetes.client.openapi.JSON.deserialize(JSON.java:168) ~[?:?]
    at io.kubernetes.client.extended.leaderelection.resourcelock.ConfigMapLock.get(ConfigMapLock.java:78) ~[?:?]
    at org.apache.druid.k8s.discovery.DefaultK8sLeaderElectorFactory$1.getCurrentLeader(DefaultK8sLeaderElectorFactory.java:67) ~[?:?]
    at org.apache.druid.k8s.discovery.LeaderElectorAsyncWrapper.getCurrentLeader(LeaderElectorAsyncWrapper.java:117) ~[?:?]
    at org.apache.druid.k8s.discovery.K8sDruidLeaderSelector.getCurrentLeader(K8sDruidLeaderSelector.java:109) ~[?:?]
    at org.apache.druid.server.coordinator.DruidCoordinator.getCurrentLeader(DruidCoordinator.java:315) ~[druid-server-29.0.0.jar:29.0.0]
    at org.apache.druid.server.http.CoordinatorRedirectInfo.getRedirectURL(CoordinatorRedirectInfo.java:57) ~[druid-server-29.0.0.jar:29.0.0]
    ... 53 more

Historical log very similar to above.

The text was updated successfully, but these errors were encountered:

abhishekagarwal87 · 2024-03-01T07:00:52Z

@jreyeshdez - There is an explanation in the ticket you linked, and it has a workaround too. I am going to close this bug as there is nothing we can do from Druid side. If there is a new client library available with the fix, we can upgrade to that version.

jreyeshdez · 2024-03-01T10:49:13Z

@jreyeshdez - There is an explanation in the ticket you linked, and it has a workaround too. I am going to close this bug as there is nothing we can do from Druid side. If there is a new client library available with the fix, we can upgrade to that version.

thanks @abhishekagarwal87, however the workaround provided in the link did not work in my case tho, as described above, I followed instructions in the comments and the amount of errors increased across all nodes. There were also other users impacted that commented in the slack thread we started https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1708510329253319

Is there any documentation in druid's release page describing what steps needs to be done for users upgrading to 29.0.0 that encounter that error? I think the guidance should come from Druid, not going to somewhere else PR/Issue's comment to fix it.

m17kea · 2024-12-02T14:38:51Z

The issue is caused because of a change in date format in the leader election config maps that are stored in kubernetes. The steps to fix are as follows:

kubectl get cm -n <druid namespace>

you should see 2 config maps of the form:

<cluster-identifier>-leaderelection-coordinator
<cluster-identifier>-leaderelection-overlord

verify the date from your error message appears in the annotations:

kubectl describe cm <cluster-identifier>-leaderelection-coordinator -n <druid namespace>
kubectl describe cm <cluster-identifier>-leaderelection-overlord -n <druid namespace>

remove these by:

kubectl delete cm <cluster-identifier>-leaderelection-coordinator -n <druid namespace>
kubectl delete cm <cluster-identifier>-leaderelection-overlord -n <druid namespace>

These will be recreated by the services whcih will then successfully start.

m17kea · 2024-12-02T15:27:12Z

Likely introduced by #15449

jreyeshdez · 2024-12-03T12:34:56Z

The issue is caused because of a change in date format in the leader election config maps that are stored in kubernetes. The steps to fix are as follows:

kubectl get cm -n <druid namespace>

you should see 2 config maps of the form:
<cluster-identifier>-leaderelection-coordinator
<cluster-identifier>-leaderelection-overlord
verify the date from your error message appears in the annotations:
kubectl describe cm <cluster-identifier>-leaderelection-coordinator -n <druid namespace>
kubectl describe cm <cluster-identifier>-leaderelection-overlord -n <druid namespace>
remove these by:
kubectl delete cm <cluster-identifier>-leaderelection-coordinator -n <druid namespace>
kubectl delete cm <cluster-identifier>-leaderelection-overlord -n <druid namespace>
These will be recreated by the services whcih will then successfully start.

Thanks @m17kea for the explanation.

I actually attempted your solution awhile ago and unfortunately did not work either. Worth saying I did it through k9s but somehow the config maps were not recreated with updated time.

Link of the conversation https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1712231718374279?thread_ts=1708510329.253319&cid=C0309C9L90D

In case you cant see above slack link, this is what the message says:

just checked and see druid-deployment-dev-leaderelection-coordinator and druid-deployment-dev-leaderelection-overlord  both have "acquireTime":"20240228T111729.078Z" and "acquireTime":"20240228T111857.334Z"

and

well unfortunately it does not work, I upgraded and applied the changes, it failed with same error but I deleted the configmaps but they got re-created back with same acquireTime format even in 29.0.1

I deploy druid via Helm so not sure if there is a different procedure.

m17kea · 2024-12-03T16:33:44Z

The code that creates the config map is in a downstream kubernetes java library that has been updated in #15449. We use the druid-operator to deploy our clusters but this issue is not related to how it is deployed, it's controlled exclusively by the k8s extension. If the config maps get recreated with the old format again then you must have a version 28 version of one or more of the processes running somewhere. If you have 3 replicas perhaps the helm rolling update does them one at a time leaving old code running. The other thing you could try is manually editing the date in the config map to the new format which would move the error to the older versions whilst the new ones rollout. If you want a hand, reach out to me

jreyeshdez · 2024-12-03T17:18:54Z

The code that creates the config map is in a downstream kubernetes java library that has been updated in #15449. We use the druid-operator to deploy our clusters but this issue is not related to how it is deployed, it's controlled exclusively by the k8s extension. If the config maps get recreated with the old format again then you must have a version 28 version of one or more of the processes running somewhere. If you have 3 replicas perhaps the helm rolling update does them one at a time leaving old code running. The other thing you could try is manually editing the date in the config map to the new format which would move the error to the older versions whilst the new ones rollout. If you want a hand, reach out to me

In that EKS cluster, there is only 1 Druid cluster running, it is isolated in a AWS dev account and EKS. However there are replicas for broker (3), coordinator (3) and router (3). Only 1 historical. Also worth noting it is running in mm-less and zk-less, not sure if that over complicates it.

I think it might be related to the rolling update where if the newer upcoming pod fails to start then it won't terminate the other ones with old config. So not sure what best approach I can take here, specially when I have to do it to the production cluster.

jreyeshdez added the Uncategorized problem report label Feb 22, 2024

abhishekagarwal87 closed this as not planned Won't fix, can't repro, duplicate, stale Mar 1, 2024

m17kea mentioned this issue Dec 2, 2024

Kubernetes service discovery breaking change in version 29.0.0 #17529

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coordinator crashes after upgrading to 29.0.0 #15942

Coordinator crashes after upgrading to 29.0.0 #15942

jreyeshdez commented Feb 22, 2024

abhishekagarwal87 commented Mar 1, 2024

jreyeshdez commented Mar 1, 2024

m17kea commented Dec 2, 2024

m17kea commented Dec 2, 2024

jreyeshdez commented Dec 3, 2024

m17kea commented Dec 3, 2024

jreyeshdez commented Dec 3, 2024 •

edited

Loading

Coordinator crashes after upgrading to 29.0.0 #15942

Coordinator crashes after upgrading to 29.0.0 #15942

Comments

jreyeshdez commented Feb 22, 2024

Description

abhishekagarwal87 commented Mar 1, 2024

jreyeshdez commented Mar 1, 2024

m17kea commented Dec 2, 2024

m17kea commented Dec 2, 2024

jreyeshdez commented Dec 3, 2024

m17kea commented Dec 3, 2024

jreyeshdez commented Dec 3, 2024 • edited Loading

jreyeshdez commented Dec 3, 2024 •

edited

Loading