Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data-index and job service startupProbes to the workflow Deployment #361

Closed
wmedvede opened this issue Jan 23, 2024 · 2 comments
Closed
Assignees

Comments

@wmedvede
Copy link
Contributor

Description

In cases where DI or JS are detected in current sonataflow-platform, a startupProbe can be added to workflow Deployment in order to query the q/health/started kogito-runtime check.

This development is related with: apache/incubator-kie-kogito-runtimes#3365

Implementation ideas

No response

@wmedvede wmedvede self-assigned this Jan 23, 2024
@wmedvede wmedvede changed the title Add startupProbes to workflow Deployment Add data-index and job service startupProbes to the workflow Deployment Jan 29, 2024
@wmedvede wmedvede moved this from 📋 Backlog to ⏳ In Progress in 🦉 KIE Podling Board Feb 1, 2024
@wmedvede
Copy link
Contributor Author

wmedvede commented Feb 7, 2024

Resolved here: #377

@wmedvede wmedvede closed this as completed Feb 7, 2024
@github-project-automation github-project-automation bot moved this from ⏳ In Progress to 🎯 Done in 🦉 KIE Podling Board Feb 7, 2024
@gabriel-farache
Copy link

I am reproducing right now
I do have the logs

2024-02-15 08:28:04,460 INFO  [io.sma.health] (executor-thread-1) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Data Index Availability - startup check","status":"DOWN","data":{"error":"[unknown] - io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: sonataflow-platform-data-index-service.sonataflow-infra/172.31.200.9:80"}},{"name":"SmallRye Reactive Messaging - startup check","status":"UP"}]}
2024-02-15 08:28:19,371 INFO  [io.sma.health] (executor-thread-1) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Data Index Availability - startup check","status":"DOWN","data":{"error":"[unknown] - io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: sonataflow-platform-data-index-service.sonataflow-infra/172.31.200.9:80"}},{"name":"SmallRye Reactive Messaging - startup check","status":"UP"}]}
2024-02-15 08:28:34,375 INFO  [io.sma.health] (executor-thread-1) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Data Index Availability - startup check","status":"DOWN","data":{"error":"[unknown] - io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: sonataflow-platform-data-index-service.sonataflow-infra/172.31.200.9:80"}},{"name":"SmallRye Reactive Messaging - startup check","status":"UP"}]}

No restarts

oc -n sonataflow-infra get pods
NAME                                                      READY   STATUS    RESTARTS   AGE
greeting-64c66ccdb7-ldmdr                                 1/1     Running   0          7m42s
sonataflow-platform-data-index-service-6676f74b48-258wf   1/1     Running   0          7m42s
sonataflow-platform-jobs-service-d9455b6f7-2v8c9          1/1     Running   0          7m42s
sonataflow-psql-postgresql-0                              1/1     Running   0          10m

and no greetings in our UI that is reading the data index
image

Here is the dump of the DB DI_dump.zip

From what I see and understand from the describe, the startupProbe

startupProbe:
      failureThreshold: 5
      httpGet:
        path: /q/health/started
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 15
      successThreshold: 1
      timeoutSeconds: 3

Is that the pod will only restart after 5 failures and here we only have 3.

From the full log (see below), there are 2 errors related to publishing event on the DI when the workflow starts so it seems that the workflow is registering itself at startup and never after so if no restart, no registration

I tried to delete the DI pod to see if after its re-creation something changes but nothing, the greeting still not appears while other workflows created after the DI start are there.

FUll log of greeting:

oc -n sonataflow-infra logs  greeting-64c66ccdb7-ldmdr 
Starting the Java application using /opt/jboss/container/java/run/run-java.sh ...
INFO exec -a "java" java -Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -cp "." -jar /deployments/quarkus-run.jar 
INFO running in /deployments
__  ____  __  _____   ___  __ ____  ______ 
 --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ 
 -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \   
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/   
2024-02-15 08:27:48,983 WARN  [io.qua.config] (main) Unrecognized configuration key "kogito.data-index.health-enabled" was provided; it will be ignored; verify that the dependency extension for this configuration is set or that you did not make a typo
2024-02-15 08:27:48,984 WARN  [io.qua.config] (main) Unrecognized configuration key "kogito.jobs-service.health-enabled" was provided; it will be ignored; verify that the dependency extension for this configuration is set or that you did not make a typo
2024-02-15 08:27:48,984 WARN  [io.qua.config] (main) Unrecognized configuration key "kogito.data-index.url" was provided; it will be ignored; verify that the dependency extension for this configuration is set or that you did not make a typo
2024-02-15 08:27:48,984 WARN  [io.qua.config] (main) Unrecognized configuration key "kogito.jobs-service.url" was provided; it will be ignored; verify that the dependency extension for this configuration is set or that you did not make a typo
2024-02-15 08:27:49,846 WARN  [org.kie.kog.add.qua.kna.eve.KnativeEventingConfigSourceFactory] (main) K_SINK variable is empty or doesn't exist. Please make sure that this service is a Knative Source or has a SinkBinding bound to it.
2024-02-15 08:27:49,941 WARN  [io.qua.run.con.ConfigRecorder] (main) Build time property cannot be changed at runtime:
 - quarkus.devservices.enabled is set to 'false' but it is build time fixed to 'true'. Did you change the property quarkus.devservices.enabled after building the application?
2024-02-15 08:27:50,623 INFO  [org.kie.kog.add.qua.mes.com.QuarkusKogitoExtensionInitializer] (main) Registered Kogito CloudEvent extension
2024-02-15 08:27:50,673 INFO  [io.quarkus] (main) serverless-workflow-project 1.0.0-SNAPSHOT on JVM (powered by Quarkus 3.2.9.Final) started in 2.157s. Listening on: http://0.0.0.0:8080
2024-02-15 08:27:50,673 INFO  [io.quarkus] (main) Profile prod activated. 
2024-02-15 08:27:50,673 INFO  [io.quarkus] (main) Installed features: [cache, cdi, jackson-jq, kogito-addon-events-process-extension, kogito-addon-jobs-knative-eventing-extension, kogito-addon-knative-eventing-extension, kogito-addon-kubernetes-extension, kogito-addon-messaging-extension, kogito-addon-microprofile-config-service-catalog-extension, kogito-addon-process-management-extension, kogito-addon-source-files-extension, kogito-addons-quarkus-knative-serving, kogito-serverless-workflow, kubernetes, kubernetes-client, qute, reactive-routes, rest-client, rest-client-jackson, resteasy, resteasy-jackson, security, security-properties-file, smallrye-context-propagation, smallrye-health, smallrye-openapi, smallrye-reactive-messaging, smallrye-reactive-messaging-http, vertx]
2024-02-15 08:27:50,675 WARN  [io.sma.rea.mes.provider] (vert.x-eventloop-thread-7) SRMSG00234: Failed to emit a Message to the channel: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: sonataflow-platform-data-index-service.sonataflow-infra/172.31.200.9:80
Caused by: java.net.ConnectException: Connection refused
	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946)
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:840)

2024-02-15 08:27:50,676 ERROR [org.kie.kog.eve.pro.ReactiveMessagingEventPublisher] (vert.x-eventloop-thread-7) Error while publishing message org.eclipse.microprofile.reactive.messaging.Message$8@7f469c1a: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: sonataflow-platform-data-index-service.sonataflow-infra/172.31.200.9:80
Caused by: java.net.ConnectException: Connection refused
	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946)
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:840)

2024-02-15 08:28:04,460 INFO  [io.sma.health] (executor-thread-1) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Data Index Availability - startup check","status":"DOWN","data":{"error":"[unknown] - io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: sonataflow-platform-data-index-service.sonataflow-infra/172.31.200.9:80"}},{"name":"SmallRye Reactive Messaging - startup check","status":"UP"}]}
2024-02-15 08:28:19,371 INFO  [io.sma.health] (executor-thread-1) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Data Index Availability - startup check","status":"DOWN","data":{"error":"[unknown] - io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: sonataflow-platform-data-index-service.sonataflow-infra/172.31.200.9:80"}},{"name":"SmallRye Reactive Messaging - startup check","status":"UP"}]}
2024-02-15 08:28:34,375 INFO  [io.sma.health] (executor-thread-1) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Data Index Availability - startup check","status":"DOWN","data":{"error":"[unknown] - io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: sonataflow-platform-data-index-service.sonataflow-infra/172.31.200.9:80"}},{"name":"SmallRye Reactive Messaging - startup check","status":"UP"}]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants