Monitoring Horreum in prod environment #365
johnaohara
started this conversation in
Ideas
Replies: 1 comment
-
OpenTelemetry might be a CNCF project (incubating), but we need to follow the industry standard which is Prometheus. We need to provide Prometheus metrics to highlight issues in the application that can be explored by looking in the log. We need to have log for critical issues such that we can investigate. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I was thinking about: #342 and observability in general. I was able to identify the root cause of the issue using custom opentelemetry spans to track the tasks placed in the taskqueue.
How do we want to observe a running prod instance of horreum, want info do we want to track and how do we want to access it? For example, there is not enough information to understand what is happening in #342 from the logs even at a debug level.
Do we want to rely on grepping debug logs to find information, or do we want some form of observability tool to be able to query recorded events and obtain the necessary information to understand what is happening the running instance?
It is relatively simple to add OpenTelemetry as a telemetry backend, but the neccesary infrastructure is required to process the telemetry data (otel collector, promethues, jaeger etc). the flip side is the Quarkus OpenTelemetry plugin provides insight and error tracing in some of the subsystems in Quarkus. Including parts of the system we would not naturally think to instrument, e.g.;
Beta Was this translation helpful? Give feedback.
All reactions