Monitoring Horreum in prod environment #365

johnaohara · 2023-02-07T19:01:53Z

johnaohara
Feb 7, 2023
Maintainer

I was thinking about: #342 and observability in general. I was able to identify the root cause of the issue using custom opentelemetry spans to track the tasks placed in the taskqueue.

How do we want to observe a running prod instance of horreum, want info do we want to track and how do we want to access it? For example, there is not enough information to understand what is happening in #342 from the logs even at a debug level.

Do we want to rely on grepping debug logs to find information, or do we want some form of observability tool to be able to query recorded events and obtain the necessary information to understand what is happening the running instance?

It is relatively simple to add OpenTelemetry as a telemetry backend, but the neccesary infrastructure is required to process the telemetry data (otel collector, promethues, jaeger etc). the flip side is the Quarkus OpenTelemetry plugin provides insight and error tracing in some of the subsystems in Quarkus. Including parts of the system we would not naturally think to instrument, e.g.;

jesperpedersen · 2023-02-07T19:55:00Z

jesperpedersen
Feb 7, 2023
Maintainer

OpenTelemetry might be a CNCF project (incubating), but we need to follow the industry standard which is Prometheus.

We need to provide Prometheus metrics to highlight issues in the application that can be explored by looking in the log. We need to have log for critical issues such that we can investigate.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring Horreum in prod environment #365

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Monitoring Horreum in prod environment #365

johnaohara Feb 7, 2023 Maintainer

Replies: 1 comment

jesperpedersen Feb 7, 2023 Maintainer

johnaohara
Feb 7, 2023
Maintainer

jesperpedersen
Feb 7, 2023
Maintainer