Monitoring and Observability is an ACES kernel component that delivers comprehensive monitoring and observability capabilities across various software stack layers, including edge, application, network, and cloud. In that, it ensures proactive issue identification and resolution, ultimately promoting smooth operations and optimal resource utilization. The ACES Monitoring and Observability component utilizes open-source tools to achieve comprehensive telemetry collection from ACES assets and instrumented applications. These assets, encompassing workloads, clusters, infrastructure elements, and applications, generate various data sources like metrics, logs, and traces. The collected data undergoes analysis for anomaly detection and alert generation. This processed information is then strategically distributed across all ACES components, facilitating real-time system-wide visibility for informed decision-making.
As shown in the architecture, the component is composed of the following functionalities and subcomponents:
Component | Functionality / Sub-Component | Functionality Description | Technologies Used |
---|---|---|---|
Monitoring and Observability component | Instrumentation & export | Instruments the applications and exports the metrics, logs, telemetry. | OpenTelemetry API/SDK and collector (instrumentation, exporter), KumuluzEE, KumuluzEE Metrics Extension |
Telemetry collector | A proxy to receive, process and export telemetry data to the monitoring backend. | OpenTelemetry collector | |
Monitoring system | Monitoring backend, ingression of metrics, anomaly detection, alerting, analysis, and visualization. | Prometheus | |
Forwarder | Ingests/converts monitoring and telemetry data and dispatches them to the event store/processing. | prometheus-kafka-adapter | |
ETL/stream aggregation | Data aggregation and transformation service. | KumuluzEE, Kafka Java Client | |
Visualization | Monitoring and observability data visualization and analysis. | Grafana | |
Event store and stream processing | Raw and aggregated event and metric/telemetry dispatch to other ACES components. | Kafka, Zookeper, Kafka UI, NATS Jetstream |
You can deploy an entire application using docker-compose. All the docker images are already built and pushed to the docker hub.
docker-compose up -d
# stop the services
docker-compose down
You can deploy the application to minikube using the following commands:
minikube delete --all
minikube start
#create namespace ul
kubectl create namespace ul
# move to minikube context
kubectl config use-context minikube
# move the namespace to ul
kubectl config set-context --current --namespace=ul
# deploy the rest of the services
kubectl apply -f deployment/k8s -n ul
# wait for the services to be ready
kubectl wait pod --for=condition=Ready --all --timeout=300s -n ul
# go to grafana web page (username: admin, password: admin)
minikube service grafana -n ul
# check logs of a service
kubectl logs -f <service-name> -n ul
You can deploy the application to minikube using the following commands:
minikube delete --all
minikube start
#create namespace ul
kubectl create namespace ul
# move to minikube context
kubectl config use-context minikube
# move the namespace to ul
kubectl config set-context --current --namespace=ul
# deploy the rest of the services
helm install monitoring-observability charts/app-0.1.0.tgz -n ul
# wait for the services to be ready
kubectl wait pod --for=condition=Ready --all --timeout=300s -n ul
# go to grafana web page (username: admin, password: admin)
minikube service grafana -n ul
# check logs of a service
kubectl logs -f <service-name> -n ul
For evaluation purposes, a system with three Quarkus microservices, an OpenTelemetry collector, a Prometheus and Kafka instance and KumuluzEE Java aggregation microservice with an additional Grafana dashboard is created. Prometheus is collecting data from microservices and OpenTelemetry collector and feds it into Kafka using the Prometheus Kafka Adapter. This data is then read by a KumuluzEE Java microservice, which also produces four topics that aggregate the collected data. Additionally, we have a Kafka UI for easy management of the Kafka system.
We have created 5 Kafka topics - one with raw data and 4 with aggregated data.
This is how the stream of metrcic_values_WMA looks like:
Finally, a Grafana dashboard with alerts is set up (assets available in /demo-resources
directory):
-
Asset demo 1: A dummy Quarkus service that produces random metrics. It's accessible on port 8082.
-
Asset demo 2: A dummy Quarkus service that produces random metrics. It's accessible on port 8083.
-
Asset demo 3: A dummy Quarkus service that produces random metrics. It's accessible on port 8084.
-
Prometheus: Monitoring and alerting toolkit. It's configured with a custom configuration file and accessible on port 9090. It's set up to scrape metrics from the dummy services.
-
Nats Jetstream: A NATS streaming server that is used for event dispatching. It's accessible on port 4222.
-
Prometheus Nats Adapter: A service that reads metrics from Prometheus and dispatches them to NATS. It's accessible on port 5000.
-
Aggregation Service: A service that produces and consumes from the Nats Jetstream. It's accessible on port 8085.
-
OpenTelemetry collector: Exposes a telemetry collection endpoint that is fed into the system. Exposed on port 8888.
-
Grafana: A visualization and dashboarding tool. It is accessible on port 3000.
This project is licensed under the terms of the GNU General Public License v3.0. See the LICENSE file for details.
© 2024 Faculty of Computer and Information Science, University of Ljubljana