From d9a142e136423a0679c39c9d78aa941b3fa44aaf Mon Sep 17 00:00:00 2001 From: Stefan Prodan Date: Thu, 24 Aug 2023 12:08:42 +0300 Subject: [PATCH] Add Flux alerts to monitoring section Signed-off-by: Stefan Prodan --- content/en/flux/guides/monitoring.md | 247 ------------------ .../notifications.md => monitoring/alerts.md} | 120 +++++---- content/en/flux/monitoring/custom-metrics.md | 2 +- content/en/flux/monitoring/events.md | 2 +- content/en/flux/monitoring/logs.md | 2 +- content/en/flux/monitoring/metrics.md | 2 +- static/_redirects | 2 + 7 files changed, 82 insertions(+), 295 deletions(-) delete mode 100644 content/en/flux/guides/monitoring.md rename content/en/flux/{guides/notifications.md => monitoring/alerts.md} (80%) diff --git a/content/en/flux/guides/monitoring.md b/content/en/flux/guides/monitoring.md deleted file mode 100644 index 6cb8ccb43..000000000 --- a/content/en/flux/guides/monitoring.md +++ /dev/null @@ -1,247 +0,0 @@ ---- -title: "Monitoring with Prometheus" -linkTitle: "Monitoring with Prometheus" -description: "Monitoring Flux with Prometheus Operator and Grafana." -weight: 50 ---- - -This guide walks you through configuring monitoring for the Flux control plane. - -Flux uses [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) -to provide a monitoring stack made out of: - -* **Prometheus Operator** - manages Prometheus clusters atop Kubernetes -* **Prometheus** - collects metrics from the Flux controllers and Kubernetes API -* **Grafana** dashboards - displays the Flux control plane resource usage and reconciliation stats -* **kube-state-metrics** - generates metrics about the state of the Kubernetes objects - -## Install the Prometheus stack - -To install the monitoring stack with `flux`, first register the Git repository on your cluster: - -```sh -flux create source git flux-monitoring \ - --interval=30m \ - --url=https://github.com/fluxcd/flux2 \ - --branch=main -``` - -Then apply the [manifests/monitoring/kube-prometheus-stack](https://github.com/fluxcd/flux2/tree/main/manifests/monitoring/kube-prometheus-stack) -kustomization: - -```sh -flux create kustomization kube-prometheus-stack \ - --interval=1h \ - --prune \ - --source=flux-monitoring \ - --path="./manifests/monitoring/kube-prometheus-stack" \ - --health-check-timeout=5m \ - --wait -``` - -The above Kustomization will install the kube-prometheus-stack Helm release in the `monitoring` namespace. - -{{% alert color="warning" title="Prometheus Configuration" %}} -Note that the above configuration is not suitable for production. -In order to configure long term storage for metrics -and highly availability for Prometheus consult the Helm -chart [documentation](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack). -{{% /alert %}} - -## Install the Loki stack (optional) - -To install Grafana Loki and Promtail in the `monitoring` namespace, apply the -[manifests/monitoring/loki-stack](https://github.com/fluxcd/flux2/tree/main/manifests/monitoring/loki-stack) -kustomization: - -```sh -flux create kustomization loki-stack \ - --depends-on=kube-prometheus-stack \ - --interval=1h \ - --prune \ - --source=flux-monitoring \ - --path="./manifests/monitoring/loki-stack" \ - --health-check-timeout=5m \ - --wait -``` - -## Install Flux Grafana dashboards - -Note that the Flux controllers expose the `/metrics` endpoint on port `8080`. -When using Prometheus Operator you need a `PodMonitor` object to configure scraping for the controllers. - -Apply the [manifests/monitoring/monitoring-config](https://github.com/fluxcd/flux2/tree/main/manifests/monitoring/monitoring-config) -containing the `PodMonitor` and the `ConfigMap` with Flux's Grafana dashboards: - -```sh -flux create kustomization monitoring-config \ - --depends-on=kube-prometheus-stack \ - --interval=1h \ - --prune=true \ - --source=flux-monitoring \ - --path="./manifests/monitoring/monitoring-config" \ - --health-check-timeout=1m \ - --wait -``` - -You can access Grafana using port forwarding: - -```sh -kubectl -n monitoring port-forward svc/kube-prometheus-stack-grafana 3000:80 -``` - -To log in to the Grafana dashboard, you can use the default credentials from the -[kube-prometheus-stack chart](https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml): - -```yaml -username: admin -password: prom-operator -``` - -## Flux dashboards - -Control plane dashboard [http://localhost:3000/d/flux-control-plane](http://localhost:3000/d/flux-control-plane/flux-control-plane): - -![Control Plane Dashboard - Part 1](/img/cp-dashboard-p1.png) - -![Control Plane Dashboard - Part 2](/img/cp-dashboard-p2.png) - -Cluster reconciliation dashboard [http://localhost:3000/d/flux-cluster](http://localhost:3000/d/flux-cluster/flux-cluster-stats): - -![Cluster reconciliation dashboard](/img/cluster-dashboard.png) - -Control plane logs [http://localhost:3000/d/flux-logs](http://localhost:3000/d/flux-logs/flux-logs): - -![Control plane logs dashboard](/img/logs-dashboard.png) - -If you wish to use your own Prometheus and Grafana instances, then you can import the dashboards from -[GitHub](https://github.com/fluxcd/flux2/tree/main/manifests/monitoring/monitoring-config/dashboards). - -## Grafana annotations - -![Annotations Dashboard](/img/grafana-annotation.png) - -To display the Flux notifications on Grafana dashboards -you can configure Flux to push events to Grafana annotations API: - -```yaml -apiVersion: notification.toolkit.fluxcd.io/v1beta2 -kind: Alert -metadata: - name: grafana - namespace: monitoring -spec: - providerRef: - name: grafana - eventSeverity: info - eventSources: - - kind: GitRepository - name: '*' - namespace: flux-system ---- -apiVersion: notification.toolkit.fluxcd.io/v1beta2 -kind: Provider -metadata: - name: grafana - namespace: monitoring -spec: - type: grafana - address: "http://kube-prometheus-stack-grafana.monitoring/api/annotations" - secretRef: - name: grafana-auth -``` - -For more details on how to integrate Flux with Grafana API please see the -[Grafana provider documentation](/flux/components/notification/provider/#grafana). - -## Metrics - -For each `toolkit.fluxcd.io` kind, -the controllers expose a gauge metric to track the Ready condition status, -and a histogram with the reconciliation duration in seconds. - -Ready status metrics: - -```sh -gotk_reconcile_condition{kind, name, namespace, type="Ready", status="True"} -gotk_reconcile_condition{kind, name, namespace, type="Ready", status="False"} -gotk_reconcile_condition{kind, name, namespace, type="Ready", status="Unknown"} -gotk_reconcile_condition{kind, name, namespace, type="Ready", status="Deleted"} -``` - -Suspend status metrics: - -```sh -gotk_suspend_status{kind, name, namespace} -``` - -Time spent reconciling: - -```sh -gotk_reconcile_duration_seconds_bucket{kind, name, namespace, le} -gotk_reconcile_duration_seconds_sum{kind, name, namespace} -gotk_reconcile_duration_seconds_count{kind, name, namespace} -``` - -Alert manager example: - -```yaml -groups: - - name: Flux - rules: - - alert: ReconciliationFailure - expr: max(gotk_reconcile_condition{status="False",type="Ready"}) by (exported_namespace, name, kind) + on(exported_namespace, name, kind) (max(gotk_reconcile_condition{status="Deleted"}) by (exported_namespace, name, kind)) * 2 == 1 - for: 10m - labels: - severity: page - annotations: - summary: '{{ $labels.kind }} {{ $labels.exported_namespace }}/{{ $labels.name }} reconciliation has been failing for more than ten minutes.' -``` - -## Logs - -The Flux controllers follow the Kubernetes structured logging conventions. -The logs are written to `stderr` in JSON format, with the following common tags: - -- `logger` controller reconciler name -- `ts` timestamp in the ISO 8601 format -- `level` can be `debug`, `info` or `error` -- `msg` info or error description -- `error` error details - -Example of a `info` log: - -```json -{ - "level": "info", - "ts": "2022-06-03T11:42:49.159Z", - "logger": "controller.kustomization", - "msg": "server-side apply completed", - "name": "demo-frontend", - "namespace": "msdemo", - "revision": "main@sha1:30081ad7170fb8168536768fe399493dd43160d7", - "output": { - "ConfigMap/msdemo/demo-frontend-redis": "created", - "Deployment/msdemo/demo-frontend-app": "configured", - "Deployment/msdemo/demo-frontend-redis": "created", - "HorizontalPodAutoscaler/msdemo/demo-frontend-app": "deleted", - "Service/msdemo/demo-frontend-app": "unchanged", - "Service/msdemo/demo-frontend-redis": "created" - } -} -``` - -Example of an `error` log: - -```json -{ - "level": "error", - "ts": "2022-06-03T12:42:05.849Z", - "logger": "controller.kustomization", - "msg": "Reconciliation failed after 1.864823186s, next try in 5m0s", - "name": "demo-frontend", - "namespace": "msdemo", - "revision": "main@sha1:f68c334e0f5fae791d1e47dbcabed256f4f89e68", - "error": "Service/msdemo/frontend dry-run failed, reason: Invalid, error: Service frontend is invalid: spec.type: Unsupported value: Ingress" -} -``` diff --git a/content/en/flux/guides/notifications.md b/content/en/flux/monitoring/alerts.md similarity index 80% rename from content/en/flux/guides/notifications.md rename to content/en/flux/monitoring/alerts.md index 141fbda50..baf7d7895 100644 --- a/content/en/flux/guides/notifications.md +++ b/content/en/flux/monitoring/alerts.md @@ -1,11 +1,8 @@ --- -title: "Setup Notifications" -linkTitle: "Setup Notifications" -description: "Configure alerting for Slack, Teams, Discord and others using Flux notification controller." -weight: 30 -card: - name: tasks - weight: 50 +title: "Flux alerts" +linkTitle: "Alerts" +description: "Configure alerting for Slack, Teams, Discord and others using Flux notification controller" +weight: 1 --- When operating a cluster, different teams may wish to receive notifications about @@ -16,48 +13,44 @@ of an app was deployed and if the deployment is healthy. ## Prerequisites -To follow this guide you'll need a Kubernetes cluster with Flux installed on it. -Please see the [get started guide](../get-started/index.md) -or the [installation guide](../installation/). +To follow this guide you'll need a Kubernetes cluster bootstrap with Flux. +Please see the [get started guide](/flux/get-started/) +or the [installation guide](/flux/installation/). The Flux controllers emit Kubernetes events whenever a resource status changes. -You can use the [notification-controller](../components/notification/_index.md) +You can use the [notification-controller](/flux/components/notification/) to forward these events to Slack, Microsoft Teams, Discord and others. The notification controller is part of the default Flux installation. ## Define a provider -First create a secret with your Slack incoming webhook: +First create a secret with your Slack bot token: ```sh -kubectl -n flux-system create secret generic slack-url \ ---from-literal=address=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK +kubectl -n flagger-system create secret generic slack-bot-token \ +--from-literal=token=xoxb-YOUR-TOKEN ``` -Note that the secret must contain an `address` field, -it can be a Slack, Microsoft Teams, Discord or Rocket webhook URL. - Create a notification provider for Slack by referencing the above secret: ```yaml apiVersion: notification.toolkit.fluxcd.io/v1beta2 kind: Provider metadata: - name: slack - namespace: flux-system + name: slack-bot + namespace: flagger-system spec: type: slack - channel: some-channel-name + channel: general + address: https://slack.com/api/chat.postMessage secretRef: - name: slack-url + name: slack-bot-token ``` -The provider type can be `slack`, `msteams`, `discord`, `rocket`, `googlechat`, `webex`, `sentry` or `generic`. - -When type `generic` is specified, the notification controller will post the incoming -[event](../components/notification/event.md) in JSON format to the webhook address. -This way you can create custom handlers that can store the events in -Elasticsearch, CloudWatch, Stackdriver, etc. +{{% alert color="info" title="Providers" %}} +Flux supports various providers such as Discord, PagerDuty, Teams, Telegram, Sentry and many others. +For a complete list please see the [Provider `.spec.type` documentation](/flux/components/notification/provider/#type). +{{% /alert %}} ## Define an alert @@ -70,9 +63,13 @@ metadata: name: on-call-webapp namespace: flux-system spec: - summary: "production cluster" + summary: "cluster addons" + eventMetadata: + env: "production" + cluster: "my-cluster" + region: "us-east-2" providerRef: - name: slack + name: slack-bot eventSeverity: info eventSources: - kind: GitRepository @@ -81,15 +78,12 @@ spec: name: '*' ``` -Apply the above files or commit them to the `fleet-infra` repository. +Apply the above files or commit them to the bootstrap repository. -To verify that the alert has been acknowledge by the notification controller do: +To verify that the alert has been acknowledged by the notification controller do: ```sh -$ kubectl -n flux-system get alerts - -NAME READY STATUS AGE -on-call-webapp True Initialized 1m +flux get alerts ``` Multiple alerts can be used to send notifications to different channels or Slack organizations. @@ -113,17 +107,13 @@ When the verbosity is set to `info`, the controller will alert if: ## Git commit status -The GitHub, GitLab, Bitbucket, and Azure DevOps providers are slightly different to the other providers. Instead of -a stateless stream of events, the git notification providers will link the event with accompanying git commit which +The GitHub, GitLab, Gitea, Bitbucket, and Azure DevOps providers are slightly different to the other providers. Instead of +a stateless stream of events, the Git notification providers will link the event with accompanying Git commit which triggered the event. The linking is done by updating the commit status of a specific commit. - - [GitHub](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-status-checks) - - [GitLab](https://docs.gitlab.com/ee/api/commits.html) - - [Bitbucket](https://developer.atlassian.com/server/bitbucket/how-tos/updating-build-status-for-commits/) - - [Azure DevOps](https://docs.microsoft.com/en-us/rest/api/azure/devops/git/statuses?view=azure-devops-rest-6.0) - In GitHub the commit status set by notification-controller will result in a green checkmark or red cross next to the commit hash. Clicking the icon will show more detailed information about the status. + ![commit status GitHub overview](/img/commit-status-github-overview.png) Receiving an event in the form of a commit status rather than a message in a chat conversation has the benefit @@ -138,11 +128,12 @@ When a new commit is pushed to the repository, source-controller will sync the c to reconcile the new commit. After this is done the kustomize-controller sends an event to the notification-controller with the result and the commit hash it reconciled. Then notification-controller can update the correct commit and repository when receiving the event. + ![commit status flow](/img/commit-status-flow.png) {{% alert color="info" title="Limitations" %}} -The git notification providers require that a commit hash present in the meta data -of the event. Therefore the providers will only work with `Kustomization` as an +The git notification providers require that a commit hash present in the metadata +of the event. Therefore, the providers will only work with `Kustomization` as an event source, as it is the only resource which includes this data. {{% /alert %}} @@ -152,6 +143,7 @@ the git provider used, refer to the [Provider CRD](/flux/components/notification for details about how to get the correct token. The guide will use GitHub, but the other providers will work in a very similar manner. The token will need to have write access to the repository it is going to update the commit status in. Store the generated token in a Secret with the following data format in the cluster. + ```yaml apiVersion: v1 kind: Secret @@ -170,6 +162,7 @@ if the manifests comes from a repository which the API token is not allowed to w Copy the manifest content in the "[kustomize](https://github.com/stefanprodan/podinfo/tree/master/kustomize)" directory into the directory "./clusters/my-cluster/podinfo" in your fleet-infra repository. Make sure that you also add the namespace podinfo. + ```yaml apiVersion: v1 kind: Namespace @@ -178,6 +171,7 @@ metadata: ``` Then create a Kustomization to deploy podinfo. + ```yaml apiVersion: kustomize.toolkit.fluxcd.io/v1 kind: Kustomization @@ -203,6 +197,7 @@ spec: Creating a git provider is very similar to creating other types of providers. The only caveat being that the provider address needs to point to the same git repository as the event source originates from. + ```yaml apiVersion: notification.toolkit.fluxcd.io/v1beta2 kind: Provider @@ -262,6 +257,7 @@ Clicking the check-mark should show a detailed view. Generate error A deployment failure can be forced by setting an invalid image tag in the podinfo deployment. + ```yaml apiVersion: apps/v1 kind: Deployment @@ -302,3 +298,39 @@ It is important to keep this in mind when building any automation tools that dea status, and consider the fact that receiving a successful status once does not mean it will always be successful. +## Grafana annotations + +![Annotations Dashboard](/img/grafana-annotation.png) + +To display the Flux notifications on Grafana dashboards +you can configure Flux to push events to Grafana annotations API: + +```yaml +apiVersion: notification.toolkit.fluxcd.io/v1beta2 +kind: Alert +metadata: + name: grafana + namespace: monitoring +spec: + providerRef: + name: grafana + eventSeverity: info + eventSources: + - kind: GitRepository + name: '*' + namespace: flux-system +--- +apiVersion: notification.toolkit.fluxcd.io/v1beta2 +kind: Provider +metadata: + name: grafana + namespace: monitoring +spec: + type: grafana + address: "http://kube-prometheus-stack-grafana.monitoring/api/annotations" + secretRef: + name: grafana-auth +``` + +For more details on how to integrate Flux with Grafana API please see the +[Grafana provider documentation](/flux/components/notification/provider/#grafana). diff --git a/content/en/flux/monitoring/custom-metrics.md b/content/en/flux/monitoring/custom-metrics.md index 8899162ec..eba818ff2 100644 --- a/content/en/flux/monitoring/custom-metrics.md +++ b/content/en/flux/monitoring/custom-metrics.md @@ -2,7 +2,7 @@ title: "Flux custom Prometheus metrics" linkTitle: "Custom metrics" description: "How to extend the Flux Prometheus metrics with kube-state-metrics" -weight: 2 +weight: 3 --- By default, the standard installation of Flux exports a specific set of metrics diff --git a/content/en/flux/monitoring/events.md b/content/en/flux/monitoring/events.md index 82c56e7fa..d4ca8df1d 100644 --- a/content/en/flux/monitoring/events.md +++ b/content/en/flux/monitoring/events.md @@ -2,7 +2,7 @@ title: "Flux events" linkTitle: "Events" description: "How to monitor the Flux events" -weight: 4 +weight: 5 --- The Flux controllers emit [Kubernetes events][kubernetes-events] during the diff --git a/content/en/flux/monitoring/logs.md b/content/en/flux/monitoring/logs.md index c3cfc2b43..0462d0c8f 100644 --- a/content/en/flux/monitoring/logs.md +++ b/content/en/flux/monitoring/logs.md @@ -2,7 +2,7 @@ title: "Flux logs" linkTitle: "Logs" description: "How to monitor the Flux logs with Loki and Grafana" -weight: 3 +weight: 4 --- The Flux controllers follow the Kubernetes structured logging conventions. These diff --git a/content/en/flux/monitoring/metrics.md b/content/en/flux/monitoring/metrics.md index 6e43ec64a..50f7276e8 100644 --- a/content/en/flux/monitoring/metrics.md +++ b/content/en/flux/monitoring/metrics.md @@ -2,7 +2,7 @@ title: "Flux Prometheus metrics" linkTitle: "Metrics" description: "How to monitor Flux with Prometheus Operator and Grafana" -weight: 1 +weight: 2 --- Flux has native support for [Prometheus][prometheus] metrics to provide insights diff --git a/static/_redirects b/static/_redirects index 1c55b48c7..d3f9fe2b6 100644 --- a/static/_redirects +++ b/static/_redirects @@ -32,6 +32,8 @@ /flux/use-cases/openshift /flux/installation/configuration/openshift 301! /flux/cheatsheets/sharding /flux/installation/configuration/sharding 301! /flux/cheatsheets/bootstrap /flux/installation/configuration 301! +/flux/guides/monitoring /flux/monitoring/metrics 301! +/flux/guides/notifications /flux/monitoring/alerts 301! /docs/contributing/* /contributing/:splat 301! /docs/* /flux/:splat 301!