diff --git a/content/en/flux/guides/monitoring.md b/content/en/flux/guides/monitoring.md deleted file mode 100644 index 6cb8ccb43..000000000 --- a/content/en/flux/guides/monitoring.md +++ /dev/null @@ -1,247 +0,0 @@ ---- -title: "Monitoring with Prometheus" -linkTitle: "Monitoring with Prometheus" -description: "Monitoring Flux with Prometheus Operator and Grafana." -weight: 50 ---- - -This guide walks you through configuring monitoring for the Flux control plane. - -Flux uses [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) -to provide a monitoring stack made out of: - -* **Prometheus Operator** - manages Prometheus clusters atop Kubernetes -* **Prometheus** - collects metrics from the Flux controllers and Kubernetes API -* **Grafana** dashboards - displays the Flux control plane resource usage and reconciliation stats -* **kube-state-metrics** - generates metrics about the state of the Kubernetes objects - -## Install the Prometheus stack - -To install the monitoring stack with `flux`, first register the Git repository on your cluster: - -```sh -flux create source git flux-monitoring \ - --interval=30m \ - --url=https://github.com/fluxcd/flux2 \ - --branch=main -``` - -Then apply the [manifests/monitoring/kube-prometheus-stack](https://github.com/fluxcd/flux2/tree/main/manifests/monitoring/kube-prometheus-stack) -kustomization: - -```sh -flux create kustomization kube-prometheus-stack \ - --interval=1h \ - --prune \ - --source=flux-monitoring \ - --path="./manifests/monitoring/kube-prometheus-stack" \ - --health-check-timeout=5m \ - --wait -``` - -The above Kustomization will install the kube-prometheus-stack Helm release in the `monitoring` namespace. - -{{% alert color="warning" title="Prometheus Configuration" %}} -Note that the above configuration is not suitable for production. -In order to configure long term storage for metrics -and highly availability for Prometheus consult the Helm -chart [documentation](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack). -{{% /alert %}} - -## Install the Loki stack (optional) - -To install Grafana Loki and Promtail in the `monitoring` namespace, apply the -[manifests/monitoring/loki-stack](https://github.com/fluxcd/flux2/tree/main/manifests/monitoring/loki-stack) -kustomization: - -```sh -flux create kustomization loki-stack \ - --depends-on=kube-prometheus-stack \ - --interval=1h \ - --prune \ - --source=flux-monitoring \ - --path="./manifests/monitoring/loki-stack" \ - --health-check-timeout=5m \ - --wait -``` - -## Install Flux Grafana dashboards - -Note that the Flux controllers expose the `/metrics` endpoint on port `8080`. -When using Prometheus Operator you need a `PodMonitor` object to configure scraping for the controllers. - -Apply the [manifests/monitoring/monitoring-config](https://github.com/fluxcd/flux2/tree/main/manifests/monitoring/monitoring-config) -containing the `PodMonitor` and the `ConfigMap` with Flux's Grafana dashboards: - -```sh -flux create kustomization monitoring-config \ - --depends-on=kube-prometheus-stack \ - --interval=1h \ - --prune=true \ - --source=flux-monitoring \ - --path="./manifests/monitoring/monitoring-config" \ - --health-check-timeout=1m \ - --wait -``` - -You can access Grafana using port forwarding: - -```sh -kubectl -n monitoring port-forward svc/kube-prometheus-stack-grafana 3000:80 -``` - -To log in to the Grafana dashboard, you can use the default credentials from the -[kube-prometheus-stack chart](https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml): - -```yaml -username: admin -password: prom-operator -``` - -## Flux dashboards - -Control plane dashboard [http://localhost:3000/d/flux-control-plane](http://localhost:3000/d/flux-control-plane/flux-control-plane): - -![Control Plane Dashboard - Part 1](/img/cp-dashboard-p1.png) - -![Control Plane Dashboard - Part 2](/img/cp-dashboard-p2.png) - -Cluster reconciliation dashboard [http://localhost:3000/d/flux-cluster](http://localhost:3000/d/flux-cluster/flux-cluster-stats): - -![Cluster reconciliation dashboard](/img/cluster-dashboard.png) - -Control plane logs [http://localhost:3000/d/flux-logs](http://localhost:3000/d/flux-logs/flux-logs): - -![Control plane logs dashboard](/img/logs-dashboard.png) - -If you wish to use your own Prometheus and Grafana instances, then you can import the dashboards from -[GitHub](https://github.com/fluxcd/flux2/tree/main/manifests/monitoring/monitoring-config/dashboards). - -## Grafana annotations - -![Annotations Dashboard](/img/grafana-annotation.png) - -To display the Flux notifications on Grafana dashboards -you can configure Flux to push events to Grafana annotations API: - -```yaml -apiVersion: notification.toolkit.fluxcd.io/v1beta2 -kind: Alert -metadata: - name: grafana - namespace: monitoring -spec: - providerRef: - name: grafana - eventSeverity: info - eventSources: - - kind: GitRepository - name: '*' - namespace: flux-system ---- -apiVersion: notification.toolkit.fluxcd.io/v1beta2 -kind: Provider -metadata: - name: grafana - namespace: monitoring -spec: - type: grafana - address: "http://kube-prometheus-stack-grafana.monitoring/api/annotations" - secretRef: - name: grafana-auth -``` - -For more details on how to integrate Flux with Grafana API please see the -[Grafana provider documentation](/flux/components/notification/provider/#grafana). - -## Metrics - -For each `toolkit.fluxcd.io` kind, -the controllers expose a gauge metric to track the Ready condition status, -and a histogram with the reconciliation duration in seconds. - -Ready status metrics: - -```sh -gotk_reconcile_condition{kind, name, namespace, type="Ready", status="True"} -gotk_reconcile_condition{kind, name, namespace, type="Ready", status="False"} -gotk_reconcile_condition{kind, name, namespace, type="Ready", status="Unknown"} -gotk_reconcile_condition{kind, name, namespace, type="Ready", status="Deleted"} -``` - -Suspend status metrics: - -```sh -gotk_suspend_status{kind, name, namespace} -``` - -Time spent reconciling: - -```sh -gotk_reconcile_duration_seconds_bucket{kind, name, namespace, le} -gotk_reconcile_duration_seconds_sum{kind, name, namespace} -gotk_reconcile_duration_seconds_count{kind, name, namespace} -``` - -Alert manager example: - -```yaml -groups: - - name: Flux - rules: - - alert: ReconciliationFailure - expr: max(gotk_reconcile_condition{status="False",type="Ready"}) by (exported_namespace, name, kind) + on(exported_namespace, name, kind) (max(gotk_reconcile_condition{status="Deleted"}) by (exported_namespace, name, kind)) * 2 == 1 - for: 10m - labels: - severity: page - annotations: - summary: '{{ $labels.kind }} {{ $labels.exported_namespace }}/{{ $labels.name }} reconciliation has been failing for more than ten minutes.' -``` - -## Logs - -The Flux controllers follow the Kubernetes structured logging conventions. -The logs are written to `stderr` in JSON format, with the following common tags: - -- `logger` controller reconciler name -- `ts` timestamp in the ISO 8601 format -- `level` can be `debug`, `info` or `error` -- `msg` info or error description -- `error` error details - -Example of a `info` log: - -```json -{ - "level": "info", - "ts": "2022-06-03T11:42:49.159Z", - "logger": "controller.kustomization", - "msg": "server-side apply completed", - "name": "demo-frontend", - "namespace": "msdemo", - "revision": "main@sha1:30081ad7170fb8168536768fe399493dd43160d7", - "output": { - "ConfigMap/msdemo/demo-frontend-redis": "created", - "Deployment/msdemo/demo-frontend-app": "configured", - "Deployment/msdemo/demo-frontend-redis": "created", - "HorizontalPodAutoscaler/msdemo/demo-frontend-app": "deleted", - "Service/msdemo/demo-frontend-app": "unchanged", - "Service/msdemo/demo-frontend-redis": "created" - } -} -``` - -Example of an `error` log: - -```json -{ - "level": "error", - "ts": "2022-06-03T12:42:05.849Z", - "logger": "controller.kustomization", - "msg": "Reconciliation failed after 1.864823186s, next try in 5m0s", - "name": "demo-frontend", - "namespace": "msdemo", - "revision": "main@sha1:f68c334e0f5fae791d1e47dbcabed256f4f89e68", - "error": "Service/msdemo/frontend dry-run failed, reason: Invalid, error: Service frontend is invalid: spec.type: Unsupported value: Ingress" -} -``` diff --git a/content/en/flux/migration/flux-v1-migration.md b/content/en/flux/migration/flux-v1-migration.md index 8e1071274..04151afb1 100644 --- a/content/en/flux/migration/flux-v1-migration.md +++ b/content/en/flux/migration/flux-v1-migration.md @@ -284,7 +284,7 @@ flux create alert app \ ``` For more details, read the guides on how to configure -[notifications]({{< relref "../guides/notifications.md" >}}) and +[notifications]({{< relref "../monitoring/alerts.md" >}}) and [webhooks]({{< relref "../guides/webhook-receivers.md" >}}). ### Flux debugging diff --git a/content/en/flux/monitoring/_index.md b/content/en/flux/monitoring/_index.md new file mode 100644 index 000000000..e27c140a6 --- /dev/null +++ b/content/en/flux/monitoring/_index.md @@ -0,0 +1,6 @@ +--- +title: "Flux monitoring" +linkTitle: "Monitoring" +description: "How to configure monitoring for Flux." +weight: 50 +--- diff --git a/content/en/flux/guides/notifications.md b/content/en/flux/monitoring/alerts.md similarity index 80% rename from content/en/flux/guides/notifications.md rename to content/en/flux/monitoring/alerts.md index 141fbda50..baf7d7895 100644 --- a/content/en/flux/guides/notifications.md +++ b/content/en/flux/monitoring/alerts.md @@ -1,11 +1,8 @@ --- -title: "Setup Notifications" -linkTitle: "Setup Notifications" -description: "Configure alerting for Slack, Teams, Discord and others using Flux notification controller." -weight: 30 -card: - name: tasks - weight: 50 +title: "Flux alerts" +linkTitle: "Alerts" +description: "Configure alerting for Slack, Teams, Discord and others using Flux notification controller" +weight: 1 --- When operating a cluster, different teams may wish to receive notifications about @@ -16,48 +13,44 @@ of an app was deployed and if the deployment is healthy. ## Prerequisites -To follow this guide you'll need a Kubernetes cluster with Flux installed on it. -Please see the [get started guide](../get-started/index.md) -or the [installation guide](../installation/). +To follow this guide you'll need a Kubernetes cluster bootstrap with Flux. +Please see the [get started guide](/flux/get-started/) +or the [installation guide](/flux/installation/). The Flux controllers emit Kubernetes events whenever a resource status changes. -You can use the [notification-controller](../components/notification/_index.md) +You can use the [notification-controller](/flux/components/notification/) to forward these events to Slack, Microsoft Teams, Discord and others. The notification controller is part of the default Flux installation. ## Define a provider -First create a secret with your Slack incoming webhook: +First create a secret with your Slack bot token: ```sh -kubectl -n flux-system create secret generic slack-url \ ---from-literal=address=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK +kubectl -n flagger-system create secret generic slack-bot-token \ +--from-literal=token=xoxb-YOUR-TOKEN ``` -Note that the secret must contain an `address` field, -it can be a Slack, Microsoft Teams, Discord or Rocket webhook URL. - Create a notification provider for Slack by referencing the above secret: ```yaml apiVersion: notification.toolkit.fluxcd.io/v1beta2 kind: Provider metadata: - name: slack - namespace: flux-system + name: slack-bot + namespace: flagger-system spec: type: slack - channel: some-channel-name + channel: general + address: https://slack.com/api/chat.postMessage secretRef: - name: slack-url + name: slack-bot-token ``` -The provider type can be `slack`, `msteams`, `discord`, `rocket`, `googlechat`, `webex`, `sentry` or `generic`. - -When type `generic` is specified, the notification controller will post the incoming -[event](../components/notification/event.md) in JSON format to the webhook address. -This way you can create custom handlers that can store the events in -Elasticsearch, CloudWatch, Stackdriver, etc. +{{% alert color="info" title="Providers" %}} +Flux supports various providers such as Discord, PagerDuty, Teams, Telegram, Sentry and many others. +For a complete list please see the [Provider `.spec.type` documentation](/flux/components/notification/provider/#type). +{{% /alert %}} ## Define an alert @@ -70,9 +63,13 @@ metadata: name: on-call-webapp namespace: flux-system spec: - summary: "production cluster" + summary: "cluster addons" + eventMetadata: + env: "production" + cluster: "my-cluster" + region: "us-east-2" providerRef: - name: slack + name: slack-bot eventSeverity: info eventSources: - kind: GitRepository @@ -81,15 +78,12 @@ spec: name: '*' ``` -Apply the above files or commit them to the `fleet-infra` repository. +Apply the above files or commit them to the bootstrap repository. -To verify that the alert has been acknowledge by the notification controller do: +To verify that the alert has been acknowledged by the notification controller do: ```sh -$ kubectl -n flux-system get alerts - -NAME READY STATUS AGE -on-call-webapp True Initialized 1m +flux get alerts ``` Multiple alerts can be used to send notifications to different channels or Slack organizations. @@ -113,17 +107,13 @@ When the verbosity is set to `info`, the controller will alert if: ## Git commit status -The GitHub, GitLab, Bitbucket, and Azure DevOps providers are slightly different to the other providers. Instead of -a stateless stream of events, the git notification providers will link the event with accompanying git commit which +The GitHub, GitLab, Gitea, Bitbucket, and Azure DevOps providers are slightly different to the other providers. Instead of +a stateless stream of events, the Git notification providers will link the event with accompanying Git commit which triggered the event. The linking is done by updating the commit status of a specific commit. - - [GitHub](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-status-checks) - - [GitLab](https://docs.gitlab.com/ee/api/commits.html) - - [Bitbucket](https://developer.atlassian.com/server/bitbucket/how-tos/updating-build-status-for-commits/) - - [Azure DevOps](https://docs.microsoft.com/en-us/rest/api/azure/devops/git/statuses?view=azure-devops-rest-6.0) - In GitHub the commit status set by notification-controller will result in a green checkmark or red cross next to the commit hash. Clicking the icon will show more detailed information about the status. + ![commit status GitHub overview](/img/commit-status-github-overview.png) Receiving an event in the form of a commit status rather than a message in a chat conversation has the benefit @@ -138,11 +128,12 @@ When a new commit is pushed to the repository, source-controller will sync the c to reconcile the new commit. After this is done the kustomize-controller sends an event to the notification-controller with the result and the commit hash it reconciled. Then notification-controller can update the correct commit and repository when receiving the event. + ![commit status flow](/img/commit-status-flow.png) {{% alert color="info" title="Limitations" %}} -The git notification providers require that a commit hash present in the meta data -of the event. Therefore the providers will only work with `Kustomization` as an +The git notification providers require that a commit hash present in the metadata +of the event. Therefore, the providers will only work with `Kustomization` as an event source, as it is the only resource which includes this data. {{% /alert %}} @@ -152,6 +143,7 @@ the git provider used, refer to the [Provider CRD](/flux/components/notification for details about how to get the correct token. The guide will use GitHub, but the other providers will work in a very similar manner. The token will need to have write access to the repository it is going to update the commit status in. Store the generated token in a Secret with the following data format in the cluster. + ```yaml apiVersion: v1 kind: Secret @@ -170,6 +162,7 @@ if the manifests comes from a repository which the API token is not allowed to w Copy the manifest content in the "[kustomize](https://github.com/stefanprodan/podinfo/tree/master/kustomize)" directory into the directory "./clusters/my-cluster/podinfo" in your fleet-infra repository. Make sure that you also add the namespace podinfo. + ```yaml apiVersion: v1 kind: Namespace @@ -178,6 +171,7 @@ metadata: ``` Then create a Kustomization to deploy podinfo. + ```yaml apiVersion: kustomize.toolkit.fluxcd.io/v1 kind: Kustomization @@ -203,6 +197,7 @@ spec: Creating a git provider is very similar to creating other types of providers. The only caveat being that the provider address needs to point to the same git repository as the event source originates from. + ```yaml apiVersion: notification.toolkit.fluxcd.io/v1beta2 kind: Provider @@ -262,6 +257,7 @@ Clicking the check-mark should show a detailed view. Generate error A deployment failure can be forced by setting an invalid image tag in the podinfo deployment. + ```yaml apiVersion: apps/v1 kind: Deployment @@ -302,3 +298,39 @@ It is important to keep this in mind when building any automation tools that dea status, and consider the fact that receiving a successful status once does not mean it will always be successful. +## Grafana annotations + +![Annotations Dashboard](/img/grafana-annotation.png) + +To display the Flux notifications on Grafana dashboards +you can configure Flux to push events to Grafana annotations API: + +```yaml +apiVersion: notification.toolkit.fluxcd.io/v1beta2 +kind: Alert +metadata: + name: grafana + namespace: monitoring +spec: + providerRef: + name: grafana + eventSeverity: info + eventSources: + - kind: GitRepository + name: '*' + namespace: flux-system +--- +apiVersion: notification.toolkit.fluxcd.io/v1beta2 +kind: Provider +metadata: + name: grafana + namespace: monitoring +spec: + type: grafana + address: "http://kube-prometheus-stack-grafana.monitoring/api/annotations" + secretRef: + name: grafana-auth +``` + +For more details on how to integrate Flux with Grafana API please see the +[Grafana provider documentation](/flux/components/notification/provider/#grafana). diff --git a/content/en/flux/monitoring/custom-metrics.md b/content/en/flux/monitoring/custom-metrics.md new file mode 100644 index 000000000..eba818ff2 --- /dev/null +++ b/content/en/flux/monitoring/custom-metrics.md @@ -0,0 +1,203 @@ +--- +title: "Flux custom Prometheus metrics" +linkTitle: "Custom metrics" +description: "How to extend the Flux Prometheus metrics with kube-state-metrics" +weight: 3 +--- + +By default, the standard installation of Flux exports a specific set of metrics +about the controllers and their inner workings that may not serve the needs of +all the users. Some of these metrics are common across all the Flux controllers, +and some are very specific to a few controllers. It's not feasible to +add all the possible informational labels to these metrics, as that may increase +the cardinality of the metrics. Since these metrics are about the operations of +Flux, it may not be much of a benefit to add custom labels to these metrics, +as these are more useful for the people administering and maintaining Flux. Most +of the time, the users of Flux who interact with Flux through the Flux custom +resources want to know about the resources they work with. For example, the +state of GitRepositories and their branches or tag references. These metrics can +be scraped by using [kube-state-metrics (KSM)][kube-state-metrics], which is +part of the [kube-prometheus-stack][kube-prometheus-stack]. KSM can be +configured to add custom labels to the resource metrics, for example, some value +from the status of a resource or some arbitrary value like a team name, department name, etc. + +## Set up kube-state-metrics + +Kube-state-metrics can be installed along with the whole monitoring stack using +kube-prometheus-stack. The +[fluxcd/flux2-monitoring-example][monitoring-example-repo] repository contains +example configurations for deploying and configuring kube-prometheus-stack to +monitor Flux. These configurations will be discussed in detail in the following +sections to show how they can be customized. + +The Kube-prometheus-stack Helm chart is used to install the monitoring stack. +The kube-state-metrics related configuration in the chart values exists in a +separate file called +[kube-state-metrics-config.yaml](https://github.com/fluxcd/flux2-monitoring-example/blob/main/monitoring/controllers/kube-prometheus-stack/kube-state-metrics-config.yaml). +It configures KSM to run in `custom-resource-state-only` mode. In this state, +KSM will not collect metrics for any of the Kubernetes core resources. The +`rbac` section provides KSM access to list and watch Flux custom resources. If +image-reflector-controller and image-automation-controllers are not used, the +API group (`image.toolkit.fluxcd.io`) and resources (`imagerepositories`, +`imagepolicies`, `imageupdateautomations`) can be removed. The +`customResourceState` section configures how the Flux metrics are composed. + +Once deployed, KSM will start collecting the Flux resource metrics from the +kube-apiserver and exporting them as configured. + +## Adding custom metrics + +The example `customResourceState` values used in the above setup add a metric +called `gotk_resource_info` with labels `name`, `exported_namespace`, +`suspended`, `ready`, etc. + +```yaml +- name: "resource_info" + help: "The current state of a GitOps Toolkit resource." + each: + type: Info + info: + labelsFromPath: + name: [metadata, name] + labelsFromPath: + exported_namespace: [metadata, namespace] + suspended: [spec, suspend] + ready: [status, conditions, "[type=Ready]", status] + ... +``` + +This provides the current state of the Flux resources. It can be used to monitor +the readiness of Flux resources. This can be edited to add more labels; more +about that in the next section. + +Similarly, more custom metrics can be added by appending them to the `metrics` +list. For example, to create a metric about the HelmRelease last applied +revision, append the HelmRelease resource metrics section: + +```yaml +... +customResourceState: + config: + spec: + resources: + - groupVersionKind: + group: helm.toolkit.fluxcd.io + version: "v2beta1" + kind: HelmRelease + metricNamePrefix: gotk + metrics: + - name: "resource_info" + help: "The current state of a GitOps Toolkit resource." + each: + type: Info + info: + labelsFromPath: + name: [metadata, name] + labelsFromPath: + exported_namespace: [metadata, namespace] + suspended: [spec, suspend] + ready: [status, conditions, "[type=Ready]", status] + - name: "helmrelease_version_info" + help: "The version information of helm release resource." + each: + type: Info + info: + labelsFromPath: + version: [status, lastAppliedRevision] + labelsFromPath: + name: [metadata, name] + exported_namespace: [metadata, namespace] + chartName: [spec, chart, spec, chart] +... +``` + +In the above, `gotk_resource_info` and `gotk_helmrelease_version_info` metrics +will be exported for HelmReleases. + +``` +# HELP gotk_resource_info The current state of a GitOps Toolkit resource. +# TYPE gotk_resource_info info +gotk_resource_info{customresource_group="helm.toolkit.fluxcd.io",customresource_kind="HelmRelease",customresource_version="v2beta1",exported_namespace="monitoring",name="kube-prometheus-stack",ready="True"} 1 +gotk_resource_info{customresource_group="helm.toolkit.fluxcd.io",customresource_kind="HelmRelease",customresource_version="v2beta1",exported_namespace="monitoring",name="loki-stack",ready="True"} 1 +# HELP gotk_helmrelease_version_info The version information of helm release resource. +# TYPE gotk_helmrelease_version_info info +gotk_helmrelease_version_info{chartName="kube-prometheus-stack",customresource_group="helm.toolkit.fluxcd.io",customresource_kind="HelmRelease",customresource_version="v2beta1",exported_namespace="monitoring",name="kube-prometheus-stack",version="48.3.1"} 1 +gotk_helmrelease_version_info{chartName="loki-stack",customresource_group="helm.toolkit.fluxcd.io",customresource_kind="HelmRelease",customresource_version="v2beta1",exported_namespace="monitoring",name="loki-stack",version="2.9.11"} 1 +``` + +## Adding custom metric labels + +Custom labels can be added to metrics to create more meaningful monitoring +metrics. For example, a common `ownedBy` label across all the resources in a +cluster, `businessUnit` or `department` name from the labels of objects, etc. + +For example, if the GitRepository objects are labelled with `department`: + +```yaml +apiVersion: source.toolkit.fluxcd.io/v1 +kind: GitRepository +metadata: + name: foo + namespace: bar + labels: + department: baz +spec: + interval: 1h + ref: + branch: main + url: https://github.com/fluxcd/flux2-monitoring-example +``` + +The KSM `customResourceState` value can be configured to extract the +department name, owned by team name and Git branch name, and include them as +labels in the `gotk_resource_info` metric of GitRepositories. + +```yaml +... +customResourceState: + config: + spec: + resources: + - groupVersionKind: + group: source.toolkit.fluxcd.io + version: "v1" + kind: GitRepository + metricNamePrefix: gotk + commonLabels: + ownedBy: teamA + labelsFromPath: + department: [metadata, labels, department] + metrics: + - name: "resource_info" + help: "The current state of a GitOps Toolkit resource." + each: + type: Info + info: + labelsFromPath: + name: [metadata, name] + labelsFromPath: + exported_namespace: [metadata, namespace] + suspended: [spec, suspend] + ready: [status, conditions, "[type=Ready]", status] + branch: [spec, ref, branch] +... +``` + +The above configuration will result in the following metric + +``` +gotk_resource_info{branch="main",customresource_group="source.toolkit.fluxcd.io",customresource_kind="GitRepository",customresource_version="v1",department="baz",exported_namespace="bar",name="foo",ownedBy="teamA",ready="True"} 1 +``` + +It contains the `ownedBy="teamA"`, `department="baz"` and `branch="main"` +labels. Similarly, more custom labels can be added depending on the need. + +Refer to the [kube-state-metrics custom-resource state configuration +docs][ksm-customresourcestate-metrics] to learn more about customizing the +metrics. + + +[kube-state-metrics]: https://github.com/kubernetes/kube-state-metrics +[monitoring-example-repo]: https://github.com/fluxcd/flux2-monitoring-example +[kube-prometheus-stack]: https://github.com/prometheus-operator/kube-prometheus +[ksm-customresourcestate-metrics]: https://github.com/kubernetes/kube-state-metrics/blob/main/docs/customresourcestate-metrics.md diff --git a/content/en/flux/monitoring/events.md b/content/en/flux/monitoring/events.md new file mode 100644 index 000000000..d4ca8df1d --- /dev/null +++ b/content/en/flux/monitoring/events.md @@ -0,0 +1,178 @@ +--- +title: "Flux events" +linkTitle: "Events" +description: "How to monitor the Flux events" +weight: 5 +--- + +The Flux controllers emit [Kubernetes events][kubernetes-events] during the +reconciliation operation to provide information about the object being +reconciled. Unlike logs, events are always associated with an object, which is a +Flux resource in this case. Events are supplemental data that can be used along +with logs to provide a complete picture of controllers' operations. Some of +the events emitted by Flux controllers are also used to send notifications. +See the [Alerts docs](/flux/monitoring/alerts/) to learn more about the Flux +Alerts based on events from controllers. In the following sections, we will go +through the Flux events and how to interpret them. + +## Kubernetes events + +The Flux controller events about a resource contain the following fields: + +- `type` can be `Normal` or `Warning` +- `firstTimestamp` timestamp in the ISO 8601 format +- `lastTimestamp` timestamp in the ISO 8601 format +- `message` info or warning description +- `reason` short machine understandable string +- `involvedObject` the API version, kind, name and namespace of the Flux object +- `metadata.annotations` the Flux specific metadata e.g. source revision +- `source.component` the Flux controller name where the event originated from. + +### Examples + +Example of a `Normal` event produced by kustomize-controller: + +```json +{ + "kind": "Event", + "apiVersion": "v1", + "metadata": { + "name": "flux-system.177bd633e296a292", + "namespace": "flux-system", + "annotations": { + "kustomize.toolkit.fluxcd.io/revision": "main@sha1:802723078affd3eb2a3898630261ab3ca5d6dd40" + } + }, + "involvedObject": { + "kind": "Kustomization", + "namespace": "flux-system", + "name": "flux-system", + "apiVersion": "kustomize.toolkit.fluxcd.io/v1", + }, + "reason": "ReconciliationSucceeded", + "message": "Reconciliation finished in 436.493292ms, next run in 10m0s", + "source": { + "component": "kustomize-controller" + }, + "firstTimestamp": "2023-08-16T10:26:43Z", + "lastTimestamp": "2023-08-16T10:26:43Z", + "type": "Normal", +} +``` + +In the above example: +- The event is about a `Kustomization` named `flux-system` in the `flux-system` + namespace, indicated by the `involvedObject` field. +- The event originates from `kustomize-controller`, indicated by the + `source.component` field. +- The event is a `Normal` type event about a successful reconciliation, + indicated by the `reason` and `message` fields. +- The `metadata.annotations` field `kustomize.toolkit.fluxcd.io/revision` + contains information about the source revision that was successfully applied + as a result of successful reconciliation of the Kustomization. + +Example of a `Warning` event produced by source-controller: + +```json +{ + "apiVersion": "v1", + "count": 4, + "eventTime": null, + "firstTimestamp": "2023-08-22T20:24:06Z", + "involvedObject": { + "apiVersion": "source.toolkit.fluxcd.io/v1", + "kind": "GitRepository", + "name": "podinfo", + "namespace": "default", + "resourceVersion": "1284973", + "uid": "2c2ed1da-556f-4793-863d-7d96e8bab3f5" + }, + "kind": "Event", + "lastTimestamp": "2023-08-22T20:24:18Z", + "message": "failed to checkout and determine revision: unable to clone 'https://github.com/stefanprodan/podinfo': couldn't find remote ref \"refs/tags/v1.8.9\"", + "metadata": { + "creationTimestamp": "2023-08-22T20:24:06Z", + "name": "podinfo.177dce48bc7db3a4", + "namespace": "default", + "resourceVersion": "1285016", + "uid": "3c8f568a-c99b-4279-8093-6ef08fae325b" + }, + "reason": "GitOperationFailed", + "reportingComponent": "", + "reportingInstance": "", + "source": { + "component": "source-controller" + }, + "type": "Warning" +} +``` + +In the above example: +- The event is about a `GitRepository` named `podinfo` in the `default` + namespace, indicated by the `involvedObject` field. +- The event originates from `source-controller`, indicated by the + `source.component` field. +- The event is a `Warning` type event about a failed Git operation, indicated by + the `reason` and `message` fields. + +## Events inspection with kubectl + +The events associated with a Flux resource can be queried using `kubectl events` +command: + +```console +$ kubectl events -n flux-system --for kustomization/flux-system +LAST SEEN TYPE REASON OBJECT MESSAGE +58m Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 448.00332ms, next run in 10m0s +48m Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 486.826649ms, next run in 10m0s +38m Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 502.282127ms, next run in 10m0s +28m Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 543.745587ms, next run in 10m0s +18m Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 465.177441ms, next run in 10m0s +8m27s Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 494.543068ms, next run in 10m0s +``` + +This shows all the events associated with the queried resource in an hour. + +## Events inspection with flux CLI + +The events associated with a Flux resource can be queried using the `flux +events` CLI command: + +```console +$ flux events --for Kustomization/flux-system +LAST SEEN TYPE REASON OBJECT MESSAGE +52m Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 506.467ms, next run in 10m0s +42m Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 531.072726ms, next run in 10m0 +32m Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 506.673992ms, next run in 10m0 +22m Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 512.255817ms, next run in 10m0 +12m Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 507.521248ms, next run in 10m0 +2m31s Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 448.00332ms, next run in 10m0s +``` + +This can also be used to watch all the events issues by the Flux controllers +across all the namespaces: + +```console +$ flux events --all-namespaces --watch +NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE +flux-system 34m (x3 over 154m) Normal GitOperationSucceeded GitRepository/flux-system no changes since last reconcilation: observed revision 'main@sha1:4d768edba5d409feb60870dd3b0ac0d307299898' +flux-system 54m Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 486.814878ms, next run in 10m0s +flux-system 44m Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 486.203813ms, next run in 10m0s +flux-system 34m Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 512.160373ms, next run in 10m0s +flux-system 24m Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 543.806383ms, next run in 10m0s +flux-system 14m Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 524.293527ms, next run in 10m0s +flux-system 4m5s Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 522.671955ms, next run in 10m0s +flux-system 47s Normal ReconciliationSucceeded Kustomization/flux-system Reconciliation finished in 523.892245ms, next run in 10m0s +flux-system 34m Normal ReconciliationSucceeded Kustomization/monitoring-configs Reconciliation finished in 104.609707ms, next run in 1h0m0s +flux-system 42s Normal ReconciliationSucceeded Kustomization/monitoring-configs Reconciliation finished in 90.70521ms, next run in 1h0m0s +flux-system 34m Normal ReconciliationSucceeded Kustomization/monitoring-controllers Reconciliation finished in 118.651968ms, next run in 1h0m0s +flux-system 39s Normal ReconciliationSucceeded Kustomization/monitoring-controllers Reconciliation finished in 132.34839ms, next run in 1h0m0s +monitoring 34m (x3 over 154m) Normal ArtifactUpToDate HelmChart/monitoring-kube-prometheus-stack artifact up-to-date with remote revision: '48.3.3' +monitoring 34m (x3 over 154m) Normal ArtifactUpToDate HelmChart/monitoring-loki-stack artifact up-to-date with remote revision: '2.9.11' +``` + +Refer to the [`flux events`](/flux/cmd/flux_events/) CLI docs to learn more +about it. + + +[kubernetes-events]: https://kubernetes.io/docs/reference/kubernetes-api/cluster-resources/event-v1/ diff --git a/content/en/flux/monitoring/logs.md b/content/en/flux/monitoring/logs.md new file mode 100644 index 000000000..0462d0c8f --- /dev/null +++ b/content/en/flux/monitoring/logs.md @@ -0,0 +1,161 @@ +--- +title: "Flux logs" +linkTitle: "Logs" +description: "How to monitor the Flux logs with Loki and Grafana" +weight: 4 +--- + +The Flux controllers follow the Kubernetes structured logging conventions. These +logs can be collected and analyzed to monitor the operations of the controllers. + +The [fluxcd/flux2-monitoring-example][monitoring-example-repo] repository +provides a ready-made example setup to get started with monitoring Flux, which +includes [Loki-stack][loki-stack] to collect logs from all the Flux controllers +and explore them using Grafana. It is recommended to set up the monitoring +example before continuing with this document to follow along. Before getting +into Loki and Grafana setup, the following sections will describe the Flux logs +and how to interpret them. + +## Controller logs + +The default installation of Flux controllers write logs to `stderr` in JSON +format at the `info` log level. This can be configured using the +`--log-encoding` and `--log-level` flags in the controllers. Refer to the +[flux-system +kustomization](https://github.com/fluxcd/flux2-monitoring-example/blob/main/clusters/test/flux-system/kustomization.yaml) +for an example of how to patch the Flux controllers with flags. The following +example patch snippet can be appended to the existing set of patches to add a +log level flag and change the log level of the controller to `debug`. + +```yaml +- op: add + path: /spec/template/spec/containers/0/args/- + value: --log-level="debug" +``` + +{{< note >}} +The patch configuration in the example only applies to a few targeted +controllers. Update the patch target to apply this to other controllers too. +{{< /note >}} + +### Structured logging + +The Flux controllers support structured logging with the following common +labels: + +- `level` can be `debug`, `info` or `error` +- `ts` timestamp in the ISO 8601 format +- `msg` info or error description +- `error` error details (present when `level` is `error`) +- `controllerGroup` the Flux CR group +- `controllerKind` the Flux CR kind +- `name` The Flux CR name +- `namespace` The Flux CR namespace +- `reconcileID` the UID of the Flux reconcile operation + +Sample of an `info` log produced by kustomize-controller: + +```json +{ + "level": "info", + "ts": "2023-08-16T09:36:41.286Z", + "controllerGroup": "kustomize.toolkit.fluxcd.io", + "controllerKind": "Kustomization", + "name": "redis", + "namespace": "apps", + "msg": "server-side apply completed", + "revision": "main@sha1:30081ad7170fb8168536768fe399493dd43160d7", + "output": { + "ConfigMap/apps/redis": "created", + "Deployment/apps/redis": "configured", + "HorizontalPodAutoscaler/apps/redis": "deleted", + "Service/apps/redis": "unchanged", + "Secret/apps/redis": "skipped" + } +} +``` + +Sample of an `error` log produced by kustomize-controller: + +```json +{ + "level": "error", + "ts": "2023-08-16T09:36:41.286Z", + "controllerGroup": "kustomize.toolkit.fluxcd.io", + "controllerKind": "Kustomization", + "name": "redis", + "namespace": "apps", + "msg": "Reconciliation failed after 2s, next try in 5m0s", + "revision": "main@sha1:f68c334e0f5fae791d1e47dbcabed256f4f89e68", + "error": "Service/apps/redis dry-run failed, reason: Invalid, error: Service redis is invalid: spec.type: Unsupported value: Ingress" +} +``` + +The log labels shown above can be used to query for specific types of logs. For +example, error logs can be queried using the `error` label, the output of +successful reconciliation of Kustomization can be queried using the `output` +label, the logs about a specific controller can be queried using the +`controllerKind` label. + +## Querying logs associated with resources + +For querying logs associated with particular resources, the `flux logs` CLI +command can be used. It connects to the cluster, fetches the relevant Flux logs, +and filters them based on the query request. For example, to list the logs +associated with Kustomization `monitoring-configs`: + +```console +$ flux logs --kind=Kustomization --name=monitoring-configs --namespace=flux-system --since=1m +... +2023-08-22T18:35:45.292Z info Kustomization/monitoring-configs.flux-system - All dependencies are ready, proceeding with reconciliation +2023-08-22T18:35:45.348Z info Kustomization/monitoring-configs.flux-system - server-side apply completed +2023-08-22T18:35:45.380Z info Kustomization/monitoring-configs.flux-system - Reconciliation finished in 88.208385ms, next run in 1h0m0s +``` + +Refer to the [`flux logs`](/flux/cmd/flux_logs/) CLI docs to learn more about +it. + +## Log aggregation with Grafana Loki + +In the [monitoring example repository][monitoring-example-repo], the monitoring +configurations can be found in the +[`monitoring/`](https://github.com/fluxcd/flux2-monitoring-example/tree/main/monitoring) +directory. `monitoring/controllers/` directory contains the configurations for +deploying kube-prometheus-stack and loki-stack. We'll discuss loki-stack below. +For Flux metrics collection using Prometheus, refer to the [Flux Prometheus +metrics](/flux/monitoring/metrics/) docs. + +The configuration in the +[`monitoring/controllers/loki-stack`](https://github.com/fluxcd/flux2-monitoring-example/tree/main/monitoring/controllers/loki-stack) +directory creates a HelmRepository for the [Grafana +helm-charts](https://github.com/grafana/helm-charts) and a HelmRelease to +deploy the `loki-stack` chart in the `monitoring` namespace. Please see the +[values](https://github.com/fluxcd/flux2-monitoring-example/blob/main/monitoring/controllers/loki-stack/release.yaml) +used for the chart and modify them accordingly. + +Once deployed, [Loki][loki] and [Promtail][promtail] Pods get created, and Loki +is added as a data source in Grafana. Promtail aggregates the logs from all the +Pods in every node and sends them to Loki. Grafana can be used to query the logs +from Loki and analyze them. Refer to the [LogQL docs][logql] to see examples of +queries and learn more about querying logs. + +### Grafana dashboard + +The [example monitoring setup][monitoring-example-repo] provides a Grafana +dashboard in +[`monitoring/configs/dashboards/logs.json`](https://github.com/fluxcd/flux2-monitoring-example/tree/main/monitoring/configs/dashboards/logs.json) +that queries and shows logs from all the Flux controllers. + +Control plane logs: + +![Control plane logs dashboard](/img/grafana-logs-dashboard.png) + +This can be used to browse logs from all the Flux controllers in a centralized +manner. + + +[monitoring-example-repo]: https://github.com/fluxcd/flux2-monitoring-example +[loki-stack]: https://github.com/grafana/helm-charts/tree/main/charts/loki-stack +[loki]: https://grafana.com/docs/loki/latest/ +[promtail]: https://grafana.com/docs/loki/latest/clients/promtail/ +[logql]: https://grafana.com/docs/loki/latest/logql/ diff --git a/content/en/flux/monitoring/metrics.md b/content/en/flux/monitoring/metrics.md new file mode 100644 index 000000000..5dbe0d729 --- /dev/null +++ b/content/en/flux/monitoring/metrics.md @@ -0,0 +1,222 @@ +--- +title: "Flux Prometheus metrics" +linkTitle: "Metrics" +description: "How to monitor Flux with Prometheus Operator and Grafana" +weight: 2 +--- + +{{% alert color="info" title="Metrics Deprecation" %}} +Some of the Flux controller metrics prior to v2.1.0 have been deprecated. Please +see the [Deprecated Resource Metrics](#-deprecated-resource-metrics) +section below to learn more about it. +{{< /alert >}} + +Flux has native support for [Prometheus][prometheus] metrics to provide insights +into the state of the Flux components. These can be used to set up monitoring +for the Flux controllers. In addition, Flux Custom Resource metrics can also +be collected leveraging tools like [kube-state-metrics][kube-state-metrics]. +This document provides information about Flux metrics that can be used to set up +monitoring, with some examples. + +The [fluxcd/flux2-monitoring-example][monitoring-example-repo] repository +provides a ready-made example setup to get started with monitoring Flux. It is +recommended to set up the monitoring example before continuing with this +document to follow along. Before getting into the monitoring setup, the +following sections will describe the kinds of metrics that can be collected for +Flux. + +## Controller metrics + +The default installation of Flux controllers export Prometheus metrics at +port `8080` in the standard `/metrics` path. These metrics are about the inner +workings of the controllers. + +Flux resource reconciliation duration metrics: + +``` +gotk_reconcile_duration_seconds_bucket{kind, name, namespace, le} +gotk_reconcile_duration_seconds_sum{kind, name, namespace} +gotk_reconcile_duration_seconds_count{kind, name, namespace} +``` + +Cache event metrics: + +``` +gotk_cache_events_total{event_type, name, namespace} +``` + +Controller CPU and memory usage: + +``` +process_cpu_seconds_total{namespace, pod} +container_memory_working_set_bytes{namespace, pod} +``` + +Kubernetes API usage: + +``` +rest_client_requests_total{namespace, pod} +``` + +Controller runtime: + +``` +workqueue_longest_running_processor_seconds{name} +controller_runtime_reconcile_total{controller, result} +``` + +In addition, many other Go runtime and [controller-runtime +metrics][controller-runtime-metrics] are also exported. + +## Resource metrics + +Metrics for the Flux custom resources can be used to monitor the deployment of +workloads. Since the use case for these metrics may vary depending on the +needs, it's hard to decide which fields of the resources would be useful to the +users. Hence, these metrics are not exported by the Flux controllers themselves +but can be collected and exported by using other tools that can read the custom +resource state from the kube-apiserver. One such tool is [kube-state-metrics +(KSM)][kube-state-metrics]. KSM is also deployed as part of +[kube-prometheus-stack][kube-prometheus-stack] and is used to export the metrics +of kubernetes core resources. It can be configured to also collect custom +resource metrics. The monitoring setup in +[flux2-monitoring-example][monitoring-example-repo] uses KSM to collect and +export Flux custom resource metrics. + +In the [example monitoring setup][monitoring-example-repo], the metric +`gotk_resource_info` provides information about the current state of Flux +resources. + +``` +gotk_resource_info{customresource_group, customresource_kind, customresource_version, exported_namespace, name, ready, suspended, ...} +``` + +- `customresource_group` is the API group of the resource, for example + `source.toolkit.fluxcd.io` for the Flux source API. +- `customresource_kind` is the kind of the resource, for example a + `GitRepository` source. +- `customresource_version` is the API version of the resource, for example `v1`. +- `exported_namespace` is the namespace of the resource. +- `name` is the name of the resource. +- `ready` shows the readiness of the resource. +- `suspended` shows if the resource's reconciliation is suspended. + +These are some of the common labels that are present in metrics for all the +kinds of resources. In addition, there are a few resource kind specific labels. +See the following table for a list of labels associated with specific resource +kind. + +| Resource Kind | Labels | +| --- | --- | +| Kustomization | `revision`, `source_name` | +| HelmRelease | `revision`, `chart_name`, `chart_source_name` | +| GitRepository | `revision`, `url` | +| Bucket | `revision`, `endpoint`, `bucket_name` | +| HelmRepository | `revision`, `url` | +| HelmChart | `revision`, `chart_name`, `chart_version` | +| OCIRepository | `revision`, `url` | +| Receiver | `webhook_path` | +| ImageRepository | `image` | +| ImagePolicy | `source_name` | +| ImageUpdateAutomation | `source_name` | + +{{< note >}} +The above metric may have extra labels after being collected in Prometheus. This +may be due to the default Prometheus scrape configuration used by +kube-prometheus-stack. Since they are about the kube-state-metrics service and +not about Flux itself, they can be ignored. +{{< /note >}} + +`gotk_resource_info` is an example of a metric used to collect information about +the Flux resources. This metric can be customized to add more labels, or more +such metrics can also be created by changing the kube-state-metrics custom +resource state configuration. Please see [Flux custom Prometheus +metrics][custom-metrics] for details about them. + +### :warning: Deprecated resource metrics + +Prior to Flux v2.1.0, the individual Flux controllers used to export resource +metrics that they managed. They have been deprecated for custom metrics using +kube-state-metrics. + +Users of the deprecated metrics `gotk_reconcile_condition` and +`gotk_suspend_status` can find the same information in the new +`gotk_resource_info` metric exported using kube-state-metrics. If needed, an +equivalent of `gotk_reconcile_condition` and `gotk_suspend_status` can be +created as a custom metric using the kube-state-metrics custom resource state +configuration. Please see [Flux custom Prometheus +metrics][custom-metrics] for details. + +## Monitoring setup + +In the [monitoring example repository][monitoring-example-repo], the monitoring configurations can be found in the +[`monitoring/`](https://github.com/fluxcd/flux2-monitoring-example/tree/main/monitoring) +directory. `monitoring/controllers/` directory contains the configurations for +deploying kube-prometheus-stack and loki-stack. We'll discuss +kube-prometheus-stack below. For Flux log collection using Loki, refer to the +[Flux logs](/flux/monitoring/logs/) docs. + +The configuration in the `monitoring/controllers/kube-prometheus-stack/` +directory creates a HelmRepository of type OCI for the [prometheus-community +helm charts](https://github.com/prometheus-community/helm-charts) and a +HelmRelease to deploy the `kube-prometheus-stack` chart in the `monitoring` +namespace. This installs all the monitoring components in the `monitoring` +namespace. Please see the +[values](https://github.com/fluxcd/flux2-monitoring-example/blob/main/monitoring/controllers/kube-prometheus-stack/release.yaml) +used for the chart deployment and modify them accordingly. + +The chart values used for configuring kube-state-metrics are in the file +[`kube-state-metrics-config.yaml`](https://github.com/fluxcd/flux2-monitoring-example/blob/main/monitoring/controllers/kube-prometheus-stack/kube-state-metrics-config.yaml), +as seen in the +[kustomization.yaml](https://github.com/fluxcd/flux2-monitoring-example/blob/main/monitoring/controllers/kube-prometheus-stack/kustomization.yaml), +which uses a kustomize ConfigMap generator to put the configurations in a +ConfigMap and use the chart values from the ConfigMap. +These values are merged with the inline chart values in the HelmRelease. +Kube-state-metrics values are in a separate file to make it easier to customize +the metrics it collects; refer to the [Flux custom Prometheus +metrics][custom-metrics] docs to see how they are used. Once +deployed with these values, the kube-state-metrics starts collecting and +exporting the Flux resource metrics. + +To configure Prometheus to scrape Flux controller metrics, a +[PodMonitor](https://github.com/fluxcd/flux2-monitoring-example/blob/main/monitoring/configs/podmonitor.yaml) +is used that selects all the Flux controller Pods and sets the metrics endpoint +to the `http-prom` port. Once created, the prometheus-operator will +automatically configure Prometheus to scrape the Flux controller metrics. + +### Flux Grafana dashboards + +The [example monitoring setup][monitoring-example-repo] provides two example +Grafana dashboards in +[`monitoring/configs/dashboards`](https://github.com/fluxcd/flux2-monitoring-example/tree/main/monitoring/configs/dashboards) +that use the Flux controller and resource metrics. The Flux Cluster Stats +dashboard shows the overall state of the Flux Sources and Cluster Reconcilers. +The Flux Control Plane dashboard shows the statistics of the various components +that constitute the Flux Control Plane and their operational metrics. + +Control plane dashboard: + +![Control Plane Dashboard - Part 1](/img/grafana-cp-dashboard-p1.png) + +![Control Plane Dashboard - Part 2](/img/grafana-cp-dashboard-p2.png) + +![Control Plane Dashboard - Part 3](/img/grafana-cp-dashboard-p3.png) + +![Control Plane Dashboard - Part 4](/img/grafana-cp-dashboard-p4.png) + +Cluster reconciliation dashboard: + +![Cluster reconciliation dashboard - Part 1](/img/grafana-cluster-dashboard-p1.png) + +![Cluster reconciliation dashboard - Part 2](/img/grafana-cluster-dashboard-p2.png) + +More custom metrics can be created and used in the dashboards for monitoring +Flux. + + +[kube-state-metrics]: https://github.com/kubernetes/kube-state-metrics +[prometheus]: https://prometheus.io/ +[monitoring-example-repo]: https://github.com/fluxcd/flux2-monitoring-example +[kube-prometheus-stack]: https://github.com/prometheus-operator/kube-prometheus +[controller-runtime-metrics]: https://book.kubebuilder.io/reference/metrics-reference +[custom-metrics]: /flux/monitoring/custom-metrics/ diff --git a/static/_redirects b/static/_redirects index 1c55b48c7..d3f9fe2b6 100644 --- a/static/_redirects +++ b/static/_redirects @@ -32,6 +32,8 @@ /flux/use-cases/openshift /flux/installation/configuration/openshift 301! /flux/cheatsheets/sharding /flux/installation/configuration/sharding 301! /flux/cheatsheets/bootstrap /flux/installation/configuration 301! +/flux/guides/monitoring /flux/monitoring/metrics 301! +/flux/guides/notifications /flux/monitoring/alerts 301! /docs/contributing/* /contributing/:splat 301! /docs/* /flux/:splat 301! diff --git a/static/img/grafana-cluster-dashboard-p1.png b/static/img/grafana-cluster-dashboard-p1.png new file mode 100644 index 000000000..ece303128 Binary files /dev/null and b/static/img/grafana-cluster-dashboard-p1.png differ diff --git a/static/img/grafana-cluster-dashboard-p2.png b/static/img/grafana-cluster-dashboard-p2.png new file mode 100644 index 000000000..a1ab08ed0 Binary files /dev/null and b/static/img/grafana-cluster-dashboard-p2.png differ diff --git a/static/img/grafana-cp-dashboard-p1.png b/static/img/grafana-cp-dashboard-p1.png new file mode 100644 index 000000000..cf3cfc6aa Binary files /dev/null and b/static/img/grafana-cp-dashboard-p1.png differ diff --git a/static/img/grafana-cp-dashboard-p2.png b/static/img/grafana-cp-dashboard-p2.png new file mode 100644 index 000000000..2602370dc Binary files /dev/null and b/static/img/grafana-cp-dashboard-p2.png differ diff --git a/static/img/grafana-cp-dashboard-p3.png b/static/img/grafana-cp-dashboard-p3.png new file mode 100644 index 000000000..fb125cad3 Binary files /dev/null and b/static/img/grafana-cp-dashboard-p3.png differ diff --git a/static/img/grafana-cp-dashboard-p4.png b/static/img/grafana-cp-dashboard-p4.png new file mode 100644 index 000000000..df28db9ee Binary files /dev/null and b/static/img/grafana-cp-dashboard-p4.png differ diff --git a/static/img/grafana-logs-dashboard.png b/static/img/grafana-logs-dashboard.png new file mode 100644 index 000000000..bb9509703 Binary files /dev/null and b/static/img/grafana-logs-dashboard.png differ