Skip to content

Commit

Permalink
Add runbooks for Argo and Vector
Browse files Browse the repository at this point in the history
  • Loading branch information
Leonhardt Wille committed Aug 28, 2023
1 parent 7cd37eb commit 7429f4f
Show file tree
Hide file tree
Showing 3 changed files with 47 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
public/
resources/
.hugo_build.lock
.vscode
23 changes: 23 additions & 0 deletions content/runbooks/argo/ArgoAppNotSynced.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
title: Argo App Not Synced
weight: 20
---

# ArgoAppNotSynced

## Meaning

At least one Application in Argo CD is not in sync, meaning there is a difference between the state running in Kubernetes and the latest state in git.

## Impact

It's possible that a change was supposed to be deployed, but it was not. In what ways this can impact the system depends on the change itself.

## Diagnosis

Check the application's diff in Argo CD. The diff will show the differences between the state in git and the state in Kubernetes.

## Mitigation

To address this alert, you can either sync the application in Argo CD or revert the change in git.
Make sure to check the diff before syncing or reverting, and consider talking to the person who made the change.
23 changes: 23 additions & 0 deletions content/runbooks/vector/VectorDiscardEvents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
title: Vector Discarded Events
weight: 20
---

# VectorDiscardEvents

## Meaning

Vector discarded events because it was unable to keep up with the rate of incoming events.

## Impact

This means that some events were not processed by Vector. This means that some logs were not sent to Loki, which means that they are not available in Grafana.

## Diagnosis

Check the Vector logs for errors. If there are no errors, check the Vector metrics for the `vector_events_discarded_total` metric. If the metric is increasing, it means that Vector is unable to keep up with the rate of incoming events.

## Mitigation

To address this alert, you can either scale up the Vector deployment or reduce the rate of incoming events.
As we deploy Vector as a DaemonSet, scaling up the deployment will scale up Vector on all nodes.

0 comments on commit 7429f4f

Please sign in to comment.