Skip to content

Commit

Permalink
feat: use defrag community script (#73)
Browse files Browse the repository at this point in the history
  • Loading branch information
bsctl committed Aug 21, 2024
1 parent 9766d62 commit ab2fc55
Show file tree
Hide file tree
Showing 6 changed files with 136 additions and 75 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@ A multi-tenant deployment for `etcd` is not common practice. However, `etcd` pro
## Documentation
Refer to the [etcd documentation](https://etcd.io/docs/v3.5/op-guide). Following sections provide additional procedures to help with a specific setup as it is used into project [Kamaji](https://github.com/clastix/kamaji).

- [Backup and restore from snapshot](docs/snapshot-recovery.md)
- [Disaster Recovery with Velero](docs/velero.md)
- [Taking Snapshots](docs/snapshot.md)
- [Recover from Snapshot](docs/snapshot-recovery.md)
- [Velero](docs/velero.md)
- [Rotate Certificates](docs/rotate-certificates.md)
- [Defragmenting Data](docs/defragmentation.md)
- [Performance and Optimization](docs/performance-and-optimization.md)

## Roadmap
Expand Down
2 changes: 0 additions & 2 deletions charts/kamaji-etcd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,6 @@ Here the values you can override:
| clusterDomain | string | `"cluster.local"` | Domain of the Kubernetes cluster. |
| datastore.enabled | bool | `false` | Create a datastore custom resource for Kamaji |
| datastore.name | string | `""` | Name of Kamaji datastore, set to fully qualified etcd name when null or not provided |
| defragmentation | object | `{"schedule":"0 0 * * *"}` | Enable storage defragmentation |
| defragmentation.schedule | string | `"0 0 * * *"` | The job scheduled maintenance time for defrag (empty to disable) |
| extraArgs | list | `[]` | A list of extra arguments to add to the etcd default ones |
| fullnameOverride | string | `""` | |
| image.pullPolicy | string | `"IfNotPresent"` | Pull policy to use |
Expand Down
66 changes: 0 additions & 66 deletions charts/kamaji-etcd/templates/etcd_cronjob_defrag.yaml

This file was deleted.

5 changes: 0 additions & 5 deletions charts/kamaji-etcd/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,11 +76,6 @@ persistentVolumeClaim:
customAnnotations: {}
# volumeType: local

# -- Enable storage defragmentation
defragmentation:
# -- The job scheduled maintenance time for defrag (empty to disable)
schedule: "0 0 * * *" # Default cron schedule (daily at midnight), see https://crontab.guru/

# -- Labels to add to all etcd pods
podLabels:
application: kamaji-etcd
Expand Down
49 changes: 49 additions & 0 deletions docs/defragmentation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Defragmenting Data
For dense Kubernetes clusters, `etcd` can suffer from poor performance if the keyspace grows too large and exceeds the space quota. Periodically maintain and defragment `etcd` to free up space in the data store. See details [here](https://etcd.io/docs/v3.5/op-guide/maintenance/).

Monitor Prometheus for `etcd` metrics and defragment it when required, otherwise, `etcd` can raise a cluster-wide alarm that puts the cluster into a maintenance mode accepting only key reads and deletes.

To keep track of defragmentation requirements, monitor these key metrics:

- `etcd_server_quota_backend_bytes`: which is the current quota limit
- `etcd_mvcc_db_total_size_in_use_in_bytes`: which indicates the actual database usage after a history compaction
- `etcd_mvcc_db_total_size_in_bytes`, which shows the database size, including free space waiting for defragmentation

You can also determine whether defragmentation is needed by checking the `etcd` database size in MB that will be freed by defragmentation with the PromQL expression:

- `(etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes)/1024/1024`

Defragmentation is an expensive operation, so it should be executed as infrequently as possible. On the other hand, it's also necessary to make sure any `etcd` member will not exceed the storage quota. The Kubernetes project recommends that when you perform defragmentation, you use a tool such as [etcd-defrag](https://github.com/ahrtr/etcd-defrag).

The `defrag.sh` script is designed to create and schedule jobs for periodically defragment data on a `kamaji-etcd` instance. The script generates Kubernetes CronJob manifests and applies them to the specified namespace. Make sure you set the defragmentation criteria according to your environment needs.


## Usage
To run the script, use the following command:

```bash
./defrag.sh [-e etcd_name] [-s etcd_service] [-n etcd_namespace] [-j schedule]
```

## Parameters

- `-e etcd_name`: Name of the etcd StatefulSet (default: `kamaji-etcd`)
- `-s etcd_service`: Name of the etcd service (default: `kamaji-etcd`)
- `-n etcd_namespace`: Namespace of the etcd StatefulSet (default: `kamaji-system`)
- `-j schedule`: Cron schedule for the defrag job (default: `"0 0 * * *"`, which means daily at midnight)

## Example

To run the script with custom parameters:

```bash
./defrag.sh -e kamaji-etcd -s kamaji-etcd -n kamaji-system -j "14 9 * * 1-5"
```
This will create a Kubernetes CronJob manifest with the specified parameters and apply it to the cluster.

## Debug mode
To run the script in debug mode set the environment variable `DEBUG`:

``` bash
export DEBUG=1
```
83 changes: 83 additions & 0 deletions scripts/defrag.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
#!/bin/bash

# Enable debugging, exit on errors, and ensure the script fails if any command in a pipeline fails
if [ "${DEBUG}" = 1 ]; then
set -x
fi
set -eu -o pipefail

# Default values for the parameters
ETCD_NAME="kamaji-etcd"
ETCD_SERVICE="kamaji-etcd"
ETCD_NAMESPACE="kamaji-system"
SCHEDULE="0 0 * * *" # every day at midnight

# Parse script parameters
while getopts "e:s:n:j:" opt; do
case ${opt} in
e ) ETCD_NAME=$OPTARG ;;
s ) ETCD_SERVICE=$OPTARG ;;
n ) ETCD_NAMESPACE=$OPTARG ;;
j ) SCHEDULE=$OPTARG ;;
\? ) echo "Usage: ./defrag.sh [-e etcd_name] [-s etcd_service] [-n etcd_namespace] [-j schedule]"
exit 1 ;;
esac
done

# Function to create the CronJob manifest for defrag etcd
create_defrag_cronjob() {
local etcd_name=$1
local etcd_service=$2
local etcd_namespace=$3
local schedule=$4 # Add a parameter for the cron schedule

cat <<EOF > ${etcd_name}-defrag-job.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: ${etcd_name}-defrag-job
namespace: $etcd_namespace
spec:
schedule: "$schedule" # Use the provided schedule
jobTemplate:
spec:
template:
spec:
containers:
- name: etcd-defrag
image: ghcr.io/ahrtr/etcd-defrag:v0.15.0 # Please replace the version with the latest version.
args:
- --endpoints=https://${etcd_name}-0.${etcd_service}.${etcd_namespace}.svc.cluster.local:2379,https://${etcd_name}-1.${etcd_service}.${etcd_namespace}.svc.cluster.local:2379,https://${etcd_name}-2.${etcd_service}.${etcd_namespace}.svc.cluster.local:2379
- --cacert=/opt/certs/ca/ca.crt
- --cert=/opt/certs/root-client-certs/tls.crt
- --key=/opt/certs/root-client-certs/tls.key
- --cluster
- --defrag-rule
- "dbQuotaUsage > 0.8 || dbSize - dbSizeInUse > 200*1024*1024"
volumeMounts:
- mountPath: /opt/certs/root-client-certs
name: root-client-certs
- mountPath: /opt/certs/ca
name: certs
restartPolicy: OnFailure
securityContext:
runAsUser: 0
volumes:
- name: root-client-certs
secret:
secretName: ${etcd_name}-root-client-certs
- name: certs
secret:
secretName: ${etcd_name}-certs
EOF
}

# Main script to defrag etcd
main() {
# Create and apply defrag CronJob
create_defrag_cronjob "$ETCD_NAME" "$ETCD_SERVICE" "$ETCD_NAMESPACE" "$SCHEDULE"
kubectl apply -f $ETCD_NAME-defrag-job.yaml
}

# Execute the main script
main

0 comments on commit ab2fc55

Please sign in to comment.