Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use defrag community script #73

Merged
merged 1 commit into from
Aug 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@ A multi-tenant deployment for `etcd` is not common practice. However, `etcd` pro
## Documentation
Refer to the [etcd documentation](https://etcd.io/docs/v3.5/op-guide). Following sections provide additional procedures to help with a specific setup as it is used into project [Kamaji](https://github.com/clastix/kamaji).

- [Backup and restore from snapshot](docs/snapshot-recovery.md)
- [Disaster Recovery with Velero](docs/velero.md)
- [Taking Snapshots](docs/snapshot.md)
- [Recover from Snapshot](docs/snapshot-recovery.md)
- [Velero](docs/velero.md)
- [Rotate Certificates](docs/rotate-certificates.md)
- [Defragmenting Data](docs/defragmentation.md)
- [Performance and Optimization](docs/performance-and-optimization.md)

## Roadmap
Expand Down
2 changes: 0 additions & 2 deletions charts/kamaji-etcd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,6 @@ Here the values you can override:
| clusterDomain | string | `"cluster.local"` | Domain of the Kubernetes cluster. |
| datastore.enabled | bool | `false` | Create a datastore custom resource for Kamaji |
| datastore.name | string | `""` | Name of Kamaji datastore, set to fully qualified etcd name when null or not provided |
| defragmentation | object | `{"schedule":"0 0 * * *"}` | Enable storage defragmentation |
| defragmentation.schedule | string | `"0 0 * * *"` | The job scheduled maintenance time for defrag (empty to disable) |
| extraArgs | list | `[]` | A list of extra arguments to add to the etcd default ones |
| fullnameOverride | string | `""` | |
| image.pullPolicy | string | `"IfNotPresent"` | Pull policy to use |
Expand Down
66 changes: 0 additions & 66 deletions charts/kamaji-etcd/templates/etcd_cronjob_defrag.yaml

This file was deleted.

5 changes: 0 additions & 5 deletions charts/kamaji-etcd/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,11 +76,6 @@ persistentVolumeClaim:
customAnnotations: {}
# volumeType: local

# -- Enable storage defragmentation
defragmentation:
# -- The job scheduled maintenance time for defrag (empty to disable)
schedule: "0 0 * * *" # Default cron schedule (daily at midnight), see https://crontab.guru/

# -- Labels to add to all etcd pods
podLabels:
application: kamaji-etcd
Expand Down
49 changes: 49 additions & 0 deletions docs/defragmentation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Defragmenting Data
For dense Kubernetes clusters, `etcd` can suffer from poor performance if the keyspace grows too large and exceeds the space quota. Periodically maintain and defragment `etcd` to free up space in the data store. See details [here](https://etcd.io/docs/v3.5/op-guide/maintenance/).

Monitor Prometheus for `etcd` metrics and defragment it when required, otherwise, `etcd` can raise a cluster-wide alarm that puts the cluster into a maintenance mode accepting only key reads and deletes.

To keep track of defragmentation requirements, monitor these key metrics:

- `etcd_server_quota_backend_bytes`: which is the current quota limit
- `etcd_mvcc_db_total_size_in_use_in_bytes`: which indicates the actual database usage after a history compaction
- `etcd_mvcc_db_total_size_in_bytes`, which shows the database size, including free space waiting for defragmentation

You can also determine whether defragmentation is needed by checking the `etcd` database size in MB that will be freed by defragmentation with the PromQL expression:

- `(etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes)/1024/1024`

Defragmentation is an expensive operation, so it should be executed as infrequently as possible. On the other hand, it's also necessary to make sure any `etcd` member will not exceed the storage quota. The Kubernetes project recommends that when you perform defragmentation, you use a tool such as [etcd-defrag](https://github.com/ahrtr/etcd-defrag).

The `defrag.sh` script is designed to create and schedule jobs for periodically defragment data on a `kamaji-etcd` instance. The script generates Kubernetes CronJob manifests and applies them to the specified namespace. Make sure you set the defragmentation criteria according to your environment needs.


## Usage
To run the script, use the following command:

```bash
./defrag.sh [-e etcd_name] [-s etcd_service] [-n etcd_namespace] [-j schedule]
```

## Parameters

- `-e etcd_name`: Name of the etcd StatefulSet (default: `kamaji-etcd`)
- `-s etcd_service`: Name of the etcd service (default: `kamaji-etcd`)
- `-n etcd_namespace`: Namespace of the etcd StatefulSet (default: `kamaji-system`)
- `-j schedule`: Cron schedule for the defrag job (default: `"0 0 * * *"`, which means daily at midnight)

## Example

To run the script with custom parameters:

```bash
./defrag.sh -e kamaji-etcd -s kamaji-etcd -n kamaji-system -j "14 9 * * 1-5"
```
This will create a Kubernetes CronJob manifest with the specified parameters and apply it to the cluster.

## Debug mode
To run the script in debug mode set the environment variable `DEBUG`:

``` bash
export DEBUG=1
```
83 changes: 83 additions & 0 deletions scripts/defrag.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
#!/bin/bash

# Enable debugging, exit on errors, and ensure the script fails if any command in a pipeline fails
if [ "${DEBUG}" = 1 ]; then
set -x
fi
set -eu -o pipefail

# Default values for the parameters
ETCD_NAME="kamaji-etcd"
ETCD_SERVICE="kamaji-etcd"
ETCD_NAMESPACE="kamaji-system"
SCHEDULE="0 0 * * *" # every day at midnight

# Parse script parameters
while getopts "e:s:n:j:" opt; do
case ${opt} in
e ) ETCD_NAME=$OPTARG ;;
s ) ETCD_SERVICE=$OPTARG ;;
n ) ETCD_NAMESPACE=$OPTARG ;;
j ) SCHEDULE=$OPTARG ;;
\? ) echo "Usage: ./defrag.sh [-e etcd_name] [-s etcd_service] [-n etcd_namespace] [-j schedule]"
exit 1 ;;
esac
done

# Function to create the CronJob manifest for defrag etcd
create_defrag_cronjob() {
local etcd_name=$1
local etcd_service=$2
local etcd_namespace=$3
local schedule=$4 # Add a parameter for the cron schedule

cat <<EOF > ${etcd_name}-defrag-job.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: ${etcd_name}-defrag-job
namespace: $etcd_namespace
spec:
schedule: "$schedule" # Use the provided schedule
jobTemplate:
spec:
template:
spec:
containers:
- name: etcd-defrag
image: ghcr.io/ahrtr/etcd-defrag:v0.15.0 # Please replace the version with the latest version.
args:
- --endpoints=https://${etcd_name}-0.${etcd_service}.${etcd_namespace}.svc.cluster.local:2379,https://${etcd_name}-1.${etcd_service}.${etcd_namespace}.svc.cluster.local:2379,https://${etcd_name}-2.${etcd_service}.${etcd_namespace}.svc.cluster.local:2379
- --cacert=/opt/certs/ca/ca.crt
- --cert=/opt/certs/root-client-certs/tls.crt
- --key=/opt/certs/root-client-certs/tls.key
- --cluster
- --defrag-rule
- "dbQuotaUsage > 0.8 || dbSize - dbSizeInUse > 200*1024*1024"
volumeMounts:
- mountPath: /opt/certs/root-client-certs
name: root-client-certs
- mountPath: /opt/certs/ca
name: certs
restartPolicy: OnFailure
securityContext:
runAsUser: 0
volumes:
- name: root-client-certs
secret:
secretName: ${etcd_name}-root-client-certs
- name: certs
secret:
secretName: ${etcd_name}-certs
EOF
}

# Main script to defrag etcd
main() {
# Create and apply defrag CronJob
create_defrag_cronjob "$ETCD_NAME" "$ETCD_SERVICE" "$ETCD_NAMESPACE" "$SCHEDULE"
kubectl apply -f $ETCD_NAME-defrag-job.yaml
}

# Execute the main script
main
Loading