Cain is a backup and restore tool for Cassandra on Kubernetes. It is named after the DC Comics superhero Cassandra Cain.
Cain supports the following cloud storage services:
- AWS S3
- Minio S3
- Azure Blob Storage
Cain is now an official part of the Helm incubator/cassandra chart!
- git
- dep
Download the latest release from the Releases page or use it with a Docker image
mkdir -p $GOPATH/src/github.com/nuvo && cd $_
git clone https://github.com/nuvo/cain.git && cd cain
make
Cain performs a backup in the following way:
- Backup the
keyspace
schema (usingcqlsh
). - Get backup data using
nodetool snapshot
- it creates a snapshot of thekeyspace
in all Cassandra pods in the givennamespace
(according toselector
). - Copy the files in
parallel
to cloud storage using Skbn - it copies the files to the specifieddst
, undernamespace/<cassandrClusterName>/keyspace/<keyspaceSchemaHash>/tag/
. - Clear all snapshots.
$ cain backup --help
backup cassandra cluster to cloud storage
Usage:
cain backup [flags]
Flags:
-a, --authentication use authentication for nodetool and clqsh. Overrides $CAIN_AUTHENTICATION
-b, --buffer-size float in memory buffer size (MB) to use for files copy (buffer per file). Overrides $CAIN_BUFFER_SIZE (default 6.75)
--cassandra-data-dir string cassandra data directory. Overrides $CAIN_CASSANDRA_DATA_DIR (default "/var/lib/cassandra/data")
-u, --cassandra-username string cassandra username. Overrides $CAIN_CASSANDRA_USERNAME (default "cain")
-c, --container string container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
--dst string destination to backup to. Example: s3://bucket/cassandra. Overrides $CAIN_DST
-h, --help help for backup
-k, --keyspace string keyspace to act on. Overrides $CAIN_KEYSPACE
-n, --namespace string namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
--nodetool-credentials-file string path to nodetool credentials file. Overrides $CAIN_NODETOOL_CREDENTIALS_FILE (default "/home/cassandra/.nodetool/credentials")
-p, --parallel int number of files to copy in parallel. set this flag to 0 for full parallelism. Overrides $CAIN_PARALLEL (default 1)
-m, --s3-max-upload-parts int maximum number of parts to upload in parallel for s3 multipart upload. Overrides $CAIN_S3_MAX_UPLOAD_PARTS (default 10000)
-s, --s3-part-size int size of each part in bytes for s3 multipart upload. Overrides $CAIN_S3_PART_SIZE (default 134217728)
-l, --selector string selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")
Backup to AWS S3
cain backup \
-n default \
-l release=cassandra \
-k keyspace \
--dst s3://db-backup/cassandra
Backup to AWS S3 with Cassandra authentication enabled
cain backup \
-n default \
-l release=cassandra \
-k keyspace \
--dst s3://db-backup/cassandra
-a
-u cain
--nodetool-credentials-file /home/cassandra/.nodetool/credentials
Backup to Azure Blob Storage
cain backup \
-n default \
-l release=cassandra \
-k keyspace \
--dst abs://my-account/db-backup-container/cassandra
Cain performs a restore in the following way:
- Restore schema if
schema
is specified. - Truncate all tables in
keyspace
. - Copy files from the specified
src
(underkeyspace/<keyspaceSchemaHash>/tag/
) - restore is only possible for the same keyspace schema. - Load new data using
nodetool refresh
.
$ cain restore --help
restore cassandra cluster from cloud storage
Usage:
cain restore [flags]
Flags:
-a, --authentication use authentication for nodetool and clqsh. Overrides $CAIN_AUTHENTICATION
-b, --buffer-size float in memory buffer size (MB) to use for files copy (buffer per file). Overrides $CAIN_BUFFER_SIZE (default 6.75)
--cassandra-data-dir string cassandra data directory. Overrides $CAIN_CASSANDRA_DATA_DIR (default "/var/lib/cassandra/data")
-u, --cassandra-username string cassandra username. Overrides $CAIN_CASSANDRA_USERNAME (default "cain")
-c, --container string container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
-h, --help help for restore
-k, --keyspace string keyspace to act on. Overrides $CAIN_KEYSPACE
-n, --namespace string namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
-f, --nodetool-credentials-file string path to nodetool credentials file. Overrides $CAIN_NODETOOL_CREDENTIALS_FILE (default "/home/cassandra/.nodetool/credentials")
-p, --parallel int number of files to copy in parallel. set this flag to 0 for full parallelism. Overrides $CAIN_PARALLEL (default 1)
-s, --schema string schema version to restore (optional). Overrides $CAIN_SCHEMA
-l, --selector string selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")
--src string source to restore from. Example: s3://bucket/cassandra/namespace/cluster-name. Overrides $CAIN_SRC
-t, --tag string tag to restore. Overrides $CAIN_TAG
--user-group string user and group who should own restored files. Overrides $CAIN_USER_GROUP (default "cassandra:cassandra")
Restore from S3
cain restore \
--src s3://db-backup/cassandra/default/ring01
-n default \
-k keyspace \
-l release=cassandra \
-t 20180903091624
Restore from Azure Blob Storage
cain restore \
--src s3://my-account/db-backup-container/cassandra/default/ring01
-n default \
-k keyspace \
-l release=cassandra \
-t 20180903091624
Cain describes the keyspace
schema using cqlsh
. It can return the schema itself, or a checksum of the schema file (used by backup
and restore
).
$ cain schema --help
get schema of cassandra cluster
Usage:
cain schema [flags]
Flags:
-c, --container string container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
-k, --keyspace string keyspace to act on. Overrides $CAIN_KEYSPACE
-n, --namespace string namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
-l, --selector string selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")
--sum print only checksum. Overrides $CAIN_SUM
cain schema \
-n default \
-l release=cassandra \
-k keyspace
cain schema \
-n default \
-l release=cassandra \
-k keyspace \
--sum
Cain commands support the usage of environment variables instead of flags. For example:
The backup
command can be executed as mentioned in the example:
cain backup \
-n default \
-l release=cassandra \
-k keyspace \
--dst s3://db-backup/cassandra
You can also set the appropriate envrionment variables (CAIN_FLAG, _ instead of -):
export CAIN_NAMESPACE=default
export CAIN_SELECTOR=release=cassandra
export CAIN_KEYSPACE=keyspace
export CAIN_DST=s3://db-backup/cassandra
cain backup
Since Cain uses Skbn, adding support for additional storage services is simple. Read this post for more information.
Cain version | Skbn version |
---|---|
0.5.3 | 0.5.1 |
0.5.2 | 0.4.2 |
0.5.1 | 0.4.2 |
0.5.0 | 0.4.1 |
0.4.2 | 0.4.1 |
0.4.1 | 0.4.1 |
0.4.0 | 0.4.0 |
0.3.0 | 0.3.0 |
0.2.0 | 0.2.0 |
0.1.0 | 0.1.1 |
Cain tries to get credentials in the following order:
- if
KUBECONFIG
environment variable is set - skbn will use the current context from that config file - if
~/.kube/config
exists - skbn will use the current context from that config file with an out-of-cluster client configuration - if
~/.kube/config
does not exist - skbn will assume it is working from inside a pod and will use an in-cluster client configuration
Skbn uses the default AWS credentials chain.
Skbn uses AZURE_STORAGE_ACCOUNT
and AZURE_STORAGE_ACCESS_KEY
environment variables for authentication.
When Authentication is enabled Cain will look for default credentials
for cqlsh
in /home/cassandra/.cassandra/credentials
if you use authentication please make sure the cassandra
container has this file and the username and password are correct.
For nodetool
authentications default credentials are in:
/home/cassandra/.nodetool/credentials
can be overridden by
setting the --nodetool-credentials-file
flag.
When this flag is used, the username for the nodetool
authentication must be provided as well .