This guide explains how to configure backup and restore for KUDO Cassandra.
- AWS S3 bucket for storing backups
- AWS CLI for local checks (not required for the actual operator)
- A
cqlsh
binary should be in the path
aws s3 ls
This should return a list of buckets accessible with your current AWS credentials
The examples below assume that the instance and namespace names are stored in the following shell variables. With this assumptions met, you should be able to copy-paste the commands easily and it prevents typos in reused values.
INSTANCE_NAME=cassandra
NAMESPACE=backup-test
SECRET_NAME=aws-credentials
BACKUP_BUCKET_NAME=<my-aws-s3-bucket>
BACKUP_PREFIX=cluster1
BACKUP_NAME=Backup1
cat <<EOF > aws-credentials.yaml
kind: Secret
apiVersion: v1
metadata:
name: ${SECRET_NAME}
namespace: ${NAMESPACE}
stringData:
access-key: <YOUR AWS ACCESS KEY>
secret-key: <YOUR AWS SECRET KEY>
EOF
Replace the values for access-key and secret key with the actual values for your AWS account.
If you are using temporary AWS credentials with a security token, the file should look like this:
cat <<EOF > aws-credentials.yaml
kind: Secret
apiVersion: v1
metadata:
name: ${SECRET_NAME}
namespace: ${NAMESPACE}
stringData:
access-key: <YOUR AWS ACCESS KEY>
secret-key: <YOUR AWS SECRET ACCESS KEY>
security-token: <YOUR AWS SECURITY TOKEN>
EOF
You can find these values by looking at ~/.aws/credentials
:
cat ~/.aws/credentials
Apply the secret to your Kubernets cluster:
kubectl apply --namespace $NAMESPACE -f aws-credentials.yaml
To allow the backup plan to run, the cluster must be set up with a specific configuration:
BACKUP_RESTORE_ENABLED
This enables the backup functionality in generalBACKUP_AWS_CREDENTIALS_SECRET
Allows the instances to access the AWS credentials without storing them in the operator itselfBACKUP_AWS_S3_BUCKET_NAME
Defines the AWS S3 bucket where the backup will be storedBACKUP_PREFIX
Prepends a prefix to the backups in the S3 buckets and allows multiple KUDO Cassandra clusters to reside in the same bucketEXTERNAL_NATIVE_TRANSPORT
Setting this to true is not required for the backup, but allows us to access the cluster withcqlsh
from the local machine
kubectl kudo install cassandra \
--instance $INSTANCE_NAME \
--namespace $NAMESPACE \
-p EXTERNAL_NATIVE_TRANSPORT=true \
-p BACKUP_RESTORE_ENABLED=true \
-p BACKUP_AWS_CREDENTIALS_SECRET=$SECRET_NAME \
-p BACKUP_AWS_S3_BUCKET_NAME=$BACKUP_BUCKET_NAME \
-p BACKUP_PREFIX=$BACKUP_PREFIX
kubectl kudo plan status --instance=$INSTANCE_NAME -n $NAMESPACE
In the output note if:
- the current
Operator-Version
matches your expectation, and - deploy plan is
COMPLETE
(this may take a couple minutes)
Example output:
Plan(s) for "cassandra" in namespace "default":
.
└── cassandra (Operator-Version: "cassandra-1.0.0" Active-Plan: "deploy")
├── Plan deploy (serial strategy) [COMPLETE]
│ └── Phase nodes (parallel strategy) [COMPLETE]
│ └── Step node [COMPLETE]
├── Plan upgrade (serial strategy) [NOT ACTIVE]
│ ├── Phase cleanup (serial strategy) [NOT ACTIVE]
│ │ └── Step cleanup-stateful-set [NOT ACTIVE]
│ └── Phase reinstall (serial strategy) [NOT ACTIVE]
│ └── Step node [NOT ACTIVE]
└── Plan backup (serial strategy) [NOT ACTIVE]
└── Phase backup (serial strategy) [NOT ACTIVE]
├── Step cleanup [NOT ACTIVE]
└── Step backup [NOT ACTIVE]
Save the DNS of the external service of the cluster:
CASSANDRA_CLUSTER=`kubectl get service --namespace $NAMESPACE --field-selector metadata.name=$INSTANCE_NAME-svc-external -o jsonpath='{.items[*].status.loadBalancer.ingress[*].hostname}'`
echo "Cassandra Cluster DNS: $CASSANDRA_CLUSTER"
cqlsh $CASSANDRA_CLUSTER <<EOF
CREATE SCHEMA schema1 WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
USE schema1;
CREATE TABLE users (user_id varchar PRIMARY KEY,first varchar,last varchar,age int);
INSERT INTO users (user_id, first, last, age) VALUES ('jsmith', 'John', 'Smith', 42);
EOF
Verify that the data was written correctly:
cqlsh $CASSANDRA_CLUSTER -e "USE schema1; SELECT * FROM users;"
Expected output should show:
user_id | age | first | last
---------+-----+-------+-------
jsmith | 42 | John | Smith
(1 rows)
TODO: Use plan trigger in KUDO v0.11
The following command will trigger the creation of the backup with the name "Backup1"
kubectl kudo update --instance=$INSTANCE_NAME -n $NAMESPACE -p BACKUP_NAME=$BACKUP_NAME -p BACKUP_TRIGGER=3
First, get a list of running jobs:
kubectl get jobs --namespace=$NAMESPACE
This should produce a list of the started backup jobs, one for each active node:
NAME COMPLETIONS DURATION AGE
backup-node-0 1/1 25s 33s
backup-node-1 1/1 21s 33s
backup-node-2 1/1 23s 33s
Next, lets view a list of all the pods:
kubectl get pods --namespace=$NAMESPACE
This shows us the cassandra nodes and the temporary backup-pods, created by the jobs:
NAME READY STATUS RESTARTS AGE
backup-node-0-bq228 0/1 Completed 0 83s
backup-node-1-fm7dq 0/1 Completed 0 83s
backup-node-2-tdzfm 0/1 Completed 0 83s
cassandra-node-0 3/3 Running 0 27m
cassandra-node-1 3/3 Running 0 27m
cassandra-node-2 3/3 Running 0 27m
The backup-pods will stay there when they are completed, but they will get cleaned up before the next backup job starts.
Let's have a look at the logs at one of the backup pods:
FIRST_BACKUP_POD_NAME=`kubectl get pods --namespace=$NAMESPACE | awk '{ if(NR==2) print $1}'`
echo "Name of first backup pod is $FIRST_BACKUP_POD_NAME"
kubectl logs $FIRST_BACKUP_POD_NAME --namespace=$NAMESPACE
This should show the logs of the backup job:
Starting medusa: python3 /usr/local/bin/medusa backup --backup-name Backup1
[2020-03-06 12:00:22,036] INFO: Monitoring provider is noop
[2020-03-06 12:00:23,605] WARNING: is ccm : 0
[2020-03-06 12:00:23,889] INFO: Creating snapshot
[2020-03-06 12:00:23,889] INFO: Saving tokenmap and schema
[2020-03-06 12:00:25,349] INFO: Node cassandra-node-0.cassandra-svc.backup-test.svc.cluster.local does not have latest backup
[2020-03-06 12:00:25,349] INFO: Starting backup
[2020-03-06 12:00:28,406] INFO: Uploading /var/lib/cassandra/data/system_auth/roles-5bc52802de2535edaeab188eecebb090/snapshots/medusa-58cc941f-7d94-4f65-8d51-0a253bafb147/md-1-big-TOC.txt (92.000B)
[2020-03-06 12:00:28,456] INFO: Uploading /var/lib/cassandra/data/system_auth/roles-5bc52802de2535edaeab188eecebb090/snapshots/medusa-58cc941f-7d94-4f65-8d51-0a253bafb147/md-1-big-Summary.db (71.000B)
[2020-03-06 12:00:28,472] INFO: Uploading /var/lib/cassandra/data/system_auth/roles-5bc52802de2535edaeab188eecebb090/snapshots/medusa-58cc941f-7d94-4f65-8d51-0a253bafb147/md-1-big-Data.db (102.000B)
...
[2020-03-06 12:00:37,783] INFO: Uploading /var/lib/cassandra/data/system/peers-37f71aca7dc2383ba70672528af04d4f/snapshots/medusa-58cc941f-7d94-4f65-8d51-0a253bafb147/md-1-big-Filter.db (16.000B)
[2020-03-06 12:00:37,804] INFO: Uploading /var/lib/cassandra/data/system/peers-37f71aca7dc2383ba70672528af04d4f/snapshots/medusa-58cc941f-7d94-4f65-8d51-0a253bafb147/md-2-big-Data.db (69.000B)
[2020-03-06 12:00:40,304] INFO: Updating backup index
[2020-03-06 12:00:41,352] INFO: Backup done
[2020-03-06 12:00:41,352] INFO: - Started: 2020-03-06 12:00:22
- Started extracting data: 2020-03-06 12:00:25
- Finished: 2020-03-06 12:00:41
[2020-03-06 12:00:41,352] INFO: - Real duration: 0:00:16.095052 (excludes time waiting for other nodes)
[2020-03-06 12:00:41,353] INFO: - 281 files, 291.20 KB
[2020-03-06 12:00:41,353] INFO: - 281 files copied from host
Done: 0
These files should now show up in the S3 bucket as well:
aws s3 ls $BACKUP_BUCKET_NAME/$BACKUP_PREFIX --recursive
2020-03-06 13:00:25 12 cluster1/cassandra-node-0.cassandra-svc.backup-test.svc.cluster.local/Backup1/meta/differential
2020-03-06 13:00:41 65831 cluster1/cassandra-node-0.cassandra-svc.backup-test.svc.cluster.local/Backup1/meta/manifest.json
2020-03-06 13:00:24 35214 cluster1/cassandra-node-0.cassandra-svc.backup-test.svc.cluster.local/Backup1/meta/schema.cql
2020-03-06 13:00:25 16677 cluster1/cassandra-node-0.cassandra-svc.backup-test.svc.cluster.local/Backup1/meta/tokenmap.json
2020-03-06 13:00:29 43 cluster1/cassandra-node-0.cassandra-svc.backup-test.svc.cluster.local/data/schema1/users-edfe97c05fa111eab6374b40d92765b8/md-1-big-CompressionInfo.db
2020-03-06 13:00:29 50 cluster1/cassandra-node-0.cassandra-svc.backup-test.svc.cluster.local/data/schema1/users-edfe97c05fa111eab6374b40d92765b8/md-1-big-Data.db
...
2020-03-06 13:00:38 7 cluster1/index/latest_backup/cassandra-node-1.cassandra-svc.backup-test.svc.cluster.local/backup_name.txt
2020-03-06 13:00:38 16677 cluster1/index/latest_backup/cassandra-node-1.cassandra-svc.backup-test.svc.cluster.local/tokenmap.json
2020-03-06 13:00:39 7 cluster1/index/latest_backup/cassandra-node-2.cassandra-svc.backup-test.svc.cluster.local/backup_name.txt
2020-03-06 13:00:39 16677 cluster1/index/latest_backup/cassandra-node-2.cassandra-svc.backup-test.svc.cluster.local/tokenmap.json
KUDO Cassandra currently only supports a full restore into a new cluster.
This step is not required, but may help if your cluster does not have enough resources to run two Cassandra clusters at the same time.
kubectl kudo uninstall --instance $INSTANCE_NAME --namespace $NAMESPACE
You may have noticed that the uninstall of the operator does not delete the persistent volume claims:
kubectl get pvc --namespace $NAMESPACE
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
var-lib-cassandra-cassandra-node-0 Bound pvc-f4048fa7-23da-4410-b7b0-b7c82729fc7d 20Gi RWO awsebscsiprovisioner 64m
var-lib-cassandra-cassandra-node-1 Bound pvc-2c85752f-6438-4c8d-8e87-9f4c91df6452 20Gi RWO awsebscsiprovisioner 64m
var-lib-cassandra-cassandra-node-2 Bound pvc-56123519-50c8-4ef7-83ab-302655101497 20Gi RWO awsebscsiprovisioner 64m
This is intentional, so that in case of an accidental deletion of the KUDO operator the data can still be accessed. In this case we don't need them, so to make sure to delete them before reinstalling the cluster with the S3 backup:
kubectl delete pvc --all --namespace $NAMESPACE
persistentvolumeclaim "var-lib-cassandra-cassandra-node-0" deleted
persistentvolumeclaim "var-lib-cassandra-cassandra-node-1" deleted
persistentvolumeclaim "var-lib-cassandra-cassandra-node-2" deleted
Now we have a clean namespace and can start the restore process
To create a full restore, the operator needs to create a new cluster.
Important: At the moment, only a restore with the very same settings are supported, this is especially true for:
- NODE_COUNT: A different node count will fail to create a successful restore.
- NUM_TOKENS: This setting will be ignored on the restored cluster, as it will take over the tokens from the backup
Namespace and instance name can differ, but the namespace and instance name from the backed up cluster is important.
Additional to the backup parameters used in the installation for the first cluster, these new parameters are used:
- RESTORE_FLAG: If true, an initContainer is created that restores a backup before the cluster is started
- RESTORE_OLD_NAMESPACE: The Namespace of the cluster that was backed up
- RESTORE_OLD_NAME: The instance name of the cluster that was backed up. Both parameters need to be set so the operator can identify the correct data to restore
- BACKUP_NAME: The name of the backup to restore
kubectl kudo install cassandra \
--instance $INSTANCE_NAME \
--namespace $NAMESPACE \
-p EXTERNAL_NATIVE_TRANSPORT=true \
-p BACKUP_RESTORE_ENABLED=true \
-p BACKUP_AWS_CREDENTIALS_SECRET=$SECRET_NAME \
-p BACKUP_AWS_S3_BUCKET_NAME=$BACKUP_BUCKET_NAME \
-p BACKUP_PREFIX=$BACKUP_PREFIX \
-p RESTORE_FLAG=true \
-p RESTORE_OLD_NAMESPACE=$NAMESPACE \
-p RESTORE_OLD_NAME=$INSTANCE_NAME \
-p BACKUP_NAME=$BACKUP_NAME
Again, we should check if the plan is executed correctly and no ERRORS show up:
kubectl kudo plan status --instance=$INSTANCE_NAME -n $NAMESPACE
Plan(s) for "cassandra" in namespace "default":
.
└── cassandra (Operator-Version: "cassandra-1.0.0" Active-Plan: "deploy")
├── Plan deploy (serial strategy) [IN_PROGRESS]
│ └── Phase nodes (parallel strategy) [IN_PROGRESS]
│ └── Step node [IN_PROGRESS]
├── Plan upgrade (serial strategy) [NOT ACTIVE]
│ ├── Phase cleanup (serial strategy) [NOT ACTIVE]
│ │ └── Step cleanup-stateful-set [NOT ACTIVE]
│ └── Phase reinstall (serial strategy) [NOT ACTIVE]
│ └── Step node [NOT ACTIVE]
└── Plan backup (serial strategy) [NOT ACTIVE]
└── Phase backup (serial strategy) [NOT ACTIVE]
├── Step cleanup [NOT ACTIVE]
└── Step backup [NOT ACTIVE]
We can look into the restore process that happens in the init container before the actual pod starts:
kubectl logs $INSTANCE_NAME-node-0 -c medusa-restore --namespace $NAMESPACE
Start Restore for node 'cassandra' from backup 'Backup1' in prefix 'cluster1'
[2020-03-06 12:48:06,591] WARNING: is ccm : 0
[2020-03-06 12:48:06,795] INFO: Downloading data from backup to /tmp/medusa-restore-5aa6b9e3-831d-4d80-9ed1-1ca88f169f5b
[2020-03-06 12:48:08,226] INFO: Downloading backup data
...
[2020-03-06 12:48:17,947] INFO: Downloading backup data
[2020-03-06 12:48:18,522] INFO: Downloading backup metadata...
[2020-03-06 12:48:18,696] INFO: Stopping Cassandra
[2020-03-06 12:48:18,702] INFO: Moving backup data to Cassandra data directory
[2020-03-06 12:48:21,028] INFO: No --seeds specified so we will not wait for any
[2020-03-06 12:48:21,028] INFO: Starting Cassandra
This step can take quite a while for bigger backups.
As we have created a new cluster, and therefore a new service, we need to get the external DNS again: Save the DNS of the external service of the cluster:
CASSANDRA_CLUSTER=`kubectl get service --namespace $NAMESPACE --field-selector metadata.name=$INSTANCE_NAME-svc-external -o jsonpath='{.items[*].status.loadBalancer.ingress[*].hostname}'`
echo "Cassandra Cluster DNS: $CASSANDRA_CLUSTER"
Now we can verify we get the same results as before the backup:
cqlsh $CASSANDRA_CLUSTER -e "USE schema1; SELECT * FROM users;"
user_id | age | first | last
---------+-----+-------+-------
jsmith | 42 | John | Smith
(1 rows)