Automating Database Cloning Using AWS and Kubernetes

Posted on July 9, 2020

At CloudHero, we face challenging situations each day when helping our customers in their digitalization and automation journey. One such challenge was automating the process of cloning the production database and anonymizing the data for development use. Specifically, maintenance is usually done only on the production database, and the staging one has stale data, so there are a lot of differences between the staging and the production environment. Here, we are going to generalize the problem, so we help you adapt these methods to your own use case.

For the following examples, the basic sequence of actions that we will follow is the snapshot/ restore one (or dump/ restore). However, it but it will be adapted to two main use cases. Therefore, you can apply these recommendations if you use Aurora MySQL for your production database, but the commands can be changed for other database engines, as well.

Cloning Your Database

1. Aurora to Aurora

If your development database will also use Aurora, we shall make use of the snapshot and restore primitives that AWS provides. Now, assuming that you have activated automatic snapshots of your production databases, identify the latest snapshot available. But first, we need to define some environment variables:

export PROD_CLUSTER_NAME=<your production cluster name>
export CLONE_INSTANCE_TYPE=<the instance type of the restored database, 
for example, db.t3.medium>
export CLONE_MYSQL_PASS=<new mysql master password for the restored database>

Look for the latest snapshot:

LAST_SNAPSHOT_ARN=$(aws rds describe-db-cluster-snapshots \
	--filters Name=db-cluster-id,Values=$PROD_CLUSTER_NAME \
	--snapshot-type automated \
	| jq .DBClusterSnapshots[-1].DBClusterSnapshotArn | tr -d "\"")

Restore the snapshot into a new cluster:

aws rds restore-db-cluster-from-snapshot \
    --db-cluster-identifier stage-cluster \
    --snapshot-identifier $LAST_SNAPSHOT_ARN \
    --engine aurora \
    --tags Key=env,Value=staging

Then create a database instance inside the restored cluster:

aws rds create-db-instance \
    --db-instance-identifier stage-instance \
    --db-instance-class $CLONE_INSTANCE_TYPE \
    --engine aurora \
    --db-cluster-identifier stage-cluster \
    --tags Key=env,Value=staging

For security reasons, we recommend changing the master password on the restored database. Keep in mind that you may have the wait for your cluster to finish restoring.

aws rds modify-db-cluster \
	--db-cluster-identifier stage-cluster \
	--master-user-password $CLONE_MYSQL_PASS \
	--apply-immediately

Also, we recommend creating new users for your MySQL users and anonymize your data.

2. Aurora to MySQL

In this case, instead of restoring snapshots, what you can do is dump the data from the production cluster and push it to S3. The fastest way to clone a database is to actually create your own snapshot and push that to S3. Specifically, you need to dump and restore your data on a server that is running a compatible version of MySQL. But first, let’s export some variables:

export PROD_CLUSTER_ENDPOINT=<your production cluster endpoint>
export PROD_MYSQL_USER=<your production cluster user>
export PROD_MYSQL_PASS=<your production cluster password>
export LOCAL_MYSQL_USER=<your local mysql user>
export LOCAL_MYSQL_PASSWORD=<your local mysql password>
export S3_BUCKET=<your s3 bucket for mysql snapshots>

Later, to dump data from your production database. You can also use your reader endpoint if you have multiple instances in your Aurora cluster.

mysqldump -h $PROD_CLUSTER_ENDPOINT -u $PROD_MYSQL_USER -p$PROD_MYSQL_PASS --all-databases --single-transaction --skip-lock-tables > dump.sql

To restore the data from the dump:

mysql -u $LOCAL_MYSQL_USER -p$LOCAL_MYSQL_PASSWORD < dump.sql

After the restore process has finished, make sure to shut down the local MySQL server. This way, the files on disk remain unchanged while they are archived.

To create a snapshot of the data and push it on S3 for cloning:

cd /var/lib/mysql && tar -czf mysql_data.tar.gz *

aws s3 cp /var/lib/mysql/mysql_data.tar.gz s3://$S3_BUCKET/mysql_data.tar.gz

rm -rf /var/lib/mysql/*

Automating the Database Cloning Process

Now it’s time to integrate the commands from above into scripts. So, we need to make some small adjustments so human interaction and decisions are not needed.

1. Aurora to Aurora

Looking above, we can immediately spot two problems:

The name for the restored cluster is the same every time.
The master password change needs a human to check when the restore is complete.

For the first issue, we recommend putting a timestamp in the name of the restored cluster. By doing this, there will be no problems when the job runs again. For example, the commands will look like this:

To restore the snapshot into a new cluster:

aws rds restore-db-cluster-from-snapshot \
    --db-cluster-identifier staging-$(date '+%Y%m%d')-cluster \
    --snapshot-identifier $LAST_SNAPSHOT_ARN \
    --engine aurora \
    --tags Key=env,Value=staging

Now create a database instance inside the restored cluster:

aws rds create-db-instance \
    --db-instance-identifier stage-$(date '+%Y%m%d')-instance \
    --db-instance-class $CLONE_INSTANCE_TYPE \
    --engine aurora \
    --db-cluster-identifier staging-$(date '+%Y%m%d')-cluster \
    --tags Key=env,Value=staging

For the second one, we should programatically wait for the Aurora cluster to be available:

STAGE_CLUSTER_ARN=$(aws resourcegroupstaggingapi get-resources \
	--resource-type-filters rds:cluster \
	--tag-filters Key=env,Values=staging \
	| jq .ResourceTagMappingList[-1].ResourceARN \
	| tr -d "\"")

STAGE_CLUSTER_ENDPOINT=$(aws rds describe-db-clusters \
	--filters Name=db-cluster-id,Values=$STAGE_CLUSTER_ARN \
	| jq .DBClusters[0].Endpoint \
	| tr -d "\"")

until mysql -u admin -h $STAGE_CLUSTER_ENDPOINT -p$MYSQL_ADMIN_PASS -e "show databases;" &> /dev/null
do
  echo "MySQL is not ready"
  echo "Trying to set MySQL master password."
  aws rds modify-db-cluster --db-cluster-identifier staging-$(date '+%Y%m%d')-cluster --master-user-password $CLONE_MYSQL_PASS --apply-immediately
  sleep 300
done

Lastly, notice that you also need to export your existing Aurora admin password. It gets copied when you restore from a snapshot.

2. Aurora to MySQL

Here, we can also spot two problems:

The dump name is always the same locally.
The snapshot name on S3 is always the same and there is no latest one.

For the first example, the fix is also to add a timestamp to your dump name.

To dump data from your production database:

mysqldump -h $PROD_CLUSTER_ENDPOINT -u $PROD_MYSQL_USER -p$PROD_MYSQL_PASS --all-databases --single-transaction --skip-lock-tables > dump-$(date '+%Y%m%d').sql

To restore the data from the dump:

mysql -u $LOCAL_MYSQL_USER -p$LOCAL_MYSQL_PASSWORD < dump-$(date '+%Y%m%d').sql

For the second one, we should have a snapshot on S3 which is always the latest, and also historical snapshots. We should also store de MD5 hash of the latest dump. Thus, if we have a setup that always queries S3 for the latest dump, it should only download it if it’s newer than the local snapshot.

cd /var/lib/mysql && tar -czf mysql_data.tar.gz *

md5sum mysql_data.tar.gz | cut -d' ' -f1 > mysql_data-$(date '+%Y%m%d').tar.gz.md5

aws s3 cp /var/lib/mysql/mysql_data.tar.gz s3://$S3_BUCKET/mysql_data-$(date '+%Y%m%d').tar.gz

aws s3 cp s3://$S3_BUCKET/mysql_data-$(date '+%Y%m%d').tar.gz s3://$S3_BUCKET/mysql_data-latest.tar.gz

aws s3 cp /var/lib/mysql/mysql_data-$(date '+%Y%m%d').tar.gz.md5 s3://$S3_BUCKET/mysql_data-$(date '+%Y%m%d').tar.gz.md5

aws s3 cp s3://$S3_BUCKET/mysql_data-$(date '+%Y%m%d').tar.gz.md5 s3://$S3_BUCKET/mysql_data-latest.tar.gz.md5

rm -rf /var/lib/mysql/*

Wrapping Cloning Automation in Kubernetes

The icing on the cake of automating database cloning is using Kubernetes to manage and deploy your CronJobs. For this reason, you have much better visibility over how jobs are running. And it’s easy to schedule them on any type of machine you like.

Firstly, we must pack all the commands in a single shell script, named create_stage_db.sh, and create a Docker container with MySQL and awscli installed. Below we have a Dockerfile with all the tools you need based on the MySQL 5.6 base image:

FROM mysql:5.6

RUN apt-get update && apt-get install python-pip jq curl -y && \
    pip install awscli

COPY create_stage_db.sh /usr/local/bin/

RUN chmod 755 /usr/local/bin/create_stage_db.sh && \
    rm -rf /var/lib/apt/lists/*

After building the image, you should push it to an image registry. In the YAML examples below, we will suppose that your image will use the name myorganisation/mysql-clone (it’s obvious that your image name will be different).

Let’s dive into two solutions that can help you automate the process of cloning the production database in Kubernetes:

1. Aurora to Aurora

The create_stage_db.sh script will look like this:

#!/bin/sh

LAST_SNAPSHOT_ARN=$(aws rds describe-db-cluster-snapshots \
	--filters Name=db-cluster-id,Values=$PROD_CLUSTER_NAME \
	--snapshot-type automated \
	| jq .DBClusterSnapshots[-1].DBClusterSnapshotArn | tr -d "\"")

aws rds restore-db-cluster-from-snapshot \
    --db-cluster-identifier staging-$(date '+%Y%m%d')-cluster \
    --snapshot-identifier $LAST_SNAPSHOT_ARN \
    --engine aurora \
    --tags Key=env,Value=staging

aws rds create-db-instance \
    --db-instance-identifier stage-$(date '+%Y%m%d')-instance \
    --db-instance-class $CLONE_INSTANCE_TYPE \
    --engine aurora \
    --db-cluster-identifier staging-$(date '+%Y%m%d')-cluster \
    --tags Key=env,Value=staging

STAGE_CLUSTER_ARN=$(aws resourcegroupstaggingapi get-resources \
	--resource-type-filters rds:cluster \
	--tag-filters Key=env,Values=staging \
	| jq .ResourceTagMappingList[-1].ResourceARN \
	| tr -d "\"")

STAGE_CLUSTER_ENDPOINT=$(aws rds describe-db-clusters \
	--filters Name=db-cluster-id,Values=$STAGE_CLUSTER_ARN \
	| jq .DBClusters[0].Endpoint \
	| tr -d "\"")

until mysql -u admin -h $STAGE_CLUSTER_ENDPOINT -p$MYSQL_ADMIN_PASS -e "show databases;" &> /dev/null
do
  echo "MySQL is not ready"
  echo "Trying to set MySQL master password."
  aws rds modify-db-cluster --db-cluster-identifier staging-$(date '+%Y%m%d')-cluster --master-user-password $CLONE_MYSQL_PASS --apply-immediately
  sleep 300
done

and a YAML manifest to run it which looks like this:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: create-stage-db
spec:
  schedule: "0 0 * * SAT"
  concurrencyPolicy: "Forbid"
  jobTemplate:
    spec:
      backoffLimit: 0
      template:
        metadata:
          labels:
            component: create-stage-db
        spec:
          containers:
          - name: create-stage-db
            image: myorganisation/mysql-clone
            imagePullPolicy: Always
            command: ["create_stage_db.sh"]
            env:
            - name: PROD_CLUSTER_NAME
              value: <your production cluster name>
            - name: CLONE_INSTANCE_TYPE
              value: <the instance type of the restored database,
for example, db.t3.medium>
            - name: CLONE_MYSQL_PASS
              value: <new mysql master password for the restored database>
            - name: MYSQL_ADMIN_PASS
              value: <your mysql admin password>
          restartPolicy: Never

This CronJob will run every week on Saturday at midnight and create new Aurora staging instances for you.

2. Aurora to MySQL

The create_stage_db.sh script will look like this:

#!/bin/bash

service mysql start

until mysql -e "show databases;" &> /dev/null
do
  echo "MySQL is not ready"
  sleep 10
done

mysqldump -h $PROD_CLUSTER_ENDPOINT \
    -u $PROD_MYSQL_USER \
    -p$PROD_MYSQL_PASS \
    --all-databases \
    --single-transaction \
    --skip-lock-tables > dump-$(date '+%Y%m%d').sql

mysql -u $LOCAL_MYSQL_USER \
    -p$LOCAL_MYSQL_PASSWORD < dump-$(date '+%Y%m%d').sql

service mysql stop

while mysql -e "show databases;" &> /dev/null
do
  echo "MySQL is still running"
  sleep 10
done

cd /var/lib/mysql && tar -czf mysql_data.tar.gz *
md5sum mysql_data.tar.gz | cut -d' ' -f1 > mysql_data-$(date '+%Y%m%d').tar.gz.md5
aws s3 cp /var/lib/mysql/mysql_data.tar.gz s3://$S3_BUCKET/mysql_data-$(date '+%Y%m%d').tar.gz
aws s3 cp s3://$S3_BUCKET/mysql_data-$(date '+%Y%m%d').tar.gz s3://$S3_BUCKET/mysql_data-latest.tar.gz
aws s3 cp /var/lib/mysql/mysql_data-$(date '+%Y%m%d').tar.gz.md5 s3://$S3_BUCKET/mysql_data-$(date '+%Y%m%d').tar.gz.md5
aws s3 cp s3://$S3_BUCKET/mysql_data-$(date '+%Y%m%d').tar.gz.md5 s3://$S3_BUCKET/mysql_data-latest.tar.gz.md5
rm -rf /var/lib/mysql/*

and a YAML manifest to run it which looks like this:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: create-stage-db
spec:
  schedule: "0 0 * * SAT"
  concurrencyPolicy: "Forbid"
  jobTemplate:
    spec:
      backoffLimit: 0
      template:
        metadata:
          labels:
            component: create-stage-db
        spec:
          containers:
          - name: create-stage-db
            image: myorganisation/mysql-clone
            imagePullPolicy: Always
            command: ["create_stage_db.sh"]
            env:
            - name: PROD_CLUSTER_ENDPOINT
              value: <your production cluster endpoint>
            - name: PROD_MYSQL_USER
              value: <your production cluster user>
            - name: PROD_MYSQL_PASS
              value: <your production cluster password>
            - name: LOCAL_MYSQL_USER
              value: <your local mysql user>
            - name: LOCAL_MYSQL_PASSWORD
              value: <your local mysql password>
            - name: S3_BUCKET
              value: <your s3 bucket for mysql snapshots>
          restartPolicy: Never

This CronJob will run every week on Saturday at midnight, create MySQL snapshots and push them to S3.

We hope that these instructions helped you gain more context on automating database cloning with Kubernetes and AWS.

Last but not least, if you wish to go the extra mile, here are two improvement ideas you could implement in order to take your Kubernetes CronJobs to the next level:

Store your password in secrets instead of leaving them in the main YAML manifest.
If your production database is big, you should mount a PVC in your cronjob when doing the Aurora to MySQL setup. Having a bigger PVC will also speed up the dump/ restore process, as AWS assigns IOPS capacity to disks based on their size.