Automate backups for Amazon RDS for PostgreSQL DB instances by using AWS Batch - AWS Prescriptive Guidance

Automate backups for Amazon RDS for PostgreSQL DB instances by using AWS Batch

Created by Kirankumar Chandrashekar (AWS)

Environment: PoC or pilot

Technologies: Containers & microservices; Databases; DevOps

Workload: All other workloads

AWS services: Amazon RDS; AWS Batch; Amazon CloudWatch; AWS Lambda; Amazon S3

Summary

Backing up your PostgreSQL databases is an important task and can typically be completed with the pg_dump utility, which uses the COPY command by default to create a schema and data dump of a PostgreSQL database. However, this process can become repetitive if you require regular backups for multiple PostgreSQL databases. If your PostgreSQL databases are hosted in the cloud, you can also take advantage of the automated backup feature provided by Amazon Relational Database Service (Amazon RDS) for PostgreSQL as well. This pattern describes how to automate regular backups for Amazon RDS for PostgreSQL DB instances using the pg_dump utility.

Note: The instructions assume that you're using Amazon RDS. However, you can also use this approach for PostgreSQL databases that are hosted outside Amazon RDS. To take backups, the AWS Lambda function must be able to access your databases.

A time-based Amazon CloudWatch Events event initiates a Lambda function that searches for specific backup tags applied to the metadata of the PostgreSQL DB instances on Amazon RDS. If the PostgreSQL DB instances have the bkp:AutomatedDBDump = Active tag and other required backup tags, the Lambda function submits individual jobs for each database backup to AWS Batch. 

AWS Batch processes these jobs and uploads the backup data to an Amazon Simple Storage Service (Amazon S3) bucket. This pattern uses a Dockerfile and an entrypoint.sh file to build a Docker container image that is used to make backups in the AWS Batch job. After the backup process is complete, AWS Batch records the backup details to an inventory table on Amazon DynamoDB. As an additional safeguard, a CloudWatch Events event initiates an Amazon Simple Notification Service (Amazon SNS) notification if a job fails in AWS Batch. 

Prerequisites and limitations

Prerequisites 

Architecture

Architecture to back up Amazon RDS for PostgreSQL DB instances by using the pg_dump utility.

Technology stack  

  • Amazon CloudWatch Events

  • Amazon DynamoDB

  • Amazon Elastic Container Registry (Amazon ECR)

  • Amazon RDS

  • Amazon SNS

  • Amazon S3

  • AWS Batch

  • AWS Key Management Service (AWS KMS)

  • AWS Lambda

  • AWS Secrets Manager

  • Docker

Tools

  • Amazon CloudWatch Events – CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources.

  • Amazon DynamoDB – DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.

  • Amazon ECR – Amazon Elastic Container Registry (Amazon ECR) is a managed AWS container image registry service that is secure, scalable, and reliable.

  • Amazon RDS – Amazon Relational Database Service (Amazon RDS) is a web service that makes it easier to set up, operate, and scale a relational database in the AWS Cloud.

  • Amazon SNS – Amazon Simple Notification Service (Amazon SNS) is a managed service that provides message delivery from publishers to subscribers.

  • Amazon S3 – Amazon Simple Storage Service (Amazon S3) is storage for the internet.

  • AWS Batch – AWS Batch helps you run batch computing workloads on the AWS Cloud.

  • AWS KMS – AWS Key Management Service (AWS KMS) is a managed service that makes it easy for you to create and control the encryption keys used to encrypt your data.

  • AWS Lambda – Lambda is a compute service that helps you run code without provisioning or managing servers.

  • AWS Secrets Manager – Secrets Manager helps you replace hardcoded credentials in your code, including passwords, with an API call to Secrets Manager to retrieve the secret programmatically.

  • Docker – Docker helps developers easily pack, ship, and run any application as a lightweight, portable, and self-sufficient container.

Your PostgreSQL DB instances on Amazon RDS must have tags applied to their metadata. The Lambda function searches for tags to identify DB instances that should be backed up, and the following tags are typically used.

Tag

Description

bkp:AutomatedDBDump = Active

Identifies an Amazon RDS DB instance as a candidate for backups.

bkp:AutomatedBackupSecret = <secret_name >

Identifies the Secrets Manager secret that contains the Amazon RDS login credentials.

bkp:AutomatedDBDumpS3Bucket = <s3_bucket_name>

Identifies the S3 bucket to send backups to.

bkp:AutomatedDBDumpFrequency

bkp:AutomatedDBDumpTime

Identify the frequency and times when databases should be backed up. 

bkp:pgdumpcommand = <pgdump_command>

Identifies the databases for which the backups need to be taken.

Epics

TaskDescriptionSkills required

Create a table in DynamoDB.

Sign in to the AWS Management Console, open the Amazon DynamoDB console, and create a table. For help with this and other stories, see the Related resources section.

Cloud administrator, Database administrator

Confirm that the table was created.

Run the aws dynamodb describe-table --table-name <table-name> | grep TableStatus command. If the table exists, the command will return the "TableStatus": "ACTIVE", result.

Cloud administrator, Database administrator
TaskDescriptionSkills required

Create an SNS topic.

Open the Amazon SNS console, choose Topics, and create an SNS topic with the name JobFailedAlert. Subscribe an active email address to the topic, and check your email inbox to confirm the SNS subscription email from AWS Notifications.

Cloud administrator

Create a failed job event rule for AWS Batch.

Open the Amazon CloudWatch console, choose Events, and then choose Create rule. Choose Show advanced options, and choose Edit. For Build a pattern that selects events for processing by your targets, replace the existing text with the “Failed job event” code from the Additional information section. This code defines a CloudWatch Events rule that initiates when AWS Batch has a Failed event.

Cloud administrator

Add event rule target.

In Targets, choose Add targets, and choose the JobFailedAlert SNS topic. Configure the remaining details and create the Cloudwatch Events rule.

Cloud administrator
TaskDescriptionSkills required

Create an Amazon ECR repository.

Open the Amazon ECR console and choose the AWS Region in which you want to create your repository. Choose Repositories, and then choose Create repository. Configure the repository according to your requirements.

Cloud administrator

Write a Dockerfile.

Sign in to Docker and use the “Sample Dockerfile” and “Sample entrypoint.sh file” from the Additional information section to build a Dockerfile.

DevOps engineer

Create a Docker image and push it to the Amazon ECR repository.

Build the Dockerfile into a Docker image and push it to the Amazon ECR repository. For help with this story, see the Related resources section.

DevOps engineer
TaskDescriptionSkills required

Create an AWS Batch job definition.

Open the AWS Batch console and create a job definition that includes the Amazon ECR repository’s Uniform Resource Identifier (URI) as the property Image.

Cloud administrator

Configure the AWS Batch job queue.

On the AWS Batch console, choose Job queues, and then choose Create queue. Create a job queue that will store jobs until AWS Batch runs them on the resources within your compute environment. Important: Make sure you write logic for AWS Batch to record the backup details to the DynamoDB inventory table.

Cloud administrator
TaskDescriptionSkills required

Create a Lambda function to search for tags.

Create a Lambda function that searches for tags on your PostgreSQL DB instances and identifies backup candidates. Make sure your Lambda function can identify the bkp:AutomatedDBDump = Active tag and all other required tags. Important: The Lambda function must also be able to add jobs to the AWS Batch job queue.

DevOps engineer

Create a time-based CloudWatch Events event.

Open the Amazon CloudWatch console and create a CloudWatch Events event that uses a cron expression to run your Lambda function on a regular schedule. Important: All scheduled events use the UTC time zone.

Cloud administrator
TaskDescriptionSkills required

Create an Amazon KMS key.

Open the Amazon KMS console and create a KMS key that can be used to encrypt the Amazon RDS credentials stored in AWS Secrets Manager.

Cloud administrator

Create an AWS Secrets Manager secret.

Open the AWS Secrets Manager console and store your Amazon RDS for PostgreSQL database credentials as a secret.

Cloud administrator

Add the required tags to the PostgreSQL DB instances.

Open the Amazon RDS console and add tags to the PostgreSQL DB instances that you want to automatically back up. You can use the tags from the table in the Tools section. If you require backups from multiple PostgreSQL databases within the same Amazon RDS instance, then use -d test:-d test1 as the value for the bkp:pgdumpcommand tag. Important: test and test1 are database names. Make sure that there is no space after the colon (:).

Cloud administrator

Verify the backup automation.

To verify the backup automation, you can either invoke the Lambda function or wait for the backup schedule to begin. After the backup process is complete, check that the DynamoDB inventory table has a valid backup entry for your PostgreSQL DB instances. If they match, then the backup automation process is successful.

Cloud administrator

Related resources

Create an inventory table in DynamoDB

 

Create an SNS topic for failed job events in AWS Batch

 

Build a Docker image and push it to an Amazon ECR repository

 

Create the AWS Batch components 

 

Create a Lambda function

 

Create a CloudWatch Events event

 

Test the backup automation

Additional information

Failed job event:

{ "detail-type": [ "Batch Job State Change" ], "source": [ "aws.batch" ], "detail": { "status": [ "FAILED" ] } }

Sample Dockerfile:

FROM alpine:latest RUN apk --update add py-pip postgresql-client jq bash && \ pip install awscli && \ rm -rf /var/cache/apk/* ADD entrypoint.sh /usr/bin/ RUN chmod +x /usr/bin/entrypoint.sh ENTRYPOINT ["entrypoint.sh"]

Sample entrypoint.sh file:

#!/bin/bash set -e DATETIME=`date +"%Y-%m-%d_%H_%M"` FILENAME=RDS_PostGres_dump_${RDS_INSTANCE_NAME} FILE=${FILENAME}_${DATETIME} aws configure --profile new-profile set role_arn arn:aws:iam::${TargetAccountId}:role/${TargetAccountRoleName} aws configure --profile new-profile set credential_source EcsContainer echo "Central Account access provider IAM role is: " aws sts get-caller-identity echo "Target Customer Account access provider IAM role is: " aws sts get-caller-identity --profile new-profile securestring=$(aws secretsmanager get-secret-value --secret-id $SECRETID --output json --query 'SecretString' --region=$REGION --profile new-profile) if [[ ${securestring} ]]; then echo "successfully accessed secrets manager and got the credentials" export PGPASSWORD=$(echo $securestring | jq --raw-output | jq -r '.DB_PASSWORD') PGSQL_USER=$(echo $securestring | jq --raw-output | jq -r '.DB_USERNAME') echo "Executing pg_dump for the PostGres endpoint ${PGSQL_HOST}" # pg_dump -h $PGSQL_HOST -U $PGSQL_USER -n dms_sample | gzip -9 -c | aws s3 cp - --region=$REGION --profile new-profile s3://$BUCKET/$FILE # in="-n public:-n private" IFS=':' list=($EXECUTE_COMMAND); for command in "${list[@]}"; do echo $command; pg_dump -h $PGSQL_HOST -U $PGSQL_USER ${command} | gzip -9 -c | aws s3 cp - --region=$REGION --profile new-profile s3://${BUCKET}/${FILE}-${command}".sql.gz" echo $?; if [[ $? -ne 0 ]]; then echo "Error occurred in database backup process. Exiting now....." exit 1 else echo "Postgresql dump was successfully taken for the RDS endpoint ${PGSQL_HOST} and is uploaded to the following S3 location s3://${BUCKET}/${FILE}-${command}.sql.gz" #write the details into the inventory table in central account echo "Writing to DynamoDB inventory table" aws dynamodb put-item --table-name ${RDS_POSTGRES_DUMP_INVENTORY_TABLE} --region=$REGION --item '{ "accountId": { "S": "'"${TargetAccountId}"'" }, "dumpFileUrl": {"S": "'"s3://${BUCKET}/${FILE}-${command}.sql.gz"'" }, "DumpAvailableTime": {"S": "'"`date +"%Y-%m-%d::%H::%M::%S"` UTC"'"}}' echo $? if [[ $? -ne 0 ]]; then echo "Error occurred while putting item to DynamoDb Inventory Table. Exiting now....." exit 1 else echo "Successfully written to DynamoDb Inventory Table ${RDS_POSTGRES_DUMP_INVENTORY_TABLE}" fi fi done; else echo "Something went wrong {$?}" exit 1 fi exec "$@"