Exporting DB snapshot data to Amazon S3 - Amazon Aurora

Exporting DB snapshot data to Amazon S3

You can export DB snapshot data to an Amazon S3 bucket. After the data is exported, you can analyze the exported data directly through tools like Amazon Athena or Amazon Redshift Spectrum. The export process runs in the background and doesn't affect the performance of your active DB cluster.

When you export a DB snapshot, Amazon Aurora extracts data from the snapshot and stores it in an Amazon S3 bucket in your account. The data is stored in an Apache Parquet format that is compressed and consistent.

You can export manual snapshots and automated system snapshots. By default, all data in the snapshot is exported. However, you can choose to export specific sets of databases, schemas, or tables.

Amazon RDS supports exporting snapshots in all AWS Regions except the following:

  • AWS GovCloud (US-East)

  • AWS GovCloud (US-West)

The following table shows the Aurora MySQL engine versions that are supported for exporting snapshot data to Amazon S3. For more information about Aurora MySQL engine versions, see Database engine updates for Amazon Aurora MySQL.

Aurora MySQL version MySQL-compatible version
2.04.6 5.7
2.04.5 5.7
2.04.4 5.7
1.19.5 5.6
1.19.4 5.6
1.19.3 5.6
1.19.2 5.6

The following table shows the Aurora PostgreSQL engine versions that are supported for exporting snapshot data to Amazon S3. For more information about Aurora PostgreSQL engine versions, see Engine versions for Amazon Aurora PostgreSQL .

Aurora PostgreSQL version PostgreSQL-compatible version
2.3 10.7
2.2 10.6
1.5 9.6.12
1.4 9.6.11

Overview of exporting snapshot data

The following procedure provides a high-level view of how to export DB snapshot data to an Amazon S3 bucket. For more details, see the following sections.

To export DB snapshot data to Amazon S3

  1. Identify the snapshot to export.

    Use an existing automated or manual snapshot, or create a manual snapshot of a DB instance.

  2. Set up access to the Amazon S3 bucket.

    A bucket is a container for Amazon S3 objects or files. To provide the information to access a bucket, take the following steps:

    1. Identify the S3 bucket where the snapshot is to be exported to. The S3 bucket must be in the same AWS Region as the snapshot. For more information, see Identifying the Amazon S3 bucket for export.

    2. Create an AWS Key Management Service (AWS KMS) customer master key (CMK) for the server-side encryption. The AWS KMS CMK is used by the snapshot export task to set up AWS KMS server-side encryption when writing the export data to S3. For more information, see Encrypting Amazon Aurora resources.

    3. Create an AWS Identity and Access Management (IAM) role that grants the snapshot export task access to the S3 bucket. For more information, see Providing access to an Amazon S3 bucket using an IAM role.

  3. Export the snapshot to Amazon S3 using the console or the start-export-task CLI command. For more information, see Exporting a snapshot to an Amazon S3 bucket.

  4. To access your exported data in the Amazon S3 bucket, see Uploading, downloading, and managing objects in the Amazon Simple Storage Service Console User Guide.

Setting up access to an Amazon S3 bucket

To export DB snapshot data to an Amazon S3 file, you first give the snapshot permission to access the Amazon S3 bucket. You then create an IAM role to allow the Amazon Aurora service to write to the Amazon S3 bucket.

Identifying the Amazon S3 bucket for export

Identify the Amazon S3 bucket to export the DB snapshot to. Use an existing S3 bucket or create a new S3 bucket.

Note

The S3 bucket to export to must be in the same AWS Region as the snapshot.

For more information about working with Amazon S3 buckets, see the following in the Amazon Simple Storage Service Console User Guide:

Providing access to an Amazon S3 bucket using an IAM role

Before you export DB snapshot data to Amazon S3, give the snapshot export tasks write-access permission to the Amazon S3 bucket.

To do this, create an IAM policy that provides access to the bucket. Then create an IAM role and attach the policy to the role. You later assign the IAM role to your snapshot export task.

Important

If you plan to use the AWS Management Console to export your snapshot, you can choose to create the IAM policy and the role automatically when you export the snapshot. For instructions, see Exporting a snapshot to an Amazon S3 bucket.

To give DB snapshot tasks access to Amazon S3

  1. Create an IAM policy. This policy provides the bucket and object permissions that allow your snapshot export task to access Amazon S3.

    Include in the policy the following required actions to allow the transfer of files from Amazon Aurora to an S3 bucket:

    • s3:PutObject*

    • s3:GetObject*

    • s3:ListBucket

    • s3:DeleteObject*

    • s3:GetBucketLocation

    Include in the policy the following resources to identify the S3 bucket and objects in the bucket. The following list of resources shows the Amazon Resource Name (ARN) format for accessing Amazon S3.

    • arn:aws:s3:::your-s3-bucket

    • arn:aws:s3:::your-s3-bucket/*

    For more information on creating an IAM policy for Amazon Aurora, see Creating and using an IAM policy for IAM database access. See also Tutorial: Create and attach your first customer managed policy in the IAM User Guide.

    The following AWS CLI command creates an IAM policy named ExportPolicy with these options. It grants access to a bucket named your-s3-bucket.

    Note

    After you create the policy, note the ARN of the policy. You need the ARN for a subsequent step when you attach the policy to an IAM role.

    aws iam create-policy --policy-name ExportPolicy --policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": [ "arn:aws:s3:::*" ] }, { "Effect": "Allow", "Action": [ "s3:PutObject*", "s3:GetObject*", "s3:DeleteObject*" ], "Resource": [ "arn:aws:s3:::your-s3-bucket", "arn:aws:s3:::your-s3-bucket/*" ] } ] }'
  2. Create an IAM role. You do this so that Aurora can assume this IAM role on your behalf to access your Amazon S3 buckets. For more information, see Creating a role to delegate permissions to an IAM user in the IAM User Guide.

    The following example shows using the AWS CLI command to create a role named rds-s3-export-role.

    aws iam create-role --role-name rds-s3-export-role --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "export.rds.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }'
  3. Attach the IAM policy that you created to the IAM role that you created.

    The following AWS CLI command attaches the policy created earlier to the role named rds-s3-export-role. Replace your-policy-arn with the policy ARN that you noted in an earlier step.

    aws iam attach-role-policy --policy-arn your-policy-arn --role-name rds-s3-export-role

Exporting a snapshot to an Amazon S3 bucket

You can have up to five concurrent DB snapshot export tasks in progress per account.

Note

Exporting RDS snapshots can take a while depending on your database type and size. The export task first restores and scales the entire database before extracting the data to Amazon S3. The task's progress during this phase displays as Starting. When the task switches to exporting data to S3, progress displays as In progress.

The time it takes for the export to complete depends on the data stored in the database. For example, tables with well distributed numeric primary key or index columns will export the fastest. Tables that don't contain a column suitable for partitioning and tables with only one index on a string-based column will take longer because the export uses a slower single threaded process.

You can export a DB snapshot to Amazon S3 using the AWS Management Console, the AWS CLI, or the RDS API.

The Export to Amazon S3 console option appears only for snapshots that can be exported to Amazon S3. A snapshot might not be available for export because of the following reasons:

  • The DB engine isn't supported for S3 export.

  • The DB instance version isn't supported for S3 export.

  • S3 export isn't supported in the AWS Region where the snapshot was created.

To export a DB snapshot

  1. Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.

  2. In the navigation pane, choose Snapshots.

  3. From the tabs, choose the type of snapshot that you want to export.

  4. In the list of snapshots, choose the snapshot that you want to export.

  5. For Actions, choose Export to Amazon S3.

    The Export to Amazon S3 window appears.

  6. For Export identifier, enter a name to identify the export task. This value is also used for the name of the file created in the S3 bucket.

  7. Choose the amount of data to be exported:

    • Choose All to export all data in the snapshot.

    • Choose Partial to export specific parts of the snapshot. To identify which parts of the snapshot to export, enter one or more tables for Identifiers.

  8. For S3 bucket, choose the bucket to export to.

    To assign the exported data to a folder path in the S3 bucket, enter the optional path for S3 prefix.

  9. For IAM role, either choose a role that grants you write access to your chosen S3 bucket, or create a new role.

    • If you created a role by following the steps in Providing access to an Amazon S3 bucket using an IAM role, choose that role.

    • If you didn't create a role that grants you write access to your chosen S3 bucket, choose Create a new role to create the role automatically. Next, enter a name for the role in IAM role name.

  10. For Master key, enter the ARN for the key to use for encrypting the exported data.

  11. Choose Export to Amazon S3.

To export a DB snapshot to Amazon S3 using the AWS CLI, use the start-export-task command with the following required options:

  • --export-task-identifier

  • --source-arn

  • --s3-bucket-name

  • --iam-role-arn

  • --kms-key-id

In the following examples, the snapshot export task is named my_snapshot_export, which exports a snapshot to an S3 bucket named my_export_bucket.

For Linux, macOS, or Unix:

aws rds start-export-task \ --export-task-identifier my_snapshot_export \ --source-arn arn:aws:rds:AWS_Region:123456789012:snapshot:snapshot_name \ --s3-bucket-name my_export_bucket \ --iam-role-arn iam_role \ --kms-key-id master_key

For Windows:

aws rds start-export-task ^ --export-task-identifier my_snapshot_export ^ --source-arn arn:aws:rds:AWS_Region:123456789012:snapshot:snapshot_name ^ --s3-bucket-name my_export_bucket ^ --iam-role-arn iam_role ^ --kms-key-id master_key

Sample output follows.

{ "Status": "STARTING", "IamRoleArn": "iam_role", "ExportTime": "2019-08-12T01:23:53.109Z", "S3Bucket": "my_export_bucket", "PercentProgress": 0, "KmsKeyId": "master_key", "ExportTaskIdentifier": "my_snapshot_export", "TotalExtractedDataInGB": 0, "TaskStartTime": "2019-11-13T19:46:00.173Z", "SourceArn": "arn:aws:rds:AWS_Region:123456789012:snapshot:snapshot_name" }

To provide a folder path in the S3 bucket for the snapshot export, include the --s3-prefix option in the start-export-task command.

To export a DB snapshot to Amazon S3 using the Amazon RDS API, use the StartExportTask operation with the following required parameters:

  • ExportTaskIdentifier

  • SourceArn

  • S3BucketName

  • IamRoleArn

  • KmsKeyId

Monitoring snapshot exports

You can monitor DB snapshot exports using the AWS Management Console, the AWS CLI, or the RDS API.

To monitor DB snapshot exports

  1. Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.

  2. In the navigation pane, choose Snapshots.

  3. To view the list of snapshot exports, choose the Exports in Amazon S3 tab.

  4. To view information about a specific snapshot export, choose the export task.

To monitor DB snapshot exports using the AWS CLI, use the describe-export-tasks command.

The following example shows how to display current information about all of your snapshot exports.

aws rds describe-export-tasks { "ExportTasks": [ { "Status": "CANCELED", "TaskEndTime": "2019-11-01T17:36:46.961Z", "S3Prefix": "something", "ExportTime": "2019-10-24T20:23:48.364Z", "S3Bucket": "examplebucket", "PercentProgress": 0, "KmsKeyId": "arn:aws:kms:AWS_Region:123456789012:key/K7MDENG/bPxRfiCYEXAMPLEKEY", "ExportTaskIdentifier": "anewtest", "IamRoleArn": "arn:aws:iam::123456789012:role/export-to-s3", "TotalExtractedDataInGB": 0, "TaskStartTime": "2019-10-25T19:10:58.885Z", "SourceArn": "arn:aws:rds:AWS_Region:123456789012:snapshot:parameter-groups-test" }, { "Status": "COMPLETE", "TaskEndTime": "2019-10-31T21:37:28.312Z", "WarningMessage": "{\"skippedTables\":[],\"skippedObjectives\":[],\"general\":[{\"reason\":\"FAILED_TO_EXTRACT_TABLES_LIST_FOR_DATABASE\"}]}", "S3Prefix": "", "ExportTime": "2019-10-31T06:44:53.452Z", "S3Bucket": "examplebucket1", "PercentProgress": 100, "KmsKeyId": "arn:aws:kms:AWS_Region:123456789012:key/2Zp9Utk/h3yCo8nvbEXAMPLEKEY", "ExportTaskIdentifier": "thursday-events-test", "IamRoleArn": "arn:aws:iam::123456789012:role/export-to-s3", "TotalExtractedDataInGB": 263, "TaskStartTime": "2019-10-31T20:58:06.998Z", "SourceArn": "arn:aws:rds:AWS_Region:123456789012:snapshot:rds:example-1-2019-10-31-06-44" }, { "Status": "FAILED", "TaskEndTime": "2019-10-31T02:12:36.409Z", "FailureCause": "The S3 bucket edgcuc-export isn't located in the current AWS Region. Please, review your S3 bucket name and retry the export.", "S3Prefix": "", "ExportTime": "2019-10-30T06:45:04.526Z", "S3Bucket": "examplebucket2", "PercentProgress": 0, "KmsKeyId": "arn:aws:kms:AWS_Region:123456789012:key/2Zp9Utk/h3yCo8nvbEXAMPLEKEY", "ExportTaskIdentifier": "wednesday-afternoon-test", "IamRoleArn": "arn:aws:iam::123456789012:role/export-to-s3", "TotalExtractedDataInGB": 0, "TaskStartTime": "2019-10-30T22:43:40.034Z", "SourceArn": "arn:aws:rds:AWS_Region:123456789012:snapshot:rds:example-1-2019-10-30-06-45" } ] }

To display information about a specific snapshot export, include the --export-task-identifier option with the describe-export-tasks command. To filter the output, include the --Filters option. For more options, see the describe-export-tasks command.

To display information about DB snapshot exports using the Amazon RDS API, use the DescribeExportTasks operation.

To track completion of the export workflow or to trigger another workflow, you can subscribe to Amazon Simple Notification Service topics. For more information on Amazon SNS, see Using Amazon RDS event notification.

Canceling a snapshot export task

You can cancel a DB snapshot export task using the AWS Management Console, the AWS CLI, or the RDS API.

Note

Canceling a snapshot export task doesn't remove any data that was exported to Amazon S3. For information about how to delete the data using the console, see How do I delete objects from an S3 bucket? To delete the data using the CLI, use the delete-object command.

To cancel a snapshot export task

  1. Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.

  2. In the navigation pane, choose Snapshots.

  3. Choose the Exports in Amazon S3 tab.

  4. Choose the snapshot export task that you want to cancel.

  5. Choose Cancel.

  6. Choose Cancel export task on the confirmation page.

To cancel a snapshot export task using the AWS CLI, use the cancel-export-task command. The command requires the --export-task-identifier option.

aws rds cancel-export-task --export-task-identifier my_export { "Status": "CANCELING", "S3Prefix": "", "ExportTime": "2019-08-12T01:23:53.109Z", "S3Bucket": "examplebucket", "PercentProgress": 0, "KmsKeyId": "arn:aws:kms:AWS_Region:123456789012:key/K7MDENG/bPxRfiCYEXAMPLEKEY", "ExportTaskIdentifier": "my_export", "IamRoleArn": "arn:aws:iam::123456789012:role/export-to-s3", "TotalExtractedDataInGB": 0, "TaskStartTime": "2019-11-13T19:46:00.173Z", "SourceArn": "arn:aws:rds:AWS_Region:123456789012:snapshot:export-example-1" }

To cancel a snapshot export task using the Amazon RDS API, use the CancelExportTask operation with the ExportTaskIdentifier parameter.

Troubleshooting PostgreSQL permissions errors

When exporting PostgreSQL databases to Amazon S3, you might see a PERMISSIONS_DO_NOT_EXIST error stating that certain tables were skipped. This is usually caused by the superuser, which you specify when creating the DB instance, not having permissions to access those tables.

To fix this error, run the following command:

GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA schema_name TO superuser_name

For more information on superuser privileges, see Master user account privileges.

Data conversion when exporting to an Amazon S3 bucket

When you export a DB snapshot to an Amazon S3 bucket, Amazon Aurora converts data to, exports data in, and stores data in the Parquet format. For more information about Parquet, see the Apache Parquet website.

Parquet stores all data as one of the following primitive types:

  • BOOLEAN

  • INT32

  • INT64

  • INT96

  • FLOAT

  • DOUBLE

  • BYTE_ARRAY – A variable-length byte array, also known as binary

  • FIXED_LEN_BYTE_ARRAY – A fixed-length byte array used when the values have a constant size

The Parquet data types are few to reduce the complexity of reading and writing the format. Parquet provides logical types for extending primitive types. A logical type is implemented as an annotation with the data in a LogicalType metadata field. The logical type annotation explains how to interpret the primitive type.

When the STRING logical type annotates a BYTE_ARRAY type, it indicates that the byte array should be interpreted as a UTF-8 encoded character string. After an export task completes, Amazon Aurora notifies you if any string conversion occurred. The underlying data exported is always the same as the data from the source. However, due to the encoding difference in UTF-8, some characters might appear different from the source when read in tools such as Athena.

For more information, see Parquet logical type definitions in the Parquet documentation.

Note
  • Some characters aren't supported in database table column names. Tables with the following characters in column names are skipped during export.

    ,;{}()\n\t=
  • If the data contains a huge value close to or greater than 500 MB, the export fails.

  • If the database, schema, or table name contains spaces, partial export isn't supported. However, you can export the entire DB snapshot.

MySQL data type mapping to Parquet

The following table shows the mapping from MySQL data types to Parquet data types when data is converted and exported to Amazon S3.

Source data type Parquet primitive type Logical type annotation Conversion notes
Numeric data types
BIGINT INT64
BIGINT UNSIGNED FIXED_LEN_BYTE_ARRAY(9) DECIMAL(20,0) Parquet supports only signed types, so the mapping requires an additional byte (8 plus 1) to store the BIGINT_UNSIGNED type.
BIT BYTE_ARRAY
DECIMAL INT32 DECIMAL(p,s) If the source value is less than 231, it's stored as INT32.
INT64 DECIMAL(p,s) If the source value is 231 or greater, but less than 263, it's stored as INT64.
FIXED_LEN_BYTE_ARRAY(N) DECIMAL(p,s) If the source value is 263 or greater, it's stored as FIXED_LEN_BYTE_ARRAY(N).
BYTE_ARRAY STRING Parquet doesn't support Decimal precision greater than 38. The Decimal value is converted to a string in a BYTE_ARRAY type and encoded as UTF8.
DOUBLE DOUBLE
FLOAT DOUBLE
INT INT32
INT UNSIGNED INT64
MEDIUMINT INT32
MEDIUMINT UNSIGNED INT64
NUMERIC INT32 DECIMAL(p,s)

If the source value is less than 231, it's stored as INT32.

INT64 DECIMAL(p,s) If the source value is 231 or greater, but less than 263, it's stored as INT64.
FIXED_LEN_ARRAY(N) DECIMAL(p,s) If the source value is 263 or greater, it's stored as FIXED_LEN_BYTE_ARRAY(N).
BYTE_ARRAY STRING Parquet doesn't support Numeric precision greater than 38. This Numeric value is converted to a string in a BYTE_ARRAY type and encoded as UTF8.
SMALLINT INT32
SMALLINT UNSIGNED INT32
TINYINT INT32
TINYINT UNSIGNED INT32
String data types
BINARY BYTE_ARRAY
BLOB BYTE_ARRAY
CHAR BYTE_ARRAY
ENUM BYTE_ARRAY STRING
LINESTRING BYTE_ARRAY
LONGBLOB BYTE_ARRAY
LONGTEXT BYTE_ARRAY STRING
MEDIUMBLOB BYTE_ARRAY
MEDIUMTEXT BYTE_ARRAY STRING
MULTILINESTRING BYTE_ARRAY
SET BYTE_ARRAY STRING
TEXT BYTE_ARRAY STRING
TINYBLOB BYTE_ARRAY
TINYTEXT BYTE_ARRAY STRING
VARBINARY BYTE_ARRAY
VARCHAR BYTE_ARRAY STRING
Date and time data types
DATE BYTE_ARRAY STRING A date is converted to a string in a BYTE_ARRAY type and encoded as UTF8.
DATETIME INT64 TIMESTAMP_MICROS
TIME BYTE_ARRAY STRING A TIME type is converted to a string in a BYTE_ARRAY and encoded as UTF8.
TIMESTAMP INT64 TIMESTAMP_MICROS
YEAR INT32
Geometric data types
GEOMETRY BYTE_ARRAY
GEOMETRYCOLLECTION BYTE_ARRAY
MULTIPOINT BYTE_ARRAY
MULTIPOLYGON BYTE_ARRAY
POINT BYTE_ARRAY
POLYGON BYTE_ARRAY
JSON data type
JSON BYTE_ARRAY STRING

PostgreSQL data type mapping to Parquet

The following table shows the mapping from PostgreSQL data types to Parquet data types when data is converted and exported to Amazon S3.

PostgreSQL data type Parquet primitive type Logical type annotation Mapping notes
Numeric data types
BIGINT INT64
BIGSERIAL INT64
DECIMAL BYTE_ARRAY STRING A DECIMAL type is converted to a string in a BYTE_ARRAY type and encoded as UTF8.

This conversion is to avoid complications due to data precision and data values that are not a number (NaN).

DOUBLE PRECISION DOUBLE
INTEGER INT32
MONEY BYTE_ARRAY STRING
REAL FLOAT
SERIAL INT32
SMALLINT INT32 INT_16
SMALLSERIAL INT32 INT_16
String and related data types
ARRAY BYTE_ARRAY STRING

An array is converted to a string and encoded as BINARY (UTF8).

This conversion is to avoid complications due to data precision, data values that are not a number (NaN), and time data values.

BIT BYTE_ARRAY STRING
BIT VARYING BYTE_ARRAY STRING
BYTEA BINARY
CHAR BYTE_ARRAY STRING
CHAR(N) BYTE_ARRAY STRING
ENUM BYTE_ARRAY STRING
NAME BYTE_ARRAY STRING
TEXT BYTE_ARRAY STRING
TEXT SEARCH BYTE_ARRAY STRING
VARCHAR(N) BYTE_ARRAY STRING
XML BYTE_ARRAY STRING
Date and time data types
DATE BYTE_ARRAY STRING
INTERVAL BYTE_ARRAY STRING
TIME BYTE_ARRAY STRING
TIME WITH TIME ZONE BYTE_ARRAY STRING
TIMESTAMP BYTE_ARRAY STRING
TIMESTAMP WITH TIME ZONE BYTE_ARRAY STRING
Geometric data types
BOX BYTE_ARRAY STRING
CIRCLE BYTE_ARRAY STRING
LINE BYTE_ARRAY STRING
LINESEGMENT BYTE_ARRAY STRING
PATH BYTE_ARRAY STRING
POINT BYTE_ARRAY STRING
POLYGON BYTE_ARRAY STRING
JSON data types
JSON BYTE_ARRAY STRING
JSONB BYTE_ARRAY STRING
Other data types
BOOLEAN BOOLEAN
CIDR BYTE_ARRAY STRING Network data type
COMPOSITE BYTE_ARRAY STRING
DOMAIN BYTE_ARRAY STRING
INET BYTE_ARRAY STRING Network data type
MACADDR BYTE_ARRAY STRING
OBJECT IDENTIFIER N/A
PG_LSN BYTE_ARRAY STRING
RANGE BYTE_ARRAY STRING
UUID BYTE_ARRAY STRING