Exporting Data from an Aurora PostgreSQL DB Cluster to Amazon S3 - Amazon Aurora

Exporting Data from an Aurora PostgreSQL DB Cluster to Amazon S3

You can query data from an Aurora PostgreSQL DB cluster and export it directly into files stored in an Amazon S3 bucket. To do this, you use the aws_s3 PostgreSQL extension that Aurora PostgreSQL provides.

Note

To export data to Amazon S3 from Aurora PostgreSQL, your database must be running one of the following PostgreSQL engine versions:

  • PostgreSQL version 10.11 or higher

  • PostgreSQL version 11.6 or higher

For more information on storing data with Amazon S3, see Create a Bucket in the Amazon Simple Storage Service Getting Started Guide.

Overview of Exporting Data to Amazon S3

To export data stored in an Aurora PostgreSQL database to an Amazon S3 bucket, use the following procedure.

To export Aurora PostgreSQL data to S3

  1. Install the required PostgreSQL extensions. These include the aws_s3 and aws_commons extensions. To do so, start psql and use the following commands.

    CREATE EXTENSION IF NOT EXISTS aws_s3 CASCADE;

    The aws_s3 extension provides the aws_s3.export_query_to_s3 function that you use to export data to Amazon S3. The aws_commons extension is included to provide additional helper functions.

  2. Identify an Amazon S3 file path to use for exporting data. For details about this process, see Specifying the Amazon S3 File Path to Export to.

  3. Provide permission to access the Amazon S3 bucket.

    To export data to an Amazon S3 file, give the Aurora PostgreSQL DB cluster permission to access the Amazon S3 bucket that the export will use for storage. Doing this includes the following steps:

    1. Create an IAM policy that provides access to an Amazon S3 bucket that you want to export to.

    2. Create an IAM role.

    3. Attach the policy you created to the role you created.

    4. Add this IAM role to your DB cluster.

    For details about this process, see Setting Up Access to an Amazon S3 Bucket.

  4. Identify a database query to get the data. Export the query data by calling the aws_s3.export_query_to_s3 function.

    After you complete the preceding preparation tasks, use the aws_s3.export_query_to_s3 function to export query results to Amazon S3. For details about this process, see Exporting Query Data Using the aws_s3.export_query_to_s3 Function.

Specifying the Amazon S3 File Path to Export to

Specify the following information to identify the location in Amazon S3 where you want to export data to:

  • Bucket name – A bucket is a container for Amazon S3 objects or files.

    For more information on storing data with Amazon S3, see Create a Bucket and View an Object in the Amazon Simple Storage Service Getting Started Guide.

  • File path – The file path identifies where the export is stored in the Amazon S3 bucket. The file path consists of the following:

    • An optional path prefix that identifies a virtual folder path.

    • A file prefix that identifies one or more files to be stored. Larger exports are stored in multiple files, each with a maximum size of approximately 6 GB. The additional file names have the same file prefix but with _partXX appended. The XX represents 2, then 3, and so on.

    For example, a file path with an exports folder and a query-1-export file prefix is /exports/query-1-export.

  • AWS Region (optional) – The AWS Region where the Amazon S3 bucket is located. If you don't specify an AWS Region value, then Aurora saves your files into Amazon S3 in the same AWS Region as the exporting DB cluster.

    Note

    Currently, the AWS Region must be the same as the region of the exporting DB cluster.

    For a listing of AWS Region names and associated values, see Regions and Availability Zones.

To hold the Amazon S3 file information about where the export is to be stored, you can use the aws_commons.create_s3_uri function to create an aws_commons._s3_uri_1 composite structure as follows.

psql=> SELECT aws_commons.create_s3_uri( 'sample-bucket', 'sample-filepath', 'us-west-2' ) AS s3_uri_1 \gset

You later provide this s3_uri_1 value as a parameter in the call to the aws_s3.export_query_to_s3 function. For examples, see Exporting Query Data Using the aws_s3.export_query_to_s3 Function.

Setting Up Access to an Amazon S3 Bucket

To export data to Amazon S3, give your Aurora PostgreSQL DB cluster permission to access the Amazon S3 bucket that the files are to go in.

To do this, use the following procedure.

To give an Aurora PostgreSQL DB cluster access to Amazon S3 through an IAM role

  1. Create an IAM policy.

    This policy provides the bucket and object permissions that allow your Aurora PostgreSQL DB cluster to access Amazon S3.

    As part of creating this policy, take the following steps:

    1. Include in the policy the following required actions to allow the transfer of files from your Aurora PostgreSQL cluster to an Amazon S3 bucket:

      • s3:PutObject

      • s3:AbortMultipartUpload

    2. Include the Amazon Resource Name (ARN) that identifies the Amazon S3 bucket and objects in the bucket. The ARN format for accessing Amazon S3 is: arn:aws:s3:::your-s3-bucket/*

    For more information on creating an IAM policy for Aurora PostgreSQL, see Creating and Using an IAM Policy for IAM Database Access. See also Tutorial: Create and Attach Your First Customer Managed Policy in the IAM User Guide.

    The following AWS CLI command creates an IAM policy named rds-s3-export-policy with these options. It grants access to a bucket named your-s3-bucket.

    Warning

    We recommend that you set up your database within a private VPC that has endpoint policies configured for accessing specific buckets. For more information, see Using Endpoint Policies for Amazon S3 in the Amazon VPC User Guide.

    We strongly recommend that you do not create a policy with all-resource access. This access can pose a threat for data security. If you create a policy that gives S3:PutObject access to all resources using "Resource":”*”, then a user with export privileges can export data to all buckets in your account. In addition, the user can export data to any publicly writable bucket within your AWS Region.

    After you create the policy, note the Amazon Resource Name (ARN) of the policy. You need the ARN for a subsequent step when you attach the policy to an IAM role.

    aws iam create-policy --policy-name rds-s3-export-policy --policy-document '{ "Version": "2012-10-17", "Statement": [ { "Sid": "s3export", "Action": [ "S3:PutObject" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::your-s3-bucket/*" ] } ] }'
  2. Create an IAM role.

    You do this so Aurora PostgreSQL can assume this IAM role on your behalf to access your Amazon S3 buckets. For more information, see Creating a Role to Delegate Permissions to an IAM User in the IAM User Guide.

    The following example shows using the AWS CLI command to create a role named rds-s3-export-role.

    aws iam create-role --role-name rds-s3-export-role --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }'
  3. Attach the IAM policy that you created to the IAM role that you created.

    The following AWS CLI command attaches the policy created earlier to the role named rds-s3-export-role. Replace your-policy-arn with the policy ARN that you noted in an earlier step.

    aws iam attach-role-policy --policy-arn your-policy-arn --role-name rds-s3-export-role
  4. Add the IAM role to the DB cluster. You do so by using the AWS Management Console or AWS CLI, as described following.

To add an IAM role for a PostgreSQL DB cluster using the console

  1. Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.

  2. Choose the PostgreSQL DB cluster name to display its details.

  3. On the Connectivity & security tab, in the Manage IAM roles section, choose the role to add under Add IAM roles to this instance.

  4. Under Feature, choose s3Export.

  5. Choose Add role.

To add an IAM role for a PostgreSQL DB cluster using the CLI

  • Use the following command to add the role to the PostgreSQL DB cluster named my-db-cluster. Replace your-role-arn with the role ARN that you noted in a previous step. Use s3Export for the value of the --feature-name option.

    Example

    For Linux, macOS, or Unix:

    aws rds add-role-to-db-cluster \ --db-cluster-identifier my-db-cluster \ --feature-name s3Export \ --role-arn your-role-arn \ --region your-region

    For Windows:

    aws rds add-role-to-db-cluster ^ --db-cluster-identifier my-db-cluster ^ --feature-name s3Export ^ --role-arn your-role-arn ^ --region your-region

Exporting Query Data Using the aws_s3.export_query_to_s3 Function

Export your PostgreSQL data to Amazon S3 by calling the aws_s3.export_query_to_s3 function.

Prerequisites

Before you use the aws_s3.export_query_to_s3 function, be sure to complete the following prerequisites:

The examples following use a database table called sample_table. These examples export the data into a bucket called sample-bucket. The example table and data are created with the following SQL statements in psql.

psql=> CREATE TABLE sample_table (bid bigint PRIMARY KEY, name varchar(80)); psql=> INSERT INTO sample_table (bid,name) VALUES (1, 'Monday'), (2,'Tuesday'), (3, 'Wednesday');

Calling aws_s3.export_query_to_s3

The following shows the basic ways of calling the aws_s3.export_query_to_s3 function. Although the parameters vary, the results are the same for these examples. All rows of the sample_table table are exported into a bucket called sample-bucket.

psql=> SELECT * from aws_s3.query_export_to_s3('select * from sample_table', :'s3_uri_1'); psql=> SELECT * from aws_s3.query_export_to_s3('select * from sample_table', :'s3_uri_1', options :='format text');

The parameters are described as follows:

  • 'select * from sample_table' – The first parameter is a required text string containing an SQL query. The PostgreSQL engine runs this query. The results of the query are copied to the S3 bucket identified in other parameters.

  • :'s3_uri_1' – This parameter is a structure that contains the information identifying the Amazon S3 file. Use the aws_commons.create_s3_uri function to create the structure as shown following.

    psql=> SELECT aws_commons.create_s3_uri( 'sample-bucket', 'sample-filepath', 'us-west-2' ) AS s3_uri_1 \gset

    Or you can include the aws_commons.create_s3_uri function call inline within the aws_s3.query_export_to_s3 function call.

    SELECT * from aws_s3.query_export_to_s3('select * from sample_table', aws_commons.create_s3_uri('sample-bucket', 'sample-filepath', 'us-west-2') );
  • options :='format text' – The options parameter is an optional text string containing PostgreSQL COPY arguments. The copy process uses the arguments and format of the PostgreSQL COPY command.

If the file specified doesn't exist in the Amazon S3 bucket, it's created. If the file already exists, it's overwritten. The syntax for accessing the exported data in Amazon S3 is the following.

s3-region://bucket-name[/path-prefix]/file-prefix

Larger exports are stored in multiple files, each with a maximum size of approximately 6 GB. The additional file names have the same file prefix but with _partXX appended. The XX represents 2, then 3, and so on. For example, suppose that you specify the path where you store data files as the following.

s3-us-west-2://my-bucket/my-prefix

If the export has to create three data files, the Amazon S3 bucket contains the following data files.

s3-us-west-2://my-bucket/my-prefix s3-us-west-2://my-bucket/my-prefix_part2 s3-us-west-2://my-bucket/my-prefix_part3

For the full reference for this function and additional ways to call it, see aws_s3.export_query_to_s3. For more about accessing files in Amazon S3, see View an Object in the Amazon Simple Storage Service Getting Started Guide.

Exporting to a CSV File That Uses a Custom Delimiter

The following example shows how to call the aws_s3.export_query_to_s3 function to export data to a file that uses a custom delimiter. The example uses arguments of the PostgreSQL COPY command to specify the comma-separated value (CSV) format and a colon (:) delimiter.

SELECT * from aws_s3.query_export_to_s3('select * from basic_test', :'s3_uri_1', options :='format csv, delimiter $$:$$');

Exporting to a Binary File with Encoding

The following example shows how to call the aws_s3.export_query_to_s3 function to export data to a binary file that has Windows-1253 encoding.

SELECT * from aws_s3.query_export_to_s3('select * from basic_test', :'s3_uri_1', options :='format binary, encoding WIN1253');

Troubleshooting Access to Amazon S3

If you encounter connection problems when attempting to export data to Amazon S3, see the following for recommendations:

Function Reference

aws_s3.export_query_to_s3

Exports a PostgreSQL query result to an Amazon S3 bucket. The aws_s3 extension provides the aws_s3.export_query_to_s3 function.

The two required parameters are query and s3_info. These define the query to be exported and identify the Amazon S3 bucket to export to. An optional parameter called options provides for defining various export parameters. For examples of using the aws_s3.export_query_to_s3 function, see Exporting Query Data Using the aws_s3.export_query_to_s3 Function.

Syntax

aws_s3.query_export_to_s3( query text, s3_info aws_commons._s3_uri_1, options text )

Input Parameters

query

A required text string containing an SQL query that the PostgreSQL engine runs. The results of this query are copied to an S3 bucket identified in the s3_info parameter.

s3_info

An aws_commons._s3_uri_1 composite type containing the following information about the S3 object:

  • bucket – The name of the Amazon S3 bucket to contain the file.

  • file_path – The Amazon S3 path of the file.

  • region – The AWS Region that the bucket is in. For a listing of AWS Region names and associated values, see Regions and Availability Zones.

    Currently, this value must be the same AWS Region as that of the exporting DB cluster. The default is the AWS Region of the exporting DB cluster.

To create an aws_commons._s3_uri_1 composite structure, see the aws_commons.create_s3_uri function.

options

An optional text string containing arguments for the PostgreSQL COPY command. These arguments specify how the data is to be copied when exported. For more details, see the PostgreSQL COPY documentation.

Alternate Input Parameters

To help with testing, you can use an expanded set of parameters instead of the s3_info parameter. Following are additional syntax variations for the aws_s3.export_query_to_s3 function.

Instead of using the s3_info parameter to identify an Amazon S3 file, use the combination of the bucket, file_path, and region parameters.

aws_s3.query_export_to_s3( query text, bucket text, file_path text, region text, options text )
query

A required text string containing an SQL query that the PostgreSQL engine runs. The results of this query are copied to an S3 bucket identified in the s3_info parameter.

bucket

A required text string containing the name of the Amazon S3 bucket that contains the file.

file_path

A required text string containing the Amazon S3 path of the file.

region

An optional text string containing the AWS Region that the bucket is in. For a listing of AWS Region names and associated values, see Regions and Availability Zones.

Currently, this value must be the same AWS Region as that of the exporting DB cluster. The default is the AWS Region of the exporting DB cluster.

options

An optional text string containing arguments for the PostgreSQL COPY command. These arguments specify how the data is to be copied when exported. For more details, see the PostgreSQL COPY documentation.

Output Parameters

aws_s3.query_export_to_s3( OUT rows_uploaded bigint, OUT files_uploaded bigint, OUT bytes_uploaded bigint )
rows_uploaded

The number of table rows that were successfully uploaded to Amazon S3 for the given query.

files_uploaded

The number of files uploaded to Amazon S3. Files are created in sizes of approximately 6 GB. Each additional file created has _partXX appended to the name. The XX represents 2, then 3, and so on as needed.

bytes_uploaded

The total number of bytes uploaded to Amazon S3.

Examples

psql=> SELECT * from aws_s3.query_export_to_s3('select * from sample_table', 'sample-bucket', 'sample-filepath'); psql=> SELECT * from aws_s3.query_export_to_s3('select * from sample_table', 'sample-bucket', 'sample-filepath','us-west-2'); psql=> SELECT * from aws_s3.query_export_to_s3('select * from sample_table', 'sample-bucket', 'sample-filepath','us-west-2','format text');

aws_commons.create_s3_uri

Creates an aws_commons._s3_uri_1 structure to hold Amazon S3 file information. You use the results of the aws_commons.create_s3_uri function in the s3_info parameter of the aws_s3.export_query_to_s3 function. For an example of using the aws_commons.create_s3_uri function, see Specifying the Amazon S3 File Path to Export to.

Syntax

aws_commons.create_s3_uri( bucket text, file_path text, region text )

Input Parameters

bucket

A required text string containing the Amazon S3 bucket name for the file.

file_path

A required text string containing the Amazon S3 path of the file.

region

A required text string containing the AWS Region that the file is in. For a listing of AWS Region names and associated values, see Regions and Availability Zones.