Exporting data from an Aurora PostgreSQL DB cluster to Amazon S3 - Amazon Aurora

Exporting data from an Aurora PostgreSQL DB cluster to Amazon S3

You can query data from an Aurora PostgreSQL DB cluster and export it directly into files stored in an Amazon S3 bucket. To do this, you first install the Aurora PostgreSQL aws_s3 extension. This extension provides you with the functions that you use to export the results of queries to Amazon S3. Following, you can find out how to install the extension and how to export data to Amazon S3.

You can export from a provisioned or an Aurora Serverless v2 DB instance. These steps aren't supported for Aurora Serverless v1.

All currently available versions of Aurora PostgreSQL support exporting data to Amazon Simple Storage Service. For detailed version information, see Amazon Aurora PostgreSQL updates in the Release Notes for Aurora PostgreSQL.

If you don't have a bucket set up for your export, see the following topics the Amazon Simple Storage Service User Guide.

The upload to Amazon S3 uses server-side encryption by default. If you are using encryption, the Amazon S3 bucket must be encrypted with an AWS managed key. Currently, you can't export data to a bucket that is encrypted with a customer managed key.

Note

You can save DB and DB cluster snapshot data to Amazon S3 using the AWS Management Console, AWS CLI, or Amazon RDS API. For more information, see Exporting DB cluster snapshot data to Amazon S3.

Installing the aws_s3 extension

Before you can use Amazon Simple Storage Service with your Aurora PostgreSQL DB cluster, you need to install the aws_s3 extension. This extension provides functions for exporting data from the writer instance of an Aurora PostgreSQL DB cluster to an Amazon S3 bucket. It also provides functions for importing data from an Amazon S3. For more information, see Importing data from Amazon S3 into an Aurora PostgreSQL DB cluster. The aws_s3 extension depends on some of the helper functions in the aws_commons extension, which is installed automatically when needed.

To install the aws_s3 extension

  1. Use psql (or pgAdmin) to connect to the writer instance of your Aurora PostgreSQL DB cluster as a user that has rds_superuser privileges. If you kept the default name during the setup process, you connect as postgres.

    psql --host=111122223333.aws-region.rds.amazonaws.com --port=5432 --username=postgres --password
  2. To install the extension, run the following command.

    postgres=> CREATE EXTENSION aws_s3 CASCADE; NOTICE: installing required extension "aws_commons" CREATE EXTENSION
  3. To verify that the extension is installed, you can use the psql \dx metacommand.

    postgres=> \dx List of installed extensions Name | Version | Schema | Description -------------+---------+------------+--------------------------------------------- aws_commons | 1.2 | public | Common data types across AWS services aws_s3 | 1.1 | public | AWS S3 extension for importing data from S3 plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language (3 rows)

The functions for importing data from Amazon S3 and exporting data to Amazon S3 are now available to use.

Verify that your Aurora PostgreSQL version supports exports to Amazon S3

You can verify that your Aurora PostgreSQL version supports export to Amazon S3 by using the describe-db-engine-versions command. The following example checks to see if version 10.14 can export to Amazon S3.

aws rds describe-db-engine-versions --region us-east-1 \ --engine aurora-postgresql --engine-version 10.14 | grep s3Export

If the output includes the string "s3Export", then the engine supports Amazon S3 exports. Otherwise, the engine doesn't support them.

Overview of exporting data to Amazon S3

To export data stored in an Aurora PostgreSQL database to an Amazon S3 bucket, use the following procedure.

To export Aurora PostgreSQL data to S3

  1. Identify an Amazon S3 file path to use for exporting data. For details about this process, see Specifying the Amazon S3 file path to export to.

  2. Provide permission to access the Amazon S3 bucket.

    To export data to an Amazon S3 file, give the Aurora PostgreSQL DB cluster permission to access the Amazon S3 bucket that the export will use for storage. Doing this includes the following steps:

    1. Create an IAM policy that provides access to an Amazon S3 bucket that you want to export to.

    2. Create an IAM role.

    3. Attach the policy you created to the role you created.

    4. Add this IAM role to your DB cluster .

    For details about this process, see Setting up access to an Amazon S3 bucket.

  3. Identify a database query to get the data. Export the query data by calling the aws_s3.query_export_to_s3 function.

    After you complete the preceding preparation tasks, use the aws_s3.query_export_to_s3 function to export query results to Amazon S3. For details about this process, see Exporting query data using the aws_s3.query_export_to_s3 function.

Specifying the Amazon S3 file path to export to

Specify the following information to identify the location in Amazon S3 where you want to export data to:

  • Bucket name – A bucket is a container for Amazon S3 objects or files.

    For more information on storing data with Amazon S3, see Create a bucket and View an object in the Amazon Simple Storage Service User Guide.

  • File path – The file path identifies where the export is stored in the Amazon S3 bucket. The file path consists of the following:

    • An optional path prefix that identifies a virtual folder path.

    • A file prefix that identifies one or more files to be stored. Larger exports are stored in multiple files, each with a maximum size of approximately 6 GB. The additional file names have the same file prefix but with _partXX appended. The XX represents 2, then 3, and so on.

    For example, a file path with an exports folder and a query-1-export file prefix is /exports/query-1-export.

  • AWS Region (optional) – The AWS Region where the Amazon S3 bucket is located. If you don't specify an AWS Region value, then Aurora saves your files into Amazon S3 in the same AWS Region as the exporting DB cluster.

    Note

    Currently, the AWS Region must be the same as the region of the exporting DB cluster.

    For a listing of AWS Region names and associated values, see Regions and Availability Zones.

To hold the Amazon S3 file information about where the export is to be stored, you can use the aws_commons.create_s3_uri function to create an aws_commons._s3_uri_1 composite structure as follows.

psql=> SELECT aws_commons.create_s3_uri( 'sample-bucket', 'sample-filepath', 'us-west-2' ) AS s3_uri_1 \gset

You later provide this s3_uri_1 value as a parameter in the call to the aws_s3.query_export_to_s3 function. For examples, see Exporting query data using the aws_s3.query_export_to_s3 function.

Setting up access to an Amazon S3 bucket

To export data to Amazon S3, give your PostgreSQL DB cluster permission to access the Amazon S3 bucket that the files are to go in.

To do this, use the following procedure.

To give a PostgreSQL DB cluster access to Amazon S3 through an IAM role

  1. Create an IAM policy.

    This policy provides the bucket and object permissions that allow your PostgreSQL DB cluster to access Amazon S3.

    As part of creating this policy, take the following steps:

    1. Include in the policy the following required actions to allow the transfer of files from your PostgreSQL DB cluster to an Amazon S3 bucket:

      • s3:PutObject

      • s3:AbortMultipartUpload

    2. Include the Amazon Resource Name (ARN) that identifies the Amazon S3 bucket and objects in the bucket. The ARN format for accessing Amazon S3 is: arn:aws:s3:::your-s3-bucket/*

    For more information on creating an IAM policy for Aurora PostgreSQL, see Creating and using an IAM policy for IAM database access. See also Tutorial: Create and attach your first customer managed policy in the IAM User Guide.

    The following AWS CLI command creates an IAM policy named rds-s3-export-policy with these options. It grants access to a bucket named your-s3-bucket.

    Warning

    We recommend that you set up your database within a private VPC that has endpoint policies configured for accessing specific buckets. For more information, see Using endpoint policies for Amazon S3 in the Amazon VPC User Guide.

    We strongly recommend that you do not create a policy with all-resource access. This access can pose a threat for data security. If you create a policy that gives S3:PutObject access to all resources using "Resource":"*", then a user with export privileges can export data to all buckets in your account. In addition, the user can export data to any publicly writable bucket within your AWS Region.

    After you create the policy, note the Amazon Resource Name (ARN) of the policy. You need the ARN for a subsequent step when you attach the policy to an IAM role.

    aws iam create-policy --policy-name rds-s3-export-policy --policy-document '{ "Version": "2012-10-17", "Statement": [ { "Sid": "s3export", "Action": [ "S3:PutObject" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::your-s3-bucket/*" ] } ] }'
  2. Create an IAM role.

    You do this so Aurora PostgreSQL can assume this IAM role on your behalf to access your Amazon S3 buckets. For more information, see Creating a role to delegate permissions to an IAM user in the IAM User Guide.

    We recommend using the aws:SourceArn and aws:SourceAccount global condition context keys in resource-based policies to limit the service's permissions to a specific resource. This is the most effective way to protect against the confused deputy problem.

    If you use both global condition context keys and the aws:SourceArn value contains the account ID, the aws:SourceAccount value and the account in the aws:SourceArn value must use the same account ID when used in the same policy statement.

    • Use aws:SourceArn if you want cross-service access for a single resource.

    • Use aws:SourceAccount if you want to allow any resource in that account to be associated with the cross-service use.

    In the policy, be sure to use the aws:SourceArn global condition context key with the full ARN of the resource. The following example shows how to do so using the AWS CLI command to create a role named rds-s3-export-role.

    Example

    For Linux, macOS, or Unix:

    aws iam create-role \ --role-name rds-s3-export-role \ --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "aws:SourceAccount": "111122223333", "aws:SourceArn": "arn:aws:rds:us-east-1:111122223333:db:dbname" } } } ] }'

    For Windows:

    aws iam create-role ^ --role-name rds-s3-export-role ^ --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "aws:SourceAccount": "111122223333", "aws:SourceArn": "arn:aws:rds:us-east-1:111122223333:db:dbname" } } } ] }'
  3. Attach the IAM policy that you created to the IAM role that you created.

    The following AWS CLI command attaches the policy created earlier to the role named rds-s3-export-role. Replace your-policy-arn with the policy ARN that you noted in an earlier step.

    aws iam attach-role-policy --policy-arn your-policy-arn --role-name rds-s3-export-role
  4. Add the IAM role to the DB cluster. You do so by using the AWS Management Console or AWS CLI, as described following.

To add an IAM role for a PostgreSQL DB cluster using the console

  1. Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.

  2. Choose the PostgreSQL DB cluster name to display its details.

  3. On the Connectivity & security tab, in the Manage IAM roles section, choose the role to add under Add IAM roles to this instance.

  4. Under Feature, choose s3Export.

  5. Choose Add role.

To add an IAM role for a PostgreSQL DB cluster using the CLI

  • Use the following command to add the role to the PostgreSQL DB cluster named my-db-cluster. Replace your-role-arn with the role ARN that you noted in a previous step. Use s3Export for the value of the --feature-name option.

    Example

    For Linux, macOS, or Unix:

    aws rds add-role-to-db-cluster \ --db-cluster-identifier my-db-cluster \ --feature-name s3Export \ --role-arn your-role-arn \ --region your-region

    For Windows:

    aws rds add-role-to-db-cluster ^ --db-cluster-identifier my-db-cluster ^ --feature-name s3Export ^ --role-arn your-role-arn ^ --region your-region

Exporting query data using the aws_s3.query_export_to_s3 function

Export your PostgreSQL data to Amazon S3 by calling the aws_s3.query_export_to_s3 function.

Prerequisites

Before you use the aws_s3.query_export_to_s3 function, be sure to complete the following prerequisites:

The examples following use a database table called sample_table. These examples export the data into a bucket called sample-bucket. The example table and data are created with the following SQL statements in psql.

psql=> CREATE TABLE sample_table (bid bigint PRIMARY KEY, name varchar(80)); psql=> INSERT INTO sample_table (bid,name) VALUES (1, 'Monday'), (2,'Tuesday'), (3, 'Wednesday');

Calling aws_s3.query_export_to_s3

The following shows the basic ways of calling the aws_s3.query_export_to_s3 function.

These examples use the variable s3_uri_1 to identify a structure that contains the information identifying the Amazon S3 file. Use the aws_commons.create_s3_uri function to create the structure.

psql=> SELECT aws_commons.create_s3_uri( 'sample-bucket', 'sample-filepath', 'us-west-2' ) AS s3_uri_1 \gset

Although the parameters vary for the following two aws_s3.query_export_to_s3 function calls, the results are the same for these examples. All rows of the sample_table table are exported into a bucket called sample-bucket.

psql=> SELECT * FROM aws_s3.query_export_to_s3('SELECT * FROM sample_table', :'s3_uri_1'); psql=> SELECT * FROM aws_s3.query_export_to_s3('SELECT * FROM sample_table', :'s3_uri_1', options :='format text');

The parameters are described as follows:

  • 'SELECT * FROM sample_table' – The first parameter is a required text string containing an SQL query. The PostgreSQL engine runs this query. The results of the query are copied to the S3 bucket identified in other parameters.

  • :'s3_uri_1' – This parameter is a structure that identifies the Amazon S3 file. This example uses a variable to identify the previously created structure. You can instead create the structure by including the aws_commons.create_s3_uri function call inline within the aws_s3.query_export_to_s3 function call as follows.

    SELECT * from aws_s3.query_export_to_s3('select * from sample_table', aws_commons.create_s3_uri('sample-bucket', 'sample-filepath', 'us-west-2') );
  • options :='format text' – The options parameter is an optional text string containing PostgreSQL COPY arguments. The copy process uses the arguments and format of the PostgreSQL COPY command.

If the file specified doesn't exist in the Amazon S3 bucket, it's created. If the file already exists, it's overwritten. The syntax for accessing the exported data in Amazon S3 is the following.

s3-region://bucket-name[/path-prefix]/file-prefix

Larger exports are stored in multiple files, each with a maximum size of approximately 6 GB. The additional file names have the same file prefix but with _partXX appended. The XX represents 2, then 3, and so on. For example, suppose that you specify the path where you store data files as the following.

s3-us-west-2://my-bucket/my-prefix

If the export has to create three data files, the Amazon S3 bucket contains the following data files.

s3-us-west-2://my-bucket/my-prefix s3-us-west-2://my-bucket/my-prefix_part2 s3-us-west-2://my-bucket/my-prefix_part3

For the full reference for this function and additional ways to call it, see aws_s3.query_export_to_s3. For more about accessing files in Amazon S3, see View an object in the Amazon Simple Storage Service User Guide.

Exporting to a CSV file that uses a custom delimiter

The following example shows how to call the aws_s3.query_export_to_s3 function to export data to a file that uses a custom delimiter. The example uses arguments of the PostgreSQL COPY command to specify the comma-separated value (CSV) format and a colon (:) delimiter.

SELECT * from aws_s3.query_export_to_s3('select * from basic_test', :'s3_uri_1', options :='format csv, delimiter $$:$$');

Exporting to a binary file with encoding

The following example shows how to call the aws_s3.query_export_to_s3 function to export data to a binary file that has Windows-1253 encoding.

SELECT * from aws_s3.query_export_to_s3('select * from basic_test', :'s3_uri_1', options :='format binary, encoding WIN1253');

Troubleshooting access to Amazon S3

If you encounter connection problems when attempting to export data to Amazon S3, first confirm that the outbound access rules for the VPC security group associated with your DB instance permit network connectivity. Specifically, the security group must have a rule that allows the DB instance to send TCP traffic to port 443 and to any IPv4 addresses (0.0.0.0/0). For more information, see Provide access to the DB cluster in the VPC by creating a security group.

See also the following for recommendations:

Function reference

aws_s3.query_export_to_s3

Exports a PostgreSQL query result to an Amazon S3 bucket. The aws_s3 extension provides the aws_s3.query_export_to_s3 function.

The two required parameters are query and s3_info. These define the query to be exported and identify the Amazon S3 bucket to export to. An optional parameter called options provides for defining various export parameters. For examples of using the aws_s3.query_export_to_s3 function, see Exporting query data using the aws_s3.query_export_to_s3 function.

Syntax

aws_s3.query_export_to_s3( query text, s3_info aws_commons._s3_uri_1, options text )

Input parameters

query

A required text string containing an SQL query that the PostgreSQL engine runs. The results of this query are copied to an S3 bucket identified in the s3_info parameter.

s3_info

An aws_commons._s3_uri_1 composite type containing the following information about the S3 object:

  • bucket – The name of the Amazon S3 bucket to contain the file.

  • file_path – The Amazon S3 file name and path.

  • region – The AWS Region that the bucket is in. For a listing of AWS Region names and associated values, see Regions and Availability Zones.

    Currently, this value must be the same AWS Region as that of the exporting DB cluster . The default is the AWS Region of the exporting DB cluster .

To create an aws_commons._s3_uri_1 composite structure, see the aws_commons.create_s3_uri function.

options

An optional text string containing arguments for the PostgreSQL COPY command. These arguments specify how the data is to be copied when exported. For more details, see the PostgreSQL COPY documentation.

Alternate input parameters

To help with testing, you can use an expanded set of parameters instead of the s3_info parameter. Following are additional syntax variations for the aws_s3.query_export_to_s3 function.

Instead of using the s3_info parameter to identify an Amazon S3 file, use the combination of the bucket, file_path, and region parameters.

aws_s3.query_export_to_s3( query text, bucket text, file_path text, region text, options text )
query

A required text string containing an SQL query that the PostgreSQL engine runs. The results of this query are copied to an S3 bucket identified in the s3_info parameter.

bucket

A required text string containing the name of the Amazon S3 bucket that contains the file.

file_path

A required text string containing the Amazon S3 file name including the path of the file.

region

An optional text string containing the AWS Region that the bucket is in. For a listing of AWS Region names and associated values, see Regions and Availability Zones.

Currently, this value must be the same AWS Region as that of the exporting DB cluster . The default is the AWS Region of the exporting DB cluster .

options

An optional text string containing arguments for the PostgreSQL COPY command. These arguments specify how the data is to be copied when exported. For more details, see the PostgreSQL COPY documentation.

Output parameters

aws_s3.query_export_to_s3( OUT rows_uploaded bigint, OUT files_uploaded bigint, OUT bytes_uploaded bigint )
rows_uploaded

The number of table rows that were successfully uploaded to Amazon S3 for the given query.

files_uploaded

The number of files uploaded to Amazon S3. Files are created in sizes of approximately 6 GB. Each additional file created has _partXX appended to the name. The XX represents 2, then 3, and so on as needed.

bytes_uploaded

The total number of bytes uploaded to Amazon S3.

Examples

psql=> SELECT * from aws_s3.query_export_to_s3('select * from sample_table', 'sample-bucket', 'sample-filepath'); psql=> SELECT * from aws_s3.query_export_to_s3('select * from sample_table', 'sample-bucket', 'sample-filepath','us-west-2'); psql=> SELECT * from aws_s3.query_export_to_s3('select * from sample_table', 'sample-bucket', 'sample-filepath','us-west-2','format text');

aws_commons.create_s3_uri

Creates an aws_commons._s3_uri_1 structure to hold Amazon S3 file information. You use the results of the aws_commons.create_s3_uri function in the s3_info parameter of the aws_s3.query_export_to_s3 function. For an example of using the aws_commons.create_s3_uri function, see Specifying the Amazon S3 file path to export to.

Syntax

aws_commons.create_s3_uri( bucket text, file_path text, region text )

Input parameters

bucket

A required text string containing the Amazon S3 bucket name for the file.

file_path

A required text string containing the Amazon S3 file name including the path of the file.

region

A required text string containing the AWS Region that the file is in. For a listing of AWS Region names and associated values, see Regions and Availability Zones.