Importing Amazon S3 data into an Aurora PostgreSQL DB cluster - Amazon Aurora

Importing Amazon S3 data into an Aurora PostgreSQL DB cluster

You can import data from Amazon S3 into a table belonging to an Aurora PostgreSQL DB cluster. To do this, you use the aws_s3 PostgreSQL extension that Aurora PostgreSQL provides. Your database must be running PostgreSQL version 10.7 or higher to import from Amazon S3 into Aurora PostgreSQL.

If you are using encryption, the Amazon S3 bucket must be encrypted with an AWS managed key. Currently, you can't import data from a bucket that is encrypted with a customer managed key.

For more information on storing data with Amazon S3, see Create a bucket in the Amazon Simple Storage Service User Guide. For instructions on how to upload a file to an Amazon S3 bucket, see Add an object to a bucket in the Amazon Simple Storage Service User Guide.

Overview of importing Amazon S3 data

To import data stored in an Amazon S3 bucket to a PostgreSQL database table, follow these steps.

To import S3 data into Aurora PostgreSQL

  1. Install the required PostgreSQL extensions. These include the aws_s3 and aws_commons extensions. To do so, start psql and use the following command.

    psql=> CREATE EXTENSION aws_s3 CASCADE; NOTICE: installing required extension "aws_commons"

    The aws_s3 extension provides the aws_s3.table_import_from_s3 function that you use to import Amazon S3 data. The aws_commons extension provides additional helper functions.

  2. Identify the database table and Amazon S3 file to use.

    The aws_s3.table_import_from_s3 function requires the name of the PostgreSQL database table that you want to import data into. The function also requires that you identify the Amazon S3 file to import. To provide this information, take the following steps.

    1. Identify the PostgreSQL database table to put the data in. For example, the following is a sample t1 database table used in the examples for this topic.

      psql=> CREATE TABLE t1 (col1 varchar(80), col2 varchar(80), col3 varchar(80));
    2. Get the following information to identify the Amazon S3 file that you want to import:

      • Bucket name – A bucket is a container for Amazon S3 objects or files.

      • File path – The file path locates the file in the Amazon S3 bucket.

      • AWS Region – The AWS Region is the location of the Amazon S3 bucket. For example, if the S3 bucket is in the US East (N. Virginia) Region, use us-east-1. For a listing of AWS Region names and associated values, see Regions and Availability Zones .

      To find how to get this information, see View an object in the Amazon Simple Storage Service User Guide. You can confirm the information by using the AWS CLI command aws s3 cp. If the information is correct, this command downloads a copy of the Amazon S3 file.

      aws s3 cp s3://sample_s3_bucket/sample_file_path ./
    3. Use the aws_commons.create_s3_uri function to create an aws_commons._s3_uri_1 structure to hold the Amazon S3 file information. You provide this aws_commons._s3_uri_1 structure as a parameter in the call to the aws_s3.table_import_from_s3 function.

      For a psql example, see the following.

      psql=> SELECT aws_commons.create_s3_uri( 'sample_s3_bucket', 'sample.csv', 'us-east-1' ) AS s3_uri \gset
  3. Provide permission to access the Amazon S3 file.

    To import data from an Amazon S3 file, give the Aurora PostgreSQL DB cluster permission to access the Amazon S3 bucket the file is in. To do this, you use either an AWS Identity and Access Management (IAM) role or security credentials. For more information, see Setting up access to an Amazon S3 bucket.

  4. Import the Amazon S3 data by calling the aws_s3.table_import_from_s3 function.

    After you complete the previous preparation tasks, use the aws_s3.table_import_from_s3 function to import the Amazon S3 data. For more information, see Using the aws_s3.table_import_from_s3 function to import Amazon S3 data.

Setting up access to an Amazon S3 bucket

To import data from an Amazon S3 file, give the Aurora PostgreSQL DB cluster permission to access the Amazon S3 bucket containing the file. You provide access to an Amazon S3 bucket in one of two ways, as described in the following topics.

Using an IAM role to access an Amazon S3 bucket

Before you load data from an Amazon S3 file, give your Aurora PostgreSQL DB cluster permission to access the Amazon S3 bucket the file is in. This way, you don't have to manage additional credential information or provide it in the aws_s3.table_import_from_s3 function call.

To do this, create an IAM policy that provides access to the Amazon S3 bucket. Create an IAM role and attach the policy to the role. Then assign the IAM role to your DB cluster.

To give an Aurora PostgreSQL DB cluster access to Amazon S3 through an IAM role

  1. Create an IAM policy.

    This policy provides the bucket and object permissions that allow your Aurora PostgreSQL DB cluster to access Amazon S3.

    Include in the policy the following required actions to allow the transfer of files from an Amazon S3 bucket to Aurora PostgreSQL:

    • s3:GetObject

    • s3:ListBucket

    Include in the policy the following resources to identify the Amazon S3 bucket and objects in the bucket. This shows the Amazon Resource Name (ARN) format for accessing Amazon S3.

    • arn:aws:s3:::your-s3-bucket

    • arn:aws:s3:::your-s3-bucket/*

    For more information on creating an IAM policy for Aurora PostgreSQL, see Creating and using an IAM policy for IAM database access. See also Tutorial: Create and attach your first customer managed policy in the IAM User Guide.

    The following AWS CLI command creates an IAM policy named rds-s3-import-policy with these options. It grants access to a bucket named your-s3-bucket.

    Note

    Note the Amazon Resource Name (ARN) of the policy returned by this command. You need the ARN when you attach the policy to an IAM role, in a subsequent step.

    Example

    For Linux, macOS, or Unix:

    aws iam create-policy \ --policy-name rds-s3-import-policy \ --policy-document '{ "Version": "2012-10-17", "Statement": [ { "Sid": "s3import", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::your-s3-bucket", "arn:aws:s3:::your-s3-bucket/*" ] } ] }'

    For Windows:

    aws iam create-policy ^ --policy-name rds-s3-import-policy ^ --policy-document '{ "Version": "2012-10-17", "Statement": [ { "Sid": "s3import", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::your-s3-bucket", "arn:aws:s3:::your-s3-bucket/*" ] } ] }'
  2. Create an IAM role.

    You do this so Aurora PostgreSQL can assume this IAM role to access your Amazon S3 buckets. For more information, see Creating a role to delegate permissions to an IAM user in the IAM User Guide.

    We recommend using the aws:SourceArn and aws:SourceAccount global condition context keys in resource-based policies to limit the service's permissions to a specific resource. This is the most effective way to protect against the confused deputy problem.

    If you use both global condition context keys and the aws:SourceArn value contains the account ID, the aws:SourceAccount value and the account in the aws:SourceArn value must use the same account ID when used in the same policy statement.

    • Use aws:SourceArn if you want cross-service access for a single resource.

    • Use aws:SourceAccount if you want to allow any resource in that account to be associated with the cross-service use.

    In the policy, be sure to use the aws:SourceArn global condition context key with the full ARN of the resource. The following example shows how to do so using the AWS CLI command to create a role named rds-s3-import-role.

    Example

    For Linux, macOS, or Unix:

    aws iam create-role \ --role-name rds-s3-import-role \ --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole" "Condition": { "StringEquals": { "aws:SourceAccount": 111122223333, "aws:SourceArn": arn:aws:rds:us-east-1:111122223333db:dbname } } } ] }'

    For Windows:

    aws iam create-role ^ --role-name rds-s3-import-role ^ --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole" "Condition": { "StringEquals": { "aws:SourceAccount": 111122223333, "aws:SourceArn": arn:aws:rds:us-east-1:111122223333db:dbname } } } ] }'
  3. Attach the IAM policy that you created to the IAM role that you created.

    The following AWS CLI command attaches the policy created in the previous step to the role named rds-s3-import-role Replace your-policy-arn with the policy ARN that you noted in an earlier step.

    Example

    For Linux, macOS, or Unix:

    aws iam attach-role-policy \ --policy-arn your-policy-arn \ --role-name rds-s3-import-role

    For Windows:

    aws iam attach-role-policy ^ --policy-arn your-policy-arn ^ --role-name rds-s3-import-role
  4. Add the IAM role to the DB cluster.

    You do so by using the AWS Management Console or AWS CLI, as described following.

    Note

    You can't associate an IAM role with an Aurora Serverless DB cluster. For more information, see Using Amazon Aurora Serverless v1.

    Also, be sure the database you use doesn't have any restrictions noted in Importing Amazon S3 data into an Aurora PostgreSQL DB cluster.

To add an IAM role for a PostgreSQL DB cluster using the console

  1. Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.

  2. Choose the PostgreSQL DB cluster name to display its details.

  3. On the Connectivity & security tab, in the Manage IAM roles section, choose the role to add under Add IAM roles to this cluster .

  4. Under Feature, choose s3Import.

  5. Choose Add role.

To add an IAM role for a PostgreSQL DB cluster using the CLI

  • Use the following command to add the role to the PostgreSQL DB cluster named my-db-cluster. Replace your-role-arn with the role ARN that you noted in a previous step. Use s3Import for the value of the --feature-name option.

    Example

    For Linux, macOS, or Unix:

    aws rds add-role-to-db-cluster \ --db-cluster-identifier my-db-cluster \ --feature-name s3Import \ --role-arn your-role-arn \ --region your-region

    For Windows:

    aws rds add-role-to-db-cluster ^ --db-cluster-identifier my-db-cluster ^ --feature-name s3Import ^ --role-arn your-role-arn ^ --region your-region

To add an IAM role for a PostgreSQL DB cluster using the Amazon RDS API, call the AddRoleToDBCluster operation.

Using security credentials to access an Amazon S3 bucket

If you prefer, you can use security credentials to provide access to an Amazon S3 bucket instead of providing access with an IAM role. To do this, use the credentials parameter in the aws_s3.table_import_from_s3 function call.

The credentials parameter is a structure of type aws_commons._aws_credentials_1, which contains AWS credentials. Use the aws_commons.create_aws_credentials function to set the access key and secret key in an aws_commons._aws_credentials_1 structure, as shown following.

psql=> SELECT aws_commons.create_aws_credentials( 'sample_access_key', 'sample_secret_key', '') AS creds \gset

After creating the aws_commons._aws_credentials_1 structure, use the aws_s3.table_import_from_s3 function with the credentials parameter to import the data, as shown following.

psql=> SELECT aws_s3.table_import_from_s3( 't', '', '(format csv)', :'s3_uri', :'creds' );

Or you can include the aws_commons.create_aws_credentials function call inline within the aws_s3.table_import_from_s3 function call.

psql=> SELECT aws_s3.table_import_from_s3( 't', '', '(format csv)', :'s3_uri', aws_commons.create_aws_credentials('sample_access_key', 'sample_secret_key', '') );

Troubleshooting access to Amazon S3

If you encounter connection problems when attempting to import Amazon S3 file data, see the following for recommendations:

Using the aws_s3.table_import_from_s3 function to import Amazon S3 data

Import your Amazon S3 data by calling the aws_s3.table_import_from_s3 function.

Note

The following examples use the IAM role method for providing access to the Amazon S3 bucket. Thus, the aws_s3.table_import_from_s3 function calls don't include credential parameters.

The following shows a typical PostgreSQL example using psql.

psql=> SELECT aws_s3.table_import_from_s3( 't1', '', '(format csv)', :'s3_uri' );

The parameters are the following:

  • t1 – The name for the table in the PostgreSQL DB cluster to copy the data into.

  • '' – An optional list of columns in the database table. You can use this parameter to indicate which columns of the S3 data go in which table columns. If no columns are specified, all the columns are copied to the table. For an example of using a column list, see Importing an Amazon S3 file that uses a custom delimiter.

  • (format csv) – PostgreSQL COPY arguments. The copy process uses the arguments and format of the PostgreSQL COPY command. In the preceding example, the COPY command uses the comma-separated value (CSV) file format to copy the data.

  • s3_uri – A structure that contains the information identifying the Amazon S3 file. For an example of using the aws_commons.create_s3_uri function to create an s3_uri structure, see Overview of importing Amazon S3 data.

The return value is text. For the full reference of this function, see aws_s3.table_import_from_s3.

The following examples show how to specify different kinds of files when importing Amazon S3 data.

Importing an Amazon S3 file that uses a custom delimiter

The following example shows how to import a file that uses a custom delimiter. It also shows how to control where to put the data in the database table using the column_list parameter of the aws_s3.table_import_from_s3 function.

For this example, assume that the following information is organized into pipe-delimited columns in the Amazon S3 file.

1|foo1|bar1|elephant1 2|foo2|bar2|elephant2 3|foo3|bar3|elephant3 4|foo4|bar4|elephant4 ...

To import a file that uses a custom delimiter

  1. Create a table in the database for the imported data.

    psql=> CREATE TABLE test (a text, b text, c text, d text, e text);
  2. Use the following form of the aws_s3.table_import_from_s3 function to import data from the Amazon S3 file.

    You can include the aws_commons.create_s3_uri function call inline within the aws_s3.table_import_from_s3 function call to specify the file.

    psql=> SELECT aws_s3.table_import_from_s3( 'test', 'a,b,d,e', 'DELIMITER ''|''', aws_commons.create_s3_uri('sampleBucket', 'pipeDelimitedSampleFile', 'us-east-2') );

The data is now in the table in the following columns.

psql=> SELECT * FROM test; a | b | c | d | e ---+------+---+---+------+----------- 1 | foo1 | | bar1 | elephant1 2 | foo2 | | bar2 | elephant2 3 | foo3 | | bar3 | elephant3 4 | foo4 | | bar4 | elephant4

Importing an Amazon S3 compressed (gzip) file

The following example shows how to import a file from Amazon S3 that is compressed with gzip.

Ensure that the file contains the following Amazon S3 metadata:

  • Key: Content-Encoding

  • Value: gzip

For more about adding these values to Amazon S3 metadata, see How do I add metadata to an S3 object? in the Amazon Simple Storage Service User Guide.

Import the gzip file into your Aurora PostgreSQL DB cluster as shown following.

psql=> CREATE TABLE test_gzip(id int, a text, b text, c text, d text); psql=> SELECT aws_s3.table_import_from_s3( 'test_gzip', '', '(format csv)', 'myS3Bucket', 'test-data.gz', 'us-east-2' );

Importing an encoded Amazon S3 file

The following example shows how to import a file from Amazon S3 that has Windows-1252 encoding.

psql=> SELECT aws_s3.table_import_from_s3( 'test_table', '', 'encoding ''WIN1252''', aws_commons.create_s3_uri('sampleBucket', 'SampleFile', 'us-east-2') );

Function reference

aws_s3.table_import_from_s3

Imports Amazon S3 data into an Aurora PostgreSQL table. The aws_s3 extension provides the aws_s3.table_import_from_s3 function. The return value is text.

Syntax

The required parameters are table_name, column_list and options. These identify the database table and specify how the data is copied into the table.

You can also use the following parameters:

  • The s3_info parameter specifies the Amazon S3 file to import. When you use this parameter, access to Amazon S3 is provided by an IAM role for the PostgreSQL DB cluster.

    aws_s3.table_import_from_s3 ( table_name text, column_list text, options text, s3_info aws_commons._s3_uri_1 )
  • The credentials parameter specifies the credentials to access Amazon S3. When you use this parameter, you don't use an IAM role.

    aws_s3.table_import_from_s3 ( table_name text, column_list text, options text, s3_info aws_commons._s3_uri_1, credentials aws_commons._aws_credentials_1 )

Parameters

table_name

A required text string containing the name of the PostgreSQL database table to import the data into.

column_list

A required text string containing an optional list of the PostgreSQL database table columns in which to copy the data. If the string is empty, all columns of the table are used. For an example, see Importing an Amazon S3 file that uses a custom delimiter.

options

A required text string containing arguments for the PostgreSQL COPY command. These arguments specify how the data is to be copied into the PostgreSQL table. For more details, see the PostgreSQL COPY documentation.

s3_info

An aws_commons._s3_uri_1 composite type containing the following information about the S3 object:

  • bucket – The name of the Amazon S3 bucket containing the file.

  • file_path – The Amazon S3 file name including the path of the file.

  • region – The AWS Region that the file is in. For a listing of AWS Region names and associated values, see Regions and Availability Zones .

credentials

An aws_commons._aws_credentials_1 composite type containing the following credentials to use for the import operation:

  • Access key

  • Secret key

  • Session token

For information about creating an aws_commons._aws_credentials_1 composite structure, see aws_commons.create_aws_credentials.

Alternate syntax

To help with testing, you can use an expanded set of parameters instead of the s3_info and credentials parameters. Following are additional syntax variations for the aws_s3.table_import_from_s3 function:

  • Instead of using the s3_info parameter to identify an Amazon S3 file, use the combination of the bucket, file_path, and region parameters. With this form of the function, access to Amazon S3 is provided by an IAM role on the PostgreSQL DB instance.

    aws_s3.table_import_from_s3 ( table_name text, column_list text, options text, bucket text, file_path text, region text )
  • Instead of using the credentials parameter to specify Amazon S3 access, use the combination of the access_key, session_key, and session_token parameters.

    aws_s3.table_import_from_s3 ( table_name text, column_list text, options text, bucket text, file_path text, region text, access_key text, secret_key text, session_token text )

Alternate parameters

bucket

A text string containing the name of the Amazon S3 bucket that contains the file.

file_path

A text string containing the Amazon S3 file name including the path of the file.

region

A text string containing the AWS Region that the file is in. For a listing of AWS Region names and associated values, see Regions and Availability Zones .

access_key

A text string containing the access key to use for the import operation. The default is NULL.

secret_key

A text string containing the secret key to use for the import operation. The default is NULL.

session_token

(Optional) A text string containing the session key to use for the import operation. The default is NULL.

aws_commons.create_s3_uri

Creates an aws_commons._s3_uri_1 structure to hold Amazon S3 file information. Use the results of the aws_commons.create_s3_uri function in the s3_info parameter of the aws_s3.table_import_from_s3 function.

Syntax

aws_commons.create_s3_uri( bucket text, file_path text, region text )

Parameters

bucket

A required text string containing the Amazon S3 bucket name for the file.

file_path

A required text string containing the Amazon S3 file name including the path of the file.

region

A required text string containing the AWS Region that the file is in. For a listing of AWS Region names and associated values, see Regions and Availability Zones .

aws_commons.create_aws_credentials

Sets an access key and secret key in an aws_commons._aws_credentials_1 structure. Use the results of the aws_commons.create_aws_credentials function in the credentials parameter of the aws_s3.table_import_from_s3 function.

Syntax

aws_commons.create_aws_credentials( access_key text, secret_key text, session_token text )

Parameters

access_key

A required text string containing the access key to use for importing an Amazon S3 file. The default is NULL.

secret_key

A required text string containing the secret key to use for importing an Amazon S3 file. The default is NULL.

session_token

An optional text string containing the session token to use for importing an Amazon S3 file. The default is NULL. If you provide an optional session_token, you can use temporary credentials.