Amazon Aurora
User Guide for Aurora (API Version 2014-10-31)

Migrating Data to Amazon Aurora with PostgreSQL Compatibility

You have several options for migrating data from your existing database to an Amazon Aurora with PostgreSQL compatibility DB cluster. Your migration options also depend on the database that you are migrating from and the size of the data that you are migrating. Following are your options:

Migrating from an RDS PostgreSQL DB instance

You can migrate data directly from an Amazon RDS PostgreSQL DB snapshot to an Aurora PostgreSQL DB cluster. For more information, see Migrating an RDS PostgreSQL DB Snapshot to an Aurora PostgreSQL DB Cluster.

You can also migrate from an RDS PostgreSQL DB instance by creating an Aurora PostgreSQL Read Replica of a PostgreSQL DB instance. When the replica lag between the PostgreSQL DB instance and the Aurora PostgreSQL Read Replica is zero, you can stop replication. At this point, you can make the Aurora Read Replica a standalone Aurora PostgreSQL DB cluster for reading and writing. For more information, see Migrating Data from an RDS PostgreSQL DB Instance to an Aurora PostgreSQL DB Cluster by Using an Aurora Read Replica.

Migrating from a database that is not PostgreSQL-compatible

You can use AWS Database Migration Service (AWS DMS) to migrate data from a database that is not PostgreSQL-compatible. For more information on AWS DMS, see What Is AWS Database Migration Service?

Importing Amazon S3 data

You can migrate by importing data from Amazon S3 into a table belonging to an Aurora PostgreSQL DB cluster for an RDS PostgreSQL DB instance. For more information, see Importing Amazon S3 Data into an Aurora PostgreSQL DB Cluster.

For a list of AWS Regions where Aurora is available, see Amazon Aurora in the AWS General Reference.

Migrating an RDS PostgreSQL DB Snapshot to an Aurora PostgreSQL DB Cluster

To create an Aurora PostgreSQL DB cluster, you can migrate a DB snapshot of an RDS PostgreSQL DB instance. The new Aurora PostgreSQL DB cluster is populated with the data from the original RDS PostgreSQL DB instance. The DB snapshot must be from an RDS DB instance running PostgreSQL 9.6.1 or 9.6.3. For information about creating a DB snapshot, see Creating a DB Snapshot.

In some cases, the DB snapshot might not be in the AWS Region where you want to locate your data. If so, use the Amazon RDS console to copy the DB snapshot to that AWS Region. For information about copying a DB snapshot, see Copying a DB Snapshot.

When you migrate the DB snapshot by using the console, the console takes the actions necessary to create both the DB cluster and the primary instance.

You can also choose for your new Aurora PostgreSQL DB cluster to be encrypted at rest by using an AWS Key Management Service (AWS KMS) encryption key. This option is available only for unencrypted DB snapshots.

To migrate a PostgreSQL DB snapshot by using the RDS console

  1. Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.

  2. Choose Snapshots.

  3. On the Snapshots page, choose the snapshot that you want to migrate into an Aurora PostgreSQL DB cluster.

  4. Choose Migrate Database.

  5. Set the following values on the Migrate Database page:

    • DB Instance Class: Choose a DB instance class that has the required storage and capacity for your database, for example db.r3.large. Aurora cluster volumes automatically grow as the amount of data in your database increases, up to a maximum size of 64 tebibytes (TiB). So you only need to choose a DB instance class that meets your current storage requirements. For more information, see Overview of Aurora Storage.

    • DB Instance Identifier: Enter a name for the DB cluster that is unique for your account in the AWS Region that you chose. This identifier is used in the endpoint addresses for the instances in your DB cluster. You might choose to add some intelligence to the name, such as including the AWS Region and DB engine that you chose, for example aurora-cluster1.

      The DB instance identifier has the following constraints:

      • It must contain 1–63 alphanumeric characters or hyphens.

      • Its first character must be a letter.

      • It can't end with a hyphen or contain two consecutive hyphens.

      • It must be unique for all DB instances per AWS account, per AWS Region.

    • VPC: If you have an existing VPC, then you can use that VPC with your Aurora PostgreSQL DB cluster by choosing your VPC identifier, for example vpc-a464d1c1. For information on using an existing VPC, see How to Create a VPC for Use with Amazon Aurora.

      Otherwise, you can choose to have Amazon RDS create a VPC for you by choosing Create a new VPC.

    • Subnet Group: If you have an existing subnet group, then you can use that subnet group with your Aurora PostgreSQL DB cluster by choosing your subnet group identifier, for example gs-subnet-group1.

      Otherwise, you can choose to have Amazon RDS create a subnet group for you by choosing Create a new subnet group.

    • Publicly Accessible: Choose No to specify that instances in your DB cluster can only be accessed by resources inside of your VPC. Choose Yes to specify that instances in your DB cluster can be accessed by resources on the public network. The default is Yes.

      Note

      Your production DB cluster might not need to be in a public subnet, because only your application servers require access to your DB cluster. If your DB cluster doesn't need to be in a public subnet, set Publicly Accessible to No.

    • Availability Zone: Choose the Availability Zone to host the primary instance for your Aurora PostgreSQL DB cluster. To have Amazon RDS choose an Availability Zone for you, choose No Preference.

    • Database Port: Enter the default port to be used when connecting to instances in the Aurora PostgreSQL DB cluster. The default is 5432.

      Note

      You might be behind a corporate firewall that doesn't allow access to default ports such as the PostgreSQL default port, 5432. In this case, provide a port value that your corporate firewall allows. Remember that port value later when you connect to the Aurora PostgreSQL DB cluster.

    • Enable Encryption: Choose Yes for your new Aurora PostgreSQL DB cluster to be encrypted at rest. If you choose Yes, also choose an AWS KMS encryption key as the Master Key value.

    • Auto minor version upgrade: Choose Enable auto minor version upgrade to enable your Aurora PostgreSQL DB cluster to receive minor PostgreSQL DB engine version upgrades automatically when they become available.

      The Auto Minor Version Upgrade option only applies to upgrades to PostgreSQL minor engine versions for your Aurora PostgreSQL DB cluster. It doesn't apply to regular patches applied to maintain system stability.

  6. Choose Migrate to migrate your DB snapshot.

  7. Choose Instances, and then choose the arrow icon to show the DB cluster details and monitor the progress of the migration. On the details page, you can find the cluster endpoint used to connect to the primary instance of the DB cluster. For more information on connecting to an Aurora PostgreSQL DB cluster, see Connecting to an Amazon Aurora DB Cluster.

Migrating Data from an RDS PostgreSQL DB Instance to an Aurora PostgreSQL DB Cluster by Using an Aurora Read Replica

You can migrate from a PostgreSQL DB instance to an Aurora PostgreSQL DB cluster by using an Aurora Read Replica. When you need to migrate from an RDS PostgreSQL DB instance to an Aurora PostgreSQL DB cluster, we recommend using this approach.

In this case, Amazon RDS uses the PostgreSQL DB engine's streaming replication functionality to create a special type of DB cluster for the source PostgreSQL DB instance. This type of DB cluster is called an Aurora Read Replica. Updates made to the source PostgreSQL DB instance are asynchronously replicated to the Aurora Read Replica.

Overview of Migrating Data by Using an Aurora Read Replica

To migrate from an RDS PostgreSQL DB instance to an Aurora PostgreSQL DB cluster, we recommend creating an Aurora Read Replica of your source PostgreSQL DB instance. When the replica lag between the PostgreSQL DB instance and the Aurora PostgreSQL Read Replica is zero, you can stop replication. At this point, you can promote the Aurora Read Replica to be a standalone Aurora PostgreSQL DB cluster. This standalone DB cluster can then accept write loads.

Be prepared for migration to take a while, roughly several hours per tebibyte (TiB) of data. While the migration is in progress, your Amazon RDS PostgreSQL instance accumulates write ahead log (WAL) segments. Make sure that your Amazon RDS instance has sufficient storage capacity for these segments.

When you create an Aurora Read Replica of a PostgreSQL DB instance, Amazon RDS creates a DB snapshot of your source PostgreSQL DB instance. This snapshot is private to Amazon RDS and incurs no charges. Amazon RDS then migrates the data from the DB snapshot to the Aurora Read Replica. After the DB snapshot data is migrated to the new Aurora PostgreSQL DB cluster, RDS starts replication between your PostgreSQL DB instance and the Aurora PostgreSQL DB cluster.

You can only have one Aurora Read Replica for a PostgreSQL DB instance. If you try to create an Aurora Read Replica for your Amazon RDS PostgreSQL instance and you already have a Read Replica, the request is rejected.

Note

Replication issues can arise due to feature differences between Aurora PostgreSQL and the PostgreSQL engine version of your RDS PostgreSQL DB instance that is the replication master. You can replicate only from an Amazon RDS PostgreSQL instance that is compatible with the Aurora PostgreSQL version in question. For example, if the supported Aurora PostgreSQL version is 9.6.3, the Amazon RDS PostgreSQL DB instance must be running version 9.6.1 or greater. If you encounter an error, you can find help in the Amazon RDS Community Forum or by contacting AWS Support.

For more information on PostgreSQL Read Replicas, see Working with Read Replicas in the Amazon RDS User Guide.

Preparing to Migrate Data by Using an Aurora Read Replica

Before you migrate data from your RDS PostgreSQL instance to an Aurora PostgreSQL cluster, make sure that your instance has sufficient storage capacity. This storage capacity is for the write ahead log (WAL) segments that accumulate during the migration. There are several metrics to check for this, described following.

Metric Description

FreeStorageSpace

The available storage space.

Units: Bytes

OldestReplicationSlotLag

The size of the lag for WAL data in the replica that is lagging the most.

Units: Megabytes

RDSToAuroraPostgreSQLReplicaLag

The amount of time in seconds that an Aurora PostgreSQL DB cluster lags behind the source RDS DB instance.

TransactionLogsDiskUsage

The disk space used by the transaction logs.

Units: Megabytes

For more information about monitoring your RDS instance, see Monitoring in the Amazon RDS User Guide.

Creating an Aurora Read Replica

You can create an Aurora Read Replica for a PostgreSQL DB instance by using the console or the AWS CLI.

Console

To create an Aurora Read Replica from a source PostgreSQL DB instance

  1. Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.

  2. In the navigation pane, choose Databases.

  3. Choose the PostgreSQL DB instance that you want to use as the source for your Aurora Read Replica, and choose Create Aurora read replica for Actions.

    
                                    Create Aurora Read Replica
  4. Choose the DB cluster specifications that you want to use for the Aurora Read Replica, as described in the following table.

    Option Description

    DB instance class

    Choose a DB instance class that defines the processing and memory requirements for the primary instance in the DB cluster. For more information about DB instance class options, see Choosing the DB Instance Class.

    Multi-AZ deployment

    Choose Create Replica in Different Zone to create the writer instance of the new DB cluster in another Availability Zone in the target AWS Region. For more information about multiple Availability Zones, see Choosing the Regions and Availability Zones.

    DB instance identifier

    Enter a name for the primary instance in your Aurora Read Replica DB cluster. This identifier is used in the endpoint address for the primary instance of the new DB cluster.

    The DB instance identifier has the following constraints:

    • It must contain 1–63 alphanumeric characters or hyphens.

    • Its first character must be a letter.

    • It can't end with a hyphen or contain two consecutive hyphens.

    • It must be unique for all DB instances for each AWS account, for each AWS Region.

    The Aurora Read Replica DB cluster is created from a snapshot of the source DB instance. Thus, the master user name and master password for the Aurora Read Replica are the same as the master user name and master password for the source DB instance.

    Virtual Private Cloud (VPC)

    Choose the VPC to host the DB cluster. Choose Create new VPC to have Amazon RDS create a VPC for you. For more information, see DB Cluster Prerequisites.

    Subnet group

    Choose the DB subnet group to use for the DB cluster. Choose Create new DB Subnet Group to have Amazon RDS create a DB subnet group for you. For more information, see DB Cluster Prerequisites.

    Public accessibility

    Choose Yes to give the DB cluster a public IP address; otherwise, choose No. The instances in your DB cluster can be a mix of both public and private DB instances. For more information about hiding instances from public access, see Hiding a DB Instance in a VPC from the Internet.

    Availability zone

    Determine if you want to specify a particular Availability Zone. For more information about Availability Zones, see Choosing the Regions and Availability Zones.

    VPC security groups

    Choose one or more VPC security groups to secure network access to the DB cluster. Choose Create new VPC security group to have Amazon RDS create a VPC security group for you. For more information, see DB Cluster Prerequisites.

    Database port

    Specify the port for applications and utilities to use to access the database. Aurora PostgreSQL DB clusters default to the default PostgreSQL port, 5432. Firewalls at some companies block connections to this port. If your company firewall blocks the default port, choose another port for the new DB cluster.

    DB parameter group

    Choose a DB parameter group for the Aurora PostgreSQL DB cluster. Aurora has a default DB parameter group you can use, or you can create your own DB parameter group. For more information about DB parameter groups, see Working with DB Parameter Groups and DB Cluster Parameter Groups.

    DB cluster parameter group

    Choose a DB cluster parameter group for the Aurora PostgreSQL DB cluster. Aurora has a default DB cluster parameter group you can use, or you can create your own DB cluster parameter group. For more information about DB cluster parameter groups, see Working with DB Parameter Groups and DB Cluster Parameter Groups.

    Encryption

    Choose Enable encryption for your new Aurora DB cluster to be encrypted at rest. If you choose Enable encryption, also choose an AWS KMS encryption key as the Master key value.

    Priority

    Choose a failover priority for the DB cluster. If you don't choose a value, the default is tier-1. This priority determines the order in which Aurora Replicas are promoted when recovering from a primary instance failure. For more information, see Fault Tolerance for an Aurora DB Cluster.

    Backup retention period

    Choose the length of time, 1–35 days, for Aurora to retain backup copies of the database. Backup copies can be used for point-in-time restores (PITR) of your database down to the second.

    Enhanced monitoring

    Choose Enable enhanced monitoring to enable gathering metrics in real time for the operating system that your DB cluster runs on. For more information, see Enhanced Monitoring.

    Monitoring Role

    Only available if you chose Enable enhanced monitoring. The AWS Identity and Access Management (IAM) role to use for Enhanced Monitoring. For more information, see Setting Up for and Enabling Enhanced Monitoring.

    Granularity

    Only available if you chose Enable enhanced monitoring. Set the interval, in seconds, between when metrics are collected for your DB cluster.

    Auto minor version upgrade

    Choose Yes to enable your Aurora PostgreSQL DB cluster to receive minor PostgreSQL DB engine version upgrades automatically when they become available.

    The Auto minor version upgrade option only applies to upgrades to PostgreSQL minor engine versions for your Aurora PostgreSQL DB cluster. It doesn't apply to regular patches applied to maintain system stability.

    Maintenance window

    Choose the weekly time range during which system maintenance can occur.

  5. Choose Create read replica.

AWS CLI

To create an Aurora Read Replica from a source PostgreSQL DB instance, use the create-db-cluster and create-db-instance AWS CLI commands to create a new Aurora PostgreSQL DB cluster. When you call the create-db-cluster command, include the --replication-source-identifier parameter to identify the Amazon Resource Name (ARN) for the source PostgreSQL DB instance. For more information about Amazon RDS ARNs, see Amazon Relational Database Service (Amazon RDS) in the AWS General Reference.

Don't specify the master user name, master password, or database name. The Aurora Read Replica uses the same master user name, master password, and database name as the source PostgreSQL DB instance.

For Linux, OS X, or Unix:

aws rds create-db-cluster --db-cluster-identifier sample-replica-cluster --engine aurora-postgresql \ --db-subnet-group-name mysubnetgroup --vpc-security-group-ids sg-c7e5b0d2 \ --replication-source-identifier arn:aws:rds:us-west-2:123456789012:db:master-postgresql-instance

For Windows:

aws rds create-db-cluster --db-cluster-identifier sample-replica-cluster --engine aurora-postgresql ^ --db-subnet-group-name mysubnetgroup --vpc-security-group-ids sg-c7e5b0d2 ^ --replication-source-identifier arn:aws:rds:us-west-2:123456789012:db:master-postgresql-instance

If you use the console to create an Aurora Read Replica, then RDS automatically creates the primary instance for your DB cluster Aurora Read Replica. If you use the CLI to create an Aurora Read Replica, you must explicitly create the primary instance for your DB cluster. The primary instance is the first instance that is created in a DB cluster.

You can create a primary instance for your DB cluster by using the create-db-instance CLI command with the following parameters:

  • --db-cluster-identifier

    The name of your DB cluster.

  • --db-instance-class

    The name of the DB instance class to use for your primary instance.

  • --db-instance-identifier

    The name of your primary instance.

  • --engine aurora-postgresql

    The database engine to use.

In the following example, you create a primary instance named myreadreplicainstance for the DB cluster named myreadreplicacluster. You do this using the DB instance class specified in myinstanceclass.

Example

For Linux, OS X, or Unix:

aws rds create-db-instance \ --db-cluster-identifier myreadreplicacluster \ --db-instance-class myinstanceclass --db-instance-identifier myreadreplicainstance \ --engine aurora-postgresql

For Windows:

aws rds create-db-instance \ --db-cluster-identifier myreadreplicacluster \ --db-instance-class myinstanceclass --db-instance-identifier myreadreplicainstance \ --engine aurora-postgresql

RDS API

To create an Aurora Read Replica from a source PostgreSQL DB instance, use the RDS API operations CreateDBCluster and CreateDBInstance to create a new Aurora DB cluster and primary instance. Don't specify the master user name, master password, or database name. The Aurora Read Replica uses the same master user name, master password, and database name as the source PostgreSQL DB instance.

You can create a new Aurora DB cluster for an Aurora Read Replica from a source PostgreSQL DB instance. To do so, use the RDS API operation CreateDBCluster with the following parameters:

  • DBClusterIdentifier

    The name of the DB cluster to create.

  • DBSubnetGroupName

    The name of the DB subnet group to associate with this DB cluster.

  • Engine=aurora-postgresql

    The name of the engine to use.

  • ReplicationSourceIdentifier

    The Amazon Resource Name (ARN) for the source PostgreSQL DB instance. For more information about Amazon RDS ARNs, see Amazon Relational Database Service (Amazon RDS) in the Amazon Web Services General Reference.

  • VpcSecurityGroupIds

    The list of Amazon EC2 VPC security groups to associate with this DB cluster.

In the following example, you create a DB cluster named myreadreplicacluster from a source PostgreSQL DB instance. This cluster has an ARN set to mysqlmasterARN. The cluster is associated with a DB subnet group named mysubnetgroup and a VPC security group named mysecuritygroup.

Example

https://rds.us-east-1.amazonaws.com/ ?Action=CreateDBCluster &DBClusterIdentifier=myreadreplicacluster &DBSubnetGroupName=mysubnetgroup &Engine=aurora-postgresql &ReplicationSourceIdentifier=mysqlmasterARN &SignatureMethod=HmacSHA256 &SignatureVersion=4 &Version=2014-10-31 &VpcSecurityGroupIds=mysecuritygroup &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIADQKE4SARGYLE/20150927/us-east-1/rds/aws4_request &X-Amz-Date=20150927T164851Z &X-Amz-SignedHeaders=content-type;host;user-agent;x-amz-content-sha256;x-amz-date &X-Amz-Signature=6a8f4bd6a98f649c75ea04a6b3929ecc75ac09739588391cd7250f5280e716db

If you use the console to create an Aurora Read Replica, then Amazon RDS automatically creates the primary instance for your DB cluster Aurora Read Replica. If you use the CLI to create an Aurora Read Replica, you must explicitly create the primary instance for your DB cluster. The primary instance is the first instance that is created in a DB cluster.

You can create a primary instance for your DB cluster by using the RDS API operation CreateDBInstance with the following parameters:

  • DBClusterIdentifier

    The name of your DB cluster.

  • DBInstanceClass

    The name of the DB instance class to use for your primary instance.

  • DBInstanceIdentifier

    The name of your primary instance.

  • Engine=aurora-postgresql

    The name of the engine to use.

In this example, you create a primary instance named myreadreplicainstance for the DB cluster named myreadreplicacluster. You do this using the DB instance class specified in myinstanceclass.

Example

https://rds.us-east-1.amazonaws.com/ ?Action=CreateDBInstance &DBClusterIdentifier=myreadreplicacluster &DBInstanceClass=myinstanceclass &DBInstanceIdentifier=myreadreplicainstance &Engine=aurora-postgresql &SignatureMethod=HmacSHA256 &SignatureVersion=4 &Version=2014-09-01 &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIADQKE4SARGYLE/20140424/us-east-1/rds/aws4_request &X-Amz-Date=20140424T194844Z &X-Amz-SignedHeaders=content-type;host;user-agent;x-amz-content-sha256;x-amz-date &X-Amz-Signature=bee4aabc750bf7dad0cd9e22b952bd6089d91e2a16592c2293e532eeaab8bc77

Promoting an Aurora Read Replica

After migration completes, you can promote the Aurora Read Replica to a standalone DB cluster. You then direct your client applications to the endpoint for the Aurora Read Replica. For more information on the Aurora endpoints, see Amazon Aurora Connection Management. Promotion should complete fairly quickly. You can't delete the master PostgreSQL DB instance or unlink the DB instance and the Aurora Read Replica until the promotion is complete.

Before you promote your Aurora Read Replica, stop any transactions from being written to the source PostgreSQL DB instance. Then wait for the replica lag on the Aurora Read Replica to reach zero.

After you promote your Read Replica, confirm that the promotion has completed. To do this, choose Instances in the navigation pane and confirm that there is a Promoted Read Replica cluster to stand-alone database cluster event for your Read Replica. After promotion is complete, the master PostgreSQL DB Instance and the Aurora Read Replica are unlinked. At this point, you can safely delete the DB instance if you want to.

Console

To promote an Aurora Read Replica to an Aurora DB cluster

  1. Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.

  2. In the navigation pane, choose Instances.

  3. Choose the DB instance for the Aurora Read Replica and choose Promote Read Replica for Actions.

  4. Choose Promote Read Replica.

AWS CLI

To promote an Aurora Read Replica to a stand-alone DB cluster, use the promote-read-replica-db-cluster AWS CLI command.

Example

For Linux, OS X, or Unix:

aws rds promote-read-replica-db-cluster \ --db-cluster-identifier myreadreplicacluster

For Windows:

aws rds promote-read-replica-db-cluster ^ --db-cluster-identifier myreadreplicacluster

Importing Amazon S3 Data into an Aurora PostgreSQL DB Cluster

You can import data from Amazon S3 into a table belonging to an Aurora PostgreSQL DB cluster. To do this, you use the aws_s3 PostgreSQL extension that Aurora PostgreSQL provides.

For more information on storing data with Amazon S3, see Create a Bucket in the Amazon Simple Storage Service Getting Started Guide. For instructions on how to upload a file to an Amazon S3 bucket, see Add an Object to a Bucket in the Amazon Simple Storage Service Getting Started Guide.

Overview of Importing Amazon S3 Data

To import data stored in an Amazon S3 bucket to a PostgreSQL database table, follow these steps.

To import S3 data into Aurora PostgreSQL

  1. Install the required PostgreSQL extensions. These include the aws_s3 and aws_commons extensions. To do so, start psql and use the following command.

    psql=> CREATE EXTENSION aws_s3 CASCADE; NOTICE: installing required extension "aws_commons"

    The aws_s3 extension provides the aws_s3.table_import_from_s3 function that you use to import Amazon S3 data. The aws_commons extension provides additional helper functions.

  2. Identify the database table and Amazon S3 file to use.

    The aws_s3.table_import_from_s3 function requires the name of the PostgreSQL database table that you want to import data into. The function also requires that you identify the Amazon S3 file to import. To provide this information, take the following steps.

    1. Identify the PostgreSQL database table to put the data in. For example, the following is a sample t1 database table used in the examples for this topic.

      psql=> CREATE TABLE t1 (col1 varchar(80), col2 varchar(80), col3 varchar(80));
    2. Get the following information to identify the Amazon S3 file that you want to import:

      • Bucket name – A bucket is a container for Amazon S3 objects or files.

      • File path – The file path locates the file in the Amazon S3 bucket.

      • AWS Region – The AWS Region is the location of the Amazon S3 bucket. For example, if the S3 bucket is in the US East (N. Virginia) Region, use us-east-1. For a listing of AWS Region names and associated values, see Choosing the Regions and Availability Zones.

      To find how to get this information, see View an Object in the Amazon Simple Storage Service Getting Started Guide. You can confirm the information by using the AWS CLI command aws s3 cp. If the information is correct, this command downloads a copy of the Amazon S3 file.

      aws s3 cp s3://sample_s3_bucket/sample_file_path ./
    3. Use the aws_commons.create_s3_uri function to create an aws_commons._s3_uri_1 structure to hold the Amazon S3 file information. You provide this aws_commons._s3_uri_1 structure as a parameter in the call to the aws_s3.table_import_from_s3 function.

      For a psql example, see the following.

      psql=> SELECT aws_commons.create_s3_uri( 'sample_s3_bucket', 'sample.csv', 'us-east-1' ) AS s3_uri \gset
  3. Provide permission to access the Amazon S3 file.

    To import data from an Amazon S3 file, you need to give the Aurora PostgreSQL DB cluster permission to access the Amazon S3 bucket the file is in. To do this, you use either an AWS Identity and Access Management (IAM) role or security credentials. For more information, see Setting Up Access to an Amazon S3 Bucket.

  4. Import the Amazon S3 data by calling the aws_s3.table_import_from_s3 function.

    After you complete the previous preparation tasks, use the aws_s3.table_import_from_s3 function to import the Amazon S3 data. For more information, see Using the aws_s3.table_import_from_s3 Function to Import Amazon S3 Data.

Setting Up Access to an Amazon S3 Bucket

To import data from an Amazon S3 file, you need to give the Aurora PostgreSQL DB cluster permission to access the Amazon S3 bucket the file is in. You provide access to an Amazon S3 bucket in one of two ways, as described in the following topics.

Using an IAM Role to Access an Amazon S3 Bucket

Before you load data from an Amazon S3 file, give your Aurora PostgreSQL DB cluster permission to access the Amazon S3 bucket the file is in. This way, you don't have to manage additional credential information or provide it in the aws_s3.table_import_from_s3 function call.

To do this, create an IAM policy that provides access to the Amazon S3 bucket. Create an IAM role and attach the policy to the role. Then assign the IAM role to your DB cluster.

To give an Aurora PostgreSQL DB cluster access to Amazon S3 through an IAM role

  1. Create an IAM policy. This policy provides the bucket and object permissions that allow your Aurora PostgreSQL DB cluster to access Amazon S3.

    Include in the policy the following required actions to allow the transfer of files from an Amazon S3 bucket to Aurora PostgreSQL:

    • s3:GetObject

    • s3:ListBucket

    Include in the policy the following resources to identify the Amazon S3 bucket and objects in the bucket. This shows the Amazon Resource Name (ARN) format for accessing Amazon S3.

    • arn:aws:s3:::your-s3-bucket

    • arn:aws:s3:::your-s3-bucket/*

    For more information on creating an IAM policy for Aurora PostgreSQL, see Creating and Using an IAM Policy for IAM Database Access. See also Tutorial: Create and Attach Your First Customer Managed Policy in the IAM User Guide.

    The following AWS CLI command creates an IAM policy named rds-s3-import-policy with these options. It grants access to a bucket named your-s3-bucket.

    Note

    After you create the policy, note the Amazon Resource Name (ARN) of the policy. You need the ARN for a subsequent step when you attach the policy to an IAM role.

    Example

    For Linux, OS X, or Unix:

    aws iam create-policy \ --policy-name rds-s3-import-policy \ --policy-document '{ "Version": "2012-10-17", "Statement": [ { "Sid": "s3import", "Action": [ "s3:GetObject", "s3:ListBucket", ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::your-s3-bucket", "arn:aws:s3:::your-s3-bucket/*" ] } ] }'

    For Windows:

    aws iam create-policy ^ --policy-name rds-s3-import-policy ^ --policy-document '{ "Version": "2012-10-17", "Statement": [ { "Sid": "s3import", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::your-s3-bucket", "arn:aws:s3:::your-s3-bucket/*" ] } ] }'
  2. Create an IAM role. You do this so Aurora PostgreSQL can assume this IAM role on your behalf to access your Amazon S3 buckets. For more information, see Creating a Role to Delegate Permissions to an IAM User in the IAM User Guide.

    The following example shows using the AWS CLI command to create a role named rds-s3-import-role.

    Example

    For Linux, OS X, or Unix:

    aws iam create-role \ --role-name rds-s3-import-role \ --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }'

    For Windows:

    aws iam create-role ^ --role-name rds-s3-import-role ^ --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }'
  3. Attach the IAM policy that you created to the IAM role that you created.

    The following AWS CLI command attaches the policy created earlier to the role named rds-s3-import-role Replace your-policy-arn with the policy ARN that you noted in an earlier step.

    Example

    For Linux, OS X, or Unix:

    aws iam attach-role-policy \ --policy-arn your-policy-arn \ --role-name rds-s3-import-role

    For Windows:

    aws iam attach-role-policy ^ --policy-arn your-policy-arn ^ --role-name rds-s3-import-role
  4. Add the IAM role to the DB cluster. You do so by using the AWS Management Console or AWS CLI, as described following.

Console

To add an IAM role for a PostgreSQL DB cluster using the console

  1. Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.

  2. Choose the PostgreSQL DB cluster name to display its details.

  3. On the Connectivity & security tab, in the Manage IAM roles section, choose the role to add under Add IAM roles to this instance.

  4. Under Feature, choose s3Import.

  5. Choose Add role.

AWS CLI

To add an IAM role for a PostgreSQL DB cluster using the CLI

  • Use the following command to add the role to the PostgreSQL DB cluster named my-db-cluster. Replace your-role-arn with the role ARN that you noted in a previous step. Use s3Import for the value of the --feature-name option.

    Example

    For Linux, OS X, or Unix:

    aws rds add-role-to-db-cluster \ --db-cluster-identifier my-db-cluster \ --feature-name s3Import \ --role-arn your-role-arn \ --region your-region

    For Windows:

    aws rds add-role-to-db-cluster ^ --db-cluster-identifier my-db-cluster ^ --feature-name s3Import ^ --role-arn your-role-arn ^ --region your-region

Using Security Credentials to Access an Amazon S3 Bucket

If you prefer, you can use security credentials to provide access to an Amazon S3 bucket instead of providing access with an IAM role. To do this, use the credentials parameter in the aws_s3.table_import_from_s3 function call.

The credentials parameter is a structure of type aws_commons._aws_credentials_1, which contains AWS credentials. Use the aws_commons.create_aws_credentials function to set the access key and secret key in an aws_commons._aws_credentials_1 structure, as shown following.

psql=> SELECT aws_commons.create_aws_credentials( 'sample_access_key', 'sample_secret_key', '') AS creds \gset

After creating the aws_commons._aws_credentials_1 structure, use the aws_s3.table_import_from_s3 function with the credentials parameter to import the data, as shown following.

psql=> SELECT aws_s3.table_import_from_s3( 't', '', '(format csv)', :'s3_uri', :'creds' );

Or you can include the aws_commons.create_aws_credentials function call inline within the aws_s3.table_import_from_s3 function call.

psql=> SELECT aws_s3.table_import_from_s3( 't', '', '(format csv)', :'s3_uri', aws_commons.create_aws_credentials('sample_access_key', 'sample_secret_key', '') );

Troubleshooting Access to Amazon S3

If you encounter connection problems when attempting to import Amazon S3 file data, see the following for recommendations:

Using the aws_s3.table_import_from_s3 Function to Import Amazon S3 Data

Import your Amazon S3 data by calling the aws_s3.table_import_from_s3 function.

Note

The following examples use the IAM role method for providing access to the Amazon S3 bucket. Thus, there are no credential parameters in the aws_s3.table_import_from_s3 function calls.

The following shows a typical PostgreSQL example using psql.

psql=> SELECT aws_s3.table_import_from_s3( 't1', '', '(format csv)', :'s3_uri' );

The parameters are the following:

  • t1 – The name for the table in the PostgreSQL DB cluster to copy the data into.

  • '' – An optional list of columns in the database table. You can use this parameter to indicate which columns of the S3 data go in which table columns. If no columns are specified, all the columns are copied to the table. For an example of using a column list, see Importing an Amazon S3 File That Uses a Custom Delimiter.

  • (format csv) – PostgreSQL COPY arguments. The copy process uses the arguments and format of the PostgreSQL COPY command. In the preceding example, the COPY command uses the comma-separated value (CSV) file format to copy the data.

  • s3_uri – A structure that contains the information identifying the Amazon S3 file. For an example of using the aws_commons.create_s3_uri function to create an s3_uri structure, see Overview of Importing Amazon S3 Data.

For the full reference of this function, see aws_s3.table_import_from_s3.

The following examples show how to specify different kinds of files when importing Amazon S3 data.

Importing an Amazon S3 File That Uses a Custom Delimiter

The following example shows how to import a file that uses a custom delimiter. It also shows how to control where to put the data in the database table using the column_list parameter of the aws_s3.table_import_from_s3 function.

For this example, assume that the following information is organized into pipe-delimited columns in the Amazon S3 file.

1|foo1|bar1|elephant1 2|foo2|bar2|elephant2 3|foo3|bar3|elephant3 4|foo4|bar4|elephant4 ...

To import a file that uses a custom delimiter

  1. Create a table in the database for the imported data.

    psql=> CREATE TABLE test (a text, b text, c text, d text, e text); CREATE TABLE
  2. Use the following form of the aws_s3.table_import_from_s3 function to import data from the Amazon S3 file.

    You can include the aws_commons.create_s3_uri function call inline within the aws_s3.table_import_from_s3 function call to specify the file.

    psql=> SELECT aws_s3.table_import_from_s3( 'test', 'a,b,d,e', 'DELIMITER ''|''', aws_commons.create_s3_uri('sampleBucket', 'pipeDelimitedSampleFile', 'us-east-2') );

The data is now in the table in the following columns.

psql=> SELECT * FROM test; a | b | c | d | e ---+------+---+---+------+----------- 1 | foo1 | | bar1 | elephant1 2 | foo2 | | bar2 | elephant2 3 | foo3 | | bar3 | elephant3 4 | foo4 | | bar4 | elephant4

Importing an Amazon S3 Compressed (gzip) File

The following example shows how to import a file from Amazon S3 that is compressed with gzip.

Ensure that the file contains the following Amazon S3 metadata:

  • Key: Content-Encoding

  • Value: gzip

For more about adding these values to Amazon S3 metadata, see How Do I Add Metadata to an S3 Object? in the Amazon Simple Storage Service Console User Guide.

Import the gzip file into your Aurora PostgreSQL DB cluster as shown following.

psql=> CREATE TABLE test_gzip(id int, a text, b text, c text, d text); CREATE TABLE psql=> SELECT aws_s3.table_import_from_s3( 'test_gzip', '', '(format csv)', 'myS3Bucket', 'test-data.gz', 'us-east-2' );

Importing an Encoded Amazon S3 File

The following example shows how to import a file from Amazon S3 that has Windows-1252 encoding.

psql=> SELECT aws_s3.table_import_from_s3( 'test_table', '', 'encoding ''WIN1252''', aws_commons.create_s3_uri('sampleBucket', 'SampleFile', 'us-east-2') );

Function Reference

aws_s3.table_import_from_s3

Imports Amazon S3 data into an Aurora PostgreSQL table. The aws_s3 extension provides the aws_s3.table_import_from_s3 function.

The three required parameters are table_name, column_list and options. These identify the database table and specify how the data is copied into the table.

You can also use these parameters:

  • The s3_info parameter specifies the Amazon S3 file to import. When you use this parameter, access to Amazon S3 is provided by an IAM role for the PostgreSQL DB cluster.

    aws_s3.table_import_from_s3 ( table_name text, column_list text, options text, s3_info aws_commons._s3_uri_1 )
  • The credentials parameter specifies the credentials to access Amazon S3. When you use this parameter, you don't use an IAM role.

    aws_s3.table_import_from_s3 ( table_name text, column_list text, options text, s3_info aws_commons._s3_uri_1, credentials aws_commons._aws_credentials_1 )

The aws_s3.table_import_from_s3 parameters are described in the following table.

Parameter Description
table_name A required text string containing the name of the PostgreSQL database table to import the data into.
column_list

A required text string containing an optional list of the PostgreSQL database table columns in which to copy the data. If the string is empty, all columns of the table are used. For an example, see Importing an Amazon S3 File That Uses a Custom Delimiter.

options

A required text string containing arguments for the PostgreSQL COPY command. These arguments specify how the data is to be copied into the PostgreSQL table. For more details, see the PostgreSQL COPY documentation.

s3_info

An aws_commons._s3_uri_1 composite type containing the following information about the S3 object:

  • bucket – The Amazon S3 bucket name containing the file.

  • file_path – The Amazon S3 path of the file.

  • region – The AWS Region that the file is in. For a listing of AWS Region names and associated region values, see Choosing the Regions and Availability Zones.

To create an aws_commons._s3_uri_1 composite structure, see aws_commons.create_s3_uri.

credentials

An aws_commons._aws_credentials_1 composite type containing the following credentials to use for the import operation:

  • Access key

  • Secret key

  • Session token

To create an aws_commons._aws_credentials_1 composite structure, see aws_commons.create_aws_credentials.

Alternate Parameters

To help with testing, you can use an expanded set of parameters instead of the s3_info and credentials parameters. Following are additional syntax variations for the aws_s3.table_import_from_s3 function.

  • Instead of using the s3_info parameter to identify an Amazon S3 file, use the combination of the bucket, file_path, and region parameters. With this form of the function, access to Amazon S3 is provided by an IAM role on the PostgreSQL DB instance.

    aws_s3.table_import_from_s3 ( table_name text, column_list text, options text, bucket text, file_path text, region text )
  • Instead of using the credentials parameter to specify Amazon S3 access, use the combination of the access_key, session_key, and session_token parameters.

    aws_s3.table_import_from_s3 ( table_name text, column_list text, options text, bucket text, file_path text, region text, access_key text, secret_key text, session_token text )

Find descriptions for these alternate parameters in the following table.

Parameter Description
bucket

A text string containing the name of the Amazon S3 bucket that contains the file.

file_path

A text string containing the Amazon S3 path of the file.

region A text string containing the AWS Region that the file is in. For a listing of AWS Region names and associated values, see Choosing the Regions and Availability Zones.
access_key A text string containing the access key to use for the import operation. The default is NULL.
secret_key A text string containing the secret key to use for the import operation. The default is NULL.
session_token (Optional) A text string containing the session key to use for the import operation. The default is NULL.

aws_commons.create_s3_uri

Creates an aws_commons._s3_uri_1 structure to hold Amazon S3 file information. You use the results of the aws_commons.create_s3_uri function in the s3_info parameter of the aws_s3.table_import_from_s3 function. The function syntax is as follows.

aws_commons.create_s3_uri( bucket text, file_path text, region text )

The aws_commons.create_s3_uri function parameters are described in the following table.

Parameter Description
bucket

A required text string containing the Amazon S3 bucket name for the file.

file_path

A required text string containing the Amazon S3 path of the file.

region

A required text string containing the AWS Region the file is in. For a listing of AWS Region names and associated region values, see Choosing the Regions and Availability Zones.

aws_commons.create_aws_credentials

Sets an access key and secret key in an aws_commons._aws_credentials_1 structure. Use the results of the aws_commons.create_aws_credentials function in the credentials parameter of the aws_s3.table_import_from_s3 function. The function syntax is as follows.

aws_commons.create_aws_credentials( access_key text, secret_key text, session_token text )

The aws_commons.create_aws_credentials function parameters are described in the following table.

Parameter Description
access_key

A required text string containing the access key to use for importing an Amazon S3 file. The default is NULL.

secret_key A required text string containing the secret key to use for importing an Amazon S3 file. The default is NULL.
session_token An optional text string containing the session token to use for importing an Amazon S3 file. The default is NULL. Note, if you provide an optional session_token, you can use temporary credentials.