Importing data into PostgreSQL on Amazon RDS
Suppose that you have an existing PostgreSQL deployment that you want to move to Amazon RDS. The complexity of your task depends on the size of your database and the types of database objects that you're transferring. For example, consider a database that contains datasets on the order of gigabytes, along with stored procedures and triggers. Such a database is going to be more complicated than a simple database with only a few megabytes of test data and no triggers or stored procedures.
We recommend that you use native PostgreSQL database migration tools under the following conditions:
-
You have a homogeneous migration, where you are migrating from a database with the same database engine as the target database.
-
You are migrating an entire database.
-
The native tools allow you to migrate your system with minimal downtime.
In most other cases, performing a database migration using AWS Database Migration Service (AWS DMS) is the best approach. AWS DMS can migrate databases without downtime and, for many database engines, continue ongoing replication until you are ready to switch over to the target database. You can migrate to either the same database engine or a different database engine using AWS DMS. If you are migrating to a different database engine than your source database, you can use the AWS Schema Conversion Tool (AWS SCT). You use AWS SCT to migrate schema objects that are not migrated by AWS DMS. For more information about AWS DMS, see What is AWS Database Migration Service?
Modify your DB parameter group to include the following settings for your import only. You should test the parameter settings to find the most efficient settings for your DB instance size. You also need to revert back to production values for these parameters after your import completes.
Modify your DB instance settings to the following:
-
Disable DB instance backups (set backup_retention to 0).
-
Disable Multi-AZ.
Modify your DB parameter group to include the following settings. You should only use these settings when importing data. You should test the parameter settings to find the most efficient settings for your DB instance size. You also need to revert back to production values for these parameters after your import completes.
Parameter | Recommended value when importing | Description |
---|---|---|
|
524288, 1048576, 2097152, or 4194304 (in KB). These settings are comparable to 512 MB, 1 GB, 2 GB, and 4 GB. |
The value for this setting depends on the size of your host. This parameter is used during CREATE INDEX statements and each parallel command can use this much memory. Calculate the best value so that you don't set this value so high that you run out of memory. |
|
256 |
The value for this setting consumes more disk space, but gives you
less contention on a write ahead log (WAL). This setting is only
supported for PostgreSQL versions 9.5 and earlier. For versions 9.6
and later, use |
|
256 (for version 9.6), 4096 (for versions 10 and later) |
Maximum size to let the WAL grow during automatic checkpoints.
Increasing this parameter can increase the amount of time needed for
crash recovery. This parameter replaces
For PostgreSQL version 9.6, this value is in 16 MB units. For later versions, the value is in 1 MB units. For example, in version 9.6, 128 means 128 chunks that are each 16 MB in size. In version 12.4, 2048 means 2048 chunks that are each 1 MB in size. |
|
1800 |
The value for this setting allows for less frequent WAL rotation. |
|
Off |
Disable this setting to speed up writes. Turning this parameter off can increase the risk of data loss in the event of a server crash (do not turn off FSYNC). |
|
8192 |
This is value is in 8 KB units. This again helps your WAL generation speed |
|
Off |
Disable the PostgreSQL auto vacuum parameter while you are loading data so that it doesn't use resources |
Use the pg_dump -Fc
(compressed) or pg_restore -j
(parallel)
commands with these settings.
The PostgreSQL command pg_dumpall
requires super_user permissions
that are not granted when you create a DB instance, so it cannot be used for
importing data.
Topics
Importing a PostgreSQL database from an Amazon EC2 instance
If you have data in a PostgreSQL server on an Amazon EC2 instance and want to move it to a PostgreSQL DB instance, you can use the following process. The following list shows the steps to take. Each step is discussed in more detail in the following sections.
-
Create a file using pg_dump that contains the data to be loaded
-
Create the target DB instance
-
Use psql to create the database on the DB instance and load the data
-
Create a DB snapshot of the DB instance
Step 1: Create a file using pg_dump that contains the data to load
The pg_dump
utility uses the COPY command to create a schema and
data dump of a PostgreSQL database. The dump script generated by
pg_dump
loads data into a database with the same name and
recreates the tables, indexes, and foreign keys. You can use the
pg_restore
command and the -d
parameter to restore
the data to a database with a different name.
Before you create the data dump, you should query the tables to be dumped to get a row count so you can confirm the count on the target DB instance.
The following command creates a dump file called mydb2dump.sql for a database called mydb2.
prompt>pg_dump dbname=mydb2 -f mydb2dump.sql
Step 2: Create the target DB instance
Create the target PostgreSQL DB instance using either the Amazon RDS console,
AWS CLI, or API. Create the instance with the backup retention setting set to 0
and disable Multi-AZ. Doing so allows faster data import. You must create a
database on the instance before you can dump the data. The database can have the
same name as the database that is contained the dumped data. Alternatively, you
can create a database with a different name. In this case, you use the
pg_restore
command and the -d
parameter to restore
the data into the newly named database.
For example, the following commands can be used to dump, restore, and rename a database.
pg_dump -Fc -v -h
[endpoint of instance]
-U[master username]
[database]
>[database]
.dump createdb[new database name]
pg_restore -v -h[endpoint of instance]
-U[master username]
-d[new database name]
[database]
.dump
Step 3: Use psql to create the database on the DB instance and load data
You can use the same connection you used to run the pg_dump command to connect to the target DB instance and recreate the database. Using psql, you can use the master user name and master password to create the database on the DB instance
The following example uses psql and a dump file named mydb2dump.sql to create a database called mydb2 on a PostgreSQL DB instance called mypginstance:
For Linux, macOS, or Unix:
psql \ -f
mydb2dump.sql
\ --hostmypginstance.c6c8mntzhgv0.us-west-2.rds.amazonaws.com
\ --port8199
\ --usernamemyawsuser
\ --passwordpassword
\ --dbnamemydb2
For Windows:
psql ^ -f
mydb2dump.sql
^ --hostmypginstance.c6c8mntzhgv0.us-west-2.rds.amazonaws.com
^ --port8199
^ --usernamemyawsuser
^ --passwordpassword
^ --dbnamemydb2
Step 4: Create a DB snapshot of the DB instance
Once you have verified that the data was loaded into your DB instance, we recommend that you create a DB snapshot of the target PostgreSQL DB instance. DB snapshots are complete backups of your DB instance that can be used to restore your DB instance to a known state. A DB snapshot taken immediately after the load protects you from having to load the data again in case of a mishap. You can also use such a snapshot to seed new DB instances. For information about creating a DB snapshot, see Creating a DB snapshot.
Using the \copy command to import data to a table on a PostgreSQL DB instance
You can run the \copy
command from the psql
prompt to import data into a table on a PostgreSQL DB instance. The table must
already exist on the DB instance. For more information on the \copy command, see the
PostgreSQL documentation
The \copy
command doesn't provide confirmation of actions,
such as a count of rows inserted. PostgreSQL does provide error messages if the
copy command fails due to an error.
Create a .csv file from the data in the source table, log on to the target database on the PostgreSQL instance using psql, and then run the following command. This example uses source-table as the source table name, source-table.csv as the .csv file, and target-db as the target database:
target-db=> \copy source-table from 'source-table.csv' with DELIMITER ',';
You can also run the following command from your client computer command prompt. This example uses source-table as the source table name, source-table.csv as the .csv file, and target-db as the target database:
For Linux, macOS, or Unix:
$psql target-db \ -U <admin user> \ -p <port> \ -h <DB instance name> \ -c "\copy source-table from 'source-table.csv' with DELIMITER ','"
For Windows:
$psql target-db ^ -U <admin user> ^ -p <port> ^ -h <DB instance name> ^ -c "\copy source-table from 'source-table.csv' with DELIMITER ','"
Importing Amazon S3 data into an RDS for PostgreSQL DB instance
You can import data from Amazon S3 into a table belonging to an RDS for PostgreSQL DB
instance. To do this, you use the aws_s3
PostgreSQL extension that
Amazon RDS provides.
To import from Amazon S3 into RDS for PostgreSQL, your database must be running PostgreSQL version 10.7 or later.
For more information on storing data with Amazon S3, see Create a bucket in the Amazon Simple Storage Service Getting Started Guide. For instructions on how to upload a file to an Amazon S3 bucket, see Add an object to a bucket in the Amazon Simple Storage Service Getting Started Guide.
Topics
Overview of importing Amazon S3 data
To import data stored in an Amazon S3 bucket to a PostgreSQL database table, follow these steps.
To import S3 data into Amazon RDS
-
Install the required PostgreSQL extensions. These include the
aws_s3
andaws_commons
extensions. To do so, start psql and use the following command.psql=> CREATE EXTENSION aws_s3 CASCADE; NOTICE: installing required extension "aws_commons"
The
aws_s3
extension provides the aws_s3.table_import_from_s3 function that you use to import Amazon S3 data. Theaws_commons
extension provides additional helper functions. -
Identify the database table and Amazon S3 file to use.
The aws_s3.table_import_from_s3 function requires the name of the PostgreSQL database table that you want to import data into. The function also requires that you identify the Amazon S3 file to import. To provide this information, take the following steps.
-
Identify the PostgreSQL database table to put the data in. For example, the following is a sample
t1
database table used in the examples for this topic.psql=> CREATE TABLE t1 (col1 varchar(80), col2 varchar(80), col3 varchar(80));
-
Get the following information to identify the Amazon S3 file that you want to import:
-
Bucket name – A bucket is a container for Amazon S3 objects or files.
-
File path – The file path locates the file in the Amazon S3 bucket.
-
AWS Region – The AWS Region is the location of the Amazon S3 bucket. For example, if the S3 bucket is in the US East (N. Virginia) Region, use
us-east-1
. For a listing of AWS Region names and associated values, see Regions, Availability Zones, and Local Zones .
To find how to get this information, see View an object in the Amazon Simple Storage Service Getting Started Guide. You can confirm the information by using the AWS CLI command
aws s3 cp
. If the information is correct, this command downloads a copy of the Amazon S3 file.aws s3 cp s3://
sample_s3_bucket
/sample_file_path
./ -
-
Use the aws_commons.create_s3_uri function to create an
aws_commons._s3_uri_1
structure to hold the Amazon S3 file information. You provide thisaws_commons._s3_uri_1
structure as a parameter in the call to the aws_s3.table_import_from_s3 function.For a psql example, see the following.
psql=> SELECT aws_commons.create_s3_uri( 'sample_s3_bucket', 'sample.csv', 'us-east-1' ) AS s3_uri \gset
-
-
Provide permission to access the Amazon S3 file.
To import data from an Amazon S3 file, give the RDS for PostgreSQL DB instance permission to access the Amazon S3 bucket the file is in. To do this, you use either an AWS Identity and Access Management (IAM) role or security credentials. For more information, see Setting up access to an Amazon S3 bucket.
-
Import the Amazon S3 data by calling the
aws_s3.table_import_from_s3
function.After you complete the previous preparation tasks, use the aws_s3.table_import_from_s3 function to import the Amazon S3 data. For more information, see Using the aws_s3.table_import_from_s3 function to import Amazon S3 data.
Setting up access to an Amazon S3 bucket
To import data from an Amazon S3 file, give the RDS for PostgreSQL DB instance permission to access the Amazon S3 bucket the file is in. You provide access to an Amazon S3 bucket in one of two ways, as described in the following topics.
Topics
Using an IAM role to access an Amazon S3 bucket
Before you load data from an Amazon S3 file, give your RDS for PostgreSQL DB instance permission to access the Amazon S3 bucket the file is in. This way, you don't have to manage additional credential information or provide it in the aws_s3.table_import_from_s3 function call.
To do this, create an IAM policy that provides access to the Amazon S3 bucket. Create an IAM role and attach the policy to the role. Then assign the IAM role to your DB instance.
To give an RDS for PostgreSQL DB instance access to Amazon S3 through an IAM role
-
Create an IAM policy.
This policy provides the bucket and object permissions that allow your RDS for PostgreSQL DB instance to access Amazon S3.
Include in the policy the following required actions to allow the transfer of files from an Amazon S3 bucket to Amazon RDS:
-
s3:GetObject
-
s3:ListBucket
Include in the policy the following resources to identify the Amazon S3 bucket and objects in the bucket. This shows the Amazon Resource Name (ARN) format for accessing Amazon S3.
-
arn:aws:s3:::
your-s3-bucket
-
arn:aws:s3:::
your-s3-bucket
/*
For more information on creating an IAM policy for Amazon RDS for PostgreSQL, see Creating and using an IAM policy for IAM database access. See also Tutorial: Create and attach your first customer managed policy in the IAM User Guide.
The following AWS CLI command creates an IAM policy named
rds-s3-import-policy
with these options. It grants access to a bucket namedyour-s3-bucket
.Note After you create the policy, note the Amazon Resource Name (ARN) of the policy. You need the ARN for a subsequent step when you attach the policy to an IAM role.
Example
For Linux, macOS, or Unix:
aws iam create-policy \ --policy-name rds-s3-import-policy \ --policy-document '{ "Version": "2012-10-17", "Statement": [ { "Sid": "s3import", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::
your-s3-bucket
", "arn:aws:s3:::your-s3-bucket
/*" ] } ] }'For Windows:
aws iam create-policy ^ --policy-name rds-s3-import-policy ^ --policy-document '{ "Version": "2012-10-17", "Statement": [ { "Sid": "s3import", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::
your-s3-bucket
", "arn:aws:s3:::your-s3-bucket
/*" ] } ] }' -
-
Create an IAM role.
You do this so Amazon RDS can assume this IAM role on your behalf to access your Amazon S3 buckets. For more information, see Creating a role to delegate permissions to an IAM user in the IAM User Guide.
The following example shows using the AWS CLI command to create a role named
rds-s3-import-role
.Example
For Linux, macOS, or Unix:
aws iam create-role \ --role-name rds-s3-import-role \ --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }'
For Windows:
aws iam create-role ^ --role-name rds-s3-import-role ^ --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }'
-
Attach the IAM policy that you created to the IAM role that you created.
The following AWS CLI command attaches the policy created earlier to the role named
rds-s3-import-role
Replace
with the policy ARN that you noted in an earlier step.your-policy-arn
Example
For Linux, macOS, or Unix:
aws iam attach-role-policy \ --policy-arn
your-policy-arn
\ --role-name rds-s3-import-roleFor Windows:
aws iam attach-role-policy ^ --policy-arn
your-policy-arn
^ --role-name rds-s3-import-role -
Add the IAM role to the DB instance.
You do so by using the AWS Management Console or AWS CLI, as described following.
Note Also, be sure the database you use doesn't have any restrictions noted in Importing Amazon S3 data into an RDS for PostgreSQL DB instance.
To add an IAM role for a PostgreSQL DB instance using the console
-
Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/
. -
Choose the PostgreSQL DB instance name to display its details.
-
On the Connectivity & security tab, in the Manage IAM roles section, choose the role to add under Add IAM roles to this instance .
-
Under Feature, choose s3Import.
-
Choose Add role.
To add an IAM role for a PostgreSQL DB instance using the CLI
-
Use the following command to add the role to the PostgreSQL DB instance named
my-db-instance
. Replace
with the role ARN that you noted in a previous step. Useyour-role-arn
s3Import
for the value of the--feature-name
option.Example
For Linux, macOS, or Unix:
aws rds add-role-to-db-instance \ --db-instance-identifier
my-db-instance
\ --feature-name s3Import \ --role-arnyour-role-arn
\ --regionyour-region
For Windows:
aws rds add-role-to-db-instance ^ --db-instance-identifier
my-db-instance
^ --feature-name s3Import ^ --role-arnyour-role-arn
^ --regionyour-region
To add an IAM role for a PostgreSQL DB instance using the Amazon RDS API, call the AddRoleToDBInstance operation.
Using security credentials to access an Amazon S3 bucket
If you prefer, you can use security credentials to provide access to an Amazon S3
bucket instead of providing access with an IAM role. To do this, use the
credentials
parameter in the aws_s3.table_import_from_s3 function call.
The credentials
parameter is a structure of type
aws_commons._aws_credentials_1
, which contains AWS credentials. Use
the aws_commons.create_aws_credentials function
to set the access key and secret key in an
aws_commons._aws_credentials_1
structure, as shown following.
psql=> SELECT aws_commons.create_aws_credentials( '
sample_access_key
', 'sample_secret_key
', '') AS creds \gset
After creating the aws_commons._aws_credentials_1
structure, use the
aws_s3.table_import_from_s3 function with the
credentials
parameter to import the data, as shown
following.
psql=> SELECT aws_s3.table_import_from_s3( 't', '', '(format csv)', :'s3_uri', :'creds' );
Or you can include the aws_commons.create_aws_credentials function
call inline within the aws_s3.table_import_from_s3
function
call.
psql=> SELECT aws_s3.table_import_from_s3( 't', '', '(format csv)', :'s3_uri', aws_commons.create_aws_credentials('
sample_access_key
', 'sample_secret_key
', '') );
Troubleshooting access to Amazon S3
If you encounter connection problems when attempting to import Amazon S3 file data, see the following for recommendations:
Using the aws_s3.table_import_from_s3 function to import Amazon S3 data
Import your Amazon S3 data by calling the aws_s3.table_import_from_s3 function.
The following examples use the IAM role method for providing access to the Amazon
S3 bucket. Thus, the
aws_s3.table_import_from_s3
function calls don't include credential parameters.
The following shows a typical PostgreSQL example using psql.
psql=> SELECT aws_s3.table_import_from_s3( 't1', '', '(format csv)', :'s3_uri' );
The parameters are the following:
-
t1
– The name for the table in the PostgreSQL DB instance to copy the data into. -
''
– An optional list of columns in the database table. You can use this parameter to indicate which columns of the S3 data go in which table columns. If no columns are specified, all the columns are copied to the table. For an example of using a column list, see Importing an Amazon S3 file that uses a custom delimiter. -
(format csv)
– PostgreSQL COPY arguments. The copy process uses the arguments and format of the PostgreSQL COPYcommand. In the preceding example, the COPY
command uses the comma-separated value (CSV) file format to copy the data. -
s3_uri
– A structure that contains the information identifying the Amazon S3 file. For an example of using the aws_commons.create_s3_uri function to create ans3_uri
structure, see Overview of importing Amazon S3 data.
The return value is text. For the full reference of this function, see aws_s3.table_import_from_s3.
The following examples show how to specify different kinds of files when importing Amazon S3 data.
Topics
Importing an Amazon S3 file that uses a custom delimiter
The following example shows how to import a file that uses a custom delimiter. It
also shows how to control where to put the data in the database table using the
column_list
parameter of the aws_s3.table_import_from_s3 function.
For this example, assume that the following information is organized into pipe-delimited columns in the Amazon S3 file.
1|foo1|bar1|elephant1 2|foo2|bar2|elephant2 3|foo3|bar3|elephant3 4|foo4|bar4|elephant4 ...
To import a file that uses a custom delimiter
-
Create a table in the database for the imported data.
psql=> CREATE TABLE test (a text, b text, c text, d text, e text); CREATE TABLE
-
Use the following form of the aws_s3.table_import_from_s3 function to import data from the Amazon S3 file.
You can include the aws_commons.create_s3_uri function call inline within the
aws_s3.table_import_from_s3
function call to specify the file.psql=> SELECT aws_s3.table_import_from_s3( 'test', 'a,b,d,e', 'DELIMITER ''|''', aws_commons.create_s3_uri('sampleBucket', 'pipeDelimitedSampleFile', 'us-east-2') );
The data is now in the table in the following columns.
psql=> SELECT * FROM test; a | b | c | d | e ---+------+---+---+------+----------- 1 | foo1 | | bar1 | elephant1 2 | foo2 | | bar2 | elephant2 3 | foo3 | | bar3 | elephant3 4 | foo4 | | bar4 | elephant4
Importing an Amazon S3 compressed (gzip) file
The following example shows how to import a file from Amazon S3 that is compressed with gzip.
Ensure that the file contains the following Amazon S3 metadata:
-
Key:
Content-Encoding
-
Value:
gzip
For more about adding these values to Amazon S3 metadata, see How do I add metadata to an S3 object? in the Amazon Simple Storage Service Console User Guide.
Import the gzip file into your RDS for PostgreSQL DB instance as shown following.
psql=> CREATE TABLE test_gzip(id int, a text, b text, c text, d text); CREATE TABLE psql=> SELECT aws_s3.table_import_from_s3( 'test_gzip', '', '(format csv)', 'myS3Bucket', 'test-data.gz', 'us-east-2' );
Importing an encoded Amazon S3 file
The following example shows how to import a file from Amazon S3 that has Windows-1252 encoding.
psql=> SELECT aws_s3.table_import_from_s3( 'test_table', '', 'encoding ''WIN1252''', aws_commons.create_s3_uri('sampleBucket', 'SampleFile', 'us-east-2') );
Function reference
aws_s3.table_import_from_s3
Imports Amazon S3 data into an Amazon RDS table. The aws_s3
extension provides the
aws_s3.table_import_from_s3
function. The return value is text.
Syntax
The required parameters are table_name
, column_list
and options
.
These identify the database table and specify how the data is copied into the table.
You can also use the following parameters:
-
The
s3_info
parameter specifies the Amazon S3 file to import. When you use this parameter, access to Amazon S3 is provided by an IAM role for the PostgreSQL DB instance.aws_s3.table_import_from_s3 ( table_name text, column_list text, options text, s3_info aws_commons._s3_uri_1 )
-
The
credentials
parameter specifies the credentials to access Amazon S3. When you use this parameter, you don't use an IAM role.aws_s3.table_import_from_s3 ( table_name text, column_list text, options text, s3_info aws_commons._s3_uri_1, credentials aws_commons._aws_credentials_1 )
Parameters
- table_name
-
A required text string containing the name of the PostgreSQL database table to import the data into.
- column_list
-
A required text string containing an optional list of the PostgreSQL database table columns in which to copy the data. If the string is empty, all columns of the table are used. For an example, see Importing an Amazon S3 file that uses a custom delimiter.
- options
-
A required text string containing arguments for the PostgreSQL
COPY
command. These arguments specify how the data is to be copied into the PostgreSQL table. For more details, see the PostgreSQL COPY documentation. - s3_info
-
An
aws_commons._s3_uri_1
composite type containing the following information about the S3 object:-
bucket
– The name of the Amazon S3 bucket containing the file. -
file_path
– The Amazon S3 file name including the path of the file. -
region
– The AWS Region that the file is in. For a listing of AWS Region names and associated values, see Regions, Availability Zones, and Local Zones .
-
- credentials
-
An
aws_commons._aws_credentials_1
composite type containing the following credentials to use for the import operation:-
Access key
-
Secret key
-
Session token
For information about creating an
aws_commons._aws_credentials_1
composite structure, see aws_commons.create_aws_credentials. -
Alternate syntax
To help with testing, you can use an expanded set of parameters instead of the
s3_info
and credentials
parameters. Following are
additional syntax variations for the aws_s3.table_import_from_s3
function:
-
Instead of using the
s3_info
parameter to identify an Amazon S3 file, use the combination of thebucket
,file_path
, andregion
parameters. With this form of the function, access to Amazon S3 is provided by an IAM role on the PostgreSQL DB instance.aws_s3.table_import_from_s3 ( table_name text, column_list text, options text, bucket text, file_path text, region text )
-
Instead of using the
credentials
parameter to specify Amazon S3 access, use the combination of theaccess_key
,session_key
, andsession_token
parameters.aws_s3.table_import_from_s3 ( table_name text, column_list text, options text, bucket text, file_path text, region text, access_key text, secret_key text, session_token text )
Alternate parameters
- bucket
-
A text string containing the name of the Amazon S3 bucket that contains the file.
- file_path
-
A text string containing the Amazon S3 file name including the path of the file.
- region
-
A text string containing the AWS Region that the file is in. For a listing of AWS Region names and associated values, see Regions, Availability Zones, and Local Zones .
- access_key
-
A text string containing the access key to use for the import operation. The default is NULL.
- secret_key
-
A text string containing the secret key to use for the import operation. The default is NULL.
- session_token
-
(Optional) A text string containing the session key to use for the import operation. The default is NULL.
aws_commons.create_s3_uri
Creates an aws_commons._s3_uri_1
structure to hold Amazon S3 file
information. Use the results of the aws_commons.create_s3_uri
function
in the s3_info
parameter of the aws_s3.table_import_from_s3 function.
Syntax
aws_commons.create_s3_uri( bucket text, file_path text, region text )
Parameters
- bucket
-
A required text string containing the Amazon S3 bucket name for the file.
- file_path
-
A required text string containing the Amazon S3 file name including the path of the file.
- region
-
A required text string containing the AWS Region that the file is in. For a listing of AWS Region names and associated values, see Regions, Availability Zones, and Local Zones .
aws_commons.create_aws_credentials
Sets an access key and secret key in an aws_commons._aws_credentials_1
structure. Use the results of the aws_commons.create_aws_credentials
function in the credentials
parameter of the aws_s3.table_import_from_s3 function.
Syntax
aws_commons.create_aws_credentials( access_key text, secret_key text, session_token text )
Parameters
- access_key
-
A required text string containing the access key to use for importing an Amazon S3 file. The default is NULL.
- secret_key
-
A required text string containing the secret key to use for importing an Amazon S3 file. The default is NULL.
- session_token
-
An optional text string containing the session token to use for importing an Amazon S3 file. The default is NULL. If you provide an optional
session_token
, you can use temporary credentials.
Transporting PostgreSQL databases between DB instances
By using PostgreSQL Transportable Databases for Amazon RDS, you can transport a PostgreSQL database between two DB instances. This provides an extremely fast method of migrating large databases between separate DB instances. To transport databases using this method, your DB instances must both run the same major version of PostgreSQL.
To use transportable databases, install the pg_transport
extension. This
extension provides a physical transport mechanism to move each database. By streaming
the
database files with minimal processing, physical transport moves data much faster
than
traditional dump and load processes and takes minimal downtime. PostgreSQL transportable
databases use a pull model where the destination DB instance imports the database
from the
source DB instance.
PostgreSQL transportable databases are available in RDS for PostgreSQL versions 10.10 and later, and 11.5 and later.
Topics
- Limitations for using PostgreSQL transportable databases
- Setting up to transport PostgreSQL databases
- Transporting a PostgreSQL database using the transport.import_from_server function
- What happens during database transport
- transport.import_from_server function reference
- Configuration parameters for the pg_transport extension
Limitations for using PostgreSQL transportable databases
Transportable databases have the following limitations:
-
Read replicas – You can't use transportable databases on read replicas or parent instances of read replicas.
-
Unsupported column types – You can't use the
reg
data types in any database tables that you plan to transport with this method. These types depend on system catalog object IDs (OIDs), which often change during transport. -
Tablespaces – All source database objects must be in the default
pg_default
tablespace. -
Compatibility – Both the source and destination DB instances must run the same major version of PostgreSQL.
Before transport begins, the
transport.import_from_server
function compares the source and destination DB instances to ensure database compatibility. This includes verifying PostgreSQL major version compatibility. Also, the function verifies that the destination DB instance likely has enough space to receive the source database. The function performs several additional checks to make sure that the transport is smooth. -
Extensions – The only extension that you can install on the source DB instance during transport is
pg_transport
. -
Roles and ACLs – The source database's access privileges and ownership information aren't carried over to the destination database. All database objects are created and owned by the local destination user of the transport.
-
Concurrent transports – You can run up to 32 total transports at the same time on a DB instance, including both imports and exports. To define the worker processes used for each transport, use the
pg_transport.work_mem
andpg_transport.num_workers
parameters. To accommodate concurrent transports, you might need to increase themax_worker_processes
parameter quite a bit. For more information, see Configuration parameters for the pg_transport extension.
Setting up to transport PostgreSQL databases
To prepare to transport a PostgreSQL database from one DB instance to another, take the following steps.
To set up for transporting a PostgreSQL database
-
Make sure that the source DB instance's security group allows inbound traffic from the destination DB instance. This is required because the destination DB instance starts the database transport with an import call to the source DB instance. For information about how to use security groups, see Controlling access with security groups.
-
For both the source and destination DB instances, add
pg_transport
to theshared_preload_libraries
parameter for each parameter group. Theshared_preload_libraries
parameter is static and requires a database restart for changes to take effect. For information about parameter groups, see Working with DB parameter groups. -
For both the source and destination DB instances, install the required
pg_transport
PostgreSQL extension.To do so, start psql as a user with the
rds_superuser
role for each DB instance, and then run the following command.psql=> CREATE EXTENSION pg_transport;
Transporting a PostgreSQL database using the transport.import_from_server function
After you complete the process described in Setting up to transport PostgreSQL databases, you can start the transport. To do so, run the transport.import_from_server function on the destination DB instance.
Both the destination user for transport and the source user for the connection
must be members of the rds_superuser
role.
The destination DB instance can't already contain a database with the same name as the source database to be transported, or the transport fails.
The following shows an example transport.
SELECT transport.import_from_server( '
source-db-instance-endpoint
',source-db-instance-port
, 'source-db-instance-user
', 'source-user-password
', 'source-database-name
', 'destination-user-password
', false);
This function requires that you provide database user passwords. Thus, we recommend that you change the passwords of the user roles you used after transport is complete. Or, you can use SQL bind variables to create temporary user roles. Use these temporary roles for the transport and then discard the roles afterwards.
For details of the transport.import_from_server
function and its
parameters, see
transport.import_from_server function reference.
What happens during database transport
The transport.import_from_server
function creates the in-transit database
on the destination DB instance. The in-transit database is inaccessible on the
destination DB instance for the duration of the transport.
When transport begins, all current sessions on the source database are ended. Any databases other than the source database on the source DB instance aren't affected by the transport.
The source database is put into a special read-only mode. While it's in this mode, you can connect to the source database and run read-only queries. However, write-enabled queries and some other types of commands are blocked. Only the specific source database that is being transported is affected by these restrictions.
During transport, you can't restore the destination DB instance to a point in time. This is because the transport isn't transactional and doesn't use the PostgreSQL write-ahead log to record changes. If the destination DB instance has automatic backups enabled, a backup is automatically taken after transport completes. Point-in-time restores are available for times after the backup finishes.
If the transport fails, the pg_transport
extension attempts to undo all
changes to the source and destination DB instances. This includes removing the
destination's partially transported database. Depending on the type of failure, the
source database might continue to reject write-enabled queries. If this happens, use
the
following command to allow write-enabled queries.
ALTER DATABASE
my-database
SET default_transaction_read_only = false;
transport.import_from_server function reference
The transport.import_from_server
function transports a PostgreSQL
database by importing it from a source DB instance to a destination DB instance. It
does
this by using a physical database connection transport mechanism.
Syntax
transport.import_from_server( host text, port int, username text, password text, database text, local_password text, dry_run bool )
Return Value
None.
Parameters
You can find descriptions of the transport.import_from_server
function
parameters in the following table.
Parameter | Description |
---|---|
host |
The endpoint of the source DB instance. |
port |
An integer representing the port of the source DB instance.
PostgreSQL DB instances often use port 5432. |
username |
The user of the source DB instance. This user must be a member of
the |
password |
The user password of the source DB instance. |
database |
The name of the database in the source DB instance to transport. |
local_password |
The local password of the current user for the destination DB
instance. This user must be a member of the
|
dry_run |
An optional Boolean value specifying whether to perform a dry
run. The default is dry_run to true .
|
Example
For an example, see Transporting a PostgreSQL database using the transport.import_from_server function.
Configuration parameters for the pg_transport extension
Use the following parameters to configure the pg_transport
extension
behavior.
SET pg_transport.num_workers =
integer
; SET pg_transport.work_mem =kilobytes
; SET pg_transport.timing =Boolean
;
You can find descriptions of these parameters in the following table.
Parameter | Description |
---|---|
pg_transport.num_workers |
The number of workers to use for a physical transport. The default is 3. Valid values are 1–32. Even large transports typically reach their maximum throughput with fewer than 8 workers. During transport, the A related parameter is the PostgreSQL
We recommend that you set For more information about the |
pg_transport.timing
|
A Boolean value that specifies whether to report timing
information during the transport. The default is We don't recommend that you set this parameter to
|
pg_transport.work_mem
|
The maximum amount of memory to allocate for each worker. The default is 131,072 kilobytes (KB). The minimum value is 64 megabytes (65,536 KB). Valid values are in kilobytes (KBs) as binary base-2 units, where 1 KB = 1,024 bytes. The transport might use less memory than is specified in this parameter. Even large transports typically reach their maximum throughput with less than 256 MB (262,144 KB) of memory per worker. |