Transport PostgreSQL databases between two Amazon RDS DB instances using pg_transport - AWS Prescriptive Guidance

Transport PostgreSQL databases between two Amazon RDS DB instances using pg_transport

Created by Raunak Rishabh (AWS) and Jitender Kumar (AWS)

Environment: PoC or pilot

Source: Databases: Relational

Target: Amazon RDS for PostgreSQL

R Type: Relocate

Workload: Open-source

Technologies: Migration; Databases

AWS services: Amazon RDS

Summary

This pattern describes the steps for migrating extremely large databases between two Amazon Relational Database Service (Amazon RDS) for PostgreSQL DB instances by using the pg_transport extension. This extension provides a physical transport mechanism to move each database. By streaming the database files with minimal processing, it provides an extremely fast method for migrating large databases between DB instances with minimal downtime. This extension uses a pull model where the target DB instance imports the database from the source DB instance.

Prerequisites and limitations

Prerequisites 

  • Both DB instances must run the same major version of PostgreSQL.

  • The database must not exist on the target. Otherwise, the transport fails.

  • No extension other than pg_transport must be enabled in the source database.

  • All source database objects must be in the default pg_default tablespace.

  • The security group of the source DB instance should allow traffic from the target DB instance.

  • Install a PostgreSQL client like psql or PgAdmin to work with the Amazon RDS PostgreSQL DB instance. You can install the client either in your local system or use an Amazon Elastic Compute Cloud (Amazon EC2) instance. In this pattern, we use psql on an EC2 instance.

Limitations 

  • You can't transport databases between different major versions of Amazon RDS for PostgreSQL.

  • The access privileges and ownership from the source database are not transferred to the target database.

  • You can't transport databases on read replicas or on parent instances of read replicas.

  • You can't use reg data types in any database tables that you plan to transport with this method.

  • You can run up to 32 total transports (including both imports and exports) at the same time on a DB instance.

  • You cannot rename or include/exclude tables. Everything is migrated as is.

Caution

  • Make backups before removing the extension, because removing the extension also removes dependent objects and some data that's critical to the operation of the database.

  • Consider the instance class and processes running on other databases on the source instance when you determine the number of workers and work_mem values for pg_transport.

  • When the transport starts, all connections on the source database are ended and the database is put into read-only mode.

Note: When the transport is running on one database, it doesn’t affect other databases on the same server.

Product versions

Architecture

Transporting PostgreSQL databases between Amazon RDS DB instances

Tools

  • pg_transport provides a physical transport mechanism to move each database. By streaming the database files with minimal processing, physical transport moves data much faster than traditional dump and load processes and requires minimal downtime. PostgreSQL transportable databases use a pull model where the destination DB instance imports the database from the source DB instance. You install this extension on your DB instances when you prepare the source and target environments, as explained in this pattern.

  • psql enables you to connect to, and work with, your PostgreSQL DB instances. To install psql on your system, see the PostgreSQL Downloads page.

Epics

TaskDescriptionSkills required

Create a parameter group for the target system.

Specify a group name that identifies it as a target parameter group; for example, pgtarget-param-group. For instructions, see the Amazon RDS documentation.

DBA

Modify the parameters for the parameter group.

Set the following parameters:

  1. Add pg_transport to the shared_preload_libraries parameter.

    shared_preload_libraries = pg_stat_statements, pg_transport
  2. Set the pg_transport.num_workers parameter. Choose the number of workers you want to run the transport with. The value you set determines the number of transport.send_file workers that will be created in the source.

  3. Increase the the value of max_worker_processes to more than three times the value of pg_transport.num_workers. For example, if you set the value of pg_transport.num_workers to 4, the max_worker_processes value should be at least 13. If this fails, pg_transport recommends a minimum value. 

  4. Set pg_transport.timing to 1. This setting enables the reporting of timing information during the transport.

  5. Set the pg_transport.work_mem parameter. This parameter specifies the maximum memory to allocate to each worker. The default value is 128 MB.

For more information about these parameters, see the Amazon RDS documentation.

DBA
TaskDescriptionSkills required

Create a parameter group for the source system.

Specify a group name that identifies it as a source parameter group; for example, pgsource-param-group. For instructions, see the Amazon RDS documentation.

DBA

Modify the parameters for the parameter group.

Set the following parameters:

  1. Add pg_transport to the shared_preload_libraries parameter.

    shared_preload_libraries = pg_stat_statements, pg_transport
  2. Set the pg_transport.num_workers parameter. The value of this parameter defined in the target determines the number of transport.send_file workers to be used. If you have an import running on this instance, increase this value, but consider the number of workers that are already running.

  3. Increase the the value of max_worker_processes to more than three times the value of pg_transport.num_workers on the target. For example, if you set the value of pg_transport.num_workers to 4 on the target, the max_worker_processes value on the source should be at least 13. If this fails, pg_transport recommends a minimum value. 

  4. Set the pg_transport.work_mem parameter. This parameter specifies the maximum memory to allocate to each worker. The default value is 128 MB.

For more information about these parameters, see the Amazon RDS documentation.

DBA
TaskDescriptionSkills required

Create a new Amazon RDS for PostgreSQL DB instance to transport your source database to.

Determine the instance class and PostgreSQL version based on your business requirements.

DBA, Systems administrator, Database architect

Modify the security group of the target to allow connections on the DB instance port from the EC2 instance.

By default, the port for the PostgreSQL instance is 5432. If you're using another port, connections to that port must be open for the EC2 instance.

DBA, Systems administrator

Modify the instance, and assign the new target parameter group.

For example, pgtarget-param-group.

DBA

Restart the target Amazon RDS DB instance.

The parameters shared_preload_libraries and max_worker_processes are static parameters and require a reboot of the instance.

DBA, Systems administrator

Connect to the database from the EC2 instance using psql.

Use the command: 

psql -h <rds_end_point> -p PORT -U username -d database -W
DBA

Create the pg_transport extension.

Run the following query as a user with the rds_superuser role:

create extension pg_transport;
DBA
TaskDescriptionSkills required

Modify the security group of the source to allow connections on the DB instance port from the Amazon EC2 instance and target DB instance

By default, the port for PostgreSQL instance is 5432. If you're using another port, connections to that port must be open for the EC2 instance.

DBA, Systems administrator

Modify the instance and assign the new source parameter group.

For example, pgsource-param-group.

DBA

Restart the source Amazon RDS DB instance.

The parameters shared_preload_libraries and max_worker_processes are static parameters and require a reboot of the instance.

DBA

Connect to the database from the EC2 instance using psql.

Use the command: 

psql -h <rds_end_point> -p PORT -U username -d database -W
DBA

Create the pg_transport extension and remove all other extensions from the databases to be transported.

The transport will fail if there are any extensions other than pg_transport installed on the source database. This command must by run by a user with the rds_superuser role.

DBA
TaskDescriptionSkills required

Perform a dry run.

Use the transport.import_from_server function to perform a dry run first:

SELECT transport.import_from_server( 'source-db-instance-endpoint', source-db-instance-port, 'source-db-instance-user', 'source-user-password', 'source-database-name', 'destination-user-password', 'true');

The last parameter of this function (set to true) defines the dry run.

This function displays any errors that you would see when you run the main transport. Resolve the errors before you run the main transport.

DBA

If the dry run is successful, initiate the database transport.

Run the transport.import_from_server function to perform the transport. It connects to the source and imports the data. 

SELECT transport.import_from_server( 'source-db-instance-endpoint', source-db-instance-port, 'source-db-instance-user', 'source-user-password', 'source-database-name', 'destination-user-password', false);

The last parameter of this function (set to false) indicates that this isn’t a dry run.

DBA

Perform post-transport steps.

After the database transport is complete:

  • Validate the data in the target environment.

  • Add all the roles and permissions to the target.

  • Enable all required extensions in the target and source, if required.

  • Revert the value of the max_worker_processes parameter.

DBA

Related resources