Menu
AWS Database Migration Service
User Guide (Version API Version 2016-01-01)

Using a PostgreSQL Database as a Source for AWS DMS

You can migrate data from one or many PostgreSQL databases using AWS DMS (AWS DMS). With a PostgreSQL database as a source, you can migrate data to either another PostgreSQL database or one of the other supported databases. AWS DMS supports a PostgreSQL version 9.4 database as a source for on-premises databases, databases on an EC2 instance, and databases on an Amazon RDS DB instance.

You can use SSL to encrypt connections between your PostgreSQL endpoint and the replication instance. For more information on using SSL with a PostgreSQL endpoint, see Using SSL With AWS Database Migration Service.

AWS DMS supports change data capture (CDC) for PostgreSQL tables with primary keys; if a table does not have a primary key, the WAL logs do not include a before image of the database row and AWS DMS cannot update the table.

AWS DMS supports CDC on Amazon RDS PostgreSQL databases when the DB instance is configured to use logical replication. Amazon RDS supports logical replication for a PostgreSQL DB instance version 9.4.9 and higher and 9.5.4 and higher.

For additional details on working with PostgreSQL databases and AWS DMS, see the following sections.

Prerequisites for Using a PostgreSQL Database as a Source for AWS DMS

For a PostgreSQL database to be a source for AWS DMS, you should do the following:

  • Use a PostgreSQL database that is version 9.4.x or later.

  • Grant superuser permissions for the user account specified for the PostgreSQL source database.

  • Add the IP address of the AWS DMS replication server to the pg_hba.conf configuration file.

  • Set the following parameters and values in the postgresql.conf configuration file:

    • Set wal_level = logical

    • Set max_replication_slots >=1

      The max_replication_slots value should be set according to the number of tasks that you want to run. For example, to run five tasks you need to set a minimum of five slots. Slots open automatically as soon as a task starts and remain open even when the task is no longer running. You need to manually delete open slots.

    • Set max_wal_senders >=1

      The max_wal_senders parameter sets the number of concurrent tasks that can run.

    • Set wal_sender_timeout =0

      The wal_sender_timeout parameter terminates replication connections that are inactive longer than the specified number of milliseconds. Although the default is 60 seconds, we recommend that you set this parameter to zero, which disables the timeout mechanism.

Security Requirements When Using a PostgreSQL Database as a Source for AWS DMS

The only security requirement when using PostgreSQL as a source is that the user account specified must be a registered user in the PostgreSQL database.

Limitations on Using a PostgreSQL Database as a Source for AWS DMS

The following change data capture (CDC) limitations apply when using PostgreSQL as a source for AWS DMS:

  • A captured table must have a primary key. If a table doesn't have a primary key, AWS DMS ignores DELETE and UPDATE record operations for that table.

  • AWS DMS ignores an attempt to update a primary key segment. In these cases, the target identifies the update as one that didn't update any rows, but since the results of updating a primary key in PostgreSQL is unpredictable, no records are written to the exceptions table.

  • AWS DMS doesn't support the Start Process Changes from Timestamp run option.

  • AWS DMS supports full load and change processing on Amazon RDS for PostgreSQL. For information on how to prepare a PostgreSQL DB instance and to set it up for using CDC, see Setting Up an Amazon RDS PostgreSQL DB Instance as a Source.

  • AWS DMS doesn't map some PostgreSQL data types, including the JSON data type. The JSON is converted to CLOB.

  • Replication of multiple tables with the same name but where each name has a different case (for example table1, TABLE1, and Table1) can cause unpredictable behavior, and therefore AWS DMS doesn't support it.

  • AWS DMS supports change processing of CREATE, ALTER, and DROP DDL statements for tables unless the tables are held in an inner function or procedure body block or in other nested constructs.

    For example, the following change is not captured:

    Copy
    CREATE OR REPLACE FUNCTION attu.create_distributors1() RETURNS void LANGUAGE plpgsql AS $$ BEGIN create table attu.distributors1(did serial PRIMARY KEY,name varchar(40) NOT NULL); END; $$;
  • AWS DMS doesn't support change processing of TRUNCATE operations.

  • AWS DMS doesn't support replication of partitioned tables. When a partitioned table is detected, the following occurs:

    • The endpoint reports a list of parent and child tables.

    • AWS DMS creates the table on the target as a regular table with the same properties as the selected tables.

    • If the parent table in the source database has the same primary key value as its child tables, a "duplicate key" error is generated.

Note

To replicate partitioned tables from a PostgreSQL source to a PostgreSQL target, you first need to manually create the parent and child tables on the target. Then you define a separate task to replicate to those tables. In such a case, you set the task configuration to Truncate before loading.

Setting Up an Amazon RDS PostgreSQL DB Instance as a Source

You can use an Amazon RDS for PostgreSQL DB instance or Read Replica as a source for AWS DMS. A DB instance can be used for both full-load and CDC (ongoing replication); a Read Replica can only be used for full-load tasks and cannot be used for CDC.

You use the AWS master user account for the PostgreSQL DB instance as the user account for the PostgreSQL source endpoint for AWS DMS. The master user account has the required roles that allow it to set up change data capture (CDC). If you use an account other than the master user account, the account must have the rds_superuser role and the rds_replication role. The rds_replication role grants permissions to manage logical slots and to stream data using logical slots.

If you don't use the master user account for the DB instance, you must create several objects from the master user account for the account that you use. For information about creating the needed objects, see Migrating an Amazon RDS for PostgreSQL Database Without Using the Master User Account.

Using CDC with an RDS for PostgreSQL DB Instance

You can use PostgreSQL's native logical replication feature to enable CDC during a database migration of an Amazon RDS PostgreSQL DB instance. This approach reduces downtime and ensures that the target database is in sync with the source PostgreSQL database. Amazon RDS supports logical replication for a PostgreSQL DB instance version 9.4.9 and higher and 9.5.4 and higher.

Note

Amazon RDS for PostgreSQL Read Replicas cannot be used for CDC (ongoing replication).

To enable logical replication for an RDS PostgreSQL DB instance, do the following:

  • In general, use the AWS master user account for the PostgreSQL DB instance as the user account for the PostgreSQL source endpoint. The master user account has the required roles that allow the it to set up CDC. If you use an account other than the master user account, you must create several objects from the master account for the account that you use. For more information, see Migrating an Amazon RDS for PostgreSQL Database Without Using the Master User Account.

  • Set the rds.logical_replication parameter in your DB parameter group to 1. This is a static parameter that requires a reboot of the DB instance for the parameter to take effect. As part of applying this parameter, AWS DMS sets the wal_level, max_wal_senders, max_replication_slots, and max_connections parameters. Note that these parameter changes can increase WAL generation so you should only set the rds.logical_replication parameter when you are using logical slots.

Migrating an Amazon RDS for PostgreSQL Database Without Using the Master User Account

If you don't use the master user account for the Amazon RDS PostgreSQL DB instance that you are using as a source, you need to create several objects to capture data definition language (DDL) events. You create these objects in the account other than the master account and then create a trigger in the master user account.

Note

If you set the captureDDL parameter to N on the source endpoint, you don't have to create the following table and trigger on the source database.

Use the following procedure to create these objects. The user account other than the master account is referred to as the NoPriv account in this procedure.

To create objects

  1. Choose a schema where the objects will be created. The default schema is public. Ensure that the schema exists and is accessible by the NoPriv account.

  2. Log in to the PostgreSQL DB instance using the NoPriv account.

  3. Create the table awsdms_ddl_audit by running the following command, replacing <objects_schema> in the code following with the name of the schema to use.

    Copy
    create table <objects_schema>.awsdms_ddl_audit ( c_key bigserial primary key, c_time timestamp, -- Informational c_user varchar(64), -- Informational: current_user c_txn varchar(16), -- Informational: current transaction c_tag varchar(24), -- Either 'CREATE TABLE' or 'ALTER TABLE' or 'DROP TABLE' c_oid integer, -- For future use - TG_OBJECTID c_name varchar(64), -- For future use - TG_OBJECTNAME c_schema varchar(64), -- For future use - TG_SCHEMANAME. For now - holds current_schema c_ddlqry text -- The DDL query associated with the current DDL event )
  4. Create the function awsdms_intercept_ddl by running the following command, replacing <objects_schema> in the code following with the name of the schema to use.

    Copy
    CREATE OR REPLACE FUNCTION <objects_schema>.awsdms_intercept_ddl() RETURNS event_trigger LANGUAGE plpgsql SECURITY DEFINER AS $$ declare _qry text; BEGIN if (tg_tag='CREATE TABLE' or tg_tag='ALTER TABLE' or tg_tag='DROP TABLE') then SELECT current_query() into _qry; insert into <objects_schema>.awsdms_ddl_audit values ( default,current_timestamp,current_user,cast(TXID_CURRENT()as varchar(16)),tg_tag,0,'',current_schema,_qry ); delete from <objects_schema>.awsdms_ddl_audit; end if; END; $$;
  5. Log out of the NoPriv account and log in with an account that has the rds_superuser role assigned to it.

  6. Create the event trigger awsdms_intercept_ddl by running the following command.

    Copy
    CREATE EVENT TRIGGER awsdms_intercept_ddl ON ddl_command_end EXECUTE PROCEDURE <objects_schema>.awsdms_intercept_ddl();

When you have completed the procedure preceding, you can create the AWS DMS source endpoint using the NoPriv account.

Using CDC with an Amazon RDS for PostgreSQL DB Instance

You can use PostgreSQL's native logical replication feature to enable CDC during a database migration of an Amazon RDS PostgreSQL DB instance. This approach reduces downtime and ensures that the target database is in sync with the source PostgreSQL database. Amazon RDS supports logical replication for a PostgreSQL DB instance version 9.4.9 and higher and 9.5.4 and higher.

To enable logical replication for an RDS PostgreSQL DB instance, do the following:

  • In general, use the AWS master user account for the PostgreSQL DB instance as the user account for the PostgreSQL source endpoint. The master user account has the required roles that allow the it to set up CDC. If you use an account other than the master user account, you must create several objects from the master account for the account that you use. For more information, see Migrating an Amazon RDS for PostgreSQL Database Without Using the Master User Account.

  • Set the rds.logical_replication parameter in your DB parameter group to 1. This is a static parameter that requires a reboot of the DB instance for the parameter to take effect. As part of applying this parameter, AWS DMS sets the wal_level, max_wal_senders, max_replication_slots, and max_connections parameters. Note that these parameter changes can increase WAL generation so you should only set the rds.logical_replication parameter when you are using logical slots.

Removing AWS DMS Artifacts from a PostgreSQL Source Database

To capture DDL events, AWS DMS creates various artifacts in the PostgreSQL database when a migration task starts. When the task completes, you might want to remove these artifacts. To remove the artifacts, issue the following statements (in the order they appear), where {AmazonRDSMigration} is the schema in which the artifacts were created:

Copy
drop event trigger awsdms_intercept_ddl;

Note that the event trigger doesn't belong to a specific schema.

Copy
drop function {AmazonRDSMigration}.awsdms_intercept_ddl() drop table {AmazonRDSMigration}.awsdms_ddl_audit drop schema {AmazonRDSMigration}

Note

Dropping a schema should be done with extreme caution, if at all. Never drop an operational schema, especially not a public one.

Additional Configuration Settings When Using a PostgreSQL Database as a Source for AWS DMS

You can add additional configuration settings when migrating data from a PostgreSQL database in two way.

  • You can add values to the Extra Connection attribute to capture DDL events and to specify the schema in which the operational DDL database artifacts are created. For more information, see PostgreSQL.

  • You can override connection string parameters. Select this option if you need to do either of the following:

    • Specify internal AWS DMS parameters. Such parameters are rarely required and are therefore not exposed in the user interface.

    • Specify pass-through (passthru) values for the specific database client. AWS DMS includes pass-through parameters in the connection sting passed to the database client.