Best practices for AWS Database Migration Service - AWS Database Migration Service

Best practices for AWS Database Migration Service

To use AWS Database Migration Service (AWS DMS) most effectively, see this section's recommendations on the most efficient way to migrate your data.

Migration planning for AWS Database Migration Service

When planning a database migration using AWS Database Migration Service, consider the following:

  • You will need to configure a network that connects your source and target databases to a AWS DMS replication instance. This can be as simple as connecting two AWS resources in the same VPC as the replication instance to more complex configurations such as connecting an on-premises database to an Amazon RDS DB instance over VPN. For more information, see Network configurations for database migration

  • Source and target endpoints – You will need to know what information and tables in the source database need to be migrated to the target database. AWS DMS supports basic schema migration, including the creation of tables and primary keys. However, AWS DMS doesn't automatically create secondary indexes, foreign keys, user accounts, and so on in the target database. Note that, depending on your source and target database engine, you may need to set up supplemental logging or modify other settings for a source or target database. For more information, see Sources for data migration and Targets for data migration.

  • Schema/Code migration – AWS DMS doesn't perform schema or code conversion. You can use tools such as Oracle SQL Developer, MySQL Workbench, or pgAdmin III to convert your schema. If you want to convert an existing schema to a different database engine, you can use the AWS Schema Conversion Tool. It can create a target schema and also can generate and create an entire schema: tables, indexes, views, and so on. You can also use the tool to convert PL/SQL or TSQL to PgSQL and other formats. For more information on the AWS Schema Conversion Tool, see AWS Schema Conversion Tool .

  • Unsupported data types – Some source data types need to be converted into the equivalent data types for the target database. For more information on supported data types, see the source or target section for your data store.

Improving the performance of an AWS DMS migration

A number of factors affect the performance of your AWS DMS migration:

  • Resource availability on the source

  • The available network throughput

  • The resource capacity of the replication server

  • The ability of the target to ingest changes

  • The type and distribution of source data

  • The number of objects to be migrated

In our tests, we've migrated a terabyte of data in approximately 12–13 hours using a single AWS DMS task and under ideal conditions. These ideal conditions included using source databases running on Amazon EC2 and in Amazon RDS with target databases in Amazon RDS, all in the same Availability Zone. Our source databases contained a representative amount of relatively evenly distributed data with a few large tables containing up to 250 GB of data. The source data didn't contain complex data types, such as BLOB.

You can improve performance by using some or all of the best practices mentioned following. Whether you can use one of these practices or not depends in large part on your specific use case. We mention limitations as appropriate.

Loading multiple tables in parallel

By default, AWS DMS loads eight tables at a time. You might see some performance improvement by increasing this slightly when using a very large replication server, such as a dms.c4.xlarge or larger instance. However, at some point, increasing this parallelism reduces performance. If your replication server is relatively small, such as a dms.t2.medium, we recommend that you reduce the number of tables loaded in parallel.

To change this number in the AWS Management Console, open the console, choose Tasks, choose to create or modify a task, and then choose Advanced Settings. Under Tuning Settings, change the Maximum number of tables to load in parallel option.

To change this number using the AWS CLI, change the MaxFullLoadSubTasks parameter under TaskSettings.

Working with indexes, triggers, and referential integrity constraints

Indexes, triggers, and referential integrity constraints can affect your migration performance and cause your migration to fail. How these affect migration depends on whether your replication task is a full load task or an ongoing replication (CDC) task.

For a full load task, we recommend that you drop primary key indexes, secondary indexes, referential integrity constraints, and data manipulation language (DML) triggers. Alternatively, you can delay their creation until after the full load tasks are complete. You don't need indexes during a full load task, and indexes incur maintenance overhead if they are present. Because the full load task loads groups of tables at a time, referential integrity constraints are violated. Similarly, insert, update, and delete triggers can cause errors, for example, if a row insert is triggered for a previously bulk loaded table. Other types of triggers also affect performance due to added processing.

You can build primary key and secondary indexes before a full load task if your data volumes are relatively small and the additional migration time doesn't concern you. Referential integrity constraints and triggers should always be disabled.

For a full load + CDC task, we recommend that you add secondary indexes before the CDC phase. Because AWS DMS uses logical replication, secondary indexes that support DML operations should be in-place to prevent full table scans. You can pause the replication task before the CDC phase to build indexes, create triggers, and create referential integrity constraints before you restart the task.

Disable backups and transaction logging

When migrating to an Amazon RDS database, it's a good idea to disable backups and Multi-AZ on the target until you're ready to cut over. Similarly, when migrating to systems other than Amazon RDS, disabling any logging on the target until after cutover is usually a good idea.

Use multiple tasks

Sometimes using multiple tasks for a single migration can improve performance. If you have sets of tables that don't participate in common transactions, you might be able to divide your migration into multiple tasks. Transactional consistency is maintained within a task, so it's important that tables in separate tasks don't participate in common transactions. Additionally, each task independently reads the transaction stream, so be careful not to put too much stress on the source database.

You can use multiple tasks to create separate streams of replication. By doing this, you can parallelize the reads on the source, the processes on the replication instance, and the writes to the target database.

Optimizing change processing

By default, AWS DMS processes changes in a transactional mode, which preserves transactional integrity. If you can afford temporary lapses in transactional integrity, you can use the batch optimized apply option instead. This option efficiently groups transactions and applies them in batches for efficiency purposes. Using the batch optimized apply option almost always violates referential integrity constraints. So we recommend that you disable these during the migration process and enable them again as part of the cutover process.

Reducing the load on your source database

AWS DMS uses some resources on your source database. During a full load task, AWS DMS performs a full table scan of the source table for each table processed in parallel. Additionally, each task you create as part of a migration queries the source for changes as part of the CDC process. For AWS DMS to perform CDC for some sources, such as Oracle, you might need to increase the amount of data written to your database's change log.

If you find you are overburdening your source database, you can reduce the number of tasks or tables for each task for your migration. Each task gets source changes independently, so consolidating tasks can decrease the change capture workload.

Using the task log to troubleshoot migration issues

In some cases, AWS DMS can encounter issues for which warnings or error messages appear only in the task log. In particular, data truncation issues or row rejections due to foreign key violations are only written in the task log. Therefore, be sure to review the task log when migrating a database. To enable viewing of the task log, configure Amazon CloudWatch as part of task creation.

Converting schema

AWS DMS doesn't perform schema or code conversion. If you want to convert an existing schema to a different database engine, you can use the AWS Schema Conversion Tool (AWS SCT). AWS SCT converts your source objects, table, indexes, views, triggers, and other system objects into the target data definition language (DDL) format. You can also use AWS SCT to convert most of your application code, like PL/SQL or TSQL, to the equivalent target language.

You can get AWS SCT as a free download from AWS. For more information on AWS SCT, see the AWS Schema Conversion Tool User Guide.

If your source and target endpoints are on the same database engine, you can use tools such as Oracle SQL Developer, MySQL Workbench, or PgAdmin4 to move your schema.

Migrating large binary objects (LOBs)

In general, AWS DMS migrates LOB data in two phases:

  1. AWS DMS creates a new row in the target table and populates the row with all data except the associated LOB value.

  2. AWS DMS updates the row in the target table with the LOB data.

This migration process for LOBs requires that, during the migration, all LOB columns on the target table must be nullable. This is so even if the LOB columns aren't nullable on the source table. If AWS DMS creates the target tables, it sets LOB columns to nullable by default. In some cases, you might create the target tables using some other mechanism, such as import or export. In such cases, make sure that the LOB columns are nullable before you start the migration task.

This requirement has one exception. Suppose that you perform a homogeneous migration from an Oracle source to an Oracle target, and you choose Limited Lob mode. In this case, the entire row is populated at once, including any LOB values. For such a case, AWS DMS can create the target table LOB columns with not-nullable constraints, if needed.

Using limited LOB mode

AWS DMS uses two methods that balance performance and convenience when your migration contains LOB values:

  1. Limited LOB mode migrates all LOB values up to a user-specified size limit (default is 32 KB). LOB values larger than the size limit must be manually migrated. Limited LOB mode, the default for all migration tasks, typically provides the best performance. However you need to ensure that the Max LOB size parameter setting is correct. This parameter should be set to the largest LOB size for all your tables.

  2. Full LOB mode migrates all LOB data in your tables, regardless of size. Full LOB mode provides the convenience of moving all LOB data in your tables, but the process can have a significant impact on performance.

For some database engines, such as PostgreSQL, AWS DMS treats JSON data types like LOBs. Make sure that if you have chosen Limited LOB mode the Max LOB size option is set to a value that doesn't cause the JSON data to be truncated.

AWS DMS provides full support for using large object data types (BLOBs, CLOBs, and NCLOBs). The following source endpoints have full LOB support:

  • Oracle

  • Microsoft SQL Server

  • ODBC

The following target endpoints have full LOB support:

  • Oracle

  • Microsoft SQL Server

The following target endpoint has limited LOB support. You can't use an unlimited LOB size for this target endpoint.

  • Amazon Redshift

For endpoints that have full LOB support, you can also set a size limit for LOB data types.

Ongoing replication

AWS DMS provides ongoing replication of data, keeping the source and target databases in sync. It replicates only a limited amount of data definition language (DDL). AWS DMS doesn't propagate items such as indexes, users, privileges, stored procedures, and other database changes not directly related to table data.

If you plan to use ongoing replication, you should enable the Multi-AZ option when you create your replication instance. By choosing the Multi-AZ option, you get high availability and failover support for the replication instance. However, this option can have an impact on performance.

Improving performance when migrating large tables

If you want to improve the performance when migrating a large table, you can break the migration into more than one task. To break the migration into multiple tasks using row filtering, use a key or a partition key. For example, if you have an integer primary key ID from 1 to 8,000,000, you can create eight tasks using row filtering to migrate 1 million records each.

To apply row filtering in the AWS Management Console, open the console, choose Tasks, and create a new task. In the Table mappings section, add a value for Selection Rule. You can then add a column filter with either a less than or equal to, greater than or equal to, equal to, or range condition (between two values). For more information about column filtering, see Specifying table selection and transformations rules from the console.

Alternatively, if you have a large partitioned table that is partitioned by date, you can migrate data based on date. For example, suppose that you have a table partitioned by month, and only the current month's data is updated. In this case, you can create a full load task for each static monthly partition and create a full load + CDC task for the currently updated partition.

Using your own on-premises name server

Usually, an AWS DMS replication instance uses the Domain Name System (DNS) resolver in an Amazon EC2 instance to resolve domain endpoints. However, you can use your own on-premises name server to resolve certain endpoints if you use the Amazon Route 53 Resolver. With this tool, you can query between on-premises and AWS using inbound and outbound endpoints, forwarding rules, and a private connection. The benefits of using an on-premises name server include improved security and ease of use behind a firewall.

Inbound endpoints enable DNS queries that originate on-premises to resolve AWS hosted domains. To configure the endpoints, assign IP addresses in each subnet that you want to provide a resolver. To establish connectivity between your on-premises DNS infrastructure and AWS, use AWS Direct Connect or a virtual private network (VPN).

Outbound endpoints connect to your on-premises name server. The name server only grants access to IP addresses included in an allow list and set in an outbound endpoint. The IP address of your name server is the target IP address. When you choose a security group for an outbound endpoint, choose the same security group used by the replication instance.

To forward select domains to the name server, use forwarding rules. An outbound endpoint can handle multiple forwarding rules. The scope of the forwarding rule is your virtual private cloud (VPC). By using a forwarding rule associated with a VPC, you can provision a logically isolated section of the AWS Cloud. From this logically isolated section, you can launch AWS resources in a virtual network.

You can configure domains hosted within your on-premises DNS infrastructure as conditional forwarding rules that enable outbound DNS queries. When a query is made to one of those domains, rules trigger an attempt to forward DNS requests to servers that were configured with the rules. Again, a private connection over AWS Direct Connect or VPN is required.

The following diagram shows the Route 53 Resolver architecture.


                Route 53 Resolver Architecture

For more information about AWS Route-53 DNS Resolver, see Getting started with Route 53 Resolver in the Amazon Route 53 Developer Guide.

Using Amazon Route 53 Resolver with AWS DMS

You can create an on-premises name server for AWS DMS to resolve endpoints using Amazon Route 53 Resolver. A description of the process follows.

To create an on-premises name server for DMS based on Route 53

  1. Sign in to the AWS Management Console and open the Route 53 console at https://console.aws.amazon.com/route53/.

  2. On the Route 53 console, choose the AWS Region in which you want to configure your Route 53 Resolver. The Route 53 Resolver is specific to a Region.

  3. Choose the query direction—inbound, outbound, or both.

  4. Provide your inbound query configuration:

    1. Enter an endpoint name and choose a VPC.

    2. Assign one or more subnets from within the VPC (for example, choose two for availability).

    3. Assign specific IP addresses to use as endpoints, or have Route 53 Resolver assign them automatically.

  5. Create a rule for your on-premises domain so that workloads inside the VPC can route DNS queries to your DNS infrastructure.

  6. Enter one or more IP addresses for your on-premises DNS servers.

  7. Submit your rule.

When everything is created, your VPC is associated with your inbound and outbound rules and can start routing traffic.

For more information about Route 53 Resolver, see Getting started with Route 53 Resolver in the Amazon Route 53 Developer Guide.

Changing the user and schema for an Oracle target

When you use Oracle as a target, AWS DMS migrates the data to the schema owned by the target endpoint's user.

For example, suppose that you're migrating a schema named PERFDATA to an Oracle target endpoint, and that the target endpoint user name is MASTER. AWS DMS will connect to the Oracle target as MASTER, and populate the MASTER schema with database objects from PERFDATA.

To override this behavior, you need to provide a schema transformation. For example, if you want to migrate the PERFDATA schema objects to a PERFDATA schema at the target endpoint, you can use the following transformation:

{ "rule-type": "transformation", "rule-id": "2", "rule-name": "2", "object-locator": { "schema-name": "PERFDATA" }, "rule-target": "schema", "rule-action": "rename", "value": "PERFDATA" }

For more information about transformations, see Specifying table selection and transformations rules using JSON.

Changing table and index tablespaces for an Oracle target

When using Oracle as a target, AWS DMS migrates all tables and indexes to default table and index tablespaces in the target. For example, suppose that your source is a database engine other than Oracle. All of the target tables and indexes are migrated to the same default tablespaces.

To override this behavior, you need to provide corresponding tablespace transformations. For example, suppose that you want to migrate tables and indexes to table and index tablespaces in the Oracle target that are named after the schema in the source. In this case, you can use transformations similar to the following. Here, the schema in the source is named INVENTORY and corresponding table and index tablespaces in the target are named INVENTORYTBL and INVENTORYIDX, respectively.

{ "rule-type": "transformation", "rule-id": "3", "rule-name": "3", "rule-action": "rename", "rule-target": "table-tablespace", "object-locator": { "schema-name": "INVENTORY", "table-name": "%", "table-tablespace-name": "%" }, "value": "INVENTORYTBL" }, { "rule-type": "transformation", "rule-id": "4", "rule-name": "4", "rule-action": "rename", "rule-target": "index-tablespace", "object-locator": { "schema-name": "INVENTORY", "table-name": "%", "index-tablespace-name": "%" }, "value": "INVENTORYIDX" }

For more information about transformations, see Specifying table selection and transformations rules using JSON.

When Oracle is both source and target, you can preserve existing table or index tablespace assignments by setting the Oracle source extra connection attribute, enableHomogenousTablespace=true. For more information, see Extra connection attributes when using Oracle as a source for AWS DMS