How AWS Database Migration Service Works - AWS Database Migration Service

How AWS Database Migration Service Works

AWS Database Migration Service (AWS DMS) is a web service that you can use to migrate data from a source data store to a target data store. These two data stores are called endpoints. You can migrate between source and target endpoints that use the same database engine, such as from an Oracle database to an Oracle database. You can also migrate between source and target endpoints that use different database engines, such as from an Oracle database to a PostgreSQL database. The only requirement to use AWS DMS is that one of your endpoints must be on an AWS service. You can't use AWS DMS to migrate from an on-premises database to another on-premises database.

For information on the cost of database migration, see the AWS Database Migration Service pricing page.

Use the following topics to better understand AWS DMS.

High-Level View of AWS DMS

To perform a database migration, AWS DMS connects to the source data store, reads the source data, and formats the data for consumption by the target data store. It then loads the data into the target data store. Most of this processing happens in memory, though large transactions might require some buffering to disk. Cached transactions and log files are also written to disk.

At a high level, when using AWS DMS you do the following:

  • Create a replication server.

  • Create source and target endpoints that have connection information about your data stores.

  • Create one or more migration tasks to migrate data between the source and target data stores.

A task can consist of three major phases:

  • The full load of existing data

  • The application of cached changes

  • Ongoing replication

During a full load migration, where existing data from the source is moved to the target, AWS DMS loads data from tables on the source data store to tables on the target data store. While the full load is in progress, any changes made to the tables being loaded are cached on the replication server; these are the cached changes. It’s important to note that AWS DMS doesn't capture changes for a given table until the full load for that table is started. In other words, the point when change capture starts is different for each individual table.

When the full load for a given table is complete, AWS DMS immediately begins to apply the cached changes for that table. When all tables have been loaded, AWS DMS begins to collect changes as transactions for the ongoing replication phase. After AWS DMS applies all cached changes, tables are transactionally consistent. At this point, AWS DMS moves to the ongoing replication phase, applying changes as transactions.

At the start of the ongoing replication phase, a backlog of transactions generally causes some lag between the source and target databases. The migration eventually reaches a steady state after working through this backlog of transactions. At this point, you can shut down your applications, allow any remaining transactions to be applied to the target, and bring your applications up, now pointing at the target database.

AWS DMS creates the target schema objects necessary to perform the migration. However, AWS DMS takes a minimalist approach and creates only those objects required to efficiently migrate the data. In other words, AWS DMS creates tables, primary keys, and in some cases unique indexes, but doesn't create any other objects that are not required to efficiently migrate the data from the source. For example, it doesn't create secondary indexes, nonprimary key constraints, or data defaults.

In most cases, when performing a migration, you also migrate most or all of the source schema. If you are performing a homogeneous migration (between two databases of the same engine type), you migrate the schema by using your engine’s native tools to export and import the schema itself, without any data.

If your migration is heterogeneous (between two databases that use different engine types), you can use the AWS Schema Conversion Tool (AWS SCT) to generate a complete target schema for you. If you use the tool, any dependencies between tables such as foreign key constraints need to be disabled during the migration's "full load" and "cached change apply" phases. If performance is an issue, removing or disabling secondary indexes during the migration process helps. For more information on the AWS SCT, see AWS Schema Conversion Tool in the AWS SCT documentation.

Components of AWS DMS

This section describes the internal components of AWS DMS and how they function together to accomplish your data migration. Understanding the underlying components of AWS DMS can help you migrate data more efficiently and provide better insight when troubleshooting or investigating issues.

An AWS DMS migration consists of three components: a replication instance, source and target endpoints, and a replication task. You create an AWS DMS migration by creating the necessary replication instance, endpoints, and tasks in an AWS Region.

Replication instance

At a high level, an AWS DMS replication instance is simply a managed Amazon Elastic Compute Cloud (Amazon EC2) instance that hosts one or more replication tasks.

The figure following shows an example replication instance running several associated replication tasks.


                            Get started with AWS DMS

A single replication instance can host one or more replication tasks, depending on the characteristics of your migration and the capacity of the replication server. AWS DMS provides a variety of replication instances so you can choose the optimal configuration for your use case. For more information about the various classes of replication instances, see Selecting the Right AWS DMS Replication Instance for Your Migration.

AWS DMS creates the replication instance on an Amazon EC2 instance. Some of the smaller instance classes are sufficient for testing the service or for small migrations. If your migration involves a large number of tables, or if you intend to run multiple concurrent replication tasks, you should consider using one of the larger instances. We recommend this approach because AWS DMS can consume a significant amount of memory and CPU.

Depending on the Amazon EC2 instance class you select, your replication instance comes with either 50 GB or 100 GB of data storage. This amount is usually sufficient for most customers. However, if your migration involves large transactions or a high-volume of data changes then you might want to increase the base storage allocation. Change data capture (CDC) might cause data to be written to disk, depending on how fast the target can write the changes.

AWS DMS can provide high availability and failover support using a Multi-AZ deployment. In a Multi-AZ deployment, AWS DMS automatically provisions and maintains a standby replica of the replication instance in a different Availability Zone. The primary replication instance is synchronously replicated to the standby replica. If the primary replication instance fails or becomes unresponsive, the standby resumes any running tasks with minimal interruption. Because the primary is constantly replicating its state to the standby, Multi-AZ deployment does incur some performance overhead.

For more detailed information about the AWS DMS replication instance, see Working with an AWS DMS Replication Instance.

Endpoints

AWS DMS uses an endpoint to access your source or target data store. The specific connection information is different, depending on your data store, but in general you supply the following information when you create an endpoint:

  • Endpoint type – Source or target.

  • Engine type – Type of database engine, such as Oracle or PostgreSQL..

  • Server name – Server name or IP address that AWS DMS can reach.

  • Port – Port number used for database server connections.

  • Encryption – Secure Socket Layer (SSL) mode, if SSL is used to encrypt the connection.

  • Credentials – User name and password for an account with the required access rights.

When you create an endpoint using the AWS DMS console, the console requires that you test the endpoint connection. The test must be successful before using the endpoint in a DMS task. Like the connection information, the specific test criteria are different for different engine types. In general, AWS DMS verifies that the database exists at the given server name and port, and that the supplied credentials can be used to connect to the database with the necessary privileges to perform a migration. If the connection test is successful, AWS DMS downloads and stores schema information to use later during task configuration. Schema information might include table definitions, primary key definitions, and unique key definitions, for example.

More than one replication task can use a single endpoint. For example, you might have two logically distinct applications hosted on the same source database that you want to migrate separately. In this case, you create two replication tasks, one for each set of application tables. You can use the same AWS DMS endpoint in both tasks.

You can customize the behavior of an endpoint by using extra connection attributes. Extra connection attributes can control various behavior such as logging detail, file size, and other parameters. Each data store engine type has different extra connection attributes available. You can find the specific extra connection attributes for each data store in the source or target section for that data store. For a list of supported source and target data stores, see Sources for AWS DMS and Targets for AWS DMS.

For more detailed information about AWS DMS endpoints, see Working with AWS DMS Endpoints.

Replication Tasks

You use an AWS DMS replication task to move a set of data from the source endpoint to the target endpoint. Creating a replication task is the last step you need to take before you start a migration.

When you create a replication task, you specify the following task settings:

  • Replication instance – the instance to host and run the task

  • Source endpoint

  • Target endpoint

  • Migration type options, as listed following. For a full explanation of the migration type options, see Creating a Task.

    • Full load (Migrate existing data) – If you can afford an outage long enough to copy your existing data, this option is a good one to choose. This option simply migrates the data from your source database to your target database, creating tables when necessary.

    • Full load + CDC (Migrate existing data and replicate ongoing changes) – This option performs a full data load while capturing changes on the source. After the full load is complete, captured changes are applied to the target. Eventually, the application of changes reaches a steady state. At this point, you can shut down your applications, let the remaining changes flow through to the target, and then restart your applications pointing at the target.

    • CDC only (Replicate data changes only) – In some situations, it might be more efficient to copy existing data using a method other than AWS DMS. For example, in a homogeneous migration, using native export and import tools might be more efficient at loading bulk data. In this situation, you can use AWS DMS to replicate changes starting when you start your bulk load to bring and keep your source and target databases in sync.

  • Target table preparation mode options, as listed following. For a full explanation of target table modes, see Creating a Task.

    • Do nothing – AWS DMS assumes that the target tables are pre-created on the target.

    • Drop tables on target – AWS DMS drops and recreates the target tables.

    • Truncate – If you created tables on the target, AWS DMS truncates them before the migration starts. If no tables exist and you select this option, AWS DMS creates any missing tables.

  • LOB mode options, as listed following. For a full explanation of LOB modes, see Setting LOB Support for Source Databases in an AWS DMS Task.

    • Don't include LOB columns – LOB columns are excluded from the migration.

    • Full LOB mode – Migrate complete LOBs regardless of size. AWS DMS migrates LOBs piecewise in chunks controlled by the Max LOB Size parameter. This mode is slower than using limited LOB mode.

    • Limited LOB mode – Truncate LOBs to the value specified by the Max LOB Size parameter. This mode is faster than using full LOB mode.

  • Table mappings – indicates the tables to migrate and how they are migrated. For more information, see Using Table Mapping to Specify Task Settings.

  • Data transformations, as listed following. For more information on data transformations, see Specifying Table Selection and Transformations by Table Mapping Using JSON.

    • Changing schema, table, and column names.

    • Changing tablespace names (for Oracle target endpoints).

    • Defining primary keys and unique indexes on the target.

  • Data validation

  • Amazon CloudWatch logging

You use the task to migrate data from the source endpoint to the target endpoint, and the task processing is done on the replication instance. You specify what tables and schemas to migrate and any special processing, such as logging requirements, control table data, and error handling.

Conceptually, an AWS DMS replication task performs two distinct functions as shown in the diagram following:


                            Get started with AWS DMS

The full load process is straight-forward to understand. Data is extracted from the source in a bulk extract manner and loaded directly into the target. You can specify the number of tables to extract and load in parallel on the AWS DMS console under Advanced Settings.

For more information about AWS DMS tasks, see Working with AWS DMS Tasks.

Ongoing replication, or change data capture (CDC)

You can also use an AWS DMS task to capture ongoing changes to the source data store while you are migrating your data to a target. The change capture process that AWS DMS uses when replicating ongoing changes from a source endpoint collects changes to the database logs by using the database engine's native API.

In the CDC process, the replication task is designed to stream changes from the source to the target, using in-memory buffers to hold data in-transit. If the in-memory buffers become exhausted for any reason, the replication task will spill pending changes to the Change Cache on disk. This could occur, for example, if AWS DMS is capturing changes from the source faster than they can be applied on the target. In this case, you will see the task’s target latency exceed the task’s source latency.

You can check this by navigating to your task on the AWS DMS console, and opening the Task Monitoring tab. The CDCLatencyTarget and CDCLatencySource graphs are shown at the bottom of the page. If you have a task that is showing target latency then there is likely some tuning on the target endpoint needed to increase the application rate.

The replication task also uses storage for task logs as discussed above. The disk space that comes pre-configured with your replication instance is usually sufficient for logging and spilled changes. If you need additional disk space, for example, when using detailed debugging to investigate a migration issue, you can modify the replication instance to allocate more space.

Schema and code migration

AWS DMS doesn't perform schema or code conversion. You can use tools such as Oracle SQL Developer, MySQL Workbench, and pgAdmin III to move your schema if your source and target are the same database engine. If you want to convert an existing schema to a different database engine, you can use AWS SCT. It can create a target schema and also can generate and create an entire schema, with tables, indexes, views, and so on. You can also use AWS SCT to convert PL/SQL or TSQL to PgSQL and other formats. For more information on AWS SCT, see AWS Schema Conversion Tool.

Whenever possible, AWS DMS attempts to create the target schema for you. Sometimes, AWS DMS can't create the schema—for example, AWS DMS doesn't create a target Oracle schema for security reasons. For MySQL database targets, you can use extra connection attributes to have DMS migrate all objects to the specified database and schema. Or you can use these attributes to have DMS create each database and schema for you as it finds the schema on the source.

Sources for AWS DMS

You can use the following data stores as source endpoints for data migration using AWS DMS.

On-premises and EC2 instance databases

  • Oracle versions 10.2 and later (for versions 10.x), 11g and up to 12.2, 18c, and 19c for the Enterprise, Standard, Standard One, and Standard Two editions

    Note
    • Support for Oracle version 19c as a source is available in AWS DMS versions 3.3.2 and later.

    • Support for Oracle version 18c as a source is available in AWS DMS versions 3.3.1 and later.

  • Microsoft SQL Server versions 2005, 2008, 2008R2, 2012, 2014, 2016, 2017, and 2019 for the Enterprise, Standard, Workgroup, and Developer editions. The Web and Express editions are not supported.

    Note

    Support for Microsoft SQL Server version 2019 as a source is available in AWS DMS versions 3.3.2 and later.

  • MySQL versions 5.5, 5.6, and 5.7.

  • MariaDB (supported as a MySQL-compatible data source) versions 10.0.24 to 10.0.28, 10.1, 10.2, and 10.3.

    Note

    Support for MariaDB as a source is available in all AWS DMS versions where MySQL is supported.

  • PostgreSQL version 9.4 and later (for versions 9.x), 10.x, and 11.x.

    Note

    PostgreSQL versions 11.x are supported as a source only in AWS DMS versions 3.3.1 and later. You can use PostgreSQL version 9.4 and later (for versions 9.x) and 10.x as a source in any DMS version.

  • MongoDB versions 2.6.x and 3.x and later.

  • SAP Adaptive Server Enterprise (ASE) versions 12.5, 15, 15.5, 15.7, 16 and later.

  • IBM Db2 for Linux, UNIX, and Windows (Db2 LUW) versions:

    • Version 9.7, all fix packs are supported.

    • Version 10.1, all fix packs are supported.

    • Version 10.5, all fix packs except for Fix Pack 5 are supported.

Microsoft Azure

  • Azure SQL Database.

Amazon RDS instance databases, and Amazon Simple Storage Service (Amazon S3)

  • Oracle versions 10.2 and later (for versions 10.x), 11g (versions 11.2.0.3.v1 and later) and up to 12.2, 18c, and 19c for the Enterprise, Standard, Standard One, and Standard Two editions.

    Note
    • Support for Oracle version 19c as a source is available in AWS DMS versions 3.3.2 and later.

    • Support for Oracle version 18c as a source is available in AWS DMS versions 3.3.1 and later.

  • Microsoft SQL Server versions 2008R2, 2012, 2014, 2016, 2017, and 2019 for the Enterprise, Standard, Workgroup, and Developer editions. The Web and Express editions are not supported.

    Note

    Support for Microsoft SQL Server version 2019 as a source is available in AWS DMS versions 3.3.2 and later.

  • MySQL versions 5.5, 5.6, and 5.7.

  • MariaDB (supported as a MySQL-compatible data source) versions 10.0.24 to 10.0.28, 10.1, 10.2, and 10.3.

    Note

    Support for MariaDB as a source is available in all AWS DMS versions where MySQL is supported.

  • PostgreSQL version 9.4 and later (for versions 9.x), 10.x, and 11.x. Change data capture (CDC) is only supported for versions 9.4.9 and later, 9.5.4 and later, 10.x, and 11.x. The rds.logical_replication parameter, which is required for CDC, is supported only in these versions and later.

    Note

    PostgreSQL versions 11.x are supported as a source only in AWS DMS versions 3.3.1 and later. You can use PostgreSQL version 9.4 and later (for versions 9.x) and 10.x as a source in any DMS version.

  • Amazon Aurora with MySQL compatibility (supported as a MySQL-compatible data source).

  • Amazon Aurora with PostgreSQL compatibility (supported as a PostgreSQL-compatible data source).

  • Amazon S3.

Targets for AWS DMS

You can use the following data stores as target endpoints for data migration using AWS DMS.

On-premises and Amazon EC2 instance databases

  • Oracle versions 10g, 11g, 12c, 18c, and 19c for the Enterprise, Standard, Standard One, and Standard Two editions.

    Note
    • Support for Oracle version 19c as a target is available in AWS DMS versions 3.3.2 and later.

    • Support for Oracle version 18c as a target is available in AWS DMS versions 3.3.1 and later.

  • Microsoft SQL Server versions 2005, 2008, 2008R2, 2012, 2014, 2016, 2017, and 2019 for the Enterprise, Standard, Workgroup, and Developer editions. The Web and Express editions are not supported.

    Note

    Support for Microsoft SQL Server version 2019 as a target is available in AWS DMS versions 3.3.2 and later.

  • MySQL versions 5.5, 5.6, and 5.7.

  • MariaDB (supported as a MySQL-compatible data target) versions 10.0.24 to 10.0.28, 10.1, 10.2 and 10.3.

    Note

    Support for MariaDB as a target is available in all AWS DMS versions where MySQL is supported.

  • PostgreSQL version 9.4 and later (for versions 9.x), 10.x, and 11.x.

    Note

    PostgreSQL versions 11.x are supported as a target only in AWS DMS versions 3.3.1 and later. You can use PostgreSQL version 9.4 and later (for versions 9.x) and 10.x as a target in any DMS version.

  • SAP Adaptive Server Enterprise (ASE) versions 15, 15.5, 15.7, 16 and later .

Amazon RDS instance databases, Amazon Redshift, Amazon DynamoDB, Amazon S3, Amazon Elasticsearch Service, Amazon Kinesis Data Streams, and Amazon DocumentDB

  • Oracle versions 11g (versions 11.2.0.3.v1 and later), 12c, 18c, and 19c for the Enterprise, Standard, Standard One, and Standard Two editions.

    Note
    • Support for Oracle version 19c as a target is available in AWS DMS versions 3.3.2 and later.

    • Support for Oracle version 18c as a target is available in AWS DMS versions 3.3.1 and later.

  • Microsoft SQL Server versions 2008R2, 2012, 2014, 2016, 2017, and 2019 for the Enterprise, Standard, Workgroup, and Developer editions. The Web and Express editions are not supported.

    Note

    Support for Microsoft SQL Server version 2019 as a target is available in AWS DMS versions 3.3.2 and later.

  • MySQL, versions 5.5, 5.6, and 5.7.

  • MariaDB (supported as a MySQL-compatible data target) versions 10.0.24 to 10.0.28, 10.1, 10.2 and 10.3.

    Note

    Support for MariaDB as a target is available in all AWS DMS versions where MySQL is supported.

  • PostgreSQL version 9.4 and later (for versions 9.x), 10.x, and 11.x.

    Note

    PostgreSQL versions 11.x are supported as a target only in AWS DMS versions 3.3.1 and later. You can use PostgreSQL version 9.4 and later (for versions 9.x) and 10.x as a target in any DMS version.

  • Amazon Aurora with MySQL compatibility.

  • Amazon Aurora with PostgreSQL compatibility.

  • Amazon Redshift.

  • Amazon S3.

  • Amazon DynamoDB.

  • Amazon Elasticsearch Service.

  • Amazon Kinesis Data Streams.

  • Apache Kafka – Amazon Managed Streaming for Apache Kafka (Amazon MSK) and self-managed Apache Kafka.

  • Amazon DocumentDB (with MongoDB compatibility).

Using AWS DMS with Other AWS Services

You can use AWS DMS with several other AWS services:

  • You can use an Amazon EC2 instance or Amazon RDS DB instance as a target for a data migration.

  • You can use the AWS Schema Conversion Tool (AWS SCT) to convert your source schema and SQL code into an equivalent target schema and SQL code.

  • You can use Amazon S3 as a storage site for your data, or you can use it as an intermediate step when migrating large amounts of data.

  • You can use AWS CloudFormation to set up your AWS resources for infrastructure management or deployment. For example, you can provision AWS DMS resources such as replication instances, tasks, certificates, and endpoints. You create a template that describes all the AWS resources that you want, and AWS CloudFormation provisions and configures those resources for you.