Menu
AWS Database Migration Service
User Guide (Version API Version 2016-01-01)

Introduction to AWS DMS

AWS Database Migration Service (AWS DMS) is a web service that you can use to migrate data from a source database to a target database. To work with AWS DMS, one of your databases must be on an AWS service. You can't migrate from an on-premises database to another on-premises database.

Migration: A High-Level View

To perform a database migration, AWS DMS connects to the source database, reads the source data, formats the data for consumption by the target database, and loads the data into the target database. Most of this processing happens in memory, though large transactions might require some buffering to disk. Cached transactions and log files are also written to disk.

At a high level, when using AWS DMS, you do the following:

  • Provision a replication server

  • Define source and target endpoints (databases)

  • Create one or more tasks to migrate data between the source and target databases.

A typical task consists of three major phases:

  • The full load of existing data

  • The application of cached changes

  • Ongoing replication

During the full load, AWS DMS loads data from tables on the source database to tables on the target database, eight tables at a time. While the full load is in progress, any changes made to the tables being loaded are cached on the replication server; these are the cached changes. It’s important to note that change capture for a given table doesn't begin until the full load for that table is started. In other words, the point when change capture starts will be different for each individual table.

When the full load for a given table is complete, AWS DMS immediately begins to apply the cached changes for that table. When all tables have been loaded, AWS DMS begins to collect changes as transactions for the ongoing replication phase. After AWS DMS applies all cached changes, tables are transactionally consistent. At this point, AWS DMS moves to the ongoing replication phase, applying changes as transactions.

At the start of the ongoing replication phase, a backlog of transactions generally causes some lag between the source and target databases. The migration eventually reaches a steady state after working through this backlog of transactions. At this point, you can shut down your applications, allow any remaining transactions to be applied to the target, and bring your applications up, now pointing at the target database.

AWS DMS creates the target schema objects necessary to perform the migration. However, AWS DMS takes a minimalist approach and creates only those objects required to efficiently migrate the data. In other words, AWS DMS creates tables, primary keys, and in some cases unique indexes, but it doesn't create any other objects that are not required to efficiently migrate the data from the source. For example, it doesn't create secondary indexes, non-primary key constraints, or data defaults.

In most cases, when performing a migration, you will also want to migrate most or all of the source schema. If you are performing a homogeneous migration (between two databases of the same engine type), you migrate the schema by using your engine’s native tools to export and import the schema itself, without any data. If your migration is heterogeneous (between two databases that use different engine types), you can use the AWS Schema Conversion Tool to generate a complete target schema for you. If you use the tool, any dependencies between tables such as foreign key constraints need to be disabled during the migration's "full load" and "cached change apply" phases. If performance is an issue, removing or disabling secondary indexes during the migration process will help. For more information on the AWS Schema Conversion Tool, see AWS Schema Conversion Tool.

AWS DMS Components

The components you work with when using AWS DMS include the following:

Replication instance

The AWS DMS replication instance runs on an Amazon Elastic Compute Cloud (Amazon EC2) instance. The replication instance provides high-availability and failover support using a Multi-AZ deployment. In a Multi-AZ deployment, AWS DMS automatically provisions and maintains a synchronous standby replica of the replication instance in a different Availability Zone. The primary replication instance is synchronously replicated across Availability Zones to a standby replica. This approach provides data redundancy, eliminate I/O freezes, and minimize latency spikes during system backups.

AWS DMS uses a replication server that connects to the source database, reads the source data, formats the data for consumption by the target database, and loads the data into the target database. Most of this processing happens in memory. However, large transactions might require some buffering on disk. Cached transactions and log files are also written to disk. When creating your replication server, you should consider the following:

  • EC2 instance class — Some of the smaller EC2 instance classes are sufficient for testing the service or for small migrations. If your migration involves a large number of tables, or if you intend to run multiple concurrent replication tasks, you should consider using one of the larger instances. We recommend this approach because AWS DMS consumes a fair amount of memory and CPU.

  • Storage — Depending on the EC2 instance class you select, your replication server comes with either 50 GB or 100 GB of data storage. This storage is used for log files and any cached changes collected during the load. If your source system is busy or takes large transactions, or if you’re running multiple tasks on the replication server, you might need to increase this amount of storage. Usually the default amount is sufficient.

Source endpoint

The change capture process that AWS DMS uses when replicating ongoing changes from a source endpoint collects changes to the database logs by using the database engine's native API. Each source engine has specific configuration requirements for exposing this change stream to a given user account. Most engines require some additional configuration to make the change data consumable in a meaningful way, without data loss, for the capture process. For example, Oracle requires the addition of supplemental logging, and MySQL requires row-level bin logging. When using Amazon RDS as a source, we recommend ensuring that backups are enabled and that the source database is configured to retain change logs for a sufficient time (24 hours is usually enough).

Target endpoint

Whenever possible, AWS DMS attempts to create the target schema for you. Sometimes, AWS DMS can't create the schema—for example, AWS DMS won't create a target Oracle schema for security reasons. For MySQL database targets, you can use extra connection parameters to have AWS DMS migrate all objects to the specified database and schema or create each database and schema for you as it finds the schema on the source.

Task

You can create one of three possible types of migration tasks:

  • Migrate existing data — If you can afford an outage long enough to copy your existing data, this option is a good one to choose. This option simply migrates the data from your source database to your target database, creating tables when necessary.

  • Migrate existing data and replicate ongoing changes — This option performs a full data load while capturing changes on the source. Once the full load is complete, captured changes are applied to the target. Eventually the application of changes reaches a steady state. At this point you can shut down your applications, let the remaining changes flow through to the target, and then restart your applications pointing at the target.

  • Replicate data changes only — In some situations it might be more efficient to copy existing data using a method other than AWS DMS. For example, in a homogeneous migration, using native export/import tools might be more efficient at loading the bulk data. In this situation, you can use AWS DMS to replicate changes starting when you start your bulk load to bring and keep your source and target databases in sync

By default AWS DMS starts your task as soon as you create it. However, in some situations, you might want to postpone the start of the task. For example, when using the AWS Command Line Interface (AWS CLI), you might have a process that creates a task and a different process that starts the task based on some triggering event. As needed, you can postpone your task's start.

Schema and code migration

AWS DMS doesn't perform schema or code conversion. You can use tools such as Oracle SQL Developer, MySQL Workbench, or pgAdmin III to move your schema if your source and target are the same database engine. If you want to convert an existing schema to a different database engine, you can use the AWS Schema Conversion Tool. It can create a target schema and also can generate and create an entire schema: tables, indexes, views, and so on. You can also use the tool to convert PL/SQL or TSQL to PgSQL and other formats. For more information on the AWS Schema Conversion Tool, see AWS Schema Conversion Tool.