Migrating data - Migrating Your Databases to Amazon Aurora

Migrating data

After the database schema has been copied from the source database to the target Aurora database, the next step is to migrate actual data from source to target. While data migration can be accomplished using different tools, AWS recommends moving data using the AWS Database Migration Service (AWS DMS) as it provides both the simplicity and the features needed for the task at hand.

Introduction and general approach to AWS DMS

The AWS Database Migration Service (AWS DMS) makes it easy for customers to migrate production databases to AWS with minimal downtime. You can keep your applications running while you are migrating your database. In addition, the AWS Database Migration Service ensures that data changes to the source database that occur during and after the migration are continuously replicated to the target. Migration tasks can be set up in minutes in the AWS Management Console. The AWS Database Migration Service can migrate your data to and from widely used database platforms, such as Oracle, SQL Server, MySQL, PostgreSQL, Amazon Aurora, MariaDB, and Amazon Redshift.

The service supports homogenous migrations such as Oracle to Oracle, as well as heterogeneous migrations between different database platforms, such as Oracle to Amazon Aurora or SQL Server to MySQL. You can perform one-time migrations, or you can maintain continuous replication between databases without a customer having to install or configure any complex software.

AWS DMS works with databases that are on premises, running on Amazon EC2, or running on Amazon RDS. However, AWS DMS does not work in situations where both the source database and the target database are on premises; one endpoint must be in AWS.

AWS DMS supports specific versions of Oracle, SQL Server, Amazon Aurora, MySQL, and PostgreSQL. For currently supported versions, refer to Sources for data migration. However, this whitepaper is just focusing on Amazon Aurora as a migration target.

Migration methods

AWS DMS provides three methods for migrating data:

  • Migrate existing data — This method creates the tables in the target database, automatically defines the metadata that is required at the target, and populates the tables with data from the source database (also referred to as a “full load”). The data from the tables is loaded in parallel for improved efficiency. Tables are only created in case of homogenous migrations, and secondary indexes aren’t created automatically by AWS DMS. Read further for details.

  • Migrate existing data and replicate ongoing changes — This method does a full load, as described above, and in addition captures any ongoing changes being made to the source database during the full load and stores them on the replication instance. Once the full load is complete, the stored changes are applied to the destination database until it has been brought up to date with the source database. Additionally, any ongoing changes being made to the source database continue to be replicated to the destination database to keep them in sync. This migration method is very useful when you want to perform a database migration with very little downtime.

  • Replicate data changes only — This method just reads changes from the recovery log file of the source database and applies these changes to the target database on an ongoing basis. If the target database is unavailable, these changes are buffered on the replication instance until the target becomes available.

  • When AWS DMS is performing a full load migration, the processing puts a load on the tables in the source database, which could affect the performance of applications that are hitting this database at the same time. If this is an issue, and you cannot shut down your applications during the migration, you can consider the following approaches:

  • Running the migration at a time when the application load on the database is at its lowest point.

  • Creating a read replica of your source database and then performing the AWS DMS migration from the read replica.

Migration procedure

The general outline for using AWS DMS is as follows:

  1. Create a target database.

  2. Copy the schema.

  3. Create an AWS DMS replication instance.

  4. Define the database source and target endpoints.

  5. Create and run a migration task.

Create target database

Create your target Amazon Aurora database cluster using the procedure outlined in Creating an Amazon Aurora DB Cluster. You should create the target database in the Region and with an instance type that matches your business requirements. Also, to improve the performance of the migration, verify that your target database does not have multi-AZ deployment enabled; you can enable that once the load has finished.

Copy schema

Additionally, you should create the schema in this target database. AWS DMS supports basic schema migration, including the creation of tables and primary keys. However, AWS DMS doesn't automatically create secondary indexes, foreign keys, stored procedures, users, and so on, in the target database. For full schema migration details, refer to the Migrating the database schema section of this document.

Create an AWS DMS replication instance

In order to use the AWS DMS service, you must create an AWS DMS replication instance, which runs in your VPC. This instance reads the data from the source database, performs the specified table mappings, and writes the data to the target database. In general, using a larger replication instance size speeds up the database migration (although the migration can also be gated by other factors such as the capacity of the source and target databases, connection latency, and so on.). Also, your replication instance can be stopped once your database migration is complete.

A diagram showing AWS Database Migration Service.

AWS Database Migration Service

AWS DMS currently supports burstable, compute and memory-optimized instance classes for replication instances. The burstable instance classes are low-cost standard instances designed to provide a baseline level of CPU performance with the ability to burst above the baseline. They are suitable for developing, configuring, and testing your database migration process as well as for periodic data migration tasks that can benefit from the CPU burst capability.

The compute-optimized instance classes are designed to deliver the highest level of processor performance and achieve significantly higher packet per second (PPS) performance, lower network jitter, and lower network latency. You should use this instance class if you are performing large heterogeneous migrations and want to minimize the migration time.

The memory-optimized instance classes are designed for migrations or replications of high-throughput transaction systems which can consume large amounts of CPU and memory.

AWS DMS Storage is primarily consumed by log files and cached transactions. Normally, doing a full load does not require a significant amount of instance storage on your AWS DMS replication instance. However, if you are doing replication along with your full load, then the changes to the source database are stored on the AWS DMS replication instance while the full load is taking place. If you are migrating a very large source database that is also receiving a lot of updates while the migration is in progress, then a significant amount of instance storage could be consumed.

The instances come with 50 GB of instance storage but can be scaled up as appropriate. Normally, this amount of storage should be more than adequate for most migration scenarios. However, it's always a good idea to pay attention to storage-related metrics. Make sure to scale up your storage if you find you are consuming more than the default allocation.

Also, in some extreme cases where very large databases with very high transaction rates are being migrated with replication enabled, it is possible that the AWS DMS replication may not be able to catch up in time. If you encounter this situation, it may be necessary to stop the changes to the source database for some number of minutes in order for the replication to catch up before you repoint your application to the target Aurora DB.

A screenshot of the Create replication instance page in the AWS DMS console.

Create replication instance page in the AWS DMS console

A screenshot of options on the create replication instance page in the AWS DMS console.

Options on the create replication instance page in the AWS DMS console

Define database source and target endpoints

A database endpoint is used by the replication instance to connect to a database. To perform a database migration, you must create both a source database endpoint and a target database endpoint. The specified database endpoints can be on premises, running on Amazon EC2, or running on Amazon RDS, but the source and target cannot both be on premises.

AWS highly recommendeds that you test your database endpoint connection after you define it. The same page used to create a database endpoint can also be used to test it, as explained later in this paper.

Note: If you have foreign key constraints in your source schema, when creating your target endpoint you need to enter the following for Extra connection attributes in the Advanced section:

initstmt=SET FOREIGN_KEY_CHECKS=0

This disables the foreign key checks while the target tables are being loaded. This in turn prevents the load from being interrupted by failed foreign key checks on partially loaded tables.

A screenshot of the create database endpoint page in the AWS DMS console .

Create database endpoint page in the AWS DMS console

Create and run a migration task

Now that you have created and tested your source database endpoint and your target database endpoint, you can create a task to do the data migration. When you create a task, you specify the replication instance that you have created, the database migration method type (discussed earlier), the source database endpoint, and your target database endpoint for your Amazon Aurora database cluster.

Also, under Task Settings, if you have already created the full schema in the target database, then you should change the Target table preparation mode to Do nothing rather than using the default value of Drop tables on target. The latter can cause you to lose aspects of your schema definition like foreign key constraints when it drops and recreates tables.

When creating a task, you can create table mappings that specify the source schema along with the corresponding tables to be migrated to the target endpoint. The default mapping method migrates all source tables to target tables of the same name if they exist. Otherwise, it creates the source table(s) on the target (depending on your task settings). Additionally, you can create custom mappings (using a JSON file) if you want to migrate only certain tables or if you want to have more control over the field and table mapping process. You can also choose to migrate only one schema or all schemas from your source endpoint.

A screenshot of the Create database migration task page in the AWS DMS console

Create database migration task page in the AWS DMS console

A screenshot of the Task settings page in the AWS DMS console.

Task settings page in the AWS DMS console

You can use the AWS Management Console to monitor the progress of your AWS Database Migration Service (AWS DMS) tasks. You can also monitor the resources and network connectivity used. The AWS DMS console shows basic statistics for each task, including the task status, percent complete, elapsed time, and table statistics, as the following image shows.

A screenshot of AWS DMS Console task statistics.

AWS DMS Console task statistics

Additionally, you can select a task and display performance metrics for that task, including throughput, records per second migrated, disk and memory use, and latency.