Migrating an Amazon RDS for MySQL Database to an Amazon DynamoDB target - Database Migration Guide

Migrating an Amazon RDS for MySQL Database to an Amazon DynamoDB target

This walkthrough helps you to understand the process of migrating data from Amazon Relational Database Service (Amazon RDS) for MySQL to Amazon DynamoDB using AWS Database Migration Service (AWS DMS).

Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale for modern applications. It’s a fully managed, multi-region, multi-master, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications. DynamoDB can handle more than 10 trillion requests per day and can support peaks of more than 20 million requests per second. Many of the world’s fastest growing businesses depend on the scale and performance of DynamoDB to support their mission-critical workloads.

Customers use DynamoDB for banking/finance, gaming, ad-tech, retail, media & entertainment workloads to build internet-scale applications supporting user-content metadata and caches. It requires high concurrency and connections for millions of users and requests, where there is a requirement for a very stringent response time. With DynamoDB, you can use design patterns for deploying shopping carts, workflow engines, inventory tracking, customer profiles, fraud detection, and leader boards, to name a few.

In this document, we will talk about a use case where a customer is running an application that handles a COVID-19 vaccination drive and stores this information in a data store. Currently, they use RDS MySQL to store vaccine data, but because of the sheer scale where data of millions of people can be getting stored at the same time, MySQL poses scalability challenges vis-à-vis response time. As business and application requirements are sensitive enough for response time in both writing data and reading it back, a relational database like MySQL cannot meet the SLA requirements. So, the customer decides to migrate to DynamoDB, which is purpose built to be performant at scale and is specifically designed to handle such use cases. The business also requires that the initial transfer of data from RDS MySQL to Amazon DynamoDB must complete within a 15-hour window.

To illustrate the process, we use AWS DMS to migrate data from an example database. AWS DMS is a managed service that helps migrate between heterogeneous sources and targets. In our case, we migrate an RDS MySQL database to Amazon DynamoDB. AWS DMS supports not only the migration of your existing data, but also ensures that the source and target are synchronized for ongoing transactions.

Why use AWS DMS?

When migrating from a relational database like MySQL to Dynamo DB, there are multiple approaches that you can take. One can be dumping your data using a CSV dump and loading that into Amazon DynamoDB Tables from S3. However, it comes with its own challenges in regard to size and requires taking extended downtime. AWS DMS supports binary log-based replication between MySQL based engines and Dynamo DB which can help achieve such migrations with minimal downtime. Also, Relational Database Management System (RDBMS) tables store the data in a normalized way across multiple tables. However, using DMS, you can customize the target table using the object mapping feature to denormalize the data into a single target table.

In this document, we guide you through the steps that you take to migrate the example MySQL database into Amazon DynamoDB. In the next sections, we describe the characteristics of the database. Then, we build the replication resources in AWS DMS that we use to migrate the database, paying close attention to matching the AWS DMS configuration with our particular use case.

Example data set

In this walkthrough, the following is the table information that is used to store the vaccine drive data. As it can be noted that the schema does not completely play out the relational model of normalization, and all data are stored in a single table in a de-normalized way.

Data set

Generally, relational tables are used to fetch a fixed data set based on the table definition. However, in this use case, we define the tables in a de-normalized manner, and going forward based on the business requirement schema, growth can be exponential in rate and dynamic in nature. Services like Amazon DynamoDB help application developers and architects to rethink the data model in a key-value format for such use cases, and plan to move the data store on DynamoDB.

The “vaccine_drive_stats” table contains 1022 million records with a size of 210 GB. This table mainly collects the information for people who participated in the vaccination program, including their vaccine status and user details.

Note that the table contains composite primary keys for the “user_id” and “area_code” columns. In MySQL, the application and admin user accesses the data using composite keys for reporting and manipulating the records in the tables.

There are additional use cases to get aggregate data , such as the total number of people who have received the first or second vaccination, state-wise vaccine numbers, total percentage of the population receiving vaccination, etc. All of these aggregate use cases can be handled using a DynamoDB schema designed to cater to aggregations.

Migration of this use case can be handled using one-to-one mapping from RDBMS MySQL to a DynamoDB table.

Similarly, if you have the following types of tables, you can consider migrating to a DynamoDB target using DMS with less downtime.

  1. Table with non-relational data

  2. Logging tables

  3. User preference tables

  4. Application Session state tables

Solution overview

The following diagram displays a high-level architecture of the solution, where we use AWS DMS to move data from a MySQL database hosted on RDS to Amazon DynamoDB.

Data set

To connect to the source database where your data resides and target Amazon DynamoDB, you will create two endpoint resources in AWS DMS. An “endpoint” is a resource for storing connection information such as hostname, username, and password. For DynamoDB, it stores an IAM role name that provides access to resources. Endpoint resources also store unique settings for each endpoint to configure the endpoint behavior.

The endpoint itself does not have a mechanism to connect to the source or target. A resource called a “replication task” connects to the source and target to migrate data. One source and target endpoint can be associated with single replication task. Tasks can use source and target endpoints, which are used by other tasks.

A replication instance is a resource where your replication task is running. It has a network interface connected to your VPC, through which AWS DMS tasks communicate with sources and targets.

In summary, in this walkthrough you will set up the following resources in AWS DMS
  • Replication Instance — An AWS managed instance that hosts the AWS DMS engine. You control the type or size of the instance based on your workload.

  • Source Endpoint — A resource that provides connection details, data store type, and credentials to connect to a source database. For this use case, we will configure the source endpoint to point to the Amazon RDS for MySQL database.

  • Target table - A DynamoDB table used on this scenario to consume the data from the Source database. We will create a DynamoDB table with customized settings for migration.

  • Target Endpoint — AWS DMS supports several target systems including Amazon RDS, Amazon Aurora, Amazon Redshift, Amazon Kinesis Data Streams, Amazon S3, and more. For this use case, we will configure Amazon Dynamo DB as the target endpoint.

  • Replication Task — A resource that runs on the replication instance and connects to endpoints to replicate data from the source to the target.

Prerequisites

The following prerequisites are required to complete this walkthrough:

  • An understanding of Amazon Relational Database Service (Amazon RDS), the applicable database technologies, and SQL.

  • A user with AWS Identity and Access Management (IAM) credentials that allows you to launch Amazon RDS and AWS Database Migration Service (AWS DMS) instances in your AWS Region. For information about IAM credentials, see Create an IAM user.

  • An understanding of the Amazon Virtual Private Cloud (Amazon VPC) service and security groups. For information about using Amazon VPC with Amazon RDS, see Amazon Virtual Private Cloud (VPCs) and Amazon RDS. For information about Amazon RDS security groups, see Controlling access with security groups.

  • An understanding of the supported features and limitations of AWS DMS. For information about AWS DMS, see What is Database Migration Service?

  • An understanding of how to work with MySQL as a source and Amazon DynamoDB as a target. For information about working with MySQL as a source, see Using an MySQL database as a source. For information about working with Amazon DynamoDB as a target, see Using Amazon DynamoDB as a target.

  • An understanding of the supported data type conversion options for MySQL and Amazon DynamoDB. For information about data types for MySQL as a source, see Source data types for MySQL. For information about data types for Amazon DynamoDB as a target, see Target data types for Amazon DynamoDB.

For more information about AWS DMS, see Getting started with Database Migration Service.

Step-by-step migration

The following steps provide instructions for migrating an Amazon RDS for MySQL database to DynamoDB. These steps assume that you have already prepared your source database as described previously.

Step 1: Create an AWS DMS Replication Instance

Step 2: Configure a Source Amazon RDS for MySQL Database

Step 3: Create an AWS DMS Source Endpoint

Step 4: Configure a Target Amazon DynamoDB table

Step 5: Configure an AWS DMS Target Endpoint

Step 6: Create an AWS DMS Task

Step 7: Run the AWS DMS Task

Step 1: Create replication instance

An AWS DMS replication instance hosts the software that migrates data between the source and target. The replication instance also caches the transaction logs during the migration. The CPU and memory capacity of the replication instance influences the overall time needed for the migration. Make sure that you consider the specifics of your particular use case when you determine the size of your replication instance. A full load task consumes a lot of memory if it is run multithreaded. For more information, see Choosing the right replication instance for your migration.

For our use case, we have a limited time window of 15 hours to complete the full load, and the table that includes 210 GB of data. Our goal is to fit into the 10-hour window. Therefore, we scale the replication instance to accommodate these requirements.

Each type of instance class has a different CPU, memory, and I/O capacity. Sizing the replication instance should be based on factors such as data volume, transaction frequency, large objects (LOBs) within storage of the data migration, and so on. We initially chose a DMS dms.c5.large instance running the latest AWS DMS engine version and default task configuration. We then upgraded to a dms.c5.12xlarge instance with a customized task configuration to see the performance differences. We will discuss the performance and configuration details in an upcoming section.

We also upgraded the storage of the replication instance to 200 GB, and as a result, 600 IOPS were available for our replication instance. By default, DMS allocates 50 GB of storage to a replication instance. This may not be sufficient for use cases where more tasks are running on same replication instance or when running tasks with parallel load for large tables. With 600 IOPS, we saved several minutes of migration time. For more information about storage volume performance and burst I/O credits, see General Purpose SSD (gp2) volumes.

Because we replicate production data in this walkthrough, we use the Multi-AZ deployment option for our replication instance for high availability. Also, we didn’t make this replication instance publicly accessible for additional security. For information about best practices for using AWS DMS, see Database Migration Service Best Practices.

To create an AWS DMS replication instance, do the following:

  1. Sign in to the AWS Management Console, and open the AWS DMS console.

  2. If you are signed in as an AWS Identity and Access Management (IAM) user, you must have the appropriate permissions to access AWS DMS. For more information about the permissions required, see IAM permissions.

  3. On the Welcome page, choose Create replication instance to start a database migration.

  4. On the Create replication instance page, specify your replication instance information.

For this parameter

Do the following

Name

Enter mysql-to-ddb-migration-ri. If you are using multiple replication servers or sharing an account, choose a name that helps you quickly differentiate between the different servers.

Description

Enter Migrate MySQL to Amazon DynamoDB.

Instance class

Choose dms.c5.12xlarge. Each size and type of instance class has increasing CPU, memory, and I/O capacity.

Engine version

Leave the default value, which is the latest stable version of the AWS DMS replication engine.

Allocated storage (GiB)

Choose 200 GiB.

VPC

Choose the virtual private cloud (VPC) in which your replication instance will launch. Select the same VPC in which your source is placed.

Multi AZ

In this scenario, choose No. If you choose Yes, AWS DMS creates a second replication server in a different Availability Zone for failover if there is a problem with the primary replication server.

Publicly accessible

Choose Yes. If either your source or target database resides outside of the VPC in which your replication server resides, you must make your replication server policy publicly accessible.

  1. Choose Create.

Step 2: Configure a Source Amazon RDS for MySQL Database

Before setting up AWS DMS resource, there are some setups are required to configure your MySQL DB instances as a source for AWS DMS. As you know, in this walkthrough we are using a MySQL database on Amazon RDS, so DMS MySQL required prerequisites has to be enabled at the instance parameter group.

Binary logging and its retention

To use AWS DMS change data capture (CDC), enable binary logging on the source MySQL RDS instance. To enable binary logs for RDS for MySQL and for RDS for MariaDB, enable automatic backups at the instance level. For more information about setting up automatic backups, see Working with automated backups in the Amazon RDS User Guide.

Next, the following parameters must be configured on the parameter group used by the source database. You can’t modify a default parameter group. If the database instance is using a default parameter group, create a new parameter group and associate it with the database instance. After you perform these steps, you must reboot the database instance for your changes to apply.

The following parameters are dynamic types, so a custom parameter group with the below values doesn’t require an instance reboot.

binlog_format=ROW

binlog_checksum=NONE

binlog_row_image=FULL

To ensure that binary logs are available to AWS DMS, you should increase the length of time that the logs remain available in the database instance host. For example, to increase the log retention to 24 hours, execute the following procedure call on the source database.

call mysql.rds_set_configuration('binlog retention hours', 24);

Also, it is recommended to retain the binary logs until the task completes the full load phase, and runs the CDC phase with less latency. In the planning phase, test your workload, and based on that, retain the logs for the production migration.

Source User Permission

You must have an account for AWS DMS that has the Replication Admin role. The role needs the following privileges to run the CDC task.

REPLICATION CLIENT – This privilege is required for CDC tasks only. In other words, full-load-only tasks don’t require this privilege.

REPLICATION SLAVE – This privilege is required for CDC tasks only. In other words, full-load-only tasks don’t require this privilege.

The AWS DMS user must also have SELECT privileges for the source tables designated for replication.

Network configuration

In this walkthrough, the DB instance and the replication instance are placed in the same VPC and the same subnet, so all you need to do is configure security groups, network ACLs, and route tables so that these Amazon RDS for MySQL DB instances and AWS DMS replication instances can communicate within the subnet. If you have source databases in different subnets or different VPCs, you need to configure your network to allow communication between the Amazon RDS for MySQL DB instance and the AWS DMS replication instance. For more information about network setup, see Network configurations for database migration in the DMS user Guide.

Inbound connection rule

To ensure that the replication instance can access the server and the port for the database, you need to make changes to the relevant security groups and network access control lists. AWS DMS only requires access to the MySQL database listener port (3306). Also, the connection is always from the AWS DMS replication instance to MySQL. Therefore, allow connections from the replication instance to the ingress rule of the security group attached to the DB instance. We recommend you to add the complete subnet group range in the ingress rule, because the AWS DMS replication instance is a managed service and the IP address may change automatically.

Step 3: Create an AWS DMS Source Endpoint

After you complete the network configurations, you can create a source endpoint. To create a source endpoint, do the following:

  1. Open the AWS DMS console at https://console.aws.amazon.com/dms/v2/.

  2. Choose Endpoints.

  3. Choose Create endpoint.

  4. On the Create endpoint page, enter the following information.

Parameter

Value

Endpoint type

Choose Source endpoint

Endpoint identifier

Enter mysql-source-dms-datastore

Source engine

Choose MySQL.

Access to endpoint database

Choose Provide access information manually.

Server name

Enter the MySQL Database host amazon ec2 instance IP

Port

Enter 3306.

Secure Socket Layer (SSL) mode

Choose none.

User name

Enter dms_user.

Password

Enter the password that you created for the dms_user user.

Step 4: Configure a Target Amazon DynamoDB table

A DMS task can create a target DynamoDB table based on the source table definition. When AWS DMS sets DynamoDB parameter values for a migration task, the default Read Capacity Units (RCU) parameter value is set to 200. The Write Capacity Units (WCU) parameter value is also set, but its value depends on several other settings:

  • The default value for the WCU parameter is 200.

  • If the ParallelLoadThreads task setting is set greater than 1 (the default is 0), then the WCU parameter is set to 200 times the ParallelLoadThreads value.

In this case, DMS uses the default provisioned capacity, which will not be sufficient to handle the workload from Source database. To avoid a resource constraint issue, and to customize the key usage, consider creating the target DynamoDB table with the configuration required by your workload.

In this walkthrough, use the below source MySQL table definition to create the target DynamoDB table. As you can see below, the source table contains composite primary keys (user_id, area_code), so you can use these fields to create a DynamoDB table with a partition key and a sort key.

Table: vaccine_drive_stats Create Table: CREATE TABLE `vaccine_drive_stats` ( 'user_id' int(11) NOT NULL AUTO_INCREMENT, 'patient_name' varchar(1000) DEFAULT NULL, 'phone_num' int(11) DEFAULT NULL, 'date_of_birth' date DEFAULT NULL, 'age' tinyint(4) DEFAULT NULL, 'date_vacc1' date DEFAULT NULL, 'date_vacc2' date DEFAULT NULL, 'date_booster' date DEFAULT NULL, 'fully_vaccinated' bit(64) DEFAULT NULL, 'age_group' varchar(50) DEFAULT NULL, 'state' varchar(1000) DEFAULT NULL, 'zipcode' int(11) DEFAULT NULL, 'gender' varchar(50) DEFAULT NULL, 'city' varchar(50) DEFAULT NULL, 'area_code' varchar(200) NOT NULL, 'vaccine_type' varchar(300) DEFAULT NULL, 'vaccine_name' varchar(100) DEFAULT NULL, 'rural_or_urban' varchar(100) DEFAULT NULL, 'certificate_link' varchar(100) DEFAULT NULL, 'vaccinated_by' varchar(200) DEFAULT NULL, 'vaccinated_at' varchar(100) DEFAULT NULL, 'next_due_date' date DEFAULT NULL, PRIMARY KEY ('user_id','area_code') ) ENGINE=InnoDB AUTO_INCREMENT=13462359 DEFAULT CHARSET=utf8mb4;

To create an Amazon DynamoDB table, do the following.

  1. Open the DynamoDB console at https://console.aws.amazon.com/dynamodb/.

  2. Choose Create Table. In the Create DynamoDB table screen, do the following:

  3. On the Table name box, enter the name of the table as “vaccine_drive_stats_tab”.

Note

The target table can be renamed as per your requirements, but make sure to map the table name using a DMS object mapping rule.

The Dynamo DB sort/partition key for a table should be picked based on the table access patterns. DMS has the limitation in the CDC phase that DynamoDB doesn’t allow updates to the primary key attributes. This restriction is important when using ongoing replication with change data capture (CDC) because it can result in unwanted data on the target. Depending on how you have the object mapping, a CDC operation that updates the primary key can do one of two things: It can either fail, or insert a new item with the updated primary key and incomplete data. So, choose the partition key and sort key carefully to avoid issues in the migration.

For the Primary key, do the following:

DynamoDB query performance depends on the partition key and sort key selection for a table. So, choose a high cardinality column as the partition key to distribute the data across partitions in a DDB table. The sort key is used to sort and order items in a partition internally at the DDB table level. So, choose a sort key that collects related information together in one partition area, so that query performance can be improved. In this use case, we have chosen user_id as the partition key and "area_code" as the sort key to distribute and organize the data based on the application access pattern. Refer Choosing the Right DynamoDB Partition Key for more details.

  1. In the Partition key box, enter column name as “user_id” and set the data type to String.

  2. Choose To add sort key.

  3. In the Sort key box, enter column name as “area_code" and set the data type to String.

  4. In table settings, choose Customize Settings and then select On-Demand Read/Write capacity

  5. When the settings are as you want them, choose Create.

In this Walkthrough, we are pre-creating the target table with On-demand capacity mode for migration. Later, based on the traffic flow, you can change the capacity mode on the target to save costs after the migration completes. For more information, see Amazon DynamoDB create table.

Step 5: Configure an AWS DMS Target Endpoint

Before you begin to work with a DynamoDB database as a target for AWS DMS, make sure that you create an IAM role. This IAM role should allow AWS DMS to assume the application role, and grants access to the DynamoDB tables that are being migrated into. The minimum set of access permissions is shown in the following IAM policy.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "dms.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }

DMS creates the control tables “awsdms_apply_exceptions” and “awsdms_full_load_exceptions” on the DynamoDB target to record the failures in loading/applying the records in the migration. So, the role that you use for the migration to DynamoDB must have the following permissions, including for control tables.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "dynamodb:PutItem", "dynamodb:CreateTable", "dynamodb:DescribeTable", "dynamodb:DeleteTable", "dynamodb:DeleteItem", "dynamodb:UpdateItem" ], "Resource": [ "arn:aws:dynamodb:us-west-2:account-id:table/name1", "arn:aws:dynamodb:us-west-2:account-id:table/OtherName*", "arn:aws:dynamodb:us-west-2:account-id:table/awsdms_apply_exceptions", "arn:aws:dynamodb:us-west-2:account-id:table/awsdms_full_load_exceptions" ] }, { "Effect": "Allow", "Action": [ "dynamodb:ListTables" ], "Resource": "*" } ] }

To create a target endpoint for Amazon DynamoDB, do the following:

  1. Open the AWS DMS console at https://console.aws.amazon.com/dms/v2/.

  2. Choose Endpoints.

  3. Choose Create endpoint.

  4. On the Create endpoint page, enter the following information.

Parameter

Value

Endpoint type

Choose Target endpoint

Endpoint identifier

Enter dynamodb-target-dms-datastore

Target engine

Choose Amazon DynamoDB.

Service access role ARN

Provide the IAM role ARN created above

Step 6: Create DMS Task

Before you create the replication task, it is important to understand the workload on the source database, and the usage pattern of the tables being replicated. This helps plan an effective migration approach, and minimizes any configuration or workload related issues. In this section, we first review the important considerations, and then learn how to configure our walkthrough DMS task accordingly by applying table mappings and task settings.

Considerations Before Creating an AWS DMS Task

Size and number of records

The volume of migrated records affects the full load completion time. It is difficult to predict the full load time upfront, but testing with a replica of a production instance should provide a baseline. Use this estimate to decide whether you should parallelize full load by using multiple tasks or by using the parallel load option.

DMS supports parallel load threads for a target DynamoDB endpoint. However, other features such as parallel-load table level mapping aren’t supported for a target Dynamo DB endpoint.

ParallelLoadThreads – Use this option to specify the number of threads that AWS DMS uses to load each table into its DynamoDB target table. The default value is 0 (single-threaded). The maximum value is 200. You can contact support to have this maximum limit increased.

ParallelLoadBufferSize – Use this option to specify the maximum number of records to store in the buffer that the parallel load threads use to load data to the DynamoDB target. The default value is 50. The maximum value is 1,000. Use this setting with ParallelLoadThreads. ParallelLoadBufferSize is valid only when there is more than one thread. ParallelLoadThreads related settings responsible for only loading the data to target table using multiple threads. However, it doesn’t help to unload the source data in parallel.

To speed up the full load of large tables such as “vaccine_drive_stats” table in our use case, we can increase the number of parallel load threads in a task.

Transactions per second

While full load is affected by the number of records, the ongoing replication performance relies on the number of transactions on the source MySQL database. Performance issues during change data capture (CDC) generally stem from resource constraints on the source database, replication instance, target database, and network bandwidth or throughput. Knowing average and peak TPS on the source and recording CDC throughput and latency metrics helps baseline AWS DMS performance and identify an optimal task configuration. For more information, see Replication task metrics.

In this walkthrough, the source database is an RDS MySQL database where transaction volume depends on number of people attending the vaccination drive. So, a considerable amount of read and write traffic is expected during the day on the Source RDS MySQL database during the migration.

This approach requires a replication instance with higher compute capacity if the data volume is huge. We chose the compute-intensive c5 class replication instance to speed up the process.

If you are not sure about your data volumes or performance expectations from the migration task, start with general t3-class instances, and then migrate to c5-class instances for compute-intensive tasks, or r5-class instances for memory intensive tasks. You should monitor the task metrics continuously, and choose the appropriate instance class that best suits your needs.

Unsupported data types

Identify data types used in tables and check that AWS DMS supports these data types. For more information, see Source data types for MySQL.

Validate that the target DynamoDB has the corresponding data types. For more information, see Target data types for DynamoDB.

After you run the initial load test, validate that AWS DMS converted data as you expected. You can also initiate a pre-migration assessment to identify any unsupported data types in the migration scope. For more information, see Specifying individual assessments.

Source filtering in full load phase

Running AWS DMS replication tasks for large tables can add to the workload on the source database especially during the full load phase when AWS DMS reads whole tables from source database without any filters to restrict rows. When you use filters in AWS DMS task table mapping, confirm that appropriate indexes exist on the source tables and indexes are actually being used during full load. Regularly monitor the source database to identify any workload related issues. For more information, see Using table mapping to specify task settings.

In this walkthrough, we migrate one large source table (a table with 1 billion records and 210 GB in size) with the DMS default configuration to migrate the existing data to check the performance. Based on the full load run time and resource utilization metrics on the source MySQL database instance and replication instance, we used the AWS DMS parallel load option to further improve full load performance.

Task configuration

In this walkthrough, we migrate the existing and incremental changes to the target DynamoDB. To do so, we use the Full Load + CDC option. For more information about the AWS DMS task creation steps and available configuration options, see Creating a task.

We will first focus on the following settings.

LOB Settings

DMS considers source MySQL data types such as JSON, LONGTEXT, MEDIUMTEXT as LOB fields during migration. AWS DMS handles large binary object (LOB) columns differently compared to other data types. For more information, see Migrating large binary objects (Lobs).

A detailed explanation of LOB handling by AWS DMS is out of scope for this walkthrough. However, remember that increasing the LOB Max Size increases the task’s memory utilization. Because of that, we recommended that you don’t set LOB Max Size to a large value. For more information about LOB settings, see Task Configuration.

For this use case, the source table doesn’t contain any large object data types, so we decided to disable LOB settings in the task “TargetMetadata” configuration. Refer to the below task setting for more details.

{ "TargetMetadata": { "TargetSchema": "", "SupportLobs": false, "FullLobMode": false, "LobChunkSize": 0, "LimitedSizeLobMode": false, "LobMaxSize": 0, "InlineLobMaxSize": 0, "LoadMaxFileSize": 0, "ParallelLoadThreads": 0, "ParallelLoadBufferSize": 0, "BatchApplyEnabled": false, "TaskRecoveryTableEnabled": false, "ParallelLoadQueuesPerThread": 0, "ParallelApplyThreads": 0, "ParallelApplyBufferSize": 0, "ParallelApplyQueuesPerThread": 0 }, }

DMS has the following limitations in migrating large objects. If you have source table with large objects, check the respective source database DMS documentation for support scope, and based on that, configure the migration task.

  • AWS DMS doesn’t support LOB data unless it is a CLOB. AWS DMS converts CLOB data into a DynamoDB string when migrating the data.

Table Object mappings

DMS has the following limitations for a target DynamoDB endpoint.

  • AWS DMS only supports replication of tables with non-composite primary keys. The exception is if you specify an object mapping for the target table with a custom partition key or sort key, or both.

For this use case, the source MySQL table contains a composite primary key. Initially, we tried migrating the composite primary key table with a target prep mode of “DROP and CREATE” with only a DMS selection mapping rule. However, the table got suspended from the migration with following error, as mentioned in the limitations section prior:

00019383: 2023-03-14T08:48:33 [TARGET_LOAD ]E: Table 'vaccine_drive_stats' has composite primary key [1025900] (dynamodb_imp.c:368)
00019383: 2023-03-14T08:48:33 [TARGET_LOAD ]E: Unable to determine hash key for table 'vaccine_drive_stats' [1025900] (dynamodb_table_requests.c:399)
00019383: 2023-03-14T08:48:33 [TARGET_LOAD ]E: Failed to initialize create table request. [1020413] (dynamodb_table_requests.c:92)
00019383: 2023-03-14T08:48:33 [TARGET_LOAD ]E: Handling new table 'valis'.'vaccine_drive_stats' failed [1020413] (endpointshell.c:2712)
00019382: 2023-03-14T08:48:33 [SOURCE_UNLOAD ]I: Unload finished for table 'valis'.'vaccine_drive_stats' (Id = 1). 20970 rows sent. (streamcomponent.c:3543)
00019374: 2023-03-14T08:48:33 [TASK_MANAGER ]W: Table 'valis'.'vaccine_drive_stats' (subtask 1 thread 1) is suspended (replicationtask.c:2550)

To mitigate this issue, we created the target table as mentioned in Step 4, and then configured the task with the following object mapping rule. In our case, we used the "map-record-to-record" option to restructure the target table and its data storing method. Refer to the source table "vaccine_drive_stats" definition with the following object mapping for more clarity.

{ "rules": [ { "rule-type": "selection", "rule-id": "1", "rule-name": "1", "object-locator": { "schema-name": "valis", "table-name": "vaccine_drive_stats" }, "rule-action": "include" }, { "rule-type": "object-mapping", "rule-id": "2", "rule-name": "2", "rule-action": "map-record-to-record", "object-locator": { "schema-name": "valis", "table-name": "vaccine_drive_stats" }, "target-table-name": "vaccine_drive_stats_tab", "mapping-parameters": { "partition-key-name": "user_id", "sort-key-name": "area_code", "exclude-columns": [ "patient_name", "phone_num", "date_of_birth", "age", "date_vacc1", "date_vacc2", "date_booster", "fully_vaccinated", "age_group", "state", "zipcode", "gender", "city", "vaccine_type", "vaccine_name", "rural_or_urban", "certificate_link", "vaccinated_by", "vaccinated_at", "next_due_date" ], "attribute-mappings": [ { "target-attribute-name": "user_id", "attribute-type": "scalar", "attribute-sub-type": "string", "value": "${user_id}" }, { "target-attribute-name": "area_code", "attribute-type": "scalar", "attribute-sub-type": "string", "value": "${area_code}" }, { "target-attribute-name": "rural_or_urban", "attribute-type": "scalar", "attribute-sub-type": "string", "value": "${rural_or_urban}" }, { "target-attribute-name": "PatientDetails", "attribute-type": "scalar", "attribute-sub-type": "string", "value": "{\"patient_name\": \"${patient_name}\",\"phone_num\": \"${phone_num}\",\"date_of_birth\": \"${date_of_birth}\",\"age\": \"${age}\",\"gender\": \"${gender}\",\"state\": \"${state}\",\"zipcode\": \"${zipcode}\"}" }, { "target-attribute-name": "PatientVaccineinfo", "attribute-type": "scalar", "attribute-sub-type": "string", "value": "{\"date_vacc1\": \"${date_vacc1}\",\"date_vacc2\": \"${date_vacc2}\",\"date_booster\": \"${date_booster}\",\"fully_vaccinated\": \"${fully_vaccinated}\",\"vaccine_type\": \"${vaccine_type}\",\"vaccine_name\": \"${vaccine_name}\",\"certificate_link\": \"${certificate_link}\"}" }, { "target-attribute-name": "PatientVaclocation", "attribute-type": "scalar", "attribute-sub-type": "string", "value": "{\"vaccinated_by\": \"${vaccinated_by}\",\"vaccinated_at\": \"${vaccinated_at}\",\"next_due_date\": \"${next_due_date}\"}" } ] } } ] }

In this case, the source table contains 22 columns in total, but by using object mapping, we restructured the total number of columns to 6, and concatenated other fields into new columns, as mentioned following. Similarly, you can restructure the target based on your requirements using the object mapping feature. For more information, see Using object mapping to migrate data to DynamoDB. The following DynamoDB console screenshot shows the records in the table. As you can see, DMS migrated the records based on object mapping configuration.

Table items
Parallel load configuration

High values for ParallelLoadThreads cause heavy write traffic on the target DynamoDB tables. In such a scenario, you might find an increase in throttling events even in On-demand capacity mode. While increasing the setting value, monitor the target table’s monitoring graph and make sure that no throttling events occur.

In our use case, the task is initially configured to use 200 for the ParallelLoadThreads setting. However, the task experienced the following DynamoDB throttling error. To avoid this DynamoDB error, we reduced the values from 200 to 150 to avoid having high throttling write events on the target table. After this change, the number of throttling events was reduced to zero on target table. For more information about throttling, see Why is my on-demand DynamoDB table being throttled?

00143766: 2023-03-15T05:26:07 [SOURCE_CAPTURE ]E: PutItem failed with error: Throughput exceeds the current capacity of your table or index. DynamoDB is automatically scaling your table or index so please try again shortly. If exceptions persist, check if you have a hot key: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-design.html. [1001788] (ddb_item_actions.cpp:78)
00143766: 2023-03-15T05:26:07 [TARGET_LOAD ]E: Encountered a non-data error. Thread is exiting. [1025906] (dynamodb_load.c:83)

Task setting used for parallel load configuration:

{ "TargetMetadata": { "TargetSchema": "", "SupportLobs": false, "FullLobMode": false, "LobChunkSize": 0, "LimitedSizeLobMode": false, "LobMaxSize": 0, "InlineLobMaxSize": 0, "LoadMaxFileSize": 0, "ParallelLoadThreads": 150, "ParallelLoadBufferSize": 1000, "BatchApplyEnabled": false, "TaskRecoveryTableEnabled": false, "ParallelLoadQueuesPerThread": 0, "ParallelApplyThreads": 0, "ParallelApplyBufferSize": 0, "ParallelApplyQueuesPerThread": 0 }, }
Other task settings

Choose Enable CloudWatch Logs to upload the AWS DMS task run log to Amazon CloudWatch. You can use these logs to troubleshoot issues, because they include error and warning messages, start and end times of the run, configuration issues, and so on. To diagnose performance issues, you can change the task logging setting, such as to enable debugging or tracing.

Note Cloud Watch log usage is charged at standard rates. For more information, see Amazon CloudWatch pricing.

For Target table preparation mode, choose one of the following options: Do nothing, Truncate, or Drop. Use Truncate in data pipelines where the downstream systems rely on a fresh dump of clean data and do not rely on historical data. In our use case, the truncate option doesn’t support DynamoDB as a target. In this walkthrough, we choose Do nothing because we pre-created the target table as per the use case requirements.

For Maximum number of tables to load in parallel, enter the number of parallel threads that AWS DMS initiates during the full load. You can increase this value to improve the full load performance and minimize the load time when you have numerous tables. In this walkthrough, we use the default value of 8 because the task is only migrating one source table.

For Commit rate during full load, enter a value to indicate the maximum number of records that can be transferred together to the target table. The default value is 10000. In this walkthrough, use 50000 for better performance.

Configuration used for FullLoadSettings :

"FullLoadSettings": { "TargetTablePrepMode": "DO_NOTHING", "CreatePkAfterFullLoad": false, "StopTaskCachedChangesApplied": false, "StopTaskCachedChangesNotApplied": false, "MaxFullLoadSubTasks": 8, "TransactionConsistencyTimeout": 600, "CommitRate": 50000 },

Note

Increasing this parameter induces additional load on the source database, replication instance, and target database.

To create a database migration task
  1. Log in to the AWS Management Console, and open the AWS DMS console.

  2. Choose Database migration tasks, then choose Create task.

  3. On the Create database migration task page, enter the following information.

Parameter Action

Task identifier

Enter mysql-to-dynamodb-data-migration.

Replication instance

Choose mysql-to-ddb-migration-ri. You configured this value in Step 1.

Source database endpoint

Choose mysql-source-dms-datastore. You configured this value in Step 3.

Target database endpoint

Choose dynamodb-target-dms-datastore. You configured this value in Step 5.

Migration type

Choose Migrate existing data and replicate ongoing changes.

Editing mode

Choose Wizard.

Custom CDC stop mode for source transactions

Choose Disable custom CDC stop mode.

Target table preparation mode

Choose Do nothing.

Stop task after full load completes

Choose Don’t stop.

Include LOB columns in replication

Choose Don’t include LOB columns

Advanced task settings, Full load tuning settings, Maximum number of tables to load in parallel

Use default value

Enable validation

Turn off because DynamoDB doesn’t support validation.

Enable CloudWatch logs

Turn on.

  1. Keep the default values for other parameters, and choose Create task.

AWS DMS runs the task immediately. The Database migration tasks section displays the status of the migration task.

Step 7: Run the AWS DMS Task

After you create your AWS Database Migration Service (AWS DMS) task, do a test run to identify the full load run time and ongoing replication performance. You can validate that initial configurations work as expected. You can do this by monitoring and documenting resource utilization on the source database, replication instance, and target database. These details make up the initial baseline and help determine if you need further optimization.

After you started the task, the full load operation starts loading tables. You can see the table load completion status in the Table Statistics section and the corresponding records in the target DynamoDB instance.

In our use case, the following image shows table statistics for the dms.c5.12xlarge replication instance with parallel-load threads option. The full load for migrating 1 billion records completed in 14 hours. This means that we achieved our goal of completing full load in less than 15 hours. Further, if you still want to reduce the full load time, you can distribute the table workload using multiple tasks with DMS source filter conditions and Parallel load threads configurations. Following this approach, you can migrate the data in parallel with better performance.

Table statistics

A task with instance class “dms.c5.large” and default configuration was able to migrate 1 Billion records in 278 hours. Later, the task moved to the failed state due to unavailability of the source binary log from the full load start time. To avoid this issue, ensure that you are retaining the binary log based on the full load completion time. Using these statistics, you can understand the benefits of using a parallel load configuration to speed up the migration phase. See the following screenshot for details.

Parallel load

We also monitored the CloudWatch metrics such as compute, memory, and network to identify the resource usage of the AWS DMS instances. You have to identify the resource constraint and scale-up to the AWS DMS instance class that serves your workloads better. You could also scale-down the AWS DMS instance to a t3 or r5 instance class based on the transaction volume for your ongoing replication task.

Because we turned on the parallel-load option, the I/O load on the replication instance is expected to increase. We described in Step 1 that you should monitor the Write IOPS and Read IOPS metrics in CloudWatch to make sure that the total IOPS (write + read IOPS) doesn’t exceed the total IOPS available for your replication instance. If it does, make sure that you allocate more storage to scale for better I/O performance. For more information, see Monitoring replication tasks using Amazon CloudWatch.

We covered most of the prerequisites that help avoid errors related to configuration. If you observe issues when running the task, then see Troubleshooting migration tasks in Database Migration Service or Best practices for Database Migration Service, or reach out to AWS Support for further assistance.

Optionally, you could choose to validate the successful completion of the data migration by querying the target DynamoDB table from the console. You can use the “Get live item count” option to get the total table record counts.

Items Summary

When you choose "Start scan" from Get live item count, you will perform a DynamoDB scan to determine the most-recent item count. This scan might consume additional table read capacity units. Generally, it is not recommended to perform this action on very large tables or tables that serve critical production traffic. You can pause the action at any time to avoid consuming extra read capacity.

After you complete the migration, validate that your data migrated successfully, and delete the cloud resources that you created.

Conclusion

We covered all of the steps that you need to migrate a table from RDS MySQL to Amazon DynamoDB, and used the available configuration details to complete the migration in less time. Once the data was completely migrated to the target DB, then you can view the application traffic on the DynamoDB table. In this walkthrough, we achieved the crucial business requirements by using AWS DMS. Try out these steps to migrate your data to DynamoDB and explore how you can centralize your data with a low-cost solution. To learn more about AWS DMS, see the Database Migration Service User Guide.