Database migration considerations
A carefully planned data migration strategy is important to prevent a sub-optimal experience that ends up creating more problems than it solves. It is important to be aware of different phases in the migration process to plan for and to optimize the performance of each of those individual stages, to prevent any delays in the data migration. Accurately estimating the storage needs and choosing the right networking connectivity between components helps prevent migration issues.
Migration phases
Following are the various phases typically seen during the data migration process:
Assess source database
As a first step in the data migration process, gather information about the tables in the source database that need to be migrated to the target database. Understand the requirement of the organization, whether you need to perform a one-time migration (via full load) or whether you need to migrate the data on an ongoing basis (continuous data capture). It is important to be aware of the table sizes as well as database sizes, and verify whether any large objects (LOBs) are present in the migration scope. LOBs are generally migrated one at a time, piece by piece. They can significantly slow down the migration and require more processing resources than standard objects. For more information on how LOBs are handled by AWS DMS, refer to Setting LOB support for source databases in an AWS DMS task.
Also understand the nature of transactions (Data Manipulation Language (DML), Data Query Language (DQL), Data Definition Language (DDL) and so on.) happening on your database and calculate Transactions per Second (TPS). Based on this information, you can isolate the busy tables which have a higher number of transactions to prevent them from impacting the migration of other tables, and you can allocate additional resources for migrating these tables. Be aware of indexes, triggers, and referential integrity database constraints, which can affect the migration performance. It's a best practice to use native monitoring for your source database (DB) engine to be sure that your DB isn't experiencing a performance bottleneck; for example, CPU or memory contention, or input/output (IO) saturation.
If the source database is already constrained on resources, the data migration will be further slowed down. It is important to understand the behavior of your source database engine to come up with an optimal migration plan. For a comprehensive list of valid sources supported by AWS DMS, refer to Sources for AWS DMS.
Since the target engine (Amazon Redshift) is different from the source database engine, you might need to convert the source database schema and a majority of database code objects, including views, stored procedures, and functions, to a format compatible with the target. AWS Schema Conversion Tool (AWS SCT) can help out with this requirement. When using SCT, you can create mapping rules and any objects that cannot be automatically converted are clearly marked so that they can be manually converted to complete the migration. You can review the database migration assessment report which summarizes all of the action items for schemas that can't be converted automatically to the target engine. Once schema conversion is complete, SCT can help migrate data from a range of data warehouses to Amazon Redshift using built-in data migration agents.
The following screenshot shows the list of action items shown by the database migration assessment report in SCT.

The list of action items shown by the database migration assessment report in SCT
Determine the target
AWS DMS can use many of the most popular databases as a target for data replication. For a comprehensive list of valid targets, refer to Targets for AWS DMS. It’s important to be aware of the latency requirements of your organization and the business use-cases while choosing a target for migrating your data. For example, Amazon Redshift is optimized for online analytical processing (OLAP), meaning it is used to aggregate large quantities of data across long time periods and may not require real-time updates. To ensure minimal disruption to those processes, consider loading in batches with larger latencies.
For real-time updates, consider using online transaction
processing (OLTP) systems such as
Amazon Relational Database Service
Connectivity options
It is important to validate and monitor connectivity between the source database and AWS DMS, and between AWS DMS to the target, to avoid any performance issues. Connectivity options vary depending on where the different components are deployed.
The location of the AWS DMS replication instances also varies
based on the target chosen. For example, if an Amazon Redshift
cluster is used as a target, it must be in the same AWS account
and same AWS Region as the replication instance. If the source
endpoint, the replication instance, and the target endpoint are
in the same VPC, traffic flows within the VPC. If different VPCs
are used for deployment, you can link these VPCs together by
using VPC peering or AWS Transit Gateway. If the source endpoint
is an on-premises database in your corporate data center, it can
be connected to AWS resources using
AWS Direct Connect
However, it is highly recommended to choose the right options to connect resources, to avoid any latency issues. AWS Direct Connect provides the shortest path to AWS resources, because it is a dedicated connection between your corporate data centers and AWS. While in transit, your network traffic remains on the AWS global network and never goes over the internet. This reduces the chance of hitting bottlenecks or unexpected increases in latency when compared to using VPN or the public internet.

VPN connection going over the public internet

Using AWS Direct Connect as a dedicated connection between an on-premises database and AWS
Full load (initial load) phase
Full load is used for loading all the data that is available in the tables in the source database to the target, and is typically used for one-time data migrations. During a full load task, AWS DMS performs a full table scan for each of the source tables in parallel. Because the entire data is being loaded from the tables, it is important to optimize the full load settings to accelerate the migration of large or partitioned tables. These configurations may vary depending on the type of target chosen. AWS SCT has agents which can help with the full load to Amazon Redshift. During a full load migration task, you can accelerate the migration of large or partitioned tables by splitting the table into segments and loading the segments in-parallel in the same migration task. We will discuss more about table segmentation later in the whitepaper.
CDC phase (insert, update, delete)
If an ongoing data migration is required from the source database to the target, you can leverage CDC functionality provided by AWS DMS to capture incremental changes. This process works by collecting changes to the database logs using the database engine's native API. If the CDC process is not tuned accurately, your workloads may face high source or target latencies.
Storage
It is a good idea to be aware of amount of data that needs to be migrated from the source database to plan accordingly for the storage required by AWS DMS replication instances as well as by Amazon Redshift (target).
For example, if you are using Oracle database as a source, be
aware of the log switches and the amount of
redo
logs

Example showing the number of log switches and redo generation per day
In AWS DMS, most of the processing happens in memory. However, large transactions might require some buffering on disk. Cached transactions and log files are also written to disk. If your source system is busy or takes large transactions, you might need to increase your storage on DMS as well as target systems. If you're running multiple tasks on the replication server, you might also need a storage increase.
If you have detailed debug logging turned on for DMS, ensure that
you turn it off when you don’t need it, because detailed debug
logging requires a large amount of storage space and can
potentially fill up the disk, leading to DMS task failures. For
more information about storage issues, refer to
Why
is my AWS DMS replication DB instance in the storage-full
status?
Based on the amount of data that needs to be migrated into your
Amazon Redshift cluster, you can choose the right instance types.
You can use the
Amazon Redshift sizing calculator
Networking bandwidth
If you are using VPN tunnel between your on-premises source
database and AWS DMS, ensure that the bandwidth is sufficient for
your workload. Each
AWS Site-to-Site VPN
To optimize performance, ensure that your network throughput is
fast. When you address network performance, get a network sniffer
trace to analyze network performance. In addition, use
network
monitoring tools
Another simple test for finding out the networking speed is done by transferring a file from your source database to an Amazon Elastic Compute Cloud (Amazon EC2) instance in AWS within the same virtual private cloud (VPC) as the AWS DMS service, and noting the amount of time the process takes. The transfer speed can be calculated by dividing the amount of data transferred with the time taken to transfer:

Source database statistics
Verify the
NetworkReceiveThroughput
metric on DMS end indicating the incoming traffic on replication
instance.

An example of average network traffic of 11 MB/s on the DMS instance
Pricing for migration
AWS DMS pricing varies based on the type of replication instances
chosen and any additional log storage. Each database migration
instance includes storage sufficient for swap space, replication
logs, and data cache for most replications and inbound data
transfer is free. All data transfer into AWS DMS is free, and data
transferred between AWS DMS and Amazon RDS, Amazon Redshift and
Amazon EC2 instances in the same Availability Zone also is free.
Standard AWS data transfer rates apply when you migrate your
source database to a target database in a different Availability
Zone, Region, or outside of AWS. Standard AWS data transfer rates
are listed on the
EC2
instance pricing page