Database migration considerations - Amazon Aurora MySQL Migration Handbook

Database migration considerations

A database represents a critical component in the architecture of most applications. Migrating the database to a new platform is a significant event in an application’s lifecycle and may have an impact on application functionality, performance, and reliability.

Migrations are among the most time-consuming and critical tasks handled by database administrators. Although the task has become easier with the advent of managed migration services such as AWS Database Migration Service, large-scale database migrations still require adequate planning to meet strict compatibility and performance requirements.

Migration phases

Because database migrations tend to be complex, AWS advocates taking a phased, iterative approach.

Diagram showing migration phases

Migration phases

This paper considers these major contributors to the success of every database migration project:

  • Factors that justify the migration to Amazon Aurora, such as compatibility, performance, cost, and high availability and durability

  • Best practices for choosing the optimal migration method

  • Best practices for planning and implementing a migration

  • Migration troubleshooting hints

This section discusses important considerations that apply to most database migration projects. For an extended discussion of related topics, see the Amazon Web Services (AWS) whitepaper Migrating Your Databases to Amazon Aurora.

Features and compatibility

Although most applications can be architected to work with many relational database engines, you should make sure that your application works with Amazon Aurora. Amazon Aurora is designed to be wire-compatible with MySQL 5.6,5.7 and 8.0 Therefore, most of the code, applications, drivers, and tools that are used today with MySQL databases can be used with Aurora with little or no change.

However, certain MySQL features, such as the MyISAM storage engine, are not available to use with Amazon Aurora.

Also, due to the managed nature of the Aurora service, SSH access to database nodes is restricted, which may affect your ability to install third-party tools or plugins on the database host.

For more details, refer to Aurora on Amazon RDS in the Amazon Relational Database Service (Amazon RDS) User Guide.

Performance

Performance is often the key motivation behind database migrations. However, deploying your database on Amazon Aurora can be beneficial even if your applications don’t have performance issues. For example, Amazon Aurora scalability features can greatly reduce the amount of engineering effort required to prepare your database platform for future traffic growth.

You should include benchmarks and performance evaluations in every migration project. Therefore, many successful database migration projects start with performance evaluations of the new database platform. Although the RDS Aurora Performance Assessment Benchmarking paper gives you a decent idea of overall database performance, these benchmarks do not emulate the data access patterns of your applications. For more useful results, test the database performance for time-sensitive workloads by running your queries (or subset of your queries) on the new platform directly.

Consider these strategies:

  • If your current database is MySQL, migrate to Amazon Aurora with downtime and performance test your database with a test or staging version of your application or by replaying the production workload.

  • If you are on a non-MySQL compliant engine, you can selectively copy the busiest tables to Amazon Aurora and test your queries for those tables. This gives you a good starting point. Of course, testing after complete data migration will provide a full picture of real-world performance of your application on the new platform.

Amazon Aurora delivers comparable performance with commercial engines and significant improvement over MySQL performance.

It does this by tightly integrating the database engine with an SSD-based virtualized storage layer designed for database workloads. This reduces writes to the storage system, minimizes lock contention, and eliminates delays created by database process threads.

One area where Amazon Aurora significantly improves upon traditional MySQL is highly concurrent workloads. In order to maximize your workload’s throughput on Amazon Aurora, AWS recommends architecting your applications to drive a large number of concurrent queries.

Cost

Amazon Aurora provides consistent high performance together with the security, availability, and reliability of a commercial database at one-tenth the cost.

Owning and running databases come with associated costs. Before planning a database migration, an analysis of the total cost of ownership (TCO) of the new database platform is imperative. Migration to a new database platform should ideally lower the total cost of ownership while providing your applications with similar or better features. If you are running an open-source database engine (MySQL, Postgres), your costs are largely related to hardware, server management, and database management activities. However, if you are running a commercial database engine (Oracle, Microsoft SQL Server, DB2 and so on), a significant portion of your cost is database licensing.

Amazon Aurora can even be more cost-efficient than open-source databases because its high scalability helps you reduce the number of database instances that are required to handle the same workload.

For more details, refer to the Amazon RDS for Aurora Pricing page.

Availability and durability

High-availability and disaster recovery are important considerations for databases. Your application may already have very strict recovery time objective (RTO) and recovery point objective (RPO) requirements. Amazon Aurora can help you meet or exceed your availability goals by having the following components:

  1. Read replicas — Increase read throughput to support high-volume application requests by creating up to 15 database Aurora replicas. Amazon Aurora Replicas share the same underlying storage as the primary instance, lowering costs and avoiding the need to perform writes at the replica nodes. This frees up more processing power to serve read requests and reduces the replica lag time, often down to single digit milliseconds. Aurora provides a reader endpoint so the application can connect without having to keep track of replicas as they are added and removed. Aurora also supports read replica auto-scaling, where it automatically adds and removes replicas in response to changes in performance metrics that you specify.

  2. Global DatabaseAmazon Aurora Global Database is designed for globally distributed applications, allowing a single Amazon Aurora database to span multiple AWS regions. It replicates data to the secondary AWS Regions using dedicated infrastructure with no impact on database performance, enables fast local reads with low latency in each region, and provides disaster recovery from region-wide outages. Unlike MySQL binary log replication, Aurora Global Database uses storage-based replication and it removes the overhead of replaying data-modifying statements at the secondary region clusters. This reduces replication overhead and leaves more capacity available for application workloads.

  3. Multi-AZ —Aurora stores copies of the data in a DB cluster across multiple Availability Zones in a single AWS Region, regardless of whether the instances in the DB cluster span multiple Availability Zones. For more information on Aurora, refer to Managing an Amazon Aurora DB Cluster. When data is written to the primary DB instance, Aurora synchronously replicates the data across Availability Zones to six storage nodes associated with your cluster volume. Doing so provides data redundancy, eliminates I/O freezes, and minimizes latency spikes during system backups. Running a DB instance with high availability can enhance availability during planned system maintenance, and help protect your databases against failure and Availability Zone disruption.

For more information about durability and availability features in Amazon Aurora, refer to Aurora on Amazon RDS in the Amazon RDS User Guide

Planning and testing a database migration

After you determine that Amazon Aurora is the right fit for your application, the next step is to decide on a migration approach and create a database migration plan. Here are the suggested high-level steps:

  1. Review the available migration techniques described in this document, and choose one that satisfies your requirements.

  2. Prepare a migration plan in the form of a step-by-step checklist. A checklist ensures that all migration steps are completed in the correct order and that the migration process flow can be controlled (for example, suspended or resumed) without the risk of important steps being missed.

  3. Prepare a checklist with rollback procedures. Ideally, you should be able to roll the migration back to a known, consistent state from any point in the migration checklist.

  4. Use the checklist to perform a test migration, and take note of the time required to complete each step. If any missing steps are identified, add them to the checklist. If any issues are identified during the test migration, address them and rerun the test migration.

  5. Test all rollback procedures. If any rollback procedure has not been tested successfully, assume that it will not work.

  6. After you complete the test migration and become fully comfortable with the migration plan, do the migration.

Summary of available migration methods

This section lists common migration sources and the migration methods available to them, in order of preference. Detailed descriptions, step-by-step instructions, and tips for advanced migration scenarios are available in subsequent sections.

Table 1 — Summary of available migration methods

Source database Target database Suggested migration tool options
Self-managed MySQL Aurora MySQL Percona XtraBackup - preferred
mysqldump or mydumper/myloader
AWS DMS
Amazon RDS for MySQL Aurora MySQL Aurora read replica - preferred
Amazon RDS snapshot restore
MySQL compatible (for example, MariaDB) Aurora MySQL Logical migration using native tools - preferred
AWS DMS
Commercial Database (for example, SQL Server or Oracle) Aurora MySQL AWS DMS - preferred