MyDumper - AWS Prescriptive Guidance

MyDumper

MyDumper (GitHub) is an open-source, logical migration tool that consists of two utilities:

  • mydumper exports a consistent backup of MySQL databases. It supports backing up the database by using multiple parallel threads, up to one thread per available CPU core.

  • myloader reads the backup files created by mydumper, connects to the target database instance, and then restores the database.

The following diagram shows the high-level steps involved in migrating a database by using a mydumper backup file. This architecture diagram includes three options for migrating the backup file from the on-premises data center to an EC2 instance in the AWS Cloud.

Diagram of migrating a mydumper backup file and using myloader to restore it on the AWS DB instance.

The following are the steps for using MyDumper to migrate a database to the AWS Cloud:

  1. Install mydumper and myloader. For instructions, see How to install mydumper/myloader (GitHub).

  2. Use mydumper to create a backup of the source MySQL or MariaDB database. For instructions, see How to use MyDumper.

  3. Move the backup file to an EC2 instance in the AWS Cloud by using one of the following approaches:

    Approach 3A – Mount an Amazon FSx or Amazon Elastic File System (Amazon EFS) file system to the on-premises server that runs your database instance. You can use AWS Direct Connect or AWS VPN to establish the connection. You can directly back up the database to the mounted file share, or you can perform the backup in two steps by backing up the database to a local file system and then uploading it to the mounted FSx or EFS volume. Next, mount the Amazon FSx or Amazon EFS file system, which is also mounted on the on-premises server, on an EC2 instance.

    Approach 3B – Use AWS CLI, AWS SDK, or Amazon S3 REST API to directly move the backup file from the on-premises server to an S3 bucket. If the target S3 bucket is in an AWS Region that is far away from the data center, you can use Amazon S3 Transfer Acceleration to transfer the file more quickly. Use the s3fs-fuse file system to mount the S3 bucket on the EC2 instance.

    Approach 3C – Install the AWS DataSync agent at the on-premises data center, and then use AWS DataSync to move the backup file to an Amazon S3 bucket. Use the s3fs-fuse file system to mount the S3 bucket on the EC2 instance.

    Note

    You can also use Amazon S3 File Gateway to transfer the large database backup files to an S3 bucket in the AWS Cloud. For more information, see Using Amazon S3 File Gateway to transfer backup files in this guide.

  4. Use myloader to restore the backup on the target database instance. For instructions, see myloader usage (GitHub).

  5. (Optional) You can set up replication between the source database and the target database instance. You can use binary log (binlog) replication to reduce downtime. For more information, see the following:

Advantages

  • MyDumper supports parallelism by using multi-threading, which improves the speed of backup and restore operations.

  • MyDumper avoids expensive character set conversion routines, which helps ensure the code is highly efficient.

  • MyDumper simplifies the data view and parsing by using dumping separate files for tables and metadata.

  • MyDumper maintains snapshots across all threads and provides accurate positions of primary and secondary logs.

  • You can use Perl Compatible Regular Expressions (PCRE) to specify whether to include or exclude tables or databases.

Limitations

  • You might choose a different tool if your data transformation processes require intermediate dump files in flat format instead of SQL format.

  • myloader doesn't import database user accounts automatically. If you are restoring the backup to Amazon RDS or Aurora, recreate the users with the required permissions. For more information, see Master user account privileges in the Amazon RDS documentation. If you are restoring the backup to an Amazon EC2 database instance, you can manually export the source database user accounts and import them into the EC2 instance.

Best practices

  • Configure mydumper to divide each table into segments, such as 10,000 rows in each segment, and write each segment in a separate file. This makes it possible to import the data in parallel later.

  • If you are using the InnoDB engine, use the --trx-consistency-only option to minimize locking.

  • Using mydumper to export the database can become read-intensive, and the process can impact overall performance of the production database. If you have a replica database instance, run the export process from the replica. Before you run the export from the replica, stop the replication SQL thread. This helps the export process run more quickly.

  • Don't export the database during peak business hours. Avoiding peak hours can stabilize the performance of your primary production database during the database export.