Multi-threaded migration using mydumper and myloader - Amazon Aurora MySQL Migration Handbook

Multi-threaded migration using mydumper and myloader

mydumper and myloader are popular open-source MySQL export/import tools designed to address performance issues associated with the legacy mysqldump program. They operate on SQL-format dumps and offer advanced features such as the following:

  • Dumping and loading data using multiple parallel threads

  • Creating dump files in a file-per-table fashion

  • Creating chunked dumps in a multiple-files-per-table fashion

  • Dumping data and metadata into separate files for easier parsing and management

  • Configurable transaction size during import

  • Ability to schedule dumps in regular intervals

For more details, refer to the MySQL Data Dumper project page.

Approach

The scenario uses the mydumper and myloader tools to perform a multi- threaded schema and data migration without the need to manually invoke any SQL commands or design custom migration scripts.

The migration is performed in two steps:

  1. Use the mydumper tool to create a schema and data dump using multiple parallel threads.

  2. Use the myloader tool to process the dump files and import them into an Aurora DB cluster, also in multi-threaded fashion.

Note that mydumper and myloader might not be readily available in the package repository of your Linux/Unix distribution. For your convenience, the scenario also shows how to build the tools from source code.

When to use

You can adopt this approach in most migration projects:

  • The utilities are easy to use and enable database users to perform multi- threaded dumps and imports without the need to develop custom migration scripts.

  • Both tools are highly flexible and have reasonable configuration defaults. You can adjust the default configuration to satisfy the requirements of both small- and large-scale migrations.

When to consider other options

You might decide not to use this approach if any of the following conditions are true:

  • You are migrating from an RDS MySQL DB instance or a self-managed MySQL database. In that case, you might get better results with Aurora Read Replica or Percona XtraBackup, respectively. Refer to the Migrating from Amazon RDS for MySQL and Percona XtraBackup sections for more details.

  • You can’t use third-party software because of operating system limitations.

  • Your data transformation processes require intermediate dump files in a flat-file format and not an SQL format.

Notes

To simplify the demonstration, this scenario assumes the following:

  • You run the migration commands from client instances running a Linux operating system:

    • Client instance A is located in the source server’s network

    • Client instance B is located in the same Amazon VPC, Availability Zone, and Subnet as the target Aurora cluster

  • The source server is a self-managed MySQL database (for example, running on Amazon EC2 or on premises) configured to allow connections from client instance A.

  • The target Aurora DB cluster already exists and is configured to allow connections from client instance B. If you don’t have an Aurora DB cluster yet, review the step-by-step cluster launch instructions in the Amazon RDS User Guide.

  • Communication is allowed between both client instances.

  • You perform the export from the source database using a privileged, super user MySQL account. For simplicity, the example assumes that the user holds all permissions available in MySQL.

  • You perform the import into Amazon Aurora using the primary user account, that is, the account whose name and password were specified during the cluster launch process.

  • The Amazon Linux 2.0.20210701 operating system is used to demonstrate the configuration and compilation steps for mydumper and myloader.

Note

This migration approach requires application downtime while the dump and import are in progress. You can avoid application downtime by extending the scenario with MySQL binary log replication. Refer to the Self-Managed Migration with Near-Zero Downtime section for more details.

Examples (preparing tools)

The first step is to install the dependencies of mydumper and myloader tools. Refer to the MySQL Data Dumper project page for up-to-date download links and to ensure that tools are prepared on both client instances.

  1. The utilities depend on several packages that you should install first.

    [ec2-user@clientA ~]$ sudo yum install gcc gcc-c++ git cmake make glib2-devel \ mysql56-devel zlib-devel pcre-devel openssl-devel
  2. Install mydumper/myloader.

    [ec2-user@clientA mydumper-0.9.1]$ sudo yum install https://github.com/maxbube/mydumper/releases/download/v0.10.7-2/mydumper-0.10.7-2.el7.x86_64.rpm
  3. Confirm that both utilities are available in the system.

    [ec2-user@clientA ~]$ mydumper -V mydumper 0.10.7, built against MySQL 5.7.34-37 [ec2-user@clientA ~]$ myloader -V myloader 0.10.7, built against MySQL 5.7.34-37

Examples (migration)

After completing the preparation steps, you can perform the migration.

The mydumper command uses the following basic syntax:

mydumper -h <source_server_address> -u <source_user> \ -p <source_user_password> -B <source_schema> \ -t <thread_count> -o <output_directory>

Descriptions of the parameter values are as follows:

  • <source_server_address> — DNS name or IP address of the source server

  • <source_user> — MySQL user account name on the source server

  • <source_user_password> — MySQL user account password on the source server

  • <source_schema> — Name of the schema to dump

  • <thread_count> — Number of parallel threads used to dump the data

  • <output_directory> — Name of the directory where dump files should be placed

mydumper is a highly customizable data dumping tool. For a complete list of supported parameters and their default values, use the built-in help.

mydumper --help
  1. Create the dump file, using mydumper. The example dump runs as follows:

    [ec2-user@clientA ~]$ mydumper -h 11.22.33.44 -u root \ -p pAssw0rd -B myschema -t 4 -o myschema_dump/

    The operation results in the following files being created in the dump directory.

    [ec2-user@clientA ~]$ ls -sh1 myschema_dump total 733M 4.0K metadata 4.0K myschema-schema-create.sql 4.0K myschema.t1-schema.sql 184M myschema.t1.sql 4.0K myschema.t2-schema.sql 184M myschema.t2.sql 4.0K myschema.t3-schema.sql 184M myschema.t3.sql 4.0K myschema.t4-schema.sql 184M myschema.t4.sql

    The directory contains a collection of metadata files in addition to schema and data dumps. You don’t have to manipulate these files directly. It’s enough that the directory structure is understood by the myloader tool.

  2. Compress the entire directory and transfer it to client instance B.

    [ec2-user@clientA ~]$ tar czf myschema_dump.tar.gz myschema_dump [ec2-user@clientA ~]$ scp -i ssh-key.pem myschema_dump.tar.gz \ <clientB_ssh_user>@<clientB_address>:/home/ec2-user/
  3. When the transfer is complete, connect to client instance B and verify that the myloader utility is available.

    [ec2-user@clientB ~]$ myloader -V myloader 0.10.7, built against MySQL 5.7.34-37
  4. Now you can unpack the dump and import it. The syntax used for the myloader command is very similar to what you already used for mydumper. The only difference is the --d (source directory) parameter replacing the --o (target directory) parameter.

    [ec2-user@clientB ~]$ tar zxf myschema_dump.tar.gz [ec2-user@clientB ~]$ myloader -h <cluster_dns_endpoint> \ -u master -p pAssw0rd -B myschema -t 4 -d myschema_dump/

Useful tips

  • The concurrency level (thread count) does not have to be the same for export and import operations. A good rule of thumb is to use one thread per server CPU core (for dumps) and one thread per two CPU cores (for imports).

  • The schema and data dumps produced by mydumper use an SQL format and are compatible with MySQL 5.6. Although you will typically use the pair of mydumper and myloader tools together for best results, technically you can import the dump files from myloader by using any other MySQL-compatible client tool.