Self-managed homogeneous migrations - Amazon Aurora MySQL Migration Handbook

Self-managed homogeneous migrations

This section provides examples of migration scenarios from self-managed MySQL-compatible databases to Amazon Aurora.

Note

If you are migrating from an Amazon RDS MySQL DB instance, you can use Aurora Read Replica or the RDS snapshot migration feature instead of doing a self-managed migration. Refer to the Migrating from Amazon RDS for MySQL section for more details.

Migrating using Percona XtraBackup

The recommended way to migrate the data from MySQL to Amazon Aurora is to use the Percona XtraBackup utility. For more information about using Percona Xtrabackup utility, refer to Migrating Data from an External MySQL Database in the Amazon Aurora User Guide.

Approach

This scenario uses the Percona XtraBackup utility to take a binary backup of the source MySQL database. The backup files are then uploaded to an Amazon S3 bucket and restored into a new Amazon Aurora DB cluster.

When to use

You can adopt this approach for small- to large-scale migrations when the following conditions are met:

  • The source database is a MySQL version 5.6 or 5.7.

  • You have administrative, system-level access to the source database.

  • You are migrating database servers in a one-to-one fashion: one source MySQL server becomes one new Aurora DB cluster.

When to consider other options

This approach is not currently supported in the following scenarios

  • Migrating into existing Aurora DB clusters.

  • Migrating multiple source MySQL servers into a single Aurora DB cluster.

Notes:

To simplify the demonstration, this scenario assumes the following:

  • Percona Xtrabackup can be installed on the source database server, it is a self-managed MySQL database (for example, running on Amazon EC2 or on-premises) or in a separate server. For simplicity, in this example Xtrabackup installed on the source database server.

  • The user requesting to create a new Aurora cluster should have necessary permission to restore the backup from S3. Review the documentation for more information about required permissions.

  • You can perform the backup from the source database using the admin user account or with minimum privileges mentioned in the Percona Xtrabackup documentation. For simplicity, the example assumes that user holds all permissions available in MySQL.

  • The Amazon Linux 2.0.20210701 operating system is used to demonstrate the configuration and complication steps for Percona Xtrabackup.

Examples (preparing tools)

The first step is to install the percona-release configuration tool.

[ec2-user@client ~]$ sudo yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm

Next step is to enable the repository and install the Percona XtraBackup.

[ec2-user@client ~]$ sudo percona-release enable-only tools [ec2-user@client ~]$ sudo yum install percona-xtrbackup-24

As a final step, confirm that Percona Xtrabackup version 2.4 is installed.

[ec2-user@client ~]$ xtrabackup --version

Examples (database backup)

After installing the Xtrabackup in your database server, you can perform the Xtrabackup. The xtrbackup command uses the following basic syntax.

xtrabackup --backup --host=<source_server_address> \ --user= <source_user> --password=<source_user_password> \ --stream=<backup_format> --target-dir=<destination_directory> \ | split -d --bytes=100MB - <backup_file_name>.<backup_format>

Descriptions of the parameter values are as follows:

  • <source_server_address> — DNS name or IP address of the source server

  • <source_user> — MySQL user account name on the source server

  • <source_user_password> — MySQL user account password on the source server

  • <backup_format> — Backup file format, it supports tar or xbstream

  • <destination_directory> — Destination directory for the backup

  • <backup_file_name> — Name of the backup file

Note

AWS recommends splitting the backup files into multiple smaller files to reduce the backup and restore time.

The example backup runs as follows.

[ec2-user@client ~]$ xtrabackup --backup --user=root --password=Passw0rd1 \ --stream=xbstream --target-dir=s3-restore/backup \ | split -d --bytes=100MB - s3-restore/backup/backup.xbstream

The operation results in the following files being created in the destination directory.

[ec2-user@client ~]$ ls -sh1 s3-restore/backup total 204M 96M backup.xbstream00 96M backup.xbstream01 13M backup.xbstream02

The directory contains the backup files which was split into 100 MB each. Now you just have to upload this to S3 bucket. In this example, files are copied from EC2 instance to S3 bucket using AWS CLI. For more information on creating and uploading files to an Amazon S3 bucket, refer to Getting started with Amazon Simple Storage Service in the Amazon S3 Getting Started Guide.

[ec2-user@client ~]$ aws s3 cp /s3-restore/backup/ s3://aurora-s3-restore --recursive upload: ./backup.xbstream02 to s3://aurora-s3-restore/backup.xbstream02 upload: ./backup.xbstream01 to s3://aurora-s3-restore/backup.xbstream01 upload: ./backup.xbstream00 to s3://aurora-s3-restore/backup.xbstream00

Now you can restore the Aurora MySQL DB cluster from an S3 bucket using the AWS RDS Console or AWS CLI.

For step-by-step instruction, refer to Restoring an Amazon Aurora MySQL DB cluster from an Amazon S3 bucket. To setup a replication between source and Aurora, refer to Synchronizing the Amazon Aurora MySQL DB cluster with the MySQL database using replication in the Amazon Aurora User Guide.

One-step migration using mysqldump

Another migration option uses the mysqldump utility to migrate data from MySQL to Amazon Aurora.

Approach

This scenario uses the mysqldump utility to export schema and data definitions from the source server and import them into the target Aurora DB cluster in a single step without creating any intermediate dump files.

When to use

You can adopt this approach for many small-scale migrations when the following conditions are met:

  • The data set is very small (10s of GB).

  • The network connection between source and target databases is fast and stable.

  • Migration performance is not critically important, and the cost of re-trying the migration is very low.

  • There is no need to do any intermediate schema or data transformations.

When to consider other options

This approach might not be an optimal choice if any of the following conditions are true:

You are migrating from an RDS MySQL DB instance or a self-managed MySQL with bigger data sets. In that case, you might get better results with Aurora read replica or Percona XtraBackup, respectively. For more details, refer to the Migrating from Amazon RDS for MySQL and Percona XtraBackup sections.

  • It is impossible to establish a network connection from a single client instance to source and target databases due to network architecture or security considerations.

  • The network connection between source and target databases is unstable or very slow.

  • The data set is larger than 10 GB.

  • Migration performance is critically important.

  • An intermediate dump file is required in order to perform schema or data manipulations before you can import the schema/data.

Notes

For the sake of simplicity, this scenario assumes the following:

  • Migration commands are run from a client instance running a Linux operating system.

  • The source server is a self-managed MySQL database (for example, running on Amazon EC2 or on-premises) that is configured to allow connections from the client instance.

  • The target Aurora DB cluster already exists and is configured to allow connections from the client instance. If you don’t yet have an Aurora DB cluster, review the step-by-step cluster launch instructions in the Amazon RDS User Guide.

  • Export from the source database is performed using a privileged, super-user MySQL account. For simplicity, this scenario assumes that the user holds all permissions available in MySQL.

  • Import into Amazon Aurora is performed using the Aurora admin user account, that is, the account whose name and password were specified during the cluster launch process.

Examples

The following command, when filled with the source and target server and user information, migrates data and all objects in the named schema(s) between the source and target servers.

mysqldump --host=<source_server_address> \ --user=<source_user> \ --password=<source_user_password> \ --databases <schema(s)> \ --single-transaction \ --compress | mysql --host=<target_cluster_endpoint> \ --user=<target_user> \ --password=<target_user_password>

Descriptions of the options and option values for the mysqldump command are as follows:

  • <source_server_address> — DNS name or IP address of the source server.

  • <source_user> — MySQL user account name on the source server.

  • <source_user_password> — MySQL user account password on the source server.

  • <schema(s)> — One or more schema names.

  • <target_cluster_endpoint> — Cluster DNS endpoint of the target Aurora cluster.

  • <target_user> — Aurora admin user name.

  • <target_user_password> — Aurora admin user password.

  • --single-transaction — Enforces a consistent dump from the source database. Can be skipped if the source database is not receiving any write traffic.

  • --compress — Enables network data compression.

Refer to the mysqldump documentation for more details.

Example:

mysqldump --host=source-mysql.example.com \ --user=mysql_admin_user \ --password=mysql_user_password \ --databases schema1 \ --single-transaction \ --compress | mysql --host=aurora.cluster-xxx.xx.amazonaws.com \ --user=aurora_admin_user \ --password=aurora_user_password
Note

This migration approach requires application downtime while the dump and import are in progress. You can avoid application downtime by extending the scenario with MySQL binary log replication. Refer to the Self-Managed Migration with Near-Zero Downtime section for more details.

Flat-file migration using files in CSV format

This scenario demonstrates a schema and data migration using flat-file dumps, that is, dumps that do not encapsulate data in SQL statements. Many database administrators prefer to use flat files over SQL-format files for the following reasons:

  • Lack of SQL encapsulation results in smaller dump files and reduces processing overhead during import.

  • Flat-file dumps are easier to process using OS-level tools; they are also easier to manage (for example, split or combine).

  • Flat-file formats are compatible with a wide range of database engines, both SQL and NoSQL.

Approach

The scenario uses a hybrid migration approach:

  • Use the mysqldump utility to create a schema-only dump in SQL format. The dump describes the structure of schema objects (for example, tables, views, and functions) but does not contain data.

  • Use SELECT INTO OUTFILE SQL commands to create data-only dumps in CSV format. The dumps are created in a one-file-per-table fashion and contain table data only (no schema definitions).

The import phase can be conducted in two ways:

  • Traditional approach — Transfer all dump files to an Amazon EC2 instance located in the same AWS Region and Availability Zone as the target Aurora DB cluster. After transferring the dump files, you can import them into Amazon Aurora using the mysql command line client and LOAD DATA LOCAL INFILE SQL commands for SQL-format schema dumps and the flat-file data dumps, respectively.

This is the approach that is demonstrated later in this section.

  • Alternative approach — Transfer the SQL-format schema dumps to an Amazon EC2 client instance, and import them using the mysql command-line client. You can transfer the flat-file data dumps to an Amazon S3 bucket and then import them into Amazon Aurora using LOAD DATA FROM S3 SQL commands.

For more information, including an example of loading data from Amazon S3, refer to Migrating Data from MySQL by Using an Amazon S3 Bucket in the Amazon RDS User Guide.

When to use

You can adopt this approach for most migration projects where performance and flexibility are important:

  • You can dump small data sets and import them one table at a time. You can also run multiple SELECT INTO OUTFILE and LOAD DATA INFILE operations in parallel for best performance.

  • Data that is stored in flat-file dumps is not encapsulated in database-specific SQL statements. Therefore, it can be handled and processed easily by the systems participating in the data exchange.

When to consider other options

You might choose not to use this approach if any of the following conditions are true:

  • You are migrating from an RDS MySQL DB instance or a self-managed MySQL database. In that case, you might get better results with snapshot migration or Percona XtraBackup, respectively. Refer to the Migrating from Amazon RDS for MySQL and Percona XtraBackup sections for more details.

  • The data set is very small and does not require a high-performance migration approach.

  • You want the migration process to be as simple as possible and you don’t require any of the performance and flexibility benefits listed earlier.

Notes

To simplify the demonstration, this scenario assumes the following:

  • Migration commands are run from client instances running a Linux operating system:

    • Client instance A is located in the source server’s network

    • Client instance B is located in the same Amazon VPC, Availability Zone, and Subnet as the target Aurora DB cluster

  • The source server is a self-managed MySQL database (for example, running on Amazon EC2 or on premises) configured to allow connections from client instance A.

  • The target Aurora DB cluster already exists and is configured to allow connections from client instance B. If you don’t have an Aurora DB cluster yet, review the step-by-step cluster launch instructions in the Amazon RDS User Guide.

  • Communication is allowed between both client instances.

  • Export from the source database is performed using a privileged, super user MySQL account. For simplicity, this scenario assumes that the user holds all permissions available in MySQL.

  • Import into Amazon Aurora is performed using the admin user account, that is, the account whose name and password were specified during the cluster launch process.

Note that this migration approach requires application downtime while the dump and import are in progress. You can avoid application downtime by extending the scenario with MySQL binary log replication. Refer to the Self-Managed Migration with Near-Zero Downtime section for more details.

Examples

In this scenario, you migrate a MySQL schema named myschema. The first step of the migration is to create a schema-only dump of all objects.

mysqldump --host=<source_server_address> \ --user=<source_user> \ --password=<source_user_password> \ --databases <schema(s)> \ --single-transaction \ --no-data > myschema_dump.sql

Descriptions of the options and option values for the mysqldump command are as follows:

  • <source_server_address> — DNS name or IP address of the source server.

  • <source_user> — MySQL user account name on the source server.

  • <source_user_password> — MySQL user account password on the source server.

  • <schema(s)> — One or more schema names.

  • <target_cluster_endpoint> — Cluster DNS endpoint of the target Aurora cluster.

  • <target_user> — Aurora primary user name.

  • <target_user_password> — Aurora primary user password.

  • --single-transaction — Enforces a consistent dump from the source database. Can be skipped if the source database is not receiving any write traffic.

  • --no-data — Creates a schema-only dump without row data.

For more details, refer to mysqldump in the MySQL 5.7 Reference Manual

Example:

admin@clientA:~$ mysqldump --host=11.22.33.44 --user=root \ --password=pAssw0rd --databases myschema \ --single-transaction --no-data > myschema_dump_schema_only.sql

After you complete the schema-only dump, you can obtain data dumps for each table. After logging in to the source MySQL server, use the SELECT INTO OUTFILE statement to dump each table’s data into a separate CSV file.

admin@clientA:~$ mysql --host=11.22.33.44 --user=root -- password=pAssw0rd mysql> show tables from myschema; +--------------------+ | Tables_in_myschema | +--------------------+ | t1 | | t2 | | t3 | | t4 | +--------------------+ 4 rows in set (0.00 sec) mysql> SELECT * INTO OUTFILE '/home/admin/dump/myschema_dump_t1.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' FROM myschema.t1; Query OK, 4194304 rows affected (2.35 sec) (repeat for all remaining tables)

For more information about SELECT INTO statement syntax, refer to SELECT ... INTO Syntax in the MySQL 5.6 Reference Manual.

After you complete all dump operations, the /home/admin/dump directory contains five files: one schema-only dump and four data dumps, one per table.

admin@clientA:~/dump$ ls -sh1 total 685M 4.0K myschema_dump_schema_only.sql 172M myschema_dump_t1.csv 172M myschema_dump_t2.csv 172M myschema_dump_t3.csv 172M myschema_dump_t4.csv

Next, you compress and transfer the files to client instance B located in the same AWS Region and Availability Zone as the target Aurora DB cluster. You can use any file transfer method available to you (for example, FTP or Amazon S3). This example uses SCP with SSH private key authentication.

admin@clientA:~/dump$ gzip myschema_dump_*.csv admin@clientA:~/dump$ scp -i ssh-key.pem myschema_dump_* \ <clientB_ssh_user>@<clientB_address>:/home/ec2-user/

After transferring all the files, you can decompress them and import the schema and data. Import the schema dump first because all relevant tables must exist before any data can be inserted into them.

admin@clientB:~/dump$ gunzip myschema_dump_*.csv.gz admin@clientB:~$ mysql --host=<cluster_endpoint> --user=aurora_user_name \ --password=pAssw0rd < myschema_dump_schema_only.sql

With the schema objects created, the next step is to connect to the Aurora DB cluster endpoint and import the data files.

Note the following:

  • The mysql client invocation includes a --local-infile parameter, which is required to enable support for LOAD DATA LOCAL INFILE commands.

  • Before importing data from dump files, use a SET command to disable foreign key constraint checks for the duration of the database session. Disabling foreign key checks not only improves import performance, but it also lets you import data files in arbitrary order.

admin@clientB:~$ mysql --local-infile --host=<cluster_endpoint> \ --user=admin --password=pAssw0rd mysql> SET foreign_key_checks = 0; Query OK, 0 rows affected (0.00 sec) mysql> LOAD DATA LOCAL INFILE '/home/ec2- user/myschema_dump_t1.csv' -> INTO TABLE myschema.t1 -> FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' -> LINES TERMINATED BY '\n'; Query OK, 4194304 rows affected (1 min 2.66 sec) Records: 4194304 Deleted: 0 Skipped: 0 Warnings: 0 (repeat for all remaining CSV files) mysql> SET foreign_key_checks = 1; Query OK, 0 rows affected (0.00 sec)

You have imported the schema and data dumps into the Aurora DB cluster.