Amazon Relational Database Service
User Guide (API Version 2014-09-01)
Did this page help you?  Yes | No |  Tell us about it...
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.

Importing Data to an Amazon RDS MySQL DB Instance with Reduced Downtime

When importing data from a MySQL database that supports a live application to an Amazon RDS MySQL DB instance, you can use the following procedure to minimize the impact on application availability. This procedure can also help if you are working with a very large database, because you can reduce the cost of the import by reducing the amount of data that is passed across the network to AWS.

In this procedure, you will transfer a copy of your database data to an Amazon EC2 instance and import the data into a new Amazon RDS MySQL DB instance. You will then use replication to bring the Amazon RDS MySQL DB instance up-to-date with your live MySQL database, before directing your application to the Amazon RDS MySQL DB instance.

Note

We don't recommend that you use this procedure with source MySQL databases from MySQL versions earlier than version 5.1, due to potential replication issues. For more information, go to Replication Compatibility Between MySQL Versions in the MySQL documentation.

Create a Copy of Your Existing Database

The first step in the process of migrating a large amount of data to an Amazon RDS MySQL DB instance with minimal downtime is to create a copy of the source data.

You can use the mysqldump utility to create a database backup in either SQL or delimited-text format. You should do a test run with each format in a non-production environment to see which method minimizes the amount of time that mysqldump runs.

You should also weigh mysqldump performance against the benefit offered by using the delimited-text format for loading. A backup using delimited-text format creates a tab-separated text file for each table being dumped. You can load these files in parallel using the LOAD DATA INFILE command to reduce the amount of time required to import your database. For more information about choosing a mysqldump format and then loading the data, go to Using mysqldump For Backups in the MySQL documentation.

Before you start the backup operation, you must set the replication options on the MySQL database that you are copying to Amazon RDS. The replication options include enabling binary logging and setting a unique server ID. Setting these options will cause your server to start logging database transactions and prepare it to be a replication master later in this process.

Note

Your database needs to be stopped to set the replication options and be in read-only mode while the backup copy is created, so you will need to schedule a maintenance window for these operations.

To set replication options

  1. From a command shell, stop the mysql service:

    sudo service mysqld stop
  2. Edit the my.cnf file (this file is usually under /etc):

    sudo vi /etc/my.cnf

    Add the log_bin and server_id options to the [mysqld] section. The log_bin option provides a file name identifier for binary log files. The server_id option provides a unique identifier for the server in master-replica relationships.

    The following example shows the updated [mysqld] section of a my.cnf file:

    [mysqld]
    log-bin=mysql-bin
    server-id=1                                

    For more information, go to Setting the Replication Master Configuration in the MySQL documentation.

  3. Start the mysql service:

    sudo service mysqld start

To create a backup copy of your existing database

  1. Make the source MySQL instance read-only:

    mysql> FLUSH TABLES WITH READ LOCK;
    mysql> SET GLOBAL read_only = ON;                                
  2. Create a backup of your data using the mysqldump utility, specifying either SQL or delimited-text format.

    You must specify --master-data=2 in order to create a backup file that can be used to start replication between servers. For more information, go to the mysqldump documentation.

    To improve performance and ensure data integrity, use the --order-by-primary and --single-transaction options of mysqldump.

    To avoid including the MySQL system database in the backup, do not use the --all-databases option with mysqldump. For more information, go to Creating a Dump Snapshot Using mysqldump in the MySQL documentation.

    Use chmod if necessary to make sure that the directory where the backup file is being created is writeable.

    • To produce SQL output, use the following command:

      sudo mysqldump --databases <database_name> --master-data=2  --single-transaction 
      --order-by-primary -r backup.sql –u <local_user> -p
    • To produce delimited-text output, use the following command:

      sudo mysqldump --tab=<target_directory> --fields-terminated-by=, --fields-enclosed-by='"' 
      --lines-terminated-by=0x0d0a <database_name> --master-data=2 --single-transaction 
      --order-by-primary -p

      Note

      You must create any stored procedures, triggers, functions, or events manually in your Amazon RDS database. If you have any of these objects in the database that you are copying, exclude them when you run mysqldump by including the following arguments with your mysqldump command: --routines=0 --triggers=0 --events=0.

      When using the delimited-text format, a CHANGE MASTER TO comment will be returned when you run mysqldump. This comment contains the master log file name and position. Note the values for MASTER_LOG_FILE and MASTER_LOG_POS; you will need these values when setting up replication.

      -- Position to start replication or point-in-time recovery from
      --
      -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin-changelog.000001', MASTER_LOG_POS=107;
  3. Compress the copied data to reduce the amount of network resources needed to copy your MySQL data to an Amazon RDS MySQL DB. Take note of the size of the backup file, you will need this information when determining how large an Amazon EC2 instance to create. When you are done, compress the backup file using GZIP or your preferred compression utility.

    • To compress SQL output, use the following command:

      gzip backup.sql
    • To compress delimited-text output, use the following command:

      tar -zcvf backup.tar.gz <target_directory>
  4. Make the source MySQL instance writeable again:

    mysql> SET GLOBAL read_only = OFF;
    mysql> UNLOCK TABLES;                                    

    For more information on making backups for use with replication, go to Backing Up a Master or Slave by Making It Read Only in the MySQL documentation.

Create an Amazon EC2 Instance and Copy the Compressed Database

Copying your compressed database backup file to an Amazon EC2 instance takes fewer network resources than doing a direct copy of uncompressed data between MySQL instances. Once your data is in Amazon EC2, you can copy it from there directly to your Amazon RDS MySQL DB instance. Note that, in order for you to save on the cost of network resources, your Amazon EC2 instance must be in the same region as your Amazon RDS MySQL DB instance. Having the Amazon EC2 instance in the same region as your Amazon RDS MySQL DB instance also lets you reduce network latency during the import.

To create an Amazon EC2 instance and copy your data

  1. In the region where you will create your Amazon RDS instance, create an Amazon Virtual Private Cloud (VPC), a VPC security group, and a VPC subnet. Ensure that the inbound rules for your VPC security group allow the IP addresses required for your application to connect to AWS. This can be a range of IP addresses (for example 203.0.113.0/24), or another VPC security group. You can use the Amazon VPC Console to create and manage VPCs, subnets, and security groups. For more information, go to Getting Started with Amazon VPC in the Amazon Virtual Private Cloud Getting Started Guide.

    Note

    Older AWS accounts can also launch instances in Amazon EC2-Classic mode. In this case, make sure that the inbound rules in the DB security group for your Amazon RDS instance allow access for your EC2-Classic instance using the Amazon EC2 private IP address. For more information, see Working with DB Security Groups.

  2. Open the Amazon EC2 Console and select the region that will contain both your Amazon EC2 instance and your Amazon RDS MySQL DB instance. Launch an Amazon EC2 instance using the VPC, subnet, and security group that you created in Step 1. Ensure that you select an instance type with enough storage for your database backup file when it is uncompressed. For details on Amazon EC2 instances, go to Getting Started with Amazon EC2 Linux Instances in the Amazon Elastic Compute Cloud User Guide for Linux.

  3. Edit the VPC security group and add the private IP address for your new Amazon EC2 instance. The private IP address will be used when connecting to your Amazon RDS MySQL DB instance. You can find the private IP address on the Details tab of the Instance pane in the Amazon EC2 Console. For more information on modifying a VPC security group, go to Security Groups for Your VPC in the Amazon Virtual Private Cloud User Guide.

  4. Copy your compressed database backup file from your local system to your Amazon EC2 instance. Use chmod if necessary to make sure you have write permission for the target directory of the Amazon EC2 instance. You can use scp or an SSH client to copy the file. The following is an example:

    $ scp -r -i <key pair>.pem backup.sql.gz ec2-user@<EC2 DNS>:/<target_directory>/backup.sql.gz

    Important

    Be sure to copy sensitive data using a secure network transfer protocol.

  5. Connect to your Amazon EC2 instance and install the latest updates and the MySQL client tools using the following commands:

    sudo yum update –y
    sudo yum install mysql-server –y

    For more information, go to Connect to Your Instance in the Amazon Elastic Compute Cloud User Guide for Linux.

  6. While connected to your Amazon EC2 instance, decompress your database backup file. For example:

    • To decompress SQL output, use the following command:

      gzip backup.sql.gz –d
    • To decompress delimited-text output, use the following command:

      tar xzvf backup.tar.gz

Create an Amazon RDS MySQL DB instance and Import Data from Your Amazon EC2 Instance

By creating an Amazon RDS MySQL DB instance in the same region as your Amazon EC2 instance, you can import the database backup file from Amazon EC2 faster than you can import it over the Internet.

To create an Amazon RDS MySQL DB instance and import your data

  1. Determine which DB instance class and what amount of storage space is required to support the expected workload for this Amazon RDS MySQL DB instance. This process should include deciding what is sufficient space and processing capacity for your data load procedures, and also what is required to handle the production workload. You can estimate this based on the size and resources of the source MySQL database. For more information, see DB Instance Class.

  2. Determine if Amazon RDS provisioned input/output operations per second (IOPS) is required to support the workloads. Provisioned IOPS storage delivers fast throughput for online transaction processing (OLTP) workloads, which are I/O intensive. For more information, see Amazon RDS Provisioned IOPS Storage to Improve Performance.

  3. Open the Amazon RDS Console. In the upper-right corner, select the region that contains your Amazon EC2 instance.

  4. Click Launch a DB Instance, and then go through the steps to select options for your DB instance:

    1. On the Select Engine page, click MySQL.

    2. On the Do you plan to use this database for production purposes? page, click No to skip configuring Multi-AZ deployment and provisioned IOPS storage.

    3. In the Instance Specifications section of the Specify DB Details page, specify the DB instance class and allocated storage size that you have determined are appropriate. Select No for Multi-AZ Deployment. Specify whether or not to use Provisioned IOPS as you determined in Step 2. For DB Engine Version, select the version that is compatible with your source MySQL instance, as follows:

      • If your source MySQL instance is 5.1.x, the Amazon RDS MySQL instance must be 5.5.x.

      • If your source MySQL instance is 5.5.x, the Amazon RDS MySQL instance can be 5.5.x or 5.6.x.

      • If your source MySQL instance is 5.6.x, the Amazon RDS MySQL instance must be 5.6.x.

      Accept the default values for all other boxes in this section.

      In the Settings section, specify the requested database and user information. Click Next when you are done.

    4. In the Network & Security section of the Configure Advanced Settings page, select the same VPC and VPC security group as for your Amazon EC2 instance. This approach will ensure that your Amazon EC2 instance and your Amazon RDS instance are visible to each other over the network. Accept the default values for all other boxes in this section.

      In the Database Options section, specify a database name. Accept the default values for all other boxes in this section.

      In the Backup section, set the backup retention period to 0. Accept the default values for all other boxes in this section.

      In the Maintenance section, accept the default values for all of the boxes. Click Launch Instance when you are done.

    Do not configure multiple Availability Zones, backup retention, or Read Replicas until after you have imported the database backup. When that import is done, you can set Multi-AZ and backup retention the way you want them for the production instance. For a detailed walkthrough of creating an Amazon RDS MySQL DB instance, see Creating a DB Instance Running the MySQL Database Engine.

  5. Review the default configuration options for the Amazon RDS MySQL DB instance. In the left navigation pane of the Amazon RDS Management Console, click on Parameter Groups , and then click on the magnifying glass icon next to the default.mysqlx.x parameter group. If this parameter group does not have the configuration options that you want, find a different one that does, or create a new parameter group. For more information on creating a parameter group, see Working with DB Parameter Groups. If you decide to use a different parameter group than the default, associate it with your Amazon RDS MySQL DB instance. For more information, see Modifying a DB Instance Running the MySQL Database Engine.

  6. Connect to the new Amazon RDS MySQL DB instance as the master user, and create the users required to support the administrators, applications, and services that will need to access the instance. The host name for the Amazon RDS MySQL DB instance will be the Endpoint value for this instance without including the port number, for example, mysampledb.claxc2oy9ak1.us-west-2.rds.amazonaws.com. You can find the endpoint value in the instance details in the Amazon RDS Management Console.

  7. Connect to your Amazon EC2 instance. For more information, go to Connect to Your Instance in the Amazon Elastic Compute Cloud User Guide for Linux.

  8. Connect to your Amazon RDS MySQL DB instance as a remote host from your Amazon EC2 instance using the mysql command. The following is an example:

    mysql –h <host_name> -port=3306 –u <db_master_user> -p

    The host name is the DNS name from the Amazon RDS MySQL DB instance endpoint.

  9. At the mysql prompt, run the source command and pass it the name of your database dump file to load the data into the Amazon RDS MySQL DB instance.

    • For SQL format, use the following command:

      mysql> source backup.sql;
    • For delimited-text format, first create the database (if it isn’t the default database you created when setting up the Amazon RDS MySQL DB instance):

      $ mysql> create database <database_name>;
      $ mysql> use <database_name>;

      Then create the tables:

      $ mysql> source <table1>.sql
      $ mysql> source <table2>.sql
      etc…

      Then import the data:

      $ mysql> LOAD DATA INFILE 'table1.txt' INTO TABLE table1 FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '0x0d0a';
      $ mysql> LOAD DATA INFILE 'table2.txt' INTO TABLE table2 FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '0x0d0a';
      etc…

      To improve performance, you can perform these operations in parallel from multiple connections so that all of your tables get created and then loaded at the same time.

      Note

      If you used any data-formatting options with mysqldump when you initially dumped the table, you must use the same options with mysqlimport or LOAD DATA INFILE to ensure proper interpretation of the data file contents.

  10. Run a simple SELECT query against one or two of the tables in the imported database to verify that the import was successful.

Replicate Between Your MySQL Database and New Amazon RDS MySQL DB Instance

Because your source database was likely updated during the time that it took to copy and transfer the data to the Amazon RDS MySQL DB instance, you will use replication to bring the copied database up-to-date with the source database.

Note

The permissions required to start replication on an Amazon RDS MySQL DB instance are restricted and not available to your Amazon RDS master user. Because of this, you must use the Amazon RDS mysql.rds_set_external_master and mysql.rds_start_replication commands to set up replication between your live database and your Amazon RDS MySQL database.

To start replication

Earlier, you enabled binary logging and set a unique server ID for your source MySQL database. Now you can set up your Amazon RDS MySQL DB instance as a replica with your live database as the replication master.

  1. In the Amazon RDS Management Console, add the IP address of the server that hosts the source MySQL database to the VPC security group for the Amazon RDS MySQL DB instance. For more information on modifying a VPC security group, go to Security Groups for Your VPC in the Amazon Virtual Private Cloud User Guide.

    You might also need to configure your local network to permit connections from the IP address of your Amazon RDS MySQL DB instance, so that it can communicate with your source MySQL instance. To find the IP address of the Amazon RDS MySQL DB instance, use the host command:

    host <RDS_MySQL_DB_host_name>

    The host name is the DNS name from the Amazon RDS MySQL DB instance endpoint, for example, myinstance.123456789012.us-east-1.rds.amazonaws.com. You can find the endpoint value in the instance details in the Amazon RDS Management Console.

  2. Using the client of your choice, connect to the source MySQL instance and create a MySQL user that will be used for replication. This account is used solely for replication and must be restricted to your domain to improve security. The following is an example:

    CREATE USER 'repl_user'@'mydomain.com' IDENTIFIED BY '<password>';
  3. For the source MySQL instance, grant REPLICATION CLIENT and REPLICATION SLAVE privileges to your replication user. For example, to grant the REPLICATION CLIENT and REPLICATION SLAVE privileges on all databases for the 'repl_user' user for your domain, issue the following command:

    GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'repl_user'@'mydomain.com' IDENTIFIED BY '<password>';
  4. If you used SQL format to create your backup file, look at the contents of that file:

    cat backup.sql

    The file will include a CHANGE MASTER TO comment that contains the master log file name and position. This comment is included in the backup file when you use the -–master-date option with mysqldump. Note the values for MASTER_LOG_FILE and MASTER_LOG_POS.

    --
    -- Position to start replication or point-in-time recovery from
    --
    
    -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin-changelog.000001', MASTER_LOG_POS=107;
  5. Make the Amazon RDS MySQL DB instance the replica. Connect to the Amazon RDS MySQL DB instance as the master user and identify the source MySQL database as the replication master by using the mysql.rds_set_external_master command. Use the master log file name and master log position that you determined in the previous step if you have a SQL format backup file, or that you determined when creating the backup files if you used delimited-text format. The following is an example:

    CALL mysql.rds_set_external_master ('mymasterserver.mydomain.com', 3306,
        'repl_user', '<password>', 'mysql-bin-changelog.000001', 107, 1); 
  6. On the Amazon RDS MySQL DB instance, issue the mysql.rds_start_replication command to start replication:

    CALL mysql.rds_start_replication;
  7. On the Amazon RDS MySQL DB instance, run the SHOW SLAVE STATUS command to determine when the replica is up-to-date with the replication master. The results of the SHOW SLAVE STATUS command include the Seconds_Behind_Master field. When the Seconds_Behind_Master field returns 0, then the replica is up-to-date with the master.

  8. After the Amazon RDS MySQL DB instance is up-to-date, enable automated backups so you can restore that database if needed. You can enable or modify automated backups for your Amazon RDS MySQL DB instance using the Amazon RDS Management Console. For more information, see Working With Automated Backups.

Redirect Your Live Application to Your Amazon RDS MySQL Instance

Once the Amazon RDS MySQL DB instance is up-to-date with the replication master, you can now update your live application to use the Amazon RDS instance.

To redirect your live application to your Amazon RDS MySQL instance and stop replication

  1. To add the VPC security group for the Amazon RDS MySQL DB instance, add the IP address of the server that hosts the application. For more information on modifying a VPC security group, go to Security Groups for Your VPC in the Amazon Virtual Private Cloud User Guide.

  2. Verify that the Seconds_Behind_Master field in the SHOW SLAVE STATUS command results is 0, which indicates that the replica is up-to-date with the replication master:

    SHOW SLAVE STATUS;
  3. Stop replication for the Amazon RDS instance using the mysql.rds_stop_replication command:

    CALL mysql.rds_stop_replication;
  4. Update your application to use the Amazon RDS MySQL DB instance. This update will typically involve changing the connection settings to identify the host name and port of the Amazon RDS MySQL DB instance, the user account and password to connect with, and the database to use.

  5. Run the mysql.rds_reset_external_master command on your Amazon RDS MySQL DB instance to reset the replication configuration so this instance is no longer identified as a replica:

    CALL mysql.rds_reset_external_master;
  6. Enable additional Amazon RDS MySQL features such as Multi-AZ support and Read Replicas. For more information, see High Availability (Multi-AZ) and Working with PostgreSQL and MySQL Read Replicas.