Configuring multi-source-replication for Amazon RDS for MySQL
With multi-source replication, you can set up an Amazon RDS for MySQL DB instance as a replica that receives binary log events from more than one RDS for MySQL source DB instance. Multi-source replication is supported for RDS for MySQL DB instances running the following engine versions:
-
All MySQL 8.4 versions
8.0.35 and higher minor versions
5.7.44 and higher minor versions
For information about MySQL multi-source replication, see MySQL Multi-Source Replication
Use cases for multi-source replication
The following cases are good candidates for using multi-source replication on RDS for MySQL:
-
Applications that need to merge or combine multiple shards on separate DB instances into a single shard.
-
Applications that need to generate reports from data consolidated from multiple sources.
-
Requirements to create consolidated long-term backups of data that's distributed among multiple RDS for MySQL DB instances.
Prerequisites for multi-source replication
Before you configure multi-source replication, complete the following prerequisites.
Make sure that each source RDS for MySQL DB instance has automatic backups enabled. Enabling automatic backups enables binary logging. To learn how to enable automatic backups, see Enabling automated backups.
To avoid replication errors, we recommended that you block write operations to the source DB instances. You can do so by setting the
read-only
parameter toON
in a custom parameter group attached to the RDS for MySQL source DB instance. You can use the AWS Management Console or the AWS CLI to create a new custom parameter group or to modify an existing one. For more information, see Creating a DB parameter group in Amazon RDS and Modifying parameters in a DB parameter group in Amazon RDS.For each source DB instance, add the IP address of the instance to the Amazon virtual private cloud (VPC) security group for the multi-source DB instance. To identify the IP address of a source DB instance, you can run the command
dig
. Run the command from an Amazon EC2 instance in the same VPC as the destination multi-source DB instance.RDS Endpoint
For each source DB instance, use a client to connect to the DB instance and create a database user with the required privileges for replication, as in the following example.
CREATE USER '
repl_user
' IDENTIFIED BY 'password
'; GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'repl_user
';
Configuring multi-source replication channels on RDS for MySQL DB instances
Configuring multi-source replication channels is similar to configuring single source replication. For multi-source replication, you first turn on binary logging on the source instance. Then, you import data from the sources to the multi-source replica. Then, you start replication from each source by using the binary log coordinates or by using GTID auto-positioning.
To configure an RDS for MySQL DB instance as a multi-source replica of two or more RDS for MySQL DB instances, perform the following steps.
Topics
Step 1: Import data from the source DB instances to the multi-source replica
Perform the following steps on each source DB instance.
Before you import the data from a source to the multi-source replica, determine
the current binary log file and position by running the SHOW MASTER
STATUS
command. Take note of these details for use in the next step.
In this example output, the file is mysql-bin-changelog.000031
and the position is
107
.
File Position ----------------------------------- mysql-bin-changelog.000031 107 -----------------------------------
Now copy the database from the source DB instance to the multi-source replica by
using mysqldump
, as in the following example.
mysqldump --databases database_name \ --single-transaction \ --compress \ --order-by-primary \ -u
RDS_user_name
\ -pRDS_password
\ --host=RDS Endpoint
| mysql \ --host=RDS Endpoint
\ --port=3306 \ -uRDS_user_name
\ -pRDS_password
After copying the database, you can set the read-only parameter to
OFF
on the source DB instance.
Step 2: Start replication from the source DB instances to the multi-source replica
For each source DB instance, use the master user credentials to connect to the instance, and run the following two stored procedures. These stored procedures configure replication on a channel and start replication. This example uses the binlog file name and position from the example output in the previous step.
CALL mysql.rds_set_external_source_for_channel('
mysourcehost.example.com
', 3306, 'repl_user
', 'password
', 'mysql-bin-changelog.000031
',107
, 1, 'channel_1
'); CALL mysql.rds_start_replication_for_channel('channel_1
');
For more information about using these stored procedures and others to set up and manage your replication channels, see Managing multi-source replication.
Using filters with multi-source replication
You can use replication filters to specify which databases and tables are replicated with in multi-source replica. Replication filters can include databases and tables in replication or exclude them from replication. For more information on replication filters, see Configuring replication filters with MySQL.
With multi-source replication, you can configure replication filters globally or at the channel level. Channel-level filtering is available only with supported DB instances running version 8.0 or version 8.4. The following examples show how to configure filters globally or at the channel level.
Note the following requirements and behavior with filtering in multi-source replication:
Back quotes (``) around the channel names are required.
If you change replication filters in the parameter group, the multi-source replica's
sql_thread
for all channels with updates are restarted to apply the changes dynamically. If an update involves a global filter, then all replication channels in the running state are restarted.All global filters are applied before any channel-specific filters.
If a filter is applied globally and at the channel level, then only the channel-level filter is applied. For example, if the filters are
replicate_ignore_db="db1,`channel_22`:db2"
, thenreplicate_ignore_db
set todb1
is applied to all channels except forchannel_22
, and onlychannel_22
ignores changes fromdb2
.
Example 1: Setting a global filter
In the following example, the temp_data
database is excluded from replication in every channel.
For Linux, macOS, or Unix:
aws rds modify-db-parameter-group \ --db-parameter-group-name myparametergroup \ --parameters "ParameterName=replicate-ignore-db,ParameterValue='temp_data',ApplyMethod=immediate"
Example 2: Setting a channel-level filter
In the following example, changes from the sample22
database are only
included in channel channel_22
. Similarly, changes from the
sample99
database are only included in channel
channel_99
.
For Linux, macOS, or Unix:
aws rds modify-db-parameter-group \ --db-parameter-group-name myparametergroup \ --parameters "ParameterName=replicate-do-db,ParameterValue='\`channel_22\`:sample22,\`channel_99\`:sample99',ApplyMethod=immediate"
Monitoring multi-source replication channels
You can monitor individual channels in a multi-source replica by using the following methods:
To monitor the status of all channels or a specific channel, connect to the multi-source replica and run the
SHOW REPLICA STATUS
orSHOW REPLICA STATUS FOR CHANNEL '
command. For more information, see Checking Replication Statuschannel_name
'in the MySQL documentation. To receive notification when a replication channel is started, stopped, or removed, use RDS event notification. For more information, see Working with Amazon RDS event notification.
To monitor the lag for a specific channel, check the
ReplicationChannelLag
metric for it. Data points for this metric have a period of 60 seconds (1 minute) are available for 15 days. To locate the replication channel lag for a channel, use the instance identifier and the replication channel name. To receive notification when this lag exceeds a particular threshold, you can set up a CloudWatch alarm. For more information, see Monitoring Amazon RDS metrics with Amazon CloudWatch.
Considerations and best practices for multi-source replication
Before you use multi-source replication on RDS for MySQL, review the following considerations and best practices:
-
Make sure that a DB instance configured as a multi-source replica has sufficient resources such as throughput, memory, CPU, and IOPS to handle the workload from multiple source instances.
-
Regularly monitor resource utilization on your multi-source replica and adjust the storage or instance configuration to handle the workload without straining resources.
-
You can configure multi-threaded replication on a multi-source replica by setting the system variable
replica_parallel_workers
to a value greater than0
. In this case, the number of threads allocated to each channel is the value of this variable, plus one coordinator thread to manage the applier threads. -
Configure replication filters appropriately to avoid conflicts. To replicate an entire database to another database on a replica, you can use the
--replicate-rewrite-db
option. For example, you can replicate all tables in database A to database B on a replica instance. This approach can be helpful when all source instances are using the same schema naming convention. For information about the--replicate-rewrite-db
option, see Replica Server Options and Variablesin the MySQL documentation. -
To avoid replication errors, avoid writing to the replica. We recommended that you enable the
read_only
parameter on multi-source replicas to block write operations. Doing so helps to eliminate replication issues caused by conflicting write operations. -
To increase the performance of read operations such as sorts and high-load joins that are executed on the multi-source replica, consider using RDS Optimized Reads. This feature can help with queries that depend on large temporary tables or sort files. For more information, see Improving query performance for RDS for MySQL with Amazon RDS Optimized Reads.
-
To minimize replication lag and improve the performance of a multi-source replica, consider enabling optimized writes. For more information, see Improving write performance with RDS Optimized Writes for MySQL.
-
Perform management operations (such as changing configuration) on one channel at a time, and avoid performing changes to multiple channels from multiple connections. These practices can lead to conflicts in replication operations. For example, simultaneously executing
rds_skip_repl_error_for_channel
andrds_start_replication_for_channel
procedures from multiple connections can cause skipping of events on a different channel than intended. -
You can enable backups on a multi-source replication instance and export data from that instance to an Amazon S3 bucket to store it for long-term purposes. However, it's important to also configure backups with appropriate retention on the individual source instances. For information about exporting snapshot data to Amazon S3, see Exporting DB snapshot data to Amazon S3 for Amazon RDS.
-
To distribute the read workload on a multi-source replica, you can create read replicas from a multi-source replica. You can locate these read replicas in different AWS Regions based on your application's requirements. For more information about read replicas, see Working with MySQL read replicas.
Limitations for multi-source replication on RDS for MySQL
The following limitations apply to multi-source replication on RDS for MySQL:
-
Currently, RDS for MySQL supports configuring a maximum of 15 channels for a multi-source replica.
-
A read replica instance can't be configured as a multi-source replica.
-
To configure multi-source replication on RDS for MySQL running engine version 5.7, Performance Schema must be enabled on the replica instance. Enabling Performance Schema is optional on RDS for MySQL running engine version 8.0 or 8.4.
-
For RDS for MySQL running engine version 5.7, replication filters apply to all replication channels. For RDS for MySQL running engine version 8.0 or 8.4, you can configure filters that apply to all replication channels or to individual channels.
-
Restoring an RDS snapshot or performing a Point-in-time-Restore (PITR) doesn't restore multi-source replica channel configurations.
-
When you create a read replica of a multi-source replica, it only replicates data from the multi-source instance. It doesn't restore any channel configuration.
-
MySQL doesn't support setting up a different number of parallel workers for each channel. Every channel gets the same number of parallel workers based on the
replica_parallel_workers
value.
The following additional limitations apply if your multi-source replication target is a Multi-AZ DB cluster:
A channel must be configured for a source RDS for MySQL instance before any writes to that instance occur.
-
Each source RDS for MySQL instance must have GTID-based replication enabled.
-
A failover event on the DB cluster removes the multi-source replication configuration. Restoring that configuration requires repeating the configuration steps.