Multi-AZ DB cluster deployments
A Multi-AZ DB cluster deployment is a semisynchronous, high availability deployment mode of Amazon RDS with two readable replica DB instances. A Multi-AZ DB cluster has a writer DB instance and two reader DB instances in three separate Availability Zones in the same AWS Region. Multi-AZ DB clusters provide high availability, increased capacity for read workloads, and lower write latency when compared to Multi-AZ DB instance deployments.
You can import data from an on-premises database to a Multi-AZ DB cluster by following the instructions in Importing data to an Amazon RDS MariaDB or MySQL database with reduced downtime.
You can purchase reserved DB instances for a Multi-AZ DB cluster. For more information, see Reserved DB instances for a Multi-AZ DB cluster.
Feature availability and support varies across specific versions of each database engine, and across AWS Regions. For more information on version and Region availability of Amazon RDS with Multi-AZ DB clusters, see Multi-AZ DB clusters.
Topics
- Instance class availability for Multi-AZ DB clusters
- Overview of Multi-AZ DB clusters
- Managing a Multi-AZ DB cluster with the AWS Management Console
- Working with parameter groups for Multi-AZ DB clusters
- Upgrading the engine version of a Multi-AZ DB cluster
- Using RDS Proxy with Multi-AZ DB clusters
- Replica lag and Multi-AZ DB clusters
- Failover process for Multi-AZ DB clusters
- Creating a Multi-AZ DB cluster
- Connecting to a Multi-AZ DB cluster
- Automatically connecting an AWS compute resource and a Multi-AZ DB cluster
- Modifying a Multi-AZ DB cluster
- Renaming a Multi-AZ DB cluster
- Rebooting a Multi-AZ DB cluster and reader DB instances
- Working with Multi-AZ DB cluster read replicas
- Using PostgreSQL logical replication with Multi-AZ DB clusters
- Deleting a Multi-AZ DB cluster
- Limitations of Multi-AZ DB clusters
Important
Multi-AZ DB clusters aren't the same as Aurora DB clusters. For information about Aurora DB clusters, see the Amazon Aurora User Guide.
Instance class availability for Multi-AZ DB clusters
Multi-AZ DB cluster deployments are supported for the following DB instance classes: db.m5d
,
db.m6gd
, db.m6id
, db.m6idn
,
db.r5d
, db.r6gd
, db.x2iedn
,
db.r6id
, and db.r6idn
, and db.c6gd
.
Note
The c6gd instance classes are the only ones that support the medium
instance size.
For more information about DB instance classes, see DB instance classes.
Overview of Multi-AZ DB clusters
With a Multi-AZ DB cluster, Amazon RDS replicates data from the writer DB instance to both of the reader DB instances using the DB engine's native replication capabilities. When a change is made on the writer DB instance, it's sent to each reader DB instance.
Multi-AZ DB cluster deployments use semisynchronous replication, which requires acknowledgment from at least one reader DB instance in order for a change to be committed. It doesn't require acknowledgment that events have been fully executed and committed on all replicas.
Reader DB instances act as automatic failover targets and also serve read traffic to increase application read throughput. If an outage occurs on your writer DB instance, RDS manages failover to one of the reader DB instances. RDS does this based on which reader DB instance has the most recent change record.
The following diagram shows a Multi-AZ DB cluster.
Multi-AZ DB clusters typically have lower write latency when compared to Multi-AZ DB instance deployments. They also allow read-only workloads to run on reader DB instances. The RDS console shows the Availability Zone of the writer DB instance and the Availability Zones of the reader DB instances. You can also use the describe-db-clusters CLI command or the DescribeDBClusters API operation to find this information.
Important
To prevent replication errors in RDS for MySQL Multi-AZ DB clusters, we strongly recommend that all tables have a primary key.
Managing a Multi-AZ DB cluster with the AWS Management Console
You can manage a Multi-AZ DB cluster with the console.
To manage a Multi-AZ DB cluster with the console
Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/
. -
In the navigation pane, choose Databases, and then choose the Multi-AZ DB cluster that you want to manage.
The following image shows a Multi-AZ DB cluster in the console.
The available actions in the Actions menu depend on whether the Multi-AZ DB cluster is selected or a DB instance in the cluster is selected.
Choose the Multi-AZ DB cluster to view the cluster details and perform actions at the cluster level.
Choose a DB instance in a Multi-AZ DB cluster to view the DB instance details and perform actions at the DB instance level.
Working with parameter groups for Multi-AZ DB clusters
In a Multi-AZ DB cluster, a DB cluster parameter group acts as a container for engine configuration values that are applied to every DB instance in the Multi-AZ DB cluster.
In a Multi-AZ DB cluster, a DB parameter group is set to the default DB parameter group for the DB engine and DB engine version. The settings in the DB cluster parameter group are used for all of the DB instances in the cluster.
For information about parameter groups, see Working with parameter groups.
Upgrading the engine version of a Multi-AZ DB cluster
Amazon RDS provides newer versions of each supported database engine so that you can keep your Multi-AZ DB cluster up to date. When Amazon RDS supports a new version of a database engine, you can choose how and when to upgrade your Multi-AZ DB cluster.
There are two kinds of upgrades that you can perform:
- Major version upgrades
-
A major engine version upgrade can introduce changes that aren't compatible with existing applications. When you initiate a major version upgrade, Amazon RDS simultaneously upgrades the reader and writer instances. Therefore, your DB cluster might not be available until the upgrade completes.
- Minor version upgrades
-
A minor version upgrade includes only changes that are backward-compatible with existing applications. When you initiate a minor version upgrade, Amazon RDS first upgrades the reader DB instances one at a time. Then, one of the reader DB instances switches to be the new writer DB instance. Amazon RDS then upgrades the old writer instance (which is now a reader instance).
Downtime during the upgrade is limited to the time it takes for one of the reader DB instances to become the new writer DB instance. This downtime acts like an automatic failover. For more information, see Failover process for Multi-AZ DB clusters. Note that the replica lag of your Multi-AZ DB cluster might affect the downtime. For more information, see Replica lag and Multi-AZ DB clusters.
For RDS for PostgreSQL Multi-AZ DB cluster read replicas, Amazon RDS upgrades the cluster member instances one at a time. The reader and writer cluster roles don't switch during the upgrade. Therefore, your DB cluster might experience downtime while Amazon RDS upgrades the cluster writer instance.
Note
The downtime for a Multi-AZ DB cluster minor version upgrade is typically 35 seconds. When used with RDS Proxy, you can further reduce downtime to one second or less. For more information, see Using Amazon RDS Proxy. Alternately, you can use an open source database proxy such as ProxySQL
,PgBouncer , or the AWS JDBC Driver for MySQL .
Currently, Amazon RDS supports major version upgrades only for RDS for PostgreSQL Multi-AZ DB clusters. Amazon RDS supports minor version upgrades for all DB engines that support Multi-AZ DB clusters.
Amazon RDS doesn't automatically upgrade Multi-AZ DB cluster read replicas. For minor version upgrades, you must first manually upgrade all read replicas and then upgrade the cluster. Otherwise, the upgrade is blocked. When you perform a major version upgrade of a cluster, the replication state of all read replicas changes to terminated. You must delete and recreate the read replicas after the upgrade completes. For more information, see Monitoring read replication.
The process for upgrading the engine version of a Multi-AZ DB cluster is the same as the process for
upgrading a DB instance engine version. For instructions, see Upgrading
a DB instance engine version. The only difference is that when using
the AWS Command Line Interface (AWS CLI), you use the modify-db-cluster command and specify the
--db-cluster-identifier
parameter (along with the
--allow-major-version-upgrade
parameter).
For more information about major and minor version upgrades, see the following documentation for your DB engine:
Using RDS Proxy with Multi-AZ DB clusters
You can use Amazon RDS Proxy to create a proxy for your Multi-AZ DB clusters. By using RDS Proxy, your applications can pool and share database connections to improve their ability to scale. Each proxy performs connection multiplexing, also known as connection reuse. With multiplexing, RDS Proxy performs all the operations for a transaction using one underlying database connection. RDS Proxy can also reduce the downtime for a minor version upgrade of a Multi-AZ DB cluster to one second or less. For more information about the benefits of RDS Proxy, see Using Amazon RDS Proxy.
To set up a proxy for a Multi-AZ DB cluster, choose Create an RDS Proxy when creating the cluster. For instructions to create and manage RDS Proxy endpoints, see Working with Amazon RDS Proxy endpoints.
Replica lag and Multi-AZ DB clusters
Replica lag is the difference in time between the latest transaction on
the writer DB instance and the latest applied transaction on a reader DB instance. The
Amazon CloudWatch metric ReplicaLag
represents this time difference. For more
information about CloudWatch metrics, see Monitoring Amazon RDS metrics with Amazon CloudWatch.
Although Multi-AZ DB clusters allow for high write performance, replica lag can still occur due to the nature of engine-based replication. Because any failover must first resolve the replica lag before it promotes a new writer DB instance, monitoring and managing this replica lag is a consideration.
For RDS for MySQL Multi-AZ DB clusters, failover time depends on replica lag of both remaining reader DB instances. Both the reader DB instances must apply unapplied transactions before one of them is promoted to the new writer DB instance.
For RDS for PostgreSQL Multi-AZ DB clusters, failover time depends on the lowest replica lag of the two remaining reader DB instances. The reader DB instance with the lowest replica lag must apply unapplied transactions before it is promoted to the new writer DB instance.
For a tutorial that shows you how to create a CloudWatch alarm when replica lag exceeds a set amount of time, see Tutorial: Creating an Amazon CloudWatch alarm for Multi-AZ DB cluster replica lag.
Common causes of replica lag
In general, replica lag occurs when the write workload is too high for the reader DB instances to apply the transactions efficiently. Various workloads can incur temporary or continuous replica lag. Some examples of common causes are the following:
-
High write concurrency or heavy batch updating on the writer DB instance, causing the apply process on the reader DB instances to fall behind.
-
Heavy read workload that is using resources on one or more reader DB instances. Running slow or large queries can affect the apply process and can cause replica lag.
-
Transactions that modify large amounts of data or DDL statements can sometimes cause a temporary increase in replica lag because the database must preserve commit order.
Mitigating replica lag
For Multi-AZ DB clusters for RDS for MySQL and RDS for PostgreSQL, you can mitigate replica lag by reducing the load on your writer DB instance. You can also use flow control to reduce replica lag. Flow control works by throttling writes on the writer DB instance, which ensures that replica lag doesn't continue to grow unbounded. Write throttling is accomplished by adding a delay into the end of a transaction, which decreases the write throughput on the writer DB instance. Although flow control doesn't guarantee lag elimination, it can help reduce overall lag in many workloads. The following sections provide information about using flow control with RDS for MySQL and RDS for PostgreSQL.
Mitigating replica lag with flow control for RDS for MySQL
When you are using RDS for MySQL Multi-AZ DB clusters, flow control is turned on by default using the
dynamic parameter rpl_semi_sync_master_target_apply_lag
. This
parameter specifies the upper limit that you want for replica lag. As replica
lag approaches this configured limit, flow control throttles the write
transactions on the writer DB instance to try to contain the replica lag below the
specified value. In some cases, replica lag can exceed the specified limit. By
default, this parameter is set to 120 seconds. To turn off flow control, set
this parameter to its maximum value of 86,400 seconds (one day).
To view the current delay injected by flow control, show the parameter
Rpl_semi_sync_master_flow_control_current_delay
by running the
following query.
SHOW GLOBAL STATUS like '%flow_control%';
Your output should look similar to the following.
+-------------------------------------------------+-------+
| Variable_name | Value |
+-------------------------------------------------+-------+
| Rpl_semi_sync_master_flow_control_current_delay | 2010 |
+-------------------------------------------------+-------+
1 row in set (0.00 sec)
Note
The delay is shown in microseconds.
When you have Performance Insights turned on for an RDS for MySQL Multi-AZ DB cluster, you
can monitor the wait event corresponding to a SQL statement indicating that the
queries were delayed by a flow control. When a delay was introduced by a flow
control, you can view the wait event
/wait/synch/cond/semisync/semi_sync_flow_control_delay_cond
corresponding to the SQL statement on the Performance Insights dashboard. To
view these metrics, make sure that the Performance Schema is turned on. For
information about Performance Insights, see Monitoring DB load with Performance Insights on Amazon RDS.
Mitigating replica lag with flow control for RDS for PostgreSQL
When you are using RDS for PostgreSQL Multi-AZ DB clusters, flow control is deployed as an extension. It
turns on a background worker for all DB instances in the DB cluster. By default, the
background workers on the reader DB instances communicate the current replica lag with
the background worker on the writer DB instance. If the lag exceeds two minutes on any
reader DB instance, the background worker on the writer DB instance adds a delay at the end
of a transaction. To control the lag threshold, use the parameter
flow_control.target_standby_apply_lag
.
When a flow control throttles a PostgreSQL process, the Extension
wait event in
pg_stat_activity
and Performance Insights indicates that. The
function get_flow_control_stats
displays details about how much delay
is currently being added.
Flow control can benefit most online transaction processing (OLTP) workloads that have short but highly concurrent transactions. If the lag is caused by long-running transactions, such as batch operations, flow control doesn't provide as strong a benefit.
You can turn off flow control by removing the extension from the
preload_shared_libraries
and rebooting your DB instance.
Failover process for Multi-AZ DB clusters
If there is a planned or unplanned outage of your writer DB instance in a Multi-AZ DB cluster, Amazon RDS automatically fails over to a reader DB instance in different Availability Zone. The time it takes for the failover to complete depends on the database activity and other conditions when the writer DB instance became unavailable. Failover times are typically under 35 seconds. Failover completes when both reader DB instances have applied outstanding transactions from the failed writer. When the failover is complete, it can take additional time for the RDS console to reflect the new Availability Zone.
Topics
Automatic failovers
Amazon RDS handles failovers automatically so you can resume database operations as quickly as possible without administrative intervention. To fail over, the writer DB instance switches automatically to a reader DB instance.
Manually failing over a Multi-AZ DB cluster
If you manually fail over a Multi-AZ DB cluster, RDS first terminates the primary DB instance. Then, the internal monitoring system detects that the primary DB instance is unhealthy and promotes a readable replica DB instance. Failover times are typically under 35 seconds.
You can fail over a Multi-AZ DB cluster manually using the AWS Management Console, the AWS CLI, or the RDS API.
To fail over a Multi-AZ DB cluster manually
Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/
. -
In the navigation pane, choose Databases.
-
Choose the Multi-AZ DB cluster that you want to fail over.
-
For Actions, choose Failover.
The Failover DB cluster page appears.
-
Choose Failover to confirm the manual failover.
To fail over a Multi-AZ DB cluster manually, use the AWS CLI command failover-db-cluster.
aws rds failover-db-cluster --db-cluster-identifier
mymultiazdbcluster
To fail over a Multi-AZ DB cluster manually, call the Amazon RDS API
FailoverDBCluster and specify the
DBClusterIdentifier
.
Determining whether a Multi-AZ DB cluster has failed over
To determine if your Multi-AZ DB cluster has failed over, you can do the following:
Set up DB event subscriptions to notify you by email or SMS that a failover has been initiated. For more information about events, see Working with Amazon RDS event notification.
View your DB events by using the Amazon RDS console or API operations.
View the current state of your Multi-AZ DB cluster by using the Amazon RDS console, the AWS CLI, and the RDS API.
For information on how you can respond to failovers, reduce recovery time, and other best practices for Amazon RDS, see Best practices for Amazon RDS.
Setting the JVM TTL for DNS name lookups
The failover mechanism automatically changes the Domain Name System (DNS) record of the DB instance to point to the reader DB instance. As a result, you need to re-establish any existing connections to your DB instance. In a Java virtual machine (JVM) environment, due to how the Java DNS caching mechanism works, you might need to reconfigure JVM settings.
The JVM caches DNS name lookups. When the JVM resolves a host name to an IP address, it caches the IP address for a specified period of time, known as the time-to-live (TTL).
Because AWS resources use DNS name entries that occasionally change, we recommend that you configure your JVM with a TTL value of no more than 60 seconds. Doing this makes sure that when a resource's IP address changes, your application can receive and use the resource's new IP address by requerying the DNS.
On some Java configurations, the JVM default TTL is set so that it never refreshes DNS entries until the JVM is restarted. Thus, if the IP address for an AWS resource changes while your application is still running, it can't use that resource until you manually restart the JVM and the cached IP information is refreshed. In this case, it's crucial to set the JVM's TTL so that it periodically refreshes its cached IP information.
Note
The default TTL can vary according to the version of your JVM and whether a security
manager is installed. Many JVMs provide a default TTL less than 60 seconds. If
you're using such a JVM and not using a security manager, you can ignore
the rest of this topic. For more information on security managers in Oracle, see
The security manager
To modify the JVM's TTL, set the networkaddress.cache.ttl
-
To set the property value globally for all applications that use the JVM, set
networkaddress.cache.ttl
in the$JAVA_HOME/jre/lib/security/java.security
file.networkaddress.cache.ttl=60
-
To set the property locally for your application only, set
networkaddress.cache.ttl
in your application's initialization code before any network connections are established.java.security.Security.setProperty("networkaddress.cache.ttl" , "60");