Fast recovery after failover with cluster cache management for Aurora PostgreSQL - Amazon Aurora

Fast recovery after failover with cluster cache management for Aurora PostgreSQL

For fast recovery of the writer DB instance in your Aurora PostgreSQL clusters if there's a failover, use cluster cache management for Amazon Aurora PostgreSQL. Cluster cache management ensures that application performance is maintained if there's a failover.

In a typical failover situation, you might see a temporary but large performance degradation after failover. This degradation occurs because when the failover DB instance starts, the buffer cache is empty. An empty cache is also known as a cold cache. A cold cache degrades performance because the DB instance has to read from the slower disk, instead of taking advantage of values stored in the buffer cache.

With cluster cache management, you set a specific reader DB instance as the failover target. Cluster cache management ensures that the data in the designated reader's cache is kept synchronized with the data in the writer DB instance's cache. The designated reader's cache with prefilled values is known as a warm cache. If a failover occurs, the designated reader uses values in its warm cache immediately when it's promoted to the new writer DB instance. This approach provides your application much better recovery performance.

Cluster cache management requires that the designated reader instance have the same instance class type and size (db.r5.2xlarge or db.r5.xlarge, for example) as the writer. Keep this in mind when you create your Aurora PostgreSQL DB clusters so that your cluster can recover during a failover. For a listing of instance class types and sizes, see Hardware specifications for DB instance classes for Aurora.

Note

Cluster cache management is not supported for Aurora PostgreSQL DB clusters that are part of Aurora global databases. It is recommended that no workload should run on the designated tier-0 reader.

Configuring cluster cache management

To configure cluster cache management, do the following processes in order.

Note

Allow at least 1 minute after completing these steps for cluster cache management to be fully operational.

Enabling cluster cache management

To enable cluster cache management, take the steps described following.

To enable cluster cache management
  1. Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.

  2. In the navigation pane, choose Parameter groups.

  3. In the list, choose the parameter group for your Aurora PostgreSQL DB cluster.

    The DB cluster must use a parameter group other than the default, because you can't change values in a default parameter group.

  4. For Parameter group actions, choose Edit.

  5. Set the value of the apg_ccm_enabled cluster parameter to 1.

  6. Choose Save changes.

To enable cluster cache management for an Aurora PostgreSQL DB cluster, use the AWS CLI modify-db-cluster-parameter-group command with the following required parameters:

  • --db-cluster-parameter-group-name

  • --parameters

Example

For Linux, macOS, or Unix:

aws rds modify-db-cluster-parameter-group \ --db-cluster-parameter-group-name my-db-cluster-parameter-group \ --parameters "ParameterName=apg_ccm_enabled,ParameterValue=1,ApplyMethod=immediate"

For Windows:

aws rds modify-db-cluster-parameter-group ^ --db-cluster-parameter-group-name my-db-cluster-parameter-group ^ --parameters "ParameterName=apg_ccm_enabled,ParameterValue=1,ApplyMethod=immediate"

Setting the promotion tier priority for the writer DB instance

For cluster cache management, make sure that the promotion priority is tier-0 for the writer DB instance of the Aurora PostgreSQL DB cluster. The promotion tier priority is a value that specifies the order in which an Aurora reader is promoted to the writer DB instance after a failure. Valid values are 0–15, where 0 is the first priority and 15 is the last priority. For more information about the promotion tier, see Fault tolerance for an Aurora DB cluster.

To set the promotion priority for the writer DB instance to tier-0
  1. Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.

  2. In the navigation pane, choose Databases.

  3. Choose the Writer DB instance of the Aurora PostgreSQL DB cluster.

  4. Choose Modify. The Modify DB Instance page appears.

  5. On the Additional configuration panel, choose tier-0 for the Failover priority.

  6. Choose Continue and check the summary of modifications.

  7. To apply the changes immediately after you save them, choose Apply immediately.

  8. Choose Modify DB Instance to save your changes.

To set the promotion tier priority to 0 for the writer DB instance using the AWS CLI, call the modify-db-instance command with the following required parameters:

  • --db-instance-identifier

  • --promotion-tier

  • --apply-immediately

Example

For Linux, macOS, or Unix:

aws rds modify-db-instance \ --db-instance-identifier writer-db-instance \ --promotion-tier 0 \ --apply-immediately

For Windows:

aws rds modify-db-instance ^ --db-instance-identifier writer-db-instance ^ ---promotion-tier 0 ^ --apply-immediately

Setting the promotion tier priority for a reader DB instance

You must set only one reader DB instance for cluster cache management. To do so, choose a reader from the Aurora PostgreSQL cluster that is the same instance class and size as the writer DB instance. For example, if the writer uses db.r5.xlarge, choose a reader that uses this same instance class type and size. Then set its promotion tier priority to 0.

The promotion tier priority is a value that specifies the order in which an Aurora reader is promoted to the writer DB instance after a failure. Valid values are 0–15, where 0 is the first priority and 15 is the last priority.

To set the promotion priority of the reader DB instance to tier-0
  1. Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.

  2. In the navigation pane, choose Databases.

  3. Choose a Reader DB instance of the Aurora PostgreSQL DB cluster that is the same instance class as the writer DB instance.

  4. Choose Modify. The Modify DB Instance page appears.

  5. On the Additional configuration panel, choose tier-0 for the Failover priority.

  6. Choose Continue and check the summary of modifications.

  7. To apply the changes immediately after you save them, choose Apply immediately.

  8. Choose Modify DB Instance to save your changes.

To set the promotion tier priority to 0 for the reader DB instance using the AWS CLI, call the modify-db-instance command with the following required parameters:

  • --db-instance-identifier

  • --promotion-tier

  • --apply-immediately

Example

For Linux, macOS, or Unix:

aws rds modify-db-instance \ --db-instance-identifier reader-db-instance \ --promotion-tier 0 \ --apply-immediately

For Windows:

aws rds modify-db-instance ^ --db-instance-identifier reader-db-instance ^ ---promotion-tier 0 ^ --apply-immediately

Monitoring the buffer cache

After setting up cluster cache management, you can monitor the state of synchronization between the writer DB instance's buffer cache and the designated reader's warm buffer cache. To examine the buffer cache contents on both the writer DB instance and the designated reader DB instance, use the PostgreSQL pg_buffercache module. For more information, see the PostgreSQL pg_buffercache documentation.

Using the aurora_ccm_status Function

Cluster cache management also provides the aurora_ccm_status function. Use the aurora_ccm_status function on the writer DB instance to get the following information about the progress of cache warming on the designated reader:

  • buffers_sent_last_minute – How many buffers have been sent to the designated reader in the last minute.

  • buffers_found_last_minute – The number of frequently accessed buffers identified during the past minute.

  • buffers_sent_last_scan – How many buffers have been sent to the designated reader during the last complete scan of the buffer cache.

  • buffers_found_last_scan – How many buffers have been identified as frequently accessed and needed to be sent during the last complete scan of the buffer cache. Buffers already cached on the designated reader aren't sent.

  • buffers_sent_current_scan – How many buffers have been sent so far during the current scan.

  • buffers_found_current_scan – How many buffers have been identified as frequently accessed in the current scan.

  • current_scan_progress – How many buffers have been visited so far during the current scan.

The following example shows how to use the aurora_ccm_status function to convert some of its output into a warm rate and warm percentage.

SELECT buffers_sent_last_minute*8/60 AS warm_rate_kbps, 100*(1.0-buffers_sent_last_scan::float/buffers_found_last_scan) AS warm_percent FROM aurora_ccm_status();

Troubleshooting CCM configuration

When you enable apg_ccm_enabled cluster parameter, cluster cache management is automatically turned on at the instance level on the writer DB instance and one reader DB instance on the Aurora PostgreSQL DB cluster. The writer and reader instance should use the same instance class type and size. Their promotion tier priority is set to 0. Other reader instances in the DB cluster should have a non-zero promotion tier and cluster cache management is disabled for those instances.

The following reasons may lead to issues in the configuration and disable cluster cache management:

  • When there is no single reader DB instance set to promotion tier 0.

  • When the writer DB instance is not set to promotion tier 0.

  • When more than one reader DB instances are set to promotion tier 0.

  • When the writer and one reader DB instances with promotion tier 0 doesn't have the same instance size.