Switching a blue/green deployment - Amazon Aurora

Switching a blue/green deployment

A switchover promotes the DB cluster, including its DB instances, in the green environment to be the production DB cluster. Before you switch over, production traffic is routed to the cluster in the blue environment. After you switch over, production traffic is routed to the DB cluster in the green environment.

Switchover timeout

You can specify a switchover timeout period between 30 seconds and 3,600 seconds (one hour). If the switchover takes longer than the specified duration, then any changes are rolled back and no changes are made to either environment. The default timeout period is 300 seconds (five minutes).

Switchover guardrails

When you start a switchover, Amazon RDS runs some basic checks to test the readiness of the blue and green environments for switchover. These checks are known as switchover guardrails. These switchover guardrails prevent a switchover if the environments aren't ready for it. Therefore, they avoid longer than expected downtime and prevent the loss of data between the blue and green environments that might result if the switchover started.

Amazon RDS runs the following guardrail checks on the green environment:

  • Replication health – Checks if green DB cluster replication status is healthy. The green DB cluster is a replica of the blue DB cluster.

  • Replication lag – Checks if the replica lag of the green DB cluster is within allowable limits for switchover. The allowable limits are based on the specified timeout period. Replica lag indicates how far the green DB cluster is lagging behind its blue DB cluster. For more information, see Diagnosing and resolving lag between read replicas for Aurora MySQL and Monitoring Aurora PostgreSQL replication for Aurora PostgreSQL.

  • Active writes – Makes sure there are no active writes on the green DB cluster.

Amazon RDS runs the following guardrail checks on the blue environment:

  • External replication – For Aurora PostgreSQL DB clusters, makes sure the blue DB cluster doesn't have logical replication slots, publications, or subscriptions. For Aurora MySQL DB clusters, makes sure the blue DB cluster doesn't have subscriptions.

  • Long-running active writes – Makes sure there are no long-running active writes on the blue DB cluster because they can increase replica lag.

  • Long-running DDL statements – Makes sure there are no long-running DDL statements on the blue DB cluster because they can increase replica lag.

  • Unsupported PostgreSQL changes – For Aurora PostgreSQL DB clusters, makes sure that no DDL changes and no additions or modifications of large objects have been performed on the blue environment. For more information, see PostgreSQL logical replication limitations for blue/green deployments.

    If Amazon RDS detects unsupported PostgreSQL changes, it changes the replication state to Replication degraded and notifies you that switchover is not available for the blue/green deployment. To proceed with switchover, we recommend that you delete and recreate the blue/green deployment and all green databases. To do so, choose Actions, Delete with green databases.

Switchover actions

When you switch over a blue/green deployment, RDS performs the following actions:

  1. Runs guardrail checks to verify if the blue and green environments are ready for switchover.

  2. Stops new write operations on the DB cluster in both environments.

  3. Drops connections to the DB instances in both environments and doesn't allow new connections.

  4. Waits for replication to catch up in the green environment so that the green environment is in sync with the blue environment.

  5. Renames the DB cluster and DB instances in the both environments.

    RDS renames the DB cluster and DB instances in the green environment to match the corresponding DB cluster and DB instances in the blue environment. For example, assume the name of a DB instance in the blue environment is mydb. Also assume the name of the corresponding DB instance in the green environment is mydb-green-abc123. During switchover, the name of the DB instance in the green environment is changed to mydb.

    RDS renames the DB cluster and DB instances in the blue environment by appending -oldn to the current name, where n is a number. For example, assume the name of a DB instance in the blue environment is mydb. After switchover, the DB instance name might be mydb-old1.

    RDS also renames the endpoints in the green environment to match the corresponding endpoints in the blue environment so that application changes aren't required.

  6. Allows connections to databases in both environments.

  7. Allows write operations on the DB cluster in the new production environment.

    After switchover, the previous production DB cluster only allows read operations. Even if you disable the read_only parameter on the DB cluster, it remains read-only until you delete the blue/green deployment.

You can monitor the status of a switchover using Amazon EventBridge. For more information, see Blue/green deployment events.

If you have tags configured in the blue environment, these tags are moved to the new production environment during switchover. The previous production environment also retains these tags. For more information about tags, see Tagging Amazon RDS resources.

If the switchover starts and then stops before finishing for any reason, then any changes are rolled back, and no changes are made to either environment.

Switchover best practices

Before you switch over, we strongly recommend that you adhere to best practices by completing the following tasks:

  • Thoroughly test the resources in the green environment. Make sure they function properly and efficiently.

  • Monitor relevant Amazon CloudWatch metrics. For more information, see Verifying CloudWatch metrics before switchover.

  • Identify the best time for the switchover.

    During the switchover, writes are cut off from databases in both environments. Identify a time when traffic is lowest on your production environment. Long-running transactions, such as active DDLs, can increase your switchover time, resulting in longer downtime for your production workloads.

    If there's a large number of connections on your DB cluster and DB instances, consider manually reducing them to the minimum amount necessary for your application before you switch over the blue/green deployment. One way to achieve this is to create a script that monitors the status of the blue/green deployment and starts cleaning up connections when it detects that the status has changed to SWITCHOVER_IN_PROGRESS.

  • Make sure the DB cluster and DB instances in both environments are in Available state.

  • Make sure the DB cluster in the green environment is healthy and replicating.

  • Make sure that your network and client configurations don’t increase the DNS cache Time-To-Live (TTL) beyond five seconds, which is the default for Aurora DNS zones.
 Otherwise, applications will continue to send write traffic to the blue environment after
 switchover.

  • You can't roll back a blue/green deployment after switchover. For critical production workloads, consider provisioning a backup DB cluster before switching over.

  • For Aurora PostgreSQL DB clusters, review the logical replication limitations and take any required actions prior to switchover. For more information, see PostgreSQL logical replication limitations for blue/green deployments.

Note

During a switchover, you can't modify any DB cluster included in the switchover.

Verifying CloudWatch metrics before switchover

Before you switch over a blue/green deployment, we recommend that you check the values of the following metrics within Amazon CloudWatch.

  • AuroraBinlogReplicaLag (for Aurora MySQL) or AuroraReplicaLag (for Aurora PostgreSQL) – Use this metric to identify the current replication lag on the green environment. To reduce downtime, make sure that this value is close to zero before you switch over.

  • DatabaseConnections – Use this metric to estimate the level of activity on the blue/green deployment, and make sure that the value is at an acceptable level for your deployment before you switch over. If Performance Insights is turned on, DBLoad is a more accurate metric.

  • ActiveTransactions – If innodb_monitor_enable is set to all in the DB parameter group for any of your DB instances, use this metric to see if there's a high number of active transactions that might block switchover.

For more information about these metrics, see Amazon CloudWatch metrics for Amazon Aurora.

Switching over a blue/green deployment

You can switch over a blue/green deployment using the AWS Management Console, the AWS CLI, or the RDS API.

To switch over a blue/green deployment
  1. Sign in to the AWS Management Console and open the Amazon RDS console at https://console.aws.amazon.com/rds/.

  2. In the navigation pane, choose Databases, and then choose the blue/green deployment that you want to switch over.

  3. For Actions, choose Switch over.

    The Switch over page appears.

    
                  Switch over blue/green deployment
  4. On the Switch over page, review the switchover summary. Make sure the resources in both environments match what you expect. If they don't, choose Cancel.

  5. For Timeout settings, enter the time limit for switchover.

  6. If your cluster is running Aurora PostgreSQL, review and acknowledge the pre-switchover recommendations. For more information, see PostgreSQL logical replication limitations for blue/green deployments.

  7. Choose Switch over.

To switch over a blue/green deployment by using the AWS CLI, use the switchover-blue-green-deployment command with the following options:

  • --blue-green-deployment-identifier – Specify the identifier of the blue/green deployment.

  • --switchover-timeout – Specify the time limit for the switchover, in seconds. The default is 300.

Example Switch over a blue/green deployment

For Linux, macOS, or Unix:

aws rds switchover-blue-green-deployment \ --blue-green-deployment-identifier bgd-1234567890abcdef \ --switchover-timeout 600

For Windows:

aws rds switchover-blue-green-deployment ^ --blue-green-deployment-identifier bgd-1234567890abcdef ^ --switchover-timeout 600

To switch over a blue/green deployment by using the Amazon RDS API, use the SwitchoverBlueGreenDeployment operation with the following parameters:

  • BlueGreenDeploymentIdentifier – Specify the identifier of the blue/green deployment.

  • SwitchoverTimeout – Specify the time limit for the switchover, in seconds. The default is 300.

After switchover

After a switchover, the DB cluster and DB instances in the previous blue environment are retained. Standard costs apply to these resources. Replication and binary logging between the blue and green environments stops.

RDS renames the DB cluster and DB instances in the blue environment by appending -oldn to the current resource name, where n is a number. The DB cluster is forced into a read-only state. Even if you disable the read_only parameter on the DB cluster, it remains read-only until you delete the blue/green deployment.


          After switching over a blue/green deployment

Updating the parent node for consumers

After you switch over an Aurora MySQL blue/green deployment, if the blue DB cluster had any external replicas or binary log consumers prior to switchover, you must update their parent node after switchover in order to maintain replication continuity.

After switchover, the writer DB instance that was previously in the green environment emits an event that contains the master log file name and master log position. For example:

aws rds describe-events --output json --source-type db-instance --source-identifier db-instance-identifier { "Events": [ ... { "SourceIdentifier": "db-instance-identifier", "SourceType": "db-instance", "Message": "Binary log coordinates in green environment after switchover: file mysql-bin-changelog.000003 and position 804", "EventCategories": [], "Date": "2023-11-10T01:33:41.911Z", "SourceArn": "arn:aws:rds:us-east-1:123456789012:db:db-instance-identifier" } ] }

First, make sure that the consumer or replica has applied all binary logs from the old blue environment. Then, use the provided binary log coordinates to resume application on the consumers. For example, if you're running a MySQL replica on EC2, you can use the CHANGE MASTER TO command:

CHANGE MASTER TO MASTER_HOST='{new-writer-endpoint}', MASTER_LOG_FILE='mysql-bin-changelog.000003', MASTER_LOG_POS=804;