Overview of managing clusters in Amazon Redshift - Amazon Redshift

Overview of managing clusters in Amazon Redshift

After your cluster is created, there are several operations you can perform on it. The operations include resizing, pausing, resuming, renaming, and deleting.

Resizing clusters in Amazon Redshift

As your data warehousing capacity and performance needs change, you can resize your cluster to make the best use of Amazon Redshift's computing and storage options.

A resize operation comes in two types:

  • Elastic resize – You can add nodes to or remove nodes from your cluster. You can also change the node type, such as from DS2 nodes to RA3 nodes. Elastic resize is a fast operation, typically completing in minutes. For this reason, we recommend it as a first option. When you perform an elastic resize, it redistributes data slices, which are partitions that are allocated memory and disk space in each node. Elastic resize is appropriate when you:

    • Add or reduce nodes in an existing cluster, but you don't change the node type – This is commonly called an in-place resize. When you perform this type of resize, some running queries complete successfully, but others can be dropped as part of the operation. An elastic resize completes within a few minutes.

    • Change the node type for a cluster – When you change the node type, a snapshot is created and data is redistributed from the source cluster to a cluster comprised of the new node type. On completion, running queries are dropped. Like the in-place resize, it completes quickly.

  • Classic resize – You can change the node type, number of nodes, or both, in a similar manner to elastic resize. Classic resize takes more time to complete, but it can be useful in cases where the change in node count or the node type to migrate to doesn't fall within the bounds for elastic resize. This can apply, for instance, when the change in node count is really large. You can also use classic resize to change the cluster encryption. For example, you can use it to modify your unencrypted cluster to use AWS KMS encryption.

Scheduling a resize – You can schedule resize operations for your cluster to scale up to anticipate high use or to scale down for cost savings. Scheduling works for both elastic resize and classic resize. You can set up a schedule on the Amazon Redshift console. For more information, see Resizing a cluster, under Managing clusters using the console. You can also use AWS CLI or Amazon Redshift API operations to schedule a resize. For more information, see create-scheduled-action in the AWS CLI Command Reference or CreateScheduledAction in the Amazon Redshift API Reference.

Elastic resize

An elastic resize operation, when you add or remove nodes of the same type, has the following stages:

  1. Elastic resize takes a cluster snapshot. This snapshot always includes no-backup tables for nodes where it's applicable. (Some node types, like RA3, don't have no-backup tables.) If your cluster doesn't have a recent snapshot, because you disabled automated snapshots, the backup operation can take longer. (To minimize the time before the resize operation begins, we recommend that you enable automated snapshots or create a manual snapshot before starting the resize.) When you start an elastic resize and a snapshot operation is in progress, the resize can fail if the snapshot operation doesn't complete within a few minutes. For more information, see Amazon Redshift snapshots and backups.

  2. The operation migrates cluster metadata. The cluster is unavailable for a few minutes. The majority of queries are temporarily paused and connections are held open. It is possible, however, for some queries to be dropped. This stage is short.

  3. Session connections are reinstated and queries resume.

  4. Elastic resize redistributes data to node slices, in the background. The cluster is available for read and write operations, but some queries can take longer to run.

  5. After the operation completes, Amazon Redshift sends an event notification.

When you use elastic resize to change the node type, it works similarly to when you add or substract nodes of the same type. First, a snapshot is created. A new target cluster is provisioned with the latest data from the snapshot, and data is transferred to the new cluster in the background. During this period, data is read only. When the resize nears completion, Amazon Redshift updates the endpoint to point to the new cluster and all connections to the source cluster are dropped.

If you have reserved nodes, for example DS2 reserved nodes, you can upgrade to RA3 reserved nodes when you perform a resize. You can do this when you perform an elastic resize or use the console to restore from a snapshot. The console guides you through this process. For more information about upgrading to RA3 nodes, see Upgrading to RA3 node types.

To monitor the progress of a resize operation using the Amazon Redshift console, choose CLUSTERS, then choose the cluster being resized to see the details.

Elastic resize doesn't sort tables or reclaim disk space, so it isn't a substitute for a vacuum operation. For more information, see Vacuuming tables.

Elastic resize has the following constraints:

  • Elastic resize and data sharing clusters - When you add or subtract nodes on a cluster that's a producer for data sharing, you can’t connect to it from consumers while Amazon Redshift migrates cluster metadata. Similarly, if you perform an elastic resize and choose a new node type, data sharing is unavailable while connections are dropped and transferred to the new target cluster. In both types of elastic resize, the producer is unavailable for several minutes.

  • Single-node clusters - You can't use elastic resize to resize from or to a single-node cluster.

  • Data transfer from a shared snapshot - To run an elastic resize on a cluster that is transferring data from a shared snapshot, at least one backup must be available for the cluster. You can view your backups on the Amazon Redshift console snapshots list, the describe-cluster-snapshots CLI command, or the DescribeClusterSnapshots API operation.

  • Platform restriction - Elastic resize is available only for clusters that use the EC2-VPC platform. For more information, see Use EC2-VPC when you create your cluster.

  • Storage considerations - Make sure that your new node configuration has enough storage for existing data. You may have to add additional nodes or change configuration.

  • Source vs target cluster size - The number of nodes and node type that it's possible to resize to with elastic resize is determined by the number of nodes in the source cluster and the node type chosen for the resized cluster. To determine the possible configurations available, you can use the console. Or you can use the describe-node-configuration-options AWS CLI command with the action-type resize-cluster option. For more information about the resizing using the Amazon Redshift console, see Resizing a cluster.

    The following example CLI command describes the configuration options available. In this example, the cluster named mycluster is a dc2.large 8-node cluster.

    aws redshift describe-node-configuration-options --cluster-identifier mycluster --region eu-west-1 --action-type resize-cluster

    This command returns an option list with recommended node types, number of nodes, and disk utilization for each option. The configurations returned can vary based on the specific input cluster. You can choose one of the returned configurations when you specify the options of the resize-cluster CLI command.

  • Ceiling on additional nodes - Elastic resize has limits on the nodes that you can add to a cluster. For example, a dc2 cluster supports elastic resize up to double the number of nodes. To illustrate, you can add a node to a 4-node dc2.8xlarge cluster to make it a five-node cluster, or add more nodes until you reach eight.

    With some ra3 node types, you can increase the number of nodes up to four times the existing count. Specifically, suppose that your cluster consists of ra3.4xlarge or ra3.16xlarge nodes. You can then use elastic resize to increase the number of nodes in an 8-node cluster to 32. Or you can pick a value below the limit. (Keep in mind that the ability to grow the cluster by 4x depends on the source cluster size.) If your cluster has ra3.xlplus nodes, the limit is double.

    All ra3 node types support a decrease in the number of nodes to a quarter of the existing count. For example, you can decrease the size of a cluster with ra3.4xlarge nodes from 12 nodes to 3, or to a number above the minimum.

    The following table lists growth and reduction limits for each node type that supports elastic resize.

    Node type Growth limit Reduction limit

    ra3.16xlarge

    4x (from 4 to 16 nodes, for example)

    To one quarter of the number (from 16 to 4 nodes, for example)

    ra3.4xlarge

    4x

    To one quarter of the number

    ra3.xlplus

    2x (from 4 to 8 nodes, for example)

    To one quarter of the number

    dc2.8xlarge

    2x

    To one half of the number (from 16 to 8 nodes, for example)

    dc2.large

    2x

    To one half of the number

    ds2.8xlarge

    2x

    To one half of the number

    ds2.xlarge

    2x

    To one half of the number

Classic resize

Classic resize handles cases where the change in cluster size or node type isn't within the specifications supported by elastic resize. Classic resize has undergone performance improvements in order that migration of large data volumes, which could take hours or days in the past, completes much more quickly. It does this by making use of a backup and restore operation between the source and target cluster. It also uses more efficient distribution to merge the data to the target cluster.

As a preliminary step, it's important to have a backup snapshot of the source cluster, or take a snapshot, prior to initiating the resize.

Classic resize has the following stages:

  1. Initial migration from source cluster to target cluster. When the new, target cluster is provisioned, Amazon Redshift sends an event notification that the resize has started. It restarts your existing cluster, which closes all connections. This includes connections from consumers, if the cluster is a producer for data sharing. After the restart, the cluster is in read-only mode, and data sharing resumes. These actions take a few minutes. Next, data is migrated to the target cluster, and both reads and writes are available.

  2. Distribution Key tables migrated as Distribution Even are converted back to their original distribution style, using background workers. The duration of this phase is dependent on the data-set size. For more information, see Distribution styles.

    Both reads and writes to the database work during this process. There can be degredation in query performance.

  3. When the resize process nears completion, Amazon Redshift updates the endpoint to the target cluster, and all connections to the source cluster are dropped. The target cluster takes on the producer role for data sharing.

  4. After the resize completes, Amazon Redshift sends an event notification.

You can view the resize progress on the Amazon Redshift console. The time it takes to resize a cluster depends on the amount of data.

Note

If you perform a classic resize on a cluster with a large volume of data, and the nodes are not RA3, data migration can be very slow. It can take several days to migrate a cluster with multiple terabytes of data. Data transfer for RA3 nodes completes much more quickly.

Elastic resize vs classic resize

The following table compares behavior between the two resize types.

Elastic resize vs classic resize
Behavior Elastic resize Classic resize Comments
System data retention Elastic resize retains system log data. Classic resize doesn't retain system tables and data. If you have audit logging enabled in your source cluster, you can continue to access the logs in Amazon S3 or in CloudWatch, following a resize. You can keep or delete these logs as your data policies specify.
Changing node types

Elastic resize, when the node type doesn't change: In-place resize, and most queries are held.

Elastic resize, with a new node type selected: A new cluster is created. Queries are dropped as the resize process completes.

Classic Resize: A new cluster is created. Queries are dropped during the resize process.
Session and query retention Elastic resize retains sessions and queries when the node type is the same in the source cluster and target. If you choose a new node type, queries are dropped. Classic resize doesn't retain sessions and queries. Queries are dropped. When queries are dropped, you can expect some performance degredation. It's best to perform a resize operation during a period of light use.
Table sort Tables are not sorted as part of the resize operation. Tables are not sorted as part of the resize operation. Amazon Redshift doesn't sort tables during a resize operation, so the existing sort order is maintained. When you resize a cluster, Amazon Redshift distributes the database tables to the new nodes and runs an ANALYZE command to update statistics. Rows that are marked for deletion aren't transferred, so you need to run a VACUUM command only if your tables need to be resorted. For more information about a vacuum operation, see Vacuuming tables in the Amazon Redshift Database Developer Guide.
Cancelling a resize operation

You can't cancel an elastic resize.

You can cancel a classic resize operation before it completes by choosing Cancel resize from the cluster details in the Amazon Redshift console.

The amount of time it takes to cancel a resize depends on the stage of the resize operation when you cancel. When you do this, the cluster isn't available until the cancel operation completes. If the resize operation is in the final stage, you can't cancel.

Snapshot, restore, and resize

Elastic resize is the fastest method to resize an Amazon Redshift cluster. If elastic resize isn't an option for you and you require near-constant write access to your cluster, use the snapshot and restore operations with classic resize as described in the following section. This approach requires that any data that is written to the source cluster after the snapshot is taken must be copied manually to the target cluster after the switch. Depending on how long the copy takes, you might need to repeat this several times until you have the same data in both clusters. Then you can make the switch to the target cluster. This process might have a negative impact on existing queries until the full set of data is available in the target cluster. However, it minimizes the amount of time that you can't write to the database.

The snapshot, restore, and classic resize approach uses the following process:

  1. Take a snapshot of your existing cluster. The existing cluster is the source cluster.

  2. Note the time that the snapshot was taken. Doing this means that you can later identify the point when you need to rerun extract, transact, load (ETL) processes to load any post-snapshot data into the target database.

  3. Restore the snapshot into a new cluster. This new cluster is the target cluster. Verify that the sample data exists in the target cluster.

  4. Resize the target cluster. Choose the new node type, number of nodes, and other settings for the target cluster.

  5. Review the loads from your ETL processes that occurred after you took a snapshot of the source cluster. Be sure to reload the same data in the same order into the target cluster. If you have ongoing data loads, repeat this process several times until the data is the same in both the source and target clusters.

  6. Stop all queries running on the source cluster. To do this, you can reboot the cluster, or you can log on as a superuser and use the PG_CANCEL_BACKEND and the PG_TERMINATE_BACKEND commands. Rebooting the cluster is the easiest way to make sure that the cluster is unavailable.

  7. Rename the source cluster. For example, rename it from examplecluster to examplecluster-source.

  8. Rename the target cluster to use the name of the source cluster before the rename. For example, rename the target cluster from preceding to examplecluster. From this point on, any applications that use the endpoint containing examplecluster connect to the target cluster.

  9. Delete the source cluster after you switch to the target cluster, and verify that all processes work as expected.

Alternatively, you can rename the source and target clusters before reloading data into the target cluster. This approach works if you don't have a requirement that any dependent systems and reports be immediately up to date with those for the target cluster. In this case, step 6 moves to the end of the process described preceding.

The rename process is only required if you want applications to continue using the same endpoint to connect to the cluster. If you don't require this, you can instead update any applications that connect to the cluster to use the endpoint of the target cluster without renaming the cluster.

There are a couple of benefits to reusing a cluster name. First, you don't need to update application connection strings because the endpoint doesn't change, even though the underlying cluster changes. Second, related items such as Amazon CloudWatch alarms and Amazon Simple Notification Service (Amazon SNS) notifications are tied to the cluster name. This tie means that you can continue using the same alarms and notifications that you set up for the cluster. This continued use is primarily a concern in production environments where you want the flexibility to resize the cluster without reconfiguring related items, such as alarms and notifications.

Getting the leader node IP address

If your cluster is public and is in a VPC, it keeps the same Elastic IP address (EIP) for the leader node after resizing. If your cluster is private and is in a VPC, it keeps the same private IP address for the leader node after resizing. If your cluster isn't in a VPC, a new public IP address is assigned for the leader node as part of the resize operation.

To get the leader node IP address for a cluster, use the dig utility, as shown following.

dig mycluster.abcd1234.us-west-2.redshift.amazonaws.com

The leader node IP address is at the end of the ANSWER SECTION in the results, as shown following.

Pausing and resuming clusters

If you have a cluster that only needs to be available at specific times, you can pause the cluster and later resume it. While the cluster is paused, on-demand billing is suspended. Only the cluster's storage incurs charges. For more information about pricing, see the Amazon Redshift pricing page.

When you pause a cluster, Amazon Redshift creates a snapshot, begins terminating queries, and puts the cluster in a pausing state. If you delete a paused cluster without requesting a final snapshot, then you can't restore the cluster. You can't cancel or roll back a pause or resume operation after it's initiated.

You can pause and resume a cluster on the Amazon Redshift console, with the AWS CLI, or with Amazon Redshift API operations.

You can schedule actions to pause and resume a cluster. When you use the new Amazon Redshift console to create a recurring schedule to pause and resume, then two scheduled actions are created for the date range that you choose. The scheduled action names are suffixed with -pause and -resume. The total length of the name must fit within the maximum size of a scheduled action name.

You can't pause the following types of clusters:

  • EC2-Classic clusters.

  • Clusters that are not active, for example a cluster that is currently modifying.

  • Hardware security module (HSM) clusters.

  • Clusters that have automated snapshots disabled.

When deciding to pause a cluster, consider the following:

  • Connections or queries to the cluster aren't available.

  • You can't see query monitoring information of a paused cluster on the Amazon Redshift console.

  • You can't modify a paused cluster. Any scheduled actions on the cluster aren't done. These include creating snapshots, resizing clusters, and cluster maintenance operations.

  • Hardware metrics aren't created. Update your CloudWatch alarms if you have alarms set on missing metrics.

  • You can't copy the latest automated snapshots of a paused cluster to manual snapshots.

  • While a cluster is pausing, it can't be resumed until the pause operation is complete.

  • When you pause a cluster, billing is suspended. However, the pause operation typically completes within 15 minutes, depending upon the size of the cluster.

  • Audit logs are archived and not restored on resume.

  • After a cluster is paused, traces and logs might not be available for troubleshooting problems that occurred before the pause.

  • No-backup tables on the cluster are not restored on resume. For more information about no-backup tables, see Excluding tables from snapshots.

When you resume a cluster, consider the following:

  • The cluster version of the resumed cluster is updated to the maintenance version based on the maintenance window of the cluster.

  • If you delete the subnet associated with a paused cluster, you might have an incompatible network. In this case, restore your cluster from the latest snapshot.

  • If you delete an Elastic IP address while the cluster is paused, then a new Elastic IP address is requested.

  • If Amazon Redshift can't resume the cluster with its previous elastic network interface, then Amazon Redshift tries to allocate a new one.

  • When you resume a cluster, your node IP addresses might change. You might need to update your VPC settings to support these new IP addresses for features like COPY from Secure Shell (SSH) or COPY from Amazon EMR.

  • If you try to resume a cluster that isn't paused, the resume operation returns an error. If the resume operation is part of a scheduled action, modify or delete the scheduled action to prevent future errors.

  • Depending upon the size of the cluster, it can take several minutes to resume a cluster before queries can be processed. In addition, query performance can be impacted for some period of time while the cluster is being re-hydrated after resume completes.

Renaming clusters

You can rename a cluster if you want the cluster to use a different name. Because the endpoint to your cluster includes the cluster name (also referred to as the cluster identifier), the endpoint changes to use the new name after the rename finishes. For example, if you have a cluster named examplecluster and rename it to newcluster, the endpoint changes to use the newcluster identifier. Any applications that connect to the cluster must be updated with the new endpoint.

You may rename a cluster if you want to change the cluster your applications connect to without having to change the endpoint in those applications. In this case, you must first rename the original cluster and then change the second cluster to reuse the name of the original cluster before the rename. Doing this is necessary because the cluster identifier must be unique within your account and region, so the original cluster and second cluster cannot have the same name. You might do this if you restore a cluster from a snapshot and don't want to change the connection properties of any dependent applications.

Note

If you delete the original cluster, you are responsible for deleting any unwanted cluster snapshots.

When you rename a cluster, the cluster status changes to renaming until the process finishes. The old DNS name that was used by the cluster is immediately deleted, although it could remain cached for a few minutes. The new DNS name for the renamed cluster becomes effective within about 10 minutes. The renamed cluster is not available until the new name becomes effective. The cluster will be rebooted and any existing connections to the cluster will be dropped. After this completes, the endpoint will change to use the new name. For this reason, you should stop queries from running before you start the rename and restart them after the rename finishes.

Cluster snapshots are retained, and all snapshots associated with a cluster remain associated with that cluster after it is renamed. For example, suppose that you have a cluster that serves your production database and the cluster has several snapshots. If you rename the cluster and then replace it in the production environment with a snapshot, the cluster that you renamed still has those existing snapshots associated with it.

Amazon CloudWatch alarms and Amazon Simple Notification Service (Amazon SNS) event notifications are associated with the name of the cluster. If you rename the cluster, you need to update these accordingly. You can update the CloudWatch alarms in the CloudWatch console, and you can update the Amazon SNS event notifications in the Amazon Redshift console on the Events pane. The load and query data for the cluster continues to display data from before the rename and after the rename. However, performance data is reset after the rename process finishes.

For more information, see Modifying a cluster.

Shutting down and deleting clusters

You can shut down your cluster if you want to stop it from running and incurring charges. When you shut it down, you can optionally create a final snapshot. If you create a final snapshot, Amazon Redshift will create a manual snapshot of your cluster before shutting it down. You can later restore that snapshot if you want to resume running the cluster and querying data.

If you no longer need your cluster and its data, you can shut it down without creating a final snapshot. In this case, the cluster and data are deleted permanently. For more information about shutting down and deleting clusters, see Deleting a cluster.

Regardless of whether you shut down your cluster with a final manual snapshot, all automated snapshots associated with the cluster will be deleted after the cluster is shut down. Any manual snapshots associated with the cluster are retained. Any manual snapshots that are retained, including the optional final snapshot, are charged at the Amazon Simple Storage Service storage rate if you have no other clusters running when you shut down the cluster, or if you exceed the available free storage that is provided for your running Amazon Redshift clusters. For more information about snapshot storage charges, see the Amazon Redshift pricing page.