Amazon Redshift Clusters
An Amazon Redshift data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster. Each cluster runs an Amazon Redshift engine and contains one or more databases.
At this time, Amazon Redshift version 1.0 engine is available. However, as the engine is updated, multiple Amazon Redshift engine versions might be available for selection.
You can determine the Amazon Redshift engine and database versions for your cluster in the Cluster Version field in the console. The first two sections of the number are the cluster version, and the last section is the specific revision number of the database in the cluster. In the following example, the cluster version is 1.0 and the database revision number is 884.
Although the console displays this information in one field,
it is two parameters in the Amazon Redshift API:
For more information, go to Cluster in the
Amazon Redshift API Reference.
Amazon Redshift provides a setting, Allow Version Upgrade, to specify whether to automatically upgrade the Amazon Redshift engine in your cluster if a new version of the engine becomes available. This setting does not affect the database version upgrades, which are applied during the maintenance window that you specify for your cluster. Amazon Redshift engine upgrades are major version upgrades, and Amazon Redshift database upgrades are minor version upgrades. You can disable automatic version upgrades for major versions only. For more information about maintenance windows for minor version upgrades, see Maintenance Windows.
About Clusters and Nodes
An Amazon Redshift cluster consists of nodes. Each cluster has a leader node and one or more compute nodes. The leader node receives queries from client applications, parses the queries, and develops query execution plans. The leader node then coordinates the parallel execution of these plans with the compute nodes, aggregates the intermediate results from these nodes, and finally returns the results back to the client applications. Compute nodes execute the query execution plans and transmit data among themselves to serve these queries. The intermediate results are sent back to the leader node for aggregation before being sent back to the client applications. For more information about leader nodes and compute nodes, go to Data warehouse system architecture in the Amazon Redshift Database Developer Guide.
When you launch a cluster, one of the options you specify is the node type. The node type determines the CPU, RAM, storage capacity, and storage drive type for each node. Dense storage node types, DS1 and DS2, are optimized for large data workloads and use hard disk drive (HDD) storage. Dense compute node types, DC1, are optimized for performance-intensive workloads and use solid state drive (SSD) storage.
The DS2 node types have the same storage capabilities as their DS1 counterparts, but they offer better performance due to an increase in available RAM and CPU over the corresponding DS1 node types. However, the DC1 node type offers the best performance of all the node types. The node type that you choose depends heavily on the amount of data you import into Amazon Redshift, the complexity of the queries and operations that you run in the database, and the needs of downstream systems that depend on the results from those queries and operations.
Node types are available in different sizes. DS1 and DS2 are available in xlarge and 8xlarge sizes. DC1 is available in large and 8xlarge sizes. Node size and the number of nodes determine the total storage for a cluster.
Some node types allow one node (single-node) or two or more nodes (multi-node). The minimum for 8xlarge clusters is two nodes. On a single-node cluster, the node is shared for leader and compute functionality. On a multi-node cluster, the leader node is separate from the compute nodes.
Amazon Redshift applies quotas to resources for each AWS account in each region. A quota restricts the number of resources that your account can create for a given resource type, such as nodes or snapshots, within a region. For more information about the default quotas that apply to Amazon Redshift resources, go to Amazon Redshift Limits in the Amazon Web Services General Reference. To request an increase, submit an Amazon Redshift Limit Increase Form.
Node Type Details
The following tables summarize the node specifications for each node type and size using the following values.
vCPU is the number of virtual CPUs for each node.
ECU is the number of Amazon EC2 compute units for each node.
RAM is the amount of memory in gibibytes (GiB) for each node.
Slices per Node is the number of slices into which a compute node is partitioned.
Storage is the capacity and type of storage for each node.
Node Range is the minimum and maximum number of nodes that Amazon Redshift supports for the node type and size.
You might be restricted to fewer nodes depending on the quota that is applied to your AWS account in the selected region, as discussed preceding.
Total Capacity is the total storage capacity for the cluster if you deploy the maximum number of nodes that is specified in the node range.
The cost of your cluster depends on the region, node type, number of nodes, and whether the nodes are reserved in advance. For more information about the cost of nodes, go to the Amazon Redshift pricing page.
Dense Storage Node Types
|Node Size||vCPU||ECU||RAM (GiB)||Slices Per Node||Storage Per Node||Node Range||Total Capacity|
|ds1.xlarge||2||4.4||15||2||2 TB HDD||1–32||64 TB|
|ds1.8xlarge||16||35||120||16||16 TB HDD||2–128||2 PB|
|ds2.xlarge||4||13||31||2||2 TB HDD||1–32||64 TB|
|ds2.8xlarge||36||119||244||16||16 TB HDD||2–128||2 PB|
Dense Compute Node Types
|Node Size||vCPU||ECU||RAM (GiB)||Slices Per Node||Storage Per Node||Node Range||Total Capacity|
|dc1.large||2||7||15||2||160 GB SSD||1–32||5.12 TB|
|dc1.8xlarge||32||104||244||32||2.56 TB SSD||2–128||326 TB|
In previous releases of Amazon Redshift, the node types had different names. You can use the old names in the Amazon Redshift API and AWS Command Line Interface (AWS CLI), though we recommend that you update any scripts that reference those names to use the current names instead. The current and previous names are as follows.
Previous Node Type Names
|Current Name||Previous Name(s)|
Determining the Number of Nodes
The number of nodes that you choose depends on the size of your dataset and your desired query performance. Using the dense storage node types as an example, if you have 32 TB of data, you can choose either 16 ds1.xlarge nodes or 2 ds1.8xlarge nodes. If your data grows in small increments, choosing the ds1.xlarge node size will allow you to scale in increments of 2 TB. If you typically see data growth in larger increments, a ds1.8xlarge node size might be a better choice.
Because Amazon Redshift distributes and executes queries in parallel across all of a cluster’s compute nodes, you can increase query performance by adding nodes to your cluster. Amazon Redshift also distributes your data across all compute nodes in a cluster. When you run a cluster with at least two compute nodes, data on each node will always be mirrored on disks on another node and you reduce the risk of incurring data loss.
Regardless of the choice you make, you can monitor query performance in the Amazon Redshift console and with Amazon CloudWatch metrics. You can also add or remove nodes as needed to achieve the balance between storage and performance that works best for you. When you request an additional node, Amazon Redshift takes care of all the details of deployment, load balancing, and data maintenance. For more information about cluster performance, see Monitoring Amazon Redshift Cluster Performance.
If you intend to keep your cluster running continuously for a prolonged period, say, one year or more, you can pay considerably less by reserving the compute nodes for a one-year or three-year period. To reserve compute nodes, you purchase what are called reserved node offerings. You purchase one offering for each compute node that you want to reserve. When you reserve a compute node, you pay a fixed up-front charge and then an hourly recurring charge, whether your cluster is running or not. The hourly charges, however, are significantly lower than those for on-demand usage. For more information, see Purchasing Amazon Redshift Reserved Nodes.
Resizing a Cluster
If your storage and performance needs change after you initially provision your cluster, you can resize your cluster. You can scale the cluster in or out by adding or removing nodes. Additionally, you can scale the cluster up or down by specifying a different node type.
For example, you can add more nodes, change node types, change a single-node cluster to a multinode cluster, or change a multinode cluster to a single-node cluster. However, you must ensure that the resulting cluster is large enough to hold the data that you currently have or else the resize will fail. When using the API, you have to specify the node type, node size, and the number of nodes even if you only change one of the two.
The following describes the resize process:
When you initiate the resize process, Amazon Redshift restarts your existing (source) cluster. The restart terminates all existing connections to the cluster. All uncommitted transactions (including COPY) are rolled back.
The source cluster starts again in read-only mode. While the cluster is in this mode, you can run read queries but not write queries.
Amazon Redshift provisions the new (target) cluster as requested, and copies data from the source cluster to the target cluster.
When the resize process nears completion, the endpoint of the target cluster is updated and all connections to the source cluster are terminated.
After the resize completes, you can connect to the target cluster and resume running read and write queries.
When you resize your cluster, it will remain in read-only mode until the resize completes. You can view the resize progress on the cluster's Status tab in the Amazon Redshift console. The time it takes to resize a cluster depends on the amount of data in each node. Typically, the resize process varies from a couple of hours to a day, although clusters with larger amounts of data might take even longer. This is because the data is copied in parallel from each node on the source cluster to the nodes in the target cluster. For more information about resizing clusters, see Tutorial: Resizing Clusters in Amazon Redshift and Resizing a Cluster.
Amazon Redshift does not sort tables during a resize operation. When you resize a cluster, Amazon Redshift distributes the database tables to the new compute nodes based on their distribution styles and runs an ANALYZE to update statistics. Rows that are marked for deletion are not transferred, so you will only need to run a VACUUM if your tables need to be resorted. For more information, see Vacuuming tables in the Amazon Redshift Database Developer Guide.
If your cluster is public and is in a VPC, it keeps the same elastic IP address (EIP) for the leader node after resizing. If your cluster is private and is in a VPC, it keeps the same private IP address for the leader node after resizing. If your cluster is not in a VPC, a new public IP address is assigned for the leader node as part of the resize operation.
To get the leader node IP address for a cluster, use the dig utility, as shown following:
The leader node IP address is at the end of the ANSWER SECTION in the results, as shown following:
You can get the dig utility as part of the BIND software download. For more information on BIND, go to BIND in the Internet Systems Consortium documentation.
Supported Platforms to Launch Your Cluster
Amazon Redshift clusters run in Amazon Elastic Compute Cloud (Amazon EC2) instances that are configured for the Amazon Redshift node type and size that you select. You can launch an Amazon Redshift cluster in one of two platforms: EC2-Classic or EC2-VPC, which are the supported platforms for Amazon EC2 instances. For more information about these platforms, go to Supported Platforms in the Amazon EC2 User Guide for Linux Instances. The platform or platforms available to you depend on your AWS account settings.
To prevent connection issues between SQL client tools and the Amazon Redshift database, we recommend that you either configure an inbound rule that enables the hosts to negotiate packet size or disable TCP/IP jumbo frames by setting the maximum transmission unit (MTU) to 1500 on the network interface (NIC) of your Amazon EC2 instances. For more information about these approaches, see Queries Appear to Hang and Sometimes Fail to Reach the Cluster.
In the EC2-Classic platform, your cluster runs in a single, flat network that you share with other AWS customers. If you provision your cluster in the EC2-Classic platform, you control access to your cluster by associating one or more Amazon Redshift cluster security groups with the cluster. For more information, see Amazon Redshift Cluster Security Groups.
In the EC2-VPC platform, your cluster runs in a virtual private cloud (VPC) that is logically isolated to your AWS account. If you provision your cluster in the EC2-VPC platform, you control access to your cluster by associating one or more VPC security groups with the cluster. For more information, go to Security Groups for Your VPC in the Amazon VPC User Guide.
To create a cluster in a VPC, you must first create an Amazon Redshift cluster subnet group by providing subnet information of your VPC, and then provide the subnet group when launching the cluster. For more information, see Amazon Redshift Cluster Subnet Groups.
For more information about Amazon Virtual Private Cloud (Amazon VPC), go to the Amazon VPC product detail page.
Choose a Platform
Your AWS account is capable of launching instances either into both platforms, or only into EC2-VPC, on a region-by-region basis. To determine which platform your account supports, and then launch a cluster, do the following:
Decide on the AWS region in which you want to deploy a cluster. For a list of AWS regions in which Amazon Redshift is available, go to Regions and Endpoints in the Amazon Web Services General Reference.
Find out which Amazon EC2 platforms your account supports in the chosen AWS region. You can find this information in the Amazon EC2 console. For step-by-step instructions, go to Supported Platforms in the Amazon EC2 User Guide for Linux Instances.
If your account supports both of the platforms, choose the one on which you want to deploy your Amazon Redshift cluster. If your account supports only EC2-VPC, you must deploy your cluster in VPC.
Deploy your Amazon Redshift cluster. You can deploy a cluster by using the Amazon Redshift console, or programmatically by using the Amazon Redshift API, CLI, or SDK libraries. For more information about these options and links to the related documentation, see What Is Amazon Redshift?.
Regions and Availability Zone Considerations
Amazon Redshift is available in several AWS regions. By default, Amazon Redshift provisions your cluster in a randomly selected Availability Zone (AZ) within the AWS region that you select. All the cluster nodes are provisioned in the same AZ.
You can optionally request a specific AZ if Amazon Redshift is available in that AZ. For example, if you already have an Amazon EC2 instance running in one AZ, you might want to create your Amazon Redshift cluster in the same AZ to reduce latency. On the other hand, you might want to choose another AZ for higher availability. Amazon Redshift might not be available in all AZs within a region.
For a list of supported AWS regions where you can provision an Amazon Redshift cluster, go to Regions and Endpoints in the Amazon Web Services General Reference.
Amazon Redshift periodically performs maintenance to apply upgrades to your cluster. During these updates, your Amazon Redshift cluster is not available for normal operations.
Amazon Redshift assigns a 30-minute maintenance window at random from an 8-hour block of time per region, occurring on a random day of the week (Monday through Sunday, inclusive). The following list shows the time blocks for each region from which the default maintenance windows are assigned:
US East (N. Virginia) region: 03:00–11:00 UTC
US West (N. California) region: 06:00–14:00 UTC
US West (Oregon) region: 06:00–14:00 UTC
Asia Pacific (Mumbai) region: 16:30–00:30 UTC
Asia Pacific (Seoul) region: 13:00–21:00 UTC
Asia Pacific (Singapore) region: 14:00–22:00 UTC
Asia Pacific (Sydney) region: 12:00–20:00 UTC
Asia Pacific (Tokyo) region: 13:00–21:00 UTC
EU (Frankfurt) region: 06:00–14:00 UTC
China (Beijing) region: 13:00–21:00 UTC
EU (Ireland) region: 22:00–06:00 UTC
If a maintenance event is scheduled for a given week, it will start during the assigned 30 minute maintenance window. While Amazon Redshift is performing maintenance, it terminates any queries or other operations that are in progress. Most maintenance completes during the 30 minute maintenance window, but some maintenance tasks might continue running after the window closes. If there are no maintenance tasks to perform during the scheduled maintenance window, your cluster continues to operate normally until the next scheduled maintenance window.
You can change the scheduled maintenance window by modifying the cluster, either programmatically or by using the Amazon Redshift console. The window must be at least 30 minutes and not longer than 24 hours. For more information, see Managing Clusters Using the Console.
Default Disk Space Alarm
When you create an Amazon Redshift cluster, you can optionally configure an Amazon CloudWatch alarm to monitor the average percentage of disk space that is used across all of the nodes in your cluster. We’ll refer to this alarm as the default disk space alarm.
The purpose of default disk space alarm is to help you monitor the storage capacity of your cluster. You can configure this alarm based on the needs of your data warehouse. For example, you can use the warning as an indicator that you might need to resize your cluster, either to a different node type or to add nodes, or perhaps to purchase reserved nodes for future expansion.
The default disk space alarm triggers when disk usage reaches or exceeds a specified percentage for a certain number of times and at a specified duration. By default, this alarm triggers when the percentage that you specify is reached, and stays at or above that percentage for five minutes or longer. You can edit the default values after you launch the cluster.
When the CloudWatch alarm triggers, Amazon Simple Notification Service (Amazon SNS) sends a notification to specified recipients to warn them that the percentage threshold is reached. Amazon SNS uses a topic to specify the recipients and message that are sent in a notification. You can use an existing Amazon SNS topic; otherwise, a topic is created based on the settings that you specify when you launch the cluster. You can edit the topic for this alarm after you launch the cluster. For more information about creating Amazon SNS topics, see Getting Started with Amazon Simple Notification Service.
After you launch the cluster, you can view and edit the alarm from the cluster’s
Status window under CloudWatch Alarms. The
name is percentage-disk-space-used-default-<
You can open the alarm to view the Amazon SNS topic that it is associated with and edit alarm
settings. If you did not select an existing Amazon SNS topic to use, the one created for you
is named <
recipient>); for example, examplecluster-default-alarms (email@example.com).
If you delete your cluster, the alarm associated with the cluster will not be deleted but it will not trigger. You can delete the alarm from the CloudWatch console if you no longer need it.
You can rename a cluster if you want the cluster to use a different name. Because the
endpoint to your cluster includes the cluster name (also referred to as the
cluster identifier), the endpoint will change to use the new
name after the rename finishes. For example, if you have a cluster named
examplecluster and rename it to
newcluster, the endpoint
will change to use the
newcluster identifier. Any applications that connect
to the cluster must be updated with the new endpoint.
You might rename a cluster if you want to change the cluster to which your applications connect without having to change the endpoint in those applications. In this case, you must first rename the original cluster and then change the second cluster to reuse the name of the original cluster prior to the rename. Doing this is necessary because the cluster identifier must be unique within your account and region, so the original cluster and second cluster cannot have the same name . You might do this if you restore a cluster from a snapshot and don’t want to change the connection properties of any dependent applications.
If you delete the original cluster, you are responsible for deleting any unwanted cluster snapshots.
When you rename a cluster, the cluster status changes to
the process finishes. The old DNS name that was used by the cluster is immediately
deleted, although it could remain cached for a few minutes. The new DNS name for the
renamed cluster becomes effective within about 10 minutes. The renamed cluster is not
available until the new name becomes effective. The cluster will be rebooted and any
existing connections to the cluster will be dropped. After this completes, the endpoint
will change to use the new name. For this reason, you should stop queries from running
before you start the rename and restart them after the rename finishes.
Cluster snapshots are retained, and all snapshots associated with a cluster remain associated with that cluster after it is renamed. For example, suppose you have a cluster that serves your production database and the cluster has several snapshots. If you rename the cluster and then replace it in the production environment with a snapshot, the cluster that you renamed will still have those existing snapshots associated with it.
Amazon CloudWatch alarms and Amazon Simple Notification Service (Amazon SNS) event notifications are associated with the name of the cluster. If you rename the cluster, you need to update these accordingly. You can update the CloudWatch alarms in the CloudWatch console, and you can update the Amazon SNS event notifications in the Amazon Redshift console on the Events pane. The load and query data for the cluster continues to display data from before the rename and after the rename. However, performance data is reset after the rename process finishes.
For more information, see Modifying a Cluster.
Shutting Down and Deleting Clusters
You can shut down your cluster if you want to stop it from running and incurring charges. When you shut it down, you can optionally create a final snapshot. If you create a final snapshot, Amazon Redshift will create a manual snapshot of your cluster before shutting it down. You can later restore that snapshot if you want to resume running the cluster and querying data.
If you no longer need your cluster and its data, you can shut it down without creating a final snapshot. In this case, the cluster and data are deleted permanently. For more information about shutting down and deleting clusters, see Shutting Down or Deleting a Cluster.
Regardless of whether you shut down your cluster with a final manual snapshot, all automated snapshots associated with the cluster will be deleted after the cluster is shut down. Any manual snapshots associated with the cluster are retained. Any manual snapshots that are retained, including the optional final snapshot, are charged at the Amazon Simple Storage Service storage rate if you have no other clusters running when you shut down the cluster, or if you exceed the available free storage that is provided for your running Amazon Redshift clusters. For more information about snapshot storage charges, go to the Amazon Redshift pricing page.
The cluster status displays the current state of the cluster. The following table provides a description for each cluster status.
|The cluster is running and available.|
|Amazon Redshift is creating the cluster. For more information, see Creating a Cluster.|
|Amazon Redshift is deleting the cluster. For more information, see Shutting Down or Deleting a Cluster.|
|Amazon Redshift is taking a final snapshot of the cluster before deleting it. For more information, see Shutting Down or Deleting a Cluster.|
The cluster suffered a hardware failure.
If you have a single-node cluster, the node cannot be replaced. To recover your cluster, restore a snapshot. For more information, see Amazon Redshift Snapshots.
|Amazon Redshift cannot connect to the hardware security module (HSM). Check the HSM configuration between the cluster and HSM. For more information, see About Encryption for Amazon Redshift Using Hardware Security Modules.|
|There is an issue with the underlying network configuration. Make sure that the VPC in which you launched the cluster exists and its settings are correct. For more information, see Managing Clusters in an Amazon Virtual Private Cloud (VPC).|
|There is an issue with one or more parameter values in the associated parameter group, and the parameter value or values cannot be applied. Modify the parameter group and update any invalid values. For more information, see Amazon Redshift Parameter Groups.|
|There was an issue restoring the cluster from the snapshot. Try restoring the cluster again with a different snapshot. For more information, see Amazon Redshift Snapshots.|
|Amazon Redshift is applying changes to the cluster. For more information, see Modifying a Cluster.|
|Amazon Redshift is rebooting the cluster. For more information, see Rebooting a Cluster.|
|Amazon Redshift is applying a new name to the cluster. For more information, see Renaming Clusters.|
|Amazon Redshift is resizing the cluster. For more information, see Resizing a Cluster.|
|Amazon Redshift is rotating encryption keys for the cluster. For more information, see About Rotating Encryption Keys in Amazon Redshift.|
|The cluster has reached its storage capacity. Resize the cluster to add nodes or to choose a different node size. For more information, see Resizing a Cluster.|
|Amazon Redshift is updating the HSM configuration. For more information, see About Encryption for Amazon Redshift Using Hardware Security Modules.|